Pattern Recognition and Machine Learning 习题答案详解
Pattern Recognition and Machine Learning 习题答案详解 C.Bishop模式识别与机器学习 书后面的全部习题答案Pattern Recognition and Machine learningSolutions to the exercises: Tutors' editionMarkus Svensen and Christopher M. BishopCopyright C) 2002-2009This is the solutions manual (lutors'Edition) for the book Pattern Recognition and Machine learning(PRML, published by Springer in 2006). This release was created September 8, 2009. Any future releases(e.g. with corrections Lo errors)will be announced on the PRMl web-Site(see below) and published viapringer.PLEASE DO NOT DISTRIBUTEMost of the solutions in this manual are intended as a resource fortutors teaching courses based on Prml and the value of this resourcewould be greatly diminished if was to become generally available. Alltutors who want a copy should contact Springer directlyThe authors would like to express their gratitude to the various people who have provided feedback onearlier releases of this documentThe authors welcome all comments, questions and suggestions about the solutions as well as reports on(potential) errors in text or formulae in this document; please send any such feedback toprml-fbemicrcsoft. comFurther information about prml is available fromhttpresearchmicrosoftcom/cmbishop/prmlContentsContentsChapter 1: IntroductionChapter 2: Probability Distributions28Chapter 3: Linear Models for RegressionChapter 4: Linear models for Classification.78Chaptcr 5: Ncural Nctworks....93Chapter 6: Kernel Methods114Chapter 7: Sparse Kernel machines128Chapter 8: Graphical Models136Chapter 9: Mixture Models and EMChapter 10: Approximate Inference163Chapter 11: Sampling Methods198Chapter 12: Continuous Latent VariablesChapter 13: Sequential DataChapter 14: Combining models2466CONTENTSSolutions 11-147Chapter 1 Introduction1.1 Substituting(1. 1) into(1.2)and then differentiating with respect to u2 wc obtain0Re-arranging terms then gives the required result1.2 For the regularized sum-of-squares error function given by (1. 4) the correspondinglinear equations are again obtained by differentiation and take the same form as(1. 122), but with Aij replaced by Aij, given bA; i+il1.3 Let us denote apples, oranges and limes by a, o and l respectively. The marginalprobability of selecting an apple is given by(a)= plar)p(r)+p(alb)p(b)+p(algp(g)10.2+-×0.2+×0.6=0.3where the conditional probabilities are obtained from the proportions of apples ineach boxTo find the probability that the box was green, given that the fruil we selected wasan orange, we can use Bayes'theoremp(glo)- plolgiplg)p(oThe denominator in(4) is given bP(o)= p(orp(r)+plolb)p(b)+plolg)p(g)0.2+×0.2+×0.6=0.365from which we obtain30.61plg100.31.4 We are often interested in finding the most probable value for some quantity. Inthe case of probability distributions over discrete variables this poses little problemHowever, for continuous variables there is a subtlety arising from the nature of probability densities and the way they transform under non-linear changes of variable8SolutionConsider first the way a function f(a) behaves when we change to a new variable ywhere the two variables arc related by m=g(y). This defines a ncw function of ygivenf(y)=f(9(y)Suppose f()has a mode(i. e a maximum) at so that f(a)=0. The corresponding mode of f(y will occur for a value y obtained by dimerentialing both sides of()with respect to yr(0)=f(g(⑨)g(⑦)=0.Assuming g(9+0 at the mode, then f'(g()=0. However, we know thatf()=0, and so we see that the locations of the mode expressed in terms of eachof the variables w and y are related by t-y(y), as one would expect. Thus, findinga mode with respect to the variable is completely equivalent to first trans formingto the variable y, then finding a mode with respect to g, and then transforming backNow consider the behaviour of a probability density pr(a) under the change of vari-ables xg(y), where the density with respect to the new variable is pu(y) and isgiven by( (1. 27). Let us write g(y)=slg(3 where s c(1, +1]. Then((. 27)can be writtenP(y)=p2(9()sg'(y)Differentiating both sides with respect to y then givesp()=8p(y){9()}2+sp(g(y)g(y)Due to the presence of the second term on the right hand side of ( 9)the relationshipe-g(g) no longer holds. Thus the value of obtained by maximizing p:(a)willnot be the value obtained by transforming to py(y) then maximizing with respect toy and then transforming back to This causes modes of densities to be dependenton the choice of variables. In the case of linear transformation the second term onthe right hand side of(9)vanishes, and so the location of the maximum transformsaccording tox=g(y)This effect can be illustrated with a simple example as shown in Figure 1. webegin by considcring a Gaussian distribution pa(r)ovcr r with mcan ll=6 andtandard deviation1, shown by the red curve in Figure Next we draw asample of N= 50, 000 points from this distribution and plot a histogram of theirvalues, which as expected agrees with the distribution pr (Now consider a non-lincar changc of variables from m to y/ given byx=9(y)=ln(y)-ln(1-y)+5.(10)The inverse of this function is given b1+exp(-+Figure 1 Example of the transformation ofthe mode of a density under a non-Ilinear change of variables, illus-Pg,(y)1(xtrating the different behaviour com-pared to a simple function. See the ytext for details0.510which is a logistic sigmoid function, and is shown in Figure l by the blue curveIf we simply transform p()as a function of x we obtain the green curve p(g(y))shown in Figure 1, and we see that the mode of the densily p: (=) is transformedvia the sigmoid function to the mode of this curve. However, the density over ytransforms instead according to(1. 27)and is shown by the magenta curve on the leftside of the diagram note that this has its mode shifted relative to the mode of thegreen curTo confirm this result we take our sample of 50, 000 values of a, evaluate the corresponding values of y using(11), and then plot a histogram of their values. We seethat this histogram matches the magenta curve in Figure l and not the green curve1.5 Expanding the square we haveE((r)-Elf())f(x)2-2(x)E|f(x)]+Ef(x)2」Ef(a)-2EIf (E[f(c)+Elf(a)E(x)2-网()2as required1.6 The definition of covariance is given by(1.41)ascov, y] -]Using(1.33)and the fact that pla, y)=p(cp(y when c and y are independent, weobtain2p(,y)2y∑)∑ply)y=E[Eg10Solutions 1.7-1. 8and hence cov[, y=0. The case where x and y are continuous variables is analo-gous, with (1.33)rcplaccd by(1.34)and thc sums rcplaccd by integrals1.7 The transformation from Cartesian to polar coordinates is defined byos 0(12yr sin(13)and hencc we havc 2+y2=r2 whcrc wc havc uscd thc well-known trigonometricresult2. 177). Also the Jacobian of the change of variables is easily seen to be0:o0(x,y)ar a8du aLcos6-rsin esin gwhere again we have used (2. 177). Thus the double integral in(1. 125) becomes2丌Boe)drdo(142丌expd(15)r[ep(2)(2)](16)(17)where wc havc uscd the changc of variables r4= al. ThusFinally, using the transformation y=a-A, the integral of the gaussian distributionbecomes=(2m(2丌σas required1. 8 From the definition(1. 46)of the univariate Gaussian distribution, we have=厂、(x)p(-d
下载地址
用户评论
答案很全,而且解答详细,和原书一一对应,对我帮助很大,谢谢楼主。
这个真的很有用,解答很详细
非常不错,找了好久,终于找到了,最近一直在学习这本书的内容,通过做习题更好的巩固知识。。
这个特别有用,虽然我也不是看的很明白。。。
原书是一本很好的书..答案也是全的
这个文档包含了全部习题的官方解答,谢谢楼主!