reinforcement learning
reinforcement learningAdaptation, Learning, and Optimization, Volume 12Series editor -in-ChiefMeng-Hiot LimNanyang Technological University, SingaporeE-mail: emhlim@ntu. edu. sgYew-Soon OngNanyang Technological University, SingaporeE-mail: asysong@ntu.edu. sgFurther volumes of this series can be found on our homepage: springer. comVol 1. Jingqiao Zhang and Arthur C. SandersonAdaptive Differential Evolution, 2009ISBN978-3-642-01526-7Vol 2. Yoel Tenne and Chi-Keong Goh(EdsComputational Intelligence inExpensive Optimization Problems, 2010ISBN978-3-642-10700-9Vol 3. Ying-ping Chen(EdExploitation of Linkage Learning in Evolutionary Algorithms, 2010ISBN978-3-642-12833-2Vol 4. Anyong Qing and Ching Kwang LeeDifferential Evolution in Electromagnetics, 2010ISBN978-3-642-12868-4Vol 5 Ruhul A Sarker and Tapabrata Ray(eds)Agent-Based Evolutionary Search, 2010ISBN978-3-642-13424-1Vol 6. John Seiffertt and Donald C wunschUnified Computational Intelligence for Complex Systems, 2010ISBN978-3-642-03179-3Vol 7. Yoel Tenne and Chi-Keong Goh(Eds)Computational Intelligence in Optimization, 2010SBN978-3-642-12774-8Vol 8. Bijaya Ketan Panigrahi, Yuhui Shi, and Meng-Hiot Lim(EdsHandbook of Swarm Intelligence, 2011ISBN978-3-642-173899Vol 9. Lijuan Li and Feng LiuGroup Search Optimization for Applications in Structural Design, 2011ISBN978-3-642-20535-4Vol. 10. Jeffrey W. Tweedale and Lakhmi C JainEmbedded Automation in Human-Agent Environment, 2011ISBN978-3-642-22675-5Vol 11. Hitoshi iba and claus c. aranhaPractical Applications of Evolutionary Computation to Financial Engineering, 2012ISBN978-3-642-27647Vol 12. Marco Wiering and Martijn van Otterlo(EdsReinforcement Learning, 2012ISBN978-3-642-27644-6Marco Wiering and Martijn van Otterlo(EdsReinforcement LearningState-of-the-Art②SpringereditorsDr. Marco WieringDr. ir. Martijn van OtterloUniversity of groningenRadboud University NijmegenThe netherlandsThe netherlandsISSN1867-4534e-ISSN1867-4542ISBN978-3-642-27644-6e-ISBN978-3-642-27645-3DOI10.1007/978-3-642-27645-3Springer Heidelberg New York Dordrecht LondonLibrary of Congress Control Number: 2011945323C Springer-Verlag Berlin Heidelberg 2012This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or partof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformation storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarmethodology now known or hereafter developed Exempted from this legal reservation are brief ex-cerpts in connection with reviews or scholarly analysis or material supplied specifically for the purposeof being entered and executed on a computer system, for exclusive use by the purchaser of the work.Duplication of this publication or parts thereof is permitted only under the provisions of the CopyrightLaw of the Publisher's location, in its current version, and permission for use must always be obtainedfrom Springer. Permissions for use may be obtained through RightsLink at the Copyright ClearanceCenter Violations are liable to prosecution under the respective Copyright LawThe use of general descriptive names, registered names, trademarks, service marks, etc in this publirelevant protective laws and regulations and therefore free for general use. ncation does not imply, even in the absence of a specific statement, that such names are exempt from theWhile the advice and information in this book are believed to be true and accurate at the date ofpublication, neither the authors nor the editors nor the publisher can accept any legal responsibility forany errors or omissions that may be made. The publisher makes no warranty, express or implied, withrespect to the material contained hereinPrinted on acid-free paperSpringerispartofSpringerScience+businessMedia(www.springer.com)Good and evil, reward and punishment, arethe only motives to a rational creature: theseare the spur and reins whereby all mankindare set on work, and guided. (Locke)ForewordReinforcement learning has been a subject of study for over fifty years, but its mod-ern form--highly influenced by the theory of Markov decision processes--emergedin the 1980s and became fully established in textbook treatments in the latter halfof the 1990s In Reinforcement Learning: State-of-the-Art, Martijn van Otterlo andMarco Wiering, two respected and active researchers in the field, have commissioned and collected a series of eighteen articles describing almost all the majordevelopments in reinforcement learning research since the start of the new millen-nium. The articles are surveys rather than novel contributions. each authoritativelytreats an important area of Reinforcement Learning, broadly conceived as includingits neural and behavioral aspects as well as the computational considerations thathave been the main focus. This book is a valuable resource for students wanting togo beyond the older textbooks and for researchers wanting to easily catch up withrecent developmentsAs someone who has worked in the field for a long time two things stand outfor me regarding the authors of the articles. The first is their youth. Of the eighteenarticles. sixteen have as their first author someone who received their phd withinthe last seven years(or who is still a student). This is surely an excellent sign forthe vitality and renewal of the field. The second is that two-thirds of the authors hailfrom Europe. This is only partly due to the editors being from there; it seems toreflect a real shift eastward in the center of mass of reinforcement learning researchfrom North America toward Europe Vive le temps et les differences!October 2011Richard s. suttonPrefacea question that pops up quite often among reinforcement learning researchers is onwhat one should recommend if a student or a colleague asks forsome good and recent book that canintroduce me to reinforcement learningThe most important goal in creating this book was to provide at least a good answerto that questionA Book about reinforcement Learninga decade ago the answer to our leading question would be quite easy to give; aroundthat time two dominant books existed that were fully up-to-date. One is the excellent introduction to reinforcement learning by Rich Sutton and Andy barto from1998. This book is written from an artificial intelligence perspective, has a great educational writing style and is widely used (around ten thousand citations at the timeof writing ) The other book was written by Dimitri Bertsekas and John Tsitsiklisin 1996 and was titled neuro-dynamic programming. Written from the standpointof operations research, the book rigorously and in a mathematically precise waydescribes dynamic programming and reinforcement learning with a particular emphasis on approximation architectures. Whereas Sutton and Barto always maximizerewards, talk about value functions, rewards and are biased to the V,2, S, A, T,R)part of the alphabet augmented with I, Bertsekas and tsitsiklis talk about costto-go-functions, always minimize costs, and settle on the G, I, U) part of thealphabet augmented with the greek symbol u. Despite these superficial(notation)differences, the distinct writing styles and backgrounds, and probably also the audi-ence for which these books were written, both tried to give a thorough introductionSutton and Barto, (1998) Reinforcement Learning: An Introduction, MIT Press2 Bertsekas and Tsitsiklis(1996) Neuro-Dynamic Programming, Athena Scientificto this exciting new research field and succeeded in doing that. At that time, the bigmerge of insights in both operations research and artificial intelligence approachesto behavior optimization was still ongoing and many fruitful cross-fertilization happened. Powerful ideas and algorithms such as Q-learning and TD-learning had beenintroduced quite recently and so many things were still unknownFor example, questions about convergence of combinations of algorithms andfunction approximators arose. Many theoretical and experimental questions aboutconvergence of algorithms, numbers of required samples for guaranteed performance,and applicability of reinforcement learning techniques in larger intelligentarchitectures were largely unanswered. In fact, many new issues came up and introduced an ever increasing pile of research questions waiting to be answered bybright, young PhD students. And even though both Sutton Barto and Bertsekastsitsiklis were excellent at introducing the field and eloquently describing theunderlying methodologies and issues of it, at some point the field grew so large thatnew texts were required to capture all the latest developments hence this book, asan attempt to fill the gapThis book is the first book about reinforcement learning featuring only state-of-the-art surveys on the main subareas. However, we can mention several otherinteresting books that introduce or describe various reinforcement learning topicstoo. These include a collection 3 edited by Leslie Kaelbling in 1996 and a new edition of the famous Markov decision process handbook by Puterman. Several otherbooks, deal with the related notion of approximate dynamic programming. Re-cently additional books have appeared on Markov decision processes, reinforcement learning, function approximation and relational knowledge representationfor reinforcement learning 10 These books just represent a sample of a larger num-ber of books relevant for those interested in reinforcement learning of course3 L. P Kaelbling(ed )(1996)Recent Advances in Reinforcement Learning, SpringerM. L. Puterman(1994, 2005)Markov Decision Processes: Discrete Stochastic DynamicProgramming, WileyJ. Si, A.G. Barto, W.B. Powell and D. Wunsch(eds )(2004)Handbook of Learning andApproximate Dynamic Programming, IEEE Press6 W.B. Powell(2011)Approximate Dynamic Programming: Solving the Curses of Dimensignality, 2nd Edition, Wiley.70 Sigaud and O. Buffet(eds)(2010) Markov Decision Processes in Artificial IntelligenceWiley-ISTEC. Szepesvari(2010)Algorithms for Reinforcement Learning, Morgan-Claypool9L. Busoniu, R Babuska, B. De Schutter and D. Ernst(2010)Reinforcement Learning andDynamic Programming Using Function Approximators, CRC Press10 M. van Otterlo(2009) The Logic Of Adaptive Behavior, IOS PressPrefaceReinforcement Learning: A Field Becoming matureIn the past one and a half decade the field of reinforcement learning has growntremendously. New insights from this recent period having much to deal withricher, and firmer, theory, increased applicability, scaling up, and connections to(probabilistic)artificial intelligence, brain theory and general adaptive systemsare not reflected in any recent book Richard Sutton, one of the founders of modernreinforcement learning described in 1999 three distinct areas in the developmentof reinforcement learning; past, present and futureThe rl past encompasses the period until approximately 1985 in which theidea of trial-and-error learning was developed. This period emphasized the use ofan active, exploring agent and developed the key insight of using a scalar rewardsignal to specify the goal of the agent, termed the reward hypothesis. The methodsusually only learned policies and were generally incapable of dealing effectivelywith delayed rewardsThe rl present was the period in which value functions were formalized. valuefunctions are at the heart of reinforcement learning and virtually all methods focuson approximations of value functions in order to compute(optimal) policies. Thevalue function hypothesis says that approximation of value functions is the dominantgenceAt this moment, we are well underway in the reinforcement learning future Sutton made predictions about the direction of this period and wrote Just as rein-forcement learning present took a step away from the ultimate goal of reward tofocus on value functions, so reinforcement learning future may take a further stepaway to focus on the structures that enable value function estimation /. In psychology, the idea of a developing mind actively creating its representations of theworld is called constructivism. My prediction is that for the next tens of years reinforcement learning will be focused on constructivism. "Indeed, as we can see in thisbook, many new developments in the field have to do with new structures that enablevalue function approximation. In addition, many developments are about properties,capabilities and guarantees about convergence and performance of these new structures. Bayesian frameworks, efficient linear approximations, relational knowledgerepresentation and decompositions of hierarchical and multi-agent nature all consti-tute new structures employed in the reinforcement learning methodology nowadaysReinforcement learning is currently an established field usually situated inmachine learning. However, given its focus on behavior learning, it has many con-nections to other fields such as psychology, operations research, mathematical optimization and beyond Within artificial intelligence there are large overlaps withprobabilistic and decision-theoretic planning as it shares many goals with the planning community(e. g. the international conference on automated planning systemsICAPS). In very recent editions of the international planning competition (IPC)methods originating from the reinforcement learning literature have entered theR.S. Sutton(1999)Reinforcement Learning: Past, Present, Future-SEAL98
用户评论