评估机器学习模型.
评估机器学习模型.evaluating-machine-learning-models.pdf )Evaluating Machineearning modelsA Beginner's Guide to KeyConcepts and pitfallsAlice zhengBeijing.oton. Farnham· Sebastopol Tokyo○RELYEvaluating Machine Learning modelsby alice ZhengCopyright o 2015 O Reilly Media, Inc. All rights reservedPrinted in the United States of AmericaPublished by O reilly Media, InC, 1005 Gravenstein Highway North, Sebastopol, CA95472OReilly books may be purchased for educational, business, or sales promotional useOnlineeditionsarealsoavailableformosttitles(http://safaribooksonline.com).Formore information, contact our corporate/institutional sales department:800-998-9938orcorporateoreilly.comEditor: Shannon CuttProofreader: Sonia ArubaProduction editor: Nicole shellInterior Designer: David FutatoCopyeditor Charles RoumeliotisCover Designer: Ellie VolckhausenIllustrator: rebecca demarestSeptember 2015: First EditionRevision history for the First Edition2015-09-01: First ReleaseThe OReilly logo is a registered trademark of OReilly Media, Inc. EvaluatingMachine Learning models, the cover image, and related trade dress are trademarks ofO Reilly media, IncWhile the publisher and the authors have used good faith efforts to ensure that theinformation and instructions contained in this work are accurate the publisher andthe authors disclaim all responsibility for errors or omissions, including withoutlimitation responsibility for damages resulting from the use of or reliance on thiswork. Use of the information and instructions contained in this work is at your ownrisk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is yourresponsibility to ensure that your use thereof complies with such licenses and/orrights978-1-49193246-9ILSI]Table of contentsPrefaceOrientation鲁鲁·春。The Machine Learning WorkflowEvaluation metricsHyperparameter Search4Online Testing MechanismsEvaluation metricsClassification Metrics7Ranking M12Rion metricsegression14Caution: The Difference Between Training metrics andEvaluation Metrics16Caution: Skewed Datasets--Imbalanced Classes, Outliersand Rare data16Related Reading18Software Packages18Offline evaluation mechanisms: hold-Out validation crossValidation, and Bootstrapping19Unpacking the prototyping Phase: Training, ValidationModel selection19Why Not Just Collect More Data?21Hold-Out validation22Cross-Validation22Bootstrap and jackknifeCaution: The Difference between Model Validation andTesting24ummary24Related reading25Software Packages25Hyperparameter Tuning ....,.............. 27Model Parameters Versus Hyperparameters27What Do hyperparameters Do?28Hyperparameter Tuning Mechanism28Hyperparameter Tuning Algorithms30The Case for Nested Cross-Validation34Related reading36Software Packages36The Pitfalls of A/B Testing.A/B Testing: What Is It?38Pitfalls of A/B Testing39Multi-Armed Bandits: An alternative46Related reading47Thats all, Folks!48iv Table of ContentsPrefaceThis report on evaluating machine learning models arose out of asense of need. The content was first published as a series of six tech-nical posts on the Dato Machine Learning Blog. I was the editor ofthe blog, and i needed something to publish for the next day. Datobuilds machine learning tools that help users build intelligent dataproducts In our conversations with the community, we sometimesran into a confusion in terminology. For example, people would askfor cross-validation as a feature, when what they really meant washyperparameter tuning, a feature we already had. So I thought, Aha!I'll just quickly explain what these concepts mean and point folks tothe relevant sections in the user guideSo I sat down to write a blog post to explain cross-validation, hold-out datasets, and hyperparameter tuning. After the first two paragraphs, however, I realized that it would take a lot more than a single blog post. The three terms sit at different depths in the concepthierarchy of machine learning model evaluation. Cross-validationand hold-out validation are ways of chopping up a dataset in orderto measure the models performance on unseen"data. Hyperparameter tuning, on the other hand, is a more "meta'process of modelselection. But why does the model need "unseen data, and what'smeta about hyperparameters? In order to explain all of that, Ineeded to start from the basics. First, I needed to explain the highlevel concepts and how they fit together only then could i dive intoeach one in detailMachine learning is a child of statistics, computer science, andmathematical optimization. Along the way, it took inspiration frominformation theory, neural science, theoretical physics, and manyother fields. Machine learning papers are often full of impenetrablemathematics and technical jargon. To make matters worse, some-times the same methods were invented multiple times in differentfields, under different names. The result is a new language that isunfamiliar to even experts in any one of the originating fieldsAs a field, machine learning is relatively young. Large-scale applica-tions of machine learning only started to appear in the last two decades. This aided the development of data science as a professionData science today is like the wild West: there is endless opportu-nity and excitement but also a lot of chaos and confusion Certainhelpful tips are known to only a few.Clearly, more clarity is needed. But a single report cannot possiblycover all of the worthy topics in machine learning. I am not coveringproblem formulation or feature engineering, which many peopleconsider to be the most difficult and crucial tasks in appliedmachine learning Problem formulation is the process of matching adataset and a desired output to a well-understood machine learningtask. This is often trickier than it sounds. Feature engineering is alsoextremely important. Having good features can make a big difference in the quality of the machine learning models, even more sothan the choice of the model itself. Feature engineering takes knowl-edge, experience, and ingenuity. We will save that topic for anothertimeThis report focuses on model evaluation. It is for folks who are starting out with data science and applied machine learning. Some seas-oned practitioners may also benefit from the latter half of the report,which focuses on hyperparameter tuning and a/b testing i certainlylearned a lot from writing it, especially about how difficult it is to doA/B testing right. I hope it will help many others build measurablybetter machine learning models!This report includes new text and illustrations not found in the orig-inal blog posts. In Chapter 1, Orientation, there is a clearer explana-tion of the landscape of offline versus online evaluations, with newdiagrams to illustrate the concepts. In Chapter 2, Evaluation Met-rics, there's a revised and clarified discussion of the statistical bootstrap. I added cautionary notes about the difference between train-ing objectives and validation metrics, interpreting metrics when thelata is skewed(which always happens in the real world), and nestedyperparameter tuning. Lastly, I added pointers to various softwarePrefacepackages that implement some of these procedures. Soft plugs forGraphLab Create, the library built by Dato, my employer.Im grateful to be given the opportunity to put it all together into asingle report. Blogs do not go through the rigorous process of academic peer reviewing. But my coworkers and the community ofreaders have made many helpful comments along the way. a bigthank you to Antoine Atallah for illuminating discussions on A/Btesting. Chris DuBois, Brian Kent, and Andrew Bruce providedcareful reviews of some of the drafts. Ping Wang and Toby rosemanfound bugs in the examples for classification metrics. Joe McCarthyprovided many thoughtful comments, and Peter Rudenko shared anumber of new papers on hyperparameter tuning. All the awesomeinfographics are done by Eric Wolfe and Mark Enomoto; all theaverage-looking ones are done by meIf you notice any errors or glaring omissions, please let me knowalice@dato.com.BetteranerratathanneverLast but not least, without the cheerful support of Ben Lorica andShannon Cutt at o reilly, this report would not have materializedThank youPreface|ⅶi
用户评论