Doing Data Science
Now that answering complex and compelling questions with data can make the difference in an election or a business model, data science is an attractive discipline. But how can you learn this wide-ranging, interdisciplinary field? With this book, you’ll get material from Columbia University’s 'IntrodOREILLY°Learn how to turnStrata data into decisions.Making Data WorkFrom startups to the fortune 500smart companies are betting ondata-driven insight, seizing theopportunities that are emergingfrom the convergence of fourpoWerful trendsNew methods of collecting, managingand analyzing dataa Cloud computing that offers inexpensivestorage and flexible, on-demand computingpower for massive data setsa Visualization techniques that turn complexdata into images that tell a compelling storya Tools that make the power of data availableto anyoneGet control over big data and turn it into insightwith O'Reillys Strata offerings. Find the inspirationand information to create new products or reviveexisting ones, understand customer behavior, andlet the data edgeO REILLYVisit oreilly. com/data to learn moreo2011 O'Reilly Media, Inc. O Reilly logo is a registered trademark of O'Reilly Media, Inc.www.it-ebooks.infowww.it-ebooksinfoDoing data scienceRachel schutt and Cathy oneilBeijing. Cambridge. Farnham.Koln. Sebastopol. Tokyo OREILLYwww.it-ebooks.infoDoing data scienceby rachel schutt and Cathy o neilCopyright O 2014 Rachel Schutt and Cathy O'Neil. All rights reservedPrinted in the united states of americaPublished by o reilly Media, Inc, 1005 Gravenstein Highway North, Sebastopol, CA95472OReilly books may be purchased for educational, business, or sales promotional useOnlineeditionsarealsoavailableformosttitles(http://my.safaribooksonline.com).Formore information, contact our corporate/institutional sales department: 800-998-9938orcorporate@oreilly.com.Editors: Mike Loukides and Courtney Indexer: Word Co Indexing ServicesNashCover Designer: Karen MontgomeryProduction Editor: Kristen BrownInterior Designer: David FutatoCopyeditor Kim CoferIllustrator: rebecca demarestProofreader: amanda KeerseyOctober 2013First editionRevision history for the first edition:2013-10-08: First releaseSeehttp://oreilly.com/catalog/errata.csp?isbn=9781449358655forreleasedetailsNutshell Handbook, the Nutshell Handbook logo, and the O Reilly logo are registeredtrademarks of o reilly media, Inc. Doing data Science, the image of a nine-bandedarmadillo, and related trade dress are trademarks of o reilly media, IncMany of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, andO Reilly media, Inc, was aware of a trademark claim, the designations have beenprinted in caps or initial capsWhile every precaution has been taken in the preparation of this book, the publisherand authors assume no responsibility for errors or omissions, or for damages resultingfrom the use of the information contained hereinISBN:978-1-449-35865-5ILSIwww.it-ebooksinfoIn loving memory of Kelly feeneywww.it-ebooksinfowww.it-ebooksinfoTable of contentsPrefaceIntroduction What Is Data Science?Big Data and Data Science HypeGetting Past the HypeWhy Now?DataficationThe Current Landscape(with a Little History134569Data Science jobsA Data science profile10Thought Experiment: Meta-Definition13OK, So What Is a Data Scientist, Really?14In academia14In industry152. Statistical Inference exploratory data analysis, and the data scienceProcessStatistical Thinking in the Age of Big Data17Statistical InferencePopulations and Samples19Populations and Samples of Big Da21Big Data Can Mean Big Assumptions24Modeling26Exploratory Data Analysis34Philosophy of Exploratory Data analysis36Exercise: eDa37The Data Science process41A Data Scientist's role in This process43www.it-ebooksinfoThought Experiment: How Would You Simulate Chaos?Case Study RealDirect46How Does RealDirect Make Money?47Exercise: RealDirect Data Strategy483. Algorithms51Machine Learning algorithms52Three Basic Algorithms54Linear regression55k-Nearest Neighbors(k-NN)k-meansExercise: Basic Machine Learning Algorithms85Solutions85Summing it all u90Thought Experiment: Automated Statistician914. Spam Filters, Naive Bayes, and Wrangling93Thought Experiment: Learning by ExampleWhy Wont Linear Regression Work for Filtering Spam? 95How About k-nearest Neighbors?6Naive bayes98Bayes law98A Spam Filter for Individual Words99A Spam Filter That Combines Words: Naive bayesFancy It Up: Laplace Smoothing103Comparing Naive Bayes to k-NN105Sample code in bash105Scraping the web: APIs and Other tools106Jakes Exercise: Naive Bayes for Article Classification109Sample r code for dealing with the nyt api110LogIstic Regression.…………,,113Thought Experiments114Classifiers115Runtime116You117Interpretability117Scalability117M6D Logistic Regression Case Stud118Click models118The Underlying math120vi Table of Contentswww.it-ebooks.info
用户评论
特别棒的资料 特别棒