1. 首页
  2. 大数据
  3. spark
  4. Learning Apache Spark 2

Learning Apache Spark 2

上传者: 2018-12-25 14:50:30上传 PDF文件 10.72MB 热度 32次
本书于2017-03由Packt Publishing出版,作者Muhammad Asif Abbasi,非常好的由浅入深学习sparkLearning Apache Spark 2opyright C 2017 Packt PublishingAll rights reserved. No part of this book may be reproduced, stored in a retrieval system, ortransmitted in any form or by any means, without the prior written permission of thepublisher, except in the case of brief quotations embedded in critical articles or reviewsEvery effort has been made in the preparation of this book to ensure the accuracy of theinformation presented. However, the information contained in this book is sold withoutwarranty, either express or implied. Neither the author, nor Packt publishing, and itsdealers and distributors will be held liable for any damages caused or alleged to be causeddirectly or indirectly by this bookPackt Publishing has endeavored to provide trademark information about all of thecompanies and products mentioned in this book by the appropriate use of capitalsHowever packt publishing cannot guarantee the accuracy of this informationFirst published: March 2017Production reference: 1240317Published by Packt Publishing LtdLivery Place35 Livery StreetBirminghamB3 2PB. UKSBN978-178588-513-6wiw. packtpub comCreditsAuthorsCopy editorMuhammad Asif abbasiSafis EditingReviewersProject CoordinatorPrashant vermadhi joshiCommissioning EditorProofreaderVeena pagareSafis editingAcquisition editorIndexerTushar GuptaTejal Daruwale soniContent Development editor GraphicsMayur pawanikarTania duttaTechnical editorProduction coordinatorKaran thakkarNilesh mohiteabout the authorMuhammad asif abbasi has worked in the industry for over 15 years in a variety of rolesfrom engineering solutions to selling solutions and everything in between. Asif is currentlyworking with sas a market leader in analytic solutions as a principal business solutionsconsulting for major organizations an d industries across the globe, and running proof-or inManager for the Global Technologies Practice. Based in London, Asif has vast experienceconcepts across various industries including but not limited to telecommunications,manufacturing, retail, finance, services, utilities and government. Asif is an Oracle CertifiedJava ee 5 enterprise architect, Teradata certified master pmP, hortonworks hadoopCertified developer and administrator. asif also holds a Master s degree in computercience and business administrationAbout the reviewersPrashant Verma started his It carrier in 2011 as a Java developer in Ericsson working intelecom domain. After couple of years of Java ee experience, he moved into Big datadomain, and has worked on almost all the popular big data technologies, such as HadoopSpark, Flume, Mongo, Cassandra, etc. He has also played with Scala. Currently, He workswith QA Infotech as Lead Data Enginner, working on solving e-Learning problems usinganalytics and machine learningPrashant has also worked on Apache spark for Java developers, Packt as a Technical reviewerI want to thank Packt Publishing for giving me the chance to review the book as well as myemployer and my family for their patience while i was busy working on this bookwww.packtpub.comForsupportfilesanddownloadsrelatedtoyourbookpleasevisitwww.Packtpub.comDid you know that Packt offers eBook versions of every book published, with PDF andepuBfilesavailableYoucanupgradetotheeboOkversionatwww.packtpUb.comandasaprint book customer, you are entitled to a discount on the eBook copy. Get in touch with usat service dpacktpub com for more detailsAtwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarange of free newsletters and receive exclusive discounts and offers on packt books andeBooKsMMapthttps://www.packtpub.com/maptGet the most in-demand software skills with Mapt. Mapt gives you full access to all Packtbooks and video courses, as well as industry-leading tools to help you plan your personaldevelopment and advance your careerWhy subscribe?e Fully searchable across every book published by PacktCopy and paste print, and bookmark contentOn demand and accessible via a web browserCustomer FeedbackThanks for purchasing this Packt book. At Packt, quality is at the heart of our editorialprocess. To help us improve, please leave us an honest review on this book,s Amazon pageathttps://www.amazOn.com/dp/1785885138If you d like to join our team of regular reviewers, you can e-mail us atcustomerreviewsapacktpub com We award our regular reviewers with free eBooks andvideos in exchange for their valuable feedback. Help us be relentless in improving ourroducts!Table of contentsPrefaceChapter 1: Architecture and InstallationApache Spark architecture overviewSpark-coreSpark sQLSpark streamingMLlib9911122GraphXSpark deployment13Installing Apache Spark14Writing your first Spark program16Scala shell examplesPython shell examples21Spark architecture24High level overview24Driver programCluster ManagerWorkerExecutorsTasksSpark ContextSpark session555562℃Apache Spark cluster manager typesBuilding standalone applications with Apache Spark27Submitting applicationsDeployment strategies29Running Spark examples29Building your own programs31Brain teasers31References32SummaryChapter 2: Transformations and Actions with Spark RDDsWhat is an rdD?3336Constructing RDDsParallelizing existing collectionsReferencing external data source39Operations on RDD41Transformations42Actions42Passing functions to Spark( ScalaAnonymous functions44Static singleton functions45Passing functions to Spark(Java)46Passing functions to Spark( Python)47Transformations49Map( func50Filter(funcflatMap(func51Sample(withReplacement, fraction, seed)52Set operations in Spark54Distinct()54Intersection(55Union(Subtract(57Cartesian(Actions58Reduce(funcCollect()59Count()60Take(n)60First(61SaveAsXXFile(61foreach(func)61PairRDdsCreating pairRDDsPairRDD transformations65reduce ByKey(funcGroupByKey (func)67reduce ByKey vs group ByKey-Performance Implications67Combine Bykey(func68Transformations on two pairRDDsActions available on PairRDDsShared variables70Broadcast variables71Accumulators71Tiil
用户评论