1. 首页
  2. 课程学习
  3. 专业指导
  4. Bioinformatics and Computational Biology Solutions Using R and Bioconductor

Bioinformatics and Computational Biology Solutions Using R and Bioconductor

上传者: 2018-12-25 14:13:30上传 PDF文件 8.38MB 热度 38次
The Bioconductor project is an open source and open development soft- ware project for the analysis and comprehension of genomic data. It is rooted in the open source statistical computing environment R. This book’s coverage is broad and ranges across most of the key capabilities of the BioconductoreditorsRobert gentlemanVincent J. CareyProgram in Computational BiologyChanning LaboratoryDivision of public health sciencesBrigham and Womens HospitalFred Hutchinson Cancer research CenterHarvard Medical School1100 Fairview aveN. M2-B876181 Longwood ave Boston Ma 02115 USAPO BOX 19024stvjc@ channing. harvard. eduSeattle, Washington 98109-1024 USArgentlem@fhcrc. orgWolfgang HuberRafael A. IrizarrySandrine dudoitEuropean Bioinformatics InstituteDepartment of Biostatistics Division of BiostatisticsEuropean Molecular BiologyJohns Hopkins BloombergSchool of public healthLaboratorySchool of public healthUniversity of CaliforniaCambridge, CBIO ISD UK615 North wolfe streetBerkelhuber@ebi ac ukBaltimore. Md 21205 USa 140 Earl Warren Hall.#7360rafa @jhu. eduBerkeley, CA 94720-7360USAsandrine@ stat. berkeley. eduseries editorsM. GailK. KrickebergDepartment of StatisticsNational cancer instituteLe chateletStanford UniversityRockville MD 20892F-63270 ManglieuStanford. CA 94305USAfranceUSAA. tsiatisDepartment of StatisticsDepartment of EpidemiologyNorth Carolina State UniversitySchool of public healthRaleigh nc 27695Johns hopkins universityUSA615 Wolfe streetBaltimore. MD 21205USALibrary of Congress Control Number: 2005923843ISBN-10:0-387-25146-4Printed on acid-free paperISBN-13:978-0387-25146-2c 2005 Springer Science+Business Media, IncAll rights reserved. This work may not be translated or copied in whole or in part without the writtenpermission of the publisher(Springer Science+Business Media, InC, 233 Spring Street, New York, NY10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connec-tion with any form of information storage and retrieval, electronic adaptation, computer software, or bsimilar or dissimilar methodology now known or hereafter developed is forbiddenThe use in this publication of trade names, trademarks, service marks, and similar terms, even if theyare not identified as such, is not to be taken as an expression of opinion as to whether or not they aresubject to proprietary rightsPrinted in China. (EVB)98765432springeronline.comrefaceDuring the past few years, there have been enormous advances in genomics and molecular biology, which carry the promise of understandinthe functioning of whole genomes in a systematic manner. The challengeof interpreting the vast amounts of data from microarrays and other highthroughput technologies has led to the development of new tools in thefields of computational biology and bioinformatics, and opened excitingnew connections to areas such as chemometrics, exploratory data analysis,statistics, machine learning, and graph theoryThe Bioconductor project is an open source and open development software project for the analysis and comprehension of genomic data. Itrooted in the open source statistical computing environment R. This bookscoverage is broad and ranges across most of the key capabilities of theBioconductor project. Thanks to the hard work and dedication of manydevelopers, a responsive and enthusiastic user community has formed. Al-though this book is self-contained with respect to the data processing anddata analytic tasks covered, readers of this book are advised to acquaintthemselves with other aspects of the project by touring the project websitewww.bioconductor.orgThis book represents an innovative approach to publishing about scentific software. We made a commitment at the outset to have a fullycomputable book. Tables, figures, and other outputs are dynamically generated directly from the experimental data. Through the companion website,www.bioconductor.org/mogr,readershavefullaccesstothesourcecode and necessary supporting libraries and hence will be able to see howevery plot and statistic was computed. They will be able to reproduce thosecalculations on their own computers and should be able to extend most ofthose computations to address their own needsAcknowledgmentsThis book, like so many projects in bioinformatics and computational bi-ology, is a large collaborative effort. The editors would like to thank thechapter authors for their dedication and their efforts in producing widelyused software, and also in producing well-written descriptions of how touse that softwareWe would like to thank the developers of r, without whom there wouldbe no Bioconductor project. Many of these developers have provided additional help and engaged in discussions about software development anddesign. We would like to thank the many Bioconductor developers andusers who have helped us to find bugs, think differently about problemsand whose enthusiasm has made the long hours somewhat more bearableWe would also like to thank Dorit arlt. Michael Boutros. SabinaChiaretti, James MacDonald, Meher Majety, Annemarie Poustka, JeromePrefaceRitz, Mamatha sauermann, Holger Siltmann, Stefan Wiemann, and SethFalcon, who have contributed in many different ways to the production ofthis monograph. Much of the preliminary work on the MLInterfaces pack-age, described in Chapter 16, was carried out by Jess Mar, Departmentof Biostatistics. harvard school of Public health. Ms Mars efforts weresupported in part by a grant from Insightful CorporationThe Bioconductor project is supported by grant 1R33 HG002708 fromthe nih as well as by institutional funds at both the Dana Farber CancerInstitute and the Fred Hutchinson Cancer Research Center. W.H. receivedproject-related funding from the german Ministry for Education and Research through National Genome Research Network (NGFN grant FKZ01GR0450SeattleRobert gentlemanBCambridge (UK)Wolfgang huberBaltiRafael 17BerkeleySandrine dudoitFebruary 2005ii ContributorsJ. Gentry, Center for Cancer Research, Massachusetts General Hospital,Boston MA. USAF. Hahne, Division of Molecular Genome Analysis, German Cancer Research Center, Heidelberg, FRGL. Harris, Department of Cancer Biology, Dana Farber Cancer InstituteBoston MA. USaT Hothorn, Institut fuir Medizininformatik, Biometrie und epidemiologieFriedrich-Alexander-Universitat Erlangen-Nurnberg, FRGW. Huber, European Molecular Biology Laboratory, European Bioinfor-matics Institute, Cambridge, UKJ. Ibrahim, Department of Biostatistics, University of North CarolinaChapel hill, Nc, UsaJ. D. Iglehart, Department of Cancer Biology, Dana Farber Cancer Institute. Boston MA. USAR. A. Irizarry, Department of Biostatistics, Johns Hopkins BloomberSchool of public health. Baltimore MD, usaX. Li, Department of Biostatistics and Computational Biology, Dana Farber Cancer Institute. Boston. MA. USAX. Lu, Department of biostatistics, Harvard school of Public healthBoston MA. USAA. Miron, Department of Cancer Biology, Dana Farber Cancer InstituteBoston MA. USAA. C. Paquet, Department of Biostatistics, University of California, Sanfrancisco、CA.USAK.S. Pollard, Center for Biomolecular Science and Engineering, Universityof california. Santa Cruz. USAD. Scholtens, Department of preventive Medicine, Northwestern Univerty, chicago, IL, USAQ. Shi, Department of Cancer Biology, Dana Farber Cancer InstituteBoston MA. USAContentsI Preprocessing data from genomic experiments1 Preprocessing OverviewW.Huber, R. A Irizarry, and R. gentleman1.1 Introduction1.2 Tasks1.2.1 Prerequisites1.2.2 Stepwise and integrated approaches1.3 Data structures1334556671.3.1 Data sources1.3.2 Facilities in R and bioconductor1.4 Statistical background1.4.1 An error model91.4.2 The variance-bias trade-off1.4.3 Sensitivity and specificity of probes1.5 Conclusion2 Preprocessing High-density Oligonucleotide ArraysB.M. Bolstad, R. A. Irizarry L. Gautier, and z. Wu2.1 Introduction132.2 Importing and accessing probe-level data152.2.1 Importing152.2.2 Examining probe-level data2.3 Background adjustment and normalization2.3.1 Background adjustment23.2 Normalization202.3.3Vsn242.4 Summarization252.4.1 expresso252.4.2 threestep2.4.3RMA272.4.4 GCRMA272.4.5 afftypd28Contents2.5 Assessing preprocessing methods292.5.1 Carrying out the assessment30Conclusion323 Quality Assessment of Affymetrix Gene Chip Data33B.M. Bolstad, F. Collin, J. Brettschneider, K. Simpson, L. CopeR. A. Irizarry, and T.P. Speed3.1 Introduction333.2 Exploratory data analysis343.2.1 Multi-array approaches353.3 Affymetrix quality assessment metrics373.4 RNA degradation383.5 Probe level models413.5.1 Quality diagnostics using PLM423.6 Conclusion474 Preprocessing Two-Color Spotted Arrays49Y.H. Yang and A C. Paquet4.1 Introduction494.2 Two-color spotted microarrays504.2.1 Illustrative data504.3 Importing and accessing probe-level data514.3.1 Importing514.3.2 Readading target information524.3. 3 Reading probe-related information534.3.4 Reading probe and background intensities544.3.5 Data structure: the marray Raw class544.3.6 Accessing the data564.3.7 Subsetting564. Quality assessment574.4.1 Diagnostic plc574.4.2 Spatial plots of spot statistics- image594.4.3 Boxplots of spot statistics-boxplot604.4.4 Scatter-plots of spot statistics-plot614.5 Normalization624.5.1 Two-channel normalization634.5.2 Separate-channel normalizatic644.6 Case study675 Cell-Based Assays71W. Huber and f. hahne5.1So715.2 Experimental technologies5.2.1 Expression assays725.2.2 Loss of function assaysContents ix5.2.3 Monitoring the response725.3 Reading data5.3.1 Plate reader data745.3.2 Further directions in normalization765.3.3 FCS format775.4 Quality assessment and visualization705.4.1 Visualization at the level of individual cells795.4.2 Visualization at the level of microtiter plates825.4.3 Brushing with rggobi5.5 Detection of effectors5.5.1 Discrete reesponse5.5.2 Continuous response5.5.3 Outlook6 SELDI-TOF Mass Spectrometry Protein DataX.Li, R. Gentleman, X. Lu, Q. Shi,J D. Iglehart, L. Harris, andA. Miron6.1 Introduction6.2 Baseline subtraction36.3 Peak detection56.4 Processing a set of calibration spectra966.4. 1 Apply baseline subtraction to a set of spectra6.4.2 Normalize spectra996.4.3 Cutoff selection1006.4.4 Identify peaks1016.4.5 Quality assessment1016.4.6 Get proto-biomarkers1026.5 An example1056.6 Conclusion108Ii Meta-data: biological annotation and visualization1117 Meta-data resources and tools in bioconductor113R. Gentleman, V.J. Carey, and J. Zhang7.1 Introduction1137.2 External annotation resources1157.3 Bioconductor annotation concepts: curated persistentpackages and Web services1167.3.1 Annotating a platform: HG-U95Av21177.3.2 An Example1187.3.3 Annotating a genome1197.5 Software tools for working with Gene Ontology(GO..1197. 4 The annotate package120Contents7.5. 1 Basics of working with the Go package1217.5.2 Navigating the hierarchy1227.5.3 Searching for terms1227.5.4 Annotation of Go terms to Locus Link sequencesevidence codes1237.5.5 The Go graph associated with a term1257.6 Pathway annotation packages: KEGG and CMAP1257.6.1 KEGG1267.6.2 CMAP1277.6.3 A Case Stud1297.7 Cross-organism annotation: the homology packages1307. 8 Annotation from other sources1327.9 Discussion1338 Querying On-line Resources135V.J. Carey, D. Temple Lang, J. Gentry, J. Zhang and RGentleman8. 1 The Tools1358.1.1 Entrez1378.1.2 Entrez examples1378.2 PubMed1388.2.1 Accessing pubMed information1398.2.2 Generating HTML output for your abstracts1418.3 KEGG via SOaP1424 Getting gene sequence information1448.5 Conclusion1459 Interactive Outputs147C. A. Smith. W. Huber and R. Gentleman9.1 Introduction1479.2 A simple approach1489.3 Using the annaffy package.1499.4 Linking to on-line databases1529.5 Building HTML pages1539.5. 1 Limiting the results1539.5.2 Annotating the probes1549.5.3 Adding other data1559.6 Graphical displays with drill-down functionality1569.6.1 HTML image maps1579.6. 2 Scalable Vector Graphics(SVG)1589. 7 Searching Meta-data1599.7.1Text searching1599.8 Concluding remarks16010Vng Dat161
下载地址
用户评论