1. 首页
  2. 课程学习
  3. 讲义
  4. SupportVectorMachinesforPatternClassification

SupportVectorMachinesforPatternClassification

上传者: 2019-05-15 00:31:31上传 PDF文件 1.81MB 热度 15次
Support Vector Machines,Pattern Classification用于模式识别;支持向量机。Shigeo abeSupport VectorMachines forPatternClassificationWith 110 Figures②SpringerProfessor Dr Shigeo abeKobe University, Kobe, JapanSeries editorProfessor Sameer Singh, PhDDepartment of Computer Science, University of Exeter, Exeter, EX4 4PT, UKBritish Library Cataloguing in Publication DataA catalogue record for this book is available from the British LibraryLibrary of Congress Cataloging-in-Publication DataAbe, Shigeo, 1947-Support vector machines for pattern classification Shigeo Abe.cmIncludes bibliographical references and index.ISBN 1-85233-929-9(alk. paper)Text processing( Computer science) 2. Pattern recognition systems. 3. Machinelearning I. TitleQA769.T48A232005005.52-dc222005040265Advances in Pattern Recognition ISSN 1617-7916ISBN-10:1-85233-929-2Printed on acid-free paperISBN-13:978-1-85233-9296O Springer-Verlag London Limited 2005Apart from any fair dealing for the purposes of research or private study, or criticism or review,S permitted under the Copyright, Designs and Patents Act 1988, this publication may only bereproduced, stored or transmitted, in any form or by any means, with the prior permission inwriting of the publishers, or in the case of reprographic reproduction in accordance with theterms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproductionoutside those terms should be sent to the publishers.The use of registered names, trademarks, etc. in this publication does not imply, even in theabsence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general useThe publisher makes no representation, express or implied, with regard to the accuracy of theinformation contained in this book and cannot accept any legal responsibility or liability for anyerrors or omissions that may bee madePrinted in the United States of America (SB)Springer Science+Business Mediaspringeronline.comPrefaceI was shocked to see a student's report on performance comparisons betweensupport vector machines(SVMs) and fuzzy classifiers that we had developedwith our best endeavors. Classification performance of our fuzzy classifiers wascomparable, but in most cases inferior, to that of support vector machinesThis tendency was especially evident when the numbers of class data weresmall. I shifted my research efforts from developing fuzzy classifiers with highgeneralization ability to developing support vector machine-based classifiersThis book focuses on the application of support vector machines to pattern classification. Specifically, we discuss the properties of support vectormachines that are useful for pattern classification applications, several multiclass models, and variants of support vector machines. To clarify their applicability to real-world problems, we compare performance of most modelsdiscussed in the book using real-world benchmark data. Readers interested inthe theoretical aspect of support vector machines should refer to books suchas[109,215,256,257Three-layer neural networks are universal classifiers in that they can clas-sify any labeled data correctly if there are no identical data in different classes3, 279. In training multilayer neural network classifiers, network weights areusually corrected so that the sum-of-squares error between the network outputs and the desired outputs is minimized. But because the decision bound-aries between classes acquired by training are not directly determined, classification performance for the unknown data, i.e., the generalization abilitydepends on the training method. And it degrades greatly when the numberof training data is small and there is no class overlapOn the other hand, in training support vector machines the decisionBoundaries are determined directly from the training data so that the sepa-rating margins of decision boundaries are maximized in the high-dimensionalspace called feature space. This learning strategy, based on statistical learningtheory developed by Vapnik 256, 257, minimizes the classification errors ofthe training data and the unknown dataVIPrefaceTherefore, the generalization abilities of support vector machines and otherclassifiers differ significantly, especially when the number of training data issmall. This means that if some mechanism to maximize the margins of decision boundaries is introduced to non-SVM-type classifiers, their performancedegradation will be prevented when the class overlap is scarce or nonexistent. IIn the original support vector machine, an n-class classification problemis converted into n two-class problems, and in the ith two-class problem wedetermine the optimal decision function that separates class i from the remaining classes. In classification, if one of the n decision functions classifiesan unknown datum into a definite class it is classified into that class In thisformulation, if more than one decision function classify a datum into definiteclasses, or if no decision functions classify the datum into a definite class, thedatum is unclassifiableAnother problem of support vector machines is slow training. Because support vector machines are trained by solving a quadratic programming problemwith the number of variables equal to the number of training data, trainings slow for a large number of training dataTo resolve unclassifiable regions for multiclass support vector machines wepropose fuzzy support vector machines and decision-tree-based support vectormachinesTo accelerate training, in this book, we discuss two approaches: selectionof important data for training support vector machines before training andtraining by decomposing the optimization problem into two subproblemsTo improve generalization ability of non- SVM-type classifiers, we introducethe ideas of support vector machines to the classifiers: neural network trainingincorporating maximizing margins and a kernel version of a fuzzy classifierwith ellipsoidal regions 3, pp 90-3, 119-39In Chapter 1, we discuss two types of decision functions: direct decisionfunctions, in which the class boundary is given by the curve where the decision function vanishes: and the indirect decision function. in which the classboundary is given by the curve where two decision functions take on the samevalueIn Chapter 2, we discuss the architecture of support vector machines fortwo-class classification problems. First we explain hard-margin support vectormachines, which are used when the classification problem is linearly separablenamely, the training data of two classes are separated by a single hyperplaneThen, introducing slack variables for the training data, we extend hard-marginsupport vector machines so that they are applicable to inseparable problemsThere are two types of support vector machines: Ll soft-margin support vector machines and L2 soft-margin support vector machines. Here, LI and L2denote the linear sum and the square sum of the slack variables that areadded to the objective function for training. Then we investigate the characTo improve generalization ability of a classifier, a regularization term, whichcontrols the complexity of the classifier, is added to the objective functionPreface VIIteristics of solutions extensively and survey several techniques for estimatingthe generalization ability of support vector machinesIn Chapter 3, we discuss some methods for multiclass problems: oneagainst-all support vector machines, in which each class is separated fromthe remaining classes; pairwise support vector machines, in which one classis separated from another class; the use of error-correcting output codes forresolving unclassifiable regions; and all-at-once support vector machines, inwhich decision functions for all the classes are determined at once. To resolveunclassifiable regions, in addition to error-correcting codes, we discuss fuzzysupport vector machines with membership functions and decision-tree-basedsupport vector machines. To compare several methods for multiclass prob-lems, we show performance evaluation of these methods for the benchmarkdata setsSince support vector machines were proposed, many variants of supportvector machines have been developed. In Chapter 4, we discuss some of themleast squares support vector machines whose training results in solving a set oflinear equations, linear programming support vector machines, robust supportvector machines. and so onIn Chapter 5, we discuss some training methods for support vector machines. Because we need to solve a quadratic optimization problem with thenumber of variables equal to the number of training data, it is impractical tosolve a problem with a huge number of training data. For example, for 10,000training data, 800 MB memory is necessary to store the Hessian matrix indouble precision. Therefore, several methods have been developed to speedtraining. One approach reduces the number of training data by preselectingthe training data. The other is to speed training by decomposing the probleminto two subproblems and repeatedly solving the one subproblem while fixingthe other and exchanging the variables between the two subproblemsOptimal selection of features is important in realizing high-performanceclassification systems. Because support vector machines are trained so thatthe margins are maximized, they are said to be robust for nonoptimal featuresIn Chapter 6, we discuss several methods for selecting optimal features andshow, using some benchmark data sets, that feature selection is importanteven for support vector machines. Then we discuss feature extraction thattransforms input features by linear and nonlinear transformationSome classifiers need clustering of training data before training. But support vector machines do not require clustering because mapping into a featurespace results in clustering in the input space. In Chapter 7, we discuss howwe can realize support vector machine-based clusterinOne of the features of support vector machines is that by mapping the input space into the feature space, nonlinear separation of class data is realizedThus the conventional linear models become nonlinear if the linear models areformulated in the feature space. They are usually called kernel-based methodsIn Chapter 8, we discuss typical kernel-based methods: kernel least squareskernel principal component analysis, and the kernel Mahalanobis distanceVIII PrefaceThe concept of maximum margins can be used for conventional classifiersto enhance generalization ability. In Chapter 9, we discuss methods for maximizing margins of multilayer neural networks, and in Chapter 10 we discussmaximum-margin fuzzy classifiers with ellipsoidal regions and polyhedral regionsSupport vector machines can be applied to function approximation. InChapter 11, we discuss how to extend support vector machines to functionapproximation and compare the performance of the support vector machinewith that of other function approximatorsAcknowledgmentsWe are grateful to those who are involved in the research project, conducted atthe Graduate School of Science and Technology, Kobe University, on neuralfuzzy, and support vector machine-based classifiers and function approximators, for their efforts in developing new methods and programs. Discussionswith Dr. Seiichi Ozawa were always helpful. Special thanks are due to thenand current graduate and undergraduate students: T Inoue, K. Sakaguchi, TTakigawa, F. Takahashi, Y. Hirokawa, T Nishikawa, K. Kaieda, Y. KoshibaD. Tsujinishi, Y. Miyamoto, S. Katagiri, T. Yamasaki, T. Kikuchi, and KMorikawa: and Ph.D. student T banI thank A. Ralescu for having used my draft version of the book as agraduate course text and having given me many useful comments. Thanks arealso due to H. Nakayama, s. Miyamoto, J. A. K. Suykens, F. Anouar, G. CCawley. H. Motoda, A. Inoue. F. Schwenker. N. Kasabov, and B -L. Lu fortheir valuable discussions and useful commentsThe internet was a valuable source of information in writing the bookMost of the papers listed in the References were obtained from the Internetfrom either authors 'home pages or free downloadable sites such asEsanN:www.dice.ucl.ac.be/esann/proceedings/electronicproceedings.htmJmlr:www.jmir.org/papers/NEC Research Institute CiteSeer: citeseer. nj. nec. com/csNIPS: books. nips. ccKobe. October 2004Shigeo abeContentsPrefacenomenclature1 ntroduction1.1 Decision Functions13331.1.1 Decision Functions for Two-Class Problems1.1.2 Decision Functions for Multiclass problems1.2 Determination of decision Functions1.3 Data Sets Used in the book2 Two-Class Support Vector Machines152.1 Hard-Margin Support Vector machines2.2 Ll Soft-Margin Support Vector Machines2.3 Mapping to a High-Dimensional Space.252.3.1 Kernel tricks2.3.2 Kernels272.3.3 Normalizing Kernels2.3.4 Properties of Mapping Functions Associated with30Kernels312.3.5 Implicit Bias Terms2.4 L2 Soft-Margin Support Vector Machines.372.5 Advantages and Disadvantages392.5.1 Advanta392.5.2 Disadva2.6 Characteristics of solutions2.6.1 Hessian Matrix2.6.2 Dependence of Solutions on Ci41..,422.6.3 Equivalence of Ll and L2 Support Vector Machines472.6.4 Nonunique Solutions502.6.5 Reducing the Number of Support Vectors582.6.6 Degenerate Solutions61Xontents2.6.7 Duplicate Copies of data632.6.8 Imbalanced Data62.6.9 Classification for the Blood cell data2. 7 Class boundaries for Different Kernels.702.8 Developing Classificers722. 8. 1 Model selection2.8.2 Estimating Generalization Errors2.8.3 Sophistication of Model Selection2.9 Invariance for Linear Transformation3 Multiclass Support Vector Machines3.1 One-against-All Support Vector Machines843.1.1 Conventional Support Vector Machines843.1.2 Fuzzy Support Vector machines3.1.3 Equivalence of Fuzzy Support Vector Machines andSupport Vector Machines with Continuous DecisionFunctions893.1.4 Decision-Tree-Based Support Vector Machines3.2 Pairwise Support vector machines3.2.1 Conventional Support vector machines3.2.2 Fuzzy Support Vector machines973.2.3 Performance Comparison of Fuzzy Support VectorMachines983.2.4 Cluster-Based Support Vector Machines...,1013.2.5 Decision-Tree-Based Support Vector Machines1023.2.6 Pairwise Classification with Correcting Classifiers1123.3 Error-Correcting Output Codes1133.3.1 Output Coding by Error-Correcting Codes3.3.2 Unified Scheme for Output Coding1143.3.3 Equivalence of ECoC with Membership Functions.. 1153.3.4 Performance Evaluation1163.4 All-at-Once Support Vector Machines1183.4.1 Basic architecture1183.4.2 Sophisticated Architecture.1203.5 Comparisons of architectures1223.5. 1 One-against-All Support vector machines1223.5.2 Pairwise Support Vector machines1233.5.3 ECOC Support Vector Machines1233.5.4 All-at-Once Support Vector Machines1243.5.5 Training Difficulty1243.5.6 Training Time Comparison.127
下载地址
用户评论