Foundations of Statistical Natural Language Processing
Part I Preliminaries 1 Introduction 1.1 Prologue: Rationalist and Empiricist Approaches 1.2 Scientific Content 1.2.1 Questions that linguistics should answer 1.2.2 Non 1.2.3 Language and cognition as probabilistic phenomena 1.3 The Ambiguity of Language: Why NLP is Difficult 1.4 Dirty Hands 1.4.1 Lexical resources 1.4.2 Word counts 1.4.3 Zipf’s laws 1.4.4 Collocations 1.4.5 Concordances 1.5 Further Reading 1.6 Exercises 2 Mathematical Foundations 2.1 Elementary Probability Theory 2.1.1 Probability spaces 2.1.2 Conditional probability and independence 2.1.3 Bayes’ theorem 2.1.4 Random variables 2.1.5 Expectation and variance 2.1.6 Notation 2.1.7 Joint and conditional distributions 2.1.8 Determining 2.1.9 Standard distributions 2.1.10 Bayesian statistics 2.1.11 Exercises 2.2 Essential Information Theory 2.2.1 Entropy 2.2.2 Joint entropy and conditional entropy 2.2.3 Mutual information 2.2.4 The noisy channel model 2.2.5 Relative entropy or Kullback 2.2.6 The relation to language: Cross entropy 2.2.7 The entropy of English 2.2.8 Perplexity 2.2.9 Exercises 2.3 Further Reading 3 Linguistic Essentials 3.1 Parts of Speech and Morphology 3.1.1 Nouns and pronouns 3.1.2 Words that accompany nouns: Determiners and adjectives 3.1.3 Verbs 3.1.4 Other parts of speech 3.2 Phrase Structure 3.2.1 Phrase structure grammars 3.2.2 Dependency: Arguments and adjuncts 3.2.3 X_ theory 3.2.4 Phrase structure ambiguity 3.3 Semantics and Pragmatics 3.4 Other Areas 3.5 Further Reading 3.6 Exercises 4 Corpus 4.1 Getting Set Up 4.1.1 Computers 4.1.2 Corpora 4.1.3 Software 4.2 Looking at Text 4.2.1 Low level formatting issues 4.2.2 Tokenization: What is a word? 4.2.3 Morphology 4.2.4 Sentences 4.3 Marked 4.3.1 Mark 4.3.2 Grammatical tagging 4.4 Further Reading Part II Words 5 Collocations 5.1 Frequency 5.2 Mean and Variance 5.3 Hypothesis Testing 5.3.1 The t test 5.3.2 Hypothesis testing of differences 5.3.3 Pearson’s chi 5.3.4 Likelihood Ratios 5.4 Mutual Information 5.5 The Notion of Collocation 5.6 Further Reading 6 Statistical Inference: n 6.1 Bins: Forming Equivalence Classes 6.1.1 Reliability vs. discrimination 6.1.2 n 6.1.3 Building n 6.2 Statistical Estimators 6.2.1 Maximum Likelihood Estimation (MLE) 6.2.2 Laplace’s Law, Lidstone’s Law and the Jeffreys 6.2.3 Held out estimation 6.2.4 Cross 6.2.5 Good 6.2.6 Briefly noted 6.3 Combining Estimators 6.3.1 Simple linear interpolation 6.3.2 Katz’s backing 6.3.3 General linear interpolation 6.3.4 Briefly noted 6.3.5 Language models for Austen 6.4 Conclusions 6.5 Further Reading 6.6 Exercises 7 Word Sense Disambiguation 7.1 Methodological Preliminaries 7.1.1 Supervised and unsupervised learning 7.1.2 Pseudowords 7.1.3 Upper and lower bounds on performance 7.2 Supervised Disambiguation 7.2.1 Bayesian classification 7.2.2 An information 7.3 Dictionary 7.3.1 Disambiguation based on sense definitions 7.3.2 Thesaurus 7.3.3 Disambiguation based on translations in a second 7.3.4 One sense per discourse, one sense per collocation 7.4 Unsupervised Disambiguation 7.5 What is aWord Sense? 7.6 Further Reading 7.7 Exercises 8 Lexical Acquisition 8.1 Evaluation Measures 8.2 Verb Subcategorization 8.3 Attachment Ambiguity 8.3.1 Hindle and Rooth (1993) 8.3.2 General remarks on PP attachment 8.4 Selectional Preferences 8.5 Semantic Similarity 8.5.1 Vector space measures 8.5.2 Probabilistic measures 8.6 The Role of Lexical Acquisition in Statistical NLP 8.7 Further Reading Part III Grammar 9 Markov Models 9.1 Markov Models 9.2 Hidden Markov Models 9.2.1 Why use HMMs? 9.2.2 General form of an HMM 9.3 The Three Fundamental Questions for HMMs 9.3.1 Finding the probability of an observation 9.3.2 Finding the best state sequence 9.3.3 The third problem: Parameter estimation 9.4 HMMs: Implementation, Properties, and Variants 9.4.1 Implementation 9.4.2 Variants 9.4.3 Multiple input observations 9.4.4 Initialization of parameter values 9.5 Further Reading 10 Part 10.1 The Information Sources in Tagging 10.2 Markov Model Taggers 10.2.1 The probabilistic model 10.2.2 The Viterbi algorithm 10.2.3 Variations 10.3 Hidden Markov Model Taggers 10.3.1 Applying HMMs to POS tagging 10.3.2 The effect of initialization on HMM training 10.4 Transformation 10.4.1 Transformations 10.4.2 The learning algorithm 10.4.3 Relation to other models 10.4.4 Automata 10.4.5 Summary 10.5 Other Methods, Other Languages 10.5.1 Other approaches to tagging 10.5.2 Languages other than English 10.6 Tagging Accuracy and Uses of Taggers 10.6.1 Tagging accuracy 10.6.2 Applications of tagging 10.7 Further Reading 10.8 Exercises 11 Probabilistic Context Free Grammars 11.1 Some Features of PCFGs 11.2 Questions for PCFGs 11.3 The Probability of a String 11.3.1 Using inside probabilities 11.3.2 Using outside probabilities 11.3.3 Finding the most likely parse for a sentence 11.3.4 Training a PCFG 11.4 Problems with the Inside 11.5 Further Reading 11.6 Exercises 12 Probabilistic Parsing 12.1 Some Concepts 12.1.1 Parsing for disambiguation 12.1.2 Treebanks 12.1.3 Parsing models vs. language models 12.1.4 Weakening the independence assumptions of PCFGs 12.1.5 Tree probabilities and derivational probabilities 12.1.6 There’s more than one way to do it 12.1.7 Phrase structure grammars and dependency grammars 12.1.8 Evaluation 12.1.9 Equivalent models 12.1.10 Building parsers: Search methods 12.1.11 Use of the geometric mean 12.2 Some Approaches 12.2.1 Non 12.2.2 Lexicalized models using derivational histories 12.2.3 Dependency 12.2.4 Discussion 12.3 Further Reading 12.4 Exercises Part IV Applications and Techniques 13 Statistical Alignment and Machine Translation 13.1 Text Alignment 13.1.1 Aligning sentences and paragraphs 13.1.2 Length 13.1.3 Offset alignment by signal processing techniques 13.1.4 Lexical methods of sentence alignment 13.1.5 Summary 13.1.6 Exercises 13.2 Word Alignment 13.3 Statistical Machine Translation 13.3.1 Exercises 13.4 Further Reading 14 Clustering 14.1 Hierarchical Clustering 14.1.1 Single 14.1.2 Group 14.1.3 An application: Improving a language model 14.1.4 Top 14.2 Non 14.2.1 K 14.2.2 The EM algorithm 14.3 Further Reading 14.4 Exercises 15 Topics in Information Retrieval 15.1 Some Background on Information Retrieval 15.1.1 Common design features of IR systems 15.1.2 Evaluation measures 15.1.3 The probability ranking principle (PRP) 15.2 The Vector Space Model 15.2.1 Vector similarity 15.2.2 Term weighting 15.3 Term Distribution Models 15.3.1 The Poisson distribution 15.3.2 The two 15.3.3 The K mixture 15.3.4 Inverse document frequency 15.3.5 Residual inverse document frequency 15.3.6 Usage of term distribution models 15.4 Latent Semantic Indexing 15.4.1 Least 15.4.2 Singular Value Decomposition 15.4.3 Latent Semantic Indexing in IR 15.5 Discourse Segmentation 15.5.1 TextTiling 15.6 Further Reading 15.7 Exercises 16 Text Categorization 16.1 Decision Trees 16.2 Maximum Entropy Modeling 16.2.1 Generalized iterative scaling 16.2.2 Application to text categorization 16.3 Perceptrons 16.4 K Nearest Neighbor classification 16.5 Further Reading
用户评论
这本自然语音处理最经典的教材,值得花时间认真阅读。 但请尊重作者版权,购买纸质正版。
书很不错,内容很适合补充学习者。
内容清晰,谢谢分享。