1 Notation and conventions 9 1.0.1 Background Information........................................................................ 10 1.1 Acknowledgements................................................................................................. 11 I Describing Datasets , 12 2 First Tools for Looking at Data 13 2.1 Datasets....................................................................................................................... 13 2.2 What's Happening? - Plotting Data................................................................. 15 2.2.1 Bar 2.2.2 Histograms................................................................................................... 16 2.2.3 How to Make Histograms...................................................................... 17 2.2.4 Conditional Histograms.......................................................................... 19 2.3 Summarizing 1D Data............................................................................................ 19 2.3.1 The Mean...................................................................................................... 20 2.3.2 Standard Deviation................................................................................... 22 2.3.3 Computing Mean and Standard Deviation Online...................... 26 2.3.4 Variance......................................................................................................... 26 2.3.5 The Median.................................................................................................. 27 2.3.6 Interquartile Range.................................................................................. 29 2.3.7 Using Summaries Sensibly.................................................................... 30 2.4 Plots and Summaries............................................................................................. 31 2.4.1 Some Properties of Histograms.......................................................... 31 2.4.2 Standard Coordinates and Normal Data......................................... 34 2.4.3 Box Plots....................................................................................................... 38 2.5 Whose is bigger? Investigating Australian Pizzas...................................... 39 2.6 You should.................................................................................................................. 43 2.6.1 remember these definitions:................................................................. 43 2.6.2 remember these terms............................................................................ 43 2.6.3 remember these facts:............................................................................. 43 2.6.4 be able to...................................................................................................... 43 3 Looking at Relationships 47 3.1 Plotting 2D Data...................................................................................................... 47 3.1.1 3.1.2 Series.............................................................................................................. 51 3.1.3 Scatter Plots for Spatial Data.............................................................. 53 3.1.4 Exposing Relationships with Scatter Plots..................................... 54 3.2 Correlation.................................................................................................................. 57 3.2.1 The Correlation Coefficient................................................................... 60 3.2.2 Using Correlation to Predict................................................................ 64 3.2.3 Confusion caused by correlation......................................................... 68 1 3.4 You should.................................................................................................................. 72 3.4.1 remember these definitions:................................................................. 72 3.4.2 remember these terms............................................................................ 72 3.4.3 remember these facts: . . . . . 3.4.4 use these procedures: . . . . . . 3.4.5 be able to: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 . . . . . . . . . . . . . . . . . 72 . . . . . . . . . . . . . . . . . 72 II Probability  , 78 4 Basic ideas in probability 79 4.1 Experiments, Outcomes and Probability....................................................... 79 4.1.1 Outcomes and Probability...................................................................... 79 4.2 Events........................................................................................................................... 81 4.2.1 Computing Event Probabilities by Counting Outcomes............. 83 4.2.2 The Probability of Events...................................................................... 87 4.2.3 Computing Probabilities by Reasoning about Sets...................... 89 4.3 Independence............................................................................................................ 92 4.3.1 Example: Airline Overbooking............................................................ 96 4.4 Conditional ........................................................ 99 4.4.1 Evaluating Conditional Probabilities.............................................. 100 4.4.2 Detecting Rare Events is Hard......................................................... 104 4.4.3 Conditional Probability and Various Forms of Independence . 106 4.4.4 The Prosecutor's Fallacy 108 4.4.5 Example: The Monty Hall Problem................................................ 110 4.5 Extra Worked Examples.................................................................................... 112 4.5.1 Outcomes and Probability................................................................... 112 4.5.2 Events.......................................................................................................... 114 4.5.3 Independence........................................................................................... 115 4.5.4 Conditional Probability......................................................................... 117 4.6 You should............................................................................................................... 121 4.6.1 remember these definitions:.............................................................. 121 4.6.2 remember these terms......................................................................... 121 4.6.3 remember and use these facts.......................................................... 121 4.6.4 remember these points:....................................................................... 121 4.6.5 be able to.................................................................................................... 121 5 Random Variables and Expectations 128 5.1 Random Variables................................................................................................. 128 5.1.1 Joint and Conditional Probability for Random Variables . . . 131 5.1.2 Just a Little Continuous Probability............................................... 134 5.2 Expectations and Expected Values................................................................ 137 5.2.1 Expected Values...................................................................................... 138 5.2.2 Mean, Variance and Covariance....................................................... 141 5.2.3 Expectations and Statistics................................................................. 145 5.3 The Weak Law of Large Numbers................................................................ 145 5.3.1 IID Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.3.2 Two Inequalities . . . . . . . . . . . . . . . . . . . . . . . .. 146 5.3.3 Proving the Inequalities . . . . . . . . . . . . . . . . . . . . . 147 5.3.4 The Weak Law of Large Numbers.................................................. 149 5.4 Using the Weak Law of Large Numbers 151 5.4.1 Should you accept a bet?..................................................................... 151 5.4.2 Odds, Expectations and Bookmaking - a Cultural Diversion 152 5.4.3 Ending a Game Early 154 5.4.4 Making a Decision with Decision Trees and Expectations . . 154 5.4.5 Utility 156 5.5 You should................................................................................... 159 5.5.1 remember these definitions:.............................................................. 159 5.5.2 remember these terms......................................................................... 159 5.5.3 use and remember these facts.......................................................... 159 5.5.4 be able to.................................................................................................... 160 6 Useful Probability Distributions , 167 6.1 Discrete Distributions 167 6.1.1 The Discrete Uniform Distribution................................................. 167 6.1.2 Bernoulli Random Variables............................................................... 168 6.1.3 The Geometric Distribution................................................................ 168 6.1.4 The Binomial Probability Distribution........................................... 169 6.1.5 Multinomial probabilities..................................................................... 171 6.1.6 The Poisson Distribution..................................................................... 172 6.2 Continuous Distributions , 174 6.2.1 The Continuous Uniform Distribution........................................... 174 6.2.2 The Beta Distribution........................................................................... 174 6.2.3 The Gamma Distribution..................................................................... 176 6.2.4 The Exponential Distribution............................................................ 176 6.3 The Normal Distribution , 178 6.3.1 The Standard Normal Distribution................................................. 178 6.3.2 The Normal Distribution..................................................................... 179 6.3.3 Properties of The Normal Distribution......................................... 180 6.4 Approximating Binomials with Large N 182 6.4.1 Large N....................................................................................................... 183 6.4.2 Getting Normal6.4.3 Using a Normal Approximation to the Binomial Distribution 187 6.5 You should . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 remember these definitions: . . . . . . . . . . . . . . . . 6.5.2 remember these terms: . . . . . . . . . . . . . . . . . . . 6.5.3 remember these facts: . . . . . . . . . . . . . . . . . . . 6.5.4 remember these points: . . . . . . . . . . . . . . . . . . . . . 188 . . . 188 . . . 188 . . . 188 . . . 188 III Inference , 196 7 Samples and Populations 197 7.1 The Sample Mean................................................................................................. 197 7.1.1 The Sample Mean is an Estimate of the Population Mean . . 197 7.1.2 The Variance of the Sample Mean.................................................. 198 7.1.3 When The Urn Model Works............................................................ 201 7.1.4 Distributions are Like Populations................................................. 202 7.2 Confidence Intervals............................................................................................ 203 7.2.1 Constructing Confidence Intervals.................................................. 203 7.2.2 Estimating the Variance of the Sample Mean............................ 204 7.2.3 The Probability Distribution of the Sample Mean..................... 206 <,7.2.4 Confidence Intervals for Population Means................................. 208 7.2.5 Standard Error Estimates from Simulation................................. 212 7.3 You should............................................................................................................... 216 7.3.1 remember these definitions:.............................................................. 216 7.3.2 remember these terms......................................................................... 216 7.3.3 remember these facts:........................................................................... 216 7.3.4 use these procedures............................................................................. 216 7.3.5 be able to.................................................................................................... 216 8 The Significance of Evidence 221 8.1 Significance.............................................................................................................. 222 8.1.1 Evaluating Significance......................................................................... 223 8.1.2 P-values....................................................................................................... 225 8.2 Comparing the Mean of Two Populations.................................................. 230 8.2.1 Assuming Known Population Standard Deviations................... 231 8.2.2 Assuming Same, Unknown Population Standard Deviation . 233 8.2.3 Assuming Different, Unknown Population Standard Deviation 235 8.3 Other Useful Tests of Significance................................................................. 237 8.3.1 F-tests and Standard Deviations...................................................... 237 8.3.2 2 Tests of Model Fit............................................................................ 239 8.4 Dangerous Behavior............................................................................................. 244 8.5 You should............................................................................................................... 246 8.5.1 remember these definitions:.............................................................. 246 8.5.2 remember 8.5.3 remember these facts: . . . . . 8.5.4 use these procedures: . . . . . . 8.5.5 be able to: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 . . . . . . . . . . . . . . . . . 246 . . . . . . . . . . . . . . . . . 246 9 Experiments  , 251 9.1 A Simple Experiment: The Effect of a Treatment.................................. 251 9.1.1 Randomized Balanced Experiments............................................... 252 9.1.2 Decomposing Error in Predictions.................................................. 253 9.1.3 Estimating the Noise Variance......................................................... 253 9.1.4 The ANOVA Table.................................................................................. 255 9.1.5 Unbalanced Experiments.................................................................... 257 9.1.6 Significant Differences.......................................................................... 259 9.2 Two Factor Experiments.................................................................................... 261 9.2.1  , Decomposing the Error........................................................................ 264 9.2.2 Interaction Between Effects................................................................ 265 9.2.3 The Effects of a Treatment................................................................. 266 9.2.4 Setting up an ANOVA Table.............................................................. 267 9.3 You should............................................................................................................... 272 9.3.1 remember these definitions:.............................................................. 272 9.3.2 remember these terms......................................................................... 272 9.3.3 remember these facts:........................................................................... 272 9.3.4 use these procedures............................................................................. 272 9.3.5 be able to.................................................................................................... 272 9.3.6 Two-Way Experiments.......................................................................... 274 10 Inferring Probability Models from Data  , 275 10.1 Estimating Model Parameters with Maximum Likelihood.................. 275 10.1.1 The Maximum Likelihood Principle............................................... 277 10.1.2 Binomial, Geometric and Multinomial Distributions................ 278 10.1.3 Poisson and Normal Distributions................................................... 281 10.1.4 Confidence Intervals for Model Parameters................................ 286 10.1.5 Cautions about Maximum Likelihood............................................ 288 10.2 Incorporating Priors with Bayesian Inference.......................................... 289 10.2.1 Conjugacy................................................................................................... 292 10.2.2 MAP Inference......................................................................................... 294 10.2.3 Cautions about Bayesian Inference................................................. 296 10.3 Bayesian Inference for Normal Distributions............................................ 296 10.3.1 Example: Measuring Depth of a Borehole................................... 296 10.3.2 Normal Prior and Normal Likelihood Yield Normal Posterior 297 10.3.3 Filtering...................................................................................................... 300 10.4 You should............................................................................................................... 303 10.4.1 remember these definitions:.............................................................. 303 10.4.2 remember these terms......................................................................... 303 10.4.3 remember these facts:........................................................................... 304 10.4.4 use these procedures............................................................................. 304 10.4.5 be able to.................................................................................................... 304 <,IV Tools 312 11 Extracting Important Relationships in High Dimensions 313 11.1 Summaries and Simple Plots........................................................................... 313 11.1.1 The Mean................................................................................................... 314 11.1.2 Stem Plots and Scatterplot Matrices.............................................. 315 11.1.3 Covariance.................................................................................................. 317 11.1.4 The Covariance Matrix......................................................................... 319 11.2 Using Mean and Covariance to Understand High Dimensional Data . 321 11.2.1 Mean and Covariance under Affine Transformations............... 322 11.2.2 . . 324 . . 325 . . 326 . . 327 . . 329 . 332 . . 334 . . 335 . . 335 . . 338 . . 339 . . 341 . . 345 . . 345 . . 345 . . 345 . . 345 . . 345 349 . . 349 . . 350 . . 350 . . 351 . . 351 . . 353 . . 355 . . 357 . . 358 . . 359 . . . 361 Eigenvectors and Diagonalization . . . . . . . . . . . . . . 11.2.3 Diagonalizing Covariance by Rotating Blobs . . . . . . . . 11.2.4 Approximating Blobs . . . . . . . . . . . . . . . . . . . . 11.2.5 Example: Transforming the Height-Weight Blob . . . . . 11.3 Principal Components Analysis . . . . . . . . . . . . . . . . . . . 11.3.1 Example: Representing Colors with Principal Components 11.3.2 Example: Representing Faces with Principal Components 11.4 Multi-Dimensional Scaling . . . . . . . . . . . . . . . . . . . . . . 11.4.1 Choosing Low D Points using High D Distances . . . . . . 11.4.2 Factoring a Dot-Product Matrix . . . . . . . . . . . . . . 11.4.3 Example: Mapping with Multidimensional Scaling . . . . 11.5 Example: Understanding Height and Weight . . . . . . . . . . . 11.6 You should . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6.1 remember these definitions: . . . . . . . . . . . . . . . . . 11.6.2 remember these terms: . . . . . . . . . . . . . . . . . . . . 11.6.3 remember these facts: . , . . . . . . . . . . . . . . . . . . . 11.6.4 use these procedures: . . . . . . . . . . . . . . . . . . . . . 11.6.5 be able to: . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Learning to Classify 12.1 Classification: The Big Ideas . . . . . . . . . . . . . . . . . . . . 12.1.1 The Error Rate . . . . . . . . . . . . . . . . . . . . . . . . 12.1.2 Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.3 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . 12.1.4 Is the Classifier Working Well? . . . . . . . . . . . . . . . 12.2 Classifying with Nearest Neighbors . . . . . . . . . . . . . . . . . 12.3 Classifying with Naive Bayes . . . . . . . . . . . . . . . . . . . . 12.3.1 Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 The Support 12.4.1 Choosing a Classifier with the Hinge Loss . . . . . . . . . 12.4.2 Finding a Minimum: General Points . . . . . . . . . . . . 12.4.3 Finding a Minimum: Stochastic Gradient Descent . . . . 12.4.4 Example: Training an SVM with Stochastic Gradient Descent 363 12.4.5 Multi-Class Classification with SVMs.............................................. 366 12.5 Classifying with Random Forests................................................................... 367 12.5.1 Building a Decision Tree..................................................................... 367 12.5.2 Choosing a Split with Information Gain........................................ 370 12.5.3 Forests......................................................................................................... 373 12.5.4 Building and Evaluating a Decision Forest.................................. 374 12.5.5 Classifying Data Items with a Decision Forest........................... 375 12.6 You should............................................................................................................... 378 12.6.1 remember these definitions:.............................................................. 378 12.6.2 remember these terms......................................................................... 378 12.6.3 remember these facts:........................................................................... 379 12.6.4 use these procedures............................................................................. 379 12.6.5 be able to.................................................................................................... 379 13.1 The Curse of Dimension..................................................................................... 384 13.1.1 The Curse: Data isn't Where You Think it is............................. 384 13.1.2 Minor Banes of Dimension.................................................................. 386 13.2 The Multivariate Normal Distribution......................................................... 387 13.2.1 Affine Transformations and Gaussians.......................................... 387 13.2.2 Plotting a 2D Gaussian: Covariance Ellipses.............................. 388 13.3 Agglomerative and Divisive Clustering........................................................ 389 13.3.1 Clustering and Distance....................................................................... 391 13.4  , The K-Means Algorithm and Variants......................................................... 392 13.4.1 How to choose K...................................................................................... 395 13.4.2 Soft Assignment....................................................................................... 397 13.4.3 General Comments on K-Means....................................................... 400 13.4.4 K-Mediods.................................................................................................. 400 13.5 Application Example: Clustering Documents........................................... 401 13.5.1 A Topic Model.......................................................................................... 402 13.6 Describing Repetition with Vector Quantization...................................... 403 13.6.1 Vector Quantization............................................................................... 404 13.6.2 Example: Groceries in Portugal....................................................... 406 13.6.3 Efficient Clustering and Hierarchical K Means.......................... 409 13.6.4 Example: Activity from Accelerometer Data............................... 409 13.7 You should............................................................................................................... 413 13.7.1 remember these definitions:.............................................................. 413 13.7.2 remember these terms......................................................................... 413 13.7.3 remember these facts:........................................................................... 413 13.7.4 use these procedures............................................................................. 413 14 Regression  , 417 14.1.1 Regression to Make Predictions....................................................... 417 14.1.2 Regression to Spot Trends.................................................................. 419 14.1 Linear Regression and Least Squares.......................................................... 421 14.1.1 Linear Regression................................................................................... 421 14.1.2 Choosing beta.................................................................................................. 422 14.1.3 Solving the Least Squares Problem................................................ 423 14.1.4  , Residuals..................................................................................................... 424 14.1.5 R-squared.................................................................................................... 424 14.2 Producing Good Linear Regressions............................................................. 427 14.2.1 Transforming Variables........................................................................ 428 14.2.2 Problem Data Points have Significant Impact............................ 431 14.2.3 Functions of One Explanatory Variable........................................ 433 14.2.4 Regularizing Linear Regressions...................................................... 435 14.3  , Exploiting Your Neighbors 14.3.1 Using your Neighbors to Predict More than a Number............ 441 14.3.2 Example: Filling Large Holes with Whole Images.................... 441 14.4 You should . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.1 remember these definitions: . . . . . . . . . . . . . . 14.4.2 remember these terms: . . . . . . . . . . . . . . . . . . . . . . 444 . . . . . 444 . . . . . 444 14.4.3 remember these facts:........................................................................... 444 14.4.4 remember these procedures:............................................................. 444 15 Markov Chains and Hidden Markov Models  , 454 15.1 Markov Chains........................................................................................................ 454 15.1.1 Transition Probability Matrices........................................................ 457 15.1.2 Stationary Distributions....................................................................... 459 15.1.3 Example: Markov Chain Models of Text...................................... 462 15.2 Estimating Properties of Markov Chains.................................................... 465 15.2.1 Simulation.................................................................................................. 465 15.2.2 Simulation Results as Random Variables..................................... 467 15.2.3 Simulating Markov Chains.................................................................. 469 15.3 Example: Ranking the Web by Simulating a Markov Chain................ 472 15.4 Hidden Markov Models and Dynamic Programming............................. 473 15.4.1 Hidden Markov Models........................................................................ 474 15.4.2 Picturing Inference with a Trellis.................................................... 474 15.4.3 Dynamic Programming for HMM's: Formalities....................... 478 15.4.4  , Example: Simple Communication Errors..................................... 478 15.5 You should............................................................................................................... 481 15.5.1 remember these definitions:.............................................................. 481 15.5.2 remember these terms......................................................................... 481 15.5.3 remember these facts:........................................................................... 481 15.5.4 be able to.................................................................................................... 481 V Some Mathematical Background  , 484 16 Resources 485 16.1 Useful Material about Matrices....................................................................... 485 16.1.1 The Singular Value Decomposition................................................. 486 16.1.2 Approximating A Symmetric Matrix............................................... 487 16.2 Some Special Functions..................................................................................... 489 16.3 Finding Nearest Neighbors............................................................................... 490 16.4 Entropy and Information Gain........................................................................ 493... Mehr
Wir senden Dir eine Email, sobald das Produkt zum gewünschten Preis erhältlich ist.