Introduction to Machine Learning
The data sets Modeling Process Prerequisites Data splitting Simple random sampling Stratified sampling Class imbalances Creating models in R Many formula interfaces Many engines Resampling methods Contents k-fold cross validation Bootstrapping Alternatives Bias variance trade-off Bias Variance Hyperparameter tuning Model evaluation Regression models Classification models Putting the processes together Feature & Target Engineering Prerequisites Target engineering Dealing with missingness Visualizing missing values Imputation Feature filtering Numeric feature engineering Skewness Standardization Categorical feature engineering Lumping One-hot & dummy encoding Label encoding Alternatives Dimension reduction Proper implementation Sequential steps Data leakage Putting the process together Contents v SUPERVISED LEARNING Linear Regression Prerequisites Simple linear regression Estimation Inference Multiple linear regression Assessing model accuracy Model concerns Principal component regression Partial least squares Feature interpretation Final thoughts Logistic Regression Prerequisites Why logistic regression Simple logistic regression Multiple logistic regression Assessing model accuracy Model concerns Feature interpretation Final thoughts Regularized Regression Prerequisites Why regularize? Ridge penalty Lasso penalty Elastic nets Implementation vi Contents Tuning Feature interpretation Attrition data Final thoughts Multivariate Adaptive Regression Splines Prerequisites The basic idea Multivariate regression splines Fitting a basic MARS model Tuning Feature interpretation Attrition data Final thoughts K-Nearest Neighbors Prerequisites Measuring similarity Distance measures Pre-processing Choosing k MNIST example Final thoughts Decision Trees Prerequisites Structure Partitioning How deep? Early stopping Pruning Ames housing example Contents vii Feature interpretation Final thoughts Bagging Prerequisites Why and when bagging works Implementation Easily parallelize Feature interpretation Final thoughts Random Forests Prerequisites Extending bagging Out-of-the-box performance Hyperparameters Number of trees mtry Tree complexity Sampling scheme Split rule Tuning strategies Feature interpretation Final thoughts Gradient Boosting Prerequisites How boosting works A sequential ensemble approach Gradient descent Basic GBM Hyperparameters viii Contents Implementation General tuning strategy Stochastic GBMs Stochastic hyperparameters Implementation XGBoost XGBoost hyperparameters Tuning strategy Feature interpretation Final thoughts Deep Learning Prerequisites Why deep learning Feedforward DNNs Network architecture Layers and nodes Activation Backpropagation Model training Model tuning Model capacity Batch normalization Regularization Adjust learning rate Grid Search Final thoughts Contents ix Support Vector Machines Prerequisites Optimal separating hyperplanes The hard margin classifier The soft margin classifier The support vector machine More than two classes Support vector regression Job attrition example Class weights Class probabilities Feature interpretation Final thoughts Stacked Models Prerequisites The Idea Common ensemble methods Super learner algorithm Available packages Stacking existing models Stacking a grid search Automated machine learning Final thoughts Interpretable Machine Learning Prerequisites The idea Global interpretation Local interpretation Model-specific vs. model-agnostic x Contents Permutation-based feature importance Concept Implementation Partial dependence Concept Implementation Alternative uses Individual conditional expectation Concept Implementation Feature interactions Concept Implementation Alternatives Local interpretable model-agnostic explanations Concept Implementation Tuning Alternative uses Shapley values Concept Implementation XGBoost and built-in Shapley values Localized step-wise procedure Concept Implementation Final thoughts DIMENSION REDUCTION Contents xi Principal Components Analysis Prerequisites The idea Finding principal components Performing PCA in R Selecting the number of principal components Eigenvalue criterion Proportion of variance explained criterion Scree plot criterion Final thoughts Generalized Low Rank Models Prerequisites The idea Finding the lower ranks Alternating minimization Loss functions Regularization Selecting k Fitting GLRMs in R Basic GLRM model Tuning to optimize for unseen data Final thoughts Autoencoders Prerequisites Undercomplete autoencoders Comparing PCA to an autoencoder Stacked autoencoders Visualizing the reconstruction Sparse autoencoders xii Contents Denoising autoencoders Anomaly detection Final thoughts CLUSTERING K-means Clustering Prerequisites Distance measures Defining clusters k-means algorithm Clustering digits How many clusters? Clustering with mixed data Alternative partitioning methods Final thoughts Hierarchical Clustering Prerequisites Hierarchical clustering algorithms Hierarchical clustering in R Agglomerative hierarchical clustering Divisive hierarchical clustering Determining optimal clusters Working with dendrograms Final thoughts Model-based Clustering Prerequisites Measuring probability and uncertainty Covariance types Model selection My basket example Final thoughts
Brad Boehmke is a data scientist at 84.51 Degrees where he wears both software developer and machine learning engineer hats. He is an Adjunct Professor at the University of Cincinnati, author of Data Wrangling with R, and creator of multiple public and private enterprise R packages. Brandon Greenwell is a data scientist at 84.51 Degrees where he works on a diverse team to enable, empower, and encourage others to successfully apply machine learning to solve real business problems. He's part of the Adjunct Graduate Faculty at Wright State University, an Adjunct Instructor at the University of Cincinnati, and the author of several R packages available on CRAN.
"Hands-On Machine Learning with R is a great resource
for understanding and applying models. Each section provides
descriptions and instructions using a wide range of R
- Max Kuhn, Machine Learning Software Engineer, RStudio
"You can't find a better overview of practical machine learning
methods implemented with R."
- JD Long, co-author of R Cookbook "Simultaneously approachable, accessible, and rigorous, Hands-On Machine Learning with R offers a balance of theory and implementation that can actually bring you from relative novice to competent practitioner."
- Mara Averick, RStudio Dev Advocate "...The book describes in detail the various methods for solving classification and clustering problems. Functions from many R libraries are compared, which enables the reader to understand their respective advantages and disadvantages. The authors have developed a clear structure to the book that includes a brief description of each model, examples of using the model for specific real-life examples, and discussion of the advantages and disadvantages of the model. This structure is one of the book's main advantages."
- Igor Malyk, ISCB News, July 2020