1 Introduction

This blog post is a comprehensive summary of predictive modeling with regression techniques. It is Inspired by a lecture given by Sandjai Bhulai, Professor at the Free University of Amsterdam and co-founder of the postgraduate programme Business Analytics / Data Science. The main purpose is to have a quick look at the techniques and develop a proper workflow. The blog post also serves as my personal summary of the lecture.

2 Packages and initialisations

knitr::opts_chunk$set(echo = T, eval = T, warning = F, message = F, cache = T,
                      fig.align = "center", fig.width = 7, fig.height = 7)

## Load packages & Install if necessary
ipak <- function(pkg) {
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg))
  install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
}


packages <- c("data.table", "ggthemes", "tidyverse", "DataExplorer", "kableExtra", "knitr", "data.table"
              ,"readr", "RColorBrewer", "htmlwidgets", "htmltools", "widgetframe", "highcharter", "elasticnet", "here")

ipak(packages)
##   data.table     ggthemes    tidyverse DataExplorer   kableExtra 
##         TRUE         TRUE         TRUE         TRUE         TRUE 
##        knitr   data.table        readr RColorBrewer  htmlwidgets 
##         TRUE         TRUE         TRUE         TRUE         TRUE 
##    htmltools  widgetframe  highcharter   elasticnet         here 
##         TRUE         TRUE         TRUE         TRUE         TRUE
theme_set(theme_few()) # add few theme to plots

3 Exploratory Data Analysis (EDA)

Quick look at the data.

FuelEff <- fread(here("static", "data","Regressions/FuelEfficiency.csv"))

FuelEff$ET <- as.factor(FuelEff$ET)
FuelEff$NC <- as.factor(FuelEff$NC)

glimpse(FuelEff)
## Observations: 38
## Variables: 8
## $ MPG <dbl> 16.9, 15.5, 19.2, 18.5, 30.0, 27.5, 27.2, 30.9, 20.3, 17.0...
## $ GPM <dbl> 5.917, 6.452, 5.208, 5.405, 3.333, 3.636, 3.676, 3.236, 4....
## $ WT  <dbl> 4.360, 4.054, 3.605, 3.940, 2.155, 2.560, 2.300, 2.230, 2....
## $ DIS <int> 350, 351, 267, 360, 98, 134, 119, 105, 131, 163, 121, 163,...
## $ NC  <fct> 8, 8, 8, 8, 4, 4, 4, 4, 5, 6, 4, 6, 6, 6, 6, 6, 8, 8, 8, 8...
## $ HP  <int> 155, 142, 125, 150, 68, 95, 97, 75, 103, 125, 115, 133, 10...
## $ ACC <dbl> 14.9, 14.3, 15.0, 13.0, 16.5, 14.2, 14.7, 14.5, 15.9, 13.6...
## $ ET  <fct> 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1...

Below you will find a few descriptive plots of the data, which is important to have before you start with the actual predictions

3.1 Missing Values

There are no missing values

plot_missing(FuelEff)

3.2 Discrete Variables

plot_bar(FuelEff, title = "Bar charts of all discrete variables")

3.3 Continuous Variables

plot_histogram(FuelEff, title = "Histograms of all continues variables")

3.4 Correlations

plot_correlation(FuelEff, use = "pairwise.complete.obs", title = "Correlation Matrix")

3.5 Boxplots

plot_boxplot(FuelEff, "MPG", title = "Boxplot of all variables with target variable MPG")

4 Data Description

Especially the correlation plots and box plots give a good indication of which variables might be important for your predictions. In this particular case, we see that WT, HP and DIS will likely be strong predictors, while ACC will not have a lot of impact. In larger datasets, there are many more steps to take before we select the eventual features.

5 Data Preparation

The current example dataset is small and easy to understand. However, it is good practice to perform a multitude of data preparation steps. The following steps are all useful and should always be considered before you start with machine learning.

  1. Centering & Scaling
  2. Check for skewness
    • Rule of Thumb : (Largest / Smallest) > 20 = significant skewness
    • Use BoxCox tests to transform predictors and remove the skewness
  3. Check for outliers
    • Do not just throw away anything you think is an outlier. Think about the implications!
    • Be careful with sampling your dataset - outliers might not be outliers at all.
    • Check with experts about the collected data, they might know the reason for outliers.
  4. Data reduction & feature extraction
    • Not a big issue in small datasets. But for large datasets, it will massively increase performance, and often have better results too.
    • PCA is a great technique to use.
  5. Dealing with missing values
    • Check for structurally missign values (might be mistakes in data collection, or surveys)
    • Missing data might be informative!
    • Data might be censored
    • Imputate missing values. Choose appropriate methods.
      • Using the data (mean, median)
      • Random Draw
      • Remove the samples
      • Remove predictor (if large % missing)
      • PCA to detect correlations
      • KNN to fill in according to neighbouring values
  6. Removing predictors
    • Remove predictors with near-zero variance
    • Remove multicollinearity
  7. Adding predictors
    • Higher-order predictors might increase performance
    • Dummy variables for categorical data
  8. Binning variables
    • Do not do this manually!

6 Model Choices

Choosing a model is not straightforward. There is a large variety of models, where each model excels at different things.

See the image below for a quick summary of regression techniques.

Regression Techniques Summary

7 Model Preparations

Each model requires different parameters. Tuning these parameters is extremely important for your model performances. The image above shows the complexity of tuning the parameters (e.g. 0 is easy, 2 is hard) for each model. Wrong parameters can easily lead to bias, variance or overfitting of your model.

To reduce these issues, a variety of techniques for training your data can be used. The most common ones are Cross-Validation (CV) and Leave One Out Cross Validation (LOOCV).

In R, the easiest way to train models, and set control parameters is with the package Caret. Which is what I will be using as well.

library(caret)
FuelEff <- fread(here("static", "data", "Regressions/FuelEfficiency.csv"))
FuelEff <- FuelEff[, -1] %>%
  as.data.frame()

#ctrl <- trainControl(method = "LOOCV")
ctrl <- trainControl(method = "cv", number = 10)

traindata <- FuelEff[, 2:7]
response <- FuelEff[, 1]

8 Applying the Models

It is important to set the same seed for each model, so we can 1) compare the models on the same data, and 2) reproduce the same results.

8.1 Linear Regression

set.seed(123) 
lmFit <- train(x = traindata, 
               y = response,
               method = "lm", 
               preProc = c("center", "scale"), 
               trControl = ctrl)
lmFit
## Linear Regression 
## 
## 38 samples
##  6 predictor
## 
## Pre-processing: centered (6), scaled (6) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 34, 35, 35, 34, 34, 34, ... 
## Resampling results:
## 
##   RMSE       Rsquared   MAE      
##   0.3491038  0.9187413  0.2965739
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

8.2 Partial Least Squares

set.seed(123)
plsFit <-
  train(x = traindata,
        y = response,
        method = "pls",
        preProc = c("center", "scale"),
        trControl = ctrl)
plsFit
## Partial Least Squares 
## 
## 38 samples
##  6 predictor
## 
## Pre-processing: centered (6), scaled (6) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 34, 35, 35, 34, 34, 34, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE       Rsquared   MAE      
##   1      0.5342822  0.8222915  0.4320590
##   2      0.4340009  0.8679883  0.3488201
##   3      0.3860258  0.9311885  0.3216491
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 3.
plot(plsFit)

8.3 Principal Component Regression

set.seed(123)
pcrFit <-
  train(x = traindata,
        y = response,
        method = "pcr",
        preProc = c("center", "scale"),
        trControl = ctrl)
pcrFit
## Principal Component Analysis 
## 
## 38 samples
##  6 predictor
## 
## Pre-processing: centered (6), scaled (6) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 34, 35, 35, 34, 34, 34, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE       Rsquared   MAE      
##   1      0.5901381  0.8066295  0.4767139
##   2      0.5190486  0.8306519  0.4254869
##   3      0.4124337  0.9203584  0.3402541
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 3.
plot(pcrFit)

8.4 Ridge Regression

set.seed(123)
ridgeGrid <-
  data.frame(.lambda = seq(0, .1, length = 15)) #Lambda definitie

ridgeFit <-
  train(
    x = traindata,
    y = response,
    method = "ridge",
    preProc = c("center", "scale"),
    tuneGrid = ridgeGrid,
    trControl = ctrl
  )
ridgeFit
## Ridge Regression 
## 
## 38 samples
##  6 predictor
## 
## Pre-processing: centered (6), scaled (6) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 34, 35, 35, 34, 34, 34, ... 
## Resampling results across tuning parameters:
## 
##   lambda       RMSE       Rsquared   MAE      
##   0.000000000  0.3491038  0.9187413  0.2965739
##   0.007142857  0.3491084  0.9202143  0.3001565
##   0.014285714  0.3507518  0.9207730  0.3028851
##   0.021428571  0.3529498  0.9206881  0.3051401
##   0.028571429  0.3552928  0.9201613  0.3070819
##   0.035714286  0.3576155  0.9193276  0.3088000
##   0.042857143  0.3598528  0.9182768  0.3103513
##   0.050000000  0.3619842  0.9170698  0.3117746
##   0.057142857  0.3640082  0.9157489  0.3130972
##   0.064285714  0.3659322  0.9143445  0.3143393
##   0.071428571  0.3677669  0.9128791  0.3155160
##   0.078571429  0.3695238  0.9113699  0.3166387
##   0.085714286  0.3712140  0.9098305  0.3179670
##   0.092857143  0.3728477  0.9082714  0.3196433
##   0.100000000  0.3744343  0.9067017  0.3212299
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was lambda = 0.
plot(ridgeFit)

8.5 Lasso Regression

set.seed(123)
lassoGrid <- data.frame(.fraction = seq(0.05, 1, length = 20))
lassoFit <-
  train(
    x = traindata,
    y = response,
    method = "lars",
    preProc = c("center", "scale"),
    tuneGrid = lassoGrid,
    trControl = ctrl
  )
lassoFit
## Least Angle Regression 
## 
## 38 samples
##  6 predictor
## 
## Pre-processing: centered (6), scaled (6) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 34, 35, 35, 34, 34, 34, ... 
## Resampling results across tuning parameters:
## 
##   fraction  RMSE       Rsquared   MAE      
##   0.05      0.9914435  0.8821032  0.9225322
##   0.10      0.8691388  0.8810176  0.8078626
##   0.15      0.7524983  0.8757677  0.6955019
##   0.20      0.6419761  0.8694007  0.5849300
##   0.25      0.5404261  0.8626244  0.4768875
##   0.30      0.4534510  0.8572930  0.3876725
##   0.35      0.4148866  0.8568022  0.3519432
##   0.40      0.4124938  0.8648776  0.3544209
##   0.45      0.4083596  0.8716531  0.3480199
##   0.50      0.3990464  0.8835669  0.3377982
##   0.55      0.3878828  0.8953990  0.3266916
##   0.60      0.3787880  0.9038286  0.3201525
##   0.65      0.3723993  0.9092132  0.3172129
##   0.70      0.3697524  0.9126979  0.3165587
##   0.75      0.3688900  0.9130463  0.3163729
##   0.80      0.3658178  0.9142300  0.3145681
##   0.85      0.3603385  0.9158821  0.3102977
##   0.90      0.3554954  0.9171826  0.3057231
##   0.95      0.3517300  0.9181306  0.3011485
##   1.00      0.3491038  0.9187413  0.2965739
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was fraction = 1.
plot(lassoFit)

8.6 Elestic Net

set.seed(123)
enetGrid <-
  expand.grid(.lambda = c(0, 0.01, .1),
              .fraction = seq(.05, 1, length = 20))
enetFit <-
  train(
    x = traindata,
    y = response,
    method = "enet",
    preProc = c("center", "scale"),
    tuneGrid = enetGrid,
    trControl = ctrl
  )
enetFit
## Elasticnet 
## 
## 38 samples
##  6 predictor
## 
## Pre-processing: centered (6), scaled (6) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 34, 35, 35, 34, 34, 34, ... 
## Resampling results across tuning parameters:
## 
##   lambda  fraction  RMSE       Rsquared   MAE      
##   0.00    0.05      0.9914435  0.8821032  0.9225322
##   0.00    0.10      0.8691388  0.8810176  0.8078626
##   0.00    0.15      0.7524983  0.8757677  0.6955019
##   0.00    0.20      0.6419761  0.8694007  0.5849300
##   0.00    0.25      0.5404261  0.8626244  0.4768875
##   0.00    0.30      0.4534510  0.8572930  0.3876725
##   0.00    0.35      0.4148866  0.8568022  0.3519432
##   0.00    0.40      0.4124938  0.8648776  0.3544209
##   0.00    0.45      0.4083596  0.8716531  0.3480199
##   0.00    0.50      0.3990464  0.8835669  0.3377982
##   0.00    0.55      0.3878828  0.8953990  0.3266916
##   0.00    0.60      0.3787880  0.9038286  0.3201525
##   0.00    0.65      0.3723993  0.9092132  0.3172129
##   0.00    0.70      0.3697524  0.9126979  0.3165587
##   0.00    0.75      0.3688900  0.9130463  0.3163729
##   0.00    0.80      0.3658178  0.9142300  0.3145681
##   0.00    0.85      0.3603385  0.9158821  0.3102977
##   0.00    0.90      0.3554954  0.9171826  0.3057231
##   0.00    0.95      0.3517300  0.9181306  0.3011485
##   0.00    1.00      0.3491038  0.9187413  0.2965739
##   0.01    0.05      1.0021737  0.8821032  0.9324097
##   0.01    0.10      0.8901040  0.8809871  0.8275808
##   0.01    0.15      0.7826987  0.8755224  0.7249137
##   0.01    0.20      0.6800796  0.8690532  0.6238765
##   0.01    0.25      0.5838516  0.8619770  0.5250596
##   0.01    0.30      0.4958268  0.8564919  0.4316494
##   0.01    0.35      0.4352548  0.8531532  0.3672571
##   0.01    0.40      0.4194547  0.8543208  0.3586732
##   0.01    0.45      0.4155285  0.8571756  0.3554100
##   0.01    0.50      0.4102322  0.8658767  0.3483506
##   0.01    0.55      0.3993022  0.8799371  0.3382498
##   0.01    0.60      0.3885199  0.8924834  0.3283067
##   0.01    0.65      0.3796402  0.9021405  0.3202953
##   0.01    0.70      0.3733477  0.9089035  0.3171051
##   0.01    0.75      0.3690766  0.9142236  0.3137839
##   0.01    0.80      0.3674409  0.9153729  0.3138569
##   0.01    0.85      0.3629853  0.9170631  0.3116112
##   0.01    0.90      0.3577906  0.9185066  0.3081810
##   0.01    0.95      0.3533306  0.9196578  0.3047508
##   0.01    1.00      0.3496541  0.9205312  0.3013206
##   0.10    0.05      1.0332806  0.8821032  0.9609821
##   0.10    0.10      0.9518415  0.8783502  0.8851119
##   0.10    0.15      0.8730545  0.8683000  0.8111682
##   0.10    0.20      0.7970407  0.8584787  0.7389782
##   0.10    0.25      0.7212938  0.8522060  0.6655895
##   0.10    0.30      0.6481007  0.8480263  0.5928386
##   0.10    0.35      0.5794113  0.8406220  0.5233410
##   0.10    0.40      0.5197609  0.8340106  0.4600211
##   0.10    0.45      0.4755120  0.8299935  0.4080893
##   0.10    0.50      0.4467121  0.8286540  0.3800511
##   0.10    0.55      0.4357717  0.8289656  0.3667490
##   0.10    0.60      0.4249146  0.8361190  0.3558767
##   0.10    0.65      0.4160983  0.8441368  0.3477751
##   0.10    0.70      0.4096045  0.8512172  0.3433161
##   0.10    0.75      0.3997441  0.8615736  0.3378336
##   0.10    0.80      0.3912180  0.8721311  0.3327923
##   0.10    0.85      0.3843666  0.8822711  0.3287078
##   0.10    0.90      0.3789450  0.8916914  0.3249573
##   0.10    0.95      0.3756573  0.8998559  0.3229113
##   0.10    1.00      0.3744343  0.9067017  0.3212299
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were fraction = 1 and lambda = 0.
plot(enetFit)

8.7 Neural Networks

Parameters

  • Decay - sensitivity of the parameters. Used to balance overfitting / bias.
  • Size - How many units the hidden layer has
  • Bagging - Trying multiple neural networks and averaging these
  • Linout - Linear output. Should be FALSE if doing classification instead of regression
  • Trace - Shows everything that the model is doing - increases time
  • maxNWts - Makes sure that you have enough memory to calculate the networks. If not - it won’t run.
  • maxit - Maximum number of iterations before it stops, even if the network is not optimal yet.
set.seed(123)
nnetGrid <-
  expand.grid(.decay = c(0, 0.01, .1),
              .size = c(1:10),
              .bag = FALSE)
nnetFit <-
  train(
    traindata,
    response,
    method = "avNNet",
    tuneGrid = nnetGrid,
    trControl = ctrl,
    linout = TRUE,
    trace = FALSE,
    MaxNWts = 10 * (ncol(traindata) + 1) + 10 + 1,
    maxit = 500,
    preProc = c("center", "scale")
  )
nnetFit
## Model Averaged Neural Network 
## 
## 38 samples
##  6 predictor
## 
## Pre-processing: centered (6), scaled (6) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 34, 35, 35, 34, 34, 34, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE       Rsquared   MAE      
##   0.00    1    0.3895405  0.8973923  0.3323903
##   0.00    2    0.6569133  0.8374466  0.5035775
##   0.00    3    0.6826297  0.8788383  0.5424810
##   0.00    4    0.8463658  0.6359670  0.6789124
##   0.00    5    0.7656605  0.7740208  0.6511560
##   0.00    6    0.8698821  0.7202164  0.7354869
##   0.00    7    1.0796775  0.6358712  0.8771279
##   0.00    8    1.0075220  0.6479410  0.8434621
##   0.00    9    0.8636968  0.7044664  0.7012957
##   0.00   10    0.9540263  0.6683322  0.7794921
##   0.01    1    0.3526006  0.9128624  0.3057678
##   0.01    2    0.3619619  0.9298032  0.3098504
##   0.01    3    0.3683206  0.9048323  0.3179104
##   0.01    4    0.3941588  0.9332994  0.3447662
##   0.01    5    0.4137627  0.9199535  0.3668975
##   0.01    6    0.4464267  0.8694082  0.3935072
##   0.01    7    0.4669006  0.8531353  0.4122863
##   0.01    8    0.4569140  0.8535870  0.4078636
##   0.01    9    0.4561972  0.8515777  0.4056808
##   0.01   10    0.4445728  0.8499994  0.3852996
##   0.10    1    0.3335752  0.9203501  0.2862332
##   0.10    2    0.3416498  0.9193257  0.2975506
##   0.10    3    0.3675403  0.9102944  0.3178251
##   0.10    4    0.3619042  0.9101573  0.3132779
##   0.10    5    0.3625173  0.9124789  0.3154598
##   0.10    6    0.3646521  0.9138108  0.3179842
##   0.10    7    0.3624660  0.9132400  0.3147886
##   0.10    8    0.3619679  0.9140112  0.3139177
##   0.10    9    0.3613569  0.9142954  0.3127367
##   0.10   10    0.3638330  0.9143804  0.3153601
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 1, decay = 0.1 and bag
##  = FALSE.

8.8 MARS

Parameters

  • Pruning - Complexity degree of your model
set.seed(123)
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:38)
marsFit <-
  train(
    traindata,
    response,
    method = "earth",
    tuneGrid = marsGrid,
    trControl = ctrl
  )
marsFit
## Multivariate Adaptive Regression Spline 
## 
## 38 samples
##  6 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 34, 35, 35, 34, 34, 34, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE       Rsquared   MAE      
##   1        2      0.4839472  0.8753910  0.4147103
##   1        3      0.5227159  0.8259529  0.4237267
##   1        4      0.4176467  0.8780799  0.3421071
##   1        5      0.4208958  0.8701870  0.3751769
##   1        6      0.4182763  0.8645150  0.3776940
##   1        7      0.4347006  0.8497555  0.3939607
##   1        8      0.4349547  0.8440440  0.3936475
##   1        9      0.4248572  0.8388525  0.3793411
##   1       10      0.4248572  0.8388525  0.3793411
##   1       11      0.4248572  0.8388525  0.3793411
##   1       12      0.4248572  0.8388525  0.3793411
##   1       13      0.4248572  0.8388525  0.3793411
##   1       14      0.4248572  0.8388525  0.3793411
##   1       15      0.4248572  0.8388525  0.3793411
##   1       16      0.4248572  0.8388525  0.3793411
##   1       17      0.4248572  0.8388525  0.3793411
##   1       18      0.4248572  0.8388525  0.3793411
##   1       19      0.4248572  0.8388525  0.3793411
##   1       20      0.4248572  0.8388525  0.3793411
##   1       21      0.4248572  0.8388525  0.3793411
##   1       22      0.4248572  0.8388525  0.3793411
##   1       23      0.4248572  0.8388525  0.3793411
##   1       24      0.4248572  0.8388525  0.3793411
##   1       25      0.4248572  0.8388525  0.3793411
##   1       26      0.4248572  0.8388525  0.3793411
##   1       27      0.4248572  0.8388525  0.3793411
##   1       28      0.4248572  0.8388525  0.3793411
##   1       29      0.4248572  0.8388525  0.3793411
##   1       30      0.4248572  0.8388525  0.3793411
##   1       31      0.4248572  0.8388525  0.3793411
##   1       32      0.4248572  0.8388525  0.3793411
##   1       33      0.4248572  0.8388525  0.3793411
##   1       34      0.4248572  0.8388525  0.3793411
##   1       35      0.4248572  0.8388525  0.3793411
##   1       36      0.4248572  0.8388525  0.3793411
##   1       37      0.4248572  0.8388525  0.3793411
##   1       38      0.4248572  0.8388525  0.3793411
##   2        2      0.4034104  0.8986415  0.3360562
##   2        3      0.4025746  0.8919200  0.3420545
##   2        4      0.3694814  0.8981198  0.3210576
##   2        5      0.3805058  0.8899355  0.3317454
##   2        6      0.3907333  0.8676445  0.3382646
##   2        7      0.3907333  0.8676445  0.3382646
##   2        8      0.3907333  0.8676445  0.3382646
##   2        9      0.3907333  0.8676445  0.3382646
##   2       10      0.3907333  0.8676445  0.3382646
##   2       11      0.3907333  0.8676445  0.3382646
##   2       12      0.3907333  0.8676445  0.3382646
##   2       13      0.3907333  0.8676445  0.3382646
##   2       14      0.3907333  0.8676445  0.3382646
##   2       15      0.3907333  0.8676445  0.3382646
##   2       16      0.3907333  0.8676445  0.3382646
##   2       17      0.3907333  0.8676445  0.3382646
##   2       18      0.3907333  0.8676445  0.3382646
##   2       19      0.3907333  0.8676445  0.3382646
##   2       20      0.3907333  0.8676445  0.3382646
##   2       21      0.3907333  0.8676445  0.3382646
##   2       22      0.3907333  0.8676445  0.3382646
##   2       23      0.3907333  0.8676445  0.3382646
##   2       24      0.3907333  0.8676445  0.3382646
##   2       25      0.3907333  0.8676445  0.3382646
##   2       26      0.3907333  0.8676445  0.3382646
##   2       27      0.3907333  0.8676445  0.3382646
##   2       28      0.3907333  0.8676445  0.3382646
##   2       29      0.3907333  0.8676445  0.3382646
##   2       30      0.3907333  0.8676445  0.3382646
##   2       31      0.3907333  0.8676445  0.3382646
##   2       32      0.3907333  0.8676445  0.3382646
##   2       33      0.3907333  0.8676445  0.3382646
##   2       34      0.3907333  0.8676445  0.3382646
##   2       35      0.3907333  0.8676445  0.3382646
##   2       36      0.3907333  0.8676445  0.3382646
##   2       37      0.3907333  0.8676445  0.3382646
##   2       38      0.3907333  0.8676445  0.3382646
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 4 and degree = 2.
plot(marsFit)

8.9 SVM

Parameters

  • Can be either Radial / Polynomial / Linear
  • Tunelength - Complexity
set.seed(123)
svmFit <-
  train(
    traindata,
    response,
    method = "svmRadial",
    tuneLength = 14,
    preProc = c("center", "scale"),
    trControl = ctrl
  )
svmFit
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 38 samples
##  6 predictor
## 
## Pre-processing: centered (6), scaled (6) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 34, 35, 35, 34, 34, 34, ... 
## Resampling results across tuning parameters:
## 
##   C        RMSE       Rsquared   MAE      
##      0.25  0.5919977  0.8378634  0.4865879
##      0.50  0.5236983  0.8676116  0.4471119
##      1.00  0.4691761  0.8957120  0.4029897
##      2.00  0.4564244  0.8537364  0.3931610
##      4.00  0.4563069  0.8448426  0.3945761
##      8.00  0.4660388  0.8455530  0.4041276
##     16.00  0.4733663  0.8418228  0.4093908
##     32.00  0.4728235  0.8409826  0.3994785
##     64.00  0.4903507  0.8238986  0.4114824
##    128.00  0.4907464  0.8235286  0.4118808
##    256.00  0.4907464  0.8235286  0.4118808
##    512.00  0.4907464  0.8235286  0.4118808
##   1024.00  0.4907464  0.8235286  0.4118808
##   2048.00  0.4907464  0.8235286  0.4118808
## 
## Tuning parameter 'sigma' was held constant at a value of 0.4731597
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.4731597 and C = 4.
plot(svmFit)

8.10 KNN

set.seed(123)
knnFit <-
  train(
    traindata,
    response,
    method = "knn",
    preProc = c("center", "scale"),
    tuneGrid = data.frame(.k = 1:20),
    trControl = ctrl
  )
knnFit
## k-Nearest Neighbors 
## 
## 38 samples
##  6 predictor
## 
## Pre-processing: centered (6), scaled (6) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 34, 35, 35, 34, 34, 34, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    1  0.5414309  0.8170529  0.4652833
##    2  0.4698088  0.8428303  0.3928792
##    3  0.4989522  0.8224664  0.4212111
##    4  0.5056929  0.8219990  0.4109583
##    5  0.4905562  0.8513589  0.4127183
##    6  0.5027853  0.8368725  0.4288778
##    7  0.4926643  0.8435061  0.4278321
##    8  0.4883420  0.8315829  0.4261365
##    9  0.5170485  0.8297113  0.4485491
##   10  0.5324021  0.8230479  0.4611917
##   11  0.5519344  0.8178468  0.4777818
##   12  0.5572327  0.8253967  0.4820764
##   13  0.5587074  0.8345198  0.4788077
##   14  0.5869497  0.8309002  0.4998607
##   15  0.5949151  0.8369644  0.5142122
##   16  0.6090828  0.8410554  0.5257927
##   17  0.6261019  0.8322213  0.5506176
##   18  0.6560881  0.8039646  0.5795620
##   19  0.6904664  0.7902431  0.6047368
##   20  0.7291519  0.7881980  0.6467608
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 2.
plot(knnFit)

9 Model Performances

Evaluating performances of your model can be done by calculating a few statistical properties of the model and its predictions.

  1. Mean Absolute Error (MEA)
    • Absolute difference between observed values and the model predictions
  2. Root Mean Squared Error (RMSE)
    • Average distance between the observed values and the model predictions
    • Proportion of information in the data that is explained by the model
    • Is a correlation measure, not accuracy.

10 Model Comparison

allResamples <- resamples(
  list(
    "Linear Reg" = lmFit,
    "PLS" = plsFit,
    "PCR" = pcrFit,
    "Ridge" = ridgeFit,
    "LASSO" = lassoFit,
    "Elastic Net" = enetFit,
    "Neural Net" = nnetFit,
    "MARS" = marsFit,
    "SVM" = svmFit,
    "KNN" = knnFit
  )
)

gridExtra::grid.arrange(
parallelplot(allResamples, metric = "RMSE", main = "RMSE of each fold for each model"),
parallelplot(allResamples, metric = "Rsquared", main = "R² of each fold for each model"),
ncol = 2)

11 Final Notes

In the end, the final model that you choose to work with depends on a few factors. According to the summary, we should always choose a neural network in this particular example dataset. It is consistant throughout each fold, and gives very accurate predictions. However - it is harder to interpret and it takes up a lot more time than the other models.

For this reason, we might also select the next best things, i.e. Ridge Regression, Lasso, Elastic Net or even Linear Regression. These models are less consistent, yet still perform well, require a lot less time and are far easier to understand, explain and comprehend for every party (think clients, managers etc.) involved.