GridSearch: the ultimate Machine Learning Tool

yMJRbqB.jpg!web

GridSearch: the ultimate Machine Learning Tool. Photo by Chris Liverani on Unsplash

Machine Learning in short

The goal of supervised Machine Learning is to build a prediction function based on historical data. This data has independent (explanatory) variables and a target variable (the variable that you want to predict).

Once a predictive model has been built, we measure its error on a separate testing data set. We do this using KPIs that allow quantifying the error of the model, for example, the Mean Square Error in a regression context (quantitative target variable) or the Accuracy in a classification context (categorical target variable).

The model with the smallest error is generally selected as the best model. Then we use this model to predict the values of the target variable by inputting the explanatory variables.

In this article, I will deep-dive into GridSearch.

Machine Learning’s Two Types of Optimization

GridSearch is a tool that is used for hyperparameter tuning . As stated before, Machine Learning in practice comes down to comparing different models to each other and trying to find the best working model.

Apart from selecting the right data set, there are generally two aspects of optimizing a predictive model:

Optimize the choice of the best model
Optimize a model’s fit using hyperparameters tuning

Let’s now look into those to have an explanation for the need for GridSearch.

Part 1. Optimize the choice of the best model

In some datasets, there may exist a simple linear relationship that can predict a target variable from the explanatory variables. In other datasets, these relationships may be more complex or highly nonlinear.

At the same time, many models exist. This ranges from simple models like the Linear Regression, up to very complex models like Deep Neural Networks.

It is key to use a model that is appropriate for our data.

For example, if we use a Linear Regression on a very complex task, the model will not be performant. But if we use a Deep Neural Network on a very simple task, this will also not be performant!

To find a well-fitting Machine Learning model, the solution is to split data into train and test data, then fit many models on the training data and test each of them on the test data. The model that has the smallest error on the test data will be kept.

Qb2qmaj.png!web

A screenshot from Scikit Learn’s list of supervised models shows that there are a lot of models to try out!

Part 2. Optimize a model’s fit using hyperparameters tuning

After choosing one well-performing model (or a few), the second thing to optimize is the hyperparameters of a model. Hyperparameters are like a configuration of the training phase of the model. They influence what a model can or cannot learn.

Tuning hyperparameters can, therefore, lower the error on the test data set even more.

The way of estimating is different for each model, and thus each model has its own hyperparameters to optimize.

Q3MfyuZ.png!web

This extract of the documentation of Scikit Learn’s RandomForestClassifier shows numerous parameters that can all influence the final accuracy of your model.

One way to do a thorough search for the best hyperparameters is to use a tool called GridSearch.

What is GridSearch?

GridSearch is an optimization tool that we use when tuning hyperparameters. We define the grid of parameters that we want to search through, and we select the best combination of parameters for our data.

The “Search” in GridSearch

The hypothesis is that there is a specific combination of values of the different hyperparameters that will minimize the error of our predictive model. Our goal using GridSearch is to find this specific combination of parameters.

The “Grid” in GridSearch

GridSearch’s idea for finding this best parameter combination is is simple: just test each parameter combination possible and select the best one!

Not really each combination possible though, since for a continuous scale there would be infinitely many combinations to test. The solution for this is to define a Grid. This Grid defines for each hyperparameter, which values should be tested.

YfAVjyF.png!web

A schematic overview of GridSearch on two hyperparameters Alpha and Beta (graphics by author)

In an example case where two hyperparameters — Alpha and Beta— are tuned: we could give both of them the values [0.1, 0.01, 0.001, 0.0001] resulting in the following “Grid” of values. At each crossing point, our GridSearch will fit the model to see what the error at this point is.

And after checking all the grid points, we know which parameter combination is best for our prediction.

The “Cross-Validation” in GridSearch

At this point, only one thing remains to be added: the Cross-Validation Error.

When testing the performance of a model with each combination of hyperparameters, there could be a risk of overfitting. This means that just by pure chance, only the training data set corresponded well to this particular hyperparameter combination! The performance on new, real-life data, could be much worse!

To get a more reliable estimate of the performances of a hyperparameter combination, we take the Cross Validation Error.

QVZJJzZ.png!web

A schematic overview of Cross-Validation (graphics by author)

In Cross-Validation, the data is split in multiple parts. For example 5 parts. Then the model is fit 5 times while leaving out one-fifth of the data. This one-fifth left-out data is used to measure the performances.

For one combination of hyperparameter values, the average of the 5 errors constitutes the cross-validation error. This makes the selection of the final combination more reliable.

What makes GridSearch so important?

GridSearch allows us to find the best model given a data set very easily. It actually makes the Machine Learning part of the Data Scientists role much easier by automating the search.

On the Machine Learning side, some things that still remain to be done is deciding on the right way to measure error, deciding on which models to try out and which hyperparameters to test for. And the most important part, the work on data preparation, is also left for the data scientist.

Thanks to the GridSearch approach, the Data Scientist can focus on the data wrangling work, while automating repetitive tasks of model comparison. This makes the work more interesting and allows the Data Scientist to add value where he’s most needed: working with data.

A number of alternatives for GridSearch exist, including Random Search, Bayesian Optimization, Genetic Algorithms, and more. I will write an article about those soon, so don’t hesitate to stay tuned. Thanks for reading!

Machine Learning in short

Machine Learning’s Two Types of Optimization

Part 1. Optimize the choice of the best model

Part 2. Optimize a model’s fit using hyperparameters tuning

What is GridSearch?

The “Search” in GridSearch

The “Grid” in GridSearch

The “Cross-Validation” in GridSearch

What makes GridSearch so important?

Recommend

央视315点名汉堡王：过期面包做汉堡、鸡腿排保质期随意改

小米涨回发行价，是偶然还是必然？

100个年入1亿的新品牌，都有这10条创业军规

315 晚会曝光 9 大黑料：汉堡王用过期面包，五菱「神车」要人命，万科精装房漏水

神仙打架：PG和MySQL到底选啥?

比较热门的十种编程语言，总有一种适合你

盘点：2020上半年网络通信大事件

系统管理员要学习的五大技能！

Shipping Const Generics in 2020

Decision Tree Classifier and Cost Computation Pruning using Python

About Joyk