Normalize before Training

In the previous article, I normalize the training data manually. However, after I check some kernels on Kaggle, some interesting method can be learned. This article records the methods.

Now we want to predict if the country has nuclear weapon. The above table is the data CSV file whose name is nuclear.csv. Each row represents one country. The first attribute is the number of soldiers in the particular country, and the second one is the total number of population. The last attribute represent if the country has nuclear weapon. 1 represents it has.

The first function that I think is very useful is train_test_split. This function can split the whole x and y into training data and testing data. The shape of x is [row_num, feature_num] while the shape of y is [row_num].

The second idea that is worth to learn is MinMaxScaler. This class can scale the column into the range between 0~1. The advantage of MinMaxScaler is that it can make the variance be more obvious.

The left side is a simple example. The different of 0.5 is relatively smaller than the mean (seven hundred thousand). However, the different will be obvious after MinMaxScaler works.

The MinMaxScaler is a common class that I can see on the Kaggle kernel. On the other hand, StandardScaler is another class that can help us arrange the data. it can makes the mean of each column become 0, and the variance of each column become 1.

However, both scalier have their own disadvantage. MinMaxScaler will control the value to be in the range of 0~1. If the testing data is out of range, it might get the incorrect predicted result. StandardScaler just normalize the column value. On the contrary, some result will be negative after the work. Some architecture (ex. ReLU) might be sensitive toward this property.

The last part is train and test the model! I use random forest to make the prediction. Even though the result is not good enough.

For the conclusion, we can use train_test_split to get the corresponding sets of data. MinMaxScaler and StandardScaler can help us normalize the feature, and use these data to do the further work!

Normalize before Training

Normalize before Training

Recommend

The latest Xbox update allows disabling system sounds

调查称近半美国PS/NS玩家会因为动视暴雪游戏而考虑订阅XGP

Google Maps will soon display traffic lights, stop signs, and tolls | TechSpot

Amazon filed its objections to the Staten Island union vote - The Washington Pos...

6 Awesome Online Workout Classes for Teenagers to Stay Fit and Healthy

大规模神经网络最新文献综述：训练高效DNN、节省内存使用、优化器设计

Understanding Bayesian linear regression & Gaussian process with Normal Dist...

增量训练IncCTR介绍

第011期如何优雅的连接到家里的网络? OpenWrt中使用wireguard教程|一个端口访问局域...

AI+科学计算-昇思MindSpore都给我们带来哪些惊喜？

About Joyk