36

Python in Computationally-Intensive Areas: Machine Learning

 4 years ago
source link: https://towardsdatascience.com/python-in-computationally-intensive-areas-machine-learning-faa16888efc0?gi=afd00f7ef918
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Python in Computationally-Intensive Areas: Machine Learning

Even the most ingenious learning algorithm will not suffice if it never completes.

Machine learning tends to be a computationally-intensive task for many practical use cases. It is vital that your learning algorithm performs well, or at the very least, completes.

Do not get me wrong, there are many practical algorithms and ideas that arise from Computational Learning Theory .

In fact, I have written about a few of them: Defining Goodness in Machine Learning Algorithms and What To Do If Learning Fails .

If performance is key in practical machine learning use cases, why is Python one of the most commonly used language in data science?

Introduction

Let us set the scene before we dive into answering this question. Back when I was a student in my introductory computer science course, the primary language we learned was Java (a compiled language). A year later, the same course was now teaching Python (an interpreted language) as its primary language. Why the switch?

Since Python is not the fastest for every problem, my hypothesis is that Python is just easier to learn and use. This hypothesis can be extended into data science where people of diverse levels of engineering backgrounds are creating great machine learning models, often by means of the back and forth process of experimenting, prototyping and running experiments .

Still, performance is important. So let us dive into how we can use Python in a computationally-intensive area like machine learning.

Global Temperature Prediction using Least Squares Polynomial Fitwith NumPy, SciPy and MatplotLib

Let us step through the creation of a simple Global Temperature Predictor, where we will stop along the way to discuss how Python’s libraries are key in assisting us in machine learning.

To start off, we will be using Numpy.

NumPyis an extension package to Python for multi-dimensional arrays. It is designed for scientific computation and is a memory-efficient container that provides fast numerical operations . Since Numpy is mostly written in C (a very fast compiled language), it is able to off-load its computationally-intensive tasks to its lower-layer.

Here is a quick comparison of loop performance,

FNBRruQ.png!web
Python Machine Learning Colab Notebook

Python : 1000 loops, best of 3: 237 µs per loop. NumPy : 1000000 loops, best of 3: 1.22 µs per loop.

Aside from the potentially increased performance, Numpy has a plethora of useful tools. Here are some of my favorite:

numpy.reshape :Gives a new shape to an array without changing its data.

numpy.copy :Performs true copy.

numpy.flatten:Flattens our array.

numpy.empty:Does not set the array values to zero, and may therefore be marginally faster.

numpy.ma:Deals with (propagation of) missing data.

numpy.genfromtxt:Deals with (propagation of) missing data for text files.

numpy.linspace:Evenly spaces numbers over a specified interval.

numpy.clip:Trims outliers.

We will begin by using sample data before we get to the real data. Here we are generating temperature data as a function of month of the year.

We can use Matplotlib, a Python library, to easily visualize our data.

a2QriyN.png!web

Python Machine Learning Colab Notebook

We can then use SciPy to fit our data to a periodic function using the optimize library. No need to reinvent the wheel here.

scipy :A scientific toolkit for Linear algebra, Interpolation, Optimization and fit, Statistics and random numbers, Numerical integration, Fast Fourier transforms, Signal processing, and Image manipulation.

aeAZVbb.png!web

FRVVfey.png!web

Python Machine Learning Colab Notebook

Let us extend the idea for our global temperature model on real data. Numpy can easily load in our real data from the NASA GLOBAL Land-Ocean Temperature Index in 0.01 degrees Celsius base period: 1951–1980 . Some of this data has NaN values, but Numpy can handle this without our assistance.

6ZZvM3e.png!web

Python Machine Learning Colab Notebook

We can plot a heat map with Matplotlib to get intuition about the trends in our data.

juEFN3V.png!web

Python Machine Learning Colab Notebook

We can flatten our data with Numpy so that we can have an easy data set to work with. Next, we split data into train and test, but we want to preserve the order here so we can do predictions on the “future”. Error will be calculated as the squared distance of the model’s prediction to the real data .

i263InE.png!web
Python Machine Learning Colab Notebook
uQBrAvn.png!web
Python Machine Learning Colab Notebook
YFFjUrn.png!web
Python Machine Learning Colab Notebook

Finally, we can perform training and plot our graph. Here we are training with SciPy least squares polynomial fit, where the outcome is a polynomial that minimizes the sum of the squared distance of the model’s prediction to the real data. Its coefficients are the unique model that can perform our predictions. In the below graph, the higher degree polynomial is performing the best on the test set.

e6zqMb.png!web

Python Machine Learning Colab Notebook

M7bqMnF.png!web

Python Machine Learning Colab Notebook

Rv2IFfi.png!web

Python Machine Learning Colab Notebook

Conclusion

What Python lacks in performance, it makes up for in ease of use with its robust libraries. In addition, these libraries often improve Python performance in many use cases.

Please see the linked Colab Notebook for the associated Python source code.

References

http://scipy-lectures.org

Building Machine Learning Systems with Python by Willi Richert and Luis Pedro Coelho


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK