5 Books That Will Teach You the Math Behind Machine Learning

A guide to the beautiful world of mathematics for machine learning

May 24 ·5min read

A fter the explosive growth of open source machine learning and deep learning frameworks, the field is more accessible than ever. Thanks to this, it went from a tool for researchers to a widely adopted and used method, fueling the insane growth of technology we experience now. Understanding how the algorithms really work can give you a huge advantage in designing, developing and debugging machine learning systems. Due to its mathematical nature, this task can seem daunting for many. However, this does not have to be the way.

From a high level, there are four pillars of mathematics in machine learning.

Linear algebra
Probability theory
Multivariate calculus
Optimization theory

It takes time to build a solid foundation of these and understand the inner workings of the state of the art machine learning algorithms such as convolutional networks, generative adversarial networks, and many others. This won’t be an afternoon project, but given that you consistently dedicate time for this, you can go pretty far in a short amount of time. There are some great resources to guide you along the way. In this post, I have selected the five which were most helpful for me.

Linear Algebra Done Right by Sheldon Axler

Vn6ZvmQ.png!web

Linear algebra is a beautiful but tough subject for beginners if it is taught the “classical” way, which is determinants and matrices first, vector spaces later. However, when it is done the other way around, it is surprisingly intuitive and clear. This book presents linear algebra in a very friendly and insightful way. I wish I had learned it from this book, instead of the old way.

You can find the author’s page about the book here .

Probability: For the Enthusiastic Beginner by David Morin

AZZJva6.jpg!web

Most machine learning books don’t introduce probability theory properly and they use confusing notation, often mixing up density functions and discrete distributions. This can be very difficult to get through without a solid background in probability.

This book will provide you with just that: a detailed, mathematically correct yet user friendly introduction to the subject. This is suitable for learners without any previous exposure on probability.

If you want to learn what probability really is, I wrote a an introduction to probability from a more abstract perspective.

The mathematical foundations of probability

A measure-theoretic introduction

towardsdatascience.com

Multivariate Calculus by Denis Auroux (from MIT OpenCourseWare)

I have cheated a little bit here, since this is not a book but an actual university course on multivariate calculus at MIT, recorded and made available for the public. Out of all the resources I know, this is by far the best introduction to the subject. It doesn’t hurt to have a background in univariate calculus, but the lectures can be followed without it as well.

You can find the full course here.

One thing this course doesn’t cover well is the gradient descent algorithm, which is fundamental for neural networks. If you would like to learn more about this, I wrote an introductory post on the subject, which explains gradient descent from scratch.

The mathematics of optimization for deep learning

A brief guide about how to minimize a function with millions of variables

towardsdatascience.com

Grokking Deep Learning by Andrew Trask

ZbA3Unm.png!web

This book is probably my favorite in this list. I love all of them, but if you only have time to read one, read this one .

It contains a complete hands-on introduction to the inner workings of neural networks, with code snippets covering all of the material. Even though not specifically geared towards advanced mathematics, by the end of this book you’ll know more about the mathematics of deep learning than 95% of data scientists, machine learning engineers, and other developers.

You’ll also build a neural network from scratch, which is probably the best learning exercise you can undertake. When starting out with machine learning, I have also built a convolutional network from scratch in pure NumPy. If you are interested, I wrote a detailed guide on how to do it yourself.

How to build a DIY deep learning framework in NumPy

Understanding the fine details of neural nets by building one from scratch

towardsdatascience.com

Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville

iUnaAju.jpg!web

This is where all of the theory you have learned comes together. It was written by some of the greatest minds in machine learning, this book synthesizes the mathematical theory and puts the heavy machinery into use, providing a solid guide into state of the art deep learning methods such as convolutional and recurrent networks, autoencoders and many more.

The best is that the book is freely available online for everyone . Given that this is the number one resource for deep learning researchers and developers, this is pretty great.

Among all of the resources I have listed here, this is probably the most difficult to read. Understanding deep learning requires you to look at the algorithms with a probabilistic perspective, which can be difficult. If you would like to learn how can a problem be translated into the language of probability and statistics, I have written a detailed guide for you, where I explain the most important details in a beginner-friendly way.

The statistical foundations of machine learning

A look beyond function fitting

towardsdatascience.com

Let’s get to learning!

As I have mentioned, probably you won’t be able to burn through all these resources in an afternoon. You’ll need to work hard, but it will pay off in the future. Building up knowledge is the best investment. In the future, this will give you a huge advantage in building machine learning systems. Not to mention that the theory behind machine learning is beautiful.

A guide to the beautiful world of mathematics for machine learning

Linear Algebra Done Right by Sheldon Axler

Probability: For the Enthusiastic Beginner by David Morin

The mathematical foundations of probability

A measure-theoretic introduction

towardsdatascience.com

Multivariate Calculus by Denis Auroux (from MIT OpenCourseWare)

The mathematics of optimization for deep learning

A brief guide about how to minimize a function with millions of variables

towardsdatascience.com

Grokking Deep Learning by Andrew Trask

How to build a DIY deep learning framework in NumPy

Understanding the fine details of neural nets by building one from scratch

towardsdatascience.com

Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville

The statistical foundations of machine learning

A look beyond function fitting

towardsdatascience.com

Let’s get to learning!

Recommend

Word Embeddings and Embedding Projector of TensorFlow

直播带市值是不是一门好生意？

零工经济，后疫情就业蓄水池

18 年 15 寸 MBP 电池鼓包把 D 面炸开了！

FizzBuzz purely in Rust's trait system

太南：一个JDK线程池BUG引发的GC机制思考

Train an Image Classifier using Keras.

State of C++ Static Analysis circa 2020

这就是你日思夜想的 React 原生动态加载

Reddit首席执行官：由于华尔街的介入，比特币的“春天要来了”

About Joyk