The Future of Machine Learning and why it looks a lot like Julia 🤖

Image by Author

TLDR; the future of ML is Julia. If you are looking for a quick answer, there you have it. If you want the well reasoned explanation, stick around. And don’t just take my word for it:

Where we are headed and why it looks a lot like Julia (but not exactly like Julia)

When trying to predict how PyTorch would itself get disrupted, we used to joke a bit about the next version of PyTorch…

dev-discuss.pytorch.org

(The post above summarize: PyTorch devs want all the features Julia provides, but they don’t want to re-write PyTorch in Julia yet because the ecosystem isn’t mature enough)

Overview 🌄

First, let me address the elephant in the room: the Julia ecosystem is not as mature as the Python ecosystem, yet. This is the present reality, but as you might discover in this post, that does not mean you can’t use Julia successfully in your machine learning workflows.

Below, I will give a brief overview of the Julia Machine Learning and Deep Learning ecosystem and then talk about why you might want to learn Julia for your ML and DL workflows. We will explore the following ideas which I think make Julia a prime candidate for use in the ML space:

You get to use Julia, so you get all the benefits Julia provides 🧑‍🤝‍🧑
Flexible and Extensible 💪
It’s (actually) open source 👐
Easy to use the internals 🍳
ML + Science = breakthrough results 🧬🧪
And more!

If you want to do Machine Learning in Julia, there are a few places you should look:

Flux.jl, Deep Learning (Ecosystem Overview)
MLJ.jl, General Machine Learning
Knet.jl, Deep Learning
JuliaML, GitHub org hosting common ML packages
JuliaAI, hosts repos for the MLJ ecosystem
FastAI.jl, Repository of best practices for deep learning in Julia

In general during this post, we will focus on Flux.jl as a Deep Learning framework but that is not to say Knet and MLJ are not well made or useful packages. I simply have less experience using them so I will save my impression for a later post.

Use Julia for ML, because, well, you get to use Julia

As a language, Julia is designed to enable developer productivity. From the package manager, to the speed of running code, all of these features lead to a developer experience which is bringing in whole swaths of new developers.

I mentioned this in my post on:

Why You Should Invest in Julia Now, as a Data Scientist

Julia is a high level, dynamic programming language built to be as fast as C or C++ while remaining as easy to use as…

medium.com

but because of the way Julia handles things like Multi Dimensional arrays, you can avoid using packages like Numpy (which you would sometimes see in Tensorflow) and Tensors (which are used in PyTorch and basically a re-branded Numpy array).

Here we can see the benefit of using Julia results in less mental overhead. In the case of Python, we can do:

import tensorflow as tf
tf.ones([3, 4], tf.int32)<tf.Tensor: shape=(3, 4), dtype=int32, numpy=
array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int32)>

And then in Julia we would do something like:

julia> using Flux
julia> ones(Float32, 1, 2, 3)
1×2×3 Array{Float32, 3}:
[:, :, 1] =
 1.0  1.0[:, :, 2] =
 1.0  1.0[:, :, 3] =
 1.0  1.0

In the latter case, we are working with just basic arrays which at least in my opinion, makes the code a little bit more intuitive.

Another point that was mentioned in the above post that bears re-iterating is the rest of the packages you would use in a Data Science / Machine Learning workflow are blazing fast in Julia. Loading CSV’s is 10–20x faster than in Python and DataFrames.jl (the analog to Pandas) also has best in class speed and performance (especially for a rather “young” package that just hit it’s 1.0 release in 2021).

If you are not yet convinced at the power of Julia and that it will enable you to be more productive, do a quick search for “Why should I switch to Julia” and you will find plenty of other (hopefully compelling) literature that goes into more depth than I want to here.

Why Flux? 🤔

I tried to answer that in this 1 minute video:

I also encourage you to check out https://fluxml.ai for a high level overview.

Simple, Flexible, and Extensible 💪

An extensible system is one whose internal structure and dataflow are minimally or not affected by new or modified functionality… (via Wikipedia)

Both Flexibility and Extensibility are front of mind in the design of Flux. For starters, Flux can be used to write models directly but can also be used as a backend for packages like FastAI.jl. Additionally, Flux tries to keep a minimal yet useful API such that developers have the freedom to use what is provided or build on it with their own custom functions which integrate into the Flux ecosystem automatically.

If you have ever spent any significant amount of time exploring https://www.tensorflow.org/api_docs/python/tf, you will have discovered like I did that there is a lot going on (just take a look on the left hand side at all of the modules). Conversely, Flux focuses on keeping the amount of code and documentation in Flux itself minimal. Take a quick peek at the docs if you haven’t: https://fluxml.ai/Flux.jl/stable/

An example of just how simple Flux is to use:

using Flux
model = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)

Here we define a simple model with 3 layers: 2 dense layers (one using the sigmoid activation function) and a softmax layer.

In Tensorflow, this would look like:

from tensorflow.keras import models, layers, callbacksmodel = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.add(tf.keras.layers.Dense(5))
model.add(tf.keras.layers.Dense(2, activation="softmax"))

In the case of Tensorflow, we can see that because we use the Sequential() model, we don’t actually specify the output size of the layers. The sequential nature of the model sets the output of one layer to be the input of the following layer.

While libraries like PyTorch do give instructions on how to extend the library, as is noted in the docs: https://pytorch.org/docs/stable/notes/extending.html there are far more limitations than in Flux.

It’s (actually) open source 👐

I don’t want to harp on this topic too long, but projects claiming to be “Open” when a majority of contributors are concentrated at a single institution are always a worry to me. In fact, this is one of the things we look at when we evaluate new projects joining NumFOCUS (the non-profit behind Jupyter, Pandas, Numpy, Julia, Flux, etc). To me, a core feature of Flux is that it is not being developed by Google (in Jax and Tensorflow’s case) or Facebook (in PyTorch’s case). While there is certainly a whole host of benefits of having that backing (like lots of money to pay people 😄), if I was a user making a decision about what framework I want to build my company or project around, my preference would be one where I could be an actual stakeholder. In the case of Flux, if you want to start contributing and being a stakeholder, we would love to have you. You can jump in and start contributing to the ecosystem: https://github.com/FluxML/Flux.jl/blob/master/CONTRIBUTING.md or drop by a bi-weekly dev call: https://julialang.org/community/#events. I will again note that this isn’t meant to be a slight at the Tensorflow or PyTorch community since I know many valuable contributions come from outside the host institution, but at a high level I think the idea is worth keeping in mind.

Easy to use the internals 🍳

If you are wondering about the emoji choice here, it is because cooking eggs is easy, almost as easy as using Flux’s internals. Now that we have settled that mystery, let’s dive in and look at why Flux’s internals are so easy to use.

Spoiler, it is because they are written in Julia. And when I say written in Julia, I mean 100% written in Julia. Take a peek at: https://github.com/FluxML/Flux.jl if you don’t believe me.

Image captured by author from the Flux.jl GitHub

The image above is in stark contrast to Tensorflow and PyTorch:

the-future-of-machine-learning-and-why-it-looks-a-lot-like-julia-a0e26b51f6a6

Image captured by author from the Tensorflow GitHub

Image captured by author from the PyTorch GitHub

Your eyes are not deceiving you, PyTorch and Tensorflow, if you are not aware, are both written mainly in C++ with a Python wrapper for users. This means that if you want to take a peek under the hood, you have to be comfortable not only with C++ and Python but also the way in which they interact. For me I am still scared by having to use pointers my freshman year of college so if I can avoid C++, I do. If we look at an example of some of the internal of Flux, as noted before, it is Julia, just plain old Julia:

function onehot(x, labels, default)
  i = something(findfirst(isequal(x), labels), 0)
  i > 0 || return onehot(default, labels)
  OneHotVector{UInt32, length(labels)}(i)
end

Here we are looking at one of the definitions of a onehot encoding in Flux. I don’t see any pointers there, which is a relief. I will avoid showing any C++ code here just to make a point since I don’t want to scare you off, but if you are bold, go check out some of the internals.

One of the side effects of the easy to use internals and 100% Julia code is that users of a package can very quickly become developers and critical contributors. I have seen this happen for myself as I have used Flux over the last few months. While contributing is still difficult, it is without a doubt easier to jump in than it would be in the case of other frameworks.

A basic example of this is: https://github.com/FluxML/Flux.jl/pull/1801 where I found it odd that calling the gpu function on some code did not provide any warning or indication that the code was in fact not being run on the GPU.

ML + Science = breakthrough results 🧬🧪

One of the areas that Flux excels is in the combination of it and other packages to create state of the art results. One example of this is https://github.com/SciML/DiffEqFlux.jl which provides:

Universal neural differential equations with O(1) backprop, GPUs, and stiff+non-stiff DE solvers, demonstrating scientific machine learning (SciML) and physics-informed machine learning methods

Now unless you are Chris Rackauckas, you might be asking yourself what that even means. You are not alone friend. When I read that for the first time, my brain nearly shut down for good. At a high level, Ordinary Differential equations are used to model natural processes like Population Growth and Decay, Glucose Absorption by the Body, and even the Spread of Epidemics. Neural ODE’s allow us to create a more robust system that does not rely on a fixed number of preset layers in a model. While this can lead to lower performance, the result in many cases is higher speed. You can read this paper for details: https://arxiv.org/abs/1806.07366 or alternatively this medium post: https://towardsdatascience.com/differential-equations-as-a-neural-network-layer-ac3092632255

DiffEqFlux.jl provides the following features:

- Neural networks can be defined where the “activations” are nonlinear functions described by differential equations
- Neural networks can be defined where some layers are ODE solves
- ODEs can be defined where some terms are neural networks
- Cost functions on ODEs can define neural networks

Another good resource to check out is the release blog post: https://julialang.org/blog/2019/01/fluxdiffeq/

I will be honest and say that I have yet to get my hands dirty with DiffEqFlux. It is one of the things on my todo list the big take away for me is that Flux is enabling this sort of innovation which I find fascinating.

I suggest you check out one of the many great talks Chris has given on the topic:

I would also suggest checking out:

where Dhairya Gandhi goes over many of the places where Flux is being used in the scientific community / ecosystem.

Distributed and Super Computing 🔀

Under active development is DaggerFlux.jl which will provide model parallelism (the ability to train a model on different devices / nodes in parallel) which is critically important to ensure that Flux remains competitive in the ML arms race.

There has also been a lot of active work in getting Julia setup to work on TPU’s (Googles custom ML hardware): https://github.com/JuliaTPU/XLA.jl though the project seems to be in a holding pattern at the moment. Despite this, it is still an area where Julia can be used.

Benchmarking Flux and Tensorflow 🪑

Just as a simple point of comparison, let’s look at how long it takes to do a basic gradient in Flux vs Tensorflow. In Flux, we will do the following:

julia> using Fluxjulia> f(x) = 3x^2 + 2x + 1; # define our functionjulia> @time df(x) = gradient(f, x)[1]; 0.006222 seconds (995 allocations: 73.773 KiB, 44.35% compilation time) # longer time on the first run, Julia is compiledjulia> @time df(x) = gradient(f, x)[1];0.000241 seconds (20 allocations: 1.234 KiB)julia> @time df(2)0.000001 seconds14.0julia> @time df(2)0.000000 seconds14.0

Now, let’s look at an example in Tensorflow using Gradient tapes:

import tensorflow as tf
import timestart = time.time()
x = tf.Variable(3.0)with tf.GradientTape() as tape:
  y = x**2# dy = 2x * dx
dy_dx = tape.gradient(y, x)
dy_dx.numpy()
end = time.time() - start
print(end) # 0.002371072769165039

I will note that I had to run the Python example on Google Colab since my Tensorflow installation broke midway through this tutorial and my normal go to guide for installing it on the M1 resulted in an entire terminal buffer filled with red text and errors. Perfect time to mention that Julia runs natively on the M1 mac and many other platforms: https://julialang.org/downloads/#current_stable_release

Again at a high level, this tutorial seems relevant since under the hood, machine learning is just taking gradients.

Where Flux needs improvements (and how you can help) 🆘

One of the biggest areas that Flux is behind other ML libraries is in community written content. If I want to do something in TF or PyTorch, it really is usually one search away. This is something we are working on in the Flux community. If you are interested in contributing to this work, which it is worth noting is extremely high impact, please check out: https://github.com/FluxML/fluxml.github.io/issues/107

If you want to get involved beyond just writing tutorials, the contributing guide here: https://github.com/FluxML/Flux.jl/blob/master/CONTRIBUTING.md is a good place to start.

Comparison: Transfer Learning 🧠

Transfer learning is one of the coolest things about Machine Learning. We can take models built for a specific use-case and fine tune them to fit a new use-case. Let us compare the code for transfer learning between Tensorflow and Julia. You can find the Julia code on GitHub:

GitHub - logankilpatrick/DeepLearningWithJulia: The Deep Learning with Julia book, using Flux.jl.

DL with Julia is a book about how to do various deep learning tasks using the Julia programming language and…

github.com

under: https://github.com/logankilpatrick/DeepLearningWithJulia/blob/main/src/transfer_learning.ipynb and a Tensorflow example here: https://www.tensorflow.org/tutorials/images/transfer_learning

One of the big differences that is worth calling out is the way that data and specifically images are handled in each framework. In my opinion, actual machine learning is easy and rather trivial in many cases. The challenge programmatically speaking is to get your data in the form in which it can be inputted to a machine learning model (and cleaned). In this case, we can look at Tensorflow:

train_dataset = tf.keras.utils.image_dataset_from_directory(train_dir, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE)

To me, a function like this is what makes using frameworks like Tensorflow so easy. PyTorch also has many similar functions which make working with data simple. I think that Flux could benefit greatly from a unified interface for loading images and other common forms of data in ML.

If we look at the Julia example, we are actually writing the loading code from scratch which I have documented. Again, this is actually more difficult than most of the other code that is used in the transfer learning example.

const DOGS = filter(x -> occursin("dog", x), FILES)
const CATS = filter(x -> occursin("cat", x), FILES)function load_batch(batchsize = 10, imagesize = (224,224); path = PATH)
    if ((batchsize % 2) != 0)
      print("Batch size must be an even number")
    end    imgs_paths = shuffle(vcat(sample(DOGS, Int(batchsize/2)), sample(CATS, Int(batchsize/2))))

labels = map(x -> occursin("dog.", x) ? 1 : 2, imgs_paths)
    labels = Flux.onehotbatch(labels, [1, 2])    imgs = Images.load.(imgs_paths)
    imgs = map(img -> Images.imresize(img, imagesize...), imgs)    imgs = map(img -> permutedims(channelview(img), (3, 2, 1)), imgs)
    imgs = cat(imgs..., dims = 4)    Float32.(imgs), labels
end

Another difference in the code that we see is in image augmentations:

data_augmentation = tf.keras.Sequential([
  tf.keras.layers.RandomFlip('horizontal'),
  tf.keras.layers.RandomRotation(0.2),
])

Here we can see the TF code to do a random flip and rotation. In the case of Julia, we would need to use: https://github.com/Evizero/Augmentor.jl

julia> pl = FlipX(0.5) |>
            Rotate(0:20) |>julia> img_new = augment(img, pl) # here we apply the augmentation

Again, from a usability standpoint, there would be benefits in my view to having image augmentation as part of the core package but the idea of extracting it out to keep things minimal is understandable.

Lastly, let us look at the model definition itself in Julia:

model = Chain(
  resnet[1:end-2],
  Dense(2048, 1000, σ),  
  Dense(1000, 256, σ),
  Dense(256, 2),
  softmax        
);

Here we take a pre-trained Resnet and grab all its layers except the last two which we will replace with our new fine tuned layers. In Tensorflow, we would do something to the effect of:

base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')
inputs = tf.keras.Input(shape=(160, 160, 3))
x = data_augmentation(inputs)
x = preprocess_input(x)
x = base_model(x, training=False)
x = global_average_layer(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)

This is one of the ways of creating a model in TF (and Flux has a similar approach) where to define the inputs and ouputs instead of the model layers but I prefer to create the model using the format we saw using the Julia chain function.

Integrating Tensorflow or PyTorch into Julia 🤯

While Tensorflow.jl is no longer being actively developed, you can still use Tensorflow (and any python code for that matter) in Julia. Torch.jl is also actively maintained: https://github.com/FluxML/Torch.jl which provides a wrapper of PyTorch’s C++ code in Julia.

If you are interested in reading more about integrating Python and Julia together, check out:

Working with Flux.jl Models on the Hugging Face Hub 🤗

How to use the Julia Deep Learning library to interact with models from Hugging Face

towardsdatascience.com

Concluding thoughts 🎬

My goal of this post was to convince someone who is unsure about using Julia for Machine Learning that it is worth while to try it. I also hope that my mention of areas where the Flux ecosystem needs further development sets the right expectation that things are not perfect, but they are definitely at the point where you can do serious science and ML.

If you have comments on this post, please get in touch with me: https://twitter.com/OfficialLoganK or if you want to help out on Flux, we would love to have you!

The Future of Machine Learning and why it looks a lot like Julia 🤖

The Future of Machine Learning and why it looks a lot like Julia 🤖

Where we are headed and why it looks a lot like Julia (but not exactly like Julia)

When trying to predict how PyTorch would itself get disrupted, we used to joke a bit about the next version of PyTorch…

Overview 🌄

Use Julia for ML, because, well, you get to use Julia

Why You Should Invest in Julia Now, as a Data Scientist

Julia is a high level, dynamic programming language built to be as fast as C or C++ while remaining as easy to use as…

Why Flux? 🤔

Simple, Flexible, and Extensible 💪

It’s (actually) open source 👐

Easy to use the internals 🍳

ML + Science = breakthrough results 🧬🧪

Distributed and Super Computing 🔀

Benchmarking Flux and Tensorflow 🪑

Where Flux needs improvements (and how you can help) 🆘

Comparison: Transfer Learning 🧠

GitHub - logankilpatrick/DeepLearningWithJulia: The Deep Learning with Julia book, using Flux.jl.

DL with Julia is a book about how to do various deep learning tasks using the Julia programming language and…

Integrating Tensorflow or PyTorch into Julia 🤯

Working with Flux.jl Models on the Hugging Face Hub 🤗

How to use the Julia Deep Learning library to interact with models from Hugging Face

Concluding thoughts 🎬

Recommend

在思否的第一次年终总结

Rokka as IIIF Image Server and some more updates

Code a Family Christmas Game ...With Cats | Keyhole Software

元宇宙赛道以百万年薪“抢人”，未来这五大类人才最紧俏！

Please explain me the solution of Problem: 1424 G-Years !!

Erlang & ASN.1(Abstract Syntax Notation One)

2021·有赞·第四届全国智慧新零售信息大会暨时尚行业年会圆满成功！

RN TouchableOpacity点击事件不响应原因详解

Gitee 如何自动部署 Pages？还是用 GitHub Actions!

元宇宙研究报告：为什么元宇宙是下一代互联网？

About Joyk