14

Write Better And Faster Python Using Einstein Notation

 2 years ago
source link: https://towardsdatascience.com/write-better-and-faster-python-using-einstein-notation-3b01fc1e8641
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Responses (3)

Also publish to my profile

There are currently no responses for this story.

Be the first to respond.

You have 2 free member-only stories left this month.

Write Better And Faster Python Using Einstein Notation

Make your code more readable, concise, and efficient using “einsum”

Photo by Lewis Kang'ethe Ngugi on Unsplash

When dealing with linear or multilinear algebra in Python, summation loops and NumPy functions can get quite messy, hard to read, and even slow. This was the case for me until I discovered NumPy's einsum function a while ago and I’m surprised not everyone is talking about it.

I am going to show you how to make your code more readable, concise, and efficient using Einstein notation in NumPy, TensorFlow, or PyTorch.

Understanding Einstein Notation

The basis of Einstein notation is to get rid of the summation symbol Σ when that doesn’t cause ambiguity (when we can determine the bounds of the indices).

Example #1: Product of matrices

In the following formula, the shape of the matrix A is (m, n) and the shape of B is (n, p).

0*wfPCAAZnhQee25bq.png?q=20
write-better-and-faster-python-using-einstein-notation-3b01fc1e8641

Since we know the bounds for i, j, and k from the shapes of the matrices. We can simplify the formula to:

0*90b4QudEeJwflHNh.png?q=20
write-better-and-faster-python-using-einstein-notation-3b01fc1e8641

Example #2: Dot product of two vectors

The dot product of two n-dimensional vectors is:

0*ANZfsXi_D7PlFHN2.png?q=20
write-better-and-faster-python-using-einstein-notation-3b01fc1e8641

We can write this in Einstein notation as:

0*70-EnR91Fmu4MUDf.png?q=20
write-better-and-faster-python-using-einstein-notation-3b01fc1e8641

Example #3: Dot product of two matrices

We can define a dot product of two matrices using this formula:

0*ptaaazd9YWUZq5zl.png?q=20
write-better-and-faster-python-using-einstein-notation-3b01fc1e8641

In Einstein notation, this is simply:

0*wc17ToJ5cq7W0Yzf.png?q=20
write-better-and-faster-python-using-einstein-notation-3b01fc1e8641

Example #4: Tensors

We can work with more than 2 indices. A tensor (higher-order matrix).

For example, we can write something like this:

0*oBVSkukjpdUV66bl.png?q=20
write-better-and-faster-python-using-einstein-notation-3b01fc1e8641

Or even like this:

0*hFxRM4h87QRXruFN.png?q=20
write-better-and-faster-python-using-einstein-notation-3b01fc1e8641

You get the idea!

When to use Einstein notation?

This mostly comes to when you’re working with vectors, matrices, and/or tensors, and you have to: multiply, transpose, and/or sum them in a particular way.

Writing the results of combining these operations can be simpler in Einstein notation.

Using Python’s einsum

einsum is implemented in numpy , torch , and tensorflow . In all of these modules, it follows the syntax einsum(equation, operands) .

Where we replace by indices. And after -> we put the output indices.

This is equivalent to:

1*nYbMi6U_wbK_gXbhLzoQsA.png?q=20
write-better-and-faster-python-using-einstein-notation-3b01fc1e8641

if an input or output is a scalar (it has no indices), we can leave the index empty.

Here are the examples above.

Example #1: Matrix multiplication

0*90b4QudEeJwflHNh.png?q=20
write-better-and-faster-python-using-einstein-notation-3b01fc1e8641

einsum("ik,kj->ij", A, B)

Example #2: Vector dot product

einsum("i,i->",u, v)

Example #3: Matrix dot product

einsum("ij,ij->", A, B)

Example #4: Tensors

einsum("ijkl,klij->ij", A, B)

0*hFxRM4h87QRXruFN.png?q=20
write-better-and-faster-python-using-einstein-notation-3b01fc1e8641

einsum("iqrj,klqmr->ijklm", A, B)

You can use this with almost any formula involving linear algebra and multilinear algebra.

Performance

So how does einsum perform compared to using loops or numpy functions?

I decided to run example #3 using three methods:

After running 1,000,000 tests and using timeit :

  • Loops: 24.36s
  • Built-in functions: 7.58s
  • Einsum: 3.78s

einsum is clearly faster. Actually, twice as fast as numpy’s built-in functions and, well, 6 times faster than loops, in this case.

Why is einsum fast?

This comes down to the fact that numpy is written in C.

When using native Python loops, all the data manipulation happens in the Python interpreter.

When using built-in numpy functions, it happens in C, which offers numpy developers the ability to optimize their code. This is why numpy is faster.

But when using einsum , numpy handles the data once in C and returns the final result, while using multiple numpy functions spends more time returning multiple values.

einsum can prove to be a great one-liner in some situations. While it is not only one way to improve the readability and efficiency of your code, it must be a no-brainer to use it when possible.

There are other ways to optimize Python code though, like using caching, which I am going to cover in a future article.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK