A Breakthrough On Matrix Product

November 29, 2011

Beating Coppersmith-Winograd

Virginia Vassilevska Williams is a theoretical computer scientist who has worked on a number of problems, including a very neat question about cheating in the setup of tournaments, and a bunch of novel papers exemplified by this one on graph and matrix problems with runtime exponents of ${3}$ that have long been begging to be improved.

Today Ken and I want to discuss her latest breakthrough in improving our understanding of the matrix multiplication problem.

Of course Volker Strassen first showed in his famous paper in 1969 that the obvious cubic algorithm was suboptimal. Ever since then progress has been measured with one parameter ${\omega}$ : if your algorithm runs in time ${O(n^{\omega})}$ , then you are known by this one number. Strassen got ${\omega = 2.808\dots}$ , and the race was off. A long series of improvements started to happen, which for a while seemed to be stuck above ${2.5}$ . Then, Don Coppersmith and Shmuel Winograd (CW) got ${\omega < 2.496}$ and everything changed. After a contribution by Strassen himself, CW finally obtained ${\omega < 2.3755}$ in 1987, with full details here. This has been the best known for decades.

This has all changed now. Virginia has proved that she can beat the “barrier” of CW and get a new lower value for ${\omega}$ . Currently her paper gives ${\omega < 2.3727}$ , an improvement of “only” ${0.0028}$ , but there is promise of more. This is also another case of proofs coming in twos, as a theorem ${\omega < 2.3737}$ in PhD thesis work by Andrew Stothers was circulated to some in June 2010 but not very widely. All this is extremely exciting, and is one of the best results proved in years in all of theory. While these algorithms are unlikely to be usable in practice, they help shed light on one of the basic questions of complexity theory: how fast can we multiply matrices? What could be more fundamental than that?

The Basic Idea

Matrix multiplication is bi-linear: the formula for the ${i,j}$ entry of ${XY}$ is ${\sum_k x_{i,k} y_{k,j}}$ . The first step in simplifying the problem is to make it more complicated: Let us have indicator variables ${z_{i,j}}$ and compute instead the tri-linear form

$\displaystyle T(x,y,z) = \sum_{i,j,k} x_{i,k} y_{k,j} z_{i,j}$

This is a special case of a general tri-linear form

$\displaystyle F = \sum_{i,j,k = 1}^{N} C_{i,j,k} x_i y_j z_k$

where ${N = n^2}$ and we have re-mapped the indices. It looks like we have made order-of ${N^3 = n^6}$ work for ourselves. The key, however, is to try to fit a representation of the form:

$\displaystyle \begin{array}{rcl} F &=& \sum_{\ell = 1}^r (\sum_i a_{\ell,i} x_i)(\sum_j b_{\ell,j} y_j)(\sum_k c_{\ell,k} z_k)\\ &=& \sum_{\ell = 1}^r P_\ell (\sum_k c_{\ell,k} z_k), \end{array}$

where ${P_\ell = (\sum_i a_{\ell,i} x_i)(\sum_j b_{\ell,j} y_j)}$ . The point is, suppose we can compute these products ${P_\ell}$ in total time ${T}$ . Then we can compute (the coefficients for) all the desired entries

$\displaystyle z_k = \sum_{\ell = 1}^r c_{\ell,k} P_\ell$

in ${Nr}$ steps. Thus what we have are separate handles on the time ${T}$ for the products and the time for the ${z_k}$ . The way to manage and balance these times involves recursion.

The Basis Idea

The recursion idea is nice to picture for matrices, though its implementation for the way we have unrolled matrices into vectors is not so nice. Picture ${X}$ and ${Y}$ as each being ${4 \times 4}$ matrices. We can regard ${X}$ instead as a ${2 \times 2}$ matrix of four ${2 \times 2}$ matrices ${X_1,X_2,X_3,X_4}$ , and do the same for ${Y}$ . Then the product ${XY}$ can be written via ${2 \times 2}$ products ${X_i Y_j}$ , and we can picture ourselves recursing on these products.

The reason why the vector case does not look so nice is that the tri-linear form ${F}$ is so general—indeed we cannot expect to fit a general tri-linear form into a small number of products ${P_\ell}$ . What CW did, building on work by Arnold Schönhage, is relax the tri-linear form by introducing more than ${N}$ -many ${z_k}$ variables, supplying appropriate coefficients to set up the recursion, and most of all framing a strategy for setting variables to zero so that three goals are met: the recursion is furthered, the values of “ ${r}$ ” at each level stay relatively small, and the matrix product can be extracted from the variables left over. This involved a hashing scheme which used subsets of integers that are free of arithmetical progressions.

The final step by CW was to choose a starting algorithm ${{\cal A}}$ for the basis case of the recursion. They devised one and got ${\omega < 2.39}$ . Then they noticed that if they bumped up the base case by manually expanding their algorithm to an ${{\cal A}^2}$ handling the next-higher case, they got a better analysis and their famous result ${\omega < 2.376}$ . By their way of thinking, bumping the basis up once more to ${{\cal A}^3}$ was the way to do better, but they left analyzing this as a problem. Others attempted the analysis and…found it gave worse not better results. So ${2.376}$ , actually ${2.375477}$ , stood.

The insight for breaking through was to make a bigger jump in the basis. Vassilevska Williams was actually anticipated in this without her knowledge by Andrew Stothers, in his 2010 PhD thesis at the University of Edinburgh. Stothers used ${{\cal A}^4}$ and showed this method capable of achieving ${\omega < 2.3737}$ , though there has been some doubt in whether all details were worked out. Vassilevska Williams, however, used ${{\cal A}^8}$ and brought some powerful computing software to bear on a more-extensive framework for the plan. It is not clear whether there is anything necessary about jumping ${{\cal A}}$ by a power of ${2}$ —in any event her program and framework work for any exponent.

The Proof

We cannot yet really give a good summary of the proof—further details are in her paper. One quick observation about her work is in order. She used the CW method, but extended it into a general schema can be used to find good matrix product algorithms, perhaps even better than the one in the paper. The algorithms themselves can be generated and examined, but as usual the task of analyzing them is very hard. The brilliant insight that she has is this task can be laid out automatically by a certain computer program. This allows her to do the analysis where previous others failed.

For example here is a sample of the overview of her main program, in pseudo-code:

The details are not as important as the fact that this program allows one to work on much larger schemas than anyone could previously.

What Does the Bound Mean?

Note that she has improved the bound of Stothers by ${0.001}$ . For what threshold value of ${n}$ does an additive improvement of ${a}$ in the exponent halve the running time? The answer is ${n = 2^{1/a}}$ , which in this case is ${n = 2^{1000}}$ . This value is far above the Bekenstein Bound for the number of particles that could be fit into a volume the size of the observable universe without collapsing it into a black hole. In this sense the algorithm itself is even beyond galactic.

The meaning instead comes from this question: Is there a fundamental reason why ${\omega}$ could settle at a value strictly greater than ${2}$ ? Note that ${\omega = 2}$ is not taken to mean the existence of a quadratic-time algorithm, but rather that for all ${\epsilon > 0}$ there are algorithms that achieve time ${O(n^{2+\epsilon})}$ . There was some reason to think ${2.5}$ could be a natural barrier, but it was breached. Perhaps ${\sqrt{5} = 2.236\dots}$ , since this is connected to the golden ratio? Her paper notes a recent draft by Noga Alon, Amir Shpilka, and Christopher Umans that speaks somewhat against the optimism shared by many that ${\omega = 2}$ .

Open Problems

Can the current bounds be improved by more computer computations? Are we about to see the solution to this classic question? Or will it be struggling over increments of ${0.001}$ ?

In any event congratulate Virginia—and Andrew—for their brilliant work.

Update: Markus Bläser, who externally reviewed Stothers’ thesis, has contributed an important comment on the blog of Scott Aaronson here. It evaluates the significance of the work in-context, and also removes the doubt going back to 2010 that was expressed here.

A Breakthrough On Matrix Product

A Breakthrough On Matrix Product

Like this:

Recommend

Why The Hartmanis-Stearns Conjecture Is Still Open

2021 Board of Trustees Election

JavaScript Cheat Sheet & Quick Guide

想购物先交会员费，城里人的消费你看懂了吗？

服贸会国别展：各国特色服务竞相展示

内网通讯延迟有多少

One thing you’ll probably leave out of your game’s post-mortem, but shouldn’t

世界工程组织联合会首次亮相服贸会

服贸会企业用动漫讲述冬奥历史普及冬奥知识

Easy lambdas with Netlify

About Joyk