Computer Scientists Find a Key Research Algorithm's Limits

The most widely used technique for optimizing values of a math function turns out to be a fundamentally difficult computational problem.

A new result in complexity theory establishes why the gradient descent algorithm cannot solve some kinds of problems quickly.Video: Olena Shmahalo/Quanta Magazine

Many aspects of modern applied research rely on a crucial algorithm called gradient descent. This is a procedure generally used for finding the largest or smallest values of a particular mathematical function—a process known as optimizing the function. It can be used to calculate anything from the most profitable way to manufacture a product to the best way to assign shifts to workers.

Yet despite this widespread usefulness, researchers have never fully understood which situations the algorithm struggles with most. Now, new work explains it, establishing that gradient descent, at heart, tackles a fundamentally difficult computational problem. The new result places limits on the type of performance researchers can expect from the technique in particular applications.

“There is a kind of worst-case hardness to it that is worth knowing about,” said Paul Goldberg of the University of Oxford, coauthor of the work along with John Fearnley and Rahul Savani of the University of Liverpool and Alexandros Hollender of Oxford. The result received a Best Paper Award in June at the annual Symposium on Theory of Computing.

You can imagine a function as a landscape, where the elevation of the land is equal to the value of the function (the “profit”) at that particular spot. Gradient descent searches for the function’s local minimum by looking for the direction of steepest ascent at a given location and searching downhill away from it. The slope of the landscape is called the gradient, hence the name gradient descent.

Gradient descent is an essential tool of modern applied research, but there are many common problems for which it does not work well. But before this research, there was no comprehensive understanding of exactly what makes gradient descent struggle and when—questions another area of computer science known as computational complexity theory helped to answer.

“A lot of the work in gradient descent was not talking with complexity theory,” said Costis Daskalakis of the Massachusetts Institute of Technology.

Computational complexity is the study of the resources, often computation time, required to solve or verify the solutions to different computing problems. Researchers sort problems into different classes, with all problems in the same class sharing some fundamental computational characteristics.

To take an example—one that’s relevant to the new paper—imagine a town where there are more people than houses and everyone lives in a house. You’re given a phone book with the names and addresses of everyone in town, and you’re asked to find two people who live in the same house. You know you can find an answer, because there are more people than houses, but it may take some looking (especially if they don’t share a last name).

This question belongs to a complexity class called TFNP, short for “total function nondeterministic polynomial.” It is the collection of all computational problems that are guaranteed to have solutions and whose solutions can be checked for correctness quickly. The researchers focused on the intersection of two subsets of problems within TFNP.

The first subset is called PLS (polynomial local search). This is a collection of problems that involve finding the minimum or maximum value of a function in a particular region. These problems are guaranteed to have answers that can be found through relatively straightforward reasoning.

One problem that falls into the PLS category is the task of planning a route that allows you to visit some fixed number of cities with the shortest travel distance possible given that you can only ever change the trip by switching the order of any pair of consecutive cities in the tour. It’s easy to calculate the length of any proposed route and, with a limit on the ways you can tweak the itinerary, it’s easy to see which changes shorten the trip. You’re guaranteed to eventually find a route you can’t improve with an acceptable move—a local minimum.

The second subset of problems is PPAD (polynomial parity arguments on directed graphs). These problems have solutions that emerge from a more complicated process called Brouwer’s fixed point theorem. The theorem says that for any continuous function, there is guaranteed to be one point that the function leaves unchanged—a fixed point, as it’s known. This is true in daily life. If you stir a glass of water, the theorem guarantees that there absolutely must be one particle of water that will end up in the same place it started from.

Computer Scientists Find a Key Research Algorithm's Limits

Computer Scientists Find a Key Research Algorithm's Limits

Recommend

Who needs qubits? Factoring algorithm run on a probabilistic computer

qxiaofan/awesome-Computer-Vision-Algorithm-Jobs

From AI to Teamwork: 7 Key Skills for Data Scientists

Skoltech scientists use supercomputer to probe limits of Google’s quantum proces...

MIT's newest computer vision algorithm identifies images down to the pixel

Scan-Line Polygon Area Fill Algorithm | Computer Graphics

Quantum computer succeeds where a classical algorithm fails

Computer Science Proof Lifts Limits on Quantum Entanglement | Quanta Magazine

Disable Weak Key Exchange Algorithm, CBC Mode in SSH

Limits to computing: A computer scientist explains why even in the age of AI, so...

About Joyk