A better story for multi-core Python

LWN.net needs you!

Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

Running the standard Python interpreter in threads on multiple CPU cores has always resulted in a smaller performance gain than one might naively think—or hope for. Because of the CPython global interpreter lock (GIL), only one thread of execution can be running in the interpreter core at any given time. Removing the GIL has long been a topic of discussion in Python circles, and various alternative Python implementations have either removed or worked around the GIL. A recent discussion on the python-ideas mailing list looked at a different approach to providing a better multi-core story for Python.

In a post that was optimistically titled "solving multi-core Python", Eric Snow outlined an approach that did not rely on removing the GIL, but instead relies on "subinterpreters" and a mechanism to share objects between them. The multi-core problem is partly a public relations problem for the language, Snow said, but it needs solving for that and other, more technical reasons.

Subinterpreters

The basic principle behind Snow's proposal is to take the existing subinterpreter support and to expose it in the Python language as a concurrency mechanism. The subinterpreters would run in separate threads, but would not generally share data with each other, at least implicitly, unlike the typical use of threads. Data would only be exchanged explicitly via channels (similar to those in the Go language). One of the main influences for Snow's thoughts (and for Go's concurrency model) is Tony Hoare's "Communicating Sequential Processes".

Handling objects shared between subinterpreters is one of the areas that requires more thought, Snow said. One way forward might be to only allow immutable objects to be shared between the subinterpreters. In order to do that, though, it probably makes sense to move the reference counts (used for garbage collection) out of the objects themselves and into a separate table. That would allow the objects themselves to be truly unchanging, which could also help performance in the multi-processing (i.e. fork()) case by avoiding page copies (via copy-on-write) of objects that are simply being referenced again, as Nick Coghlan pointed out.

Other areas that need to be considered are what the restrictions on subinterpreters would be. If, for example, subinterpreters were not allowed to start new threads, they would be single-threaded and not require a GIL. Or the GIL for subinterpreters could be replaced with a "local interpreter lock", with the main GIL used in the main interpreter and to mediate interaction between subinterpreters. There is also a question about using fork() in subinterpreters. In the initial email, Snow suggested disallowing that, but in the discussion that followed, he seemed to rethink that.

The proposal is clearly a kind of early stage "request for comment" (or "a shot over the bow" as Snow put it) but it did spark quite a bit of discussion and some fairly favorable comments. Yury Selivanov was quite interested in the idea, for example, noting that just being able to share immutable objects would be useful:

Even if this is the only thing we have -- an efficient way for sharing immutable objects (such as bytes, strings, ints, and, stretching the definition of immutable, FDs) that will allow us to do a lot.

Concerns

But Gregory Smith was concerned about the impact of each subinterpreter needing to re-import all of the modules used by the main interpreter, since those would not be shared. That would reduce the effectiveness of Snow's model. On the other hand, though, Smith sees a potential upside as well: "I think a result of it could be to make our subinterpreter support better which would be a good thing." Several suggestions were made for ways to speed up the startup time for subinterpreters or to share more state (such as modules) between the interpreters.

Several in the thread believed that the existing, fork()-based concurrency was the right way forward, at least for POSIX systems. For example, Devin Jeanpierre said:

So if I have a web server, each independent serving thread has to do all of the initialization (import HTTP libraries, etc.), right? Compare with forking, where the initialization is all done and then you fork, and you are immediately ready to serve, using the data structures shared with all the other workers, which is only copied when it is written to. So forking starts up faster and uses less memory (due to shared memory.)

While fork() does provide those benefits, it is only available on POSIX systems. It is different than Snow's goal, which is "to make it obvious and undeniable that Python (3.6+) has a good multi-core story", which is partly a matter of public perception. The subinterpreter idea is just a means to that end, he said, and he would be happy to see a different solution if it fulfilled that goal. In the meantime, though, his proposal has some characteristics that multi-processing with fork() lacks:

But we are aiming for a share-nothing model with an efficient object-passing mechanism. Furthermore, subinterpreters do not have to be single-use. My proposal includes running tasks in an existing subinterpreter (e.g. executor pool), so that start-up cost is mitigated in cases where it matters.

But Sturla Molden pointed to the lack of fork() for Windows as one of the real reasons behind Snow's proposal: "It then boils down to a workaround for the fact that Windows cannot fork, which makes it particularly bad for running CPython". But, as Snow said, Python cannot ignore Windows. Beyond that, though, even with the "superior" fork() solution available, the perception of multi-core Python is much different:

If the multi-core problem is already solved in Python then why does it fail in the court of public opinion. The perception that Python lacks a good multi-core story is real, leads organizations away from Python, and will not improve without concrete changes. Contrast that with Go or Rust or many other languages that make it simple to leverage multiple cores (even if most people never need to).

Molden replied with a long list of answers to the "FUD" that is promulgated about Python and the GIL, but that doesn't really change anything. That is why Snow's goal is to make multi-core support "obvious and undeniable". It also seems that Molden is coming from a scientific/numeric Python background, which is not generally where the complaints about Python's multi-core support originate, as Coghlan noted.

Shared data

The reasoning behind restricting the data shared between interpreters to immutable types (at least initially) can be seen from a question asked by Nathaniel Smith. He wondered how two subinterpreters could share a complicated data structure containing several different types of Python objects. Snow acknowledged that concern, and suggested that avoiding the "trickiness involved" in handling that kind of data by sticking to immutable objects; though there may be "some sort of support for mutable objects" added later, he said.

Coghlan summarized Snow's proposal as really being three separate things:

Filing off enough of the rough edges of the subinterpreter support that we're comfortable giving them a public Python level API that other interpreter implementations can reasonably support
Providing the primitives needed for safe and efficient message passing between subinterpreters
Allowing subinterpreters to truly execute in parallel on multicore machines

All 3 of those are useful enhancements in their own right, which offers the prospect of being able to make incremental progress towards the ultimate goal of native Python level support for distributing across multiple cores within a single process.

In addition, Coghlan has published a summary of the state of multi-core Python that looks at the problem along with alternatives and possible solutions. It is an update from an earlier entry in his Python 3 Q&A and is well worth a read to get the background on the issues.

There seems to be enough interest in Snow's proposal that it could be on the radar for Python 3.6 (which is roughly 18 months off). There is a long road before that happens, though. A PEP will have to be written—as will a good bit of code. We also have yet to see what Guido van Rossum's thoughts on the whole idea are, though Snow did mention some discussions with Python's benevolent dictator for life in his initial post. As Nathaniel Smith put it, Snow's approach seems like the "least impossible" one. That is not the same as "possible", of course, but seems hopeful at least.

(Log in to post comments)

A better story for multi-core Python

A better story for multi-core Python

Subinterpreters

Concerns

Shared data

Recommend

Business Source License 1.1

Burning Man Has Fallen: Attendees Warned Not to Arrive Due to Flooding From Stor...

Removing Service from Laravel Container is not that Easy

"Programming" and "Programmers" Mean Different Things to Dif...

Beyond the hype: How will generative AI translate to enterprise solutions and ap...

First look: Inside Modyfi’s push to build the future of graphic design

A primal-dual approximation algorithm for the k-prize-collecting minimum vertex...

Wealthiest People in Israel (August 23, 2023)

An Uncommon Method to Delete Fear [the Dream-Destroyer] | Free Podcast Summary

这家蓝宝石公司一审判决结果公布：259名投资者获赔2292.28万元

About Joyk