4

Python packaging and its tools

 1 year ago
source link: https://lwn.net/Articles/924114/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Python packaging and its tools

Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

The Python-packaging discussions continued in January and February; they show no sign of abating in March either. This time around, we look (again) at tools for packaging, including a brand new Rust-based entrant. There is also a proposal to have interested parties create Python Enhancement Proposals (PEPs) for packaging solutions that would be judged by a panel of PEP delegates in order to try to choose something that the whole community can rally around—without precluding the existence of other options. As always, it is all a difficult balancing act.

One tool

Picking up from where our last article left off, there was interest in finding a single tool that the Python Packaging Authority (PyPA) could push as the default. But Donald Stufft said that he is skeptical that the PyPA has the means to "bless a singular tool in a way that people will actually recognize it as 'the' tool". Beyond that, though, he is not sure that the clamored-for single, unified tool is even possible; everyone expects too much of such a tool.

I suspect that 100% of the users that want a unified tool, just blindly assumed that whatever their preferred workflow, or something like it, would of course be included in that tool, and they don't consider that they might have to make drastic changes to their workflow to get it-- but somebody is going to have to make drastic changes, because the reality is what exists now in the world are so varied that a singular tool can't possibly solve them all IMO.

Greg Roodt wondered if an enhanced pip might be the right path. Pradyun Gedam agreed, noting that pip already occupies the privileged/default position. Even though it is not sensible to "combine all the workflows/innovations into one thing", getting to a 90% solution is "not an intractable task", he said.

Paul Moore was concerned that adding these features to pip is not something that can be done quickly; "pip has a lot of technical debt which we would need to pay off before we could add a lot of extra complexity". Roodt acknowledged that, but, like Gedam, thought a few extra pieces would be sufficient to help simplify the ecosystem substantially. Stufft said that adding features to pip "is probably the least controversial way of arriving at a unified tool".

Both Moore and Stufft were worried about adding the virtual-environment management that is offered by other tools to pip, but they did discuss some possible approaches. Brett Cannon cautioned that, based on some investigation of virtual-environment workflows he had done recently, "there is no 90% answer, so there will be some teeth gnashing regardless of the decision".

Poetry creator Sébastien Eustace wondered if there was even a need for the PyPA to "endorse or promote a single tool". He noted that Poetry is an independent packaging tool that came about because of missing pieces in the PyPA tool set; it was never endorsed by the PyPA but still has become "the second most downloaded 'packaging' tool". Stufft said that the push for a single recommended tool arose because it is one of the most requested features from users; the current status quo works, but users are not happy with it. Any endorsement would simply set the default:

I don't think anyone is suggesting preventing there to be options. The question is whether there should be a recommended or "default" option, not whether we should provide an only option. Obviously users want it, and they don't feel well served by the status quo.

Ofek Lev was concerned about the "massive undertaking" required to add these extra features to pip; throughout, he has been advocating Hatch as a better candidate for the unified tool. Stufft's (and others') arguments boil down to a question of practicality, though; pip exists, it is used by a majority of Python users, is already recommended by the Python Package Index (PyPI), so it is the default default, so to speak. But Lev thinks that fundamentally changing pip will be difficult to do for a number of reasons:

Do we think the backends of Flit, Hatch, Poetry, PDM, etc. were created just for fun or because PEP 517 told us we could? No, it was because setuptools was too difficult to contribute to. And consider in that case that is merely improving upon its central and only purpose of building packages. In the case we're talking about here we're in a code base of equivalent size with even more complexity and we're talking about not just adding new features but fundamentally changing what it does/is.

It is probably not surprising that Stufft disagreed; he does not think it constitutes a fundamental change to pip, just further evolution of the tool:

In fact, pip can start adding those features today, without anyone's permission, and I suspect if they did so the "please provide a unified tool" talking point would just go away, because pip is already the default tool, it's just implementing the features that people keep asking for.

However, the ability of the pip maintainers to find the time to do that work is worrisome to various commenters. "H. Vetinari" said that it is "an intriguing idea to flesh out pip in this way", but there is a need to expand the maintainer group in order to "realistically grow all those features in something less than 'years'". Moore agreed that it would require some "fundamental changes to how pip is maintained" to speed up the development of these extra features. He had a long list of reasons that it would take a long time, including limited maintainer bandwidth and the preservation of legacy pip workflows.

But Moore also agreed that choosing pip neatly sidestepped the "which tool to bless" question; "I don't think we should underestimate the challenges", however. While users want a unified tool, they may not want to wait as long as it would take; "In 3-5 years? Maybe we could get pip to the point of being that tool in that sort of time period. In 6 months? Not a chance." Stufft said that users are mostly reasonable and will just want to see progress toward the goal; "I don't think there is a world where they get it in 6mos no matter what we do".

Ralf Gommers is skeptical about the pip-based plan. He said that "the weight of history, the complex and legacy code, the backlog of issues and difficulty of working on pip, and the important lower-level role as a pure installer it already fulfills are already stacked against this idea". He suggested that some combination of Poetry, Hatch, and PDM might be the right approach; "Each has its own problems and isn't complete enough, however if you'd take the best features of each you'd have about the right thing." Authors of each of those tools have commented in the thread, he said, so they could simply get together and produce a unified tool:

[...] I think it's safe to say that if these projects would join forces, we'd have something very promising and worth recommending as the workflow & Python project management tool. And it doesn't require a 200+ message thread with everyone involved in packaging agreeing - if a handful of authors would agree to do this and make it happen, we'd be good here and could "bless" it after the fact.

Gedam announced his (also lengthy) blog post that attempted to summarize his views and fill in lots of the background on the topic. He concluded that adding features to pip would be desirable, but that it is daunting. Developers interested in packaging have generally developed their own tools:

[...] we've made it fairly tractable to "build your own" in a sandbox that lets you ignore the need to support entire swaths of workflows, and that's something you can't compete with easily for contributor experience. And, when the alternative is "spend a few months trying to implement something in a 'legacy' codebase, while catering to needs that you don't have, also convince a bunch of people with limited availability that your idea is a good one and wait for them to review what you wrote", it's not surprising that we end up with a bunch of "new things" and have multiple groups building multiple workflow tools.

We still don't have agreement that this is the direction that we, as a community, want pip to go.

Battling PEPs?

Stufft was generally in agreement with Gedam's "excellent post", but he did take exception to the idea that the community is not in agreement. He believes that most users are in agreement that pip (or some other tool that is shipped with Python) should provide the "unified experience". Since pip is that tool, it should be enhanced, or some other tool should be shipped with Python, which would require agreement from the Python core developers by way of the steering council (SC). He proposed a kind of PEP "battle" to figure out which direction to go:

Interested parties write a PEP on how they think we should solve the "unification" problem within some time frame, all of these PEPs will have the same set of PEP-Delegates, the various proposals will be discussed just like any other PEP, then the PEP-Delegates will pick one and that's the direction we'll go in. [...] If they are unable to come to an agreement, then it will get kicked up to the SC to make a choice.

My recommendation is that we do something a little unorthodox and instead of having a singular PEP-Delegate, we instead have a team of 3 of them, who will together select the direction we go in. My rationale here is that this is our first time making a decision quite like this, it's going to be a decision with a large impact, and there is no one singular person who could make this decision who isn't biased in some way.

Christopher A. M. Gerlach (C.A.M. Gerlach), who is one of the PEP editors, further refined the idea and offered his assistance. There has been at least one volunteer for the PEP-Delegate group that would evaluate the PEPs, but, as of yet, there has been no visible action on the creation of PEPs to consider. It is not at all clear that those who might be in a position to propose a PEP and push it through to "completion" want to put in the enormous effort required to do so. Multiple competing visions seems like it may be even more of a stretch, but we shall see—it has only been a little over a month since Stufft suggested that path.

A new tool

On January 20, though, Nathaniel J. Smith announced a new tool (and binary format) that, to a certain extent, upends the usual order of things. He noted that one of the goals of Kushal Das, who was one of the authors of PEP 582 ("Python local packages directory") back in 2018, was that Python beginners only need to download a single thing in order to get started with the language. The PEP, which is still being discussed, was a means to that end. Smith looked at the problem from a different angle:

Historically, our tools have started with the assumption that you already have a Python, and now you want to manage it. That means every tool needs to be prepared to cope with every possible way of installing/managing Python. It means a beginner-friendly workflow tool has to be part of the interpreter (the main motivation for PEP 582), even with all the limitations that imposes [...]

But what if we went the other way, and uploaded CPython to PyPI, so you could pip install python? Well, OK, you couldn't actually pip install it because pip is written in Python, but pretend we had a tool that could do this. Then Kushal's beginners could install this one tool, and it could bootstrap Python + the packages they needed.

Pybi is Smith's format for packaging CPython binaries for distribution, which is similar in form to the wheel format used by PyPI. That way, some tool could download the latest Python, install it, and pre-populate the install with some packages of interest from PyPI. As noted, though, that tool would not have access to a Python environment, so Smith also developed posy in Rust. In part, posy is meant to be a way for Smith to exercise his Rust skills. The GitHub site README starts with an homage, calling the project: "Me messing around in Rust for fun (just a hobby, won't be big and serious like pip)". The eventual goal sounds fairly serious, however:

  • A project-oriented Python workflow manager, designed to make it easy for beginners to write their first Python script or notebook, and then grow with you to developing complex standalone apps and libraries with many contributors.
  • A combined replacement for pyenv, deadsnakes, tox, venv, pip, pip-compile/pipenv, and PEP 582, all in a single-file executable with zero system requirements (not even Python).

The reception to the announcement ranged from generally positive to something approaching "over the moon", though there are still plenty of reservations, of course. For the most part, posy simply implements the existing packaging standards, but it also takes into account the lifecycle model that was discussed at a 2018 core sprint. That model ranges from beginners (or, more broadly, simple projects, perhaps consisting of a handful of scripts) through deployable web applications, reusable libraries, and standalone applications; it has come up multiple times in these packaging discussions.

Gedam was concerned that posy is inventing yet another scheme for virtual-environment handling, among other things; he raised the inevitable specter from the xkcd: Standards comic. Moore agreed with some of those concerns, but was happy to see posy take the full lifecycle into account:

Most tools and approaches I've seen either frame themselves as "beginner friendly" (stage 1 and maybe 2), or as aimed at stage 3 (deployable webapp/reusable library/standalone app) and later. And both groups assume that stages 1 and 2 - "simple scripts" and "sharing with others" are beginner workflows, not needed by more advanced users [...] Or at least, that's how the documentation, examples and discussions feel to me.

I've no idea whether this project will succeed in unifying the full lifecycle described in that document. I don't know if it'll make our existing problems worse. I'm concerned about the fact that it's inventing new mechanisms for things like isolation that may or may not work. I suspect that a model based around heavy manipulation of sys.path will cause huge problems for the static typing community, for example. But I'm pleased that someone is looking at a problem which I feel like struggled to express well enough to get the existing tools to pay attention to [...], and I'm glad that we're still innovating, and not just fighting to consolidate what we have and deal with legacy issues.

While "some of the ideas here are interesting", Stufft said, there were some things that he was "not particularly enthused about", including an unclear deployment story, an unnecessary extra binary format, and the implementation language. "I personally enjoy Rust, but I think it speaks to a serious shortcoming in the idea that it relies on being written in an external language to make it viable." He argued that the only Rust property that was being employed was that it can create a standalone binary, which is really just a property of compiled languages; it could have "used one of the various strategies that exist to create a single file executable out of Python", instead. He is concerned that it gives the wrong impression:

The language choice is a short coming, because it has the implication that the packaging tool isn't capable enough to produce real world software that is meant to be deployed to end user machines, machines that you can't rely on the system Python on. After all, there's nothing inherently special about posy here, it's just an application that wants to run without the dependence of an existing Python install.

Smith pointed out that posy would not exist at all if it were not written in Rust, however, since that was part of why he wrote it. Stufft acknowledged that, but is concerned that by sidestepping (via Rust) the problem of delivering a Python command-line application to users, that important part of the overall Python-packaging story is being skipped as well. He clarified that point further in another post: "My assertion is that packaging things for distribution to end users is also part of the packaging story, because well it is, and it's one of the most chronically underserved parts of our packaging story [...]".

The thread continued on a ways and it appears there is a fair amount of enthusiasm for Smith's approach. Where that goes from here is hard to say, but there is still a plenty of work needed to get to the point where posy can fully fill the niche he envisions for it. It may well make sense to merge the pybi and wheel formats into a "wheel 2.0" or similar; there is talk of doing so, which might be an effort that is independent of posy's future.

A new thread

As January came to a close, the thread for part one of the packaging-strategy discussion wound down and was eventually closed. In early February, the thread for the second part of the strategy discussion was opened, though it seems that much of the energy has gone out of the conversation(s), as the new thread had a rather desultory tone. Moore wondered if the discussion time might be better spent elsewhere. He asked: "But are these strategy discussions likely to deliver anything better, or are they just taking energy and bandwidth away from the people working on making progress?" Part of the problem is that the PyPA is effectively simply an interest group, rather than a decision-making body:

Discussions like this tend mostly to demonstrate that there's no uniform view on direction among PyPA members (let alone among non-PyPA projects like conda and poetry). [...] There wasn't much consensus on the previous discussion, so does that mean we have no strategy? Or will someone propose a strategy, in which case without a change in PyPA governance, what difference will that make? (Even with a change in governance, I don't see anyone imposing a particular direction on packaging projects - there's too much history of independence for that to happen any time soon).

That led Gedam to start something of a meta-thread where he responded to the frustrations that had been voiced about the discussions; he sympathized with those feelings, but felt that progress was being made. Beyond that, despite people feeling a sense of urgency to immediately solve the packaging problems, it is going to take a while to get there. "We're not going to magically/quickly solve issues that are happening at a larger-than-ever scale and that have grown into their current shape over more than a decade!"

Steve Dower suggested that some kind of focused, in-person gathering might be a better way to get some kind of resolution, though other options are possible: "Less ideal is to have regularly scheduled meetings in amongst other distractions, and at the bottom end is to have an online-only, text-only, open-invite discussion without a specific goal (sound familiar? :) )". Gommers wondered if there were any plans for such a gathering, but Gedam said that there were not, at least yet.

There is still fruitful discussion going on in various threads in the Packaging category of the Python discussion forum. It is clear that none of these questions or problems is going to be resolved anytime soon, though progress is slowly being made in various areas, just as it has been over the past decade or more.

It is probably the right time to let things play out a ways before we check back in on this freewheeling Python-packaging conversation. It will be interesting to see what, if anything, concrete comes out of it. There is, already, the pypackaging-native site, which describes many of the problems, but are there PEPs in the works to solve some of them? While the discussion is somewhat fragmented—and a bit fractious at times—there is a lot of attention being drawn to the problems right now, which may help lead the community to a workable path for a solution (or, more likely, solutions). Stay tuned ...


(Log in to post comments)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK