Hacker News new | past | comments | ask | show | jobs | submit login
My User Experience Porting Off Setup.py (gregoryszorc.com)
150 points by markdog12 6 months ago | hide | past | favorite | 89 comments



Packages have always been Python's Achilles heel. Python's philosophy for code is, "there should be one, and preferably only one, obvious way to do it." But for packaging systems, their philosophy is more like Perl: "there's more than one way to do it (and all of them have their own pitfalls)."

I don't understand why the Python leadership hasn't shown stronger... leadership... in which tools they recommend.


Python's situation is pathetic. Some Python folks wanted to get rid of the builtin module `distutils`. Got deprecated, and now it's finally removed in Python 3.12. That's a breaking change in a minor release, probably because they're too afraid to do a Python 4 release after the 2->3 debacle. Then they publish a migration guide only as part of the 3.12 release, which is too little too late, cause the majority of the projects hasn't moved away from `distutils` yet. There's a lot of unclarity like "you can just install setuptools, it replaces distutils" (setuptools has a startup script that hacks the search paths so you can still run `import distutils` and get their vendored copy of it). Except nobody knows whether to add a conditional dependency on setuptools then. Why on earth do you have to depend on another package to get a vendored copy of distutils? And a fraction of the projects doesn't care and makes it a user problem: just install setuptools yourself as a system / user site-package, then everything works. (Except it doesn't when setuptools is in PYTHONPATH cause the startup script doesn't run, which happens on nix and other prefix-based package managers). At the end of the day, every Python package has their own ugly solution.

All that could have been prevented by the Python folks who insisted on deprecating a builtin module. They could have jumped in and help migrate the popular Python packages away from distutils long ago, to set an example. But nope, they really like to keep things messy.


You think that's the only module they dropped?

They took away cgi… which I was of course using.

And crypt.

They don't care about breaking things. People would not migrate to python4 so they just break everything in python3 :D


The cgi module is still in 3.12. It will be removed in 3.13, which is planned for release on 2024-10-01, so not yet dropped.

FWIW, the Python developers have wanted to drop the cgi module for over 20 years. It has no maintainer.

Depending on your needs, you might consider:

1) versioning and maintaining your own copy of cgi.py;

2) use something like https://pypi.org/project/legacy-cgi/ (I have not vetted it) which is a forked copy of cgi and cgitb; or

3) use the CGI capabilities of wsgiref (I used it for one CGI project to allow a future transition away from CGI).

As for "crypt" - wow! I haven't seen anyone use that since the 1990s!

Again, the code is short, and easy to vendor, except for the _crypt extension module.

For that, you might be able to use ctypes, like this:

  >>> from ctypes.util import find_library
  >>> find_library("c")
  '/usr/lib/libc.dylib'
  >>> from ctypes import cdll
  >>> libc = cdll.LoadLibrary(find_library("c"))
  >>> import ctypes
  >>> libc.crypt.argtypes = [ctypes.c_char_p, ctypes.c_char_p]
  >>> libc.crypt.restype = ctypes.c_char_p
  >>> libc.crypt(b"toomanysecrets", b"az")
  b'azi4LBG1VJohQ'
Here is what Python's own _crypt does:

  >>> import _crypt
  >>> _crypt.crypt("toomanysecrets", "az")
  'azi4LBG1VJohQ'


What does it mean to say "it has no maintainer" about code in the Python standard library? In most other projects the standard library is jointly maintained by the core team. In version history the cgi module seemed to receive small fixes couple of times a year from different people.


"Has no maintainer" means there is no one who has said they have the expertise to be able to render final judgment on a bug report or other issue. "If no active maintainer is listed for a given module, then questionable changes should be discussed on the Core Development Discourse category, while any other issues can and should be decided by any committer." - https://devguide.python.org/core-developers/experts/index.ht...

In practice, what that means is if there is a bug report, like https://github.com/python/cpython/issues/71964 from 2016, then the fix may languish for years as no one in the core team is able to resolve it. You can see several people reported the bug, along with a comment from 2022 that "The cgi module is now deprecated following the acceptance of PEP 594" so will not be fixed.

The fixes I saw likely fall into the un-questionable changes that can be decided by any committer.


> FWIW, the Python developers have wanted to drop the cgi module for over 20 years. It has no maintainer.

It works. Does it need a maintainer?

The point isn't how to replace them. The point is that I have stuff that works and has been working for several years and will stop working.


Then Python is not the language for you, and never has been.

The Python core developers have had a practice of removing standard library packages for over 30 years, including packages that - like cgi.py - worked.

For example, modules removed with Python 2.0 included cmp, cmpcache, dircmp, dump, find, grep, packmail, poly, stdwin, util, whatsound, and zmod.

Python 2.4 removed mpz, rotor, and xreadlines - no more Enigma machine emulation for you, and I had to change my code because I used xreadlines.

Python 3.0 removed even more modules: cl, md5, sha, rfc822, and more. Plus it did some library reorganization.

Ever use the "parser" module? I did. It was removed in 3.10 because of the switch to the PEG parser.

PEP 594 re-affirms the reasoning behind the long-given practice removing old packages.

There are language communities with a stronger commitment to not breaking old code. COBOL is fantastically backwards compatible, for an obvious example.

I urge you to consider the options I gave. The functionality you want can be done using the standard library without that much effort.


My impression is that there's lots of different niches that have developed, each with their own needs, and trying to unify them is a mess.

If you're putting something like a web app or something else that will be bundled and distributed as a unit, then you're probably best off with something like Poetry, PDM, or pip-tools - you have a lock file for deterministic dependencies, most of your dependencies will be pure python wheels, and you only really need to test things once. On the other hand, if you're developing a library, you'll need to test against multiple versions of Python, and ideally multiple versions of some of your larger dependencies, to ensure that your library will work for as many users as possible. You'll also need to be able to build and package wheels. Alternatively, you're working in data science, and your main concern is probably making sure you can install the packages you need in the environments you're going to use them - specifically, so that they work with the GPUs and other hardware you have available. And there's still the group of people writing mainly scripts for server maintenance or other tasks, who want to be able to easily install dependencies and keep those dependencies up to date for security reasons, with the minimum number of breaking changes.

Right now, there are different tools, packaging systems, etc catering to each of these groups, and so building the One Ring of Python package management is going to involve (a) solving all of these problems, and (b) convincing all these groups of people that your general solution is better than their niche-specific solution. That's certainly not easy, I don't even know if it's all that possible.

I do think that working from the ground up (i.e building the individual components like the package metadata file, or the pypackages folder experiment) seems to be working well, in that tools seem to be coalescing around these options and finding the best ways to use them, which is all work that might hopefully feed into new official tooling. But we'll see.


As I see it, there is a hierarchy of packaging needs, the base levels have been solved over and over with new, better, shiny all you need tools -- while the most tricky, complicated part has been solved over and over separately with each project.

  * Pure python -- Easy, use one of the declarative ones. 
  * Python + Standalone C -- not too bad, use the build tool.
  * Python + external (potentially distro supplied) C libraries -- Using setup.py, customized,  and different for each project. 
That last one is where Pillow, the ML space, scipy, and others live, and it's painful. Pillow has a 1000 line setup.py file to find all the optional (and 2 required) dependencies and headers on it's platforms. We've also got code to build the dependencies if necessary for packaging. To port this to some standard, we'd effectively need rpm or dpkg style build infra from PyPa, to work on all the supported platforms.


I think that's also a valid view of the problem (i.e. the further away from pure Python you get, the more complex and unclear things are). But I also think that's just the view from the "library developers" perspective — if you aren't publishing a library, but using Python for some other purpose, you are going to run into issues even at points that, from your perspective, are already fairly solved.

For example, for application developers, even if they just stick to pure Python dependencies, there's still no standard lockfile format that standard Python tools can just emit and ingest. At best, you've got `pip freeze`, but you'll need to use custom tooling to update and maintain that, or switch to pip-compile or another, more full-featured package manager. To me, lockfiles really are table stakes here, but they're not at all easy to get working in Python.


> Python + Standalone C -- not too bad, use the build tool

Doesn't that generally end up using setup.py as well?

From my understanding, build ends up calling the build-backend, which defaults to setuptools.build_meta:__legacy__, which is setup.py.

I know there are other backends, but they seem very specialized to a certain project's needs.

I think there's a cmake backend too, but I don't like requiring my customers to install cmake first, and that dependency can't be expressed in pyproject.toml.

I had hoped that redo (https://redo.readthedocs.io/en/latest/) would become popular, as a small, simple, pure-Python Makefile replacement, and that there would be a back-end using it, but neither happened.

My specific needs for a backend is to support and configure a code-generation step when building my C extension. The full code generation is >10MB, which handles all 3x24 or so different specialized implementations of the core algorithm. This takes a while to compile, so during development I use a slower, general-purpose implementation.


> you're working in data science, and your main concern is probably making sure you can install the packages you need in the environments you're going to use them

Honest question from a web developer who sometimes has to work with Python — don't containers solve exactly this?


Unfortunately no, the problem here is that you're probably going to need a lot of compiled extensions, and some of these extensions are going to be running on your GPU (especially if you're in the ML world, but also more generally if you want to take advantage of e.g. your lab's HPC cluster). PyPI can manage some of this with the wheels system (i.e. OS, architecture, Python ABI), but there's no metadata to indicate, for example, which GPU you have available. So in most cases it's possible to just precompile all the relevant variants and let people download the best one for them, or even in some cases allow people to compile everything for themselves, but there's still situations where those aren't good options.

This is why PyTorch is famously more complicated to install via newer packages managers such as Poetry, because it requires something slightly more complicated than the existing Wheel setup, and most package managers aren't designed for that. (Pip isn't designed for that either, but PyTorch has come up with workarounds for pip already.)

Containers can't solve this problem because containers are tied to the architecture of the machine they're running on, they can't abstract that away. So even if your code is running in a container, it still needs to know which architecture, OS, resources, etc it has access to.


Not exactly no - https://stackoverflow.com/questions/63960319/does-it-matter-...

You need to be running a GPU driver on the host that supports the container cuda version.

So in theory yes, in practice, weird issue occur sometimes that really suck to debug. For example why do I get NaN loss after spending 8 days on 128 GPUs with this specific set of drivers+cuda container? (Don't hold it that way, use a matching cuda version...)

Also a lot of data scientists HATE sys-admin tasks and docker falls squarely into that for many people.


The problem is that people doing data science are not developers, so instead of just using whatever is there they are reinventing a terrible version of package management.


It seems to stem from the Python packaging story being driven by PyPA, rather than the Python foundation, which appears reluctant to pick a packaging solution and ship it with CPython. setuptools was kind of an in-between, it was included in CPython but in many cases you have to update it or add plugins...


I’m thinking https://astral.sh/ are going to be the ones to “solve” this.


Packaging and linting are two very different topics. Just because someone does A very well, doesn't mean they have any clue about B.


Astral and Ruff look great, but what is their monetization strategy, and how do their interests intersect with the interests of the developer community in the long term?


Unsure if slick promo or not, but that review by tiangolo instantly piqued my interest


By joining forces with Pyflow, or a different tool, or making yet another one?


Packaging was the reason I left python (in favor of Nim) roughly 10 years ago.


The funny thing in context to this is that Perl has rock solid package management and distribution since like forever (CPAN).


If the code hasn't had any updates in 15 years, it's easy to be rock solid.


I believe PEP 517 was how the Python leadership wanted to improve the situation. Unfortunately it backfired hard. It broke the existing systems and further fractured the ecosystem.


Python leadership did the python 2->3 adventure. So.....

But yeah, I agree. This is something that golang nails. They supply all the options out of the box.


A friend of mine had a critique of videogames that has stuck with me. He called them "antiknowledge". Your brain's learning faculties are repurposed into "learning" something that isn't actually a thing. You get really good at some artificial skinner box treadmill system ... and then they change the meta and you have to keep up with the new stats and strategies and so on. Yet after sinking in hundreds of hours, you don't come out the other end with any tangible real-life skill, that would be useful outside of the game.

That's what Python packaging feels like. At least videogames are fun.


Not all of them! Many of those skinnerbox type games actually have a pretty mediocre or outright unfun game loop - it's the skins, battle passes and "rewards" which keep people playing.


A lot of videogames have you learning antiknowledge, but at the same time very useful meta skills. Build orders in Starcraft won't help you in your office job, but a habit of working out complicated tradeoffs, feeling out your opponent and having a feel of un-intuitive consequences will.


> you don't come out the other end with any tangible real-life skill, that would be useful outside of the game.

Replace "the game" with "the small sphere of specialisation" and this applies to learning a great many things which don't generalise, many of which don't make you money either.

Your friend doesn't have a critique of video games, he has just discovered relaxation and/or hobbies.


Girls' supposedly innate disadvantage over boys in spatial skills disappears after they play video games.


What does that have to do with anything?


An example of video games imparting valuable real-world skills, the opposite of "antiknowledge".

There is other research suggesting video games have learning benefits:

https://www.bbc.co.uk/newsround/53740172

https://www.ox.ac.uk/news/2020-11-16-groundbreaking-new-stud...


I do not think your example does a favor to describing anti-knowledge. I do think anti-knowledge exists in IT and its a problem. One of which is the label and annotation cult.


Really good description of dota2 in particular. In some ways, League has become more "pure" in my mind (especially comparing to w3 dota)


Completely wrong. Every skill, no matter how useless, generalizes.

Competitive team games are the best examples. Playing them well requires internalizing the concepts of probability, teamwork, mindset, efficiency, dealing with failure, etc.

Those skills come quite in handy because life is a game. It's especially obvious when looking at human-created systems like capitalism and jobs.


(self-correction)

"Completely wrong" is too harsh. What I said only holds true when a player puts in the effort to get good at a game. Mindlessly playing Cookie Clicker while commuting will not yield many benefits.


That's nonsense. The same could be said of most hobbies. The motivation isn't learning real life skills, it's entertainment.


It can be. But isn't it a question of balance?

If you play some useless¹ game for the majority of your days, you can have multiple reasons for this. Maybe it keeps you occupied and thinking of the problems in your life? But after a certain degree it certainly does not "entertain" you anymore: You are doing it because doing anything else that you can think of feels worse. And a few years after not a lot will be left of it, unless you played with your friends or there was something in it for you beyond just pushing the time forward.

If I am playing my instrument for hours, I have at least improved at expressing my feelings with my instrument while having a generally good and relaxing time. And after years of doing it I can do it well enough to play on concerts without feeling afraid.

I know to many people who are so afraid of their own thoughts, they will obsessively "entertain" themselves dueinf all wake hours, and the majority of the time they are not enjoying it. This is bad. We only have one life. You don't get a price for having some the most, but at least do something that is meaningful.

¹ a game that you yourself don't even love that much


I'm +infinity on "antiknowledge", but I like your "entertainment" point.

To harmonize these ideas, I argue to the kids that the "entertainment" choices are like spice on the food. We want enough to enjoy the flavor in question, but not so much that the usage becomes expensive or overpowers the dish.


I just wanna say I really hate the new poetry+toml crap, it's just more complicated and yet another set of crap I gotta deal with arbitrarily by arbitrary projects.

Why not just improve upon pip? I don't know, just have pip use toml and give it different flags or auto detect things? Was a whole new ecosystem of tools needed?

I look at decisions like this and I know a Python 4.0 is on the way, just like the 2->3 jump because why not right?

Any language that updates it's syntax or toolset in a way that is impossible to have backwards compatibility is an irresponsible toy language, as awesome and powerful as it may be, the developers are still toying and tinkering with it with no regard to real world impact of their decisions.

Why can't python3.12 have a flag like --std=2.7.18 for example, like gcc? If the devs don't have enough resources, I would live to see a donate page I can help out with.

We are at a point where to deploy python you can't use python but you need shell scripts to figure out versions, venvs, pyev, pipx, poetry, etc... and reliably fail of course and every time you have to troubleshoot the problem. This is a failure in software engineering, new grads should be taught of python and similar languages and the lack of planning and organization and resulting cascading chaos as examples of how not to design the user experience of any piece of software.

Sorry if I exaggerated a bit anywhere, it's difficult to pretend all the frustrations and crying out "why???" when using python don't exist. But at the same time, it is still my #1 go to language for most use cases because the language itself is just fabulous!


I deal with aggressive vulnerability management at work, and we have a unified pipeline. Figuring out what our servers will be running versus what we are doing locally versus what the pipeline expects us to do for dependencies, plus vuln management is so much work.

God forbid you have to upgrade a base python version and reopen the dusty tomes of python dependency hell.


I work in infosec as well and I have yet to even look into malicious pip packages (although I've seen malicious nuget packages), with the last curl vuln it was chaotic, telling people a lot of things actually use libcurl. Can you imagine if something like the requests or urllib package became compromised, absolutley no real way to manage the patching, projects using old versions of it will be forced to upgrade and every package that claims it needs a specific version will break, pure chaos!


> reopen the dusty tomes of python dependency hell.

They are bathed regularly in the blood of a thousand virgins and anything but dusty


I still use pip + venv for everything and I don't really see a reason to change. Correct me if I'm wrong, but the way I understand it, tools like poetry aren't really anything "official" as pip is, just something that popped up and gained some traction. It's only the pyproject.toml format that was standardized, and poetry happens to be one of the tools that supports it.


One area I really like the push towards pyproject.toml: Configuration.

I was sick of having 10 different files in the root of my project to config the different tooling that the project used.

I got so sick of it at one point, I went ahead and wrote the code to support the pyproject.toml file for mypy because it was my last holdout.


About "eager deprecations" let me give you another absolute gem:

    ********************************************************************************
    The license_file parameter is deprecated, use license_files instead.

    By 2023-Oct-30, you need to update your project and remove deprecated calls
    or your builds will no longer be supported.
    ********************************************************************************

Yes, please go ahead and break people's builds at an arbitrary date because the technical challenges of supporting both `license_file` and `license_files` are insurmountable.


To be fair, that seems to have been a 2.5 year warning:

https://github.com/pypa/setuptools/commit/3544de73b3662a27fa...


Just as an example: CMake ensures roughly a decade of backwards compatibility.

This `license_file` change is the epitome of unnecessary breakage.


Or packages' need to support more than one version of Python.


Thank you Gregory for writing this post. There have been a bunch of announcements about "setup.py has been deprecated", but few have clearly outlined how to move away from setup.py, and more importantly, fewer have outlined what a struggle it is to move away from setup.py.

I was sad to see setuptools officially deprecated, because it looks like another way in which Python packaging is being red-taped away for a non-expert. If someone like the OP (who has 10+ years programming Python) had to do so much for what appears to be a zstd CFFI/Rust wrapper, where does that leave the rest of us?

Here's a python package of mine that uses setup.py: https://github.com/ahgamut/cliquematch/blob/master/setup.py which I have not upgraded to the new tool(s) yet. I think I will need to upgrade it soon. If anyone has suggestions for a tool that will _fully replace_ setup.py, I would like to see tutorials with the following examples:

1. How would I build a package that has pure-Python files and data files? With setuptools I would use maybe MANIFEST.in or package_dir.

2. How would I build a package that has a CPython extension accessed via cffi? (this post points to the answer)

3. How would I build a package that has a CPython extension _without_ cffi, that just wraps some small C code I wrote with CPython's API? What about an extension that uses PyBind11? What about an extension that uses Rust?

4. How would I build a package that requires a "system" package like libblas-dev? Can something like numpy be built optimally without ever writing setup.py? What would a config for that look like? Last I remember numpy used their own patch of distutils to build, I wonder what it is now.


Here's a packaging guide that answers most of your questions.

https://learn.scientific-python.org/development/guides/packa...

As a TLDR, you have many options 3rd party build tool (aka build backends). Each build tool have different *static* ways to specify compile options that is native to the language or generic (e.g., CMakeList,s Cargo.toml, 3rd party YAML. When it comes to dynamically specifying your extensions, setuptools is still the only option.


I think the documentation (or lack of documentation) can be very illuminating in the focus and though process of those behind a project. That there is concise "This is how you did it before and this is how it is done now" shows that there seemingly haven't been any though process in building that bridge. And in general it seems like the process of packaging projects that is not going to be published on pypi, but are going to be used internally is a dark spot as well.

Having the mess of different blog posts and documentation sources saying different stuff is far from ideal though. If you can't sum up the process in a clear concise way you are far from done.


Python removed distutils… basically now there is no official way to install things.


The Python documentation says "pip is the preferred installer program. Starting with Python 3.4, it is included by default with the Python binary installers." - https://docs.python.org/3/installing/index.html#installing-i...


Except that pip doesn't actually do the install part, you need some other module for that.


I do not understand your comment. What module are you talking about?

What prevents the pip from a normal CPython installation from doing an install?

You can even bootstrap pip via the ensurepip module documented at https://docs.python.org/3/library/ensurepip.html which notes "This module does not access the internet. All of the components needed to bootstrap pip are included as internal parts of the package."


How do create a package that can be installed using pip?

Can you do that without setuptools or similar?


Right, but creating a package is not the same as installing a package, and your comment at https://news.ycombinator.com/item?id=38083296 concerned installing.

> Can you do that without setuptools or similar?

Sure.

Here is a program which, when run, creates the zipfile "hello-1.0-py3-none-any.whl" containing the wheel for a package named "hello" along with an entry point for the command-line program also named "hello"

    # Create a wheel for a 'Hello, world!' Python package.
    #
    # To install:
    #   pip install hello-1.0-py3-none-any.whl
    #
    # To use on the command-line:
    #   hello
    #
    # To use from Python:
    #   >>> import hello
    #   >>> hello.main()
    #   Hello, world!
    #
    # To uninstall:
    #   pip uninstall hello

    import base64, hashlib, zipfile

    package_name = "hello"
    version = "1.0"

    # This will go in hello/__init__.py
    payload = """
    def main():
      print("Hello, world!")
    """

    METADATA = f"""\
    Metadata-Version: 2.1
    Name: {package_name}
    Version: {version}
    Summary: Example of a hand-built wheel.
    Home-page: http://news.ycombinator.com/
    Author: eesmith
    Author-email: eesmith@example.com
    License: Public Domain
    Platform: UNKNOWN

    UNKNOWN
    """

    # This causes the installer to create the command-line 'hello' program.
    entry_points = """\
    [console_scripts]
    hello = hello:main
    """

    WHEEL = """\
    Wheel-Version: 1.0
    Generator: eesmith_wheelgen (0.0.0)
    Root-Is-Purelib: true
    Tag: py3-none-any
    """

    top_level = f"""
    {package_name}
    """

    def build():
      wheel_name = f"{package_name}-{version}-py3-none-any.whl"
      with zipfile.ZipFile(wheel_name, "w") as zip:
        dist_info = f"{package_name}-{version}.dist-info"

        # Add a file and build up information needed for the RECORD .
        record_lines = []
        def add_file(filename, content):
          with zip.open(filename, "w") as f:
            byte_content = content.encode("utf8")
            f.write(byte_content)

          digest = hashlib.sha256(byte_content).digest()
          encoded = base64.urlsafe_b64encode(digest).rstrip(b"=")
          record_line = f"{filename},sha256={encoded},{len(byte_content)}\n"
          record_lines.append(record_line.encode("utf8"))

        add_file(f"{package_name}/__init__.py", payload)
        add_file(f"{dist_info}/METADATA", METADATA)
        add_file(f"{dist_info}/WHEEL", WHEEL)
        add_file(f"{dist_info}/entry_points.txt", entry_points)
        add_file(f"{dist_info}/top_level.txt", top_level)

        with zip.open(f"{dist_info}/RECORD", "w") as f:
          f.writelines(record_lines)

    if __name__ == "__main__":
        build()


Surely you can't install a package if you first can't create it?


You can easily install a third-party package, which will bootstrap you to one way to create package.

But, okay, you want to install a package you wrote, using only a stock Python installation.

1) Why? That is, you can modify your PYTHONPATH to include your working directory.

2) Why use an installer? For simple packages you can drop the file/directory into site-packages.

3) I've been using Python for long enough that I distributed Python packages before there was a setup.py or setuptools. I used a Makefile.

In fact, I still use a Makefile when I need faster build cycle times as setup.py does not do good dependency management.

4) I showed you how to build a wheel using only stock Python code, which lets you use pip as the installer.

Finally, with a source distribution you can tweak Python's own Makefile to build your own module, including a C extension: https://docs.python.org/3/extending/extending.html#compilati...

Why don't any of these resolve your issue?


As someone who just yesterday also went through the exercise of (finally) understanding how to package with pyproject.toml, I empathize with the author and agree with a few painpoints he mentioned. Namely the confusion caused when opening some pages in https://packaging.python.org/en/latest/ and seeing references to a number of soon to be deprecated tools and approaches to packaging, as though they still have a place in the horizon. It's especially frustrating because the website is versioned, so you would expect a deliberate use of deprecation warnings and clear recommendations to migrate to new approaches and tools. I opened that website with the understanding that pyproject.toml is the future and setup.py is out. Instead, I still saw pages where the two are treated as though they will coexist for a while.

Having said that, the author also sounds like he's ranting a bit. He seems to insist in finding specifically how to work the way setup.py used to, but without setup.py, instead of just learning how to use pyproject.toml. While learning the new way of doing something, how it replaces the old way is usually self-evident. The (official) tutorial he eventually lands on (https://packaging.python.org/en/latest/tutorials/packaging-p...) is actually a pretty good primer. Without previously knowing what hatchling, build, twine, or even pyproject.toml was, I was able to quickly understand their purpose. From clicking a few other links on the side bar, I understood that packaging is done with tools that present a frontend and interact with a backend. Sometimes a tooling set provides both. Hatch seems to be the frontend of one such project, while Hatchling is the backend.


> The (official) tutorial he eventually lands on ... is actually a pretty good primer.

That tutorial you linked to does not describe how to handle the issue the author faced in building a Python binary extension. It doesn't even describe how to build a C/C++/whatever extension.

At the bottom of the page the text "Read about Packaging binary extensions" points to https://packaging.python.org/en/latest/guides/packaging-bina... which starts "Page Status: Incomplete" and "Last Reviewed: 2013-12-08", and contains a number of "FIXME" sections.

(Parts have been updated during the decade, but the contents of the page do not instill confidence that it's up-to-date.)

That documentation still doesn't show how to configure the extension, instead linking to https://docs.python.org/3/extending/extending.html . That in turn links to https://docs.python.org/3/extending/building.html which links to https://setuptools.pypa.io/en/latest/setuptools.html which takes you to https://setuptools.pypa.io/en/latest/userguide/ext_modules.h... showing how to configure your setup.py.

However, unlike using setup.py directly, where you can hack setup.py's argv, that documentation does not show any way to specify compile-time parameters, "like --system-zstd and --rust-backend as a way to influence the build".

Which is what the author wants to do.


I've had a similarly frustrating time trying to understand and wrangle the pyproject.toml builder system, (egg-layer? wheel-roller? cheese-monger?)

One thing the author might want to try is writing their own "build-backend". You can specify your own script (even use setup.py) and that will be the target of python -m build or pip wheel or presumably whatever build-frontend you use.

    # pyproject.toml
    [build-system]
    requires = ["setuptools"]
    build-backend = "setup"  # import setup.py as the build-module
    backend-path = ["."]

Then in setup.py you should write two functions:

    def build_sdist(sdist_directory, config_settings):
        ...

    def build_wheel(wheel_directory, config_settings, metadata_directory):
        ...

Where config_settings is a dictionary of the command line "--config-settings" options passed to the builder. (sys.argv does not have access to the actual invocation, I suppose to ensure frontend standardization)

example:

    $ python -m build --config-setting=foo=bar --config-setting=can-spam

    # will call 
    >>> build_sdist("the/dist/dir", {"foo": "bar", "can": "spam"})

Of course, you can extend the default setuptools build meta so you only have to do the pre-compilation or whatever your custom build step requires:

    from setuptools.build_meta import build_sdist as setuptools_build_sdist

    def build_sdist(sdist_directory, config_settings):
        # ... code-gen and copy files to source  ...

        # this will call setup.py::setup, to make things extra confusing
        return setuptools_build_sdist(sdist_directory, config_settings)
I had to create a temporary MANIFEST.in file to make sure that the setuptools build_sdist saw the generated files. Maybe there's a better way? I think the wheel "just" packages whatever the sdist produces, though that might be more difficult if you're compiling .so files or whatnot.

Still overall pretty fiddly/under-documented and a shame there seems to be a push for more dependencies rather than encouraging users to build their own solutions.

More info in PEP 517: https://peps.python.org/pep-0517/


    I open https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html
    in my browser and see a 4,000+ word blog post.
    Oof. Do I really want/need to read this?
*proceeds to write an 8,000 word blog post about it*

-----

Kidding aside, good content and a great reminder to think before blindly upgrading—at least until the kinks and details are worked through


I’m quite surprised to see that setup.py has been deprecated for years. I set out to setup my first package earlier this year and after spending way too much time trying to figure out what is recommended and why I actually settled on setup.py

I knew it was “the old” way, but didn’t realize it was abandoned.

Getting your code packaged seems way harder than developing your code.


It is not deprecated. Calling setup.py directly is (i.e., `python setup.py sdist`)

Modern way of building a package is to use a build frontend. Most notably is the PyPA's build package.

    python -m build .
Most project managers like poetry has this built-in with it's `poetry build` command.


> No offence meant to the Poetry project here but I don't perceive my project as needing whatever features Poetry provides: I'm just trying to publish a simple library package

I don't understand... Poetry solves this too.


It is not obvious it does.


It's literally written on the home page:

    SHARE YOUR WORK
    Publish
    Make your work known by publishing it to PyPI
    You can also publish on private repositories


I've decided (a few years ago) that if i'll ever have to upgrade the packaging for my stuff, i'm going to do it with nix (and only that - sorry other os users, you'll have to install nix - actually: not sorry!). I was in (distro) packaging way too deep and decided that my limited time in this world doesn't allow for that kind of crap, anymore.

The blatant mess that ensued in the meantime (i.e. last few years) proves me right, imho.


It is as if someone has taken the time to describe my typical day. I'm not a package maintainer or anything, just a guy trying to keep multiple CI/CD systems working.

I did not get halfway through this before I could start to feel the hairs of my neck start to stand on end. The numerous blind alleys. The promising leads that aren't. The official documentation that contains links to other documents that contradict the first and so I have to try and also piece together some temporal state to make any sense of any of it. Some days it seems I am actually some kind of an information archeologist, piecing together the detritus of overlapping civilizations.


In this particular case. OP had a lot of knowledge which served as a crutch in this journey. The less you are aware of how setuptools used to work, the better.


Its more funny to migrate the old setup.py process to the pyproject.toml and an rpm specfile. with an setup.py you can say something like "python3 setup.py install -root=$builddir"


With project.toml, the new strat is to build the wheel with `python3 -m build` and then install it with `pip install --root=$DESTDIR` plus a handful of flags to tell pip not to touch the network or the local cache.

It's not great, but it's also not terrible.

E.g. https://github.com/cbarrick/efiboot/blob/2ca46a7c27c837adf23...


I know that now, but at the time i migrated i didn't and i had the same problems as the post. I still have my problems with using python in the company but it isn't from python or its ecosystem itself. Its from redhat splitting their python packages into the different supported python versions(package a is only available for 3.6 and package b only for 3.11)



Oh wow, this is a pretty gnarly issue for the Python packaging story, considering the looming deprecation of setup.py.

I traced the discussion to [1], which is an interesting read, but it seems that progress on this died out in April, at least on the Python side.

[1]: https://discuss.python.org/t/linux-distro-patches-to-sysconf...


Grim. Python packaging is a skip (dumpster) full of dead rats.


Out of context (and with limited knowledge) comment 3 seems terrifying.


I assume you mean this part:

> We will clobber any previously installed version of this package, even if it breaks whatever else is installed. It's the user's job to make sure that is all sorted out ahead of time.

But this is a `make install` recipe, and I think this is generally expected behavior when running that command. Typically, this would be run in a chroot when building a package.


Thank you for sharing this journey! Very helpful.


from* ffs


Ouch. I stopped reading 15% of the way through because I noticed that I was only 15% of the way through and I felt like it should be the end.

It seems like with a tenth of the effort of this blog rant the author could have written a flowchart or best practices.

Luckily for 99% of Python users, we only need to install libraries and not package them...


I wrote a very similar post a couple months ago about trying to modernize the Redux JS packages to support ESM and CJS module formats:

https://blog.isquaredsoftware.com/2023/08/esm-modernization-...

I _liked_ Gregory's post. I _felt_ all of the frustration, and the confusion over lack of good docs, and the plethora of competing tools and formats. It resonated deeply for me. I haven't touched Python in years, but I could see exactly how all these pain points were happening thanks to the writing style and the explanations. The goal is to share "here's what I tried", "here's what seems wrong or broken", "here's the pitfalls I ran into", and "here's the _frustration_ I'm feeling at how messed up this all is". And it succeeded.

(And fwiw, I've had a lot of other folks in the JS ecosystem express similar thoughts as thanks for my post. It's a limited target audience, most people won't care, but the folks who _do_ have to deal with these kinds of problems understand and appreciate the details and the effort involved.)


Thank you for this post! I've recently read it and then decided to abandon any attempt at such a migration.


How can he write that when seemingly there is no best practices from Python?


"Rewrite in Pascal"?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: