Hacker News new | past | comments | ask | show | jobs | submit login
Pypi.org is running a survey on the state of Python packaging (pypi.org)
234 points by zbentley on Sept 7, 2022 | hide | past | favorite | 188 comments



If I see some JS, Go, or Rust code online I know I can probably get it running on my machine in less than 5 min. Most of the time, it's a ‘git clone’ and a 'yarn' | 'go install' | 'cargo run', and it just works.

With python, it feels like half the time I don't even have the right version of python installed, or it’s somehow not on the right path. And once I actually get to installing dependencies, there are often very opaque errors. (The last 2 years on M1 were really rough)

Setting up Pytorch or Tensorflow + CUDA is a nightmare I've experienced many times.

Having so many ways to manage packages is especially harmful for python because many of those writing python are not professional software engineers, but academics and researchers. If they write something that needs, for example, CUDA 10.2, Python 3.6, and a bunch of C audio drivers - good luck getting that code to work in less than a week. They aren’t writing install scripts, or testing their code on different platforms, and the python ecosystem makes the whole process worse by providing 15 ways of doing basically the same thing.

My proposal:

- Make poetry part of pip

- Make local installation the default (must pass -g for global)

- Provide official first party tooling for starting a new package

- Provide official first party tooling for migrating old dependency setups to the new standard

edit: fmt


What you feel with python is what I feel with JS

managing npm and package.json which has dependency issues.

With python it always has been pip install package and we are done.

Although I do share your pain with versioning. I once spent a week debugging an issue only to find out that there's a fixed version available.


I find a difference between managing and using.

As a user of an application: `npm install` just works (same with `cargo build`). For Python, I’ll probably do `python -m venv env; . env/bin/activate` and then, well, it’s probably `pip install -r requirements.txt`, but sometimes it’ll be other things, and there are just too many options. I may well add --ignore-installed and use some packages installed locally of potentially different versions, e.g. for setting up Stable Diffusion recently (first time I’ve ever used the dGPU on my laptop) I wanted to use the Arch Linux packages for PyTorch and the likes.

For managing dependencies in an existing library or application (as distinct from starting from scratch, where you’ll have to make several extra choices in Python, and where npm is badly messed up for libraries), both npm and Python are generally fairly decent, but the whole you-can-only-have-one-version-of-a-library thing from Python lends itself more to insurmountable problems.

My personal background: many years of experience with both npm and Python, but I’ve never done all that much packaging in either. (Also many years of Rust, and I have done packaging there, and it’s so much easier than both.)


> With python it always has been pip install package and we are done.

The worst I encountered was a somewhat lengthy manual to build a project (cannot remember what it was), and not only did you have to manually install all required dependencies -- npm automates this by having an npm-readable list in package.json -- but after running all those commands, the last note said something like "oh BTW, whenever we wrote "pip" above we actually meant "pip3", use that instead, and if you did it wrong, there's no way to undo it."


Ah the classic boogaloo of them finding it was pip3 and adding it tocs in the last...

I agree with that, its pretty painful


Once I switched to pnpm many of those problems went away for me. I manage a slew of js/ts monorepos and pnpm is a god send, not because of performance or workspaces, but because things get resolved sanely.


I used to feel this way in the JS ecosystem. It's been quite a while since I've encountered something insurmountable (unless it's something like Bit using bespoke packaging).


Libraries that use native code in JS or C-bindings in Go are equally annoying to get going if the documentation from the author is sub-par. In the JS world you'll get pages and pages of node-gyp errors that are completely incomprehensible until you realize the author never mentioned that you needed a half dozen little development library dependencies installed. Native C/C++ library code interfacing just sucks in general because there is zero package or dependency management in that world too.


I think some of the pain has been Apple's fault. Requiring conda for some official packages means you always have two competing ecosystems on one machine - which is just asking for pain.


What packages are you referring to? I’ve been doing professional Python dev on a Mac for the past three years and have never had a reason to use conda, so I’m curious what I’m missing.


It’s only the last 5 years, give or take, that you can install C-compiled packages with pip. When I started out, I had to apt/port/pacman install packages, then run pip install (or python setup.py even) to install dependencies. It’s hard for newer python converts to even comprehend the pain it used to be. Conda came with the compiled package concept, and pip came after. Dealing with dependencies these days is a breeze, relatively speaking. On windows you even needed a visual studio license, just to pip install numpy!


> It’s only the last 5 years, give or take, that you can install C-compiled packages with pip

Give, quite a bit. The egg format was introduced in 2004 [0], and even the newer wheel, that replaced it, is turning 10 later this month.

[0] https://packaging.python.org/en/latest/discussions/wheel-vs-...


Just because the egg/wheel format existed, didn't mean that they where actually used and actually worked. Being able to reliable pip install packages like numpy and gdal only started 3-4 years ago. There is a reason that Christoph Gohlke's python package site exists and was popular.


I started using numpy around 2007 and didn’t see reliable binary installs with good numerical performance until I discovered anaconda, much later (2016?). Maybe I hung out with the wrong people, or libraries. Some libraries did not have compiled wheels a few years ago, and M1 macs still run into the very occasional issue.


9 out of the top 360 packages still do not offer wheels! https://pythonwheels.com/


Not having a wheel is not really an issue for Python-only packages.


You’re running install-time code if you don’t, so I disagree. The wheel format is better suited even for pure python distribution. You can read more details on that site.


Conda environments are the most isolated from the host os, outside of using docker or a vm. They’re also the most heavyweight. As a package manager, you can install a lot of non-python stuff. It’s more similar to apt or yum than pip. You can even do things like run bash on windows with m2, which I usually find preferable to wsl.


pyarrow for example, if you don't have a wheel then good luck building one without using conda. It's not impossible but the developers don't really support it.

Official dependency and build tooling is not properly geared for C extensions. You'll be looking at compiler errors to figure out what dependencies you are missing.


Conda ships Intel MKL for linear algebra, too, meaning numpy.dot (matrix multiply), scipy, fft, solvers, … run 2-10x faster. Well, used to, I think the open source alternatives have improved since.


Installing Python only applications is trivial. What you're complaining about is all the missing code in other languages which isn't controlled by Python and depends on the OS to provide.

This is why we created Linux distributions in the first place. It is not the place of every language to reinvent the wheel - poorly.


Isn’t wheel one of pythons many packaging schemes (intended to replace eggs?)?

> There should be one– and preferably only one –obvious way to do it.

Oh no…


There are only two: wheel, which includes binaries, and sdist, which is Python-only. Eggs have been deprecated for nearly 10 years now.

> Oh no…

That sentence doesn't mean what you think you means. It is saying that there should be "only one way to do something".


> There should be one– and preferably only one –obvious way to do it.

Does not mean

> There should only ever be one way to do each thing.

But somehow extremely many people read it like that.


I wish pip had some package deduplication implemented. Even some basic local environments have >100MB of dependencies. ML environments go into the gigabytes range from what I remember.


Cargo allows you to share a single package directory. Having a single site-packages and the ability to import a particular version (or latest by default) would solve this.


This is also nice because you can clean up your build files all in one go. Every now and again I delete my cargo target directory and get 5GB back.


I'm not sure what you mean - unlike npm, pip installs only one copy of a dependency in each environment.


So if I have environments A, B, and C and each has a dependency ML-package-1.2 that's 100 MB, it means there's one copy of it in each environment? Meaning 3x100 MB?


Meaning 3x100 MB?

By default, yes. It is possible to install ML-package-1.2 into your 'base' python and then have virtual environments A,B and C all use that instead of installing their own copies. However this is generally not considered best practice.


Ah, so across multiple environments? You do have a point, but that has nothing to do with pip - it's not an environment manager.

I can see that as an option in tools like venv or poetry, possibly setting a "common" location to share across environments.


As a senior dev who is just starting with python, I just wish python makes pip being able to handle everything nicely. Hate seeing a bunch of tuts saying that I should use other tools (like poetry or whatever) because pip doesn't handle X use case.

I completely agree with your feeling. I been working with js, php, ruby for years, and packaging is pretty straightforward in every one of them. And the lang versioning works for every one of them (php is the most annoying, yeah, but even on php is easier than in python). Ruby has several alternatives like rbenv or rvm, but any of them works has every feature 99,9% of developers needs, and they work just fine. I want that in python ):


For all the complaints I hear about JS/npm, it's been so much less of a hassle than Gradle configs, Gemfiles, Composer files, and whatever python has going on.

I don't know how I'd feel about poetry being part of pip. Didn't poetry just push out a change where they deciding randomly failing 5% of the time while running in a CI env was a good idea?

The Python ecosystem is a bit of a disaster. There are many useful packages and programs written in Python, but they're legit scary to install. I always search for Go or JS alternatives if they're available to avoid all the python headaches.

Every time I need to interact with Python, I find myself banging my head trying to figure out eggs vs wheels vs distutils vs setuptools vs easyinstall vs pip vs pip3 vs /usr/bin/pip3 vs /opt/homebrew/bin/pip3 vs pip3.7 vs pip3.8 vs pipx vs pip-tools vs poetry vs pyenv vs pipenv vs virtualenv vs venv vs conda vs anaconda vs miniconda.


After working with Java for over a year professionally, I really appreciate its dependency management system(Maven or Gradle). Whereas with Python it is always a mess(Poetry looks promising though).


> Whereas with Python it is always a mess

I’m curious because this gets repeated a lot by many people. What specific messes do you get into?

I’m asking because my experience with python these days is always just doing “python -m venv .venv” activating it and then using “pip install -r requirements.txt”

To update dependencies I do “pip install —-upgrade dep” run tests and then go “pip freeze > requirements.txt” this never fails me. Though sometimes updating dependencies does make tests fail of cause, but that’s the fault of the individual packages not the packaging system. Even so I’d say even that is rare for me these days.

I know the might only work for 95% of the workflows out there, but I’m very curious as to what specific messes the last 5% end up struggling with and what makes people like you feel that it’s always a mess and not just “sometimes it gets messy” etc.


> After working with Java for over a year professionally, I really appreciate its dependency management system(Maven or Gradle).

Personally, it feels better in some ways and worse in others.

The whole pom.xml approach/format that Maven uses seems decent, all the way up to specifying which registries or mirrors you want to use (important if you have something like Nexus). Although publishing your own package needs additional configuration, which may mean putting credentials in text files, though thankfully this can be a temporary step in the CI pipeline.

That said, personally I almost prefer the node_modules approach to dependencies that Node uses (and I guess virtualenv to a degree), given that (at least usually) everything a particular project needs can easily be in a self-contained folder. The shared .m2 cache isn't a bad idea, it can just make cleaning it periodically/for particular projects kind of impossible, which is a shame.

I think one of the aspects that make dependencies better in JVM land is the fact that you oftentimes compile everything your app needs (perhaps without the JVM, though) into a .jar file or something similar, which can then be deployed. Personally, I think that that's one of the few good approaches, at least for business software, which is also why I like Go and .NET when similar deployments are possible. JVM already provides pretty much everything else you might need runtime-wise and you don't find yourself faffing about with system packages, like DB drivers.

That said, what I really dislike about the JVM ecosystem and frameworks like Spring, is the reliance on dynamic loading of classes and the huge amounts of reflection-related code that is put into apps. It's gotten to the point where even if your code has no warnings and actually compiles, it might still easily fail at runtime. Especially once you run into issues where dependencies have different versions of the same package that they need themselves, or alternatively your code doesn't run into the annotations/configuration that it needs.

Thankfully Spring Boot seems like a (decent) step forwards and helps you avoid some of the XML hell, but there is still definitely lots of historical baggage to deal with.

Personally, I like Python because of how easy it is to work with once things actually work and its relatively simplistic nature and rich ecosystem... but package management? I agree that it could definitely use some work. Then again, personally I just largely have stuck with the boring setup of something like pip/virtualenv inside of containers, or whatever is the most popular/widespread at any given moment.

Of course, Python is no exception here, trying to work with old Ruby or Node projects also sometimes has issues with getting things up and running. Personally I feel that the more dependencies you have, the harder it will be to keep your application up and running, and later update it (for example, even in regards to front end, consider React + lots of additional libraries vs something like Angular which has more functionality out of the box, even if more tightly coupled).


For improvements I commented: Remove setup.py files and mandate wheels. This is the root cause of a lot of the evil in the ecosystem.

Next on the list would be pypi namespaces, but there are good reasons why that is very hard.

The mission statement they are proposing, “a packaging ecosystem for all”, completely misses the mark. How about a “packaging ecosystem that works” first?

I spent a bunch of time recently fixing our internal packaging repo (nexus) because the switch from md5 hashes to sha256 hashes broke everything, and re-locking a bajillion lock files would take literally months of man hours time.

I’ve been a Python user for the last 17 years, so I’m sympathetic of how we got to the current situation and aware that we’ve actually come quite far.

But every time I use Cargo I am insanely jealous, impressed and sad that we don’t have something like it. Poetry is closest, but it’s a far cry.


> Remove setup.py files and mandate wheels

What alternative is there for me?

My package has a combination of hand-built C extensions and Cython extensions, as well as a code generation step during compilation. These are handled through a subclass of setuptools.command.build_ext.build_ext.

Furthermore, I have compile-time options to enable/disable certain configuration options, like enabling/disabling support for OpenMP, via environment variables so they can be passed through from pip.

OpenMP is a compile-time option because the default C compiler on macOS doesn't include OpenMP. You need to install it, using one of various approaches. Which is why I only have a source distribution for macOS, along with a description of the approaches.

I have not found a non-setup.py way to handle my configuration, nor to provide macOS wheels.

Even for the Linux wheels, I have to patch the manylinux Docker container to whitelist libomp (the OpenMP library), using something like this:

  RUN perl -i -pe 's/"libresolv.so.2"/"libresolv.so.2", "libgomp.so.1"/'
    /opt/_internal/pipx/venvs/auditwheel/lib/python3.9/site-packages/
    auditwheel/policy/manylinux-policy.json
Oh, and if compiling where platform.machine() == "arm64" then I need to not add the AVX2 compiler flag.

The non-setup.py packaging systems I've looked at are for Python-only code bases. Or, if I understand things correctly, I'm supposed to make a new specialized package which implements PEP 518, which I can then use to boot-strap my code.

Except, that's still going to use effectively arbitrary code during the compilation step (to run the codegen) and still use setup.py to build the extension. So it's not like the evil disappears.


To be clear, I’m not suggesting we remove the ability to compile native extensions.

I’m suggesting we find a better way to build them, something a bit more structured, and decouple that specific use case from setup.py.

It would be cool to be able to structure this in a way that means I can describe what system libraries I may need without having to execute setup.py and find out, and express compile time flags or options in a structured way.

Think of it like cargo.toml va build.rs.


I agree it would be cool and useful.

But it appears to be such a hard problem that modern packaging tools ignore it, preferring to take on other challenges instead.

My own attempts at extracting Python configuration information to generate a Makefile for personal use (because Makefile understand dependencies better than setup.py) is a mess caused by my failure to understand what all the configuration options do.

Given that's the case, when do you think we'll be able to "Remove setup.py files and mandate wheels"?

I'm curious on what evils you're thinking of? I assume the need to run arbitrary Python code just to find metadata is one of them. But can't that be resolved with a pyproject.toml which uses setuptools only for the build backend? So you don't need to remove setup.py, only restrict when it's used, yes?


The closest thing I've seen to a solution in this space is Riff, discussed yesterday [1], which solves the external dependency problem for rust projects.

[1]: https://news.ycombinator.com/item?id=32739954


Agreed.

In my answers to the survey, I mentioned "nix" was the technology most likely to affect the future of Python packaging, in part because of reading that same article on Riff.

I think now I should have mentioned Riff too.


The ability to create a custom package that can run any custom code you want at install time is very powerful. I think a decent solution would be to have a way to mark a package as trusted, and only allow pre/post scripts if they are indeed trusted. Maybe even have specific permissions that can be granted, but that seems like a ton of work to get right across operating systems.

My specific use cases are adding custom CA certs to certifi after it is installed, and modifying the maximum version of a requirement listed for an abandoned library that works fine with a newer version.

I think the best solutions would be an official way to ignore dependencies for a specific package, and specify replacement packages in a project's dependencies. Something like this if it were a Pipfile:

  public-package = {version = "~=1.0",replace_with='path/to/local-package'}
  abandoned-package = {version = "~=*",ignore_dependencies=True}

But the specific problem doesn't matter, what matters is that there will always be exceptions. This is Python, we're all adults here, and we should be able to easily modify things to get them to work the way we want them to. Any protections added should include a way to be dangerous.

I know your point is more about requiring static metadata than using wheels per se. I just believe that all things Python should be flexible and hack-able. There are other more rigid languages if you're into that sort of thing.

edit:

before anyone starts getting angry I know there are other ways to solve the problems I mentioned.

forking/vendoring is a bit of overkill for such a small change, and doesn't solve for when a dependency of a dependency needs to be modified.

monkeypatching works fine, however it would need to be done at all the launch points of the project, and even then if I open a repl and import a specific module to try something it won't have my modifications.

modifying an installed package at runtime works reasonably well, but it can cause a performance hit at launch, and while it only needs to be run once, it still needs to be run once. So if the first thing you do after recreating a virualenv is to try something with an existing module we have the same problem as monkey patching.

'just use docker' or maybe the more toned down version: 'create a real setup script for developers' are both valid solutions, and where I'll probably end up. It was just very useful to be able to modify things in a pinch.


Well we're almost there I think. You can define dependencies and other metadata in pyproject.toml nowadays:

https://setuptools.pypa.io/en/latest/userguide/pyproject_con...


How do I specify that I need gfortran installed?


You can't. But is that possible with any programming language specific package manager? How would that even work given that every flavour of OS/distro have their own way of providing gfortran?


You can't. But my g'parent comment in this thread was because my Python module needs the OpenMP library, or compile-time detection that it wasn't there, to skip OpenMP support. The latter is done by an environment variable which my setup.py understands.

Then orf dreamed of a day where you could "describe what system libraries I may need without having to execute setup.py and find out, and express compile time flags."

The link you pointed doesn't appear to handle what we were talking about. By specifying "gfortran", I hoped to highlight that difference.

riff, building on nix, seems an intriguing solution for this.


I emphatize with your situation and it's a great example. As crazy as this may sound, I think you would have to build every possible permutation of your library and make all of them available on pypi. You'd need a some new mechanism based on metadata to represent all the options and figure out how to resolve against available system libraries. Especially that last part seems very complicated. But I do think it's possible.


> not add the AVX2 compiler flag

It is a better idea to do instruction selection at runtime in the code that currently uses AVX2. I recently wrote some docs for Debian contributors about the different ways to achieve this:

https://wiki.debian.org/InstructionSelection


I do that, using manual CPUID tests, along with allowing environment variables to override the default path choices.

But if the compiler by default doesn't enable AVX2 then it will fail to compile the AVX2 intrinsics unless I add -mavx2.

Even worse was ~10 years ago when I had an SSSE3 code path, with one file using SSSE3 intrinsics.

I had to compile only that file for SSSE3, and not the rest of the package, as otherwise the compiler would issue SSSE3 instructions where it decided was appropriate. Including in code that wasn't behind a CPUID check.

Thus crash on hardware without SSSE3.

See https://stackoverflow.com/questions/15527611/how-do-i-specif... for more info about my solution. Someone last year contributed a solution for MS Windows.


See the wiki page, the function multi-versioning stuff means you can use AVX2 in select functions without adding -mavx2. And using SIMD Everywhere you can automatically port that to ARM NEON, POWER AltiVec etc.


EDIT: after I wrote the below I realize I could use automatic multi-versioning solely to configure the individual functions, along with with a stub function indicating "was compiled for this arch?" I think that might be more effective should I need to revisit how I support multiple processor architecture dispatch. I will still need the code generation step.

Automatic multi-versioning doesn't handled what I needed, at least not when I started.

I needed a fast way to compute the popcount.

10 years ago, before most machines supported POPCNT, I implemented a variety of popcount algorithms (see https://jcheminf.biomedcentral.com/articles/10.1186/s13321-0... ) and found that the fastest version depended on more that just the CPU instruction set.

I ended up running some timings during startup to figure out the fastest version appropriate to the given hardware, with the option to override it (via environment variables) for things like benchmark comparisons. I used it to generate that table I linked to.

Function multi-versioning - which I only learned about a few month ago - isn't meant to handle that flexibility. To my understanding.

I still have one code path which uses __popcountll built-in intrinsics and another which has inline POPCNT assembly, so I can identify when it's no longer useful to have the inline assembly.

(Though I used AVX2 if available, I've also read that some of the AMD processors have several POPCNT execution ports, so may be faster than using AVX2 for my 1024-bit popcount case. I have the run-time option to choose which to use, if I ever have access to those processors.)

Furthermore, my code generation has one path for single-threaded use and one code path for OpenMP, because I found single-threaded-using-OpenMP was slower than single-threaded-without-OpenMP and it would crash on multithreaded macOS programs, due to conflicts between gcc's OpenMP implementation and Apple's POSIX threads implementation.

The AVX2 popcount is from Muła, Kurz, and Lemire, https://academic.oup.com/comjnl/article-abstract/61/1/111/38... , with manually added prefetch instructions (implemented by Kurz). It does not appear that SIMD Everywhere is the right route for me.


If you implement your own ifunc instead of using the compiler-supplied FMV ifunc, you could do your benchmarks from your custom ifunc that runs before the program main() and choose the fastest function pointer that way. I don't think FMV can currently do that automatically, theoretically it could but that would require on additional modifications to GCC/LLVM. From the sounds of it, running an ifunc might be too early for you though, if you have to init OpenMP or something non-stateless before benchmarking.

SIMD Everywhere is for a totally different situation; if you want to automatically port your AVX2 code to ARM NEON/etc without having to manually rewrite the AVX2 intrinsics to ARM ones.


> The mission statement they are proposing, “a packaging ecosystem for all”, completely misses the mark. How about a “packaging ecosystem that works” first?

I think at the point a programming language is going on about "mission statements" for a packaging tool, you know they've lost the plot

copy Maven from 2004 (possibly with less XML)

that's it, problem solved


I tend to just give up on a package if it requires a C toolchain to install. Even if I do end up getting things set up in a way that the library's build script is happy with, I'll be inflicting pain on anyone else who then tries to work with my code.


I know this is unpopular opinion on here, but I believe all this packaging madness is forced on us by languages because Windows (and to a lesser degree osx) have essentially no package management.

Especially installing a tool chain to compile C code for python is no issue on Linux, but such a pain on Windows.


It may be unpopular but it's correct!

Every language tries to re-implement the package manager, but it ends up breaking down as soon as you need to interact with anything outside of that specific language's ecosystem. The only solution for interacting with the "outside" (other languages, toolchains, etc) is a system level, language agnostic package manager of some kind.

Linux distros package management is far from perfect but it's still miles ahead of the alternatives!

I very highly recommend people to learn how to write and create Linux packages if they need to distribute software. On Arch for example this would be creating PKGBUILDs, Gentoo has ebuilds, and other distros have something similar to these things.


> it ends up breaking down as soon as you need to interact with anything outside of that specific language's ecosystem.

It works OK in nuget on Windows, due to the different approach taken by the OS maintainer.

A DLL compiled 25 years ago for Windows NT stills work on a modern Windows, as long as the process is 32-bit. A DLL compiled 15 years ago for 64-bit Vista will still work on a modern Windows without issues at all.

People who need native code in their nuget packages are simply shipping native DLLs, often with static link to C runtime. Probably the most popular example of such package is SQLite.

> I very highly recommend people to learn how to write and create Linux packages if they need to distribute software.

I agree. When working on embedded Linux software where I control the environment and don’t care about compatibility with different Linuxes or different CPU architectures, I often package my software into *.deb packages.


C tends to work in those cases because there aren't a significant number of interesting C dependencies to add... because there is no standard C build system, packaging format, or packaging tools.

When juggling as many transitive dependencies in C as folks do with node, python, etc., there's plenty of pain to deal with.


It feels so suboptimal to need the C toolchain to do things, but having no solid way to depend on it as a non-C library (especially annoying in Rust, which insists on building everything from source and never installing libraries globally).

I make a tool/library that requires the C toolchain at runtime. That's even worse than build time, I need end users to have things like lld, objdump, ranlib, etc installed anywhere they use it. My options are essentially:

- Requiring users to just figure it out with their system package manager

- Building the C toolchain from source at build time and statically linking it (so you get to spend an hour or two recompiling all of LLVM each time you update or clear your package cache! Awesome!),

- Building just LLD/objdump/.. at build-time (but user still need to install LLVM. So you get both slow installs AND have to deal with finding a compatible copy of libLLVM),

- Pre-compiling all the C tools and putting them in a storage bucket somewhere, for all architectures and all OS versions. But then not have support when things like the M1 or new OS versions right away, or people on uncommon OSes. And now need to maintain a build machine for all of these myself.

- Pre-compile the whole C toolchain to WASM, build Wasmtime from source instead, and just eat the cost of Cranelift running LLVM 5-10x slower than natively...

I keep trying to work around the C toolchain, but I still can't see any very good solution that doesn't make my users have extra problems one way or another.

Hey RiiR evangelism people, anyone want to tackle all of LLVM? .. no? No one? :)


I feel Zig could help here. The binaries ship with LLVM statically linked. You could rely on them to provide binaries for a variety of architecture / OS, and use it to compile code on the target machine. I'll probably explore this at some point for Pip.


...and ensure _all_ package metadata required to perform dependency resolution can be retrieved through an API (in other words without downloading wheels).


Yeah, that’s sort of what I meant by my suggestion. Requirements that can only be resolved by downloading and executing code is a huge burden on tooling


If the package is available as a wheel, you don't need to execute code to see what the requirements are; you just need to parse the "METADATA" file. However, the only way to get the METADATA for a wheel (using PyPA standard APIs, anyway) is to download the whole wheel.

For comparison, pacman (the Arch Linux package manager) packages have fairly similar ".PKGINFO" file in them; but in order to support resolving dependencies without downloading the packages, the server's repository index includes not just a listing of the (name, version) tuple for each package, it also includes each package's full .PKGINFO.

Enhancing the PyPA "Simple repository API" to allow fetching the METADATA independently of the wheel would be a relatively simple enhancement that would make a big difference.

----

As I was writing this comment, I discovered that PyPA did this; they adopted PEP 658 in March of this year! https://github.com/pypa/packaging.python.org/commit/1ebb57b7...


Pip can use range requests to fetch just a part of the wheel, and lift the metadata out of that. So it can sometimes avoid downloading the entire wheel just to get the deps. Some package servers don't support this though.

Also, there's a difference between a pep being adopted and that pep being implemented (usually a bunch of elbow grease). That said there are a couple exciting steps towards 658 being implemented: https://github.com/pypa/pip/pull/11111 (just approved yesterday, not yet merged) https://github.com/pypi/warehouse/issues/8254 (been open for forever, but there has been incremental progress made. Warehouse seems to not attract the same amount of contribution as pip)


Yeah. Well, mandating wheels and getting rid of setup.py at least avoids having to run scripts, and indeed enables the next step which would be indexing all the metadata and exposing it through an API. I just thought it wouldn't necessarily be obvious to all readers of your comment.


Just to be clear, package metadata already is sort of available through the pypi json api. I've got the entire set of all package metadata here: https://github.com/orf/pypi-data

  $ gzcat release_data/c/d/cdklabs.cdk-hyperledger-fabric-network.json.gz | jq '. | to_entries | .[].value.info.requires_dist' | head
  [
    "typeguard (~=2.13.3)",
    "publication (>=0.0.3)",
    "jsii (<2.0.0,>=1.63.2)",
    "constructs (<11.0.0,>=10.0.5)",
    "aws-cdk-lib (<3.0.0,>=2.33.0)"
  ]
It's just not everything has it, and there isn't a way to differenciate between "missing" and "no dependencies". And it's also only for the `dist` releases. But anyway, poetry uses this information during dependency resolution.


I'm aware! The issue is the mixed content (wheels, and... not wheels) on pypi. If the data is incomplete, it's useless in the sense that you're never going to be able to guarantee good results.


What if I have a dependency on a commercial third-party Python package which is on Conda but not on PyPI?


you are placing open code in a vendor lock-in, to start


Yes, I understand that.

I see I misunderstood korijn's comment. My earlier reply is off-topic, so I won't continue further off the track.


> For improvements I commented: Remove setup.py files and mandate wheels.

This would make most C extensions impossible to install on anything other than x86_64-pc-linux-gnu (or arm-linux-gnueabihf/aarch64-linux-gnu if you are lucky) because developers don't want to bother building wheels for them.


I think it'd make other things impossible too. One project I help maintain is C++ and is mainly so. It optionally has Python bindings. It also has something like 150 options to the build that affect things. There is zero chance of me ever attempting to make `setup.py` any kind of sensible "entry point" to the build. Instead, the build detects "oh, you want a wheel" and generates `setup.py` to just grab what the C++ build then drops into a place where `build_ext` or whatever expects them to be using some fun globs. It also fills in "features" or whatever the post-name `[name]` stuff is called so you can do some kind of post-build "ok, it has a feature I need" inspection.


cibuildwheel (which is an official, supported tool) has made this enormously easier. I test and generate wheels with a compiled (Rust! Because of course) extension using a Cython bridge for all supported Python versions for 32-bit and 64-bit Windows, macOS x86_64 and arm64, and whatever manylinux is calling itself this week. No user compilation required. It took about half a day to set up, and is extremely well documented.


Setup.py can do things wheels can't. Most notably it's the only installation method that can invoke 2to3 at runtime without requiring a dev to create multiple packages.


It’s lucky Python 2 isn’t supported anymore then, and everyone has had like a decade to run 2to3 once and publish a package for Python 3, so that use case becomes meaningless.


You'd be surprised at how many billions lines of production code are still at 2 (and could not care less whether it's end-of-lined)


I'm not surprised at all, but regardless they also should not be similarly surprised if people could not care less about that use case.


very unfortunately the direct burden of python2 is placed on the packagers.. users of Python 2 like their libs (me) and have no horse in this demonization campaign


Pay for support for Python 2 then? At which point it’s a burden on the person you are paying. Or don't, in which case you're complaining that people are demonizing you because they are not doing your work for free?


this is battle-fatigue in action! I did not complain, in fact I am faced with the serious burden that is placed on packagers of python, on a regular basis, and have put many cycles of thought into it.. It is obvious that packagers and language designers are firmly in one end of a sort of spectrum.. while many users, perhaps engineering managers in production with sunk costs, are firmly at the other.. with many in between .. more could be said.. no one is an enemy on this, it is complicated to solve it with many tradeoffs and moving parts.. evidence in the topic today.


Many "Python" packages include native code in some form either as bindings or to workaround Python being agonizingly slow. Which means you often need to call make, or cmake or some other build system anyway... unless you want to build wheels for every possible configuration a user might have (which is virtually impossible, considering every combination of OS, architecture, debug options, etc. you may want to support). Plus you need a build system to build the wheels anyway...


I suggested "one packaging system to rule them all." The fragmentation in this space is frustrating.


I recommend PDM over poetry!


I have a terrible admission to make: one of the reason I like Python is its huge standard library, and I like that because I just ... despise looking for libraries, trying to install them, evaluating their fitness, and so on.

I view dependencies outside of the standard library as a kind of technical debt, not because I suffer from Not Invented Here and want to code it myself, no, I look and think, "Why isn't this in the standard library with a working set of idioms around it?"

I haven't developed anything with more than five digits of code to it, which is fine for me, but part of it is just ... avoidance of having to screw with libraries. Ran into a pip issue I won't go into (it requires a lot of justification to see how I got there) and just ... slumped over.

This has been a bad spot in Python for a long, long time. While people are busy cramming their favorite feature from their last language into Python, this sort of thing has languished.

Sadly, I have nothing to offer but encouragement, I don't know the complexities of packaging, it seems like a huge topic that perhaps nobody really dreamed Python would have to seriously deal with twenty years ago.


> despise looking for libraries, trying to install them, evaluating their fitness, and so on.

This is exactly why I prefer the larger opinionated web frameworks (Django, Vue.js) to the smaller more composable frameworks (Flask, React). I don’t what to make decisions every time I need a new feature, I want something that “just works”.

Python and Django just work, and brilliantly at that!


Currently dealing with Flask and it makes me sad from the endless decision fatigue. Enormous variations in quality of code, documentation, SO answers, etc. To not even consider the potential for supply side attacks.

With Django there is a happy path answer for most everything. If I run into a problem, I know I'm not the first.


This is one of the reasons why I don't quite like Node. It feels like everything is a dependency.

It seems ridiculous to me that there isn't a native method for something as simple and ubiquitous as putting a thread to sleep, or that there is an external library (underscore) that provides 100+ methods that seem to be staples in any modern language.

Python is nice in that way. It is also opinionated in a cohesive and community driven manner, e.g. PEP8.


If requests and a basic web framework was in the standard library you’d effectively eliminate the majority of my dependencies.

Honestly I doubt see the package management being an issue for most end-users. Between the builtin venv, conda and Docker I feel that the use-cases for most is well covered.

The only focus area I really see is better documentation. Easier to read documentation more precisely. Perhaps a set of templates to help people getting start with something like pyproject.

It feels like the survey is looking for a specific answer, or maybe it’s just that surveys are really hard to do. In any case I find responses to be mostly: I have no opinion one way or the other.


Something like bottle.py would be an excellent candidate for inclusion. The real reason to avoid putting anything into the standard library is that it seems to often be the place where code goes to stagnate and die for Python.


I am not sure why that has turned into a truism.

Really good code in the standard library should reach a level of near perfection, then eventually transition into hopeful speed gains, after which you're really only changing that code because the language has changed or the specification has updated.


> I view dependencies outside of the standard library as a kind of technical debt

That's an interesting position. So are you suggesting that very specialised packages such as graph plotting, ML-packages, file formats, and image processing should be part of the standard library? What about very OS/hardware-specific packages, such as libraries for microcontrollers?

There are many areas that don't have a common agreed-upon set of idioms or functionality and that are way too specialised to be useful for most users. I really don't think putting those into the standard library would be a good idea.


Hrm. Graph-plotting ... yes. File formats ... yes, as many as possible. Image processing, given the success of ImageMagick, I'd say yes there as well. I don't know much about ML to say.

OS-specific packages, quite possibly.

The thing about the standard library is that it is like high school: there's a lot of stuff you think you will never need, and you're right about most of it, but the stuff you do need you're glad you had something going, at least.


ImageMagick is actually a good example: I use Python as my primary tool for shell scripting (I don't like "traditional" shell scripts for various reasons) - if I can use Python to control external tools such as ImageMagick, why would I want to include all its functionality, codecs, effects, etc. in the standard library?

Including too much leads to a huge burden for the maintainers and consequently results in this: https://peps.python.org/pep-0594/

Quote:

> Times have changed. With the introduction of PyPI (née Cheeseshop), setuptools, and later pip, it became simple and straightforward to download and install packages. Nowadays Python has a rich and vibrant ecosystem of third-party packages. It’s pretty much standard to either install packages from PyPI or use one of the many Python or Linux distributions.

> On the other hand, Python’s standard library is piling up with cruft, unnecessary duplication of functionality, and dispensable features.


More packages in the standard library means it can run in less machines and more extra junk needs to be installed.

Minimal standard library languages let you pick and choose what needs to be run. Golang is a nice happy medium since it’s compiled.


I didn't take the survey because I've never packaged anything for PyPI, but I wish all of the package managers would have an option for domain validated namespaces.

If I own example.com, I should be able to have 'pypi.org/example.com/package'. The domain can be tied back to my (domain verified) GitHub profile and it opens up the possibility of using something like 'example.com/.well-known/pypi/' for self-managed signing keys, etc..

I could be using the same namespace for every package manager in existence if domain validated namespaces were common.

Then, in my perfect world, something like Sigstore could support code signing with domain validated identities. Domain validated signatures make a lot of sense. Domains are relatively inexpensive, inexhaustible, and globally unique.

For code signing, I recognize a lot of project names and developer handles while knowing zero real names for the companies / developers involved. If those were sitting under a recognizable organizational domain name (example.com/ryan29) I can do a significantly better job of judging something's trustworthiness than if it's attributed to 'Ryan Smith Inc.', right?


Maven Central requires validation of a domain name in order to use a reverse-domain package[0].

It's not without problems. One is that folks often don't control the domain (consider Go's charming habit of conflating version control with package namespacing). Another is what was noted below: resurrection attacks on domains can be quite trivial and already happen in other forms (eg registering lapsed domains for user accounts and performing a reset).

[0] https://central.sonatype.org/faq/how-to-set-txt-record/


That's a really interesting idea, but I worry about what happens when a domain name expires and is re-registered (potentially even maliciously) by someone else.


I think you'd probably need some buy in from the domain registries and ICANN to make it really solid. Ideally, domains would have something similar to public certificate transparency logs where domain expirations would be recorded. I even think it would be reasonable to log registrant changes (legal registrant, not contact info). In both cases, it wouldn't need to include any identifiable info, just a simple expired/ownership changed trigger so others would know they need to revalidate related identities.

I don't know if registries would play ball with something like that, but it would be useful and should probably exist anyway. I would even argue that once a domain rolls through grace, redemption, etc. and gets dropped / re-registered, that should invalidate it as an account recovery method everywhere it's in use.

There's a bit of complexity when it comes to the actual validation because of stuff like that. I think you'd need buy in from at least one large company that could do the actual verification and attest to interested parties via something like OAuth. Think along the lines of "verify your domain by logging in with GitHub" and at GitHub an organization owner that's validated their domain would be allowed to grant OAuth permission to read the verified domain name.


You've already talked about Sigstore (which is an excellent technology for this space), so we can consider developers holding keys that are stored in an append-only log. Then it doesn't matter if the domain expires and someone re-registers it, since they don't have the developer's private keys.

Of course there are going to be complexities involving key-rollover and migrating to a different domain, but a sufficiently intelligent Sigstore client could handle the various messages and cryptographic proofs needed to secure that. The hard part is how to issue a new key if you lose the old one, since that probably requires social vouching and a reputation system.

[0] https://docs.sigstore.dev/


> Then it doesn't matter if the domain expires and someone re-registers it, since they don't have the developer's private keys.

A principal reason to use sigstore is to get out of the business of handling private keys entirely. It turns a key management problem into an identity problem, the latter being much easier to solve at scale.


> Then it doesn't matter if the domain expires and someone re-registers it, since they don't have the developer's private keys.

That's a good point in terms of invalidation, but a new domain registrant should be able to claim the namespace and start using it.

I think one possible solution to that would be to assume namespaces can have their ownership changed and build something that works with that assumption.

Think along the lines of having 'pypi.org/example.com' be a redirect to an immutable organization; 'pypi.org/abcd1234'. If a new domain owner wants to take over the namespace they won't have access to the existing account and re-validating to take ownership would force them to use a different immutable organization; 'pypi.org/ef567890'.

If you have a package locking system (like NPM), it would lock to the immutable organization and any updates that resolve to a new organization could throw a warning and require explicit approval. If you think of it like an organization lock:

v1:

    pypi.org/example.com --> pypi.org/abcd1234
v2:

    pypi.org/example.com --> pypi.org/ef123456
If you go from v1 to v2 you know there was an ownership change or, at the very least, an event that you need to investigate.

Losing control of a domain would be recoverable because existing artifacts wouldn't be impacted and you could use the immutable organization to publish the change since that's technically the source of truth for the artifacts. Put another way, the immutable organization has a pointer back the current domain validated namespace:

v1:

    pypi.org/abcd1234 --> example.com
v2:

    pypi.org/abcd1234 --> example.net
If you go from v1 to v2 you know the owner of the artifacts you want has moved from the domain example.com to example.net. The package manager could give a warning about this and let an artifact consumer approve it, but it's less risky than the change above because the owner of 'abcd1234' hasn't changed and you're already trusting them.

I think that's a reasonably effective way of solving attacks that rely on registering expired domains to take over a namespace and it also makes it fairly trivial for namespace owners to point artifact consumers to a new domain if needed.

Think of the validated domain as more of a vanity pointer than an actual artifact repository. In fact, thinking about it like that, you don't actually need any cooperation or buy in from the domain registries.

> The hard part is how to issue a new key if you lose the old one, since that probably requires social vouching and a reputation system.

It's actually really hard because as you increase the value of a key, I think you decrease the security practices around handling them. For example, some people will simply drop their keys into OneDrive if there's any inconvenience associated with losing them.

I would really like to have something where I can use a key generated on a tamper proof device like a YubiKey and not have to worry about losing it. Ideally, I could register a new key without any friction.


Needs compensating controls to get it right.

* Dependencies are managed in a similar way to Go - where hashes of installed packages are stored and compared client side. This means that a hijacker could only serve up the valid versions of packages that I’ve already installed.

* This is still a “centralized” model where a certain level of trust is placed in PyPi - a mode of operation where the “fingerprint” of the TLS key is validated would assist here. However it comes with a few constraints.

Of course the above still comes with the caveat that you have to trust pypi. I’m not saying that this is an unreasonable ask. It’s just how it is.


CT: Certificate Transparency logs log creation and revocation events.

The Google/trillian database which supports Google's CT logs uses Merkle trees but stores the records in a centralized data store - meaning there's at least one SPOF Single Point of Failure - which one party has root on and sole backup privileges for.

Keybase, for example, stores their root keys - at least - in a distributed, redundantly-backed-up blockchain that nobody has root on; and key creation and revocation events are publicly logged similarly to now-called "CT logs".

You can link your Keybase identity with your other online identities by proving control by posting a cryptographic proof; thus adding an edge to a WoT Web of Trust.

While you can add DNS record types like CERT, OPENPGPKEY, SSHFP, CAA, RRSIG, NSEC3; DNSSEC and DoH/DoT/DoQ cannot be considered to be universally deployed across all TLDs. Should/do e.g. ACME DNS challenges fail when a TLD doesn't support DNSSEC, or hasn't secured root nameservers to a sufficient baseline, or? DNS is not a trustless system.

EDNS (Ethereum DNS) is a trustless system. Reading EDNS records does not cost EDNS clients any gas/particles/opcodes/ops/money.

Blockcerts is designed to issue any sort of credential, and allow for signing of any RDF graph like JSON-LD.

List_of_DNS_record_types: https://en.wikipedia.org/wiki/List_of_DNS_record_types

Blockcerts: https://www.blockcerts.org/ https://github.com/blockchain-certificates :

> Blockcerts is an open standard for creating, issuing, viewing, and verifying blockchain-based certificates

W3C VC-DATA-MODEL: https://w3c.github.io/vc-data-model/ :

> Credentials are a part of our daily lives; driver's licenses are used to assert that we are capable of operating a motor vehicle, university degrees can be used to assert our level of education, and government-issued passports enable us to travel between countries. This specification provides a mechanism to express these sorts of credentials on the Web in a way that is cryptographically secure, privacy respecting, and machine-verifiable

W3C VC-DATA-INTEGRITY: "Verifiable Credential Data Integrity 1.0" https://w3c.github.io/vc-data-integrity/#introduction :

> This specification describes mechanisms for ensuring the authenticity and integrity of Verifiable Credentials and similar types of constrained digital documents using cryptography, especially through the use of digital signatures and related mathematical proofs. Cryptographic proofs enable functionality that is useful to implementors of distributed systems. For example, proofs can be used to: Make statements that can be shared without loss of trust,

W3C TR DID (Decentralized Identifiers) https://www.w3.org/TR/did-core/ :

> Decentralized identifiers (DIDs) are a new type of identifier that enables verifiable, decentralized digital identity. A DID refers to any subject (e.g., a person, organization, thing, data model, abstract entity, etc.) as determined by the controller of the DID. In contrast to typical, federated identifiers, DIDs have been designed so that they may be decoupled from centralized registries, identity providers, and certificate authorities. Specifically, while other parties might be used to help enable the discovery of information related to a DID, the design enables the controller of a DID to prove control over it without requiring permission from any other party. DIDs are URIs that associate a DID subject with a DID document allowing trustable interactions associated with that subject.

> Each DID document can express cryptographic material, verification methods, or services, which provide a set of mechanisms enabling a DID controller to prove control of the DID. Services enable trusted interactions associated with the DID subject. A DID might provide the means to return the DID subject itself, if the DID subject is an information resource such as a data model.


For another example of how Ethereum might be useful for certificate transparency, there's a fascinating paper from 2016 called "EthIKS: Using Ethereum to audit a CONIKS key transparency log" which is probably way ahead of its time.

Abstract: https://link.springer.com/chapter/10.1007/978-3-662-53357-4_...

PDF: https://jbonneau.com/doc/B16b-BITCOIN-ethiks.pdf


Certificate Transparency: https://en.wikipedia.org/wiki/Certificate_Transparency

/? "Certificate Transparency" Blockchain https://scholar.google.com/scholar?q=%22Certificate+Transpar... https://scholar.google.com/scholar_alerts?view_op=list_alert...

- Some of these depend upon a private QKD [fiber,] line

- NIST PQ algos are only just now announced: https://news.ycombinator.com/item?id=32281357 : Kyber, NTRU, {FIPS-140-3}?

/? Ctrl-F "Certificate Transparency" https://westurner.github.io/hnlog/ :

"Google's Certificate Transparency Search page to be discontinued May 15th, 2022" https://news.ycombinator.com/item?id=30781698

- LetsEncrypt Oak is also powered by Google/trillian, which is a trustful centralized database

- e.g. Graph token (GRT) supports Indexing (search) and Curation of datasets

> And what about indexing and search queries at volume, again without replication?

My understanding is that the s Sigstore folks are now more open to the idea of a trustless DLT? "W3C Verifiable Credentials" is a future-proof standardized way to sign RDF (JSON-LD,) documents with DIDs.

Verifiable Credentials: https://en.wikipedia.org/wiki/Verifiable_credentials

# Reproducibile Science Publishing workflow procedures with Linked Data:

- Sign the git commits (GPG,)

- Sign the git tags (GPG+Sigstore, ORCID & DOI (-> W3C DIDs), FigShare, Zenodo,)

- Sign the package(s) and/or ScholarlyArticle & their metadata & manifest ( Sigstore, pkg_tool_xyz,CodeMeta RDF/JSON-LD, ),

- Sign the SBOM (CycloneDx, Sigstore,)

- Search for CVEs/vulns & Issues for everything in the SBOM (Dependabot, OSV,)

- Search for trusted package hashes for everything in the SBOM

- Sign the archive/VM/container image (Docker Notary TUF, Sigstore,)

- Archive & Upload & Restore & Verify (and then Upgrade Versions in the) from the dependency specifications, SBOM, and/or archive/VM/container image (VM/container tools, repo2docker (REES),)

- Upgrade Versions and run unit, functional, and integration tests ({pip-tools, pipenv, poetry, mamba}, pytest, CI, Dependabot,))


Sigstore uses Trillian for its transparency log, Rekor.


My wishlist:

We need a way to configure an ordered list of indexes pip searches for packages. —extra-index-url or using a proxy index is not the solution.

Also namespaces and not based on a domain. So for example: pip install apache:parquet

Also some logic either in the pip client or index server to minimize typosquatting

Also pip should adopt a lock file similar to npm/yarn. Instead of requirements.txt

And also “pip list” should output a dependency tree like “npm list”

I should not have to compile source when I install. Every package should have wheels available for the most common arch+OS combos.

Also we need a way to download only what you need. Why does installing scipy or numpy install more dependencies than the conda version? For example pywin and scipy.


If you are using poetry you can add something to the pyproject.toml to handle the indexes, though I am not sure if they are ordered or not

[[tool.poetry.source]] name = "my-pypi" url = "https://my-pypi-index.wherever" secondary = true


Thanks. I’ll look into this.


Typosquatting is a thing that has been looked at and is being looked at:

https://github.com/pypi/warehouse/pull/5001 - had to be reverted because it was too noisy

https://github.com/pypi/warehouse/issues/9527


> apache:parquet

How are you going to name the file storing the wheel for that package? Using ":" on Windows is going to be problematic.


That’s right. Then some other delimiter.



Seems more oriented to (potential) contributors than end users of the packaging system. Who cares about mission statements and inclusivity, secure funding and pay developers to make the tools.


> Who cares about mission statements and inclusivity, secure funding and pay developers to make the tools.

These are connected things.

I maintain a PyPA member project (and contribute to many others), and the latter is aided by the former: the mission statement keeps the community organized around shared goals (such as standardizing Python's packaging tooling), and inclusivity insures a healthy and steady flow of new contributors (and potential corporate funding sources).


> I maintain a PyPA member project (and contribute to many others)

THANK YOU!

> keeps the community organized around shared goals (such as standardizing Python's packaging tooling)

Personally I felt some disconnect between "package manager for all" and the need for "standardizing Python's packaging tooling." Yes, communities should be welcoming and friendly to everyone, AND the community should have clear expectations for best practices that members should follow. E.g., is an experienced female developer more likely to give up on contributing because she couldn't find a local meetup, or because she didn't know whether to create pyproject.toml vs requirements.txt? In some sense, the bigger and more diverse the community, the greater the need for a clear, solid foundation. IDK if that's remotely clear; it's just a feeling I had going through some of those questions.


The PSF are not engineers looking for a better developer experience, but politicians looking for power. That’s why the pipenv fiasco a few years ago


What was the pipenv fiasco?


Pipenv is pretty nice, I still use it. Kenneth Reitz really has a knack for interfaces and making things easier for developers to use.

The fiasco was that he was not the best at maintaining projects, and used his popularity from tablib and requests to get pipenv recommended by the PyPA well before it was ready for general use. Then around the same time there was a few scandals I don't remember the details of, something about a developer being mad that Kenneth kept the money being donated to requests. It ended up with all of his popular projects being maintained by others.

I wish the response from the PyPA was to go full in on pipenv and just keep improving it at the same rate it was in the beginning. Instead it stagnated. Poetry came out fragmenting the ecosystem even more. And quite a few developers gave up on virtual environments in favor of docker.

pipenv started having regular releases again and have kept it going for the past few years. I like it enough to not want to put any effort into switching, but in 2017/2018 it really felt like the next big thing. It makes me sad to think that we all lost out because of politics.


I was trying to to use Pipenv around the time of the drama.

Ended up switching to Poetry because it had a superior dependency resolution algorithm.


Ugh I remember the dependency problems. I think we worked around it by explicitly installing dependencies first. We did almost switch to poetry because of that, but it came down to "everyone is busy, lets just see how often it is a problem". Luckily we didn't run into it too many times, and they have since fixed it.


This survey is the literal definition of leading question. Found about 2 boxes I could tick, before being forced to order a list of the designer's preferences according to how much I agree with them. The only data that can be generated from a survey like this is the data you wanted to find (see also Boston Consulting Group article earlier today). I cannot honestly respond to it

The only question I have is, what grant application(s) is the survey data being used to support?


The absence of the go binary as a tool (i.e. "go get ...", "go install ..." etc.) is odd, considering that is what has been eating Python's lunch lately.


I imagine many of you have feedback that could be useful to folks making decisions about the future of Python packaging, a common subject of complaint in many discussions here.

Remember not to just complain, but to offer specific problems/solutions--i.e. avoid statements like "virtualenvs suck, why can't it be like NPM?" and prefer instead feedback like "the difference between Python interpreter version and what virtualenv is being used causes confusion".


"virtualenvs suck, why can't it be like NPM?" is a specific problem and a specific solution. The problem being having to manage venvs (which have many gotchas and pitfalls, and no standarization), and the solution is to replace those with packages being installed into the project folder with standardized and well-known tools.


Keep an eye on https://peps.python.org/pep-0582/ it's a proposal to add local directory/node_modules-like behavior to package installs. It stalled out a few years ago but I heard there is a lot more discussion and push to get it in now.

I think if this PEP makes it in then like 90% of people's pain with pip just completely goes away almost overnight. Love it or hate it the NPM/node_modules style of all dependencies dumped in a local directory solves a _ton_ of problems in the packaging world. It would go a long way towards making the experience much smoother for most python users.


I did a proof-of-concept for this a few years ago: https://github.com/jogjayr/pykg

It literally uses npm, the NPM registry, and node_modules for Python dependency management.


I am pretty happy with PyPi/pip, it is an easy way to distribute Python and C++ code wrapped in a Python extension to others. For a C++ developer it is becoming harder to distribute native executables, since MacOS and Windows require signing binaries. Python package version conflicts and backwards incompatibility can be an issue.


I like poetry for its simplicity but I can’t tell how “official” it is in the python ecosystem. I hope it doesn’t die out. I think it’s the simplest possible way to maintain deps and publish to PyPI if you don’t have any weird edge cases.


Couldn't agree more. Poetry is fantastic and provides that 'just works' experience for most cases. It's not official (although possibly should be adopted) but has gained ground by virtue of its quality. Fortunately it's very actively developed so will hopefully stick around.


I like poetry. It still has a way to go though since it is slow as all hell doing almost anything and its error messages are closer to a stack trace than something actionable.


I agree that the stack trace error messages are weird. That aspect feels uncharacteristically hacky for an otherwise pretty polished tool.


Supposedly this has been improved in 1.2.0: https://python-poetry.org/blog/announcing-poetry-1.2.0/#non-...


The state of python packing changes so frequently that I wouldn't even be able to answer this survey, without looking up best practices. We're supposed to be using yaml files now, right?


I think you meant TOML files.

P.S. By the way https://peps.python.org/pep-0518/#other-file-formats


It should be make the official package manager IMO.


I’ve been tinkering with stable diffusion lately and this has been a rude introduction to python. Coming from .net (nuget) and JavaScript (npm), it’s baffling that there isn’t an established solution for python. It looks to me like people are trying, but different libraries use different techniques. To a newcomer this is confusing.


>I’ve been tinkering with stable diffusion lately and this has been a rude introduction to python. Coming from .net (nuget) and JavaScript (npm), it’s baffling that there isn’t an established solution for python.

Python has had multiple legacy solutions going back a long time before nuget and npm existed, and before central registries of dependencies. Every new solution has to cope with all that compatibility/transitional baggage. Also a bunch of usecases .NET or JS never really had to deal much with - eg being a core system language for Linux distros, and supporting cross platform installs back in the download and run something days. The scope of areas Python gets used in means its packaging is pulled in more directions than most other languages who mostly stick to a main niche.

So the history and surface area of problems to solve in Python packaging is larger than what most other languages have had to deal with. It also takes years for the many 3rd party tools to try out new approaches, gain traction and then slowly get their best ideas synthesized and adapted into the much more conservative core Python stdlib.

Not saying it is great, just laying out some of the reasons it is what it is.


ML/AI is the deep end of python dependencies. Lots of hardware specific requirements (e.g. CUDA, cuDNN, AVX2 TensorFlow binaries, etc). A typical python web application is a lot simpler.


To be fair, this is a big part of the problem- I’m doing this on a m1 mac, PLUS am new to python. Fortunately there are some good guides. Still, this is a solvable problem in other package managers.


Installing packages, creating a manifest of dependancies, managing virtual environments, packaging, checking/formatting code, etc... should be built into the Python toolchain (the python binary itself). Needing to chose a bunch of third party tools to make it work... makes Python, well... un-pythonic.


Not sure about formatting code, I think that's a job for IDE or text editor, not the runtime.

Absolutely agree about the rest of them.


I was thinking of something similar to go's "go fmt". Something that would standardize the indentation (tab to 4 spaces), sort imports in the correct order, etc.


Pickup poetry and fix it. I thought it would be fun to use poetry but it smacks itself here and there.


State of the art python packaging must include support for common use cases such as conda+machine learning.

It’s incredible how even Julia’s Pkg.jl supports better python packaging in combination with conda than the official python packaging tools.

This is very clearly a question of the culture of the core python developers (such as brett cannon) who seem to think the machine learning people with their compilers and JITs are not an important part of the community.


The Twitter hero faction of the Python core "developers" (many of whom have not done much actual work for a decade) has always been pro-web and anti-C-extensions.

They have been peddling the dream of speeding up Python by neglecting the scientific ecosystem for a long time without having either the expertise or a clear plan how to achieve that.

All that while the primarily useful part of Python is its C extension capabilities. The web does not need Python, query/response applications are handled just fine and better by functional languages.


The pypy folks and a bunch of other alternative python implementations have pooled together and are working on a soft fork of the c extension api to enable existing established projects that make heavy use of the c api to better work with the superior faster python implementations.

I believe this is the way forward. Existing packages are essentially tied to cpython because that’s what the python.org website offers for download, and people will just use that by default.

Once all major packages support the new c api, it will be easier to just use something other than cpython by default.


I just wish that PyPI would enforce binary wheels going forward (at least for linux x64/arm64 for people who use Docker, but ideally for all common platforms). They already supply the cibuildwheel tool to automate their builds, so it shouldn't be that hard for library developers...

Software developers shouldn't need to figure out what build-time dependencies their libraries need...


Should PyPI kick my project off because it don't support MS Windows?

My package uses C and Cython extensions. While I support macOS and Linux-based OSes, I don't know how to develop on or support MS Windows.

I've tried to be careful about the ILP32 vs LP64 differences, but I suspect there's going to be many places where I missed up.

I also use "/dev/stdin" to work-around my use of a third-party library that has no way to read from stdin. As far as I can tell, there's no equivalent in Windows, so I'll have to raise an exception for that case, and modify my tests cases.


Can't you use "CON" instead of "/dev/stdin" on Windows?


I asked that question nearly 11 years ago on StackOverflow, at https://stackoverflow.com/questions/7395157/windows-equivale... . ;)

Quoting the best comment, "echo test | type CON or echo test | type CONIN$ will read from the console, not from stdin."


I haven't had too much trouble with packages missing binary wheels lately. Occasionally Pip doesn't find them, which looks the same as if the binary wheel were missing entirely (looking at you, Anaconda-- update your Pip already), but they're usually there.

But I'm usually on Windows if I need binary wheels; maybe the coverage is a bit different on Linux.


Try installing grpcio-tools on Darwin/arm64 with Python 3.10. More often than not I run into problems where low level headers required by some cryptography libraries cannot be found and as a result compilation fails.


Python packaging is fine, the difficult part is always C code with .pyd or .so files god damn those are nasty piece of shit.

Pypi doesn't need users to manually to fill survey per se, they need to (optionally) prompt "install failed, submit traceback and error code to pypi for solutions" during setup.


I wish there is some package manager in middle of conda and pip. Conda is too strict and often get stuck in SAT solving. pip doesn't even ask when reinstalling a version currently being used.

Edit: Typo: reinstalling a version of package currently being used


My ask would be to get rid of the need for conda all together.

Conda obviously offers a lot of value in sharing hairy compiled packages, but it does not play well with anything else. None of available the tooling really works with both conda and pip. It fragments the already lousy packaging story.


other than an occasional pkg conflict, we've found conda and pip to work pretty well together (defining pip dependencies in the conda env) -- across a lot of different python envs


Try poetry. It wraps pip and fixes a lot of its issues


Seconding Poetry. IMO it should have been the standard package manager - it just works (TM)


Although it consumed 70GB of RAM before I killed it, when I tried to use it to `poetry install` stable-diffusion.


we find mamba solves deps solving for conda (we fail to do GPU dependencies without it), and I think it's getting integrated

my main thing w/ conda is it's bananas figuring out how to make a new recipe, which is pretty surprising


Agreed, I've found packaging for conda to be so much harder than packaging for pip


Conda sucks because it always wants things its own way, and introduces yet another site-packages and environment I need to track. Absolutely hate Apple for requiring it for their metal and M1 support.


> pip doesn't even ask when reinstalling a version currently being used.

Just as an explanation: a "version" in Python packaging can come from one of many potential distributions, including a local distribution (such as a path on disk) that might different from a canonical released distribution on PyPI. Having `pip install ...` always re-install based on its candidate selection rules is generally good (IMO), since an explicit run of `pip install` implies user intent to search for a potentially new or changed distribution.


I meant this for dependency not the package I am installing.


I'm not sure I understand what you mean -- `pip` should not be reinstalling transitive dependencies. If you install A and B and both depend on C, C should only be installed once.


I think they're referring to requirements files. I've seen the same behavior - on day 1, pip installs packages A and B, then on day 2 when someone else modified the requirements file it installs C and reinstalls B even though B hasn't changed.

This one I've seen when the dependency doesn't specify an exact version and you included "--upgrade".

There's a second case that I think was fixed in pip 21 or 22, where two transitive dependencies overlap - A and B depend on C, but with different version ranges. If A allows a newer version of C than B allows, C can get installed twice.


Try using mamba (https://github.com/mamba-org/mamba)

We ran into many unsolvable or 30m+ solvable envs with conda that mamba handled quickly.

The underlying solver can be used with conda directly as well, but I have not done that (https://www.anaconda.com/blog/a-faster-conda-for-a-growing-c...)


Sounds like you want pipenv, but that might be too close to conda


> There should be one– and preferably only one –obvious way to do it.

I think the answer to Python’s packaging woes has been in plain sight all along. There are so many competing package managers and systems that it is hard to keep track of them all. Of course; this is just what happens in open source, but the problem seems uniquely bad for Python. In Ruby, Swift, Rust or Golang I don’t encounter there being quite the multitude of options for each.

Work on cross comparability and roadmap to merge them all down to eventually one obvious way to install packages.

Lost an hour of time trying to get Stable Diffusion working on an M1 Mac yesterday


Nevermind Pypi packages, the monstrosity that is __init__.py must die.

Namespace traversal that executes code is just so so bad.


Python lacks the flexible universal binary distribution solution which nearly all of the new comers has. Consider golang, rust or docker images. Most probably docker image distribution is the only available solution for now and volume management is the worst problem on that front.


Thanks for posting this! I'm glad I had the chance to share some feedback. I only wish I had seen an "Other Feedback" box to simply say thanks to everyone trying to make things better.


Packaging motto suggestion offered: "Mastery Over 'Ni!' Triumph Yields" (MONTY)

Need to revive the MONTY Python spirit as a hedge against the general dreariness of the day.


Is there any hope that even if there emerges a consensus package management solution going forward, old packages will be easily portable to it?


Nothing will improve as long as the Python powers insist on packaging being an exercise for the community.


Is there a problem with the package format itself though? There are lots of serious problems tied to distribution rather than package format which would make the experience way easier especially for beginners and people used to other package managers...

lacking binary wheels on PyPI, problems with shipping project with dependencies, confusion about there being multiple "package managers" (pip, pip-tools, poetry, pipenv, conda) and multiple formats of dependency lists (setup.py, setup.cfg, requirements.txt, pyproject.toml, ...), sys.path associated confusion (global packages, user-level packages, and anything specified in PYTHONPATH, ...)


XKCD's opinion:

https://xkcd.com/1987/


And that's just installing packages, creating packages is another hellscape altogether


Situation after the user survey has been evaluated:

https://xkcd.com/927/


I just wish they'd change their name so that my students stop snickering. (The name is pronounced like the French word for "piss".)


Isn't PyPI pronounced like: pie (food/π) + pea + eye ().

that is different from the french "pipi": pea + pea.


Sure. Now tell that to a bunch of immature college juniors. If you read the letters "pypi" in French, it sounds exactly like "pipi".

And wait until you learn what "bit" sounds like in French.


What does "bit" sound like in French?


The slang word for penis. "Megabit" and "gigabit" are endless sources of fun for the students.



Once upon a time, it’s name was “cheese shop”. Good times




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: