“Python's batteries are leaking”

dragonwriter · on May 18, 2019

Note that similar issues were raised with Ruby stdlib, which is being addressed in part with “Gemification” of stdlib, so that all of stdlib (targeted for 3.0, though it's been going on since 2.4)[0] is being moved out to externally-updatable packages that are included by default (default and bundled gems), so that it is still “batteries included” but the batteries are at least replaceable.

Amber's suggestion seems to be in the same direction (though perhaps not as extreme.)

[0] https://www.slideshare.net/mobile/hsbt/gemification-for-ruby...

bsder · on May 19, 2019

Python cannot be atomized effectively, and the issue is political.

The problem is that I cannot count on being able to install new software in many environments.

If I fight the battle to get centralized IT to install Python, I now have a guaranteed set of standard libraries as well. I'm never going to get permission to install anything other than default. Ever.

Consequently, the standard libraries need to be very complete and very useful.

And, while people seem to love the Rust approach to libraries, I'm not necessarily a fan. Far too many times I have pulled a library that is "obviously" something that a language should consider to be "standard library" and gotten bitten because it was broken. Only VERY core libraries in Rust are guaranteed to work across multiple architectures and OS's.

I think Rust is probably doing the right thing for Rust as "batteries included" is NOT one of its tenets. However, that doesn't make it right for everybody else.

hn_throwaway_99 · on May 19, 2019

> If I fight the battle to get centralized IT to install Python, I now have a guaranteed set of standard libraries as well. I'm never going to get permission to install anything other than default. Ever.

Can you explain this more? What kind of place do you work? I've had some experience with large, bureaucratic companies, but nothing ever so far as "you can't install any other libraries."

postgeographic · on May 19, 2019

Not who you asked, but I work for a large international company, a big 4 professional services firm. I wanted to install anaconda and Jupyter on my machine (data science is not a part of my 'official' job description, but I wanted to see how much of my data exploration workflow I could speed up or automate). I had to go up three separate hierarchy ladders to get sign off. First, my own team, then IT, then our risk and quality team (thanks to our audit practice and a few historical issues with whistleblowers and leaks, the risk guys are pretty much the final arbiter of... Everything)

After about 9 weeks of emails, meetings, and pitches, I finally got Anaconda up and running. A week later, I tried to upgrade the 3rd party packages.. No dice. Blocked by the corporate VPN. I'd need the sign off every time I wanted to `pip upgrade` anything

Needless to say, I do not bother anymore.

sebazzz · on May 19, 2019

I also work at a big 4, and perhaps the same one as you, and I assure you, there is a way. There is always a procedure or some way you can follow. You just need to know how to look it up.

We have our own development team, our own servers, our own freedom to deliver to clients fast without the hassle of the main corporation. How? We talked to the right persons.

lugg · on May 22, 2019

Big 4?

javagram · on May 22, 2019

https://en.m.wikipedia.org/wiki/Big_Four_accounting_firms

cosmodisk · on May 19, 2019

I can't imagine myself working in a place like this..I understand there should be a level of checks, however this is just crazy...

gigatexal · on May 19, 2019

Hierarchies and the systems or checks that they serve at places like this exist only to keep some people employed. That has got to suck! For anyone who has a rogue or novel idea will get shot down because it’s too much of a burden in terms of overhead to get any decision made.

Myrmornis · on May 23, 2019

I don't think your company deserves you.

bloaf · on May 19, 2019

Where I work, there is currently a push to get python on the computers that manage physical equipment operation. These computers are not allowed to connect to the internet, and have extremely limited connectivity to the rest of the business network. Installing anything new on them requires risk assessments like you wouldn't believe, since the consequences of malicious code could easily hit 10s of millions of dollars and a nonzero number of lives.

loeg · on May 19, 2019

If your risk assessment says that the exact same tkinter outside of Python stdlib is riskier than in Python stdlib, maybe your risk evaluation process needs reevaluation.

edoceo · on May 19, 2019

I hear this sentiment frequently. Come on, one software engineer cannot steer the huge ship that is BigCo Risk Assessment. Well, they couldn't do that and the original task.

It might me more helpful to think of these types of external factors as fixed points that cannot be moved and just engineer around them.

You'll burn out if you try to boil the ocean on every business process that doesn't seem "logical" from your cursory examination.

cwyers · on May 19, 2019

On one hand, this is true. On the other hand, this is being put forth as a reason to not make a change in the entire Python ecosystem, and it's not really Python's job to bend over backwards for shops that have bad risk assessment either.

schlenk · on May 19, 2019

As long as you cannot even prove that due to a lacking python code signing infrastructure for packages (wheels can do it, but it is far from wide spread).

And setup.py is a trainwreck, e.g. some packages compile download and compile huge dependencies (e.g. a full Apache httpd...), the default compiler flags may lack all the mandatory security flags (e.g. for using ASLR on python 2.x), or ship their own copy of openssl statically and break your FIPS-140 certification that way...

dralley · on May 19, 2019

And since setup.py is a Python file, you can't express build time dependencies properly. Pyproject.toml let's you do that, but it's new, nobody knows about it, and older pip clients don't support it.

sametmax · on May 19, 2019

Yes but it won't get it. And at the end if the day, people need to be able to get work done.

The corporate world is full of stupid things that will never not change, or take years to change.

gautamdivgi · on May 19, 2019

Where I work the solution was to use a proxy to pypi. Basically an internal pip repo (and docker, npm, maven, everything else...). All internal apps go through the internal repository that creates a local version of the package from pypi. That gives the security / compliance folks a way to block packages with security issues, etc. and at the same time provide the developers flexibility to get most of what is needed.

In a large company this gives the compliance folks a central place to blacklist packages - along with a trail of what systems have downloaded the package to target for upgrades.

sametmax · on May 19, 2019

Many technical solutiins exist, but the problem is political or organisational.

gautamdivgi · on May 19, 2019

Agree. At this point it was more a case of executives saying they wanted internal dev teams to use and contribute to open source and supporting orgs to come up with solutions on how that can be possible with a 0-touch approach. That’s what tipped the balance.

RayDonnelly · on May 19, 2019

I disagree. tcl/tk is written in C and C can be compiled in very very badly indeed (from a security perspective).

Yuioup · on May 19, 2019

Maybe a stupid question but can you ship code on the machine? If you can, what is stopping you from including the source of the library that you're trying to 'install'?

sametmax · on May 19, 2019

Many do, but you can get fired for it.

Havoc · on May 19, 2019

>What kind of place do you work?

Not OP but same. I'm currently debating with myself whether I should attempt to install PUTTY. Given that port 22 is blocked and it's not needed for my core role it'll be dicey if I get challenged.

Pulling executable code off some repo...no way that is ever officially passing muster. People might do it anyway, but on a personal risk basis.

>Can you explain this more?

Place that are heavy on confidential financial info basically. Practically everything I touch is confidential client data. So employer is naturally jumpy about what's on my laptop software wise.

Ironically the above comes full circle...need putty to get onto a VM in cloud where there are no restrictions and crucially no client data. Nobody cares what I do there - hell they'll even pay for it thanks for MSDN enterprise

Maxious · on May 19, 2019

halter73 · on May 19, 2019

I've found the built-in Windows 10 ssh client has trouble with tunneling, so I still use the ssh client included with git.

voltagex_ · on May 19, 2019

If you can figure out how to reproduce the issue, I'm sure the team would accept a bug report at https://github.com/PowerShell/Win32-OpenSSH

mycall · on May 19, 2019

netsh interface portproxy add v4tov4 listenport=8001 connectport=80 connectaddress=1.1.1.1

throwaway2048 · on May 19, 2019

this isn't useful for tunneling through a remote machine (which is what ssh does)

Havoc · on May 20, 2019

I've been checking but I can't add the module. Either it's slow to get to Enterprise version or it's blocked. Not sure.

Found a work-around though - Google cloud shell being *nix works fine for SSHing about the place. Gets me around the port fw too

jcims · on May 19, 2019

Nice! Too bad our desktops are still on Windows 7 at the office.

scottLobster · on May 19, 2019

I work for a major defense contractor, and while we have vanilla python we are categorically not allowed to download software off the internet and install it without authorization, even on the unclassified internet-connected corporate network.

Theoretically there's a process for requesting new software and getting it approved, but actually pushing it through requires getting one of my program architects to care enough to file the request (As a mere level 2 engineer all I can do is write it, can't submit), then potentially weeks of followup, for 1 specific version of 1 specific package.

In the case of python packages, perl and the perl packages we need are already approved because a few senior devs got together and pushed them through 10 years ago (was before my time, but I understand it was with quite a bit of arm twisting). It's more time-efficient to just code perl than to fight for python.

It's one of the many reasons I intend to get myself another job for Christmas. :)

As for why the system exists: Cost cutting, in the sense of "the less we invest in infrastructure the more we can divert to sexy hardware for the cameras and shareholder dividends. So long as it's theoretically possible for you to do your work, we don't care how many hoops you have to jump through to do it. And our competition is even worse than us, so we don't have to worry about anyone undercutting."

As a result all our infrastructure is centralized. Programs have to jockey with each other for everything from virtual servers to physical workstations and monitors. Hell the only reason my program has our primary test server is because one of our architects literally overheard a hallway conversation about a program that was spinning down and getting rid of some servers, so he jumped on it.

civility · on May 19, 2019

> Theoretically there's a process for requesting new software and getting it approved, but actually pushing it through requires getting one of my program architects to care enough to file the request

I worked for a much smaller government contractor, but before I left they were moving to a system where you needed approval from the customer in order to get new packages. (For those who don't work in this field, that means you are actually making a request to the contracting representative from the particular government agency for each package you want.) So it wasn't just in-house bureaucracy in the way of progress, and I generally just went without or wrote my own instead of trying to deal with it.

HelloNurse · on May 20, 2019

If you must work with one hand tied behind your back you are fucked, and your company is even more fucked. Your priority should be supporting your managers who want to get things done in the struggle against centralized IT managers who want to repress aspirations to work.

I'm afraid the standard library has to be aligned with the needs of more normal users who, as already discussed, want to allow libraries to have their own release cycles and to be more "opinionated" and specialized than the standard library would permit.

garettmd · on May 22, 2019

> the needs of more normal users

I'm afraid users dealing with that sort of bureaucracy are much more normal than you think, if not the norm. They're just usually not the types of folks that are hanging around HN, or they're at least less vocal.

ringshall · on May 19, 2019

At least in some places, local install permission may be available, but cross company install permission is much more difficult to get.

As in, it is very difficult to get software installed in general images or on multiuser servers.

There are obviously good reasons for it to be conservative about this.

Aeolun · on May 19, 2019

As far as I understand this. It’s something like “You can install whatever libraries you want, but we’ll only ever install default python on user PC’s”

tjalfi · on May 19, 2019

Last June I opened a ticket for our procurement department to have an open source license reviewed. It is still open. Our purchasing process is similar to [0].

[0] https://training.kalzumeus.com/newsletters/archive/enterpris...

dforrestwilson · on May 19, 2019

Maybe just me but sometimes after initial install l, I later find a weird non-standard library that I need to do something.

At which point you have to scale the IT wall all over again if you work at a Fortune 500 company.

inoop · on May 19, 2019

> At which point you have to scale the IT wall all over again if you work at a Fortune 500 company.

I work at a Fortune 500 company, and the only wall I have to scale when I want to use a library that nobody at our company has ever used before, is to get someone to check and approve the license (typically takes 1-2 days), and import it into our code repos.

I mean I see your point, but not everywhere is as bad as you make it seem.

roca · on May 19, 2019

Putting code into the standard library doesn't magically create developer resources to maintain it. Indeed, Amber Brown is saying that many libraries in the Python standard library aren't properly maintained. So it's not clear that standard library policy is relevant to the Rust issues you have.

rcfox · on May 19, 2019

> If I fight the battle to get centralized IT to install Python, I now have a guaranteed set of standard libraries as well. I'm never going to get permission to install anything other than default. Ever.

You could install packages with `pip install --user` to have them install under your home directory.

growse · on May 19, 2019

> > If I fight the battle to get centralized IT to install Python, I now have a guaranteed set of standard libraries as well. I'm never going to get permission to install anything other than default. Ever.

> You could install packages with `pip install --user` to have them install under your home directory.

You might find that the issue isn't necessarily always technical (ie permissions), but policy. A lot of places don't let you arbitrarily download software off the internet onto a system. Pp's point is that having a full feature stdlib let's you only need to crank the policy/approval process once, rather than once for every dependent lib.

dragonwriter · on May 19, 2019

> The problem is that I cannot count on being able to install new software in many environments.

The approach Ruby is taking with gemification and default and bundled gems for standard libraries is equivalent to the traditional standard library if you can't install updates, but superior in other cases.

d0mine · on May 19, 2019

As I understand, the Ruby model -- that is suggested for Python above -- is to include the libraries by default but to keep them as separate pypi packages.

In this case, even If you can't upgrade, you still get the same libraries but perhaps older versions.

pornel · on May 19, 2019

This is close to what Rust is doing, and it's working pretty well, apart from shocking newcomers who expect libstd to be useful on its own.

In Rust, libstd is mainly for interfacing with the compiler and providing interoperability between packages (crates). The wider crate ecosystem is the real standard library, since external crates are as easy to use as the standard library.

For example, the libstd doesn't even have support for random number generation. There's a rand crate, which is now on 6th major breaking version. That's perfectly fine, because multiple versions can coexist in one program, and every user can upgrade (or not) at their own pace. And the crate was able to refine its interface six times, instead of being stuck with the first try forever.

icxa · on May 19, 2019

This is interesting. Go seems to have the complete opposite stance. The stdlib are some of the most useful and well written packages you can use in the Go ecosystem, and then you have the "extended" standard lib which isn't 100% in the language yet, and even further sometimes concepts from useful community packages make it into the std lib.

As to why this is the case, I think maybe this is enabled by Go's backward's compatibility focus and encouragement to upgrade early and often, and the community's focus to utilize small interfaces sometimes from the stdlib itself, like io.Reader and io.Writer, or http.Handler. Added to that w.r.t. using the latest and greatest, most Go users frequently are using the latest version of Go even in production (per the go experience surveys).

I am sure it also helps that Google pays people to develop, maintain, and improve the stdlib.

pornel · on May 20, 2019

No doubt Go's stdlib is useful, and there's plenty of things it got right. However, it's not immune to making some mistakes and having to freeze them forever. The more functionality you add, the harder it gets to get it perfect on the first (and only) try. Search for "deprecated site:https://golang.org/pkg/" finds various issues ranging from cosmetic mistakes to entire packages being deprecated.

    CompressedSize     uint32 // Deprecated: Use CompressedSize64 instead.
    CompressedSize64   uint64 // Go 1.1

    // Deprecated: HeaderMap exists for historical compatibility
    // and should not be used.

Requirements may change over time, so even getting something perfect now is not a guarantee it will last (e.g. pre-UTF-8 languages froze byte-oriented or UCS-2 strings, even though these were good decisions at the time).

Sometimes improvements are not worth the cost of deprecation and replacement, so things are just left as they are. For example, an HTTP interface designed for request-response HTTP/1 works for stream-oriented HTTP/2, but support for push, prioritization and custom frames is bolted on. Packet-oriented HTTP/3 will add even more stuff that will have to be retrofitted somehow to the old model. Libraries can come and go, but std can't just throw away an old interface and start over.

rbanffy · on May 19, 2019

This approach is useful for a while, but once something is in a stdlib, its interface is frozen forever.

icxa · on May 19, 2019

(I am trying to tie the two concepts together from your reply, so I am not saying this to come off as combative but understand your objection)

- What would you define as a while? I think Go has done well with it, given that it is almost 10 years old now at this point, and in reality, was in development internally at Google years before that.

- Do you feel like an interface being frozen forever is particularly a bad thing in of itself? What if the interface does a really good job describing the thing, whatever it may be? For example, Go's io.Reader/io.Writer interfaces.

I agree sometimes an interface being frozen is bad, but for example when Go standardized the context package, it simply added "Context" to the existing functions that now take a context (e.g. in the database/sql package, you have the old, Exec, Query, QueryRow functions, and after 1.8 you have ExecContext, QueryContext, QueryRowContext, which some people may view as a reason to have method overloading, but I view as adding better clarity.

jerf · on May 19, 2019

Go was and is by design a "boring" language. The core designers didn't have much trouble looking at decades of prior art and getting it mostly right.

Rust libraries should be expected to take a few tries to get right, especially earlier in its lifecycle. There's more possibilities and less experience in the language.

You can see a similar effect in Haskell, which has iterated many basic bits of functionality many times over.

DougBTX · on May 19, 2019

The way to answer your two questions is to combine them, the definition of “a while” depends on how good a job the interface does. If the interface is really good, then it can last a really long time (perhaps as long as the language lasts).

I read some discussion the other day about some ways in which the Any type in Rust isn’t as flexible as it could be, since it is frozen the only way to improve it would be to introduce a new name, such as Unknown. Similarly in C# along with adding async support, the standard library added async versions of many methods, eg Read now also has a ReadAsync partner.

It does seem that having multiple names for basically the same thing adds a small but tolerable level of overhead to a language. At least if as much as possible is moved out, then projects can choose to only use the latest versions, and live in a world as if past versions never existed.

gmueckl · on May 19, 2019

Having incompatible libraries solving the same basic problems is absolutely not fine. Over time, this will become a huge issue for composability. C++, for instance, already is in this kind of mess with things like STL containers versus Qt comstainers vs. homegrown special case (optimized containers) and heaps of additional libraries building upon each. There are other good examples in other programming languages as well, mostly older ones.

The example of random number generators is a good one, too. There are a lot of applications that require (reproducable!) PRNG sequences and sometimes you have to share PRNGs between modules. Now, looking at the rand crate, I see that the prng part of it was recently mucked around with. If I have two 3rd party modules that I require to share a PRNG that I control (say, noise generators for procedural textures), I cannot compose them if one of them uses the old and one of them uses the new version of the library.

pornel · on May 20, 2019

When a crate is part of a public interface, that's harder, indeed. Crates solve it in a few ways:

• crates that expect to be used for interoperability are often split into smaller crates (like API and back-end, or low-level API and high-level API), so that they can evolve some parts without breaking others.

• sometimes breaking changes are technically breaking, but easy to upgrade (e.g. methods renamed, args reordered). In that case most users catch up quickly.

• in desperate cases, a new version can import its own old version and re-export old structs and interfaces that haven't changed, so they're compatible across major versions.

• proper sharing and composition should be done via traits, so that you can implement a trait for any number generator, not just one version of one implementation.

rmtech · on May 19, 2019

I've never coded Rust - is there any distinction between a really important crate used by millions of people and something really obscure with 3 users? Are all the crates subject to security audit?

pornel · on May 20, 2019

There is no technical distinction. The community is working on a WoT/review tool (cargo-crev), but in the meantime you can see who has published the crate and who uses it. The de-facto standard crates are maintained by Rust team members or well-known authors.

rmtech · on May 21, 2019

ok. So it's kind of informal at the moment.

Maybe in the future we'll see more hacking of libraries (people managing to deliberately sneak exploits in) and in response stronger lockdowns on important library code.

steveklabnik · on May 19, 2019

Anyone can upload to crates.io without review.

mkesper · on May 19, 2019

But then you have six versions to have security fixes, how do you do that?

pornel · on May 20, 2019

You release fixes for older versions according to semver.

astrodust · on May 19, 2019

Yes and no. Until the Ruby maintainers give the gem versions of the internal modules different names you will get bizarro conflicts when using a gem after the "batteries included" version has already loaded.

I don't know how many hours I've spent battling Psych errors because of this very thing, but it's way too many. Calling the gem something, anything else, would solve the issue.

It's great that they're unbundling a lot of things, but there's still some serious friction between external and internalized versions of these gems.

For Ruby, EventMachine sub-universe is really in bad shape. EventMachine is creaky and old. Event-aware packages are in short supply and are usually woefully out of date, unmaintained.

petre · on May 19, 2019

This works in Perl, a language that almost everybody criticizes or hates. I've never had any issues in upgrading core packages.

mjw1007 · on May 18, 2019

If your project has any third-party dependencies, and so (nowadays) you're going to set up requirements.txt and virtualenv and whatever anyway, I can see that you're going to think things like "this XML parser in the standard library is just getting in the way; I can get a better one from PyPi".

But I think a lot of the value of a large standard library is that it makes it possible to write more programs without needing that first third-party dependency.

This is particularly good if you're using Python as a piece of glue inside something that isn't principally a Python project. It's easy to imagine a Python script doing a little bit of code generation in the build system of some larger project that wants to parse an XML file.

rogerbinns · on May 18, 2019

I think the biggest problem is going from zero third-party dependencies to one and more. Adding that very first one is a huge pain since there are many ways of doing it with many different trade offs. It is also time consuming and tedious. The various tools like you mention are best at adding even more dependencies, but are hurdles for the very first one.

tachyonbeam · on May 18, 2019

Not true. The more external dependencies you add, the more likely it is that one of them will break. I try to have as few external dependencies as possible, and to pick dependencies that are robust and reliably maintained. There is so much Python code on GitHub that is just broken out of the box. When people try your software and it fails to install because your nth dependency is broken or won't build on their system, you're lucky if they open an issue. Most potential users will just end up looking for an alternative and not even report the problem.

rogerbinns · on May 19, 2019

To add one more third party dependency when you already have some is as simple as adding one more to whatever solution you are already using (eg another line in requirements or running a command).

When you have no third-party dependencies, then adding the first one requires picking amongst trade offs and lots of work. A subset of choices include using virtualenv, using pip, using higher layer tools, copying the code to the project, using a Python distribution that includes them, writing code to avoid needing the first dependency ...

* You have to document to humans and to the computer which of the approaches is being used

* Compiled extensions are a pain

* You have to consider multiple platforms and operating systems

* You have to consider Python version compatibility (eg third party could support fewer Python versions than the current code base)

* And the version compatibility of the tools used to reference the dependency

* And a way of checking license compatibility

* The dependency may use different test, doc, type checking etc tools so they may have to be added to the project workflow too

* Its makes it harder for collaborators since there is more complexity than "install Python and you are done"

I stand by my claim that the first paragraph (adding another dependency) is way less work, than the rest which is adding the very first one.

Groxx · on May 19, 2019

They're typically broken out of the box because they don't pin their dependencies. pip-tools[1] or pipenv[2], and tox[3] if it's a lib, should be considered bare minimum necessities - if a project isn't using them, consider abandoning it ASAP, since apparently they don't know what they're doing and haven't paid attention to the ecosystem for years.

[1] https://github.com/jazzband/pip-tools [2] https://docs.pipenv.org/en/latest/ [3] https://tox.readthedocs.io/en/latest/

tachyonbeam · on May 19, 2019

It's trickier than just pinning dependencies because some libraries also need to build C code, etc. Once you bring in external build tools, you have that many more potential points of failure. It's great. Also, what happens if your dependencies don't pin their dependencies? Possibly, uploading a package to pipy should require freezing dependencies or do it automatically.

lalaland1125 · on May 19, 2019

Modern python tooling like pipenv pins the dependencies of your dependencies as well. This is no longer an issue

vlovich123 · on May 19, 2019

I used a requirements.in file to list out all the top-level direct dependencies & then used pip-compile from piptools to convert that into a frozen list of versioned dependencies. pip-compile is also nice because it doesn't upgrade unless explicitly asked to which makes collaboration really nice. I then used the requirements.txt & various supporting tooling to auto-create & keep updated a virtualenv (so that my peers didn't need to care about python details & just running the tool was reliable on any machine). It was super nice but there's no existing tooling out there to do anything like that & it took about a year or two to get the tooling into a nice place. It's surprisingly hard to create Python scripts that work reliably out-of-the-box on everyone's environments without the user having to do something (which always means in my experience that something doesn't work right). C modules were more problematic (needing Xcode installation on OSX, potentially precompiled external libraries not available via pip but also not installed by default), but I created additional scripts to help bring a new developer's machine to a "good state" to take manual config out of the equation. That works in a managed environment where "clean" machines all share the same known starting state + configs - I don't know how you'd tackle this problem in the wild.

I do think there's a lot of low-hanging fruit where Python could bake something in to auto-setup a virtualenv for a script entrypoint & have the developer just list the top-level dependencies & have the frozen dependency list also version controlled (+ if the virtualenv & frozen version-controlled dependency list disgaree rebuild virtualenv).

theossuary · on May 19, 2019

I don't know if it'd work the same way, but I've had a lot of success with Twitter's Pex files. They package an entire Python project into an archive with autorun functionality. You distribute a Pex file and users run it just like a Python file and it'll build/install dependencies, etc. before running the main script in the package.

I used it to distribute dependencies to Yarn workers for PySpark applications and it worked flawlessly, even with crazy dependencies like tensorflow. I'm a really big fan of the project, it's well done.

https://github.com/pantsbuild/pex

schlenk · on May 19, 2019

Unless your dependency is a C-header file updated by your distro as part of a new version.

nitrogen · on May 19, 2019

Requiring people to "pay attention...for years" is not the way to build long-term robust software.

guitarbill · on May 18, 2019

the problem is it can fall apart quickly. the XML parsing in the standard library is limited and slow, so most people consume lxml instead [0]. so it depends on the case. counterpoint: e.g. pathlib being in included is great. it was at least inspired by 3rd party libraries, but the features are relatively stable and the scope defined, and relatively few dependencies, and so moving it into the standard library is a win IMO. not only for import ease, but for consistency.

[0] https://pypi.org/project/lxml/

maxerickson · on May 19, 2019

ElementTree is in the stdlib. It isn't slow and has incremental parsing and so on.

It's also a nice API for dealing with XML.

UncleEntity · on May 19, 2019

> ElementTree is in the stdlib. It isn't slow and has incremental parsing and so on.

I had enough trouble using it efficiently that I went and wrapped Boost property tree[0] and can happily churn out all sorts of data queries (including calling into python for the sorting function from the C++ lib) in almost no time.

I was taking daily(ish) updates of an rss feed and appending it to a master rss file but sorting was pretty slow using list comprehensions so now I convert it automagically to json and append it as is. No more list comprehensions either, just hand it a lambda and it outputs a sorted C++ iterator.

Though I probably should've just thrown the data into a database and learned SQL like a normal person...

[0] https://github.com/eponymous/python3-property_tree

porker · on May 19, 2019

> and so (nowadays) you're going to set up requirements.txt and virtualenv and whatever anyway

If only that were the norm amongst long-tail python users. Heck I don't do it; I have the anaconda distribution installed on Windows and when I need to do a bit of data analysis hope I have the correct version of packages installed.

Making this core to the python workflow (bundling virtualenv? updating all docs to say "Set up a virtualenv first"?) is the first required change, before thinking about unbundling stdlib

schlenk · on May 19, 2019

I develop in python since 10+ years. And setting up a virtualenv only happens when developing patches for 3rd party packages. (due to the other commercial environment that doesn't work with venvs).

And it is a totally miserable experience on Windows, every single time.

pas · on May 19, 2019

The stdlib would still ship with Python I presume, it would be simply updatable. But that doesn't mean it wouldn't work without a "requirements.txt" (use Pipenv which is light-years ahead).

wirrbel · on May 18, 2019

Last week I have written a Python script purely relying on the stdlib. Basically a coworkers shell script had to be adjusted to account for newer datafiles I was processing and I am not that experienced with shell script magic. I typed down 15 lines of Python code only relying on the std lib and was happy, it ran on the server's 2.x Python without an issue. This is the primary selling point of a larger std. library.

But, I do wonder whether it needs to grow. Python is in a stage where adoption of new std library features is inherently slow, not only in third-party libraries like twisted, but also in applications, even if they use newer Python versions.

What kills Python here is that it is most commonly bundled with the linux distribution or the OS (true also on mac). This reduces cycle times drastically. Compare this to newr language platforms that people like to install in newer versions on older platforms quite regularly.

Some recent additions would be fine additions at an early stage of language development, but surely not for Python.

petre · on May 19, 2019

> What kills Python here is that it is most commonly bundled with the linux distribution or the OS (true also on mac)

It doesn't kill it, quite the contrary, rather makes it ubiquitous. If you want another version you just install virtualenv. It's the same with Perl. We use the Perl version shipped with the distribution (openSuSE) and deploy to that. It's older but it's stable and it works. On our dev environment (mac) we have the same version with all of the modules installed in plenv. We also chose a framework with as little deoendencies as possible (Mojolicious). It looks like it was a great choice.

nullwasamistake · on May 18, 2019

What they need is an Apache Commons or Guava of Python. They're both defacto part of the standard java library.

dehrmann · on May 18, 2019

I try to avoid Guava because they have a habit of making incompatible breaking changes, and because so many libraries depend on it, it's likely to cause version conflicts. The way Apache Commons puts the major version in the package is much better in that regard.

nullwasamistake · on May 19, 2019

I have not experienced this running guava 16-23 in various apps. Maybe incompatible but they're good about security patches for old versions. I have never seen a version conflict between guava releases

BeeOnRope · on May 19, 2019

It's very easy to get a Guava version conflict because (a) Guava frequently adds new stuff, and (b) Guava semi-frequently deprecates and removes stuff a couple versions later.

So all you need is one dep that needs Guava version X with method M that is removed in version X+2 (say) and another dep that needs something new introduced in version X+2, and you have a Guava version conflict. That's, Guava releases are not backwards compatible due to removal of classes and methods.

You can sometimes fix this with a technology like shade or OSGi or whatever to allow private copies but it does not always work.

aphexairlines · on May 19, 2019

Transitive dependencies on Guava 19, 20, 21 can lead to runtime crashes if your dependencies differ in what guava versions they expected when they were compiled:

https://www.google.com/search?q=guava+nosuchmethoderror

avar · on May 18, 2019

She seems to be advocating that Python do pretty much what Perl has ended up doing, which is "we have some batteries, but we haven't been adding new ones for a decade or more".

The reasons are similar, it's a constant drag on core compiler development to need to support various batteries included that most core contributors aren't going to care about, so it's easier to tell people "use CPAN".

There was even talk of "distros" for the interpreter. Where the core bits would be similar to what Linux is, and all the batteries would be provide as collections of add-on packages.

Strangely enough these efforts seem to stop at OS distributors. They really seem to like to install just the one "compiler", and wouldn't stand for a project like Perl or Python telling them "we mean for you to distribute the core compiler plus these 100 packages, because that's what forms our 'language'". "Strangely" because you'd think they'd be the best positioned to make easy work of packaging up such a thing, and it shouldn't in principle make a difference if you need to install 100 RPMs / APTs by default.

gcb0 · on May 18, 2019

And perl solved that perfectly: just let the OS/distro solve the 100s of packages. And it have been solved, despite you claiming otherwise on your last paragraph.

When did you have to use cpan in a modern system? Compare that to how many times you had to use pip.

Now, if you use a crappy OS or distro (or god forbid, some container built by you have no idea who on top of nobody knows what) then yeah, you are bound to do the leg work yourself, but you will be doing that regardless of the language/subsystem you are trying to use in that case.

Not to mention that it is the only way to do things professionally. For example, if you must have a system that parses XML but for company policy is not allowed to have even the means of performing a network request. With python you either have both xml and an http library and whatever else included and you will either have to do a special package with a striped down python+xml only or get a corporate exception. While on other languages you can install only the xml parser component package and your code will run happily and be compliant with company policy.

majewsky · on May 18, 2019

> When did you have to use cpan in a modern system? Compare that to how many times you had to use pip.

Well yeah. A sizable amount of new software is still being written in Python. But when I use Perl software (besides my custom scripts), it's always stuff that's old enough that the distribution is carrying packages for it.

If you disagree, please name a significant new software written in Perl that was released in the last, say, 5 years.

gcb0 · on May 22, 2019

name one package you missed in those 5 years.

I see your comment as the goal, not the problem.

thaumasiotes · on May 18, 2019

> if you must have a system that parses XML but for company policy is not allowed to have even the means of performing a network request. With python you either have both xml and an http library and whatever else included and you will either have to do a special package with a striped down python+xml only or get a corporate exception. While on other languages you can install only the xml parser component package and your code will run happily and be compliant with company policy.

...wouldn't the company policy involve removing the means of performing a network request from the computer, making the notional capabilities of the software irrelevant?

Python will let you drive network requests through the OS. It's just that you wouldn't normally want to.

lalaland1125 · on May 19, 2019

+1 for this. Trying to ban programming languages that support network connections is both foolish and impossible. Any Turing complete language that allows any sort of OS interaction can be used to communicate over a network. Even if you have to manually do the syscalls yourself.

gcb0 · on May 22, 2019

most companies i've worked for have special packages for perl/python/php/etc that compile the interpreter without support for system calls, for example.

That alone have probably paid of handsomely over the years considering all the XSS we patched, which could very well have been full network compromises.

avar · on May 18, 2019

I don't mean OS distributors can't package up CPAN modules. They can do that, no problem, same for the Python equivalents.

I mean that a significant use people get out of Python and Perl is that they aren't bare-bones like say Scheme or Lua where the standard library is really spartan.

It allows you to write useful code that works on the lowest common denominator of "just OS Perl or Python". Whether that's some random version on whatever Linux distro, or *BSD or Solaris or whatever without needing to write your own getopt library or whatever.

Which is why the "let's ship a bare-bones compiler and have people use CPAN or PyPi" is contentious. In theory it shouldn't matter, and for a lot of shops who install hundreds of packages it doesn't, but it does for people who target stdlib-only, which is a big use-case. Particularly since the people who have that use-case are drawn to these languages.

dragonwriter · on May 18, 2019

> Which is why the "let's ship a bare-bones compiler and have people use CPAN or PyPi" is contentious.

But you don't have to ship just a bare-bones interpreter to deal with the problem of stdlib staleness, you just need the stdlib libraries to be updatable via package manager, you don't need to not ship a baseline version of them with the interpreter.

That doesn't deal with the bloat issue raised with relatively unused libraries, but if they are relatively unused because they aren't good rather than because the use case is uncommon, upgradability could solve that.

Of course, you don't solve compatibility for versions before the move to upgradable packages, but at the same time if you solve problems going forward you increase the incentive to upgrade.

avar · on May 18, 2019

Sure, in Perl these are called "dual-life" modules. It makes things easy for users, but makes the life of the compiler-maintainer worse.

Now not only do they need to ship a stable compiler+large-stdlib, but they can't even rely on there being a 1=1 version relationship between the two, instead it'll be many=many as users might use multiple library versions with multiple compiler versions.

dragonwriter · on May 18, 2019

> Sure, in Perl these are called "dual-life" modules. It makes things easy for users, but makes the life of the compiler-maintainer worse.

Well, yeah, but you're not going to have much of a language at all if you optimize for quality of life of the language maintainer.

__david__ · on May 19, 2019

I use cpan constantly, though indirectly via carton. I used to use the OS for Perl libs but this starts to fail hard when you have multiple projects that all demand different versions of stuff.

walshemj · on May 18, 2019

You use CPAN all the time in perl develpment

jzl · on May 18, 2019

The idea of "distros" for python is interesting, and to a certain extent has already happened: just look at Anaconda.

I've been using built-in environment isolation tools such as virtualenv for ages but have recently switched over to using miniconda for all things python. Among other things it has amazing support across all three major OS's, and I happen to be dealing with all three at any given time. Whether one uses miniconda, pipenv, virtualenv, or anything else like it, as far as I am concerned the days of ever using the system python are over. I will always create my own personal "distro" on the fly with full control over the python version and every add-on package.

mixmastamyk · on May 19, 2019

> the days of ever using the system python are over

You don’t have any Python scripts in your bin folder?

imiric · on May 18, 2019

As a non-scientific user of pyenv[0], would I benefit from switching to Anaconda/miniconda?

[0]: https://github.com/pyenv/pyenv

jzl · on May 18, 2019

The default "Anaconda" install is like 7Gb after it grabs everything. Turnkey if you're using all that stuff anyway, but otherwise not particularly worth it. Miniconda on the other hand I'm finding meets my needs exactly. The base install is standard.

goerz · on May 19, 2019

You don’t have to switch. You can install anaconda using pyenv, to get the best of both worlds.

maximente · on May 18, 2019

not likely IMO. i've found conda - which is their environment management tool - to be a hassle unless one needs specific numpy/scipy/GPU libs. i'm using pipsi and pew, although i'll look into pyenv.

kalefranz · on May 19, 2019

Former conda dev lead here. Definitely interested in more details regarding what part of the conda experience you found to be a hassle, if you’re willing to share.

liveoneggs · on May 18, 2019

https://wiki.python.org/moin/PythonDistributions

marcosdumay · on May 18, 2019

That problem does not seem to happen with the Haskell Platform. Maybe it's because GHC has almost no batteries at all, so nobody thinks it's sufficient, or maybe it's because distros get it in a single package, so it does not feel like installing 100 libraries.

twic · on May 18, 2019

> There was even talk of "distros" for the interpreter. Where the core bits would be similar to what Linux is, and all the batteries would be provide as collections of add-on packages.

Not a million miles from the modularisation that Java has been going through.

stochastastic · on May 18, 2019

The Python standard library has been a huge help for me. Evaluating which third party packages to trust and handling updates is a hassle. (Would love a solution for this. Does anyone have a curated version of PyPI?) I’m surprised that people want to slim it down other than for performance on a more constrained system.

As an aside, why doesn’t the Python standard library extend/replace features with code from successful packages like Requests? Tried it and it didn’t work? Too much bloat? Already got too much on the to-do list?

debatem1 · on May 18, 2019

The quote about the stdlib being "where packages go to die" was, for a long time, considered a feature and not a bug. The theory was that once a package is in the stdlib its development should slow to prioritize stability over new features.

This may somewhat explain why lots of successful packages are not in the stdlib: putting them in effectively killed future development until a few years ago. But today that argument is inconsistently applied and I'm not sure if it's a rule worth keeping.

sitkack · on May 18, 2019

That phrase never had a good connotation. I believe someone skilled in PR respun that meaning.

aasasd · on May 18, 2019

> Does anyone have a curated version of PyPI?

Pypi have thrown out the downloads counter—a huge misservice to coders. Like I got all day to figure out the best libs for ten different features which I only need in passing, so my primary concern is to not pick complete garbage.

So, my solution to that now is to look up Github pages for the libs and choose the one with most stars. As much as I dislike Github for its occasional typical proprietary behavior, Gitlab doesn't help in this case.

j88439h84 · on May 19, 2019

Pypi still has a download counter, but it will tend to reflect which libraries are used in ci, so it's a biased estimate.

jancsika · on May 19, 2019

If only there were a way to filter machines on the internet by some kind of id number, and to subtract numbers from other numbers at scale.

orf · on May 22, 2019

And what about caches and proxies?

aasasd · on May 19, 2019

Could you please point me to the location of that counter? Because I ain't seeing it anywhere.

detaro · on May 19, 2019

afaik it's not in the UI, but the dataset is published and and access provided by other sites: https://pypistats.org/

petre · on May 19, 2019

Perl has stars on MetaCPAN and every version has a test counter. Download counter is unreliable.

DonHopkins · on May 19, 2019

Especially download counters for libraries that make http requests!

simonh · on May 18, 2019

Requests depends on urllib3 which would also have to go into the stdlib. It also contains a CA bundle which the core devs don’t want to do. It also likely the internal implementation doesn’t follow core dev standards and practices, a common problem with integrating external libs. Finally there’s a risk it would slow or discourage new feature development by tying it to the core release cycle.

A better approach might be to add a core of basic requests-like features built from the stdlib’s existing resources. That would be beneficial to many users and if they need more then there’s always Requests.

stochastastic · on May 18, 2019

That makes sense; if it makes any difference I don’t necessarily mean “take code from X and drop it in” so much as “if the consensus appears to be that the X api is better then add that to stdlib”. I like that notion of adding the X api, or parts thereof, And then having a third party X+ package. Maybe I just like the idea of something I built being “worthy” of stdlib.

jonnycomputer · on May 18, 2019

I've come to use packages outside of the standard library very sparingly; been burned too many times to find that development of some package stopped or slowed down and backing out can be a real pita.

falcor84 · on May 18, 2019

What is your argument here? Standard library module development is also extremely slow.

kilburn · on May 18, 2019

There is one major difference: an abandoned external package may break with newer python versions, whereas you can always count on stdlib packages being updated for new versions.

falcor84 · on May 19, 2019

Can you provide an example?

To the best of my knowledge, minor version updates in Python 3 have been entirely backwards so far and shouldn't have broken any library code. And as for incompatible changes sick as the transition to 3, then these of course had interface changes in the standard library.

jonnycomputer · on May 19, 2019

exactly. and i'm in academic software. its even worse.

chestervonwinch · on May 19, 2019

> Does anyone have a curated version of PyPI?

I'm not entirely sure what you're looking for, but have you tried https://www.enthought.com/product/enthought-deployment-manag... ?

Edit: example

    $ edm envs create tester36 --version 3.6
    $ edm shell -e tester36
    (tester36) $ edm install ipython matplotlib pyqt

stochastastic · on May 19, 2019

That’s interesting. Thanks!

llukas · on May 18, 2019

+100

> As an aside, why doesn’t the Python standard library extend/replace features with code from successful packages like Requests?

It is possible (ie. asyncio was separate package). It is slow process though.

misterdoubt · on May 19, 2019

I'd bet that absorbing Requests into the standard library, no matter the particular method of absorption proposed, would present too much of a political challenge to overcome.

quietbritishjim · on May 18, 2019

pathlib is another example.

misterdoubt · on May 19, 2019

Curating packages for quality across multiple versions and architectures is hard work. In addition to Enthought (mentioned earlier), Anaconda maintains a curated set.

lukka5 · on May 25, 2019

> Does anyone have a curated version of PyPI?

https://python.libhunt.com/ Not exactly curation but it does have rated libs for lots of categories.

pdonis · on May 18, 2019

> Already got too much on the to-do list?

That would be my guess. I would also add that getting through the process of adding a third-party set of modules to the standard library can take quite a while.

sametmax · on May 19, 2019

Hawk Owl is a _fantastic_ dev. She was the main force behind the twisted 2->3 transition. But because she is, she is missing the point of batteries included.

Asyncio is in the stdlib so that we have an official lib and API. The main benefit is that most people now, when looking for async, are not wondering about twisted or gevent or tornado. Most just go asyncio. Most dev efforts go to asyncio. It's the end of the great async war. Is it perfect ? No. And I don't care. It's one thing less to worry about. For those who know what they are doing, you can still choose and pip install twisted, but most people don't, and that's solved. Before that, just choosing the lib was a nighmare, as basically it's a definitive call. Out it on pypi, even with a "stdlib" tag, we go back to the 200X era. And it was not fun.

And the goal for having things like xml/sqlite/ssl without installing anything makes python very useful in a load of situations where you can't install stuff. Sometime you are offline. Sometime you are in a restricted env. Sometime you are not on your machine. Sometime your security protocol is hell. Don't assume people use Python as we do, from our comfortable dev laptop driven by the knowledge of our craft. Python is used in banks, by scientists, in schools, by kids, by poor people in the third world, by geographers and pentesters. The python user base is incredibly diverse, it's why it's so popular: it fits a lot of use cases.

So I see the benefit of having a side version of official modules we can pip install that can move faster. I see the benefit of cleaning the stdlib of old stuff, like the wave module, Template or @static.

But I'm glad I don't have anything to install to generate a uuid or unzip stuff. I'm glad I don't have to worry about twisted anymore (depiste that I did write a book on the topic !).

Also, pip install is NOT simple when you learn the language. I have to spend some time in the classroom, even with adult professionals, to explain the various subtleties of site-packages, import path, py -x on windows, python-pip on linux, -m, virtualenv, header files, etc. before my students become autonomous with it. Without a teachers, this turn into months of bad practices and frustrations.

You'd have to fix that first, way, way before moving stuff to pypi. I do think it should be high priority actually: it affects way more than pip.

sph · on May 19, 2019

Having a huge standard library also kills analysis paralysis and lets people be more productive.

If you're in the flow and trying to hack together something, the last thing you need is to lose all momentum to pick a date time library. I've had this issue tons of times with Node and Rust, where I'm not up to date with the current meta and my 30 minute hack job is interrupted 5 minutes in by having to google which library should I use to do an HTTP request. (I've actually lost interest in whatever I was doing a few times because of this.)

Python's stdlib is nobody's favourite, but when you start to get to its limits, you're probably past your flow state, you've written most of the logic and you can spend some time to replace http.client with requests because the latter is much better.

On a tangent note, I've been trying to find another scripting language to replace Python because I'm not a fan of it anymore (I won't get into it right now), and considering what I just wrote, there's not much that can replace it, as most languages have a bare-bones standard library and if you're not up to date with the current best library to do X, you'll never achieve great productivity.

gray_-_wolf · on May 19, 2019

> I've been trying to find another scripting language to replace Python because I'm not a fan of it anymore (I won't get into it right now), and considering what I just wrote, there's not much that can replace it

Have you considered ruby?

agumonkey · on May 19, 2019

I think go did this well, they provide a very solid toolset that is nothing fancy but you can forget about it right away and start producing solutions.

sametmax · on May 19, 2019

Give it 25 years.

Kwpolska · on May 19, 2019

You still need to learn to use the library you picked, even if it’s part of stdlib. Python has two HTTP libraries in stdlib, http.client and urllib.request. The former is low-level, and the latter has a fairly complicated API. Learning to use them will take much longer than just giving in and picking Requests. The stdlib docs will tell you to use Requests. Everyone on the Internet will tell you to use Requests. Any questions you might have for http.client/urllib.request will be answered by “use Requests”.

guggle · on May 19, 2019

> I see the benefit of cleaning the stdlib of old stuff, like the wave module

Wait... what ? No way ! Some of us do you use Python to process wav files. If anything, I'd like this module to be updated, not removed.

dtech · on May 19, 2019

The argument is not that this module should be removed, it's that it too niche to be included in every python installation by default and should be installed through a package manager or similar.

pas · on May 19, 2019

Non maintained modules should be split off and put on PyPI with a big warning, of course.

lsak1201 · on May 19, 2019

> Asyncio is in the stdlib so that we have an official lib and API.

It was pitched first as a common low level async loop for other applications like Twisted and Tornado.

Then people started using it directly and the keywords were added.

It's great for the people who think the way asyncio does, others are now forced to use it. I find all of Twisted, Go and Jane Street's Async easier to use.

Perhaps Python is just the wrong language for me.

SlowRobotAhead · on May 19, 2019

>She was the main force behind the twisted 2->3 transition.

Explain? I’ve hated the slow adoption of 3.x from 2.x, and generally how terrible it is to have apps that are 2.x on your 3.x system, and would like to know more about how that happened.

sametmax · on May 19, 2019

Twisted is a lib, she happened to have contributed a lot to the migration effort for it.

0xbadcafebee · on May 18, 2019

It's funny to me that they're making a point that PyPI is better than core, because actually I think PyPI has created a rather crap ecosystem. The non-hierarchial organization of packages, the lack of curation, lack of inheriting past functionality and extending it as more standard functionality, etc has resulted in a confusing sprawl of packages with duplicate, incompatible, buggy functionality. It's a bit like Linux internals; it's grown haggard over time, isn't organized well, is badly documented, and so it's difficult to pick it up and use it without stumbling over a decade or more of stale documentation and obsolete software.

Perl has a much better set of modules that extend standard functionality, which considering how much flack Perl gets for being hard to read, is rather funny. Rather than every new feature being its own independent project, most of the useful modules inherit a parent and follow the same convention, leading to very simple and easy to use extensions. And Perl Core isn't all that great, but it does have some batteries included, and everything else is extended easily and in a more standard manner by CPAN.

zbentley · on May 18, 2019

Wow, I've had a really opposite experience with CPAN modules. I've overwhelmingly found them to not respect encapsulation (messing with all sorts of global state, not mentioning that they're doing it, and failing to clean up after themselves or even provide the tools to clean up well), be massively inconsistent in their APIs, have messy and hard-to-parse documentation (still better than Python's conventions here, though), and have some really silly hierarchy-related decisions, most of which I suspect stem from inter-maintainer politics and infighting, of which I've observed a large amount.

Sure, I've found some gems on CPAN, but, having worked on both Perl, Python, and Java at reasonable scale for awhile, I cannot understand all the praise CPAN gets. It's the worst-quality scripting language package ecosystem out there. Even NPM does a better job, and some things about NPM are awful. CPAN might have been the first/only/best package manager for a get-shit-done scripting language at some point, but not any more.

Separately, I agree about modules which extend language functionality (e.g. class systems, async programming, runtime typing) specifically. Perl does pretty well in that area. While many of those language-extension modules really don't play well with any other metaprogramming tools being installed in the project, I don't imagine that any alternatives in other languages do, either. My main beef above is with "simple" (read: not pervasive semantics changes) modules like IPC utilities, HTTP clients, or loggers that don't know how to stay in their lanes.

j88439h84 · on May 19, 2019

What functionality would you like to have on PyPI, in addition to curation?

0xbadcafebee · on May 19, 2019

The big "function" I would like is just organizing the packages differently to get people to think about and use them differently.

Search engines are a "cool" technology that have become the de facto way to find what you're looking for. But if there's a lot of content related to what you're looking for, they can suck.

Go to PyPI and search for "semantic version". 10,000+ projects for "semantic version" found. As you go through page after page of different modules related to versioning, the one module you won't find immediately is Versio (https://pypi.org/project/Versio/), a well-documented and useful module which I ended up using. I have no idea how I found this module, but it certainly wasn't from PyPI's search engine.

Now go to CPAN (really metacpan) and search for "semantic version". Yes, you're still looking at thousands of results - but wait! There are only two modules here that look useful: Version::Dotted::Semantic, and SemVer. And the description comes straight from the docs' README, rather than being a short uninformative blurb. The first module, Version::Dotted::Semantic, is inheriting a separate module, Version::Dotted, and adding some extra functionality. Not only does the search page give more information about the module, but the hierarchy makes it easier to find (and later extend) useful modules in an intuitive way. Since the base module's functionality is boring, generic, and simple, it's less likely that people will make 20 different versions of it, so it'll be reused more often and thus remain stable for a long time.

A lot of CPAN's module names have sprawled over time and gotten less useful, but there's still a general convention that you name your module as a hierarchy of what it does (even if it's kind of verbose) and make small, reusable modules, rather than giant modules that are hard to extend. Not all modules measure up to this standard, and there's definitely room to improve, but I think Python modules could benefit greatly from a system like this.

As far as curation goes, PyPI is often filled with cruft. While searching for Jenkins packages, you will come across lots of entries like this: https://pypi.org/project/jenkins2api/. The homepage leads to a GitHub 404, it's only ever had one release, and it has no documentation. This project should probably not have been listed on the main search page, or at least sorted well down the list by default with intelligent filters and marked accordingly. (The "date last updated" and "trending" sorting just results in having virtually no Jenkins-related modules in the results at all)

yingw787 · on May 18, 2019

I agree with Amber’s point that more stuff should be moved from the standard library to PyPI. I made my first pull request to CPython during the development sprints this year, and it’s honestly not the best experience. Everything is built from scratch in CI after every commit, even a documentation change. There’s nowhere near enough CI builds and pipelines for everything Python supports. Pull requests are outstanding for several months, and there’s at least a thousand PRs open when I checked this morning.

I’m not sure if Python’s ideal solution is to reduce stdlib and have endorsed packages in PyPI, but it would be an improvement over the current process.

_skel · on May 18, 2019

The story of Python 2 to Python 3 migration, in a nutshell:

> Van Rossum argued instead that if the Twisted team wants the ecosystem to evolve, they should stop supporting older Python versions and force users to upgrade. Brown acknowledged this point, but said half of Twisted users are still on Python 2 and it is difficult to abandon them. The debate at this point became personal for Van Rossum, and he left angrily.

someguydave · on May 19, 2019

Hopefully the “python foundation” will declare python 2 deprecated soon so that it can be handed over to responsible maintainers.

Agathos · on May 19, 2019

I don't understand the lowercase letters and scare quotes around python foundation. Is that not its name? (Okay, it's Python Software Foundation.)

The end of life date is already set: January 1, 2020.

It's open source so I don't know what you're looking for in terms of a formal handover. Yes, I bet Red Hat and others will continue to maintain their own versions past that date.

Groxx · on May 19, 2019

That's happening: https://pythonclock.org/

To ensure things move along: pip has been printing highly-visible "python 2.7 will deprecate soon" warnings for a couple months or so now.

schlenk · on May 19, 2019

And backing out of it when running on pypy, as that does not deprecate python 2 compatibility...

Groxx · on May 19, 2019

Sure. Pypy is a separate implementation, they only control CPython. That's a pretty normal arrangement - official moves on, other forks might backport fixes for longer or focus on stability or some other realm of performance or something.

Izkata · on May 19, 2019

Probably only on a recent pip version. Pip 10's dependency resolution doesn't like our requirements files (we have contradictory versions that work due to the order they are in the file), so we've mostly only gone up to pip 9.

resoluteteeth · on May 18, 2019

When I first used python like 20 years ago I was blown away by how much functionality was blown in, and it can be annoying using languages where even the most basic functionality involves downloading 50 packages from the internet, but on the other hand the standard library does seem to be a mess now.

geofft · on May 18, 2019

I found that attractive when I first learned Python, but when I (much more recently) started picking up Rust, I was blown away by how easy and normal it is to use external packages: the build tool and package manager are the same thing and shipped with the language, the hello-world-equivalent docs assume you're using it, and even the Rust compiler and standard library themselves can (carefully) depend on external packages. Having run up against limits of the Python standard library several times in years of writing production software in it and not just learning it, I find the batteries-not-includes-but-easy-to-install approach better on the whole. (Rust is not the only language that does this - my impression is Node/NPM and Swift, at least, are similar - but it's the one I happen to be familiar with.)

In Python's defense, this was not obvious at the time; my understanding is Rust came to this approach by looking at the experience of Python and other languages. When Python's standard library was first being written, there were no easy package managers for any language, and the normal thing to do for installing dependencies in e.g. C was to grab random tarballs and figure out how to build and deploy them yourself. So avoiding that process made perfect sense.

civility · on May 18, 2019

> Having run up against limits of the Python standard library several times in years of writing production software in it and not just learning it, I find the batteries-not-includes-but-easy-to-install approach better on the whole.

Obviously it doesn't apply to everyone, and it certainly doesn't apply to most startups or open source developers, but I spent most of the last 20 years working in environments where you have to get permission for every third party library you bring on to the network. Many networks were essentially "airgapped", so it's not like you could just ignore the rules. The bureaucratic process alone meant that we preferred large bundles like Anaconda or Qt. Trying to use Cargo as it is typically used and documented would be a complete non-starter.

bscphil · on May 19, 2019

>Obviously it doesn't apply to everyone, and it certainly doesn't apply to most startups or open source developers, but I spent most of the last 20 years working in environments where you have to get permission for every third party library you bring on to the network.

Situations like this will really make you appreciate "batteries included". I think this particular issue is fairly revealing of the attitudes common among programmers of different languages. I think it's a good thing to be skeptical of a program pulling in a bunch of standard libraries over the Internet. It worries me when I find something on Github I want to try and I can't download and compile it without it pulling in 30 or 100 other libraries that I haven't looked at or decided to trust come along for the ride. I don't like that way of doing software, and unfortunately it's the norm in node and starting to become a norm in Rust as well. Real security fails have been caused this way in node's case at least.

Even in cases when Python programs depend on external libraries, I usually don't need to use pip for anything because Python programs will pull in dependencies provided by your distribution just fine. (My distribution doesn't even have any Rust libraries, so even if dynamic linking is possible in Rust not many people are shipping software that way.)

"Download by default" is a worse way of doing things, and it makes me sad to see newer languages like Go and Rust embracing it.

roca · on May 19, 2019

Standard libraries serve several functions. One of them is to bless certain versions of certain libraries as "known good". This can be done outside the standard library too, without incurring the penalties of actually moving things into the standard library. Rust definitely needs more work in this area though.

geofft · on May 18, 2019

Yes, my own employer is similar (you don't need permission, but production systems have no internet access and so you need to pre-download all your tarballs etc.), and cargo doesn't work right. But I think there is work on pointing cargo at an internal mirror.

We do have an internal PyPI mirror (with devpi) and we point `pip` at that, and it works pretty well.

steveklabnik · on May 18, 2019

Hm, Cargo should absolutely work in this environment; it’s required by the Firefox and Debian build systems, for example.

(Pointing at an internal mirror is now stable. Setting up that mirror is the hard part.)

RayDonnelly · on May 19, 2019

You can use `conda install rust_osx-64` on macOS and `conda install rust_linux-64` to use `cargo` with the Anaconda Distribution libraries and tools (including its compilers).

mjw1007 · on May 18, 2019

Rust is good at many things, but it's not as attractive for Python for the sort of program you write when a shell script gets a bit too complicated and you want to write it in a proper language.

So Rust can get away more easily without having support for things like command line parsing in the standard library, because it isn't really trying to support situations where it would be inconvenient to give your program its own project directory and Cargo.toml and all.

hermanradtke · on May 18, 2019

This is a pretty niche case imo. If you really want a stand-alone dole, you can forego the use of Cargo.toml and instead use std::env::args to get command line arguments.

If it really matters, then do it right (in whatever language makes the most sense).

ryl00 · on May 18, 2019

> When Python's standard library was first being written, there were no easy package managers for any language

Wasn't perl's CPAN developed around that time period (mid '90s) ?

geofft · on May 18, 2019

Emphasis on easy :-P Using CPAN in even the late '00s was an ordeal.

qbaqbaqba · on May 19, 2019

It just took ages to install and test some huge and popular modules like Catalyst, Moose. However, once installed and tested they would just on your system. And it wasn't that easy for modules creators and core developers as red hat was notorious for shipping decades of Perl versions (5.10 times).

Aeolun · on May 19, 2019

Node is ostensibly batteries included, but they’re weird, arcane batteries powered by tears. Therefore, to get anything done easily usually requires an external package.

petre · on May 19, 2019

Node doesn't even have a stdlib. You have to npm or yarn anything. And yes most of the time it will end in tears. That's not my definition of batteries included, but rather some batteries might explode, others might only ignite, we wish you the best of luck.

zbentley · on May 19, 2019

> Node doesn't even have a stdlib.

Yes it does. Perhaps you're thinking of browser JS, which does not.

vpzom · on May 19, 2019

What's a stdlib then, if Node's isn't?

new4thaccount · on May 18, 2019

Yea. I'm using Rust at the moment for fun and the amount of things not in the standard library is crazy to me. What? There is no built-in dictionary? What do I use instead and where is it?

Edit: based off of all the replies below, everyone understands the validity of what I'm trying to say, but also have fortunately pointed out my admittedly grevious error of not knowing you can just import hashmap from stdlib. The extra step is pretty minimal and not a problem. I'm hoping this is covered in the Rust book.

nicoburns · on May 18, 2019

Dictionary in the python sense? There are two!

https://doc.rust-lang.org/std/collections/struct.HashMap.htm... https://doc.rust-lang.org/std/collections/struct.BTreeMap.ht...

stavros · on May 18, 2019

Oof, I wasted a good hour+ trying to convert from one to the other. Maybe there's an easy way to do this, but not many people on IRC knew (or were available at the time).

edflsafoiewq · on May 18, 2019

    use std::iter::FromIterator;
    BTreeMap::from_iter(hash_map.into_iter())

steveklabnik · on May 18, 2019

Or .into_iter().collect(), I believe. No import needed.

a1369209993 · on May 18, 2019

Assuming the keys are Hash + Ord, wouldn't that just be:

  for (k,v) in &src
    { dst.insert(k,v); }

?

stavros · on May 19, 2019

Yes, that's what I ended up doing, but the error confused me and I ended up looking for a less hacky way for more than an hour until I said "fuck it" and just did that.

new4thaccount · on May 18, 2019

At least the first one seems pretty straightforward and thanks for replying. I only just started and have a lot to learn.

qbaqbaqba · on May 19, 2019

Are they compatible?

jonesetc · on May 18, 2019

I know it's not the whole point, but there is a hash map and sorted (btree) map in the std collections.

That being said the point still stands, I remember specifically feeling it at the lack of included regex. I overall prefer the slim std lib of rust over the massive python though. Especially with the community being pretty good about nominating de facto standard packages like for regex.

resoluteteeth · on May 18, 2019

Yeah Rust is pretty extreme. I was just trying to generate a random number in it and was pretty surprised to find out that at some point it did have this functionality in the standard library but they actually removed it and now it's a separate crate.

nicoburns · on May 18, 2019

That is actually being considered for inclusion at some point. It was mainly excluded due it not being ready and them not wanting to commit to the existing API. That said, I like the Rust way. Including a library is pretty easy, and it mains you get the best API as the main one (rather than a stdlib function that is "good enough", and another library to use if you really care about that functionality).

CDSlice · on May 18, 2019

If you are using dictionary in the Python sense, Rust has that built in too, you just have to import it from the standard library collections module.

https://doc.rust-lang.org/std/collections/struct.HashMap.htm...

jzoch · on May 18, 2019

You mean a map? It's in the standard library.

new4thaccount · on May 18, 2019

Yep. Map/Hash/Dictionary.

I didn't know it was included in stdlib (I really thought it wasn't) and feel idiotic now. It is still odd (having a primarily scripting background and not coming from the systems side) that I have to include what seems to be essentially an import statement at the top. I guess it is a lot more efficient that way though. Thanks for pointing out my error!

jgibson · on May 18, 2019

Er, doesn't hashmap do pretty much the same thing? https://doc.rust-lang.org/std/collections/struct.HashMap.htm...

rightbyte · on May 18, 2019

When I first touched Python I had only used C. Is your story similair?

Had I been using Java or Visual Basic or even C++ proirly maybe I wouldn't be so impressed as I were.

I think the mistake Python is doing is messing with it's simplicity with decorations, halfass lambdas and stuff. As a newbie you could understand Python code, while eg. C++ templates were magic. You need to know more Python to understand Python code nowadays.

icegreentea2 · on May 18, 2019

Decorators added were in in 2004, I think lambdas were an original language feature (like... before 2004).

There are other new stuff that might be confusing to beginners (like our mighty walrus operator), but those two examples have been there for effectively forever.

rightbyte · on May 18, 2019

Ok it's maybe just my random perception of the language evolving in unsync with my (moderate) skills in it.

I just had the feeling there's more "stuff" that you need to know.

nerdwaller · on May 18, 2019

> She thinks that some bugs in the standard library will never be fixed.

This is actually an interesting paradox to be in, and one that Linus Torvalds recently commented on. His focus, like Guido’s, is the user and even fixing a bug can break the user.

https://lkml.org/lkml/2018/8/3/621

notatoad · on May 18, 2019

This isn't a paradox. once it's released it's not a bug anymore, it's just behaviour. document the behaviour, but breaking compatibility with previous versions is a bug. it doesn't matter how obviously wrong the previous behaviour is.

nnq · on May 18, 2019

> but breaking compatibility with previous versions is a bug

That's how you get an inconsistent mess that never evolves. There's something called semver, increase the version number and do the fix / refactors / radical redesign / whatever. People will see that you've went from version 1.0 to 87.3 in one year and they may choose not to use your thing because you're moving too fast for them, but that's life...

llukas · on May 19, 2019

1.0 to 87.3 in one year? More realistic would be python 2->3 in 10 years or so and still too fast...

cameronbrown · on May 18, 2019

Better than breaking software. Linux is far more important when it comes to ABI stability here than Python though.