Hacker News new | past | comments | ask | show | jobs | submit login
Why Python keeps growing, explained (github.blog)
265 points by usrme on March 3, 2023 | hide | past | favorite | 439 comments



One thing I’d add to this conversation, though I’m certain it’s already been stated: As many have mentioned, there is a large subset of the user base that uses Python for applied purposes in unrelated fields that couldn’t care less about more granular aspects of optimization. I work as a research assistant for international finance faculty and I would say that compared to the average Hackernews reader, I’m technologically illiterate, but compared to the average 60-80 y/o econ/finance faculty member, I’m practically a Turing award winner.

Most of these applied fields are using Python and R as no more than data gathering tools and fancy calculators. something for which the benefits of other languages are just not justified.

The absolute beauty of Python for what I do is that I can write code and hand it off to a first year with a semester of coding experience. Even if they couldn’t write it themselves, they can still understand what it does after a bit of study. Additionally, I can hand it off to 75 year old professors who still sends Fax memos to the federal reserve and they’ll achieve a degree of comprehension.

For these reasons, Python, although not perfect, has been so incredibly useful.


I just want to add to this, I had this exact same experience when working with journalists and other non-technical background programmers.

You’ll find everyone from philosophy PhDs to Biologists to Journalists who use pandas because its so easy to learn it and work with it. It’s amazing how you can become productive in python/pandas without any experience or even basic understanding of programming because of how accessible jupyter, colab and blogs/docs on pandas are.

The other thing people don’t talk about is that a lot of these organizations can hire a CS student part time or a full time software engineer/data engineer/data scientist who can optimize their scripts once they are written. Pretty much any software engineer can read and debug python code without needing to learn python. So for example, I know some engineers working in genomics who have turned biologist-written scripts that take several days to run in python into scripts that take hours or minutes to run by doing basic optimizations like removing quadratic algorithms from the script or applying pyspark or dask to add parallelism.

The fact that python can be used as a bridge between technical and non-technical people is amazing and I think it has provided a better bridge between these groups than SQL was ever able to provide.


I couldn’t agree more. And I must say, now that it’s being used as a bridge between technical and nontechnical talent it’s becoming ever more vital from a career perspective. Most people recognize the value of fundamental coding skills and if you’re even just above average at coding in a non-CS field, you seem magnitudes more valuable than you really are. In both industry and research, ears immediately perk up when they realize I have a background in economics but competencies in coding beyond the standard regressions in R that everyone does in econometrics. It’s hilarious because as mentioned prior, I’m rather pathetic compared to most people on this forum.


Yeah, Python is widely used where I work for just that. The "hierarchy" of tools look somewhat like this - from most to least technical competent users

1) Languages like Python / R / Julia / etc. + SQL

2) PowerBI, Tableau, or similar tools

3) Excel

The number of users of those tools will be the inverse, with Excel being number 1.

If you're competent using the "stack" above, you could probably work as an analyst anywhere - given that you can pick up domain knowledge.


I hate to admit that I very often start the python repl to just do some simple calculations. I always have multiple terminals open so instead of opening a calculator I just use python in one of the terminals.


Agreed. Python's REPL has basically totally replaced my usage of Emacs calc as a desk calculator, mainly because it is always there and if I don't know the big-brain closed-form solution for something like compound interest, I can just write a loop and figure it out that way.


So what you are saying is that Python is Excel for programmers :D


This is a really good line, the VAST VAST majority of programming in the world is done in Excel by people who would be horrified if you told them they were programming.

And I wouldn't be surprised if a large number of python programmers would say they're not programming, it's just scripting.


I also use a python repl as an alternative to excel or SQL. I find myself just downloading the data as a CSV and then quickly cooking up some pandas to get a graph or aggregate some stats, it’s just so much quick easier imo.


I’ve migrated to the tidyverse for most of my EDA and plotting - I’ve found dplyr and ggplot to be noticeably more expressive. Pandas always added a ton of friction for me.

It’s still my choice for quick and non-graphical analysis when I’m on a remote.


An alternative to pandas/Python for similar uses is https://www.visidata.org/. You can use Python in it also.


A bit off topic, but what would you use for data "mangling"? Like joining csvs on complex conditions, cleaning tables etc. Pandas seems to be the wrong tool for this, but I still often find myself using it as in contrast to something like Excel, my steps are at least clearly documented for future use or verification.


If you asked this question 6 or 8 years ago the answer would be it depends on the volume of data (10s of gb, 100s of gb etc.) and I could give you just a single tool that would help you in most cases.

Today honestly most tools are pretty capable, pandas is a great choice and if you have really high volumes of data you might try koalas (spark) or polars.

Honestly the biggest design considerations for data science today are things things external to your project: what do you and others on your team know, what tools does your company already have setup, what volume of data are you processing, what are your SLAs, who or what else needs to run this script/workflow, what softwares do you need to integrate with, how often does it need to be processed, how are you going to assure the quality of your data and what tools are you using for reporting?

I tend to use pandas and SQLite for most use cases cause I can cook up a script in 2 hours and be done, I just code it interactively in a notebook and most people are able to work on a pandas or SQLite script productively if it needs to be maintained even if they don't know python. If its a large volume of data or a rapid schedule (minutes, seconds) or tight SLAs on quality or processing time, then I start to consider whether pyspark, Apache beam, dask or bigquery might be a good fit.

So it really just depends but for most people who are processing < 100 GB on a 1+ day schedule or ad hoc I would recommend just using pandas or tidyverse in R and getting really good at writing those scripts fast. Today you’ll get the most mileage out of those two tools.


I still use perl for some of that stuff, or even awk, but those are barely reusable or readable.


This is a letter to the general community: please stop writing these scripts in perl and bash one liners. That one off script you thought would only be used once or twice at this nonprofit has been in continuous use for 12 years and every year a biologist or journalist runs your script having no idea how it actually works. Eventually the script breaks after 8 years and some poor college student interns there and has to figure out how perl works, what your spaghetti is doing and eventually is tasked with rewriting it in python as an intern project (true story).


I think your complaint isn't really about perl and bash. It's about knowing your audience.

When writing code that will be used by a particular sort of user base, the code should be written in whatever way best suits that user base. If your users are academics, researchers, journalists, etc. -- yes, avoid anything with complex or obscure semantics like perl or bash.

But if your code is going to be used by programmers or people who are already comfortable with perl/bash/whatever, those tools may be just the ticket.


one line spaghetti ... I remain unsympathetic.


He has a valid point, though. I've seen (and written!) one-liners that were so complex that nobody, even devs, can deal with them without decoding them first.

They aren't technically "spaghetti", but they are technically impenetrable.

I argue that one-liners like that aren't good for anybody, dev or otherwise.


Do you reply on any GitHub repo or gist w/ code snippets?


> I very often start the python repl to just do some simple calculations.

If you use the python repl a lot and haven't heard of it, ptpython is worth checking out as a repl replacement. I find it to be much more ergonomic.


yup, from decimal import Decimal, and get better accuracy than any default calculator


You may like xonsh

https://xon.sh/

No need to fire up a python repl.


I don't see why that's something to be ashamed of. I frequently pop open a Ruby on Rails console for this purpose. (Basically ruby's repl + libraries and language extensions.)


Eh, I type basic operations in Spolight or Google, whichever is lying on my screen!


I have python on my phone and use it to calculate tips sometimes.


Have you tried ipython? Python repl on steriods!


from time to time yes. Ideally I would also have a jupyter notebook running at all times, but in the end it mostly comes down to vanilla python because that's installed on everything I am using


I do too if I already have a repl open, but otherwise I mostly use bc so I don’t have to wait for the slight lag of the repl to start


What’s to hate about that? It’s a perfectly good use of Python and I do it all the time.


I've seen this too. Python has supplanted what used to be done in a spreadsheet entirely, even the custom VBA macro stuff that was once a high level spreadsheet. Python with/plus viz is more enjoyable experience than trying to wrangle some general purpose spreadsheet into doing this stuff. And, it's relatively portable and transferrable which are major advantages of the spreadsheets.


I'm one of Python's biggest critics (to me it's a Monkey's Paw of software development), but I think this is exactly the appropriate situation to use it. It's great for one-off fancy calculations, system scripts, ideally with no dependencies and/or a short lifetime


> to me it's a Monkey's Paw of software development

This piqued my curiosity. I've worked with Python on and off for the last ~20 years, and while I'm not a fanboy or apologist, and use other tools when appropriate, there's also a reason it remains in my toolbox and sees regular use while many other tools have come/gone/been replaced by something better.

Can you share an example scenario where it's a Monkey's Paw? My suspicion is that this is more of an org issue than a tech issue?


Dependency management/tooling. Python (philosophically) treats the whole system as a dependency by default, in contrast with other modern languages that operate at the project/workspace level. This means it's very hard to isolate separate projects under the same system, or to reproducibly get a project running on a different system (even the same OS, because the system-wide state of one machine vs the next matters so much).

People work around these issues with various kludges like virtual environments, Docker (just ship the whole system!), and half a dozen different package managers, each with their own manifest format. But this is a problem that simply doesn't exist in Go, JavaScript, Rust, and others.

For code that never needs anything except the standard library, or for a script that never needs to be maintained or run on a different machine, Python is fine. Maybe even nice. But I've watched my coworkers waste so many hundreds of developer-hours just trying to wrangle their Python services into running locally, managing virtual environments, keeping them from trampling on each other's global dependencies, following setup docs that don't work consistently, and fixing deployments that fail every other week because the house is built on sand.


No.

Virtualenvs, and requirements are a thing in Python for ages.

I’ve used tons of languages and while not the best, Python dependency management and project isolation is decent. IMO certainly better than JavaScript.


It's decent if you've been in the loop enough to use it. It's not built-in. It's a good practice, for sure, but it not being built-in at the language level makes it insanely easy for a newcomer to just... Not use virtualenvs at all.

In contrast to Javascript/Node.js/NPM/Yarn/whatever-you-want-to-call-server-js, which maintains a local folder with dependencies for your project, instead of installing everything globally by default.

Heck, a virtual env is literally a bundled python version with the path variables overriden so that the global folder is actually a project folder, basically tricking Python into doing things The Correct Way.


Virtualenvs are a part of the standard library since v3.3[0] and most READMEs do reference them btw.

[0]: https://docs.python.org/3/library/venv.html


It's been said, quite correctly, that Python is the second best language for everything.

I feel that it has recently - like many really mature platforms - become very much like the elephant from that old apocryphal story [0]. It is being used for many different purposes, with very different requirements and needs, with users being so focused on their own use that anything outside that is considered "bloat" and "waste".

[0] https://en.wikipedia.org/wiki/Blind_men_and_an_elephant


when it comes to slightly more non simple use cases involving parallelism and concurrency python and their imperative kin starts falling quite short of basic needs that are easily satisfied by

fp languages like

ocaml

haskell

racket

common lisp

erlang

elixir

or rust/golang

but even if the code is single threaded and not hampered by GIL limitations python tends to be super slow imho; also debugging dynamic python and imperative stateful python after a certain code base size >10k LOC gets extremely painful


A lot of these problem spaces can get away with single threaded performance because maybe they're generating a report or running an analysis once a day or at even slower frequency. I work in a field where numerical correctness and readability is important for prototyping control algorithms (I work on advanced sensors) and python satisfies for those properties for our analysis and prototyping work.

When we really want or need performance we rewrite the slow part in C++ and use pybind to call into it. For all the real implementations that run in a soft real time system, everything is done in C++ or C depending on the ecosystem.


debugging dynamic python and imperative stateful python after a certain code base size >10k LOC gets extremely painful

for any meaningful scale you are better served by basic FP hygiene as evidenced in

haskell

elixir

CL/racket

or rust/golang


Because you say it doesn't make it true. It's not that painful or painful at all really. Good abstractions and planning make writing and maintaining a python easy, just like in any language.


I don’t get it. Go is as imperative as a language can be.


go is imperative but there are functional elegant styles borrowed from otp/erlang in ergo https://github.com/ergo-services/ergo https://memo.barrucadu.co.uk/three-months-of-go.html


Common Lisp, paragon of FP:

  (loop for x across numbers
        when (evenp x)
          do (setf result (+ result x)))
I mean yeah, you can do FP in CL, but it allows you to program in any paradigm which you prefer.


I agree. But most people just need a pick up truck, not forming railway consists.


Python is ideal for the non-professional programmer who wants to put their skills and knowledge on wheels.


>As many have mentioned, there is a large subset of the user base that uses Python for applied purposes in unrelated fields that couldn’t care less about more granular aspects of optimization.

Nobody cares about this that much. Even a straight up software developer in python doesn't care. The interpreter is so slow that most optimization tricks are irrelevant to the overall bottleneck. Really optimizing python involves the FFI and using C or C++, which is a whole different ball game.

For the average python developer (not a data scientist) most frameworks have already done this for you.


Python keeps growing in number of users because it’s easy to get started, has libraries to load basically any data, and to perform any task. It’s frequently the second best language but it’s the second best language for anything.

By the time a python programmer has «graduated» to learning a second language, exponential growth has created a bunch of new python programmers, most of which don’t consider themselves programmers.

There are more non-programmers in this world, and they don’t care - or know about - concurrency, memory efficiency, L2 cache misses due to pointer chasing. These people all use python. This seems to be a perspective missing from most hackernews discussions, where people work on high performance Big corp big data web scale systems.


I fully agree with the description.

What worries me, though, is that the features that make Python quite good at prototyping make it rather bad at auditing for safety and security. And we live in a world in which production code is prototyping code, which means that Python code that should have remained a quick experiment – and more often than not, written by people who are not that good at Python or don't care about code quality – ends up powering safety/security-critical infrastructures. Cue in the thousands of developer-hours debugging or attempting to scale code that is hostile to the task.

I would claim that the same applies to JavaScript/Node, btw.


I sometimes think about what Python would be like if it were written today, with the hindsight of the last thirty years.

Immutability would be the default, but mutability would be allowed, marked in some concise way so that it was easy to calculate things using imperative-style loops. Pervasive use of immutable instances would make it impossible for libraries to rely on mutating objects a la SQLAlchemy.

The language would be statically type-checked, with optional type annotations and magic support for duck typing (magic because I don't know how that would work.) The type system would prioritize helpful, legible feedback, and it would not support powerful type-level programming, to keep the ecosystem accessible to beginners.

It would still have a REPL, but not everything allowed in the REPL would be allowed when running code from a file.

There would be a strong module system that deterred libraries from relying on global state.

Support for at least one fairly accessible concurrency paradigm would be built in.

I suspect that the error system would be exception-based, so that beginners and busy people could write happy path code without being nagged to handle error values and without worrying that errors could be invisibly suppressed, but there might be another way.


I think free mutability and not really needing to know about types are two things that make the language easier for beginners.

If someone who's not familiar with programming runs into an error like "why can't I change the value of X" that might take them multiple hours to figure out, or they may never figure it out. Even if the error message is clear, total beginners often just don't know how to read them and use them.

They provide longer term advantages once your program becomes larger but the short term advantages are more important as a scripting language imo


The type system I want would just be a type system that tells you that your code will fail, and why. Pretty much the same errors you get at runtime. Hence the need for my hypothetical type system to handle duck typing.

I don't think mutability by default is necessary for beginners. They just need obvious ways of getting things done. There are two places beginners use mutability a lot. The first is gradual transformation of a value:

    line = "The best of times, the worst "
    line = line.trim()
    line = line[:line.find(' ')]
This is easily handled by using a different name for each value. The second is in loops:

    word_count = 0
    for line in lines():
        word_count += num_words(line)
I think in a lot of cases beginners will have no problem using a map or list comprehension idiom if they've seen examples:

    word_counts = [num_words(line) for line in lines]
    # or word_counts = map(num_words, line)
    word_count = sum(word_counts)
But for cases where the immutable idiom is a bit tricker (like a complicated fold) they could use a mutable variable using the mutability marker I mentioned. Let's make the mutability marker @ since it tells you that the value can be different "at" different times, and let's require it everywhere the variable is used:

    word_count @= 0
    for line in lines():
        word_count @= word_count + num_words(line)
Voila. The important thing is not to mandate immutability, but to ensure that mutability is the exception, and immutability the norm. That ensures that library writers won't assume mutability and rely on it (cough SQLAlchemy cough), and the language will provide good ergonomic support for immutability.

It's a common claim that immutability only pays off in larger programs, but I think the mental tax of mutability starts pretty immediately for beginners. We're just used to it. Consider this example:

    dog_name = Name(first='Rusty', last='Licks')
    dog1.name = favorite_name
    dog_name.last = 'Barksalot'
    dog2.name = favorite_name
    print(dog1.name) # It's not Rusty Licks!
Beginners shouldn't have to constantly wrestle with the difference between value semantics and reference semantics! This is the simplest possible example, and it's already a mind-bender for beginners. In slightly more complicated guises, it even trips up professionals programmers. I inherited a Jupyter notebook from a poor data scientist who printed out the same expression over and over again in different places in the notebook trying to pinpoint where and why the value changed. (Lesson learned: never try to use application code in a data science calculation... lol.) Reserving mutability for special cases protects beginners from wrestling with strange behavior from mistakes like these.


You should check out Julia (https://julialang.org/), that's very close to what you describe.


You beat me to it!

Julia is both dynamic and fast. It doesn’t solve all issues but uniquely solves the problem of needing 2 languages if you want flexibility and performance.


Exception error handling - and their extensive use in the standard library -is the fundamental design mistake that prevented Python becoming a substantial programing language.

Coupled with the dynamic typing and mutability by default, it guarantees Python programs won't scale, relegating the language to the role of a scratchpad for rough drafts and one off scripts, a toy beginner's language.


I have no idea why you say that it's a scratchpad or a toy language consdering that far more production lines of code are getting written in Python nowadays than practically any other language with the possible exception of Java.


But that's the same with Excel: massive usage for throwaway projects with loose or non-existing requirements or performance bounds that end-up in production. Python is widely used, but not for substantial programming in large projects - say, projects over 100 kloc. Python hit the "quick and dirty" sweet spot of programming.


This is absolutely not true. I’ve made my living working with Python and there’s an astounding amount of large Python codebases. Onstage and YouTube alone have millions of lines of code. Hedge funds and fintechs base their entire data processing workflows around Python batch jobs. Django is about as popular as Rails and powers millions of websites and backends.

None of those applications are toys. I have no idea where your misperception is coming from.


I guess I'm more than a little prejudiced from trying to maintain all sorts of CI tools, web applications and other largeish programs somebody initially hacked in Python in an afternoon and which grew to become "vital infrastructure". The lack of typing bytes you hard and the optional typing that has been shoehorned into the language is irrelevant in practice.

All sorts of problems would simply have not existed if the proper language was used from the beginning, as opposed to the one where anyone can hack most easily.


Statically typed with duck typing is called structural typing. (As opposed to nominal typing, with inheritance hierarchies).

It’s already what you get with python and mypy. Using typing.Protocol or Unions.


This is pretty much what nim is btw. Very fun language in my experience.


Nim is fun, but it needs to be more popular.


Aren't you kinda describing OCaml?


We still live in a world where many outward facing networked applications are written in C. Dynamic languages with safe strings are far from the floor for securable tools.


That is true.

However, I hope that these C applications are written by people who are really good at C. I know that some of these Python applications are written by people who discovered the language as they deployed into production.


That’s a measure of programming prowess, not the actual security concern at hand.

If the masterful C developer still insists on using a language that has so many footguns and a weird culture of developers pretending that they’re more capable than they are, then their C mastery could very well’ve not been worth much against someone throwing something together in Python, which will at the very least immediately bypass the vast majority of vulnerabilities found in C code. Plus, my experience with such software is that the sort of higher level vulnerabilities that you’d still see in Python code aren’t ones that the C developer has necessarily dealt with.


That's entirely possible.

How could we check?


Python code can be production code. There are many people and companies shipping Python production code and generating substantial value.


You are correct, there are absolutely huge companies shipping Python production code and generating substantial value.

Do we agree that this somewhat orthogonal to what I'm writing, though?


A popular opinion in game development is that you should write a prototype first to figure out what works and is fun, and once you reach a good solution throw away that prototype code and write a proper solution with the insigts gained. The challenge is that many projects just extend the prototype code to make the final product, and end up with a mess.

Regular sofware development is a lot like that as well. But you can kind of get around that by having Python as the "prototyping language", and anything that's proven to be useful gets converted to a language that's more useful for production.


Hey, it is better than programs written as Excel functions.


It is.

But I fear that the same folks that decried the use of excel by "the masses" are now just as horrified by the widespread usage of Python! :-)


> the features that make Python quite good at prototyping make it rather bad at auditing for safety and security

What's an example that makes it bad? Is it a case of the wrong tool for the job?

For example I understand that garbage collection languages shouldn't be used with real time systems like flight controllers.


What audits need most is some ability to analyze the system discretely and really "take it apart" into pieces that they can apply metrics of success or failure to(e.g. pass/fail for a coding style, numbers of branches and loops, when memory is allocated and released).

Python is designed to be highly dynamic and to allow more code paths to be taken at runtime, through interpreting and reacting to the live data - "late binding" in the lingo, as opposed to the "early binding" of a Rust or Haskell, where you specify as much as you can up front and have the compiler test that specification at build time. Late binding creates an explosion of potential complexity and catastrophic failures because it tends to kick the can down the road - the program fails in one place, but the bug shows up somewhere else because the interpreter is very permissive and assumes what you meant was whatever allows the program to continue running, even if it leads to a crash or bad output later.

Late binding is very useful - we need to assume some of it to have a live, interactive system instead of a punchcard batch process. And writing text and drawing pictures is "late binding" in the sense of the information being parsed by your eyes rather than a machine. But late binding also creates a large surface area where "anything can happen" and you don't know if you're staying in your specification or not.


Interesting. What kinds of software get this level of audit scrutiny?


There are many examples, but let's speak for instance of the fact that Python has privacy by convention and not by semantics.

This is very useful when you're writing unit tests or when you want to monkey-patch a behavior and don't have time for the refactoring that this would deserve.

On the other hand, this means that a module or class, no matter how well tested and documented and annotated with types, could be entirely broken because another piece of code is monkey-patching that class, possibly from another library.

Is it the case? Probably not. But how can you be sure?

Another (related) example: PyTorch. Extremely useful library, as we have all witnessed for a few years. But that model you just downloaded (dynamically?) from Hugging Face (or anywhere else) can actually run arbitrary code, possibly monkey-patching your classes (see above).

Is it the case? Probably not. But how can you be sure?

Cue in supply chain attacks.

That's what I mean by auditing for safety and security. With Python, you can get quite quickly to the result you're aiming for, or something close. But it's really, really, really hard to be sure that your code is actually safe and secure.

And while I believe that Python is an excellent tool for many tasks, I am also something of an expert in safety, with some experience in security, and I consider that Python is a risky foundation to develop any safety- or security-critical application or service.


Thanks for this, super insightful perspective


There's also the argument that at a certain scale the time of a developer is simply more expensive than time on a server.

If I write something in C++ that does a task in 1 second and it takes me 2 days to write, and I write the same thing in Python that takes 2 seconds but I can write it in 1 day, the 1 day of extra dev time might just pay for throwing a more high performance server against it and calling it a day. And then I don't even take the fact that a lot of applications are mostly waiting for database queries into consideration, nor maintainability of the code and the fact that high performance servers get cheaper over time.

If you work at some big corp where this would mean thousands of high performance servers that's simply not worth it, but in small/medium sized companies it usually is.


Realistically something that takes 1 second in C++ will take 10 seconds (if you write efficient python and lean heavily on fast libraries) to 10 minutes in python. But the rest of your point stands


I spend most of my time waiting on IO, something like C++ isn't going to improve my performance much. If C++ takes 1ms to transform data and my Python code takes 10ms, it's not much of a win for me when I'm waiting 100ms for IO.

With Python I can write and test on a Mac or Windows and easily deploy on Linux. I can iterate quickly and if I really need "performance" I can throw bigger or more VPSes at the problem with little extra cognitive load.

I do not have anywhere near the same flexibility and low cognitive load with C++. The better performance is nice but for almost everything I do day to day completely unnecessary and not worth the effort. My case isn't all cases, C++ (or whatever compiled language you pick) will be a win for some people but not for me.


And how much code is generally written that actually is compute heavy? All the code I've ever written in my job is putting and retrieving data in databases and doing some basic calculations or decisions based on it.


Rule of thumb:

Code is "compute heavy" (could equally be memory heavy or IOPs heavy) if it's deployed into many servers or "the cloud" and many instances of it are running serving a lot of requests to a lot of users.

Then the finance people start to notice how much you are paying for those servers and suddenly serving the same number of users with less hardware becomes very significant for the company's bottom line.

The other big one is reducing notable latency for users of your software.


That is absolutely true.

But sometimes, you do end up writing that compute heavy piece of code. At that stage, you have to learn how to write your own native library :)

Speaking of which, I've written some Python modules in Rust using PyO3, its' a very agreeable experience.


Damn! Is the rule of thumb really a 10x performance hit between Python/C++? I don’t doubt you’re correct, I’m just thinking of all the unnecessary cycles I put my poor CPU through.


Outside cases where Python is used as a thin wrapper around some C library (simple networking code, numpy, etc) 10x is frankly quite conservative. Depending on the problem space and how aggressively you optimize, it's easily multiple orders of magnitude.


Those cases are about 95% of scientific programming.

This is the first line in most scientific code:

    import numpy


FFI into lean C isn't some perf panacea either, beyond the overhead you're also depriving yourself of interprocedural optimization and other Good Things from the native space.


Of course it depends on what you are doing, but 10x is a pretty good case. I recently re-wrote a C++ tool in python and even though all the data parsing and computing was done by python libraries that wrap high performance C libraries, the program was still 6 or 7 times slower than C++. Had I written the python version in pure python (no numpy, no third party C libraries) it would no doubt have been 1000x slower.


It depends on what you're doing. If you load some data, process it with some Numpy routines (where speed-critical parts are implemented in C) and save a result, you can probably be almost as fast as C++... however if you write your algorithm fully in Python, you might have much worse results than being 10x slower. See for example: https://shvbsle.in/computers-are-fast-but-you-dont-know-it-p... (here they have ~4x speedup from good Python to unoptimized C++, and ~1000x from heavy Python to optimized one...)


It can be anywhere from 2-3x for IO-heavy code to 2000x for tight vectorizable loops. But 20x-80x is pretty typical.


Last time I checked (which was a few years ago), the performance gain of porting a non-trivial calculation-heavy piece of code from Python to OCaml was actually 25x. I believe that performance of Python has improved quite a lot since then (as has OCaml's), but I doubt it's sufficient to erase this difference.

And OCaml (which offers a productivity comparable to Python) is sensibly slower than Rust or C++.


It really depends on what you're doing, but I don't think it is generally accurate.

What slows Python down is generally the "everything is an object" attitude of the interpreter. I.e. you call a function, the interpreter has to first create an object of the thing you're calling.

In C++, due to zero-cost abstractions, this usually just boils down to a CALL instruction preceded by a bunch of PUSH instructions in assembly, based on the number of parameters (and call convention). This is of course a lot faster than running through the abstractions of creating some Python object.


> What slows Python down is generally the "everything is an object" attitude of the interpreter

Nah, it’s the interpreter itself. Due to it not having JIT compilation there is a very high ceiling it can not even in theory surpass (as opposed to things like pypy, or graal python).


I don't think this is true: Other Python runtimes and compilers (e.g. Nuitka) won't magically speed up your code to the level of C++.

Python is primarily slowed down because of the fact that each attribute and method access results in multiple CALL instructions since it's dictionaries and magic methods all the way down.


Which can be inlined/speculated away easily. It won’t be as fast as well-optimized C++ (mostly due to memory layout), but there is no reason why it couldn’t get arbitrarily close to that.


> Which can be inlined/speculated away easily.

How so? Python is dynamically typed after all and even type annotations are merely bolted on – they don't tell you anything about the "actual" type of an object, they merely restrict your view on that object (i.e. what operations you can do on the variable without causing a type error). For instance, if you add additional properties to an object of type A via monkey-patching, you can still pass it around as object of type A.


A function/part of code is performed say a thousand times, the runtime collects statistics that object ‘a’ was always an integer, so it might be worthwhile to compile this code block to native code with a guard on whether ‘a’ really is an integer (that’s very cheap). The speedup comes from not doing interpretation, but taking the common case and making it natively fast and in the slow branch the complex case of “+ operator has been redefined” for example can be handled simply by the interpreter. Python is not more dynamic than Javascript (hell, python is strongly typed even), which hovers around the impressive 2x native performance mark.

Also, if you are interested, “shapes” are the primitives of both Javascript and python jit compilers instead of regular types.


Other than this, dynamic typing is a big culprit. I can't find back the article with the numbers, but its performance overhead is enormous.


Well at least 10x, sometimes more. Not really surprising when you think about that it's a VM reading and parsing your code as a string at runtime.


> it's a VM reading and parsing your code as a string at runtime.

Commonly it creates the .pyc files, so it doesn't really re-parse your code as a string every time. But it does check the file's dates to make sure that the .pyc file is up to date.

On debian (and I guess most distributions) the .pyc files get created when you install the package, because generally they go in /usr and that's only writeable by root.

It does include the full parser in the runtime, but I'd expect most code to not be re-parsed entirely at every start.

The import thing is really slow anyway. People writing command lines have to defer imports to avoid huge startup times to load libraries that are perhaps needed just by some functions that might not even be used in that particular run.


> re-parse your code as a string every time

That doesn’t really take any significant time though on modern processors.


Aren't those pyc files still technically just string bytecode, but encoded as hex?


Well bytecode isn't the same as the actual code you write in your editor.


As a long-time Python lover, yes that's a decent rule of thumb.


It is anywhere from 1x to 100x+.


If the 1 second is spent waiting for IO, it will take 1 second in whatever language.

But yes python is slow.

However I've seen good python code be faster than bad C code.


Well, to be fair the "good python code" is probably just executing something written in c lol. But lots of python is backed up by stuff written in c.


Not necessarily. Just using a better optimized sort or hash algorithm can make a big difference.

I was talking specifically of pure python code (except the python's standard library itself, where it really is unavoidable).


Of course algorithmic complexity will trump anything else at big enough n values.


Not for everything. There are plenty of Python operations that are not 10x slower than c.


That is true, but there are relatively few real world applications that consist of only those operations. In the example I mentioned below, there where actually some parts of my python rewrite that ended up faster than the original C++ code, but once everything was strung together into a complete application those parts where swamped by the slow parts.


Most of the time these are arithmetic tight loops that require optimisations, and it's easy to extract those into separate compiled cython modules without losing overal cohesion within the same Python ecosystem.


If Python was merely twice as slow then I could agree with you.


Not all code needs to process terabytes of data.

I have code running that reads ~20 bytes, checks the internal status on an hashmap and flips a bit.

Would it be faster in C? Of course.

Would it have taken me much longer to write to achieve absolutely no benefit? Yes.


Speeding up the time critical parts with Cython or Numba or ... is rather easy.


There are also programmers who are tired of chasing pointers and simply want to get stuff done.

E.g. people who once wrote "robust" code in Rust but were "outcompeted" left and right by coworkers who churn out shiny new things at 10x the speed.


At some point, every engineer has heard this same argument but in favor of all kinds of dubious things such as emailing zip files of source code, not having tests, not having a build system, not doing IaC, not using the type system, etc.

I'm sure Rust was the wrong tool for the job in your case but I find this type of get shit done argument unpersuasive in general. It overestimates the value of short-term delivery and underestimates how quickly an investment in doing things "the right way" pays off.


Totally depends on the business you're in.

If you're dealing in areas with short time limits then Python is great, because you can't sell a ticket for a ship that has sailed.

And I've seen "the right way" which, again, depending on the business may result in a well designed product that is not what's actually needed (because people are really bad at defining what they want)

What's brilliant with Python compared to other hacky solutions that it does support test, type hints, version control and other things. It just doesn't force you to work that way. But if you want to write stable, maintainable code, you can do it.

That means you can write your code without types and add them later. Or add tests later once your prototype was been accepted. Or whenever something goes wrong in production, fix it and then write a test against that.

Oh and I totally agree you should certainly try to "do things the right way", if the business allows it.


It is hard to believe that Python is objectively that much more productive than other languages. I know Python moderately well (with much more real world experience in C#). I like Python very much but I don't think it is significantly more productive than C#.


Python is out of this world more productive in the Science space and Data space.

The only thing that can compete with it for productivity in the science space is R.


This. C#, Java or even newcomers such as Kotlin/Go are even in the same ballpark due to the REPL/Jupyter alone. Let alone when you consider the ecosystem


If you are in a lab (natural science lab) or anywhere close to data, I bet you it is much more productive, even more so when you have to factor in that the code might be exposed to non-technical individuals.


Using Python vs Rust is in no way in the same league as not having tests.


Totally agree. That's why I clarified:

> I'm sure Rust was the wrong tool for the job in your case but I find this type of get shit done argument unpersuasive in general.

Unless you're working on a fire-and-forget project with a tiny time horizon get shit done arguments are blatantly short-termist.


The thing is that the short term is much easier to predict what you're going to need and where the value is, and in the long term you might not even work on this codebase anymore. Lot of incentives to get things done in the short term.


The business owner (whoever writes the checks) prefers get shit done over "the right way". Time to completion is a key factor of the payoff function of the devs work.


The entire point of doing things the right way is that you end up delivering more value in the long term, and "long term" can be as soon as weeks or even days in some cases.

Business owners definitely prefer less bugs, less customer complaints, less support burden, less outages, less headaches. Corner cutting doesn't make economic sense for most businesses and good engineering leadership doesn't have much trouble communicating this up the chain. The only environment where I've seen corner cutting make business sense is turd polishing agencies whose business model involves dumping their mistakes on their clients and running away so the next guy can take the blame.


Try the travel/event booking business (where I'm in) - and no, people don't dump their mistakes on the next guy here - to the contrary, the "hacky" Python solutions are supported for years and teams stay for decades (allthough a decade ago we had not discovered how great Python was)

What business owners actually don't like at all is how long is takes traditional software development to actually solve problems - which then don't really fit the business after wasting a few years of ressources... and the dumping and running away is worse in Java and other compiled software. With Python you can at least read the source in production if the team ran away...


> the dumping and running away is worse in Java and other compiled software. With Python you can at least read the source in production if the team ran away...

Java (and dotnet, the two big "VM" languages) is somewhat of a strange example for that; JVM bytecode is surprisingly stable and reverse engineering is reasonably easy unless the code was purposely obfuscated - a bad sign on any language anyways.


> It overestimates the value of short-term delivery

For an early stage start up this is almost the only relevant factor for success.


From the other half of that sentence:

> underestimates how quickly an investment in doing things "the right way" pays off.

What time horizon should a startup optimize delivery for? Minutes, hours, days, weeks? Say you're a startup dev in a maximalist "get shit done now" mindset so you're skipping types, tests, any forethought or planning so you can get the feature of the week done as fast as possible. This makes you faster for one week but slower the week after, and the week after, and the week after that.

Say a seed stage startup aims for 12 months runway to achieve some key outcomes. That's still a marathon. It still doesn't make sense to sprint the first 200 meters.


> coworkers who churn out shiny new things at 10x the speed

Sounds like a classic web-dev perspective, my customers hate when we ship broken tools because it ruins their work, new feature velocity be dammned. We love our borrow checker because initially you run at 0.5x velocity but post-25kSLOC you get to run at 2x velocity, which continues to mystify managers worldwide.


This is not just a web-dev perspective.

People use Python in financial applications, Data Engineering and AI/ML pipelines, infrastructure software etc and the 10x speed can be real.


Feature factories give web dev such a bad rep, it doesn't have to be this way..


I use mostly Python, and a bit of Rust.

With Python, testing, good hygiene and a bit of luck you can write core that is maybe 99% reliable. It is very, very hard to get to (100-eps)% for eps < 0.1% or so. Rust seems better suited to that.

Anything else, especially if there isn't a huge premium on speed, meh - Python is almost always sufficient, and not in the way.


I use the same combo: lots of Python to analyse problems, test algos, process data, etc. Then, once I settle on a solution but still need more performance (outside GPU's), I go to rust.


Genuinely curious, why do you need Rust?


I'm simulating an audio speaker in real time. So I do the data crunching, model fitting, etc. in python and this gives me a godd theoretical model of the speaker. But to be able to make a simulation in realtime, I need lots of speed so rust makes sense there (moreover, the code I have to plug that in is rust too, so one more reason :-)). (now tbh, my realtime needs are not super hard, so I can avoid a DSP and a real time OS :-) )

I don't need rust specifically. It's just that its memory and thread management really help me to continue what I do in python: focusing on my core business instead of technical stuff.

The less I code the better I feel :-)


My most successful career epiphany was realizing that everyone -- my customers, my boss, etc -- was happier if I shipped code when I thought it was 80% ready. That long tail from 80-100% generates a lot of frustration.


Can you clarify this comment?


You mean the bit about frustration?

It's just an application of the Pareto principle. That last 20% of work to make perfect software costs a lot of time. Customers (and by extension, management) do not care how pretty your code is, how perfect your test coverage is (unless your manager is a former developer, then they might have more of an opinion), they care most that you ship it. Bugs are a minor irritation compared to sitting around waiting for functionality they need, as long as you're responsive in fixing the bugs that do come up.


Thanks. I thought that is what you meant but another possible take was that the last 20% is actually important. Getting something 80% finished is fast and then the long tail to get it to 100% is frustrating for everyone because the work, in theory is finished. I think that can happen as well.

Of course there are at least three dimensions to discuss here: internal quality, external quality and product/feature fit. Lower quality internal code eventually leads to slower future development and higher turnover as no one wants to work with the crappy code base. Lower external quality (i.e. bugs) can lead to customers not liking your product. Interestingly the relationship between internal and external quality is not as direct as one might think. Getting features out the door more quickly (at the expense of other things) can help with product fit. Essentially, like most things, this is an ongoing optimization problem and different approaches are appropriate for different problem domains.


That is interesting. I went in the other direction :)

I am tired of having to refactor shiny new things churned out at 10x the speed and that keep breaking in production. These days, if given a choice, I prefer writing them in Rust code, spending more time writing and less time refactoring everything as soon as it breaks or needs to scale.


When the pointer chasing (sometimes) comes in handy, is once you have a successful business with a lot of data and/or users, and suddenly the cost of all those EC2 instances comes to the attention of the CFO.

That's when rewriting the hot path in Go or Rust or Java or C or C++, can pay off and make those skills very valuable to the company. Making contributions to databases, operating systems, queueing systems, interpreters, Kubernetes etc. also fall into that category.

But yeah if you are churning out a MVP for a new business, yeah starting with Python or Ruby or Javascript is a better bet.

(Erlang/Elixir is also an interesting point in the design space, as it's very high level and concise, but also scales better than anything else, although not especially efficient for code executing serially. And Julia offers the concision of Python with much higher performance for numerical computing.)


Or there are programmers who write both. Something that I want to write once, have run on several different platforms, handle multi-threading nicely, and never have to think about again? Rust. Writing something to read in some data to unblock an ML engineer or make plots for management? Definitely not Rust, probably python. Then you can also churn out things at 10x the speed, but by writing the tricky parts in something other than python, you don't get dragged back down by old projects rearing their ugly heads, so you outpace the python-only colleagues in the long-term.


Programming is secondary to my primary duties and only a means for me to get other things done. I'm in constant tension between using Python and Rust.

With Python I can get things up and going very quickly with little boilerplate, but I find that I'm often stumbling on edge cases that I have to debug after the fact and that these instances necessarily happen exactly when I'm focused on another task. I also find that packaging for other users is a major headache.

With Rust, the development time is much higher for me, but I appreciate being able to use the type-system to enforce business logic and therefore find that I rarely have to return to debug some issue once I have it going.

It's a tough trade-off for me, because I appreciate the velocity of Python, but Rust likely saves me more time overall.


If you're 'tired of chasing pointers', Rust's a lot closer to (and I'd argue better than) Python than say Go - it'll tell you where the issue is and usually how to fix it; Go will just blow up at run time. (Python (where applicable) will do something unexpected and wrong but potentially not error (..great!))

(Fwiw I use all three, Python professionally.)


> at 10x the speed

Is a coping canard invented by programmers who can't into more powerful programming languages.

By far the slowest language for developing is PHP, it's even worse than plain C in that regard.


I completely agree - but you say that like it's a bad thing. I work as a developer alongside data scientists, who might have strong knowledge of statistics or machine learning frameworks rather than traditional programming chops.

For the most part they don't need to know about concurrency, memory efficiency etc, because they're using a library where those issues have been abstracted away.

I think that's what makes python ideal - it's interoperability with other languages and library ecosystem means less technical people can produce good, efficient work without having to take on a whole bunch of the footguns that would come from working directly in a language like c++ or Rust.


But this is a false dichotomy. The space of options isn't C++/Rust or Python. There are languages which attempt to give the best of both worlds, e.g. Julia.

> they're using a library where those issues have been abstracted away.

I work in Python, and while libraries like numpy have certainly abstracted away some of those issues, there's still so much performance left of the table because Python is still Python.


I'd say if you do data-intenstive computation with Numpy you are not leaving much on the table due to Python.


Have gone through the exercise, I know this is false.

Not everything can be pushed into numpy, and you can still be left with lots of loops in python.


That's what numba [0] is for (can also help with the NumPy stuff in certain cases.)

[0] https://numba.pydata.org/


Oh, I'm familiar with numba and while it certainly helps, it has plenty of it's own issues. You don't always get a performance gain and you only find this out at the end of a refactoring. Your code can get less readable if you need to transport data in and out of formats that it's compatible with (looking at you List()).

To say nothing of adding yet another long dependancy chain to the language (python 3.11 is still not supported even though work started in Aug of last year).

I do wonder if the effort put into making this slow language fast could have been put to better use, such as improving a language with python's ease of use but which was build from the beginning with performance in mind.


I've rewritten real world performance critical numpy code in C and easily gotten 2-5x speedup on several occasions, without having to do anything overly clever on the C side (ie no SIMD or multiprocessing C code for example).


Did you rewrite the whole thing or just drop into C for the relevant module(s)? Because the ability to chuck some C into the performance critical sections of your code is another big plus for Python.


But... pretty much any language can interoperate with C, it's calling conventions have become the universal standard. I mean, I still remember at $previousJob when I was deprecating a C library and carefully searched for any mention of the include file... only to discover that a whole lot of Fortran code depended on the thing I was changing, and I had just broken all of it (since Fortran doesn't use include files the same way, my search for "#include <my_library" didn't return any hits, but the function calls were there none-the-less).

Julia, to use the great-great-grand-op's example, seems to also have a reasonably easy C interop (I've never written any Julia, so I'm basing this off skimming the docs, dunno, it might actually be much more of a pain than it looks like here).


Calling C from Julia is pretty straightforward

https://docs.julialang.org/en/v1/manual/calling-c-and-fortra...


Even if you do this, you're still paying a penalty whenever you move data Python->C and C->Python.

Plus that you now need to write performant (and safe) C code, which (to me) defeats part of the reason to use Python in the first place.


I’ve done the same but moved from vanilla numpy to numba. The code mostly stayed the same and it took a couple hours vs however long a port to C or Rust would have taken.


For a package whose pitch is "Just apply one of the Numba decorators to your Python function, and Numba does the rest." a few hours of work is a long time.


2-5x speedup is not a lot, I would say it is not worth it to rewrite from py to C if you don't have an order of magnitude improvement.

Because if you compare the benefit to the cost of rewrite from py to C and cost of maintaining/updating C code and possible C footguns like manual memory safety, etc - then there is no benefit left


2-5x IS a lot. It's the speed difference between the current iPhone 14 and an iPhone XS-iPhone 6. That's 4-8 years of hardware improvements.

And the parent was talking about numpy code which is better than stock python, who knows how far back normal python would send you.


I'm in the camp that 2-5x performance improvement is not really worth re-writing Python code in C for.


Guess that'll depend on how much you need the performance and how much code it is.

They're comparing numpy (SIMD plus parallelism) with straightforward C code and getting a 2-5x improvement.


I highly doubt that numpy can ever be a bottleneck. In typical python app - there are other things like I/O that consume resources and become bottleneck, before you run into numpy limits and justify rewrite in C.


I haven't personally run into IO bottlenecks so I have no idea how you would speed those up in Python.

But there's two schools of thoughts I've heard from people regarding how to think about these bottlenecks:

1. IO/network is such a bottleneck so it doesn't matter if the rest is not as fast as possible.

2. IO/network is a bottleneck so you have to work extra hard on everything else to make up for it as much as possible.

I tend to fall in the second camp. If you can't work on the data as it's being loaded and have to wait till it's fully loaded, then you need to make sure you process it as quickly as possibly to make up for the time you spend waiting.


In my typical python apps, it's 0.1-20 seconds of IO and pre-processing, followed by 30 seconds to 10 hours of number crunching, followed by 0.1-20 seconds of post processing and IO.


It can be worth it. What matters is how much time it saves your users over the course of using the app vs the time it took to develop it. So, if:

#-of-users * total-time-saved-per-user > time-spent-optimizing

Then it's worth it. You can even multiply by cost of user per time unit and cost of developer per time unit, to see how much money was saved.

Even in cases where its the same person on both sides, it can still work out. There's an xkcd comic about it, even.


2-5x speedup barely seems worth re-writing something for, unless we're talking calculations that take literally days to complete, or you're working on the kernel of some system that is used by millions of people.


Most programmers don't actually need to know about that stuff, either. And most programmers who do need to know about that stuff, don't know about it.


Solving race conditions is QA's problem right? ;)


You're so silly. QA? What's QA? Solving race conditions is the customer's problem :-p


> For the most part they don't need to know about concurrency [...]

In my opinion, this is the part that Go got mostly right. Concurrency is handled by the runtime, and held behind a very thin veil. As a programmer you don't really need to know about it, but it's there when you need to poke at it directly. Exposing channels as a uniform communication mechanism has still enough footguns to be unpleasant, though.

In an ideal world, I should be able to decorate a [python] variable and behind the scenes the runtime would automatically shovel all writes to it through an implicitly created channel. Instead of me as a coder having to think about it. Reads could still go through directly because they are safe.

If I could have Python syntax and stdlib, with Go's net/http and crypto libraries included, and have concurrency handled transparently in Go-style without having to think about it, that would be pretty close to an all-wishes-come-true systems language. Oh, and "go fmt", "go perf" and "go fuzz" as first-class citizens too.

Someone else in this thread brought up the idea of immutable data structures as a default. I wouldn't mind that. Python used to have frozenset (technically it still does but I haven't seen a performance difference for a while), so extending the idea of freeze()/unfreeze() to all data types certainly has appeal.


In fact, the development of the world is based on constant levels of abstraction, just think of assembly language and computer punch tape programming, those days are not long past.


> without having to take on a whole bunch of the footguns that would come from working directly in a language like c++ or Rust.

Don't forget the footguns of working with developers who do those things. Ask them to do something simple and you get something complex and expensive after months of back and forth about what is wanted. You're likely to a framework for a one off SQL query.

I hear it being said already, "You're using software developers wrong!" Well, maybe software developers shouldn't be so hard to use?


> maybe software developers shouldn't be so hard to use?

This whole take assumes bad intention on both sides. Nobody's job is easy in this situation. Leadership's job is to set everyone up for success. If things go off the rails and end up with months of back and forth leading to nobody being happy despite good intentions and honest effort, then the problem lies with leadership.


> a whole bunch of the footguns

Could you expand on this in the context of Rust?


Sure thing! Footguns might be the wrong word, and I know as a low level language Rust is insanely safe, but for a high level developer it's type system is gonna mean spending a lot of time in the compiler figuring out type errors, at least initially. That might not be a traditional footgun, but if you're just trying to, I dunno, build a crud api or something, its gonna nuke your development time.

Please don't read this as "rust is difficult and bad", I definitely don't think it is! But its a low level language, and working with it means dealing with complexity that for some tasks just might not be relevant.


I find figuring out type errors is usually less work than figuring out the runtime bugs they prevented.


I agree, but for something like the CRUD app example I made bringing in pydantic or something would solve that. Rust's type system is a lot stricter because it's solving problems in a space that doesn't touch a lot of Python developers.


In all fairness Rust was never meant for writing database interfaces, more like storage engines.


>and they don’t care - or know about - concurrency, memory efficiency, L2 cache misses due to pointer chasing.

Also if I (a programmer) want to write really really fast code I'm probably reaching for tools like tensorflow, numpy, or jax. So there's not much incentive for me to switch to a more efficient language when as near as I can tell the best tooling for dealing with SIMD or weird gpu bullshit seems to be being created for python developers. If you want to write fast code do it in c/rust/whatever, if you want to write really fast code do it in python with <some-ML-library>.

For a very specific definition of the word "fast" at least.


> Also if I (a programmer) want to write really really fast code I'm probably reaching for tools like tensorflow, numpy, or jax. So there's not much incentive for me to switch to a more efficient language when as near as I can tell the best tooling for dealing with SIMD or weird gpu bullshit seems to be being created for python developers. If you want to write fast code do it in c/rust/whatever, if you want to write really fast code do it in python with <some-ML-library>.

Rather unfortunately, my current bugbear is that Pytorch is... slow. On the CPU. One of the most common suggestions for people who want stable diffusion to be faster is, wait for it, "Try getting a recent Intel CPU, you'll see a real uplift in performance".

This despite the system only keeping a single CPU core busy. Of course, that's all you can do in Python most of the time.

(You can also use larger batch sizes. But that only partially papers over the issue, and also it uses more GPU memory.)


>"Also if I (a programmer) want to write really really fast code I'm probably reaching for tools like tensorflow, numpy, or jax."

Very limited view which ignores way too many areas.


Sure, examples?


Your OS, the linear algebra libraries themselves, much of the user-facing software that you use (latency sensitive rather than throughput sensitive), image/video encoding/decoding, most of the language runtimes that you use, high volume webservers, high volume data processing (where your data is not already some nice flat list of numbers you're operating on with tensor operations), for some examples.

Really, for almost any X, somebody somewhere has to do X with strict performance requirements (or at very large scale, so better perf == savings)

Most of these python libraries are only fast for relatively large and relatively standard operations in the first place. If you have a lot of small/weird computations, they come with a ton of overhead. I've personally had to write my own fast linear algebra libraries since our hot loop was a sort of modified tropical algebra once.


How is it in disagreement with parent?


They asked for examples of non-numpy/tf/had use cases and I gave some including my own experience? No disagreement, HPC Python in practice is heavily biased towards numpy and friends


They're not, those are useful examples.


Your comment is super interesting because it suggests Python has evolved in a direction opposite to the Python Paradox - http://www.paulgraham.com/pypar.html

Whereas before you could get smarter programmers using Python, now because of the exponential growth of Python, the median Python programmer is likely someone with little or no software engineering or computer architecture background who is basically just gluing together a lot of libraries.


Neat observation. I wasn't doing much programming in 2004, but, I'm guessing 2004 Python would be like today's Rust. People learn it because they love it.


I think more so Rust than even Python on 2004 since Rust has a pretty steep learning curve and does require a non-trivial amount of dedication to learning it.


Perhaps today its not smart programmers per se, but smart people who are interested in learning to program.

The libraries are the killer feature for me.


> It’s frequently the second best language but it’s the second best language for anything.

This myth wasn't even true many years ago, it certainly isn't true today. You can build a mobile app, game, distributed systems, OS, GUI, Web frontend, "realtime" systems, etc in Python, but it is a weak choice for most of those things (and many others) let alone the second best option.


The saying does not mean that in a rigorous evaluation Python would be second best out of all programming ecosystems for all problems.

The saying means that for any given problem, there is a better choice, but second best is the language you know which has all of the tools to get the job done, so the answer is probably just a bunch of pip installs, imports, and glue code.

It’s kind of like “the best camera is the one you have with you” — it’s a play on the differing definitions of “best” to highlight the value of feasibility over technical perfection.


When I switched from PHP to Python years ago I had the same feeling as the OP, then it became the third best, then the fourth, then situational when object-orientation makes sense, then for just scripting, and now... unsure beyond a personal developer comfort/productivity preference. TUIs and GUIs built on Python on my machine seem to be the first things to have issues during system upgrades because of the package management situation.


> it’s the second best language for anything

Anything that doesn't require high performance that is. Is there any 3D game engine for python yet? I guess Godot has gdscript which is 90% python by syntax, but that doesn't quite count I think.


You won't get high performance out of Python directly, but there are a lot of Python libraries that use C or a powerful low level language underneath. The heavy lifting in so much of machine learning is CUDA, but most people involved in ML are writing Python.


Sure, but what's not really python per se. One could also call C++ libraries from java via JNI and pretend java is super fast.

If people write program logic in python it will run at python speeds. Otherwise you're not really writing python, like nobody says some linux native program is bash because it happens to be launched from a bash script.


> Sure, but what's not really python per se. One could also call C++ libraries from java via JNI and pretend java is super fast.

But that's how every scripting language obtains good-not-just-decent performance. A strong culture of dropping down to C for any halfway-important library is why PHP's so hard to beat in real-world use, speed-wise (whatever its other shortcomings).


> Sure, but what's not really python per se.

But that's exactly the strength of Python: it's an interface language. It's meant to make pretty sophisticated things like CUDA easy for everyone.


Java is super fast though, it almost never uses JNI as it doesn’t need it as opposed to Python. It uses JNI for integrating with the C world (e.g. opengl bindings).


Python isn't a joke either. I'm a full-on programmer who started with C and branched out to several other languages, and I'd still pick Python for a lot of new tasks, even things that aren't little scripts. Or NodeJS, which has similar properties but has particular advantages for web backends.


I’ve been a Python developer for 15 years, and Python might have been the second best language for anything when I started my career, but there are so many better options for just about any domain except maybe data science. Basically for any domain that involves running code in a production environment (as opposed to iterating in a Jupiter notebook) in which you care about reliability or performance or developer velocity, Python is going to be a pretty big liability (maybe it will be manageable if you’re just building a CRUD app atop a database). Common pain points include performance (no you can’t just multiprocess or numpy your way out of performance problems), packaging/deployment, and even setting up development environments that are reasonably representative of a production environment (this depends a lot on how you deploy to production—I’m sure lots of people have solved this for their production environment).


Yeah, big corp big data web scale systems use python too.


I'm a bit surprised to see this article on GitHub blog, it feels more like something from dev.to - looking at the surface, with little actual insights.

Most of the provided reasons behind Python's popularity are true also for other languages - portable, open source, productive, big community. This can be also said about PHP, Ruby, or Perl back in 2000s. Why isn't Perl as popular as Python?

I don't think it's all about readability or productivity, but about tools that were built over the last 30 years that have been used in academia and now with the boom in ML/AI/Data Science, they made Python an obvious choice to use for the new generation of tools and applications.

Imagine that the boom in ML/AI didn't happen - would Python be #1 language right now?


> Why isn't Perl as popular as Python?

I don't think there is a single reason, but it sure didn't help that the community self-destructed by trying to make an entirely new language after version 5 and still call it Perl. It took a lot of years to resolve that nonsense, and in the meantime many people moved on.

It also does not help that Perl is a creative language, useful but very much open to many different interpretations. Hiring a perl guy and expecting them to read someone else's code is a crapshoot. The upside to Python's strong cultural opinions on coding style makes it easier for one developer to pick up someone else's code.

> Imagine that the boom in ML/AI didn't happen - would Python be #1 language right now?

Probably not. But it wouldn't be perl, either. Javascript most likely. But the core usage of python for scripting was never predicated on ML popularity, so it would still be a pretty commonly used language. and javascript has many annoying warts too, so I think plenty of people would still choose to write django apps instead of node, whether ML existed or not.


The line noise complaint isn't without merit either.

One of the few programming language jokes I've liked enough to repeat is "Python in Perl for people who can't bring themselves to write Perl code"


As commented somewhere else in this thread, Python was clearly more ergonomic than Python, hand had a lot of mindshare exactly for this reason. I remember when Python was new and the not that professional choice, Perl was at that time for that niche. Now still I don't see a contender for a language where speed doesn't matter. Ruby has some Perlisms that really make it weird, PHP is tight to the web, and equally weird, these $s and @s are really bad for normal people. Python wins clearly when teaching somebody programming.


I’d say that Ruby and even Perl are a lot nicer for scripting than Python (due to the extremely low-effort unix interop). Python can do it but it’s a while lot more verbose and difficult for a beginner to learn than “anything inside a pair of backticks is run as a system command and you can interpolate variables”.

Python was friendlier for beginners than Ruby the first time I took a real stab at learning to code during a CNY holiday in 2008, but it wasn’t about the language itself. Ruby was harder then because many of the popular libraries and many of the tutorials were written by people who considered Windows support as an afterthought. It’s hard to express how frustrating it was to have my vacation days ticking down, hitting issues in one tutorial after another and having people suggest I install linux on a VM (a process where I hit still more snags).

People learning Python and PHP didn’t hit that hurdle. I ended up learning Flash on my Asus laptop a couple of years later and getting my start that way and not coming back to Ruby until six years later when I was a much more experienced dev.


> Why isn't Perl as popular as Python?

Perl was significantly more popular at one point, but it slowly lost traction while Python gradually gained traction over the years.

Better ecosystem for numeric computing is definitely a big reason for the success of Python, by the question is why Python gained a foothold in that niche in the first place. It think it is because Python is just a lot more accessible to people with different backgrounds. Perl really grew out of shell scripting as a supercharged alternative to Bash and Awk, but retaining many of the quirks for familiarity. Python on the other hand grew out of research in teaching programming to beginners.


This was already posted at https://news.ycombinator.com/item?id=35000415, I don't know why it didn't detect the duplicate. I'll repost my comment from there:

This is a strange article. It's got the talking point about Python that we were hearing about 10 years ago - "tired of those pesky curly brackets in Java, try this new language you might not have heard of: Python!". Who reading the GitHub blog has not heard of Python?

Also, that snippet used in the "What is Python commonly used for" section is strange:

  import antigravity
  
  def main():
      antigravity.fly()

  if __name__ == '__main__':
      main()
It's overly verbosely written (especially given the example just above about how you don't need main function in Python) and refers just to an insider joke/Easter egg. I can't see that it's going to convince anyone to try Python, only make them feel that they're on the outside of a joke.

It then ends with what seems like it might have been the point of the article, an advert for Copilot. It seems the way to get started writing Python is to write a short comment and then spam <TAB> and let the AI auto-complete your project.

(Also, and perhaps less importantly, looking at the author's GitHub profile I can't see a single instance of Python there. Though I'm not doing a deep-dive as that feel overly picky and there's plenty of private contributions that could well be Python.)


I read the article and had the same feeling that it's a fluff piece without substance. If you've to compare anything, compare it with the vibrant JVM ecosystem. Using the same tired argument of `System.but.Println()` shows the author has no original idea. Python is great but JVM is no lackey, it is a marvelous piece of battle tested engineering.

In the end it is just an ad for Github products and not worthy of being on HN frontpage.


Agree, I thought it was a pretty low-effort article until I got to the end and realized it was just an ad for CodeSpaces and CoPilot.


I find CoPilot to be super useful, but I would not use CodeSpaces due to safety concerns and limitations in team management.


>It's got the talking point about Python that we were hearing about 10 years ago - "tired of those pesky curly brackets in Java, try this new language you might not have heard of: Python!"

That was a talking point closer to 20 years ago, at this point.


Pretty sure I tried both Java and Python at the end of the 90s. So more 25 years ago.


You're right, I think I was caught out by the unending march of time :)


There's a joke I saw that goes like:

   1980-2000 = "30 years ago"
   2000-2010 = "10 years ago"
   2010-2023 = "Recently


> Who reading the GitHub blog has not heard of Python?

The GitHub blog became a strange place recently(-ish). It went from a factual blog describing fancy new GitHub features and interesting technical stuff (as it was a decade ago) to mostly a place full of incoherent marketing fluff like this post (with some real technical content interspersed).


It's doubly weird, since the antigravity module doesn't have a fly attribute, and it "does its thing" on import.

https://github.com/python/cpython/blob/main/Lib/antigravity....


Very suspect


Curly brackets or brackets at all aren't mentioned by name in the article - is that your interpretation of the first code example?

IMO, that example is there to show that there's less required boilerplate required in python compared to java when doing the same thing. And, in particular, none of that boilerplate really matters to what you want to do - print hello world - emphasizing the point of python being simple.


The section that explains why python is good for AI talks about pybrain, a library that seems to date from 10 years ago. I’m pretty well versed in most ml frameworks and never heard of it. Last update to the website looks to be 2010. Weird to feature that and PyTorch as examples of ML libraries. No mention of sklearn which is vastly more popular


But I didn't type

  import antigravity
I typed

  pydoc3 antigravity
(Cue clip of industrial disaster in background.)


The antigravity part is probably a reference to https://xkcd.com/353/


> Hello world is just `print "Hello, world!"`

SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Hello, world!")?


That's because that issue of the webcomic is like 15 years old. It was true of the version of Python which was current at the time:

https://stackoverflow.com/questions/6182964/why-is-parenthes...


Yes, I know. But some people still have the reflex from Python 2 and feel bitter when the error message says "I know what you want, and I'm not giving it to you."


It _is_ a reference to https://xkcd.com/353/, quite literally, as importing the module opens that link.


Eh, they're just a lot of ways to say "path dependence". Scripting languages are basically the same exact technology with respect to each other. In the alternate universe where numpy and scipy are, let's say, numruby and sciruby, wouldn't we be here asking why Ruby keeps growing?

That's not a sales pitch for python, it's a sales pitch for the concept of a scripting language; it's like saying "you should really buy a Ford, it comes with four wheels".


I admit I have a blind spot for Python, because I use PHP in my day job, so when I need to do some scripting, I mostly use PHP. But admittedly Python is a lot friendlier than some alternatives (Perl, Shell scripts etc.), and more universal than others (PHP being mostly used for web dev), so that's why people choosing a scripting language for their tool/library tend to choose Python.


I use Python all the time both for my personal stuff and for some side-projects at work, so this isn't a dunk on Python, but honestly it feels like a circular thing: it's popular because it's popular.

I wouldn't say it's friendlier than the alternatives, Perl and Shell scripts sure, but not when compared to Javascript, Ruby or Lua.

Now, if you're talking about libraries, support, etc. then sure, Python wins hands down, but that doesn't make it a better language in itself. I'd say Ruby and Lua are a little bit better as languages.

But then again, I don't care much for the language in itself, so Python is enough for most of my use cases.


I read "it's popular because it's friendlier". PHP, Javascript or R are popular, but are not friendlier. I find their error messages way worse for the beginner, when you need it more. Third party code is too "clever" for the beginner to read and learn, because it seems to be two languages: the one you are learning in the tutorials, and the other idiom that is used in the serious libraries. As a beginner you are hit with this feeling that you are far, far away from writting an useful thing.

In my job I've seen some beginners starting with R, and quickly hating it because they don't feel they can do much on their own, but copy-pasting and then modifying from the examples and the tutorials. And it the changes go too far, everything collapses with cryptic errors. When you show them Python as an alternative, pointing that they shouldn't use it over R for statistics and graphics, they like that they can build ideas from the scratch. That beginner is hooked for life.


I think Lua was always seen as a bit obscure, and not enough people invested in the language to write useful utilities. It has a solid C foreign function interface, and the compiler is quite fast, which leaves me puzzled about why it never gained traction. I think it's an embedded scripting language in the majority of use cases (e.g. NeoVim, LuaLaTeX, scripting in some game engines).

The story of Ruby is altogether different: they made the fatal mistake of not defining a C foreign function interface in the standard, otherwise I imagine we'd be seeing numerical computation and ML libraries with a Ruby interface today. Still, Ruby lives on in Metasploit, and in Sorbet and Crystal.


> I think Lua was always seen as a bit obscure, and not enough people invested in the language to write useful utilities. It has a solid C foreign function interface, and the compiler is quite fast, which leaves me puzzled about why it never gained traction. I think it's an embedded scripting language in the majority of use cases (e.g. NeoVim, LuaLaTeX, scripting in some game engines).

- Lua's standard library is so weak that it makes most other batteries-not-included languages look like they have large, robust, and helpful standard libraries.

- It's got a bit of the quirkiness and gotcha-ability of JavaScript but without its being a language that's impossible to avoid due to capture of a mega-popular platform, which is what propelled JavaScript to ubiquity despite its being kinda shit and unpleasant to work with.

- Tooling's not as good as many other languages.

(FWIW sometimes I write Lua regardless, because it's the right tool for the job)


> The story of Ruby is altogether different: they made the fatal mistake of not defining a C foreign function interface in the standard, otherwise I imagine we'd be seeing numerical computation and ML libraries with a Ruby interface today.

This to me is extremely plausible, and sad.


Luck of libraries and initial userbase are certainly involved in success, but not all scripting languages are equal. I mean we could add bash to that list then.

In fact I'd argue that python enjoying the success it has, despite probably the worst handling of a version bump in any language (2->3), is a testament to its popularity.


The worst version bump ever seen was Perl 6 which literally became another language (Raku).


Why did Rails get outcompeted by newer (and also some older) alternatives in the long run, when numpy and scipy did not?

Ruby would have a much bigger market share these days if library path dependence was that powerful.


> Why did Rails get outcompeted by newer (and also some older) alternatives in the long run, when numpy and scipy did not?

Chiefly, because big money corp invested massively elsewhere.

Also by now it’s easier to find developers who are cheaper and already (only can) use javascript/python?

I think that actual technical merits are dwarfed compared to other forces at stake here.


I think that experiment is taking shape with Elixir:

https://github.com/elixir-nx/nx

I don't see how it could ever overtake Python, but it could establish itself as a viable niche alternative.


I think one reason for Python growing popularity is because it's become the default tool in some domains whether it is the best tool or not.

This week our Director ordered a total rewrite of two years of work in Python. His rationale: it's what everyone else uses in this space. No reason specific to our use case, just simply to follow the herd. I realise that a large community translates into easy hiring and rich ecosystems, but I despise the mentality as it promotes a monoculture.


... and Python _became_ the default tool because the de facto developer consensus (after years of competing languages) is that an interpreted language should

    = be usable, and 

    = provide a set of data structures that an educated programmer *expects to find when scripting*.
Python literally sucked less than the alternatives.


Python gets introduced to students, so for many people it's the first language they learn. Half the programming community have less than 5 years of experience. I question their ability to evaluate suckage, lol.


That's a terrible idea for so many reasons


What are you rewriting from? At director level focus is usually more on things like how easy is it to staff / get support for something. Python is strong here - you can find programmers globally who can do pretty well with it.

Can you say the same about your solution?


If Python was a market, this would clearly be the top.

There was a time when all you had to learn was Java or C++ or (language or tool here). Somehow, this time when we standardize, it will be different.


It's really funny that one of the subheadings under "Why is Python so popular?" is "It has high corporate demand."


That's a perfectly relevant thing, no?


Yes. The entire reason I like using Java at my day job is that the rest of the company uses it the most and supports it well. I would never use Java on my own, but that's a different situation.


It reads as a little bit of a tautology. "People are using it because people use it". I get that from a hireability standpoint it's a real thing to consider, but the statement doesn't say anything about whether or not Python is actually a good language to use


Another counterpoint to this. Rust is a popular language (debatable claim, of course) but it's not high in corporate demand.


> This week our Director ordered a total rewrite of two years of work in Python

WTF? Unless your system is originally written in a proprietary language that literally no one outside your company knows, I'll say it's a good sign that you need to change team (or change job). Don't work under a director like that.


Agreed. Coincidentally, I resigned the day before this was announced.


Sorry if it sounds too cynical, but that's probably the intended effect of a rewrite from what the team knows into Python: staffing changes. A bunch of the (expensive) old guard will leave and they can be replaced with cheap grads, who all know Python.

Better luck with the next one!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: