Why Python keeps growing, explained

CE02 · on March 3, 2023

One thing I’d add to this conversation, though I’m certain it’s already been stated: As many have mentioned, there is a large subset of the user base that uses Python for applied purposes in unrelated fields that couldn’t care less about more granular aspects of optimization. I work as a research assistant for international finance faculty and I would say that compared to the average Hackernews reader, I’m technologically illiterate, but compared to the average 60-80 y/o econ/finance faculty member, I’m practically a Turing award winner.

Most of these applied fields are using Python and R as no more than data gathering tools and fancy calculators. something for which the benefits of other languages are just not justified.

The absolute beauty of Python for what I do is that I can write code and hand it off to a first year with a semester of coding experience. Even if they couldn’t write it themselves, they can still understand what it does after a bit of study. Additionally, I can hand it off to 75 year old professors who still sends Fax memos to the federal reserve and they’ll achieve a degree of comprehension.

For these reasons, Python, although not perfect, has been so incredibly useful.

faizshah · on March 3, 2023

I just want to add to this, I had this exact same experience when working with journalists and other non-technical background programmers.

You’ll find everyone from philosophy PhDs to Biologists to Journalists who use pandas because its so easy to learn it and work with it. It’s amazing how you can become productive in python/pandas without any experience or even basic understanding of programming because of how accessible jupyter, colab and blogs/docs on pandas are.

The other thing people don’t talk about is that a lot of these organizations can hire a CS student part time or a full time software engineer/data engineer/data scientist who can optimize their scripts once they are written. Pretty much any software engineer can read and debug python code without needing to learn python. So for example, I know some engineers working in genomics who have turned biologist-written scripts that take several days to run in python into scripts that take hours or minutes to run by doing basic optimizations like removing quadratic algorithms from the script or applying pyspark or dask to add parallelism.

The fact that python can be used as a bridge between technical and non-technical people is amazing and I think it has provided a better bridge between these groups than SQL was ever able to provide.

CE02 · on March 4, 2023

I couldn’t agree more. And I must say, now that it’s being used as a bridge between technical and nontechnical talent it’s becoming ever more vital from a career perspective. Most people recognize the value of fundamental coding skills and if you’re even just above average at coding in a non-CS field, you seem magnitudes more valuable than you really are. In both industry and research, ears immediately perk up when they realize I have a background in economics but competencies in coding beyond the standard regressions in R that everyone does in econometrics. It’s hilarious because as mentioned prior, I’m rather pathetic compared to most people on this forum.

TrackerFF · on March 3, 2023

Yeah, Python is widely used where I work for just that. The "hierarchy" of tools look somewhat like this - from most to least technical competent users

1) Languages like Python / R / Julia / etc. + SQL

2) PowerBI, Tableau, or similar tools

3) Excel

The number of users of those tools will be the inverse, with Excel being number 1.

If you're competent using the "stack" above, you could probably work as an analyst anywhere - given that you can pick up domain knowledge.

blensor · on March 3, 2023

I hate to admit that I very often start the python repl to just do some simple calculations. I always have multiple terminals open so instead of opening a calculator I just use python in one of the terminals.

ElevenLathe · on March 3, 2023

Agreed. Python's REPL has basically totally replaced my usage of Emacs calc as a desk calculator, mainly because it is always there and if I don't know the big-brain closed-form solution for something like compound interest, I can just write a loop and figure it out that way.

wongarsu · on March 3, 2023

So what you are saying is that Python is Excel for programmers :D

bombcar · on March 3, 2023

This is a really good line, the VAST VAST majority of programming in the world is done in Excel by people who would be horrified if you told them they were programming.

And I wouldn't be surprised if a large number of python programmers would say they're not programming, it's just scripting.

faizshah · on March 3, 2023

I also use a python repl as an alternative to excel or SQL. I find myself just downloading the data as a CSV and then quickly cooking up some pandas to get a graph or aggregate some stats, it’s just so much quick easier imo.

cauthon · on March 3, 2023

I’ve migrated to the tidyverse for most of my EDA and plotting - I’ve found dplyr and ggplot to be noticeably more expressive. Pandas always added a ton of friction for me.

It’s still my choice for quick and non-graphical analysis when I’m on a remote.

RSHEPP · on March 3, 2023

An alternative to pandas/Python for similar uses is https://www.visidata.org/. You can use Python in it also.

bakuninsbart · on March 3, 2023

A bit off topic, but what would you use for data "mangling"? Like joining csvs on complex conditions, cleaning tables etc. Pandas seems to be the wrong tool for this, but I still often find myself using it as in contrast to something like Excel, my steps are at least clearly documented for future use or verification.

faizshah · on March 3, 2023

If you asked this question 6 or 8 years ago the answer would be it depends on the volume of data (10s of gb, 100s of gb etc.) and I could give you just a single tool that would help you in most cases.

Today honestly most tools are pretty capable, pandas is a great choice and if you have really high volumes of data you might try koalas (spark) or polars.

Honestly the biggest design considerations for data science today are things things external to your project: what do you and others on your team know, what tools does your company already have setup, what volume of data are you processing, what are your SLAs, who or what else needs to run this script/workflow, what softwares do you need to integrate with, how often does it need to be processed, how are you going to assure the quality of your data and what tools are you using for reporting?

I tend to use pandas and SQLite for most use cases cause I can cook up a script in 2 hours and be done, I just code it interactively in a notebook and most people are able to work on a pandas or SQLite script productively if it needs to be maintained even if they don't know python. If its a large volume of data or a rapid schedule (minutes, seconds) or tight SLAs on quality or processing time, then I start to consider whether pyspark, Apache beam, dask or bigquery might be a good fit.

So it really just depends but for most people who are processing < 100 GB on a 1+ day schedule or ad hoc I would recommend just using pandas or tidyverse in R and getting really good at writing those scripts fast. Today you’ll get the most mileage out of those two tools.

bombcar · on March 3, 2023

I still use perl for some of that stuff, or even awk, but those are barely reusable or readable.

faizshah · on March 3, 2023

This is a letter to the general community: please stop writing these scripts in perl and bash one liners. That one off script you thought would only be used once or twice at this nonprofit has been in continuous use for 12 years and every year a biologist or journalist runs your script having no idea how it actually works. Eventually the script breaks after 8 years and some poor college student interns there and has to figure out how perl works, what your spaghetti is doing and eventually is tasked with rewriting it in python as an intern project (true story).

JohnFen · on March 3, 2023

I think your complaint isn't really about perl and bash. It's about knowing your audience.

When writing code that will be used by a particular sort of user base, the code should be written in whatever way best suits that user base. If your users are academics, researchers, journalists, etc. -- yes, avoid anything with complex or obscure semantics like perl or bash.

But if your code is going to be used by programmers or people who are already comfortable with perl/bash/whatever, those tools may be just the ticket.

tejtm · on March 3, 2023

one line spaghetti ... I remain unsympathetic.

JohnFen · on March 3, 2023

He has a valid point, though. I've seen (and written!) one-liners that were so complex that nobody, even devs, can deal with them without decoding them first.

They aren't technically "spaghetti", but they are technically impenetrable.

I argue that one-liners like that aren't good for anybody, dev or otherwise.

bg24 · on March 3, 2023

Do you reply on any GitHub repo or gist w/ code snippets?

nordsieck · on March 3, 2023

> I very often start the python repl to just do some simple calculations.

If you use the python repl a lot and haven't heard of it, ptpython is worth checking out as a repl replacement. I find it to be much more ergonomic.

xarope · on March 3, 2023

yup, from decimal import Decimal, and get better accuracy than any default calculator

mharig · on March 4, 2023

You may like xonsh

https://xon.sh/

No need to fire up a python repl.

kayodelycaon · on March 3, 2023

I don't see why that's something to be ashamed of. I frequently pop open a Ruby on Rails console for this purpose. (Basically ruby's repl + libraries and language extensions.)

throwaway744678 · on March 3, 2023

Eh, I type basic operations in Spolight or Google, whichever is lying on my screen!

EamonnMR · on March 3, 2023

I have python on my phone and use it to calculate tips sometimes.

influx · on March 3, 2023

Have you tried ipython? Python repl on steriods!

blensor · on March 3, 2023

from time to time yes. Ideally I would also have a jupyter notebook running at all times, but in the end it mostly comes down to vanilla python because that's installed on everything I am using

cauthon · on March 3, 2023

I do too if I already have a repl open, but otherwise I mostly use bc so I don’t have to wait for the slight lag of the repl to start

chubot · on March 3, 2023

What’s to hate about that? It’s a perfectly good use of Python and I do it all the time.

conductr · on March 3, 2023

I've seen this too. Python has supplanted what used to be done in a spreadsheet entirely, even the custom VBA macro stuff that was once a high level spreadsheet. Python with/plus viz is more enjoyable experience than trying to wrangle some general purpose spreadsheet into doing this stuff. And, it's relatively portable and transferrable which are major advantages of the spreadsheets.

brundolf · on March 3, 2023

I'm one of Python's biggest critics (to me it's a Monkey's Paw of software development), but I think this is exactly the appropriate situation to use it. It's great for one-off fancy calculations, system scripts, ideally with no dependencies and/or a short lifetime

haswell · on March 3, 2023

> to me it's a Monkey's Paw of software development

This piqued my curiosity. I've worked with Python on and off for the last ~20 years, and while I'm not a fanboy or apologist, and use other tools when appropriate, there's also a reason it remains in my toolbox and sees regular use while many other tools have come/gone/been replaced by something better.

Can you share an example scenario where it's a Monkey's Paw? My suspicion is that this is more of an org issue than a tech issue?

brundolf · on March 3, 2023

Dependency management/tooling. Python (philosophically) treats the whole system as a dependency by default, in contrast with other modern languages that operate at the project/workspace level. This means it's very hard to isolate separate projects under the same system, or to reproducibly get a project running on a different system (even the same OS, because the system-wide state of one machine vs the next matters so much).

People work around these issues with various kludges like virtual environments, Docker (just ship the whole system!), and half a dozen different package managers, each with their own manifest format. But this is a problem that simply doesn't exist in Go, JavaScript, Rust, and others.

For code that never needs anything except the standard library, or for a script that never needs to be maintained or run on a different machine, Python is fine. Maybe even nice. But I've watched my coworkers waste so many hundreds of developer-hours just trying to wrangle their Python services into running locally, managing virtual environments, keeping them from trampling on each other's global dependencies, following setup docs that don't work consistently, and fixing deployments that fail every other week because the house is built on sand.

cjalmeida · on March 3, 2023

No.

Virtualenvs, and requirements are a thing in Python for ages.

I’ve used tons of languages and while not the best, Python dependency management and project isolation is decent. IMO certainly better than JavaScript.

henry700 · on March 3, 2023

It's decent if you've been in the loop enough to use it. It's not built-in. It's a good practice, for sure, but it not being built-in at the language level makes it insanely easy for a newcomer to just... Not use virtualenvs at all.

In contrast to Javascript/Node.js/NPM/Yarn/whatever-you-want-to-call-server-js, which maintains a local folder with dependencies for your project, instead of installing everything globally by default.

Heck, a virtual env is literally a bundled python version with the path variables overriden so that the global folder is actually a project folder, basically tricking Python into doing things The Correct Way.

noirscape · on March 4, 2023

Virtualenvs are a part of the standard library since v3.3[0] and most READMEs do reference them btw.

[0]: https://docs.python.org/3/library/venv.html

BerislavLopac · on March 4, 2023

It's been said, quite correctly, that Python is the second best language for everything.

I feel that it has recently - like many really mature platforms - become very much like the elephant from that old apocryphal story [0]. It is being used for many different purposes, with very different requirements and needs, with users being so focused on their own use that anything outside that is considered "bloat" and "waste".

[0] https://en.wikipedia.org/wiki/Blind_men_and_an_elephant

noloblo · on March 3, 2023

when it comes to slightly more non simple use cases involving parallelism and concurrency python and their imperative kin starts falling quite short of basic needs that are easily satisfied by

fp languages like

ocaml

haskell

racket

common lisp

erlang

elixir

or rust/golang

but even if the code is single threaded and not hampered by GIL limitations python tends to be super slow imho; also debugging dynamic python and imperative stateful python after a certain code base size >10k LOC gets extremely painful

ironman1478 · on March 3, 2023

A lot of these problem spaces can get away with single threaded performance because maybe they're generating a report or running an analysis once a day or at even slower frequency. I work in a field where numerical correctness and readability is important for prototyping control algorithms (I work on advanced sensors) and python satisfies for those properties for our analysis and prototyping work.

When we really want or need performance we rewrite the slow part in C++ and use pybind to call into it. For all the real implementations that run in a soft real time system, everything is done in C++ or C depending on the ecosystem.

noloblo · on March 3, 2023

debugging dynamic python and imperative stateful python after a certain code base size >10k LOC gets extremely painful

for any meaningful scale you are better served by basic FP hygiene as evidenced in

haskell

elixir

CL/racket

or rust/golang

ironman1478 · on March 3, 2023

Because you say it doesn't make it true. It's not that painful or painful at all really. Good abstractions and planning make writing and maintaining a python easy, just like in any language.

cjalmeida · on March 3, 2023

I don’t get it. Go is as imperative as a language can be.

noloblo · on March 3, 2023

go is imperative but there are functional elegant styles borrowed from otp/erlang in ergo https://github.com/ergo-services/ergo https://memo.barrucadu.co.uk/three-months-of-go.html

tmtvl · on March 3, 2023

Common Lisp, paragon of FP:

  (loop for x across numbers
        when (evenp x)
          do (setf result (+ result x)))

I mean yeah, you can do FP in CL, but it allows you to program in any paradigm which you prefer.

namaria · on March 3, 2023

I agree. But most people just need a pick up truck, not forming railway consists.

PaulHoule · on March 4, 2023

Python is ideal for the non-professional programmer who wants to put their skills and knowledge on wheels.

kilgnad · on March 3, 2023

>As many have mentioned, there is a large subset of the user base that uses Python for applied purposes in unrelated fields that couldn’t care less about more granular aspects of optimization.

Nobody cares about this that much. Even a straight up software developer in python doesn't care. The interpreter is so slow that most optimization tricks are irrelevant to the overall bottleneck. Really optimizing python involves the FFI and using C or C++, which is a whole different ball game.

For the average python developer (not a data scientist) most frameworks have already done this for you.

pletnes · on March 3, 2023

Python keeps growing in number of users because it’s easy to get started, has libraries to load basically any data, and to perform any task. It’s frequently the second best language but it’s the second best language for anything.

By the time a python programmer has «graduated» to learning a second language, exponential growth has created a bunch of new python programmers, most of which don’t consider themselves programmers.

There are more non-programmers in this world, and they don’t care - or know about - concurrency, memory efficiency, L2 cache misses due to pointer chasing. These people all use python. This seems to be a perspective missing from most hackernews discussions, where people work on high performance Big corp big data web scale systems.

Yoric · on March 3, 2023

I fully agree with the description.

What worries me, though, is that the features that make Python quite good at prototyping make it rather bad at auditing for safety and security. And we live in a world in which production code is prototyping code, which means that Python code that should have remained a quick experiment – and more often than not, written by people who are not that good at Python or don't care about code quality – ends up powering safety/security-critical infrastructures. Cue in the thousands of developer-hours debugging or attempting to scale code that is hostile to the task.

I would claim that the same applies to JavaScript/Node, btw.

dkarl · on March 3, 2023

I sometimes think about what Python would be like if it were written today, with the hindsight of the last thirty years.

Immutability would be the default, but mutability would be allowed, marked in some concise way so that it was easy to calculate things using imperative-style loops. Pervasive use of immutable instances would make it impossible for libraries to rely on mutating objects a la SQLAlchemy.

The language would be statically type-checked, with optional type annotations and magic support for duck typing (magic because I don't know how that would work.) The type system would prioritize helpful, legible feedback, and it would not support powerful type-level programming, to keep the ecosystem accessible to beginners.

It would still have a REPL, but not everything allowed in the REPL would be allowed when running code from a file.

There would be a strong module system that deterred libraries from relying on global state.

Support for at least one fairly accessible concurrency paradigm would be built in.

I suspect that the error system would be exception-based, so that beginners and busy people could write happy path code without being nagged to handle error values and without worrying that errors could be invisibly suppressed, but there might be another way.

jezzamon · on March 3, 2023

I think free mutability and not really needing to know about types are two things that make the language easier for beginners.

If someone who's not familiar with programming runs into an error like "why can't I change the value of X" that might take them multiple hours to figure out, or they may never figure it out. Even if the error message is clear, total beginners often just don't know how to read them and use them.

They provide longer term advantages once your program becomes larger but the short term advantages are more important as a scripting language imo

dkarl · on March 3, 2023

The type system I want would just be a type system that tells you that your code will fail, and why. Pretty much the same errors you get at runtime. Hence the need for my hypothetical type system to handle duck typing.

I don't think mutability by default is necessary for beginners. They just need obvious ways of getting things done. There are two places beginners use mutability a lot. The first is gradual transformation of a value:

    line = "The best of times, the worst "
    line = line.trim()
    line = line[:line.find(' ')]

This is easily handled by using a different name for each value. The second is in loops:

    word_count = 0
    for line in lines():
        word_count += num_words(line)

I think in a lot of cases beginners will have no problem using a map or list comprehension idiom if they've seen examples:

    word_counts = [num_words(line) for line in lines]
    # or word_counts = map(num_words, line)
    word_count = sum(word_counts)

But for cases where the immutable idiom is a bit tricker (like a complicated fold) they could use a mutable variable using the mutability marker I mentioned. Let's make the mutability marker @ since it tells you that the value can be different "at" different times, and let's require it everywhere the variable is used:

    word_count @= 0
    for line in lines():
        word_count @= word_count + num_words(line)

Voila. The important thing is not to mandate immutability, but to ensure that mutability is the exception, and immutability the norm. That ensures that library writers won't assume mutability and rely on it (cough SQLAlchemy cough), and the language will provide good ergonomic support for immutability.

It's a common claim that immutability only pays off in larger programs, but I think the mental tax of mutability starts pretty immediately for beginners. We're just used to it. Consider this example:

    dog_name = Name(first='Rusty', last='Licks')
    dog1.name = favorite_name
    dog_name.last = 'Barksalot'
    dog2.name = favorite_name
    print(dog1.name) # It's not Rusty Licks!

Beginners shouldn't have to constantly wrestle with the difference between value semantics and reference semantics! This is the simplest possible example, and it's already a mind-bender for beginners. In slightly more complicated guises, it even trips up professionals programmers. I inherited a Jupyter notebook from a poor data scientist who printed out the same expression over and over again in different places in the notebook trying to pinpoint where and why the value changed. (Lesson learned: never try to use application code in a data science calculation... lol.) Reserving mutability for special cases protects beginners from wrestling with strange behavior from mistakes like these.

lupex · on March 3, 2023

You should check out Julia (https://julialang.org/), that's very close to what you describe.

cjalmeida · on March 3, 2023

You beat me to it!

Julia is both dynamic and fast. It doesn’t solve all issues but uniquely solves the problem of needing 2 languages if you want flexibility and performance.

cornholio · on March 3, 2023

Exception error handling - and their extensive use in the standard library -is the fundamental design mistake that prevented Python becoming a substantial programing language.

Coupled with the dynamic typing and mutability by default, it guarantees Python programs won't scale, relegating the language to the role of a scratchpad for rough drafts and one off scripts, a toy beginner's language.

Daishiman · on March 4, 2023

I have no idea why you say that it's a scratchpad or a toy language consdering that far more production lines of code are getting written in Python nowadays than practically any other language with the possible exception of Java.

cornholio · on March 4, 2023

But that's the same with Excel: massive usage for throwaway projects with loose or non-existing requirements or performance bounds that end-up in production. Python is widely used, but not for substantial programming in large projects - say, projects over 100 kloc. Python hit the "quick and dirty" sweet spot of programming.

Daishiman · on March 6, 2023

This is absolutely not true. I’ve made my living working with Python and there’s an astounding amount of large Python codebases. Onstage and YouTube alone have millions of lines of code. Hedge funds and fintechs base their entire data processing workflows around Python batch jobs. Django is about as popular as Rails and powers millions of websites and backends.

None of those applications are toys. I have no idea where your misperception is coming from.

cornholio · on March 9, 2023

I guess I'm more than a little prejudiced from trying to maintain all sorts of CI tools, web applications and other largeish programs somebody initially hacked in Python in an afternoon and which grew to become "vital infrastructure". The lack of typing bytes you hard and the optional typing that has been shoehorned into the language is irrelevant in practice.

All sorts of problems would simply have not existed if the proper language was used from the beginning, as opposed to the one where anyone can hack most easily.

Too · on March 4, 2023

Statically typed with duck typing is called structural typing. (As opposed to nominal typing, with inheritance hierarchies).

It’s already what you get with python and mypy. Using typing.Protocol or Unions.

noirscape · on March 4, 2023

This is pretty much what nim is btw. Very fun language in my experience.

jasfi · on March 4, 2023

Nim is fun, but it needs to be more popular.

Yoric · on March 3, 2023

Aren't you kinda describing OCaml?

awkward · on March 3, 2023

We still live in a world where many outward facing networked applications are written in C. Dynamic languages with safe strings are far from the floor for securable tools.

Yoric · on March 3, 2023

That is true.

However, I hope that these C applications are written by people who are really good at C. I know that some of these Python applications are written by people who discovered the language as they deployed into production.

KyeRussell · on March 3, 2023

That’s a measure of programming prowess, not the actual security concern at hand.

If the masterful C developer still insists on using a language that has so many footguns and a weird culture of developers pretending that they’re more capable than they are, then their C mastery could very well’ve not been worth much against someone throwing something together in Python, which will at the very least immediately bypass the vast majority of vulnerabilities found in C code. Plus, my experience with such software is that the sort of higher level vulnerabilities that you’d still see in Python code aren’t ones that the C developer has necessarily dealt with.

Yoric · on March 3, 2023

That's entirely possible.

How could we check?

stevofolife · on March 3, 2023

Python code can be production code. There are many people and companies shipping Python production code and generating substantial value.

Yoric · on March 3, 2023

You are correct, there are absolutely huge companies shipping Python production code and generating substantial value.

Do we agree that this somewhat orthogonal to what I'm writing, though?

wongarsu · on March 3, 2023

A popular opinion in game development is that you should write a prototype first to figure out what works and is fun, and once you reach a good solution throw away that prototype code and write a proper solution with the insigts gained. The challenge is that many projects just extend the prototype code to make the final product, and end up with a mess.

Regular sofware development is a lot like that as well. But you can kind of get around that by having Python as the "prototyping language", and anything that's proven to be useful gets converted to a language that's more useful for production.

osigurdson · on March 3, 2023

Hey, it is better than programs written as Excel functions.

crispyambulance · on March 3, 2023

It is.

But I fear that the same folks that decried the use of excel by "the masses" are now just as horrified by the widespread usage of Python! :-)

di456 · on March 3, 2023

> the features that make Python quite good at prototyping make it rather bad at auditing for safety and security

What's an example that makes it bad? Is it a case of the wrong tool for the job?

For example I understand that garbage collection languages shouldn't be used with real time systems like flight controllers.

syntheweave · on March 3, 2023

What audits need most is some ability to analyze the system discretely and really "take it apart" into pieces that they can apply metrics of success or failure to(e.g. pass/fail for a coding style, numbers of branches and loops, when memory is allocated and released).

Python is designed to be highly dynamic and to allow more code paths to be taken at runtime, through interpreting and reacting to the live data - "late binding" in the lingo, as opposed to the "early binding" of a Rust or Haskell, where you specify as much as you can up front and have the compiler test that specification at build time. Late binding creates an explosion of potential complexity and catastrophic failures because it tends to kick the can down the road - the program fails in one place, but the bug shows up somewhere else because the interpreter is very permissive and assumes what you meant was whatever allows the program to continue running, even if it leads to a crash or bad output later.

Late binding is very useful - we need to assume some of it to have a live, interactive system instead of a punchcard batch process. And writing text and drawing pictures is "late binding" in the sense of the information being parsed by your eyes rather than a machine. But late binding also creates a large surface area where "anything can happen" and you don't know if you're staying in your specification or not.

di456 · on March 4, 2023

Interesting. What kinds of software get this level of audit scrutiny?

Yoric · on March 3, 2023

There are many examples, but let's speak for instance of the fact that Python has privacy by convention and not by semantics.

This is very useful when you're writing unit tests or when you want to monkey-patch a behavior and don't have time for the refactoring that this would deserve.

On the other hand, this means that a module or class, no matter how well tested and documented and annotated with types, could be entirely broken because another piece of code is monkey-patching that class, possibly from another library.

Is it the case? Probably not. But how can you be sure?

Another (related) example: PyTorch. Extremely useful library, as we have all witnessed for a few years. But that model you just downloaded (dynamically?) from Hugging Face (or anywhere else) can actually run arbitrary code, possibly monkey-patching your classes (see above).

Is it the case? Probably not. But how can you be sure?

Cue in supply chain attacks.

That's what I mean by auditing for safety and security. With Python, you can get quite quickly to the result you're aiming for, or something close. But it's really, really, really hard to be sure that your code is actually safe and secure.

And while I believe that Python is an excellent tool for many tasks, I am also something of an expert in safety, with some experience in security, and I consider that Python is a risky foundation to develop any safety- or security-critical application or service.

di456 · on March 6, 2023

Thanks for this, super insightful perspective

xaitv · on March 3, 2023

There's also the argument that at a certain scale the time of a developer is simply more expensive than time on a server.

If I write something in C++ that does a task in 1 second and it takes me 2 days to write, and I write the same thing in Python that takes 2 seconds but I can write it in 1 day, the 1 day of extra dev time might just pay for throwing a more high performance server against it and calling it a day. And then I don't even take the fact that a lot of applications are mostly waiting for database queries into consideration, nor maintainability of the code and the fact that high performance servers get cheaper over time.

If you work at some big corp where this would mean thousands of high performance servers that's simply not worth it, but in small/medium sized companies it usually is.

dagw · on March 3, 2023

Realistically something that takes 1 second in C++ will take 10 seconds (if you write efficient python and lean heavily on fast libraries) to 10 minutes in python. But the rest of your point stands

giantrobot · on March 3, 2023

I spend most of my time waiting on IO, something like C++ isn't going to improve my performance much. If C++ takes 1ms to transform data and my Python code takes 10ms, it's not much of a win for me when I'm waiting 100ms for IO.

With Python I can write and test on a Mac or Windows and easily deploy on Linux. I can iterate quickly and if I really need "performance" I can throw bigger or more VPSes at the problem with little extra cognitive load.

I do not have anywhere near the same flexibility and low cognitive load with C++. The better performance is nice but for almost everything I do day to day completely unnecessary and not worth the effort. My case isn't all cases, C++ (or whatever compiled language you pick) will be a win for some people but not for me.

Gasp0de · on March 3, 2023

And how much code is generally written that actually is compute heavy? All the code I've ever written in my job is putting and retrieving data in databases and doing some basic calculations or decisions based on it.

jimbokun · on March 3, 2023

Rule of thumb:

Code is "compute heavy" (could equally be memory heavy or IOPs heavy) if it's deployed into many servers or "the cloud" and many instances of it are running serving a lot of requests to a lot of users.

Then the finance people start to notice how much you are paying for those servers and suddenly serving the same number of users with less hardware becomes very significant for the company's bottom line.

The other big one is reducing notable latency for users of your software.

Yoric · on March 3, 2023

That is absolutely true.

But sometimes, you do end up writing that compute heavy piece of code. At that stage, you have to learn how to write your own native library :)

Speaking of which, I've written some Python modules in Rust using PyO3, its' a very agreeable experience.

joenot443 · on March 3, 2023

Damn! Is the rule of thumb really a 10x performance hit between Python/C++? I don’t doubt you’re correct, I’m just thinking of all the unnecessary cycles I put my poor CPU through.

vgatherps · on March 3, 2023

Outside cases where Python is used as a thin wrapper around some C library (simple networking code, numpy, etc) 10x is frankly quite conservative. Depending on the problem space and how aggressively you optimize, it's easily multiple orders of magnitude.

DiogenesKynikos · on March 3, 2023

Those cases are about 95% of scientific programming.

This is the first line in most scientific code:

    import numpy

intelVISA · on March 3, 2023

FFI into lean C isn't some perf panacea either, beyond the overhead you're also depriving yourself of interprocedural optimization and other Good Things from the native space.

dagw · on March 3, 2023

Of course it depends on what you are doing, but 10x is a pretty good case. I recently re-wrote a C++ tool in python and even though all the data parsing and computing was done by python libraries that wrap high performance C libraries, the program was still 6 or 7 times slower than C++. Had I written the python version in pure python (no numpy, no third party C libraries) it would no doubt have been 1000x slower.

ale42 · on March 3, 2023

It depends on what you're doing. If you load some data, process it with some Numpy routines (where speed-critical parts are implemented in C) and save a result, you can probably be almost as fast as C++... however if you write your algorithm fully in Python, you might have much worse results than being 10x slower. See for example: https://shvbsle.in/computers-are-fast-but-you-dont-know-it-p... (here they have ~4x speedup from good Python to unoptimized C++, and ~1000x from heavy Python to optimized one...)

dralley · on March 3, 2023

It can be anywhere from 2-3x for IO-heavy code to 2000x for tight vectorizable loops. But 20x-80x is pretty typical.

Yoric · on March 3, 2023

Last time I checked (which was a few years ago), the performance gain of porting a non-trivial calculation-heavy piece of code from Python to OCaml was actually 25x. I believe that performance of Python has improved quite a lot since then (as has OCaml's), but I doubt it's sufficient to erase this difference.

And OCaml (which offers a productivity comparable to Python) is sensibly slower than Rust or C++.

sva_ · on March 3, 2023

It really depends on what you're doing, but I don't think it is generally accurate.

What slows Python down is generally the "everything is an object" attitude of the interpreter. I.e. you call a function, the interpreter has to first create an object of the thing you're calling.

In C++, due to zero-cost abstractions, this usually just boils down to a CALL instruction preceded by a bunch of PUSH instructions in assembly, based on the number of parameters (and call convention). This is of course a lot faster than running through the abstractions of creating some Python object.

kaba0 · on March 3, 2023

> What slows Python down is generally the "everything is an object" attitude of the interpreter

Nah, it’s the interpreter itself. Due to it not having JIT compilation there is a very high ceiling it can not even in theory surpass (as opposed to things like pypy, or graal python).

codethief · on March 3, 2023

I don't think this is true: Other Python runtimes and compilers (e.g. Nuitka) won't magically speed up your code to the level of C++.

Python is primarily slowed down because of the fact that each attribute and method access results in multiple CALL instructions since it's dictionaries and magic methods all the way down.

kaba0 · on March 3, 2023

Which can be inlined/speculated away easily. It won’t be as fast as well-optimized C++ (mostly due to memory layout), but there is no reason why it couldn’t get arbitrarily close to that.

codethief · on March 3, 2023

> Which can be inlined/speculated away easily.

How so? Python is dynamically typed after all and even type annotations are merely bolted on – they don't tell you anything about the "actual" type of an object, they merely restrict your view on that object (i.e. what operations you can do on the variable without causing a type error). For instance, if you add additional properties to an object of type A via monkey-patching, you can still pass it around as object of type A.

kaba0 · on March 3, 2023

A function/part of code is performed say a thousand times, the runtime collects statistics that object ‘a’ was always an integer, so it might be worthwhile to compile this code block to native code with a guard on whether ‘a’ really is an integer (that’s very cheap). The speedup comes from not doing interpretation, but taking the common case and making it natively fast and in the slow branch the complex case of “+ operator has been redefined” for example can be handled simply by the interpreter. Python is not more dynamic than Javascript (hell, python is strongly typed even), which hovers around the impressive 2x native performance mark.

Also, if you are interested, “shapes” are the primitives of both Javascript and python jit compilers instead of regular types.

ale42 · on March 3, 2023

Other than this, dynamic typing is a big culprit. I can't find back the article with the numbers, but its performance overhead is enormous.

moffkalast · on March 3, 2023

Well at least 10x, sometimes more. Not really surprising when you think about that it's a VM reading and parsing your code as a string at runtime.

bombolo · on March 3, 2023

> it's a VM reading and parsing your code as a string at runtime.

Commonly it creates the .pyc files, so it doesn't really re-parse your code as a string every time. But it does check the file's dates to make sure that the .pyc file is up to date.

On debian (and I guess most distributions) the .pyc files get created when you install the package, because generally they go in /usr and that's only writeable by root.

It does include the full parser in the runtime, but I'd expect most code to not be re-parsed entirely at every start.

The import thing is really slow anyway. People writing command lines have to defer imports to avoid huge startup times to load libraries that are perhaps needed just by some functions that might not even be used in that particular run.

kaba0 · on March 3, 2023

> re-parse your code as a string every time

That doesn’t really take any significant time though on modern processors.

moffkalast · on March 3, 2023

Aren't those pyc files still technically just string bytecode, but encoded as hex?

bombolo · on March 3, 2023

Well bytecode isn't the same as the actual code you write in your editor.

benji-york · on March 3, 2023

As a long-time Python lover, yes that's a decent rule of thumb.

roflyear · on March 3, 2023

It is anywhere from 1x to 100x+.

bombolo · on March 3, 2023

If the 1 second is spent waiting for IO, it will take 1 second in whatever language.

But yes python is slow.

However I've seen good python code be faster than bad C code.

roflyear · on March 3, 2023

Well, to be fair the "good python code" is probably just executing something written in c lol. But lots of python is backed up by stuff written in c.

bombolo · on March 3, 2023

Not necessarily. Just using a better optimized sort or hash algorithm can make a big difference.

I was talking specifically of pure python code (except the python's standard library itself, where it really is unavoidable).

kaba0 · on March 3, 2023

Of course algorithmic complexity will trump anything else at big enough n values.

roflyear · on March 3, 2023

Not for everything. There are plenty of Python operations that are not 10x slower than c.

dagw · on March 3, 2023

That is true, but there are relatively few real world applications that consist of only those operations. In the example I mentioned below, there where actually some parts of my python rewrite that ended up faster than the original C++ code, but once everything was strung together into a complete application those parts where swamped by the slow parts.

ghostwriter · on March 3, 2023

Most of the time these are arithmetic tight loops that require optimisations, and it's easy to extract those into separate compiled cython modules without losing overal cohesion within the same Python ecosystem.

eska · on March 3, 2023

If Python was merely twice as slow then I could agree with you.

bombolo · on March 3, 2023

Not all code needs to process terabytes of data.

I have code running that reads ~20 bytes, checks the internal status on an hashmap and flips a bit.

Would it be faster in C? Of course.

Would it have taken me much longer to write to achieve absolutely no benefit? Yes.

mharig · on March 4, 2023

Speeding up the time critical parts with Cython or Numba or ... is rather easy.

amelius · on March 3, 2023

There are also programmers who are tired of chasing pointers and simply want to get stuff done.

E.g. people who once wrote "robust" code in Rust but were "outcompeted" left and right by coworkers who churn out shiny new things at 10x the speed.

CipherThrowaway · on March 3, 2023

At some point, every engineer has heard this same argument but in favor of all kinds of dubious things such as emailing zip files of source code, not having tests, not having a build system, not doing IaC, not using the type system, etc.

I'm sure Rust was the wrong tool for the job in your case but I find this type of get shit done argument unpersuasive in general. It overestimates the value of short-term delivery and underestimates how quickly an investment in doing things "the right way" pays off.

airtag · on March 3, 2023

Totally depends on the business you're in.

If you're dealing in areas with short time limits then Python is great, because you can't sell a ticket for a ship that has sailed.

And I've seen "the right way" which, again, depending on the business may result in a well designed product that is not what's actually needed (because people are really bad at defining what they want)

What's brilliant with Python compared to other hacky solutions that it does support test, type hints, version control and other things. It just doesn't force you to work that way. But if you want to write stable, maintainable code, you can do it.

That means you can write your code without types and add them later. Or add tests later once your prototype was been accepted. Or whenever something goes wrong in production, fix it and then write a test against that.

Oh and I totally agree you should certainly try to "do things the right way", if the business allows it.

osigurdson · on March 3, 2023

It is hard to believe that Python is objectively that much more productive than other languages. I know Python moderately well (with much more real world experience in C#). I like Python very much but I don't think it is significantly more productive than C#.

zeku · on March 3, 2023

Python is out of this world more productive in the Science space and Data space.

The only thing that can compete with it for productivity in the science space is R.

cjalmeida · on March 3, 2023

This. C#, Java or even newcomers such as Kotlin/Go are even in the same ballpark due to the REPL/Jupyter alone. Let alone when you consider the ecosystem

kyawzazaw · on March 3, 2023

If you are in a lab (natural science lab) or anywhere close to data, I bet you it is much more productive, even more so when you have to factor in that the code might be exposed to non-technical individuals.

ubercore · on March 3, 2023

Using Python vs Rust is in no way in the same league as not having tests.

CipherThrowaway · on March 3, 2023

Totally agree. That's why I clarified:

> I'm sure Rust was the wrong tool for the job in your case but I find this type of get shit done argument unpersuasive in general.

Unless you're working on a fire-and-forget project with a tiny time horizon get shit done arguments are blatantly short-termist.

cwyers · on March 3, 2023

The thing is that the short term is much easier to predict what you're going to need and where the value is, and in the long term you might not even work on this codebase anymore. Lot of incentives to get things done in the short term.

djbusby · on March 3, 2023

The business owner (whoever writes the checks) prefers get shit done over "the right way". Time to completion is a key factor of the payoff function of the devs work.

CipherThrowaway · on March 3, 2023

The entire point of doing things the right way is that you end up delivering more value in the long term, and "long term" can be as soon as weeks or even days in some cases.

Business owners definitely prefer less bugs, less customer complaints, less support burden, less outages, less headaches. Corner cutting doesn't make economic sense for most businesses and good engineering leadership doesn't have much trouble communicating this up the chain. The only environment where I've seen corner cutting make business sense is turd polishing agencies whose business model involves dumping their mistakes on their clients and running away so the next guy can take the blame.

airtag · on March 3, 2023

Try the travel/event booking business (where I'm in) - and no, people don't dump their mistakes on the next guy here - to the contrary, the "hacky" Python solutions are supported for years and teams stay for decades (allthough a decade ago we had not discovered how great Python was)

What business owners actually don't like at all is how long is takes traditional software development to actually solve problems - which then don't really fit the business after wasting a few years of ressources... and the dumping and running away is worse in Java and other compiled software. With Python you can at least read the source in production if the team ran away...

ElectricalUnion · on March 3, 2023

> the dumping and running away is worse in Java and other compiled software. With Python you can at least read the source in production if the team ran away...

Java (and dotnet, the two big "VM" languages) is somewhat of a strange example for that; JVM bytecode is surprisingly stable and reverse engineering is reasonably easy unless the code was purposely obfuscated - a bad sign on any language anyways.

jimbokun · on March 3, 2023

> It overestimates the value of short-term delivery

For an early stage start up this is almost the only relevant factor for success.

CipherThrowaway · on March 3, 2023

From the other half of that sentence:

> underestimates how quickly an investment in doing things "the right way" pays off.

What time horizon should a startup optimize delivery for? Minutes, hours, days, weeks? Say you're a startup dev in a maximalist "get shit done now" mindset so you're skipping types, tests, any forethought or planning so you can get the feature of the week done as fast as possible. This makes you faster for one week but slower the week after, and the week after, and the week after that.

Say a seed stage startup aims for 12 months runway to achieve some key outcomes. That's still a marathon. It still doesn't make sense to sprint the first 200 meters.

temp2022account · on March 3, 2023

> coworkers who churn out shiny new things at 10x the speed

Sounds like a classic web-dev perspective, my customers hate when we ship broken tools because it ruins their work, new feature velocity be dammned. We love our borrow checker because initially you run at 0.5x velocity but post-25kSLOC you get to run at 2x velocity, which continues to mystify managers worldwide.

FlyingSnake · on March 3, 2023

This is not just a web-dev perspective.

People use Python in financial applications, Data Engineering and AI/ML pipelines, infrastructure software etc and the 10x speed can be real.

intelVISA · on March 3, 2023

Feature factories give web dev such a bad rep, it doesn't have to be this way..

rich_sasha · on March 3, 2023

I use mostly Python, and a bit of Rust.

With Python, testing, good hygiene and a bit of luck you can write core that is maybe 99% reliable. It is very, very hard to get to (100-eps)% for eps < 0.1% or so. Rust seems better suited to that.

Anything else, especially if there isn't a huge premium on speed, meh - Python is almost always sufficient, and not in the way.

wiz21c · on March 3, 2023

I use the same combo: lots of Python to analyse problems, test algos, process data, etc. Then, once I settle on a solution but still need more performance (outside GPU's), I go to rust.

stevofolife · on March 3, 2023

Genuinely curious, why do you need Rust?

wiz21c · on March 6, 2023

I'm simulating an audio speaker in real time. So I do the data crunching, model fitting, etc. in python and this gives me a godd theoretical model of the speaker. But to be able to make a simulation in realtime, I need lots of speed so rust makes sense there (moreover, the code I have to plug that in is rust too, so one more reason :-)). (now tbh, my realtime needs are not super hard, so I can avoid a DSP and a real time OS :-) )

I don't need rust specifically. It's just that its memory and thread management really help me to continue what I do in python: focusing on my core business instead of technical stuff.

The less I code the better I feel :-)

rootusrootus · on March 3, 2023

My most successful career epiphany was realizing that everyone -- my customers, my boss, etc -- was happier if I shipped code when I thought it was 80% ready. That long tail from 80-100% generates a lot of frustration.

osigurdson · on March 3, 2023

Can you clarify this comment?

rootusrootus · on March 3, 2023

You mean the bit about frustration?

It's just an application of the Pareto principle. That last 20% of work to make perfect software costs a lot of time. Customers (and by extension, management) do not care how pretty your code is, how perfect your test coverage is (unless your manager is a former developer, then they might have more of an opinion), they care most that you ship it. Bugs are a minor irritation compared to sitting around waiting for functionality they need, as long as you're responsive in fixing the bugs that do come up.

osigurdson · on March 4, 2023

Thanks. I thought that is what you meant but another possible take was that the last 20% is actually important. Getting something 80% finished is fast and then the long tail to get it to 100% is frustrating for everyone because the work, in theory is finished. I think that can happen as well.

Of course there are at least three dimensions to discuss here: internal quality, external quality and product/feature fit. Lower quality internal code eventually leads to slower future development and higher turnover as no one wants to work with the crappy code base. Lower external quality (i.e. bugs) can lead to customers not liking your product. Interestingly the relationship between internal and external quality is not as direct as one might think. Getting features out the door more quickly (at the expense of other things) can help with product fit. Essentially, like most things, this is an ongoing optimization problem and different approaches are appropriate for different problem domains.

Yoric · on March 3, 2023

That is interesting. I went in the other direction :)

I am tired of having to refactor shiny new things churned out at 10x the speed and that keep breaking in production. These days, if given a choice, I prefer writing them in Rust code, spending more time writing and less time refactoring everything as soon as it breaks or needs to scale.

jimbokun · on March 3, 2023

When the pointer chasing (sometimes) comes in handy, is once you have a successful business with a lot of data and/or users, and suddenly the cost of all those EC2 instances comes to the attention of the CFO.

That's when rewriting the hot path in Go or Rust or Java or C or C++, can pay off and make those skills very valuable to the company. Making contributions to databases, operating systems, queueing systems, interpreters, Kubernetes etc. also fall into that category.

But yeah if you are churning out a MVP for a new business, yeah starting with Python or Ruby or Javascript is a better bet.

(Erlang/Elixir is also an interesting point in the design space, as it's very high level and concise, but also scales better than anything else, although not especially efficient for code executing serially. And Julia offers the concision of Python with much higher performance for numerical computing.)

a4a4a4a4 · on March 3, 2023

Or there are programmers who write both. Something that I want to write once, have run on several different platforms, handle multi-threading nicely, and never have to think about again? Rust. Writing something to read in some data to unblock an ML engineer or make plots for management? Definitely not Rust, probably python. Then you can also churn out things at 10x the speed, but by writing the tricky parts in something other than python, you don't get dragged back down by old projects rearing their ugly heads, so you outpace the python-only colleagues in the long-term.

jamincan · on March 3, 2023

Programming is secondary to my primary duties and only a means for me to get other things done. I'm in constant tension between using Python and Rust.

With Python I can get things up and going very quickly with little boilerplate, but I find that I'm often stumbling on edge cases that I have to debug after the fact and that these instances necessarily happen exactly when I'm focused on another task. I also find that packaging for other users is a major headache.

With Rust, the development time is much higher for me, but I appreciate being able to use the type-system to enforce business logic and therefore find that I rarely have to return to debug some issue once I have it going.

It's a tough trade-off for me, because I appreciate the velocity of Python, but Rust likely saves me more time overall.

OJFord · on March 3, 2023

If you're 'tired of chasing pointers', Rust's a lot closer to (and I'd argue better than) Python than say Go - it'll tell you where the issue is and usually how to fix it; Go will just blow up at run time. (Python (where applicable) will do something unexpected and wrong but potentially not error (..great!))

(Fwiw I use all three, Python professionally.)

otabdeveloper4 · on March 3, 2023

> at 10x the speed

Is a coping canard invented by programmers who can't into more powerful programming languages.

By far the slowest language for developing is PHP, it's even worse than plain C in that regard.

benrutter · on March 3, 2023

I completely agree - but you say that like it's a bad thing. I work as a developer alongside data scientists, who might have strong knowledge of statistics or machine learning frameworks rather than traditional programming chops.

For the most part they don't need to know about concurrency, memory efficiency etc, because they're using a library where those issues have been abstracted away.

I think that's what makes python ideal - it's interoperability with other languages and library ecosystem means less technical people can produce good, efficient work without having to take on a whole bunch of the footguns that would come from working directly in a language like c++ or Rust.

_aavaa_ · on March 3, 2023

But this is a false dichotomy. The space of options isn't C++/Rust or Python. There are languages which attempt to give the best of both worlds, e.g. Julia.

> they're using a library where those issues have been abstracted away.

I work in Python, and while libraries like numpy have certainly abstracted away some of those issues, there's still so much performance left of the table because Python is still Python.

ofrzeta · on March 3, 2023

I'd say if you do data-intenstive computation with Numpy you are not leaving much on the table due to Python.

_aavaa_ · on March 3, 2023

Have gone through the exercise, I know this is false.

Not everything can be pushed into numpy, and you can still be left with lots of loops in python.

dragonwriter · on March 3, 2023

That's what numba [0] is for (can also help with the NumPy stuff in certain cases.)

[0] https://numba.pydata.org/

_aavaa_ · on March 3, 2023

Oh, I'm familiar with numba and while it certainly helps, it has plenty of it's own issues. You don't always get a performance gain and you only find this out at the end of a refactoring. Your code can get less readable if you need to transport data in and out of formats that it's compatible with (looking at you List()).

To say nothing of adding yet another long dependancy chain to the language (python 3.11 is still not supported even though work started in Aug of last year).

I do wonder if the effort put into making this slow language fast could have been put to better use, such as improving a language with python's ease of use but which was build from the beginning with performance in mind.

dagw · on March 3, 2023

I've rewritten real world performance critical numpy code in C and easily gotten 2-5x speedup on several occasions, without having to do anything overly clever on the C side (ie no SIMD or multiprocessing C code for example).

VBprogrammer · on March 3, 2023

Did you rewrite the whole thing or just drop into C for the relevant module(s)? Because the ability to chuck some C into the performance critical sections of your code is another big plus for Python.

mandevil · on March 3, 2023

But... pretty much any language can interoperate with C, it's calling conventions have become the universal standard. I mean, I still remember at $previousJob when I was deprecating a C library and carefully searched for any mention of the include file... only to discover that a whole lot of Fortran code depended on the thing I was changing, and I had just broken all of it (since Fortran doesn't use include files the same way, my search for "#include <my_library" didn't return any hits, but the function calls were there none-the-less).

Julia, to use the great-great-grand-op's example, seems to also have a reasonably easy C interop (I've never written any Julia, so I'm basing this off skimming the docs, dunno, it might actually be much more of a pain than it looks like here).

cjalmeida · on March 3, 2023

Calling C from Julia is pretty straightforward

https://docs.julialang.org/en/v1/manual/calling-c-and-fortra...

_aavaa_ · on March 3, 2023

Even if you do this, you're still paying a penalty whenever you move data Python->C and C->Python.

Plus that you now need to write performant (and safe) C code, which (to me) defeats part of the reason to use Python in the first place.

pbourke · on March 3, 2023

I’ve done the same but moved from vanilla numpy to numba. The code mostly stayed the same and it took a couple hours vs however long a port to C or Rust would have taken.

_aavaa_ · on March 3, 2023

For a package whose pitch is "Just apply one of the Numba decorators to your Python function, and Numba does the rest." a few hours of work is a long time.

slt2021 · on March 3, 2023

2-5x speedup is not a lot, I would say it is not worth it to rewrite from py to C if you don't have an order of magnitude improvement.

Because if you compare the benefit to the cost of rewrite from py to C and cost of maintaining/updating C code and possible C footguns like manual memory safety, etc - then there is no benefit left

_aavaa_ · on March 3, 2023

2-5x IS a lot. It's the speed difference between the current iPhone 14 and an iPhone XS-iPhone 6. That's 4-8 years of hardware improvements.

And the parent was talking about numpy code which is better than stock python, who knows how far back normal python would send you.

jononomo · on March 3, 2023

I'm in the camp that 2-5x performance improvement is not really worth re-writing Python code in C for.

_aavaa_ · on March 3, 2023

Guess that'll depend on how much you need the performance and how much code it is.

They're comparing numpy (SIMD plus parallelism) with straightforward C code and getting a 2-5x improvement.

slt2021 · on March 4, 2023

I highly doubt that numpy can ever be a bottleneck. In typical python app - there are other things like I/O that consume resources and become bottleneck, before you run into numpy limits and justify rewrite in C.

_aavaa_ · on March 4, 2023

I haven't personally run into IO bottlenecks so I have no idea how you would speed those up in Python.

But there's two schools of thoughts I've heard from people regarding how to think about these bottlenecks:

1. IO/network is such a bottleneck so it doesn't matter if the rest is not as fast as possible.

2. IO/network is a bottleneck so you have to work extra hard on everything else to make up for it as much as possible.

I tend to fall in the second camp. If you can't work on the data as it's being loaded and have to wait till it's fully loaded, then you need to make sure you process it as quickly as possibly to make up for the time you spend waiting.

dagw · on March 4, 2023

In my typical python apps, it's 0.1-20 seconds of IO and pre-processing, followed by 30 seconds to 10 hours of number crunching, followed by 0.1-20 seconds of post processing and IO.

redsaz · on March 3, 2023

It can be worth it. What matters is how much time it saves your users over the course of using the app vs the time it took to develop it. So, if:

#-of-users * total-time-saved-per-user > time-spent-optimizing

Then it's worth it. You can even multiply by cost of user per time unit and cost of developer per time unit, to see how much money was saved.

Even in cases where its the same person on both sides, it can still work out. There's an xkcd comic about it, even.

jononomo · on March 3, 2023

2-5x speedup barely seems worth re-writing something for, unless we're talking calculations that take literally days to complete, or you're working on the kernel of some system that is used by millions of people.

roflyear · on March 3, 2023

Most programmers don't actually need to know about that stuff, either. And most programmers who do need to know about that stuff, don't know about it.

intelVISA · on March 3, 2023

Solving race conditions is QA's problem right? ;)

oblio · on March 3, 2023

You're so silly. QA? What's QA? Solving race conditions is the customer's problem :-p

bostik · on March 3, 2023

> For the most part they don't need to know about concurrency [...]

In my opinion, this is the part that Go got mostly right. Concurrency is handled by the runtime, and held behind a very thin veil. As a programmer you don't really need to know about it, but it's there when you need to poke at it directly. Exposing channels as a uniform communication mechanism has still enough footguns to be unpleasant, though.

In an ideal world, I should be able to decorate a [python] variable and behind the scenes the runtime would automatically shovel all writes to it through an implicitly created channel. Instead of me as a coder having to think about it. Reads could still go through directly because they are safe.

If I could have Python syntax and stdlib, with Go's net/http and crypto libraries included, and have concurrency handled transparently in Go-style without having to think about it, that would be pretty close to an all-wishes-come-true systems language. Oh, and "go fmt", "go perf" and "go fuzz" as first-class citizens too.

Someone else in this thread brought up the idea of immutable data structures as a default. I wouldn't mind that. Python used to have frozenset (technically it still does but I haven't seen a performance difference for a while), so extending the idea of freeze()/unfreeze() to all data types certainly has appeal.

butterfly771 · on March 3, 2023

In fact, the development of the world is based on constant levels of abstraction, just think of assembly language and computer punch tape programming, those days are not long past.

li2uR3ce · on March 3, 2023

> without having to take on a whole bunch of the footguns that would come from working directly in a language like c++ or Rust.

Don't forget the footguns of working with developers who do those things. Ask them to do something simple and you get something complex and expensive after months of back and forth about what is wanted. You're likely to a framework for a one off SQL query.

I hear it being said already, "You're using software developers wrong!" Well, maybe software developers shouldn't be so hard to use?

ryanianian · on March 3, 2023

> maybe software developers shouldn't be so hard to use?

This whole take assumes bad intention on both sides. Nobody's job is easy in this situation. Leadership's job is to set everyone up for success. If things go off the rails and end up with months of back and forth leading to nobody being happy despite good intentions and honest effort, then the problem lies with leadership.

nindalf · on March 3, 2023

> a whole bunch of the footguns

Could you expand on this in the context of Rust?

benrutter · on March 3, 2023

Sure thing! Footguns might be the wrong word, and I know as a low level language Rust is insanely safe, but for a high level developer it's type system is gonna mean spending a lot of time in the compiler figuring out type errors, at least initially. That might not be a traditional footgun, but if you're just trying to, I dunno, build a crud api or something, its gonna nuke your development time.

Please don't read this as "rust is difficult and bad", I definitely don't think it is! But its a low level language, and working with it means dealing with complexity that for some tasks just might not be relevant.

pkolaczk · on March 3, 2023

I find figuring out type errors is usually less work than figuring out the runtime bugs they prevented.

benrutter · on March 3, 2023

I agree, but for something like the CRUD app example I made bringing in pydantic or something would solve that. Rust's type system is a lot stricter because it's solving problems in a space that doesn't touch a lot of Python developers.

namaria · on March 3, 2023

In all fairness Rust was never meant for writing database interfaces, more like storage engines.

traverseda · on March 3, 2023

>and they don’t care - or know about - concurrency, memory efficiency, L2 cache misses due to pointer chasing.

Also if I (a programmer) want to write really really fast code I'm probably reaching for tools like tensorflow, numpy, or jax. So there's not much incentive for me to switch to a more efficient language when as near as I can tell the best tooling for dealing with SIMD or weird gpu bullshit seems to be being created for python developers. If you want to write fast code do it in c/rust/whatever, if you want to write really fast code do it in python with <some-ML-library>.

For a very specific definition of the word "fast" at least.

Filligree · on March 3, 2023

> Also if I (a programmer) want to write really really fast code I'm probably reaching for tools like tensorflow, numpy, or jax. So there's not much incentive for me to switch to a more efficient language when as near as I can tell the best tooling for dealing with SIMD or weird gpu bullshit seems to be being created for python developers. If you want to write fast code do it in c/rust/whatever, if you want to write really fast code do it in python with <some-ML-library>.

Rather unfortunately, my current bugbear is that Pytorch is... slow. On the CPU. One of the most common suggestions for people who want stable diffusion to be faster is, wait for it, "Try getting a recent Intel CPU, you'll see a real uplift in performance".

This despite the system only keeping a single CPU core busy. Of course, that's all you can do in Python most of the time.

(You can also use larger batch sizes. But that only partially papers over the issue, and also it uses more GPU memory.)

FpUser · on March 3, 2023

>"Also if I (a programmer) want to write really really fast code I'm probably reaching for tools like tensorflow, numpy, or jax."

Very limited view which ignores way too many areas.

traverseda · on March 3, 2023

Sure, examples?

vgatherps · on March 3, 2023

Your OS, the linear algebra libraries themselves, much of the user-facing software that you use (latency sensitive rather than throughput sensitive), image/video encoding/decoding, most of the language runtimes that you use, high volume webservers, high volume data processing (where your data is not already some nice flat list of numbers you're operating on with tensor operations), for some examples.

Really, for almost any X, somebody somewhere has to do X with strict performance requirements (or at very large scale, so better perf == savings)

Most of these python libraries are only fast for relatively large and relatively standard operations in the first place. If you have a lot of small/weird computations, they come with a ton of overhead. I've personally had to write my own fast linear algebra libraries since our hot loop was a sort of modified tropical algebra once.

kaba0 · on March 3, 2023

How is it in disagreement with parent?

vgatherps · on March 3, 2023

They asked for examples of non-numpy/tf/had use cases and I gave some including my own experience? No disagreement, HPC Python in practice is heavily biased towards numpy and friends

traverseda · on March 3, 2023

They're not, those are useful examples.

RcouF1uZ4gsC · on March 3, 2023

Your comment is super interesting because it suggests Python has evolved in a direction opposite to the Python Paradox - http://www.paulgraham.com/pypar.html

Whereas before you could get smarter programmers using Python, now because of the exponential growth of Python, the median Python programmer is likely someone with little or no software engineering or computer architecture background who is basically just gluing together a lot of libraries.

kittiepryde · on March 3, 2023

Neat observation. I wasn't doing much programming in 2004, but, I'm guessing 2004 Python would be like today's Rust. People learn it because they love it.

RcouF1uZ4gsC · on March 3, 2023

I think more so Rust than even Python on 2004 since Rust has a pretty steep learning curve and does require a non-trivial amount of dedication to learning it.

analog31 · on March 3, 2023

Perhaps today its not smart programmers per se, but smart people who are interested in learning to program.

The libraries are the killer feature for me.

Zingler · on March 3, 2023

> It’s frequently the second best language but it’s the second best language for anything.

This myth wasn't even true many years ago, it certainly isn't true today. You can build a mobile app, game, distributed systems, OS, GUI, Web frontend, "realtime" systems, etc in Python, but it is a weak choice for most of those things (and many others) let alone the second best option.

brookst · on March 3, 2023

The saying does not mean that in a rigorous evaluation Python would be second best out of all programming ecosystems for all problems.

The saying means that for any given problem, there is a better choice, but second best is the language you know which has all of the tools to get the job done, so the answer is probably just a bunch of pip installs, imports, and glue code.

It’s kind of like “the best camera is the one you have with you” — it’s a play on the differing definitions of “best” to highlight the value of feasibility over technical perfection.

toastal · on March 3, 2023

When I switched from PHP to Python years ago I had the same feeling as the OP, then it became the third best, then the fourth, then situational when object-orientation makes sense, then for just scripting, and now... unsure beyond a personal developer comfort/productivity preference. TUIs and GUIs built on Python on my machine seem to be the first things to have issues during system upgrades because of the package management situation.

moffkalast · on March 3, 2023

> it’s the second best language for anything

Anything that doesn't require high performance that is. Is there any 3D game engine for python yet? I guess Godot has gdscript which is 90% python by syntax, but that doesn't quite count I think.

loudmax · on March 3, 2023

You won't get high performance out of Python directly, but there are a lot of Python libraries that use C or a powerful low level language underneath. The heavy lifting in so much of machine learning is CUDA, but most people involved in ML are writing Python.

moffkalast · on March 3, 2023

Sure, but what's not really python per se. One could also call C++ libraries from java via JNI and pretend java is super fast.

If people write program logic in python it will run at python speeds. Otherwise you're not really writing python, like nobody says some linux native program is bash because it happens to be launched from a bash script.