Ask HN: Is anyone using PyPy for real work?

ggm · 2023-07-31T11:52:49

I'm using pypy to analyse 350m DNS events a day, through python cached dicts to avoid dns lookup stalls. I am getting 95% dict cache hit rate, and use threads with queue locks.

Moving to pypy definitely speeded me up a bit. Not as much as I'd hoped, it's probably all about string index into dict and dict management. I may recode into a radix tree. Hard to work out in advance how different it would be: People optimised core datastructs pretty well.

Uplift from normal python was trivial. Most dev time spent fixing pip3 for pypy in debian not knowing what apts to load, with a lot of "stop using pip" messaging.

danpalmer · 2023-07-31T12:08:46

Debian is its own worst enemy with things like this. It’s why we eventually moved off it at a previous job, because deploying Python server applications on it was dreadful.

I’m sure it’s better if you’re deploying an appliance that you hand off and never touch again, but for evolving modern Python servers it’s not well suited.

gjvc · 2023-07-31T12:22:44

Yes 1000x What is it with them which makes them feel entitled to have special "dist-packages" vs "site-packages" as is the default? This drives me nuts, when I have a bunch of native packages I want to bundle in our in-house python deployment. CentOS and Ubuntu are vanilla, and only Debian (mind-boggingly) deviates from the well-trodden path.

I still haven't figured out how to beat this dragon. All suggestions welcome!

stefanor · 2023-07-31T18:34:31

> What is it with them which makes them feel entitled to have special "dist-packages" vs "site-packages" as is the default? This drives me nuts, when I have a bunch of native packages I want to bundle in our in-house python deployment. CentOS and Ubuntu are vanilla, and only Debian (mind-boggingly) deviates from the well-trodden path.

Hi, I'm one of the people that look after this bit of Debian (and it's exactly the same in Ubuntu, FWIW).

It's like that to solve a problem (of course, everything has a reason). The idea is that Debian provides a Python that's deeply integrated into Debian packages. But if you want to build your own Python from source, you can. What you build will use site-packages, so it won't have any overlap with Debian's Python.

Unfortunately, while this approach was designed to be something all package-managed distributions could do, nobody else has adopted it, and consequently the code to make it work has never been pushed upstream. So, it's left as a Debian/Ubuntu oddity that confuses people. Sorry about that.

My recommendations are: 1. If you want more control over your Python than you get from Debian's package-managed python, build your own from source (or use a docker image that does that). 2. Deploy your apps with virtualenvs or system-level containers per app.

kevin_thibedeau · 2023-07-31T14:58:20

Dist packages is the right way to handle Python libs. You'd prefer to have the distro package manager clashing with Pip? Never knowing who installed what. Breaking things when updates are made.

eyegor · 2023-07-31T12:40:54

I usually make a venv in ~/.venv and then activate it at the top of any python project. Makes it much easier to deal with dependencies when they're all in one place.

gjvc · 2023-07-31T13:24:13

i am a big fan of .venv/ -- except when it takes ~45 mins to compile the native extension code in question -- then I want it all pre-packaged.

dgroshev · 2023-07-31T14:26:35

At this stage [0], uncompiled native extensions are not yet a bug, but a definite oversight of the maintainer. They should come as precompiled wheels

[0]: https://pythonwheels.com

doliveira · 2023-07-31T14:34:52

Honestly I don't think I've ever used a precompiled package in Python. Every single C stuff seems to take ages and requires all that fun stuff of installing native system dependencies.

Edit: skimming through this page, precompiling seems like an afterthought, and the linked packages don't even seem to mention how to integrate third-party libraries. So I guess I can see why it doesn't deliver on its promises.

toyg · 2023-07-31T14:46:07

Probably a function of the specific set of packages you use, or the pip options you specify. Pretty much all the major C packages come as wheels these days.

doliveira · 2023-07-31T15:12:12

They all come as wheels, they just aren't precompiled.

toyg · 2023-07-31T15:14:51

I honestly can't remember the last time I had to compile anything, and I am on Windows.

orf · 2023-08-01T02:19:50

Can you link one that comes as a wheel but is really a source distribution?

dgroshev · 2023-07-31T14:50:46

You can try pip install pillow for a good example of how it works. I suspect there's a strong survivorship bias here, as you'd only notice the packages that don't ship with wheels.

doliveira · 2023-07-31T15:20:59

Yeah, perhaps. One I remember from last year is the cryptography and numpy package, for instance. Now they do seem to ship with binary wheels, at least for my current Python and Linux version.

Kerberos and Hadoop stuff obviously still doesn't, though. I guess the joke's on me for being stuck in this stack...

montecarl · 2023-07-31T16:44:07

In order for a wheel to be used instead of a source distribution there needs to be one that matches your environment. For numpy you can take a look at the wheels for their latest release[1]. The filename of a wheel specifies where it can be used. Let's take an example:

numpy-1.25.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

This specifies cpython 3.9, linux, glibc 2.17 or higher, and x86_64 cpu. Looking through the list you will see that the oldest cpython supported is 3.9. So if you are running with an older version of python you will have to build from source.

I just learned a bit more about this recently because I could not figure out why PyQt6 would not install on my computer. It turned out my glibc was too old. Finally upgraded from Ubuntu 18.04.

[1] https://pypi.org/project/numpy/1.25.2/#files

martsa1 · 2023-07-31T21:11:34

Try `--only-binary :all:` to force pip to ignore sdist packages, might help avoid those slow compilations.

j0057 · 2023-07-31T17:55:56

It's a good idea to be caching sdists and wheels — for resilience against PyPI downtime, for left-pad scenarios, and even just good netiquette — and for packages that don't have a wheel for your environment, you can fairly easily build that wheel yourself and stick it into the cache.

synergy20 · 2023-07-31T13:23:00

second this and it's what I do on all Linux distros, just run it inside .venv as the site-installation.

if you need extra dependencies that pip can not do well in the .venv case, Conda can help with its own and similar site-based installation.

I don't know how it is different in the python installation case between ubuntu and debian, they seem the same to me.

KingMachiavelli · 2023-07-31T18:39:54

IMO bespoke containers using whatever python package manager makes sense for each project. Or make the leap to Nix(OS) and then still have to force every python project into compliance which can be very easy if the PyPy packages you need are already in the main Nix repo (nixpkgs) or very difficult if depends on a lot of uncommon packages, uses poetry, etc.

Since PEP 665 was rejected the Python ecosystem continues to lack a reasonable package manager and the lack of hashed based lock files prevents building on top of the current python project/package managers.

silon42 · 2023-07-31T12:45:15

dist packages are a must for software written in Python that is part of the distribution itself.

smashed · 2023-07-31T13:09:19

You're not really answering why they are important?

Is it because .deb packages will install inside dist-packages and when you run pip install as root without a virtual env, it installs inside site-packages?

I don't really see how this helps though? Sure you won't get paths to clash between the two but you still have duplicate packages which is probably not what you want..

rlpb · 2023-07-31T13:40:05

Debian ships packages with a coherent dependency structure that crosses language boundaries. You don't need to care what language something is written in to be able to "apt install" it. The expectation is that if it "apt installed" then it should Just Work because all the required dependencies were also pulled in from Debian at the same time.

Debian also tries to ship just one version of everything in a single distribution release to reduce the burden on its maintainers.

This is fundamentally at odds with pip. If you've pip installed something, then that'll likely be the latest version of that package, and in the general case won't be the version of the same thing that shipped in the Debian release. If there exist debs that depend on that package and they are shared between pip and debs, now the deb could be using a different version of the dependency than the deb metadata says is acceptable, leading to breakage.

Another way of putting this: it shouldn't be possible for you to pip upgrade a dependency that a deb shipped by Debian itself relies upon. Because then you'd creating a Frankenstein system where Debian cannot rely on its own dependencies providing what it expects.

This is fixed by having two places where things are installed. One for what the system package manager ships, and one for your own use with pip and whatever you want to do. In this sense, having duplicate packages is actually exactly what you want.

sanderjd · 2023-07-31T15:33:17

Yep, I screw things up all the time with packages in homebrew that are written in python, when I forget to switch into a virtual env before doing stuff with pip. Debian's solution seems very sensible. And it is the same solution as homebrew, I suppose, as long as you don't interact with any of the homebrew-installed packages via pip. But I find it quite easy to accidentally do that.

tech2 · 2023-07-31T15:54:28

  export PIP_REQUIRE_VIRTUALENV=1

has been quite helpful in the past as pip then refuses to just install things directly.

rlpb · 2023-07-31T16:53:55

There is https://peps.python.org/pep-0668/ which suggests that in the future this kind of behaviour will be default. I'm not sure of the specifics but I have seen lots of conversation about it in Debian circles.

sanderjd · 2023-07-31T16:56:59

Nice!

gjvc · 2023-07-31T19:11:15

OK, but... I get the same problem when I compile python from source. I'm not talking about the distribution's base files. In fact, I am in the business of creating an /opt/gjvc-corp-tools/ prefix with all my packages under there. When I compile python from source on Debian, the resulting installation (from make install) does not have a site-packages directory in-place already. That is what is mind-boggling.

pmontra · 2023-07-31T20:19:43

> This is fundamentally at odds with pip

It's at odds with everything. I leave the system versions of any language alone and use language manager tools or docker to be able to run the exact version that any project of my customers require. Asdf is my favorite because it handles nearly everything, even PostgreSQL.

dduong · 2023-07-31T13:43:23

Imagine you installed python3-requests (version x.y.z). Some of your distribution's packages depend on that specific package/version.

If you pip install requests globally, you just broke a few of your distrib's packages.

probably_a_gpt · 2023-08-01T02:30:25

> I still haven't figured out how to beat this dragon. All suggestions welcome!

Docker

bashinator · 2023-07-31T15:27:48

What distro did you move to? IME debian as a base image for python app containers is also kind of a pain.

danpalmer · 2023-08-02T13:53:16

We moved to stripped down Debian images in containers and made sure to not use any of the Debian packaging ecosystem.

bombolo · 2023-07-31T14:08:21

It works completely fine in my experience.

danpalmer · 2023-08-02T13:54:34

Lucky you. Having gone through multiple Debian upgrades, a Python 2->3 migration on Debian, and Debian python packaging to Pip/PyPI, it was a whole world of pain that cost us months of development time over years, as well as a substantial amount of downtime.

syllogism · 2023-07-31T13:04:51

If you have very large dicts, you might find this hash table I wrote for spaCy helpful: https://github.com/explosion/preshed . You need to key the data with 64-bit keys. We use this wrapper around murmurhash for it: https://github.com/explosion/murmurhash

There's no docs so obviously this might not be for you. But the software does work, and is efficient. It's been executed many many millions of times now.

ggm · 2023-08-01T04:50:49

I'm in strings, not 64 bit keys. But thanks, nice to share ideas.

syllogism · 2023-08-02T19:22:58

The idea is to hash the string into a 64-bit key. You can store the string in a value, or you can have a separate vector and make the value a struct that has the key and the value.

The chance of colliding on the 64-bit space is low if the hash distributes evenly, so you just yolo it.

mattip · 2023-07-31T12:19:39

> it's probably all about string index into dict and dict management

Cool. Is the performance here something you would like to pursue? If so could you open an issue [0] with some kind of reproducer?

[0] https://foss.heptapod.net/pypy/pypy/-/issues

ggm · 2023-08-01T06:13:55

I'm thinking about how to demonstrate the problem. I have a large pickle but pickle load/dump times across gc.disable()/gc.enable() really doesn't say much.

I need to find out how to instrument the seek/add cost of threads against the shared dict under a lock.

My gut feel is that probably if I inlined things instead of calling out to functions I'd shave a bit more too. So saying "slower than expected" may be unfair because there's limits to how much you can speed this kind of thing up. Thats why I wondered if alternate datastructures were a better fit.

its variable length string indexes into lists/dicts of integer counts. The advantage of a radix trie would be finding the record in semi constant time to the length in bits of the strings, and they do form prefix sets.

mattip · 2023-08-01T11:28:32

Would love to hear more. You can reach us with any of these methods https://www.pypy.org/contact.html

CyberDildonics · 2023-08-01T02:05:58

Uplift from normal python was trivial.

By definition if you lift something it is going to go up, but what does this mean?

ggm · 2023-08-01T08:26:17

If you replace your python engine you have to replace your imports.

Some engines can't build and deploy all imports.

Some engines demand syntactic sugar to do their work. Pypy doesn't

sitkack · 2023-07-31T14:58:12

One should really consider using containers in this situation.

rovr138 · 2023-07-31T15:26:24

Can you describe what in this situation warrants it?

I'm very curious about where the line is/should be.

89vision · 2023-07-31T16:45:14

In my experience leaving the system python interpreter the way it was shipped will save you enormous headaches down the road. Anytime I find myself needing additional python packages installed I will almost always at minimum create a virtual env, or ideally a container.

reftel · 2023-07-31T12:25:33

I use it at work for a script that parses and analyzes some log files in an unusual format. Wrote a naive parser with a parsing combinator library. It was too slow to be usable with CPython. Tried PyPy and got a 50x speed increase (yes, 50 times faster). Very happy with the results, actually =)

mattip · 2023-07-31T13:22:37

Thanks for the feedback. It does seem like parsing logs and simulations is a sweet spot for PyPy

_aavaa_ · 2023-07-31T13:29:55

Simulations are, at least in my experience, numba’s [0] wheelhouse.

[0]: https://numba.pydata.org/

zzzeek · 2023-07-31T13:58:59

what cpython version and OS was that? I'd be very surprised if modern Python 3.11 has anything an order of magnitude slower like that. things have gotten much faster over the years in cpython

macNchz · 2023-07-31T12:38:02

I put PyPy in production at a previous job, running a pretty high traffic Flask web app. It was quick and pretty straightforward to integrate, and sped up our request timings significantly. Wound up saving us money because server load went down to process the same volume of requests, so we were able to spin down some instances.

Haven’t used it in a bit mostly because I’ve been working on projects that haven’t had the same bottleneck, or that rely on incompatible extensions.

Thank you for your work on the project!

mattip · 2023-07-31T13:00:04

You're welcome.

> that rely on incompatible extensions.

Which ones? Is using conda an option, we have more luck getting binary packages into their build pipelines than getting projects to build wheels for PyPI

macNchz · 2023-07-31T13:47:13

I can't actually remember off of the top of my head, I tried it out a year or two ago but didn't get too far because during profiling it became clear the biggest opportunities for performance improvement in this app were primarily algorithmic/query/io optimizations outside of Python itself, so business-wise it didn't make too much sense, though if it had I think using Conda would have been on the table. We make heavy use of Pandas/Numpy et al, though I know those are largely supported now so I'd guess it was not one of them but something adjacent.

ADcorpo · 2023-07-31T11:50:37

This post is a funny coincidence as I tried today to speed-up a CI pipeline running ~10k tests with pytest by switching to pypy.

I am still working on it but the main issue is psycopg support for now, as I had to install psycopg2cffi in my test environment, but it will probably prevent me from using pypy for running our test suite, because psycopg2cffi does not have the same features and versions as psycopg2. This means either we switch our prod to pypy, which won't be possible because I am very new in this team and that would be seen as a big, risky change by the others, or we keep in mind the tests do not run using the exact same runtime as production servers (which might cause bugs to go unnoticed and reach production, or failing tests that would otherwise work on a live environment).

I think if I ever started a python project right now, I'd probably try and use pypy from the start, since (at least for web development) there does not seem to be any downsides to using it.

Anyways, thank you very much for your hard work !

cpburns2009 · 2023-07-31T14:14:21

If you use recent versions of PostgreSQL (10+ I believe) you can use psycopg3 [1] which has a pure Python implementation which should be compatible with PyPy.

[1]: https://www.psycopg.org/psycopg3/docs/basic/install.html

jsmeaton · 2023-07-31T12:34:29

Second this - no psycopg2 support and to a lesser extent lxml is a nonstarter and makes it pretty difficult to experiment with on production code bases. I could see a lot of adoption from Django deployments otherwise.

sodimel · 2023-07-31T12:55:22

Yeah we don't use pypy for those exact reasons on our small django projects.

tlocke · 2023-07-31T14:03:22

I work on pg8000 https://pypi.org/project/pg8000/ which is a pure-Python PostgreSQL driver that works well with pypy. Not sure if it would meet all your requirements, but just thought I'd mention it.

lozenge · 2023-07-31T19:58:48

One compromise could be to run pypy on draft PRs and CPython on approved PRs and master?

PaulHoule · 2023-07-31T11:40:00

I use CPython most of the time but PyPy was a real lifesaver when I was doing a project that bridged EMOF and RDF, particularly I was working with moderately sized RDF models (say 10 million triples) with rdflib.

With CPython, I was frustrated with how slow it was, and complained about it to the people I was working with, PyPy was a simple upgrade that sped up my code to the point where it was comfortable to work with.

mark_l_watson · 2023-07-31T12:43:38

That is a great idea! I use rdflib frequently and never thought to try it with PyPy. Now I will.

mattip · 2023-07-31T11:51:45

Is your group still using it?

PaulHoule · 2023-07-31T14:38:04

That particular code has been retired because after a quite a bit of trying things that weren’t quite right we understood the problem and found a better way to do it. I’m doing the next round of related work (logically modeling XSLT schemas and associated messages in OWL) in Java because there is already a library that almost does was I want.

I am still using this library that I wrote

https://paulhoule.github.io/gastrodon/

to visualize RDF data so even if I make my RDF model in Java I am likely to load it up in Python to explore it. I don’t know if they are using PyPy but there is at least one big bank that has people using Gastrodon for the same purpose.

nickpsecurity · 2023-07-31T12:22:23

What do you use RDF models for?

PaulHoule · 2023-07-31T14:34:47

So I wrote this library

https://paulhoule.github.io/gastrodon/

which makes it very easy to visualize RDF data with Jupyter by turning SPARQL results into data frames.

Here are two essays I wrote using it

https://ontology2.com/essays/LookingForMetadataInAllTheWrong...

https://ontology2.com/essays/PropertiesColorsAndThumbnails.h...

People often think RDF never caught on but actually there are many standards that are RDF-based such as RSS, XMP, ActivityPub and such that you can work on quite directly with RDF tools.

Beyond that I’ve been on a standards committee for ISO 20022 where we’ve figured out, after quite a few years of looking at the problem, how to use RDF and OWL as a master standard for representing messages and schemas in financial messaging. In the project that needed PyPy we were converting a standard represented in EMOF into RDF. Towards the end of last year I figured out the right way to logically model the parts of those messages and the associated schema with OWL. That is on its way of becoming one of those ISO standard documents that unfortunately costs 133 swiss franc. I also figured out that it is possible to do the same for many messages defined with XSLT and I’m expecting to get some work applying this to a major financial standard and I think there will be some source code and a public report on that.

Notably the techniques I use address quite a few problems with the way most people use RDF, most notably many RDF users don’t use the tools available to represented ordered collections, a notable example with this makes trouble is in Dublin Core for document (say book) metadata where you can’t represent the order of the authors of a paper which is something the authors usually care about a great deal. XMP adapts the Dublin Core standard enough to solve this problem, but with the techniques I use you can use RDF to do anything any document database can, though some SPARQL extensions would make it easier.

eigenvalue · 2023-07-31T15:04:51

Thanks for reminding me to look at PyPy again. I usually start all my new Python projects with this block of commands that I keep handy:

Create venv and activate it and install packages:

  python3 -m venv venv
  source venv/bin/activate
  python3 -m pip install --upgrade pip
  python3 -m pip install wheel
  pip install -r requirements.txt

I wanted a similar one-liner that I could use on a fresh Ubuntu machine so I can try out PyPy easily in the same way. After a bit of fiddling, I came up with this monstrosity which should work with both bash and zsh (though I only tested it on zsh):

Create venv and activate it and install packages using pyenv/pypy/pip:

  if [ -d "$HOME/.pyenv" ]; then rm -Rf $HOME/.pyenv; fi && \
  curl https://pyenv.run | bash && \
  DEFAULT_SHELL=$(basename "$SHELL") && \
  if [ "$DEFAULT_SHELL" = "zsh" ]; then RC_FILE=~/.zshrc; else RC_FILE=~/.bashrc; fi && \
  if ! grep -q 'export PATH="$HOME/.pyenv/bin:$PATH"' $RC_FILE; then echo -e '\nexport PATH="$HOME/.pyenv/bin:$PATH"' >> $RC_FILE; fi && \
  if ! grep -q 'eval "$(pyenv init -)"' $RC_FILE; then echo 'eval "$(pyenv init -)"' >> $RC_FILE; fi && \
  if ! grep -q 'eval "$(pyenv virtualenv-init -)"' $RC_FILE; then echo 'eval "$(pyenv virtualenv-init -)"' >> $RC_FILE; fi && \
  source $RC_FILE && \
  LATEST_PYPY=$(pyenv install --list | grep -P '^  pypy[0-9\.]*-\d+\.\d+' | grep -v -- '-src' | tail -1) && \
  LATEST_PYPY=$(echo $LATEST_PYPY | tr -d '[:space:]') && \
  echo "Installing PyPy version: $LATEST_PYPY" && \
  pyenv install $LATEST_PYPY && \
  pyenv local $LATEST_PYPY && \
  pypy -m venv venv && \
  source venv/bin/activate && \
  pip install --upgrade pip && \
  pip install wheel && \
  pip install -r requirements.txt

Maybe others will find it useful.

nicce · 2023-07-31T15:43:54

Just a note; these scrips are not comparable in monstrosity as the first is about to initiate the project when as the second one is to initiate whole PyPy installation.

So if you have PyPy already on your machines;

  pypy -m venv venv && \
    source venv/bin/activate && \
    pip install --upgrade pip && \
    pip install wheel && \
    pip install -r requirements.txt

Was not that bad after all, when my initial thought was that do I need all the above to just initiate the project :D

eigenvalue · 2023-07-31T15:57:56

That's true, but you can run the first block of commands on a brand new Ubuntu installation because regular CPython is installed by default. Whereas you would need to do the whole second block when starting on a fresh machine.

deizel · 2023-07-31T18:40:10

Given you'll want to activate a virtual environment for most Python projects, and projects live in directories.. I find myself constantly reaching for direnv. https://github.com/direnv/direnv/wiki/Python

    echo "layout python\npip install --upgrade pip pip-tools setuptools wheel\npip-sync" > .envrc

When you CD into a given project, it'll activate the venv, upgrade to non-ancient versions of Pip/etc with support for latest PEPs (ie. `pyproject.toml` support on new Python 3.9 env), verify the latest pinned packages are present.. it's just too useful not to have.

    direnv stdlib

This command (or this link https://direnv.net/man/direnv-stdlib.1.html) will print many useful functions that can be used in the `.envrc` shell script that is loaded when entering directories, ranging from many languages, to `dotenv` support, to `on_git_branch` for e.g. syncing deps when switching feature branches.

Check it out if you haven't.. I've been using it for more years than I can count and being able to CD from a PHP project to a Ruby project to a Python project with ease really helps with context switching.

stefanor · 2023-07-31T18:38:53

If you have a system-level installed pypy, the pypy equivalent is:

  python3 -m venv -p pypy3 venv
  source venv/bin/activate
  python3 -m pip install --upgrade pip
  python3 -m pip install wheel
  pip install -r requirements.txt

Not very different...

saila · 2023-07-31T15:40:51

For a more apples to apples comparison, you would install pypy using your package manager, e.g. apt install pypy3 or brew install pypy3. On Linux, you might have to add a package repo first.

eigenvalue · 2023-07-31T15:59:11

I find that much scarier to do personally since it seems a lot more likely to screw up other stuff on your machine, whereas with pyenv it's all self-contained in the venv. Also using apt packages tends to install a pretty old version.

smarnach · 2023-07-31T16:26:00

No, installing a package with apt is not more likely to screw up your machine than installing it manually. Moreover, you seem to be completely fine using the apt-installed CPython, while you think PyPy needs to be installed manually.

saila · 2023-08-04T02:10:40

I use pyenv myself, but that is beside the point. The two examples above are using different strategies to install python3 versus pypy. A valid comparison would use a package manager for both or pyenv for both.

pdw · 2023-07-31T12:27:19

We don't. To be honest, I didn't realize PyPy supported Python 3. I thought it was eternally stuck on Python 2.7.

So the good: It apparently now supports Python 3.9? Might want to update your front page, it only mentions Python 3.7.

The bad: It only supports Python 3.9, we use newer features throughout our code, so it'd be painful to even try it out.

eyegor · 2023-07-31T12:48:04

Their docs seem perpetually out of date, but they recently released support for 3.10. I haven't been able to try it recently because our projects use 3.10 features but in the past it was easily a 10-100x speedup as long as all the project's libraries worked.

https://downloads.python.org/pypy/

mattip · 2023-07-31T12:47:10

It supports Python3.10 now too. Thanks, I updated the site.

ADcorpo · 2023-07-31T12:42:31

I think it supports up to 3.10, as there are official docker images for this version, I saw them this morning.

Maybe the site is not up to date ?

mkl · 2023-07-31T11:26:33

You should probably put "Ask HN:" in your title.

Personally I don't use PyPy for anything, though I have followed it with interest. Most of the things I need to go faster are numerical, so Numba and Cython seem more appropriate.

1equalsequals1 · 2023-07-31T14:04:30

Cut him some slack, he's only been registered for 10 years

ezekiel68 · 2023-07-31T16:12:42

I read this as humor and I imagine mattip may have done also.

Cort3z · 2023-07-31T15:25:57

I don’t think it’s about being strict or condescending. In some HN readers the post will show up in a different catalogue and generally be easier for people to find, thus giving the post more visibility :)

Edit; typo

q3k · 2023-07-31T11:31:01

I use PyPy quite often as a 'free' way to make some non-numpy CPU-bound Python script faster. This is also the context for when I bring up PyPy to others.

The biggest blocker for me for 'defaulting' to PyPy is a) issues when dealing with CPython extensions and how quite often it ends up being a significant effort to 'port' more complex applications to PyPy b) the muscle memory for typing 'python3' instead of 'pypy3'.

nicce · 2023-07-31T11:37:50

For the b) part, you should consider creating alias for that command, if it really might lead for you to not use it otherwise.

mark_l_watson · 2023-07-31T12:50:24

I had the same thought. For years I have aliased ‘p’ for ‘python’ and after reading this thread I will alias ‘pp’ for ‘pypy’.

cpburns2009 · 2023-07-31T14:04:02

We use PyPy extensively at my employer, a small online retailer, for the website, internal web apps, ETL processes, and REST API integrations.

We use the PyPy provided downloads (Linux x86 64 bit) because it's easier to maintain multiple versions simultaneously on Ubuntu servers. The PyPy PPA does not allow this. I try to keep the various projects using the latest stable version of PyPy as they receive maintenance, and we're currently transitioning from 3.9/v7.3.10 to 3.10/v7.3.12.

Thank you for all of the hard work providing a JITed Python!

mattip · 2023-07-31T14:10:00

Cool. Would love to hear more about the successes and problems, or even get a guest blog post on https://www.pypy.org/blog/

v3ss0n · 2023-07-31T18:08:20

Nice to meet you here mattip.We had used PyPy for several years and I had raise this several times that only thing lacking PyPy is marketing ( and wrong information on cpyext unsupported ). PyPy gave us 8x performance boost on average, 4x min , 20x on especially JSON operation on long loops.

PyPy should had become standard implemention and it would save a lot of investment on Fast python

I tried to shill PyPy all the time but thanks to outdated website and weird reason of hetapod love ( at least put something on GitHub for discovery sick) , the devs who won't bother to look anything further than a GitHub page frawns upon me thinking PyPy is outdated and inactive project.

PyPy is one of the most ambitious project in opensource history and lack of publicity make me scream internally.

rsecora · 2023-07-31T12:37:11

I use it for data transformation, cleanup and enrichment. (TXT, CSV, Json, XML, database) to (TXT, CSV, JSON, XML, database).

Speed up of 30x - 40x. The highest speedup on those that require logic in the transformation. (lot of function calls, numerical operations and dictionary lookups).

captn3m0 · 2023-07-31T12:42:02

Similar. I was working on some ETL work with SQLite, and now PyPy is my regular tool for getting better performance at similar jobs.

sitkack · 2023-07-31T15:05:20

Same. I have used it for many ETL jobs, usually with about a 10x speed up. It also pulled in the latency on some Flask rest apis.

ghj · 2023-07-31T18:05:34

Copying from an older comment of mine shilling Pypy https://news.ycombinator.com/item?id=25595590

PyPy is pretty well stress-tested by the competitive programming community.

https://codeforces.com/contests has around 20-30k participants per contest, with contests happening roughly twice a week. I would say around 10% of them use python, with the vast majority choosing pypy over cpython.

I would guesstimate at least 100k lines of pypy is written per week just from these contests. This covers virtually every textbook algorithm you can think of and were automatically graded for correctness/speed/memory. Note that there's no special time multiplier for choosing a slower language, so if you're not within 2x the speed of the equivalent C++, your solution won't pass! (hence the popularity of pypy over cpython)

The sheer volume of advanced algorithms executed in pypy gives me huge amount of confidence in it. There was only one instance where I remember a contestant running into a bug with the jit, but it was fixed within a few days after being reported: https://codeforces.com/blog/entry/82329?#comment-693711 https://foss.heptapod.net/pypy/pypy/-/issues/3297.

New edit from that previous comment: there's now a Legendary Grandmaster (ELO rating > 3000, ranking 33 out of hundreds of thousands) who almost exclusively use pypy: https://codeforces.com/submissions/conqueror_of_tourist

wolfspaw · 2023-08-01T20:36:46

Really cool!

Competitive Programming needs a lot of speed to compete with the C++ submissions, really cool that there are Contestants using Python to win.

eigenvalue · 2023-07-31T15:22:48

I do think it would be very useful to have an online tool that lets you paste in your requirements.txt and then tells you which of the libraries have been recently verified to work properly with PyPy without a lot of additional fuss.

Also, you might want to flag the libraries that technically "work" but still require an extremely long and involved build process. For example, I recently started the process of installing Pandas with pip in a PyPy venv and it was stuck on `Getting requirements to build wheel ...` for a very long time, like 20+ minutes.

Twirrim · 2023-07-31T15:00:46

I was experimenting with some dynamic programming 0/1 knapsack code last week. PyPy available through the distro (7.3.9) was making a reasonable speed up, but not phenomenally. Out of curiousity I grabbed the latest version through pyenv (7.3.12) and it looks like some changes between them suddenly had the code sit in a sweet spot with it, I saw a couple of orders of magnitude better performance out of it. Good work.

I'm rarely using python in places at work where it would suit it (lots of python usage, but they're more on the order of short run tools), but I'm always looking for chances and always using it for random little personal things.

twp · 2023-07-31T13:05:05

Yes. We have a legacy Python-based geospatial data processing pipeline. Switching from CPython to PyPy sped it up by a factor of 30x or so, which was extremely helpful.

Thank you for your amazing work!

mattip · 2023-08-01T20:19:49

Would love to hear more. Is it still being used?

ant6n · 2023-07-31T13:24:54

When I worked at Transit App, I built a backend pre-processing pipeline to compress transit and osm data in python [1] and also another pipeline to process transit map data in python [2]. Since the Ops people complained about how long it took to compress the transit feeds (I think London took 10h each time something changed), I migrated everything to Pypy. Back then that was a bit annoying cuz it meant I had to remove numpy as a requirement, but other than that there were few issues. Also it meant we were stuck on 2.7 for quite a while, so long that I hadnt prepared a possible migration to 3.x. The migration happened after I left. Afaik they still use pypy.

Python is fun to work with (except classes…), but its just sooo slow. Pypy can be a life saver.

[1] https://blog.transitapp.com/how-we-shrank-our-trip-planner-t... [2] https://blog.transitapp.com/how-we-built-the-worlds-pretties...

mattip · 2023-08-01T20:24:36

What kind of speed up did you get?

ant6n · 2023-08-01T20:54:08

Some parts 10x and more. Overall a bit more than 5x.

wiz21c · 2023-07-31T11:45:07

I don't use PyPy because when I'm stuck with performance issues, I go to numpy and if it really doesn't work I go to cython/numba (because it means that 99% of my python code continue to work the same, only the 1% that gets optimized is different; if I'd go PyPy, I'd have to check my whole code again). I do mostly computational fluid dynamics.

(nevertheless, PyPy is impressive :-) )

oebs · 2023-07-31T16:02:55

I'm maintaining an internal change-data-capture application that uses a python library to decode mysql binlog and store the change records as json in the data lake (like Debezium). For our most busiest databases a single Cpython process couldn't process the amount of incoming changes in real time (thousands of events per second). It's not something that can be easily parallelized, as the bulk of the work is happening in the binlog decoding library (https://github.com/julien-duponchelle/python-mysql-replicati...).

So we've made it configurable to run some instances with Pypy - which was able to work through the data in realtime, i.e. without generating a lag in the data stream. The downside of using pypy was increased memory usage (4-8x) - which isn't really a problem. An actually problem that I didn't really track down was that the test suite (running pytest) was taking 2-3 times longer with Pypy than with CPython.

A few months ago I upgraded the system to run with CPython 3.11 and the performance improvements of 10-20% that come with that version now actually allowed us to drop Pypy and only run CPython. Which is more convenient and makes the deployment and configuration less complex.

eslaught · 2023-07-31T16:36:11

We use PyPy for performing verification of our software stack [1], and also for profiling tools [2]. The verification tool is basically a complete reimplementation of our main product, and therefore encodes a massive amount of business logic (and therefore difficult to impossible to rewrite in another language). As with other users, we found the switch to PyPy was seamless and provides us with something like a 2.5x speedup out of the box, with (I think) higher speedups in some specific cases.

We eventually rewrote the profiler tool in Rust for additional speedups, but as mentioned for the verification engine, it's probably too complicated to ever do that so we really appreciate drop-in tools like PyPy that can speed up our code.

[1]: https://github.com/StanfordLegion/legion/blob/master/tools/l...

[2]: https://github.com/StanfordLegion/legion/blob/master/tools/l...

waysa · 2023-07-31T11:52:11

I used PyPy with SymPy when I was helping out a mathematician-friend. SymPy is not exactly fast, a free performance boost was very welcome.

mattip · 2023-07-31T13:05:11

Interesting. I was under the impression PyPy did not do so well with SymPy because the dynamic code paths are difficult to JIT. What kind tasks waw a speed up?

waysa · 2023-08-07T16:20:02

It's been a while and the code is long lost. We only touched the surface of SymPy. Functions, Substitutions, some `ingegrate` and `simplify` is what I remember. The maths was already done. My job was to verify some equations.

t90fan · 2023-07-31T12:03:48

I can't remember exactly what the use case was but we used at my old work (Start up providing a Web CDN/WAF type service, think the kind of stuff CloudFlare does nowadays) in ~2013 for some sort of batch processing analytics/billing type job, using MRJob and AWS Elastic Map Reduce over a seriously large data set.

The performance of PyPy over CPython saved us loads and loads time and thus $$$s, from what I can recall.

mattip · 2023-07-31T12:17:23

Thanks, that is hopeful, although quite a while ago.

tgbugs · 2023-07-31T18:26:11

We use pypy3 on musl via gentoo in production to run dataset validation pipelines. The easiest place to see that we use pypy3 is probably [1]. The build process and patches we carry are under [2].

We also use pypy3 to accelerate rdflib parsing and serialization of various RDF formats. See for example [3].

Thanks to you and the whole PyPy team!

1. https://github.com/tgbugs/dockerfiles/blob/6f4ad5d873b7ab267...

2. https://github.com/tgbugs/dockerfiles/blob/6f4ad5d873b7ab267...

3. https://github.com/SciCrunch/sparc-curation/blob/0fdf393e26f...

mattip · 2023-07-31T18:40:16

You're welcome, thanks for sharing. Do you have any numbers about speed vs. an alternative?

tgbugs · 2023-07-31T21:23:05

I don't have anything rigorous, but I can say that I see the usual ~4x speedup when using rdflib to parse large files so that a 20 minute workload in cpython drops to 4 or 5 minutes when run on pypy3.

I just reran one of my usual benchmarks and I see 2mins for pypy3 (pypy 7.3.12 python 3.10.12) peak memory usage about 8gigs, 4.8mins for python3.11 (3.11.4) peak memory usage about 3.6gigs (2.4x speedup). On another computer running the exact same workload I see 6.3mins and 19mins (3x speedup) with the same peak memory usage.

I don't have any numbers on the dataset pipelines because I never ran them in production on cpython and went straight to pypy3. It is easy for me to switch between the two implementations in this context so I could run a side by side comparison (with the usual caveat that it would be completely non-rigorous).

I also have some internal notes related to a project that I didn't list because it isn't public, isn't in production, and the benchmarks were collected quite a while ago, but I see a 4x increase in throughput when pulling large amounts of data from a postgresql database from 20mbps on cpython 3.6 to 80mbps on pypy3.

fragebogen · 2023-08-01T11:50:43

I'm running a constrained convex optimization project at work, where we need as close to real time (<10s is great, <1min is acceptable) responses for a web interface.

Basically I'm using a SciPy exclusively for the optimization routine:

* minimize(method="SLSQP") [0]

* A list comprehention which calls ~10-500 pre-fitted PchipInterpolator [1] functions and stores the values as a np.array().

The Pchip functions (and it's first derivatives) are used in the main opt function as well as in several constraints.

Most jobs took about 10 seconds but the long tail might take up to 10 min some times. I tried the pypy 3.8 (7.3.9), and saw similar compute times on the shorter jobs, but roughly ~2x slower compute times on the heavier jobs. This obviously was not what I expected, but I had very limited experience with pypy and didn't know how to debug further.

Eventually python 3.10 came around and gave 1.25x speed increase, and then 3.11 which gave another 1.6-1.7x increase which gave a decent ~2x cumulative speedup, but the occasional heavy jobs still stay in the 5 min range and would have been nicer in the 10-30s obviously.

Still I would like to say that trying pypy out was a quite smooth experience, staying within scipy land, took me half a day to switch and benchmark. But if anyone else has experience with pypy and scipy, knowing some obvious pitfalls, it would be much appreciated to hear.

[0] https://docs.scipy.org/doc/scipy/reference/optimize.minimize...

[1] https://docs.scipy.org/doc/scipy/reference/generated/scipy.i...

mattip · 2023-08-01T12:26:36

If you find your bottlenecks in SciPy or Numpy, then PyPy will not help. Those are primarily written in C, so the PyPy JIT cannot peer inside and do any magic.

fragebogen · 2023-08-09T12:22:33

My effort was rather in trying to speed up all the python looping, etc. around the np calls. But I never went far trying to actually benchmark the entire pipeline in order to find out what was the actual bottleneck.

Apreche · 2023-07-31T13:18:22

I don’t actually use PyPY, but I’m very aware of it. My understanding is that the only reason to use PyPy instead of the default Python is for performance gains. For the vast majority of projects I work on, the performance of our code on the CPU is almost never the bottleneck. The slowness is always in IO, databases, networks, etc.

That said, if I do ever run into a situation where I need my code to perform better, PyPy is high on my list of things to try. It’s nice to know it’s an option.

cool-RR · 2023-07-31T18:08:28

Hi Matti. I'm happy to see that you're doing community outreach. I haven't tried PyPy in a while. The general impression I have about PyPy is that as soon as you try to do anything a little bit complicated, things break in unexpected ways and there's little support. Also, I love using Wing IDE for debugging, and if I'm not mistaken it can't debug PyPy code.

I'm currently doing multi-agent reinforcement learning research using RLlib, which is part of Ray. I tried to install a PyPy environment for it. It failed because Ray doesn't provide a wheel for it:

    Could not find a version that satisfies the requirement ray (from versions: none)

My hunch is that even Ray did provide that, there would have been some other roadblock that would have prevented me from using PyPy.

OOPMan · 2023-08-01T12:36:27

Wow, I didn't know anyone still uses Wing.

The modern debugging tools available in other IDEs work fine with PyPy (and have for years), so I guess that must be a wing issue.

cool-RR · 2023-08-02T11:39:38

Interesting, thank you. I think a big part of PyPy's problem is the long tail of esoteric tools and packages that different people use.

oxmane · 2023-07-31T16:24:10

At Alooma (https://www.linkedin.com/mwlite/company/alooma) we've been running all our integrations with data sources using PyPy. Main motivation was indeed performance gains.

FWIW, since I've seen it mentioned, we've also been using psycopg2cffi to access Postgres sources.

The product now lives (at least partially) as Datastream on GCP (https://cloud.google.com/datastream/docs/overview). I'm not sure though if it's still running on PyPy.

I could try and connect with the folks still working on it, if you're interested.

mattip · 2023-07-31T16:30:33

Cool. Yes, I am interested in hearing more.

oxmane · 2023-08-01T12:00:14

Sent you an intro email to the relevant person.

lsferreira42 · 2023-07-31T13:31:18

I'm building a bot detector api to use with our CDN and using pypy was decided on day one, without pypy the performance is just not there.

Also in my day job we use pypy in all our python deployments, to be fair until now I thought that everybody would develop in python, test in pypy for an easy speed boost and only got back to python if pypy was slower than cpython

_han · 2023-07-31T13:13:24

I didn't hear about PyPy before, but I think you're doing great work.

I would be interested in seeing benchmarks where PyPy is compared with more recent versions of CPython. https://www.pypy.org/ currently shows a comparison with CPython 3.7, but recent releases of CPython (3.11+) put a lot of effort into performance which is important to take into account.

wg0 · 2023-07-31T12:50:26

While the community is here, anyone has embedded pypy as scriptable language for some larger program? Like Inkscape or scripting as part of a rule engine. Or for that, CPython is more suitable?

mattip · 2023-07-31T13:03:36

It is much easier to embed CPython, PyPy can only be embedded via CFFI [0].

[0] https://cffi.readthedocs.io/en/latest/embedding.html

bofaGuy · 2023-07-31T13:17:25

My biggest issue is that DataDog doesn’t support PyPy. Out of curiosity, I made a new branch of our app and took out DataDog and observed a significant improvement in performance when using PyPy vs CPython on the same branch (but can’t remember how much).

ofek · 2023-07-31T14:58:16

Do you mean the Python tracing library does not work out-of-the-box?

disclaimer: I work there but not on the APM team

bofaGuy · 2023-07-31T16:46:19

Correct. I think DD only supports CPython.

IshKebab · 2023-07-31T12:32:40

I've never used it because the (unknown) effort of switching and the chance of compatibility issues have always made it unappealing compared to just switching to a faster language.

If I could just `pip3 install pypy` and then set an environment variable to use it or something like that then I'd give it a try. It does feel a bit like adding a jet pack to a rowing boat though. I know some people use Python in situations where the performance requirement isn't "I literally don't care" but surely not very many?

Obviously if it was the default that would be fantastic.

JimDabell · 2023-07-31T13:53:20

If you use a version manager like rtx or asdf then it’s basically that simple. I just had to run a single command:

    rtx use python@pypy3.10

This downloaded and installed PyPy v3.10 in a few seconds and created an .rtx.toml file in the current directory that ensures when I run python in that directory I get that version of PyPy.

btown · 2023-07-31T18:08:11

A sub-question for the folks here: is anyone using the combination of gevent and PyPy for a production application? Or, more generally, other libraries that do deep monkey-patching across the Python standard library?

Things like https://github.com/gevent/gevent/issues/676 and the fix at https://github.com/gevent/gevent/commit/f466ec51ea74755c5bee... indicate to me that there are subtleties on how PyPy's memory management interacts with low-level tweaks like gevent that have relied on often-implicit historical assumptions about memory management timing.

Not sure if this is limited to gevent, either - other libraries like Sentry, NewRelic, and OpenTelemetry also have low-level monkey-patched hooks, and it's unclear whether they're low-level enough that they might run into similar issues.

For a stack without any monkey-patching I'd be overjoyed to use PyPy - but between gevent and these monitoring tools, practically every project needs at least some monkey-patching, and I think that there's a lack of clarity on how battle-tested PyPy is with tools like these.

RMPR · 2023-07-31T16:16:21

I don't. I work for a company where we always try to track the latest stable version of Python. Right now we are on 3.11, and unfortunately Pypy is lagging behind.

PartiallyTyped · 2023-07-31T13:21:17

Hey, you might want to delete the link to https://mesapy.org/rpython-by-example in https://doc.pypy.org/en/latest/architecture.html as it is pointing to a resource that people are unable to access.

mattip · 2023-07-31T13:29:28

Thanks, that should have been https://mssun.github.io/rpython-by-example/index.html. Fixing.

saltcured · 2023-08-01T16:33:26

I used it for real once over a decade ago, when I had to help some researchers who wanted to load an archive of Twitter JSON dumps into an RDBMS. This was basically cleaning/transliterating data fields into CSV that could bulk-import into PostgreSQL. I think we were using Python 2.7 back then.

1. The same naive deserialization and dict processing code ran much faster with PyPy.

2. Conveniently, PyPy also tolerated some broken surrogate pairs in Twitter's UTF8 stream, which threw exceptions when trying to decode the same events with the regular Python interpreter.

I've had some web service code where I wished I could easily swap to PyPy, but these were conservative projects using Apache + mod_wsgi daemons with SE-Linux. If there were a mod_wsgi_pypy that could be a drop-in replacement, I would have advocated for trials/benchmarking with the ops team.

Most other performance-critical work for me has been with combinations of numpy, PyOpenCL, PyOpenGL, and various imaging codecs like `tifffile` or piping numpy arrays in/out of ffmpeg subprocesses.

ahallan · 2023-07-31T13:31:12

I've used it at work to speed up some standard Python code (without any c-bound library usage). It sped up the code by 5 times.

I've deployed used the pypy:3.9 image on docker.

One thing I did notice is that it was significantly faster on my local machine vs when I tried to deploy it using an AWS lambda/fargate. I know this is because of virtualization/virtual-cpu, but there was not much I could do to improve it.

danielpassy · 2023-08-09T13:39:06

If I'm not mistaken, at Buser Brasil, Brazilian Flix Bus, the destination search is powered by PyPy https://www.buser.com.br/

wenc · 2023-07-31T15:23:44

I actually donated to the Pypy project in the past but I don’t use it.

Two reasons for my hesitation:

1) Cpython is fast enough for most things I need to do. The speed improvement from Pypy is either not enough or not necessary.

2) Lingering doubts about subtle incompatibility (in terms of library support) that I might have to spend hours getting to the bottom of.

I already work long hours and don’t have bandwidth to tinker. With Cpython, although slow, I can be assured is the standard surface that everyone targets, and I can google solutions for.

It’s the subtle things that i waste a lot of time on. It’s analogous to an Ubuntu user trying to use Red Hat. They’re both Linuxes but the way things are done are different enough that they trip you up.

The only way to get out of this quandary is for Pypy to be a first class citizen. Guido will never endorse this so this means a bunch of us will always have hesitation putting it into production systems.

comboy · 2023-07-31T15:58:14

A bit meta. It seems like it would be nice to have no-action tickets for open source projects.

Quite often you would want to just thank somebody, or say that you would prefer it that way and don't understand why is it this way or it would be cool to have this or that, but of course opening ticket on github feels like wasting time of the maintainer and especially when you have some feedback like e.g. what would you like to see or what you do and don't like it feels entitled because well you can do it yourself, you can fork etc.

It would need to be low friction for both sides. Preferably with no way to respond so that there's zero pressure and little time waste for maintainers.

Mail feels like you want something, it works for thank you but still feels bad on receiving end when you just ignore them.

mattip · 2023-07-31T16:36:50

You can comment on our blog, or open an issue. Frankly, we get so little feedback that dealing with new issues is not a hassle.

comboy · 2023-08-01T13:15:28

I understand, and reaching out like this seems like a great idea and time well spent, I meant it would be nice to have something like this in general, for different. kind of projects.

CurriedHautious · 2023-07-31T17:07:56

What is the compatibility of PyPy with a typical web server deployment? I am currently looking at testing compatibility with Tornado -> SQL Alchemy -> psycopg2. It seems like the C-extensions are a common tripping point. I see the recommendation to use psycopg2cffi, but it seems that package's last release was 2019 :(

SQL Alchemy actually points to PyPy in its recommendations of things to try in ORM performance. https://docs.sqlalchemy.org/en/20/faq/performance.html#resul...

cpburns2009 · 2023-07-31T17:52:25

The only compatibility issue for web development I've run into is database drivers.

For PostgreSQL, psycopg2 is not supported. psycopg2cffi is largely unmaintained, and the 2.9.0 version in PyPI lacks some newer features of psycopg2: the `psycopg2.sql` module and empty result sets raise a RuntimeError in Python 3.7+. The latest commit in on Github does have these changes [1]. Psycopg 3 [2] and pg8000 [3] (as user tlocke mentioned elsewhere) are viable alternates provided you aren't stuck with older versions of PostgreSQL. I have to continue use psycopg2cffi until I can upgrade an old PostgreSQL 9.4 database.

For Microsoft SQL Server, pymssql does not support PyPy [4]. It's under new maintainership so it might gain support in the future. pypyodbc hasn't had any activity since 2022, and no new PyPI release since 2021 [5]. The datatypes returned can differ between libodbc1 versions. On Ubuntu 18.04 in particular: empty string columns are returned as a single space, integer columns are returned as a Decimal. Also, if you encounter a mysterious HY010 error ("Function sequence error"), you may need to upgrade libodbc1 to v2.3.7+ from v2.3.4 using the Microsoft repos.

[1]: https://github.com/chtd/psycopg2cffi [2]: https://pypi.org/project/psycopg/ [3]: https://pypi.org/project/pg8000/ [4]: https://github.com/pymssql/pymssql/pull/517 [5]: https://pypi.org/project/pypyodbc/

CurriedHautious · 2023-07-31T21:33:57

Through some back & forth in conda and pip (fighting dependencies), I have been able to get PyPy 3.9 running with my ARM64 Debian system. So far I am seeing a performance decrease up to 2x. I have a series of REST API calls that encapsulate a single DB transaction - a mix of reads and writes. Most of it is leaning on SQL Alchemy, but we have been reaching for psycopg2 for some of the larger insert statements.

I was hoping to see some improvement in ORM performance (SQLAlchemy 1.3) - mainly in the bookkeeping side. Currently the app is about 60% Python app wait time and 40% DB wait time. We have a handful noisy areas which emit a lot of statements (Update 1 row at a time, 10000 times via ORM for example).

I also tried cProfiler to drill down, but as I've seen in Stack Overflow notes that profiler has a larger impact in PyPy over CPython.

landtuna · 2023-07-31T15:56:30

I used it in a situation where replacing Python code with a C-implemented module was not efficient because there were too many small objects being marshaled in and out of PyObjects. PyPy let everything stay in Python-land and still run quickly.

Qem · 2023-07-31T15:34:27

I can't really use it at work, due to restrictive corporative policy in place. I don't control my workstation setup, and I'm only allowed vanilla CPython there. Regarding this, I wished PyPy were pip installable from inside CPython, like Hylang and Pyston do.

But while programming as a hobby at home, mostly small-scale simulations, PyPy is my default interpreter for Python. It seems PyPy has a sweet spot on code written relying heavily on OOP style, with a lot of method calls and self invocation. I consistently get 8-10x speed improvements.

hnfong · 2023-07-31T17:14:56

I use pypy as a drop-in replacement for CPython for some small data crunching scripts of my hobby projects. Might not count as "real work", but getting "free" speed ups is very nice and I'm very grateful for the PyPy project for providing a performant alternative to CPython.

I was close to trying pypy on a production django deployment (which gets ~100k views a month), but given that the tiny AWS EC2 instance we're running it on is memory bound, the increased pypy memory usage made it impractical to do so.

alfalfasprout · 2023-07-31T16:50:36

Years ago, yes I used it.

Nowadays, to be honest, everything that I need to be fast in Python is largely around numerical code which either calls out to C/C++ (via numpy or some ML library) or I use numba for. And these are either slower w/ PyPi or won't work.

HTTP web servers are notoriously slow in Python (even the fastest ones like falcon) but I found they either didn't play nicely with Pypi or weren't any faster. In large part because if the API does any kind of "heavy lifting" they can't be truly concurrent.

claytonjy · 2023-07-31T13:48:13

Can someone ELI5 why pypy doesn't or can't work with C-based packages like numpy or psycopg? I know nothing of how pypy does its magic.

If we could use pypy, while still using those packages, I think it'd be the go-to interpreter. Why can't pypy optimize everything else, and leave the C stuff as-is?

How does pypy handle packages written in other languages, like rust? can I use pypy if I depend on Pydantic?

lapinot · 2023-07-31T14:19:38

Basically afaik the default C API (`Python.h`) is matched to CPython's internal representations, hence it's a pain to support it for alternative implementations and incurs cost penalties. The preferred way to interact with C code in pypy is through cffi (https://cffi.readthedocs.io/en/latest/) and ctypes (which afaik is implemented in pure python on top of cffi in pypy).

Numpy being itself written in C and C++ it is strongly tied to the C API and has a complicated build process. Some stuff works and some don't (didn't try recently). If you're invested in numerical python you should most likely not use pypy but go for stuff like cython (like scipy does).

For psycopg apparently you can use psycopg2cffi (never tried).

> How does pypy handle packages written in other languages, like rust? can I use pypy if I depend on Pydantic?

PyO3 supports pypy so everything should be fine.

mattip · 2023-07-31T14:05:11

Lots of questions :)

For c-extensions see https://www.pypy.org/posts/2018/09/inside-cpyext-why-emulati...

We would like to be able to "just JIT" better. But for that we need feedback about what is still unreasonably slow, and resources to work on improving it. Right now PyPy is on a shoe-string budget of volunteers.

For rust, like CPython, use PyO3, which works with PyPy.

I am not sure about Pydantic. Sounds like a topic for someone to investigate on their codebase and tell us how PyPy does.

ideasman42 · 2023-07-31T12:51:38

If it was relatively up to date with Python3 I'd use it, but as it lags behind considerably I avoid it, even for personal work.

mattip · 2023-07-31T13:01:07

Python 3.10 is too old for your work?

ideasman42 · 2023-07-31T13:25:35

In fact no, Python 3.10 is OK new enough.

There is still the lag though, Python 3.10 was out for quite a while before PyPy supported 3.10.

justinc-md · 2023-07-31T16:39:57

I used PyPy extensively at a previous employer. The use case was to accelerate an application that was CPU-bound because of serde, which could not be offloaded using multiprocessing. PyPy resulted in a 10x increase in message throughput, and made the project viable in python. Without PyPy, we would have rebuilt the application in Java.

Aqueous · 2023-07-31T16:56:38

Can you tell us the obstacles for incorporating learnings from or even backporting the work from PyPi back into CPython?

mattip · 2023-07-31T17:01:56

There is significant cooperation between the two. When porting PyPy to a new Python version, we examine what changes CPython made. And CPython core developers are aware of PyPy. The main obstacle is developer time. CPython has to be very careful about backward compatibility, which includes not making the interpreter slower for a few while faster for many.

vogu66 · 2023-07-31T19:10:44

I've actually come across and started using Pyjion recently (https://github.com/tonybaloney/pyjion); how does Pypy compare, both in terms of performance and purpose? There seems to be a lot of overlap...

garyrob · 2023-07-31T11:54:45

I've never ended up using PyPy other than to play with it. Numba has worked very well for me for real code.

qeternity · 2023-07-31T15:59:41

We have evaluated PyPy but actually found Pyston to be more performant in most of our use cases (even with extensive JIT warming). That project unfortunately seems unmaintained now, but I am hoping that the improvements will be upstreamed into CPython.

pyuser583 · 2023-07-31T13:16:48

I don’t use it, but I’d like to.

The big obstacle is that for while we would have multiple execution environments. It’s not like we could flip a switch and all Dockerfiles are using PyPy.

Plus I don’t think AWS Lambda supports it.

If I could go back in time, we would use it from the beginning.

kzrdude · 2023-07-31T12:18:38

I wonder if programs like Rye, that distribute python in a way similar to Rust's rustup, can help. Rye already supports pypy, you can just pull down pypy3.9 at will into any particular python project managed by rye.

radus · 2023-07-31T14:47:15

I don’t use it because I make frequent use of scientific libraries. If it were possible to use on a function by function basis, with a decorator like numba, I would definitely give it a go.

garashovb · 2023-08-01T06:34:10

David Beazley - PyCon 2012 Keynote Talk (Tinkering with PyPy)

https://youtu.be/6_-5XZzJyt0

zapregniqp · 2023-08-01T12:38:17

Yes, I have personally used for some system admin tasks. I've used PyPy to write scripts and tools for automating tasks due to its performance benefits.

woopwoop24 · 2023-07-31T16:43:58

not using it, but thank you for the work you put in, highly appreciated, we only strive, because so many of us put in the work :)

czbond · 2023-07-31T17:50:38

Question - new'ish to Python. Could I use PyPy with dataframes / pandas / Ray?

serjester · 2023-07-31T17:54:06

PyPy is for eeking our performance from vanilla python. Most dataframe libraries are already written in something low level like C or Rust (Polars) so you won't see gains (if they even run).

password4321 · 2023-07-31T17:41:42

Time to add some opt-out telemetry! (runs for the hills)

So... thanks for not doing that.

ComplexSystems · 2023-07-31T15:58:43

I would like to, but aren't there issues using it with NumPy and SciPy?

pletnes · 2023-08-01T10:48:03

Numpy works fine these days, haven’t tested scipy. I’d give it a go. But the C bits inside numpy won’t go faster.

nurettin · 2023-08-01T18:19:39

I use it to speed up NEAT-Python for simulations

mattip · 2023-08-01T20:22:29

What kind of speed up do you get?

nurettin · 2023-08-02T17:21:43

It saves around 10-20% in seconds

m_antis89 · 2023-07-31T14:44:43

> Is anyone using PyPy for real work? Yes.

andrewstuart · 2023-07-31T11:49:31

I’ve been aware of it for a long time.

I don’t use it.

Why would I use it, what’s the compelling benefit?

bayindirh · 2023-07-31T11:56:14

Errm, nothing too serious. It's way faster for CPU bound code, and allows micro-threads.

This two weird tricks tend to create wonders, tho.

ADcorpo · 2023-07-31T11:56:55

From the project's homepage:

> A fast, compliant alternative implementation of Python

Performance without compromising too much on compatibility seems to be the main benefit. There is a talk on the YouTube channel «Pycon Sweden» from 5 years ago where the host showed some impressive speed gains for his workload (parsing black box dumps from planes).

Twirrim · 2023-07-31T14:55:43

It's a python runtime that contains a JIT, as a result it can be phenomenally faster. Like with any JITted runtime, it depends a bit on what your code is doing, and how long you're running it for as there is a little (but honestly very little) bit of up front overhead.

ceeam · 2023-07-31T12:01:10

I liked Psyco a lot, it was totally awesome and with very few bugs (CPython differences) but that was looong ago. PyPy looks and feels like a monstrosity, it builds longer than most software for once, which is off-putting. I would be more interested in a Python JIT which is more like LuaJIT to Lua.