Async Python is not faster

phodge · on June 12, 2020

How is this result surprising? The point of coroutines isn't to make your code execute faster, it's to prevent your process sitting idle while it waits for I/O.

When you're dealing with external REST APIs that take multiple seconds to respond, then the async version is substantially "faster" because your process can get some other useful work done while it's waiting. Obviously the async framework introduces some overhead, but that bit of overhead is probably a lot less than the 3 billion cpu cycles you'll waste waiting 1000ms for an external service.

calpaterson · on June 12, 2020

I think it is surprising to a lot of people who do take it as read that async will be faster.

As I describe in the first line of my article I don't think that people who think async is faster have unreasonable expectations. It seems very intuitive to assume that greater concurrency would mean greater performance - at least one some measure.

> When you're dealing with external REST APIs that take multiple seconds to respond, then the async version is substantially "faster" because your process can get some other useful work done while it's waiting.

I'm afraid I also don't think you have this right conceptually. An async implementation that does multiple ("embarrassingly parallel") tasks in the same process - whether that is DB IO waiting or microservice IO waiting - is not necessarily a performance improvement over a sync version that just starts more workers and has the OS kernel scheduler organise things. In fact in practice an async version is normally lower throughput, higher latency and more fragile. This is really what I'm getting at when I say async is not faster.

Fundamentally, you do not waste "3 billion cpu cycles" waiting 1000ms for an external service. Making alternative use of the otherwise idle CPU is the purpose (and IMO the proper domain of) operating systems.

john-radio · on June 12, 2020

> Fundamentally, you do not waste "3 billion cpu cycles" waiting 1000ms for an external service. Making alternative use of the otherwise idle CPU is the purpose (and IMO the proper domain of) operating systems.

Sure, the operating system can find other things to do with the CPU cycles when a program is IO-locked, but that doesn't help the program that you're in the situation of currently trying to run.

> An async implementation that does multiple ("embarrassingly parallel") tasks in the same process - whether that is DB IO waiting or microservice IO waiting - is not necessarily a performance improvement over a sync version that just starts more workers and has the OS kernel scheduler organise things. In fact in practice an async version is normally lower throughput, higher latency and more fragile. This is really what I'm getting at when I say async is not faster.

You're right. "Arbitrary programs will run faster" is not the promise of Python async.

Python async does help a program work faster in the situation that phodge just described (waiting for web requests, or waiting for a slow hardware device), since the program can do other things while waiting for the locked IO (unlike a Python program that does not use async and could only proceed linearly through its instructions). That's the problem that Python asyncio purports to solve. It is still subject to the Global Interpreter Lock, meaning it's still bound to one thread. (Python's multiprocessing library is needed to overcome the GIL and break out a program into multiple threads, at the cost that cross-thread communication now becomes expensive).

quietbritishjim · on June 12, 2020

> unlike a Python program that does not use async and could only proceed linearly through its instructions

This isn't how it works. While Python is blocked in I/O calls, it releases the GIL so other threads can proceed. (If the GIL were never released then I'm sure they wouldn't have put threading in the Python standard library.)

> Python's multiprocessing library is needed to overcome the GIL

This is technically true, in that if you are running up against the GIL then the only way to overcome it is to use multiprocessing. But blocking IO isn't one of those situations, so you can just use threads.

The comparison here is not async vs just doing one thing. It's async vs threads. I believe that's what the performance comparison in the article is about, and if threads were as broken as you say then obviously they wouldn't have performed better than asyncio.

--------

As an aside, many C-based extensions also release the GIL when performing CPU-bound computations e.g. numpy and scipy. So GIL doesn't even prevent you from using multithreading in CPU-heavy applications, so long as they are relatively large operations (e.g. a few calls to multiply huge matrices together would parallelise well, but many calls to multiply tiny matrices together would heavily contend the GIL).

gshulegaard · on June 12, 2020

> > Python's multiprocessing library is needed to overcome the GIL

> No it's not, just use threads.

I just wanted to expand on this a little to describe some of the downsides to threads in Python.

Multi-threaded logic can be (and often is) slower than single-threaded logic because threading introduces overhead of lock contention and context switching. David Beazley did a talk illustrating this in 2010:

https://www.youtube.com/watch?v=Obt-vMVdM8s

He also did a great talk about coroutines in 2015 where he explores threading and coroutines a bit more:

https://www.youtube.com/watch?v=MCs5OvhV9S4&t=525s

In workloads that are often "blocked" like network calls our I/O bound work loads, threads can provide similar benefits to coroutines but with overhead. Coroutines seek to provide the same benefit without as much overhead (no lock contention, fewer context switches by the kernel).

It's probably not the right guidelines for everyone but I generally use these when thinking about concurrency (and pseudo-concurrency) in Python:

- Coroutines where I can.

- Multi-processing where I need real concurrency.

- Never threads.

quietbritishjim · on June 12, 2020

Ah ha! Now we have finally reached the beginning of the conversation :-)

The point is, many people think (including you judging by your comment, and certainly including me up until now but now I'm just confused) that in Python asyncio is better than using multiple threads with blocking IO. The point of the article is to dispel that belief. There seems to be some debate about whether the article is really representative, and I'm very curious about that. But then the parent comment to mine took us on an unproductive detour that based on the misconception that Python threads don't work at all. Now your comment has brought up that original belief again, but you haven't referenced the article at all.

gshulegaard · on June 13, 2020

I didn't reference the article because I provided more detailed references which explore the difference between threads and coroutines in Python to a much greater depth.

The point of my comment is to say that neither threads or coroutines will make Python _faster_ in and of themselves. Quite the opposite in fact: threading adds overhead so unless the benefit is greater than the overhead (e.g. lock contention and context switching) your code will actually be net slower.

I can't recommend the videos I shared enough, David Beazley is a great presenter. One of the few people who can do talks centered around live coding that keep me engaged throughout.

> The point is, many people think (including you judging by your comment, and certainly including me up until now but now I'm just confused) that in Python asyncio is better than using multiple threads with blocking IO. The point of the article is to dispel that belief.

The disconnect here is that this article isn't claiming that asyncio is not faster than threads. In fact the article only claims that asyncio is not a silver bullet guaranteed to increase the performance of any Python logic. The misconception it is trying to clear up, in it's own words is:

> Sadly async is not go-faster-stripes for the Python interpreter.

What I, and many others are questioning is:

A) Is this actually as widespread a belief as the article claims it to be? None of the results are surprising to me (or apparently some others).

B) Is the article accurate in it's analysis and conclusion?

As an example, take this paragraph:

> Why is this? In async Python, the multi-threading is co-operative, which simply means that threads are not interrupted by a central governor (such as the kernel) but instead have to voluntarily yield their execution time to others. In asyncio, the execution is yielded upon three language keywords: await, async for and async with.

This is a really confusing paragraph because it seems to mix terminology. A short list of problems in this quote alone:

- Async Python != multi-threading.

- Multi-threading is not co-operatively scheduled, they are indeed interrupted by the kernel (context switches between threads in Python do actually happen).

- Asyncio is co-operatively scheduled and pieces of logic have to yield to allow other logic to proceed. This is a key difference between Asyncio (coroutines) and multi-threading (threads).

- Asynchronous Python can be implemented using coroutines, multi-threading, or multi-processing; it's a common noun but the quote uses it as a proper noun leaving us guessing what the author intended to refer to.

Additionally, there are concepts and interactions which are missing from the article such as the GIL's scheduling behavior. In the second video I shared, David Beazley actually shows how the GIL gives compute intensive tasks higher priority which is the opposite of typical scheduling priorities (e.g. kernel scheduling) which leads to adverse latency behavior.

So looking at the article as a whole, I don't think the underlying intent of the article is wrong, but the reasoning and analysis presented is at best misguided. Asyncio is not a performance silver bullet, it's not even real concurrency. Multi-processing and use of C extensions is the bigger bang for the buck when it comes to performance. But none of this is surprising and is expected if you really think about the underlying interactions.

To rephrase what you think I thought:

> The point is, many people think (including you judging by your comment, and certainly including me up until now but now I'm just confused) that in Python asyncio is better than using multiple threads with blocking IO.

Is actually more like:

> Asyncio is more efficient than multi-threading in Python. It is also comparatively more variable than multi-processing, particularly when dealing with workloads that saturate a single event loop. Neither multi-threading or Asyncio is actually concurrent in Python, for that you have to use multi-processing to escape the GIL (or some C extension which you trust to safely execute outside of GIL control).

---

Regarding your aside example, it's true some C extensions can escape the GIL, but often times it's with caveats and careful consideration of where/when you can escape the GIL successfully. Take for example this scipy cookbook regarding parallelization:

https://scipy-cookbook.readthedocs.io/items/ParallelProgramm...

It's not often the case that using a C extension will give you truly concurrent multi-threading without significant and careful code refactoring.

camgunz · on June 13, 2020

For single processes you’re right, but this article (and a lot of the activity around asyncio in Python) is about backend webdev, where you’re already running multiple app servers. In this context, asyncio is almost always slower.

willseth · on June 12, 2020

> But blocking IO isn't one of those situations, so you can just use threads.

Threads and async are not mutually exclusive. If your system resources aren't heavily loaded, it doesn't matter, just choose the library you find most appropriate. But threads require more system overhead, and eventually adding more threads will reduce performance. So if it's critical to thoroughly maximize system resources, and your system cannot handle more threads, you need async (and threads).

otabdeveloper4 · on June 12, 2020

> But threads require more system overhead, and eventually adding more threads will reduce performance.

Absolutely false. OS threads are orders of magnitude lighter than any Python coroutine implementation.

dragonwriter · on June 12, 2020

> OS threads are orders of magnitude lighter than any Python coroutine implementation.

But python threads, which have extra weight on top of an cross-platform abstraction layer on top of the underlying OS threads, are not lighter than python coroutines.

You aren't choosing between Python threads and unadorned OS threads when writing Python code.

otabdeveloper4 · on June 13, 2020

You're absolutely right.

I'm pointing out that this is a Python problem, not a threads problem, a fact which people don't understand.

dragonwriter · on June 13, 2020

Everyone has been discussing relative performance of different techniques within Python; there is neither a basis to suggest from that that people don't understand that aspects of that are Python specific, nor a reason to think that that is even particularly relevant to the discussion.

willseth · on June 12, 2020

Okay, then let's do a bakeoff! You outfit a Python webserver that only uses threads, and I'll outfit an identical webserver that also implements async. Server that handling the most requests/sec wins. I get to pick the workload.

js2 · on June 12, 2020

FWIW, I have a real world Python3 application that does the following:

- receives an HTTP POST multipart/form-data that contains three file parts. The first part is JSON.

- parses the form.

- parses the JSON.

- depending upon the JSON accepts/rejects the POST.

- for accepted POSTs, writes the three parts as three separate files to S3.

It runs behind nginx + uwsgi, using the Falcon framework. For parsing the form I use streaming-form-data which is cython accelerated. (Falcon is also cython accelerated.)

I tested various deployment options. cpython, pypy, threads, gevent. Concurrency was more important than latency (within reason). I ended up with the best performance (measured as highest RPS while remaining within tolerable latency) using cpython+gevent.

It's been a while since I benchmarked and I'm typing this up from memory, so I don't have any numbers to add to this comment.

heavyset_go · on June 12, 2020

Each Linux thread has at least an 8MB virtual memory overhead. I just tested it, and was able to create one million coroutines in a few seconds and with a few hundred megabytes of overhead in Python. If I created just one thousand threads, it would take possibly 8 gigs of memory.

otabdeveloper4 · on June 13, 2020

Virtual memory is not memory. You're effectively just bumping an offset, there's no actual allocations involved.

> ...it would take possibly 8 gigs of memory.

No. Nothing is 'taken' when virtual memory is requested.

shk1338 · on June 13, 2020

But have you tried creating one thousand of OS threads and measuring the actual memory usage? If I recall correctly I read some article where it was explained that threads in Linux are not actually claiming their 8MB each so literally. I need to recheck that later.

heavyset_go · on June 13, 2020

You're right, I've read the same. Using Python 3.8, creating 12,000 threads with `time.sleep` as the target clocks in at 200MB residential memory.

Jasper_ · on June 12, 2020

People seem to keep misunderstanding the GIL. It's the Global Interpeter Lock, and it's effectively the lock around all Python objects and structures. This is necessary because Python objects have no thread ownership model, and the development team does not want per-object locks.

During any operation that does not need to modify Python objects, it is safe to unlock the GIL. Yielding control to the OS to wait on I/O is one such example, but doing heavy computation work in C (e.g. numpy) can be another.

ben509 · on June 12, 2020

To clarify that the CPython devs aren't being arbitrary here: There have been attempts at per-object or other fine-grained locking, and they appear to be less performant than a GIL, particularly for the single-threaded case.

Single-threaded performance is a major issue as that's most Python code.

Jasper_ · on June 12, 2020

Yes. I expect generic fine-grained locking, especially per-object leaks, to be less performant for multi-threaded code too, as locks aren't cheap, and even with the GIL, lock overhead could still be worse than a good scheduler.

Any solution which wants to consider per-object locking has to consider removing refcounting, or locking the refcount bits separately, as locking/unlocking objects to twiddle their refcounts is going to be ridiculously expensive.

Ultimately, the Python ownership and object model is not condusive to proper threading, as most objects are global state and can be mutated by any thread.

mrits · on June 12, 2020

Instead of disagreeing with some of your vague assertions I'll just make my own points for people that want to consider using async.

Workers (usually live in a new process) are not efficient. Processes are extremely expensive and subjectively harder for exception handling. Threads are lighter weight..and even better are async implementations that use a much more scalable FSM to handle this.

Offloading work to things not subjective to the GIL is the reason async Python got so much traction. It works really well.

brightball · on June 12, 2020

This is often a point of confusion for people when looking at Erlang, Elixir or Go code. Concurrency beyond leveraging available CPU's doesn't really add any advantage.

On the web when the bulk of your application code time is waiting on APIs, database queries, external caches or disk I/O it creates a dramatic increase in the capacity of your server if you can do it with minimal RAM overhead.

It's one of the big reasons I've always wanted to see Techempower create a test version that continues to increase concurrency beyond 512 (as high as maybe 10k). I think it would be interesting.

camgunz · on June 12, 2020

> On the web when the bulk of your application code time is waiting on APIs, database queries, external caches or disk I/O it creates a dramatic increase in the capacity of your server if you can do it with minimal RAM overhead.

Python doesn't block on I/O.

kirkeby · on June 12, 2020

Of course it does.

camgunz · on June 12, 2020

It releases the GIL.

Edit: sorry I can do better.

If you're using async/await to not block on I/O while handling a request, you still have to wait for that I/O to finish before you return a response. Async adds overhead because you schedule the coroutine and then resume execution.

The OS is better at scheduling these things because it can do it in kernel space in C. Async/await pushes that scheduling into user space, sometimes in interpreted code. Sometimes you need that, but very often you don't. This is in conflict with "async the world", which effectively bakes that overhead into everything. This explains the lower throughput, higher latency, and higher memory usage.

So effectively this means "run more processes/threads". If you can only have 1 process/thread and cannot afford to block, then yes async is your only option. But again that case is pretty rare.

cvlasdkv · on June 12, 2020

From my understanding the primary use of concurrency in Erlang/Elixir is for isolation and operational consistency. Do you believe that not to be the case?

toast0 · on June 12, 2020

The primary use of concurrency in Erlang is modelling a world that is concurrent.

If you go back to the origins of Erlang, the intent was to build a language that would make it easier to write software for telecom (voice) switches; what comes out of that is one process for each line, waiting for someone to pick up the line and dial or for an incoming call to make the line ring (and then connecting the call if the line is answered). Having this run as an isolated process allows for better system stability --- if someone crashes the process attached to their line, the switch doesn't lose any of the state for the other lines.

It turns out that a 1980s design for operational excellence works really well for (some) applications today. Because the processes are isolated, it's not very tricky to run them in parallel. If you've got a lot of concurrent event streams (like users connected via XMPP or HTTP), assigning each a process makes it easy to write programs for them, and because Erlang processes are significantly lighter weight than OS processes or threads, you can have millions of connections to a machine, each with its own process.

You can absolutely manage millions of connections in other languages, but I think Erlang's approach to concurrency makes it simpler to write programs to address that case.

brightball · on June 12, 2020

That's a big topic. The shortest way I can summarize it though:

Immutable data, heap isolated by concurrent process and lack of shared state, combined with supervision trees made possible because of extremely low overhead concurrency, and preemptive scheduling to prevent any one process from taking over the CPU...create that operational consistency.

It's a combination of factors that have gone into the language design that make it all possible though. Very big and interesting topic.

But it does create a significant capacity increase. Here's a simple example with websockets.

https://dockyard.com/blog/2016/08/09/phoenix-channels-vs-rai...

zzzeek · on June 12, 2020

this is true for compiled languages as the ones you mention, but generally does not apply to Python, which as an interpreted language tends to add CPU overhead for even the smallest tasks.

szatkus · on June 12, 2020

CPU can do billions of operations every second. When you have 200ms for every request that overhead is not that large, you're still blocked by I/O.

zzzeek · on June 12, 2020

for local services like databases, real world benchmarks disagree.

szatkus · on June 12, 2020

You should add that you mean just databases. I've just looked at your profile and as I understand it's your focus.

I built a service that was making a lot of requests. Much enough that at some point we've run out of 65k connections limit for basic Linux polling (we needed to switch to kpoll). Some time after that we've ran out of other resources and switching from threads to threads+greenlets really solved our problem.

arghwhat · on June 12, 2020

>... is not necessarily a performance improvement over a sync version that just starts more workers and has the OS kernel scheduler organise things.

This is very true, especially when actual work is involved.

Remember, the kernel uses the exact same mechanism to have a process wait on a synchronous read/write, as it does for a processes issuing epoll_wait. Furthermore, isolating tasks into their own processes (or, sigh, threads), allows the kernel scheduler to make much better decisions, such as scheduling fairness and QoS to keep the system responsive under load surges.

Now, async might be more efficient if you serve extreme numbers of concurrent requests from a single thread if your request processing is so simple that the scheduling cost becomes a significant portion of the processing time.

... but if your request processing happens in Python, that's not the case. Your own scheduler implementation (your event loop) will likely also end up eating some resources (remember, you're not bypassing anything, just duplicating functionality), and is very unlikely to be as smart or as fair as that of the kernel. It's probably also entirely unable to do parallel processing.

And this is all before we get into the details of how you easily end up fighting against the scheduler...

crimsonalucard1 · on June 12, 2020

Yeah except nodejs will beat flask in this same exact benchmark. Explain that.

talideon · on June 12, 2020

CPython doesn't have a JIT, while node.js does. If you want to compare apples to apples, try looking at Flask running on PyPy.

e12e · on June 12, 2020

Ed: after reading the article, I guess it's safe to say that everything below is false :)

---

I'd guess the c++ event loop is more important than the jit?

Maybe a better comparison is quart (with eg uvicorn)

https://pgjones.gitlab.io/quart/

https://www.uvicorn.org/

Or Sanic / uvloop?

https://sanicframework.org/

https://github.com/MagicStack/uvloop

Tronic2 · on June 14, 2020

Plain sanic runs much faster than the uvicorn-ASGI-sanic stack used in the benchmark, and the ASGI API in the middle is probably degrading other async frameworks' performance too. But then this benchmark also has other major issues, like using HTTP/1.0 without keep-alive in its Nginx proxy_pass config (keep-alive again has a huge effect on performance, and would be enabled on real performance-critical servers). https://sanic.readthedocs.io/en/latest/sanic/nginx.html

e12e · on June 15, 2020

Interesting, thank you. I wasn't aware nginx was so conservative by default.

https://nginx.org/en/docs/http/ngx_http_proxy_module.html#pr...

talideon · on June 13, 2020

You're not completely off. There might be issues with async/await overhead that would be solved by a JIT, but also if you're using asyncio, the first _sensible_ choice to make would be to swap out the default event loop with one actually explicitly designed to be performant, such as uvloop's one, because asyncio.SelectorEventLoop is designed to be straightforward, not fast.

There's also the major issue of backpressure handling, but that's a whole other story, and not unique to Python.

My major issue with the post I replied to is that there are a bunch of confounding issues that make the comparison given meaningless.

crimsonalucard1 · on June 13, 2020

The database is the bottleneck. JIT or even C++ shouldn't even be a factor here. Something is wrong with the python implimentation of async await.

talideon · on June 13, 2020

If I/O-bound tasks are the problem, that would tend to indicate an issue with I/O event loop, not with Python and its async/await implementation. If the default asyncio.SelectorEventLoop is too slow for you, you can subclass asyncio.AbstractEventLoop and implement your own, such as buildiong one on top of uvloop. And somebody's already done that: https://github.com/MagicStack/uvloop

Moreover, even if there's _still_ a discrepancy, unless you're profiling things, the discussion is moot. This isn't to say that there aren't problems (there almost certainly are), but that you should get as close as possible to an apples-to-apples comparison first.

crimsonalucard1 · on June 13, 2020

When I talk about async await I'm talking about everything that encompasses supporting that syntax. This includes the I/O event loop.

So really we're in agreement. You're talking about reimplementing python specific things to make it more performant, and that is exactly another way of saying that the problem is python specific.

talideon · on June 13, 2020

No, we're not in agreement. You're confounding a bunch of independent things, and that is what I object to.

It's neither fair nor correct to mush together CPython's async/await implementation with the implementation of asyncio.SelectorEventLoop. They are two different things and entirely independent of one another.

Moreover, it's neither fair nor correct to compare asyncio.SelectorEventLoop with the event loop of node.js, because the former is written in pure Python (with performance only tangentally in mind) whereas the latter is written in C (libuv). That's why I pointed you to uvloop, which is an implementation of asyncio.AbstractEventLoop built on top of libuv. If you want to even start with a comparison, you need to eliminate that confounding variable.

Finally, the implementation matters. node.js uses a JIT, while CPython does not, giving them _much_ different performance characteristics. If you want to eliminate that confounding variable, you need to use a Python implementation with a JIT, such as PyPy.

Do those two things, and then you'll be able to do a fair comparison between Python and node.js.

crimsonalucard1 · on June 14, 2020

Except the problem here is that those tests were bottlenecked by IO. Whether you're testing C++, pypy, libuv, or whatever it doesn't matter.

All that matters is the concurrency model because that application he's running is barely doing anything else except IO and anything outside of IO becomes negligible because after enough requests, those sync worker processes will all be spending the majority of their time blocked by an IO request.

The basic essence of the original claim is that sync is not necessarily better than async for all cases of high IO tasks. I bring up node as a counter example because that async model IS Faster for THIS same case. And bringing up node is 100% relevant because IO is the bottleneck, so it doesn't really matter how much faster node is executing as IO should be taking most of the time.

Clearly and logically the async concurrency model is better for these types of tasks so IF tests indicate otherwise for PYTHON then there's something up with python specifically.

You're right, we are in disagreement. I didn't realize you completely failed to understand what's going on and felt the need to do an apples to apples comparison when such a comparison is not Needed at all.

talideon · on June 14, 2020

No, I understand. I just think that your comparison with _node.js_ when there are a bunch of confounding variables is nonsense. Get rid of those and then we can look at why "nodejs will beat flask in this same exact benchmark".

crimsonalucard1 · on June 14, 2020

> I just think that your comparison with _node.js_ when there are a bunch of confounding variables is nonsense

And I'm saying all those confounding variables you're talking about are negligible and irrelevant.

Why? Because the benchmark test in the article is a test where every single task is 99% bound by IO.

What each task does is make a database call AND NOTHING ELSE. Therefore you can safely say that for either python or Node request less than 1% of a single task will be spent on processing while 99% of the task is spent on IO.

You're talking about scales on the order of 0.01% vs. 0.0001%. Sure maybe node is 100x faster, but it's STILL NEGLIGIBLE compared to IO.

It it _NOT_ Nonsense.

You Do not need an apples to apples comparison to come to the conclusion that the problem is Specific to the python implementation. There ARE NO confounding variables.

talideon · on June 14, 2020

> And I'm saying all those confounding variables you're talking about are negligible and irrelevant.

No, you're asserting something without actual evidence, and the article itself doesn't actually state that either: it contains no breakdown of where the time is spent. You're assuming the issue lies in one place (Python's async/await implementation) when there are a bunch of possible contributing factors _which have not been ruled out_.

Unless you've actually profiled the thing and shown where the time is used, all your assertions are nonsense.

Show me actual numbers. Prove there are no confounding variables. You made an assertion that demands evidence and provided none.

crimsonalucard1 · on June 14, 2020

>Unless you've actually profiled the thing and shown where the time is used, all your assertions are nonsense.

It's data science that is causing this data driven attitude to invade peoples minds. Do you not realize that logic and assumptions take a big role in drawing conclusions WITHOUT data? In fact if you're a developer you know about a way to DERIVE performance WITHOUT a single data point or benchmark or profile. You know about this method, you just haven't been able to see the connections and your model about how this world works (data driven conclusions only) is highly flawed.

I can look at two algorithms and I can derive with logic alone which one is O(N) and which one is O(N^2). There is ZERO need to run a benchmark. The entire theory of complexity is a mathematical theory used to assist us at arriving AT PERFORMANCE conclusions WITHOUT EVIDENCE/BENCHMARKS.

Another thing you have to realize is the importance of assumptions. Things like 1 + 1 = 2 will remain true always and that a profile or benchmark ran on a specific task is an accurate observation of THAT task. These are both reasonable assumptions to make about the universe. They are also the same assumptions YOU are making everytime you ask for EVIDENCE and benchmarks.

What you aren't seeing is this: The assumptions I AM making ARE EXACTLY THE SAME: reasonable.

>you're asserting something without actual evidence, and the article itself doesn't actually state that either: it contains no breakdown of where the time is spent

Let's take it from the top shall we.

I am making the assumption that tasks done in parallel ARE Faster than tasks done sequentially.

The author specifically stated he made a server that where each request fetches a row from the database. And he is saying that his benchmark consisted of thousands of concurrent requests.

I am also making the assumption that for thousands of requests and thousands of database requests MOST of the time is spent on IO. It's similar to deriving O(N) from a for loop. I observe the type of test the author is running and I make a logical conclusion on WHAT SHOULD be happening. Now you may ask why is IO specifically taking up most of the time of a single request a reasonable assumption? Because all of web development is predicated on this assumption. It's the entire reason why we use inefficient languages like python, node or Java to run our web apps instead of C++, because the database is the bottleneck. It doesn't matter if you use python or ruby or C++, the server will always be waiting on the db. It's also a reasonable assumption given my experience working with python and node and databases. Databases are the bottleneck.

Given this highly reasonable assumption, and in the same vein as using complexity theory to derive performance speed, it is highly reasonable for me to say that the problem IS PYTHON SPECIFIC. No evidence NEEDED. 1 + 1 = 2. I don't need to put that into my calculator 100 times to get 100 data points for some type of data driven conclusion. It's assumed and it's a highly reasonable assumption. So reasonable that only an idiot would try to verify 1 + 1 = 2 using statistics and experiments.

Look you want data and no assumptions? First you need to get rid of the assumption that a profiler and benchmark is accurate and truthful. Profile the profiler itself. But then your making another assumption: The profiler that profiled the profiler is accurate. So you need to get me data on that as well. You see where this is going?

There is ZERO way to make any conclusion about anything without making an assumption. And Even with an assumption, the scientific method HAS NO way of proving anything to be true. Science functions on the assumption that probability theory is an accurate description of events that happen in the real world AND even under this assumption there is no way to sample all possible EVENTS for a given experiment so we can only verify causality and correlations to a certain degree.

The truth is blurry and humans navigate through the world using assumptions, logic and data. To intelligently navigate the world you need to know when to make assumptions and when to use logic and when data driven tests are most appropriate. Don't be an idiot and think that everything on the face of the earth needs to be verified with statistics, data and A/B tests. That type of thinking is pure garbage and it is the same misguided logic that is driving your argument with me.

talideon · on June 15, 2020

Buddy, you can make all the "logical arguments" you want, but if you can't back up them up with evidence, you're just making guesses.

jinglebells · on June 12, 2020

Nodejs is faster than Python as a general rule, anyway. As I understand, Nodejs compiles Javascript, Python interprets Python code.

I do a lot of Django and Nodejs and Django is great to sketch an app out, but I've noticed rewriting endpoints in Nodejs directly accessing postgres gets much better performance.

Just my 2c

arghwhat · on June 12, 2020

CPython, the reference implementation, interprets Python. PyPy interprets and JIT compiles Python, and more exotic things like Cython and Grumpy statically compiles Python (often through another, intermediate language like C or Go).

Node.js, using V8, interprets and JIT compiles JavaScript.

Although note that, while Node.js is fast relative to Python, it's still pretty slow. If you're writing web-stuff, I'd recommend Go instead for casually written, good performance.

1337shadow · on June 13, 2020

The compare between Django against no-ORM is a bit weird given that rewriting your endpoint in python without Django or ORM would also have produced better results I suppose.

crimsonalucard1 · on June 13, 2020

Right but this test focused on concurrent IO. The bottleneck is not the interpreter but the concurrency model. It doesn't matter if you coded it in C++, the JIT shouldn't even be a factor here because the bottleneck is IO and therefore ONLY the concurrency model should be a factor here. You should only see differences in speed based off of which model is used. All else is negligible.

So you have two implementations of async that are both bottlenecked by IO. One is implemented in node. The other in python.

The node implementation behaves as expected in accordance to theory meaning that for thousands of IO bound tasks it performs faster then a fixed number of sync worker threads (say 5 threads).

This makes sense right? Given thousands of IO bound tasks, eventually all 5 threads must be doing IO and therefore blocked on every task, while the single threaded async model is always context switching whenever it encounters an IO task so it is never blocked and it is always doing something...

Meanwhile the python async implementation doesn't perform in accordance to theory. 5 async workers is slower then 5 sync workers on IO bound tasks. 5 sync workers should eventually be entirely blocked by IO and the 5 async workers should never be blocked ever... Why is the python implementation slower? The answer is obvious:

It's python specific. It's python that is the problem.

arghwhat · on June 12, 2020

JIT compiler.

crimsonalucard1 · on June 13, 2020

Bottleneck is IO. Concurrency model should be the limiting factor here.

NodeJS is faster than flask because of the concurrency model and NOT because of the JIT.

The python async implementation being slower than the python sync implementation means one thing: Something is up with python.

The poster implies that with the concurrency model the outcome of these tests are expected.

The reality is, these results are NOT expected. Something is going on specifically with the python implementation.

nurettin · on June 13, 2020

You mean express.js ?

crimsonalucard1 · on June 13, 2020

NodeJS primitives are enough to produce the same functionality as flask without the need for an extra framework.

wongarsu · on June 12, 2020

Async IO was in large part a response to "how can my webserver handle xx thousand connections per second" (or in the case of Erlang "how do you handle millions of phone calls at once"). Starting 15 threads to do IO works great, but once you wait for hundreds of things at once the overhead from context switching becomes a problem, and at some point the OS scheduler itself becomes a problem

tijsvd · on June 12, 2020

Not really. At least on Linux, the scheduler is O(1). There is no difference between one process waiting for 10k connections, or 10k processes waiting for 1 each. And there is hardly a context switch either, if all these 10k processes use the same memory map (as threads do).

I've tested this extensively on Linux. There is no more CPU used for threads vs epoll.

On the other hand, if you don't get the epoll imementation exactly right, you may end up with many spurious calls. E.g. simply reading slow data from a socket in golang on Linux incurs considerable overhead: a first read that is short, another read that returns EWOULDBLOCK, and then a syscall to re-arm the epoll. With OS threads, that is just a single call, where the next call blocks and eventually returns new data.

Edit: one thing I haven't considered when testing is garbage collection. I'm absolutely convinced that up to 10k connections, threads or async doesn't matter, in C or Rust. But it may be much harder to do GC over 10k stacks than over 8.

dathinab · on June 12, 2020

I recently have read a block with benchmarks doing that for well written C in their use case async io only becomes faster then using threads from around 10k parallel connections. (Through the difference was negligible).

This seems to also be a major behind io_uring.

staticassertion · on June 12, 2020

I don't think this is true? At least, I've never seen the issue of OS threads be that context switching is slow.

The issue is memory usage, which OS threads take a lot of.

Would userland scheduling be more CPU efficient? Sure, probably in many cases. But I don't think that's the problem with handling many thousands of concurrent requests today.

yamrzou · on June 12, 2020

> is not necessarily a performance improvement over a sync version that just starts more workers and has the OS kernel scheduler organise things

Co-routines are not necessarily faster than threads, but they yield to a performance improvement if one has to spin thousands of them : they have less creation overhead and consume less RAM.

tijsvd · on June 12, 2020

> Co-routines are not necessarily faster than threads, but they yield to a performance improvement if one has to spin thousands of them : they have less creation overhead and consume less RAM.

This hardly matters when spinning up a few thousand threads. Only memory that's actually used is committed, one 4k page at a time. What is 10MB these days? And that is main memory, while it's much more interesting what fits in cache. At that point it doesn't matter if your data is in heap objects or on a stack.

Add to that the fact that Python stacks are mostly on the heap, the real stack growing only due to nested calls in extensions. It's rare for a stack in Python to exceed 4k.

dullgiulio · on June 12, 2020

Languages that to green threads don't do them for memory savings, but to save on context switches when a thread is blocked and cannot run. System threads are scheduled by the OS, green threads my the language runtime, which saves a context switch.

tijsvd · on June 12, 2020

Green threads are scheduled by the language runtime and by the OS. If the OS switches from one thread to another in the same process, there is no context switch, really, apart from the syscall itself which was happening anyway (the recv that blocks and causes the switch). At least not on Linux, where I've measured the difference.

crimsonalucard1 · on June 12, 2020

This is not what is happening with flask/uwsgi. There is a fixed number of threads and processes with flask. The threads are only parallel for io and the processes are parallel always.

viscanti · on June 12, 2020

Which is fine until you run out of uwsgi workers because a downstream gets really slow sometime. The point of async python isn't to speed things up, it's so you don't have to try to guess the right number of uwsgi workers you'll need in your worst case scenario and run with those all the time.

crimsonalucard1 · on June 13, 2020

Yep and this test being shown is actually saying that about 5 sync workers acting on thousands of requests is faster then python async workers.

Theoretically it makes no sense. A Task manager executing tasks in parallel to IO instead of blocking on IO should be faster... So the problem must be in the implementation.

_pmf_ · on June 12, 2020

> I think it is surprising to a lot of people who do take it as read that async will be faster.

Literally the first thing any concurrency course starts with in the very first lesson is that scheduling and context overhead are not negligible. Is it so hard to expect our professionals to know basic principles of what they are dealing with?

dspillett · on June 12, 2020

> think it is surprising to a lot of people who do take it as read that async will be faster.

This is because when they are first shown it, the examples are faster, effectively at least, because the get given jobs done in less wallclock time due to reduced blocking.

They learn that but often don't get told (or work out themselves) that in many cases the difference is so small as to be unmeasurable or in other circumstances the can be negative effects (overheads others have already mentioned in the framework, more things waiting on RAM with a part processed working day which could lead to thrashing in a low memory situation, greater concurrent load on other services such as a database and the IO system it depends upon, etc).

As a slightly of-the-topic-of-async example, back when multi-core processing was first becoming cheap enough that it was not just affordable at give but the default option, I had great trouble trying to explain to a colleague why two IO intensive database processes he was running were so much slower than when I'd shown him the same process (I'd run them sequentially). He was absolutely fixated on the idea that his four cores should make concurrency the faster option, I couldn't get through that in this case the flapping heads on the drives of the time were the bottleneck and the CPU would be practically idle no matter how many cores it had while the bottleneck was elsewhere.

Some people learn the simple message (async can handle some loads much more efficiently) as an absolute (async is more efficient) and don't consider at all that the situation may be far more nuanced.

nurettin · on June 13, 2020

> An async implementation that does multiple ("embarrassingly parallel") tasks in the same process

You mean concurrent tasks in the same process?

lucideer · on June 12, 2020

> I don't think that people who think async is faster have unreasonable expectations

I do.

And I don't think I'm alone nor being unreasonable.

kerkeslager · on June 12, 2020

> The point of coroutines isn't to make your code execute faster, it's to prevent your process sitting idle while it waits for I/O.

This is a quintessential example of not seeing the forest for the trees.

The point of coroutines is absolutely to make my code execute faster. If a completely I/O-bound application sits idle while it waits for I/O, I don't care and I should not care because there's no business value in using those wasted cycles. The only case where coroutines are relevant is when the application isn't completely I/O bound; the only case where coroutines are relevant is when they make your code execute faster.

It's been well-known for a long time that the majority of processes in (for example) a webserver, are I/O bound, but there are enough exceptions to that rule that we need a solution to situations where the process is bound by something else, i.e. CPU. The classic solution to this problem is to send off CPU-bound processes to a worker over a message queue, but that involves significant overhead. So if we assume that there's no downside to making everything asynchronous, then it makes sense to do that--it's not faster for the I/O bound cases, but it's not slower either, and in the minority-but-not-rare CPU-bound case, it gets us a big performance boost.

What this test is doing is challenging the assumption that there's no downside to making everything asynchronous.

In context, I tend to agree with the conclusion that there are downsides. However, those downsides certainly don't apply to every project, and when they do, there may be a way around them. The only lesson we can draw from this is that gaining benefit from coroutines isn't guaranteed or trivial, but there is much more compelling evidence for that out there.

michaelcampbell · on June 12, 2020

> The point of coroutines is absolutely to make my code execute faster.

I think rather the point is to make your APPLICATION either finish in less time, or to not take MORE time when given more load.

The code runs as fast as it runs, coroutines notwithstanding.

kerkeslager · on June 12, 2020

> > The point of coroutines is absolutely to make my code execute faster.

> I think rather the point is to make your APPLICATION either finish in less time, or to not take MORE time when given more load.

Potato potato.

michaelcampbell · on June 12, 2020

Well, sure, anything can mean anything if you're willing to redefine what words mean.

kerkeslager · on June 12, 2020

The meaning of words is determined by usage. Usage of words is determined by the meaning. This circular definition causes the inherent problem of language: words don't have inherent meaning. The best I can do is to attempt to use words in a way similar to the way that you use words, but I can only ever make an educated guess about how you use words, so it's never going to be perfect.

And from my perspective, I don't think it's unreasonable for me to expect you to try to understand what I'm trying to communicate, rather than attempting to force me to use different words. The burden of communication is shared by both speaker and listener.

earthboundkid · on June 12, 2020

“Faster” is not a well defined technical term. It is a piece of natural language that can easily refer to max time, mean time, P99, latency, throughput, price per watt, etc. depending on context.

BiteCode_dev · on June 12, 2020

This is not what this article is about.

The surprising conclusion of the article is that on a realistic scenario, the async web frameworks will ouput less requests/sec than the sync ones.

I'm very familiar with Python concurrency paradigms, and I wasn't expecting that at all.

Add to that zzzeek's article (the guy wrote SQLA...) stating async is also slower for db access, this makes async less and less appealing, given the additional complexity it adds.

Now appart from doing a crawler, or needing to support websockets, I find hard to justify asyncio. In fact, with David Beasley hinting that you probably can get away with spawning a 1000 threads, it raises more doubts.

The whole point of async was that, at least when dealing with a lot of concurrent I/O, it would be a win compared to threads+multiprocessing. If just by cranking the number of sync workers you get better results for less complexity, this is bad.

DougBTX · on June 12, 2020

As far as I can tell, the main cost of threads is 2-4MB of memory usage for stack space, so async allows saving memory by allowing one thread to process more than one task. A big deal if you have a server with 1GB of memory and want to handle 100,000 simultaneous connections, like Erlang was designed for. But if the server has enough memory for as many threads that are needed to cover the number of simultaneous tasks, is there still a benefit?

BiteCode_dev · on June 12, 2020

Now the $1000 question would be, if you pay for the context switching of BOTH threads and asyncio, having 5 processes, which each 20 threads, within each an event loop, what happens?

Is the price of the context switching too high, or are you compensating the weakness of each system, by handling I/O concurrently in async, but smoothing the blocking code outside of the await thanks to threads?

Making a _clean_ benchmark for would it be really hard, though.

BiteCode_dev · on June 13, 2020

Anwsering my own comment cause I can't edit it anymore, but this article has started a heated debate on tweeter.

The author of "black" suggested that the cause of the slow down may be that asyncio actually starved postgres for resources:

https://twitter.com/llanga/status/1271719783080366086

zzzeek · on June 12, 2020

> When you're dealing with external REST APIs that take multiple seconds to respond, then the async version is substantially "faster" because your process can get some other useful work done while it's waiting. Obviously the async framework introduces some overhead, but that bit of overhead is probably a lot less than the 3 billion cpu cycles you'll waste waiting 1000ms for an external service.

but threads get you the same thing with much less overhead. this is what benchmarks like this one and my own continue to confirm.

People often are afraid of threads in Python because "the GIL!" But the GIL does not block on IO. I think programmers reflexively reaching for Tornado or whatever don't really understand the details of how this all works.

danbruc · on June 12, 2020

but threads get you the same thing with much less overhead.

That is not true, at least not in general, the whole point of using continuations for async I/O is to avoid the overhead of using threads, the scheduler overhead, the cost of saving and restoring the processor state when switching tasks, the per thread stack space, and so on.

catblast · on June 12, 2020

The scheduler overhead and the cost of context-switches are vastly overstated compared to alternatives. The per thread stack space in effect has virtually no run-time cost, and starting off at a single 4k page for a stack, thousands still only waste a miniscule about of memory.

camgunz · on June 12, 2020

async implementations build a scheduler into the runtime, and that's generally slower than the OS' scheduler. 10-100x slower if it's not in C (or whatever).

aviba · on June 12, 2020

GIL might not block on I/O but the implementation that uses PyObject does need the GIL no?

mavdi · on June 12, 2020

I get enraged when articles like this get upvotes. The evidence given doesn't at all negate the reasoning behind using async, which as you said, is about not having to be blocked by IO, not freaking throughput test for an unrealistic scenario. Just goes to show the complete lack of understanding of the topic. I wouldn't dare write something up if I didn't 100% grasp it, but the bar is way lower for some others it seems.

didibus · on June 12, 2020

I don't know the async Python specifics, but from what I understand, you don't necessarily need async to handle large number of IO requests, you can simply use non-blocking IO and check back on it synchronously either in some loop or at specific places in your program.

The use of async either as callbacks, or user threads, or coroutines, is a convenience layer for structuring your code. As I understand, that layer does add some overhead, because it captures an environment, and has to later restore it.

hinkley · on June 12, 2020

I'm starting to wonder what the origin story is for titles like this. Have CS programs dropped the ball? Did the author snooze through these fundamentals? Or are they a reaction to coworkers who have demonstrated such an educational gap?

Async and parallel always use more CPU cycles than sequential. There is no question. He real questions are: do you have cycles to burn, will doing so brings the wall clock time down, and is it worth the complexity of doing so?

Izkata · on June 12, 2020

I think it's because "async" has been overloaded. The post isn't about what I thought it would be upon seeing the title.

I was thinking this would be about using multiprocessing to fire off two or more background tasks, then handle the results together once they all completed. If the background tasks had a large enough duration, then yeah, doing them in parallel would overcome the overhead of creating the processes and the overall time would be reduced (it would be "faster"). I thought this post would be a "measure everything!" one, after they realized for their workload they didn't overcome that overhead and async wasn't faster.

Upon what the post was about, my response was more like "...duh".

danbruc · on June 12, 2020

Obviously the async framework introduces some overhead, but that bit of overhead is probably a lot less than the 3 billion cpu cycles you'll waste waiting 1000ms for an external service.

Waiting for I/O does usually not waste any CPU cycles, the thread is not spinning in a loop waiting for a response, the operating system will just not schedule the thread until the I/O request completed.

toxik · on June 12, 2020

Sigh. Async is somewhat orthogonal to parallel.

You are making dinner. You start to boil water for the potatoes. While that happens, you prepare the beef. Async.

You and your girlfriend are making dinner. You do the potatoes, she does the beef. Parallel.

You can perhaps see how you could have asynchronous and parallel execution at the same time.

In the context of a Web server, a request is handled by a single Python process (so don’t give me that “OS scheduler can do other things”). Async matters here because your request turnover can be higher, even if the requests/sec remains the same.

In the cooking example, each request gets a single cook. If that cook is able to do things asynchronously, he will finish a single meal faster.

If it were only parallel, you could have more cooks - because they would be less demanding - but they would each be slower.

stock_toaster · on June 12, 2020

> In the cooking example, each request gets a single cook. If that cook is able to do things asynchronously, he will finish a single meal faster.

There is a bit of nuance here, in that the async-chef would make any individual meal slower than a sync-chef, once the number of outstanding requests is large. The sync-chef would indeed have overall higher wait times, but each meal would process just as fast as normal (eg. more like a checkout line at a grocery store).

I prefer the grocery store checkout line metaphor for this reason. If a single clerk was "async" and checking out multiple people at once, all the people in a line would have an average wait time of roughly the same for a small line size. A "sync" clerk would have a longer line with people overall waiting longer, but each individual checkout would take the same amount of time once the customer managed to reached the clerk.

This is pertinent when considering the resources utilized during the job. If an sync clerk only ever holds a single database connection, while an async clerk holds one for every customer they try to check out at the same time, the sync clerk will be far more friendly to the database (but less friendly to the customers, when there aren't too many customers at once).

toxik · on June 15, 2020

I think you managed to miss the point: the async chef is doing other stuff necessary to fulfill a single order when he can, i.e., while the potatoes are boiling. The sync chef has to wait for the potatoes to boil, only when those are done can he start to fry the beef.

The sync chef doesn't occupy the frying pan when he's boiling potatoes, so in some sense he only really does as much as he can. Having hundreds of sync chefs would likely be more efficient in terms of order volume, _but not order latency._

danbruc · on June 12, 2020

I do not disagree with that, my point was just that you are not wasting clock cycles, you may however, as you pointed out, be wasting time while waiting for I/O to complete which you could potentially make better use of by using some more clock cycles while the I/O operation is in progress to do more work which is not dependent on the I/O result.

toxik · on June 12, 2020

I didn’t mean to disagree with you, I just wanted to put my take on it out there

maxmalysh · on June 12, 2020

https://en.wikipedia.org/wiki/C10k_problem

rumanator · on June 12, 2020

> How is this result surprising? The point of coroutines isn't to make your code execute faster, it's to prevent your process sitting idle while it waits for I/O.

It depends on what you mean by "faster". HTTP requests are IO bound, thus it is to be expected that the throughout of a IO bound service benefits from a technology that prevents your process from sitting idle while waiting for IO.

Thus it's surprising that Python's async code performs worse, not better, in both throughput and latency.

> When you're dealing with external REST APIs that take multiple seconds to respond, then the async version is substantially "faster"

The findings reported in the blog post you're commenting are the exact opposite of your claim: Python's async performs worse than it's sync counterpart.

ashtonkem · on June 12, 2020

We need to stop saying “faster” with regards to async. The point of async was always either fitting more requests per compute resource, and/or making systems more latency consistent under load.

“Faster” is misleading because the speed improvements that you get with async is very dependent on load. At low levels there is going to typically be negligible or no speed gains, but at higher levels the benefit will be incredibly obvious.

The one caveat to this is cases where async allows you to run two requests in parallel, rather than sequentially. I would argue that this is less about async than it is about concurrency, and how async work can make some concurrent work loads more ergonomic to program.

zzzeek · on June 12, 2020

you just contradicted yourself:

> “Faster” is misleading

and

> "At low levels there is going to typically be negligible or no speed gains, but at higher levels the benefit will be incredibly obvious."

there are no "speed" gains period. the same amount of work will be accomplished in the same amount of time with threads or async. async makes it more memory efficient to have a huge number of clients waiting concurrently for results on slow services, but all of those clients walking off with their data will not be reached "faster" than with threads.

the reason that asyncio advocates say that asyncio is "faster" is based on the notion that the OS thread scheduler is slow, and that async context switches are some combination of less frequent and more efficient such that async is faster. This may be the case for other languages but for Python's async implementations it is not the case, and benchmarks continue to show this.

ashtonkem · on June 12, 2020

I did not contradict myself; saying that async is “faster” implies speed gains in all circumstances. In reality the benefits of async io is extremely load dependent, which is why I don’t want to call it “faster”.

vertex-four · on June 12, 2020

The other thing about async is that, in some scenarios, it can make shared resource use clearer - i.e. in a program I've written, the design is such that one type on one thread (a producer) owns the data and passes it to consumers directly, rather than trying to deal with lock-free algorithms and mutexes for sharing the data and suchlike. A multi-threaded ring buffer is much less clearly correct than a single-threaded one.

delusional · on June 12, 2020

> but that bit of overhead is probably a lot less than the 3 billion cpu cycles you'll waste waiting 1000ms for an external service.

You are not waiting for that 1000ms, and you haven't been for 35 years since the first os's starting feature preemptive multitasking.

When you wait on a socket, the OS will remove you from the CPU and place someone who is not waiting. When data is ready, you are placed back. You aren't wasting the CPU cycles waiting, only the ones the OS needs to save your state.

Actually standing there and waiting on the socket is not a thing people have done for a long time.

pdpi · on June 12, 2020

> You are not waiting for that 1000ms, and you haven't been for 35 years since the first os's starting feature preemptive multitasking.

The point is that async IO allows your own process/thread to progress while waiting for IO. Preemptive multitasking just assigns the CPU to something else while waiting, which is good for the box as a whole, but not necessarily productive for that one process (unless it is multithreaded).

hedora · on June 12, 2020

Sync I/O lets your process (not thread) do something else. In other languages, async I/O is faster because it avoids context switches and amortizes kernel crossings. Apparently this is not the case in practice for python.

This doesn’t surprise me at all, as I’ve had to deal with async python in production, and it was a performance and reliability nightmare compared to the async Java and C++ it interacted with.

dilandau · on June 12, 2020

>it's to prevent your process sitting idle while it waits for I/O.

...with the goal of making your application faster.

arghwhat · on June 12, 2020

... no. With the goal of allowing concurrency without parallelism.

In doing that, you're removing natural parallelism, and end up competing with the kernel scheduler, both in performance and in scheduling decisions.

parhamn · on June 12, 2020

This is a lazy argument. We get it, you know what coroutines are and how the kernel scheduler works (also everyone else in this thread).

That doesn't matter though. If you think the average python user is looking for "concurrency without parallelism" with no speed/performance goal in mind, you totally have the wrong demographic.

The fact that the language chose to implement asyncio on a single thread (again the end user doesn't care that this is the case, it could have been thread/core abstraction like goroutines), with little gain, which lead to a huge fragmentation of its library ecosystem is bad. Even worse that it was done in 2018. Doesn't matter how smart you are about the internals.

arghwhat · on June 12, 2020

How in the world did you come to the conclusion that I thought Python users wanted that? I simply concluded that it's the only thing it provides. I wasn't saying it was a good thing, which I think was what you might have read it as.

Python implements things on a single thread due to language restrictions (or rather, reference implementation restrictions), as the GIL as always disallows parallel interpreter access, so multiple Python threads serve little purpose other than waiting for sync I/O. It's been many years since I followed Python development, but back then all GIL removal work had unfortunately come to a halt...

parhamn · on June 12, 2020

> ...with the goal

> ... no. With the goal

I assumed those meant the end user of the language (it is fair to assume the person you responded to meant that). The goal of the language itself was probably to stay trendy - e.g. JS/Golang/Nim/Rust/etc had decent async stories, where python didn't. Python needed async syntax support as the threading and multiprocessing interfaces were clunky compared to others in the space. What they ended up with arguably isn't good.

I'm pretty familiar with those restrictions which is why I expected this thread to be more of "yeah it sucks that its slower" instead of pulling the "coroutines don't technically make anything faster per se" argument which is distracting.

crimsonalucard1 · on June 12, 2020

I see this elitist attitude all over the internet. First it was people saying “Guys why are you over reacting to corona the flu is worse.”

Then it was people saying “Guys, stop buying surgical masks, The science says they don’t work it’s like putting a rag over your mouth.”

All of these so called expert know it alls were wrong and now we have another expert on asynchronous python telling us he knows better and he’s not surprised. No dude your just another guy on the internet pretending he’s a know it all.

If you are any good, you’ll realize that nodejs will beat the flask implementation any day of the week and the nodejs model is exactly identical to the python async model. Nodejs blew everything out of the water, and it showed that asynchronous single threaded code was better for exactly the test this benchmark is running.

It’s not obvious at all. Why is the node framework faster then python async? Why can’t python async beat python sync when node can do it easily? What is the specific flaw within python itself that is causing this? Don’t answer that question because you don’t actually know man. Just do what you always do and wait for a well intentioned humble person to run a benchmark then comment on it with your elitist know it all attitude claiming your not surprised.

Is there a word for these types of people? They are all over the internet. If we invent a label maybe they’ll start becoming self aware and start acting more down to earth.

ben509 · on June 12, 2020

> Nodejs blew everything out of the water

Node's JIT comes from a web browser's javascript implementation used by billions of people. It's also had async baked in from day one.

Python started single process, added threading, and then bolted async on top of that. And CPython is a pretty straight interpreter.

A comparison between Node and PyPy would be more informative, but PyPy has a far less mature JIT and still has to deal with Python's dynamism.

> If we invent a label maybe they’ll start becoming self aware and start acting more down to earth.

You can't lecture people into self-awareness, any more than experts can lecture everyone into wearing masks.

crimsonalucard1 · on June 13, 2020

Except IO is the bottleneck here. The concurrency model for IO should determine overall speed. If python async is slower for IO tasks then sync then that IS an unexpected result and an indication of a python specific problem.

ben509 · on June 14, 2020

> Except IO is the bottleneck here.

If you say IO is the bottleneck, then you're claiming there is no significant difference between python and node. That's what a bottleneck means.

> The concurrency model for IO should determine overall speed.

"Speed" is meaningless, it's either latency or throughput. Yeah, yeah, sob in your pillow about how mean elites are, clean up your mascara, and learn the correct terminology.

We've already claimed the concurrency model is asynchronous IO for both python and node. Since they are both doing the same basic thing, setting up an event loop and polling the OS for responses, it's not an issue of which has a superior model.

> If python async is slower for IO tasks then sync then that IS an unexpected result and an indication of a python specific problem.

Both sync and async IO have their own implementations. If you read from a file synchronously, you're calling out to the OS and getting a result back with no interpreter involvement. This[2] is a simple single-threaded server in C. All it does is tell the kernel, "here's my IO, wake me up when it's done."

When you do async work, you have to schedule IO and then poll for it. This[1] is an example of doing that in epoll in straight C. Polling involves more calls into the kernel to tell it what events to look for, and then the application has to branch through different possible events.

And you can't avoid this if you want to manage IO asynchronously. If you use synchronous IO in threading or processes, you're still constructing threads or processes. (Which makes sense if you needed them anyway.)

So unless an interpreter builds its synchronous calls on top of async, sync necessarily has less involvement with both the kernel and interpreter.

The reason the interpreter matters is because the latency picture of async is very linear:

* event loop wakes up task * interpreter processes application code * application wants to open / read / write / etc * interpreter processes stdlib adding a new task * event loop wakes up IO task * interpreter processes stdlib checking on task * kernel actually checks on task

Since an event loop is a single-threaded operation, each one of these operations is sequential. Your maximum throughput, then, is limited by the interpreter being able to complete IO operations as fast as it is asked to initiate them.

I'm not familiar enough with it to be certain, but Node may do much of that work in entirely native code. Python is likely slow because it implements the event loop in python[3].

So, not only is Python's interpreter slower than Node's, but it's having to shuffle tasks in the interpreter. If Node is managing a single event loop all in low level code, that's less work it's doing, and even if it's not, Node can JIT-compile some or all of that interpreter work.

[1]: https://github.com/o0myself0o/epoll/blob/master/epoll.c

[2]: https://www.programminglogic.com/example-of-client-server-pr...

[3]: https://github.com/python/cpython/blob/3.8/Lib/asyncio/unix_...

crimsonalucard1 · on June 14, 2020

>If you say IO is the bottleneck, then you're claiming there is no significant difference between python and node. That's what a bottleneck means.

This is my claim that this SHOULD be what's happening under the obvious logic that tasks handled in parallel to IO should be faster then tasks handled sequentially and under the assumption that IO takes up way more time then local processing.

Like I said the fact that this is NOT happening within the python ecosystem and assuming the axioms above are true, then this indicates a flaw that is python specific.

>The reason the interpreter matters is because the latency picture of async is very linear:

I would say it shouldn't matter if done properly because the local latency picture should be a fraction of the time when compared to round trip travel time and database processing.

>Python is likely slow because it implements the event loop in python

Yeah, we're in agreement. I said it was a python specific problem.

If you take a single task in this benchmark for python. And the interpreter spends more time processing the task locally then the total Round trip travel time and database processing time... Then this means the database is faster than python. If database calls are faster then python then this is a python specific issue.

catalogia · on June 12, 2020

You're making the classic mistake of assuming a common thread connects the people who've annoyed you in various unrelated contexts.

zaptheimpaler · on June 12, 2020

I mean no one even mentioned node. Maybe it is faster idk. But we're talking about python?

orf · on June 12, 2020

His async code creates a pool with only 10 max connections[1] (the default). Whereas his sync pool[2], with a flask app that has 16 workers, has significantly more database connections.

I expect upping this number would have a positive effect on asyncio numbers because the only thing[3] this[4] is[5] measuring[6] is how many database connections you have, and is about as far from a realistic workload as you can get.

Change your app to make 3 parallel requests to httpbin, collect the responses and insert them into the database. That's an actually realistic asyncio workload rather than a single DB query on a very contested pool. I'd be very interested to see how sync frameworks fare with that.

1. https://github.com/calpaterson/python-web-perf/blob/master/a...

2. https://github.com/calpaterson/python-web-perf/blob/master/s...

3. https://github.com/calpaterson/python-web-perf/blob/master/a...

4. https://github.com/calpaterson/python-web-perf/blob/master/a...

5. https://github.com/calpaterson/python-web-perf/blob/master/a...

6. https://github.com/calpaterson/python-web-perf/blob/master/a...

calpaterson · on June 12, 2020

Hi - as mentioned in the article all connections went through pgbouncer (limited to 20) and I was careful to ensure that all configurations saturated the CPU so I'm pretty confident they were not waiting on connections to open. Opening a connection from pgbouncer over a unix socket is very fast indeed - my guess is perhaps a couple of orders of magnitude faster than without it. 20 connections divided by 4 CPUs is a lot, and pretty much all CPU time was still spent in Python.

Sidenote here: one thing I found but didn't mention (the reason I put in the pooling, both in Python and pgbouncer) is that otherwise, under load, the async implementions would flood postgres with open connections and everything would just break down.

I think making a database query and responding with JSON is a very realistic workload. I've coded that up many times. Changing it to make requests to other things (mimicking a microservice architecture) is also interesting and if you did that I'd be interested to read your write up.

supermatt · on June 12, 2020

Aren't you still capping the throughput by the query rate of your connection pool though? By limiting that, you are limiting the application as a whole - i.e. your benchmark is bound by the speed of your database, and has (almost) nothing to do with the performance of a specific python implementation.

arghwhat · on June 12, 2020

Only if there are spare resources left to saturate the connection pools, which didn't seem to be the case.

If the system as a whole is well saturated, and the python processes dominate the system load with a DB load proportional to the requests served, then I don't think we would hit any external bottlenecks.

The benchmarks performed are not that great (e.g., virtualized, same machine for all components, etc.), but I don't think the errors are enough to throw off the result. Note, of course, that such results are not universal, and some loads might perform better async.

a1a1a1a1a1a1 · on June 12, 2020

If the benchmark is bound by the database speed, wouldn't the expected result be that all implementations returned roughly the same number of requests per second?

fraggle222 · on June 12, 2020

>Sidenote here: one thing I found but didn't mention (the reason I put in the pooling, both in Python and pgbouncer) is that otherwise, under load, the async implementions would flood postgres with open connections and everything would just break down.

Doesn't this prove that async is waiting for connections when you put a limit on it? The only way async wins is if it is free to hit the db whenever it needs to.

pas · on June 12, 2020

But why async is spending so much CPU if it just waits?

fraggle222 · on June 12, 2020

Who knows. The point is, if when not restricted you get a ton of db connections, then any restriction on that almost definitely means you are imposing a bottle neck. The only way this would not be the case is if it was trying to create db connections when it didn't need them, unlikely.

dna_polymerase · on June 12, 2020

So the CPU and database are the bottlenecks not async Python.

supermatt · on June 12, 2020

The benchmark is certainly flawed, but I don't see how you can jump to that conclusion.

bildung · on June 12, 2020

> His async code creates a pool with only 10 max connections[1] (the default). Whereas his sync pool[2], with a flask app that has 16 workers, has significantly more database connections.

And the reasoning is explained in the article:

"The rule I used for deciding on what the optimal number of worker processes was is simple: for each framework I started at a single worker and increased the worker count successively until performance got worse."

thefreeman · on June 12, 2020

That is talking about WSGI worker processes. OP is talking about database pool connections. They are not the same thing.

tmstieff · on June 12, 2020

Seems many commenters missed this statement. It's also troubling how common it is to hear assertions that async is king especially on projects where your future scale is unknown. Based on https://web.archive.org/web/20160203172420/https://www.maili... presentation, it looks like there is a stronger case for a sync model as the default.

anentropic · on June 12, 2020

> Change your app to make 3 parallel requests to httpbin, collect the responses and insert them into the database. That's an actually realistic asyncio workload

I don't see how that is a more "realistic" asyncio workload.

It might be a workload that async is better suited for, but the point of the article is to compare async web frameworks, which will often be used just to fetch and return some data from the db.

If you had an endpoint which needed to fetch 3 items from httpbin and insert them in the db it may make sense to use asyncio tools for that, even within the context of a web app running under a sync framework+server like Falcon+Gunicorn.

In my experience Python web apps (Django!) often spend surprisingly little time waiting on the db to return results, and relatively a large amount of time cpu-bound instantiating ORM model instances from the db data, then transforming those instances back into primitive types that can be serialized to JSON in an HTTP response. In that context I am not surprised if sync server with more processes is performing better. In this test it's not even that bad... the 'ORM' seems to be returning just a tuple which is transformed to a dict and then serialized.

strbean · on June 12, 2020

This raises an interesting point - async is less well suited to languages that are, well, slow as molasses. If your language is so slow that basic operations dominate even network IO, you're not going to gain much.

anentropic · on June 12, 2020

I should add that when I said above "I am not surprised if sync server with more processes is performing better"... that's only after reading this article and thinking about it

until then I'd had pretty much bought the hype that the new async frameworks running on Uvicorn were the way to go

I'm very glad to see this kind of comparative test being made, it's very useful, even if it later gets refined and added to and the results more nuanced

the_mitsuhiko · on June 12, 2020

He only has 4 CPUs. I doubt rising the worker count is going to help the async situation. From my experience it’s really hard to make async outperform sync when databases are involved because the async layer adds so much overhead. Only when you are completely io bound with lots of connections does async outperform sync in python.

orf · on June 12, 2020

> From my experience it’s really hard to make async outperform sync when databases are involved because the async layer adds so much overhead

Highly disagree as the database is just another IO connection to a server, which is asyncio bread and butter. Being able to stream data from longer running queries without buffering and whilst serving other requests (and making other queries) is really quite powerful.

But yeah, if you're maxing out your database with sync code then async isn't going to make it magically go faster.

zzzeek · on June 12, 2020

Hi, take a look at my benchmarks from five years ago at https://techspot.zzzeek.org/2015/02/15/asynchronous-python-a.... The extra variable with Python is that it's a very CPU-heavy interpreted language and it's really unlikely for an application to be significantly IO bound with a database server on the same network within the realm of CRUD-style queries. asyncio was significantly slower than threads (noting they've made performance improvements since then) and gevent was about the same (which I'm pretty sure is close to as fast as you can get for async in Python).

the_mitsuhiko · on June 12, 2020

The database is mostly just idle IO. You send a query and then you wait for results. That’s something sync python is decent at because when you wait for that IO the GIL is released. The situation is different if there is a lot of activity on the epoll/kqueue etc. (connects, data ready etc.).

orf · on June 12, 2020

Apologies - I completely misread your initial comment. Yeah that's correct.

Despite this I think it's quite rare to hit this limit, at least in the orchestration-style use cases I use asyncio for. With those I value making independent progress on a number of async tasks rather than potentially being blocked waiting for a worker thread to become available.

kissgyorgy · on June 12, 2020

Before you criticize the article, you should read it. He wrote a whole section about the specific worker numbers and why and how he choose them.

nik_s · on June 12, 2020

On top of that, the author uses aiopg rather than asyncpg[1] for the async database operations, even though asyncpg is (allegedly) a whole lot faster.

1. https://github.com/MagicStack/asyncpg

ddorian43 · on June 12, 2020

asyncpg is not scalable. It can only do "session pooling" because it needs advisory_locks, listen/notify, which will end up needing a lot of Postgresql connections.

jaddison · on June 22, 2020

Can you share more information on this (articles, etc)?

ddorian43 · on June 22, 2020

There is no 1 article to explain but you can research each part.

1. One Postgresql connection is a forked process and has memory overhead (4MB iirc) + context switching.

2. A connection can only execute 1 concurrent query (no multiplexing).

3. Asyncpg to be fast, uses the features that I mentioned in my parent post. Those can only be used in Session Pooling https://www.pgbouncer.org/features.html.

The whole point of async is to some other work while waiting for a query (ex a different query).

If you have 10 servers with 16 cores, each vcore has 1 python process, each python process doing 10 simultaneous queries. 10 * 16 * 10 = 1600 opened connections.

The best way IMHO: Is to use autocommit connections. This way your transactions execute in 1 RPC. You can keep multiple connections opened with very light CPU and pooling is best.

I've done 20K short lived queries/second from 1 process with only ~20 connections opened in Postgresql (using Pgbouncer statement pooling).

jordic · on June 12, 2020

Absolutely agree, sum to this the quality of the driver aiopg vs asyncpg...

zzzeek · on June 12, 2020

I am SUPER happy someone else is finally looking at this. It is long past time that the reflexive use of asycnio or systems like gevent/eventlet for no other reason than "hand-wavy SPEED" come to an end. That web applications that literally serve just one user at at time are built in Tornado for "speed". (my example for this is the otherwise excellent SnakeViz: https://jiffyclub.github.io/snakeviz/ which IMO should have just used wsgiref).

As the blog post apparently cites as well (woo!), I've written about the myth of "async == speed" some years ago here and my conclusions were identical.

https://techspot.zzzeek.org/2015/02/15/asynchronous-python-a...

calpaterson · on June 12, 2020

Hi - yes loved your blogpost! Also very tired of the "async magic performance fairy dust" :)

It's a difficult myth to dispel and I think the situation in terms of public mindshare is much worse now than it was in 2015. Some very silly claims from the async crowd now have basically widespread credence. I think one of the root causes is that people are sometimes very woolly about how multi-processing works. One of the others is that I think it's easy to make the conceptual mistake of 1 sync workers = 1 async worker and do a comparison that way

One of my worries is that right now it feels like everything in Python is being rewritten in asyncio and the balkanisation of the community could well be more problematic than 2 vs 3.

zzzeek · on June 12, 2020

> One of my worries is that right now it feels like everything in Python is being rewritten in asyncio and the balkanisation of the community could well be more problematic than 2 vs 3.

this is exactly why the issue is so concerning for me as well.

1337shadow · on June 12, 2020

> I think the situation in terms of public mindshare is much worse now than it was in 2015

Ok in 2015 it was a pain but with Python 3.8 it's actually a only joy & fun in my opinion.

> the balkanisation of the community could well be more problematic than 2 vs 3

If you could call Python2 code from Python3 or vice-versa as easily as you can do with async then it would be comparable.

throwaway894345 · on June 12, 2020

For me it's worth the effort to deal with async if it means not having to deal with uwsgi or other frontends. But in general I think Python has too many problems (packaging, performance, distribution, etc) that it doesn't make sense IMO to invest in new Python projects.

1337shadow · on June 12, 2020

uWSGI is a lot of joy for me, really, I've never been happier with my deployments since I have discovered uWSGI back in 2008 or something, and nowadays it supports plenty of languages so there's just nothing I don't deploy on uWSGI anymore.

Python packaging is something that I have fully automated (maintaining over 50 packages here) and that I'm pretty happy with.

I fail to see the problem with Python packaging, maybe because I have an aggressive continuous integration practice ? (always integrate upstream changes, contribute to dependencies that I need, and when I'm not doing TDD it's only because I have not yet proof that the code I'm writing is not actually going to be useful) That's not something everybody wants to do (I don't understand their reasoning though).

People would rather freeze their dependencies and then cry because upgrading is a lot of work, instead of upgrading at the rhythm of upstream releases. If other packages managers or other languages have packaging features that encourages what I consider to be non-continuous integration then good for them, but that's not how a hacker like me wants to work, being able to "ignore upstream releases" is not a good feature, it made me a sad developer really, "ignoring non-latest releases" have made me a really happy developer.

Most performance issues are not imputable to the language. If they are, it's probably not affecting all your features, you can still rewrite the feature that Python is not well performing for into a compiled language. I need most of my code to be easy to manipulate, and very little of it to actually outperform Python.

I've recently re-assessed if I should keep going with Python for another 10 years, tried a bunch of languages, frameworks, at the end of the month I still wanted a language that easy to manipulate with basic text tools, that's sufficiently easy so that I can onboard junior collegues on my tools, that provides sufficiently advanced OOP because I find it efficient to structure and reuse code.

Python does what it claims, it solves a basic human-computer problem, let's face it: it's here to stay and shine, and its wide ecosystem seems like a solid proof. Wether it makes sense to invest in a project or not should not depend in the language anyway.

throwaway894345 · on June 12, 2020

> uWSGI is a lot of joy for me, there's nothing I don't deploy on uWSGI, even PHP code.

Oh man, we moved away from uwsgi to async a couple of years ago and that's been one of the best decisions we've made. Async is no walk in the park, but not having to deal with uwsgi configuration, etc has been well worth it.

> Python packaging is something that I have fully automated (maintaining over 50 packages here) and that I'm pretty happy with.

Yeah, I don't doubt this. Many people have found a happy path that works for them, but I've found that those tend to be people who don't have significant constraints (e.g., they don't need fast builds, or they don't care about reproducibility, or they don't have to deal with a large number of regular contributors, or etc).

> Most performance issues are not imputable to the language.

This isn't true in a meaningful sense. For the most part, if you're doing anything more complicated than a CRUD app, you will run into performance problems with Python almost immediately upon leaving the prototype phase, and your main options for improving performance are horizontal scaling (multiprocess/multihost parallelism) or rewriting the hot path in a faster language. As previously discussed, these options only work for certain use cases where the ratio of de/serialization to real work is low, so you often find yourself without options. Further, horizontal scaling is expensive (compute is expensive) and rewriting in a different language is differently expensive (you now have to integrate a separate build system and employ developers who are not only well-versed in the new language, but also in implementing Python extensions specifically).

On the other hand, if you chose a language like Go, you would be in the same ballpark of maintainability, onboarding, etc (many would argue Go is easier to write and maintain due to simplicity and static typing) but you would be in a much better place with respect to packaging and performance. You likely wouldn't need to optimize anything since naive Go tends to be 10-100X faster than naive Python, and if you needed to optimize, you can do so in-language without paying any sort of de/serialization overhead (parallelism, memory management, etc), allowing you to eek out another magnitude of performance. There are other options besides Go that also give performance gains, but they often involve trading off simplicity/packaging/deployment/tooling/ecosystem/etc.

> If they are, it's probably not affecting all your features, you can still rewrite the feature that Python is not well performing for into a compiled language.

This is true, but "rewriting features" is usually prohibitively expensive, and it's often non-trivial to figure out up-front which features will have performance problems in the future such that you could otherwise avoid a rewrite.

> Python does what it claims that's a basic human problem, let's face it: it's here to stay and shine.

Yes, Python is here to stay, but that's more attributable to network effects and misinformation than merit in my experience.

1337shadow · on June 12, 2020

Well we can't use uWSGI for ASGI but still good for us for anything else, I literally have 0 uWSGI configuration file, just a uWSGI command in a container command.

> Many people have found a happy path that works for them, but I've found that those tend to be people who don't have significant constraints (e.g., they don't need fast builds, or they don't care about reproducibility, or they don't have to deal with a large number of regular contributors, or etc).

I'm really curious about this statement, building a python codebase for me means building a container image, if the system packages or python dependencies don't change then it's really going to take less than a minute. What does your build look like ?

Can you define "a large number of regular contributors".

What do you mean "they don't need reproductibility" ? I suppose they just build a container image in a minute and then go over and deploy on some host. If a dependency breaks the code, it's still reproductible, but broken, then it means it has to be fixed, rather than ignored, a temporary version pin is fine though.

> This is true, but "rewriting features" is usually prohibitively expensive, and it's often non-trivial to figure out up-front which features will have performance problems in the future such that you could otherwise avoid a rewrite.

If Go is so much easier to write then I fail to see how it can be a problem to use Go to rewrite a feature for which performance is mission critical, and for which you have final specifications in the python implementation you're replacing. But why write it in Go instead of Rust, Julia, Nim, or even something else ?

You're going to choose the most appropriate language for what exactly you have to code. If you're trying to outperform an interpreted language and/or don't care about being stuck with a rudimentary pseudo-object oriented feature set then choose such a compiled language. Otherwise, Python is a pretty decent choice.

> Yes, Python is here to stay, but that's more attributable to network effects and misinformation than merit in my experience.

If Go was easier to write and read, why would they implement a Python subset in Go for configuration files, instead of just having configuration files in Go ? go.starlark.net Oh right, because it's not as easy to read and write than Python, and because you'd need to recompile. So apparently, even Google who basically invented also seem to need it to support some Python dialect.

10-100X performance is most probably something you'll never need when starting a project, unless performance is mission critical from the start. Static types and compile is an advantage for you, but for me dynamic typing and interpretation means freedom (again, I'm going to TDD on one hand and fix runtime exceptions as soon as I see them in applicative monitoring anyway).

I don't believe comparing Python and Go is really relevant, comparing PHP and Ruby and Python for example would seem more appropriate, when you say "people shouldn't need Python because they have Go" I fail to see the difference with just saying "people shouldn't need interpreted languages because there are compiled languages".

Humans need a basic programing language that is easy to write and read, without caring about having to compile it for their target architecture, Python claims to do that, and does it decently. If you're looking for more, or something else, then nobody said that you should be using Python.

I might be wrong, but when I'm talking about Humans, I'm referring to, what I have seen during the last 20 years as 99% of the projects out there in the wild, not the 1% of projects that have extremely specific mission critical performance requirements, thousands of daily contributors, and the like. Those are also pretty cool, and they need pretty cool technology, but it's really not the same requirements. For me saying everybody needs Go would look a bit like saying everybody needs k8s or AWS. Languages are many and solve different purpose. The one that Python serves is staying, not by misinformation, but because of Human nature.

throwaway894345 · on June 12, 2020

> What does your build look like ?

Running tests, building a PEX file, putting the PEX file into a container image. We have probably about a dozen container images and counting at this point. The tests take a long time (because Python is 2+ orders of magnitude slower than other languages), and our CI bill is killing us (we're looking into other CI providers as well).

> Can you define "a large number of regular contributors".

More than 20 (although our eng org is 30-50). Multiple teams. You don't want to hold everyone's hand and show them all the tips and tricks you've found for working around the quirks of Python packaging or give them an education on wheels, bdists, sdists, virtualenvs, pipenvs, pyenvs, poetries, eggs, etc. They were promised Python was going to be easy and they wouldn't have to learn a bunch of things, after all.

> What do you mean "they don't need reproductibility" ? I suppose they just build a container image in a minute and then go over and deploy on some host.

Container images aren't reproducible in practice. Moreover, they have to also be reproducible for local development, and we use macs and Docker for mac is prohibitively slow. Need something else to make sure developers aren't dealing with dependency hell.

> If Go is so much easier to write then I fail to see how it can be a problem to use Go to rewrite a feature for which performance is mission critical, and for which you have final specifications in the python implementation you're replacing.

Both can be true: Go is easier to write than Python and it's still prohibitively expensive to rewrite a whole feature in Go. If the feature is small, well-designed, and easily isolated from the rest of the system, then rewriting is cheap enough, but these cases are rare and "opportunity cost" is a real thing--time spent rewriting is time not spent building new features.

> But why write it in Go instead of Rust, Julia, Nim, or even something else ?

Because Rust slows development velocity by an order of magnitude and Julia and Nim aren't mature general-purpose application development languages.

> You're going to choose the most appropriate language for what exactly you have to code. If you're trying to outperform an interpreted language and/or don't care about being stuck with a rudimentary pseudo-object oriented feature set then choose such a compiled language. Otherwise, Python is a pretty decent choice.

Yes, you have to choose the most appropriate language, but I contend that Python is a pretty rubbish choice for reasons that people often fail to consider up front. E.g., "My app will never need to be fast, and if it's fast I can just rewrite the slow parts in C!".

> If Go was easier to write and read, why would they implement a Python subset in Go for configuration files, instead of just having configuration files in Go ? go.starlark.net Oh right, because it's not as easy to read and write than Python, and because you'd need to recompile. So apparently, even Google who basically invented also seem to need it to support some Python dialect. Starlark is pretty cool though, and I use it a lot; I just wish it were statically typed.

Apples and oranges. Starlark is an embedded scripting language, not an app dev language. Different design goals. It also probably pre-dates Go, or at least derives from something which pre-dates Go.

> 10-100X performance is most probably something you'll never need when starting a project, unless performance is mission critical from the start.

You would be surprised. As soon as you're doing something moderately complex with a small-but-not-tiny data set you can easily find yourself in the tens of seconds. And 100X is the difference between a subsecond request and an HTTP timeout. It matters a lot.

> Static types and compile is an advantage for you, but for me dynamic typing and interpretation means freedom (again, I'm going to TDD on one hand and fix runtime exceptions as soon as I see them in applicative monitoring anyway).

We do TDD for our application development too and we still see hundreds of typing errors in production every week. I think your idea of "static typing" is jaded by Java or C++ or something; you can have fast, flexible iteration cycles with Go or many of the newer classes of statically typed languages, as previously mentioned. "Type inference" (in moderation) is your friend. Anyway, Go programs can often compile in the time it takes a Python program to finish importing its dependencies. A Go test can complete in a fraction of the time it takes for pytest to start testing (no idea why it takes so long for it to find all of the tests).

> I don't believe comparing Python and Go is really relevant, comparing PHP and Ruby and Python for example would seem more appropriate, when you say "people shouldn't need Python because they have Go" I fail to see the difference with just saying "people shouldn't need interpreted languages because there are compiled languages".

"compiled" and "interpreted" aren't use cases. "General app dev" is a use case. Python and Go compete in the same classes of tools: web apps, CLI applications, devops automation, lambda functions, etc. PHP and Ruby are also in many of these spaces as well. I don't especially care if Python is the fastest interpreted language (it's not by a long shot), I care if it's fast enough for my application (it's not by a long shot).

> Humans need a basic programing language that is easy to write and read, without caring about having to compile it for their target architecture, Python claims to do that, and does it decently. If you're looking for more, or something else, then nobody said that you should be using Python.

Lots of people recommend Python for use cases for which it's not well suited, and since so many Python dependencies are C, you absolutely have to worry about recompiling for your target architecture, and it's much, much harder than with Go (to recompile a Go project for another architecture, just set the OS and the architecture via the `GOOS` and `GOARCH` env vars and rerun `go build`--you'll have a deployable binary before your Python Docker image finishes building).

> I might be wrong, but when I'm talking about Humans, I'm referring to, what I have seen during the last 20 years as 99% of the projects out there in the wild, not the 1% of projects that have extremely specific mission critical performance requirements

Right, Python is alright for CRUD apps or any other kind of app where the heavy lifting can easily be off-loaded to another language. There's still the build issues and everything else to worry about, but at least performance isn't the problem. But I think you'll be surprised to find out that lots of apps don't fit that bill.

> For me saying everybody needs Go would look a bit like saying everybody needs k8s or AWS.

I'm not saying everyone needs Go, I'm saying that Go is a better Python than Python. There are a handful of exceptions--there's not currently a solid Go-alternative for django, and I wouldn't be surprised if the data science ecosystem was less mature. But for general purpose development, I think Go beats Python at its own game. And I've been playing that game for a decade now. This conversation has been pretty competitive, but I really encourage you to give Go a try--I think you'll come around eventually, and you can learn it so fast that you can be writing interesting programs with it in just a few hours. Check out the tour: https://tour.golang.org.

1337shadow · on June 13, 2020

I understand that if you're building a PEX file then all dependencies must be reinstalled into it every time, however you might still be able to leverage container layer caching to save the download time.

CI bills are aweful, I always deploy my own CI server, a gitlab-runner where I also spawn a Traefik instance to practice eXtreme DevOps.

More than 20 daily contributors that's nice, but I must admit that I have contributed to some major python projects that don't have a packaging problem, such as Ansible or Django. So, I'm not sure if the number of contributors is really a factor in packaging success. That said, sdist and well are things that happen in CI for me, it's just adding this to my .gitlab-ci.yml:

    pypi:
        stage: deploy
        script: pypi-release

And adding TWINE_{USERNAME,PASSWORD} to CI. The other trick is to use the excellent setupmeta or something like that (OpenStack also has a solution) so that setup.py discovers the version based on the git tag or publishes a dev version.

That's how I automate the packaging of all my Python packages (I have something similar for my NPM packages). As for virtualenvs, it's true that they are great but I don't use them, I use pip install --user, which has the drawback that you need all your software to run with the latest releases of dependencies, otherwise you have to contribute the fixes, but I'm a more happy developer this way, and my colleagues aren't blocked by a breaking upstream release very often, they will just pin a version if they need to keep working while somebody takes care of changing our code and contribute to dependencies to make everything work with latest versions.

I don't think that other languages are immune to version compatibility issues, I don't think that problem is language dependent, either you pin your versions and forget about upstream releases, either you aggressively integrate upstream releases continuously in your code and your dependencies.

> My app will never need to be fast

I maintain a governmental service that was in production in less than 3 months, then 21 months of continuous development, serving 60m citizen with a few thousand administrators, as sole techie, on a single server, for the third year. Needless to say, my country has never seen such a fast and useful project. I have not optimized anything. Of course you can imagine it's not my first project in this case. For me, Python's speed most often not a problem is not a lie, I proved it.

The project does have a slightly complex database, the administration interface does implement really tight permission granularity (each department has its own admin team with users of different roles), it did have to iterate quickly, but you know the story with Django : changing the DB schema is easy, migrations are generated by Django, you can write data migrations easily, tests will tell you what you broke, you write new tests (I also use snapshot testing so a lot of my tests actually write themselves), and upgrading a package is just as easy as fixing anything that broke when running the tests.

You seem to think that Python is outdated because it's old, and that's also what I thought when I went over all alternative for my 10 next years of app devs. I was ready to trash all my Python really. But that's how I figured that the human-computer problem Python solves will just always be relevant. I'll assume that you understand the point I made on that and that we simply disagree here.

Or maybe we don't really disagree, I'll agree with you that a compiled language is better for mission-critical components, but any of these will almost always need a CRUD and that's where Python shines.

But I've not always been making CRUDs with Python, I have 2 years of experience as an OpenStack developer, and I must admit that Python fit the bill pretty well here too. Maybe my cloud company was not big enough to have problems, or we just avoided the common mistakes. I know people like Rackspace had hard times maintaining forks of the services, I was the sole maintainer of 4 network services rewrites which were basically 1 package using OpenStack as a framework (like I would use Django), to simply listen on RabbitMQ and do stuff on SDN and SSH. Then again, I think not so much people actually practice CI/CD correctly, so that's definitely going to be a problem for them at some point.

> there's not currently a solid Go-alternative for django

That's one of the things that put me of, I tried all Go web frameworks, and they are pretty cool, but will they ever reach the productivity levels of Django, Rails or Symfony ?

Meanwhile, I'm just waiting for the day someone puts me in charge of something where performance is sufficiently performance-critical that I need to rewrite it in a compiled language, if I could have the chance to do some ASM optimizations that would also be a lot of fun. Another option is that I have something to contribute to a Go project, but so far, Go developers seem doing really fine without me for sure :)

While I choose it for general purpose development ? I guess I'm stuck with "I love OOP" just like "the little functional programing Python offers".

I really enjoyed this conversation too, would like to share it on my blog if you don't mind, thank you for your time, have a great weekend.

nerdponx · on June 12, 2020

This is kind of weird to me though. All this effort being spent to argue what is effectively a strawman belief only held by people who don't fully understand what they believe.

dehrmann · on June 12, 2020

> hand-wavy SPEED

In general, you can get higher throughput with asyncio because you don't have context switches, but it comes at the cost of latency. So hand-wavy, indeed. It really depends what sort of speed you're after.

ahupp · on June 12, 2020

This is true as far as it goes, but is not testing the (very common) areas where async shines.

Imagine you're loading a profile page on some social networking site. You fetch the user's basic info, and then the information for N photos, and then from each photo the top 2 comments, and for each comment the profile pic of the commentor. You can't just fetch all this in one shot because there's data dependencies. So you start fetching with blocking IO, but that makes your wait time for this request proportional to the number of fetches, which might be large.

So instead, you ideally want your wait to be proportional to the depth of your dependency tree. But composing all these fetches that way is hard without the right abstraction. You can cobble it together with callbacks but it gets hairy fast.

So (outside of extreme scenarios) it's not really about whether async is abstractly faster than sync. It's about how real developers would solve the same problem with/without async.

(Source: I worked on product infrastructure in this area for many years at FB)

reggieband · on June 12, 2020

I felt baffled by this thread until I read this response. async/await for me has always been about managing this kind of dependency nightmare. I guess if all you have to do is spawn 100 jobs that run individually and report back to some kind of task manager then the performance gains of threads probably beats async/coroutine based approaches on a pure speed benchmark. But when I have significant chains of dependent work then the very idea of using bare threads and callbacks to manage that is annoying.

At least in Typescript nowadays, the ability to just mark a function `async` and throw an `await` in front of its invocation drastically lowers the barrier to moving something from blocking to non-blocking. In the same cases if I had to recommend the same change with thread pools and callbacks (and the manual book-keeping around all that) most developers just wouldn't bother.

Groxx · on June 13, 2020

>... the very idea of using bare threads and callbacks to manage that is annoying.

Yeah, that's an extremely painful way to write threaded code. Much more normal is to simply block your thread while waiting for others to .Join() and return their results, likely behind an abstraction layer like a Future.

The only time you really need to use callbacks is when you need to blend async and threaded code, and you aren't able to block your current thread (e.g. Android main thread + any thread use is an example of this). But there are much much easier ways to deal with that if you need to do it a lot - put your primary logic in a different, blockable thread.

sicromoft · on June 12, 2020

> just mark a function `async` and throw an `await` ... to [move] something from blocking to non-blocking.

That's not how it works. `async` and `await` are merely syntactic sugar around callbacks. Everything in javascript is already nonblocking[1], whether or not you use async/await.

[1] There are a few rare exceptions in node js (functions suffixed with "Sync"), but in the same vein, they are blocking whether or not you use async/await.

peferron · on June 12, 2020

The argument was about the developer experience, not how things work behind the scenes. It's super simple for a developer to write this, for example:

    const a = an async operation
    const b = another async operation
    // Resolve a and b concurrently
    const [x, y] = await Promise.all([a, b])
    // Do something with x and y

You can naturally achieve that with callbacks but there's more boilerplate involved. I'm not familiar with Python so I don't know how it would look like without async.

Edit: I just re-read your comment and the one you were responding to, and do agree that async/await don't "move" things from blocking to non-blocking. It just helps using already non-blocking resources more easily. It will not help you if you're trying to make a large numerical computation asynchronous, for example. In this regard it's very different from Golang's `go`, which will run the computation in a separate goroutine, which itself will run concurrently (with Go's scheduler deciding when to yield), and in parallel if the environment allows it.

earthboundkid · on June 13, 2020

As someone who works in both Python and JavaScript regularly, JS’s async is just leagues easier and better. It’s night and day. Even something as simple as new Promise or Promise.all is way more confusing in Python. It’s very different.

alexhutcheson · on June 12, 2020

A lot of the debate and discussion here seems to come from the fact that the example program demonstrates concurrency across requests (each concurrent request is being handled by a different worker), but no concurrency within each request: The code to serve each request is essentially one straight line of execution, which pauses while it waits for a DB query to return.

A more interesting example would be a request that requires multiple blocking operations (database queries, syscalls, etc.). You could do something like:

    # Non-concurrent approach
    def handle_request(request):
      a = get_row_1()
      b = get_row_2()
      c = get_row_3()
      return render_json(a, b, c)
   

    # asyncio approach
    async def handle_request(request):
      a, b, c = await asyncio.gather(
        get_row_1(),
        get_row_2(),
        get_row_3())
      return render_json(a, b, c)

    # Naive threading approach
    def handle_request(request):
       a_q = queue.SimpleQueue()
       t1 = threading.Thread(target=get_row_1(a_q))
       t1.start()
       b_q = queue.SimpleQueue()
       t2 = threading.Thread(target=get_row_2(b_q))
       t2.start()
       c_q = queue.SimpleQueue()
       t3 = threading.Thread(target=get_row_3(c_q))
       t3.start()

       t1.join()
       t2.join()
       t3.join()

       return render_json(a_q.get(), b_q.get(), c_q.get())


    # concurrent.futures with a ThreadPoolExecutor 
    def handle_request(request, thread_pool):
      a = thread_pool.submit(get_row_1())
      b = thread_pool.submit(get_row_2())
      c = thread_pool.submit(get_row_3())
      return render_json(a.result(), b.result(), c.result())

These examples demonstrate what people find appealing about asyncio, and would also tell you more about how choice of concurrency strategy affects response time for each request.

knite · on June 13, 2020

This a great point, surprised you received no follow-up comments!

berbc · on June 12, 2020

Is speed really a good reason for using async? If I remember correctly, asynchronous I/O was introduced to deal with many concurrent clients.

Therefore, I would have liked to see how much memory all those workers use, and how many concurrent connections they can handle.

jillesvangurp · on June 12, 2020

I think speed is the wrong word here. A better word is throughput.

The underlying issue with python is that it does not support threading well (due to the global interpreter lock) and mostly handles concurrency by forking processes instead. The traditional way of improving throughput is having more processes, which is expensive (e.g. you need more memory). This is a common pattern with other languages like ruby, php, etc.

Other languages use green threads / co-routines to implement async behavior and enable a single thread to handle multiple connections. On paper this should work in python as well except it has a few bottlenecks that the article outlines that result in throughput being somewhat worse than multi process & synchronous versions.

MaxBarraclough · on June 12, 2020

I think 'scalability' is the best word here.

Taken from Stephen Cleary's SO answer on this topic: https://stackoverflow.com/a/31192718

throwaway894345 · on June 12, 2020

> which is expensive (e.g. you need more memory)

Memory is cheap; the cost is in constant de/serialization. Same with "just rewrite the hotspots in C!"-style advice; de/serialization can easily eat anything you saved by multiprocessing/rewriting. Python is a deceivingly hard language, and a lot of this is a direct result of the "all of CPython is the public C-extension interface!" design decision (significant limitations on optimizations => heavy dependency on C-extensions for anything remotely performance sensitive => package management has to deal extensively with the nightmare that is C packaging => no meaningful cross-platform artifacts or cross compilation => etc).

ianbutler · on June 12, 2020

Memory is not cheap when dealing the real world cost of deploying a production system. The pre fork worker model used in many sync cases is very resource intensive and depending on the number of workers you're probably paying a lot more for the box it's running on, ofc this is different if you're running on your own metal but I have other issues with that.

throwaway894345 · on June 12, 2020

> Memory is not cheap when dealing the real world cost of deploying a production system.

What? What makes you say that? What did you think I was talking about if not a production system? To be clear, we're talking about the overhead of single-digit additional python interpreters unless I'm misunderstanding something...

ianbutler · on June 12, 2020

Observed costs from companies running the pre fork worker model vs alternative deployment methods and just in the benchmark they're running double digit interpreters which I've seen as more common and expensive.

throwaway894345 · on June 12, 2020

Double-digit interpreters per host? Where is the expense? Interpreters have a relatively small memory overhead (<10mb). If you're running 100 interpreters per host (you shouldn't be), that's an extra $50/host/year. But you should be running <10/host, so an extra $5/host/year. Not ideal, but not "expensive", and if you care about costs your biggest mistake was using Python in the first place.

ianbutler · on June 12, 2020

I don't know where you're seeing the < 10mb from the situation I saw they were easily consuming 30mb per interpreter. Even my cursory search around now shows them at roughly 15-20mb so assuming the 30mb Gunicorn was just misconfigured that's still an extra $100 per host using your estimate and what I'm looking at Googling around and across a situation where there are multiple public apis that's adding up pretty quickly.

Another google search shows me Gunicorn, for instance, using high memory on fork isn't exactly uncommon either.

Edit: I reworded some stuff up there and tried to make my point more clear.

throwaway894345 · on June 12, 2020

The interpreter overhead on macos is 7.7mb. I can't speak to gunicorn configuration but it's far from the only game in town.

ianbutler · on June 12, 2020

Totally fair point, my experience with fork type deploys has only been Gunicorn so I'll take this as a challenge to try some others out.

earthboundkid · on June 13, 2020

Yes, C dependency management is awful, and because Python is only practical with C extensions for performance critical code, it ends up being a nightmare as well.

jordic · on June 12, 2020

In our use case switching to asyncio it's like moving from 12 cores to 3... (And I'm pretty sure we are handling more concurrency... from 24-30 req/s to 150req/s But our workload is mostly network related (db, external services...)

blondin · on June 12, 2020

same.

maybe author is concerned that many people are jumping the gun on async-await before we all fully understand why we need it at all. and that's true. but that paradigm was introduced (borrowed) to solve a completely different issue.

i would love to see how many concurrent connections those sync processes handle.

calpaterson · on June 12, 2020

Hi - not sure what you mean by this. The sync workers handle one request (to completion) per worker. So 16 workers means 16 concurrent requests. For the async workers it's different - they do more concurrently - but as discussed their throughput is not better (and latency much worse).

Maybe what you're getting at is cases where there are a large number of (fairly sleepy) open connections? Eg for push updates and other websockety things. I didn't test that I'm afraid. The state of the art there seems to be using async and I think that's a broadly appropriate usage though that is generally not very performance sensitive code except that you try to do as little as possible in your connection manager code.

fnord123 · on June 12, 2020

In the case of everything working smoothly that model may play out. But if you get a client that times out, or worse, a slow connection then they used one of your workers for a long time in a synchronous model. In the async model this has less of a footprint as you are still accepting other connections despite the slow progress of one of the workers.

blondin · on June 12, 2020

yes many open connections is what i meant (suggested by other people as well). by the way, i really liked the writing, it's refreshing. and i agree with you that people aren't using async for the right reasons.

calpaterson · on June 12, 2020

Thanks :) , really appreciate that. I think all technology goes through a period of wild over-application early on. My country is full of (hand dug) canals for example

rlpb · on June 12, 2020

I find it interesting that all the talk here is about performance, and nobody has mentioned any benefits of Async Python when performance isn't an issue.

I use trio/asyncio to more easily write correct complex concurrent code when performance doesn't matter. See "The Problem with Threads"[1].

For this use case, Async Python probably still isn't faster, but that doesn't matter. Let's not throw out the baby with the bathwater :)

[1] https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-...

PaulHoule · on June 12, 2020

I love asyncio for writing mixed initiative "servers". For instance, I have an asyncio "server" that accepts websocket connections on one side, waits on an AQMP queue, proxies requests and mediates for the HEOS smart speaker API, Phillips Hue, U.S. Weather Service, etc.

This is great for react or vue front end applications which get their state updated when things happen in the outside world (e.g. somebody else starts the music player, that gets related)

When CPU performance is an issue (say generate a weather video from frames) you want to offload that into another process or thread, but it is an easy programming style if correctness matters.

O5vYtytb · on June 12, 2020

That sounds a lot like home-assistant :)

PaulHoule · on June 12, 2020

It is a little bit, except this one is customizable, maintainable, and not phonish in any way. In particular, there is no "one ring to rule them all" App but rather there are very simple one-task applications (put one button to pair the left/right computer to the soundbar via Optical or Coax) and also some applications that are highly complex (e.g. multiple windows)

rukittenme · on June 12, 2020

Whats the point of writing concurrent code if its not faster?

Jtsummers · on June 12, 2020

Contrasting with jdlshore, concurrency can make programs much easier to reason about, when done well. This is a benefit of both Go and Erlang, though they use different approaches.

Concurrency can help you separate out logic that is often commingled in non-concurrent code, but doesn't need to be. As a real-world example, I used to do safety critical systems for aircraft. The linear, non-concurrent version, included a main loop that basically executed a couple dozen functions. Each function may or may not have dependencies on the other functions, so information was passed between them over multiple passes through this main loop (as their order was fixed) using shared memory.

A similar project had about a dozen processes, each running concurrently. There was no speed improvement, but the connection between each activity was handled via channels (equivalent in theory to Go's channels, less like Erlang's mailboxes as the channels could be shared). We knew it was correct because each process was a simple state machine, separated cleanly from all other state machines.

The second system's code was much simpler, there was no juggling (in our code) of the state of the system, compared to managing the non-concurrent logic. If a channel had data to be acted on, the process continued, otherwise it waited. Very simple. And it turns out that many systems can be modeled in a similar fashion (IME). Of course, we had a very straightforward communication mechanism (again, essentially the same as Go channels except it was a library written in, as I recall, Ada by whoever made the host OS).

rukittenme · on June 12, 2020

Signals are not dependent on concurrency. And you don't need multiple processes to implement a state machine.

I mean think about it. Whats the difference between sending message A and then message B versus sending messages A and B into a queue and letting some async process pop from it? Less complexity and guaranteed message delivery come for free in single-threaded code.

Am I wrong? What am I missing?

jdlshore · on June 12, 2020

I don't think you're wrong, but in Jtsummers' specific case, I think multi-processing probably would be simpler. You don't have to implement the event loop, there's no risk of tromping on other processes' data, and if a process gets into an invalid state, you can just die without impacting others.

You'd need a good watchdog and error handling, but presumably some of that came for "free" in their environment.

Although if you take out the "free" OS support, watchdog, etc., I agree that there's likely a place between "shared memory spaghetti" and "multi-processing" that's simpler than both.

Jtsummers · on June 12, 2020

Exactly this. I had started my own reply and refreshed and saw yours, thanks.

The other benefit of the concurrent design (versus the single-threaded version) was that it was actually much simpler. This was critical for our field because that system is still flying, now 12 years later, and will probably be flying for another 30-50 years. The single-threaded system was unnecessarily complex. Much of the complexity came from having to include code to handle all the state juggling between the separate tasks, since each had some dependency on each other (not a fully connected graph, but not entirely disconnected either). The concurrent design made it trivial to write something very close to the most naive version possible, where waiting was something that only happened when external input was needed. So the coordination between each task just fell out naturally.

You still have to care about locking the system up, but in our case because each process was sufficiently reduce to its essentials, this was easy to evaluate and reason about.

detaro · on June 12, 2020

"some async process" is a concurrency mechanism, is it not?

rukittenme · on June 12, 2020

It is. The single-threaded example comes before the "versus". The async example comes after. I should have been more clear.

detaro · on June 12, 2020

Ah, indeed misread that. Then my answer is: Singlethreaded code sometimes has to implement things an async environment would handle for you.

I.e. when handling many in- and outputs I can write my own loop around epoll etc, write logic to keep of track of queues of data to send per-target etc. Or I can use a runtime that provides that for me and lets me mostly pretend things are running on their own.

jdlshore · on June 12, 2020

Concurrency is notoriously difficult to reason about. Concurrency bugs are also a f__king nightmare to debug.

Given how slow I/O operations are, and how much modern code depends on the network, we typically need some concurrency in our code. So for me, almost always, the question isn't, "which concurrency choice is fastest?" but rather, "which concurrency choice is fast enough while leading to code with the least bugs?"

rukittenme · on June 12, 2020

If you are I/O bound, concurrency has a use case. I don't argue against it. I'm pointing out that its pointless to write concurrent code if you don't expect a performance benefit from it.

It's like multi-threading 2+2.

rlpb · on June 12, 2020

See https://news.ycombinator.com/item?id=23502286 for a good example.