Hacker News new | past | comments | ask | show | jobs | submit login
SciPy builds for Python 3.12 on Windows are a minor miracle (quansight.org)
475 points by todsacerdoti 6 months ago | hide | past | favorite | 292 comments



What an amazing read. Now I know why my pip installs are failing in 3.12 but we now have a brighter future ahead.

Also while I love Python it’s helpful to understand why Python packaging is a (manageable) mess. It’s because of non standardization of build tools for C/C++/fortran and the immensity of the ecosystem, nothing to do with Python itself. It’s part irreducible complexity.

It’s a miracle it works at all.


Yes, that's a fundamental reason python packaging is a mess. Python success is largely due to the availability of key mixed language packages. No other mainstream language package manager has to deal with this.

For example, cargo for rust, which is great, can assume to package mostly rust-only code. And while it is compiled, the language "owns" the compiler, which means building from sources as distribution strategy works. I don't know how/if cargo can deal with e.g. fortran out of the box, but I doubt cargo on windows would work well if top cargo packages required fortran code.

The single biggest improvement for python ecosystem was the standardisation of a binary package format, wheel. It is only then that the whole scientific python ecosystem started to thrive on windows. But binary compatibility is a huge PITA, especially across languages and CPUs.


Many rust crates actually do package code written in other languages. There are plenty of useful C/C++/Fortran libraries that nobody has rewritten into Rust but for which wrappers have been created that call into C. It works in Rust because the build.rs lets libraries do whatever they want during the build process, including invoking compilers for other languages.

Various factors still make the Rust and Python story different (Python uses more mixed-language packages, the Rust demographic is more technically advanced, etc). But a big one is that in Rust, the FFI is defined in Rust. Recompiling just the Rust code gives you an updated FFI compatible with your version of Rust. In Python, the FFI is typically defined in C, so recompiling the python won't get you a compatible FFI. If Python did all FFI through something like ctypes, it would be much more smooth.


> It works in Rust because the build.rs lets libraries do whatever they want during the build process, including invoking compilers for other languages.

It's still its own mess. You'll find plenty of people having problems with openssl, for instance.


I have the pleasure of maintaining a reasonably popular -sys crate, and getting it working on Windows is an absolute nightmare, especially considering the competing toolchains & ABIs (msvc and gnu) and millions of ways the C dependency can be installed (it’s a big one, and users may choose a specific version, so we don’t bundle it), with no pkg-config telling you where to look.

No idea how build.rs being able to run any Rust code is an advantage, it’s not like setup.py can’t run any Python code and shell out. In fact, bespoke code capable of spitting out inscrutable errors during `setup.py install` is the old way that newer tooling tries to avoid. Rust evangelism is puzzling.


Wait a second, I need to understand this better.

If you cargo build, can that run a dependencies' build including trying to compile C and stuff?



I believe build.rs can do pretty much anything: https://doc.rust-lang.org/cargo/reference/build-scripts.html


That's both scary and sad :-(


Why? If you are using a crate, its code will be running in your application. Its not really any more of a concern if it can run code while building.


Yeah, but with binary packages you can add another lay of defense in depth, signed packages, signature checking, etc. It's not just about the original authors themselves, it can also be about attacks on the public repositories, for example.


When I referred to build.rs - I merely meant that the build script made it possible to build code written in other languages - not that it solved all the problems. It very much doesn't solve all the problems involved.


Though you get partially saved by the "FFI is defined in Rust" part, with many of the hard to compile crates offering optional prebuilt binaries for the part that's not rust.


> It works in Rust because the build.rs lets libraries do whatever they want during the build process, including invoking compilers for other languages.

And also there's a helper library (the "gcc" crate) which does all the work of figuring out how to call the C or C++ compiler for the target platform, so that build.rs can be very small for simple cases. You don't have to do all the work yourself.


Don't you mean the `cc` crate? `gcc` is its terribly-outdated predecessor.


It's the same crate, it just changed its name at some point in its history. I still refer to it through its former name by force of habit.


I know it's the same crate, but they changed its name. `gcc` is the predecessor of `cc` in terms of name. If you depend on `gcc` instead of `cc`, you won't have any of the new improvements.


gcc is not terribly outdated overall, this is FUD. specific information, please


The `gcc` crate was last published to in 2018. Use `cc`.

https://docs.rs/crate/gcc/latest/builds



As you might expect, compiling rust crates that use C libraries can lead to the inscrutable-block-of-text linker errors we know and love. I have been having a rough time with CUDA, CMSIS-DSP, Tensorflow, and OpenCV during the past few weeks. One of them requires LLVM=v15 to be installed; another requires v16+. A diff one requires an old version of rustc to be installed. On a diff one, when I posted on Github, the maintainers assured me the crate is, in fact, fine, and my system is misconfigured; phew!


Lol, as a maintainer of a reasonably popular and complex -sys crate, our crate is “fine” on Windows, at least in theory, and I’ve heard of successes using it. However, I can’t even port my own app depending on said -sys crate to Windows; there’s always a wall of linker errors. If you report a Windows problem to me, I won’t tell you it’s fine, I just throw my hands in the air.


Out of curiosity is it a “Windows is objectively difficult” problem, or a “Windows is not Linux and I know Linux best” problem?

I’ve only begun using it so my expertise is limited, but I think vcpkg aims to help with some of these difficulties by shipping code as source and then running make on dependencies so they are guaranteed ABI compatible because the same compiler builds everything.


That I don’t know Windows well is certainly a factor, but I think it’s at most 40% of the problem. Several notable issues, not all:

- Competing msvc and gnu toolchains and ABIs, with native and Windows-first dependencies working better or exclusively with msvc, and *ix-first dependencies working better or exclusively with gnu, is a uniquely Windows situation. Which is which for a given build is also not clearly labeled most of the time. (You might mention glibc vs musl, but there’s basically nothing uniquely musl, and when you’re compiling for musl you can almost always get/compile everything for musl from the ground up.)

- Confusing coexistence of x86 and x64 is another thing largely unique to Windows. (amd64 and arm64 are much more clearly separated in Apple land.)

- Package management is a complete mess. Choco, scoop, win-get, nuget, vcpkg, ad hoc msi, ad hoc exe installer, ad hoc zip, etc. etc. There’s no pkg-config telling you where to look and which compiler flags to use. If you want to pick up a user-installed dep, you special case everything (e.g. looking in vcpkg path) and/or ask user to supply the search path(s) in env var(s).

Anyway, shit mostly(tm) just work(tm) on *ix if you follow the happy path. There’s no happy path on Windows more often than not.


> The single biggest improvement for python ecosystem was the standardisation of a binary package format, wheel.

I agree. Some people love to complain about python packaging. But from one perspective, it's arguably been a solved problem since wheels were introduced 10 years ago. The introduction of wheels was a massive step forward. Only depend on wheel archives, don't depend on packages that need to be built from source, and drag in all manner of exciting compile-time dependencies.

If there's a package you want to depend on for your target platform, and the maintainers don't produce a prebuilt wheel archive for your platform -- well, set up a build server and build some wheels yourself, and host them somewhere, or pick a different platform.


> Yes, that's a fundamental reason python packaging is a mess. Python success is largely due to the availability of key mixed language packages. No other mainstream language package manager has to deal with this.

Admittedly I'm not a python expert, but julia handles this just fine? It doesn't seem like it's a difficulty inherent to "mixed language packages". Somehow it appears to me that python's approach is just bad somehow.


Then again jl seems to take 10s to start doing something non-trivial each time while modules are being compiled.


THEN AGAIN, If you want to restart repeatedly, for whatever reason (??), maybe you should just compile the modules once...? See e.g. [1]

TLDR: There's a package called PrecompileTools.jl [2] which allows you to list: I want to compile and cache native code for the following methods on the following types. This isn't complicated stuff.

[1] https://julialang.org/blog/2023/04/julia-1.9-highlights/#cac...

[2] https://julialang.github.io/PrecompileTools.jl/stable/


Julia 1.9's native precompilation is definitely helpful in that regard but loading those native shared libraries (.so files on Linux) into Julia does take some to verify.

If the main objective is to reduce time to load and time to first task then PackageCompiler.jl [3] is still the ultimate way to do so.

Because Julia is a dynamic language, there are some complicated compilation issues such as invalidation and recompilation that arise. Adding new methods or packages may result in already compiled code no longer statically dispatching correctly requiring invalidation and recompilation of that code.

It slightly more complicated than what you stated. It's "I want to compile and cache native code for the following methods on the following types in this particular environment". PackageCompiler.jl can wrap the entire environment into a single native library, the system image.

[3] https://github.com/JuliaLang/PackageCompiler.jl


I don't think this is really true. NodeJS has quite a similar problem.


Doesn't R handle fortran and C++ packages?


It does -- and to a much lesser extent shims exist for other languages as well. R also has imperfect packaging but I think it's handled well albeit with a level of complexity that also goes up a lot unexpectedly at times. For a truly great experience, call python packages within R in a custom conda environment in order to get data out of pandas in a particularly unholy way...


R has always shipped binary packages on Windows and Mac to avoid lots of the pain we see in Python.

Also, all packages must build with the latest version of R, or they are removed from CRAN. This makes the dep problems a lot less severe than we see with Python.


Nothing to do with Python? These FFI bindings exist because Python is slow as dirt.


This is one of the reasons why Python got so popular.

It is too slow to reimplement big pieces of software in it, so people just used bindings to existing code. And productivity rocketed!


These FFI bindings exist because Python was designed as a glue language for FFI bindings.


And the bindings to python exist because those languages are a pain in the ass to work with.

I don't want to work with Fortran, C++, Cobol, etc. And I sure as hell don't want to figure out how to integrate such wildly different languages into my existing and modern ecosystem.


You're probably replying to the original comment from the wrong angle.

Ecosystems like Java, .NET, Golang, Rust, etc do away with this entire problem by virtue of... not calling into C 99.99% of the time, because they're <<fast enough>>.


There's no right or wrong angle here. There's the useful or not useful angle.

Python was designed to call into C. It was always the solution to make Python fast: write the really slow parts in C and it might just turn out that will make the whole thing fast enough. Again: this is by design.

The languages and VMs you list were designed to be fast enough without calling into C. If you need that, great, use them.

People saying 'Python is slow' miss the point. It was never meant to be fast, it was always meant to be fast enough without qualifiers like 'no C'. If it isn't fast enough or otherwise not useful, don't use it, you've got plenty of alternatives.


I don't think Python was designed to call into C, is there some document from the early days claiming that was a major design goal?

1. The only way to integrate with C or other langs in early Pythons was to write interpreter extensions against the internal API. The cffi module seems to have appeared as late as 2012.

2. The Python interpreter API is not an excellent way to extend it, being as it is just whatever happens to be the internals of CPython specifically. There's now an HPython project that is trying to define a JNI equivalent for Python i.e. something vendor neutral, binary compatible and so on.

A language designed to call into C would have had a much easier to use FFI from day one.


My understanding is that, however, calls into Fortran happen because you want some subroutine to be «as fast as possible».


> It’s a miracle it works at all.

I agree. In fact with what seems to be an exponential growth in complexity of software ecosystems, what's keeping it all from eventually getting to a "tower of Babel" catastrophe? Of course, this does not only apply to software, but it is a good example.


yes it was very eye opening. i often see people comparing their favourite package manager to pythons and coming to the conclusion that python is terrible, but its not! one thing i dont quite understand, is why dont the python people just use a c/c++ math library instead of fortran?


There often simply doesn't exist an equivalent library written in c/c++ (or any other language for that matter). The example I'm familiar with is SLICOT (Subroutine Library in Systems and Control Theory) [1], exposed in python through Slycot [2]. It as routines for pole placement, riccati solvers, various factorizations, and MIMO zero computations and a ton of other stuff. As far as I have been able to find, no c/c++/other-language comes close to supplying either the breadth or depth of this library. Further, many of the SLICOT subroutines were written by the original inventors of their respective algorithms, which I view as a big bonus.

[1] http://slicot.org/ [2] https://github.com/python-control/Slycot


A lot of these kinds of routines could be translated to other languages but aren't because they are complex and often unmaintained and no one is really around that understands them well enough to port them to C or Rust or whatever.

There is also the issue that often they were published before adding a LICENSE file was a thing. I've found myself in the position before of having to email professors in some random university to ask them if I can get permission to redistribute such a routine while packaging a library that depended on it. In one case I asked them if it would be possible to update their code with a license (which was just a zip file on netlib) and the answer was, "no, but you have my email". So I found myself having to write something like "distributed with permission from the author, private correspondence, pinky swear" in my copyright file. some of this code is so old the authors aren't around and it would get "lost" in terms of being able to get permissions to use it, I mean it's a potential crisis to be honest, if people really cared to check. (Until the copyrights expire I guess, which is what, 70 years after the author's death or some such?)

Anyway, I wonder if a potential solution is to autotranslate some of these libraries using LLMs? Maybe AI will save us in the end. Of course you can't trust LLMs so you still need to understand the code well enough to verify any such translation.


> (Until the copyrights expire I guess, which is what, 70 years after the author's death or some such?)

«To promote the Progress of Science and useful Arts», my ass. Why do we keep tolerating this Disney-caused bullshit ?

Though I guess that we didn't, and this is what caused the Free Software movement, so it all works out in the end ?


Right, lots of legacy code, plus lack of pointer aliasing in Fortran opens up more opportunities for optimization (or so I have read; this might have changed).


'restrict' has been an common extension in C since long ago and is now a proper keyword.

I guess it is a matter of taste.


also a matter of 'I'm not going to rewrite lapack in C because a platform which was never the intended target doesn't have a free compiler'.


Because some of the best math libraries are written in Fortran. Seriously lots of heavy duty scientific code was written in Fortran in the 1970s and is still underpinning applications today. In many cases there is no equivalent alternative.


Imo i would much rather write a math library in fortran than c or cpp. Fortran is quite a joy for doing numerical work. It sucks for most other things though. Really the only thing nowadays is that youll probably be using gpus so that makes cpp better for cuda integration.


Why would C++ be better than FORTRAN for GPUs ?


Indeed. The real problem is python seems to attract people with no training in software development. It is a mess on top of a mess.


That's also a feature - by design it has to be friendly to new users, and not an arcane art only accessible to the Chosen One, as much as those Chosen Ones would like to be the only programmers.


I thought that was what people said about php


A lot of the mockery PHP got was not because it attracted amateur developers, it was because the language itself was amateurishly implemented and because of the resulting mess when that leaked into how it behaved. Things like function names in the standard library optimized for a strlen based hash, a hand rolled parser that made it impossible to even guess in which contexts what features would work, proactive conversion of strings into numbers "0hello" == "0world", ... . There where entire communities dedicated not to mock the people working with PHP but the language itself.


And Basic, Visual Basic, HTML, Javascript, the list is probably endless :-)


It's also to do with Python itself.


How?


The Python packaging world is full of barely compatible tools and no clear vision. Even if you're consuming packages, or packaging pure Python code, it's often an incomprehensible mess.


Well, part of it is really Python's age and legacy. We are talking about going back to 1996. So much of python's development was ducktape through history in response to the changing world and whims of the contributors.

I'm not saying it's an excuse but it's just how it got to where it was. Newer languages have alot of lessons learnt to build upon to be decent from day 0.


Java and ruby are similar ages as python and dependencies are much better stories there.


Ruby never had nearly the FFI/other language problem as Python so could almost entirely focus on Ruby-code delivery.


The same for Java, since its ecosystem has an allergy to calling native (non-JVM) code, to the point of rewriting perfectly good libraries in Java. When they do call native code, it's often in horrible ways (like copying a native library from within the JAR file to a temporary directory, and loading it from there, the JAR file coming with pre-compiled native libraries for all possible operating system and architecture combinations). So the Java package managers mostly focus on Java (and other JVM languages) code building and delivery.


Maven and Gradle support building C and C++ libraries just fine.


While this may be true, they've also had literal decades to improve the situation and have barely got anywhere. In some ways they've gone backwards!


I haven't found it so. I've stuck with pip and adopted venv when it showed up, and haven't needed anything else. I use Docker for "pinned" builds.


venv and docker are exactly the indicators of how Python is bad


Are screws bad because you need a screwdriver?


That's a completely nonsensical analogy. Maybe you missed his point, but well designed programming language infrastructure does not need Docker or venv to work. The fact that you have to resort to the massive hack of Docker shows how bad the situation is.

I do not have to use Docker or a venv for my Rust, Go or Deno builds.


I'm not a Rustacian, but so far I much prefer Python's packaging to Go's. Italicize all you like. I don't find the emphasis convincing.

When I receive a Python program that I'd like to modify, I can. When I receive a Go program that I'd like to modify, I must beg for the source code.

Do you "vendor" your database into your Go program? If not, you likely still need Docker, or something like it, for your program to work.


No, but you have to use Rust, Go or Deno.


I get to use Rust, Go or Deno (for my own projects). I am forced to use Python for work unfortunately.

It's not a terrible language for sure. It's just the packaging systems and tooling around it that are face-palmingly awful.


The thing is I've learnt Python packaging one time basically (obviously there's always more to learn about anything) and that's enabled me to write a ton more software than I would have written otherwise. For example, just a little plotting utility for me to visualise my accounting data. It's on GitHub, but nobody uses it but me. Could you imagine me writing this in anything but Python?

If your projects really do benefit from other languages, maybe you're doing network or systems applications that need to be fast, then you might have easier packaging but those languages require easy more work due to fewer libraries or just being harder (Rust).

As usual with these things it's six of one and half a dozen of the other :)


You're mistaking hammer for a screwdriver


This pair of articles is excellent, both for good practical advice about installing Python packages, and for its general attitude about how to teach difficult things to large groups of people:

https://www.bitecode.dev/p/relieving-your-python-packaging-p...

https://www.bitecode.dev/p/why-not-tell-people-to-simply-use

Every single decision point or edge case represents permanent failure for hudreds of people and intense frustration for thousands. Of course, none of this is really to do with Python the language. It's more about the wide userbase, large set of packages and use cases, and overlapping generations of legacy tools. But most of it isn't C/C++/Fortran's fault either.


I'm assuming you're not a Python user otherwise you'd already know the many answers!

This link might give you a taste:

https://packaging.python.org/en/latest/key_projects/


I am a Python user, but never heard of most of the tools in that list. This is probably because everyone and their cousin attempts to write yet another package manager for Python.

The built-in tools venv, pip, (together with requirements.txt and constraints.txt) meet 99% of real life dependency management needs.


The proliferation of requirements.txt is one of the key reasons why Python packaging sucks so much.


Right what we need is a requirements.yaml (better yet, create an entirely new markup language for this particular project) and another new package manager for it. One day (one day!) I will start a project without python. One can hope.


> Right what we need is a requirements.yaml

It already exists, it's called pyproject.toml. It already existed for years in the form of setup.py. Requirements.txt means that projects can't be automatically installed which contributes massively to the difficulty of getting packages to work.


Pyproject.toml is a right step afaict but man is it complicated: "Please note that some of these configurations are deprecated, obsolete or at least discouraged, but they are made available to ensure portability." Core vs setuptools-specific etc. See https://setuptools.pypa.io/en/latest/userguide/pyproject_con... and https://packaging.python.org/en/latest/specifications/declar...


Also web frameworks! Many web frameworks.


Back when Linux was a janky corner case run by loosely-coordinated hackers with sometimes-impractical ideological constraints, when brilliant people were expending heaps of effort to enable support for it, that was fantastic.

Now that the janky corner case is a proprietary system run by cyber-landlords whose constraints are just... hostility, I feel less positive about the work being done to support it.

On the one hand it's really great that these people care so deeply about making these tools usable for everyone. And I applaud them for doing it, absolutely not suggesting they should change course, just pondering.

But... now instead of thinking "wow I'm so glad this work is happening" I do rather think "wow imagine what those wonderful people could achieve if they didn't have to work on this".


Well, "they don't "have" to work on it, as was pointed out several times the SciPy devs are volunteers.

Indeed most of the story is explaining why SciPy just had to hope someone else would make a open source Fortran compiler for windows, and it looks like it was mainly NVidia devs that provided salvation


I must go on compiling / You can't break that which isn't yours / I must go on packaging / I'm not my own, it's not my choice


It's not even that. SciPy is paying the price of the idiotic (and highly biased) decisions of Python core-dev members who chose MSVC over MinGW for Python on Windows. (And their motivation is derived from Microsoft sponsorship. A bunch of people on core-dev list are straight-up Microsoft employees, who are paid by Microsoft to be on that list, and, on top of that, Microsoft pays for CI servers for CPython project).

This whole problem could've been avoided if Python just didn't use proprietary tools in its toolchain.


Are you saying using llvm would have also made things better, or are you saying that it would be better if everyone just used gcc?


I'm not knowledgeable enough about portability of LLVM, but if it's binary compatible with GCC-compiled stuff (or can be made such) on Windows, then, sure.

See, the problem here is that if you want interop, with, eg. Ruby, Erlang, R, Perl, Go and probably a bunch of others (the only other exception I know of is PHP (PECL) that uses MS toolchain), then you have to produce compatible binaries.

Ideally, it shouldn't be about the flavor of compiler, but be some kind of official documented format... that many compilers can easily implement. But, since de facto this format is "just use GCC", then, in practical terms, either use GCC, or pretend you use GCC.


BTW, in the quest to find out how other popular languages deal with this problem, I discovered that things can actually be even worse than Python. Enter JavaScript (Node.js)!

In JavaScript, you cannot distribute pre-built native modules, only source code. And if it compiles, then it compiles, and if it doesn't -- it's the user's fault.


> In particular, Meson was going to refuse to accept the MSVC+gfortran combination that was in use in conda-forge.

This sounds like a bug? The point of a build tool is to run the commands you tell it, not tell you, sorry dave.


It's the MSVC linker that would complain. The problem is that the C runtimes used by MSVC and gfortran (whose own runtime library is written in C) are not ABI-compatible. The hack that numpy used was to link the Fortran objects into a DLL to add a level of indirection (the import library) that would pacify MSVC.

So there was some extra work needed to create these DLLs. Either in the build description files, or in Meson. The SciPy people didn't want to implement this indirection in either place, and the Meson developers were not eager to help them either (they did help in general, for example with Fortran and Cython support; but they don't want to provide footguns) because it was indeed a hack. It only worked because the Fortran side didn't use files that were opened on the Python/C side, for example.

https://web.archive.org/web/20180711144501/https://pav.iki.f...


Thank you for the clarification. Meson refusing to do something and the developers not deeming that a bug seemed really unusual.


Oh, Meson is definitely a "sorry Dave" kind of build system. In many cases it's extremely opinionated, though I have only found a couple cases that get into "infuriating" territory.

It does compensate by generally preserving a lot more sanity than its competitors, and having a readable and maintainable description of the build system.


I appreciate the sanity it brings. More software should be like that, as you implied. Or, so I read as such.


The article is well-written and detailed, however I was taken aback by the claim that meson is 'widely used for C & C++ projects'. I've come across bazel more often than meson. I guess because meson is written in python it seemed like a good choice for SciPy, and it worked out in the end so congratulations.

But yeah, I think CMake is still the gold standard despite all its quirks, complexities and problems.


Meson is a front end.

Edit: Caution: This statement is incorrect--It can generate CMakefiles as a backend. It can also generate build files for MSVC. It can also generate standard Makefiles.

Edit: Corrected downstream. CMake can make Ninja files which Meson also makes. I got this backwards but kept the edit so people won't get confused.

CMake is NOT a gold standard. CMake is an agglomerative disaster.

For example: try getting CMake to accept zig as your compiler. "Oh, your compiler command has a space in it? So sorry. I'm going to put everything after the space in all manner of weird places. Some correct--some broken--some totally random." If you're lucky CMake crashes with an inscrutable message. If you're unlucky, you wind up with a compiler command that fails in bizarre ways and no way to figure out why CMake is doing what it did.

This is my experience with CMake every damn time--some absolutely inscrutable bug pops up until I figure out how to route around it. If I'm really unlucky, I have to file a bug report with CMake as I can't route around it.

Sure, if some unfortunate soul has beaten CMake into submission and produced a functioning CMakefile, CMake works. If YOU are the poor slob having to create that CMakefile, you are in for worlds and worlds and worlds of pain.


Note that Meson does not generate CMake files. It generates MSVC or ninja files.


Sorry. You are correct. I forgot that it's CMake that can now generate Ninja files.


> I forgot that it's CMake that can now generate Ninja files.

Cmake has had ninja support for a long time. I've used it for at least 7 years at this point.


I agree, I stay away from CMake whenever possible. I've tried so much to understand it, and yet, it kicks me around like a pinata


Same. By now I almost have an allergic reaction to thinking about using it.

I know the basic thing I'm trying to do is not going to work, and that I'm going to wind up opening 20 browser tabs that alternate between (1) trying to understand from first principles how to do it properly (and getting frustrated going around in circles through their labyrinthine yet thoroughly incomplete docs), and (2) just desperately searching the rest of the web for the right incantation to whisper (and getting frustrated by all the blog posts and forum answers that describe how to do the thing before they went and changed how everything works).

Feeling the rage and despair build as hours roll by, and you're still staring at a screen full of the most cretinous syntax ever excreted into the world.


Projects that use Meson

https://mesonbuild.com/Users.html

Includes Gimp, Gtk+, nautilus, Postgres, qemu, Wayland…


One thing that's noticeable is most of those projects are Linux-only and many of them are around GNOME / GTK ecosystem. Most of those projects have no Windows versions. The ones that have them have really bad compatibility with it (e.g. GIMP, GTK3). Meson has a preference of simpler code bases leaning more on the C-heavy side. That's usually not the case with big, old and commercial C++ code bases.

The programs that need complex build systems that require things like compiling a code generator first and then compiling the rest of the project with the generator etc. are quite common in C++ world to tame the language's shortcomings. Libraries like Qt, Protobuf and GRPC often introduce a crazy amount of build complexity.

CMake's complexity is directly result of that and it is currently the only build generator that can cope with that using basically every compiler in existence (including proprietary ones like ARMCC, ICC, MSVC and very limited ones like SDCC). Even Bazel cannot handle the same number of compilers and feature sets. That's the thing that makes CMake gold standard not its string-driven scripting language.

CMake shares quite a bit history with C++ and you hear sentences like "it's the only thing that works for this level of complexity" for both.


I thought conan was the standard? Or what about hunter? Vcpkg? And now with modules getting added to the language and being supported ro varying degrees by all these options, I don't understand how the c++ ecosystem won't fracture into a pile of unusable garbage.


Vcpkg is a Microsoft thing. No project I know, besides those targeting Windows only even consider it. But, I'm probably biased by my software choices.


If you’ve ever wished that CMake’s ExternalProject pattern could be lit on fire, launched into the sea, fished out of the sea and then incinerated via tactical nuclear strike before bundling up the ashes that are then launched into the sun, vcpkg is definitely something you should look into.

I’m no Microsoft fanboy. I’ve been in software dev for roughly 20 years at this point so generally view anything from Microsoft with genuine suspicion, so I get the hesitation to take it seriously. But it works across the big three (Windows, Linux, macOS) and is MIT licensed so I’d definitely recommend giving it a whirl.

The only serious knock against it is that they went the OG Homebrew route with a single Git repo containing all of their ports (equivalent to Homebrew Formulae). And then whoever designed the Git repo approach also knew slightly too much about Git internals and leveraged tree-ish refs as part of the versioning design which is just weird and confuses anyone that’s not spent time tearing into Git’s object model.

So basically, vcpkg is honestly a good tool that does what it does fairly well. It may not do everything you need, but if it can it’s amazing.

Also, the buried lede here is how vcpkg handles binary caching. Think of it like sccache but at the dependency level. I’ve seen it drop CI runs from over an hour to 10m purely because it helps skip building dependencies without resorting to bespoke caching strategies.


I know orojects that have chosen it for linux only work, but I got the feeling they regretted it. Conan is what I was using before I switched to cargo, and it was fine as long as everything in your tree was conan. Dependencies you could wrap, but dependents were a headache.


conan and vcpkg are pretty much head to head, depending on which circles one moves on.

Everything else is mostly statistical noise.

People on C and C++ ecosystems like choice, so there will never be one single solution.


Meson does more than run the command you tell it, it can also synthesize those commands (so you can support MSVC/gcc/clang without writing build rules). If you ask it to synthesize a command for a combination it doesn't know about, of course it's going to tell you "sorry dave"


Yeah, sure, 40 years ago make had implicit rules for building .o files from .c files so you didn't have to write them, but if you told it your compiler was going to be bananac, it wouldn't refuse to run.


The author of make personally thanked the author of bazel for making a replacement. He actually greatly regretted making Make.

(I love Make).


Stuart Feldman? Got a source/name?


I agree.

Disclaimer: author of a soon-to-be released Meson competitor.

But one obscure, little gem of a rant that taught me what you just said is [1].

tl;dr: Build systems should run the commands you tell them to run, period. Because sometimes, the programmer actually does know what he's doing.

I am ashamed to admit that before I read that comment, I was thinking about making my build system magical. But after reading that comment, I realized that "magic" is why people hate build systems.

[1]: https://ofekshilon.com/2016/08/30/cmake-rants/#comment-29273


Magic is great when you write the magic to suit your problem.

When you are subject to other people magic, it usually ends in tears.


Yes, very much yes.

Your comment is such a concise description of the problem.


I read this 2021 comment/rant and didn't find it illuminating or insightful

I just want a tool that doesn’t make ANY assumptions about compiler flags or compilers or how my project directory structure is laid out. I can do the leg work of inputting all the exact parameters and build configurations into the tool. I just need the tool to incrementally compile my code in a parallel fashion without HACKS

That's exactly what Ninja is, and it's existed since 2012.

CMake is a flawed generator for Ninja, but you can write your own. I did that for my project, as explained here - https://lobste.rs/s/qnb7xt/ninja_is_enough_build_system#c_tu...


Understandable that you wouldn't find it insightful.

I found it insightful because I didn't know. Also, how angry the commenter was; I'd been struggling to figure out why people hate build systems, and the hate pouring out from that comment was palpable enough to point me in the right direction.

Anyway, I've already read your comments on lobste.rs, and I'm glad it's worked for you.

Ninja is great (my build system will be able to read Ninja files), but there are two major problems.

First, you still need a generator. Most people don't think about reinventing the wheel like you do, so CMake still comes up. And a generator can still have the magic that people hate.

Second, it's still limited in what you can do. Sometimes, you need something more complex to make your build happen.

Ninja can definitely take care of the 95% case, and I actually will encourage people to use it before they use mine.

I'm only going to be targeting the people with the 5% case, or that hate all of the widely-used generators. I fall into the latter category myself.


OK but I don't understand why you would read Ninja files rather than generate them.

I think Ninja almost always needs a generator -- it's too low level to write by hand, especially for C/C++ projects.

Meson apparently also generates Ninja, but I haven't used it.

---

I think the main difficulty with builds is solving the "Windows problem", as I wrote in that thread.

Good example of all that from yesterday:

https://lobste.rs/s/hh7wuy/why_scipy_builds_for_python_3_12_...

https://news.ycombinator.com/item?id=38196412

Anyone who can solve that problem will be a hero, but I also think no one person can solve that problem.

That is, cross-language / polyglot builds and working with the existing tools from each ecosystem is THE problem. Unfortunately most people seem to think that the language they use is the only one worth solving for.

Our Ninja build handles C++ great, but we also have many shell scripts for one-off Python things, and R things, and JavaScript things, ...


> OK but I don't understand why you would read Ninja files rather than generate them.

My build system is self-contained. It doesn't just do the configure; it can also do the build.

Also, a feature is being able to treat external builds as part of the same build.

For example, if one of your dependencies is a CMake project, it will be able to run the CMake configure as a target, import the targets from the Ninja file, and run those targets as part of the build for your project.

It should be able to do the same for Cargo files, Zig build files, etc.

That would solve the polyglot issue, minus details, which I'm working hard on.

> https://news.ycombinator.com/item?id=38196412

I'm not sure why you linked the post we're commenting on...

Edit: I'm also solving the Windows problem; my build system is Windows-native, and I have a design for something to build up command-lines based on the compiler in use, including MSVC.

Edit 2: I also understand why you question whether anyone can solve the polyglot problem. You are right to question whether I can. If your response is "show me the code," that is rational, and my response would be, "I'm almost there; give me three months." :)


I have thought that everyone was using WSL2 for this kind of thing and calling it a day. What are the reasons for even trying to build a native Windows version?


That's like asking way don't mac developers work in a linux vm instead of wanting a native build.

Because working in a VM is inconvenient and has poor integration.


WSL2 is both convenient and is extremely well integrated. Microsoft did a really good job with it and VSCode.

Also, in most cases for scientific computing, Mac users are using stuff like Remote SSH in VSCode to work directly on hardware that is running the code, which is pretty much always Linux.

Generally, its kinda said that manpower is being wasted on getting things running in Windows or Mac by the software developers. It should be the other way around, have Microsoft or Apple dedicate manpower to port things over and make both systems conform to the Linux standard.


Let's try the same argument, the other way around: it's sad that Linux developers waste time developing the Linux desktop given that almost all usage of Linux is on the server. Linux developers should dedicate manpower to porting Linux desktop software to Windows/Mac which the vast majority of desktop users use. Now you see the problem with your argument?


For one, Linux is OSS, so contributions made there are available for anyone, forever. I don’t think voluntary developers are very keen on dedicating contributions to a business entity who by the next major release would have stolen your code and hidden it behind ads or some not so smart assistant that requires online connectivity all the time.

Linux is also the common ground. If you have to choose ONE system, you would choose Linux. Otherwise you are going to need BOTH Mac and Windows, possibly even more. Just getting the hardware to test that would be a major setback for a small oss contributor.

I don’t think you can consider scipy desktop software anyway. The IDE can by all means be, but just let it communicate to some deamon running in WSL, docker or remotely over a standard interface.


> it's sad that Linux developers waste time developing the Linux desktop

Hey, that's unpopular statement on wanna-be hackers resources like HN or Reddit - people feel being creative using Linux on Desktop and feeling control over machine, famous System Administrator Of LocalHost experts ;)


> Windows/Mac which the vast majority of desktop users use.

Use for...what? Windows is primarily used for people who play games. Mac is used by people who want tech jewelry. Neither of which is related to development.

And it would make more sense to develop on a platform that runs the same kernel as the servers. This is the reason why the whole WSL2 exists with VSCode integration. Microsoft quickly realized that if they want to compete in the cloud with Azure, they have to be Linux first.


Enterprise software development says hello to your world.


And that is why so many developers buy Apple devices and code on macOS, and walk around with them on FOSDEM, because of the kernel.


... no, because its tech jewelry lol. That is literally the history of Apple.

Or are we still going to pretend that a computer that you don't own, because Apple tells you what software you can and can't install on it, is somehow better for development?


It is a UNIX desktop that actually works, and doesn't need endless hours researching for hardware compatibility or custom kernel parameters.

Android/Linux, ChromeOS/Linux, WebOS/Linux also work great for the same reasons.

Other than that, better leave it on servers and embedded devices being a UNIX headless clone, with cloud and hardware vendors taking the trouble to keep it running.


>doesn't need endless hours researching for hardware compatibility or custom kernel parameters.

Yeah so this is indicative of the fact that you don't really have ANY experience with modern Linux. You can take a well supported laptop and install Linux Mint on it and everything will just work, no tweaking required. Try it sometime before making 10 year old arguments that Mac users were making back in the day and apparently still do now.

Furthermore, Id go even further and argue that as a developer, learning how to configure basic documented things should be something that you know and is fairly straightforward for you to to do, just like installing tooling you need for your development.


My dear, I do have plenty experience with "modern" Linux, plenty of it,

https://www.idgshop.de/linuxwelt

As usual we get the answer,

"Have you ever tried distribution XYZ?"

The magical one that will sort out all problems, and then doesn't.


Im sorry but no. I have set up probably over 100 linux laptops at this point. It used to require more tweaks back in the early 2010s. Now you can get any Dell or Thinkpad, install Mint without issues.

You must be doing something wrong if you are having to tweak kernel parameters.


Would like to know which laptop that might be...


Pretty much most and Thinkpad or Dell works out of the box. You may have to change a setting here or there for some advanced things like adaptive charging, but most of the time the core OS works very well.

There is also Framework laptops, and Librem laptops that are linux first.

My current DD is an Ideapad with Manjaro, which even being somewhat more bleeding edge than Debian based ones, has not only been flawless, but things like Nvidia Optimus work straight out of the box, with external displays.


> Pretty much most and Thinkpad or Dell works out of the box.

…except for AMD ones with AMD GPU :( https://www.wezm.net/v2/posts/2020/linux-amdgpu-pixel-format...

or Intel ones with a particular wifi card https://bugzilla.kernel.org/show_bug.cgi?id=203709

I however agree with you that at least for me, the Linux desktop on almost all laptops and desktops "just works". Especially when comparing with Macbooks - you have way less hardware choice there.


Apple's locking down of macOS has been greatly exaggerated. You can install anything you want on a Mac, Apple doesn't stop you. The same can't be said for iOS or iPadOS, however.


>You can install anything you want on a Mac

Can I install Ubuntu?


if you're on an Intel Mac, yes. if you're on apple Silicon, yes, but it's not user friendly (but hey, that's Linux for you)


Intel Macs aren't relevant anymore. As for apple silicon, running a ubuntu VM doesn't count. You also need to use Parallels, which is not free. The Asahi Linux method is still very buggy because its essentially a reverse engineer of Apple. And its not guaranteed to ever not be buggy, because Apple. And Apple will 100% kill it if it ever gets to popular because it runs well, because they will lose revenue streams they get with MacOS.

So the definitive answer is no, Macs are still pretty much locked down.

I get that people like the battery life and the hardware of Macs, which is fine for personal use, but objectively for a laptop that is going to be used for development, you get much more utility out of buying a "non mac" laptop of your choice in the form factor, and installing Linux on it.


In what way, today, has Apple prevented you from running whatever you want on a MacBook? Not some hypothetical "And Apple will 100% kill it if it ever gets to[sic] popular" future action by Apple, but an actual thing they've done to stop AsahiLinux, or someone else, from making the progress they've been making, on trying to run on bare metal?

In macOS, I am not prevented from running whatever I want. There are some extra buttons to click to allow certain kinds of software to run, but ultimately, Apple doesn't have a say on what I can and cannot run on my Mac. Macs aren't "still pretty much locked down" because the open source community hasn't been able to make a kernel up to your standards. That's just not a commonly accepted definition of "locked down".


> The Asahi Linux method is still very buggy

It's been getting way better lately. I don't daily drive it mainly because I've yet to move my stuff from the macOS partition.


If it ever get to the state that modern linux is, then I will change my mind. Its very much like the linux of 2010s, where you had to configure a whole bunch of things to get linux to work despite people swearing that it works.


If you want to install Ubuntu, buy from an OEM that is supported by Ubuntu.

https://ubuntu.com/certified


> Also, in most cases for scientific computing, Mac users are using stuff like Remote SSH in VSCode to work directly on hardware that is running the code, which is pretty much always Linux.

This might be true for extremely demanding tasks that need to run on a cluster or cloud but a surprisingly large amount of scientific computing is perfectly manageable (indeed much easier to manage) on a laptop, if there are compatible toolchains for the OS.

That said, I would not be dissatisfied with a world in which Linux was the OS of choice for nearly everyone. Fully agree it would be great if research software developers could focus on the domain, distributions take a wildly disproportionate amount of effort


What's so well integrated with atrocious IO speeds between boundaries?


not a problem if you don't cross them - my wsl2 disk image took more space than the windows install. I basically used windows as a web/email/chat client and a terminal to the real system, which was wsl2.


Bad integration is a problem with integration even if you personally do no integration


To each their own. I consider it great at what it does. It has limitations but what doesn’t? It’s a tool, use it, or don’t - I use it, it works for me.


Good for you, but also irrelevant to the argument


Absolutely not - the integration is perfectly fine, for me.


it's not the integration that is perfectly fine for you, but the lack thereof since you simply don't use it


I use the \\wsl$\ to get an occasional file out of wsl2 super easily, vscode remote wsl, the shared localhost interface and a wsl2 gui app every once in a blue moon. Don't think it's nothing.

What I don't use: files on the windows fs from wsl2 for continuously or vice versa.


> What I don't use: files on the windows fs from wsl2 for continuously or vice versa.

Exactly, your "occasionally and once in a blue moon" is close to nothing (but I agree, it's not nothing)


Wasn't that mostly on wsl1?


vice versa, wsl2 is the true VM that truly suffers

https://vxlabs.com/2019/12/06/wsl2-io-measurements/


I think you mistook this for reddit.


WSL2 is not really a VM though, in the traditional sense, as it has full hardware access, including the GPU. Technically, when you enable WSL2, the Windows build itself is running through the Hyper-V hypervisor, as is the Linux distribution. In fact, a few years ago, I used Proxmox to basically do the same thing, in order to test cross-desktop apps I was developing for Windows, Linux, and macOS (as a Hackintosh).


>WSL2 is not really a VM though, in the traditional sense, as it has full hardware access

It is exactly a VM, and the WSL2 guest does not have full hardware access. The hypervisor can paravirtualize compatible GPUs, but for other hardware (such as USB) this is not possible. Hardware passthrough is also not possible in WSL2.


Sure, however in the traditional sense of most people using VMs, there is no virtualization of GPUs, ie vGPU, and thus no hardware acceleration, if you use something like VMWare or VirtualBox inside Windows. In a hypervisor setup like Proxmox, there is, but the issue is that you need one hardware device per VM, which can get annoying. So paravirtualization via WSL is actually, depending on your needs, superior to either of the previous two cases.


As someone stuck on a Windows corporate laptop, I can say that WSL2 is definitely not on the approved software list. Unless it installed by default and comes with a big solid green check mark for compliance/virus scanner/whatever security doo-dad of the day, it is a non-starter for many of us.

Sure, with enough begging and pleading, anything is possible, but that usually requires Conversations.


Sure, corporate laptops are always a different story. Ideally it's nice to have cross compilation but that might not always be possible. I use a lot of Rust and I really like their OS compatibility out of the box.


> Technically, when you enable WSL2, the Windows build itself is running through the Hyper-V hypervisor

My (very limited) understanding of hypervisor is it by definition is used to run VMs.

So I don't get why that makes WSL2 not be counted as a VM, even "technically".

I assume you mean once you enabled hypervisor, technically both Windows itself and WSL2 are VMs in parallel?


> I assume you mean once you enabled hypervisor, technically both Windows itself and WSL2 are VMs in parallel?

Correct, I am referring to the traditional usage of VMs via VMWare or VirtualBox which do not have GPU acceleration, while WSL2 does.


Way more researchers use windows then you might think. Plus students.


Maybe I'm biased, being in EU and all. But I've seen more Linux than Windows in those environments.


I can assert that a large majority of people at CERN leave GNU/Linux for servers and use Windows/macOS on their desktops.

It was like that 20 years ago, and it looks quite the same when I visit it for Alumni events.


Same, it's either linux or macos in compute intensive research for what I've seen in Europe. I've seen groups working with Windows based data analysis tools but those groups and the ones that need python rarely intersect.

WSL2 is a godsend for people forced on Windows by blind IT policies, but fortunately IT doesn't have that kind of control in academia.


What you've "seen" has nothing to do with what's actually popular. Why does the concept of observer bias have to be explained on this site?


My response was based on my own experience. Hence the remark.

What I've observed was also dictated by my own choices.


I'm guessing big corporations where it's near impossible to get a non-Windows machine are a big part of the audience.


msft and nvidia tackling CUDA drivers on WSL2 is a godsend. Especially when you can use Docker Desktop on it.

Definitely making the best of a less than ideal situation.


The κατα in καταστροφή means 'down to' or 'according to' rather than sudden, and implies strongly a bad turn

The inverse of κατα is often ανα though αναστροφή means literally "up turn" or invert

So maybe ευστροφη a "good turn" (eustrophe) would be better coinage

But arguing with JRR Tolkien on language coinage and being right would be a ... ευκαταστροφη i.e. good luck!

But in general I love their noticing of the overflowing grace -- that's something that gives joy and happiness in the world


"Kata" in katastrofi actually means "against", so katastrofi is things "turning against".


I’m under the impression that the best BLASes are mostly C (MKL, Blis, and OpenBlas). I wonder how far they could get with just C and Python.

I wonder if they’d just go with libflame instead, if they started today.

Of course there’s lots of other functionality in scipy; iterative stuff, sparse stuff, etc etc, so maybe Fortran is unavoidable (although, Fortran is a great language, I’m glad the tooling situation is at least starting to improve on Windows).


The question of removing fortran from scipy has happened a few times, and never got anywhere for the reasons you gave. A lot of scipy itself contains fortran code that would take man years to rewrite.

Several key parts using fortran have been removed, once it became possible. For example fft stuff.


not gonna lie, seeing "Fortran is a great language, I’m glad the tooling situation is at least starting to improve on Windows" was not on my 2023 bingo card


I assume that is because Fortran is such a great language that you just didn’t think it was possible that the tooling situation on Windows could be anything but perfect. :)


Great read. After spending a lot of time this year modernizing a CMake C++ project with Python bindings, which I successfully added to conda-forge as a new feedstock, I can say with great confidence that the first IT-related change I would make as God Emperor would be to extirpate Windows from all Universes for Eternity.


Very naive question, but are the semantics of Fortran so different that it can't be translated to C first and then compiled using a C compiler? Perhaps maintained in C going forward?

I can't imagine there are a lot of Fortran folks around maintaining these old libraries - they must need maintenance no?


If you want the code to run slower, yes, you could do that. Because there are no pointers in Fortran, only arrays, and because arguments to functions aren't allowed to alias (let's ignore the horror of COMMON blocks for now), aggressive optimization and vectorization is easier.

The standard Fortran math libraries just work, and they are fast.

I should clarify that you can write C/C++ code that would have equivalent speed, especially with the C restrict keyword, but putting in an f2c step to translate the existing code will making things significantly worse in many cases.


Also, in Fortran arrays are normally stored in column-major order (sometimes called “Fortran order”). C uses row order. A simplistic translation would have terrible performance.


Fortran has had pointers in the standard language since F'90 and as a ubiquitous vendor extension since ca. 15 years earlier.


Are the often used in the big linear algebra libraries?


No. The math libraries are written in Fortran 77 for the most part, which did not have pointers. I should have been more precise about that.


Thanks for clarifying, I was intrigued by your comment and it seemed you’d know.


> Because there are no pointers in Fortran, only arrays, and because arguments to functions aren't allowed to alias [...], aggressive optimization and vectorization is easier.

Can you explain this like I'm 12?

What's an example of an aggressive optimization you can make based on arguments not being aliased and there being no pointers?


  int *x;
  int *y;
  // ...
  *x = 5;
Because of aliasing, x and y might point to the same memory location, so the compiler must assume that when you modified x to 5, y also potentially got modifed, so at next access it will read y again from memory and discard any cached value in a register it might have.

The compiler will try proving that no such aliasing is present, by tracking your pointer usage, but it's not always possible, in those cases it will assume the worse and re-read values from memory.


I finally understand—-thanks for this example.


Ah that makes sense. Thank you!


Not a direct compiler optimization, but consider memcpy() vs memmove() as an example. If you know two regions of memory do not overlap you can call memcpy() for a direct optimized copy, but if they overlap you must call memmove() and introduce an intermediate copy.


memmove does not (in any implementation I’ve ever heard of) introduce an intermediate copy, it just performs the copy loop in the reverse direction to handle the overlap (and can’t always vectorize in the same way memcpy can).


It makes sense, but when would you ever memcpy with overlap? I would think any situation that lets that happen is from a bug, like you have an incorrect buffer length or an incorrect destination address.


Inserting an element in an array is something along the lines of memmove(arr + idx + 1, arr + idx, (length - idx) * sizeof(*arr)); arr[idx] = foo;



Fortran is a higher level language than C. There are plenty of fortran developers.

It's awful for application development, but that isn't it's niche


We have this already - “f2c” has been around for decades.


It's currently used to provide a hackish WASM compatible port of scipy for Pyodide and the Pyodide maintainers are eager to be able to drop the f2c hacks in favor of lfortran or any WASM-capable fotran compiler because f2c can cause very hard to debug low level crashes when running scipy or other downstream libraries tests (e.g. scikit-learn).


yeah, fortran has native arrays


One nit to pick: as far as I am aware, "aarch64" and "arm64" are the same thing. Am I off?


They are the same thing but there used to be two competing LLVM implementations on the backend side.

[1] https://www.phoronix.com/news/MTY5ODk


Thanks, that is very helpful!



I love how he ranted so hard - without even mentioning the embedded or realtime variants.


What I see in Python is that "aarch64" usually refers to Linux and "arm64" usually refers to MacOS ARM. I don't know enough about these things to understand why they have different names.


the build system churn in python is really hard to keep up with.

I'm also curious about performance numbers on Windows (though, to first order, it doesn't matter... anything serious is probably running on a Linux machine).


Fortunately it should be slowing down now. The big transition was getting everyone on board with PEP 517, especially converting legacy Setuptools projects.


I've been hearing something like this about Python builds every two or three years for decades. How long until the next "big transition" is "converting legacy PEP 517 projects"?


Hopefully never.


Hopefully, sure. But why should we believe that this time is different?


The main difference now is there is an actual standard rather than a de facto tool that becomes the standard.

The idea is that different tools may rise and fall in popularity but they should all be following the standard so the compatability breaks should be minimal.

Will it work? No idea, but it's the best attempt to make things work well for everyone yet.

Python 3.12 might be the biggest churn moment, but there are probably a few more down the road, such as dropping legacy version specifiers.


For pure CPU computation Windows is just as fast as Linux since 99.9% of the time it's your code running and not the OS.


Not true. Changes to the OS, particularly to the scheduler, can affect CPU-bound work a great deal. It's "your code", but the OS decides when and where it runs. For example, the changes between Linux 6.5 and Linux 6.6 led to a >20% uplift in TensorFlow and some smaller uplifts to Blender. This is the same software on the same hardware.

https://www.phoronix.com/review/linux66-epyc-xeon/3

I couldn't track down a more detailed scheduler-specific benchmark. I remember reading one on Phoronix a few months back...


Depending on the code you're running, calling convention can matter a lot. The SysV ABI will use xmm/ymm/zmm automatically, whereas on Windows you have to opt into it with __vectorcall.


Not many compute clusters run windows


> > While Fortran has long been the butt of the joke in IT departments the world over, in a curious twist of fate, it has seen a dramatic resurgence over the last few years. While the reasons for this are not exactly obvious ...

> It is obvious to me. :)

https://x.com/OndrejCertik/status/1722364038212899274?s=20

resurrecting fortran: https://ondrejcertik.com/blog/2021/03/resurrecting-fortran/


This was a great read, thx :) Of all the other languages than Python, I am the most drawn to Fortran. I used it for some university project and it was quite nice. Hope I'll find a good excuse to pick it up for some project.


Sometimes I wonder if Rust’s biggest draw is not that is is safe, but that is removed a lot of the BS hoops you have to jump through to get a working program on your computer.

I wonder if the Python alternatives will also get a similar boost for that reason. If maintaining the language and the ecosystem is that much of a bear, it would save a lot of human hours to do that. Even if we started back from scratch for a bit


Just go to read: https://www.bitecode.dev/p/relieving-your-python-packaging-p...

TL; DR

The reason why we need virtual this and that in python is that some work in one and some work in the others. This is particularly problematic in macOS where sometimes the system fall back to the default (somewhat default) python compiler. I even have script to check version and lots of environment for running different program.

It is a mess.

After this magic, is there a new packaging environment, procedure, ... for the rest of us?


Heroes and miracles are indeed required in the swamps of bad designs


Having finally gotten over the whole 2->3 fiasco I'm still sitting out new Python development until the PyPA sorts out the new packaging & distribution story. They're working on it and making progress, but it's still pretty gnarly.

Anyway, the complexity is too damn high!


Indeed it was a great read about what goes behind the scenes for maintaining such packages and a problem generally every developer thinks about when such an extension has to be shipped along .

I still wonder if targeting Python's LIMITED C API wouldn't help in this case. I use a tool (for Nim) which seems to target that Limited API and solves the problem of specific python version mismatching at-least for my specific code. I never had to upgrade a pure C/Nim extension due to python version !

Using zig-cc along with a fixed GLIBC (2.27) and generic CPU flag also made it possible to target linux ecosystem in case user is not a developer and just use the compiled extension shipped with the package.


About the compiler/architecture table: Whats the difference between arm64 and aarch64? The Wikipedia article to ARM64 redirects to AArch64.


I was thinking the same thing. Maybe they meant the armhfp architecture?


This is a really insightful article. Thank you for sharing it. it reminds me there is still so much for to be done in the FOSS community and how grateful we are to have it. Thank you!


An amazing read. Most of the article notes that there was no Fortran compiler yet (though amazing how well the one they landed on worked -- a miracle alright!) but I was particularly struck by the ABI issues. The story of a lone developer hacking a MinGW build that used the right ABI struck home.

I can provide some background on the ABI issues in Windows.

The following is personal opinion not company opinion, and I am a product manager not an engineer so may have some technical details wrong. But:

At Embarcadero we're moving our C++Builder toolchain forward to a new Clang[1], and using a new C and C++ RTL layer [also see 1]. Like SciPy, we're now using UCRT, and the key bits that cause difficulty are the C++ RTL and platform ABI. Boy howdy do we have some stories.

Issues:

* There is no standard Windows ABI beyond what msvc produces. This results in WinAPI (C-level, in other words!) APIs that could not in the past be reliably used from anything other than msvc because they could throw exceptions, rather than returning error codes, and Win64 SEH is not fully documented (clang-cl, which is open source, disappears into the closed source msvcrt to handle this.)

* Or another issue: cdecl isn't standardised. This may surprise those of you who think -- as I used to think! -- that cdecl is simple and a known calling convention for any platform. We have issues where sret for return values[2] can be different between our toolchain and something built with MSVC. Since we need to change multiple languages, and one of our languages (Delphi) handles some returns (managed types) differently, changing this is more complex than it looks. We already did a lot of ABI compatibility work several years ago between multiple languages[3].

Back to the ABI: our new Clang is aiming at being fairly compatible with mingw-llvm on Windows. (And MSVC too, but we're starting with mingw-llvm as the basis.) That does not include a C++ ABI which the C++ committee has been resistant to, but it's a known good, working C++ toolchain, open source.

If Windows ever did have a platform ABI, it would likely be based on MSVC, but I would suggest that we -- developers -- should resist that until or unless the entire toolchain including runtime internals that affect the ABI is open sourced, or at least documented so that other toolchains can match it.

[1] https://blogs.embarcadero.com/win64-clang-toolchains-in-rad-...

[2] sret is a hidden (?) or special return value used for structs, basically when a large (> register size) memory is required. I think. This is where I hope I don't embarass myself too much in the explanation. See eg https://stackoverflow.com/questions/66894013/when-calling-ff... In other words, returning values is more complex than it seems, even for a plain simple C method returning a struct which should be _incredibly basic_ and becomes a complex ABI issue.

[3] https://blogs.embarcadero.com/abi-changes-in-rad-studio-10-3...


Sometimes when a detail of Windows isn't documented, the Wine source code can be useful. Have you tried looking at it for details of win64 SEH? For example:

https://github.com/wine-mirror/wine/blob/master/dlls/ntdll/e...

https://github.com/wine-mirror/wine/blob/master/dlls/msvcrt/...


We have not, simply for legal reasons. The Wine folk are wizards, though.


> closed source msvcrt

Isn't msvcrt source code included with Visual Studio? I distinctly remember looking through it many years ago.

And I see it now too on my install: C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\crt\src

But maybe it's not 100% of it and some key parts and missing, and obviously there might be legal issues reading this source code.


Not complete. Some parts are licensed, such as math libraries, and VS dropped the ability to rebuild the CRT DLL externally some time ago. A lot of it does have viewable source, but some critical core routines are only shipped compiled.


https://archive.is/SyeRt

Archive link. Website doesn't load for me.


I'm very glad that f18 worked (so far as we know). I haven't tried building SciPy for any platform yet myself, much less Windows.


In their table about compilers, isn't AArch64 and ARM64 the same ?


Yes, though different operating systems use different naming. On Linux it's `aarch64` and on macOS+Windows it's `arm64`.


I just want to comment on this:

> Meson was going to refuse to accept the MSVC+gfortran combination

Back in the days, I went to Python's core-dev list and asked why. Why would any sane person ever use MSVC for a cross-platform language runtime. And guess what the answer was? Well... The answer was "Microsoft pays us, gives us servers to run CI on, and that's why we will use Microsoft's tools, goodby!"

For reference, Ruby uses GCC for the same purpose as do plenty of other similar languages for this exact reason.

To give you some context, I ran into this problem when writing bindings to kubectl. For those of you that don't know, in order to interface with Python from Go, one needs CGO, and on MS Windows it means MinGW. You could, in principle, build Python itself with GCC (a.k.a. MinGW) (and that's what MSYS2 a.k.a Cygwin a.k.a. Gitbash does), but this means no ABI compatibility with the garbage distributed from python.org.

So, after I had a proof of concept bindings to kubectl working on Linux, I learned that there will be no way (well, no reasonably simple way) to get that working on Windows. So, the project died. (Btw, there still isn't a good Kubernetes client in Python).

---

On the subject of packaging. I've decided to write my own Wheel packager. Just as a way to learn Ada. This made me read through the "spec" of this format while paying a lot more attention that I ever needed before. And what a dumpster fire this format is... It's insane that this atrocity is used by millions, and so much of critical infrastructure relies on this insanity to function.

It's very sad that these things are only ever discussed by a very small, very biased, and not very smart group of people. But then their decisions affect so many w/o even the baseline knowledge of the decisions made by those few. I feel like Python users should be picking up pitchforks and torches and marching on PyPA (home-)offices and demand change. Alas, most those adversely affected by their work have no idea PyPA exists, forget the details of their work.


hindsight is 20/20.

Don't criticise people for making certain decisions years ago when those don't match what you'd choose to do now. Often you'll find that they were very reasonable given the constraints at the time.

Also the spec will have evolved over time with changes that would have been made under constraint of the existing system, which tends to produce things that are not as nice compared to something that was designed from the get-go to support the features. This is something that's seen very often in software engineering, and are probably partly a reason why long-lived codebases tend to be dumpster fires in general.

Calling them 'very biased and not very smart' is not very constructive.

That's not to say that the wheel format isn't a dumpster fire (I'll have to take your word on that), or hasn't morphed into one with time & revisions.


Are we talking about not criticizing Wheel format?

Because, if so, I'm not buying it. Wheel is an iteration after Egg, that was created in a world full of package managers, packages of all sorts and flavors. Wheel authors failed to learn from what was available for... idk some odd thirty years? (I'm thinking CPAN).

But, it has problems that just show how immature the people who designed the format were when it comes to using existing formats. For example, the Wheel authors were completely clueless about multiple gotchas of Zip format (even though they've been using Egg which is also based on Zip for... what a decade? I mean, come on, you had to be blind and deaf not to know about these problems if you had anything to do with packaging).

But, the most important problem is in the name format. And it's not about knowing gotchas of other formats. It's just total lack of planning / ability to predict the next step. For instance, some parts of the Wheel name are defined roughly as "whatever some function in sys module returns on that platform". So, it leaves this part of the name unpredictable and undefined, essentially. Wheel authors cannot make a universal package because in order to do so they need to have knowledge of all existing platforms and all future platforms... which, of course, nobody does.

And they've done it because... it was easy to do. Not because it was the right thing to do or the smart thing to do. The consequence of this decision is that implementing a PyPI competitor is virtually impossible because it's a layered crap-sandwich of multiple layers of mistakes that support each other (various parts of the name format were modified multiple times over the course of history, and weren't immediately supported by pip). Similarly, implementing a viable alternative to pip is equally almost impossible because of the same historical crap-pie of multiple mistakes on which Python package publishers built their whole infrastructure.

This led to the situation where today the whole Python packaging is locked into using PyPI, setuptools and pip. Those who are intimately familiar with the subject know that they are broken beyond repair and have no hope of getting better, but the mess is so big that undoing it just seems impossible. And, of course, PyPA is blissfully unaware of all the nonsense that's going on in its tools keeps adding new worthless features to polish this turd.



Also https://xkcd.com/1987/

Edit: not to imply that the work of the maintainers hasn’t been INCREDIBLE (it has). I just thought this XKCD was a funny take on how complex the python packaging ecosystem is.


maybe the free software community should stop subsidizing microsoft's operating systems and let them port things like scipy to it themselves

after all, if you want linux, you know where to find it; it's right there in wsl2

also microsoft could start shipping a fucking compiler with their sorry malware-ridden excuse for an operating system, like every single other operating system vendor has for sixty years


> microsoft could start shipping a fucking compiler

Visual Studio is a free download. Most users are not developers. Waste of space to include it by default.

https://visualstudio.microsoft.com/vs/community/


All you really need is cl.exe and friends, and the C/C++ headers, in some fixed directory path. (And, apparently, maybe a Fortran compiler?) That shouldn't take up that much space.

And the last time I was forced to install Windows 10, it spent a few hundred megabytes of bandwidth and disk space on Candy Crush Soda Saga, not to mention a bunch of other junk I never asked for, so disk space is not that precious to Microsoft.


That simply won't work for most Windows development. Need all the frameworks, including .NET. At least tens of gigabytes. Full Visual Studio 2022 installation is 210 GB.


you just need something you can compile them with


Nobody's asking for .NET or an IDE. Just enough to build things like the aforementioned Python packages. Or am I missing your sarcasm?


This is not sarcasm.

You are asking for basic development capability in the base OS installation.

Vast majority of Windows development uses .NET, or uses frameworks etc. The barebones C++ compiler and standard library simply won't work for most development on Windows, so what's the point? You are expecting base functionality to cater to your very niche specific needs which is practically useless for the vast majority of Windows development in general. There's no business case for it. Won't happen.

Even on Linux I need to install a lot of headers and libraries and compilers and SDKs before it can be used for development. Ubuntu base install is practically useless for dev without `apt install <all the things>`.


that's a ridiculous excuse

unix v7 included a compiler and was three megabytes

https://www.tuhs.org/Archive/Distributions/Research/Keith_Bo...

https://opensimh.org/research-unix-7-pdp11-45-v2.0.pdf

the compiler was a tiny fraction of that

gcc 9 is about 50 megabytes

windows 11 is over 8 gigabytes; you need a 16 gig usb drive to install it

there are probably individual audio files included in windows that are bigger than gcc

also, tho, this is a lot like not including life jackets on a ship because most passengers don't get shipwrecked


It's not an excuse. Visual Studio is multiple gigabytes. Yes they should make it smaller but unless that happens it would be stupid to include it by default. Waste of space.


It's possible install only the components you want from Visual Studio. I chocolatey to install only the compiler and a component needed to compile Cython codes on Windows:

``` choco install -y visualstudio2019buildtools choco install -y visualstudio2019-workload-vctools ```

I this case it's installing Visual Studio 2019.


If we're talking about base OS install then its a different story. Which options do they choose as default for all users? Are you suggesting that Microsoft create an OOTB setup perfectly custom tailored to this one tiny niche requirement to compile Python whatever?

Even on Linux the OOTB setup is useless for my development. I always need to `apt install` all the compilers, frameworks, libraries, SDK's, utilities, etc, before its usable.


they can ship a compiler that isn't a giant pile of shit then

there are free ones


A compiler without system libraries is useless.

All of those little apt get install ....-dev to spend an afternoon on.

Followed by installing Clion, QtCreator or KDE + KDevelop.


You should get a job at Microsoft and fix it.


i'd sooner breakfast on goat vomit


You are only looking at the compiler without standard library, and all the nice tools modern C and C++ developers have grown to enjoy since 1979.

You should be comparing to a C compiler for MS-DOS.

If you want to do a proper comparisation you should include GNU/Linux libraries for all major architectures already compiled, GUI frameworks, IDE, .NET, Python, node, Java SDK, Azure integration SDKs, device drivers,...


because of dynamic linking, the standard library is already included in microsoft windows, and all that other crap isn't needed to get blas and lapack to build


What standard library would that be?


Microsoft's Universal CRT, present by default in Windows 10, and installable on Windows 7 SP1 and later.

Linking to UCRT using an entirely FOSS toolchain is, alas, nontrivial, but supported by mingw compilers (gcc and clang; no idea about the various FOSS Fortran compilers):

https://mingwpy.github.io/ucrt.html


beats me, i haven't had a microsoft windows box since 02000


So you just like to rant, I see.


yet somehow i was still correct: https://news.ycombinator.com/item?id=38203731


Most users will never need a compiler. If they need it, it's a download away.

Life jackets are hopefully never needed, but when they are needed, the crew can't go to the warehouse and get them. Big difference.


it might be a download away, or it might be permanently unavailable


Which was the usual way things were on UNIX, thanks Sun, before GNU/Linux became relevant.


no, there was a brief period of time where sun decided to try to imitate microsoft in this stupidity, but fortunately none of the other unix vendors followed suit


First of all, everyone else was doing the same outside UNIX in the 1980's.

Secondly, Solaris, Aix, HP-UX, DG/UX were the same in what concerns having to buy a UNIX developers license for the compilers.

So other UNIX vendors did follow suit, and I can't be bothered to dive into BYTE and DDJ ads from 1980 - 1990's to add others to the list.


though i never bought one myself, i never saw an aix or irix box without compilers installed, and don't have any personal experience with hp-ux and dg/ux, but it was only for solaris that the fsf decided they had to put up precompiled gcc binaries on their ftp site because the vendor wasn't shipping one


Why would it be permanently unavailable?


that's what always happens to downloads

like 90% of my links from 8 years ago are 404 now


The compiler from 8 years ago is very likely obsolete. If it was relevant, you'd have stored it somewhere safe or you'd have a support contract with Microsoft.


the compiler from 8 years ago can still build code that works on the operating system from 8 years ago; the new compiler often cannot, even if it does exist

it may well be obsolete in the sense that the new compiler is more convenient to use and produces more efficient code, but that's irrelevant

software doesn't rot like the potatoes you forgot about in the fridge

your argument is contingent on the presumption that people never do stupid things that cause them damage in the future. but if that were true, nobody would buy cigarettes, or for that matter microsoft windows, in the first place


Software indeed rots because it doesn't exist in a vacuum. Requirements change, bugs are discovered, support declines unless you give golden coins to someone. Infinite backwards compatibility is the exception rather than the normal.

My point still stands: the compiler should have been kept around if it is required to keep something business-critical on an 8 year old machine running. Whether such old versions of compilers are still provided depends on the goodwill of Microsoft.


it seems that somewhere in the thread you went from arguing against my position to arguing in favor of it


We also shifted away from discussing what an end user needs (a recent OS, probably no compilers unless they develop software, and if they do, a recent one) to what one would need if stuck with a legacy hardware or software stack.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: