Those who have maintained forks of OSS at work, what was your experience like?

25

ubernostrum 5 months ago | link

Background: back before Django was originally publicly released, its ORM was literally a code generator. You would run it over a set of model-definition classes, and it would output some new files of code containing the query functions for those models. This apparently got some pushback from people who were shown early versions of what became Django, but there wasn’t enough time to completely rewrite it before the first public release. So Django launched with an ORM that was still, at heart, the code generator; the only difference was that instead of writing the files out to disk, it generated the code as modules that only existed in memory, and hacked them into the import path to make them work (this is not as hard as it sounds).

This still was pretty unpopular, so for the Django 0.95 release the ORM was completely rewritten, an effort which came to be known as “magic removal” (since the original ORM would “magically” cause whole modules of code to appear in your import path that you’d never written).

At the time, I worked for the newspaper company where Django had originally been developed, and we both used internally and sold commercially a news-oriented CMS built on top of Django, which was going to take a while to port to the new “magic-removal” version of Django. So I volunteered to help maintain bugfix branches of the pre-“magic-removal” Django, which ended up being available not just to us and our customers, but also to anyone who wanted the code. The branches got renamed a while back, but are still visible in the main Django GitHub repository.

It mostly wasn’t that bad; some critical bugfixes from later post-“magic-removal” Django were backported in, but the pre-“magic-removal” code was actually pretty stable and since there was no new feature work happening, there were only around 40 commits over the course of a little over two years of maintaining it.

The worst part was this bug, which was the bane of several people. As befits a gnarly bug, once I finally understood what was actually going on (and please don’t expect me to be able to explain it now, 16 years later, when I’ve forgotten almost everything I ever knew about the internals of the pre-“magic-removal” ORM), the fix was literally a two-line change. Though it turned out the same bug lurked in the new post-“magic-removal” Django, and had to be fixed there too.

I don’t know if I’d do that again; my views on software updates and addressing technical debt before it reaches the “oops, gonna need two years to dig out of this hole” stage have evolved a lot over the course of my career, and so now I push hard for staying up-to-date and handling changes in dependencies ASAP.

17

david_chisnall 5 months ago | link

I maintained the CHERI LLVM fork for a long time (now I maintain a fork of that). Other folks are still maintaining the fork of FreeBSD for CHERI. I’ve worked on a few in-house forks of compiler things at various points and worked with companies that had in-house forks of FreeBSD.

I think it’s worth characterising them according to why they won’t merge upstream, typically in the following groups:

Upstream doesn’t want the patches.
It’s experimental and will churn a lot before it’s in a good place to ask for review.
It holds something that gives competitive advantage.

The first reason happens when you have rare requirements. It’s usually worth engaging with upstream here to see if anyone wants some of the infrastructure for your work. A number of the things that we did for CHERI in LLVM, for example, ended up being useful for AMD GPUs because they also had weird pointer semantics. Folks building managed-language VMs also wanted a stronger separation of pointer and integer types, so we worked with them on some things. Similarly, Juniper wanted stable, parseable output from various command-line tools and it turns out that they weren’t the only ones, so they upstreamed libxo and integrated it with things like ifconfig, so their tooling could just consume JSON or XML output.

The second is common in research prototypes. A lot of these are thrown away (I have a load of these that showed that an idea was not, in fact, a good one). Others die for bad reasons. I’ve seen a few colleagues write papers about some LLVM optimisation but start with an old release and then not have the time to update to trunk and work (occasionally they’re just doing bad science: one abandoned a project because, after getting a CGO paper, they updated LLVM and found that their transform now gave a slowdown when their baseline was less naive). A lot good work in academia is wasted because no one budgets for the upstreaming time. In both FreeBSD and LLVM, I’ve tried to build connections between universities and companies that care about their research and can have engineers work to upstream it (which often involves a partial rewrite, but is usually cheaper than inventing the thing from scratch). The REF in the UK helps because getting things upstream into a widely deployed project counts for more in the impact metrics than simply publishing a top-tier paper.

The third category is often the hardest because it’s very rarely true. Most of the time, the secret sauce is easy for someone else to replicate. It’s also painful because, if it actually is valuable, upstream will implement the same feature in a different way. Now you have to decide whether to make your fork diverge more by ripping out an upstream feature, or throw away your work. If you pick option 1, remember that more smart people don’t work for you than do: the upstream version of the feature is likely to get more engineering investment than yours and so, over time, yours will get worse. Companies that do this well gradually trickle features upstream when it looks like someone else might want to work on them, so they have a small lead but get to share long-term engineering investment.

2

zie 5 months ago | link

I would just add, there can be one other reason that I’m aware of, regulatory requirements. If you work in governments/for governments, they can have odd laws/rules, where the code has to exist and be “owned” by someone in country or in organization, etc. Lots of times that means you can’t just fork, you have to create from scratch, which is annoying. Sometimes you can though!

Sometimes it’s easy and it’s literally just a soft fork, where you can get away with just adding a changelog entry, “I own X here”. Sometimes you have to hard fork it for various reasons, and in those situations you usually don’t get to ever play with upstream again.

Though I guess now that I wrote this all out, this could maybe be a sub-category of your third reason.
1. 2
  
  david_chisnall 5 months ago | link
  
  For corporate deployments, you often need to have an internal soft fork to a repo and then do builds from there, so that the entire source-to-binary flow is on auditable systems. Microsoft does this with all open source that they distribute binaries for and I’ve seen similar flows for internal use elsewhere.
  
  It’s less common now that a load of public CI systems are free for open source, but it used to be extremely common for open source projects to do binary releases from random contributors’ systems (LLVM still does this). This provides a great attack vector for malware: a targeted attack on a single developer can install malware that can then propagate to all users. Big projects typically build on isolated infrastructure (FreeBSD, for example, builds all packages on machines that are not connected to the Internet, so ensures that any malware must be present in the source tarballs, whose hashes are present in the ports tree and are thus auditable later). Newer small projects that provide binary releases will do so with something like Cirrus CI or GitHub Actions where each build runs in a pristine VM with known configurations and so malware needs to be committed to the repo (where it shows up in the git logs if someone looks carefully enough). Cloning the repo and building ensures that any malware that you inadvertently ship must be in the repo and can’t be deleted without your having a separate audit log that shows it was there.
  
  I don’t really view this as forking though, because ideally the copy in your clone is identical to upstream’s. A few places do something like this but carry a handful of back-ported fixes.
  1. 2
    
    dvogel 5 months ago | link
    
    FreeBSD, for example, builds all packages on machines that are not connected to the Internet, so ensures that any malware must be present in the source tarballs, whose hashes are present in the ports tree and are thus auditable later
    
    How do the source tarballs get into these build systems?
    1. 4
      
      david_chisnall 5 months ago | link
      
      It’s been a while since I looked at the infrastructure, so I’m not sure if this is still the case, but they were exported from a read-only NFS share. If you can compromise a builder, you then had to compromise the NFS server to the outside world. All packages are built with Poudriere, which builds each one in a jail, and also prevents any network access, so you first need a jail escape to get access to the network stack to try to attack the NFS server.

12

mullr 5 months ago | link

We fork rust crates pretty regularly at work, to fix bugs or add features we need. Our general practice is to stay on upstream as much as possible, submitting an upstream patch immediately. That actually works out about half the time. So we have a small handful of forks.

We generally only need to touch these things in order to keep dependencies up to date. This is thanks to Rust having a good compatibility policy, for the most part. Most crates do a pretty good job of keeping api breakage to a minimum as well.

…except for one. Not going to name it, but one of our key dependencies seems to take delight in substantially breaking the api at every release. We’ve had that one forked for maybe a year, since they don’t seem to be in a hurry to take the feature patch we submitted. And frankly it’s been less work to maintain our fork than to try to keep up with the real library. We just update its deps when we need to and go about our business. We don’t keep it up to date with upstream, since we don’t really have a reason.

4

bitemyapp 5 months ago | link

This is how it’s gone for me with forking crates to add patches at work but the last fork I was maintaining has caught up to my work so we’re back on vanilla crates.io now except for the purely internal stuff on the private crate registry.

I don’t think forking is a big deal if it’s something small that you expect to be able to drop on the maintainers as a PR and have upstreamed eventually. If it’s a bigger change on a big project you may want to get more involved in the project in public so that you’re not laboring in private on things that may not be upstreamable. That’s pretty rare though, I think programmers are excessively afraid of fixing any problems in their libraries and tend toward the extreme of treating open source libraries & frameworks like frozen black box products.

8

GeoffWozniak 5 months ago | link

I work on a GCC-based compiler (with Binutils) in my day job. It was forked long before I started there.

It’s got some significant changes in parts, but most of the changes are done via the hooks that GCC/Binutils supplies for different targets. There are some big changes in the upstream linker code and numerous scattered changes in the compiler itself. All of these are purposeful changes.

The upstream code is not updated very frequently. This is both a blessing and a curse. It’s a blessing in that you are not constantly dealing with merge conflicts. It’s a curse when you do update as the changes are generally way out of date and very difficult to pull in. I’m currently working on making things easier to update by restructuring some of the code, but it will not get rid of the merge conflicts and careful file tracking entirely. I’d say that not updating much is more curse than blessing.

We have changed our attitude about our changes recently. Instead of viewing it as updating the upstream code, we look at it as porting our changes to a newer code base. This shift in framing makes changes to upstream code fall under more scrutiny, especially in light of upstream changes to function signatures and types.

Our changes can be complicated to port because, originally, the upstream code did not have any history attached (aside from Changelogs). Our code has moved between various revision control systems over the years without the upstream history. I don’t recommend this approach. It is fine for tracking your changes but when updating the upstream base version, tracking down upstream changes is difficult. This is particularly bad when upstream files are deleted, added, or renamed. We are moving away from this model, which is a slow process.

If I were to start over I’d work directly on the upstream repository history and structure our changes as patches. On an update I wouldn’t merge, but instead try to apply the patches on the new branch. If you put some discipline in commit messages by associating every commit with an issue in the bug tracker you get to keep your history with the upstream history. (Note that your changes will have different commit ids on an update.) Additionally, I’d try to enforce good commit messages. In short, have someone be an internal maintainer.

7

colindean 5 months ago | link

I maintained an internal fork of OSCommerce 2.2, released in 2003, from 2008 to 2009, when I was fresh out of college. I can’t remember exactly when that fork was split (2005?), but I know it had never been rebased. Engineers who preceded me supposedly would handle security patches in this ancient PHP codebase, but I don’t recall any of the six engineers we had during my time actively undertaking that. This codebase supported some $5M in annual sales, I think.

My predecessors had forked because the codebase was insufficiently customizable. We maintained a single codebase that was then deployed to unique customer environments with their own config.php and runtime environment variables passed in via an Apache .htaccess. We had plenty of if($customer == "BreadAndSoupCo") { ... } else if ($customer == "MassageParlor") { ... } throughout the codebase. I shudder to think now how unsustainable that practice was, but certainly I didn’t know any better, and I think my team just accepted it.

The best parts? We could do whatever we wanted. The worst parts? We were genuinely responsible for our destiny, including security, which, as stated previously, we probably neglected at our own (unknown) peril. We made utterly unportable changes, which were significant and probably could not be easily rebased.

I would not have taken that job with what I know now about software development and maintenance in general. However, I made some lifelong friends at that job, and just tonight, for the first time in years, I started a conversation with one of them who lived then and lives now thousands of miles away.

7

tobin_baker 5 months ago | link

My company maintained a fork of LLVM, but not in any disciplined way. Because our changes were not structured as a logical set of patches, it was prohibitively difficult to merge from upstream (I tried once and gave up). So we ended up stuck indefinitely on an LLVM release several years out of date. Do not recommend.

7

stephenr 5 months ago | link

I quasi-maintain a … well I guess it’s a private fork, really, of a “commercial” tool for a client (they were using the tool for a reasonable time before they were my client). I say commercial because it’s paid for (and the business owner thus far is happier to pay for time spent doing this ridiculous dance than to just build out the functionality in the main application), but the quality of the thing is easily the worst code I’ve ever seen in my life.

As for the why: I asked the developer (singular. It’s one dude) about adding TLS support for the database connection (our application moved from a single-server environment to a cluster, and thus DB connections travel over the DC’s LAN now, rather than being local socket/loopback connections) and the response was essentially “nope, no plans to do that”. Having seen the internals of the tool now (it is shipped with two obfuscated files, that rely on the secrecy-preserving wonder twins of eval + base64) I am 100% convinced this was due to the author’s lack of experience/ability rather than any kind of cost/benefit analysis - the change itself could have been as little as about 5 lines of code (albeit, repeated in about 3 dozen files).

The best part: there aren’t any really. It’s all terrible.

Worst parts: um you name it. The “merge changes into the pristine branch, re-run the de-obfuscator, and then re-run the existing patches and check for new breakages” part is pretty horrible, but then the part where i was finding dozens upon dozens of copy-and-pasted-and-then-sometimes-changed-slightly blocks of utterly terrible code was also pretty fucking horrible too.

It’s fundamentally the same thing, but with some basic concepts fixed: it now works over TLS, common copy-and-pasted functionality (e.g. the first one was to get a DB connection and make it available for use) is moved into a single function and then referenced, etc. The only significant change was to disable a specific feature in the code because it ends up executing a truly horrific SQL statement (which is a consequence of the horrific schema). None of this was “by choice”, any more than getting your leg amputated when it’s gangrenous is “by choice”.

Yes and no. I do periodically update the base (and then reapply patches) from the vendor’s latest version, but I don’t update it as frequently as they put out updates, simply because the process to bring it up to date is a PITA.

Knowing what I know now? I would simply tell the client that we can’t use it (lack of TLS support was the original issue), and need to replace the functionality it provides.

And for anyone who wonders: yes we do still abide by the built-in “usage” checker (each licence is tied to a specific (sub) domain) and and have left in the part where it phones home to check this.

4

sknebel 5 months ago | link

Interesting that TLS support lead you to looking into the internals of the software, that’s something where I’d have expected someone to use stunnel or a comparable proxy solution first.
1. 2
  
  stephenr 5 months ago | link
  
  We do use stunnel for redis, so if it were just the tls issue that possibly would have been the solution, but there was something else that came up, I’ll have to look up what it was now, I can’t remember the details of it.

7

x64k edited 5 months ago | link

I was part of a team that did this twice. Project A was a fork of a suite of L2 switching protocols. Project B was more of a rolling set of Linux kernel patches. I wouldn’t call it a fork, most of these were drivers, just… particularly large, and it was actually easier to maintain our own branch and rebase it once in a while. I was involved in Project A more than in Project B. Office politics meant the office I was in was largely excluded from that kind of work, but the “sub-team” that actually had to do it was badly understaffed, so when the bad stuff hit the fan, I’d lend a hand if I could.

Why did you decide to fork in the first place? Did you have a choice?

This had happened long before I got there so I’m not entirely sure.

Project A had been forked for strictly commercial/licensing reasons as far as I could tell. It had happened a long time before I got there so I never spoke to anyone who knew or remembered the whole story.

Project B kind of happened. The team was already working with two manufacturer-provided kernel trees for some devices at the time when this third one happened. We depended on a set of drivers for a very complex third-party device which the manufacturer didn’t want (or couldn’t?) upstream, so we were kind of tied to specific kernel versions (they did an okay job at that, they had good support for recent LTS kernels, we never ran hopelessly obsolete code). In addition to that, we had a bunch of fixes and tweaks in other drivers, some of them from upstream; we upstreamed some of those changes when we could but it was a somewhat time-consuming process and, AFAIK, we just didn’t have the bandwidth to upstream larger changes. Finally, we backported fixes when we really had to.

What were the best parts of maintaining a fork?

For project A, there weren’t any. For project B, it was kind of cool that we had a fairly predictable update cadence. We got to plan for things well in advance and there weren’t any surprises. There was surprisingly little churn. IIRC the whole thing got updated once or twice a year.

What were the worst parts?

Project B was generally okay, but since it was, effectively, a downstream kernel, we had little substantial guidance from people who knew the hardware better than us, or who knew whatever subsystem we were working on better than us. We had commercial support for some of the hardware so that helped a bit, but in other cases we just kind of hoped for the best.

Project A was bad in every single way. It had been forked by a team that didn’t know or understand autotools, and nobody in the management team that supervised it knew what autotools is or does so they didn’t budget any time for training, porting, or anything of the sort.. They only figured something was wrong a few years afterwards when someone asked why that cmake output looks so weird and were shocked to hear it doesn’t use cmake.

To give you an idea about how atrocious it was: at some point someone just gave up, made some changes to the configure script by hand and checked that into the main tree. That became modus operandi afterwards and the source tree had accumulated about ten years’ worth of changes to autotools config files that did absolutely nothing because most of the configure script and basically all makefiles were hand-rolled.

Nobody wanted to touch it, and at one point someone figured we’d hire an intern to fix it. Yep. Me and one of the smartest young kids I’ve ever met wrangled with it for a whole summer, wasting several months of this dude’s most valuable learning time (fortunately he was way pickier next year). We got it in slightly better shape, but since nobody understood how autotools works, most of the changes we’d made were either reverted or rendered useless again in less than a year.

Did you change the fork significantly? Was that by choice?

Project A: yeah, we added a lot of features. The original fork had happened at some point in the ’00s I think. It was definitely by choice, we needed those features.

Projecft B: kind of. In addition to general hardware fixes we had e.g. a bunch of changes in the network subsystem that helped us with testing. Definitely by choice, too, many of the changes were specific to our testing requirements and development environment. I doubt there would’ve been any value in upstreaming them.

If you forked from an active project, did you keep up to date with upstream?

Project A was abandoned by upstream AFAIK (that’s part of why the fork happened) so obviously no.

Project B, yep, absolutely. Ideally, we would’ve ran an upstream kernel period. We couldn’t, but we tried to stay as close as possible. If we needed critical fixes, our first option was always to see if we could rebase on top of an upstream version that had them, and only backported things ourselves if there was no other choice. It’s not so much that we dreaded the rebase – most of our changes were self-contained and largely confined to reasonably stable parts of the kernel – but we doubted we were able to integrate critical fixes better than the upstream developers who’d made the fixes in the first place.

If you were to start over, what would you do differently?

I wasn’t there in the beginning for either project so I’m not sure I can answer this well enough. But I can venture a guess at what was missing.

Project A: treating “the boring parts” (build system, CI etc.) as auxialiary to the programming process was a mistake. There were dozens of people working on that whole thing, on three or four timezones, and exactly three of us who knew anything about autotools, all of us in a team that was responsible for only a tiny portion of the code, and none of us having had been on the team that made the original fork. Bugs related to these parts were always at the bottom of the todo list when they should’ve been at the top. Surprisingly few people on the dev team understood that, and they hadn’t been given the time to learn the new build system, or the opportunity, so most of them just abandoned it. I wasn’t an autotools specialist in any way, either, I just happened to pick it up because if you ran Linux back when The Matrix was the latest box office hit, you picked it up.

Project B: honestly, I don’t know if I’d have done anything differently, it was pretty well ran given the constraints. I think its only real problem was that it had a very low bus factor. The very few people working on it understood that was a problem and tried to fix it to some degree but there were real organisational obstacles to that and it never really happened.

6

fanf 5 months ago | link

In my first job after uni, in the late 1990s, we had a bunch of forked projects that had been created to support mass web hosting when that was cutting edge.

thttpd, not a big codebase, needed some bug fixes as well as the mass hosting hacks
FreeBSD support for configuring local IP addresses en masse using CIDR blocks (because Host: header support in browsers took a while to arrive)
Apache httpd mass hosting hacks

The mass hosting hacks were basically interpolating part of the virtual host name into the filesystem path.

Before I started there wasn’t much regard to working with the upstream projects. My boss and I tried to improve that. He sent Jef Poskanzer our version of thttpd so Jef could incorporate the parts he liked (tho I don’t think there was much effort to reconverge). I cleaned up the FreeBSD hacks and submitted the patches tho they never got incorporated — it’s better to do this kind of trick with packet filters if you can. I turned the Apache hacks into mod_vhost_alias which was accepted upstream.

This taught me a lot about working with open source projects. And how large the gap can be between an expedient hack and something with good enough quality to be acceptable upstream. And how terrible it was to try to maintain a fork using CVS.

Much later on I started using git when I was working on changes to submit upstream, whether or not upstream used git, because it was so much easier. I didn’t work strictly upstream-first because I usually wanted to fix an issue in production without waiting for a new release with my changes.

These ops-driven changes were generally small, so they weren’t too difficult to maintain as a fork. And they generally occurred in software we were pushing hard, which was a small subset so the absolute number of patches we were carrying forward was not too large. And I was always working to get changes upstream where possible, to eliminate local patches.

Another way to eliminate patches is when some upstream change makes it possible to solve the problem in a different way. So even if a patch is too ugly to upstream and not worth the effort to clean it up and document it, it’s worth keeping in sync with upstream so you’re aware of improvements that can make the patch moot.

4

pmdj 5 months ago | link

Situation: I maintain an internal fork of Qemu for a client.

Why did you decide to fork in the first place? Did you have a choice?

Adding features and improving performance/correctness. This goes well beyond a few lines of code that can be upstreamed comparatively easily. For a while the client also considered the changes to be a competitive advantage.

Forking wasn’t really a choice, the question is how long the changes are kept out-of-tree. I think it’s been about 2 years now?

What were the best parts of maintaining a fork?

We can change what we want without caring about other configurations, coding standards, etc I suppose? At least until upstreaming time.

What were the worst parts?

The worst part is definitely upstreaming the changes. Submitting a bunch of patches and having them sit ignored for weeks, then finally getting some review on them, resubmitting with changes, which are again ignored for many weeks, having to go to great lengths to credibly demonstrate that some piece of code is correct to a reviewer who doesn’t even have a system with the affected platform, etc. is frustrating and unproductive. But that’s true for any patches to that project if you’re not a core maintainer, and it’s not specific to the fact I maintained the changes out of tree for over a year.

Did you change the fork significantly? Was that by choice?

One feature addition is in new code files, so mainly build scripts are actually changed.
One feature branch is a fairly significant change to an existing device. Coincidentally, someone else added similar support and submitted their version upstream first. Theirs still hasn’t been merged after some months, but once it does merge, I’ll have to re-do some of my work on top of their changes because mine functionally went beyond in some ways. (but theirs also has functionality my implementation doesn’t.)
The perf improvements are in an area of code that’s not super hot with changes. (Else there wouldn’t be any low-hanging fruit that an outsider such as myself could find and fix I suppose.)

None of those changes could realistically have been implemented any other way within Qemu. I guess we could have picked a different VMM project to build on in the first place.

If you forked from an active project, did you keep up to date with upstream?

Keeping up with upstream is not so bad. I’ll usually take 2-4 hours to rebase non-upstreamed features every time there’s a Qemu release, which is every few months.

If you were to start over, what would you do differently?

Good question. Perhaps choose a different OSS VMM as the base altogether. But the client’s infra was running entirely on libvirt at the time we started the project, and their VMs were already working in Qemu. So for the original scope it was the correct call at the time.

However, the original feature was a success, so my brief expanded. The time I’ve spent on upstreaming just the first batch of patches would probably have been enough to get their VMs running on another VMM and to write a libvirt plugin for said VMM.

3

david_chisnall edited 5 months ago | link

QEMU is an awkward one. Clause 7 of the GPL prohibits distributing it with any further restrictions. We had a version with some Arm draft extensions that were shared with partners under NDA. As a result, we weren’t allowed to disclose them and so couldn’t distribute our QEMU version to anyone (even Arm).

I’d love to see a permissively licensed alternative. I saw that Mambo is now open source, but I didn’t check the license. Looks like Apache 2, which is promising.
1. 1
  
  pmdj 5 months ago | link
  
  The thing about Qemu is that it‘s so many different things to different people, all in the shape of one monolithic project.
  
  The licensing wasn’t an issue in our case as running customer code inside VMs on internally managed host systems isn’t by most GPL readings considered to be distributing the VMM.
  
  The structure and code submission process of the project are definitely costing us however, and there are some technical aspects (pervasive use of big Qemu lock/BQL, generally Linux-focused (host) architectural decisions) which have been a thorn in my side.
  
  On the other hand, the more niche and less general purpose projects aren’t anywhere near as mature and widely supported, so it’s hard to say what the right call is, ultimately.

4

edd 5 months ago | link

We forked LLVM for a research project. Our circumstances are probably a bit a-typical, but I’ll answer anyway.

Why did you decide to fork in the first place? Did you have a choice?

We needed a C compiler and a code generator that had stackmaps that we could experiment with. We don’t have time to write those things from scratch.

What were the best parts of maintaining a fork?

We can make experimental changes in isolation of upstream development. Most of the kinds of things we are doing aren’t likely to be generally useful anyway (at least in the short term), so it wouldn’t make sense to upstream them anyway.

What were the worst parts?

Upstream syncs. LLVM is very fast moving, so there will be merge conflicts. Sometimes they are easy to fix, sometimes not. For example, when LLVM moved to opaque pointers, that was quite laborious to sync. They also recently killed the legacy pass manager, which was fiddly too.

Luckily no upstream change has totally killed us yet.

Did you change the fork significantly? Was that by choice?

Maybe a thousand lines or so, at a guess. The plugin architecture of LLVM doesn’t give us the control we need.

If you forked from an active project, did you keep up to date with upstream?

Every few months we sync. The longer you leave it, the worse it gets.

If you were to start over, what would you do differently?

Unless LLVM was designed with our niche use-case in mind, there’s not much we could do differently.

Hope that helps.

7

david_chisnall 5 months ago | link

We forked LLVM for a research project. Our circumstances are probably a bit a-typical

So far, around a quarter of the posts on this article have been folks that forked LLVM, and two thirds of those were for research projects.

I suspect that this audience may not be the kind that makes statisticians happy.
1. 5
  
  asb 5 months ago | link
  
  So far, around a quarter of the posts on this article have been folks that forked LLVM, and two thirds of those were for research projects.
  
  I suspect that this audience may not be the kind that makes statisticians happy.
  
  I don’t think I have much to add, but for me also the first example that comes to mind is a downstream LLVM fork targeting a research architecture. That said, it was a very educational experience and partly inspired me to write LLVM Weekly as an extension of work I was doing to track upstream developments. Also to later kick off the work for an upstream RISC-V LLVM backend to put an end to that failure mode for LLVM work in the RISC-V community. So I guess it had positive outcomes even if it became a time sink.
  1. 3
    
    david_chisnall 5 months ago | link
    
    So I guess it had positive outcomes even if it became a time sink
    
    I (and many others) definitely had positive outcomes from your work there, even if you didn’t!
2

edd 5 months ago | link

Just remembered, we also had a fork of the Linux kernel for a benchmarking experiment we did a few years back, but we don’t maintain that any more.

3

adamshaylor edited 5 months ago | link

I hope you don’t mind my answering your questions with some of my own. Assuming you’re thinking of forking an OSS project at your work:

Why was the project abandoned? Have you looked at what the authors are working on now? Have they moved on to a different problem? Perhaps they’ve actually moved on to a better solution?
On a scale of 1 to 10, with 1 being the most rudimentary maintenance and 10 being a complete overhaul, where are you going with this fork?
How much of the codebase have you read? Are you happy with the quality?
Are there any actively maintained open source projects that solve the same problems the one you’re considering forking does?
What resources went into maintaining the original project when it was active? What resources will be required to maintain a fork? How much does the problem domain of the project overlap with the problem domain of your organization? Enough to justify the resource requirements?

If I betray a conservative attitude towards forking, it’s because I vividly recall two stories where OSS projects were forked in past jobs. The forks were of actively maintained projects, not abandoned ones. They were never upstreamed. I appreciate this prompt to write down what I consider, with the benefit of hindsight, to have been really bad ideas. I’m not going to go into particulars of who or what because I don’t think it’s constructive to point fingers. I do think there are some lessons worth sharing. So, to preserve anonymity, I’ll call each forked project “Phi” and “Upsilon.”

Phi was a client-side JavaScript library. Someone forked it because it contained more features than we needed. They were rightly concerned about the performance implications of JavaScript bloat. At the time, Phi did not use modules and was therefore not tree-shakeable. Unfortunately, the hard fork effectively turned into a messy, convoluted rewrite with none of the benefits of the original project (good documentation, large community, free features and patches) and all the drawbacks of being stuck with someone else’s aging codebase. I don’t think anyone on my team fully understood Phi’s internal idioms until a lot of hastily written features had already eroded its separation of concerns. Eventually, the cost of maintaining it outweighed the cost of starting over. I think this is true of repurposing in general. I have seen multiple projects go south when they start life as a copy of something else. The up-front cost of starting from scratch is paid back several times over with its clarity of purpose.

Upsilon was an extremely complex but obscure graphics library, which, in turn, was dependent on another, major open source project built by a FAANG. The underlying FAANG project was frequently updated, which in turn necessitated frequent updates to Upsilon. Sometimes Upsilon’s maintainers kept up with their dependency updates. Sometimes they didn’t. During one of the dry spells, someone on my team got impatient. Rather than file an issue with Upsilon and wait, they forked Upsilon, wrote a patch, and never upstreamed. Then, crucially, someone else on our team came along and added a feature on top of the patch in the fork. Fast forward a couple of years. The forkers are long gone. The patches we originally needed are now available in the original Upsilon project in spite of our failing to contribute them. Meanwhile, our fork is failing on newer operating systems. We can’t switch back to the original project because it lacks our proprietary feature, upon which there are many downstream dependents. I tried to merge the latest upstream code into our fork, but I got stuck on the numerous conflicts with our feature code. For a while, we just didn’t upgrade the OS on the server it ran on. Finally, someone more tenacious than me sucked it up and, over the course of several weeks, resolved the merge conflicts, knowing full well that we would have to repeat the process again every time we needed updates from upstream. The lessons I drew from that dreadful experience were: a) always upstream your patches; and, b) failing that, always favor adding features at your own layer of abstraction over hastily adding them to a patched fork.

1

christianscott 5 months ago | link

Thanks for the answer – those are good questions.

I think this is true of repurposing in general. I have seen multiple projects go south when they start life as a copy of something else. The up-front cost of starting from scratch is paid back several times over with its clarity of purpose.

This is a great insight.

3

andrus 5 months ago | link

Background: I was responsible for developing the browser-based WebRTC video chat library for the company I worked for. We already had an established voice product which used SIP in the backend. We also saw that there were at least two JavaScript libraries for handling SIP in the browser (JsSIP and SIP.js) as well as on mobile. Because of this, we decided to try to reuse as much as possible for the video product and send SIP all the way to the clients. I ultimately chose SIP.js for the first iteration of the browser-based video chat library’s signaling.

Why did you decide to fork in the first place? Did you have a choice?

There were bug fixes and feature additions we needed to implement in SIP.js and consume on our own timeline. The maintainers of SIP.js were very responsive to our pull requests, but we needed to control our own destiny with regards to when the changes got merged and released.

What were the best parts of maintaining a fork?

The whole process of implementing bug fixes and feature additions in our fork, trying them out in production, and then submitting them back to upstream, was rewarding because (1) of how receptive the SIP.js maintainers were to eventually merging our changes back in, and (2) it felt like we were figuring things out and improving the software.

What were the worst parts?

Maintaining the changes in a way that wouldn’t block us from upgrading. SIP.js was under active development at that time, and we didn’t want to get into a scenario where we diverged. There was a lot of manually keeping things in sync between branches.

If you forked from an active project, did you keep up to date with upstream?

We maintained our own branch with our set of patches applied on top. Any time we upgraded, we’d recreate the branch, and re-apply the patches on top.

Did you change the fork significantly? Was that by choice?

We avoided this. We tried to keep the changes in our fork and the ones we submitted upstream as similar as possible. Once a change was accepted upstream we’d delete it from the set of patches we maintained or change our code to use the (slightly different) version of a change that was accepted upstream.

If you were to start over, what would you do differently?

Maintaining the fork wasn’t that bad. The big thing I would’ve redone is using SIP in the frontend. We saved a bit of time on the backend and paid for it in the frontend by dealing with all of the complexities of SIP there. We eventually migrated the video libraries away from SIP and drastically reduced our library sizes.

3

ane 5 months ago | link

It was pretty typical: a bug was found, the project was inactive, and the author did not reply nor did anyone react to emails or pull requests, so I forked the library and put the finished result into our private repository.

This has happened several times. In most cases the author eventually reacted and a new version of the library was made and the fork ended as a result.

3

rtpg 5 months ago | link

I was dealing with several Django library forks at $PREV_JOB. This lead to me learning a lot about the ORM and the like, and being able to maintain OSS and use it professionally on a work project is a real eye opener. And it was a real “get paid to do the fun stuff” thing for me (I was always doing the Django upgrades and the like as well… something’s wrong with me).

Thanks to some of that stuff I’m now maintaining django-taggit (mostly bursts of activity, sorry to anyone with waiting patches! Will get to them when I do). I’m now freelancing so it’s totally a “when I feel like it” thing now, which is less fun than before but it’s also somewhat satisfying to go in and try to answer people’s questions and try to make ports.

1

sjamaan edited 5 months ago | link

A few jobs ago we forked the Chamilo (nee Dokeos) 1.x codebase in order to add some customizations our customers were asking for. We had a few custom questionnaires and a tool that was based around collaborative drawing (using Adobe AIR on the client with a Java server component, IIRC).

Note that the Chamilo codebase was your typical gutter-quality PHP code: it had glaring security problems (SQL and HTML injection galore, which they tried to “fix” by flailingly adding lots of inappropriately-placed html_entities and addslashes as well as some mysql_real_escape_string calls, often resulting in an even bigger mess due to double escaping) and abused the session for storing variables, which meant you would get weird glitches when using the back button or browsing multi-tabbed. I once spent an entire week trying to hunt down an intermittent bug caused by such abuse of the session…

All this to say that Chamilo 1.x wasn’t really pluggable or extendable, and our changes were very specific to our client, so we were rather forced to maintain our own fork. We were either using subversion or mercurial, I don’t remember (it was a few years ago). Every time a new release came out, we would unpack the upstream into our VCS into a new branch and merge the diff between the old and new version into our fork.

Now, my conscience wants to say all of this was pretty terrible, but in fact maintaining our fork with some changes was quite managable. Our changes usually were wrapped in some very specific if statements (not all our customers wanted all of our changes, so we used feature flags) and not too spread out across the system. There were a few moments where merges didn’t go that smoothly, and we were left scratching our heads why the code wasn’t working.

I think we mostly got lucky that we got started at a time when they started off their Big Rewrite (I think the people there had finally realised how terrible their code really was), so there wasn’t too much churn in the 1.x codebase.

I don’t know what we could’ve done differently. I had argued many times with management that the code was terrible and we shouldn’t be banking on this product, but apparently it was the least bad free software electronic learning system. And to be fair, the alternatives I investigated like A-Tutor and Moodle sucked at least as hard technically or were not usable enough for our customers.

Given the non-modular nature of the codebase, keeping our code within feature flags was probably a good choice, and it also allows you to find all customizations relatively easily (our feature flags had the company name as a prefix). Alternatively I would suggest using comments and a clear and consistent marker to denote customizations. Maintaining our own changes in a separate branch and importing clean copies of upstream in another branch was definitely the right call. It makes it easier to diff the changes and apply them to a new release.

But all in all, you’re much better off avoiding this kind of stuff in the first place, if you can. It is so much better if you can upstream your changes (assuming they’re generic enough - and you’d better try to make them such). And failing that, choosing a system with a modular codebase where you can easily add your own stuff is so much better.

1

jorgelbg 5 months ago | link

We generally prioritize utilizing upstream solutions or off-the-shelf tools to avoid reinventing the wheel. However, in my specific domain, we’ve encountered situations where forking became necessary. This includes a few Prometheus exporters, which are community projects.

In many cases, forking wasn’t the initial intention. Instead, it occurred when we needed to introduce specific features or optimizations, and there was a lack of momentum in the upstream project.

Another case is a widely used Grafana plugin that hasn’t seen updates for an extended period. Due to its extensive use and the absence of a suitable alternative with all its features, we were forced to fork it when it became incompatible with the latest Grafana version.

It would be interesting to hear from people in cochroachlabs/tailscale. According to some blog posts/talks, maintaining such forks has paid off sometimes.