Hacker News new | past | comments | ask | show | jobs | submit login
Compromised PyTorch-nightly dependency chain between December 25th – December 30 (pytorch.org)
327 points by ivan_z2000 on Jan 1, 2023 | hide | past | favorite | 175 comments



The only thing that surprises me about such attacks is that they don't happen more often. With Python and Node.js it's now the norm that large packages have hundreds of transitive dependencies. In this case, what happened was dependency confusion because of PyPI taking precedence, but even with such holes plugged the problem remains that by installing a single package, you're potentially trusting hundreds of authors. And yet we do want all these packages, because they solve specific problems in an optimized manner, and we do want anyone to be able to publish packages. I doubt there is a good solution here, ultimately.


There is not a conflict here. PyTorch depends on many packages and none of them were compromised. Instead, a limitation with the way pip installs dependencies allowed someone to create a package on PyPI with the same name and version as the one on PyTorch's nightly package index, which took precedence over the real package. This could be fixed by supporting better ways to specify Python dependencies in pip, without any dampening effect on the package ecosystem.


there's a few factors here that allow dependency confusion attack to happen here

1. pytorch publishes their nightly package on their repo which depends on a custom triton build provided on their repo, but using a package name they don't own in pypi. This has been mitigated by them by renaming the dependency from torchtriton to pytorch-triton, reserving pytorch-triton package in pypi, and changing the dependency name on the newer nightly builds 20221231 forward to point to pytorch-triton package instead.

2. pytorch installation instruction for nightly using pip in (https://pytorch.org/get-started/locally/#start-locally), make use of the --extra-index-url option. This is a known vector of dependency confusion attacks, and is an inherently insecure method of installing packages from private repositories. The recommended approach of distributing wheels in private repositories is by using a repository server that allows proxying/redirecting the public pypi packages to pypi, and users should be using a single --index-url pointed to that private repository (assuming the maintainer of that private repository is to be trusted). --extra-index-url is meant to provide mirror urls (serving the same set of packages as the main one), rather than to combine repos with different sets of packages.


> The recommended approach of distributing wheels in private repositories is by using a repository server that allows proxying/redirecting the public pypi packages to pypi, and users should be using a single --index-url pointed to that private repository (assuming the maintainer of that private repository is to be trusted).

Alternatively, keep a separate requirements_private.txt around for private dependencies and add a line --index-url <my private repository>.


I think blame is shared between pip being ultra vulnerable to foot guns like this and whoever put together the PyTorch nightly install not seeing the dep confusion issue from a mile away.


> And yet we do want all these packages, because they solve specific problems in an optimized manner, and we do want anyone to be able to publish packages.

Do we want all those packages or do we want the functionality of them? Dependency hell happens for reasons of deficient first-party support. Notably, languages that lack a sufficient standard library and a blessed toolchain should be considered a pre-existing condition to this decease.

I get that language maintainers have legit reasons to exclude something like HTTP from std, but there has to be some middle ground here. For instance, Golang provides experimental packages that have high quality but a lower level of support. To me, this is a win. It centers the community in a common direction, and delivers real world value in the meantime, with minimal maintenance upkeep compared to the 100s of packages we see in some ecosystems like JS and to a lesser extent Rust.


Python has the best or second-best standard library in existence, Go being its only competitor for that title.

It still contains only a tiny fraction of the functionality needed by any large project. The world is just too complex for a standard library to ever realistically cover a meaningful portion of the problem space.

We all need tensor algebra, video decoding, cutting-edge network protocols, compression, cryptography, dozens of data exchange formats, syscall bindings for three or more platforms, fuzzing, containers, and I don't know what else. This isn't going to all fit into a standard library. "Dependency hell" is here to stay.


> This isn't going to all fit into a standard library.

There is a size limit for standard libraries?

I think Java introduced modules and its dependency model so you could package only what your software needed.


> There is a size limit for standard libraries?

There is a size limit to what language curators are able to maintain, yes.

The idea of the standard library is that it will select components that are general enough to cover a wide range of use cases; therefore the language builders ned time to review every new proposal, and their time is limited.

You may dump everything under the sun in the standard library, but then it won't be better than what you get from a library marketplace.


Java?


Let’s start the year with a programming language debate!

In my opinion, Java’s standard library is good, but not as good as the one in Python and then Golang. I also prefer the one in C#.


Log4j?


Never thought I would shrill for Java, but that vulnerability could exist in any language. That it was such a widespread issue only goes to show how widely deployed Java is for server software.


No, it wouldn't. Java is the only ecosystem where dynamically loading code by default from random web servers is considered to be a feature. Probably a legacy of its early decade, when everybody was dreaming of this.


Java is unique as the culture encourages doing magic convention over configuration by reflection AND it-just-works serialization, which leads to a new bug class: unsafe deserialization except that the deserialization happens in the library by default (e.g. fastjson, jackson, tons of struts2 bugs, etc) and propagates along the supply chain.

Libraries refuse to shift away from this culture thing, and decided to just blacklist the known exploit path while keeping the vulnerability intact. It's not that they do this for backward-compatibility, it's new libraries still being designed this way.

The log4j 2 bug is slightly different, but IMO as a design issue it has root in the same culture thing described above.


> Libraries refuse to shift away from this culture thing, and decided to just blacklist the known exploit path while keeping the vulnerability intact.

Log4j 2 refused to shift away from it since they are paid to add that kind of bloat, yes this wasn't an accident caused by a bad culture, this was a feature someone requested. As far as I understood they are completely separate from the original log4j and it is just another, mostly compatible logging implementation.

> Java is unique as the culture encourages doing magic convention over configuration by reflection AND it-just-works serialization

As opposed to what? Python? The language that culturally refuses to fix the GIL and as workaround provides multiprocessing, which requires that everything is serialized?

Also I may not have seen many Python code bases, but from what I have seen ClassLoader/Import abuse is alive in both languages.


I agree that the GIL situation is unfortunate.

Python at least does not pretend their it-just-works serialization is secure. The documentation [1] actively discourages deserializing untrusted data and suggests to use pure data format (e.g. JSON) as possible.

In contrast, the documentation on ObjectInputStream before Java 11 does not warn about this at all. Even then [2], it suggests to implement blacklist/whitelist filters, and it is pretty hard to get right. The same filtering can also be done for pickle, but the usual consensus in Python developers seems to be "don't do this even if it's possible".

> but from what I have seen ClassLoader/Import abuse is alive in both languages.

Yeah, should have said that my rant is mainly against endless deserialization/OGNL injection/whatever-popular-expression-language injection bugs in frameworks, not abusing ClassLoader. These features, just like log4j2's code-execution-disguised-as-string-interpolation feature [3], shouldn't exist.

[1] https://docs.python.org/3/library/pickle.html

[2] https://docs.oracle.com/en/java/javase/11/docs/api/java.base...

[3] I'd argue that's the real bug, instead of the obvious JNDI class loader blah blah stuff. Luckily log4j didn't refuse to fix it and completely removed "message lookup".


Python developers aren't smart enough to develop threaded code. Making threading work in python would be like giving crack to an infant. Python maintainers made the right choice to coddle their user base, you see if you have to be smart to use python then you wouldn't be using python to begin with and user base goes to zero. I kid I kid. Happy New Years - a python developer


That's a third-party dependency, not a part of the standard library.


Log4j was never part of Java


I think that Debian and most other Linux distros handle this extremely well.

There are thousands, if not tens of thousands, of packages available to install via apt or yum. But most of those packages are packaged by a dedicated maintainer - not any rando. The bar isn't much higher, but there _is_ a bar. Python's (well, pip) practice of letting any vermin with no prior vetting publish is root of the problem in my opinion.


Staging the packages in experimental then testing, sometimes for months, before going to stable, is doing also a lot against this kind of threats.

Which leads me to wonder: do we really always want the very latest version or the packages? Any slightly older version will be immune to dependency-poisoning, thanks of the scrutiny of users over several weeks.


> Which leads me to wonder: do we really always want the very latest version or the packages?

Not really, but then someone needs to decide what version to use and when/how this version is updated. For every single dependency. I agree that keeping track of your dependencies and actually managing them may be better engineering but it's a lot of work not many people want to do


We're solving this problem at https://socket.dev starting with npm, with python coming in the next month or two. Here's an example of a date picker web component that runs an install script, collects telemetry, accesses the network and filesystem, and more -- all detected with our static analysis engine. https://socket.dev/npm/package/angular-calendar

We show alerts in GitHub pull requests, or the CLI, if you add a dependency with a supply chain risk.


Developing in container environments with limited access might help, but I think there's a performance hit for heavy processing/ML training unless you use privileged mode which kinda defeats the purpose.


What I do is that I do all dev work in a VM, usually dedicated to a group of related projects. There are no passwords stored in the VM and its ssh key has read-only access to repositories. The VM has write access to forks and code is merged using pull requests just like third party contributions. This setup has been working well for me, and I even created a tool to automate setting up these dev VMs which I hope to make publicly available at some point.

I don't do any heavy computations except for running test suites of various kinds, and these seem to perform the same on raw hardware as they do in the Kernel-based VMs.


Jails / sandboxes such as bwrap could be enough in this case by denying access to e.g. HOME files not explicitly whitelisted.

Also a Little Snitch-like host-based firewall, which would request explicit permission to connect.


Love to see a container environment that can monitor Monitor and log all outgoing network connection requests.... Monitor and log all critical file/directory access such as /etc/*

With such container, we can catch the compromised supply-chain attach easily, right?

Does anyone know such container exist?


Only using privileged containers, or else you don’t have visibility into signal from other containers.

But, say you had such a container, there’s an important distinction between “you captured a log showing the smoking gun evidence of the supply chain attack”, and “you successfully picked that log out of all of the log data you generated and classified it with high confidence as an attack”.

Speaking from experience, the second problem is the hard problem for a multitude of reasons. So while you would have the data, you’d probably have trouble getting good precision/recall on when to actually sound the alarms vs. when it’s some SRE who needed to troubleshoot some network connectivity issues.


> Only using privileged containers, or else you don’t have visibility into signal from other containers.

The suspect application doesn't need the privileges, so I'm not sure how much of a problem that is?

> there’s an important distinction between “you captured a log showing the smoking gun evidence of the supply chain attack”, and “you successfully picked that log out of all of the log data you generated and classified it with high confidence as an attack”.

Assuming that you're talking about the signal:noise problem, that's hard in the general case but I feel like you could easily pick off really obvious cases like trying to access private SSH/GPG keys and still get a lot of value.


> Assuming that you're talking about the signal:noise problem, that's hard in the general case but I feel like you could easily pick off really obvious cases like trying to access private SSH/GPG keys and still get a lot of value.

Probably. I’d agree that it’s worth trying at the very least. I’ve run into enough “should be easy” cases that turn out to be not that easy that my default is to get the data and see if the hypothesis really pans out.


I’ve created Packj sandbox [1] for “safe installation” of PyPI/NPM/Rubygems packages

1. https://github.com/ossillate-inc/packj

It DOES NOT require a VM/Container; uses strace. It shows you a preview of file system changes that installation will make and can also block arbitrary network communication during installation (uses an allow-list).


strace uses ptrace, which is not safe for security use because of race conditions. Linux Security Modules should be used.

https://stackoverflow.com/a/4421762/711380


Thanks for highlighting this! While PTRACE introduces TOCTTOU vulnerabilities, Packj sandboxes fixes that by using read-only args for ptrace. You can find my PhD work [1] on this relevant.

1. https://lwn.net/Articles/803890/


If CI/ CD pipeline uses GitHub Actions, you can monitor and even block outbound network calls at the DNS and network level using Harden Runner (https://github.com/step-security/harden-runner). It can also detect overwrite of files in the working directory. Harden Runner would have caught this dependency confusion and similar attacks due to a call to the attacker endpoint.


This is the way things should be done by any competent developer.

Generally a VM jail is preferable - firecracker or cloud-hypervisor(virtiofs & gpu passthrough) recommended.

A proper namespace jail (eg. bwrap) is sufficient for 99.9% of cases. To break out of a properly configured namespace jail you would need to sacrifice a 0day.


It looks like totally overengineered solution to me. The CPU already has a protected mode which doesn't allow the program to access any files directly. Why do you need to run a VM which by the way is run from the kernel (privileged mode) in Linux? Why cannot you run untrusted programs in protected mode?


With containers the entire surface area of the kernel is available to attack (syscalls). With a VM the surface is restricted to the VMM and KVM.

This is an oversimplification, there may be other protocols that are passed through or utilized, they would add to the surface.


Also, the container itself usually contains (or has access to) valuable secrets, such as keys for staging servers and, of course, the source code.


Maven / Java seem to have solved it well.


I was developing in Java right up until Maven became popular. We used to just download jars. What would you say is the main difference with Maven/Java vs NPM?

My recollection is that Java libraries were larger, higher-quality, more stable, and better-maintained, and you didn't need as many of them. A Java jar was not a "package" but contained dozens of "packages" developed together. Jars tended to be self-contained or mostly self-contained; small dependencies would shipped inside. The idea of making npm packages as small as possible, like practically putting each file in a separate git repo, and publishing it as a separate artifact, emerged shortly after NPM itself, and it was radical, and not really particularly good. Java also has a much larger standard library, and between the packages that come with Java itself, the packages that aren't technically part of the standard library but were written by Sun/Oracle, and well-known third-party utilities, you didn't need a lot of third-party packages. And if you needed something tiny like left-pad and didn't have it, you'd probably just copy and paste it.


> What would you say is the main difference with Maven/Java vs NPM?

Maven doesn't allow execution of arbitrary code at install-time, which curbs a large number of potential supply-chain attacks.

Because of the JVM and JARs being mostly self-contained Maven doesn't really need to worry about system or runtime dependencies (unless you're using Scala...). This allows Maven to be a 'dumb' package manager that relies on simple semantics (no hidden specially-generated indices, for example) and be fairly successful. Of course, there's an internal battle of whether Gradle or Maven is superior, but they both rely on the same distribution and packaging specifications.


Maven doesn't have this problem because maven central is too obtuse for hackers to use, and Enterprise Java developers don't ever update their dependencies. It's actually to their benefit, but it's for the wrong reasons.


> Maven doesn't have this problem because maven central is too obtuse for hackers to use

I have many gripes with Sonatype but Maven Central isn't really one of them. The fact you can publish a packages to the likes of PyPI, npm Registry, or Docker Hub with 0 friction makes those places very attractive to spammers and bad actors. Maven Central having a higher barrier of entry is a feature.

IIRC Brian Fox, the CTO of Sonatype, was actively involved with Maven in the early days and was part of the decision for Maven packages to use domains for namespaces. Namespaces are another valuable feature of Maven that makes supply-chain attacks like typo-squatting harder to pull off.


The real reason was the second one. That was just a cheap dig at their UX. Both were cheap digs, but also both true.


Lol, I knew you were mostly joking — but you also weren't wrong.

At the same time, some people genuinely shit on Maven Central and think that it's inferior to other registries.


There's a real problem with maven central and java in general that there's no correlation between the package name - which is nicely domain-name formatted - and actual domain names. If there were a clear "this is really thai domain name and DNS verified" and "this is compatible but not DNS verified" marker, it would be great.

I think golang has the best answer for this, where it's easy to impersonate but it has to be explicit.


Yeah, it's far from perfect but it does get a lot right. It's painful watching all these new package management tools like pip and npm completely ignore what came before them.

I think Go's approach is interesting, though it does rely on some magic that isn't immediately obvious. I agree that being explicit is a tremendous benefit: it avoids the attack used here, and makes it less likely for typo-squatting to succeed (e.g., `npm install axiod`).


Publishing to Maven Central is a bit of a pain, but the manual effort, doc jars, signed jars, etc. help with security and keep away low-effort packages.


Also, a pretty sophisticated way to manage transitive dependencies. Python is an absolute mess in this regard (as well as pretty much everything else with dependency management…)


Alex Birsan actually published his findings for this vulnerability in Feb 2021 [1], and collected a bunch of bug bounties from various companies (Apple, Microsoft, Paypal, Yelp, Tesla, Shopify, Uber, Netflix).

He was able to steal packages names for Python (PyPi), JS (npm), Ruby (Gem), where these various companies have their own private package repositories with private modules where they don't control the package names in the default repositories.

The main requirements for this attack are

1. Having private repositories, where the package names in the default repositories is not owned by the owner of the package.

2. Package manager allows downloading packages from multiple repositories (the default and private repositories) without being able to pin specific package to only be downloaded from the private repositories.

What's notable on his findings is the omission of Facebook and Google, which I believe due to their usage of Buck/Bazel and monorepos for their internal codes. Another thing to note was that Alex mainly target companies internal private packages, while this particular instance affects an open source project which was providing package on their own repository and make use of a package (which I was not able to find any bug bounty program for).

There's another post by Kjäll et.al [2] that explains how this particular vulnerabilities affects other package managers (PHP, Java, .Net, ObjC/Swift, Docker), in what conditions it's vulnerable, and how to mitigate the risk. Two notable language package managers that were not affected are

1. Rust, mainly because you have to explicitly select the private registry for each private packages.

2. Go, mentioned as unlikely, due to the use of FQDNs in the package names and hash verification by default.

I think anyone adding non default package repositories or providing one (their own private repo in enterprise setup, or 3rd party provided repositories), need to be aware of this particular class of vulnerability, and implement policy to mitigate it. I would say, individual devs installing on their dev machines or CI/CD systems based on shell commands (rather than secured package manager setup) would be the main area of attacks mainly due to the relative difficulty of auditing those scenarios.

[1]. https://medium.com/@alex.birsan/dependency-confusion-4a5d60f...

[2]. https://schibsted.com/blog/dependency-confusion-how-we-prote...


Good analysis, thanks for the view in from the outside. I found the terms of any such bug bounty[0], whose scope includes "Open source projects by Meta"

And from the engineering blog, "[...] PyTorch 1.0, the next version of our open source AI framework."[1] (emphasis mine)

[0] https://www.facebook.com/whitehat/

[1] https://engineering.fb.com/2018/05/02/ai-research/announcing...

However Meta has since ditched it[2], and a careful keyword search of pytorch.org, linuxfoundation.org, suggests there is not any current official bug bounties for PyTorch.

[2] https://pytorch.org/blog/PyTorchfoundation/


I was aghast when I first started to see dependency management tooling that allowed dependency versions to be declared using a wildcard. That seemed like an insane compromise between convenience and safety.

I couldn’t bring myself to use a wildcard for a long time. I always specified the exact version and incremented manually to feel like I was at least trying to maintain control.

I still think it’s an insane practice, but with software engineering ever increasingly being the art of composing dependencies with bits of glue code, I get it.


The problem is the automation (PyPI accepts packages from basically anybody automatically). Fedora and Debian/Ubuntu package a large fraction of the python ecosystem (everything required by any package!) and are far less susceptible to these attacks, due to a maintainer doing the update. That's not to say that there is probably perfect vetting of every update, but something like this would be harder.

PyPI probably needs a vetting system for new contributors of some kind.


I think this is an unpopular opinion, but I believe language package management systems try to solve solve the problem that has been solved by Linux distributions a long time ago, and they typically do it very poorly.

I suspect a prime reason is the absence of package management on Windows which a lot of developers (and users) use, and secondly the desire by developers to always use cutting edge library features when writing code, but nobody wanting to upgrade any dependencies after. There used to be a lot discipline about being compatible with many libraries versions IMO, nowadays people just specify the latest version of each library in their requirements.txt file (or equivalent in other languages).


> language package management systems try to solve solve the problem that has been solved by Linux distributions a long time ago, and they typically do it very poorly.

Yet, every main Linux distribution has its own packaging format (deb, rpm, etc) , package naming convention, dependency resolver, package release strategy (rolling, fixed, etc) , package build & deployment system (source, binary, per arch binary, etc), and package install peculiarities (custom, upstream focused, system wide, in a chroot, in a snap, etc), reproducibility constraints, etc...

So it's not like it's a _solved_ problem for Linux distributions.

Not to mention that most distribution package managers are system wide, while language package managers are often environment based.


> > language package management systems try to solve solve the problem that has been solved by Linux distributions a long time ago, and they typically do it very poorly.

> Yet, every main Linux distribution has its own packaging format (deb, rpm, etc) , package naming convention, dependency resolver, package release strategy (rolling, fixed, etc) , package build & deployment system (source, binary, per arch binary, etc), and package install peculiarities (custom, upstream focused, system wide, in a chroot, in a snap, etc), reproducibility constraints, etc...

> So it's not like it's a _solved_ problem for Linux distributions.

Just because someone reinvents the wheel does not mean it wasn't invented (solved before hand. I would also argue each distro package manager is miles ahead of any language one.

> Not to mention that most distribution package managers are system wide, while language package managers are often environment based.

That's the thing I was alluding to in the second part of my post, if developers would be more careful about backwards compatibility we wouldn't have to use environments. I do admit that packages for apps are more of an issue, it would be nice to upgrade those without having to upgrade the rest of the system.


It's not impopular it's completely wrong to a point I can't comprehend. Have you ever seen the download page of any multi platform software ? It's one or 2 packages for windows, covering from win7 to win11, one or two for mac, and a dozen for linux and it cover only a very small subset of the distribs and their versions. Or more often they don't even care and provide only one ubuntu and red hat package and good luck to you.

The state of linux software distribution is completely abyssimal and ridiculous


> that has been solved by Linux distributions a long time ago

Certainly not. A recent example: I wanted to try a KDE distribution, so I installed Neon which has three dependency updaters by default: pkcon, apt, and snap. If I use Python or Node, then I usually need to use their distribution management systems as well.

Dependency management is one area that completely fragments Linux into different distro universes.

I am not saying Windows is better (it isn’t) but if Linux had solved the problem, then most of us would share one solution.


I agree with you completely. Not only is security a problem, but mixed-language packages too. It's possible to put something like that in PyPI but it it's a huge pain (and the new setup.py replacements make it basically impossible).


If they happened often, companies would need to do a better job securing against them. I suspect nation states would like to avoid that.


> hundreds

Thousands.


What are the chances some sort of united nations institution pays workers to both audit and prevent/harden supply chain attacks? Wondering in light of potential job obsolescence with the progress of ChatGPT and the like, if maybe we can forward the human intellect surplus there... Also, could it be thought as a human right to have access to "safe" environment in the future, I certainly would love if my children do not have to worry about this constant threat at some point or at least the stress can be decreased... Sounds a bit like being paranoid is the only way to go, I wonder too what long term effects this has on mental health, do we as techs view our close ones as less trusted the closer to supply chain attacks we work?


> What are the chances some sort of united nations institution pays workers to both audit and prevent/harden supply chain attacks?

Zero, essentially. State actors profit massively from such systemic weaknesses, so it is not in their interest to eliminate them for the population at large (they do of course want to eliminate them for themselves, but they already have extremely strict supply chain policies so that's mostly a solved problem).

Hell, we have state-sponsored institutions working hard to actively create vulnerabilities in software that previously didn't have them. Security vulnerabilities are a tool through which power is exercised. They're not going away as long as governments have any say in it.


I think it is a great idea. The problem is that institutions/decision makers have very little knowledge about open source software so it is really hard to convince them to do this. I can only speak about Germany, but I recently read a newspaper by some IT government official that only contained buzzwords, where it was clear that he did not know what he was talking about. There are exceptions, someone got the government to fund Curl and OpenSSH (both between 50k-500k). So that is great. But you also have a second fund where everyone can apply, and looking at the responsible team you see that out of 5 people, none has a STEM degree, but instead graduated in fields like cultural studies. I doubt that they know/care enough about the threat of supply chain attacks to direct funds there.


Why do you expect that from UN?


One of the reasons why this happens is because Linux uses outdated security model. Linux protects one user's data from other user, but doesn't protect user's data from programs run by the user. This means that for example, third-party software like VSCode or Dropbox has permission to read your browser's cookies, history and saved passwords. Or it can debug your browser and read its memory.

Linux's security model would work well on multi-user mainframes in 80s but doesn't make much sense for an Internet-connected computer with a single user. The threats now are completely different.

For comparison, iOS and Android's security model would not allow a Python package to steal SSH keys. But the best solution would be to implement least privileges principle and do not grant unnecessary privileges to programs.

Another solution which could protect from such attacks would be to hire someone to inspect the packages. But it seems thatnobody is providing such services. Is it because people got used to everything being free?


This might be a bit pedantic, but “Linux” in the general sense doesn’t use the outdated security model you described. You’re basically comparing discretionary access control (e.g. user/group permissions) with mandatory access control (e.g. SELinux), both of which Linux happily supports…

…but the usage of MAC in Linux is so awkward that it might as well not exist, since the majority of distributions, packages and users ignore it completely to avoid all the headaches that come with it.

It’s tricky. It would be nice if the maintainers of big distributions/packages took MAC seriously and defaulted to restrictive SELinux policies or whatever, but history has shown this to be _very_ brittle and annoying for users. That’s why you see this stuff enabled in iOS and Android but not general Linux distributions — it’s much easier to lock things down with MAC when you’re developing a singular OS from scratch with apps that run in a sandboxed environment that’s isolated from the “system”. But if you’re maintaining something like python or pip, it’s basically impossible to do that.


This is a Python problem first, long before it is a Linux problem. Not every ecosystem suffers from these issues. But both Node and Python have such instances of namespace issues and package confusion with alarming regularity.


Is that so? It would seem like any packaging solution that pulls dependencies from GitHub is vulnerable.

This is the case for Rust, it seems. Or any GitHub-based project that pulls recursively submodules.

Unless you have a curated, audited package repository, i can't see how you easily avoid it. Hence it is reasonable to ask how the OS can help.


The OS can't help as long as users can create files. But eco-systems (whether Rust, Go, Node or Python doesn't really matter) can help by managing their name spaces properly. The thing that went wrong here is that someone can silently replace a package in a namespace that you think is unique by overruling it through a namespace that takes precedence. Besides the obvious usability issues (this is not the kind of functionality that you would want in any circumstances) if a package suddenly gets pulled from an entirely different repository compared to where the previous instance came from and does so silently (without throwing an error and without so much as a pause to ask the user whether this really is what they want) then I really don't see what kind of OS level awareness of the world at large would be required to fix that.

The problem does not stem from the operating system level, and so it should not be fixed there, and it likely can't be fixed there, at least not without possibly breaking a whole bunch of stuff that is working just fine today.


As explained in another thread, nothing went wrong with the tooling and namespacing. The PyTorch devs misused a feature intended for a different use-case.

I took a look and the vulnerability that was exploited is warned about and described in the documentation.

> Warning > Using this option to search for packages which are not in the main repository (such as private packages) is unsafe, per a security vulnerability called dependency confusion: an attacker can claim the package on the public repository in a way that will ensure it gets chosen over the private package.

https://pip.pypa.io/en/stable/cli/pip_install/#examples


And that's even further removed from the claim that this could be fixed at the operating system level: an operating system is powerless against people that bypass security measures willfully. But regardless, this option, even if available should always result in a warning when there is a change of effective source for a chunk of code that was previously included blindly from another source.


I agree the issue is not caused by OS; rather my point is, OS is a convenient layer to prevent it.

You mention some mechanisms for preventing these issues at packaging level. But so long as bad actors can sneak malicious commits, it's an open problem. Here a whole package was replaced, in another scenario a single line of malicious code can be sneaked in, released, rubber stamped and shipped via bona fide channels. Ensuring no single bad commit is made in any package or its dependency doesn't seem like a realistic task.

As other posters suggest, this could be somewhat mitigated by mobile-style finer-grained permissions, even if the problem doesn't stem from an OS failure per se.


Those who want those protections can effectively already have them, but convenience (and modern CI/CD tooling) dictates that they will either be disabled or ignored completely.

If you try to fix a problem at the wrong layer all you will end up doing is create a bunch of false positives which will result in those protections being disabled. Then, by the time you need them they will no longer be active. A problem like this should be dealt with as close to the source as possible. A defense-in-depth strategy is a sound one but only if it does not add burdens that will lead to the defenses being disarmed in peaceful times.

Finally, I fail to see which finer grained permission would have stopped this, though maybe someone can make some suggestions. Not allowing the use of Node, Python, Go or Rust would seem to be a bit harsh and nothing short of that would seem to solve the problem without the OS having to gain massive awareness of what is going on at the application level.

Microsoft effectively fell for this by dragging everything and the kitchensink into the operating system layer, the only result of that was an even larger attack surface at a level where it mattered even more.

This is not an easy problem to solve. For starters you would have to review the way all of these dependency resolution mechanisms work and flag the ones that you think contain risk. Then you need to get the suppliers of those systems - and others like it - to agree on a standardized way of vetting the import process and finally you would have to make the OS aware of a set of signatures or certificates that would stop such an attack in its tracks.

The alternatives are doing away with easy dependency resolution or taking charge of the root cause and fixing it: the admission that decentralized source code control is a great thing to have but unsuitable for the distribution of packages for immediate introduction to production systems based on the say so of whoever controls the namespaces. The kind of centralized control that would require is something that the likes of Apple and Microsoft can provide, no Linux distribution is dominant enough to be able to solve this (nor are they wealthy enough). Distros have a hard enough time vetting their own output.


For example, blanket access to filesystem seems bad. Start with, most apps get access to some particular subdir of $HOME, like say "Documents", their own config for and systemwide files (not user-specific). Then you have a directory of home which you know is more vulnerable, but the rest of it, like .ssh, is protected. Select programs like ssh or rsync can have access.

Perhaps a simpler starting point is a runner script that enforces such rules on an opt-in basis from the user, and tooling that makes it easy to run binaries through this. Like a reverse sudo sort of thing.

I'm certainly not saying it's easy, but I'm not sure it is impossible either. It's a bit like an internal firewall.

Even if apps request blanket permissions, that's a win already. On mobile, when apps do it, it is a big red flag for me and I skip.


All that was required here was a very noticeable warning that a dependency that was previously sourced from one place now came from another. Doubly so, because, as a commenter further up points out, the new source is outside of the private repository and the risk of confusion was well known.


For comparison, iOS and Android's security model would not allow a Python package to steal SSH keys. But the best solution would be to implement least privileges principle and do not grant unnecessary privileges to programs.

I think even in iOS and Android, the same thing could still happen if the user keep secrets (say private keys) in their file directories, and then the user give permission to the app to access their file directories, which could be legitimate need of common applications like file manager, text editors, etc.

I think the main change to solve this would be to not keep secrets in files that are readable by users (or program run by users) implicitly, instead using a secret provider that makes it explicit to the user whenever an application needs to access such secrets.


running all apps in something like a lite version of docker in user namespace mode would implement this


Such solutions exist already, see e.g. bubblewrap [1]. The Flatpak ecosystem is slowly trying to add protection by sandboxing applications using bubblewrap. However a contingent of the Linux community meets this with:

- Flatpak is terrible because it doesn't follow a file system hierarchy that was invented in the 70ies.

- Flatpak is terrible because my early 90ies package manager is the pinnacle of packaging.

- Flatpak is terrible because I only trust my distribution's packages.

- Flatpak is a security nightmare because it doesn't isolate every application now. (Which is not really possible, because applications/toolkits need to be adapted).

- Flatpak is terrible because now my applications cannot open arbitrary files anymore (including ~/.ssh).

Conservatism is what holds the Linux ecosystem back. We have seen this story before with systemd. This is sad, because Red Hat and others are doing fantastic work modernizing Linux (see Flatpak, Fedora Silverblue, etc.).

[1] https://github.com/containers/bubblewrap


Qubes OS does it exactly like that:

https://en.wikipedia.org/wiki/Qubes_OS


If you access the domain that data was being sent to, h4ck.cfd, a message is given:

>Hello, if you stumbled on this in your logs, then this is likely because your Python was misconfigured and was vulnerable to a dependency confusion attack. To identify companies that are vulnerable the script sends the metadata about the host (such as its hostname and current working directory) to me. After I've identified who is vulnerable and repoterted the finding all of the metadata about your server will be deleted.

I of course don't believe it, but interesting to see something like this.


It's hard to take that message at face value. If one wanted to help find issues, one wouldn't actively collect secrets. Like maybe `/etc/resolv.conf` makes sense to identify parties but collecting all contents of `~/.ssh` can only be seen as malicious


Agree. This is misdirection.


It’s interesting they didn’t use a less conspicuous domain name and content. Like something related to analytics perhaps.


It's important to understand that this issue was specifically caused by a deficiency in Python's packaging system: dependencies are specified by name and version, and if the user configures multiple package repositories, there's no way to control which repository they get the package from. So if a package is normally only available on PyTorch's nightly package index, someone can upload the same package name and version on PyPI and it will take precedence.

It would be better if packages could specify a cryptographic hashes of the packages they depend on and/or specify the package repo that a package should come from. There is some prior art for this in e.g. the Javascript and Rust ecosystems.

This problem did not occur simply because pytorch has many dependencies and one of them got compromised. If pip allowed you to specify dependencies in a more secure way, this would not have happened.


It is possible to specify hashes. https://pip.pypa.io/en/stable/topics/secure-installs/


hashes would also need to be specified for all dependencies (transitives) in case they were needed,

and all dependencies need to be pinned to specific versions as well. hence this would only work when users are making use of venvs, instead of user install / site install setup.


This can be solved by using namespaced names for packages from third-party repositories, e.g. "pytorch/triton" instead of just "triton". In this case package from one repo cannot replace another repo's package.


Would it work to make a requirements-pytorch.txt that doesn't look in PyPi?

    --index-url https://user:pass@path/to/pytorch/nightly/repo


Little command to check if you're affected (from TFA):

    python3 -c "import pathlib;import importlib.util;s=importlib.util.find_spec('triton'); affected=any(x.name == 'triton' for x in (pathlib.Path(s.submodule_search_locations[0] if s is not None else '/' ) / 'runtime').glob('*'));print('You are {}affected'.format('' if affected else 'not '))"


The malicious package uploads your ssh private keys to it's server. This is extremely concerning to people that may have accidentally installed.

Judging from the package installation stats this was installed around 2,500 times


Good reason to get a Yubikey or similar, and use it to generate WebAuthn‐based SSH keys that can’t be used once exfiltrated. (“ssh-keygen -t ed25519_sk”)


Encrypting keys with a passphrase would also help.


Yes, but once an attacker has root on your systems they may well install something that captures that passphrase so a chunk of hardware that you have with you would seem to offer some extra protection.


Fortunately ssh has forward secrecy, so if you are using the keys in your ssh client and don't use them after they are compromised, then your traffic remains secure. However if these keys are used in an ssh server and someone has marked those keys as trusted, potential issues remain. Unfortunately ssh does not have the PKI infrastructure of ssl to revoke keys.

But an ssh server (or any other server) should not also be used as a development environment that is pulling dependencies (if you are developing a server, then you should be using dummy keys and doing the development in a non-production environment).


>Unfortunately ssh does not have the PKI infrastructure of ssl to revoke keys.

It does, but very few people use it.


Props to the author of the malware — not for executing a supply chain attack, but for making malware that’s not obvious to detect when running. It snags only 1000 files in HOME, only those less than 100k, and then caps it off by uploading via DNS.

I hate to “give props” to the bad guy, but as a former pentester this area is fascinating. The cat and mouse game is eternal, and both sides evolve. This malware is pretty standard, but it’s also smart — you can tell it was developed by a dev to target devs.

Any progress on blocking exfiltration via DNS? It seems only malicious programs make heavy queries like this, and it should be detectable. Even a small ML network could probably be trained to tell the difference between legit queries and malware uploads.


If we are considering improvements, I would prioritize files differently. All .dotfiles in $HOME under the 100k threshold (which feels way too large to get generically useful secrets), but then I would try to be XDG compliant, so walk everything under .config/ looking for files under ~1kb (more likely to just hold credentials). Could be more targeted and look for specific shell folders (eg .config/fish) where aliases and other annoying to type data might live.


If using GitHub Actions for CI/ CD, Harden Runner (https://github.com/step-security/harden-runner) can be used to audit and block DNS exfiltration. Outbound calls from CI are predictable (to source repo, artifact registry, etc.) and don't change often.


> Users of the PyTorch stable packages are not affected by this issue.

But the packages they ask you to uninstall have the same names. How does one install the nightly package? If I just installed "torch" can I assume that I don't have the nightly version?


> How does one install the nightly package?

This is documented on the installation page, assuming you instead by their method. You specify a different source URL when invoking the package manager, so you would need to deliberately install the nightly release rather than get it accidentally. If you installed it by other means (perhaps you OS's package manager) this is also likely to be the case.

> If I just installed "torch" can I assume that I don't have the nightly version?

Probably, but it depends how you installed it, if you are using your OS repository then check with their documentation. The linked page for this thread includes a section entitled "HOW TO CHECK IF YOUR PYTHON ENVIRONMENT IS AFFECTED" (their capitalisation), follow the instruction there if you are not 100% sure what version you currently have installed.


the version numbers are different, even though the package names are the same. A stable version will have a version number such as `1.13.0`, where as a nightly version will have the date in the version number, such as `2.0.0.dev20221230`. You can check this either with `pip list | grep torch` or via `python -c "import torch; print(torch.__version__)"`

If you installed `torch` via the instructions to install the nightly version specifically, then you get the nightly version. By default, you get the stable version of `torch`.


The timing of the exploit seems perfect too. People always say folks deploy malware around Christmas time since that’s when developers are less likely to notice it due to holidays but it’s my first time see it happen (probably after the large scale Google attack many years ago).


Suffering. I wish Python had better better dep management systems. Maybe this is the wake up call we need to make one.


This does not seem like a uniquely Python problem. As a library, the authors probably listed `libFoo >= X.Y" and left it at that. If libraries pin exact versions with hashes, that creates problem elsewhere as everything now needs to be updated in lockstep.


That's long overdue. Npm likewise.


In this case, the malicious dependency wasn't actually a dependency, so removing it from pypi caused no issues with pytorch. Not all projects are so lucky.

I recently wanted to contribute a feature I needed to an NPM package. The package is fairly popular (400k+ downloads per week) but is unmaintained with the last release about 3 years ago. There are no practical alternatives to this package. So my only option was to fork it.

Unfortunately, one of its dependencies was found to be malware between that last release date and now. That dependency was nuked and no longer is available (not that you would install it anyways), so neither my fork nor the project can be built. The dependency tree is, of course, very complicated and it would take a lot of work to figure out a replacement for that dependency and integrate it.

I ended up directly patching the last release's compiled bundle to do what I needed to do. In this case, this very useful (but unmaintained) package is now more or less dead and further development work will be very difficult, its source code a probably useless relic.


Is the underlying issue still present of PyTorch using some dependencies that are shadowed by PyPI without squatting the corresponding PyPI urls? Or was this the last one? It looks like the last one based on 20 min of my looking at their github, but I'm not a good person to trust about this because my knowledge of build and packaging systems is practically none.


This is the last one. It was also the first one.


thank you.


another comment here rightly pointed out that what we want is to protect our files from being read by unrelated processes.

there are no good implementations of this yet to my knowledge. encrypted fuse with snitch like rules and prompts is probably the play.

any network snitch program would have caught this and prevented exfill. a prompt would have been displayed for a new connection to sketchy.io udp 53. from pytorch or some child process. most eyebrows would rise. many requests would be denied.

on mac you have little snitch and at least one foss alternative. on linux you have opensnitch, or can roll your own on libnetfilter_queue[1] or lsm hooks[2].

hopefully soon somebody implements encrypted fuse snitch. until and even after then, wrap yourself in network snitch safety blankets and feel warm and cozy.

1. https://github.com/nathants/tiny-snitch

2. https://github.com/nathants/mighty-snitch


I'm curious if SELinux as configured by default on Fedora/EL would prevent reading private ssh keys. I guess pretty easy to test...


anything without interactive notifications will need restrictive rules, but should work.

ufw preventing all udp 53 to anywhere but 1.1.1.1 might have worked.

it’s the raised eyebrows that are the most important part. ttre, time to raised eyebrow. random network nonsense should not pass invisibly.


the 3.0.0 version that is available on pypi is now an empty package with updated description of

This is not the real torchtriton package but uploaded here to discover dependency confusion vulnerabilities

the compromised version is still available in pypi as version 2.0.0+0d7e753227 https://pypi.org/project/torchtriton/#history.

So technically, if you are pulling the older version of pytorch-nightly (specifically 2.0.0.dev20221230), it will still pull that compromised dependency (because torch have explicit version lock to it).


> So technically, if you are pulling the older version of pytorch-nightly (specifically 2.0.0.dev20221230), it will still pull that compromised dependency (because torch have explicit version lock to it).

All PyTorch nightlies with this dependency have been deleted


@smhx are you sure? at the time of this comment, I was still able to download 2.0.0.dev20221230

  pip3 download torch==2.0.0.dev20221230+cpu --extra-index-url https://download.pytorch.org/whl/nightly/cpu
and on extracting the wheel, METADATA still have

  Requires-Dist: torchtriton (==2.0.0+0d7e753227) ; extra == 'dynamo'
The package dated 20221231 has pytorch-triton already (so should be safe now)

Although I guess this is low risk, because people normally would download nightlies without pinning to a particular version/date.

But in case there are people that does pin their version, and cache those vulnerable versions (locally or on their own proxies/private repositories), they could still be affected.

I recommend to get PyPA to yank the 2.0.0.dev20221230 version in pypi, and possibly amend the post to remind people to purge their caches not just on their local but also on their proxies/private repos/mirrors (mainly for the torchtriton package) and to immediately stop using any pytorch nightlies dated before Dec 31 2022 (mainly any pytorch nighlies that has a pin on torchtriton==2.0.0+0d7e753227, not just between 25 Dec to 30 Dec).


thanks for the heads-up, looks like we didn't yank the CPU wheels on those dates. will get to them in the next set of working hours, as its an unlikely scenario (not only do you have to install the wheel of a specific date, you also have to specify the undocumented feature flag [dynamo])


Reminds me of the blog post on Guix HPC, "What's in a Package?" and it was specifically focused on problems with PyTorch and approaches with package managers.

https://hpc.guix.info/blog/2021/09/whats-in-a-package/


Is pytorch now partly broken without torchtriton? If yes, what's not working? If no, why was it ever there in the first place?


this only affects the nightly pytorch. the stable pytorch build doesn't depend on `torchtriton`.

the nightly pytorch moved to depend on our own secured `pytorch-triton` now, secured on PyPI and our nightly channel.


anyone using AI/deep learning to vet software supply chains yet?

seems like a higher priority task than generating artificial art.

let's turn the GPT's toward something like a softwaresupplyGPT that can be trained on all the software on all the individual package/dependency managers that are in existence.


I started with static/dynamic analysis in Packj [1] to vet PyPI/NPM/Rubygems packages and flags hidden malware or "risky” code behavior such as spawning of shell, use of SSH keys, and mismatch of GitHub code vs packaged code (provenance).

1. https://github.com/ossillate-inc/packj flags malicious/risky packages.


At this stage Python and PyPI should be forked by competent people. You cannot solve packaging and various other issues with the arrogant old boys and their cronies in place. It is a social problem.

No other package distribution system is that bad. BSD ports are better, Homebrew is better.


Good thing I dev in containers, seems like their was no breakout capability so valuable information is limited from a container. Except for secrets loaded as environment variables, which is a bad practice anyway but pretty common in python development sadly.


Cases like that makes me laugh a lot. A decade ago, we used to rely on Linux distribution repos, and any serious company would have its own local copy on its own repository.

Things were a little bit slower to get new versions but things were safe and reliable.

Now, cool kids on the block arrived a few years ago, and decided that they were a lot smarter to have always the last version of everything, in direct and instant way, and build on moving sands. They were thinking that old farts were dinosaurs doing legacy software dev.

All we can say now is that by forgetting the past, you are doomed to make the same mistakes again and again.


I guess the question is: would package maintainers have caught this? (I don't know the answer; I suspect that it would slip by some but probably not all.)


It would have, because I don't know how it is done nowadays, but back then, each package repository maintainers were watching closely the source they were releasing. As signing with their own keys.

They would only build from source for example, and rebuilt by the distribution infrastructure. And in addition, the slow pace of having things like stable, testing, unstable was to give the opportunity to serious users to ensure to used package that were stable, safe and battle tested.


Can't we require each package on PyPI and whatever, to have their own public GPG key signed and associated to them? I am not sure of the technical issues one would have to overcome with this.

Then if someone does try to creep into your package list with a new GPG key or modifies the content with a new key, it would call attention to this much faster and prevent these kind of situations.

Thus; if you trust an author, then add their GPG key to your vault. This will most likely cover multiple packages of the same project, as well as well known groups that provide packages on these infrastructure.


Like many, I use UFW. I wonder if there is a way to have a "pop-up ask for new outbound connection" acceptance per program running.


Note that this exfiltrated via DNS queries so it may actually be your local resolver making the connection.


OpenSnitch might fit the bill


I use OpenSnitch for this - it works exactly as you describe.


Any proof-of-software-certification blockchains out there? In which "mining" is a trusted party reviewing code & posting content integrity hashes of all derived bits.

Then npm/pypi/cargo/whatever are given a list of the parties you trust and the number of certifications each bit needs before it can be downloaded.

Edit: of all the responses, none provide any mechanism for the software-verifier to be compensated. Says a lot about how much we value that kind of work as an industry.


> a trusted party reviewing code & posting content integrity hashes

That sounds like some combination of Crev[0], Sigstore[1], and Trillian[2].

[0] https://github.com/crev-dev/cargo-crev/blob/master/cargo-cre...

[1] https://docs.sigstore.dev/

[2] https://transparency.dev/


Blockchain consensus only works when there are many more “verifiers” than there are items (blocks) to verify. Good luck with the amount of software updates being published and manual reviews.

Regarding manually reviewing software, it would be very easy to create a malicious majority of verifiers who certify malware, because malicious “verification“ takes much less resources than truthful verification. Rather than verifiers being payed, verifiers would need to themselves pay for the right to verify, in order to throttle malicious “verification”.

But realistically, you need verifiers that you trust in the first place, and if those exist, digital signatures are sufficient.


There's no technical fix for compensating verifiers; it has to be negotiated out of band. This is different from Bitcoin because the Bitcoin protocol is unambiguous, so any well-behaved node that follows it can trust that all other well-behaved nodes will accept its output as valid. By contrast, security review is inherently subjective and reasonable people will disagree, necessitating judgment calls on who to trust.


Engineer the act of querying a trusted party to be a paid operation and the problem is solved. It doesn’t really matter if the verification is subjective, you pay 5 people you trust, only accept it if 4 say it’s good, and stop paying the 5th if they consistently disagree and/or vouch for disagreeable content.


A blockchain doesn't magically create money to pay the "miners". Well perhaps it might by taking money from speculators, but that isn't really a robust basis for security audits.


The money comes from folks who want verified software and are willing to pay for it. The greatest failure of crypto thus far is convincing people it’s a speculators market. This had to happen when the only outcome of a mine was “a hash starting with a lot of 0’s and a couple hundred bytes we can all agree on”. Make mining produce something legitimately valuable and that is no longer necessary.


I’m not sure if they use blockchains per-se, but there are definitely companies that are trying to provide a secure software supply chain (chainguard is one I’m aware of, but I know nothing about the underlying technology).


CEO here. We use a transparency log (sigstore) instead of a blockchain.


Not sure if you need anything as complex as that: just the Python tools recording the hash of the dependency and storing that in the git repo should be enough to prevent this particular problem. e.g. go.sum contains a hash of the entire source. I believe npm and cargo are similar. This way the user should always get the exact same dependency that I added as a developer, no matter what happens with their settings or connection or whatnot.

Not familiar with Python's ecosystem at all, but judging from this commit[1] it doesn't seem to have anything it(?)

[1]: https://github.com/pytorch/pytorch/commit/bc92444b34dfbe3841...


That protects this one avenue, but the more common one I've seen (npm land) is a compromised downstream dependency that gets automatically upgraded for whatever reason.

Regardless - I want to be protected even when the Author of the package themselves is compromised (either by their tokens being stolen or them manually compromising the project)


it does, but to use hashes you'd have to pin exact versions of every dependency, which is generally undesirable for a library


Plug: I've been building Packj [1] to address exactly this problem. You can _audit_ as well as _sandbox_ installation of PyPI/NPM/Rubygems packages and flags hidden malware or "risky” code behavior such as spawning of shell, use of SSH keys, and mismatch of GitHub code vs packaged code (provenance).

1. https://github.com/ossillate-inc/packj flags malicious/risky packages.


We use https://www.kusari.dev/ which is about securing the building and deployment of software, not about blockchains.

FRSCA: Factory for Repeatable Secure Creation of Artifacts provides a simple-to-install solution that aims to help secure the supply chain by building secure pipelines. It also provides abstractions and definitions with security guardrails ensuring all builds follow supply chain security best practices.

GUAC: Graph for Understanding Artifact Composition leverages metadata to provide deeper visibility and enable users to quickly understand security issues throughout the software supply chain, their blast radius, and how to remediate their root causes. Learn more as we build this out in the open source with our blog article.

https://www.kusari.dev/products/securesupplychain/

See also: https://slsa.dev/


Sigstore does this with a transparency log instead of a Blockchain.


What's going on with the font on this page though? All the "t"s look weird.


torchtriton dep? Is this targeting LLM users like OpenAI? This attack was only discovered after a few days. I wonder if the attacker succeeded in getting something?


nice hack


> At around 4:40pm GMT on December 30 (Friday), we learned about a malicious dependency package (torchtriton) that was uploaded to the Python Package Index (PyPI) code repository with the same package name as the one we ship on the PyTorch nightly package index. Since the PyPI index takes precedence, this malicious package was being installed instead of the version from our official repository

You'd think people would know better by now.


> The binary’s file upload functionality is limited to files less than 99,999 bytes in size. It also uploads only the first 1,000 files in $HOME (but all files < 99,999 bytes in the .ssh directory).

That’s… not great.


So that's including ssh keys?


Yes, though the private keys should be encrypted ideally.


Not gonna lie, I never choose the pw option when creating keys. Am I out of the ordinary here?


Create the keys with a password, then store the password in an SSH agent (which is built into many keychain managers such as Seahorse). That way, you can use your key without entering the password, yet the key file is still encrypted on disk. Keys are decrypted automatically on login and stored in memory only, protected by the secrets daemon.


Still doesn't help unattended machines that need to scp automatically


I think I just had a lightbulb moment, thanks! Not wanting to re-enter the password all the time when running scripts etc. was my only reason for not using the feature.


Any equivalent solution for containers running in k8s?


Hardware keys make more and more sense all the time.


Yes


These attacks are more and more common and there's still little to actually solve them.

We're working on this at Chainguard by starting at the lowest levels (a containerized Linux distro) where we can deliver packages safely. Then we can use that layer to safely deliver language packages, like in this case.

This is all being done in the open, with projects like Sigstore (sigstore.dev) and our Linix distro Wolfi (wolfi.dev).

This is an incredibly complicated space and there's no silver bullet. We need to build safe delivery mechanisms for trustworthy software.

Linux distros have traditionally done this very well, but they've struggled to solve language package manager distribution, leading developers to shy away from and work around them when installing things like PyTorch.

Our hope is that we can bring the trustworthiness of Linux distros to language package managers, and the ease of language package managers to Linux distros.


> At around 4:40pm GMT on December 30 (Friday), we learned about a malicious dependency package (torchtriton) that was uploaded to the Python Package Index (PyPI) code repository with the same package name as the one we ship on the PyTorch nightly package index. Since the PyPI index takes precedence, this malicious package was being installed instead of the version from our official repository.

sigh Why didn't anyone think of this attack vector till now?


Someone did, now.


I know I shouldn't say this but I'm grateful this happened. People treat this problem as unsolvable when it's absolutely not. Solutions like crev have existed for a long time but languages like rust and python devs never treat them as first class tools. They never bundle them with their official build tools so people don't use them and they use the flimsiest of excuses to justify not doing so.

Maybe this is the kick that lang devs needed to take dependency security seriously and finally make peer review mandatory for third party packages by default. If it doesn't, I hope we continue getting kicked in the balls until we do.


> People treat this problem as unsolvable when it's absolutely not.

The root problem is "I'm using software from hundreds of authors, how do I know I can trust all of them?". And that problem is indeed unsolvable. It's not even a technology problem.

And no, peer review is not the solution. Scientific fraud is still widespread despite peer review being the standard for many decades. Granted, there may be better approaches than the Wild West that is PyPI and NPM, but this problem will never go away completely.


Are you honestly telling me that a package that was reviewed and signed off by three reputable developers is practically susceptible to "Scientific fraud"? If you do, then I can see why you think this problem is unsolvable.

You will never have a theoretically perfect solution to this problem, but guess what: turns out you don't have to. Even single-reviewer systems like linux repositories have proven to be vastly more secure than this crap.

This problem is absolutely solvable and eventually it will be. I just hope they solve it the right way.


>Are you honestly telling me that a package that was reviewed and signed off by three reputable developers is practically susceptible to "Scientific fraud"?

Yes? Especially if it's pulling in dependencies written by a bunch of other people because then it doesn't require malicious action by those three reputable devs, but merely negligence on their part in how those dependencies (and any updates etc. to them) are managed.


Of course negligence causes the system to fail. That's why people rely on REPUTABLE developers; that is, those who have a long history of doing good, thorough reviews. In practice, even single-reviewer systems have proven to be enough to keep entire package ecosystems secure (linux package maintainers).

I encourage you to do more research on this topic.


Making crev part of the official Rust toolchain won't magically make enough time in the day for me to want to volunteer any of it doing code review.


Discussion under duplicate: https://news.ycombinator.com/item?id=34202836


We've merged the threads now.


The language itself needs to evolve so that so many libraries are no longer needed. For example, in javascript if I can't call it via fetch, I don't implement it.


Tonight on 60 minutes, how did the Meta layoffs crash your self-driving car into an electrical substation? Find out tonight, as we expose how their ivory tower fell, and their elite teams enabled a single trickster to seize the bottom card and plunge cities into darkness with a gentle tug. Is this the bottom for them? We ask the financiers who overlooked the clearly pathological expansion that led to this. And a little later, we go on a deep dive,... what is "dev ops"? Why is this "grand slam" of management completely unsustainable? What business practices are obvious ways to sweep poor performers under the rug at these bizarrely huge companies? Tonight, on 60 minutes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: