2 Lenses for Examining the Safety of Open Source Software

2 Lenses for Examining the Safety of Open Source Software

Open source repositories — such as Python’s PyPI, the Maven Java repository, and the Node Package Manager (npm) for JavaScript — typically have a skeleton crew of engineers and volunteers to manage and secure the infrastructure. The volume of malicious users and projects being created on these platforms everyday is fast outpacing security review teams’ capacity to keep up.

The focus on the security of repositories mirrors the increasing attention that the software supply chain has garnered from attackers, says Tim Mackey, head of software supply chain risk strategy at software integrity firm Synopsys.

“If I’m an attacker, and I want to go and compromise, say, a JavaScript application, or a Python application at scale, then the best way for me to do that is to somehow gain control over meaningful elements of the repository,” he says. “So, if I’m a development organization that’s consuming Python code, Node code, or Java code … I’m going to have a level of implicit trust that the repository is going to be doing the right thing … and that there’s no intrinsic avenues for attack or ways that trust can be breached.”

There are several technical efforts underway to reduce the work on maintainers and repositories’ infrastructure staff. However, solving this challenge — keeping malicious packages and users out of the software application — requires more than just technology.

Put Technology on the Case

The OpenSSF Scorecard (hosted by the Open Software Security Foundation), for example, runs automated checks against developers’ code and open source projects to help gauge the risk of malicious maintainers, compromises of the source code or build system, and malicious packages.

“Being really deliberate about what it is you’re linking into your supply chain is best — really, the best offense here is a good defense,” says Zack Newman, principal research scientist at Chainguard. “Coming up with a policy inside of an organization to look at specific signals in the Scorecard when we’re adding dependencies, I think, goes a long way.”

Another technology, sigstore, allows developers and maintainers to easily sign their code to allow the end user to have trust in the provenance of the code. The project makes digitally signing source code easier because individual developers do not have to manage their own cryptographic infrastructure. Python has a package to help developers generate and verify code signatures using sigstore, and GitHub is also working on a plan for developers who use npm to adopt sigstore, as well.

Add More People and Process, Too

No matter how good the tools are, the bottom line is this: What software repositories really need is more funding and more security professionals on staff.

“You’ll hear suggestions to put automated tools in the pipeline, so that we just have some scanner check all the packages as they’re uploaded for malware,” Newman says. “That sounds like a great idea, but it’s not quite the solution that you’d think because we run into issues with false positives, which then need to be manually reviewed, imposing a huge operational overhead — and so now we’re back at square one.”

The focus on securing the software supply chain has led to increased investment by industry in the open source ecosystem. OpenSSF’s Alpha-Omega Project, which aims to secure the most critical projects, now has a security developer-in-residence for the Python Software Foundation. Amazon Web Services has also donated to PyPI to create a Safety & Security Engineer role.

As open source software has become clearly recognized as a critical infrastructure, government investment has also increased. In March, for example, the Biden-Harris administration announced its National Cybersecurity Strategy, which seeks to hold companies liable for software products, while previous White House meetings and guidance aims to increase support for securing open source projects.

More bodies, not necessarily more technology, will solve many of the problems in the short term, says Synopsys’ Mackey.

“One of the things I like about the Python model is that they have that human review cycle in there,” he says. “And that, to a certain extent, is going to limit the scope of damage for some of these things.”