A weakness in Node Package Manager (npm) could allow anybody to hide malicious dependencies and scripts within their packages, a former GitHub employee claims.
Npm is owned by GitHub and is used for JavaScript code sharing, serving more than 17 million developers. It’s the world’s largest software registry, containing more than 2 million packages, according to the website.
In blog post on June 27, Darcy Clarke, former staff engineering manager for npm’s command line interface team, detailed a weakness in the site he named “manifest confusion.”
The “confusion” arises from the fact that npm doesn’t validate the metadata associated with any given package, theoretically enabling any publisher to hide certain information about their packages, including the scripts it runs and the dependencies on which it relies.
What Is Manifest Confusion?
Npm — and other repositories like it — has been under pressure in recent months, with more and more hackers devising novel methods to poison packages and spread their malware through the code supply chain.
Not all credit goes to the hackers, though — some npm security risks arise from the way the platform itself works, as with its lagging efforts against typosquatting, and, now, manifest confusion, Clarke said.
The problem is that npm doesn’t automatically cross-reference a package’s “manifest” — what users first see when they visit a package on the site — against its package.json, the standard file describing its contents. Both the manifest and package.json contain metadata including, crucially, what scripts and dependencies the package runs on.
It stands to reason, then, that they should agree with one another, but a publisher can simply manipulate a package’s manifest, and npm won’t notice. As a case in point, Clarke created an npm package with a manifest stripped of any evidence of the dependencies in its package.json.
Theoretically, hackers could do the same with their own packages, in order to conceal the existence of malware from unwitting software developers.
Why npm Doesn’t Validate
Clarke posited an historical precedent for manifest confusion. “Before the node ecosystem became what it is today,” he wrote, “the number of people contributing to the corpus of software you trusted to use and download was very small. With a smaller community, you have more trust, and even as the npm registry was being developed, most aspects were open source and freely available to be contributed to and code inspected.”
Even today, “various references in docs.npmjs.com [point] to the fact that the registry stores the contents of package.json as the metadata — and nowhere does it mention that the client is responsible for ensuring consistency,” Clarke explained.
Exactly why the system works this way isn’t clear. At the time of publication, GitHub had not responded to Dark Reading’s request for comment.
“[Who knows] the real reasons they chose to do validation client-side versus server-side?” says Mike Parkin, senior technical engineer at Vulcan Cyber. “It seems they chose to lean toward easier organic growth versus a path that would have given more security but would have impacted ease of use.”
No Signs Manifest Confusion Will Be Fixed Soon
According to Clarke, GitHub has been aware of the manifest confusion weakness since at least Nov. 4 last year, and Clarke himself submitted a report on it March 9. However, “GitHub closed that ticket and said they were dealing with the issue ‘internally’ on March 21st,” he lamented in his posting. “To my knowledge, they have not made any significant headway, nor have they made this issue public.”
But “GitHub is understandably in a tough spot,” he acknowledged. “The fact that npmjs.com has functioned this way for over a decade means that the current state is pretty much codified.”
Clarke, it may be noted, is the founder of an alternative website for JavaScript packages, vlt.
With no solution in sight, it will be up to developers to make sure they’re not downloading anything they don’t realize is in a package. Clarke recommended that developers rely only on the metadata indicated by the package’s contents, not its manifest, to account for any potential discrepancies.
To Parkin, taking care around third-party code needs to be mandatory practice for any developer and organization. “That’s especially true with obscure libraries, or older ones that haven’t seen a lot of active development and may have been orphaned, or libraries that were recently added,” he says. “While none of those factors are immediate red flags that would preclude using them in a project, they are factors that make vetting the sources even more important.”
It doesn’t have to be difficult, though — plenty of vendors have been working on this problem for a while.
“There are automated tools that can scan code for unusual features and obvious exploits, and those tools need to be part of the developer’s toolkit,” Parkin says, pointing to OWASP’s list of source code analysis tools.
“Validating your packages should not be an optional step,” he concludes. “Instead, it should be built into any coding project that relies on third party libraries. Full stop.”