Open Source LLM Projects Likely Insecure, Risky to Use

Open Source LLM Projects Likely Insecure, Risky to Use

There is a lot of interest in integrating generative AI and other artificial intelligence applications into existing software products and platforms. However, these AI projects are fairly new and immature from a security standpoint, which exposes organizations using these applications to various security risks, according to recent analysis by software supply chain security company Rezilion.

Since ChatGPT’s debut earlier this year, there are now more than 30,000 open source projects using GPT 3.5 on GitHub, which highlights a serious software supply chain concern: how secure are these projects that are being integrated left and right?

Rezilion’s team of researchers attempted to answer that question by analyzing 50 most popular Large Language Model (LLM)-based projects on GitHub – where popularity was measured by how many stars the project has. The project’s security posture was measured by the OpenSSF Scorecard score. The Scorecard tool from the Open Source Security Foundation assesses the project repository on various factors such as the number of vulnerability it has, how frequently the code is being maintained, what dependencies it relies on, and the presence of binary files, to calculate the Scorecard score. The higher the number, the more secure the code.

The researchers mapped the project’s popularity (size of the bubble, y-axis) and security posture (x-axis). None of the projects analyzed scored higher than 6.1, which indicates that there was a high level of security risk associated with these projects, Rezilion said. The average score was 4.6 out of 10, indicating that the projects were riddled with issues. In fact, the most popular project (with almost 140,000 stars), Auto-GPT, is less than three months old and has the third-lowest score of 3.7, making it an extremely risky project from a security perspective.

When organizations are considering which open source projects to integrate into their codebase or which ones to work with, they consider factors such as whether the project is stable, currently supported and actively maintained, and the number of people actively working on the project. There are several types of risks organizations have to consider, such as trust boundary risks, data management risks, and inherent model risks.

“When a project is new, there are more risks around the stability of the project, and it’s too soon to tell whether the project will keep evolving and remain maintained,” the researchers wrote in their analysis. “Most projects experience strong growth in their early years before hitting a peak in community activity as the project reaches full maturity, then the level of engagement tends to stabilize and remain consistent.”

The age of the project was relevant, Rezilion researchers said, noting that most of the projects in the analysis were between two and six months old. When the researchers looked at both the age of the project and Scorecard score, the age-score combination that was the most common was projects that are two months old and have a Scorecard score of 4.5 to 5.

“Newly-established LLM projects achieve rapid success and witness exponential growth in terms of popularity,” the researchers said. “However, their Scorecard scores remain relatively low.”

Development and security teams need to understand the risks associated with adopting any new technologies, and make a practice of evaluating them prior to use.