Software development has evolved greatly in the last decades. It is leaning towards an scenario based in third-party modules, components and libraries that help accelerate the development of our own software solving effectively frequently used tasks so that we do not need to reinvent the wheel.
While It is straightforward to see the advantages of this approach we need to realise that coupled with them comes a series of risks that need to be handled as well. To use a better known pattern that comes from the cloud computing world there’s a shared responsibility model regarding vulnerabilities and potential attacks as we can see in its different flavours: IaaS, PaaS or SaaS.
The main issue arises when a module or library that we depend on gets compromised, automatically the vulnerability propagates to our software project. It’s fair to notice that this propagation does not mean that we are thus affected by a potential attack but it remains a risk that we need to evaluate, control and mitigate and that it requires knowledge inside the organization that uses those affected software components.
Many third-party components are open source and its maintenance relies on a given community that can vary in sizes. In many cases the weight of the maintenance falls in the shoulders of one or two main contributors that keep the project up-to-date and make incremental improvements.
Here is where the burnout concept kicks in. Maintaining a popular library or module requires a ton of work from reviewing contributions, handling communication and analyzing the roadmap of the project to keep it moving forward in the right direction, but returns are often not at sight. When the maintainer sees that the library is widely used its maintenance it is not proportionally shared by the community, burnout increases and we head into a fertile field for an attacker to step in and offer help and gain the permissions needed to perform its attack.
The idea behind the investigation that we are presenting today comes from an attack performed in september 2018 towards the repository event-stream. event-stream is a popular library, that provides helper functions to work with streams in a Node.js application with more than 1.9 million weekly downloads in NPM.
Even though the library is popular its maintenance fell mainly on the repository owner as you can see in the next figure that shows the repository contributions overall:
To give a brief summary of the attack, the attacker seeing the low maintenance of the repo by its community offered help and convinced its owner to give write permissions to the repo and to the published module inside the NPM platform Node Package Manager. After gaining those permissions the attacker added malicious code and published a new version inside NPM affecting indirectly to a significant volume of projects that relied on the event-stream library.
The details of the attack have already been covered in other posts so we point you to one of those here. We can not encourage you enough to check out that post so that you can see the nitty-gritty details on how it was performed and gain some valuable context information.
This attack was really targeted, oriented towards stealing bitcoins wallets from a parent software platform copay-dash that had event-stream as a dependency. Even though in this case the attack was targeted the underlying technique shows a broader scale problem: Managing software dependencies and the implications it conveys in terms of security in our software, specially when we rely on open sourced libraries where the responsibility becomes blurred on the underlying community.
With our investigation we want to dive into the mentioned bigger scale dependency issue.
The question that wondered our minds and that led to this investigation is: if we selected the most depended upon libraries in NPM, Is it frequent to see projects that have low maintenance, projects where the main contributor may be burnout and thus prone to buy into an attack like the one launched over event-stream?
To test our hypothesis we needed to follow these steps:
- Find the libraries most depended upon in NPM
- Define the characteristics that would indicate a low maintenance of the codebase.
- Analyze the results, obtain insights and provide recommendations that improve the current situation.
We focused on the 1000 most-depended upon libraries on the NPM platform. Using a python script foreach library we scraped characteristics that would be valuable to show the activity and use level of the module.
We also need to define a threshold of, what we are going to refer as “low maintenance” codebase, in order to do so we looked into the following features:
- Repository that had 5 or less commits in the last year.
- Community size of 30 or less contributors.
- Participation percentage was low during last year: we compute this participation percentage as the commits performed by contributors other than the owner of the repo over the overall commits.
The above definition is quite restrictive, even the event-stream library would not be included in the low-maintenance bucket since it had 16 commits and 34 contributors over the last year. though it is true that a big part of those commits are part of the attack itself.
We have released the code on Github in the npm-attack-surface-investigation repo. It includes the python scripts need to reproduce our analysis in case it is valuable to someone in the community.
This investigation has been conducted by TEGRA, an R&D Cybersecurity Center based in Galicia (Spain). It is a joint effort from Telefónica, a leading international telecommunications company, through ElevenPaths, its global cybersecurity unit, and Gradiant, an ICT R&D center based in Galicia. TEGRA also has the support from Xunta de Galicia.
The results that we have obtained are shocking: 250 (25%) of the 1000 analyzed libraries fall into the low maintenance bucket following the aforementioned definitions, Those 250 modules accumulate almost 700M weekly downloads, so we are looking into libraries used globally and frequently in a worldwide scale.
Out of those 250, there are 129 libraries that showed no commit activity (12.9% of our analysis scope) at all in the last year, accumulating more than 330M weekly downloads.
If we add to those 129 libraries with no activity (we can not compute community participation since there’s none) the libraries that were only maintained by the repository owner the number of libraries jumps up to 168, summing a total of more than 450 million weekly downloads
This link has the results of the analysis with more information so that you can verify the results of our investigation for yourselves.
After reviewing the results we think that our hypothesis has been proved and we can predict that the attack suffered by event-stream is not a one-of-a-kind event but more a signal of a trend that will continue to hit the open source community over the next years to come.
The use of third-party dependencies in software development has many advantages but attached to them come along some risks that need to be identified and managed by software developers, specially at a corporate level, to avoid being surprised by collateral vulnerabilities inside their projects, inherited from their dependency trees.
Even though open source software is a major trend nowadays, its maintenance is a tedious task, since the returns of it are not straightforward or measurable in the short-term. If we combine that with the fact that these projects are open, in theory, to anyone willing to contribute, we can find ourselves with a landscape where the responsibility becomes blurred, making the open source community more prone to attacks like the one described in earlier sections.
Even though our analisis has only covered NPM libraries, we think that the same conclusions might be found inside other programming languages and package managers where we make use of third-party modules.
Next we will go through some essential recommendations to mitigate the risks of using third-party software from the classic paradigm of cybersecurity: prevention, detection and response.
- Since the release of version 5.x.x, NPM creates a file named package-lock.json that specifies the dependency tree of a project at a given moment in time. It is important that we use and publish this file together with our project to ensure that others users of our software will find the exact same tree of dependencies during the installation phase when they perform “npm install” that way they won’t be affected by minor releases or patches that could potentially include malicious code if they were hijacked. This will allow us to control risks, given that in the moment of the file generation the dependency tree was sanitized.
- Before we include a new dependency in our code we need to think whether that dependency is really needed, and if we conclude that it is, we need to verify if the library that we will be using has a strong community and activity behind.
This section has a lot of potential growth we can see in the software world iniciatives that are worth exploring and integrating into our development cycle. The first step is to list the dependencies that our software has in order to be able to manage them, there are some open source projects that try to help in that area by automating dependency extraction from our codebase.
We are going to focus in two examples showcased by the BBVA labs in the XII STIC conference of the CCN-CERT in Madrid this december:
- Patton: a project that uses fuzzy matching to find public vulnerabilities in our codebase or dependency tree.
- Deeptracy: a project that automates dependency extraction for multiple programming languages.
- After making sure that we keep our software dependencies up-to-date, in many cases moving to the latest dependency does not imply any source code change on our software, so having a backlog task to review and upgrade our dependencies is a must-have in mature software environments.
- Even though anyone who has worked on software development knows about the complexity of the task, is is important to note that an open source community implies a bidirectional flow and that if our software, critical or not, relies on other pieces of open source software we must try to contribute to the community behind it and keep it live and active.
Open source communities are not a panacea and we must not view them from a pure consumer perspective. Participating actively in those communities that we rely on in our own software development is the most direct way to remove maintainers burnout, manage the overall health of our software products and reduce the potential attack surface.
TEGRA cybersecurity center started within the framework of the mixed research unit IRMAS (Information Rights Management Advanced Systems), which is co-funded by European Union, within the framework of the Operational Program ERDF Galicia 2014-2020, to promote technological development, innovation and quality research.