Individual context-free online community health indicators fail to identify open source software sustainability

Read original: arXiv:2309.12120 - Published 5/10/2024 by Yo Yehudi, Carole Goble, Caroline Jay

🎯

Overview

This study examined the longevity and health indicators of open-source software projects over the course of a year.
The researchers used a combination of subjective surveys and automated analysis of online source control data to assess project performance.
The key finding is that these health indicators cannot be used to make cross-project comparisons, as the context of each project varies significantly.
However, the indicators can be useful in tracking changes within a single project's health over time.

Plain English Explanation

When software is developed, it's not uncommon for it to be abandoned or shut down at some point. This can happen for various reasons, like the original developer moving on or the project losing funding. While research on the longevity of academic open-source software is limited, there's no reason to assume it's any different from other software.

Some open-source projects are able to weather these challenges and remain active and maintained, even in the face of adversity. This study looked at a number of open-source projects over the course of a year, using both subjective surveys and automated analysis of their online code repositories to measure common performance indicators.

The key finding is that these health indicators can't really be used to compare different projects to each other. The context and circumstances of each project are just too unique. However, the indicators can be quite useful for tracking changes in a single project's health over time, as long as you don't try to use them to judge how one project is doing compared to another.

Technical Explanation

This study monitored the health and longevity of open-source software projects over a one-year period. The researchers used a combination of subjective measures, like participant surveys, and objective, automated analysis of the projects' online source control repositories to assess common performance indicators.

The goal was to determine whether these health indicators could be used as cross-project benchmarks to compare the relative success or failure of different open-source initiatives. However, the researchers found that the significant variation in context for each project made it impossible to use the indicators in this way.

While the indicators can't be used for cross-project comparisons, they can still be useful in signifying changes in a single project's health over time. By tracking the indicators for a specific project, the researchers could identify trends and potential issues, as long as they didn't try to draw comparisons to unrelated projects.

Critical Analysis

The researchers acknowledge the limitations of their approach, noting that the significant contextual differences between open-source projects make it difficult to establish universal health benchmarks. This is a reasonable caveat, as the factors that contribute to a project's longevity and maintenance can vary widely depending on the project's goals, funding sources, user base, and other unique circumstances.

One area that could have been explored further is the role of funding models and sustainability strategies in open-source project longevity. The researchers mention that grant funding cessation can contribute to project abandonment, but they don't dive deeper into the various funding approaches that open-source projects may employ and how those impact long-term viability.

Additionally, the researchers could have considered the impact of engineering formality and software risk on project health, as these factors may also play a role in determining a project's ability to withstand challenges and maintain an active community.

Overall, the study provides a valuable starting point for understanding the complexities of open-source software sustainability, but there is still much room for further exploration and research in this important area.

Conclusion

This study examined the longevity and health indicators of open-source software projects, finding that while these indicators can be useful for tracking changes within a single project, they cannot be reliably used to make cross-project comparisons. The significant variation in context and circumstances for each open-source initiative makes it difficult to establish universal benchmarks for success or failure.

The researchers acknowledge the limitations of their approach and identify areas for further investigation, such as the role of funding models and engineering practices in project sustainability. Overall, this study highlights the need for a more nuanced understanding of the factors that contribute to the long-term viability of open-source software, an increasingly important component of the technology landscape.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🎯

Individual context-free online community health indicators fail to identify open source software sustainability

Yo Yehudi, Carole Goble, Caroline Jay

The global value of open source software is estimated to be in the billions or trillions worldwide1, but despite this, it is often under-resourced and subject to high-impact security vulnerabilities and stability failures2,3. In order to investigate factors contributing to open source community longevity, we monitored thirty-eight open source projects over the period of a year, focusing primarily, but not exclusively, on open science-related online code-oriented communities. We measured performance indicators, using both subjective and qualitative measures (participant surveys), as well as using computational scripts to retrieve and analyse indicators associated with these projects' online source control codebases. None of the projects were abandoned during this period, and only one project entered a planned shutdown. Project ages spanned from under one year to over forty years old at the start of the study, and results were highly heterogeneous, showing little commonality across documentation, mean response times for issues and code contributions, and available funding/staffing resources. Whilst source code-based indicators were able to offer some insights into project activity, we observed that similar indicators across different projects often had very different meanings when context was taken into account. We conclude that the individual context-free metrics we studied were not sufficient or essential for project longevity and sustainability, and might even become detrimental if used to support high-stakes decision making. When attempting to understand an online open community's longer-term sustainability, we recommend that researchers avoid cross-project quantitative comparisons, and advise instead that they use single-project-level assessments which combine quantitative measures with contextualising qualitative data.

5/10/2024

On Software Ageing Indicators in OpenStack

Yevhen Yazvinskyi, Jasmin Bogatinovski, Jorge Cardoso, Odej Kao

Distributed systems in general and cloud systems in particular, are susceptible to failures that can lead to substantial economic and data losses, security breaches, and even potential threats to human safety. Software ageing is an example of one such vulnerability. It emerges due to routine re-usage of computational systems units which induce fatigue within the components, resulting in an increased failure rate and potential system breakdown. Due to its stochastic nature, ageing cannot be directly measured, instead ageing indicators as proxies are used. While there are dozens of studies on different ageing indicators, their comprehensive comparison in different settings remains underexplored. In this paper, we compare two ageing indicators in OpenStack as a use case. Specifically, our evaluation compares memory usage (including swap memory) and request response time, as readily available indicators. By executing multiple OpenStack deployments with varying configurations, we conduct a series of experiments and analyze the ageing indicators. Comparative analysis through statistical tests provides valuable insights into the strengths and weaknesses of the utilised ageing indicators. Finally, through an in-depth analysis of other OpenStack failures, we identify underlying failure patterns and their impact on the studied ageing indicators.

4/26/2024

Predicting Software Reliability in Softwarized Networks

Hasan Yagiz Ozkan, Madeleine Kaufmann, Wolfgang Kellerer, Carmen Mas-Machuca

Providing high quality software and evaluating the software reliability in softwarized networks are crucial for vendors and customers. These networks rely on open source code, which are sensitive to contain high number of bugs. Both, the knowledge about the code of previous releases as well as the bug history of the particular project can be used to evaluate the software reliability of a new software release based on SRGM. In this work a framework to predict the number of the bugs of a new release, as well as other reliability parameters, is proposed. An exemplary implementation of this framework to two particular open source projects, is described in detail. The difference between the prediction accuracy of the two projects is presented. Different alternatives to increase the prediction accuracy are proposed and compared in this paper.

8/1/2024

Biomedical Open Source Software: Crucial Packages and Hidden Heroes

Andrew Nesbitt, Boris Veytsman, Daniel Mietchen, Eva Maxfield Brown, James Howison, Jo~ao Felipe Pimentel, Laurent H`ebert-Dufresne, Stephan Druskat

Despite the importance of scientific software for research, it is often not formally recognized and rewarded. This is especially true for foundation libraries, which are used by the software packages visible to the users, being ``hidden'' themselves. The funders and other organizations need to understand the complex network of computer programs that the modern research relies upon. In this work we used CZ Software Mentions Dataset to map the dependencies of the software used in biomedical papers and find the packages critical to the software ecosystems. We propose the centrality metrics for the network of software dependencies, analyze three ecosystems (PyPi, CRAN, Bioconductor) and determine the packages with the highest centrality.

4/11/2024