Machine Learning-Enabled Software and System Architecture Frameworks

2308.05239

Published 6/28/2024 by Armin Moin, Atta Badii, Stephan Gunnemann, Moharram Challenger

🏋️

Abstract

Various architecture frameworks for software, systems, and enterprises have been proposed in the literature. They identified several stakeholders and defined modeling perspectives, architecture viewpoints, and views to frame and address stakeholder concerns. However, the stakeholders with data science and Machine Learning (ML) related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks. Only this way can we envision a holistic system architecture description of an ML-enabled system. Note that the ML component behavior and functionalities are special and should be distinguished from traditional software system behavior and functionalities. The main reason is that the actual functionality should be inferred from data instead of being specified at design time. Additionally, the structural models of ML components, such as ML model architectures, are typically specified using different notations and formalisms from what the Software Engineering (SE) community uses for software structural models. Yet, these two aspects, namely ML and non-ML, are becoming so intertwined that it necessitates an extension of software architecture frameworks and modeling practices toward supporting ML-enabled system architectures. In this paper, we address this gap through an empirical study using an online survey instrument. We surveyed 61 subject matter experts from over 25 organizations in 10 countries.

Create account to get full access

Overview

The paper addresses the lack of stakeholders and modeling perspectives related to data science and machine learning (ML) in existing software and enterprise architecture frameworks.
It highlights the unique characteristics of ML components, which differ from traditional software systems and require a different approach to architecture modeling.
The authors conducted an empirical study by surveying 61 subject matter experts from over 25 organizations in 10 countries to understand the gaps in current architecture frameworks.

Plain English Explanation

When designing complex software systems or enterprise-level architectures, various frameworks have been developed to help organize the different components and perspectives that need to be considered. These frameworks identify key stakeholders, such as business leaders, IT managers, and software developers, and define the various "views" or models that are needed to address their concerns.

However, the authors of this paper argue that these existing frameworks have not adequately accounted for the stakeholders and needs related to data science and machine learning. Machine learning systems have some unique characteristics that set them apart from traditional software systems. For example, the actual functionality of an ML component is not fully specified at design time, but rather inferred from data. Additionally, the architectural models used to describe ML components, such as neural network architectures, often use different notations and formalisms than those used for typical software components.

To better address these ML-specific needs, the authors conducted a survey of over 60 experts across 25 organizations in 10 countries. The goal was to understand what gaps exist in current architecture frameworks and how they could be extended to better support the design and implementation of ML-enabled systems.

Technical Explanation

The study used an online survey instrument to gather input from 61 subject matter experts, including data scientists, software architects, and systems engineers, from over 25 organizations across 10 countries. The survey asked participants about their experiences and perspectives on the challenges of deploying machine learning models in complex software and enterprise systems.

The key findings from the survey include:

Existing architecture frameworks do not adequately capture the unique stakeholders, concerns, and modeling perspectives related to data science and machine learning.
ML components have special characteristics, such as data-driven functionality and specialized architectural models, that require a different approach from traditional software systems.
There is a need to extend current software and enterprise architecture practices to better support the integration of machine learning into larger system designs.

The authors propose that a more holistic, ML-aware system architecture description is needed to address these gaps. This would involve incorporating data science and ML stakeholders into the architecture modeling process, as well as developing new viewpoints and views to capture the unique aspects of ML components.

Critical Analysis

The study provides valuable insights into the challenges of incorporating machine learning capabilities into complex software and enterprise systems. The authors make a compelling case for the need to extend existing architecture frameworks to better support ML-enabled systems.

One potential limitation of the research is the relatively small sample size of 61 survey participants, although the authors note that these participants represented a diverse range of organizations and roles. There may be value in conducting additional studies with larger and more diverse samples to further validate the findings.

Additionally, the paper does not provide detailed recommendations or a specific framework for how to extend current architecture practices to address the ML-related gaps. More work may be needed to develop and validate such a framework.

Overall, this research highlights an important gap in the field of systems and software architecture that deserves further exploration. By incorporating the unique needs of data science and machine learning, future architecture frameworks can better support the design and deployment of ML-enabled systems in complex organizational contexts.

Conclusion

This paper identifies a significant gap in existing software and enterprise architecture frameworks: the lack of consideration for stakeholders and modeling perspectives related to data science and machine learning. The authors conducted an empirical study to better understand these gaps and the unique characteristics of ML components that require a different approach from traditional software systems.

The key takeaway is that as machine learning becomes increasingly integrated into complex software and enterprise architectures, there is a pressing need to extend current architecture practices to better support the design and deployment of ML-enabled systems. This involves incorporating data science and ML-specific stakeholders, concerns, and modeling perspectives into the architecture development process. By addressing these gaps, organizations can better leverage the power of machine learning while ensuring its seamless integration into their larger system architectures.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

A Framework to Model ML Engineering Processes

Sergio Morales, Robert Claris'o, Jordi Cabot

The development of Machine Learning (ML) based systems is complex and requires multidisciplinary teams with diverse skill sets. This may lead to communication issues or misapplication of best practices. Process models can alleviate these challenges by standardizing task orchestration, providing a common language to facilitate communication, and nurturing a collaborative environment. Unfortunately, current process modeling languages are not suitable for describing the development of such systems. In this paper, we introduce a framework for modeling ML-based software development processes, built around a domain-specific language and derived from an analysis of scientific and gray literature. A supporting toolkit is also available.

4/30/2024

cs.SE cs.AI cs.LG

📊

Naming the Pain in Machine Learning-Enabled Systems Engineering

Marcos Kalinowski, Daniel Mendez, Gorkem Giray, Antonio Pedro Santos Alves, Kelly Azevedo, Tatiana Escovedo, Hugo Villamizar, Helio Lopes, Teresa Baldassarre, Stefan Wagner, Stefan Biffl, Jurgen Musil, Michael Felderer, Niklas Lavesson, Tony Gorschek

Context: Machine learning (ML)-enabled systems are being increasingly adopted by companies aiming to enhance their products and operational processes. Objective: This paper aims to deliver a comprehensive overview of the current status quo of engineering ML-enabled systems and lay the foundation to steer practically relevant and problem-driven academic research. Method: We conducted an international survey to collect insights from practitioners on the current practices and problems in engineering ML-enabled systems. We received 188 complete responses from 25 countries. We conducted quantitative statistical analyses on contemporary practices using bootstrapping with confidence intervals and qualitative analyses on the reported problems using open and axial coding procedures. Results: Our survey results reinforce and extend existing empirical evidence on engineering ML-enabled systems, providing additional insights into typical ML-enabled systems project contexts, the perceived relevance and complexity of ML life cycle phases, and current practices related to problem understanding, model deployment, and model monitoring. Furthermore, the qualitative analysis provides a detailed map of the problems practitioners face within each ML life cycle phase and the problems causing overall project failure. Conclusions: The results contribute to a better understanding of the status quo and problems in practical environments. We advocate for the further adaptation and dissemination of software engineering practices to enhance the engineering of ML-enabled systems.

6/10/2024

cs.SE cs.AI

🏷️

Beyond development: Challenges in deploying machine learning models for structural engineering applications

Mohsen Zaker Esteghamati, Brennan Bean, Henry V. Burton, M. Z. Naser

Machine learning (ML)-based solutions are rapidly changing the landscape of many fields, including structural engineering. Despite their promising performance, these approaches are usually only demonstrated as proof-of-concept in structural engineering, and are rarely deployed for real-world applications. This paper aims to illustrate the challenges of developing ML models suitable for deployment through two illustrative examples. Among various pitfalls, the presented discussion focuses on model overfitting and underspecification, training data representativeness, variable omission bias, and cross-validation. The results highlight the importance of implementing rigorous model validation techniques through adaptive sampling, careful physics-informed feature selection, and considerations of both model complexity and generalizability.

4/22/2024

cs.LG cs.CE stat.ML

🗣️

A Systematic Literature Review on the Use of Machine Learning in Software Engineering

Nyaga Fred, I. O. Temkin

Software engineering (SE) is a dynamic field that involves multiple phases all of which are necessary to develop sustainable software systems. Machine learning (ML), a branch of artificial intelligence (AI), has drawn a lot of attention in recent years thanks to its ability to analyze massive volumes of data and extract useful patterns from data. Several studies have focused on examining, categorising, and assessing the application of ML in SE processes. We conducted a literature review on primary studies to address this gap. The study was carried out following the objective and the research questions to explore the current state of the art in applying machine learning techniques in software engineering processes. The review identifies the key areas within software engineering where ML has been applied, including software quality assurance, software maintenance, software comprehension, and software documentation. It also highlights the specific ML techniques that have been leveraged in these domains, such as supervised learning, unsupervised learning, and deep learning. Keywords: machine learning, deep learning, software engineering, natural language processing, source code

6/21/2024

cs.SE cs.LG