A Theory of Machine Learning

Read original: arXiv:2407.05520 - Published 7/9/2024 by Jinsook Kim, Jinho Kang
Total Score

0

⚙️

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper proposes a new theory of machine learning that challenges the prevailing "epistemic" approach.
  • The authors argue that the epistemic approach has fundamental limitations and present an alternative "systems-theoretic" view of machine learning.
  • The new theory has implications for how we understand the capabilities and limitations of machine learning systems.

Plain English Explanation

The paper discusses two main theories of machine learning and introduces a new third theory. The first theory, called the "epistemic" approach, is the dominant view in the field. This sees machine learning as a way to gain knowledge about the world by identifying patterns in data. The second theory, which the authors call "credal learning," builds on the epistemic approach but allows for more uncertainty in the models.

The paper then presents the authors' new "systems-theoretic" theory of machine learning. This views machine learning not just as a way to gain knowledge, but as a dynamic system that interacts with and transforms the world. The key insight is that machine learning systems don't just passively observe and model the world - they actively shape it through their interactions.

The implications of this new theory are significant. It suggests that the capabilities of machine learning are fundamentally limited by the systems they are embedded in. The systems-theoretic view also raises questions about the true "intelligence" of machine learning, since the systems are so deeply tied to their environment.

Overall, this paper challenges the dominant way of thinking about machine learning and offers a new perspective that has important consequences for how we understand and develop these technologies.

Technical Explanation

The paper begins by critiquing the prevailing "epistemic" approach to machine learning, which sees it as a way to gain knowledge about the world by identifying patterns in data. The authors argue that this view has fundamental limitations, as it fails to account for the dynamic, interactive nature of machine learning systems.

The authors then present their alternative "systems-theoretic" theory of machine learning. This views machine learning not just as a knowledge-gathering process, but as a dynamic system that is embedded in and interacts with the world around it. The key insight is that machine learning systems don't just passively observe and model the world - they actively shape it through their interactions.

To support this theory, the authors draw on concepts from systems theory, control theory, and cognitive science. They present a formal mathematical framework for understanding machine learning as a systems-theoretic process, and use this to derive various implications and predictions about the capabilities and limitations of these technologies.

For example, the systems-theoretic view suggests that the performance of machine learning systems is fundamentally bounded by the properties of the environment they are embedded in. It also raises questions about the true "intelligence" of these systems, since their behavior is so tightly coupled to their surroundings.

The paper also discusses the relationship between the systems-theoretic approach and other emerging perspectives on machine learning, such as the credal learning theory and the survey of statistical theory of deep learning. The authors argue that their framework can help unify and reconcile these different views.

Critical Analysis

The paper presents a compelling and well-developed alternative to the dominant epistemic approach to machine learning. The systems-theoretic view offers important insights into the dynamic, interactive nature of these technologies and the ways in which they are fundamentally shaped by their environments.

That said, the paper does not fully address some important limitations and caveats of the proposed theory. For example, it is not always clear how the systems-theoretic framework would apply to specific machine learning architectures and use cases. The authors also do not delve deeply into the empirical evidence supporting their claims, relying more on theoretical arguments.

Additionally, the implications of the systems-theoretic view for the future development and deployment of machine learning systems are not fully explored. While the paper raises important questions about the "intelligence" and capabilities of these technologies, it does not offer a clear roadmap for how to navigate these issues.

Overall, this paper represents a significant contribution to the ongoing debate about the nature of machine learning and its relationship to the world. However, further research and empirical validation would be needed to fully assess the practical utility and broader applicability of the systems-theoretic approach.

Conclusion

This paper presents a novel "systems-theoretic" theory of machine learning that challenges the prevailing epistemic approach. The key insight is that machine learning systems don't just passively observe and model the world, but actively shape it through their interactions.

The implications of this view are significant, as it suggests that the capabilities of machine learning are fundamentally bounded by the properties of the environments they are embedded in. It also raises important questions about the true "intelligence" of these technologies, since their behavior is so tightly coupled to their surroundings.

While the paper does not fully address all the limitations and practical implications of the systems-theoretic approach, it offers a compelling alternative perspective that is likely to shape ongoing debates and research in the field of machine learning. As the development of these technologies continues to accelerate, a deeper understanding of their dynamic, interactive nature will be crucial for ensuring their safe and responsible deployment.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

⚙️

Total Score

0

A Theory of Machine Learning

Jinsook Kim, Jinho Kang

We critically review three major theories of machine learning and provide a new theory according to which machines learn a function when the machines successfully compute it. We show that this theory challenges common assumptions in the statistical and the computational learning theories, for it implies that learning true probabilities is equivalent neither to obtaining a correct calculation of the true probabilities nor to obtaining an almost-sure convergence to them. We also briefly discuss some case studies from natural language processing and macroeconomics from the perspective of the new theory.

Read more

7/9/2024

🌿

Total Score

0

Can Machines Learn the True Probabilities?

Jinsook Kim

When there exists uncertainty, AI machines are designed to make decisions so as to reach the best expected outcomes. Expectations are based on true facts about the objective environment the machines interact with, and those facts can be encoded into AI models in the form of true objective probability functions. Accordingly, AI models involve probabilistic machine learning in which the probabilities should be objectively interpreted. We prove under some basic assumptions when machines can learn the true objective probabilities, if any, and when machines cannot learn them.

Read more

7/9/2024

Credal Learning Theory
Total Score

0

Credal Learning Theory

Michele Caprio, Maryam Sultana, Eleni Elia, Fabio Cuzzolin

Statistical learning theory is the foundation of machine learning, providing theoretical bounds for the risk of models learnt from a (single) training set, assumed to issue from an unknown probability distribution. In actual deployment, however, the data distribution may (and often does) vary, causing domain adaptation/generalization issues. In this paper we lay the foundations for a `credal' theory of learning, using convex sets of probabilities (credal sets) to model the variability in the data-generating distribution. Such credal sets, we argue, may be inferred from a finite sample of training sets. Bounds are derived for the case of finite hypotheses spaces (both assuming realizability or not) as well as infinite model spaces, which directly generalize classical results.

Read more

5/6/2024

↗️

Total Score

0

Information-Theoretic Foundations for Machine Learning

Hong Jun Jeon, Benjamin Van Roy

The staggering progress of machine learning in the past decade has been a sight to behold. In retrospect, it is both remarkable and unsettling that these milestones were achievable with little to no rigorous theory to guide experimentation. Despite this fact, practitioners have been able to guide their future experimentation via observations from previous large-scale empirical investigations. However, alluding to Plato's Allegory of the cave, it is likely that the observations which form the field's notion of reality are but shadows representing fragments of that reality. In this work, we propose a theoretical framework which attempts to answer what exists outside of the cave. To the theorist, we provide a framework which is mathematically rigorous and leaves open many interesting ideas for future exploration. To the practitioner, we provide a framework whose results are very intuitive, general, and which will help form principles to guide future investigations. Concretely, we provide a theoretical framework rooted in Bayesian statistics and Shannon's information theory which is general enough to unify the analysis of many phenomena in machine learning. Our framework characterizes the performance of an optimal Bayesian learner, which considers the fundamental limits of information. Throughout this work, we derive very general theoretical results and apply them to derive insights specific to settings ranging from data which is independently and identically distributed under an unknown distribution, to data which is sequential, to data which exhibits hierarchical structure amenable to meta-learning. We conclude with a section dedicated to characterizing the performance of misspecified algorithms. These results are exciting and particularly relevant as we strive to overcome increasingly difficult machine learning challenges in this endlessly complex world.

Read more

8/21/2024