Alice's Adventures in a Differentiable Wonderland -- Volume I, A Tour of the Land

2404.17625

Published 4/30/2024 by Simone Scardapane

👁️

Abstract

This book is a self-contained introduction to the design of modern (deep) neural networks. Because the term neural comes with a lot of historical baggage, I prefer the simpler term differentiable models in the text. The focus of this 250-pages volume is on building efficient blocks for processing $n$D data, including convolutions, transformers, graph layers, and modern recurrent models (including linearized transformers and structured state-space models). Because the field is evolving quickly, I have tried to strike a good balance between theory and code, historical considerations and recent trends. I assume the reader has some exposure to machine learning and linear algebra, but I try to cover the preliminaries when necessary. The volume is a refined draft from a set of lecture notes for a course called Neural Networks for Data Science Applications that I teach in Sapienza. I do not cover many advanced topics (generative modeling, explainability, prompting, agents), which will be published over time in the companion website.

Create account to get full access

Overview

This book is a self-contained introduction to the design of modern (deep) neural networks, also referred to as "differentiable models" to avoid historical baggage.
The focus is on building efficient building blocks for processing n-dimensional data, including convolutions, transformers, graph layers, and modern recurrent models.
The author aims to strike a balance between theory and code, historical considerations and recent trends, assuming the reader has some exposure to machine learning and linear algebra.
The book is a refined draft from lecture notes for a course on Neural Networks for Data Science Applications, and does not cover advanced topics like generative modeling, explainability, prompting, and agents, which will be published separately.

Plain English Explanation

This book is a comprehensive guide to the design of modern neural networks, which the author prefers to call "differentiable models" to avoid the historical baggage associated with the term "neural". The focus is on creating efficient building blocks for processing multi-dimensional data, such as convolutions, transformers, graph layers, and advanced recurrent models.

The author has tried to strike a balance between theory and practical implementation, as well as between historical context and the latest developments in the field. The book assumes the reader has some familiarity with machine learning and linear algebra, but covers the necessary preliminaries when needed.

This book is based on lecture notes for a course on Neural Networks for Data Science Applications, and does not delve into more advanced topics like generative modeling, explainability, [prompting], and [agents], which will be covered in a companion website.

Technical Explanation

The book is a comprehensive introduction to the design and implementation of modern neural networks, referred to as "differentiable models" to avoid the historical baggage associated with the term "neural". The author focuses on building efficient building blocks for processing n-dimensional data, including convolutions, transformers, [graph layers], and modern recurrent models.

The book aims to strike a balance between theory and practical implementation, as well as between historical context and the latest developments in the field. The author assumes the reader has some familiarity with machine learning and linear algebra, but covers the necessary preliminaries when needed.

The content is based on refined lecture notes from a course called "Neural Networks for Data Science Applications" taught by the author at Sapienza University. The book does not cover more advanced topics like generative modeling, explainability, prompting, and agents, which will be published separately in a companion website.

Critical Analysis

The author's decision to avoid the term "neural" in favor of "differentiable models" is an interesting approach that may help readers approach the subject with a fresh perspective, unencumbered by the historical baggage associated with the field of neural networks.

The focus on building efficient building blocks for processing n-dimensional data is a practical and relevant approach, as many real-world applications involve complex, high-dimensional data. The inclusion of transformers, graph layers, and modern recurrent models suggests the book will cover a broad range of cutting-edge techniques in neural network design.

One potential limitation of the book is its scope, as the author has chosen to exclude advanced topics like generative modeling, explainability, prompting, and agents. While this decision may have been made to maintain a focused and manageable volume, it could leave some readers wanting more in-depth coverage of these important areas of research and development.

Overall, this book appears to be a well-designed and comprehensive introduction to the modern design of neural networks, with a balanced approach between theory and practice. The author's expertise and the refinement of the content from a university course suggest the book will be a valuable resource for students, researchers, and practitioners in the field of machine learning and data science.

Conclusion

This book offers a self-contained and up-to-date introduction to the design of modern neural networks, or "differentiable models" as the author prefers to call them. By focusing on the construction of efficient building blocks for processing n-dimensional data, the book provides a practical and relevant approach to neural network design, covering a range of cutting-edge techniques like convolutions, transformers, graph layers, and modern recurrent models.

While the book does not delve into more advanced topics like generative modeling, explainability, prompting, and agents, it aims to strike a balance between theory and code, as well as historical context and recent trends. The author's expertise and the refinement of the content from a university course suggest this book will be a valuable resource for students, researchers, and practitioners in the field of machine learning and data science.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

A singular Riemannian Geometry Approach to Deep Neural Networks III. Piecewise Differentiable Layers and Random Walks on $n$-dimensional Classes

Alessandro Benfenati, Alessio Marta

Neural networks are playing a crucial role in everyday life, with the most modern generative models able to achieve impressive results. Nonetheless, their functioning is still not very clear, and several strategies have been adopted to study how and why these model reach their outputs. A common approach is to consider the data in an Euclidean settings: recent years has witnessed instead a shift from this paradigm, moving thus to more general framework, namely Riemannian Geometry. Two recent works introduced a geometric framework to study neural networks making use of singular Riemannian metrics. In this paper we extend these results to convolutional, residual and recursive neural networks, studying also the case of non-differentiable activation functions, such as ReLU. We illustrate our findings with some numerical experiments on classification of images and thermodynamic problems.

4/10/2024

cs.LG

🎲

New!Reasoning About Action and Change

Florence Dupin de Saint-Cyr (IRIT-ADRIA, UT3), Andreas Herzig (IRIT-LILaC, CNRS), J'er^ome Lang (LAMSADE, PSL, IRIT-ADRIA), Pierre Marquis (CRIL)

The purpose of this book is to provide an overview of AI research, ranging from basic work to interfaces and applications, with as much emphasis on results as on current issues. It is aimed at an audience of master students and Ph.D. students, and can be of interest as well for researchers and engineers who want to know more about AI. The book is split into three volumes.

6/28/2024

cs.AI cs.DM cs.LO cs.SC

📉

A differentiable programming framework for spin models

Tiago de Souza Farias, Vitor Vaz Schultz, Jos'e Carlos Merino Mombach, Jonas Maziero

We introduce a novel framework for simulating spin models using differentiable programming, an approach that leverages the advancements in machine learning and computational efficiency. We focus on three distinct spin systems: the Ising model, the Potts model, and the Cellular Potts model, demonstrating the practicality and scalability of our framework in modeling these complex systems. Additionally, this framework allows for the optimization of spin models, which can adjust the parameters of a system by a defined objective function. In order to simulate these models, we adapt the Metropolis-Hastings algorithm to a differentiable programming paradigm, employing batched tensors for simulating spin lattices. This adaptation not only facilitates the integration with existing deep learning tools but also significantly enhances computational speed through parallel processing capabilities, as it can be implemented on different hardware architectures, including GPUs and TPUs.

5/24/2024

cs.LG

🧠

Neural Fluidic System Design and Control with Differentiable Simulation

Yifei Li, Yuchen Sun, Pingchuan Ma, Eftychios Sifakis, Tao Du, Bo Zhu, Wojciech Matusik

We present a novel framework to explore neural control and design of complex fluidic systems with dynamic solid boundaries. Our system features a fast differentiable Navier-Stokes solver with solid-fluid interface handling, a low-dimensional differentiable parametric geometry representation, a control-shape co-design algorithm, and gym-like simulation environments to facilitate various fluidic control design applications. Additionally, we present a benchmark of design, control, and learning tasks on high-fidelity, high-resolution dynamic fluid environments that pose challenges for existing differentiable fluid simulators. These tasks include designing the control of artificial hearts, identifying robotic end-effector shapes, and controlling a fluid gate. By seamlessly incorporating our differentiable fluid simulator into a learning framework, we demonstrate successful design, control, and learning results that surpass gradient-free solutions in these benchmark tasks.

5/27/2024

cs.AI cs.GR