Invariant Discovery of Features Across Multiple Length Scales: Applications in Microscopy and Autonomous Materials Characterization

Read original: arXiv:2408.00229 - Published 8/2/2024 by Aditya Raghavan, Utkarsh Pratiush, Mani Valleti, Richard Liu, Reece Emery, Hiroshi Funakubo, Yongtao Liu, Philip Rack, Sergei Kalinin

🌐

Overview

Physical imaging is a fundamental characterization method used across many scientific disciplines.
Images can provide crucial data about atomic structures, material microstructures, and dynamic phenomena.
Extracting and interpreting this information is challenging.
Variational Autoencoders (VAEs) can help identify underlying factors of variation in image data.
A key challenge is defining appropriate descriptors that reflect local structure.

Plain English Explanation

Physical imaging techniques, such as microscopy and spectroscopy, are essential tools used in fields ranging from condensed matter physics and chemistry to astronomy. These imaging methods capture crucial information about the world around us, from the atomic scale to the scale of the universe. The images produced can reveal details about the structure of materials, the dynamics of turbulent systems, and other complex phenomena.

However, effectively extracting and making sense of all the information contained in these images can be challenging. That's where a powerful machine learning technique called Variational Autoencoders (VAEs) comes in. VAEs can help identify the underlying factors that are responsible for the variations seen in image data, allowing researchers to distill meaningful patterns from complex datasets.

One key challenge in using VAEs for this purpose is defining the right "descriptors" - the features or characteristics that the VAE should focus on to capture the relevant local structure in the images. The scale-invariant VAE (SI-VAE) approach addresses this by training the VAE to work with descriptors sampled at different length scales, allowing it to discover the scale-dependent factors of variation in the system.

This approach has been used to analyze a variety of imaging data, from ferroelectric domain images to movies of electron-beam induced phenomena in graphene and topography evolution across combinatorial libraries. The insights gained from this type of analysis can help inform decisions in automated experiments, such as those focused on discovering structure-property relationships.

Technical Explanation

The scale-invariant VAE (SI-VAE) approach is designed to help researchers effectively extract and interpret the wealth of information contained in physical imaging data. It builds on the power of Variational Autoencoders (VAEs), which are a type of deep learning model that can identify the underlying factors responsible for the variations seen in image data.

The key innovation of the SI-VAE approach is the way it handles the definition and selection of appropriate descriptors - the features or characteristics that the VAE should focus on to capture the relevant local structure in the images. Rather than relying on a single set of descriptors, the SI-VAE progressively trains the VAE using descriptors sampled at different length scales. This allows the model to discover the scale-dependent factors of variation in the system.

The researchers demonstrate the effectiveness of this approach using several case studies, including the analysis of ferroelectric domain images, movies of electron-beam induced phenomena in graphene, and topography evolution across combinatorial libraries. In each case, the SI-VAE was able to uncover meaningful patterns and insights that could inform further research and experimentation.

Critical Analysis

The scale-invariant VAE (SI-VAE) approach presented in this paper is a promising technique for extracting and interpreting the wealth of information contained in physical imaging data. By training the VAE model to work with descriptors sampled at different length scales, the SI-VAE can uncover scale-dependent factors of variation that might be missed by more traditional approaches.

One potential limitation of the SI-VAE is that it requires the researcher to define an appropriate set of descriptors to begin with. While the progressive training approach helps the model discover the most relevant factors, the initial selection of descriptors could still influence the results. Further research may be needed to explore more automated or data-driven approaches to descriptor selection.

Additionally, the paper focuses primarily on the application of the SI-VAE to specific imaging modalities and case studies. While the authors claim the approach is "universal" and can be applied across a broad range of imaging methods and phenomena, more evidence may be needed to fully demonstrate the generalizability of the technique.

Overall, the scale-invariant VAE (SI-VAE) approach represents an exciting development in the field of physical imaging data analysis. By leveraging the power of Variational Autoencoders and introducing a novel approach to descriptor selection, the SI-VAE has the potential to unlock new insights and drive further discoveries across a wide range of scientific disciplines.

Conclusion

The scale-invariant VAE (SI-VAE) approach presented in this paper is a significant advancement in the field of physical imaging data analysis. By training Variational Autoencoders to work with descriptors sampled at different length scales, the SI-VAE can uncover the scale-dependent factors of variation in complex imaging datasets, providing researchers with valuable insights that could inform further experimentation and discovery.

The demonstrated applications of the SI-VAE, ranging from ferroelectric domain images to movies of electron-beam induced phenomena in graphene and topography evolution across combinatorial libraries, showcase the versatility and potential of this technique. As the authors suggest, the SI-VAE approach could be particularly useful for exploring phenomena that exhibit scale-invariant properties, such as turbulence or transformation fronts.

Overall, the scale-invariant VAE (SI-VAE) approach represents a significant step forward in the field of physical imaging data analysis, with the potential to unlock new discoveries and insights across a wide range of scientific disciplines.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌐

Invariant Discovery of Features Across Multiple Length Scales: Applications in Microscopy and Autonomous Materials Characterization

Aditya Raghavan, Utkarsh Pratiush, Mani Valleti, Richard Liu, Reece Emery, Hiroshi Funakubo, Yongtao Liu, Philip Rack, Sergei Kalinin

Physical imaging is a foundational characterization method in areas from condensed matter physics and chemistry to astronomy and spans length scales from atomic to universe. Images encapsulate crucial data regarding atomic bonding, materials microstructures, and dynamic phenomena such as microstructural evolution and turbulence, among other phenomena. The challenge lies in effectively extracting and interpreting this information. Variational Autoencoders (VAEs) have emerged as powerful tools for identifying underlying factors of variation in image data, providing a systematic approach to distilling meaningful patterns from complex datasets. However, a significant hurdle in their application is the definition and selection of appropriate descriptors reflecting local structure. Here we introduce the scale-invariant VAE approach (SI-VAE) based on the progressive training of the VAE with the descriptors sampled at different length scales. The SI-VAE allows the discovery of the length scale dependent factors of variation in the system. Here, we illustrate this approach using the ferroelectric domain images and generalize it to the movies of the electron-beam induced phenomena in graphene and topography evolution across combinatorial libraries. This approach can further be used to initialize the decision making in automated experiments including structure-property discovery and can be applied across a broad range of imaging methods. This approach is universal and can be applied to any spatially resolved data including both experimental imaging studies and simulations, and can be particularly useful for exploration of phenomena such as turbulence, scale-invariant transformation fronts, etc.

8/2/2024

Targetin the partition function of chemically disordered materials with a generative approach based on inverse variational autoencoders

Maciej J. Karcz, Luca Messina, Eiji Kawasaki, Emeric Bourasseau

Computing atomic-scale properties of chemically disordered materials requires an efficient exploration of their vast configuration space. Traditional approaches such as Monte Carlo or Special Quasirandom Structures either entail sampling an excessive amount of configurations or do not ensure that the configuration space has been properly covered. In this work, we propose a novel approach where generative machine learning is used to yield a representative set of configurations for accurate property evaluation and provide accurate estimations of atomic-scale properties with minimal computational cost. Our method employs a specific type of variational autoencoder with inverse roles for the encoder and decoder, enabling the application of an unsupervised active learning scheme that does not require any initial training database. The model iteratively generates configuration batches, whose properties are computed with conventional atomic-scale methods. These results are then fed back into the model to estimate the partition function, repeating the process until convergence. We illustrate our approach by computing point-defect formation energies and concentrations in (U, Pu)O2 mixed-oxide fuels. In addition, the ML model provides valuable insights into the physical factors influencing the target property. Our method is generally applicable to explore other properties, such as atomic-scale diffusion coefficients, in ideally or non-ideally disordered materials like high-entropy alloys.

9/11/2024

Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology

Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, Dominique Beaini, Maciej Sypetkowski, Chi Vicky Cheng, Kristen Morse, Maureen Makes, Ben Mabey, Berton Earnshaw

Featurizing microscopy images for use in biological research remains a significant challenge, especially for large-scale experiments spanning millions of images. This work explores the scaling properties of weakly supervised classifiers and self-supervised masked autoencoders (MAEs) when training with increasingly larger model backbones and microscopy datasets. Our results show that ViT-based MAEs outperform weakly supervised classifiers on a variety of tasks, achieving as much as a 11.5% relative improvement when recalling known biological relationships curated from public databases. Additionally, we develop a new channel-agnostic MAE architecture (CA-MAE) that allows for inputting images of different numbers and orders of channels at inference time. We demonstrate that CA-MAEs effectively generalize by inferring and evaluating on a microscopy image dataset (JUMP-CP) generated under different experimental conditions with a different channel structure than our pretraining data (RPI-93M). Our findings motivate continued research into scaling self-supervised learning on microscopy data in order to create powerful foundation models of cellular biology that have the potential to catalyze advancements in drug discovery and beyond.

4/17/2024

Data-efficient and Interpretable Inverse Materials Design using a Disentangled Variational Autoencoder

Cheng Zeng, Zulqarnain Khan, Nathan L. Post

Inverse materials design has proven successful in accelerating novel material discovery. Many inverse materials design methods use unsupervised learning where a latent space is learned to offer a compact description of materials representations. A latent space learned this way is likely to be entangled, in terms of the target property and other properties of the materials. This makes the inverse design process ambiguous. Here, we present a semi-supervised learning approach based on a disentangled variational autoencoder to learn a probabilistic relationship between features, latent variables and target properties. This approach is data efficient because it combines all labelled and unlabelled data in a coherent manner, and it uses expert-informed prior distributions to improve model robustness even with limited labelled data. It is in essence interpretable, as the learnable target property is disentangled out of the other properties of the materials, and an extra layer of interpretability can be provided by a post-hoc analysis of the classification head of the model. We demonstrate this new approach on an experimental high-entropy alloy dataset with chemical compositions as input and single-phase formation as the single target property. While single property is used in this work, the disentangled model can be extended to customize for inverse design of materials with multiple target properties.

9/12/2024