Data-efficient and Interpretable Inverse Materials Design using a Disentangled Variational Autoencoder

Read original: arXiv:2409.06740 - Published 9/12/2024 by Cheng Zeng, Zulqarnain Khan, Nathan L. Post

Data-efficient and Interpretable Inverse Materials Design using a Disentangled Variational Autoencoder

Overview

This paper presents a data-efficient and interpretable approach for inverse materials design using a disentangled variational autoencoder (dVAE).
The method aims to learn a latent representation that disentangles the material composition and structure, enabling efficient and interpretable inverse design.
The dVAE is trained on a limited dataset of material properties and structures, and is then used to generate new material designs with desired properties.

Plain English Explanation

The researchers developed a machine learning model called a disentangled variational autoencoder (dVAE) to help design new materials with specific properties. This model was trained on a relatively small dataset of information about different materials, including their chemical composition and physical structure.

Instead of learning a single, complex representation of the materials, the dVAE model was designed to learn separate, disentangled representations for the composition and structure of the materials. This allows the model to more easily understand and manipulate these different aspects of the materials independently.

Once trained, the dVAE model can be used to efficiently generate new material designs that have the desired properties. By adjusting the composition and structure representations separately, the researchers can explore the space of possible materials in a more targeted and interpretable way, without needing a huge dataset.

This approach aims to make the materials design process more data-efficient and easier to understand, compared to using a more complex, black-box machine learning model. The disentangled representations provide insight into how the composition and structure of the materials contribute to their overall properties.

Technical Explanation

The key innovation in this paper is the use of a disentangled variational autoencoder (dVAE) for inverse materials design. The dVAE is trained to learn a latent representation that disentangles the material composition and structure, rather than encoding them into a single, complex latent space.

The model architecture consists of an encoder that maps the material composition and structure into separate latent representations, and a decoder that reconstructs the original material properties from the latent codes. The model is trained using a variational objective that encourages the latent codes to be informative and disentangled.

Once trained, the dVAE can be used for efficient and interpretable inverse design. By manipulating the latent composition and structure representations independently, the researchers can explore the space of possible materials in a targeted way to find designs with desired properties. The disentangled latent space provides insight into how the different material attributes contribute to the overall properties.

The authors demonstrate the effectiveness of this approach on two materials design tasks: generating new alloy compositions with target properties, and designing new crystal structures with desired thermal and electronic properties. They show that the dVAE outperforms standard VAE and other baselines in terms of data efficiency and interpretability.

Critical Analysis

One key limitation of this work is the reliance on a relatively small dataset of material properties and structures. While the dVAE model is designed to be data-efficient, the performance may be constrained by the amount and quality of the training data available.

Additionally, the disentanglement of the latent representation is not guaranteed to be perfect, and there may still be some entanglement between the composition and structure encodings. Further research is needed to explore more robust disentanglement techniques and their impact on inverse design performance.

The authors also acknowledge that the interpretability of the latent representations may depend on the specific material system and properties of interest. The method may not be equally effective for all types of materials design problems.

Overall, this work presents a promising approach for data-efficient and interpretable inverse materials design, but more research is needed to address the limitations and extend the method to a wider range of materials design challenges.

Conclusion

This paper introduces a novel disentangled variational autoencoder (dVAE) model for efficient and interpretable inverse materials design. By learning separate latent representations for material composition and structure, the dVAE enables targeted exploration of the material design space and provides insights into the underlying relationships between material attributes and properties.

The authors demonstrate the effectiveness of this approach on two materials design tasks, showcasing the data-efficiency and interpretability advantages of the dVAE over standard VAE and other baselines. This work represents an important step towards more accessible and explainable materials design, with potential applications in fields like clean energy, electronics, and advanced manufacturing.

While the current implementation has some limitations, the general principles of disentangled latent representations and targeted inverse design hold promise for significantly improving the materials discovery process. Further research in this direction could lead to transformative advancements in the field of computational materials science.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Data-efficient and Interpretable Inverse Materials Design using a Disentangled Variational Autoencoder

Cheng Zeng, Zulqarnain Khan, Nathan L. Post

Inverse materials design has proven successful in accelerating novel material discovery. Many inverse materials design methods use unsupervised learning where a latent space is learned to offer a compact description of materials representations. A latent space learned this way is likely to be entangled, in terms of the target property and other properties of the materials. This makes the inverse design process ambiguous. Here, we present a semi-supervised learning approach based on a disentangled variational autoencoder to learn a probabilistic relationship between features, latent variables and target properties. This approach is data efficient because it combines all labelled and unlabelled data in a coherent manner, and it uses expert-informed prior distributions to improve model robustness even with limited labelled data. It is in essence interpretable, as the learnable target property is disentangled out of the other properties of the materials, and an extra layer of interpretability can be provided by a post-hoc analysis of the classification head of the model. We demonstrate this new approach on an experimental high-entropy alloy dataset with chemical compositions as input and single-phase formation as the single target property. While single property is used in this work, the disentangled model can be extended to customize for inverse design of materials with multiple target properties.

9/12/2024

Targetin the partition function of chemically disordered materials with a generative approach based on inverse variational autoencoders

Maciej J. Karcz, Luca Messina, Eiji Kawasaki, Emeric Bourasseau

Computing atomic-scale properties of chemically disordered materials requires an efficient exploration of their vast configuration space. Traditional approaches such as Monte Carlo or Special Quasirandom Structures either entail sampling an excessive amount of configurations or do not ensure that the configuration space has been properly covered. In this work, we propose a novel approach where generative machine learning is used to yield a representative set of configurations for accurate property evaluation and provide accurate estimations of atomic-scale properties with minimal computational cost. Our method employs a specific type of variational autoencoder with inverse roles for the encoder and decoder, enabling the application of an unsupervised active learning scheme that does not require any initial training database. The model iteratively generates configuration batches, whose properties are computed with conventional atomic-scale methods. These results are then fed back into the model to estimate the partition function, repeating the process until convergence. We illustrate our approach by computing point-defect formation energies and concentrations in (U, Pu)O2 mixed-oxide fuels. In addition, the ML model provides valuable insights into the physical factors influencing the target property. Our method is generally applicable to explore other properties, such as atomic-scale diffusion coefficients, in ideally or non-ideally disordered materials like high-entropy alloys.

9/11/2024

🧠

Learning Disentangled Semantic Spaces of Explanations via Invertible Neural Networks

Yingji Zhang, Danilo S. Carvalho, Andr'e Freitas

Disentangled latent spaces usually have better semantic separability and geometrical properties, which leads to better interpretability and more controllable data generation. While this has been well investigated in Computer Vision, in tasks such as image disentanglement, in the NLP domain sentence disentanglement is still comparatively under-investigated. Most previous work have concentrated on disentangling task-specific generative factors, such as sentiment, within the context of style transfer. In this work, we focus on a more general form of sentence disentanglement, targeting the localised modification and control of more general sentence semantic features. To achieve this, we contribute to a novel notion of sentence semantic disentanglement and introduce a flow-based invertible neural network (INN) mechanism integrated with a transformer-based language Autoencoder (AE) in order to deliver latent spaces with better separability properties. Experimental results demonstrate that the model can conform the distributed latent space into a better semantically disentangled sentence space, leading to improved language interpretability and controlled generation when compared to the recent state-of-the-art language VAE models.

6/12/2024

📉

Manifold Learning by Mixture Models of VAEs for Inverse Problems

Giovanni S. Alberti, Johannes Hertrich, Matteo Santacesaria, Silvia Sciutto

Representing a manifold of very high-dimensional data with generative models has been shown to be computationally efficient in practice. However, this requires that the data manifold admits a global parameterization. In order to represent manifolds of arbitrary topology, we propose to learn a mixture model of variational autoencoders. Here, every encoder-decoder pair represents one chart of a manifold. We propose a loss function for maximum likelihood estimation of the model weights and choose an architecture that provides us the analytical expression of the charts and of their inverses. Once the manifold is learned, we use it for solving inverse problems by minimizing a data fidelity term restricted to the learned manifold. To solve the arising minimization problem we propose a Riemannian gradient descent algorithm on the learned manifold. We demonstrate the performance of our method for low-dimensional toy examples as well as for deblurring and electrical impedance tomography on certain image manifolds.

8/13/2024