Recently, a range of computational tools have been developed through research attempts such as the Materials Project and the Materials Genome Initiative. These tools have been helpful in designing innovative materials that can be used for various applications, such as aeronautics, energy, civil engineering, and electronics.
However, formulating techniques for synthesizing these materials has been dependent on a fusion of intuition, experience, as well as manual literature reviews.
Scientists from MIT, the University of Massachusetts at Amherst, and the University of California at Berkeley aspire to bridge the materials science-automation gap by using an innovative artificial-intelligence system with the ability to analyze various research papers to gather “recipes” for synthesizing specific materials.
Computational materials scientists have made a lot of progress in the ‘what’ to make—what material to design based on desired properties, but because of that success, the bottleneck has shifted to, ‘Okay, now how do I make it?’
Elsa Olivetti, Atlantic Richfield Assistant Professor of Energy Studies in MIT’s Department of Materials Science and Engineering (DMSE).
The team aims to develop a database including materials recipes deduced from millions of research papers. Researchers and engineers can enter the name of a target material and other criteria such as reaction conditions, precursor materials, and fabrication processes to extract proposed recipes.
Getting a step closer to achieving their aim, Olivetti and her team have created a machine-learning system with the ability to analyze a research paper, extract the paragraphs comprising materials recipes, and categorize the words in the paragraphs relative to their roles in the recipes: names of target materials, names of pieces of equipment, numeric quantities, descriptive adjectives, operating conditions, and so on.
In a paper published in the latest issue of the Chemistry of Materials journal, the researchers have showed that a machine-learning system has the ability to examine the deduced data to derive general properties of different types of materials (e.g. different temperature ranges required for their production), or specific properties of individual materials (e.g. different physical forms taken by the materials due to changes in fabrication conditions).
The senior author of the paper is Olivetti. Edward Kim, an MIT graduate student in DMSE; Kevin Huang, a DMSE postdoc; Adam Saunders and Andrew McCallum, computer scientists at UMass Amherst; and Gerbrand Ceder, a Chancellor’s Professor in the Department of Materials Science and Engineering at Berkeley are other authors of the paper.
Filling in the gaps
The team trained their system by adopting a combination of unsupervised and supervised machine-learning methods. “Supervised” suggests that the training data entered into the system is initially annotated by people. The system attempts correlate the raw data and the annotations. “Unsupervised” means that the training data is not annotated and the system instead learns to cluster data together according to structural similarities.
Due to the fact that materials-recipe extraction is a new research field, Olivetti and her team did not have access to huge, annotated data sets gathered over many years by diverse research groups. They had to annotate the data on their own of which there were nearly 100 papers.
With respect to machine-learning standards, this is a quite small data set. In order to enhance it, the researchers adopted an algorithm known as Word2vec, which was created at Google. Word2vec analyzes the contexts in which there are words (i.e. the syntactic roles of the words in the sentences and the other words next to these words) and assembles together words with similar contexts. Therefore, for example, if one paper included the sentence “We heated the titanium tetrachloride to 500 C,” and yet other included the sentence “The sodium hydroxide was heated to 500 C,” Word2vec will group together “titanium tetrachloride” and “sodium hydroxide.”
The team successfully used Word2vec to considerably enlarge their training set because the machine-learning system had the ability to figure out whether a label assigned to any word was probably assigned to other words clustered along with it. Therefore, in the place of 100 papers, the team can hence train their system to analyze nearly 640,000 papers.
Tip of the iceberg
However, in order to investigate the accuracy of the system, the team had to depend on the labeled data because there was no criterion for assessing its performance on the unlabeled data. In the assessments, the system could identify, at accuracy of 99%, the paragraphs including recipes and to label (at accuracy of 86%) the words in these paragraphs.
The researchers believe that further research can enhance the accuracy of the system. In the research in progress, they are investigating a range of in-depth learning methods with the ability to make further generalizations in relation to the materials recipe structure, with the aim of automatically formulating recipes for materials not accounted for in the prevalent literature.
A major proportion of Olivetti’s earlier studies have focused on discovering more cost-efficient and environmental-friendly techniques for producing useful materials. She believes that a database of materials recipes can assist that project.
This is landmark work, the authors have taken on the difficult and ambitious challenge of capturing, through AI methods, strategies employed for the preparation of new materials. The work demonstrates the power of machine learning, but it would be accurate to say that the eventual judge of success or failure would require convincing practitioners that the utility of such methods can enable them to abandon their more instinctual approaches.
Ram Seshadri, the Fred and Linda R. Wudl Professor of Materials Science at the University of California at Santa Barbara.
The National Science Foundation, Office of Naval Research, the Department of Energy, and seed support through the MIT Energy Initiative supported the study. The Natural Sciences and Engineering Research Council of Canada partially supported Kim.