skip to content

Department of Physics

The Cavendish Laboratory
 
Machine learning auto-generates databases for magnetic materials discovery

Data science heralds a next-generation approach to materials discovery. Databases of materials and property information can be mined with artificial intelligence to target new materials that are tailored to a specific device application. Success in discovery is nonetheless predicated on having appropriate materials databases to mine. Realising such databases is no small feat - while massive quantities of materials and property data lie in the literature, they are all dispersed across millions of documents. It would require many lifetimes of work to manually curate these sorts of databases.

The Molecular Engineering group at the University of Cambridge has developed software that uses artificial intelligence to auto-generate materials databases.

Their parent software, ChemDataExtractor, is a ‘chemistry aware’ text-mining tool that identifies paired quantities of material-property data in scientific documents and automatically collates these data for a given targeted application in the form of a materials database. This uses natural language processing and optical character recognition as its generic basis but its real power lies in the machine learning methods and scientific dictionaries that interpret scientific text, often exploiting the context of phrases that scientists tend to write in papers using a particular construct.

Even within the chemistry or physics domain, the style of scientific writing is very different. So while ChemDataExtractor was originally applied to organic data, it has now been developed to interpret data on inorganic materials.
The results of this development have just been published in the Nature journal, Scientific Data. This opens up the opportunity to create and thence mine databases for inorganic materials applications.

Callum Court and Jacqui Cole present a new algorithmic workflow to realise this goal, which employs machine learning with probabilistic reasoning. They illustrate the use of this new method to auto-generate a large materials database for magnetic materials applications; specifically, they present a new materials database of Curie and Néel phase-transition temperatures with their cognate chemical information.

This leads to a much larger objective in data-driven magnetic materials discovery in Molecular Engineering. The prediction work is the PhD subject of Callum Court, who is studying within the EPSRC-funded Centre of Doctoral Training in Computational Methods for Materials Science at Cambridge. The predicted materials will be experimentally validated by collaborators at the UK Neutron and Muon source (the ISIS Facility), who is a founding external partner in the Molecular Engineering initiative at the University of Cambridge.

Share