Data Extraction Tool May Lead to Discovery of New Polymers – Datanami

July 14, 2023 The amount of published materials science research is growing at an exponential rate, too fast for scientists to keep up. To help these scholars, a first-of-its-kind materials science data extraction pipeline is now available to make their research easier and faster.

Credit: Georgia Tech

The pipeline extracts material property records from published papers and populates the data into a new application called Polymer Scholar. The platform works like a browser to search polymers and materials properties by keyword, rather than reading through countless articles.

The application makes materials research more efficient, which could lead to discovery of new polymers.

Essentially, we have created an index on materials science literature that is much more granular than ones in a typical index that a search engine would create, said Georgia Tech Ph.D. studentPranav Shetty, the lead designer of the pipeline.

Our hope is that materials science researchers can make use of this capability in their day to day lives and workflows, and therefore, allow their work to have much more usability toward studying polymers and developing new materials.

The groups paper says the number of materials science papers published annually grows at a rate of 6% compounded annually. This amount of content makes for long, difficult work for scientists and in need of a computing solution.

The groups answer is MaterialsBERT, a model they built and trained that powers the pipeline.

MaterialsBERT categorizes words in text by association with a material property record. After the model associates text with records, the data is fed to Polymer Scholar. Scientists can use Polymer Scholar to study data, searching either polymer name or a property, like boiling point or tensile strength.

The group used 2.4 million materials science abstracts to train MaterialsBERT. In tests, the model outperformed five other models on three of five entity-recognition datasets.

According to the study, the pipeline needed only 60 hours to obtain 300,000 material property records from over 130,000 abstracts.

As a comparison, materials scientists currently use a database called PoLyInfo. This system has over 492,000 material property records, manually curated by hand over the span of many years. Georgia Techs pipeline can accomplish in hours what took humans years to do in PoLyInfo.

Polymer Scholar and MaterialsBERT are powered by a large corpus of 2.4 million materials science articles, which took some time and effort to develop the infrastructure to support such a large collection, said Chao Zhang, an assistant professor in the School of Computational Science and Engineering (CSE). This body of papers made all the difference training MaterialsBERT because it improved the language models ability to identify and extract data.

Polymer research is vital because of their role in manufacturing, healthcare, electronics, and other industries. Polymers have desirable properties that make them useful toward future applications. When polymer research slows, it inhibits development of new technologies. These technologies are needed to overcome todays challenges like climate change, faltering infrastructure, and sustainable energy.

In their paper, the group analyzed data using polymer solar cells, fuel cells, and supercapacitors as keywords in Polymer Scholar. This showed that scholars can use the pipeline to infer trends and phenomena in materials science literature. It also used practical examples to demonstrate applicability.

The journal npj computational materials published the groups paper because of its findings.

The groups work embodies Georgia Techs commitment to interdisciplinary scholarship. Researchers from the School of CSE and the School of Materials Science and Engineering (MSE) collaborated on the pipeline.

School of CSE authors include Shetty, Zhang, and Ph.D. studentSonakshi Gupta. MSE authors include postdoctoral researchersArunkumar Chitteth Rajan,Christopher Kuenneth, undergraduate studentsLakshmi Prerana Panchumarti,Lauren Holm, and ProfessorRampi Ramprasad.

The pipeline is the latest work for the group who are committed to applying computational methods to lead innovations in materials science.

Our long-term vision is to use the extracted data to train models that can predict material properties, Ramprasad said. Creating a pipeline to extract this data that can seamlessly feed into predictive models will ultimately lead to an extraordinary pace of materials discovery.

Source: Georgia Tech

Originally posted here:

Data Extraction Tool May Lead to Discovery of New Polymers - Datanami

Related Posts

Comments are closed.