Quantifying the distribution of materials data types in scientific literature across text, tables, and figures

Hasan M. Sayeed; Wade Smallwood; Sterling G. Baird; Taylor D. Sparks

doi:10.26434/chemrxiv-2023-wd5cr-v2

Materials Science

Search within Materials Science

Quantifying the distribution of materials data types in scientific literature across text, tables, and figures

16 November 2023, Version 2

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Materials science research is a multifaceted field, with valuable data scattered across the pages of research papers in various formats. The efficient extraction of data from these papers is of paramount importance for further analysis and research. This study aims to shed light on the distribution of data in materials science papers and their interconnections. In this preliminary analysis, we systematically examined 10 random materials science papers to discern where key data types—composition, processing conditions, characterization, and performance properties—reside within the textual content, tables, and figures. Our findings reveal intriguing patterns in the presentation of data, ranging from conventional text-based descriptions to detailed tabular presentations and visually informative figures. The analysis encompasses diverse materials and highlights cases where data types are isolated or interconnected across different sources. We also address the challenges and limitations faced during the annotation process. This investigation underscores the importance of understanding data distribution within materials science papers, as it has profound implications for data accessibility and integration in the field. Furthermore, these insights pave the way for future research, particularly in the development of advanced NLP models tailored to the unique characteristics of materials science research papers and other machine learning techniques for more efficient data extraction and analysis in materials science research.

Keywords

Supplementary materials

Title

Description

Actions

Title

Supplementary Information

Description

A comprehensive and detailed breakdown of the data type distribution within the ten analyzed materials science papers. This complements the summarized data distribution table presented in the main body of the paper, offering a more exhaustive view of how data types are distributed across text, tables, and figures in each paper.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Dec 29, 2023 Version 3

Nov 16, 2023 Version 2

Nov 14, 2023 Version 1

Version Notes

Change in citation information in Supplementary Information.

Metrics

1,057

682

Views

Downloads

Citations

License

The content is available under CC BY NC 4.0

DOI

10.26434/chemrxiv-2023-wd5cr-v2

Funding

NSF

DMR 2334411

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Quantifying the distribution of materials data types in scientific literature across text, tables, and figures

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share