University of Leicester
Browse

Transformation from human-readable documents and archives in arc welding domain to machine-interpretable data

Download (6.26 MB)
journal contribution
posted on 2021-04-20, 14:41 authored by Z Dong, S Paul, K Tassenberg, G Melton, H Dong
The capability of extracting useful information from documents and further transferring into knowledge is essential to advance technology innovations in industries. However, the overwhelming majority of scientific literature primarily published as unstructured human-readable formats is incompatible for machine analysis via contemporary artificial intelligence (AI) methods that effectively discovers knowledge from data. Therefore, the extraction approach transforming of unstructured data are fundamental in establishing state-of-the-art digital knowledge-based platforms. In this paper, we integrated multiple Python libraries and developed a method as a cohesive package for automated data extraction and quick processing to convert unstructured documents into machine-interpretable data. Transformed data can be further incorporated with AI analytical methods. The output files have shown excellent quality of digitalised data without major flaws in terms of context inconsistency. All scripts were written in Python with functional modules providing easy accessibility and proficiency to achieve objectives. Eventually, the finalised well-structured data can be implemented for further knowledge discovery.

History

Citation

Computers in Industry, Volume 128, June 2021, 103439

Author affiliation

School of Engineering

Version

  • VoR (Version of Record)

Published in

Computers in Industry

Volume

128

Pagination

103439

Publisher

Elsevier BV

issn

0166-3615

Acceptance date

2021-03-03

Copyright date

2021

Available date

2021-04-20

Language

en

Usage metrics

    University of Leicester Publications

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC