posted on 2021-04-20, 14:41authored byZ Dong, S Paul, K Tassenberg, G Melton, H Dong
The capability of extracting useful information from documents and further transferring into knowledge is essential to advance technology innovations in industries. However, the overwhelming majority of scientific literature primarily published as unstructured human-readable formats is incompatible for machine analysis via contemporary artificial intelligence (AI) methods that effectively discovers knowledge from data. Therefore, the extraction approach transforming of unstructured data are fundamental in establishing state-of-the-art digital knowledge-based platforms. In this paper, we integrated multiple Python libraries and developed a method as a cohesive package for automated data extraction and quick processing to convert unstructured documents into machine-interpretable data. Transformed data can be further incorporated with AI analytical methods. The output files have shown excellent quality of digitalised data without major flaws in terms of context inconsistency. All scripts were written in Python with functional modules providing easy accessibility and proficiency to achieve objectives. Eventually, the finalised well-structured data can be implemented for further knowledge discovery.
History
Citation
Computers in Industry, Volume 128, June 2021, 103439