Abstract
Coal mining accidents are a major concern worldwide, necessitating effective safety measures and comprehensive analysis to prevent future accidents. Our proposed solution is the first attempt for Indian mines, inspired by the potential of Natural Language Processing (NLP) that can read and analyze vast repositories of accident records in seconds. In combination with machine learning (ML), NLP algorithms can extract unstructured text by eliminating manual data entry errors, reading poorly scanned reports, and understanding multiple versions of the event and cluster documents based on types that would otherwise take months to collate. In the case of accident records, it can be an asset in capturing recurring issues, contributing factors, and high-risk areas, enabling proactive measures to be taken to prevent future accidents. The heart of the study lies in applying two ML algorithms called latent Dirichlet allocation (LDA) and RAKE (Rapid Automatic Keyword Extraction). LDA is a topic modeling technique for clustering accidents based on descriptions. RAKE generates root cause analysis through keywords from accident descriptions and remedies suggested by inspection officers. Both are unsupervised learning techniques that do not require any training on labeled datasets. AI and NLP can significantly enhance the process of creating Swiss Cheese Models and Logic Sequences of Contributory Factors Diagrams by automating the extraction, classification, and analysis of data from incident reports and other relevant documents. Data for analysis in this study came from the Directorate General of Mines Safety (DGMS), India records from 2010 to 2015.
Recommended Citation
S. Agarwal et al., "Application of Natural Language Processing and Machine Learning for Analyzing Mining Accident Reports and Automating the Process of Root Cause Analysis," International Journal of Coal Science and Technology, vol. 12, no. 1, article no. 91, SpringerOpen, Dec 2025.
The definitive version is available at https://doi.org/10.1007/s40789-025-00822-0
Department(s)
Engineering Management and Systems Engineering
Second Department
Electrical and Computer Engineering
Publication Status
Open Access
Keywords and Phrases
Accidents; Coal mine safety; Digitalization; Natural language processing; Topic modeling
International Standard Serial Number (ISSN)
2198-7823; 2095-8293
Document Type
Article - Journal
Document Version
Final Version
File Type
text
Language(s)
English
Rights
© 2025 The Authors, All rights reserved.
Creative Commons Licensing

This work is licensed under a Creative Commons Attribution 4.0 License.
Publication Date
01 Dec 2025
Included in
Electrical and Computer Engineering Commons, Operations Research, Systems Engineering and Industrial Engineering Commons
