Comparative Study using Inverse Ontology Cogency and Alternatives for Concept Recognition in the Annotated National Library of Medicine Database

Abstract

This paper introduces inverse ontology cogency, a concept recognition process and distance function that is biologically-inspired and competitive with alternative methods. The paper introduces inverse ontology cogency as a new alternative method. It is a novel distance measure used in selecting the optimum mapping between ontology-specified concepts and phrases in free-form text. We also apply a multi-layer perceptron and text processing method for named entity recognition as an alternative to recurrent neural network methods. Automated named entity recognition, or concept recognition, is a common task in natural language processing. Similarities between confabulation theory and existing language models are discussed. This paper provides comparisons to MetaMap from the National Library of Medicine (NLM), a popular tool used in medicine to map free-form text to concepts in a medical ontology. The NLM provides a manually annotated database from the medical literature with concepts labeled, a unique, valuable source of ground truth, permitting comparison with MetaMap performance. Comparisons for different feature set combinations are made to demonstrate the effectiveness of inverse ontology cogency for entity recognition. Results indicate that using both inverse ontology cogency and corpora cogency improved concept recognition precision 20% over the best published MetaMap results. This demonstrates a new, effective approach for identifying medical concepts in text. This is the first time cogency has been explicitly invoked for reasoning with ontologies, and the first time it has been used on medical literature where high-quality ground truth is available for quality assessment.

Department(s)

Engineering Management and Systems Engineering

Second Department

Electrical and Computer Engineering

Research Center/Lab(s)

Center for High Performance Computing Research

Second Research Center/Lab

Intelligent Systems Center

Keywords and Phrases

Cogent confabulation; Concept recognition; Language model; Natural language processing

International Standard Serial Number (ISSN)

0893-6080

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2021 Elsevier, All rights reserved.

Publication Date

01 Jul 2021

PubMed ID

33684612

Share

 
COinS