Graphical Figure Classification Using Data Fusion for Integrating Text and Image Features

Abstract

This paper describes a multimodal (image + text) learning approach for automatically identifying three graphical figure types commonly found in biomedical literature, namely, diagrams, statistical figures and flow charts. The goal is to improve retrieval of figures from biomedical journal articles. In this article, we describe a data fusion approach to combine information from both text and image sources, believed to contain complementary information. Text information about the image is extracted from the figure caption. The data fusion process includes a hybrid of evolutionary algorithm (EA) and Binary Particle Swarm Optimization (BPSO) called method applied to find an optimal subset of extracted image features. Chi-square statistic and information gain metric are used to select the optimal subset of extracted text features, which along with image features are input to Multi-Layer Perceptron Neural Network classifiers, whose outputs are characterized as fuzzy sets to determine the final classification result. Evaluation performed on 1707 figure images extracted from a test subset of Biome Central® journals extracted from U.S. National Library of Medicine's PubMed Central® repository yielded classification accuracy as high as 96.1%.

Meeting Name

12th International Conference on Document Analysis and Recognition (ICDAR) (2013: Aug. 25-28, Washington, DC)

Department(s)

Electrical and Computer Engineering

Keywords and Phrases

Biomedical literature; Chi-square statistics; Classification accuracy; Classification results; Multi-layer perceptron neural networks; National library of medicines; Set intersection; Data fusion; Feature extraction; Fuzzy sets; Optimization; Text processing; Image processing; Binary Particle Swarm Optimization (BPSO); Evolutionary Algorithm (EA); Feature Selection; Fuzzy Set Intersection; Fuzzy Set Union; Multi-Layer Perceptron Neural Network (MLP-NN)

International Standard Serial Number (ISSN)

1520-5363

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2013 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.

Publication Date

01 Oct 2013

Share

 
COinS