Ensemble LUT Classification For Degraded Document Enhancement

Abstract

The fast evolution of scanning and computing technologies have led to the creation of large collections of scanned paper documents. Examples of such collections include historical collections, legal depositories, medical archives, and business archives. Moreover, in many situations such as legal litigation and security investigations scanned collections are being used to facilitate systematic exploration of the data. It is almost always the case that scanned documents suffer from some form of degradation. Large degradations make documents hard to read and substantially deteriorate the performance of automated document processing systems. Enhancement of degraded document images is normally performed assuming global degradation models. When the degradation is large, global degradation models do not perform well. In contrast, we propose to estimate local degradation models and use them in enhancing degraded document images. Using a semi-automated enhancement system we have labeled a subset of the Frieder diaries collection.1 This labeled subset was then used to train an ensemble classifier. The component classifiers are based on lookup tables (LUT) in conjunction with the approximated nearest neighbor algorithm. The resulting algorithm is highly efficient. Experimental evaluation results are provided using the Frieder diaries collection.1 © 2008 SPIE-IS&T.

Department(s)

Electrical and Computer Engineering

Keywords and Phrases

Document degradation models; Document image analysis; Ensemble classification; Historical documents; Image enhancement

International Standard Book Number (ISBN)

978-081946987-8

International Standard Serial Number (ISSN)

0277-786X

Document Type

Article - Conference proceedings

Document Version

Final Version

File Type

text

Language(s)

English

Rights

© 2023 Society of Photo-optical Instrumentation Engineers, All rights reserved.

Publication Date

31 Mar 2008

Share

 
COinS