Abstract

Given a collection of strings, goal of the approximate string matching is to efficiently find the strings in the collection that are similar to a query string. In this paper, we focus on edit distance as measure to quantify the similarity between two strings. Existing q-gram based methods to address this problem use inverted indexes to index the q-grams of given string collection. These methods begin by generating the q-grams of query string (disjoint or overlapping) and then merge the inverted lists of these q-grams. Several filtering techniques have been proposed so as to segment inverted lists to relatively shorter lists thus reducing the merging cost. We use a filtering technique which we call as "position restricted alignment" that combines well known length filtering and position filtering to provide more aggressive pruning. We then provide an indexing scheme that integrates the inverted lists storage with the proposed filter thus enabling us to auto-filter the inverted lists. We evaluate the effectiveness of the proposed approach by thorough experimentation. © 2013 ACM.

Department(s)

Computer Science

International Standard Book Number (ISBN)

978-145031599-9

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2024 Association for Computing Machinery, All rights reserved.

Publication Date

02 May 2013

Share

 
COinS