Abstract
Given a collection of strings, goal of the approximate string matching is to efficiently find the strings in the collection that are similar to a query string. In this paper, we focus on edit distance as measure to quantify the similarity between two strings. Existing q-gram based methods to address this problem use inverted indexes to index the q-grams of given string collection. These methods begin by generating the q-grams of query string (disjoint or overlapping) and then merge the inverted lists of these q-grams. Several filtering techniques have been proposed so as to segment inverted lists to relatively shorter lists thus reducing the merging cost. We use a filtering technique which we call as "position restricted alignment" that combines well known length filtering and position filtering to provide more aggressive pruning. We then provide an indexing scheme that integrates the inverted lists storage with the proposed filter thus enabling us to auto-filter the inverted lists. We evaluate the effectiveness of the proposed approach by thorough experimentation. © 2013 ACM.
Recommended Citation
M. Patil et al., "Approximate String Matching By Position Restricted Alignment," ACM International Conference Proceeding Series, pp. 384 - 391, Association for Computing Machinery, May 2013.
The definitive version is available at https://doi.org/10.1145/2457317.2457388
Department(s)
Computer Science
International Standard Book Number (ISBN)
978-145031599-9
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2024 Association for Computing Machinery, All rights reserved.
Publication Date
02 May 2013