Using Multiset Discrimination to Solve Language Processing Problems Without Hashing

Abstract

It is generally assumed that hashing is essential to solve many language processing problems efficiently; e.g. symbol table formation and maintenance, grammar manipulation, basic block optimization, and global optimization. This paper questions this assumption, and initiates development of an efficient alternative compiler methodology without hashing or sorting. The methodology rests on efficient solutions to the basic problem of detecting duplicate values in a multiset, which we call multiset discrimination. Paige and Tarjan (1987) gave an efficient solution to multiset discrimination for detecting duplicate elements occurring in a multiset of varying length strings. The technique was used to develop an improved algorithm for lexicographic sorting, whose importance stems largely from its use in solving a variety of isomorphism problems (Aho et al., 1974). The current paper and a related paper (Paige, 1994) show that full lexicographic sorting is not needed to solve these isomorphism problems, because they can be solved more efficiently using straightforward extensions to the simpler multiset discrimination technique. By reformulating language processing problems in terms of multiset discrimination, we also show how almost every subtask of compilation can be solved without hashing in worst case running time no worse (and frequently better) than the best previous expected time solution (under the assumption that one hash operation takes unit expected time). Because of their simplicity, our solutions may be of practical as well as theoretical interest.The various applications presented culminate with a new algorithm to solve iterated strength reduction folded with useless code elimination that runs in worst case asymptotic time and auxiliary space Θ(¦L¦ + ¦L'¦), where ¦L¦ and ¦L'¦ represent the lengths of the initial and optimized programs, respectively. The previous best solution due to Cocke and Kennedy (1977) takes Ω(¦L¦3¦L'¦) has operations in the worst case.

Department(s)

Mathematics and Statistics

International Standard Serial Number (ISSN)

0304-3975

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 1995 Elsevier, All rights reserved.

Publication Date

01 Jan 1995

Share

 
COinS