Abstract
Many large programs operate on collection types. Extensive libraries are available in many programming languages, such as the C++ Standard Template Library, which make programming with collections convenient. Extending programming languages to provide collection queries as first-class constructs in the language would not only allow programmers to write queries explicitly in their programs but it would also allow compilers to leverage the wealth of experience available from the database domain to optimize such queries. This paper describes an approach to reduce the run time of programs involving explicit collection queries by performing run time query optimization that is effective for single runs of a program. In addition, it also leverages a cache to store previously computed results. The proposed approach relies on histograms built from the data at run time to estimate the selectivity of joins and predicates in order to construct query plans. Information from earlier executions of the same query during run time is leveraged during the construction of the query plans, even when the data has changed between these executions. An effective cache policy is also determined for caching the results of join (sub) queries. The cache is maintained incrementally, when the underlying collections change, and use of the cache space is optimized by a cache replacement policy. Our approach has been implemented within the Java Query Language (JQL) framework using Aspect J. Our approach demonstrated that its run time query optimization in integration with caching sub query result significantly improves the run time of programs with explicit queries over equivalent programs performing collection operations by iterating over those collections. This paper evaluates our approach using synthetic as well as real world Robocode programs by comparing it to JQL as a benchmark. Experimental results show that our approach performs better than the JQL approach with respect to the program run time. © Springer Science + Business Media New York 2013.
Recommended Citation
V. K. Nerella et al., "Exploring Optimization and Caching for Efficient Collection Operations," Automated Software Engineering, vol. 21, no. 1, pp. 3 - 40, Springer, Jan 2014.
The definitive version is available at https://doi.org/10.1007/s10515-013-0119-x
Department(s)
Computer Science
Keywords and Phrases
Cache policy; Collection operations; Join caching; Joins; Query optimization; Run time; Selectivity
International Standard Serial Number (ISSN)
1573-7535; 0928-8910
Document Type
Article - Journal
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2024 Springer, All rights reserved.
Publication Date
01 Jan 2014