Abstract
Similarity searches are a critical task in data mining. As datasets grow larger, exact nearest neighbor searches quickly become unfeasible, leading to the adoption of approximate nearest neighbor (ANN) searches. ANN has been studied for text data, images, and trajectories. However, there has been little effort to develop ANN systems for polygons in spatial database systems and geographic information systems. We present PolyMinHash, a system for approximate polygon similarity search that adapts MinHashing into a novel 2D polygon-hashing scheme to generate short, similarity-preserving signatures of input polygons. Minhash is generated by counting the number of randomly sampled points needed before the sampled point lands within the polygon's interior area, yielding hash values that preserve area-based Jaccard similarity. We present the trade-off between search accuracy and runtime of our PolyMinHash system. Our hashing mechanism reduces the number of candidates to be processed in the query refinement phase by up to 98% compared to the number of candidates processed by the Brute Force (Exact Search) algorithm1.
Recommended Citation
A. Subedi et al., "PolyMinHash: Efficient Area-Based MinHashing of Polygons for Approximate Nearest Neighbor Search," 33rd ACM Sigspatial International Conference on Advances in Geographic Information Systems ACM Sigspatial 2025, pp. 553 - 556, Association for Computer Machinery, Dec 2025.
The definitive version is available at https://doi.org/10.1145/3748636.3762767
Department(s)
Computer Science
Publication Status
Free Access
Keywords and Phrases
jaccard distance; locality sensitive hashing; minhashing; nearest neighbor; polygons; similarity search
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2026 Association for Computing Machinery, All rights reserved.
Publication Date
12 Dec 2025
