Personalized Web Image Organization


Due to the problem of semantic gap, i.e. the visual content of an image may not represent its semantics well, existing efforts on web image organization usually transform this task to clustering the surrounding text. However, because the surrounding text is usually short and the words therein usually appear only once, existing text clustering algorithms can hardly use the statistical information for image representation and may achieve downgraded performance with higher computational cost caused by learning from noisy tags. This chapter presents using the Probabilistic ART with user preference architecture, as introduced in Sects. 3.5 and 3.4, for personalized web image organization. This fused algorithm is named Probabilistic Fusion ART (PF-ART), which groups images of similar semantics together and simultaneously mines the key tags/topics of individual clusters. Moreover, it performs semi-supervised learning using the user-provided taggings for images to give users direct control of the generated clusters. An agglomerative merging strategy is further used to organize the clusters into a hierarchy, which is of a multi-branch tree structure rather than a binary tree generated by traditional hierarchical clustering algorithms. The entire two-step algorithm is called Personalized Hierarchical Theme-based Clustering (PHTC), for tag-based web image organization. Two large-scale real-world web image collections, namely the NUS-WIDE and the Flickr datasets, are used to evaluate PHTC and compare it with existing algorithms in terms of clustering performance and time cost.


Electrical and Computer Engineering

Research Center/Lab(s)

Center for High Performance Computing Research

International Standard Serial Number (ISSN)


Document Type

Book - Chapter

Document Version


File Type





© 2019 Springer London, All rights reserved.

Publication Date

01 May 2019