ECO: event detection from click-through data via query clustering
"The web is an index of real-world events and lot of knowledge can be mined from the web resources and their derivatives. Event detection is one recent research topic triggered from the domain of web data mining with the increasing popularity of search engines. In the visitor-centric approach, the click-through data generated by the web search engines is the start up resource with the intuition: often such data is event-driven. In this thesis, a retrospective algorithm is proposed to detect such real-world events from the click-through data. This approach differs from the existing work as it: (i) considers the click-through data as collaborative query sessions instead of mere web logs and try to understand user behavior (ii) tries to integrate the semantics, structure, and content of queries and pages (iii) aims to achieve the overall objective via Query Clustering. The problem of event detection is transformed into query clustering by generating clusters - hybrid cover graphs; each hybrid cover graph corresponds to a real-world event. The evolutionary pattern for the co-occurrences of query-page pairs in a hybrid cover graph is imposed for the quality purpose over a moving window period. Also, the approach is experimentally evaluated on a commercial search engine's data collected over 3 months with about 20 million web queries and page clicks from 650000 users. The results outperform the most recent work in this domain in terms of number of events detected, F-measures, entropy, recall etc."--Abstract, page iv.
Madria, Sanjay Kumar
M.S. in Computer Science
Air Force Research Laboratory (Wright-Patterson Air Force Base, Ohio)
Missouri University of Science and Technology
ix, 42 pages
© 2010 Prabhu Kumar Angajala, All rights reserved.
Thesis - Open Access
Association rule mining
Querying (Computer science)
Semantics -- Data processing
Print OCLC #
Electronic OCLC #
Link to Catalog Record
Angajala, Prabhu Kumar, "Event detection from click-through data via query clustering" (2010). Masters Theses. 4984.