Alternative Title

ECO: event detection from click-through data via query clustering

Abstract

"The web is an index of real-world events and lot of knowledge can be mined from the web resources and their derivatives. Event detection is one recent research topic triggered from the domain of web data mining with the increasing popularity of search engines. In the visitor-centric approach, the click-through data generated by the web search engines is the start up resource with the intuition: often such data is event-driven. In this thesis, a retrospective algorithm is proposed to detect such real-world events from the click-through data. This approach differs from the existing work as it: (i) considers the click-through data as collaborative query sessions instead of mere web logs and try to understand user behavior (ii) tries to integrate the semantics, structure, and content of queries and pages (iii) aims to achieve the overall objective via Query Clustering. The problem of event detection is transformed into query clustering by generating clusters - hybrid cover graphs; each hybrid cover graph corresponds to a real-world event. The evolutionary pattern for the co-occurrences of query-page pairs in a hybrid cover graph is imposed for the quality purpose over a moving window period. Also, the approach is experimentally evaluated on a commercial search engine's data collected over 3 months with about 20 million web queries and page clicks from 650000 users. The results outperform the most recent work in this domain in terms of number of events detected, F-measures, entropy, recall etc."--Abstract, page iv.

Advisor(s)

Madria, Sanjay Kumar

Committee Member(s)

Leopold, Jennifer
Erçal, Fikret

Department(s)

Computer Science

Degree Name

M.S. in Computer Science

Sponsor(s)

Air Force Research Laboratory (Wright-Patterson Air Force Base, Ohio)

Publisher

Missouri University of Science and Technology

Publication Date

Fall 2010

Pagination

ix, 42 pages

Rights

© 2010 Prabhu Kumar Angajala, All rights reserved.

Document Type

Thesis - Open Access

File Type

text

Language

English

Library of Congress Subject Headings

Association rule mining
Data mining
Querying (Computer science)
Semantics -- Data processing

Thesis Number

T 9715

Print OCLC #

724116197

Electronic OCLC #

745911546

Share

 
COinS