Abstract

Sets and bags are closely related structures and have been studied in relational databases. A bag is different from a set in that it is sensitive to the number of times an element occurs, while a set is not. In this paper, we introduce the concept of a Web bag in the context of a World Wide Web warehouse called WHOWEDA (WareHouse Of WEb DAta) which we are currently building. Informally, a Web bag is a Web table which allows multiple occurrences of identical Web types. A Web bag helps one to discover useful knowledge from a Web table, such as visible documents or Web sites (i.e. documents/sites which can be reached by many paths), luminous documents (i.e. documents with many outgoing links) and luminous paths (i.e. frequently traversed paths). In this paper, we provide a cost-benefit analysis of materializing Web bags as compared to Web tables with distinct Web tuples

Meeting Name

International Symposium on Database Engineering and Applications, 1999

Department(s)

Computer Science

Keywords and Phrases

WHOWEDA; Web Bags; Web Tables; Web Tuples; World Wide Web Warehouse; Cost-Benefit Analysis; Data Mining; Data Structures; Data Warehouses; Element Occurrence; Fan-In; Fan-Out; Frequently Traversed Paths; Identical Web Types; Information Resources; Luminous Documents; Luminous Paths; Outgoing Links; Search Engines; Useful Knowledge Discovery; Visible Web Sites; Visible Documents

Document Type

Article - Conference proceedings

Document Version

Final Version

File Type

text

Language(s)

English

Rights

© 1999 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.

Publication Date

01 Jan 1999

Share

 
COinS