Directed Graph
Total Page:16
File Type:pdf, Size:1020Kb
The Consistency and Conformance of Web Document Collection Based on Heterogeneous DAC Graph Marek Kopel and Aleksander Zgrzywa www.iis.pwr.wroc.pl www.zsi.pwr.wroc.pl Outline • Background & Idea • Personal Web of Trust • User and Agent Trust • Local Document Ranking & Filtering • Example Scenario • Conclusions & Future Work 2 Relationships in WWW • directed graph - most common model of a Web document collection • documents' hyperlinking relationship (edges) → PageRank, HITS • Tim Berners-Lee (in reference to the social aspect of Web 2.0): “I called this graph the Semantic Web, but maybe it should have been Giant Global Graph!” 3 Relationships in WWW (2) There's more to hyperlink than href: • HTML 4.01 attributes rel and rev - e.g: <a href=”glos.html” rel=”glossary”>used definitions</a> – navigation in a document collection (start, prev, next, contents, index), – structure (chapter, section, subsection, appendix, glossary) – meta (copyright, help) • XHTML 2.0 – custom namespaces 4 Relationships in WWW (3) Popular relation ontologies: • FOAF <foaf:knows> • XFN microformat – friendship (contact, acquaintance, friend) – family (child, parent, sibling, spouse, kin) – professional (co-worker, colleague) – physical (met) – geographical (co-resident, neighbor) – romantic (muse, crush, date, sweetheart) • rel-tag microformat - folksonomies 5 Heterogeneous DAC Graph • DAC graph – nodes of three types: • Document • Author • Concept – edges between nodes model the relationships – most of the relationships can be acquired directly from the Web data 6 Consistency and Conformance • Consistency of a Web document collection – inner similarity concerning subject • similarly tagged (Web 2.0) – authors assigned the same tags or categories • same keywords (digital libraries) • Conformance of a Web document collection – document authors' relationship – authors with strong relationship → often coauthors (agree on some subjects) – citing and referencing – Web of Trust 7 Relationships in DAC Graph • Document • Author • Concept 8 fragment of a DAC graph of a Web document collection a 1 d d 1 2 a c 2 1 c 2 d d 3 4 9 Consistency Collection document-concept graph a 1 d d 1 2 a c 2 1 c 2 d d 3 4 10 Conformance Collection document-author graph a 1 d d 1 2 a c 2 1 c 2 d d 3 4 11 Deriving Relationships a 1 a c 2 1 d a 1 3 c 2 c 3 c 4 c 5 For all authors a i that have relationships with both c o n c e p t c 1 and document d 1 For all concepts c i that have relationships with both concept c 1 and document d 1 r e l ( c 1 , d 1 ) + = r e l ( c 1 , a i ) * r e l ( a i , d 1 )* r e l ( c , c ) * r e l ( c , d ) 1 i i 1 12 Consistency and Conformance • Subgraphs are clustered – only the relationships’ values • consistency collection graph – output biggest cluster’s doc. nodes → consistent subcollection • conformance collection graph – → conformable subcollection card(cons _ sub(C)) consistency = card(C) card(conf _ sub(C)) conformance = • C – Web document collection card(C) • cons_subc(C) – consistent subcollection of C • conf_subc(C) – conformable subcollection of C Conclusions and Future Work • Relationships are asymmetric, so undirected → directed graph • Relationship deriving using: paths with one → n proxy nodes • Graph clustering: – MCA - Markov Cluster Algorithm (currently) – Other algorithms – Maximum clique technique 14 Q & A.