Feature Grammar Systems
Total Page:16
File Type:pdf, Size:1020Kb
Feature Grammar Systems Incremental Maintenance of Indexes to Digital Media Warehouses Menzo Windhouwer ... van M voor M :-) Feature Grammar Systems Incremental Maintenance of Indexes to Digital Media Warehouses ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam, op gezag van de Rector Magnificus prof. mr. P. F. van der Heijden ten overstaan van een door het college voor promoties ingestelde commissie in het openbaar te verdedigen in de Aula der Universiteit op donderdag 6 november 2003, te 14.00 uur door Menzo Aart Windhouwer geboren te Apeldoorn Promotiecommissie Promotor: prof. dr M. L. Kersten Overige commissieleden: prof. dr P. M. G. Apers prof. dr P. M. E. de Bra prof. dr P. van Emde Boas prof. dr P. Klint Faculteit Faculteit der Natuurwetenschappen, Wiskunde en Informatica Universiteit van Amsterdam The research reported in this thesis was carried out at CWI, the Dutch national research laboratory for mathematics and computer science, within the theme Data Mining and Knowledge Discovery, a subdivision of the research cluster Information Systems. SIKS Dissertation Series No-2003-16. The research reported in this thesis has been carried out under the auspices of SIKS, the Dutch Graduate School for Information and Knowledge Systems. The research and implementation effort reported in this thesis was carried out within the Advanced Multimedia Indexing and Searching, Digital Media Warehouses and Waterland projects. ISBN 90 6196 522 5 Cover design and photography by Dick Windhouwer (www.windhouwer.nl). Contents 1 Digital Media Warehouses 1 1.1 Multimedia Information Retrieval ................... 2 1.2 Annotations ............................... 4 1.2.1 The Semantic Gap ....................... 4 1.2.2 Annotation Extraction Algorithms ............... 5 1.2.3 Annotation Extraction Dependencies ............. 6 1.2.4 Annotation Maintenance .................... 7 1.3 The Acoi System ............................ 8 1.3.1 A Grammar-based Approach .................. 8 1.3.2 System Architecture ...................... 9 1.3.3 Case Studies .......................... 10 1.3.3.1 The WWW Multimedia Search Engine ....... 10 1.3.3.2 The Australian Open Search Engine ........ 11 1.3.3.3 Rijksmuseum Presentation Generation ....... 11 1.4 Discussion ................................ 12 2 Feature Grammar Systems 13 2.1 A Grammar Primer ........................... 15 2.1.1 A Formal Specification of Languages and Grammars ..... 15 2.1.1.1 The Chomsky Hierarchy ............... 16 2.1.1.2 Mild Context-sensitivity ............... 17 2.1.2 The Derivation Process ..................... 18 2.1.2.1 Bidirectional Grammars ............... 19 2.1.2.2 Parse Trees ...................... 19 2.1.2.3 The Delta Operation ................. 19 2.1.2.4 Parse Forests ..................... 20 2.1.2.5 Disambiguation ................... 21 2.1.3 Regulated Rewriting ...................... 21 2.1.3.1 Conditional Grammars ............... 22 2.1.3.2 Tree-controlled Grammars ............. 22 2.1.4 Grammar Systems ....................... 24 vi Contents 2.1.4.1 CD Grammar Systems ................ 24 2.1.4.2 Internal Control ................... 26 2.2 Feature Grammar Systems ....................... 27 2.2.1 Detectors ............................ 28 2.2.2 Atoms .............................. 32 2.2.3 Dependencies .......................... 33 2.2.3.1 Detector Input .................... 33 2.2.3.2 Detector Output ................... 37 2.2.4 Ambiguous Feature Grammar Systems ............ 39 2.2.5 Mildly Context-sensitive Feature Grammar Systems ..... 41 2.3 Discussion ................................ 42 3 Feature Grammar Language 45 3.1 The Basic Feature Grammar Language ................. 45 3.1.1 Production Rules ........................ 46 3.1.2 Atoms .............................. 48 3.1.3 Detectors ............................ 49 3.1.3.1 The XPath Language ................ 50 3.1.3.2 Detector Confidence ................. 52 3.1.4 The Start Symbol ........................ 52 3.2 The Extended Feature Grammar Language ............... 53 3.2.1 Production Rules ........................ 53 3.2.1.1 Additional Sequence Types ............. 53 3.2.1.2 Constants ...................... 54 3.2.2 Detectors ............................ 54 3.2.2.1 Whitebox Detectors ................. 54 3.2.2.2 Plugins ........................ 56 3.2.2.3 Classifiers ...................... 57 3.2.3 The Start Symbol ........................ 58 3.2.3.1 References ...................... 58 3.2.4 Feature Grammar Modules ................... 61 3.2.5 Change Detection ........................ 62 3.3 Discussion ................................ 63 4 Feature Detector Engine 65 4.1 A Parser Primer ............................. 66 4.1.1 More Parsing Algorithms for Context-free Grammars ..... 69 4.2 Parsing Feature Grammar Systems ................... 73 4.2.1 Exhaustive Backtracking for Feature Grammar Systems ... 75 4.2.1.1 Left-recursion .................... 78 4.2.1.2 Lookahead ...................... 80 4.2.1.3 Memoization ..................... 81 4.3 The Feature Detector Engine ...................... 82 vii 4.3.1 The Symbol Table ....................... 84 4.3.1.1 Rewriting ...................... 84 4.3.1.2 Semantic Checks .................. 85 4.3.2 The Parser ........................... 86 4.3.3 The Parse Forest ........................ 86 4.3.3.1 XML and DOM ................... 86 4.3.3.2 Labeling Parse Trees ................. 88 4.3.3.3 Memoized Parse Trees ................ 90 4.3.4 The Sentences ......................... 92 4.3.5 Detectors ............................ 93 4.3.5.1 Detector Input .................... 93 4.3.5.2 Blackbox Detectors ................. 94 4.3.5.3 Plugins ........................ 94 4.3.5.4 Classifiers ...................... 95 4.3.5.5 Start Symbols and References ............ 95 4.3.5.6 Deadlock Resolution ................ 96 4.4 Discussion ................................ 97 5 Feature Databases 99 5.1 The Monet Database Kernel ...................... 99 5.1.1 Monet and XML ........................ 100 5.1.1.1 Semistructured Data ................. 100 5.1.1.2 Monet XML and XMark .............. 101 5.1.1.3 XQuery ....................... 102 5.2 A Feature Database ........................... 102 5.2.1 A Database Schema ...................... 103 5.2.2 A parse forest XML document ................. 105 5.2.3 Inserting a Parse Forest ..................... 107 5.2.4 Replacing a (Partial) Parse Forest ............... 107 5.2.5 Query Facilities ......................... 108 5.2.6 Adding Database Management to a Database Kernel ..... 109 5.3 Discussion ................................ 110 6 Feature Detector Scheduler 111 6.1 The Dependency Graph ......................... 111 6.2 Identifying Change ........................... 114 6.2.1 External Changes ........................ 114 6.2.2 Internal Changes ........................ 115 6.2.2.1 The Start Condition ................. 115 6.2.2.2 The Detector Function ................ 115 6.2.2.3 The Stop Condition ................. 116 6.3 Managing Change ............................ 116 6.3.1 The (Re)validation Process ................... 117 viii Contents 6.3.2 Lookup Table Management .................. 117 6.4 Discussion ................................ 118 7 Case Studies 119 7.1 The Acoi Implementation ........................ 119 7.1.1 Acoi Prehistory ......................... 119 7.1.2 The Acoi Project ........................ 120 7.1.3 Acoi 1998 ............................ 120 7.1.4 Acoi 2000 ............................ 120 7.1.5 Acoi 2002 ............................ 121 7.1.6 Acoi Future ........................... 122 7.2 The WWW Multimedia Search Engine ................. 123 7.2.1 The Feature Grammars ..................... 123 7.2.2 The System Architecture .................... 124 7.2.3 Lessons Learned ........................ 125 7.3 The Australian Open Search Engine .................. 126 7.3.1 The Webspace Method ..................... 127 7.3.2 COBRA ............................. 128 7.3.3 The Australian Open DMW Demonstrator ........... 128 7.3.4 Lessons Learned ........................ 132 7.4 Rijksmuseum Presentation Generation ................. 133 7.4.1 The Style Repository ...................... 133 7.4.2 The Data Repository ...................... 135 7.4.3 The Presentation Environment ................. 136 7.4.4 A Style Feature Grammar ................... 137 7.4.5 Generating the Presentation .................. 139 7.4.6 Lessons Learned ........................ 141 7.5 Discussion ................................ 142 8 Conclusion and Future Work 143 8.1 Conclusion ............................... 143 8.2 Future Work ............................... 144 8.2.1 Feature Grammar Systems ................... 144 8.2.2 Feature Grammar Language .................. 144 8.2.3 Feature Detector Engine .................... 145 8.2.4 Feature Database ........................ 145 8.2.5 Feature Detector Scheduler ................... 145 8.2.6 Digital Media Warehouses ................... 146 8.3 Discussion ................................ 146 A The Feature Grammar Language 147 ix B Feature Grammars 151 B.1 The WWW Feature Grammar ..................... 151 B.2 The Text Feature Grammar ....................... 151 B.3 The HTML Feature Grammar ..................... 152 B.4 The Image Feature Grammar ...................... 153 B.5 The Audio Feature Grammar ...................... 153 B.6 The MIDI Feature Grammar ...................... 154 B.7 The MP3 Feature Grammar ....................... 154 B.8 The Video Feature