Safe Template Processing of XML Documents DissertaOn Zur Erlangung Des Akademischen Grades Doktoringenieur (Dr.-Ing.)

Safe Template Processing of XML Documents Dissertaon zur Erlangung des akademischen Grades Doktoringenieur (Dr.-Ing.) vorgelegt an der Technischen Universität Dresden Fakultät Informak eingereicht von Dipl.-Inform. Falk Hartmann geboren am 2. April 1973 in Freital Betreuender Hochschullehrer: Prof. Dr. rer. nat. habil. Uwe Aßmann, TU Dresden Gutachter: Prof. Dr. rer. nat. habil. Uwe Aßmann, TU Dresden Prof. Dr. Welf Löwe, Linnaeus University Tag der Verteidigung: 1. Juli 2011 Dresden, im September 2011 Contents 1. Preface 7 1.1. Overview ..................................... 8 1.2. Problems ..................................... 9 1.3. Movang Example ................................ 10 1.4. Goals ....................................... 13 1.5. Contribuons ................................... 14 1.6. Related Work ................................... 16 1.7. Typographic Convenons ............................. 16 1.8. Outline ...................................... 17 2. Introducon 19 2.1. Definions ..................................... 19 2.1.1. Templates and Related Terms ...................... 20 2.1.2. Life Cycle Phases ............................. 24 2.1.3. The Extensible Markup Language XML .................. 25 2.1.4. XML Schema Languages ......................... 27 2.2. Applicaons .................................... 30 2.2.1. Web Applicaons ............................. 30 2.2.2. Code Generaon ............................. 30 2.3. Alternaves to Using Templates ......................... 31 2.3.1. Transformaons ............................. 31 2.3.2. Aspect-Oriented Approaches ....................... 32 2.3.3. Unparsers ................................. 33 2.3.4. Comparison of Templates with Alternave Technologies ........ 34 2.4. Related Research Areas .............................. 35 2.4.1. Macro Processing ............................. 35 2.4.2. Templates as Programming Language Feature .............. 35 2.4.3. Invasive Soware Composion ...................... 36 2.4.4. Frame Processing ............................. 37 2.5. Classificaon ................................... 38 2.5.1. Target Language Awareness of Slot Markup ............... 39 2.5.2. Generality of the Slot Markup ...................... 40 2.5.3. Entanglement Index ........................... 40 3 Contents 2.5.4. Instanaon Data Access Strategy .................... 42 2.5.5. Query Language ............................. 43 2.5.6. Instanaon Technique .......................... 44 2.5.7. Reuse in Templates ............................ 44 2.5.8. Further Features ............................. 45 2.6. Conclusion ..................................... 46 3. Safe Template Processing 49 3.1. Goals ....................................... 49 3.1.1. Safe Authoring .............................. 50 3.1.2. Safe Instanaon ............................. 50 3.1.3. Separaon of Concerns .......................... 51 3.1.4. Broad Applicability ............................ 53 3.1.5. Ulizaon of Exisng Standards ..................... 53 3.2. Requirements ................................... 54 3.2.1. Preservaon of Target Language Constraints ............... 54 3.2.2. Coverage of Target Language ....................... 55 3.2.3. Computability ............................... 55 3.2.4. Expressiveness .............................. 56 3.2.5. Instanaon Data Type Safety ...................... 56 3.2.6. Independence of Query Language .................... 57 3.3. Proposal of an Architecture fulfilling the Requirements ............. 57 3.4. Conclusion ..................................... 60 4. Design of a Universal, Syntax- and Semancs-Preserving Slot Markup Language 61 4.1. General Design Decisions ............................. 62 4.2. Creaon of Character Data ............................ 65 4.2.1. xtl:text ................................ 66 4.2.2. xtl:attribute ............................ 67 4.2.3. xtl:include ............................. 69 4.3. Condional and Repeated Inclusion of Template Fragments ........... 71 4.3.1. xtl:if ................................. 71 4.3.2. xtl:for-each ............................. 73 4.4. Reuse of Template Fragments ........................... 76 4.4.1. xtl:macro ............................... 76 4.4.2. xtl:call-macro ........................... 77 4.5. Advanced Features ................................ 78 4.5.1. Accessing mulple Instanaon Data Sources using Realms ...... 78 4.5.2. Instanaon Pipelines using Bypassing ................. 80 4.6. Definion of the Instanaon Semancs using XSL-T .............. 83 4.7. Relaon to Document Validaon ......................... 84 4.8. Conclusion ..................................... 86 5. Safe Authoring of Templates 87 4 Contents 5.1. Constraint Separaon ............................... 87 5.1.1. Introductory Example ........................... 89 5.1.2. The Constraint XML Schema Language CXSD ............... 94 5.1.3. The Instanaon Data Constraint Language IDC ............. 98 5.1.4. Constraint Separaon Process ...................... 99 5.1.5. Proof of the Preservaon of the Target Language Constraints ...... 103 5.1.5.1. Completeness of the Set of Required Aributes ....... 104 5.1.5.2. Compliance to the Content Model .............. 105 5.1.6. Visitor-based Implementaon of the Constraint Separaon ....... 107 5.1.7. Paral Templazaon ........................... 116 5.2. Template Validaon ................................ 117 5.3. Conclusion ..................................... 121 6. Flexible, Efficient and Safe Template Instanaon 123 6.1. Instanaon Data Evaluaon ........................... 123 6.1.1. Design of the PHP Interface ....................... 124 6.1.2. The Identy PHP ............................. 126 6.1.3. The JXPath PHP .............................. 127 6.1.4. The SPARQL PHP ............................. 127 6.1.5. The System PHP .............................. 128 6.2. Template Instanaon .............................. 129 6.2.1. XML Access Technologies ......................... 129 6.2.2. Operaonal Model of the XTL Engine .................. 130 6.2.3. Pipeline Implementaon of the XTL Engine ............... 136 6.2.4. Memory and Runme Complexity .................... 150 6.3. Instanaon Data Validaon ........................... 150 6.3.1. The IDC PHP ................................ 151 6.3.2. Template Interface Generaon ...................... 152 6.3.2.1. Introductory Example ..................... 153 6.3.2.2. An Algorithm for the Template Interface Generaon ..... 155 6.3.2.3. Implementaon using a PHP and an API-based Generator .. 160 6.4. Conclusion ..................................... 163 7. Validaon 165 7.1. Implementaon of the Prototype ......................... 165 7.1.1. The Constraint Separaon Tool xtlsc ................. 167 7.1.2. The Template Validaon Tool cxsdvalidate ............. 167 7.1.3. The Template Instanaon Tool xtlinstantiate .......... 168 7.1.4. The Template Interface Generaon Tool xtltc ............. 169 7.2. Test Suites ..................................... 170 7.2.1. Constraint Separaon Test Suite ..................... 170 7.2.2. Template Validaon Test Suite ...................... 170 7.2.3. Template Instanaon Test Suite ..................... 171 7.2.4. Template Interface Generaon Test Suite ................ 172 5 Contents 7.2.5. Round-trip Test Suite ........................... 174 7.3. Applicaons of the Prototype ........................... 175 7.3.1. SNOW: Use of XTL in a Staged Architecture ............... 175 7.3.2. EMODE: Use of XTL for Model-to-Text Transformaons ......... 179 7.3.3. FeasiPLe: Use of XTL for Code Generaon from Ontologies ....... 179 7.4. Proof of the Preservaon of the Target Language Constraints .......... 180 7.5. Runme and Memory Usage Measurements ................... 180 7.5.1. Runme Measurement of Validaon against a CXSD Schema ...... 181 7.5.2. Runme Measurements of the Template Instanaon ......... 182 7.5.3. Memory Usage Measurements of the Template Instanaon ...... 184 7.6. Conclusion ..................................... 187 8. Summary, Conclusion, and Outlook 189 8.1. Summary ..................................... 189 8.2. Conclusion ..................................... 190 8.3. Suggested Improvements for XML Technologies ................. 191 8.4. Future Research Direcons ............................ 192 A. Referenced XML Schemata and Instances 193 A.1. XML Schema of XTL ................................ 193 A.2. Purchase Order Schema .............................. 201 A.3. Purchase Order Instance ............................. 202 B. Detailed Results of the Runme and Memory Measurements 205 List of Acronyms 211 List of Figures 216 List of Lisngs 219 List of Tables 221 Bibliography 237 Index 239 6 1 Preface But, how did the first template appear? Internaonal Encyclopedia of Systems and Cybernecs, 1997 [69] Almost two decades aer the introducon of the World Wide Web by Tim Berners-Lee in 1989, the automac generaon of Web pages from dynamic data is sll suffering the same problem as in the beginning: How can one be sure that the applicaon produces valid HTML code? There have been several approaches to this problem, among them approaches that successfully solved the problem, thereby unfortunately violang other well-established design rules, like the Separaon of Concerns principle. The consequences of this violaon can hardly be managed in large applicaons developed in a cooperaon of many parcipants assigned to mulple roles in the development process: therefore, the problem can sll be considered unsolved in its generality. The goal of this thesis has been

Safe Template Processing of XML Documents DissertaOn Zur Erlangung Des Akademischen Grades Doktoringenieur (Dr.-Ing.)

On the Integrity and Trustworthiness of Web Produced Data

XML Signature/Encryption — the Basis of Web Services Security

Sams Teach Yourself XML in 21 Days

Bibliography of Erik Wilde

Feasibility and Performance Evaluation of Canonical XML

A Layered Approach to XML Canonicalization

Exploring the XML World

Information Integration with XML

Extensible Layout in Functional Documents♦

Oracle Xml Db

XML Technologies Dissected

[ Team Lib ] .NET & XML Provides an In-Depth, Concentrated Tutorial for Intermediate to Advanced-Level Developers. Additiona