PDF Documentation
Total Page:16
File Type:pdf, Size:1020Kb
lxml 2018-03-13 Contents Contents 2 I lxml 13 1 lxml 14 Introduction................................................. 14 Documentation............................................... 14 Download.................................................. 15 Mailing list................................................. 16 Bug tracker................................................. 16 License................................................... 16 Old Versions................................................. 16 2 Why lxml? 17 Motto.................................................... 17 Aims..................................................... 17 3 Installing lxml 19 Where to get it................................................ 19 Requirements................................................ 19 Installation................................................. 20 MS Windows............................................. 20 Linux................................................. 20 MacOS-X............................................... 20 Building lxml from dev sources....................................... 21 Using lxml with python-libxml2...................................... 21 Source builds on MS Windows....................................... 21 Source builds on MacOS-X......................................... 21 4 Benchmarks and Speed 22 General notes................................................ 22 How to read the timings........................................... 23 Parsing and Serialising........................................... 23 The ElementTree API............................................ 26 Child access.............................................. 27 Element creation........................................... 27 Merging different sources....................................... 28 deepcopy............................................... 28 Tree traversal............................................. 29 XPath.................................................... 29 A longer example.............................................. 30 lxml.objectify................................................ 32 ObjectPath............................................... 32 2 CONTENTS CONTENTS Caching Elements........................................... 33 Further optimisations......................................... 33 5 ElementTree compatibility of lxml.etree 35 6 lxml FAQ - Frequently Asked Questions 38 General Questions.............................................. 38 Is there a tutorial?........................................... 38 Where can I find more documentation about lxml?.......................... 38 What standards does lxml implement?................................ 38 Who uses lxml?............................................ 39 What is the difference between lxml.etree and lxml.objectify?................... 40 How can I make my application run faster?............................. 40 What about that trailing text on serialised Elements?........................ 41 How can I find out if an Element is a comment or PI?........................ 41 How can I map an XML tree into a dict of dicts?........................... 41 Why does lxml sometimes return ’str’ values for text in Python 2?................. 42 Why do I get XInclude or DTD lookup failures on some systems but not on others?........ 42 Installation................................................. 42 Which version of libxml2 and libxslt should I use or require?.................... 42 Where are the binary builds?..................................... 43 Why do I get errors about missing UCS4 symbols when installing lxml?.............. 43 My C compiler crashes on installation................................ 43 Contributing................................................. 43 Why is lxml not written in Python?.................................. 43 How can I contribute?......................................... 44 Bugs..................................................... 44 My application crashes!........................................ 44 My application crashes on MacOS-X!................................ 45 I think I have found a bug in lxml. What should I do?........................ 45 How do I know a bug is really in lxml and not in libxml2?..................... 45 Threading.................................................. 46 Can I use threads to concurrently access the lxml API?....................... 46 Does my program run faster if I use threads?............................. 46 Would my single-threaded program run faster if I turned off threading?............... 47 Why can’t I reuse XSLT stylesheets in other threads?........................ 47 My program crashes when run with mod_python/Pyro/Zope/Plone/................... 47 Parsing and Serialisation.......................................... 48 Why doesn’t the pretty_print option reformat my XML output?............... 48 Why can’t lxml parse my XML from unicode strings?........................ 49 Can lxml parse from file objects opened in unicode/text mode?................... 49 What is the difference between str(xslt(doc)) and xslt(doc).write() ?................ 49 Why can’t I just delete parents or clear the root node in iterparse()?................. 50 How do I output null characters in XML text?............................ 50 Is lxml vulnerable to XML bombs?.................................. 50 How do I use lxml safely as a web-service endpoint?........................ 50 XPath and Document Traversal....................................... 51 What are the findall() and xpath() methods on Element(Tree)?............... 51 Why doesn’t findall() support full XPath expressions?..................... 51 How can I find out which namespace prefixes are used in a document?............... 51 How can I specify a default namespace for XPath expressions?................... 52 II Developing with lxml 53 7 The lxml.etree Tutorial 54 3 CONTENTS CONTENTS The Element class.............................................. 55 Elements are lists........................................... 55 Elements carry attributes as a dict.................................. 57 Elements contain text......................................... 58 Using XPath to find text........................................ 59 Tree iteration............................................. 60 Serialisation.............................................. 61 The ElementTree class........................................... 63 Parsing from strings and files........................................ 63 The fromstring() function....................................... 64 The XML() function......................................... 64 The parse() function.......................................... 64 Parser objects............................................. 65 Incremental parsing.......................................... 65 Event-driven parsing......................................... 66 Namespaces................................................. 68 The E-factory................................................ 71 ElementPath................................................. 72 8 APIs specific to lxml.etree 74 lxml.etree.................................................. 74 Other Element APIs............................................. 74 Trees and Documents............................................ 75 Iteration................................................... 76 Error handling on exceptions........................................ 77 Error logging................................................ 78 Serialisation................................................. 78 Incremental XML generation........................................ 79 CDATA................................................... 81 XInclude and ElementInclude........................................ 82 write_c14n on ElementTree......................................... 82 9 Parsing XML and HTML with lxml 83 Parsers.................................................... 83 Parser options............................................. 84 Error log................................................ 85 Parsing HTML............................................ 85 Doctype information......................................... 86 The target parser interface.......................................... 87 The feed parser interface.......................................... 89 Incremental event parsing.......................................... 90 Event types.............................................. 91 Modifying the tree.......................................... 91 Selective tag events.......................................... 92 Comments and PIs.......................................... 93 Events with custom targets...................................... 93 iterparse and iterwalk............................................ 95 iterwalk................................................ 96 Python unicode strings........................................... 97 Serialising to Unicode strings..................................... 97 10 Validation with lxml 99 Validation at parse time........................................... 99 DTD..................................................... 100 RelaxNG.................................................. 102 XMLSchema................................................ 103 4 CONTENTS CONTENTS Schematron................................................. 105 (Pre-ISO-Schematron)........................................... 107 11 XPath and XSLT with lxml 109 XPath.................................................... 109 The xpath() method.......................................