PDF Documentation
Total Page:16
File Type:pdf, Size:1020Kb
lxml 2019-03-26 Contents Contents 2 I lxml 13 1 lxml 14 Introduction................................................. 14 Documentation............................................... 14 Download.................................................. 15 Mailing list................................................. 16 Bug tracker................................................. 16 License................................................... 16 Old Versions................................................. 16 2 Why lxml? 18 Motto.................................................... 18 Aims..................................................... 18 3 Installing lxml 20 Where to get it................................................ 20 Requirements................................................ 20 Installation................................................. 21 MS Windows............................................. 21 Linux................................................. 21 MacOS-X............................................... 21 Building lxml from dev sources....................................... 22 Using lxml with python-libxml2...................................... 22 Source builds on MS Windows....................................... 22 Source builds on MacOS-X......................................... 22 4 Benchmarks and Speed 23 General notes................................................ 23 How to read the timings........................................... 24 Parsing and Serialising........................................... 24 The ElementTree API............................................ 27 Child access.............................................. 28 Element creation........................................... 28 Merging different sources....................................... 29 deepcopy............................................... 29 Tree traversal............................................. 30 XPath.................................................... 30 A longer example.............................................. 31 lxml.objectify................................................ 33 ObjectPath............................................... 33 2 CONTENTS CONTENTS Caching Elements........................................... 34 Further optimisations......................................... 34 5 ElementTree compatibility of lxml.etree 36 6 lxml FAQ - Frequently Asked Questions 39 General Questions.............................................. 39 Is there a tutorial?........................................... 39 Where can I find more documentation about lxml?.......................... 39 What standards does lxml implement?................................ 39 Who uses lxml?............................................ 40 What is the difference between lxml.etree and lxml.objectify?................... 41 How can I make my application run faster?............................. 41 What about that trailing text on serialised Elements?........................ 42 How can I find out if an Element is a comment or PI?........................ 42 How can I map an XML tree into a dict of dicts?........................... 42 Why does lxml sometimes return ’str’ values for text in Python 2?................. 43 Why do I get XInclude or DTD lookup failures on some systems but not on others?........ 43 How do namespaces work in lxml?.................................. 43 Installation................................................. 43 Which version of libxml2 and libxslt should I use or require?.................... 43 Where are the binary builds?..................................... 44 Why do I get errors about missing UCS4 symbols when installing lxml?.............. 44 My C compiler crashes on installation................................ 44 Contributing................................................. 44 Why is lxml not written in Python?.................................. 44 How can I contribute?......................................... 45 Bugs..................................................... 45 My application crashes!........................................ 45 My application crashes on MacOS-X!................................ 46 I think I have found a bug in lxml. What should I do?........................ 46 How do I know a bug is really in lxml and not in libxml2?..................... 47 Threading.................................................. 47 Can I use threads to concurrently access the lxml API?....................... 47 Does my program run faster if I use threads?............................. 48 Would my single-threaded program run faster if I turned off threading?............... 48 Why can’t I reuse XSLT stylesheets in other threads?........................ 48 My program crashes when run with mod_python/Pyro/Zope/Plone/................... 48 Parsing and Serialisation.......................................... 49 Why doesn’t the pretty_print option reformat my XML output?............... 49 Why can’t lxml parse my XML from unicode strings?........................ 50 Can lxml parse from file objects opened in unicode/text mode?................... 50 What is the difference between str(xslt(doc)) and xslt(doc).write() ?................ 51 Why can’t I just delete parents or clear the root node in iterparse()?................. 51 How do I output null characters in XML text?............................ 51 Is lxml vulnerable to XML bombs?.................................. 51 How do I use lxml safely as a web-service endpoint?........................ 51 XPath and Document Traversal....................................... 52 What are the findall() and xpath() methods on Element(Tree)?............... 52 Why doesn’t findall() support full XPath expressions?..................... 52 How can I find out which namespace prefixes are used in a document?............... 53 How can I specify a default namespace for XPath expressions?................... 53 3 CONTENTS CONTENTS II Developing with lxml 54 7 The lxml.etree Tutorial 55 The Element class.............................................. 56 Elements are lists........................................... 56 Elements carry attributes as a dict.................................. 58 Elements contain text......................................... 59 Using XPath to find text........................................ 60 Tree iteration............................................. 61 Serialisation.............................................. 62 The ElementTree class........................................... 64 Parsing from strings and files........................................ 64 The fromstring() function....................................... 65 The XML() function......................................... 65 The parse() function.......................................... 65 Parser objects............................................. 66 Incremental parsing.......................................... 66 Event-driven parsing......................................... 67 Namespaces................................................. 69 The E-factory................................................ 72 ElementPath................................................. 73 8 APIs specific to lxml.etree 75 lxml.etree.................................................. 75 Other Element APIs............................................. 75 Trees and Documents............................................ 76 Iteration................................................... 77 Error handling on exceptions........................................ 78 Error logging................................................ 79 Serialisation................................................. 79 Incremental XML generation........................................ 80 CDATA................................................... 82 XInclude and ElementInclude........................................ 83 write_c14n on ElementTree......................................... 83 9 Parsing XML and HTML with lxml 84 Parsers.................................................... 84 Parser options............................................. 85 Error log................................................ 86 Parsing HTML............................................ 86 Doctype information......................................... 87 The target parser interface.......................................... 88 The feed parser interface.......................................... 90 Incremental event parsing.......................................... 91 Event types.............................................. 92 Modifying the tree.......................................... 92 Selective tag events.......................................... 93 Comments and PIs.......................................... 94 Events with custom targets...................................... 94 iterparse and iterwalk............................................ 96 iterwalk................................................ 97 Python unicode strings........................................... 98 Serialising to Unicode strings..................................... 98 10 Validation with lxml 100 Validation at parse time........................................... 100 4 CONTENTS CONTENTS DTD..................................................... 101 RelaxNG.................................................. 103 XMLSchema................................................ 104 Schematron................................................. 106 (Pre-ISO-Schematron)........................................... 108 11 XPath and XSLT with lxml 110 XPath...................................................