Lxmldoc-4.5.0.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
lxml 2020-01-29 Contents Contents 2 I lxml 14 1 lxml 15 Introduction................................................. 15 Documentation............................................... 15 Download.................................................. 16 Mailing list................................................. 17 Bug tracker................................................. 17 License................................................... 17 Old Versions................................................. 17 2 Why lxml? 18 Motto.................................................... 18 Aims..................................................... 18 3 Installing lxml 20 Where to get it................................................ 20 Requirements................................................ 20 Installation................................................. 21 MS Windows............................................. 21 Linux................................................. 21 MacOS-X............................................... 21 Building lxml from dev sources....................................... 22 Using lxml with python-libxml2...................................... 22 Source builds on MS Windows....................................... 22 Source builds on MacOS-X......................................... 22 4 Benchmarks and Speed 23 General notes................................................ 23 How to read the timings........................................... 24 Parsing and Serialising........................................... 24 The ElementTree API............................................ 27 Child access.............................................. 28 Element creation........................................... 28 Merging different sources....................................... 29 deepcopy............................................... 29 Tree traversal............................................. 30 XPath.................................................... 30 A longer example.............................................. 31 lxml.objectify................................................ 33 ObjectPath............................................... 33 2 CONTENTS CONTENTS Caching Elements........................................... 34 Further optimisations......................................... 34 5 ElementTree compatibility of lxml.etree 36 6 lxml FAQ - Frequently Asked Questions 39 General Questions.............................................. 39 Is there a tutorial?........................................... 39 Where can I find more documentation about lxml?.......................... 39 What standards does lxml implement?................................ 40 Who uses lxml?............................................ 40 What is the difference between lxml.etree and lxml.objectify?................... 41 How can I make my application run faster?............................. 42 What about that trailing text on serialised Elements?........................ 42 How can I find out if an Element is a comment or PI?........................ 42 How can I map an XML tree into a dict of dicts?........................... 43 Why does lxml sometimes return ’str’ values for text in Python 2?................. 43 Why do I get XInclude or DTD lookup failures on some systems but not on others?........ 43 How do namespaces work in lxml?.................................. 43 Installation................................................. 43 Which version of libxml2 and libxslt should I use or require?.................... 43 Where are the binary builds?..................................... 44 Why do I get errors about missing UCS4 symbols when installing lxml?.............. 44 My C compiler crashes on installation................................ 44 Contributing................................................. 45 Why is lxml not written in Python?.................................. 45 How can I contribute?......................................... 45 Bugs..................................................... 46 My application crashes!........................................ 46 My application crashes on MacOS-X!................................ 46 I think I have found a bug in lxml. What should I do?........................ 46 How do I know a bug is really in lxml and not in libxml2?..................... 47 Threading.................................................. 47 Can I use threads to concurrently access the lxml API?....................... 47 Does my program run faster if I use threads?............................. 48 Would my single-threaded program run faster if I turned off threading?............... 48 Why can’t I reuse XSLT stylesheets in other threads?........................ 48 My program crashes when run with mod_python/Pyro/Zope/Plone/................... 48 Parsing and Serialisation.......................................... 49 Why doesn’t the pretty_print option reformat my XML output?............... 49 Why can’t lxml parse my XML from unicode strings?........................ 50 Can lxml parse from file objects opened in unicode/text mode?................... 50 What is the difference between str(xslt(doc)) and xslt(doc).write() ?................ 51 Why can’t I just delete parents or clear the root node in iterparse()?................. 51 How do I output null characters in XML text?............................ 51 Is lxml vulnerable to XML bombs?.................................. 51 How do I use lxml safely as a web-service endpoint?........................ 52 How can I sort the attributes?..................................... 52 XPath and Document Traversal....................................... 53 What are the findall() and xpath() methods on Element(Tree)?............... 53 Why doesn’t findall() support full XPath expressions?..................... 53 How can I find out which namespace prefixes are used in a document?............... 53 How can I specify a default namespace for XPath expressions?................... 53 3 CONTENTS CONTENTS II Developing with lxml 54 7 The lxml.etree Tutorial 55 The Element class.............................................. 56 Elements are lists........................................... 56 Elements carry attributes as a dict.................................. 58 Elements contain text......................................... 59 Using XPath to find text........................................ 60 Tree iteration............................................. 61 Serialisation.............................................. 62 The ElementTree class........................................... 64 Parsing from strings and files........................................ 65 The fromstring() function....................................... 65 The XML() function......................................... 65 The parse() function.......................................... 66 Parser objects............................................. 67 Incremental parsing.......................................... 67 Event-driven parsing......................................... 68 Namespaces................................................. 70 The E-factory................................................ 72 ElementPath................................................. 74 8 APIs specific to lxml.etree 76 lxml.etree.................................................. 76 Other Element APIs............................................. 76 Trees and Documents............................................ 77 Iteration................................................... 78 Error handling on exceptions........................................ 79 Error logging................................................ 80 Serialisation................................................. 80 C14N................................................. 80 Pretty printing............................................. 80 XML declaration........................................... 81 Incremental XML generation........................................ 82 CDATA................................................... 83 XInclude and ElementInclude........................................ 84 9 Parsing XML and HTML with lxml 85 Parsers.................................................... 85 Parser options............................................. 86 Error log................................................ 87 Parsing HTML............................................ 87 Doctype information......................................... 88 The target parser interface.......................................... 89 The feed parser interface.......................................... 91 Incremental event parsing.......................................... 92 Event types.............................................. 93 Modifying the tree.......................................... 93 Selective tag events.......................................... 94 Comments and PIs.......................................... 95 Events with custom targets...................................... 95 iterparse and iterwalk............................................ 97 iterwalk................................................ 98 Python unicode strings........................................... 99 Serialising to Unicode strings..................................... 99 4 CONTENTS CONTENTS 10 Validation with lxml 101 Validation at parse time........................................... 101 DTD..................................................... 102 RelaxNG.................................................. 104 XMLSchema................................................ 105