PDF Documentation

PDF Documentation

lxml 2018-03-13 Contents Contents 2 I lxml 13 1 lxml 14 Introduction................................................. 14 Documentation............................................... 14 Download.................................................. 15 Mailing list................................................. 16 Bug tracker................................................. 16 License................................................... 16 Old Versions................................................. 16 2 Why lxml? 17 Motto.................................................... 17 Aims..................................................... 17 3 Installing lxml 19 Where to get it................................................ 19 Requirements................................................ 19 Installation................................................. 20 MS Windows............................................. 20 Linux................................................. 20 MacOS-X............................................... 20 Building lxml from dev sources....................................... 21 Using lxml with python-libxml2...................................... 21 Source builds on MS Windows....................................... 21 Source builds on MacOS-X......................................... 21 4 Benchmarks and Speed 22 General notes................................................ 22 How to read the timings........................................... 23 Parsing and Serialising........................................... 23 The ElementTree API............................................ 26 Child access.............................................. 27 Element creation........................................... 27 Merging different sources....................................... 28 deepcopy............................................... 28 Tree traversal............................................. 29 XPath.................................................... 29 A longer example.............................................. 30 lxml.objectify................................................ 32 ObjectPath............................................... 32 2 CONTENTS CONTENTS Caching Elements........................................... 33 Further optimisations......................................... 33 5 ElementTree compatibility of lxml.etree 35 6 lxml FAQ - Frequently Asked Questions 38 General Questions.............................................. 38 Is there a tutorial?........................................... 38 Where can I find more documentation about lxml?.......................... 38 What standards does lxml implement?................................ 38 Who uses lxml?............................................ 39 What is the difference between lxml.etree and lxml.objectify?................... 40 How can I make my application run faster?............................. 40 What about that trailing text on serialised Elements?........................ 41 How can I find out if an Element is a comment or PI?........................ 41 How can I map an XML tree into a dict of dicts?........................... 41 Why does lxml sometimes return ’str’ values for text in Python 2?................. 42 Why do I get XInclude or DTD lookup failures on some systems but not on others?........ 42 Installation................................................. 42 Which version of libxml2 and libxslt should I use or require?.................... 42 Where are the binary builds?..................................... 43 Why do I get errors about missing UCS4 symbols when installing lxml?.............. 43 My C compiler crashes on installation................................ 43 Contributing................................................. 43 Why is lxml not written in Python?.................................. 43 How can I contribute?......................................... 44 Bugs..................................................... 44 My application crashes!........................................ 44 My application crashes on MacOS-X!................................ 45 I think I have found a bug in lxml. What should I do?........................ 45 How do I know a bug is really in lxml and not in libxml2?..................... 45 Threading.................................................. 46 Can I use threads to concurrently access the lxml API?....................... 46 Does my program run faster if I use threads?............................. 46 Would my single-threaded program run faster if I turned off threading?............... 47 Why can’t I reuse XSLT stylesheets in other threads?........................ 47 My program crashes when run with mod_python/Pyro/Zope/Plone/................... 47 Parsing and Serialisation.......................................... 48 Why doesn’t the pretty_print option reformat my XML output?............... 48 Why can’t lxml parse my XML from unicode strings?........................ 49 Can lxml parse from file objects opened in unicode/text mode?................... 49 What is the difference between str(xslt(doc)) and xslt(doc).write() ?................ 49 Why can’t I just delete parents or clear the root node in iterparse()?................. 50 How do I output null characters in XML text?............................ 50 Is lxml vulnerable to XML bombs?.................................. 50 How do I use lxml safely as a web-service endpoint?........................ 50 XPath and Document Traversal....................................... 51 What are the findall() and xpath() methods on Element(Tree)?............... 51 Why doesn’t findall() support full XPath expressions?..................... 51 How can I find out which namespace prefixes are used in a document?............... 51 How can I specify a default namespace for XPath expressions?................... 52 II Developing with lxml 53 7 The lxml.etree Tutorial 54 3 CONTENTS CONTENTS The Element class.............................................. 55 Elements are lists........................................... 55 Elements carry attributes as a dict.................................. 57 Elements contain text......................................... 58 Using XPath to find text........................................ 59 Tree iteration............................................. 60 Serialisation.............................................. 61 The ElementTree class........................................... 63 Parsing from strings and files........................................ 63 The fromstring() function....................................... 64 The XML() function......................................... 64 The parse() function.......................................... 64 Parser objects............................................. 65 Incremental parsing.......................................... 65 Event-driven parsing......................................... 66 Namespaces................................................. 68 The E-factory................................................ 71 ElementPath................................................. 72 8 APIs specific to lxml.etree 74 lxml.etree.................................................. 74 Other Element APIs............................................. 74 Trees and Documents............................................ 75 Iteration................................................... 76 Error handling on exceptions........................................ 77 Error logging................................................ 78 Serialisation................................................. 78 Incremental XML generation........................................ 79 CDATA................................................... 81 XInclude and ElementInclude........................................ 82 write_c14n on ElementTree......................................... 82 9 Parsing XML and HTML with lxml 83 Parsers.................................................... 83 Parser options............................................. 84 Error log................................................ 85 Parsing HTML............................................ 85 Doctype information......................................... 86 The target parser interface.......................................... 87 The feed parser interface.......................................... 89 Incremental event parsing.......................................... 90 Event types.............................................. 91 Modifying the tree.......................................... 91 Selective tag events.......................................... 92 Comments and PIs.......................................... 93 Events with custom targets...................................... 93 iterparse and iterwalk............................................ 95 iterwalk................................................ 96 Python unicode strings........................................... 97 Serialising to Unicode strings..................................... 97 10 Validation with lxml 99 Validation at parse time........................................... 99 DTD..................................................... 100 RelaxNG.................................................. 102 XMLSchema................................................ 103 4 CONTENTS CONTENTS Schematron................................................. 105 (Pre-ISO-Schematron)........................................... 107 11 XPath and XSLT with lxml 109 XPath.................................................... 109 The xpath() method.......................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    500 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us