1:mod:`xml.dom.minidom` --- Minimal DOM implementation 2===================================================== 3 4.. module:: xml.dom.minidom 5 :synopsis: Minimal Document Object Model (DOM) implementation. 6 7.. moduleauthor:: Paul Prescod <paul@prescod.net> 8.. sectionauthor:: Paul Prescod <paul@prescod.net> 9.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> 10 11**Source code:** :source:`Lib/xml/dom/minidom.py` 12 13-------------- 14 15:mod:`xml.dom.minidom` is a minimal implementation of the Document Object 16Model interface, with an API similar to that in other languages. It is intended 17to be simpler than the full DOM and also significantly smaller. Users who are 18not already proficient with the DOM should consider using the 19:mod:`xml.etree.ElementTree` module for their XML processing instead. 20 21 22.. warning:: 23 24 The :mod:`xml.dom.minidom` module is not secure against 25 maliciously constructed data. If you need to parse untrusted or 26 unauthenticated data see :ref:`xml-vulnerabilities`. 27 28 29DOM applications typically start by parsing some XML into a DOM. With 30:mod:`xml.dom.minidom`, this is done through the parse functions:: 31 32 from xml.dom.minidom import parse, parseString 33 34 dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name 35 36 datasource = open('c:\\temp\\mydata.xml') 37 dom2 = parse(datasource) # parse an open file 38 39 dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>') 40 41The :func:`parse` function can take either a filename or an open file object. 42 43 44.. function:: parse(filename_or_file, parser=None, bufsize=None) 45 46 Return a :class:`Document` from the given input. *filename_or_file* may be 47 either a file name, or a file-like object. *parser*, if given, must be a SAX2 48 parser object. This function will change the document handler of the parser and 49 activate namespace support; other parser configuration (like setting an entity 50 resolver) must have been done in advance. 51 52If you have XML in a string, you can use the :func:`parseString` function 53instead: 54 55 56.. function:: parseString(string, parser=None) 57 58 Return a :class:`Document` that represents the *string*. This method creates an 59 :class:`io.StringIO` object for the string and passes that on to :func:`parse`. 60 61Both functions return a :class:`Document` object representing the content of the 62document. 63 64What the :func:`parse` and :func:`parseString` functions do is connect an XML 65parser with a "DOM builder" that can accept parse events from any SAX parser and 66convert them into a DOM tree. The name of the functions are perhaps misleading, 67but are easy to grasp when learning the interfaces. The parsing of the document 68will be completed before these functions return; it's simply that these 69functions do not provide a parser implementation themselves. 70 71You can also create a :class:`Document` by calling a method on a "DOM 72Implementation" object. You can get this object either by calling the 73:func:`getDOMImplementation` function in the :mod:`xml.dom` package or the 74:mod:`xml.dom.minidom` module. Once you have a :class:`Document`, you 75can add child nodes to it to populate the DOM:: 76 77 from xml.dom.minidom import getDOMImplementation 78 79 impl = getDOMImplementation() 80 81 newdoc = impl.createDocument(None, "some_tag", None) 82 top_element = newdoc.documentElement 83 text = newdoc.createTextNode('Some textual content.') 84 top_element.appendChild(text) 85 86Once you have a DOM document object, you can access the parts of your XML 87document through its properties and methods. These properties are defined in 88the DOM specification. The main property of the document object is the 89:attr:`documentElement` property. It gives you the main element in the XML 90document: the one that holds all others. Here is an example program:: 91 92 dom3 = parseString("<myxml>Some data</myxml>") 93 assert dom3.documentElement.tagName == "myxml" 94 95When you are finished with a DOM tree, you may optionally call the 96:meth:`unlink` method to encourage early cleanup of the now-unneeded 97objects. :meth:`unlink` is an :mod:`xml.dom.minidom`\ -specific 98extension to the DOM API that renders the node and its descendants are 99essentially useless. Otherwise, Python's garbage collector will 100eventually take care of the objects in the tree. 101 102.. seealso:: 103 104 `Document Object Model (DOM) Level 1 Specification <https://www.w3.org/TR/REC-DOM-Level-1/>`_ 105 The W3C recommendation for the DOM supported by :mod:`xml.dom.minidom`. 106 107 108.. _minidom-objects: 109 110DOM Objects 111----------- 112 113The definition of the DOM API for Python is given as part of the :mod:`xml.dom` 114module documentation. This section lists the differences between the API and 115:mod:`xml.dom.minidom`. 116 117 118.. method:: Node.unlink() 119 120 Break internal references within the DOM so that it will be garbage collected on 121 versions of Python without cyclic GC. Even when cyclic GC is available, using 122 this can make large amounts of memory available sooner, so calling this on DOM 123 objects as soon as they are no longer needed is good practice. This only needs 124 to be called on the :class:`Document` object, but may be called on child nodes 125 to discard children of that node. 126 127 You can avoid calling this method explicitly by using the :keyword:`with` 128 statement. The following code will automatically unlink *dom* when the 129 :keyword:`!with` block is exited:: 130 131 with xml.dom.minidom.parse(datasource) as dom: 132 ... # Work with dom. 133 134 135.. method:: Node.writexml(writer, indent="", addindent="", newl="") 136 137 Write XML to the writer object. The writer should have a :meth:`write` method 138 which matches that of the file object interface. The *indent* parameter is the 139 indentation of the current node. The *addindent* parameter is the incremental 140 indentation to use for subnodes of the current one. The *newl* parameter 141 specifies the string to use to terminate newlines. 142 143 For the :class:`Document` node, an additional keyword argument *encoding* can 144 be used to specify the encoding field of the XML header. 145 146 147.. method:: Node.toxml(encoding=None) 148 149 Return a string or byte string containing the XML represented by 150 the DOM node. 151 152 With an explicit *encoding* [1]_ argument, the result is a byte 153 string in the specified encoding. 154 With no *encoding* argument, the result is a Unicode string, and the 155 XML declaration in the resulting string does not specify an 156 encoding. Encoding this string in an encoding other than UTF-8 is 157 likely incorrect, since UTF-8 is the default encoding of XML. 158 159.. method:: Node.toprettyxml(indent="\\t", newl="\\n", encoding=None) 160 161 Return a pretty-printed version of the document. *indent* specifies the 162 indentation string and defaults to a tabulator; *newl* specifies the string 163 emitted at the end of each line and defaults to ``\n``. 164 165 The *encoding* argument behaves like the corresponding argument of 166 :meth:`toxml`. 167 168 169.. _dom-example: 170 171DOM Example 172----------- 173 174This example program is a fairly realistic example of a simple program. In this 175particular case, we do not take much advantage of the flexibility of the DOM. 176 177.. literalinclude:: ../includes/minidom-example.py 178 179 180.. _minidom-and-dom: 181 182minidom and the DOM standard 183---------------------------- 184 185The :mod:`xml.dom.minidom` module is essentially a DOM 1.0-compatible DOM with 186some DOM 2 features (primarily namespace features). 187 188Usage of the DOM interface in Python is straight-forward. The following mapping 189rules apply: 190 191* Interfaces are accessed through instance objects. Applications should not 192 instantiate the classes themselves; they should use the creator functions 193 available on the :class:`Document` object. Derived interfaces support all 194 operations (and attributes) from the base interfaces, plus any new operations. 195 196* Operations are used as methods. Since the DOM uses only :keyword:`in` 197 parameters, the arguments are passed in normal order (from left to right). 198 There are no optional arguments. ``void`` operations return ``None``. 199 200* IDL attributes map to instance attributes. For compatibility with the OMG IDL 201 language mapping for Python, an attribute ``foo`` can also be accessed through 202 accessor methods :meth:`_get_foo` and :meth:`_set_foo`. ``readonly`` 203 attributes must not be changed; this is not enforced at runtime. 204 205* The types ``short int``, ``unsigned int``, ``unsigned long long``, and 206 ``boolean`` all map to Python integer objects. 207 208* The type ``DOMString`` maps to Python strings. :mod:`xml.dom.minidom` supports 209 either bytes or strings, but will normally produce strings. 210 Values of type ``DOMString`` may also be ``None`` where allowed to have the IDL 211 ``null`` value by the DOM specification from the W3C. 212 213* ``const`` declarations map to variables in their respective scope (e.g. 214 ``xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE``); they must not be changed. 215 216* ``DOMException`` is currently not supported in :mod:`xml.dom.minidom`. 217 Instead, :mod:`xml.dom.minidom` uses standard Python exceptions such as 218 :exc:`TypeError` and :exc:`AttributeError`. 219 220* :class:`NodeList` objects are implemented using Python's built-in list type. 221 These objects provide the interface defined in the DOM specification, but with 222 earlier versions of Python they do not support the official API. They are, 223 however, much more "Pythonic" than the interface defined in the W3C 224 recommendations. 225 226The following interfaces have no implementation in :mod:`xml.dom.minidom`: 227 228* :class:`DOMTimeStamp` 229 230* :class:`DocumentType` 231 232* :class:`DOMImplementation` 233 234* :class:`CharacterData` 235 236* :class:`CDATASection` 237 238* :class:`Notation` 239 240* :class:`Entity` 241 242* :class:`EntityReference` 243 244* :class:`DocumentFragment` 245 246Most of these reflect information in the XML document that is not of general 247utility to most DOM users. 248 249.. rubric:: Footnotes 250 251.. [1] The encoding name included in the XML output should conform to 252 the appropriate standards. For example, "UTF-8" is valid, but 253 "UTF8" is not valid in an XML document's declaration, even though 254 Python accepts it as an encoding name. 255 See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl 256 and https://www.iana.org/assignments/character-sets/character-sets.xhtml. 257