1:mod:`xml.dom.minidom` --- Minimal DOM implementation 2===================================================== 3 4.. module:: xml.dom.minidom 5 :synopsis: Minimal Document Object Model (DOM) implementation. 6 7.. moduleauthor:: Paul Prescod <paul@prescod.net> 8.. sectionauthor:: Paul Prescod <paul@prescod.net> 9.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> 10 11**Source code:** :source:`Lib/xml/dom/minidom.py` 12 13-------------- 14 15:mod:`xml.dom.minidom` is a minimal implementation of the Document Object 16Model interface, with an API similar to that in other languages. It is intended 17to be simpler than the full DOM and also significantly smaller. Users who are 18not already proficient with the DOM should consider using the 19:mod:`xml.etree.ElementTree` module for their XML processing instead. 20 21 22.. warning:: 23 24 The :mod:`xml.dom.minidom` module is not secure against 25 maliciously constructed data. If you need to parse untrusted or 26 unauthenticated data see :ref:`xml-vulnerabilities`. 27 28 29DOM applications typically start by parsing some XML into a DOM. With 30:mod:`xml.dom.minidom`, this is done through the parse functions:: 31 32 from xml.dom.minidom import parse, parseString 33 34 dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name 35 36 datasource = open('c:\\temp\\mydata.xml') 37 dom2 = parse(datasource) # parse an open file 38 39 dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>') 40 41The :func:`parse` function can take either a filename or an open file object. 42 43 44.. function:: parse(filename_or_file, parser=None, bufsize=None) 45 46 Return a :class:`Document` from the given input. *filename_or_file* may be 47 either a file name, or a file-like object. *parser*, if given, must be a SAX2 48 parser object. This function will change the document handler of the parser and 49 activate namespace support; other parser configuration (like setting an entity 50 resolver) must have been done in advance. 51 52If you have XML in a string, you can use the :func:`parseString` function 53instead: 54 55 56.. function:: parseString(string, parser=None) 57 58 Return a :class:`Document` that represents the *string*. This method creates an 59 :class:`io.StringIO` object for the string and passes that on to :func:`parse`. 60 61Both functions return a :class:`Document` object representing the content of the 62document. 63 64What the :func:`parse` and :func:`parseString` functions do is connect an XML 65parser with a "DOM builder" that can accept parse events from any SAX parser and 66convert them into a DOM tree. The name of the functions are perhaps misleading, 67but are easy to grasp when learning the interfaces. The parsing of the document 68will be completed before these functions return; it's simply that these 69functions do not provide a parser implementation themselves. 70 71You can also create a :class:`Document` by calling a method on a "DOM 72Implementation" object. You can get this object either by calling the 73:func:`getDOMImplementation` function in the :mod:`xml.dom` package or the 74:mod:`xml.dom.minidom` module. Once you have a :class:`Document`, you 75can add child nodes to it to populate the DOM:: 76 77 from xml.dom.minidom import getDOMImplementation 78 79 impl = getDOMImplementation() 80 81 newdoc = impl.createDocument(None, "some_tag", None) 82 top_element = newdoc.documentElement 83 text = newdoc.createTextNode('Some textual content.') 84 top_element.appendChild(text) 85 86Once you have a DOM document object, you can access the parts of your XML 87document through its properties and methods. These properties are defined in 88the DOM specification. The main property of the document object is the 89:attr:`documentElement` property. It gives you the main element in the XML 90document: the one that holds all others. Here is an example program:: 91 92 dom3 = parseString("<myxml>Some data</myxml>") 93 assert dom3.documentElement.tagName == "myxml" 94 95When you are finished with a DOM tree, you may optionally call the 96:meth:`unlink` method to encourage early cleanup of the now-unneeded 97objects. :meth:`unlink` is an :mod:`xml.dom.minidom`\ -specific 98extension to the DOM API that renders the node and its descendants are 99essentially useless. Otherwise, Python's garbage collector will 100eventually take care of the objects in the tree. 101 102.. seealso:: 103 104 `Document Object Model (DOM) Level 1 Specification <https://www.w3.org/TR/REC-DOM-Level-1/>`_ 105 The W3C recommendation for the DOM supported by :mod:`xml.dom.minidom`. 106 107 108.. _minidom-objects: 109 110DOM Objects 111----------- 112 113The definition of the DOM API for Python is given as part of the :mod:`xml.dom` 114module documentation. This section lists the differences between the API and 115:mod:`xml.dom.minidom`. 116 117 118.. method:: Node.unlink() 119 120 Break internal references within the DOM so that it will be garbage collected on 121 versions of Python without cyclic GC. Even when cyclic GC is available, using 122 this can make large amounts of memory available sooner, so calling this on DOM 123 objects as soon as they are no longer needed is good practice. This only needs 124 to be called on the :class:`Document` object, but may be called on child nodes 125 to discard children of that node. 126 127 You can avoid calling this method explicitly by using the :keyword:`with` 128 statement. The following code will automatically unlink *dom* when the 129 :keyword:`!with` block is exited:: 130 131 with xml.dom.minidom.parse(datasource) as dom: 132 ... # Work with dom. 133 134 135.. method:: Node.writexml(writer, indent="", addindent="", newl="", \ 136 encoding=None, standalone=None) 137 138 Write XML to the writer object. The writer receives texts but not bytes as input, 139 it should have a :meth:`write` method which matches that of the file object 140 interface. The *indent* parameter is the indentation of the current node. 141 The *addindent* parameter is the incremental indentation to use for subnodes 142 of the current one. The *newl* parameter specifies the string to use to 143 terminate newlines. 144 145 For the :class:`Document` node, an additional keyword argument *encoding* can 146 be used to specify the encoding field of the XML header. 147 148 Silimarly, explicitly stating the *standalone* argument causes the 149 standalone document declarations to be added to the prologue of the XML 150 document. 151 If the value is set to `True`, `standalone="yes"` is added, 152 otherwise it is set to `"no"`. 153 Not stating the argument will omit the declaration from the document. 154 155 .. versionchanged:: 3.8 156 The :meth:`writexml` method now preserves the attribute order specified 157 by the user. 158 159.. method:: Node.toxml(encoding=None, standalone=None) 160 161 Return a string or byte string containing the XML represented by 162 the DOM node. 163 164 With an explicit *encoding* [1]_ argument, the result is a byte 165 string in the specified encoding. 166 With no *encoding* argument, the result is a Unicode string, and the 167 XML declaration in the resulting string does not specify an 168 encoding. Encoding this string in an encoding other than UTF-8 is 169 likely incorrect, since UTF-8 is the default encoding of XML. 170 171 The *standalone* argument behaves exactly as in :meth:`writexml`. 172 173 .. versionchanged:: 3.8 174 The :meth:`toxml` method now preserves the attribute order specified 175 by the user. 176 177.. method:: Node.toprettyxml(indent="\\t", newl="\\n", encoding=None, \ 178 standalone=None) 179 180 Return a pretty-printed version of the document. *indent* specifies the 181 indentation string and defaults to a tabulator; *newl* specifies the string 182 emitted at the end of each line and defaults to ``\n``. 183 184 The *encoding* argument behaves like the corresponding argument of 185 :meth:`toxml`. 186 187 The *standalone* argument behaves exactly as in :meth:`writexml`. 188 189 .. versionchanged:: 3.8 190 The :meth:`toprettyxml` method now preserves the attribute order specified 191 by the user. 192 193 194.. _dom-example: 195 196DOM Example 197----------- 198 199This example program is a fairly realistic example of a simple program. In this 200particular case, we do not take much advantage of the flexibility of the DOM. 201 202.. literalinclude:: ../includes/minidom-example.py 203 204 205.. _minidom-and-dom: 206 207minidom and the DOM standard 208---------------------------- 209 210The :mod:`xml.dom.minidom` module is essentially a DOM 1.0-compatible DOM with 211some DOM 2 features (primarily namespace features). 212 213Usage of the DOM interface in Python is straight-forward. The following mapping 214rules apply: 215 216* Interfaces are accessed through instance objects. Applications should not 217 instantiate the classes themselves; they should use the creator functions 218 available on the :class:`Document` object. Derived interfaces support all 219 operations (and attributes) from the base interfaces, plus any new operations. 220 221* Operations are used as methods. Since the DOM uses only :keyword:`in` 222 parameters, the arguments are passed in normal order (from left to right). 223 There are no optional arguments. ``void`` operations return ``None``. 224 225* IDL attributes map to instance attributes. For compatibility with the OMG IDL 226 language mapping for Python, an attribute ``foo`` can also be accessed through 227 accessor methods :meth:`_get_foo` and :meth:`_set_foo`. ``readonly`` 228 attributes must not be changed; this is not enforced at runtime. 229 230* The types ``short int``, ``unsigned int``, ``unsigned long long``, and 231 ``boolean`` all map to Python integer objects. 232 233* The type ``DOMString`` maps to Python strings. :mod:`xml.dom.minidom` supports 234 either bytes or strings, but will normally produce strings. 235 Values of type ``DOMString`` may also be ``None`` where allowed to have the IDL 236 ``null`` value by the DOM specification from the W3C. 237 238* ``const`` declarations map to variables in their respective scope (e.g. 239 ``xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE``); they must not be changed. 240 241* ``DOMException`` is currently not supported in :mod:`xml.dom.minidom`. 242 Instead, :mod:`xml.dom.minidom` uses standard Python exceptions such as 243 :exc:`TypeError` and :exc:`AttributeError`. 244 245* :class:`NodeList` objects are implemented using Python's built-in list type. 246 These objects provide the interface defined in the DOM specification, but with 247 earlier versions of Python they do not support the official API. They are, 248 however, much more "Pythonic" than the interface defined in the W3C 249 recommendations. 250 251The following interfaces have no implementation in :mod:`xml.dom.minidom`: 252 253* :class:`DOMTimeStamp` 254 255* :class:`EntityReference` 256 257Most of these reflect information in the XML document that is not of general 258utility to most DOM users. 259 260.. rubric:: Footnotes 261 262.. [1] The encoding name included in the XML output should conform to 263 the appropriate standards. For example, "UTF-8" is valid, but 264 "UTF8" is not valid in an XML document's declaration, even though 265 Python accepts it as an encoding name. 266 See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl 267 and https://www.iana.org/assignments/character-sets/character-sets.xhtml. 268