xml module in Python - module index

XML stands for Extensible Markup Language. It is used to represent arbitrary data in plain text format, in a way that is both human and machine-friendly.

The fact that xml data is stored simply in plain text format makes xml independent of any software or hardware system, this ensures that xml data can be exchanged across varying systems.

xml's syntax is much like that of html. However, unlike HTML and similar languages, xml does not have any centralized domain of application. Its main purpose is for storage and transmission of arbitrary data between any type of systems.

Typically, files with .xml extension are associated with xml data.

Consider the following xml document, we can name it messages.xml:

<?xml version ="1.0" encoding = "UTF-8" ?>

<messages>
  <message>
    <from>John</from>
    <to>Mary</to>
    <date>2024/04/05</date>
    <time>11:00pm</time>
    <subject>Requesting a date.</subject>
    <body>I hope you are doing fine, I saw you yesterday and...</body>
  </message>
  <message>
    <from>Mary</from>
    <to>John</to>
    <date>2024/04/09</date>
    <time>12:00pm</time>
    <subject>Lorem ipsum dolor sit amet</subject>
    <body>Maxime mollitia, molestiae quas vel sint commodi repudiandae consequuntur....</body>
  </message>
</messages>

As you can see in the above example, the xml tags used in the document are arbitrary and invented by the user.

What the xml module does

The xml package in the standard library provide tools for working with and manipulating xml documents. It contains four sub-modules as shown in the following table:

`xml.etree`	Provides the `ElementTree` API, which is useful when processing xml documents.
`xml.dom`	W3C Document Object Model. Used for creating a hierarchical representation of elements in an xml document.
`xml.parsers`	Wrappers for XML parsers.
`xml.sax`	Provides support for SAX 2 API.

`xml.etree` submodule

The xml.etree submodule provide the ElementTree class which is useful for parsing xml data. ElementTree objects represents xml data in form of a tree structure in which the hierarchy of the nodes is based on the nesting of the xml document. This is useful for parsing xml and representing the entire document in form of a tree.

The ElementTree.parse() helper function parses an xml file and creates an ElementTree for it. Consider the following example:

parse an xml document

from xml.etree import ElementTree

with open('pynerds.txt') as file:
  tree = ElementTree.parse(file)

  for node in tree.iter('message'):
      for element in node:
        print(element.tag, ': ', element.text)
      print(node.tail)

from : John
to : Mary
date : 2024/04/05
time : 11:00pm
subject : Requesting a date.
body : I hope you are doing fine, I saw you yesterday and...

from : Mary
to : John
date : 2024/04/09
time : 12:00pm
subject : Lorem ipsum dolor sit amet
body : Maxime mollitia, molestiae quas vel sint commodi repudiandae consequuntur...

xml.dom

The Document Object Model (DOM)is a programming interface for processing XML and similar documents like HTML.

The xml.dom submodule provides the DOM interface for processing xml documents in Python.

We can use the minidom.parse() function to create a simple dom interface.

from xml.dom import minidom

dom  = minidom.parse('pynerds.txt')

messages = dom.getElementsByTagName('message')

#retrieves text from a node
def getNodeText(node): 
  
    nodelist = node.childNodes 
    result = [] 
    for node in nodelist: 
        if node.nodeType == node.TEXT_NODE: 
            result.append(node.data) 
    return ''.join(result) 


for message in messages:
  print('from: ', getNodeText(message.getElementsByTagName('from')[0]))
  print('to: ', getNodeText(message.getElementsByTagName('to')[0]))
  print('subject: ', getNodeText(message.getElementsByTagName('subject')[0]))
  print('body: ', getNodeText(message.getElementsByTagName('body')[0]))
  print('\n')

from: John
to: Mary
subject: Requesting a date.
body: I hope you are doing fine, I saw you yesterday and...

from: Mary
to: John
subject: Lorem ipsum dolor sit amet
body: Maxime mollitia, molestiae quas vel sint commodi repudiandae consequuntur....

storage&exchange

pickle module

shelve module

dbm module

sqlite3 module

csv module

xml module

xml module in Python - module index

What the xml module does

`xml.etree` submodule

xml.dom

storage&exchange

pickle module

shelve module

dbm module

sqlite3 module

csv module

xml module

xml module in Python - module index

What the xml module does

xml.etree submodule

xml.dom

Related articles

`xml.etree` submodule