Tutorial

Python XML to JSON, XML to Dict

Published on August 3, 2022
Default avatar

By Shubham

Python XML to JSON, XML to Dict

While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.

Today we will learn how to convert XML to JSON and XML to Dict in python. We can use python xmltodict module to read XML file and convert it to Dict or JSON data. We can also stream over large XML files and convert them to Dictionary. Before stepping into the coding part, let’s first understand why XML conversion is necessary.

Converting XML to Dict/JSON

XML files have slowly become obsolete but there are pretty large systems on the web that still uses this format. XML is heavier than JSON and so, most developers prefer the latter in their applications. When applications need to understand the XML provided by any source, it can be a tedious task to convert it to JSON. The xmltodict module in Python makes this task extremely easy and straightforward to perform.

Getting started with xmltodict

We can get started with xmltodict module but we need to install it first. We will mainly use pip to perform the installation.

Install xmltodict module

Here is how we can install the xmltodict module using Python Package Index (pip):

pip install xmltodict

This will be done quickly as xmltodict is a very light weight module. Here is the output for this installation: python install xmltodict module The best thing about this installation was that this module is not dependent on any other external module and so, it is light-weight and avoids any version conflicts. Just to demonstrate, on Debian based systems, this module can be easily installed using the apt tool:

sudo apt install python-xmltodict

Another plus point is that this module has an official Debian package.

Python XML to JSON

The best place to start trying this module will be to perform an operation it was made to perform primarily, to perform XML to JSON conversions. Let’s look at a code snippet on how this can be done:

import xmltodict
import pprint
import json

my_xml = """
    <audience>
      <id what="attribute">123</id>
      <name>Shubham</name>
    </audience>
"""

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(json.dumps(xmltodict.parse(my_xml)))

Let’s see the output for this program: python xml to json Here, we simply use the parse(...) function to convert XML data to JSON and then we use the json module to print JSON in a better format.

Converting XML File to JSON

Keeping XML data in the code itself is neither always possible nor it is realistic. Usually, we keep our data in either database or some files. We can directly pick files and convert them to JSON as well. Let’s look at a code snippet how we can perform the conversion with an XML file:

import xmltodict
import pprint
import json

with open('person.xml') as fd:
    doc = xmltodict.parse(fd.read())

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(json.dumps(doc))

Let’s see the output for this program: python xml file to json Here, we used another module pprint to print the output in a formatted manner. Apart from that, using the open(...) function was straightforward, we used it get a File descriptor and then parsed the file into a JSON object.

Python XML to Dict

As the module name suggest itself, xmltodict actually converts the XML data we provide to just a simply Python dictionary. So, we can simply access the data with the dictionary keys as well. Here is a sample program:

import xmltodict
import pprint
import json

my_xml = """
    <audience>
      <id what="attribute">123</id>
      <name>Shubham</name>
    </audience>
"""
my_dict = xmltodict.parse(my_xml)
print(my_dict['audience']['id'])
print(my_dict['audience']['id']['@what'])

Let’s see the output for this program: python xml to dict So, the tags can be used as the keys along with the attribute keys as well. The attribute keys just need to be prefixed with the @ symbol.

Supporting Namespaces in XML

In XML data, we usually have a set of namespaces which defines the scope of the data provided by the XML file. While converting to the JSON format, it is then necessary that these namespaces persist in the JSON format as well. Let us consider this sample XML file:

<root xmlns="https://defaultns.com/"
        xmlns:a="https://a.com/">
    <audience>
        <id what="attribute">123</id>
        <name>Shubham</name>
    </audience>
</root>

Here is a sample program on how we can include XML namespaces in the JSON format as well:

import xmltodict
import pprint
import json

with open('person.xml') as fd:
    doc = xmltodict.parse(fd.read(), process_namespaces=True)

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(json.dumps(doc))

Let’s see the output for this program: xml namespace to dict and json

JSON to XML conversion

ALthough converting from XML to JSON is the prime objective of this module, xmltodict also supports doing the reverse operation, converting JSON to XML form. We will provide the JSON data in program itself. Here is a sample program:

import xmltodict

student = {
  "data" : {
    "name" : "Shubham",
    "marks" : {
      "math" : 92,
      "english" : 99
    },
    "id" : "s387hs3"
  }
}

print(xmltodict.unparse(student, pretty=True))

Let’s see the output for this program: python json to xml Please note that giving a single JSON key is necessary for this to work correctly. If we consider that we modify our program to contain multiple JSON keys at the very first level of data like:

import xmltodict

student = {
    "name" : "Shubham",
    "marks" : {
        "math" : 92,
        "english" : 99
    },
    "id" : "s387hs3"
}

print(xmltodict.unparse(student, pretty=True))

In this case, we have three keys at the root level. If we try to unparse this form of JSON, we will face this error: python json to xml unparse error This happens because xmltodict needs to construct the JSON with the very first key as the root XML tag. This means that there should only be a single JSON key at the root level of data.

Conclusion

In this lesson, we studied an excellent Python module which can be used to parse and convert XML to JSON and vice-versa. We also learned how to convert XML to Dict using xmltodict module. Reference: API Doc

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about us


About the authors
Default avatar
Shubham

author

Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
JournalDev
DigitalOcean Employee
DigitalOcean Employee badge
July 19, 2021

Thanks for your tutorial. This worked for me: pp.pprint(json.dumps(xmltodict.parse(results))) However as your tutorial ends early I don’t know what to do next if I want to store this result in a variable. I tried json_data = pp.pprint(json.dumps(xmltodict.parse(results))) but I don’t think this is right. How do I then store this result in a variable to use next in my code without storing it in a file?

- Ben

    JournalDev
    DigitalOcean Employee
    DigitalOcean Employee badge
    April 3, 2020

    Como puedo guardar el documento xml después de leerlo json

    - Jhonatan

      Try DigitalOcean for free

      Click below to sign up and get $200 of credit to try our products over 60 days!

      Sign up

      Join the Tech Talk
      Success! Thank you! Please check your email for further details.

      Please complete your information!

      Get our biweekly newsletter

      Sign up for Infrastructure as a Newsletter.

      Hollie's Hub for Good

      Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.

      Become a contributor

      Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

      Welcome to the developer cloud

      DigitalOcean makes it simple to launch in the cloud and scale up as you grow — whether you're running one virtual machine or ten thousand.

      Learn more
      DigitalOcean Cloud Control Panel