PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Register now
Data Services
Pricing
Login
Try Zyte APIContact Sales
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    SERP

    Enterprise

    DocumentationSupport

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    Coding Agent Add-Ons

    Agentic Web Data

    Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.

  • Data Services
  • Pricing
  • Browse

    • BlogArticles, podcasts, videos
    • Case studiesCustomer outcomes
    • White papersIn-depth reports
    • EventsConferences, webinars, recordings

    Subscribe

    • NewsletterSwiftly delivered
    • Discord communityExtract Data community
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications
Login
Try Zyte APIContact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
BlogLearnA Practical Guide to Python XML Parsing
LearnScraping practice

A Practical Guide to Python XML Parsing

K

Karlo Jedud

·

10 min read · May 15, 2025

Summarize at:

ChatGPT

Perplexity


  • Introduction

  • Why Use XML in Modern Development?

  • Setting Up Your Python Environment for XML Parsing

  • Navigating the XML Tree Structure

  • Modifying XML Data

  • Deleting XML Elements

  • Saving Changes to XML Files

  • Advanced XML Parsing Techniques

  • Adding Metadata

  • Conclusion

Extensible Markup Language (XML) is a standard format used for structuring, storing, and transporting data. Despite the rise of JSON in modern APIs, XML remains a vital format, especially in industries like telecommunications, banking, healthcare, and e-commerce, where complex data structures and strict validation are often required.

XML is a powerful markup language that enables the representation of hierarchical data, making it perfect for scenarios where the relationships between data points need to be expressed explicitly. Whether you're dealing with configuration files, SOAP-based APIs, or large datasets used in enterprise applications, understanding how to parse and manipulate XML is essential.

In this guide, we’ll dive into the world of XML parsing using Python, providing not just basic methods but also advanced techniques like handling XML namespaces, performing XPath queries, and mapping XML data to custom Python objects. By the end, you will have a deep understanding of how to read, modify, and write XML files efficiently using Python.

Why Use XML in Modern Development?

While JSON may be the go-to format for many web APIs, XML has distinct advantages in certain use cases. Here’s why XML is still relevant and used widely today:

  1. Complex Hierarchical Data: XML allows for more complex data representations than JSON, especially when the relationships between data elements are critical.

  2. Validation: With XML Schemas (XSD), developers can enforce strong typing and validate the structure and data types in XML documents.

  3. Wide Adoption in Legacy Systems: Many enterprise systems, especially in banking, government, and telecom sectors, are built on XML-based protocols.

  4. Rich Metadata Support: XML is ideal when metadata, such as versioning, authorship, and creation dates, needs to be included along with the data.

  5. Namespacing: XML supports namespaces, allowing developers to avoid name conflicts in large and complex XML documents, especially when combining multiple sources of data.

Given these advantages, XML remains a critical format in many fields. Therefore, learning how to parse and manipulate XML data is a necessary skill for developers working with enterprise systems, data pipelines, and other data-intensive applications.

Setting Up Your Python Environment for XML Parsing

Before we get into the code, let’s make sure your Python environment is ready for XML parsing. Python’s standard library includes xml.etree.ElementTree, which is the most commonly used module for XML parsing. For more complex XML handling, you might also want to consider external libraries like lxml.

Required Libraries

1.ElementTree: Python’s built-in XML parsing library. No external installation is required, as it’s part of the Python Standard Library.
bash

python

2.lxml (optional): A more powerful XML and HTML handling library. It supports XPath and XSLT, making it ideal for complex use cases.
bash

python

Once you’ve got your environment ready, we’ll proceed to the basic tasks of loading and parsing XML data using Python.

Loading and Parsing XML Data in Python

Loading XML data is the first step to working with XML in Python. Depending on the source, XML data can come from files, strings, or even web responses (such as API calls). Python provides flexible methods to handle each case.

Loading XML from a File

The most common scenario is loading XML data from a file, especially when dealing with configuration files or structured datasets. Here’s how you can load and parse XML from a file:

python

python

In this example:

  • ET.parse() loads the XML file into an element tree.

  • getroot() retrieves the root element of the XML structure, which is the starting point for further exploration.

Loading XML from a String

Sometimes, XML data is delivered as a string—especially when working with web services or APIs. You can load XML from a string using ET.fromstring():

python

python

This method is ideal when you need to quickly parse XML content from sources like HTTP responses or dynamically generated strings.

Loading XML from a URL (Web API)

When working with web services, you often receive XML data as a response. You can use Python’s requests library to fetch the XML from a URL and then parse it:

python

python

In this case, response.content contains the XML data, which is then parsed using ET.fromstring().

Navigating the XML Tree Structure

Once the XML data is loaded, it’s time to navigate the tree structure. XML is hierarchical, meaning each element can have children, attributes, and text content.

Accessing Child Elements

You can easily access the child elements of any node in the tree:

python

python

This loop goes through the children of the root node, printing their tags (element names) and text (element content).

Accessing Nested Elements

For more deeply nested elements, you can chain multiple loops or use XPath-like syntax:

python

python

This example navigates through all grandchild elements of the root node.

Accessing Attributes

Attributes in XML are key-value pairs attached to elements. You can retrieve them using the .attrib property:

python

python

If an element has an attribute, for instance, , the .attrib property will return it as a dictionary: {'id': '1'}.

Finding Specific Elements

You can search for specific elements using find() or findall(). While find() returns the first matching element, findall() returns a list of all matching elements:

python

python

Modifying XML Data

Modifying XML data is a common task, especially when working with configurations or building dynamic datasets. Python provides convenient methods to change the content, attributes, and structure of an XML tree.

Changing Element Text

You can modify the text content of any element in the XML tree:

python

python

This snippet updates the text of all child elements to 'Updated Value'.

Modifying Attributes

Just like element text, attributes can also be modified dynamically:

python

python

This example changes the id attribute of every element to '2'.

Inserting New Elements into XML

In some cases, you may need to insert new elements into the XML tree. This can be done by appending new child elements or adding attributes to existing ones.

Adding a New Element

Here’s how you can add a new element to the root node:

python

python

Inserting Elements with Attributes

You can also add elements with attributes, making the XML data more structured:

python

python

Deleting XML Elements

If you need to remove elements from an XML tree, Python’s ElementTree module provides an easy way to do so. Keep in mind that removing a parent element will also delete all its child elements.

Removing Elements

Here’s how you can remove specific elements:

python

This code removes all elements from the root node.

Saving Changes to XML Files

After you’ve modified the XML structure, you’ll want to save the changes. Python allows you to write the updated XML back to a file, ensuring that your changes are persistent.

Saving the XML Tree to a File

Here’s how you can save the modified XML tree to a file:

python

  • encoding='utf-8': Ensures the file is saved with UTF-8 encoding, which is essential for handling special characters.

  • xml_declaration=True: Adds the XML declaration () to the beginning of the file.

Advanced XML Parsing Techniques

While basic parsing and modification techniques cover most use cases, more complex scenarios may require advanced techniques like XPath queries, handling XML namespaces, and mapping XML data to custom Python objects.

Using XPath for Complex Queries

XPath is a powerful query language for selecting nodes in an XML document based on various criteria. Here’s how to use XPath to find specific elements in an XML document:

python

This query retrieves all elements with an id attribute of "2". XPath is useful for filtering elements or navigating deeply nested structures efficiently.

Handling XML Namespaces

XML namespaces are often used to avoid naming conflicts, especially when combining XML documents from different sources. To handle XML namespaces, you need to declare them in your code:

python

In this example, the namespaces dictionary maps the ns prefix to the full namespace URL.

Mapping XML Data to Python Objects

In some cases, you may want to map XML data to custom Python objects for easier manipulation and interaction. This approach is especially useful when working with object-oriented programming in Python.

Example: Mapping XML Data to a Custom Class

Here’s an example of how to create a Python class that represents XML data:

python

In this code, we map the XML elements to attributes of a Person class, allowing for a more structured and object-oriented approach to handling XML data.

Encoding and Decoding XML

Proper encoding and decoding are crucial when working with XML files, especially when dealing with special characters or non-ASCII text.

Writing XML with Proper Encoding

To ensure special characters are encoded correctly in the XML file, use the encoding parameter when saving:

python

This ensures that characters outside the ASCII range are properly handled.

Decoding XML Data from Different Encodings

If your XML data is encoded in something other than UTF-8 (such as ISO-8859-1), you’ll need to specify the encoding when reading the file:

python

This ensures that special characters are read correctly.

Adding Metadata

Metadata provides additional information about the XML document, such as versioning, authorship, or document creation details.

Here’s an example of adding metadata:

python

This demonstrates how to add attributes to the root element that serve as metadata.

Conclusion

In this comprehensive guide, we've explored how to load, parse, modify, and save XML data in Python. From basic operations like navigating the XML tree to more advanced tasks like XPath queries, namespaces, and mapping XML data to Python objects, Python offers robust tools for handling XML effectively.

XML remains a crucial format for structured data exchange in various industries, and mastering its parsing and manipulation with Python will open up opportunities for working with APIs, enterprise systems, and data pipelines. Whether you're dealing with simple configuration files or complex XML documents with namespaces and metadata, Python’s versatility makes it an ideal language for XML parsing.

By applying the techniques outlined in this guide, you’ll be well-equipped to handle XML tasks efficiently in your projects. Keep exploring the vast landscape of XML and Python, and you'll find new ways to optimize your data workflows.

More learn articles

Keep learning

All learn articles →
What are residential proxies bannerUse case

What is a residential proxy?

Learn what residential proxies are, how they compare to datacenter proxies, and why modern web scraping needs more than IP diversity.

10 min read

Zyte Case Studies — every customer story, in one placeUse case

How much do rotating proxies cost?

Learn how much rotating proxies cost, what affects pricing, and why total web scraping costs often go beyond proxy subscriptions.

10 min read

Zyte Case Studies — every customer story, in one placeUse case

How do rotating proxies work?

Learn how rotating proxies work, when to use them for web scraping, and why IP rotation alone is not enough for reliable data access.

10 min read

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026
1pip install xml.etree.ElementTree
Copy
1pip install lxml
Copy
1import xml.etree.ElementTree as ET
2
3# Load XML data from a file
4tree = ET.parse('example.xml')
5root = tree.getroot()
6
7# Print the root element
8print(root.tag)
Copy
1xml_data = """<root>
2                <child>Value</child>
3              </root>"""
4root = ET.fromstring(xml_data)
5print(root.tag)  # Output: root
Copy
1import requests
2import xml.etree.ElementTree as ET
3
4# Fetch XML from a URL
5response = requests.get('https://example.com/api/data.xml')
6root = ET.fromstring(response.content)
7
8# Print root tag
9print(root.tag)
Copy
1# Access child elements
2for child in root:
3    print(child.tag, child.text)
Copy
1# Access nested elements
2for child in root.findall(".//grandchild"):
3    print(child.tag, child.text)
Copy
1# Access attributes
2for elem in root.iter('child'):
3    print(elem.attrib)
Copy
1# Find the first 'child' element
2first_child = root.find('child')
3print(first_child.text)
4
5# Find all 'child' elements
6all_children = root.findall('child')
7for child in all_children:
8    print(child.tag, child.text)
Copy
1# Modify the text content of a child element
2for elem in root.iter('child'):
3    elem.text = 'Updated Value'
Copy
1# Modify an attribute
2for elem in root.iter('child'):
3    elem.set('id', '2')
Copy
1# Add a new element under the root
2new_element = ET.Element('newchild')
3new_element.text = 'New Value'
4root.append(new_element)
Copy
1# Add an element with attributes
2new_element = ET.Element('newchild', attrib={'id': '3'})
3new_element.text = 'New Value'
4root.append(new_element)
Copy
1# Remove a specific element
2for elem in root.findall('child'):
3    root.remove(elem)
Copy
1# Save the modified XML tree back to a file
2tree.write('modified_example.xml', encoding='utf-8', xml_declaration=True)
Copy
1# Use XPath to find elements
2for elem in root.findall(".//child[@id='2']"):
3    print(elem.tag, elem.text)
Copy
1# Parse XML with namespaces
2namespaces = {'ns': 'http://example.com/ns'}
3for elem in root.findall('ns:child', namespaces):
4    print(elem.text)
Copy
1class Person:
2    def __init__(self, name, age):
3        self.name = name
4        self.age = age
5
6# Create an object from XML data
7person_element = root.find('person')
8person = Person(person_element.find('name').text, person_element.find('age').text)
9
10print(f"Person: {person.name}, Age: {person.age}")
Copy
1# Write XML with UTF-8 encoding
2tree.write('encoded_example.xml', encoding='utf-8', xml_declaration=True)
Copy
1# Decode XML data with a specific encoding
2with open('example.xml', encoding='iso-8859-1') as file:
3    content = file.read()
4    root = ET.fromstring(content)
Copy
1# Add metadata to the root element
2root.set('version', '1.0')
3root.set('author', 'John Doe')
Copy