Version: 8.1

Parsing XML with Java Libraries

XML Document Security

Parsing XML can be potentially dangerous without hardening the parser implementation. Best practices on how to prevent attackers from exploiting vulnerabilities can be found here.

note

You will need to place your script in the current working directory where the script will be running. For example, if your script affects the Designer Launcher, put your script in the directory your Designer Launcher is installed.

What is the DOM Parser?

The Document Object Model (DOM) parser provides a powerful way to parse and manipulate XML documents. It's commonly used due to its ease of use and comprehensive functionality. The DOM parser breaks down XML into accessible elements, each representing a node in the XML tree structure. For more information on interfacing with the DOM parser, refer to the Java XML DOM Parser Documentation.

Using the DOM Parser

There are several ways to import XML data using the DOM parser, depending on how it's stored. It can retrieve data from an XML file using the file path or from a string. Regardless of the method, it provides a root object representing the XML document.

Jython - Reading a File
from javax.xml.parsers import DocumentBuilderFactory
from java.io import File

# Define your XML file path
xmlFilePath = "file.xml"  # Replace with your actual XML file path

# Create a DOM document builder
builderFactory = DocumentBuilderFactory.newInstance()
builder = builderFactory.newDocumentBuilder()

# Parse the XML file
file = File(xmlFilePath)
document = builder.parse(file)

# Access the root element
root = document.getDocumentElement()

Jython - Reading from a String
from javax.xml.parsers import DocumentBuilderFactory
from java.io import ByteArrayInputStream

# Define your XML string
xmlString = """
<employee id="1234">
    <name>John Smith</name>
    <start_date>2010-11-26</start_date>
    <department>IT</department>
    <title>Tech Support</title>
</employee>
"""  # Replace with your actual XML string

# Create a DOM document builder
builderFactory = DocumentBuilderFactory.newInstance()
builder = builderFactory.newDocumentBuilder()

# Parse the XML string
stream = ByteArrayInputStream(xmlString.encode('utf-8'))
document = builder.parse(stream)

# Access the root element
root = document.getDocumentElement()

Each tag is considered an element object. For instance, in the given example, the root element would be the employee tag. Elements can have attributes contained within the tag itself. In the example above, the employee element has an id attribute with a value of 1234. Additionally, elements can have additional data, typically between the start and end tags. This data can be accessed using the Element object's built-in functionality.

Function	Description	Example	Output
Element.tagName	Returns the name of the element's tag.	`print root.tagName`	`employee`
Element.attributes	Returns a dictionary of the element's attributes.	`print root.attributes.item(0)`	`id: "1234"`
Element.textContent	Returns the additional data of the element.	`print root.textContent`	`John Smith 2010-11-26 IT Tech Support`
Element.item(index)	Allows direct reference to an element's children by index.	`print root.item(5).tagName`	`department`

A Simple Employee Example

Using the functions above, let's parse through a sample XML string and extract employee data. We'll demonstrate how to access different elements and attributes and display them. Let's iterate through the XML elements and print out the following employee details:

Code Output
Employee ID: 1
Name: John Doe
Department: Engineering

Employee ID: 2
Name: Jane Smith
Department: Marketing

Code Snippet - Extracting Employee Details
from javax.xml.parsers import DocumentBuilderFactory
from java.io import ByteArrayInputStream

# Define your XML string
xmlString = """
<employees>
  <employee id="1">
      <name>John Doe</name>
      <department>Engineering</department>
  </employee>
  <employee id="2">
      <name>Jane Smith</name>
      <department>Marketing</department>
  </employee>
</employees>
"""  # Replace with your actual XML string

# Create a DOM document builder
builderFactory = DocumentBuilderFactory.newInstance()
builder = builderFactory.newDocumentBuilder()

# Parse the XML string
document = builder.parse(ByteArrayInputStream(xmlString.encode()))

# Access the root element
root = document.getDocumentElement()

# Iterate through employees
employees = root.getElementsByTagName("employee")
for i in range(employees.getLength()):
  employee = employees.item(i)
  # Convert the id attribute to an integer
  id = int(employee.getAttribute("id"))
  print "Employee ID:", id
  print "Name:", employee.getElementsByTagName("name").item(0).textContent
  print "Department:", employee.getElementsByTagName("department").item(0).textContent
  print

What is the SAX Parser?

The Simple API for XML (SAX) parser, available through Java libraries, provides an event-driven approach to parse XML documents. It's widely used for its efficiency, especially when handling large XML files. SAX parses XML sequentially and triggers events as it encounters elements, attributes, and other components in the XML document. For more detailed information about the SAX parser, refer to the Java XML SAX Parser Documentation.

Using the SAX Parser

The SAX parser doesn't build a tree structure like the DOM parser. Instead, it parses the XML document sequentially and triggers events that the developer can handle. Here's an example of using the SAX parser to parse an XML file:

Java - Reading a String
from javax.xml.parsers import SAXParserFactory
from org.xml.sax.helpers import DefaultHandler
from java.io import ByteArrayInputStream

# Define your XML string
xmlString = """
<employees>
    <employee id="1">
        <name>John Doe</name>
        <department>Engineering</department>
    </employee>
    <employee id="2">
        <name>Jane Smith</name>
        <department>Marketing</department>
    </employee>
</employees>
"""  # Replace with your actual XML string

# Define a custom ContentHandler
class MyContentHandler(DefaultHandler):
    def startElement(self, uri, localName, qName, attributes):
        print("Start Element:", qName)
        for i in range(attributes.getLength()):
            print("Attribute:", attributes.getQName(i), "=", attributes.getValue(i))

    def endElement(self, uri, localName, qName):
        print("End Element:", qName)

    def characters(self, ch, start, length):
        print("Character Data:", ch[start:start+length])

# Create a SAX parser
saxParserFactory = SAXParserFactory.newInstance()
saxParser = saxParserFactory.newSAXParser()

# Parse the XML string
stream = ByteArrayInputStream(xmlString.encode('utf-8'))
saxParser.parse(stream, MyContentHandler())

In the above example, we define a custom ContentHandler class that extends DefaultHandler. This class overrides methods to handle idfferent events encountered during XML parsing, such as starting and ending elements, and character data.

What is the StAX Parser?

The Streaming API for XML (StAX) parser, available through Java libraries, offers a cursor-based approach to parse XML documents. It provides an efficient way to read and process XML sequentially without loading the entire document into memory. StAX parsers allow developers to iterate through XML elements, attributes, and other components as they are encountered in the XML stream. For more detailed information about the StAX parser, refer to the Java XML StAX Parser Documentation.

Using the StAX Parser

The StAX parser operates in a streaming fashion, allowing developers to read XML content sequentially without the need to build a complete in-memory representation of the XML document. Here's an example of how to create an XML input factory and a stream reader to parse the XML content. We iterate through the XML stream and handle different events such as starting and ending elements, as well as character data.

Code Snippet - Extracting Employee Details
from javax.xml.stream import XMLInputFactory, XMLStreamReader
from java.io import ByteArrayInputStream

# Create an XML input factory
inputFactory = XMLInputFactory.newInstance()

# Create an XML stream reader
streamReader = inputFactory.createXMLStreamReader(ByteArrayInputStream(xmlString.encode()))

# Iterate through the XML stream
while streamReader.hasNext():
    event = streamReader.next()
    if event == XMLStreamReader.START_ELEMENT:
        print("Start Element:", streamReader.getLocalName())
        # Print attributes, if any
        for i in range(streamReader.getAttributeCount()):
            print("Attribute:", streamReader.getAttributeLocalName(i), "=", streamReader.getAttributeValue(i))
    elif event == XMLStreamReader.END_ELEMENT:
        print("End Element:", streamReader.getLocalName())
    elif event == XMLStreamReader.CHARACTERS:
        print("Character Data:", streamReader.getText())

What is the DOM Parser?​

Using the DOM Parser​

A Simple Employee Example​

What is the SAX Parser?​

Using the SAX Parser​

What is the StAX Parser?​

Using the StAX Parser​