XML.jl

Read and write XML in pure Julia.

Quickstart

using XML

filename = joinpath(dirname(pathof(XML)), "..", "test", "data", "books.xml")

doc = read(filename, Node)

children(doc)
# 2-Element Vector{Node}:
#  Node Declaration <?xml version="1.0"?>
#  Node Element <catalog> (12 children)

doc[end]  # The root node
# Node Element <catalog> (12 children)

doc[end][2]  # Second child of root
# Node Element <book id="bk102"> (6 children)

`Node` Interface

Every node in the XML DOM is represented by Node, a single type parametrized on its string storage.

nodetype(node)      -> XML.NodeType (an enum)
tag(node)           -> String or Nothing
attributes(node)    -> XML.Attributes{String} or Nothing
value(node)         -> String or Nothing
children(node)      -> Vector{Node}
is_simple(node)     -> Bool (e.g. <tag>text</tag>)
simple_value(node)  -> e.g. "text" from <tag>text</tag>

`NodeType`

Each item in an XML DOM is classified by its NodeType:

NodeType	XML Representation	Constructor
`Document`	An entire document	`Document(children...)`
`DTD`	`<!DOCTYPE ...>`	`DTD(...)`
`Declaration`	`<?xml attributes... ?>`	`Declaration(; attrs...)`
`ProcessingInstruction`	`<?tag attributes... ?>`	`ProcessingInstruction(tag; attrs...)`
`Comment`	`<!-- text -->`	`Comment(text)`
`CData`	`<![CDATA[text]]>`	`CData(text)`
`Element`	`<tag attrs...> children... </tag>`	`Element(tag, children...; attrs...)`
`Text`	the `text` part of `<tag>text</tag>`	`Text(text)`

Mutation

push!(parent, child)   # Add a child
parent[2] = child      # Replace a child
node["key"] = "value"  # Add/change an attribute
node["key"]            # Get an attribute

Tree Navigation

depth(child, root)      # Depth of child relative to root
parent(child, root)     # Parent of child within root's tree
siblings(child, root)   # Siblings of child within root's tree

Writing Elements with `XML.h`

Similar to Cobweb.jl, XML.h enables you to write elements with a simpler syntax:

using XML: h

node = h.parent(
    h.child("first child content", id="id1"),
    h.child("second child content", id="id2")
)
# Node Element <parent> (2 children)

print(XML.write(node))
# <parent>
#   <child id="id1">first child content</child>
#   <child id="id2">second child content</child>
# </parent>

Reading

# From a file:
read(filename, Node)

# From a string:
parse(str, Node)

Writing

XML.write(filename::String, node)  # write to file
XML.write(io::IO, node)            # write to stream
XML.write(node)                    # return String

XML.write respects xml:space="preserve" on elements, suppressing automatic indentation.

XPath

Query nodes using a subset of XPath 1.0 via xpath(node, path):

doc = parse("""
<root>
  <a id="1"><b>hello</b></a>
  <a id="2"><b>world</b></a>
</root>
""", Node)

root = doc[end]

xpath(root, "//b")           # All <b> descendants
xpath(root, "a[@id='2']/b")  # <b> inside <a id="2">
xpath(root, "a[1]")          # First <a> child
xpath(root, "//b/text()")    # Text nodes inside all <b>s

Supported syntax

Expression	Description
`/`	Root / path separator
`tag`	Child element by name
`*`	Any child element
`//`	Descendant-or-self (recursive)
`.`	Current node
`..`	Parent node
`[n]`	Positional predicate (1-based)
`[@attr]`	Has-attribute predicate
`[@attr='v']`	Attribute-value predicate
`text()`	Text node children
`node()`	All node children
`@attr`	Attribute value (returns strings)

Streaming Tokenizer

For large files or when you need fine-grained control, XML.XMLTokenizer provides a streaming tokenizer that yields tokens without building a DOM. Token kinds live in the XML.XMLTokenizer.TokenKinds baremodule (e.g. TokenKinds.OPEN_TAG, TokenKinds.TEXT).

using XML.XMLTokenizer: tokenize

for token in tokenize("<root><child attr=\"val\">text</child></root>")
    println(token.kind, " => ", repr(String(token.raw)))
end
# OPEN_TAG => "<root"
# TAG_CLOSE => ">"
# OPEN_TAG => "<child"
# ATTR_NAME => "attr"
# ATTR_VALUE => "\"val\""
# TAG_CLOSE => ">"
# TEXT => "text"
# CLOSE_TAG => "</child"
# TAG_CLOSE => ">"
# CLOSE_TAG => "</root"
# TAG_CLOSE => ">"

`LazyNode`

For read-only access without building a full DOM tree, use LazyNode. It stores only a reference to the source string and re-tokenizes on demand, using significantly less memory:

doc = parse(xml_string, LazyNode)
doc = read("file.xml", LazyNode)

LazyNode supports the same read-only interface as Node: nodetype, tag, attributes, value, children, is_simple, simple_value, plus integer and string indexing.

For streaming and high-throughput workloads, several extra accessors avoid materializing intermediate collections:

sourcetext(n)               # zero-copy SubString view of the node's raw source bytes
eachchildnode(n)            # lazy iterator over children — no Vector allocation
children!(buf, n)           # collect children into a reusable buffer
eachattribute(n)            # lazy iterator over attribute name=>value pairs
is_simple_value(n)          # combined is_simple + simple_value (one tokenizer pass)
get(n, key, default)        # single-attribute read without building Attributes
XML.write(n)                # zero-copy: returns node's original source text
XML.write(n; normalize=true) # re-parse + pretty-print, collapses source whitespace

Memory-mapped files

For very large files, combine LazyNode with memory mapping to avoid reading the entire file into heap memory:

using XML, Mmap, StringViews

doc = open("very_large.xml") do io
    sv = StringView(Mmap.mmap(io))
    parse(sv, LazyNode)
end

AbstractTrees Integration

Loading AbstractTrees alongside XML enables tree-walking utilities (print_tree, PreOrderDFS, Leaves, etc.) on both Node and LazyNode:

using XML, AbstractTrees

doc = parse("<a><b/><c><d/></c></a>", Node)
print_tree(doc)
# Document
# └─ <a>
#    ├─ <b>
#    └─ <c>
#       └─ <d>

for n in PreOrderDFS(doc)
    nodetype(n) == Element && println(tag(n))
end

Benchmarks

Benchmark source: benchmarks.jl. Test data: books.xml (small, ~4 KB) and a generated XMark auction XML (medium, ~14 MB).

                         Parse (small) — median time (ms)

        XML.jl  ■■■■■■■ 0.0374
   XML.jl (SS)  ■■■■■■■ 0.0339
         EzXML  ■■■■ 0.0218
      LightXML  ■■■■ 0.0218
       XMLDict  ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.200


                         Parse (medium) — median time (ms)

        XML.jl  ■■■■■■■■■■■■■■ 185.0
   XML.jl (SS)  ■■■■■■■■■■■■■ 168.0
         EzXML  ■■■■■■ 81.5
      LightXML  ■■■■■■■■ 107.0
       XMLDict  ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 520.0


                      Write (small) — median time (ms)

     XML.jl  ■■■■ 0.00929
      EzXML  ■■■■ 0.0103
   LightXML  ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.101


                      Write (medium) — median time (ms)

     XML.jl  ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 48.0
      EzXML  ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 52.6
   LightXML  ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 56.1


                        Read file — median time (ms)

     XML.jl  ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 193.0
      EzXML  ■■■■■■■■■■■■■■■■■■■■■■■■■ 121.0
   LightXML  ■■■■■■■■■■■■■■■■■■■■ 95.6


                   Collect tags (small) — median time (ms)

     XML.jl  ■■■■■■ 0.000586
      EzXML  ■■■■■■■■■■■■■■■■■■■■■■ 0.00205
   LightXML  ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.00368


                  Collect tags (medium) — median time (ms)

     XML.jl  ■■■■■■■■■■■■■■■■■■ 13.1
      EzXML  ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 29.4
   LightXML  ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 23.2

versioninfo()
# Julia Version 1.12.6
# Commit 15346901f00 (2026-04-09 19:20 UTC)
# Build Info:
#   Official https://julialang.org release
# Platform Info:
#   OS: macOS (arm64-apple-darwin24.0.0)
#   CPU: 10 × Apple M1 Pro
#   WORD_SIZE: 64
#   LLVM: libLLVM-18.1.7 (ORCJIT, apple-m1)
#   GC: Built with stock GC
# Threads: 8 default, 1 interactive, 8 GC (on 8 virtual cores)
# Environment:
#   JULIA_NUM_THREADS = auto

Name		Name	Last commit message	Last commit date
Latest commit History 207 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
ext		ext
src		src
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XML.jl

Quickstart

`Node` Interface

`NodeType`

Mutation

Tree Navigation

Writing Elements with `XML.h`

Reading

Writing

XPath

Supported syntax

Streaming Tokenizer

`LazyNode`

Memory-mapped files

AbstractTrees Integration

Benchmarks

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

XML.jl

Quickstart

Node Interface

NodeType

Mutation

Tree Navigation

Writing Elements with XML.h

Reading

Writing

XPath

Supported syntax

Streaming Tokenizer

LazyNode

Memory-mapped files

AbstractTrees Integration

Benchmarks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`Node` Interface

`NodeType`

Writing Elements with `XML.h`

`LazyNode`

Packages