Gentlemen, indent your XML!

By Filip Salo; published on February 06, 2007.

When pushing a lot of XML around, something like this may come in handy. This is my ~/bin/xmlindent.py:

#!/usr/bin/env python

from lxml import etree
import sys

def indent(elem, level=0):
    i = "\n" + level*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        for e in elem:
            indent(e, level+1)
            if not e.tail or not e.tail.strip():
                e.tail = i + "  "
        if not e.tail or not e.tail.strip():
            e.tail = i
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i

if len(sys.argv) > 1:
    src = sys.argv[1]
else:
    src = sys.stdin

tree = etree.parse(src)
indent(tree.getroot())
tree.write(sys.stdout, "utf-8")

The indent function is a variant of the one in Fredrik Lundh's effbotlib, and I'm using lxml instead of cElementTree because it gives a cleaner and more human-friendly output when there are namespaces involved.

Oh, and there was a bug, I guess, in lxml 1.0 that made it barf on parse(sys.stdin). Upgrading to 1.1.2 fixed that, though.

(As a bonus, that made me get easy_install working properly; one of those "nah, some other time" procrastination tarpits. It's a nice tool. easy_install, I mean. Not the tarpit.)

Enjoy.