LuaExpat
XML Expat parsing for the Lua programming language

Introduction

Lua Object Model (LOM) is a representation of XML elements through Lua data types. Currently it is not supposed to be 100% complete, but simple.

LuaExpat's distribution provides an implementation of LOM that gets an XML documenta (a string) and transforms it to a Lua table. The only function exported is lxp.lom.parse.

Characteristics

The model represents each XML element as a Lua table. A LOM table has three special characteristics:

  • a special field called tag that holds the element's name;
  • an optional field called attr that stores the element's attributes (see attribute's section); and
  • the element's children are stored at the array-part of the table. A child could be an ordinary string or another XML element that will be represented by a Lua table following these same rules.

Attributes

The special field attr is a Lua table that stores the XML element's attributes as pairs <key>=<value>. To assure an order (if necessary), the sequence of keys could be placed at the array-part of this same table.

Examples

For a simple string like

    s = [[<abc a1="A1" a2="A2">inside tag `abc'</abc>]]

A call like

    tab = lxp.lom.parse (s))

Would result in a table equivalent to

tab = {
        ["attr"] = {
                [1] = "a1",
                [2] = "a2",
                ["a2"] = "A2",
                ["a1"] = "A1",
        },
        [1] = "inside tag `abc'",
        ["tag"] = "abc",
}

Now an example with an element nested inside another element

tab = lxp.lom.parse(
[[<qwerty q1="q1" q2="q2">
    <asdf>some text</asdf>
</qwerty>]]
)

The result would have been a table equivalent to

tab = {
        [1] = "\
        ",
        [2] = {
                ["attr"] = {
                },
                [1] = "some text",
                ["tag"] = "asdf",
        },
        ["attr"] = {
                [1] = "q1",
                [2] = "q2",
                ["q2"] = "q2",
                ["q1"] = "q1",
        },
        [3] = "\
",
        ["tag"] = "qwerty",
}

Note that even the new-line and tab characters are stored on the table.