This page is part of a static HTML representation of TriTarget.org at https://tritarget.org

Simple DSL Via XML

Sukima14th December 2023 at 9:22pm

There I was deep in some code designed to work with a specialized data structure: a flat tree or graph. "What is that?" You might ask.

I have no idea if this is an actual term but I am calling it a flat tree to describe a data model that represents a tree but is a single array where each entry has a reference to its parent by an ID.

[
  { "id": "1", "value": "foo" },
  { "id": "2", "value": "bar", "parent": "1" },
  { "id": "3", "value": "baz", "parent": "2" },
  { "id": "4", "value": "frotz", "parent": "1" }
]

Example of a flat tree as JSON

With this diagrammed we see the relationships:

Example of a tree structure

This is a simple example but imagine the graph being much more complex. On the project I was working on I needed to generate such data in tests. The issue was that I had to work harder to map the values to reach the same tree I had in my mind.

What I really wanted was a way to describe the tree as nested nodes like it was in my mind and have the computer correctly generate the properties to reflect that tree.

At first I thought about a fluent interface design but in TypeScript I worried about the complexity growing fast. It also didn't describe nesting well without tons of callback functions to describe the scoping.

My next thought was LISP-like which uses S-Expressions to describe lists and trees. An advantage of this kind of language is how easy it is to parse it. The disadvantage is that many really find the syntax difficult to read. It also means writing a parser on myself.

My third thought was XML as it is designed specifically to describe trees. It is also relatively recognizable in the front end because HTML. And best of is that parsing XML is built-in to all browsers as a one-liner!

<node id="1" value="foo">
  <node id="2" value="bar">
    <node id="3" value="baz"/>
  </node>
  <node id="4" value="frotz"/>
</node>

Example of a tree structure in XML

Neat, now I can describe a nested structure, parse it with built-in tools, and have it generate a proper data structure from that language.

test(
  'it parses XML into a flat tree structure',
  function (assert)
  {
    let expected = [
      { id: '1', value: 'A', parent: undefined },
      { id: '2', value: 'B', parent: '1' },
      { id: '3', value: 'C', parent: '1' },
    ];
    let actual = tree`
      <node value="A">
        <node value="B"/>
        <node value="C"/>
      </node>
    `;
    assert.deepEqual(actual, expected);
  }
);

Example test code using this DSL

If you notice, The example uses tagged template strings making the syntax even more terse then calling a function. Not only that but it can handle string interpolation and multi-lines well. Being XML whitespacing is not a concern and you can have comments inline.

Getting to the implementation… we can start with the entry point. Define the tagged template string function:

interface TreeNode {
  id: string;
  value: string;
  parent?: string;
}

function tree(
  strings: TemplateStringsArray,
  ...parts: unknown[]
): TreeNode[] {
  // TODO: not implemented yet
  return [];
}

When you call a tagged template string like this it will provide the function an array of strings and a variable number of args for each interpolated part.

tree`foo${'bar'}baz${'frotz'}foobar`;

Results in…

tree(['foo', 'baz', 'foobar'], 'bar', 'frotz');

Thus we can combine this into a single string with this one-liner:

let xml = strings.reduce(
  (r, s, i) => `${r}${s}${parts[i] ?? ''}`,
  ''
);

With a full string we can pass that directly into the XML parser with this:

let parser = new DOMParser();
let doc = parser.parseFromString(xml, 'application/xml');

The second argument to parseFromString can be either application/xml or text/html. In our case we want the parsing to be strict since the data it represents is also strict in nature. If we were doing document based markup the text/html would be more inline with our needs.

function tree(
  strings: TemplateStringsArray,
  ...parts: unknown[]
): TreeNode[] {
  let xml = strings
    .reduce((r, s, i) => `${r}${s}${parts[i] ?? ''}`, '');
  let doc = new DOMParser()
    .parseFromString(xml, 'application/xml');
  // TODO: not there yet
  return [];
}

At this point we have a fully fledged document object we can query and mutate as needed.

For example we could add auto-generated IDs to every node if we wanted:

for (let node of doc.querySelectorAll('node'))
  node.setAttribute('id', String(++nextId));

Given that this is a nested structure we can use recursion to generate objects for each node.

Also note, that an XML document requires a root node or it won’t parse properly that means we will always have a top level node to use recursion on. And better yet, walking a DOM is done top down which is the order we need when it comes to linking each child node to their parent.

Another trick with JavaScript is the ability to manage iteration with recursive generator functions.

return Array.from(nodesFromElement(doc.documentElement));

The nodesFromElement generator function could look like this:

function* nodesFromElement(
  node: Element
): Generator<TreeNode> {
  yield validNodeFromElement(node);
  for (let child of node.children)
    yield* nodesFromElement(child);
}

Given an Element we convert it to a data node and yield it. The we loop over the children recursively calling this function and yielding each result.

function validNodeFromElement(node: Element): TreeNode {
  switch (node.tagName) {
    case 'node':
      return nodeFromElement(node);
    case 'parsererror':
      throw new SyntaxError(
        node.textContent ?? 'malformed XML'
      );
    default:
      throw new SyntaxError(`unknown tag ${node.tagName}`);
  }
}

In this function we check the node's tag name to make sure we only work on the Domain Specific Language we’ve sanctioned.

We also check for XML parsing errors because unlike most functions that throw exceptions, DOMParser fails quietly by generating a document with a <parsererror> tag in it. We use that as a reason to throw our own exception.

Our last step in this process is to convert a valid node into a data structure which is eventually yielded.

function nodeFromElement(node: Element): TreeNode {
  let parentId = node.parentElement?.getAttribute('id');

  return {
    id: node.getAttribute('id') ?? '',
    value: node.getAttribute('value') ?? '',
    parent: parentId ?? undefined,
  };
}

See the full DEMO

Discuss this article