This page is part of a static HTML representation of TriTarget.org at https://tritarget.org

Vim as a static syntax highlighter

Sukima4th November 2022 at 8:33am

I fell in love with computers at a young age. I think a common theme with them was that the ability for them to solve problems wasn’t constrained to one methodology. I think this story I just went through highlights that concept well.

A key take-away I got from this journey was just how flexible and varied technology can be and when that technology remains open to the Unix Philosophy’s even if they produce hard to use data tools can still interface with it. A pipeline can be established that make automation work. This is in contrast to so many tools out there that think they are the only game in town and don't follow the same philosophy’s. For example I cannot imagine this working with VS Code or InteliJ.

The problem

My blog is written in TiddlyWiki which come with a highlight.js plugin. There were two issue with this. First is language support which is lacking for more modern languages like Ember’s HBS templates. It was also difficult to find a suitable color scheme. This sent me to research alternatives. And there are a lot. Each taking on more complexity and in the end all to be added to the bloat of a single HTML file that is this blog.

I often mused that I use Vim every day and its syntax highlighting and if there was only a way to get my blog’s code examples to show a Vim editor it would be perfect. And then it dawned on me. What if I actually could?

Vim has a built in script that will output the syntax highlighting colors and source text as an HTML file complete with CSS class names. And TiddlyWiki was really just HTML and JavaScript. Why couldn't we export the HTML from Vim and import it into TiddlyWiki as a tiddler?

Vim as a script in a build pipeline

I already run my blog through a Makefile which generates diagrams with PlantUML, copies assets to the output folder, bundle JavaScript through Babe, build static files, and even deploy to production. It wasn't to hard to imagine a way to ask vim to write an HTML file from another source file and then use that to build the tiddlers for the TiddlyWiki generator engine.

I want the source files to live in one directory; have make convert them to HTML (via Vim); have make post process the files to generated output stored into the TiddlyWiki source directory.

sourcecode_src := $(shell find sourcecode -type f)
sourcecode_html := $(patsubst \
  sourcecode/%,wiki/sourcecode/%.html,$(sourcecode_src))
sourcecode_tid := $(patsubst \
  sourcecode/%,tiddlers/sourcecode/%.tid,$(sourcecode_src))

.PHONY: generated

generated: $(sourcecode_tid) \
  tiddlers/generated/sourcecode.css \
  tiddlers/generated/PGPKeyFile.tid \
  tiddlers/generated/PGPKeyInfo.tid

wiki/sourcecode/%.html: sourcecode/%
  @mkdir -p $(@D)
  SOURCE_FILE="$<" \
    TARGET_FILE="$@" \
    vim -N -E -s \
    -c "source scripts/sourcecode-to-html.vim" \
    $< \
    >/dev/null

tiddlers/sourcecode/%.tid: wiki/sourcecode/%.html
  @mkdir -p $(@D)
  SOURCE_FILE="$(patsubst wiki/sourcecode/%.html,%,$<)" \
    TARGET_FILE="$@" \
    ./bin/sourcecode-html < $< > $@

tiddlers/generated/sourcecode.css: $(sourcecode_html)
  cat $^ | ./bin/sourcecode-css > $@

In the above cryptic incantation it defines the steps we outlined above.

When make needs to build…

  1. a tiddlers/sourcecode/foobar.txt.tid file it knows to build it from wiki/sourcecode/foobar.txt.html
  2. a wiki/sourcecode/foobar.txt.html file it knows to build it from sourcecode/foobar.txt
  3. a tiddlers/generated/sourcecode.css file it knows to build it from all the wiki/sourcecode/*.html
  4. generated it knows to build all the tiddlers/sourcecode/*.tid which it know which ones by scanning sourcecode/*

Apologies but it seems in the act of explaining I fear I made it more confusing. Trust the Makefile.

Click each section to read more.

Exporting to HTML from VIM

The incantation for vim is to open a file in ex-mode then execute a vim script.

vim -N -E -s -c "source [script_file]" [input_file]

Because we are running in ex-mode we loose out on al the niceties of the normal start up scripts. We have to rebuild this initialization to get all the nice syntax styling we would expect. This is especially true if our file type support comes from a plugin which is not loaded when vim runs in ex-mode. That is because typically when we process text through ex-mode you want it to be fast and simple.

Here is the basic script to perform the export to HTML:

syntax on
set t_Co=256
source $HOME/.vimrc
set noswapfile
set viminfofile=NONE
set background=light
edit
set tabstop=2
set expandtab
let g:html_no_progress = 1
let g:html_number_lines = 0
let g:html_expand_tabs = 1
runtime syntax/2html.vim
saveas! $TARGET_FILE
quitall

We setup some terminal colors, and source the main .vimrc. We setup some tabs to better fit the content of the blog. We disabled swapfiles and viminfo. We refresh the file to run all the autocommands. And finally run the 2html.vim script which opens up an unsaved buffer of the syntax converted to HTML. Then we save that file to the $TARGET_FILE we established in the make recipe.

Convert HTML to tiddlers

Our next step is to convert the HTML files to just HTML fragments that we can use a TiddlyWiki tiddlers. There are two pieces to this. First TiddlyWiki does not embed HTML the same way we would think. It can process HTML as part of its WikiText parser but it needs specialized rule to do so thus just injecting raw HTML into a tiddler will cause issues. Second TiddlyWiki also supports raw HTML but does so by putting it inside an <iframe>. Normally this will work fine but in this case we don’t want an iframe but to just render the raw HTML inline like any other webpage.

To accomplish we need to make our own TiddlyWiki parser:

/*\
title: $:/plugins/sukima/sourcecode/parser.js
type: application/javascript
module-type: parser
\*/
function SourceCodeParser(type, text, options) {
  this.tree = [{ type: 'raw', html: text }];
};
exports['text/prs.sourcecode'] = SourceCodeParser;

This registers a MIME type of text/prs.sourcecode which is how TiddlyWiki knows to use this parser for the HTML fragments. A simple tiddler with that type when rendered will just be the raw HTML as is.

The second part is to convert a full HTML document to a fragment of just its <body>. We could again use vim’s ex-mode to do that.

0,/^<body/ delete
/^<\/body/,$ delete
1s/id='\(.*\)'/class="\1"/
call append(0, [
      \'title: ' .. $SOURCE_FILE,
      \'type: text/prs.sourcecode',
      \''
      \])
saveas! $TARGET_FILE
quitall

Honestly that is confusing but useful. The bigest issue is that it is slow. We have to use vim for the first to HTML because it is Vim that provides the syntax highlighting in the first place but now that we have raw HTML we can use other faster tools. I opted for Node.JS streams because TiddlyWiki is built under Node so it is already available and streams are super awesome.

I use my Split by line stream to turn the raw bytes to lines so each Transform stream can focus on line by line logic.

filterHtmlBody()

Runs through each line ignoring lines that are not inside the <body>.

function filterHtmlBody() {
  let isBody = false;
  return new Transform({
    objectMode: true,
    transform(line, _, done) {
      if (line.startsWith('<body')) {
        isBody = true;
      } else if (line.startsWith('</body>')) {
        isBody = false;
      } else if (isBody) {
        this.push(line);
      }
      done();
    },
  });
}

convertIdsToClassNames()

Vim’s HTML output needs a little clean up. It tacks an ID to the <pre id='…'> tag and uses single quotes for this even though all of the quotes in the rest of the file are double quotes. This Transform runs a simple RegExp on that line while other lines are just a pass-through.

function convertIdsToClassNames() {
  const idMatcher = /id=['"]([^'"]*)['"]/g;
  let firstLine = true;
  return new Transform({
    objectMode: true,
    transform(line, _, done) {
      if (firstLine) {
        firstLine = false;
        this.push(line.replace(idMatcher, 'class="$1"'));
      } else {
        this.push(line);
      }
      done();
    }
  });
}

prependPreambleLines()

The final tiddler output needs a title: and the custom type: text/prs.sourcecode to work. This transform adds these lines to the output pipeline.

function prependPreambleLines() {
  let sentPreamble = false;
  return new Transform({
    objectMode: true,
    transform(line, _, done) {
      if (!sentPreamble) {
        sentPreamble = true;
        this.push(
          `title: sourcecode/${process.env.SOURCE_FILE}`
        );
        this.push('type: text/prs.sourcecode');
        this.push('');
      }
      this.push(line);
      done();
    },
  });
}

joinLines()

Because we are processing the pipeline on a per line basis none of these chunks have an ending new-line character(s). In order for the output to make sense being piped to a byte-based Writer stream (stdout') we need to convert the lines back into strings with ending new-lines characters.

function joinLines() {
  return new Transform({
    objectMode: true,
    transform(line, _, done) {
      this.push(`${line}\n`);
      done();
    },
  });
}

Putting it all together

And the best part at the end is putting the pieces together just like Legos™

process.stdin
  .pipe(splitByLines())
  .pipe(filterHtmlBody())
  .pipe(convertIdsToClassNames())
  .pipe(prependPreambleLines())
  .pipe(joinLines())
  .pipe(process.stdout);

CSS styles

The CSS was a bit more complex because each HTML output only defines CSS classes for the types of Vim highlight identifiers specific to that file. And they change for each syntax filetype. This is an issue because there isn’t a good way to find all the classes and define colors for them.

This means we need to again process the HTML files and strip the CSS and combine them into one file. That also means de-duplicating.

Another complication is that the CSS colors Vim produces are hex codes and specific to the environment in which Vim renders. Thus when ran via the ex-mode script the environment is a dumb terminal with really bad colors. The good news is that the colors Vim produces are unique and there are only sixteen of them. That means we can make a map between the hex codes and terminal colors.

In my case that matched well to the sixteen colors in the Solarized pallet. And since we are compiling our own CSS we can exchange the outputted hex codes to CSS variables.

The best way I knew to compile all the HTML into one outputted CSS file was again to use Node.JS streams. The idea was to concatenate all the files into one stream and read the stream line by line. When we hit a <style> tag buffer the CSS rules (which Vim outputs the whole rule one per line). De-duplicate the rules; prefix the rules to provide some CSS scoping; then add TiddlyWiki front-matter.

the Makefile uses cat and the list of HTML files to produce one long stream of data we pipe to another of our Node.JS scripts.

Color map

This is a hard coded map of Vim hex colors it outputs and a CSS variable we can define in our TiddlyWiki seperatly.

const colorTransforms = new Map([
  ['#000000', 'var(--rebase02)'],
  ['#c00000', 'var(--red)'],
  ['#008000', 'var(--green)'],
  ['#804000', 'var(--yellow)'],
  ['#0000c0', 'var(--blue)'],
  ['#c000c0', 'var(--magenta)'],
  ['#008080', 'var(--cyan)'],
  ['#c0c0c0', 'var(--rebase2)'],
  ['#808080', 'var(--rebase03)'],
  ['#ff6060', 'var(--orange)'],
  ['#00ff00', 'var(--rebase01)'],
  ['#ffff00', 'var(--rebase00)'],
  ['#8080ff', 'var(--rebase0)'],
  ['#ff40ff', 'var(--violet)'],
  ['#00ffff', 'var(--rebase1)'],
  ['#ffffff', 'var(--rebase3)'],
]);

filterCssRules()

A boolean state machine helps turn on and off the filtering. When a <style> tag is seen the Transform stream will start pushing lines further down the pipeline and when a </style> tag comes it stops and drops the lines. It also focuses only on CSS rules that are classes to avoid any element based CSS which we don’t need.

function filterCssRules() {
  let isStyle = false;
  return new Transform({
    objectMode: true,
    transform(line, _, done) {
      if (line.startsWith('<style>')) {
        isStyle = true;
      } else if (line.startsWith('</style>')) {
        isStyle = false;
      } else if (isStyle && line.startsWith('.')) {
        this.push(line);
      }
      done();
    },
  });
}

uniqueLines()

Here we use a Set to unique each line. This will reduce down to unique CSS rules.

function uniqueLines() {
  let lines = new Set();
  return new Transform({
    objectMode: true,
    transform(line, _, done) {
      lines.add(line);
      done();
    },
    flush(done) {
      lines.forEach((line) => this.push(line));
      done();
    },
  });
}

prepareCssRule()

This stream processes each CSS rule prepending the scoped class .vimCodeElement and replaces the color hex codes using the map previously.

function prepareCssRule() {
  return new Transform({
    objectMode: true,
    transform(line, _, done) {
      for (let colorTransform of colorTransforms) {
        let [colorMatcher, replacement] = colorTransform;
        line = line.replace(colorMatcher, replacement);
      }
      this.push(`.vimCodeElement ${line}`);
      done();
    },
  });
}

prependPreambleLines()

Add the TiddlyWiki front matter.

function prependPreambleLines() {
  let sentPreamble = false;
  return new Transform({
    objectMode: true,
    transform(line, _, done) {
      if (!sentPreamble) {
        sentPreamble = true;
        this.push('/*\\');
        this.push('title: $:/site/sourcecode/styles.css');
        this.push('tags: $:/tags/Stylesheet');
        this.push('type: text/css');
        this.push('\\*/');
      }
      this.push(line);
      done();
    },
  });
}

Putting it all together

process.stdin
  .pipe(splitByLines())
  .pipe(filterCssRules())
  .pipe(uniqueLines())
  .pipe(prepareCssRule())
  .pipe(prependPreambleLines())
  .pipe(joinLines())
  .pipe(process.stdout);

Discuss this article