I recently had a need to couple compression and base64 encoding with built-in tools (no 3rd party libraries). The trouble is that these two ideas don't have an easy common connection. What I mean by that is that Web APIs expose built-in compression as a Stream. While base64 is exposed as string only synchronous functions.
The difference between them stems from the different use cases. Compression is designed to work with binary data while base64 historically is designed for ASCII strings.
There is a lot to get into and a good way to do that is to walk through a working example.
I have a side project I call fiddles (source). It is a simple app, allow users to code in a text editor and reflect the result inside an iframe.
I wanted to add support for PlantUML which has a service that can render PlantUML source into an image. To accomplish this they want the source to be compressed and encoded in a very specific way and added as part of the URL path.
For example, this source:
@startuml
A -> B: Hello
@endumlIs compressed and encoded: SoWkIImgAStDuN9KqBLJSB9Iy4ZDoSddSaZDIm6A0W0
Construct that to a URL: https://www.plantuml.com/plantuml/svg/SoWkIImgAStDuN9KqBLJSB9Iy4ZDoSddSaZDIm6A0W0
Results in this image:
This encoding process ends up being five parts.
- Convert a string to a stream of bytes
- Compress the stream of bytes
- Encode the stream of bytes to base64
- Transform the base64 encoding to a PlantUML compatible encoding
- Convert the stream pipeline back into a string
Convert a string to a stream
Since we need a stream because the only compression support browsers have is via CompressionStream.
There are a few ways to convert a string to a stream of bytes.
My first thought was to convert a string to a Uint8Array using TextEncoder.
But I'd still have to convert the Uint8Array into a stream.
My next idea was to use TextEncoderStream as that seems like exactly what we want.
But… I'd still have to make a custom ReadableStream to convert the string to a stream I can pipe to the TextEncoderStream.
Ok maybe ReadableStream.from() could do that.
And this would be the answer except ReadableStream.form() is only available in FireFox (for now).
To support this we would still have to use a polyfill.
Luckily we do have a widely supported way to convert a string to a byte stream through a Blob.
new Blob(['foobar']).stream();
// => ReadableStream<Uint8Array>Compress a stream of bytes
With a stream of bytes we easily pipe it through the built-in compression transform stream.
new Blob(['foobar']).stream()
.pipeThrough(new CompressionStream('deflate-raw'));The currently supported compression algorithms are: gzip, deflate, and deflate-raw.
In our case PlantUML expects deflate-raw not needing any identifying header bytes.
Encode into base64
Here is the fun part.
There is not a built-in way to stream encode base64.
Instead we have to either use btoa() or process it into a data: URI process.
A bunch of examples use the data URI method as it doesn't have the awkward string to string system that btoa() has
and doesn't fall prey to the foot-guns of UTF-8 code points causing btoa()/atob() to crash.
However, the implementation is not strait forward and it kind of obfuscates the intent. As well they have very divergent implementations between encoding and decoding.
There are also many examples where people implemented the base64 bit math themselves.
I found, however, that with a slight bit of math and clever use of chunked string/array processing
it was possible to use btoa()/atob() effectively without the complexity of the other solutions.
class Base64EncoderStream extends TransformStream<Uint8Array, string> { constructor() { let buffer = new Uint8Array(); super({ transform: (chunk, controller) => { const bytes = new Uint8Array([...buffer, ...chunk]); const split = bytes.length - (bytes.length % 3); const next = bytes.slice(0, split); buffer = bytes.slice(split); controller.enqueue(this.encode(next)); }, flush: (controller) => { if (!buffer.length) return; controller.enqueue(this.encode(buffer)); } }); } encode(bytes: Uint8Array): string { return btoa(String.fromCodePoint(...bytes)); } }
Base64 is an encoding system, that inputs a set of bytes and outputs a set of strings. In this case we break up the set into chunks. When encoding it consumes three bytes at a time and outputs a chunk of four characters. While encoding to compensate for chunks that fall short of the chunk size padding (the Because the first byte array is always a factor of three bytes the encoding will never include a trailing The reverse works for decoding as long as we use strings that meet a factor of four characters.
Deep dive into Base64 encoding process
Input chunk size Output chunk size Encoding 3 bytes 4 characters Decoding 4 characters 3 bytes = character) is added to the end of the output. For decoding padding is ignored.fromCharCode and btoa= character which allows each chunk to be concatenated together safely later on.
Add the encoder to the pipeline:
new Blob(['foobar']).stream()
.pipeThrough(new CompressionStream('deflate-raw'));
.pipeThrough(new Base64EncoderStream());Transform to PlantUML compatible encoding
PlantUML uses a different string encoding than Base64. Luckily, its encoding is also a 64 character set. Since the two encodings have the same sized encoding set we can post transform from Base64 to PlantUML encoding.
| Base64 | ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef ghijklmnopqrstuvwxyz0123456789+/ |
|---|---|
| PlantUML | 0123456789ABCDEFGHIJKLMNOPQRSTUV WXYZabcdefghijklmnopqrstuvwxyz-_ |
A transform is a matter of mapping the Base64 character with PlantUML's equivalent character and stripping out any trailing = padding.
const base64map = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef' + 'ghijklmnopqrstuvwxyz0123456789+/'; const plantUmlMap = '0123456789ABCDEFGHIJKLMNOPQRSTUV' + 'WXYZabcdefghijklmnopqrstuvwxyz-_'; export class PlantumlEncoderStream extends TransformStream<string, string> { constructor() { super({ transform: (chunk, controller) => { const lookup = (char: string) => plantUmlMap[base64map.indexOf(char)] ?? ''; const encoded = chunk.replaceAll(/./g, lookup); controller.enqueue(encoded); }, }); } }
Add the encoder to the pipeline:
new Blob(['foobar']).stream()
.pipeThrough(new CompressionStream('deflate-raw'));
.pipeThrough(new Base64EncoderStream());
.pipeThrough(new PlantumlEncoderStream());Convert the stream back into a string
Once a stream it is streams all the way down.
The end needs to collect all the chunks into a single basket.
To do that we can make a small WritableStream that can collect strings
and concatenates them for consumption when the stream completes.
export class CaptureStringStream extends WritableStream<string> { result = ''; constructor() { super({ write: (chunk) => { this.result += chunk; }, }); } }
Add the writer to the pipeline:
const captureStream = new CaptureStringStream();
await new Blob(['foobar']).stream()
// => Uint8Array(6) [102, 111, 111, 98, 97, 114]
.pipeThrough(new CompressionStream('deflate-raw'));
// => Uint8Array(12) [
// 74, 203, 207, 79, 74,
// 44, 2, 0, 0, 0,
// 255, 255
// ]
// => Uint8Array(2) [ 3, 0 ]
.pipeThrough(new Base64EncoderStream());
// => "SsvPT0osAgAAAP//"
// => "AwA="
.pipeThrough(new PlantumlEncoderStream());
// => "IilFJqei0W000F__"
// => "0m0"
.pipeTo(captureStream);
console.log(captureStream.result);
// => "IilFJqei0W000F__0m0"Final step: build the URL
The last step is to build this into a URL.
const imageSrc = new URL('https://www.plantuml.com');
imageSrc.pathname =
`/plantuml/svg/${captureStream.result}`;