Tidbits on software development, technology, and other geeky stuff

I found a use for a zero-width space

You know about zero-width spaces right? It’s an invisible character that seems not so useful at first glace. Most of my experience with them has been negative because they usually show up in a random file I’m parsing and it is the cause of a bizarre bug. Or, it’s been the cause of a copy/paste search that yields no results even though there are matches sitting in front of my eyes.

Well, despite my apathy for them, I actually found a good use for them.

I’ve recently been building out xertz, a static site generator written in TypeScript. It’s being used to build this here site. Something that had been bugging me was the indentation of the rendered HTML. I use Handlebars.js templates and include raw HTML which has been converted from Markdown. My templates looks something like the following, where {{ content_html }} is the raw HTML bit:

<body>
  {{> header }}
  {{> sidebar }}
  <article class="post">
    {{{ content_html }}}
  </article>
  {{> footer }}
</body>

The problem is, newlines in {{ content_html }} do not get indented so the actual output looks something like this:

<body>
  <header>My Site</header>
  <aside>Sidebar here</aside>
  <article class="post">
    <p>Lorem ipsum dolor sit amet, usu an justo deterruisset. Est ad discere nominati,
erroribus dissentias mei ne, appetere qualisque eloquentiam sea et.</p><img alt="An image" src="/i-found-a-use-for-zero-width-space/my-image.jpg"/><p>Lorem ipsum
dolor sit amet, usu an justo deterruisset. Est ad discere nominati</p>
<p>Lorem ipsum dolor sit amet, usu an justo deterruisset. Est ad discere nominati,
erroribus dissentias mei ne, appetere qualisque eloquentiam sea et.</p><img alt="An image" src="/i-found-a-use-for-zero-width-space/my-image.jpg"/><p>Lorem ipsum
dolor sit amet, usu an justo deterruisset. Est ad discere nominati</p>
  </article>
  <footer>My footer</footer>
</body>

That’s a simple example. In practice it’s much worse. Yes, this has no effect on the actual layout and rendering of the webpage but I’m a developer and care about the source and what it looks like.

I tried to create a Handlebars helper called “indent” so I could indent each newline. I called it like this:

...
<article class="post">
  {{{ indent content_html 2 }}}
</article>
...

The helper is pretty simple. It just replaces newlines (\n) with a newline followed by a number of spaces:

function indent(input: string, width: number) {
  const intendation = input.replace(/\n/g, "\n" + new Array(width).join(" "));
  return input.replace(/\n/g, match => match.replace(/\n/, `\n${intendation}`)
}

That worked pretty well until my <pre> code blocks started looking like this:

const my_var;
    const another_var;

The HTML looked like this:

<body>
  <header>My Site</header>
  <aside>Sidebar here</aside>
  <article class="post">
    <p>Below is a code block</p><pre>const my_var;
    const another_var;</pre><p>Lorem ipsum dolor sit amet, usu an justo deterruisset. Est ad discere nominati,
    dissentias mei ne</p>
  </article>
  <footer>My footer</footer>
</body>

Uh oh, I’m getting indentation in my code blocks. Yes, since my code blocks are using <pre> any spaces inside of them will be rendered as is. <pre> means “preformatted” after all.

So then I thought I needed hint to signal to the indent helper to skip indentation in these <pre> tags.

Adding hints in the <pre> tags seemed feasible because I use Prism with Marked to convert Markdown code blocks like:

```javascript
const my_var = "Hello";
```

into <pre> blocks. It’s quite easy to modify the output of these tags because you provide a function that returns something like this:

return `<pre class="${className}"><code class="${className}">${code}</code></pre>`;

Easy to modify, yes. I thought, “can I add some character(s) to end of <pre> tags lines that my indent helper could skip?” But since I’m using Regex to add the indentation in my indent helper, I can only use a single character to be able to include a negated character (e.g. [^!]) in my RegEx without having to do a negative look-behind (Javascript doesn’t support these anyway).

Ok, so, I just need Prism to add a single character that will not be visible to the end of lines that are inside of <pre> blocks. Then my indent helper can ignore these. How do I do this?

Zero-width spaces, of course!

Now, my code formatting function preceeds newlines in my code blocks with a zero-width space. It looks like this:

const codeWithNewlineHints = code.replace(
  /\n/g,
  // Prepend each newline with a zero-width space character so we can signal to any upstream formatting to leave the formatted code alone.
  "\u200b\n"
);

return `<pre class="${className}"><code class="${className}">${codeWithNewlineHints}</code></pre>`;

In my indent helper, I simply ignore lines containing these characters preceding a newline.

const intendation = input.replace(/\n/g, "\n" + new Array(width).join(" "));
return input.replace(/\n/g, match => match.replace(/[^\u200b]\n/, `\n${intendation}`)

See that RegEx there? /[^\u200b]\n/ means only match newlines if they are not preceded by a zero-width character (\u200b). So, with this, indentation will only be added to lines not preceeded by these characters.

I’ve gained a newfound respect zero-width spaces.

Discuss on Twitter