Rendering HTML from Markdown in a Cloud Function

Taylor Beseda’s avatar

by Taylor Beseda
@tbeseda
on

A blank notebook and pencil Original photo by Kelly Sikkema on Unsplash

Skip the build step and dynamically render Markdown to HTML right on the server. Yep. Respond to an HTTP request by loading a .md file, transforming the Markdown source, and returning HTML. In the process:

  • create a document table of contents,
  • parse and return frontmatter metadata,
  • dynamically add element attributes,
  • modify external anchor tags,
  • and provide syntax highlighting for inline code fences.

“But that has to be slow!”

Selena Gomez: "No … not really"

A custom markdown-it pipeline

Creating HTML from Markdown is fast with the right tools and an eye toward performant plugins. Arc.codes, the home of Architect, has been serving technical documentation from an AWS Lambda for years. The post you’re reading right now is served this way.

These sites use markdown-it with a fine-tuned set of plugins to achieve server-side-rendered Markdown. We found markdown-it to be a perfect balance of speed and extensibility. It has a built-in callback for syntax highlighting code blocks so that we can provide an instance of highlight.js. A few plugins were selected (and occasionally forked) to provide additional features like a table of contents, external anchor linking, and element class mapping.

Updated Markdown benchmarks

There are other options for rendering Markdown to HTML. Some have increased extensibility, while others are faster, but we found markdown-it to fit our needs best. Here are some stats and benchmarks (on a large doc, source):

Library extensible operations/sec
commonmark.js v0.30.0 no 1,624 ±0.80% (95 runs)
Snarkdown v2.0.0 no 956 ±0.29% (98 runs)
markdown-it v13.0.1 yes 763 ±0.78% (95 runs)
Marked v4.0.15 yes 742 ±0.21% (98 runs)
Markdoc v0.1.2 yes 432 ±1.29% (93 runs)
Showdown v2.1.0 yes 156 ±0.37% (86 runs)

Arcdown

After assembling the same functionality for a few projects, we abstracted the toolchain into its own package: arcdown.

arcdown gives us a simple configuration API and helpful result object to quickly serve Markdown from a cloud function while being very customizable:

import { readFileSync } from 'fs'
import render from 'arcdown'

​​const path = new URL('.', import.meta.url).pathname
const file = readFileSync(`${path}/example.md`, 'utf8')

const {
 frontmatter, // attributes from frontmatter
 html,        // the good stuff: HTML!
 slug,        // a URL-friendly slug
 title,       // document title from the frontmatter
 tocHtml,     // an HTML table of contents
} = await render(file)

Configuration and extensibility

The included plugins are each configurable or can be completely disabled (i.e., maybe your site doesn’t need a table of contents). External markdown-it plugins can also be provided – like automatic emoji parsing:

import markdownItEmoji from 'markdown-it-emoji'
import render from 'arcdown'

const result = await render(file, { plugins: { markdownItEmoji } } )

Heck, even the core renderer can be replaced by passing in a new render function:

const renderer = { render: (string) => `YO-${string}-LO` }
const result = await render(file, { renderer })

“Why, though?”

Mostly so we can tinker with and tune a site’s usage of arcdown as we’re building. For instance, this extensibility helped benchmark markdown-it against other Markdown parsers, compare syntax highlighting libraries (more on that below), and experiment with added plugins.

I won’t go into a ton of detail on configuration here. See the example on GitHub for a complete kitchen sink usage of arcdown.

Syntax highlighting and automatic language detection

Out of the box, highlight.js is a large install at 3.9 MB. The library provides grammars for nearly 200 languages and even more CSS styles for theming. Fortunately, our library can ignore 99% of this at runtime using only the core highlighter without any registered languages. arcdown automatically detects languages used in a given Markdown document and registers those specific grammars.

arcdown even allows for providing custom syntax definitions.

Shiki was also considered and implemented as a markdown-it highlighter, but we found it noticeably slower in a Lambda environment. That said, highlight.js is not necessarily a permanent element of arcdown; we’ll be happy to swap in a faster library with comparable capabilities.

Server-Side possibilities

Because we’re rendering in real-time on the server, our application has the opportunity to dynamically augment markup with data from an external source like a database or API.

Imagine a document that displays key metrics of a GitHub repository:

const template = `
# @architect/architect

| Metrics       |               |
|---------------|---------------|
| Version       | {{ version }} |
| Open Issues   | {{ issues }}  |
| Pull Requests | {{ prs }}     |
`

It’s a quick String.replace to swap {{ tokens }} with dynamic info before generating HTML.

Even the Markdown source could be loaded remotely, empowering writers to author rich content without HTML.

Once Markdown source has been parsed, speed can be increased further by caching results where possible. Save HTML documents or fragments to memory, a key-value store, or DynamoDB to skip the transformation steps entirely.

Libraries to keep an eye on

  • MDX - “Markdown for the component era.” MDX has been working on support for the Node.js runtime, but the library is more geared towards a project with a build step.
  • Astro - Markdown is baked in and has performant SSR, even in a Lambda function! But it expands the scope of operations and complexity for our use case.
  • Markdoc - mentioned above and recently released by Stripe; a compelling syntax addition to Markdown, but slow in a short-lived process. For now.

Also of interest are WebAssembly implementations for parsing Markdown. markdown-wasm is wicked fast, but because of the nature of WASM modules, jumping between the module and JavaScript to do things like calling back to plugins and highlighters can negate performance gains. It might be worth exploring operating on the WASM input and output. Even if you’re essentially parsing the document more than once, it may still be faster than a pure JS implementation.

Give it a shot

Have you considered rendering Markdown on a server? Try out arcdown and file an issue if you have any feature requests or spot any opportunities for better performance. Check out arcdown on GitHub.