Sanitizing Input

avatar
Simon MacDonald
February 11, 2022

hero image Photo by Tim Mossholder on Unsplash

By this point in the pandemic, we are all tired of sanitizing our hands, but that doesn’t mean we should let our guard down when sanitizing user input. Don’t rely solely on client side validation as it can easily be bypassed.

This topic recently came up on our Discord:

I did not see any existing Node.js runtime helper that would sanitize input? Has anyone needed to do that in an arc project? Which package did you use (sanitizer, validator etc.)? Did you create a custom middleware?

This got me thinking, how would you write some middleware to handle this task and which package would be the best for the task?

Package Selection

Three popular packages for sanitizing HTML:

Name Weekly Downloads GitHub Stars
dompurify 2,204,742 8.4k
sanitize-html 1,016,184 2.9k
xss 1,987,604 4.4k

Again, popularity is not the best metric for choosing an npm package, but all three of these packages have a healthy development community and are actively maintained.

Let’s quickly evaluate the three packages using code size, the number of dependencies, and speed as our criteria.

Code Size

Using BundlePhobia to check the bundle size, xss is the lightest package. It looks like dompurify is close behind, but to run it in a Node.js runtime, you also need to include the peer dependency jsdom which is larger than the rest of the packages put together.

Name Bundle Size Gzipped Bundle Size
dompurify 19.5 kB 7.5 kB
sanitize-html 147.8 kB 45.9 kB
xss 17.3 kB 5.5 kB
jsdom 2.3 MB 572.5 kB

Running slow-deps will show you the size and install time of each package, including its dependencies.

npx slow-deps
Analyzing 4 dependencies...
[====================] 100% 0.0s
--------------------------------------------------
| Dependency           | Time  | Size   | # Deps |
--------------------------------------------------
| jsdom                | 2s    | 9.4 MB | 58     |
| sanitize-html        | 2s    | 916 KB | 16     |
| dompurify            | 868ms | 618 KB | 1      |
| xss                  | 742ms | 241 KB | 4      |
--------------------------------------------------

Once again xss comes out on top with the quickest installation time and once again, dompurify comes in second but hamstrung by the inclusion of jsdom.

Speed

I wrote a quick test harness to pass in some “dirty” HTML with script tags and in-line JavaScript. The test runs the sanitation 10,000 per package and the average execution time reported below, along with the cost per 1 million invocations.

Name Avg. Execution Time Cost per 1M runs
dompurify 11.381 ms $0.19
sanitize-html 0.875773 ms $0.01
xss 0.752937 ms $0.01

xss is the fastest with sanitize-html close behind and dompurify a distant third.

It’s pretty clear that xss is our best choice as it has the quickest execution time and the smallest code footprint, which is key for keeping coldstart times down.

Sanitize Middleware

Now that we’ve picked a suitable package it’s time to write our middleware. Create a new file src/shared/sanitize.js where we will write our sanitize code:

// src/shared/sanitize.js
const xss = require("xss");

module.exports = async function sanitize (req) {
 let buff = Buffer.from(req.body, 'base64')
 let text = buff.toString('ascii')
 let sanitized_body = xss(text, {
       stripIgnoreTagBody: ["script"]
   })
 req.body = sanitized_body
}

The above code:

Since our sanitize function was created in the src/shared folder it will be available to all of our HTTP functions as long as we require it.

const sanitize = require('@architect/shared/sanitize')

Now in our cloud function we only need to add our sanitize function to our call to arc.http.async. For example:

exports.handler = arc.http.async(sanitize, echo)

Our santize function will be called first to clean up the requests body. Then our echo function will be called, which returns the sanitized request body.

Here’s a full code sample:

// src/http/post-echo/index.js
const arc = require('@architect/functions')
const sanitize = require('@architect/shared/sanitize')

exports.handler = arc.http.async(sanitize, echo)

async function echo (req) {
 return {
   statusCode: 200,
   headers: {
     'cache-control': 'no-cache, no-store, must-revalidate, max-age=0, s-maxage=0',
     'content-type': 'text/html; charset=utf8'
   },
   json: { body: req.body }  }
}

In Conclusion

I hope this gives you a bit of insight into how we go about selecting node packages for use in our HTTP functions as well as how to write shared middleware that can be used in multiple HTTP functions in your application.