Avoid Surprise Bills from AWS

avatar
Ryan Bethel
November 2, 2021

burning money Photo by Jp Valery on Unsplash

Architect users often ask for some way to avoid large surprise bills from AWS. The budget-watch plugin sets a cost limit for your app and temporarily shuts it down when the limit is reached. It is the first step to solving this problem for our community.

Scaling is a built-in feature of dynamic apps using cloud functions. But if your app can scale to infinity, so can your bill. Many users have this concern, even though it rarely happens. Amazon has many services to help monitor billing. Setting an account-wide budget alert is a relatively easy first line of defense. But configuring more fine-grained monitoring for a single app is not so easy. It’s a little like trying to fix your car by breaking into your mechanic’s garage. The billing services are too complicated for a small team to devote their limited resources to.

The unit of deployment in Architect is an individual app. None of the AWS budget solutions easily scope to a single app. You might have one app you expect to cost $100 per month and a dozen other experimental apps that you expect to be free. You should be able to easily set these limits and have some assurance there will be no surprises.

What budget-watch does

There are many ways an app could run up your bill. Maybe you hit the top of hacker news, or you might have an infinite loop. The simplest way to stop a runaway app is to shut down the compute resources. By setting reserved concurrency (how many simultaneous executions can run) on all lambdas to zero, you can effectively stop most things that cost money in your app. You install the plugin with npm install @architect/plugin-budget-watch, and then add the four lines shown below to your app manifest.

@plugins
budget-watch

@budget
limit $40

If the app uses multiple plugins, budget-watch should be listed last so that it is the last one applied. When deployed, it attaches a budget alert scoped to just the resources of the app. If the cost of those resources exceeds the limit set, a shutdown is triggered. To restart the app, the limit can be increased or removed and the app redeployed. This resets the lambda concurrency, and the app will resume operation. You can see the code on github.com/architect/plugin-budget-watch for more details.

Implementation details (the messy parts)

We would love to see Amazon build this feature into the platform. Solving this problem from the outside exposes many rough edges in the billing and CloudFormation APIs. The three biggest challenges are:

  1. Enabling tags
  2. Slow billing updates
  3. Managing configuration drift

Enabling tags

An ideal solution would be deployed entirely through the app manifest using CloudFormation (AWS’s infrastructure as code that Architect uses underneath). All that is needed to scope to an app is a single auto-generated tag (aws:cloudformation:stack-name). But before this tag can be used, someone needs to navigate deep into the AWS console to activate it. This only needs to be done once, but it breaks the ideal user experience of avoiding the console altogether. Some users may not be allowed to activate tags because of account limits set by their organization.

Billing updates

Amazon bills you by the millisecond but only lets you check three times a day. It’s like touching a stove and finding out eight hours later that you got burned. You can set all kinds of alerts on all sorts of dimensions, but you can’t do any better than 8-12 hours of granularity. Even if you set an alert for a limit you have already passed, the notification may still be delayed for half a day until the next billing update.

Drift detection

Deterministic deploys are a core feature of Architect. You get the same infrastructure every time you deploy the same app manifest. This plugin should not break that contract. Setting a budget limit and resetting it after it has triggered are all done from the app manifest. To shut down the app the plugin changes all concurrency by directly calling the lambda API. This causes a drift (out of band configuration changes to infrastructure) between your app and manifest.

girl drifting in a toy car

This drift needs to be reversed if the app is restarted. If you deploy the same CloudFormation template, it will not make any update because the template has not changed. AWS has “Drift Detection” that you can enable to monitor the out-of-band changes, but it requires clicking around the console to enable it. There is no way to turn drift detection on with CloudFormation, and there is no way to automatically reconcile that drift by having your template overwrite those configuration changes. Not only that, drift detection does not even monitor the relevant lambda configurations.

The workaround for this drift reset is to use a CloudFormation Custom Resource as a reset mechanism. Custom resources are intended for provisioning infrastructure outside of AWS and connect them to your stack through CloudFormation. They have lifecycle hooks for create, update, and delete that run custom logic. After the budget-watch limit is triggered, it can be reset by increasing the limit in the manifest or by removing the limit. This triggers an update in the custom resource that resets the concurrency back to its original settings.

Other approaches considered

There are many ways to build a feature like this with AWS building blocks, but most suffer from the same limitations (i.e., 3x/day updates). Two promising approaches considered were cost anomaly detection w/SNS alerts and budget triggered actions. Cost anomaly detection was not used primarily because the alerts look for deviations in billing rather than absolute limits. What is the “expected” budget for the app I just deployed for the first time?

Budget triggered actions seemed like the most promising solution provided by AWS. You set a budget with an alert that has automated actions attached. The challenge is that the actions you can specify are permissions, policy, and instance-based. These actions end up tightly coupled to implementation details of the app and must be updated as the app changes.

Limitations

The actual causes of large surprise bills are as varied as the many services that AWS offers. This solution cannot possibly catch every one. It focuses on the biggest source and applies the broadest intervention. Architect is a very flexible framework. It is possible (especially with custom plugins) to include infrastructure that will not be shut down by budget-watch.

This plugin is still in Beta. We encourage people to try it out and give feedback. If you want to add the plugin to your Architect project you can follow the instruction in the GitHub repository (architect/plugin-budget-watch).

What about Begin users?

Begin.com has a generous free tier, so users do not have to worry about costs. But for those on the paid tier, we will soon make this feature available to all users of Begin. We hope that AWS will build this into the platform in a more usable way. Until then, we hope to relieve some of the fear of unexpected bills. If you want to build scalable web apps with Begin sign up for a free account today.