Automated Multi-Region deployments in AWS: Lambda
Amazon's Lambda service is perhaps one of our favorites there is, because it lets you just hand over code to AWS and lets AWS deal with the complexities of executing it. Best of all, it is cost effective: instead of servers running 24/7 and wasting resources and money, it only charges you when your code needs to be executed. I'd even argue its more environmentally friendly, since you only burn what you use!
This does come at some "cost." Lambda executions can be more expensive when you aggregate them out compared to other computing. But, the benefit on the flip sides is, you don't have to worry about managing servers (Even ECS with EC2 you need to monitor servers), patching, etc. So, is that a cost you'd be willing to pay for?
Debugging can be a bit more difficult. Getting your application running and tested the first go can be a bit painful if something isn't working as expected. Once you get it cracked for the first time, the following are a breeze.
But, again—this all falls back to, making life easier. Our goal is to not have to manage servers, ever. Lambda (and to be far, Fargate) enable this.
Costs
Your lambda costs are computed leveraging milliseconds that you execute your function. You pay for:
$0.20 per million executions
$0.0000166667 for every GB Second of execution
So, if you setup a function that requires 1024MB of memory, and takes 100ms to execute, you will pay (for 1m executions) roughly $1.87 to handle the 1,000,000 requests.
Consider if you were using Fargate. To handle that 1,000,000 requests over a month you'd pay around $30.42 a month. (These figures do not include NAT Gateways, or other services—just straight compute).
So, if you are now running a multi-region failover, you can have your Lambda functions in both regions, with the failover region costing you a total of $0.00 in daily executions while the primary region hums along.
SAM
The Serverless Application Model ("SAM") is a collection of tools that assist in accelerating the development and deployment of AWS FaaS. It is a great tool.
But, we don't use it. For us, it has some limitations in setting up and configuring API Gateways. We instead (via CloudFormation) create the functions and API Gateways via our CloudFormation Template Artisans.
You probably should use SAM, it is a force multiplier and abstracts most of the pain from deploying solutions.
AWS CodePipeline and Lambda
This series is centered around developing Lambda FaaS that can be deployed in multi-region deployments. With that, we'll touch on things we've seen and considerations to take.
CodePipeline and Deployment
First, CodePipeline. As we'll touch on more in the AWS CodePipeline writeup, you need to decide how you'll deploy your CloudFormation templates. CodePipeline has a slew of deployment options available to it. We use the CloudFormation deployment type—that deploys StackSets. We could have used the StackSet deployment type, but it wasn't as flexible as the CloudFormation.
Our general setup for a solution is to have two repositories:
Infrastructure as Code (IaC)
Compute (Code + IaC)
Our IaC is generally networking as well as storage. We'll setup VPCs, KMS, S3 buckets, NACLs, Security Groups, ACM certificates, and DynamoDB in this configuration set. The beauty is, deploying as a StackSet, we can automatically target and deploy to AWS Organization OUs. This means we can deploy to new accounts without a single finger being lifted after we move an account into the application OU.
Additionally, this splits out the infrastructure logic from compute. If we make changes to the IaC, we don't have to worry about it changing the compute. But, in theory, your infrastructure itself should be pretty static.
So, if you are aiming for daily deployments, you should be updating only your compute and leaving your infrastructure alone.
Lambda Layers and Code
Depending on how you are packaging things up, you may be using zipped code, Lambda Layers, and potentially containers hosted in Elastic Container Registry (ECR). We develop "Monolith" .NET Core solutions, and with that use code in a zip archive as well as Lambda Layers.
Each of these Lambda code methods will follow the same practices. You must think in a "multi-region" mindset. We ran into some issues early on trying to figure out what we were doing wrong. Lambda requires your S3 code to reside in the same region that your Lambda exists in. This extends to Lambda Layers as well.
When you think about it, it makes a lot of sense. If you are in both us-gov-west-1
and us-gov-east-1
, and us-gov-east-1
is where you host your code, and us-gov-east-1
goes down... us-gov-west-1
is going to be in a world of hurt. So, you need to replicate code in all regions which you wish to deploy your Lambda functions.
To facilitate this, in our next post S3 Replication (Pending) we touch on how to replicate code from one region to another.
This also comes into play with Lambda Layers. You will need to replicate your Layers to each region you work in.
The same exists for ECR. If you are deploying Lambda as a Container, you need to replicate your containers everywhere you deploy it.
S3 Replication
S3 Replication is talked about in our next blog post, but there is one big issue we ran into. If your build process completes and submits code to S3, you cannot go and approve or let it auto deploy. This will cause you pain.
S3 replication isn't instant. We've seen, even for small 30MB zip files it take roughly 5 or so minutes to completely replicate. So, let your build finish, grab coffee, and approve the deployment in a few minutes.
Lambda Limitations
Lambda is a trade off of on demand compute and "always on" compute like ECS Fargate. If you want something that is (almost) always responsive you'll want to look at Fargate.
Lambda works on executions. It can process one execution at a time. So if you have 100 concurrent requests, Lambda will spin up 100 executions that "live" for roughly 5 minutes. So, if you are seeing 100 connections on a constant basis, Lambda will recycle the instances and drain the pool as they aren't used.
Thus, the "initial spin up" of a Lambda function could take a few dozen milliseconds to a few seconds. With .NET Core, we were originally seeing 6 second initial load times. We did a few things to improve that:
We "Pre-Jittered" our .NET projects and created Lambda layers so speed load
We remove all static content from the deployments, reducing the zip of .NET code to virtually nothing (30MB with image assets down to 400kb with just code)
Increase the RAM within the Lambda function, we went from 512mb to between 1024mb and 2048mb.
The combination of these three drove initial load times down to 500ms.
API Gateways
We are just documenting a few notes here on API Gateways.
Avoid Binary Data
If you are planning to send files to process, avoid anything binary. Call your API to generate a presigned URL request to upload or download a file. Leverage S3 to trigger events and processing. This has a lot of practical reasons. Mainly, API Gateway has a 6MB upload limit to the service, so anything larger than that will fail.
Additionally, API Gateway likes to Base64 encode binary data, in both directions. So if you are serving up images from API Gateway, expect (if you don't have HTTP Accept headers) for Base64 data to be returned.
Offload to S3 and CloudFront anything that is static or binary.
Host Names
You will want to specify two host names for your API Gateway. You will want two for each region to enable failover checks. For instance, if we are deploying a service sample.fed.dev
into two regions we would setup as follow:
us-gov-west-1
sample.fed.dev
us-gov-west-1.sample.fed.dev
us-gov-east-1
sample.fed.dev
us-gov-east-1.sample.fed.dev
This configuration will generate four DNS CNAME entries you map in Route53.
You will create four DNS records in total. The first two will be the to perform health check failovers, the second two will be same host name to the different API Gateway CNAME values.
A big note we'll put in here. Be careful about changing DNS names when you map API Gateway endpoints to URLs. If you change the domain or recreate it, then us-gov-west-1.sample.fed.dev
and us-gov-east-1.sample.fed.dev
health check DNS entries.
Once those are resolving, you will create health checks in Route53 for those two entries.
After the health checks have been created you will create a primary and secondary DNS entry for the sample.fed.dev
domains in us-gov-west-1.sample.fed.dev
and us-gov-east-1.sample.fed.dev
—mapping to their health check domains.
Next Up in series
Next up in this series will be:
Part 1: Intro
Part 2: Gotchas
Part 3: DynamoDB Global Tables
Part 4: AWS Lambda
Part 5: S3 Replication (Pending)
Part 6: AWS Fargate (Pending)
Part 7: AWS CodePipeline (Pending)