Building an AWS Serverless system: Provisioned concurrency, Zero-Downtime and Monitoring
In this second blog post in my series about an API-based serverless service in AWS, I discuss Provisioned Concurrency — a way to make Lambdas respond faster, without a "cold start". When we tried to configure it, it turned out that a re-deploy would yield downtime. Here I summarize what we have learnt.
How to engage Provisioned Concurrency
Depending on the Runtime we use for our Lambda, its initialization times might lie in the range of 0.5-5.5 seconds. This warmup time is not there if the Lambda was already invoked recently - within 30—45 minutes it can stay "warm". Imagine the case when your application does several calls sequentially and since you had a bit of low traffic, your Lambdas are not "warm" yet - your client might need to wait for many seconds and it is not nice for user experience.
If your case demands to exclude warmup time in most calls, you can make use of Provisioned concurrency - that is, some copies of Lambda stay warm always and provide a faster experience. The tradeoff is that Provisioned Concurrency is not free — the pricing can be seen on this AWS page.
In Terraform (using respective serverless.tf Module), the following instruction is made:
module "my-lambda" {
  source  = "terraform-aws-modules/lambda/aws"
   ...
  # This will generate a "published version" 
  publish               = true 
  # to disable Provisioned concurrency, use -1 
  provisioned_concurrent_executions =  4
}The Provisioned Concurrency is not applied to a Lambda in general, but to a specified "published" version of a Lambda. You can not use $LATEST tag for referring to a Lambda with Provisioned Concurrency.  Hence, in API Gateway that calls the Lambda, you must refer to that version of Lambda precisely, using a qualifiedarn_:
module "api_gateway" {
  source  = "terraform-aws-modules/apigateway-v2/aws"
  ... 
  integrations = {    
    "GET /mylambdarest" = {
       integration_type        = "AWS_PROXY"
       integration_http_method = "POST"
        payload_format_version  = "2.0"
        lambda_arn              = module.my-lambda.this_lambda_function_qualified_arn
      }
 ...
  }
}Terraform-related caveats for Provisioned Concurrency and VPC Lambdas
When you update your Lambda, terraform would generally start shifting your Provisioned Concurrency setup to the new version. This process is usually taking few minutes and this must be taken into account because it might result in some downtime under certain circumstances (see the next section).
It was also discovered by us at the time of writing (first half of 2021), that if a Lambda belongs to a VPC (which is the case for Lambdas communicating with databases within ) then re-creation of Provisioned Concurrency happens every time you invoke terraform apply even if the Lambda was not modified.  We observed this behavior while using the serverless.tf Lambda Module; I could not say whether usage of a "native" AWS Terraform provider would differ from that behavior.
How to achieve zero-downtime deployments with Provisioned Concurrency
Note added in proof: the approach below describes a zero-downtime deployments that concerns routine code update of a Lambda. There are still scenarios that can cause downtime. Please see the "When downtime is still possible" subsection below.
Imagine the situation: you configured your API Gateway and Provisioned Concurrency Lambda and they work perfectly together, but when you invoke terraform apply, the system goes away for a few minutes, while terraform slowly printing in-progress messages and your colleagues starting wondering about the production downtime. This is what happened:
- Earlier you needed to tell Lambda and API Gateway that they are permitted to work with each other, and most likely decided that you will add to Lambda a policy to allow access from the API Gateway. This is, on one hand, a valid way to achieve connectivity. However, as you renew the Lambda, the previous configuration is destroyed, so is the permission for API Gateway to call the Lambda. At this point, API Gateway calls lose the rights to invoke the Lambda and clients are starting to get HTTP 500 errors.
- Then, a new version of Lambda is being created, together with its Provisioned Concurrency configuration and API Gateway permissions. While permissions are fast to recreate, Provisioned Concurrency is not — it takes several minutes to finalize the config. The API Gateway permissions are restored after that, therefore your new Lambda remains denied for API Gateway for some minutes. Once the configuration is fully concluded, Lambda becomes reachable again and normal operation resumes.
Many simple tutorials online regretfully do not go beyond the basic case that does not include Provisioned Concurrency. When Lambda is connected to API Gateway using the $LATEST tag, one might not sense the problems described above.
This problem can be avoided as follows. There are two ways to permit API Gateway to talk to lambdas:
- Lambda has a Policy to permit API Gateway to call it (see for example this AWS blog).
- alternatively, an API Gateway can have its own set of "credentials" that is authorized to call the necessary Lambdas. In such a case, no additional permissions are needed on the Lambda side.
Turns out, the second option works well with Provisioned Concurrency Lambdas and generates no outage as the switch over between versions happens instantaneously. Access rights never experience any re-creation but remain in place at all times, on the API Gateway side. To achieve this, we need to pass to needed integrations a credentials role:
resource "aws_iam_role" "api_gateway_credentials_call_lambda" {
  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
      Effect = "Allow",
      Principal = {
        Service = "lambda.amazonaws.com"
      },
    Action = "sts:AssumeRole"
      },
      {
      Effect = "Allow",
      Principal = {
        Service = "apigateway.amazonaws.com"
    },
      Action = "sts:AssumeRole"
      }
            ]})
  inline_policy {
      name = "permission-apigw-lambda-invokefunction"
      policy = jsonencode({
        "Version" : "2012-10-17",
        "Statement" : [
        {
        Effect   = "Allow",
        Action   = "lambda:InvokeFunction",
        Resource = "arn:aws:lambda:*:${data.aws_caller_identity.current.account_id}:function:*"
        }
        ]
      })
  }
}Note that the expression "arn:aws:lambda:*:${data.aws_caller_identity.current.account_id}:function:*" uses a wildcard (final asterisk) that means "include all Lambdas". If you use some Lambdas that are not a part of API, it might be a good idea to allow API Gateway to reach only the Lambdas that it really needs to call. It can be achieved by inventing a special prefix to these Lambda IDs and use it in the expression above instead of function:*.
Again, using serverless.tf API Gateway module, we configure that Role to be used as part of integrations that call Lambdas, via credentials_arn parameter:
module "api_gateway" {
  source  = "terraform-aws-modules/apigateway-v2/aws"
  ...     
  integrations = {       
    "GET /auth/mylambda" = {
      integration_type        = "AWS_PROXY"
      integration_http_method = "POST"
      payload_format_version  = "2.0"
      lambda_arn              = module.mylambda.this_lambda_function_qualified_arn
      credentials_arn         = aws_iam_role.api_gateway_credentials_call_lambda.arn
    }Note that for lambda_arn I used qualified_arn as I used this bit for a Lambda with Provisioned Concurrency.
When downtime is still possible?
The configuration described above works well when you need to tweak code inside an existing Lambda. Word of warning: other types of changes you make can still cause downtime. We describe below two such cases, but there can be more, not counting, of course, the situation when API Gateway and lambdas are temporarily destroyed as a whole.
For instance, if the Lambda handler function was renamed, there was a break in service during deployment. It happened because the Lambda configuration was updated with a new function name, then Lambda continued to set up, but it took some time. During that time, traffic was sent towards the new function name, but the updated source code wasn't there yet, and an old source code threw an error about an unrecognized identifier. Another downtime case is described in this GitHub issue.
It is therefore necessary to test your deployments whether they contain any changes that will cause an outage. Please see my blog post Verification of zero-downtime deployments using GitHub Actions for my own take on the problem.
Monitoring
The system has to be monitored, so we can know if it failed. It does not seem very complicated to set up, but I just mention couple of aspects that we had to research.
Backend
We utilize native Cloud Watch monitoring and Dashboards. Conveniently, these Dashboards can be made accessible via a password-protected minisite so one can view these without a need to log in into the AWS Console (these are still password-protected). Remember to present on dashboards and in alerts the information that you find most "interesting" and do not tire the viewer with the noise. If your monitoring approach is full of noise, you would immediately get used to ignore its messages and important cases will sink in the noise.
To ensure our Lambdas are as performant as possible, we instrument the Lambda source code with AWS X-Ray subsegments and examine the X-Ray traces for bottlenecks.
Below I list few cases that are worth highlighting in connection with Monitoring.
Adding Lambda insights
In Console, you can enable "Lambda Insights" for a Lambda function, but what about the IaaC approach? Turns out, it's not a feature flag of Lambda, but a special Lambda layer to be added, as follows e.g. from this Stackoverflow suggestion. Note that you need to use the current version of the Layer; besides, ARNs for the Layer are Region-specific. Check the ARN you need to consume on this AWS page. In serverless.tf approach to Lambda configuration, to enable Lambda Insights you need to mention:
module "mylambda" {
  source  = "terraform-aws-modules/lambda/aws"
   ...
  layers = [
  "arn:aws:lambda:eu-west-1:580247275435:layer:LambdaInsightsExtension:14"]
}Monitoring Provisioned Concurrency
As Provisioned concurrency can incur noticeable charges, it is always good to know that you are running just the needed amount of it. This AWS document describes metrics that can help visualize the situation with Provisioned Concurrency. I find ProvisionedConcurrencyUtilization to be the most useful metric. We also monitor  ProvisionedConcurrencyInvocations, ProvisionedConcurrencySpilloverInvocations, ProvisionedConcurrentExecutions.
Based on the insights received from these metrics, an auto-adjustment of Provisioned concurrency can be implemented to adapt the resources to real usage patterns.
Frontend
So, what if someone somewhere got our static webpage not working because of some obscure browser or otherwise?
While backend monitoring is easy to make using native CloudWatch functionalities, with frontend it is not a simple task to DIY. There are some attempts to ensure that frontend errors are fed to CloudWatch as well, but that is usually not enough - a developer might be curious about auxiliary data. Besides, this has been already tackled an advanced ways by existing vendors.
After some review of available APM tools (most of these offer both frontend and backend monitoring), we found out that Sentry fulfills our needs very well and at a reasonable price. It is well integrated with the majority of popular front-end frameworks and repository servers (such as GitHub) and provides rich information about errors and exceptions, alerting and even an ability to suggest "bad" commits in source control. Just a remark, an adblocker I use blocks Sentry calls by default, so to use it the service should be added to its exceptions.
Read more about Futurice's AWS Cloud services here
 Askar IbragimovCloud Architect and Senior Developer Askar IbragimovCloud Architect and Senior Developer





