Serverless compute — AWS Lambda, Google Cloud Functions, Azure Functions — eliminated an entire category of infrastructure management: no servers to provision, no patching, no capacity planning (in the traditional sense). For many workloads, this is a genuine operational simplification. For the workloads where serverless creates new reliability challenges, those challenges are specific and learnable.
This is the reliability engineering guide for Lambda-based production systems: the failure modes, the patterns that address them, and the observability you need to understand what's happening at runtime.
The Cold Start Problem (And When It Actually Matters)
Cold starts happen when Lambda needs to initialize a new execution environment — download your function code, start the runtime, run your initialization code — before handling the first request. Cold start latency ranges from 100ms for lightweight functions to several seconds for large functions with heavy initialization (database connection pools, large dependency loading, JVM startup for Java runtimes).
Whether cold starts matter depends entirely on your workload:
They don't matter for: Async event processing (S3 events, SQS messages, Kinesis records). A few seconds of added latency for the first message in a batch is immaterial when the messages were already queued.
They matter for: Synchronous API requests via API Gateway where a user is waiting for a response. Cold starts add perceptible latency to the user experience, especially for infrequently used functions that cold start frequently.
Mitigation strategies:
*Provisioned concurrency* — Lambda keeps a configurable number of execution environments pre-warmed, eliminating cold starts for that concurrency level. The cost: you pay for the provisioned capacity even when it's not handling requests.
# Set provisioned concurrency via boto3
lambda_client = boto3.client('lambda')
lambda_client.put_provisioned_concurrency_config(
FunctionName='my-api-function',
Qualifier='production', # Use a published version or alias
ProvisionedConcurrentExecutions=10 # Keep 10 environments warm
)*Keep initialization code lean.* The cold start includes your initialization code — everything outside the handler function. Lazy initialization (create the database connection on first request, not on cold start) reduces cold start time for environments that don't need the resource.
*Choose the right runtime.* Python and Node.js cold starts are typically 100-300ms. Java cold starts (JVM initialization) are 1-3 seconds. If your use case is latency-sensitive, runtime choice is a reliability decision.
Idempotency: Non-Negotiable for Event Processing
Lambda's invocation model has important reliability implications. SQS-triggered Lambda functions are invoked with at-least-once delivery — the same message may trigger your function multiple times if the function fails or the processing time exceeds the visibility timeout. Kinesis-triggered functions replay records on function failure. S3 event notifications can occasionally deliver duplicate events.
Your function must be idempotent: processing the same event multiple times must produce the same result as processing it once. Without idempotency, at-least-once delivery means duplicate side effects — sending the same email twice, charging a payment twice, inserting the same record twice.
The idempotency key pattern:
import boto3
import hashlib
import json
dynamodb = boto3.resource('dynamodb')
idempotency_table = dynamodb.Table('lambda-idempotency')
def idempotent_handler(event, context):
# Generate a deterministic key from the event
event_key = hashlib.sha256(
json.dumps(event, sort_keys=True).encode()
).hexdigest()
# Check if the system processed this event before
try:
idempotency_table.put_item(
Item={
'event_key': event_key,
'processed_at': context.aws_request_id,
'ttl': int(time.time()) + 86400 # Expire after 24 hours
},
ConditionExpression='attribute_not_exists(event_key)'
)
except dynamodb.meta.client.exceptions.ConditionalCheckFailedException:
# Already processed — return success without side effects
print(f"Duplicate event detected: {event_key}, skipping")
return {'statusCode': 200, 'body': 'duplicate'}
# Process the event (guaranteed to run only once per unique event)
return process_event(event)The DynamoDB conditional write ensures that only one execution processes each unique event, even under concurrent invocations. The TTL automatically cleans up old idempotency records.
AWS Lambda Powertools provides a production-ready idempotency layer that handles this pattern with less boilerplate — worth using for new functions rather than implementing from scratch.
Dead Letter Queues: Catching What Falls Through
When a Lambda function fails repeatedly on an asynchronous invocation (from SNS, S3, or EventBridge), Lambda gives up after a configured number of retries. Without a Dead Letter Queue (DLQ), failed events are silently discarded. With a DLQ, they're sent to an SQS queue for investigation and reprocessing.
DLQs are not optional for production event processing. Silently dropped events are reliability failures you won't know about until a customer reports missing data.
# CloudFormation / SAM template
Resources:
ProcessingFunction:
Type: AWS::Serverless::Function
Properties:
Handler: app.handler
DeadLetterQueue:
Type: SQS
TargetArn: !GetAtt DeadLetterQueue.Arn
EventInvokeConfig:
MaximumRetryAttempts: 2 # Retry failed invocations twice before DLQ
MaximumEventAgeInSeconds: 3600 # Discard events older than 1 hour
DeadLetterQueue:
Type: AWS::SQS::Queue
Properties:
QueueName: processing-function-dlq
MessageRetentionPeriod: 1209600 # 14 days
DLQDepthAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: ProcessingFunction-DLQ-HasMessages
MetricName: ApproximateNumberOfMessagesVisible
Namespace: AWS/SQS
Dimensions:
- Name: QueueName
Value: processing-function-dlq
Threshold: 0
ComparisonOperator: GreaterThanThreshold
EvaluationPeriods: 1
# Alert immediately when anything lands in the DLQAlert immediately when anything lands in the DLQ. A message in the DLQ means processing failed multiple times — this is a production issue, not a background noise signal.
Concurrency Limits: Blast Radius Containment
Lambda's default behavior is to scale concurrency to meet demand, up to your account-level concurrency limit. In a shared AWS account with many Lambda functions, a traffic spike to one function can consume the account's concurrency budget, causing throttling for other functions.
Reserved concurrency sets a hard limit on the concurrency available to a specific function — and it also guarantees that concurrency is available, since reserved capacity is not shared with other functions.
# Reserve concurrency for a critical function
lambda_client.put_function_concurrency(
FunctionName='payment-processor',
ReservedConcurrentExecutions=100 # Reserve 100; guaranteed and capped
)
# For non-critical functions, set a cap to prevent them from consuming the pool
lambda_client.put_function_concurrency(
FunctionName='report-generator',
ReservedConcurrentExecutions=10 # Cap at 10; protect other functions
)The architecture principle: critical functions (payment processing, user authentication) should have reserved concurrency that guarantees their availability. Non-critical functions (report generation, batch exports) should have reserved concurrency that caps their consumption and protects the critical functions.
Lambda Observability Beyond CloudWatch Logs
CloudWatch Logs is the default Lambda logging destination, but structured log analysis requires more than raw text. The patterns that produce useful production observability:
Structured JSON logging:
import json
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def handler(event, context):
logger.info(json.dumps({
"event": "invocation_start",
"function_name": context.function_name,
"request_id": context.aws_request_id,
"remaining_time_ms": context.get_remaining_time_in_millis(),
"source_event_type": event.get('source', 'unknown')
}))
# ... processing
logger.info(json.dumps({
"event": "invocation_complete",
"request_id": context.aws_request_id,
"records_processed": records_processed,
"duration_ms": elapsed_ms
}))Lambda Powertools for structured logging, tracing, and metrics:
from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.metrics import MetricUnit
logger = Logger()
tracer = Tracer()
metrics = Metrics(namespace="MyService")
@logger.inject_lambda_context(log_event=True)
@tracer.capture_lambda_handler
@metrics.log_metrics
def handler(event, context):
metrics.add_metric(name="RecordsProcessed", unit=MetricUnit.Count, value=len(event['Records']))
with tracer.capture_method("process_record"):
for record in event['Records']:
process_record(record)Lambda Powertools is the fastest path to production-quality observability for Lambda functions. It handles structured logging, X-Ray tracing, custom CloudWatch metrics, and idempotency in a well-maintained library.
The Lambda Anti-Patterns to Avoid
Long-running operations in synchronous Lambda. Lambda has a 15-minute maximum execution timeout. If your "Lambda function" is actually orchestrating a multi-hour workflow, you need Step Functions or a different architecture. Lambda is not a batch job runner.
Database connections opened per-invocation. If you open a new database connection on every Lambda invocation and close it at the end, you'll exhaust your database connection pool at moderate Lambda concurrency. Use RDS Proxy (connection pooling for Lambda) or design your function to reuse connections across invocations (connections opened in initialization code persist for the lifetime of the execution environment).
Lambda for heavy computation. Lambda's CPU allocation is proportional to memory allocation. For CPU-intensive workloads (video transcoding, large-scale data transformation, ML inference), Fargate or EC2 is usually the better choice. Lambda's cost efficiency is for event-driven, IO-bound workloads.
Ignoring function memory sizing. Lambda's memory setting controls both RAM and CPU allocation. An under-memored function that's CPU-bound will run slower and often cost more in aggregate than the same function with higher memory (more CPU → faster execution → lower duration cost). Use Lambda Power Tuning to empirically find the optimal memory setting for your workload.
*Zak Hassan is a Staff SRE specializing in cloud-native reliability patterns, AI-powered operations, and data platform engineering. Find him at zakhassan.com or on LinkedIn.*
Topic Paths