
Introduction
Deploying microservices at scale requires a platform that handles container orchestration, networking, monitoring, and autoscaling without burdening your team with infrastructure management. AWS offers several options, but ECS with Fargate is one of the simplest and most powerful ways to run Spring Boot microservices in production. It removes the need to manage servers, clusters, or EC2 instances while providing enterprise-grade reliability and seamless integration with the AWS ecosystem. In this comprehensive guide, you will learn how ECS and Fargate work together, how to containerize and deploy Spring Boot applications, configure networking and load balancing, manage secrets securely, implement autoscaling, and set up comprehensive monitoring for production workloads.
What Are ECS and Fargate
AWS ECS (Elastic Container Service) is Amazon’s fully managed container orchestrator. It handles running containers, balancing traffic, managing deployments, and scaling your services across availability zones.
AWS Fargate is the serverless compute engine for ECS. It lets you run containers without provisioning or managing EC2 instances. You define CPU and memory requirements, and Fargate handles the underlying infrastructure.
ECS Core Concepts
// ECS Architecture Overview
//
// ┌─────────────────────────────────────────────────────────────┐
// │ ECS Cluster │
// │ ┌─────────────────────────────────────────────────────┐ │
// │ │ ECS Service │ │
// │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
// │ │ │ Task 1 │ │ Task 2 │ │ Task 3 │ │ │
// │ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ │
// │ │ │ │Container│ │ │ │Container│ │ │ │Container│ │ │ │
// │ │ │ │ (App) │ │ │ │ (App) │ │ │ │ (App) │ │ │ │
// │ │ │ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │ │ │
// │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
// │ └─────────────────────────────────────────────────────┘ │
// └─────────────────────────────────────────────────────────────┘
//
// Cluster: Logical grouping of services and tasks
// Service: Maintains desired count of running tasks
// Task: Running instance(s) of a task definition
// Task Definition: Blueprint for containers (image, CPU, memory, ports)
// Container: Your application running in Docker
Key Benefits of ECS + Fargate
• No servers to manage – Fargate handles all infrastructure
• Automatic scaling – Scale based on metrics or schedules
• High availability – Multi-AZ deployment by default
• Deep AWS integration – IAM, CloudWatch, ALB, Secrets Manager
• Predictable pricing – Pay per vCPU and memory per second
• Security isolation – Each task runs in its own kernel
• Fast deployments – Rolling updates with health checks
Preparing Your Spring Boot Microservice
Start by containerizing your Spring Boot application with an optimized Docker image.
Optimized Dockerfile
# Multi-stage build for smaller image
FROM eclipse-temurin:17-jdk-alpine AS builder
WORKDIR /app
# Copy gradle files first for better caching
COPY gradle gradle
COPY gradlew build.gradle settings.gradle ./
RUN ./gradlew dependencies --no-daemon
# Copy source and build
COPY src src
RUN ./gradlew bootJar --no-daemon -x test
# Runtime stage
FROM eclipse-temurin:17-jre-alpine
WORKDIR /app
# Create non-root user for security
RUN addgroup -S spring && adduser -S spring -G spring
USER spring:spring
# Copy the built jar
COPY --from=builder /app/build/libs/*.jar app.jar
# Expose port
EXPOSE 8080
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=60s --retries=3 \
CMD wget -qO- http://localhost:8080/actuator/health || exit 1
# JVM options for containers
ENV JAVA_OPTS="-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0 -XX:+UseG1GC"
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]
Spring Boot Configuration for ECS
# application.yml - Production configuration for ECS
spring:
application:
name: order-service
profiles:
active: ${SPRING_PROFILES_ACTIVE:prod}
server:
port: 8080
shutdown: graceful
management:
endpoints:
web:
exposure:
include: health,info,prometheus,metrics
base-path: /actuator
endpoint:
health:
show-details: always
probes:
enabled: true
health:
livenessState:
enabled: true
readinessState:
enabled: true
# Graceful shutdown for ECS deployments
spring:
lifecycle:
timeout-per-shutdown-phase: 30s
logging:
pattern:
console: '{"timestamp":"%d{ISO8601}","level":"%p","service":"${spring.application.name}","trace":"%X{traceId:-}","span":"%X{spanId:-}","message":"%m"}%n'
level:
root: INFO
com.example: DEBUG
Build and Push to ECR
#!/bin/bash
# deploy-to-ecr.sh
set -e
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
AWS_REGION="us-east-1"
ECR_REPO="order-service"
IMAGE_TAG="${GIT_COMMIT:-latest}"
# Create ECR repository if it doesn't exist
aws ecr describe-repositories --repository-names $ECR_REPO 2>/dev/null || \
aws ecr create-repository \
--repository-name $ECR_REPO \
--image-scanning-configuration scanOnPush=true \
--encryption-configuration encryptionType=AES256
# Login to ECR
aws ecr get-login-password --region $AWS_REGION | \
docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com
# Build image
docker build -t $ECR_REPO:$IMAGE_TAG .
# Tag for ECR
docker tag $ECR_REPO:$IMAGE_TAG \
$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$ECR_REPO:$IMAGE_TAG
# Push to ECR
docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$ECR_REPO:$IMAGE_TAG
echo "Image pushed: $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$ECR_REPO:$IMAGE_TAG"
Infrastructure as Code with Terraform
Define your ECS infrastructure using Terraform for reproducibility and version control.
VPC and Networking
# main.tf - VPC and networking
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = var.aws_region
}
# VPC
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "${var.project_name}-vpc"
cidr = "10.0.0.0/16"
azs = ["${var.aws_region}a", "${var.aws_region}b", "${var.aws_region}c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = var.environment != "prod"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Environment = var.environment
Project = var.project_name
}
}
# Security group for ALB
resource "aws_security_group" "alb" {
name = "${var.project_name}-alb-sg"
description = "Security group for ALB"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# Security group for ECS tasks
resource "aws_security_group" "ecs_tasks" {
name = "${var.project_name}-ecs-tasks-sg"
description = "Security group for ECS tasks"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
ECS Cluster and Service
# ecs.tf - ECS cluster, task definition, and service
# ECS Cluster
resource "aws_ecs_cluster" "main" {
name = "${var.project_name}-cluster"
setting {
name = "containerInsights"
value = "enabled"
}
tags = {
Environment = var.environment
}
}
# CloudWatch Log Group
resource "aws_cloudwatch_log_group" "ecs" {
name = "/ecs/${var.project_name}"
retention_in_days = 30
}
# IAM Role for ECS Task Execution
resource "aws_iam_role" "ecs_task_execution" {
name = "${var.project_name}-ecs-task-execution"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy_attachment" "ecs_task_execution" {
role = aws_iam_role.ecs_task_execution.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
# Allow reading secrets
resource "aws_iam_role_policy" "ecs_secrets" {
name = "${var.project_name}-ecs-secrets"
role = aws_iam_role.ecs_task_execution.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = [
"secretsmanager:GetSecretValue",
"ssm:GetParameters"
]
Resource = [
"arn:aws:secretsmanager:${var.aws_region}:${data.aws_caller_identity.current.account_id}:secret:${var.project_name}/*",
"arn:aws:ssm:${var.aws_region}:${data.aws_caller_identity.current.account_id}:parameter/${var.project_name}/*"
]
}]
})
}
# IAM Role for ECS Tasks (application permissions)
resource "aws_iam_role" "ecs_task" {
name = "${var.project_name}-ecs-task"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}]
})
}
# Task Definition
resource "aws_ecs_task_definition" "app" {
family = "${var.project_name}-app"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = var.task_cpu
memory = var.task_memory
execution_role_arn = aws_iam_role.ecs_task_execution.arn
task_role_arn = aws_iam_role.ecs_task.arn
container_definitions = jsonencode([{
name = "app"
image = "${data.aws_caller_identity.current.account_id}.dkr.ecr.${var.aws_region}.amazonaws.com/${var.project_name}:${var.image_tag}"
portMappings = [{
containerPort = 8080
protocol = "tcp"
}]
environment = [
{
name = "SPRING_PROFILES_ACTIVE"
value = var.environment
},
{
name = "SERVER_PORT"
value = "8080"
}
]
secrets = [
{
name = "SPRING_DATASOURCE_URL"
valueFrom = "arn:aws:secretsmanager:${var.aws_region}:${data.aws_caller_identity.current.account_id}:secret:${var.project_name}/db-url"
},
{
name = "SPRING_DATASOURCE_USERNAME"
valueFrom = "arn:aws:secretsmanager:${var.aws_region}:${data.aws_caller_identity.current.account_id}:secret:${var.project_name}/db-username"
},
{
name = "SPRING_DATASOURCE_PASSWORD"
valueFrom = "arn:aws:secretsmanager:${var.aws_region}:${data.aws_caller_identity.current.account_id}:secret:${var.project_name}/db-password"
}
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.ecs.name
"awslogs-region" = var.aws_region
"awslogs-stream-prefix" = "ecs"
}
}
healthCheck = {
command = ["CMD-SHELL", "wget -qO- http://localhost:8080/actuator/health || exit 1"]
interval = 30
timeout = 5
retries = 3
startPeriod = 60
}
}])
}
# ECS Service
resource "aws_ecs_service" "app" {
name = "${var.project_name}-service"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.app.arn
desired_count = var.desired_count
launch_type = "FARGATE"
deployment_configuration {
maximum_percent = 200
minimum_healthy_percent = 100
}
deployment_circuit_breaker {
enable = true
rollback = true
}
network_configuration {
subnets = module.vpc.private_subnets
security_groups = [aws_security_group.ecs_tasks.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.app.arn
container_name = "app"
container_port = 8080
}
depends_on = [aws_lb_listener.https]
}
Application Load Balancer
# alb.tf - Application Load Balancer
resource "aws_lb" "main" {
name = "${var.project_name}-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = module.vpc.public_subnets
enable_deletion_protection = var.environment == "prod"
tags = {
Environment = var.environment
}
}
resource "aws_lb_target_group" "app" {
name = "${var.project_name}-tg"
port = 8080
protocol = "HTTP"
vpc_id = module.vpc.vpc_id
target_type = "ip"
health_check {
enabled = true
healthy_threshold = 2
unhealthy_threshold = 3
timeout = 5
interval = 30
path = "/actuator/health"
port = "traffic-port"
protocol = "HTTP"
matcher = "200"
}
deregistration_delay = 30
}
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.main.arn
port = 80
protocol = "HTTP"
default_action {
type = "redirect"
redirect {
port = "443"
protocol = "HTTPS"
status_code = "HTTP_301"
}
}
}
resource "aws_lb_listener" "https" {
load_balancer_arn = aws_lb.main.arn
port = 443
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
certificate_arn = var.certificate_arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.app.arn
}
}
Autoscaling Configuration
Configure autoscaling to handle variable traffic loads efficiently.
# autoscaling.tf
resource "aws_appautoscaling_target" "ecs" {
max_capacity = var.max_capacity
min_capacity = var.min_capacity
resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.app.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
# Scale based on CPU utilization
resource "aws_appautoscaling_policy" "cpu" {
name = "${var.project_name}-cpu-autoscaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.ecs.resource_id
scalable_dimension = aws_appautoscaling_target.ecs.scalable_dimension
service_namespace = aws_appautoscaling_target.ecs.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = 70.0
scale_in_cooldown = 300
scale_out_cooldown = 60
}
}
# Scale based on memory utilization
resource "aws_appautoscaling_policy" "memory" {
name = "${var.project_name}-memory-autoscaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.ecs.resource_id
scalable_dimension = aws_appautoscaling_target.ecs.scalable_dimension
service_namespace = aws_appautoscaling_target.ecs.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageMemoryUtilization"
}
target_value = 80.0
scale_in_cooldown = 300
scale_out_cooldown = 60
}
}
# Scale based on ALB request count
resource "aws_appautoscaling_policy" "requests" {
name = "${var.project_name}-requests-autoscaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.ecs.resource_id
scalable_dimension = aws_appautoscaling_target.ecs.scalable_dimension
service_namespace = aws_appautoscaling_target.ecs.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ALBRequestCountPerTarget"
resource_label = "${aws_lb.main.arn_suffix}/${aws_lb_target_group.app.arn_suffix}"
}
target_value = 1000.0 # requests per target
scale_in_cooldown = 300
scale_out_cooldown = 60
}
}
Secrets Management
Store sensitive configuration in AWS Secrets Manager and inject it into containers securely.
# secrets.tf
resource "aws_secretsmanager_secret" "db_url" {
name = "${var.project_name}/db-url"
}
resource "aws_secretsmanager_secret_version" "db_url" {
secret_id = aws_secretsmanager_secret.db_url.id
secret_string = "jdbc:postgresql://${aws_db_instance.main.endpoint}/${var.db_name}"
}
resource "aws_secretsmanager_secret" "db_username" {
name = "${var.project_name}/db-username"
}
resource "aws_secretsmanager_secret_version" "db_username" {
secret_id = aws_secretsmanager_secret.db_username.id
secret_string = var.db_username
}
resource "aws_secretsmanager_secret" "db_password" {
name = "${var.project_name}/db-password"
}
resource "aws_secretsmanager_secret_version" "db_password" {
secret_id = aws_secretsmanager_secret.db_password.id
secret_string = random_password.db.result
}
resource "random_password" "db" {
length = 32
special = true
}
CI/CD Pipeline with GitHub Actions
Automate deployments with a GitHub Actions workflow.
# .github/workflows/deploy.yml
name: Deploy to ECS
on:
push:
branches: [main]
workflow_dispatch:
env:
AWS_REGION: us-east-1
ECR_REPOSITORY: order-service
ECS_CLUSTER: order-service-cluster
ECS_SERVICE: order-service-service
CONTAINER_NAME: app
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up JDK 17
uses: actions/setup-java@v4
with:
java-version: '17'
distribution: 'temurin'
cache: 'gradle'
- name: Run tests
run: ./gradlew test
- name: Upload test results
uses: actions/upload-artifact@v4
if: always()
with:
name: test-results
path: build/reports/tests/
deploy:
needs: test
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
- name: Build, tag, and push image
id: build-image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
echo "image=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT
- name: Download task definition
run: |
aws ecs describe-task-definition \
--task-definition ${{ env.ECS_SERVICE }} \
--query taskDefinition > task-definition.json
- name: Update task definition
id: task-def
uses: aws-actions/amazon-ecs-render-task-definition@v1
with:
task-definition: task-definition.json
container-name: ${{ env.CONTAINER_NAME }}
image: ${{ steps.build-image.outputs.image }}
- name: Deploy to ECS
uses: aws-actions/amazon-ecs-deploy-task-definition@v1
with:
task-definition: ${{ steps.task-def.outputs.task-definition }}
service: ${{ env.ECS_SERVICE }}
cluster: ${{ env.ECS_CLUSTER }}
wait-for-service-stability: true
Monitoring and Observability
Set up comprehensive monitoring for your ECS services.
# monitoring.tf - CloudWatch alarms
resource "aws_cloudwatch_metric_alarm" "cpu_high" {
alarm_name = "${var.project_name}-cpu-high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CPUUtilization"
namespace = "AWS/ECS"
period = 60
statistic = "Average"
threshold = 80
alarm_description = "CPU utilization is too high"
dimensions = {
ClusterName = aws_ecs_cluster.main.name
ServiceName = aws_ecs_service.app.name
}
alarm_actions = [aws_sns_topic.alerts.arn]
ok_actions = [aws_sns_topic.alerts.arn]
}
resource "aws_cloudwatch_metric_alarm" "memory_high" {
alarm_name = "${var.project_name}-memory-high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "MemoryUtilization"
namespace = "AWS/ECS"
period = 60
statistic = "Average"
threshold = 80
alarm_description = "Memory utilization is too high"
dimensions = {
ClusterName = aws_ecs_cluster.main.name
ServiceName = aws_ecs_service.app.name
}
alarm_actions = [aws_sns_topic.alerts.arn]
}
resource "aws_cloudwatch_metric_alarm" "alb_5xx" {
alarm_name = "${var.project_name}-5xx-errors"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "HTTPCode_Target_5XX_Count"
namespace = "AWS/ApplicationELB"
period = 60
statistic = "Sum"
threshold = 10
alarm_description = "Too many 5xx errors"
dimensions = {
LoadBalancer = aws_lb.main.arn_suffix
TargetGroup = aws_lb_target_group.app.arn_suffix
}
alarm_actions = [aws_sns_topic.alerts.arn]
}
resource "aws_sns_topic" "alerts" {
name = "${var.project_name}-alerts"
}
Common Mistakes to Avoid
Avoid these common pitfalls when deploying Spring Boot to ECS.
1. Insufficient Health Check Grace Period
// ❌ Bad: Spring Boot takes 30+ seconds to start
healthCheck = {
startPeriod = 10 // Too short!
}
// ✅ Good: Allow time for Spring Boot initialization
healthCheck = {
startPeriod = 60 // Or more for complex apps
interval = 30
retries = 3
}
2. Not Using Container-Aware JVM Settings
# ❌ Bad: JVM doesn't respect container limits
ENTRYPOINT ["java", "-jar", "app.jar"]
# ✅ Good: Container-aware JVM settings
ENV JAVA_OPTS="-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0"
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]
3. Hardcoding Secrets in Task Definition
// ❌ Bad: Secrets in plain text
environment = [{
name = "DB_PASSWORD"
value = "supersecret123" // Exposed in console!
}]
// ✅ Good: Use Secrets Manager
secrets = [{
name = "DB_PASSWORD"
valueFrom = "arn:aws:secretsmanager:region:account:secret:db-password"
}]
4. No Graceful Shutdown
# ❌ Bad: Connections dropped during deployment
server:
shutdown: immediate
# ✅ Good: Graceful shutdown
server:
shutdown: graceful
spring:
lifecycle:
timeout-per-shutdown-phase: 30s
Conclusion
Deploying Spring Boot microservices to AWS ECS and Fargate provides a clean, scalable, and operationally simple way to run your applications in the cloud. You package your service as an optimized Docker image, push it to ECR, define infrastructure as code with Terraform, and let AWS handle container orchestration, networking, and scaling. The combination of ECS, Fargate, and supporting services like ALB, Secrets Manager, and CloudWatch creates a production-ready platform with minimal operational burden.
The key to success is proper configuration of health checks, graceful shutdown, container-aware JVM settings, and comprehensive monitoring. With these foundations in place, your Spring Boot microservices will run reliably and scale automatically to meet demand.
To continue building your cloud skills, read Serverless Applications with AWS Lambda & API Gateway. For microservice architecture patterns, see Migrating from a Monolith to Microservices with Spring Boot. You can also explore the Amazon ECS documentation, the ECS best practices guide, and the Spring Boot deployment documentation for additional guidance.