Building a Production-Ready AWS EKS Architecture
Building a production-ready Kubernetes cluster on AWS Elastic Kubernetes Service (EKS) requires careful consideration of networking, ingress, data persistence, and content delivery. In this guide, I'll walk through a comprehensive architecture that combines the power of EKS with Istio service mesh, Application Load Balancer (ALB), CloudFront, RDS Aurora, ElastiCache Redis, and S3.
Architecture Overview
This architecture provides a robust, scalable, and highly available platform for running containerized applications on AWS. The design follows AWS best practices and includes:
- EKS Cluster - Managed Kubernetes control plane
- Istio Service Mesh - Advanced traffic management and ingress
- Application Load Balancer (ALB) - Layer 7 load balancing
- CloudFront - Global content delivery network
- RDS Aurora - Managed relational database
- ElastiCache Redis - Managed in-memory data store
- S3 - Object storage for static assets and backups
Component Breakdown
1. Amazon EKS Cluster
Amazon EKS provides a managed Kubernetes control plane that automatically handles updates, patching, and scaling. For production workloads, use at least 3 worker nodes across multiple Availability Zones.
Key Considerations:
- Cluster Versioning: Always use a supported Kubernetes version (check AWS EKS release calendar)
- Node Groups: Use managed node groups for simplicity or self-managed node groups for customization
- Networking: Choose between VPC CNI (default) or alternative CNI plugins based on IP requirements
- IAM Integration: Leverage IRSA (IAM Roles for Service Accounts) for secure pod-level permissions
Example EKS Cluster Configuration:
# eks-cluster.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: production-cluster
region: us-east-1
version: "1.28"
vpc:
cidr: 10.0.0.0/16
nat:
gateway: HighlyAvailable
managedNodeGroups:
- name: worker-nodes
instanceType: m5.large
minSize: 3
maxSize: 10
desiredCapacity: 3
volumeSize: 100
iam:
withAddonPolicies:
autoScaler: true
cloudWatch: true
2. Istio Service Mesh
Istio provides advanced traffic management, security, and observability. The Istio Ingress Gateway replaces the traditional Kubernetes ingress controller and offers more sophisticated routing capabilities.
Installation Steps:
# Download Istio
curl -L https://istio.io/downloadIstio | sh -
cd istio-*
# Install Istio with ingress gateway
istioctl install --set profile=default \
--set values.gateways.istio-ingressgateway.serviceAnnotations."service\.beta\.kubernetes\.io/aws-load-balancer-type"=nlb \
--set values.gateways.istio-ingressgateway.serviceAnnotations."service\.beta\.kubernetes\.io/aws-load-balancer-scheme"=internet-facing
# Enable Istio sidecar injection
kubectl label namespace default istio-injection=enabled
Istio Ingress Gateway Configuration:
# istio-gateway.yaml
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: main-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
tls:
httpsRedirect: true
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
credentialName: tls-certificate
hosts:
- "*.example.com"
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: main-vs
spec:
hosts:
- "*"
gateways:
- main-gateway
http:
- match:
- uri:
prefix: "/api"
route:
- destination:
host: api-service
port:
number: 8080
- route:
- destination:
host: web-service
port:
number: 3000
3. Application Load Balancer (ALB)
The ALB sits in front of Istio Ingress Gateway, providing SSL termination and additional load balancing capabilities. The ALB Ingress Controller manages ALB resources via Kubernetes annotations.
ALB Ingress Controller Setup:
# alb-ingress-controller.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: aws-load-balancer-controller
namespace: kube-system
spec:
replicas: 2
selector:
matchLabels:
app: aws-load-balancer-controller
template:
metadata:
labels:
app: aws-load-balancer-controller
spec:
serviceAccountName: aws-load-balancer-controller
containers:
- name: controller
image: amazon/aws-load-balancer-controller:v2.7.0
args:
- --cluster-name=production-cluster
- --ingress-class=alb
- --aws-region=us-east-1
Ingress Configuration with ALB:
# ingress-alb.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: alb-ingress
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/ssl-redirect: '443'
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789:certificate/xxx
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
spec:
ingressClassName: alb
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: istio-ingressgateway
port:
number: 80
4. Amazon CloudFront
CloudFront provides global content delivery, reducing latency and offloading traffic from your origin. Configure CloudFront to use the ALB as the origin.
CloudFront Distribution Setup:
{
"DistributionConfig": {
"Origins": {
"Items": [
{
"Id": "ALB-Origin",
"DomainName": "alb-123456789.us-east-1.elb.amazonaws.com",
"CustomOriginConfig": {
"HTTPPort": 80,
"HTTPSPort": 443,
"OriginProtocolPolicy": "https-only",
"OriginSslProtocols": {
"Items": ["TLSv1.2"]
}
}
}
]
},
"DefaultCacheBehavior": {
"TargetOriginId": "ALB-Origin",
"ViewerProtocolPolicy": "redirect-to-https",
"AllowedMethods": {
"Items": ["GET", "HEAD", "OPTIONS", "PUT", "POST", "PATCH", "DELETE"]
},
"Compress": true,
"CachePolicyId": "4135ea2d-6df8-44a3-9df3-4b5a84be39ad"
},
"Enabled": true,
"PriceClass": "PriceClass_100"
}
}
Benefits:
- Global Edge Locations: Content served from nearest location
- DDoS Protection: AWS Shield Standard included
- SSL/TLS: Free SSL certificates via AWS Certificate Manager
- Compression: Automatic gzip compression
- Custom Error Pages: Better error handling
5. Amazon RDS Aurora
RDS Aurora provides a managed PostgreSQL or MySQL-compatible database with automatic failover, backup, and scaling capabilities.
Aurora Cluster Configuration:
# Terraform example
resource "aws_rds_cluster" "aurora_cluster" {
cluster_identifier = "production-aurora"
engine = "aurora-postgresql"
engine_version = "15.4"
database_name = "appdb"
master_username = "admin"
master_password = var.db_password
backup_retention_period = 7
preferred_backup_window = "03:00-04:00"
db_subnet_group_name = aws_db_subnet_group.aurora.name
vpc_security_group_ids = [aws_security_group.aurora.id]
enabled_cloudwatch_logs_exports = ["postgresql"]
deletion_protection = true
serverlessv2_scaling_configuration {
max_capacity = 16
min_capacity = 2
}
}
resource "aws_rds_cluster_instance" "aurora_instance" {
count = 2
identifier = "production-aurora-${count.index}"
cluster_identifier = aws_rds_cluster.aurora_cluster.id
instance_class = "db.serverless"
engine = aws_rds_cluster.aurora_cluster.engine
engine_version = aws_rds_cluster.aurora_cluster.engine_version
}
Connection from Kubernetes:
# secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: aurora-credentials
type: Opaque
stringData:
host: production-aurora.cluster-xxxxx.us-east-1.rds.amazonaws.com
port: "5432"
database: appdb
username: admin
password: <password>
---
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-deployment
spec:
template:
spec:
containers:
- name: app
env:
- name: DB_HOST
valueFrom:
secretKeyRef:
name: aurora-credentials
key: host
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: aurora-credentials
key: password
6. Amazon ElastiCache Redis
ElastiCache Redis provides managed in-memory caching and session storage with automatic failover and backup capabilities.
Redis Cluster Setup:
resource "aws_elasticache_replication_group" "redis" {
replication_group_id = "production-redis"
description = "Redis cluster for caching and sessions"
node_type = "cache.r6g.large"
port = 6379
parameter_group_name = "default.redis7"
engine_version = "7.0"
num_cache_clusters = 2
automatic_failover_enabled = true
multi_az_enabled = true
subnet_group_name = aws_elasticache_subnet_group.redis.name
security_group_ids = [aws_security_group.redis.id]
at_rest_encryption_enabled = true
transit_encryption_enabled = true
auth_token = var.redis_auth_token
snapshot_retention_limit = 5
maintenance_window = "mon:03:00-mon:04:00"
}
Kubernetes Connection:
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-config
data:
redis.conf: |
host: production-redis.xxxxx.cache.amazonaws.com
port: 6379
tls: true
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-deployment
spec:
template:
spec:
containers:
- name: app
volumeMounts:
- name: redis-config
mountPath: /etc/redis
env:
- name: REDIS_HOST
value: production-redis.xxxxx.cache.amazonaws.com
- name: REDIS_AUTH_TOKEN
valueFrom:
secretKeyRef:
name: redis-credentials
key: auth-token
7. Amazon S3
S3 serves as object storage for static assets, backups, and application data. Use S3 lifecycle policies to optimize costs.
S3 Bucket Configuration:
resource "aws_s3_bucket" "app_storage" {
bucket = "production-app-storage"
}
resource "aws_s3_bucket_versioning" "app_storage" {
bucket = aws_s3_bucket.app_storage.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "app_storage" {
bucket = aws_s3_bucket.app_storage.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
resource "aws_s3_bucket_lifecycle_configuration" "app_storage" {
bucket = aws_s3_bucket.app_storage.id
rule {
id = "transition-to-ia"
status = "Enabled"
transition {
days = 30
storage_class = "STANDARD_IA"
}
transition {
days = 90
storage_class = "GLACIER"
}
}
}
Kubernetes Access via IRSA:
# serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: app-sa
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/app-s3-access-role
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-deployment
spec:
template:
spec:
serviceAccountName: app-sa
containers:
- name: app
env:
- name: AWS_REGION
value: us-east-1
- name: S3_BUCKET
value: production-app-storage
Network Architecture
The architecture uses a multi-AZ VPC layout for high availability:
Internet
│
├─ CloudFront (Global Edge Locations)
│
├─ ALB (Multi-AZ, Public Subnets)
│
├─ Istio Ingress Gateway (Private Subnets)
│
├─ EKS Worker Nodes (Private Subnets)
│ ├─ Application Pods
│ └─ Istio Sidecars
│
├─ RDS Aurora (Private Subnets, Multi-AZ)
│
├─ ElastiCache Redis (Private Subnets, Multi-AZ)
│
└─ S3 (Regional Storage)
Security Best Practices
- Network Security: All compute resources in private subnets, only ALB in public subnets
- Encryption in Transit: TLS/SSL everywhere (CloudFront → ALB → Istio → Pods)
- Encryption at Rest: Enable encryption for RDS, ElastiCache, and S3
- IAM Integration: Use IRSA for pod-level permissions, avoid long-lived credentials
- Secrets Management: Store secrets in AWS Secrets Manager or Kubernetes Secrets
- Network Policies: Implement Kubernetes Network Policies for pod-to-pod communication
- VPC Endpoints: Use VPC endpoints for private AWS service access
Monitoring and Observability
- CloudWatch: Cluster and node metrics, custom application metrics
- Prometheus + Grafana: Kubernetes and Istio metrics
- AWS X-Ray: Distributed tracing for microservices
- CloudWatch Logs: Centralized log aggregation
- Kubernetes Dashboard: Cluster management UI
Cost Optimization
- Right-size EKS nodes: Use cluster autoscaler, spot instances for non-critical workloads
- Aurora Serverless v2: Scale database capacity based on demand
- CloudFront caching: Reduce origin load and data transfer costs
- S3 lifecycle policies: Move data to cheaper storage classes
- Reserved Capacity: Consider Reserved Instances for predictable workloads
Conclusion
This architecture provides a solid foundation for running production workloads on AWS EKS. By combining EKS with Istio, ALB, CloudFront, RDS Aurora, Redis, and S3, you get a scalable, secure, and highly available platform that follows AWS best practices.
Key takeaways:
- EKS provides managed Kubernetes with AWS integration
- Istio adds advanced traffic management and security
- ALB + CloudFront ensure global performance and availability
- RDS Aurora provides managed, scalable databases
- ElastiCache Redis handles caching and session storage
- S3 serves as durable object storage
Start with the core components and gradually add complexity as your requirements evolve. Always monitor costs and performance to ensure you're getting optimal value from each service.