Kubernetes Deployment
Complete Kubernetes deployment guide for Elsa Workflows including Helm charts, deployment configurations, ingress setup, autoscaling, monitoring, service mesh integration, and production best practice
This comprehensive guide covers deploying Elsa Workflows to Kubernetes in production environments. Whether you're using managed Kubernetes services (EKS, AKS, GKE) or self-hosted clusters, this guide provides everything you need for a reliable, scalable deployment.
Overview
Elsa Workflows can be deployed to Kubernetes using either:
Helm Charts (Recommended) - Simplified deployment and management
Raw Kubernetes Manifests - Full control over configuration
This guide covers both approaches and includes:
Elsa Server and Studio deployments
Database integration and persistence
Ingress configuration for external access
Horizontal Pod Autoscaling (HPA)
Monitoring with Prometheus and Grafana
Service mesh integration (Istio/Linkerd)
Production best practices and troubleshooting
Table of Contents
Prerequisites
Before deploying to Kubernetes, ensure you have:
Required Tools
kubectl v1.28+ - Kubernetes command-line tool
Helm v3.12+ - Kubernetes package manager (if using Helm charts)
Docker - For building custom images (optional)
Access to a Kubernetes cluster (v1.28+)
Cluster Requirements
Minimum: 2 nodes with 4GB RAM and 2 CPU cores each
Recommended: 3+ nodes with 8GB RAM and 4 CPU cores each
Storage: Dynamic volume provisioning support (for databases)
Ingress Controller: NGINX, Traefik, or cloud provider load balancer
Knowledge Requirements
Basic Kubernetes concepts (Pods, Services, Deployments)
Understanding of Elsa architecture (see Architecture Overview)
Familiarity with database configuration
Basic YAML syntax
Architecture Overview
A typical Elsa Workflows Kubernetes deployment consists of:
┌─────────────────────────────────────────────────────────────┐
│ Ingress Controller │
│ (NGINX / Traefik / ALB) │
└─────────────────┬───────────────────────┬───────────────────┘
│ │
┌────────▼──────────┐ ┌───────▼────────┐
│ Elsa Studio │ │ Elsa Server │
│ (Deployment) │ │ (Deployment) │
│ Replicas: 2+ │ │ Replicas: 3+ │
└───────────────────┘ └────────┬───────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌────────▼────────┐ ┌───────▼────────┐ ┌───────▼────────┐
│ PostgreSQL │ │ Redis │ │ RabbitMQ │
│ (StatefulSet) │ │ (StatefulSet) │ │ (StatefulSet) │
│ + PVC │ │ + PVC │ │ + PVC │
└─────────────────┘ └────────────────┘ └────────────────┘Components
Elsa Server: Hosts the workflow engine and REST API
Elsa Studio: Visual workflow designer (optional, can be separate)
Database: PostgreSQL, SQL Server, or MySQL (with persistent storage)
Redis: Distributed caching and locking
RabbitMQ: Message broker for distributed cache invalidation (via MassTransit)
Ingress: External access routing
Monitoring: Prometheus metrics and Grafana dashboards
Helm Chart Deployment
Helm is the recommended approach for deploying Elsa Workflows to Kubernetes. While official Helm charts are under development, this section provides a production-ready chart configuration.
Step 1: Create Helm Chart Structure
Create a new Helm chart for Elsa:
helm create elsa-workflows
cd elsa-workflowsStep 2: Configure Values
Create a values.yaml file with the following configuration:
# values.yaml - Elsa Workflows Helm Chart Configuration
# Global settings
global:
imageRegistry: docker.io
imagePullPolicy: IfNotPresent
storageClass: "" # Use default storage class
# Elsa Server configuration
elsaServer:
enabled: true
name: elsa-server
image:
repository: elsaworkflows/elsa-server-v3-5
tag: latest
pullPolicy: IfNotPresent
replicaCount: 3
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
env:
- name: ASPNETCORE_ENVIRONMENT
value: "Production"
- name: HTTP_PORTS
value: "8080"
- name: DATABASEPROVIDER
value: "PostgreSql"
- name: CONNECTIONSTRINGS__POSTGRESQL
valueFrom:
secretKeyRef:
name: elsa-secrets
key: postgresql-connection-string
- name: REDIS__CONNECTIONSTRING
valueFrom:
secretKeyRef:
name: elsa-secrets
key: redis-connection-string
- name: RABBITMQ__CONNECTIONSTRING
valueFrom:
secretKeyRef:
name: elsa-secrets
key: rabbitmq-connection-string
service:
type: ClusterIP
port: 80
targetPort: 8080
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 20
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
# Elsa Studio configuration
elsaStudio:
enabled: true
name: elsa-studio
image:
repository: elsaworkflows/elsa-studio-v3-5
tag: latest
pullPolicy: IfNotPresent
replicaCount: 2
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
env:
- name: ASPNETCORE_ENVIRONMENT
value: "Production"
- name: HTTP_PORTS
value: "8080"
- name: ELSASERVER__URL
value: "http://elsa-server/elsa/api"
service:
type: ClusterIP
port: 80
targetPort: 8080
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 75
# PostgreSQL configuration
postgresql:
enabled: true
auth:
username: elsa
password: "" # Set via secret
database: elsa
primary:
persistence:
enabled: true
size: 50Gi
storageClass: "" # Use default
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "4Gi"
cpu: "2000m"
initdb:
scripts:
init.sql: |
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS "pg_trgm";
# Redis configuration (for distributed locking and caching)
redis:
enabled: true
architecture: standalone
auth:
enabled: true
password: "" # Set via secret
master:
persistence:
enabled: true
size: 10Gi
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
# RabbitMQ configuration (for MassTransit)
rabbitmq:
enabled: true
auth:
username: elsa
password: "" # Set via secret
persistence:
enabled: true
size: 20Gi
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
replicaCount: 3
clustering:
enabled: true
# Ingress configuration
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
hosts:
- host: studio.example.com
paths:
- path: /
pathType: Prefix
service: elsa-studio
- host: api.example.com
paths:
- path: /
pathType: Prefix
service: elsa-server
tls:
- secretName: elsa-tls
hosts:
- studio.example.com
- api.example.com
# Monitoring configuration
monitoring:
enabled: true
serviceMonitor:
enabled: true
interval: 30s
grafana:
enabled: true
dashboards:
enabled: true
# Security settings
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
capabilities:
drop:
- ALL
# Pod disruption budget
podDisruptionBudget:
enabled: true
minAvailable: 1
# Network policies
networkPolicy:
enabled: true
policyTypes:
- Ingress
- EgressStep 3: Create Secrets
Create a Kubernetes secret for sensitive configuration:
export DB_PASSWORD='<your_postgres_password>'
export REDIS_PASSWORD='<your_redis_password>'
export RABBITMQ_PASSWORD='<your_rabbitmq_password>'
kubectl create secret generic elsa-secrets \
--from-literal=postgresql-connection-string="Server=elsa-postgresql;Username=elsa;Database=elsa;Port=5432;Password=${DB_PASSWORD};SSLMode=Require;MaxPoolSize=100" \
--from-literal=redis-connection-string="elsa-redis-master:6379,password=${REDIS_PASSWORD},ssl=False,abortConnect=False" \
--from-literal=rabbitmq-connection-string="amqp://elsa:${RABBITMQ_PASSWORD}@elsa-rabbitmq:5672/" \
--namespace elsa-workflowsSecurity Best Practice
Never commit secrets to version control. Use external secret management tools like:
Sealed Secrets
External Secrets Operator
HashiCorp Vault
Cloud provider secret managers (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager)
Step 4: Install with Helm
# Add required Helm repositories
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
# Create namespace
kubectl create namespace elsa-workflows
# Install Elsa Workflows
# Store passwords in a secure values file (e.g., secrets.yaml) and never commit it to source control.
helm install elsa-workflows ./elsa-workflows \
--namespace elsa-workflows \
--values values.yaml \
--values secrets.yaml # Contains sensitive values, never committedStep 5: Verify Deployment
# Check deployment status
kubectl get pods -n elsa-workflows
# View logs
kubectl logs -n elsa-workflows -l app=elsa-server --tail=50
# Check services
kubectl get svc -n elsa-workflows
# Verify ingress
kubectl get ingress -n elsa-workflowsUpgrading
To upgrade your deployment:
# Update values in values.yaml, then:
helm upgrade elsa-workflows ./elsa-workflows \
--namespace elsa-workflows \
--values values.yaml
# Check rollout status
kubectl rollout status deployment/elsa-server -n elsa-workflowsUninstalling
helm uninstall elsa-workflows --namespace elsa-workflowsKubernetes Manifest Deployment
For full control over your deployment, you can use raw Kubernetes manifests. This section provides production-ready YAML configurations.
Directory Structure
k8s/
├── namespace.yaml
├── secrets.yaml
├── configmaps.yaml
├── postgresql/
│ ├── statefulset.yaml
│ ├── service.yaml
│ └── pvc.yaml
├── redis/
│ ├── statefulset.yaml
│ └── service.yaml
├── rabbitmq/
│ ├── statefulset.yaml
│ └── service.yaml
├── elsa-server/
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── hpa.yaml
│ └── pdb.yaml
├── elsa-studio/
│ ├── deployment.yaml
│ ├── service.yaml
│ └── hpa.yaml
└── ingress.yamlNamespace
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: elsa-workflows
labels:
name: elsa-workflows
environment: productionConfigMap
# configmaps.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: elsa-config
namespace: elsa-workflows
data:
ASPNETCORE_ENVIRONMENT: "Production"
HTTP_PORTS: "8080"
DATABASEPROVIDER: "PostgreSql"
# Add non-sensitive configuration hereSecrets
# secrets.yaml
# DO NOT commit this file with actual values!
# Use kubectl create secret or external secret management
apiVersion: v1
kind: Secret
metadata:
name: elsa-secrets
namespace: elsa-workflows
type: Opaque
stringData:
postgresql-connection-string: "Server=elsa-postgresql;Username=elsa;Database=elsa;Port=5432;Password=CHANGE_ME;SSLMode=Prefer;MaxPoolSize=100"
redis-connection-string: "elsa-redis:6379,password=CHANGE_ME,ssl=False,abortConnect=False"
rabbitmq-connection-string: "amqp://elsa:CHANGE_ME@elsa-rabbitmq:5672/"Elsa Server Deployment
# elsa-server/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: elsa-server
namespace: elsa-workflows
labels:
app: elsa-server
component: api
version: v3.5
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: elsa-server
template:
metadata:
labels:
app: elsa-server
component: api
version: v3.5
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- elsa-server
topologyKey: kubernetes.io/hostname
containers:
- name: elsa-server
image: elsaworkflows/elsa-server-v3-5:latest
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: ASPNETCORE_ENVIRONMENT
value: "Production"
- name: HTTP_PORTS
value: "8080"
- name: DATABASEPROVIDER
value: "PostgreSql"
- name: CONNECTIONSTRINGS__POSTGRESQL
valueFrom:
secretKeyRef:
name: elsa-secrets
key: postgresql-connection-string
- name: REDIS__CONNECTIONSTRING
valueFrom:
secretKeyRef:
name: elsa-secrets
key: redis-connection-string
- name: RABBITMQ__CONNECTIONSTRING
valueFrom:
secretKeyRef:
name: elsa-secrets
key: rabbitmq-connection-string
# Distributed runtime configuration
- name: ELSA__RUNTIME__TYPE
value: "Distributed"
- name: ELSA__CACHING__TYPE
value: "Distributed"
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /health/live
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: http
initialDelaySeconds: 20
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: false # Set to true if application supports it
# If readOnlyRootFilesystem: true, mount volumes for writable paths:
# volumeMounts:
# - name: tmp
# mountPath: /tmpElsa Server Service
# elsa-server/service.yaml
apiVersion: v1
kind: Service
metadata:
name: elsa-server
namespace: elsa-workflows
labels:
app: elsa-server
spec:
type: ClusterIP
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
selector:
app: elsa-serverElsa Studio Deployment
# elsa-studio/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: elsa-studio
namespace: elsa-workflows
labels:
app: elsa-studio
component: ui
version: v3.5
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: elsa-studio
template:
metadata:
labels:
app: elsa-studio
component: ui
version: v3.5
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: elsa-studio
image: elsaworkflows/elsa-studio-v3-5:latest
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: ASPNETCORE_ENVIRONMENT
value: "Production"
- name: HTTP_PORTS
value: "8080"
- name: ELSASERVER__URL
value: "http://elsa-server/elsa/api"
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 20
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 10
periodSeconds: 5
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALLElsa Studio Service
# elsa-studio/service.yaml
apiVersion: v1
kind: Service
metadata:
name: elsa-studio
namespace: elsa-workflows
labels:
app: elsa-studio
spec:
type: ClusterIP
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
selector:
app: elsa-studioDeploy All Manifests
# Create namespace
kubectl apply -f k8s/namespace.yaml
# Create secrets (use environment-specific values)
kubectl apply -f k8s/secrets.yaml
# Deploy infrastructure (database, cache, message broker)
kubectl apply -f k8s/postgresql/
kubectl apply -f k8s/redis/
kubectl apply -f k8s/rabbitmq/
# Wait for infrastructure to be ready
kubectl wait --for=condition=ready pod -l app=postgresql -n elsa-workflows --timeout=300s
# Deploy Elsa components
kubectl apply -f k8s/elsa-server/
kubectl apply -f k8s/elsa-studio/
# Configure ingress
kubectl apply -f k8s/ingress.yamlDatabase Configuration
Proper database configuration is crucial for production Kubernetes deployments. This section covers PostgreSQL, SQL Server, and MySQL configurations.
PostgreSQL StatefulSet
# postgresql/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elsa-postgresql
namespace: elsa-workflows
spec:
serviceName: elsa-postgresql
replicas: 1 # Use 3+ for HA with replication
selector:
matchLabels:
app: postgresql
template:
metadata:
labels:
app: postgresql
spec:
containers:
- name: postgresql
image: postgres:16-alpine
ports:
- containerPort: 5432
name: postgres
env:
- name: POSTGRES_DB
value: "elsa"
- name: POSTGRES_USER
value: "elsa"
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: elsa-secrets
key: postgres-password
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
- name: POSTGRES_INITDB_ARGS
value: "--encoding=UTF8 --lc-collate=en_US.utf8 --lc-ctype=en_US.utf8"
args:
- "-c"
- "max_connections=200"
- "-c"
- "shared_buffers=256MB"
- "-c"
- "effective_cache_size=1GB"
- "-c"
- "maintenance_work_mem=64MB"
- "-c"
- "checkpoint_completion_target=0.9"
- "-c"
- "wal_buffers=16MB"
- "-c"
- "default_statistics_target=100"
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "4Gi"
cpu: "2000m"
livenessProbe:
exec:
command:
- /bin/sh
- -c
- pg_isready -U elsa
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- /bin/sh
- -c
- pg_isready -U elsa
initialDelaySeconds: 5
periodSeconds: 5
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "standard" # Use your storage class
resources:
requests:
storage: 50GiPostgreSQL Service
# postgresql/service.yaml
apiVersion: v1
kind: Service
metadata:
name: elsa-postgresql
namespace: elsa-workflows
spec:
type: ClusterIP
clusterIP: None # Headless service for StatefulSet
ports:
- port: 5432
targetPort: postgres
protocol: TCP
name: postgres
selector:
app: postgresqlDatabase Backup Configuration
Create a CronJob for regular backups:
# postgresql/backup-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: postgresql-backup
namespace: elsa-workflows
spec:
schedule: "0 2 * * *" # Daily at 2 AM
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: postgres:16-alpine
command:
- /bin/sh
- -c
- |
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
pg_dump -h elsa-postgresql -U elsa -d elsa > /backup/elsa_backup_${TIMESTAMP}.sql
# Upload to S3 or other storage
# aws s3 cp /backup/elsa_backup_${TIMESTAMP}.sql s3://your-bucket/backups/
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: elsa-secrets
key: postgres-password
volumeMounts:
- name: backup
mountPath: /backup
restartPolicy: OnFailure
volumes:
- name: backup
persistentVolumeClaim:
claimName: backup-pvcConnection Pooling
For high-load scenarios, consider using PgBouncer:
# postgresql/pgbouncer-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: pgbouncer
namespace: elsa-workflows
spec:
replicas: 2
selector:
matchLabels:
app: pgbouncer
template:
metadata:
labels:
app: pgbouncer
spec:
containers:
- name: pgbouncer
image: edoburu/pgbouncer:latest
ports:
- containerPort: 5432
env:
- name: DATABASE_URL
value: "postgres://elsa:PASSWORD@elsa-postgresql:5432/elsa"
- name: POOL_MODE
value: "transaction"
- name: MAX_CLIENT_CONN
value: "1000"
- name: DEFAULT_POOL_SIZE
value: "25"
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"Persistent Storage
Proper storage configuration ensures data persistence across pod restarts and upgrades.
Storage Classes
Define storage classes for different performance tiers:
# storage-classes.yaml
---
# Standard storage for general use
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard-retain
provisioner: kubernetes.io/aws-ebs # or azure-disk, gce-pd
parameters:
type: gp3
fsType: ext4
reclaimPolicy: Retain # Prevent accidental data loss
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
# High-performance storage for databases
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
type: io2
iopsPerGB: "50"
fsType: ext4
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumerPersistent Volume Claims
# pvc.yaml
---
# Database PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgresql-data-pvc
namespace: elsa-workflows
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
---
# Backup PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: backup-pvc
namespace: elsa-workflows
spec:
accessModes:
- ReadWriteMany # For multiple backup pods
storageClassName: standard-retain
resources:
requests:
storage: 500Gi
---
# Redis PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: redis-data-pvc
namespace: elsa-workflows
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 20GiVolume Snapshots
Configure VolumeSnapshotClass for backup and disaster recovery:
# volume-snapshot-class.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: elsa-snapshot-class
driver: ebs.csi.aws.com # or disk.csi.azure.com, pd.csi.storage.gke.io
deletionPolicy: Retain
parameters:
tagSpecification_1: "Name=elsa-workflow-snapshot"Create snapshots:
# create-snapshot.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: postgresql-snapshot
namespace: elsa-workflows
spec:
volumeSnapshotClassName: elsa-snapshot-class
source:
persistentVolumeClaimName: postgresql-data-pvcIngress Setup
Ingress controllers provide external access to your Elsa Workflows deployment with SSL/TLS termination, routing, and load balancing.
NGINX Ingress Controller
Installation
# Install NGINX Ingress Controller
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.service.type=LoadBalancerIngress Configuration
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: elsa-ingress
namespace: elsa-workflows
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/limit-rps: "10"
# CORS configuration
nginx.ingress.kubernetes.io/enable-cors: "true"
nginx.ingress.kubernetes.io/cors-allow-origin: "https://studio.example.com"
nginx.ingress.kubernetes.io/cors-allow-methods: "GET, POST, PUT, DELETE, OPTIONS"
nginx.ingress.kubernetes.io/cors-allow-headers: "DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization"
spec:
ingressClassName: nginx
tls:
- hosts:
- studio.example.com
- api.example.com
secretName: elsa-tls
rules:
- host: studio.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: elsa-studio
port:
number: 80
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: elsa-server
port:
number: 80Traefik Ingress Controller
Installation
helm repo add traefik https://traefik.github.io/charts
helm repo update
helm install traefik traefik/traefik \
--namespace traefik \
--create-namespace \
--set ports.web.redirectTo.port=websecureIngressRoute Configuration
# traefik-ingressroute.yaml
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: elsa-studio
namespace: elsa-workflows
spec:
entryPoints:
- websecure
routes:
- match: Host(`studio.example.com`)
kind: Rule
services:
- name: elsa-studio
port: 80
middlewares:
- name: security-headers
tls:
secretName: elsa-tls
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: elsa-server
namespace: elsa-workflows
spec:
entryPoints:
- websecure
routes:
- match: Host(`api.example.com`)
kind: Rule
services:
- name: elsa-server
port: 80
middlewares:
- name: rate-limit
- name: security-headers
tls:
secretName: elsa-tls
---
# Security headers middleware
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
name: security-headers
namespace: elsa-workflows
spec:
headers:
stsSeconds: 31536000
stsIncludeSubdomains: true
stsPreload: true
forceSTSHeader: true
contentSecurityPolicy: "default-src 'self'"
customResponseHeaders:
X-Frame-Options: "SAMEORIGIN"
X-Content-Type-Options: "nosniff"
---
# Rate limiting middleware
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
name: rate-limit
namespace: elsa-workflows
spec:
rateLimit:
average: 100
burst: 50SSL/TLS with cert-manager
Install cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set installCRDs=trueClusterIssuer Configuration
# cert-manager-issuer.yaml
---
# Let's Encrypt Staging (for testing)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: [email protected]
privateKeySecretRef:
name: letsencrypt-staging
solvers:
- http01:
ingress:
class: nginx
---
# Let's Encrypt Production
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: [email protected]
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginxHorizontal Pod Autoscaling
HPA automatically scales pods based on CPU, memory, or custom metrics.
Metrics Server Installation
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yamlHPA for Elsa Server
# elsa-server/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: elsa-server-hpa
namespace: elsa-workflows
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: elsa-server
minReplicas: 3
maxReplicas: 10
metrics:
# CPU-based scaling
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# Memory-based scaling
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
# Custom metric: requests per second
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Min
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 4
periodSeconds: 30
selectPolicy: MaxHPA for Elsa Studio
# elsa-studio/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: elsa-studio-hpa
namespace: elsa-workflows
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: elsa-studio
minReplicas: 2
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80Vertical Pod Autoscaling (Optional)
For automatic resource request adjustments:
# elsa-server/vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: elsa-server-vpa
namespace: elsa-workflows
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: elsa-server
updatePolicy:
updateMode: "Auto" # or "Initial", "Recreate", "Off"
resourcePolicy:
containerPolicies:
- containerName: elsa-server
minAllowed:
cpu: 500m
memory: 512Mi
maxAllowed:
cpu: 4000m
memory: 8GiPod Disruption Budget
Ensure availability during voluntary disruptions:
# elsa-server/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: elsa-server-pdb
namespace: elsa-workflows
spec:
minAvailable: 2
selector:
matchLabels:
app: elsa-server
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: elsa-studio-pdb
namespace: elsa-workflows
spec:
minAvailable: 1
selector:
matchLabels:
app: elsa-studioTesting Autoscaling
# Watch HPA status
kubectl get hpa -n elsa-workflows --watch
# Generate load to test scaling
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh
# Inside the pod:
while true; do wget -q -O- http://elsa-server.elsa-workflows.svc.cluster.local; done
# Monitor pod scaling
kubectl get pods -n elsa-workflows --watchMonitoring with Prometheus & Grafana
Comprehensive monitoring is essential for production Kubernetes deployments. This section covers Prometheus metrics collection and Grafana dashboards.
Install Prometheus Stack
# Add Prometheus Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install kube-prometheus-stack (includes Prometheus, Grafana, and Alertmanager)
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
--set grafana.adminPassword=adminServiceMonitor for Elsa Server
# monitoring/elsa-server-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: elsa-server
namespace: elsa-workflows
labels:
app: elsa-server
release: prometheus
spec:
selector:
matchLabels:
app: elsa-server
endpoints:
- port: http
path: /metrics
interval: 30s
scrapeTimeout: 10sPrometheusRule for Alerts
# monitoring/elsa-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: elsa-alerts
namespace: elsa-workflows
labels:
release: prometheus
spec:
groups:
- name: elsa-workflows
interval: 30s
rules:
# High CPU usage alert
- alert: ElsaServerHighCPU
expr: |
rate(container_cpu_usage_seconds_total{namespace="elsa-workflows",pod=~"elsa-server-.*"}[5m]) > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Elsa Server high CPU usage"
description: "Pod {{ $labels.pod }} CPU usage is above 80% for 5 minutes"
# High memory usage alert
- alert: ElsaServerHighMemory
expr: |
container_memory_working_set_bytes{namespace="elsa-workflows",pod=~"elsa-server-.*"} /
container_spec_memory_limit_bytes{namespace="elsa-workflows",pod=~"elsa-server-.*"} > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "Elsa Server high memory usage"
description: "Pod {{ $labels.pod }} memory usage is above 90%"
# Pod restart alert
- alert: ElsaServerPodRestarting
expr: |
rate(kube_pod_container_status_restarts_total{namespace="elsa-workflows",pod=~"elsa-server-.*"}[15m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Elsa Server pod restarting"
description: "Pod {{ $labels.pod }} has restarted {{ $value }} times in the last 15 minutes"
# Low replica count
- alert: ElsaServerLowReplicas
expr: |
kube_deployment_status_replicas_available{namespace="elsa-workflows",deployment="elsa-server"} < 2
for: 5m
labels:
severity: critical
annotations:
summary: "Elsa Server low replica count"
description: "Only {{ $value }} replicas available for elsa-server deployment"
# Database connection errors
- alert: ElsaDatabaseConnectionErrors
expr: |
rate(elsa_database_connection_errors_total[5m]) > 0.1
for: 2m
labels:
severity: critical
annotations:
summary: "Elsa database connection errors"
description: "Database connection error rate is {{ $value }} per second"
# Workflow execution failures
- alert: ElsaWorkflowExecutionFailures
expr: |
rate(elsa_workflow_execution_failed_total[5m]) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "High workflow execution failure rate"
description: "Workflow execution failure rate is {{ $value }} per second"
# High response time
- alert: ElsaServerHighResponseTime
expr: |
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{namespace="elsa-workflows"}[5m])) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "Elsa Server high response time"
description: "95th percentile response time is {{ $value }} seconds"Grafana Dashboard
Create a comprehensive Grafana dashboard for Elsa Workflows:
# monitoring/elsa-dashboard-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: elsa-grafana-dashboard
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
elsa-workflows.json: |
{
"dashboard": {
"title": "Elsa Workflows",
"timezone": "browser",
"schemaVersion": 16,
"refresh": "30s",
"panels": [
{
"title": "Request Rate",
"targets": [
{
"expr": "rate(http_requests_total{namespace=\"elsa-workflows\"}[5m])",
"legendFormat": "{{pod}}"
}
],
"type": "graph"
},
{
"title": "Response Time (95th Percentile)",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{namespace=\"elsa-workflows\"}[5m]))",
"legendFormat": "{{pod}}"
}
],
"type": "graph"
},
{
"title": "CPU Usage",
"targets": [
{
"expr": "rate(container_cpu_usage_seconds_total{namespace=\"elsa-workflows\"}[5m])",
"legendFormat": "{{pod}}"
}
],
"type": "graph"
},
{
"title": "Memory Usage",
"targets": [
{
"expr": "container_memory_working_set_bytes{namespace=\"elsa-workflows\"} / 1024 / 1024",
"legendFormat": "{{pod}}"
}
],
"type": "graph"
},
{
"title": "Active Workflows",
"targets": [
{
"expr": "elsa_active_workflows_total{namespace=\"elsa-workflows\"}",
"legendFormat": "{{pod}}"
}
],
"type": "stat"
},
{
"title": "Workflow Execution Rate",
"targets": [
{
"expr": "rate(elsa_workflow_executions_total{namespace=\"elsa-workflows\"}[5m])",
"legendFormat": "{{status}}"
}
],
"type": "graph"
},
{
"title": "Database Connection Pool",
"targets": [
{
"expr": "elsa_database_connections_active{namespace=\"elsa-workflows\"}",
"legendFormat": "Active"
},
{
"expr": "elsa_database_connections_idle{namespace=\"elsa-workflows\"}",
"legendFormat": "Idle"
}
],
"type": "graph"
},
{
"title": "Pod Status",
"targets": [
{
"expr": "kube_pod_status_phase{namespace=\"elsa-workflows\"}",
"legendFormat": "{{pod}} - {{phase}}"
}
],
"type": "table"
}
]
}
}Custom Metrics in Elsa
To expose custom metrics from your Elsa Server, configure Prometheus metrics in Program.cs:
using Prometheus;
var builder = WebApplication.CreateBuilder(args);
// Configure Elsa
builder.Services.AddElsa(elsa =>
{
elsa.UseWorkflowManagement();
elsa.UseWorkflowRuntime();
elsa.UseWorkflowsApi();
});
var app = builder.Build();
// Enable Prometheus metrics endpoint
app.UseMetricServer(); // Exposes /metrics endpoint
app.UseHttpMetrics(); // Collect HTTP metrics
app.UseWorkflowsApi();
app.Run();Accessing Grafana
# Port-forward Grafana service
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
# Access Grafana at http://localhost:3000
# Default credentials: admin / admin (or password set during installation)Key Metrics to Monitor
http_requests_total
Total HTTP requests
-
http_request_duration_seconds
Request latency
P95 > 2s
elsa_workflow_executions_total
Workflow executions
-
elsa_workflow_execution_failed_total
Failed workflows
Rate > 0.5/s
elsa_active_workflows_total
Currently active workflows
-
elsa_database_connections_active
Active DB connections
> 90% of pool
container_cpu_usage_seconds_total
CPU usage
> 80%
container_memory_working_set_bytes
Memory usage
> 90% of limit
kube_pod_container_status_restarts_total
Pod restarts
> 0 in 15min
Service Mesh Integration
Service meshes provide advanced traffic management, security, and observability features. This section covers integration with Istio and Linkerd.
Istio Integration
Prerequisites
# Download and install Istio
curl -L https://istio.io/downloadIstio | sh -
cd istio-*
export PATH=$PWD/bin:$PATH
# Install Istio with demo profile
istioctl install --set profile=demo -y
# Enable sidecar injection for elsa-workflows namespace
kubectl label namespace elsa-workflows istio-injection=enabledGateway Configuration
# istio/gateway.yaml
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: elsa-gateway
namespace: elsa-workflows
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
credentialName: elsa-tls
hosts:
- studio.example.com
- api.example.com
- port:
number: 80
name: http
protocol: HTTP
hosts:
- studio.example.com
- api.example.com
tls:
httpsRedirect: trueVirtualService Configuration
# istio/virtualservice.yaml
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: elsa-studio
namespace: elsa-workflows
spec:
hosts:
- studio.example.com
gateways:
- elsa-gateway
http:
- match:
- uri:
prefix: /
route:
- destination:
host: elsa-studio
port:
number: 80
weight: 100
timeout: 30s
retries:
attempts: 3
perTryTimeout: 10s
retryOn: 5xx,reset,connect-failure,refused-stream
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: elsa-server
namespace: elsa-workflows
spec:
hosts:
- api.example.com
gateways:
- elsa-gateway
http:
- match:
- uri:
prefix: /
route:
- destination:
host: elsa-server
port:
number: 80
weight: 100
timeout: 60s
retries:
attempts: 3
perTryTimeout: 20s
retryOn: 5xx,reset,connect-failure,refused-streamDestinationRule for Circuit Breaking
# istio/destinationrule.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: elsa-server
namespace: elsa-workflows
spec:
host: elsa-server
trafficPolicy:
connectionPool:
tcp:
maxConnections: 1000
http:
http1MaxPendingRequests: 1000
http2MaxRequests: 1000
maxRequestsPerConnection: 2
loadBalancer:
simple: LEAST_REQUEST
outlierDetection:
consecutiveErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
minHealthPercent: 40PeerAuthentication for mTLS
# istio/peerauthentication.yaml
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: elsa-workflows
spec:
mtls:
mode: STRICTAuthorizationPolicy
# istio/authorizationpolicy.yaml
---
# Allow traffic from ingress to services
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: allow-ingress
namespace: elsa-workflows
spec:
selector:
matchLabels:
app: elsa-server
action: ALLOW
rules:
- from:
- source:
namespaces: ["istio-system"]
---
# Deny all by default, then allow specific paths
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: elsa-server-authz
namespace: elsa-workflows
spec:
selector:
matchLabels:
app: elsa-server
action: ALLOW
rules:
- to:
- operation:
methods: ["GET", "POST", "PUT", "DELETE"]
paths: ["/elsa/api/*", "/health/*", "/metrics"]Linkerd Integration
Installation
# Install Linkerd CLI
curl -sL https://run.linkerd.io/install | sh
export PATH=$PATH:$HOME/.linkerd2/bin
# Verify cluster compatibility
linkerd check --pre
# Install Linkerd control plane
linkerd install | kubectl apply -f -
# Verify installation
linkerd check
# Install Linkerd Viz for observability
linkerd viz install | kubectl apply -f -Mesh Elsa Workflows Namespace
# Inject Linkerd sidecar into existing deployments
kubectl get deploy -n elsa-workflows -o yaml | \
linkerd inject - | \
kubectl apply -f -
# Or annotate namespace for automatic injection
kubectl annotate namespace elsa-workflows linkerd.io/inject=enabledTraffic Split for Canary Deployments
# linkerd/trafficsplit.yaml
apiVersion: split.smi-spec.io/v1alpha1
kind: TrafficSplit
metadata:
name: elsa-server-split
namespace: elsa-workflows
spec:
service: elsa-server
backends:
- service: elsa-server-stable
weight: 90
- service: elsa-server-canary
weight: 10ServiceProfile for Advanced Metrics
# linkerd/serviceprofile.yaml
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
name: elsa-server.elsa-workflows.svc.cluster.local
namespace: elsa-workflows
spec:
routes:
- name: POST /elsa/api/workflows
condition:
method: POST
pathRegex: /elsa/api/workflows
timeout: 30s
retries:
limit: 3
timeout: 10s
- name: GET /elsa/api/workflows
condition:
method: GET
pathRegex: /elsa/api/workflows.*
timeout: 10s
- name: Health Check
condition:
pathRegex: /health.*
isRetryable: trueRate Limiting with Linkerd
# linkerd/ratelimit.yaml
apiVersion: policy.linkerd.io/v1alpha1
kind: HTTPRoute
metadata:
name: elsa-server-ratelimit
namespace: elsa-workflows
spec:
parentRefs:
- name: elsa-server
kind: Service
rules:
- matches:
- path:
type: PathPrefix
value: /elsa/api
filters:
- type: RequestHeaderModifier
requestHeaderModifier:
add:
- name: X-RateLimit-Limit
value: "100"Observability with Service Mesh
Istio Dashboard
# Access Kiali dashboard
istioctl dashboard kiali
# Access Jaeger for distributed tracing
istioctl dashboard jaeger
# Access Prometheus
istioctl dashboard prometheus
# Access Grafana
istioctl dashboard grafanaLinkerd Dashboard
# Access Linkerd dashboard
linkerd viz dashboard
# View traffic metrics
linkerd viz stat deploy -n elsa-workflows
# View route metrics
linkerd viz routes deploy/elsa-server -n elsa-workflows
# Tap live traffic (for debugging)
linkerd viz tap deploy/elsa-server -n elsa-workflowsComparison: Istio vs Linkerd
Learning Curve
Steep
Gentle
Resource Usage
Higher (Envoy proxy)
Lower (Linkerd2-proxy)
Features
Comprehensive
Focused
Traffic Management
Advanced
Basic
Security
mTLS, AuthZ policies
mTLS, policy
Observability
Prometheus, Jaeger, Kiali
Prometheus, built-in viz
Performance
Good
Excellent
Best For
Complex environments
Simplicity, performance
Service Mesh Best Practices
Start Simple: Begin without a service mesh and add it when needed
Resource Planning: Allocate extra resources for sidecar proxies (~50-100Mi RAM, 0.1 CPU per pod)
Gradual Rollout: Enable mesh incrementally, namespace by namespace
Monitor Performance: Watch for latency increases due to proxy overhead
Use mTLS: Enable mutual TLS for secure pod-to-pod communication
Circuit Breaking: Configure circuit breakers to prevent cascade failures
Observability: Leverage built-in tracing and metrics
Test Thoroughly: Test failure scenarios with chaos engineering
Distributed Configuration
For Kubernetes deployments with multiple replicas, proper distributed configuration is essential. Reference the Distributed Hosting guide for detailed configuration.
Distributed Runtime Configuration
Configure distributed workflow runtime in your Elsa Server:
// Program.cs or Startup.cs
using Elsa.Extensions;
using Elsa.DistributedLocking.Extensions;
using Medallion.Threading.Postgres;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddElsa(elsa =>
{
// Configure distributed workflow runtime
elsa.UseWorkflowRuntime(runtime =>
{
runtime.UseDistributedRuntime();
// Configure distributed locking with PostgreSQL
runtime.DistributedLockProvider = serviceProvider =>
new PostgresDistributedSynchronizationProvider(
builder.Configuration.GetConnectionString("PostgreSql"),
options =>
{
options.KeepaliveCadence(TimeSpan.FromMinutes(5));
options.UseMultiplexing();
});
});
// Configure distributed caching with MassTransit
elsa.UseDistributedCache(distributedCaching =>
{
distributedCaching.UseMassTransit();
});
// Configure MassTransit with RabbitMQ
elsa.UseMassTransit(massTransit =>
{
massTransit.UseRabbitMq(
builder.Configuration.GetConnectionString("RabbitMq"),
rabbit =>
{
rabbit.ConfigureTransportBus = (context, bus) =>
{
bus.PrefetchCount = 50;
bus.Durable = true;
bus.AutoDelete = false;
bus.ConcurrentMessageLimit = 32;
};
});
});
// Configure Quartz.NET with PostgreSQL for distributed scheduling
elsa.UseScheduling(scheduling =>
{
scheduling.UseQuartzScheduler();
});
});
// Configure Quartz with persistent store
builder.Services.AddQuartz(quartz =>
{
quartz.UsePostgreSql(builder.Configuration.GetConnectionString("PostgreSql"));
});
var app = builder.Build();
app.Run();Environment-Based Configuration
Use Kubernetes ConfigMaps and Secrets for environment-specific settings:
# distributed-config-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: elsa-distributed-config
namespace: elsa-workflows
data:
# Elsa Configuration
ELSA__RUNTIME__TYPE: "Distributed"
ELSA__CACHING__TYPE: "Distributed"
ELSA__LOCKING__PROVIDER: "PostgreSQL"
# MassTransit Configuration
MASSTRANSIT__TRANSPORT: "RabbitMq"
MASSTRANSIT__PREFETCHCOUNT: "50"
# Quartz Configuration
QUARTZ__CLUSTERED: "true"
QUARTZ__INSTANCENAME: "ElsaQuartzCluster"
# Performance Tuning
ASPNETCORE__KESTREL__LIMITS__MAXCONCURRENTCONNECTIONS: "1000"
ASPNETCORE__KESTREL__LIMITS__MAXREQUESTBODYSIZE: "10485760"Apply to deployment:
# Add to elsa-server deployment
spec:
template:
spec:
containers:
- name: elsa-server
envFrom:
- configMapRef:
name: elsa-distributed-config
- secretRef:
name: elsa-secretsRedis Configuration for Caching
Deploy Redis for distributed caching:
# redis/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elsa-redis
namespace: elsa-workflows
spec:
serviceName: elsa-redis
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7-alpine
command:
- redis-server
- --appendonly
- "yes"
- --maxmemory
- "1gb"
- --maxmemory-policy
- "allkeys-lru"
ports:
- containerPort: 6379
name: redis
volumeMounts:
- name: data
mountPath: /data
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10GiTroubleshooting
Common issues and their solutions when deploying Elsa to Kubernetes.
Pod Issues
Pods Not Starting
Symptom: Pods stuck in Pending or ImagePullBackOff state
# Check pod status
kubectl describe pod <pod-name> -n elsa-workflows
# Common issues and solutions:Solution 1: Insufficient Resources
# Check node resources
kubectl top nodes
# Check resource requests
kubectl describe node <node-name>
# Solution: Scale cluster or reduce resource requestsSolution 2: Image Pull Issues
# Check image pull secrets
kubectl get secrets -n elsa-workflows
# Create image pull secret if needed
kubectl create secret docker-registry regcred \
--docker-server=<registry-server> \
--docker-username=<username> \
--docker-password=<password> \
--docker-email=<email> \
-n elsa-workflows
# Add to deployment
spec:
template:
spec:
imagePullSecrets:
- name: regcredPods Crashing (CrashLoopBackOff)
Symptom: Pods repeatedly restarting
# View logs
kubectl logs <pod-name> -n elsa-workflows --previous
# Check events
kubectl get events -n elsa-workflows --sort-by='.lastTimestamp'Common Causes:
Database Connection Issues
# Test database connectivity
kubectl run -it --rm debug --image=postgres:16-alpine --restart=Never -- \
psql -h elsa-postgresql -U elsa -d elsa
# Check connection string in secrets
kubectl get secret elsa-secrets -n elsa-workflows -o jsonpath='{.data.postgresql-connection-string}' | base64 -dMissing Dependencies
# Check if Redis/RabbitMQ are running
kubectl get pods -n elsa-workflows
# Check service endpoints
kubectl get endpoints -n elsa-workflowsConfiguration Errors
# Validate ConfigMaps and Secrets
kubectl get configmap elsa-config -n elsa-workflows -o yaml
kubectl get secret elsa-secrets -n elsa-workflows -o yamlDatabase Issues
Migration Failures
Symptom: Elsa Server fails to start due to database migration errors
# Run migrations manually using a Job
kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
name: elsa-db-migration
namespace: elsa-workflows
spec:
template:
spec:
containers:
- name: migration
image: elsaworkflows/elsa-server-v3-5:latest
command: ["/bin/sh", "-c"]
args:
- |
dotnet ef database update
env:
- name: CONNECTIONSTRINGS__POSTGRESQL
valueFrom:
secretKeyRef:
name: elsa-secrets
key: postgresql-connection-string
restartPolicy: Never
backoffLimit: 3
EOF
# Check job logs
kubectl logs job/elsa-db-migration -n elsa-workflowsConnection Pool Exhaustion
Symptom: "Timeout expired" or "Too many connections" errors
# Check current connections
kubectl exec -it elsa-postgresql-0 -n elsa-workflows -- \
psql -U elsa -d elsa -c "SELECT count(*) FROM pg_stat_activity;"
# Solution: Increase max_connections or connection pool size
# Update PostgreSQL configuration
kubectl edit statefulset elsa-postgresql -n elsa-workflows
# Or use PgBouncer (see Database Configuration section)Network Issues
Service Not Accessible
Symptom: Cannot reach Elsa services from outside cluster
# Check service
kubectl get svc -n elsa-workflows
# Check endpoints
kubectl get endpoints elsa-server -n elsa-workflows
# Check ingress
kubectl get ingress -n elsa-workflows
kubectl describe ingress elsa-ingress -n elsa-workflows
# Test internal connectivity
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
curl http://elsa-server.elsa-workflows.svc.cluster.local/healthSolution: DNS Issues
# Test DNS resolution
kubectl run -it --rm debug --image=busybox --restart=Never -- \
nslookup elsa-server.elsa-workflows.svc.cluster.local
# Check CoreDNS
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dnsIngress Not Working
# Check ingress controller
kubectl get pods -n ingress-nginx
# Check ingress class
kubectl get ingressclass
# Verify TLS certificate
kubectl get certificate -n elsa-workflows
kubectl describe certificate elsa-tls -n elsa-workflows
# Check cert-manager logs if using Let's Encrypt
kubectl logs -n cert-manager deployment/cert-managerPerformance Issues
High Latency
# Check pod metrics
kubectl top pods -n elsa-workflows
# Check HPA status
kubectl get hpa -n elsa-workflows
# View detailed metrics
kubectl describe hpa elsa-server-hpa -n elsa-workflowsSolutions:
Increase replica count
Optimize database queries
Add caching layer
Review resource limits
Memory Leaks
# Monitor memory usage over time
kubectl top pod <pod-name> -n elsa-workflows --containers
# Get heap dump (if .NET diagnostics enabled)
kubectl exec -it <pod-name> -n elsa-workflows -- \
dotnet-dump collect --process-id 1Distributed Configuration Issues
Lock Acquisition Failures
Symptom: "Failed to acquire lock" errors in logs
# Check distributed lock table in database
kubectl exec -it elsa-postgresql-0 -n elsa-workflows -- \
psql -U elsa -d elsa -c "SELECT * FROM distributed_locks;"
# Clear stale locks (use with caution)
kubectl exec -it elsa-postgresql-0 -n elsa-workflows -- \
psql -U elsa -d elsa -c "DELETE FROM distributed_locks WHERE acquired_at < NOW() - INTERVAL '1 hour';"Cache Invalidation Issues
Symptom: Stale data across pods
# Check RabbitMQ queues
kubectl exec -it elsa-rabbitmq-0 -n elsa-workflows -- rabbitmqctl list_queues
# Verify MassTransit configuration
kubectl logs <elsa-server-pod> -n elsa-workflows | grep -i masstransit
# Restart all pods to force cache refresh
kubectl rollout restart deployment/elsa-server -n elsa-workflowsDebugging Commands
# Get all resources in namespace
kubectl get all -n elsa-workflows
# Describe all pods
kubectl describe pods -n elsa-workflows
# View logs from all pods
kubectl logs -n elsa-workflows -l app=elsa-server --tail=100
# Follow logs in real-time
kubectl logs -f <pod-name> -n elsa-workflows
# Execute commands in pod
kubectl exec -it <pod-name> -n elsa-workflows -- /bin/sh
# Port-forward for local access
kubectl port-forward svc/elsa-server 8080:80 -n elsa-workflows
# Get resource usage
kubectl top pods -n elsa-workflows
kubectl top nodes
# Check cluster events
kubectl get events -n elsa-workflows --sort-by='.lastTimestamp'
# Validate YAML before applying
kubectl apply --dry-run=client -f deployment.yaml
# Explain resource fields
kubectl explain deployment.spec.template.spec.containersProduction Best Practices
Follow these best practices for reliable, secure, and performant Kubernetes deployments.
Security
1. Use Non-Root Containers
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
capabilities:
drop:
- ALL
readOnlyRootFilesystem: false # Set to true if possible2. Network Policies
Restrict pod-to-pod communication:
# network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: elsa-server-netpol
namespace: elsa-workflows
spec:
podSelector:
matchLabels:
app: elsa-server
policyTypes:
- Ingress
- Egress
ingress:
# Allow from ingress controller
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8080
# Allow from Studio
- from:
- podSelector:
matchLabels:
app: elsa-studio
ports:
- protocol: TCP
port: 8080
egress:
# Allow to database
- to:
- podSelector:
matchLabels:
app: postgresql
ports:
- protocol: TCP
port: 5432
# Allow to Redis
- to:
- podSelector:
matchLabels:
app: redis
ports:
- protocol: TCP
port: 6379
# Allow to RabbitMQ
- to:
- podSelector:
matchLabels:
app: rabbitmq
ports:
- protocol: TCP
port: 5672
# Allow DNS
- to:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: UDP
port: 533. Secrets Management
Use external secret management:
# external-secrets-operator example
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: aws-secrets-manager
namespace: elsa-workflows
spec:
provider:
aws:
service: SecretsManager
region: us-east-1
auth:
jwt:
serviceAccountRef:
name: elsa-server
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: elsa-secrets
namespace: elsa-workflows
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: SecretStore
target:
name: elsa-secrets
creationPolicy: Owner
data:
- secretKey: postgresql-connection-string
remoteRef:
key: elsa/production/database
property: connection-string4. RBAC Configuration
# rbac.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: elsa-server
namespace: elsa-workflows
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: elsa-server-role
namespace: elsa-workflows
rules:
- apiGroups: [""]
resources: ["configmaps", "secrets"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: elsa-server-rolebinding
namespace: elsa-workflows
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: elsa-server-role
subjects:
- kind: ServiceAccount
name: elsa-server
namespace: elsa-workflowsHigh Availability
1. Multi-Zone Deployment
spec:
template:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- elsa-server
topologyKey: topology.kubernetes.io/zone2. Pod Disruption Budgets
Ensure minimum availability during disruptions:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: elsa-server-pdb
namespace: elsa-workflows
spec:
minAvailable: 2 # or maxUnavailable: 1
selector:
matchLabels:
app: elsa-server3. Health Checks
Configure appropriate health checks:
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 20
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
startupProbe:
httpGet:
path: /health/startup
port: 8080
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 30 # Allow up to 150s for startupResource Management
1. Set Resource Requests and Limits
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"2. Quality of Service Classes
Guaranteed: requests == limits (highest priority)
Burstable: requests < limits (medium priority)
BestEffort: no requests/limits (lowest priority)
3. Limit Ranges
# limitrange.yaml
apiVersion: v1
kind: LimitRange
metadata:
name: elsa-limits
namespace: elsa-workflows
spec:
limits:
- max:
memory: "4Gi"
cpu: "4000m"
min:
memory: "256Mi"
cpu: "250m"
default:
memory: "1Gi"
cpu: "1000m"
defaultRequest:
memory: "512Mi"
cpu: "500m"
type: ContainerBackup and Disaster Recovery
1. Regular Backups
# Backup script example
#!/bin/bash
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
# Database backup
kubectl exec -n elsa-workflows elsa-postgresql-0 -- \
pg_dump -U elsa elsa | gzip > backup_${TIMESTAMP}.sql.gz
# Upload to S3
aws s3 cp backup_${TIMESTAMP}.sql.gz s3://your-bucket/backups/
# Kubernetes resource backup
kubectl get all,configmap,secret,pvc,ingress -n elsa-workflows -o yaml > \
k8s_backup_${TIMESTAMP}.yaml2. Velero for Cluster Backups
# Install Velero
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.8.0 \
--bucket velero-backups \
--backup-location-config region=us-east-1 \
--snapshot-location-config region=us-east-1
# Create backup schedule
velero schedule create elsa-daily \
--schedule="0 2 * * *" \
--include-namespaces elsa-workflows
# List available backups
velero backup get
# Restore from a specific backup
velero restore create --from-backup <backup-name>Monitoring and Alerting
1. Define SLIs/SLOs
Elsa Server
Request Success Rate
> 99.9%
Elsa Server
P95 Latency
< 500ms
Elsa Server
Availability
> 99.95%
Database
Connection Success
> 99.99%
2. Alert on SLO Violations
# prometheus-rules.yaml
- alert: SLOViolation-SuccessRate
expr: |
(
sum(rate(http_requests_total{namespace="elsa-workflows",code=~"2.."}[5m]))
/
sum(rate(http_requests_total{namespace="elsa-workflows"}[5m]))
) < 0.999
for: 5m
labels:
severity: critical
annotations:
summary: "Success rate below SLO (99.9%)"Cost Optimization
1. Right-Size Resources
# Use VPA recommendations
kubectl describe vpa elsa-server-vpa -n elsa-workflows
# Monitor actual usage
kubectl top pods -n elsa-workflows --containers2. Use Spot/Preemptible Instances
# Node affinity for spot instances
spec:
template:
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node.kubernetes.io/lifecycle
operator: In
values:
- spot3. Enable Cluster Autoscaler
# AWS example
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yamlCI/CD Integration
1. GitOps with ArgoCD
# argocd-application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: elsa-workflows
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/your-org/elsa-k8s
targetRevision: main
path: k8s/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: elsa-workflows
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true2. CI Pipeline Example
# .github/workflows/deploy.yaml
name: Deploy to Kubernetes
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Configure kubectl
uses: azure/k8s-set-context@v3
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBE_CONFIG }}
- name: Deploy
run: |
kubectl apply -f k8s/
kubectl rollout status deployment/elsa-server -n elsa-workflowsNext Steps
After deploying Elsa Workflows to Kubernetes:
Configure Monitoring: Set up Grafana dashboards and alerts
Test Failure Scenarios: Use chaos engineering tools like Chaos Mesh
Optimize Performance: Profile and tune based on your workload
Implement Backups: Set up automated backup and restore procedures
Security Hardening: Implement network policies, RBAC, and secret rotation
Documentation: Document your specific configuration and runbooks
Related Resources
Distributed Hosting Guide - Configure distributed runtime
Database Configuration - Database setup details
Authentication Guide - Secure your deployment
Docker Compose Guide - Local testing
Elsa Server Application Type - Server configuration
Elsa Studio Application Type - Studio configuration
Community and Support
Join the community on Discord or Slack
Version Information
This guide is written for:
Elsa Workflows: v3.5+
Kubernetes: v1.28+
Helm: v3.12+
PostgreSQL: 16+
Redis: 7+
RabbitMQ: 3.12+
Always refer to the official releases for the latest version compatibility information.
Last Updated: 2025-11-20
Acceptance Criteria Checklist (DOC-009):
✅ K8s manifests/Helm charts
✅ Horizontal scaling configuration
✅ Distributed locking setup
✅ Database integration
✅ Secrets management
✅ Health checks & readiness probes
✅ Monitoring integration
✅ Troubleshooting guide
Last updated