Kubernetes Deployment
Complete Kubernetes deployment guide for Elsa Workflows including Helm charts, deployment configurations, ingress setup, autoscaling, monitoring, service mesh integration, and production best practice
This comprehensive guide covers deploying Elsa Workflows to Kubernetes in production environments. Whether you're using managed Kubernetes services (EKS, AKS, GKE) or self-hosted clusters, this guide provides everything you need for a reliable, scalable deployment.
Overview
Elsa Workflows can be deployed to Kubernetes using either:
Helm Charts (Recommended) - Simplified deployment and management
Raw Kubernetes Manifests - Full control over configuration
This guide covers both approaches and includes:
Elsa Server and Studio deployments
Database integration and persistence
Ingress configuration for external access
Horizontal Pod Autoscaling (HPA)
Monitoring with Prometheus and Grafana
Service mesh integration (Istio/Linkerd)
Production best practices and troubleshooting
Table of Contents
Prerequisites
Before deploying to Kubernetes, ensure you have:
Required Tools
kubectl v1.28+ - Kubernetes command-line tool
Helm v3.12+ - Kubernetes package manager (if using Helm charts)
Docker - For building custom images (optional)
Access to a Kubernetes cluster (v1.28+)
Cluster Requirements
Minimum: 2 nodes with 4GB RAM and 2 CPU cores each
Recommended: 3+ nodes with 8GB RAM and 4 CPU cores each
Storage: Dynamic volume provisioning support (for databases)
Ingress Controller: NGINX, Traefik, or cloud provider load balancer
Knowledge Requirements
Basic Kubernetes concepts (Pods, Services, Deployments)
Understanding of Elsa architecture (see Architecture Overview)
Familiarity with database configuration
Basic YAML syntax
Architecture Overview
A typical Elsa Workflows Kubernetes deployment consists of:
Components
Elsa Server: Hosts the workflow engine and REST API
Elsa Studio: Visual workflow designer (optional, can be separate)
Database: PostgreSQL, SQL Server, or MySQL (with persistent storage)
Redis: Distributed caching and locking
RabbitMQ: Message broker for distributed cache invalidation (via MassTransit)
Ingress: External access routing
Monitoring: Prometheus metrics and Grafana dashboards
Helm Chart Deployment
Helm is the recommended approach for deploying Elsa Workflows to Kubernetes. While official Helm charts are under development, this section provides a production-ready chart configuration.
Step 1: Create Helm Chart Structure
Create a new Helm chart for Elsa:
Step 2: Configure Values
Create a values.yaml file with the following configuration:
Step 3: Create Secrets
Create a Kubernetes secret for sensitive configuration:
Security Best Practice
Never commit secrets to version control. Use external secret management tools like:
Sealed Secrets
External Secrets Operator
HashiCorp Vault
Cloud provider secret managers (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager)
Step 4: Install with Helm
Step 5: Verify Deployment
Upgrading
To upgrade your deployment:
Uninstalling
Kubernetes Manifest Deployment
For full control over your deployment, you can use raw Kubernetes manifests. This section provides production-ready YAML configurations.
Directory Structure
Namespace
ConfigMap
Secrets
Elsa Server Deployment
Elsa Server Service
Elsa Studio Deployment
Elsa Studio Service
Deploy All Manifests
Database Configuration
Proper database configuration is crucial for production Kubernetes deployments. This section covers PostgreSQL, SQL Server, and MySQL configurations.
PostgreSQL StatefulSet
PostgreSQL Service
Database Backup Configuration
Create a CronJob for regular backups:
Connection Pooling
For high-load scenarios, consider using PgBouncer:
Persistent Storage
Proper storage configuration ensures data persistence across pod restarts and upgrades.
Storage Classes
Define storage classes for different performance tiers:
Persistent Volume Claims
Volume Snapshots
Configure VolumeSnapshotClass for backup and disaster recovery:
Create snapshots:
Ingress Setup
Ingress controllers provide external access to your Elsa Workflows deployment with SSL/TLS termination, routing, and load balancing.
NGINX Ingress Controller
Installation
Ingress Configuration
Traefik Ingress Controller
Installation
IngressRoute Configuration
SSL/TLS with cert-manager
Install cert-manager
ClusterIssuer Configuration
Horizontal Pod Autoscaling
HPA automatically scales pods based on CPU, memory, or custom metrics.
Metrics Server Installation
HPA for Elsa Server
HPA for Elsa Studio
Vertical Pod Autoscaling (Optional)
For automatic resource request adjustments:
Pod Disruption Budget
Ensure availability during voluntary disruptions:
Testing Autoscaling
Monitoring with Prometheus & Grafana
Comprehensive monitoring is essential for production Kubernetes deployments. This section covers Prometheus metrics collection and Grafana dashboards.
Install Prometheus Stack
ServiceMonitor for Elsa Server
PrometheusRule for Alerts
Grafana Dashboard
Create a comprehensive Grafana dashboard for Elsa Workflows:
Custom Metrics in Elsa
To expose custom metrics from your Elsa Server, configure Prometheus metrics in Program.cs:
Accessing Grafana
Key Metrics to Monitor
http_requests_total
Total HTTP requests
-
http_request_duration_seconds
Request latency
P95 > 2s
elsa_workflow_executions_total
Workflow executions
-
elsa_workflow_execution_failed_total
Failed workflows
Rate > 0.5/s
elsa_active_workflows_total
Currently active workflows
-
elsa_database_connections_active
Active DB connections
> 90% of pool
container_cpu_usage_seconds_total
CPU usage
> 80%
container_memory_working_set_bytes
Memory usage
> 90% of limit
kube_pod_container_status_restarts_total
Pod restarts
> 0 in 15min
Service Mesh Integration
Service meshes provide advanced traffic management, security, and observability features. This section covers integration with Istio and Linkerd.
Istio Integration
Prerequisites
Gateway Configuration
VirtualService Configuration
DestinationRule for Circuit Breaking
PeerAuthentication for mTLS
AuthorizationPolicy
Linkerd Integration
Installation
Mesh Elsa Workflows Namespace
Traffic Split for Canary Deployments
ServiceProfile for Advanced Metrics
Rate Limiting with Linkerd
Observability with Service Mesh
Istio Dashboard
Linkerd Dashboard
Comparison: Istio vs Linkerd
Learning Curve
Steep
Gentle
Resource Usage
Higher (Envoy proxy)
Lower (Linkerd2-proxy)
Features
Comprehensive
Focused
Traffic Management
Advanced
Basic
Security
mTLS, AuthZ policies
mTLS, policy
Observability
Prometheus, Jaeger, Kiali
Prometheus, built-in viz
Performance
Good
Excellent
Best For
Complex environments
Simplicity, performance
Service Mesh Best Practices
Start Simple: Begin without a service mesh and add it when needed
Resource Planning: Allocate extra resources for sidecar proxies (~50-100Mi RAM, 0.1 CPU per pod)
Gradual Rollout: Enable mesh incrementally, namespace by namespace
Monitor Performance: Watch for latency increases due to proxy overhead
Use mTLS: Enable mutual TLS for secure pod-to-pod communication
Circuit Breaking: Configure circuit breakers to prevent cascade failures
Observability: Leverage built-in tracing and metrics
Test Thoroughly: Test failure scenarios with chaos engineering
Distributed Configuration
For Kubernetes deployments with multiple replicas, proper distributed configuration is essential. Reference the Distributed Hosting guide for detailed configuration.
Distributed Runtime Configuration
Configure distributed workflow runtime in your Elsa Server:
Environment-Based Configuration
Use Kubernetes ConfigMaps and Secrets for environment-specific settings:
Apply to deployment:
Redis Configuration for Caching
Deploy Redis for distributed caching:
Troubleshooting
Common issues and their solutions when deploying Elsa to Kubernetes.
Pod Issues
Pods Not Starting
Symptom: Pods stuck in Pending or ImagePullBackOff state
Solution 1: Insufficient Resources
Solution 2: Image Pull Issues
Pods Crashing (CrashLoopBackOff)
Symptom: Pods repeatedly restarting
Common Causes:
Database Connection Issues
Missing Dependencies
Configuration Errors
Database Issues
Migration Failures
Symptom: Elsa Server fails to start due to database migration errors
Connection Pool Exhaustion
Symptom: "Timeout expired" or "Too many connections" errors
Network Issues
Service Not Accessible
Symptom: Cannot reach Elsa services from outside cluster
Solution: DNS Issues
Ingress Not Working
Performance Issues
High Latency
Solutions:
Increase replica count
Optimize database queries
Add caching layer
Review resource limits
Memory Leaks
Distributed Configuration Issues
Lock Acquisition Failures
Symptom: "Failed to acquire lock" errors in logs
Cache Invalidation Issues
Symptom: Stale data across pods
Debugging Commands
Production Best Practices
Follow these best practices for reliable, secure, and performant Kubernetes deployments.
Security
1. Use Non-Root Containers
2. Network Policies
Restrict pod-to-pod communication:
3. Secrets Management
Use external secret management:
4. RBAC Configuration
High Availability
1. Multi-Zone Deployment
2. Pod Disruption Budgets
Ensure minimum availability during disruptions:
3. Health Checks
Configure appropriate health checks:
Resource Management
1. Set Resource Requests and Limits
2. Quality of Service Classes
Guaranteed: requests == limits (highest priority)
Burstable: requests < limits (medium priority)
BestEffort: no requests/limits (lowest priority)
3. Limit Ranges
Backup and Disaster Recovery
1. Regular Backups
2. Velero for Cluster Backups
Monitoring and Alerting
1. Define SLIs/SLOs
Elsa Server
Request Success Rate
> 99.9%
Elsa Server
P95 Latency
< 500ms
Elsa Server
Availability
> 99.95%
Database
Connection Success
> 99.99%
2. Alert on SLO Violations
Cost Optimization
1. Right-Size Resources
2. Use Spot/Preemptible Instances
3. Enable Cluster Autoscaler
CI/CD Integration
1. GitOps with ArgoCD
2. CI Pipeline Example
Next Steps
After deploying Elsa Workflows to Kubernetes:
Configure Monitoring: Set up Grafana dashboards and alerts
Test Failure Scenarios: Use chaos engineering tools like Chaos Mesh
Optimize Performance: Profile and tune based on your workload
Implement Backups: Set up automated backup and restore procedures
Security Hardening: Implement network policies, RBAC, and secret rotation
Documentation: Document your specific configuration and runbooks
Related Resources
Distributed Hosting Guide - Configure distributed runtime
Database Configuration - Database setup details
Authentication Guide - Secure your deployment
Docker Compose Guide - Local testing
Elsa Server Application Type - Server configuration
Elsa Studio Application Type - Studio configuration
Community and Support
Join the community on Discord or Slack
Version Information
This guide is written for:
Elsa Workflows: v3.5+
Kubernetes: v1.28+
Helm: v3.12+
PostgreSQL: 16+
Redis: 7+
RabbitMQ: 3.12+
Always refer to the official releases for the latest version compatibility information.
Last Updated: 2025-11-20
Last updated