Kubernetes Deployment

Complete Kubernetes deployment guide for Elsa Workflows including Helm charts, deployment configurations, ingress setup, autoscaling, monitoring, service mesh integration, and production best practice

This comprehensive guide covers deploying Elsa Workflows to Kubernetes in production environments. Whether you're using managed Kubernetes services (EKS, AKS, GKE) or self-hosted clusters, this guide provides everything you need for a reliable, scalable deployment.

Overview

Elsa Workflows can be deployed to Kubernetes using either:

  • Helm Charts (Recommended) - Simplified deployment and management

  • Raw Kubernetes Manifests - Full control over configuration

This guide covers both approaches and includes:

  • Elsa Server and Studio deployments

  • Database integration and persistence

  • Ingress configuration for external access

  • Horizontal Pod Autoscaling (HPA)

  • Monitoring with Prometheus and Grafana

  • Service mesh integration (Istio/Linkerd)

  • Production best practices and troubleshooting

Table of Contents

Prerequisites

Before deploying to Kubernetes, ensure you have:

Required Tools

  • kubectl v1.28+ - Kubernetes command-line tool

  • Helm v3.12+ - Kubernetes package manager (if using Helm charts)

  • Docker - For building custom images (optional)

  • Access to a Kubernetes cluster (v1.28+)

Cluster Requirements

  • Minimum: 2 nodes with 4GB RAM and 2 CPU cores each

  • Recommended: 3+ nodes with 8GB RAM and 4 CPU cores each

  • Storage: Dynamic volume provisioning support (for databases)

  • Ingress Controller: NGINX, Traefik, or cloud provider load balancer

Knowledge Requirements

  • Basic Kubernetes concepts (Pods, Services, Deployments)

  • Understanding of Elsa architecture (see Architecture Overview)

  • Familiarity with database configuration

  • Basic YAML syntax

New to Kubernetes?

For local development and testing, consider using Minikube, k3d, or Docker Desktop Kubernetes before deploying to production clusters.

Architecture Overview

A typical Elsa Workflows Kubernetes deployment consists of:

Components

  1. Elsa Server: Hosts the workflow engine and REST API

  2. Elsa Studio: Visual workflow designer (optional, can be separate)

  3. Database: PostgreSQL, SQL Server, or MySQL (with persistent storage)

  4. Redis: Distributed caching and locking

  5. RabbitMQ: Message broker for distributed cache invalidation (via MassTransit)

  6. Ingress: External access routing

  7. Monitoring: Prometheus metrics and Grafana dashboards

Helm Chart Deployment

Helm is the recommended approach for deploying Elsa Workflows to Kubernetes. While official Helm charts are under development, this section provides a production-ready chart configuration.

Step 1: Create Helm Chart Structure

Create a new Helm chart for Elsa:

Step 2: Configure Values

Create a values.yaml file with the following configuration:

Step 3: Create Secrets

Create a Kubernetes secret for sensitive configuration:

Step 4: Install with Helm

Step 5: Verify Deployment

Upgrading

To upgrade your deployment:

Uninstalling

Kubernetes Manifest Deployment

For full control over your deployment, you can use raw Kubernetes manifests. This section provides production-ready YAML configurations.

Directory Structure

Namespace

ConfigMap

Secrets

Elsa Server Deployment

Elsa Server Service

Elsa Studio Deployment

Elsa Studio Service

Deploy All Manifests

Database Configuration

Proper database configuration is crucial for production Kubernetes deployments. This section covers PostgreSQL, SQL Server, and MySQL configurations.

PostgreSQL StatefulSet

PostgreSQL Service

Database Backup Configuration

Create a CronJob for regular backups:

Connection Pooling

For high-load scenarios, consider using PgBouncer:

Persistent Storage

Proper storage configuration ensures data persistence across pod restarts and upgrades.

Storage Classes

Define storage classes for different performance tiers:

Persistent Volume Claims

Volume Snapshots

Configure VolumeSnapshotClass for backup and disaster recovery:

Create snapshots:

Ingress Setup

Ingress controllers provide external access to your Elsa Workflows deployment with SSL/TLS termination, routing, and load balancing.

NGINX Ingress Controller

Installation

Ingress Configuration

Traefik Ingress Controller

Installation

IngressRoute Configuration

SSL/TLS with cert-manager

Install cert-manager

ClusterIssuer Configuration

Horizontal Pod Autoscaling

HPA automatically scales pods based on CPU, memory, or custom metrics.

Metrics Server Installation

HPA for Elsa Server

HPA for Elsa Studio

Vertical Pod Autoscaling (Optional)

For automatic resource request adjustments:

Pod Disruption Budget

Ensure availability during voluntary disruptions:

Testing Autoscaling

Monitoring with Prometheus & Grafana

Comprehensive monitoring is essential for production Kubernetes deployments. This section covers Prometheus metrics collection and Grafana dashboards.

Install Prometheus Stack

ServiceMonitor for Elsa Server

PrometheusRule for Alerts

Grafana Dashboard

Create a comprehensive Grafana dashboard for Elsa Workflows:

Custom Metrics in Elsa

To expose custom metrics from your Elsa Server, configure Prometheus metrics in Program.cs:

Accessing Grafana

Key Metrics to Monitor

Metric
Description
Alert Threshold

http_requests_total

Total HTTP requests

-

http_request_duration_seconds

Request latency

P95 > 2s

elsa_workflow_executions_total

Workflow executions

-

elsa_workflow_execution_failed_total

Failed workflows

Rate > 0.5/s

elsa_active_workflows_total

Currently active workflows

-

elsa_database_connections_active

Active DB connections

> 90% of pool

container_cpu_usage_seconds_total

CPU usage

> 80%

container_memory_working_set_bytes

Memory usage

> 90% of limit

kube_pod_container_status_restarts_total

Pod restarts

> 0 in 15min

Service Mesh Integration

Service meshes provide advanced traffic management, security, and observability features. This section covers integration with Istio and Linkerd.

Istio Integration

Prerequisites

Gateway Configuration

VirtualService Configuration

DestinationRule for Circuit Breaking

PeerAuthentication for mTLS

AuthorizationPolicy

Linkerd Integration

Installation

Mesh Elsa Workflows Namespace

Traffic Split for Canary Deployments

ServiceProfile for Advanced Metrics

Rate Limiting with Linkerd

Observability with Service Mesh

Istio Dashboard

Linkerd Dashboard

Comparison: Istio vs Linkerd

Feature
Istio
Linkerd

Learning Curve

Steep

Gentle

Resource Usage

Higher (Envoy proxy)

Lower (Linkerd2-proxy)

Features

Comprehensive

Focused

Traffic Management

Advanced

Basic

Security

mTLS, AuthZ policies

mTLS, policy

Observability

Prometheus, Jaeger, Kiali

Prometheus, built-in viz

Performance

Good

Excellent

Best For

Complex environments

Simplicity, performance

Service Mesh Best Practices

  1. Start Simple: Begin without a service mesh and add it when needed

  2. Resource Planning: Allocate extra resources for sidecar proxies (~50-100Mi RAM, 0.1 CPU per pod)

  3. Gradual Rollout: Enable mesh incrementally, namespace by namespace

  4. Monitor Performance: Watch for latency increases due to proxy overhead

  5. Use mTLS: Enable mutual TLS for secure pod-to-pod communication

  6. Circuit Breaking: Configure circuit breakers to prevent cascade failures

  7. Observability: Leverage built-in tracing and metrics

  8. Test Thoroughly: Test failure scenarios with chaos engineering

Distributed Configuration

For Kubernetes deployments with multiple replicas, proper distributed configuration is essential. Reference the Distributed Hosting guide for detailed configuration.

Distributed Runtime Configuration

Configure distributed workflow runtime in your Elsa Server:

Environment-Based Configuration

Use Kubernetes ConfigMaps and Secrets for environment-specific settings:

Apply to deployment:

Redis Configuration for Caching

Deploy Redis for distributed caching:

Troubleshooting

Common issues and their solutions when deploying Elsa to Kubernetes.

Pod Issues

Pods Not Starting

Symptom: Pods stuck in Pending or ImagePullBackOff state

Solution 1: Insufficient Resources

Solution 2: Image Pull Issues

Pods Crashing (CrashLoopBackOff)

Symptom: Pods repeatedly restarting

Common Causes:

  1. Database Connection Issues

  1. Missing Dependencies

  1. Configuration Errors

Database Issues

Migration Failures

Symptom: Elsa Server fails to start due to database migration errors

Connection Pool Exhaustion

Symptom: "Timeout expired" or "Too many connections" errors

Network Issues

Service Not Accessible

Symptom: Cannot reach Elsa services from outside cluster

Solution: DNS Issues

Ingress Not Working

Performance Issues

High Latency

Solutions:

  • Increase replica count

  • Optimize database queries

  • Add caching layer

  • Review resource limits

Memory Leaks

Distributed Configuration Issues

Lock Acquisition Failures

Symptom: "Failed to acquire lock" errors in logs

Cache Invalidation Issues

Symptom: Stale data across pods

Debugging Commands

Production Best Practices

Follow these best practices for reliable, secure, and performant Kubernetes deployments.

Security

1. Use Non-Root Containers

2. Network Policies

Restrict pod-to-pod communication:

3. Secrets Management

Use external secret management:

4. RBAC Configuration

High Availability

1. Multi-Zone Deployment

2. Pod Disruption Budgets

Ensure minimum availability during disruptions:

3. Health Checks

Configure appropriate health checks:

Resource Management

1. Set Resource Requests and Limits

2. Quality of Service Classes

  • Guaranteed: requests == limits (highest priority)

  • Burstable: requests < limits (medium priority)

  • BestEffort: no requests/limits (lowest priority)

3. Limit Ranges

Backup and Disaster Recovery

1. Regular Backups

2. Velero for Cluster Backups

Monitoring and Alerting

1. Define SLIs/SLOs

Service
SLI
SLO

Elsa Server

Request Success Rate

> 99.9%

Elsa Server

P95 Latency

< 500ms

Elsa Server

Availability

> 99.95%

Database

Connection Success

> 99.99%

2. Alert on SLO Violations

Cost Optimization

1. Right-Size Resources

2. Use Spot/Preemptible Instances

3. Enable Cluster Autoscaler

CI/CD Integration

1. GitOps with ArgoCD

2. CI Pipeline Example

Next Steps

After deploying Elsa Workflows to Kubernetes:

  1. Configure Monitoring: Set up Grafana dashboards and alerts

  2. Test Failure Scenarios: Use chaos engineering tools like Chaos Mesh

  3. Optimize Performance: Profile and tune based on your workload

  4. Implement Backups: Set up automated backup and restore procedures

  5. Security Hardening: Implement network policies, RBAC, and secret rotation

  6. Documentation: Document your specific configuration and runbooks

Community and Support

Version Information

This guide is written for:

  • Elsa Workflows: v3.5+

  • Kubernetes: v1.28+

  • Helm: v3.12+

  • PostgreSQL: 16+

  • Redis: 7+

  • RabbitMQ: 3.12+

Always refer to the official releases for the latest version compatibility information.


Last Updated: 2025-11-20

Last updated