Skip to Content
InstallationKubernetesHelm Values Configuration

GlassFlow Helm Values Configuration

This comprehensive guide covers all available configuration options in the GlassFlow Helm chart’s values.yaml file. Use this reference to customize your GlassFlow deployment for production environments.

Quick Start: For basic installations, you can use the default values. For production deployments, review the sections below to optimize your configuration.

Global Settings

Global settings apply across all components of the GlassFlow deployment.

global: # Global image registry - prepended to all image repositories imageRegistry: "ghcr.io/glassflow/" # Observability configuration observability: metrics: enabled: true # Enable metrics collection logs: enabled: false # Enable log export exporter: otlp: {} # OTLP exporter configuration otelCollector: resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 512Mi # NATS global configuration nats: # NATS address for operator connection # Defaults to {{ .Release.Name }}-nats.{{ .Release.Namespace }}.svc.cluster.local address: "" stream: maxAge: 24h # Maximum age of messages in streams maxBytes: 0 # Maximum size of streams (0 = unlimited) # Pipeline namespace configuration pipelines: namespace: auto: true # When true, operator creates per-pipeline namespaces (pipeline-<id>) name: "glassflow-pipelines" # Fixed namespace to deploy all pipelines into (when auto is false) create: true # When auto is false, Helm can optionally create the namespace usageStats: enabled: true installationId: ""

Pipeline Namespaces:

  • By default, the operator creates per-pipeline namespaces (pipeline-<id>)
  • To use a fixed namespace for all pipelines, set global.pipelines.namespace.auto: false
  • When auto is false, all pipelines deploy to the namespace specified in global.pipelines.namespace.name

Key Global Settings

SettingDescriptionDefaultProduction Recommendation
imageRegistryGlobal Docker registry prefixghcr.io/glassflow/-
observability.metrics.enabledEnable metrics collectiontrueKeep enabled for monitoring
observability.logs.enabledEnable log exportfalseEnable for production monitoring
observability.logs.exporter.otlpYour OTLP collector endpoint{}Configure your OTLP endpoint where glassflow will send logs. See OTLP Exporter Configuration for detailed setup
nats.stream.maxAgeMessage retention period24hAdjust based on your data retention needs
nats.stream.maxBytesMaximum stream size0 (unlimited)Set a byte limit based on expected data volume
otelCollector.resourcesOTel collector sidecar resources100m/128Mi req, 500m/512Mi limIncrease if log/metric volume is high
pipelines.namespace.autoCreate per-pipeline namespacestrueSet to false to use fixed namespace
pipelines.namespace.nameFixed namespace for all pipelinesglassflow-pipelinesUsed when auto is false
pipelines.namespace.createCreate namespace if it doesn’t existtrueOnly applies when auto is false
usageStats.enabledSend anonymous usage statisticstrueSet to false to opt out

API Component

Configure the GlassFlow backend API service.

api: # Scaling configuration replicas: 1 logLevel: "INFO" # Container image settings image: repository: glassflow-etl-be tag: v2.11.2 pullPolicy: IfNotPresent # Resource allocation resources: requests: memory: "100Mi" cpu: "100m" limits: memory: "200Mi" cpu: "250m" # Service configuration service: type: ClusterIP port: 8081 targetPort: 8081 # Environment variables env: []

API Configuration Options

SettingDescriptionDefaultProduction Recommendation
replicasNumber of API instances11 is sufficient for API operations
logLevelLogging verbosityINFOUse DEBUG for troubleshooting
resources.requestsMinimum resources100Mi/100mScale based on load
resources.limitsMaximum resources200Mi/250mSet appropriate limits

UI Component

Configure the GlassFlow frontend user interface.

ui: # Scaling configuration replicas: 1 # Container image settings image: repository: glassflow-etl-fe tag: v2.11.2 pullPolicy: IfNotPresent # Resource allocation resources: requests: memory: "512Mi" cpu: "100m" limits: memory: "1Gi" cpu: "200m" # Service configuration service: type: ClusterIP port: 8080 targetPort: 8080 # Environment variables (object format) env: {} # Kafka Kerberos Gateway sidecar (for connecting to Kerberos-secured Kafka clusters) kafkaGateway: enabled: true image: repository: kafka-kerberos-gateway tag: latest pullPolicy: IfNotPresent resources: requests: memory: "128Mi" cpu: "50m" limits: memory: "256Mi" cpu: "200m" port: 8082 # Internal port within the UI pod # Auth0 authentication configuration auth0: enabled: false profileRoute: "/api/auth/me" secret: "" appBaseUrl: "http://localhost:8080" domain: "" issuerBaseUrl: "" clientId: "" clientSecret: ""

Auth0 integration is disabled by default. Set enabled: true and configure the domain, client ID, client secret, and base URL to enable authentication.

UI Configuration Options

SettingDescriptionDefaultProduction Recommendation
replicasNumber of UI instances11 is sufficient for UI pod
resources.requestsMinimum resources512Mi/100mFrontend typically needs more memory
resources.limitsMaximum resources1Gi/200mAdjust based on user load
envEnvironment variables (object format){}Use object format, not array
kafkaGateway.enabledEnable Kafka Kerberos Gateway sidecartrueEnable if connecting to Kerberos-secured Kafka
kafkaGateway.resourcesGateway resource requests/limits128Mi/50m - 256Mi/200mAdjust based on usage
kafkaGateway.portGateway internal port8082Internal port within UI pod

GlassFlow Operator

Configure the Kubernetes operator that manages ETL pipelines resources in k8s. The operator chart and code is in a separate repo  and is deployed as a dependency chart.

glassflow-operator: controllerManager: replicas: 1 manager: # Maximum duration a reconcile operation can run before timing out reconcileTimeout: 15m # Operator image configuration image: repository: glassflow-etl-k8s-operator tag: v2.1.0 pullPolicy: IfNotPresent # Resource allocation resources: requests: cpu: 10m memory: 64Mi limits: cpu: 500m memory: 128Mi # Service account configuration serviceAccount: annotations: {} # ETL component configurations glassflowComponents: ingestor: image: repository: glassflow-etl-ingestor tag: v2.11.2 logLevel: "INFO" resources: requests: cpu: 1000m memory: 256Mi limits: cpu: 1500m memory: 512Mi affinity: {} join: image: repository: glassflow-etl-join tag: v2.11.2 logLevel: "INFO" resources: requests: cpu: 500m memory: 256Mi limits: cpu: 1000m memory: 1Gi affinity: {} sink: image: repository: glassflow-etl-sink tag: v2.11.2 logLevel: "INFO" resources: requests: cpu: 1000m memory: 500Mi limits: cpu: 1500m memory: 1.5Gi affinity: {} dedup: image: repository: glassflow-etl-dedup tag: v2.11.2 pullPolicy: IfNotPresent logLevel: "INFO" resources: requests: cpu: 1000m memory: 1Gi limits: cpu: 2000m memory: 2Gi storage: size: "40Gi" className: "" affinity: {}

Operator Configuration Options

ComponentCPU RequestMemory RequestCPU LimitMemory Limit
Controller Manager10m64Mi500m128Mi
Ingestor1000m256Mi1500m512Mi
Join500m256Mi1000m1Gi
Sink1000m500Mi1500m1.5Gi
Dedup1000m1Gi2000m2Gi

The dedup.storage field provisions a PersistentVolumeClaim for the deduplication state store. Set storage.size based on your expected deduplication key volume and storage.className to a StorageClass name if you need a specific provisioner.

NATS Configuration

NATS is the messaging system used for internal communication between GlassFlow components. Nats is deployed as a dependency chart using the official nats charts repo 

nats: # Enable/disable NATS deployment enabled: true # NATS configuration config: # Clustering for high availability cluster: enabled: true port: 6222 replicas: 3 # Must be 2+ when JetStream is enabled # JetStream for persistent messaging jetstream: enabled: true # Memory store (fast, non-persistent) memoryStore: enabled: false maxSize: 1Gi # File store (persistent, recommended for production) fileStore: enabled: true dir: /data pvc: enabled: true size: 100Gi storageClassName: "" # Container resource allocation container: merge: resources: requests: memory: "3Gi" cpu: "4000m" limits: memory: "3Gi" cpu: "4000m"

NATS Configuration Options

SettingDescriptionDefaultProduction Recommendation
enabledDeploy NATS with GlassFlowtrueUse external NATS for large deployments
cluster.replicasNumber of NATS nodes3Use 3+ for production
jetstream.fileStore.pvc.sizeStorage size100GiScale based on data volume
resources.requestsMinimum resources3Gi/4000mNATS is I/O and CPU intensive at high throughput

NATS Prometheus Exporter

Nats Prometheus exporter collects all NATS related metrics. These metrics are provdied together with GlassFlow metrics on the /metrics endpoint. Details on accessing GlassFlow metrics can be found here

natsPrometheusExporter: image: repository: natsio/prometheus-nats-exporter tag: 0.17.3 pullPolicy: IfNotPresent # Metrics to collect metrics: accstatz: true connz: true connz_detailed: true jsz: true gatewayz: true leafz: true routez: true subz: true varz: true service: type: ClusterIP port: 80 targetPort: 7777 protocol: TCP name: http

PostgreSQL Configuration

GlassFlow requires PostgreSQL for persisting pipeline definitions, connection credentials, and run history. By default, the chart deploys a single-node PostgreSQL instance. Set postgresql.enabled: false and configure global.postgres.connection_url (or global.postgres.secret) to use an external PostgreSQL instance.

postgresql: enabled: true image: repository: postgres tag: "17-alpine" pullPolicy: IfNotPresent replicaCount: 1 auth: enabled: true database: "glassflow" sslmode: "disable" username: "glassflow" password: "glassflow123" existingSecret: enabled: false name: "" keys: usernameKey: username passwordKey: password databaseKey: database service: type: ClusterIP port: 5432 persistence: enabled: true size: 10Gi storageClass: "" resources: requests: memory: "512Mi" cpu: "100m" limits: memory: "2Gi" cpu: "1000m"

Change postgresql.auth.password before deploying to production. For production, prefer auth.existingSecret to avoid storing credentials in values.yaml.

PostgreSQL Configuration Options

SettingDescriptionDefaultProduction Recommendation
enabledDeploy PostgreSQL with GlassFlowtrueSet to false to use an external instance
auth.passwordDatabase passwordglassflow123Change before deploying to production
auth.existingSecret.enabledSource credentials from a Kubernetes SecretfalseEnable for production to avoid plaintext credentials in values.yaml
persistence.sizePVC size for PostgreSQL data10GiScale based on pipeline and connection volume
global.postgres.connection_urlExternal PostgreSQL URL""Set when postgresql.enabled is false

Notification Service

The notification service sends Slack and email alerts for pipeline events. Notifications are activated by setting global.notifications.enabled: true; the channel details are configured in this section.

notificationService: replicas: 1 image: repository: glassflow-notifier tag: v1.0.1 pullPolicy: IfNotPresent resources: requests: memory: "100Mi" cpu: "100m" limits: memory: "200Mi" cpu: "250m" slack: enabled: "false" webhookUrl: "" defaultChannel: "#notifications" email: enabled: "false" smtpHost: "" smtpPort: 587 smtpUsername: "" smtpPassword: "" fromAddress: "" toAddress: ""

Notification Service Configuration Options

SettingDescriptionDefaultProduction Recommendation
slack.enabledEnable Slack notifications"false"Set to "true" and provide webhookUrl
slack.webhookUrlIncoming webhook URL""Required when Slack is enabled
slack.defaultChannelDefault Slack channel#notificationsOverride per-alert in pipeline config
email.enabledEnable email notifications"false"Set to "true" and configure SMTP fields
email.smtpHostSMTP server host""Required when email is enabled

Ingress Configuration

Configure external access to GlassFlow services.

ingress: # Enable external access enabled: false # Ingress controller class ingressClassName: "nginx" # or "traefik", "istio" # Ingress annotations annotations: cert-manager.io/cluster-issuer: "letsencrypt-prod" nginx.ingress.kubernetes.io/ssl-redirect: "true" # Host configurations hosts: - host: "glassflow.example.com" paths: - path: "/" pathType: Prefix serviceName: "glassflow-ui" servicePort: 8080 - path: "/api/v1" pathType: Prefix serviceName: "glassflow-api" servicePort: 8081 # TLS configuration tls: - hosts: - "glassflow.example.com" secretName: "glassflow-tls-secret"

Ingress Configuration Options

By default, helm deployment does not expose GlassFlow to the internet. See Using Ingress for details on configuring ingress for enabling external access.

SettingDescriptionDefaultProduction Recommendation
enabledEnable external accessfalseSet to true for production
ingressClassNameIngress controller""Specify your controller
hostsDomain configurations[]Configure your domains
tlsHTTPS configuration[]Enable for production

Security Settings

Configure security contexts and service accounts.

# Pod security context podSecurityContext: {} # fsGroup: 2000 # runAsNonRoot: true # runAsUser: 1000 # Container security context securityContext: {} # capabilities: # drop: # - ALL # readOnlyRootFilesystem: true # runAsNonRoot: true # runAsUser: 1000 # Service account configuration serviceAccount: create: true automount: true annotations: eks.amazonaws.com/role-arn: "arn:aws:iam::ACCOUNT:role/ROLE" name: ""

Security Configuration Options

SettingDescriptionDefaultProduction Recommendation
podSecurityContextPod-level security context{}Configure fsGroup, runAsNonRoot, etc. for proper permissions
securityContextContainer-level security context{}Enable readOnlyRootFilesystem, runAsNonRoot, etc. for hardened containers
serviceAccount.createCreate service accounttrueUse existing for production
serviceAccount.automountAutomount API credentialstrueEnable for service account token mounting
serviceAccount.nameService account name""Use custom name if needed
serviceAccount.annotationsService account annotations{}Useful for IAM roles, OIDC providers

Credential Encryption

GlassFlow can encrypt Kafka and ClickHouse credentials before they are persisted in PostgreSQL. When enabled, the API uses AES-256-GCM to encrypt the following fields at rest:

  • Kafka: SASL password, TLS private key, Kerberos keytab
  • ClickHouse: connection password

Credentials are decrypted transparently at runtime and are never stored or logged in plaintext when encryption is active.

Enable credential encryption for all production deployments. When global.encryption.enabled is false (the default), Kafka and ClickHouse credentials are stored in plaintext in the PostgreSQL connections table. Any user or process with read access to the database can retrieve them without restriction.

How it works

The encryption key is a 32-byte (256-bit) random value stored in a Kubernetes Secret that you create and manage. You must supply the Secret before enabling encryption — the chart does not generate one automatically. This keeps the key lifecycle outside of Helm and prevents accidental key rotation during upgrades.

When global.encryption.enabled is true, global.encryption.existingSecret.name must be set. The chart will fail with a validation error if the field is empty.

Configuration

global: encryption: # Set to true to enable AES-256-GCM encryption of credentials in PostgreSQL. # When false (default), credentials are stored in plaintext. enabled: false # Reference an existing Kubernetes Secret that contains the encryption key. # Required when enabled=true — the chart does not generate this Secret. existingSecret: # Name of the Kubernetes Secret in the same namespace as GlassFlow. name: "" # Key within the Secret whose value is the 32-byte encryption key. key: "encryption-key"

Encryption configuration options

SettingTypeDefaultDescription
global.encryption.enabledboolfalseEnable AES-256-GCM encryption of credentials stored in PostgreSQL
global.encryption.existingSecret.namestring""Name of a pre-existing Kubernetes Secret containing the encryption key. Required when enabled is true
global.encryption.existingSecret.keystring"encryption-key"Key inside the Secret whose value is the 32-byte encryption key

Pod Configuration

Configure pod-level settings for scheduling and labeling.

# Pod annotations (useful for monitoring, logging, etc.) podAnnotations: prometheus.io/scrape: "true" prometheus.io/port: "9090" prometheus.io/path: "/metrics" # Pod labels podLabels: {} # Node selector for main pods (API and UI) nodeSelector: {} # Pod tolerations tolerations: [] # Pod affinity rules affinity: {}

Pod Configuration Options

SettingDescriptionDefaultProduction Recommendation
podAnnotationsAdditional pod annotations{}Add monitoring annotations
podLabelsAdditional pod labels{}Useful for service discovery
nodeSelectorNode selector for scheduling{}Use for dedicated nodes
tolerationsPod tolerations[]For tainted nodes
affinityPod affinity/anti-affinity rules{}Control pod placement

Autoscaling Configuration

Configure horizontal pod autoscaling for the API and UI components.

autoscaling: enabled: false minReplicas: 1 maxReplicas: 5 targetCPUUtilizationPercentage: 80

Autoscaling Configuration Options

SettingDescriptionDefaultProduction Recommendation
enabledEnable autoscalingfalseEnable for production workloads
minReplicasMinimum number of replicas1Set based on minimum load
maxReplicasMaximum number of replicas5Set based on peak load
targetCPUUtilizationPercentageTarget CPU utilization80Adjust based on workload

Best Practices

Production Checklist:

  • Use 3+ NATS replicas for high availability
  • Set appropriate resource requests and limits
  • Enable ingress with TLS
  • Configure persistent storage for NATS
  • Set up monitoring and logging
  • Use node selectors for dedicated resources

Resource Sizing Guidelines

EnvironmentAPI CPUAPI MemoryUI CPUUI MemoryNATS CPUNATS MemoryNATS ReplicasNATS StorageDedup CPUDedup Memory
Development50m50Mi50m256Mi500m1Gi110Gi500m512Mi
Production500m500Mi200m1Gi4000m4Gi3100Gi1000m1Gi
High-Performance1000m1Gi500m2Gi4000m+8Gi5500Gi2000m2Gi

Monitoring Configuration

# Enable comprehensive monitoring global: observability: metrics: enabled: true logs: enabled: true exporter: otlp: endpoint: "https://your-otel-collector:4317" tls: insecure: false # Add monitoring annotations podAnnotations: prometheus.io/scrape: "true" prometheus.io/port: "9090" prometheus.io/path: "/metrics"

Troubleshooting

Common Issues

  1. NATS Connection Issues

    # Ensure NATS is properly configured nats: config: cluster: replicas: 3 # Must be 2+ for JetStream
  2. Resource Constraints

    # Check resource requests vs limits resources: requests: memory: "100Mi" # Should be realistic cpu: "100m" limits: memory: "200Mi" # Should be higher than requests cpu: "250m"
  3. Ingress Not Working

    # Verify ingress configuration ingress: enabled: true ingressClassName: "nginx" # Must match your controller hosts: - host: "your-domain.com"

Validation Commands

# Validate Helm values helm template glassflow glassflow/glassflow-etl -f values.yaml --dry-run # Check resource usage kubectl top pods -n glassflow # Verify services kubectl get svc -n glassflow # Check ingress kubectl get ingress -n glassflow

Next Steps

After configuring your values.yaml:

  1. Install GlassFlow: helm install glassflow glassflow/glassflow-etl -f values.yaml
  2. Verify Installation: Check pod status and service endpoints
  3. Configure Monitoring: Set up Prometheus/Grafana dashboards
  4. Set Up Logging: Configure log aggregation
  5. Test Functionality: Create your first ETL pipeline

For more information, see the Installation Guide and Pipeline JSON Reference.

Last updated on