Skip to Content
ArchitectureKubernetes Components

Kubernetes Components

GlassFlow consists of the following components running as Kubernetes pods in the glassflow namespace:

Core Application Pods

1. GlassFlow API

  • Pod Name: glassflow-api-*
  • Purpose: Core ETL engine that provides an API and orchestrates the pipeline.
  • Features:
    • Interface to the UI and python client for pipeline management.
    • Provides CRUD operations and REST API for Pipeline.

2. GlassFlow UI

  • Pod Name: glassflow-ui-*
  • Purpose: Web-based user interface for pipeline management.
  • Features:
    • Intuitive pipeline configuration.
    • Real-time monitoring.
    • User-friendly interface for managing data operations.
    • Responsive web interface.

3. GlassFlow Controller Manager

  • Pod Name: glassflow-controller-manager-*
  • Purpose: Kubernetes operator that manages GlassFlow custom resources.
  • Features:
    • Watches for pipeline custom resources.
    • Manages pipeline lifecycle.
    • Handles scaling and updates.
    • Integrates with Kubernetes API server.

4. GlassFlow Postgres

  • Pod Name: glassflow-postgresql-*
  • Purpose: PostgreSQL database for storing pipeline configuration.

NATS Cluster

5. NATS Server Cluster

  • Pod Names: glassflow-nats-0, glassflow-nats-1, glassflow-nats-2, glassflow-nats-3, glassflow-nats-4
  • Purpose: Distributed message broker and key-value store.
  • Features:
    • JetStream enabled for persistent messaging.
    • High-performance message delivery.
    • Clustering for high availability.
    • Automatic failover capabilities.
    • 5-node cluster for redundancy.

6. NATS Box

  • Pod Name: glassflow-nats-box-*
  • Purpose: NATS utility container for debugging and management.
  • Features:
    • NATS CLI tools.
    • Debugging capabilities.
    • Cluster monitoring utilities.
    • Administrative functions.

Monitoring and Observability

7. OpenTelemetry Collector

  • Pod Name: glassflow-otel-collector-*
  • Purpose: Collects, processes, and exports telemetry data.
  • Features:
    • Metrics collection.
    • Log aggregation.
    • Export to monitoring backends.
    • Provides promentheus metrics on /metrics endpoint at port 9090.

8. Prometheus NATS Exporter

  • Pod Name: glassflow-prometheus-nats-exporter-*
  • Purpose: Exports NATS metrics for Prometheus monitoring.
  • Features:
    • NATS server metrics.
    • JetStream statistics.
    • Connection monitoring.
    • Performance metrics.

Custom Resources

9. Pipeline Custom Resource Definition (CRD)

  • CRD Name: pipelines.etl.glassflow.io
  • Purpose: Defines the schema for pipeline resources in Kubernetes.
  • Features:
    • Declarative pipeline configuration.
    • Kubernetes-native pipeline management.
    • Integration with controller manager.
    • Version-controlled pipeline definitions.

Per-Pipeline Resources

10. Pipeline-Specific Namespaces

  • Namespace Pattern: pipeline-{pipeline-name}-{unique-id}
  • Example: pipeline-load-pipeline-1-7c8e
  • Purpose: Isolates each pipeline’s resources for better management and security.

11. Ingestor StatefulSet

  • StatefulSet Name: ingestor-{partition-id}
  • Purpose: Consumes data from external sources (Kafka, etc.).
  • Features:
    • Horizontal scaling (5 replicas in example).
    • Partition-based processing.
    • Fault tolerance with multiple replicas.

12. Transformation StatefulSet

  • StatefulSet Name: dedup-{partition-id}
  • Purpose: Component to deal with transforming, filtering and deduplicating data.
  • Features:
    • Horizontal scaling (5 replicas in example).
    • If deduplication is enabled, uses BadgerDB for deduplication in disk and memory.
    • If filter is enabled, uses expr to filter data.
    • If stateless transformation is enabled, uses expr to transform data.

13. Sink StatefulSet

  • StatefulSet Name: sink-{partition-id}
  • Purpose: Writes processed data to ClickHouse.
  • Features:
    • Horizontal scaling (5 replicas in example).
    • Handles final data persistence.
    • Error handling and retry logic.
    • Connection pooling for efficiency.

Namespace and Resources

GlassFlow Namespace

  • Namespace: glassflow
  • Purpose: Isolates GlassFlow components from other Kubernetes workloads.
  • Features:
    • Resource isolation.
    • Network policies.
    • RBAC configuration.
    • Resource quotas.

High Availability Features

NATS Clustering

  • Cluster Size: 3/5 nodes
  • Benefits:
    • Fault tolerance.
    • Load distribution.
    • Automatic failover.
    • Data replication.

Resource Management

Resource Requests and Limits

  • Each pod has defined CPU and memory requests/limits.
  • NATS cluster uses persistent volumes for data storage.
  • Monitoring components have minimal resource requirements.

Storage

  • NATS Data: Persistent volumes for JetStream storage.
  • Logs: Persistent volumes for application logs.
  • Configuration: ConfigMaps and Secrets for configuration management.

Networking

Service Discovery

  • Internal service communication via Kubernetes DNS.
  • NATS cluster communication via headless services.
  • External access via LoadBalancer or Ingress.

Security

  • RBAC policies for component access.
  • Network policies for traffic isolation.
  • TLS encryption for NATS cluster communication.
Last updated on