Kubernetes Components
GlassFlow consists of the following components running as Kubernetes pods in the glassflow namespace:
Core Application Pods
1. GlassFlow API
- Pod Name:
glassflow-api-* - Purpose: Core ETL engine that provides an API and orchestrates the pipeline.
- Features:
- Interface to the UI and python client for pipeline management.
- Provides CRUD operations and REST API for Pipeline.
2. GlassFlow UI
- Pod Name:
glassflow-ui-* - Purpose: Web-based user interface for pipeline management.
- Features:
- Intuitive pipeline configuration.
- Real-time monitoring.
- User-friendly interface for managing data operations.
- Responsive web interface.
3. GlassFlow Controller Manager
- Pod Name:
glassflow-controller-manager-* - Purpose: Kubernetes operator that manages GlassFlow custom resources.
- Features:
- Watches for pipeline custom resources.
- Manages pipeline lifecycle.
- Handles scaling and updates.
- Integrates with Kubernetes API server.
4. GlassFlow Postgres
- Pod Name:
glassflow-postgresql-* - Purpose: PostgreSQL database for storing pipeline configuration.
NATS Cluster
5. NATS Server Cluster
- Pod Names:
glassflow-nats-0,glassflow-nats-1,glassflow-nats-2,glassflow-nats-3,glassflow-nats-4 - Purpose: Distributed message broker and key-value store.
- Features:
- JetStream enabled for persistent messaging.
- High-performance message delivery.
- Clustering for high availability.
- Automatic failover capabilities.
- 5-node cluster for redundancy.
6. NATS Box
- Pod Name:
glassflow-nats-box-* - Purpose: NATS utility container for debugging and management.
- Features:
- NATS CLI tools.
- Debugging capabilities.
- Cluster monitoring utilities.
- Administrative functions.
Monitoring and Observability
7. OpenTelemetry Collector
- Pod Name:
glassflow-otel-collector-* - Purpose: Collects, processes, and exports telemetry data.
- Features:
- Metrics collection.
- Log aggregation.
- Export to monitoring backends.
- Provides promentheus metrics on
/metricsendpoint at port9090.
8. Prometheus NATS Exporter
- Pod Name:
glassflow-prometheus-nats-exporter-* - Purpose: Exports NATS metrics for Prometheus monitoring.
- Features:
- NATS server metrics.
- JetStream statistics.
- Connection monitoring.
- Performance metrics.
Custom Resources
9. Pipeline Custom Resource Definition (CRD)
- CRD Name:
pipelines.etl.glassflow.io - Purpose: Defines the schema for pipeline resources in Kubernetes.
- Features:
- Declarative pipeline configuration.
- Kubernetes-native pipeline management.
- Integration with controller manager.
- Version-controlled pipeline definitions.
Per-Pipeline Resources
10. Pipeline-Specific Namespaces
- Namespace Pattern:
pipeline-{pipeline-name}-{unique-id} - Example:
pipeline-load-pipeline-1-7c8e - Purpose: Isolates each pipeline’s resources for better management and security.
11. Ingestor StatefulSet
- StatefulSet Name:
ingestor-{partition-id} - Purpose: Consumes data from external sources (Kafka, etc.).
- Features:
- Horizontal scaling (5 replicas in example).
- Partition-based processing.
- Fault tolerance with multiple replicas.
12. Transformation StatefulSet
- StatefulSet Name:
dedup-{partition-id} - Purpose: Component to deal with transforming, filtering and deduplicating data.
- Features:
- Horizontal scaling (5 replicas in example).
- If deduplication is enabled, uses BadgerDB for deduplication in disk and memory.
- If filter is enabled, uses expr to filter data.
- If stateless transformation is enabled, uses expr to transform data.
13. Sink StatefulSet
- StatefulSet Name:
sink-{partition-id} - Purpose: Writes processed data to ClickHouse.
- Features:
- Horizontal scaling (5 replicas in example).
- Handles final data persistence.
- Error handling and retry logic.
- Connection pooling for efficiency.
Namespace and Resources
GlassFlow Namespace
- Namespace:
glassflow - Purpose: Isolates GlassFlow components from other Kubernetes workloads.
- Features:
- Resource isolation.
- Network policies.
- RBAC configuration.
- Resource quotas.
High Availability Features
NATS Clustering
- Cluster Size: 3/5 nodes
- Benefits:
- Fault tolerance.
- Load distribution.
- Automatic failover.
- Data replication.
Resource Management
Resource Requests and Limits
- Each pod has defined CPU and memory requests/limits.
- NATS cluster uses persistent volumes for data storage.
- Monitoring components have minimal resource requirements.
Storage
- NATS Data: Persistent volumes for JetStream storage.
- Logs: Persistent volumes for application logs.
- Configuration: ConfigMaps and Secrets for configuration management.
Networking
Service Discovery
- Internal service communication via Kubernetes DNS.
- NATS cluster communication via headless services.
- External access via LoadBalancer or Ingress.
Security
- RBAC policies for component access.
- Network policies for traffic isolation.
- TLS encryption for NATS cluster communication.
Last updated on