Architecture Overview
This page provides an overview of the current architecture for the GlassFlow ClickHouse ETL. The system is designed to orchestrate data pipelines that ingest, deduplicate, join, and sink data from Kafka into ClickHouse, leveraging a modular and scalable event-driven approach.
Architecture Components
For a detailed description of each component, see the System Components page.
Data Flow Summary
-
Pipeline Creation:
The user creates a pipeline via the frontend, which is handled by the HTTP server and validated by the Pipeline Manager. -
Ingestion:
The NATS-Kafka Bridge ingests data from Kafka, optionally deduplicating events. -
Processing:
If enabled, the Join Operator merges streams from two Kafka topics. -
Sink:
The Sink Operator writes processed data to ClickHouse in batches. -
Monitoring:
The user can monitor pipeline status via the frontend.