Usage Guide
This guide will walk you through the process of creating and managing data pipelines using GlassFlow. We’ll cover everything from initial setup to monitoring your pipeline’s performance.
Prerequisites
Before creating your pipeline, ensure you have:
- GlassFlow running locally (see Installation Guide)
- Access to your Kafka cluster
- Access to your ClickHouse database
- The following information ready:
- Kafka connection details
- ClickHouse connection details
- Source topic names
- Target table names
Creating a Pipeline
GlassFlow provides two ways to create a pipeline, each suited for different use cases and preferences:
1. Web UI (Recommended for Beginners)
The Web UI approach uses a visual wizard that guides you through each step of pipeline creation. This method is ideal for:
- Creating your first pipeline and understanding the components
- Visual learners who prefer a guided interface
- Quick prototyping and experimentation
The wizard walks you through:
- Configuring Kafka connections and topic selection
- Setting up transformations (deduplication, joins)
- Configuring ClickHouse connections and table mapping
- Field mapping and data type configuration
2. Python SDK (Recommended for Advanced Users)
The Python SDK approach allows for programmatic creation of pipelines. This method is ideal for:
- Developers who prefer code-based configuration
- Automated pipeline deployment and CI/CD integration
- Complex pipeline configurations that benefit from version control
- Integration with existing Python-based data workflows
The SDK provides:
- Type-safe pipeline configuration
- Integration with Python data processing libraries
- Version control and code review capabilities
- Automated testing and validation
Learn how to use the Python SDK →
Both approaches create the same underlying pipeline configuration and can be used interchangeably based on your workflow preferences.
Verifying Data Flow
-
Check Kafka Topics
- Verify data is being produced
- Check message format
- Monitor topic health
-
Monitor ClickHouse
- Verify data arrival
- Check data quality
- Monitor table growth
-
Monitor the Pipeline logs
Pipeline logs are available via docker logs. To follow the logs in real-time for all containers, run:
docker compose logs -f
To follow the logs in real-time for the backend app, run:
docker compose logs app -f
To follow the logs in real-time for the UI, run:
docker compose logs ui -f