Usage Guide

This guide will walk you through the process of creating and managing data pipelines using GlassFlow. We’ll cover everything from initial setup to monitoring your pipeline’s performance.

Prerequisites

Before creating your pipeline, ensure you have:

GlassFlow running locally (see Installation Guide)
Access to your Kafka cluster
Access to your ClickHouse database
The following information ready:
- Kafka connection details
- ClickHouse connection details
- Source topic names
- Target table names

Creating a Pipeline

GlassFlow provides two ways to create a pipeline, each suited for different use cases and preferences:

1. Web UI (Recommended for Beginners)

The Web UI approach uses a visual wizard that guides you through each step of pipeline creation. This method is ideal for:

Creating your first pipeline and understanding the components
Visual learners who prefer a guided interface
Quick prototyping and experimentation

The wizard walks you through:

Configuring Kafka connections and topic selection
Setting up transformations (deduplication, joins)
Configuring ClickHouse connections and table mapping
Field mapping and data type configuration

Learn how to use the Web UI →

2. Python SDK (Recommended for Advanced Users)

The Python SDK approach allows for programmatic creation of pipelines. This method is ideal for:

Developers who prefer code-based configuration
Automated pipeline deployment and CI/CD integration
Complex pipeline configurations that benefit from version control
Integration with existing Python-based data workflows

The SDK provides:

Type-safe pipeline configuration
Integration with Python data processing libraries
Version control and code review capabilities
Automated testing and validation

Learn how to use the Python SDK →

Both approaches create the same underlying pipeline configuration and can be used interchangeably based on your workflow preferences.

Verifying Data Flow

Check Kafka Topics
- Verify data is being produced
- Check message format
- Monitor topic health
Monitor ClickHouse
- Verify data arrival
- Check data quality
- Monitor table growth
Monitor the Pipeline logs

Pipeline logs are available via docker logs. To follow the logs in real-time for all containers, run:


docker compose logs -f

To follow the logs in real-time for the backend app, run:


docker compose logs app -f

To follow the logs in real-time for the UI, run:


docker compose logs ui -f