Run a demo pipeline
GlassFlow comes with a comprehensive demo environment that allows you to test its capabilities locally. This guide walks you through a local installation using Docker Compose. It spins up a local Kafka, GlassFlow and a local ClickHouse. It showcases how deduplication works at GlassFlow. It is perfect for development, testing, or trying out GlassFlow on your machine.
Demo Overview
The demo environment provides two ways to interact with GlassFlow:
- Through the GlassFlow UI: Connect directly to local Kafka and ClickHouse instances
- Through Python Scripts: Use our Python SDK to automate pipeline setup and event generation
Prerequisites
Before starting, ensure you have:
- Docker and Docker Compose
- Python 3.8+ (for Python demos)
- pip (Python package manager)
Setting Up the Demo Environment
Navigate to the demo directory:
cd demos
Start the local infrastructure:
docker compose up -d
This will start the following services:
- Kafka (ports 9092 - external, 9093 - internal)
- ClickHouse (ports 8123 - HTTP, 9000 - Native)
- GlassFlow ClickHouse ETL application (port 8080)
Option 1: Using the GlassFlow UI
Create Kafka Topics
# Create a new Kafka topic
docker compose exec kafka kafka-topics \
--topic users \
--create \
--partitions 1 \
--replication-factor 1 \
--bootstrap-server localhost:9092
Create ClickHouse Table
docker compose exec clickhouse clickhouse-client \
--user default \
--password secret \
--query "
CREATE TABLE IF NOT EXISTS users_dedup (
event_id UUID,
user_id UUID,
name String,
email String,
created_at DateTime,
tags Array(String)
) ENGINE = MergeTree
ORDER BY event_id"
Generate a test event in Kafka to help you create the pipeline in the UI
# Send multiple JSON events to Kafka
echo '{"event_id": "49a6fdd6f305428881f3436eb498fc9d", "user": {"id": "8db09a6aa33a46f6bdabe4683a34ac4d", "name": "John Doe", "email": "[email protected]"}, "created_at": "2024-03-20T10:00:00Z", "tags": ["tag1", "tag222"]}' |
docker compose exec -T kafka kafka-console-producer \
--topic users \
--bootstrap-server localhost:9092
Configure Pipeline in UI
Access the GlassFlow UI at http://localhost:8080
and use these connection details to create a deduplication pipeline:
Kafka Connection
Authentication Method: No Authentication
Security Protocol: PLAINTEXT
Bootstrap Servers: kafka:9093
Kafka Topic
Topic Name: users
Consumer Group Initial Offset: latest
Schema:
{
"event_id": "49a6fdd6f305428881f3436eb498fc9d",
"user": {
"id": "8db09a6aa33a46f6bdabe4683a34ac4d",
"name": "Jane Smith",
"email": "[email protected]"
},
"created_at": "2024-03-20T10:03:00Z",
"tags": ["tag13", "tag324"]
}
Deduplication
Enabled: true
Deduplicate Key: event_id
Deduplicate Key Type: string
Time Window: 1h
ClickHouse Connection
Host: clickhouse
HTTP/S Port: 8123
Native Port: 9000
Username: default
Password: secret
Use SSL: false
ClickHouse Table
Table: users_dedup
Send data to Kafka
# Send multiple JSON events to Kafka
echo '{"event_id": "49a6fdd6f305428881f3436eb498fc9d", "user": {"id": "8db09a6aa33a46f6bdabe4683a34ac4d", "name": "John Doe", "email": "[email protected]"}, "created_at": "2024-03-20T10:00:00Z", "tags": ["tag1", "tag222"]}
{"event_id": "49a6fdd6f305428881f3436eb498fc9d", "user": {"id": "8db09a6aa33a46f6bdabe4683a34ac4d", "name": "John Doe", "email": "[email protected]"}, "created_at": "2024-03-20T10:01:00Z", "tags": ["tag1", "tag222"]}
{"event_id": "f0ed455046a543459d9a51502cdc756d", "user": {"id": "a7f93b87e29c4978848731e204e47e97", "name": "Jane Smith", "email": "[email protected]"}, "created_at": "2024-03-20T10:03:00Z", "tags": ["tag13", "tag324"]}' |
docker compose exec -T kafka kafka-console-producer \
--topic users \
--bootstrap-server localhost:9092
Verify Results
After a few seconds (maximum delay time - default 1 minute), you should see the deduplicated events in ClickHouse:
docker compose exec clickhouse clickhouse-client \
--user default \
--password secret \
-f prettycompact \
--query "SELECT * FROM users_dedup"
ββevent_idββ¬βuser_idββ¬βnameββββββββ¬βemailβββββββββββββ¬ββββββββββcreated_atββ¬βtagsβββββββββββββββββ
1. β 123 β 456 β John Doe β [email protected] β 2024-03-20 10:00:00 β ["tag1", "tag222"] β
2. β 124 β 457 β Jane Smith β [email protected] β 2024-03-20 10:03:00 β ["tag13", "tag324"] β
ββββββββββββ΄ββββββββββ΄βββββββββββββ΄βββββββββββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββββββ
To start creating your own pipelines with the UI, you can follow the Web UI Usage guide.
Option 2: Programmatically Using Python Demos
The Python demos automate the entire process, including:
- Creating Kafka topics
- Setting up ClickHouse tables
- Creating and configuring pipelines
- Generating and sending test events
Setting Up Python Environment
Navigate to the demo directory:
cd demos
Create and activate a virtual environment:
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
# .\venv\Scripts\activate
Install dependencies:
pip install -r requirements.txt
Available Demos
Deduplication
Tests GlassFlowβs deduplication capabilities:
# Run with default options
python demo_deduplication.py
# Run with custom options
python demo_deduplication.py \
--num_records 100000 \
--duplication-rate 0.5
Options:
--num-records
: Number of records to generate (default: 10000)--duplication-rate
: Rate of duplication (default: 0.1)--rps
: Records per second (default: 1000)--config
: Path to pipeline configuration file--generator-schema
: Path to generator schema file--print-n-rows
or-p
: Number of rows to print from results--yes
or-y
: Skip confirmation prompts--cleanup
or-c
: Cleanup ClickHouse table before running
Configuration Files
The demo uses configuration files in the config
directory:
-
Pipeline Configurations (
config/glassflow/
):deduplication_pipeline.json
: Deduplication pipeline configjoin_pipeline.json
: Join pipeline config
-
Generator Schemas (
config/glassgen/
):user_event.json
: User event schemaorder_event.json
: Order event schema
To start creating your own pipelines with the python SDK, you can follow the Python SDK Usage guide.
Cleaning Up
To stop and remove all demo containers:
docker compose down
To remove all data and start fresh:
docker compose down -v