Web UI
The GlassFlow web interface provides an intuitive, visual way to create and manage data pipelines without writing code. This guide will walk you through the complete process of setting up different types of pipelines using the web interface.
Getting Started
Access the Web Interface
When using the GlassFlow CLI, the web interface is available at http://localhost:30080 after running glassflow up. For production deployments, use your configured GlassFlow URL.
Pipeline Types
The web interface supports four main pipeline types:
- Deduplicate - Remove duplicate records based on specified keys
- Join - Combine data from multiple Kafka topics
- Deduplicate and Join - Both deduplication and joining in a single pipeline
- Single Topic - Ingest, filter, transform, and deduplicate data from a single Kafka topic

Creating a Pipeline
Creating a Single Topic Pipeline
This section walks through creating a pipeline that ingests, filters, transforms, and deduplicates records from a single Kafka topic.
Setup Kafka Connection
Configure the connection to your Kafka cluster:
Connection Parameters
- Brokers: Enter your Kafka broker addresses (e.g.,
localhost:9092orkafka:9093) - Protocol: Select the connection protocol
PLAINTEXT- For unsecured local developmentSASL_SSL- For production with authenticationSSL- For SSL-secured connections
- Authentication: Configure if required
- Username: Your Kafka username
- Password: Your Kafka password
- Mechanism: Authentication mechanism (e.g.,
SCRAM-SHA-256) - Root CA: SSL certificate
- Skip Authentication: Enable for unsecured connections
Kafka Connection

Select Topic
Choose the Kafka topic and define its schema:
Topic Selection
- Topic Name: Select the Kafka topic you want to process
- Consumer Group Initial Offset: Choose where to start reading
earliest- Start from the beginning of the topiclatest- Start from the most recent messages

Schema Definition The UI automatically detects the schema of the topic.
Configure Deduplicate
This step is optional. Skip it if you do not need to deduplicate records before they reach ClickHouse.
Configure deduplication settings to remove duplicate records:
Deduplication Configuration
- Deduplication Key: Select the field to use for identifying duplicates
- Choose a field that uniquely identifies each record
- Common choices:
user_id,event_id,transaction_id
- Time Window: Set the deduplication time window
30s- 30 seconds1m- 1 minute1h- 1 hour12h- 12 hours24h- 24 hours

Deduplication Logic
The system will:
- Track the specified ID field within the time window
- Keep only the first occurrence of each unique ID
- Discard subsequent duplicates within the time window
- Reset tracking after the time window expires
Configure Filter
This step is optional. Skip it if you do not need to filter records before they reach ClickHouse.
Toggle the filter on to define a condition that identifies records to drop. Records that match the expression are excluded from the pipeline; records that do not match are passed through.
Filter Configuration
- Filter Expression: Enter a JSON-based condition that describes the records you want to drop
- Example:
{"status": "aborted"}drops every record wherestatusequals"cancelled" - Conditions can reference any field present in the topic schema
- Only records that match the expression are dropped; all others continue downstream
- Example:

Configure Stateless Transformation
This step is optional. Skip it if you do not need to modify records before they are written to ClickHouse. When both filter and transformation are enabled, the transformation is applied after filtering.
Toggle the transformation on to define an expression or script that reshapes each record passing through the pipeline.
Transformation Configuration
- Transformation Expression: Write the expression or script to apply to each record
- Field renaming: Map an existing field to a new name
- Adding fields: Compute and attach new fields to the record
- Removing fields: Drop fields that are not needed downstream
- Type casting: Convert a field value to a different data type

Setup ClickHouse Connection
Configure the connection to your ClickHouse database:
Connection Parameters
- Host: Enter your ClickHouse server address
- Port: Specify the ClickHouse port (default:
8123for HTTP,8443for HTTPS) - Username: Your ClickHouse username
- Password: Your ClickHouse password
- Database: Select the target database
- Secure Connection: Enable for TLS/SSL connections
- Skip Certificate Verification: Enable for self-signed certificates
Clickhouse Connection

Select Destination
Configure the destination table and field mappings:
Table Configuration
- Table: Select an existing table or create a new one
- Table Name: Enter the name for your destination table
Batch Configuration
- Max Batch Size: Maximum number of records per batch (default: 1000)
- Max Delay Time: Maximum time to wait before flushing batch (default: 1s)
Field Mapping
Map source fields to ClickHouse columns:
- Source Field: The field from your Kafka topic
- Data Type: The incoming data type extracted from previous configuration
- Destination Column: The corresponding ClickHouse column name and type loaded from the database

Configure Pipeline Resources
Set the replica counts and CPU/memory limits for each pipeline component. The defaults are sufficient for most workloads up to ~80k rps. Increase them when targeting higher throughput.
Component Resources
For each component — Ingestor, Transform, and Sink — you can configure:
- Replicas: Number of parallel instances. More replicas increase throughput for that stage.
- Ingestor replicas must not exceed the Kafka topic’s partition count
- Transform replicas cannot be changed after pipeline creation when deduplication is enabled
- CPU Request / Limit: CPU allocated to each replica (millicores)
- Memory Request / Limit: RAM allocated to each replica
NATS Stream Settings
- Max Bytes: Maximum size of the internal NATS buffer (e.g.,
25GB) - Max Age: Maximum retention time for buffered messages (e.g.,
24h)
For guidance on replica counts and resource values for specific throughput targets, see the Scaling Guide.

Creating a Multi-Topic Pipeline
This section covers creating pipelines that combine data from multiple Kafka topics.
Setup Multiple Kafka Connections
Follow the same steps as the deduplication pipeline, but configure connections for multiple Kafka topics:
- Setup First Kafka Connection (Left Topic)
- Select First Topic and define schema
- Setup Second Kafka Connection (Right Topic)
- Select Second Topic and define schema
Configure Join Settings
Join Configuration
- Join Type: Select
temporalfor time-based joins - Join Key: Specify the field used to match records between topics
- Time Window: Set the join time window for matching records
Join Logic
The system will:
- Match records based on the join key
- Consider records within the specified time window
- Combine matching records according to the join orientation
- Output the joined results
Setup ClickHouse and Destination
Follow the same steps as the deduplication pipeline for the destination configuration.
Deploying the Pipeline
Review Configuration
Before deploying:
- Review all connection settings
- Verify field mappings
- Check transformation configurations
- Ensure all required fields are properly configured
Deploy
- Click the “Deploy” button
- The system will:
- Validate your configuration
- Generate the pipeline configuration
- Send the configuration to the GlassFlow API
- Start the pipeline processing
Deployment Status
Monitor your pipeline:
- Status: Shows if the pipeline is running, stopped, or in error
- Metrics: View processing statistics
Pipeline Management

Stopping Pipelines
When a pipeline is stopped, it will stop ingesting new messages and once all messages in the queue are processed, the pipeline will stop (i.e. scale down resources to 0).
Terminating Pipelines
When a pipeline is terminated, it will stop ingesting new messages and immediately terminate the pipeline (i.e. scale down resources to 0).
Resuming Pipelines
When a pipeline is resumed, it will resume ingesting new messages and scale up resources to the original size.
Editing Pipelines
All pipeline configurations can be edited. Only stopped (or terminated) pipelines can be edited.
Deleting Pipelines
Only stopped (or terminated) pipelines can be deleted. Deleting a pipeline will remove all pipeline resources and configuration.
Pipeline States
Every pipeline moves through a defined set of states. Understanding these states helps you interpret the dashboard, respond to failures, and avoid invalid operations.
State Reference
| State | Type | Description |
|---|---|---|
Created | Stable | The pipeline configuration has been saved but the pipeline has never been started. No Kubernetes workloads are running. |
Running | Stable | The pipeline is actively consuming from Kafka, processing data, and writing to ClickHouse. |
Stopping | Transitional | A stop has been requested. The pipeline is draining in-flight messages from the queue before scaling down to zero replicas. |
Stopped | Stable | The pipeline has scaled down to zero replicas. No data is being processed. Configuration is preserved and the pipeline can be resumed or deleted. |
Resuming | Transitional | A resume has been requested. Kubernetes workloads are scaling back up. The pipeline transitions to Running once all components are healthy. |
Terminating | Transitional | A terminate has been requested. The pipeline is scaling down immediately, without waiting to drain the queue. |
Failed | Stable | A system-level failure has occurred that the pipeline cannot recover from automatically (for example, a Kubernetes workload crash or an unrecoverable NATS error). |
Transitional states (Stopping, Resuming, Terminating) are temporary. The pipeline moves through them automatically; you cannot directly request a transitional state. While a pipeline is in a transitional state, most operations are blocked — the exception is that Terminating can always be requested from any non-terminal state.
Best Practices
1. Connection Security
- Use secure connections (SASL_SSL/SSL) for production
- Store credentials securely
- Test connections before deploying
2. Schema Design
- Define clear, consistent field names
- Use appropriate data types
- Consider future data structure changes
3. Performance Tuning
- Adjust batch sizes based on your data volume
- Set appropriate time windows for deduplication
- Monitor pipeline performance metrics
4. Error Handling
- Review pipeline logs regularly
- Set up monitoring for pipeline failures
- Have fallback strategies for data processing
Troubleshooting
Common Issues
-
Connection Failures
- Verify network connectivity
- Check authentication credentials
- Ensure proper SSL certificates
-
Schema Mismatches
- Verify field names match exactly
- Check data type compatibility
- Review JSON structure
-
Performance Issues
- Adjust batch sizes
- Review time window settings
- Monitor resource usage
Getting Help
- Check the pipeline logs for detailed error messages
- Review the Pipeline JSON Reference documentation
- Consult the FAQ for common solutions
Next Steps
- Explore the Pipeline JSON Reference documentation for detailed configuration options
- Learn about monitoring and observability for your pipelines