Web UI
The GlassFlow web interface provides an intuitive, visual way to create and manage data pipelines without writing code. This guide will walk you through the complete process of setting up different types of pipelines using the web interface.
Getting Started
Access the Web Interface
The GlassFlow web interface is available at http://localhost:8080
by default. For production deployments, use your configured GlassFlow URL.
Pipeline Types
The web interface supports four main pipeline types:
- Deduplicate - Remove duplicate records based on specified keys
- Join - Combine data from multiple Kafka topics
- Deduplicate and Join - Both deduplication and joining in a single pipeline
- Ingest Only - Simple data ingestion without transformations
Creating a Deduplication Pipeline
This section walks through creating a pipeline that removes duplicate records from a Kafka topic.
Step 1: Setup Kafka Connection
Configure the connection to your Kafka cluster:
Connection Parameters
- Brokers: Enter your Kafka broker addresses (e.g.,
localhost:9092
orkafka:9093
) - Protocol: Select the connection protocol
PLAINTEXT
- For unsecured local developmentSASL_SSL
- For production with authenticationSSL
- For SSL-secured connections
- Authentication: Configure if required
- Username: Your Kafka username
- Password: Your Kafka password
- Mechanism: Authentication mechanism (e.g.,
SCRAM-SHA-256
) - Root CA: SSL certificate
- Skip Authentication: Enable for unsecured connections
Local Kafka Connection
Production Kafka Connection
Step 2: Select Topic
Choose the Kafka topic and define its schema:
Topic Selection
- Topic Name: Select the Kafka topic you want to process
- Consumer Group Initial Offset: Choose where to start reading
earliest
- Start from the beginning of the topiclatest
- Start from the most recent messages
Schema Definition The UI automatically detects the schema of the topic.
Step 3: Define Deduplicate Keys
Configure deduplication settings to remove duplicate records:
Deduplication Configuration
- Enable Deduplication: Toggle to enable/disable deduplication
- ID Field: Select the field to use for identifying duplicates
- Choose a field that uniquely identifies each record
- Common choices:
user_id
,event_id
,transaction_id
- ID Field Type: Specify the data type of the ID field
- Usually
string
for most use cases
- Usually
- Time Window: Set the deduplication time window
30s
- 30 seconds1m
- 1 minute1h
- 1 hour12h
- 12 hours24h
- 24 hours
Deduplication Logic
The system will:
- Track the specified ID field within the time window
- Keep only the first occurrence of each unique ID
- Discard subsequent duplicates within the time window
- Reset tracking after the time window expires
Step 4: Setup ClickHouse Connection
Configure the connection to your ClickHouse database:
Connection Parameters
- Host: Enter your ClickHouse server address
- Port: Specify the ClickHouse port (default:
8123
for HTTP,8443
for HTTPS) - Username: Your ClickHouse username
- Password: Your ClickHouse password
- Database: Select the target database
- Secure Connection: Enable for TLS/SSL connections
Connection Testing
- Click “Test Connection” to verify connectivity
- Ensure you can successfully connect to your ClickHouse instance
- Proceed once connection is confirmed
Clickhouse Local Connection
Clickhouse Production Connection
Step 5: Select Destination
Configure the destination table and field mappings:
Table Configuration
- Table: Select an existing table or create a new one
- Table Name: Enter the name for your destination table
Field Mapping
Map source fields to ClickHouse columns:
- Source Field: The field from your Kafka topic
- Column Name: The corresponding ClickHouse column name
- Column Type: Select the ClickHouse data type:
String
- For text dataInt8
,Int16
,Int32
,Int64
- For integersFloat32
,Float64
- For floating-point numbersDateTime
- For timestamp dataBoolean
- For boolean values
Batch Configuration
- Max Batch Size: Maximum number of records per batch (default: 1000)
- Max Delay Time: Maximum time to wait before flushing batch (default: 1s)
Creating a Join Pipeline
This section covers creating pipelines that combine data from multiple Kafka topics.
Step 1-2: Setup Multiple Kafka Connections
Follow the same steps as the deduplication pipeline, but configure connections for multiple Kafka topics:
- Setup First Kafka Connection (Left Topic)
- Select First Topic and define schema
- Setup Second Kafka Connection (Right Topic)
- Select Second Topic and define schema
Step 3: Configure Join Settings
Join Configuration
- Join Type: Select
temporal
for time-based joins - Join Key: Specify the field used to match records between topics
- Time Window: Set the join time window for matching records
- Orientation: Choose join direction
left
- Keep all records from the first topicright
- Keep all records from the second topicinner
- Keep only matching records
Join Logic
The system will:
- Match records based on the join key
- Consider records within the specified time window
- Combine matching records according to the join orientation
- Output the joined results
Step 4-5: Setup ClickHouse and Destination
Follow the same steps as the deduplication pipeline for the destination configuration.
Creating a Deduplicate and Join Pipeline
This pipeline type combines both deduplication and joining capabilities.
Step 1-2: Setup Multiple Kafka Connections
Configure connections for all topics you want to process.
Step 3: Configure Deduplication
Set up deduplication for each topic individually:
- Configure deduplication keys for the first topic
- Configure deduplication keys for the second topic
- Set appropriate time windows for each
Step 4: Configure Join Settings
Set up the join configuration as described in the join pipeline section.
Step 5-6: Setup ClickHouse and Destination
Configure the destination as in previous pipeline types.
Creating an Ingest Only Pipeline
This is the simplest pipeline type for basic data ingestion without transformations.
Step 1: Setup Kafka Connection
Follow the same Kafka connection setup as other pipeline types.
Step 2: Select Topic
Choose your topic and define the schema.
Step 3: Setup ClickHouse Connection
Configure the ClickHouse connection.
Step 4: Select Destination
Configure the destination table and field mappings.
Note: No deduplication or join configuration is needed for ingest-only pipelines.
Deploying the Pipeline
Review Configuration
Before deploying:
- Review all connection settings
- Verify field mappings
- Check transformation configurations
- Ensure all required fields are properly configured
Deploy
- Click the “Deploy” button
- The system will:
- Validate your configuration
- Generate the pipeline configuration
- Send the configuration to the GlassFlow API
- Start the pipeline processing
Deployment Status
Monitor your pipeline:
- Status: Shows if the pipeline is running, stopped, or in error
- Metrics: View processing statistics
- Logs: Access pipeline logs for debugging
Pipeline Management
Deleting Pipelines
- Remove pipeline configurations
- Clean up resources
Important: Only one pipeline can be active at a time in the current version.
Best Practices
1. Connection Security
- Use secure connections (SASL_SSL/SSL) for production
- Store credentials securely
- Test connections before deploying
2. Schema Design
- Define clear, consistent field names
- Use appropriate data types
- Consider future data structure changes
3. Performance Tuning
- Adjust batch sizes based on your data volume
- Set appropriate time windows for deduplication
- Monitor pipeline performance metrics
4. Error Handling
- Review pipeline logs regularly
- Set up monitoring for pipeline failures
- Have fallback strategies for data processing
Troubleshooting
Common Issues
-
Connection Failures
- Verify network connectivity
- Check authentication credentials
- Ensure proper SSL certificates
-
Schema Mismatches
- Verify field names match exactly
- Check data type compatibility
- Review JSON structure
-
Performance Issues
- Adjust batch sizes
- Review time window settings
- Monitor resource usage
Getting Help
- Check the pipeline logs for detailed error messages
- Review the Pipeline Configuration documentation
- Consult the FAQ for common solutions
Next Steps
- Explore the Pipeline Configuration documentation for detailed configuration options
- Check out the demo scripts for more examples
- Learn about monitoring and observability for your pipelines