Skip to Content

Web UI

The GlassFlow web interface provides an intuitive, visual way to create and manage data pipelines without writing code. This guide will walk you through the complete process of setting up different types of pipelines using the web interface.

Getting Started

Access the Web Interface

When using the GlassFlow CLI, the web interface is available at http://localhost:30080 after running glassflow up. For production deployments, use your configured GlassFlow URL.

Pipeline Types

The web interface supports four main pipeline types:

  1. Deduplicate - Remove duplicate records based on specified keys
  2. Join - Combine data from multiple Kafka topics
  3. Deduplicate and Join - Both deduplication and joining in a single pipeline
  4. Single Topic - Ingest, filter, transform, and deduplicate data from a single Kafka topic

Pipeline Types

Creating a Pipeline

Creating a Single Topic Pipeline

This section walks through creating a pipeline that ingests, filters, transforms, and deduplicates records from a single Kafka topic.

Setup Kafka Connection

Configure the connection to your Kafka cluster:

Connection Parameters

  • Brokers: Enter your Kafka broker addresses (e.g., localhost:9092 or kafka:9093)
  • Protocol: Select the connection protocol
    • PLAINTEXT - For unsecured local development
    • SASL_SSL - For production with authentication
    • SSL - For SSL-secured connections
  • Authentication: Configure if required
    • Username: Your Kafka username
    • Password: Your Kafka password
    • Mechanism: Authentication mechanism (e.g., SCRAM-SHA-256)
    • Root CA: SSL certificate
  • Skip Authentication: Enable for unsecured connections

Kafka Connection

Setup Kafka Connection

Select Topic

Choose the Kafka topic and define its schema:

Topic Selection

  • Topic Name: Select the Kafka topic you want to process
  • Consumer Group Initial Offset: Choose where to start reading
    • earliest - Start from the beginning of the topic
    • latest - Start from the most recent messages

Select Topic

Schema Definition The UI automatically detects the schema of the topic.

Configure Deduplicate

This step is optional. Skip it if you do not need to deduplicate records before they reach ClickHouse.

Configure deduplication settings to remove duplicate records:

Deduplication Configuration

  • Deduplication Key: Select the field to use for identifying duplicates
    • Choose a field that uniquely identifies each record
    • Common choices: user_id, event_id, transaction_id
  • Time Window: Set the deduplication time window
    • 30s - 30 seconds
    • 1m - 1 minute
    • 1h - 1 hour
    • 12h - 12 hours
    • 24h - 24 hours

Deduplication Configuration

Deduplication Logic

The system will:

  1. Track the specified ID field within the time window
  2. Keep only the first occurrence of each unique ID
  3. Discard subsequent duplicates within the time window
  4. Reset tracking after the time window expires

Configure Filter

This step is optional. Skip it if you do not need to filter records before they reach ClickHouse.

Toggle the filter on to define a condition that identifies records to drop. Records that match the expression are excluded from the pipeline; records that do not match are passed through.

Filter Configuration

  • Filter Expression: Enter a JSON-based condition that describes the records you want to drop
    • Example: {"status": "aborted"} drops every record where status equals "cancelled"
    • Conditions can reference any field present in the topic schema
    • Only records that match the expression are dropped; all others continue downstream

Filter Configuration

Configure Stateless Transformation

This step is optional. Skip it if you do not need to modify records before they are written to ClickHouse. When both filter and transformation are enabled, the transformation is applied after filtering.

Toggle the transformation on to define an expression or script that reshapes each record passing through the pipeline.

Transformation Configuration

  • Transformation Expression: Write the expression or script to apply to each record
    • Field renaming: Map an existing field to a new name
    • Adding fields: Compute and attach new fields to the record
    • Removing fields: Drop fields that are not needed downstream
    • Type casting: Convert a field value to a different data type

Stateless Transformation Configuration

Setup ClickHouse Connection

Configure the connection to your ClickHouse database:

Connection Parameters

  • Host: Enter your ClickHouse server address
  • Port: Specify the ClickHouse port (default: 8123 for HTTP, 8443 for HTTPS)
  • Username: Your ClickHouse username
  • Password: Your ClickHouse password
  • Database: Select the target database
  • Secure Connection: Enable for TLS/SSL connections
  • Skip Certificate Verification: Enable for self-signed certificates

Clickhouse Connection

Clickhouse Connection

Select Destination

Configure the destination table and field mappings:

Table Configuration

  • Table: Select an existing table or create a new one
  • Table Name: Enter the name for your destination table

Batch Configuration

  • Max Batch Size: Maximum number of records per batch (default: 1000)
  • Max Delay Time: Maximum time to wait before flushing batch (default: 1s)

Field Mapping

Map source fields to ClickHouse columns:

  • Source Field: The field from your Kafka topic
  • Data Type: The incoming data type extracted from previous configuration
  • Destination Column: The corresponding ClickHouse column name and type loaded from the database

Clickhouse Table Mapping

Configure Pipeline Resources

Set the replica counts and CPU/memory limits for each pipeline component. The defaults are sufficient for most workloads up to ~80k rps. Increase them when targeting higher throughput.

Component Resources

For each component — Ingestor, Transform, and Sink — you can configure:

  • Replicas: Number of parallel instances. More replicas increase throughput for that stage.
    • Ingestor replicas must not exceed the Kafka topic’s partition count
    • Transform replicas cannot be changed after pipeline creation when deduplication is enabled
  • CPU Request / Limit: CPU allocated to each replica (millicores)
  • Memory Request / Limit: RAM allocated to each replica

NATS Stream Settings

  • Max Bytes: Maximum size of the internal NATS buffer (e.g., 25GB)
  • Max Age: Maximum retention time for buffered messages (e.g., 24h)

For guidance on replica counts and resource values for specific throughput targets, see the Scaling Guide.

Pipeline Resources Configuration

Creating a Multi-Topic Pipeline

This section covers creating pipelines that combine data from multiple Kafka topics.

Setup Multiple Kafka Connections

Follow the same steps as the deduplication pipeline, but configure connections for multiple Kafka topics:

  1. Setup First Kafka Connection (Left Topic)
  2. Select First Topic and define schema
  3. Setup Second Kafka Connection (Right Topic)
  4. Select Second Topic and define schema

Configure Join Settings

Join Configuration

  • Join Type: Select temporal for time-based joins
  • Join Key: Specify the field used to match records between topics
  • Time Window: Set the join time window for matching records

Join Logic

The system will:

  1. Match records based on the join key
  2. Consider records within the specified time window
  3. Combine matching records according to the join orientation
  4. Output the joined results

Setup ClickHouse and Destination

Follow the same steps as the deduplication pipeline for the destination configuration.

Deploying the Pipeline

Review Configuration

Before deploying:

  1. Review all connection settings
  2. Verify field mappings
  3. Check transformation configurations
  4. Ensure all required fields are properly configured

Deploy

  1. Click the “Deploy” button
  2. The system will:
    • Validate your configuration
    • Generate the pipeline configuration
    • Send the configuration to the GlassFlow API
    • Start the pipeline processing

Deployment Status

Monitor your pipeline:

  • Status: Shows if the pipeline is running, stopped, or in error
  • Metrics: View processing statistics

Pipeline Management

Pipeline Page

Stopping Pipelines

When a pipeline is stopped, it will stop ingesting new messages and once all messages in the queue are processed, the pipeline will stop (i.e. scale down resources to 0).

Terminating Pipelines

When a pipeline is terminated, it will stop ingesting new messages and immediately terminate the pipeline (i.e. scale down resources to 0).

Resuming Pipelines

When a pipeline is resumed, it will resume ingesting new messages and scale up resources to the original size.

Editing Pipelines

All pipeline configurations can be edited. Only stopped (or terminated) pipelines can be edited.

Deleting Pipelines

Only stopped (or terminated) pipelines can be deleted. Deleting a pipeline will remove all pipeline resources and configuration.

Pipeline States

Every pipeline moves through a defined set of states. Understanding these states helps you interpret the dashboard, respond to failures, and avoid invalid operations.

State Reference

StateTypeDescription
CreatedStableThe pipeline configuration has been saved but the pipeline has never been started. No Kubernetes workloads are running.
RunningStableThe pipeline is actively consuming from Kafka, processing data, and writing to ClickHouse.
StoppingTransitionalA stop has been requested. The pipeline is draining in-flight messages from the queue before scaling down to zero replicas.
StoppedStableThe pipeline has scaled down to zero replicas. No data is being processed. Configuration is preserved and the pipeline can be resumed or deleted.
ResumingTransitionalA resume has been requested. Kubernetes workloads are scaling back up. The pipeline transitions to Running once all components are healthy.
TerminatingTransitionalA terminate has been requested. The pipeline is scaling down immediately, without waiting to drain the queue.
FailedStableA system-level failure has occurred that the pipeline cannot recover from automatically (for example, a Kubernetes workload crash or an unrecoverable NATS error).

Transitional states (Stopping, Resuming, Terminating) are temporary. The pipeline moves through them automatically; you cannot directly request a transitional state. While a pipeline is in a transitional state, most operations are blocked — the exception is that Terminating can always be requested from any non-terminal state.

Best Practices

1. Connection Security

  • Use secure connections (SASL_SSL/SSL) for production
  • Store credentials securely
  • Test connections before deploying

2. Schema Design

  • Define clear, consistent field names
  • Use appropriate data types
  • Consider future data structure changes

3. Performance Tuning

  • Adjust batch sizes based on your data volume
  • Set appropriate time windows for deduplication
  • Monitor pipeline performance metrics

4. Error Handling

  • Review pipeline logs regularly
  • Set up monitoring for pipeline failures
  • Have fallback strategies for data processing

Troubleshooting

Common Issues

  1. Connection Failures

    • Verify network connectivity
    • Check authentication credentials
    • Ensure proper SSL certificates
  2. Schema Mismatches

    • Verify field names match exactly
    • Check data type compatibility
    • Review JSON structure
  3. Performance Issues

    • Adjust batch sizes
    • Review time window settings
    • Monitor resource usage

Getting Help

  • Check the pipeline logs for detailed error messages
  • Review the Pipeline JSON Reference documentation
  • Consult the FAQ for common solutions

Next Steps

Last updated on