Skip to Content
Load TestingTest Results

Load Test Results

Last Updated: May 22, 2025

Overview

This document presents the results from a load testing run of the ETL pipeline with GlassFlow. The test was conducted to evaluate the performance and reliability of the system under various configurations.

Test Data

The complete test results are available in the etl-clickhouse-loadtest repository. The results include detailed metrics for each test variant, including throughput, latency, and processing times.

Test Data Format

The tests were conducted using user event data with the following JSON structure:

{ "event_id": "$uuid4", "user_id": "$uuid4", "name": "$name", "email": "$email", "created_at": "$datetime(%Y-%m-%d %H:%M:%S)" }

Test Scope

These load tests focused specifically on the deduplication functionality of GlassFlow. The tests did not include temporal joins. Each test run:

  • Generated events with the above schema
  • Applied deduplication based on event_id
  • Measured performance metrics for the deduplication process

Discussion

We encourage discussion and feedback on these results. Please join the conversation in our GitHub Discussions where you can:

  • Share your observations
  • Ask questions about the test methodology
  • Discuss potential optimizations
  • Compare results with your own testing

Test results

GlassFlow Pipeline Parameters

ParameterValue
Duplication Rate0.1
Deduplication Window8h
Max Delay Time10s
Max Batch Size (GlassFlow Sink - Clickhouse)5000

Test Results

Variant ID#records (millions)#Kafka Publishers (num_processes)Source RPS in Kafka (records/s)GlassFlow RPS (records/s)Average Latency (ms)Lag (sec)
load_9fb6b2c95.02870585470.11710.1
load_0b8b8a7010.02877386530.115615.04
load_a7e0c0df15.02880487480.114310.04
load_bd0fdf3920.02873785560.116947.74
load_1542aa3b5.041767991890.1088260.55
load_a85a4c4210.041773894290.1061495.97
load_5efd111b15.041767993410.1071756.49
load_23da167d20.041753493770.1066991.77
load_883b39a05.062599588690.1128370.57
load_b083f89f10.062622691480.1093710.97
load_462558f415.062632891910.10881061.44
load_254adf2920.062601083910.11921613.62
load_0c3fdefc5.083438488950.1124415.78
load_3942530b10.083377987470.1143846.26
load_d2c1783c15.083440990670.11031217.37
load_febf151f20.083513591210.10961622.75
load_993c0bc55.0104025687570.1142445.76
load_022e44e510.0103871586870.1151891.8
load_0adbae8315.0103982086940.1151347.66
load_77d67ac720.0104045884010.1191885.24
load_af1205205.0123769180680.124485.95
load_c942493110.0124574386100.1161941.66
load_ee837ca615.0124553986050.11621412.48
load_ac40b14320.0124900588780.11261843.61
load_675d04f35.0124038284670.1181465.66
load_28956d5010.0125582980180.12471066.62

Result parameters

ParameterDescription
Variant IDA unique identifier for each load test run with a specific set of parameters
Source RPS in KafkaRate of sending records to Kafka by the load testing framework (records per second)
GlassFlow RPSThe average rate of processing that GlassFlow achieved (records per second)
Average LatencyAverage latency in milliseconds per record incurred by GlassFlow processing
LagThe time difference (in seconds) between when the last record was sent to Kafka by the load testing framework and when that record became available in ClickHouse

Performance Graph

Test Results Latency vs RPS

Test Results Scatter Plot

Key Findings

Stability

  • The system was able to handle the load without any issues, even upto 55k RPS.
  • The system was able to deduplicate the data without any issues

GlassFlow Processing RPS

  • GlassFlow processing RPS was stable even under high load
  • The processing RPS seems to peak at 9000 records per second in the current setup
  • The processing RPS is limited by resources of the machine running GlassFlow (currently in docker container)

Lag

  • Lag was directly proportional to the Ingestion Kafka RPS and the amount of data sent at the given RPS.
  • For a given Ingestion Kafka RPS, lag was increasing with the amount of data sent (expected)
  • Since GlassFlow RPS was maxed at around 9000 records per second, lag was increasing with the increased Ingestion Kafka RPS

Resource Utilization

  • CPU utilization was efficient across all test variants
  • Memory usage remained stable during extended test runs

Next Steps

  1. Further Testing

    • Longer duration tests
    • Higher throughput scenarios
    • Tests with different events data
  2. Batch Size Optimization

    • Testing with different batch sizes to compare performance and lag
  3. Monitoring Improvements

    • Enhanced metrics collection
    • Advanced analysis of the results
  4. Hardware setup

    • Testing on different hardware
    • Tests with remote clickhouse and kafka to check performance with network delay
Last updated on