Load Test Results
Last Updated: May 22, 2025
Overview
This document presents the results from a load testing run of the ETL pipeline with GlassFlow. The test was conducted to evaluate the performance and reliability of the system under various configurations.
Test Data
The complete test results are available in the etl-clickhouse-loadtest repository. The results include detailed metrics for each test variant, including throughput, latency, and processing times.
Test Data Format
The tests were conducted using user event data with the following JSON structure:
{
"event_id": "$uuid4",
"user_id": "$uuid4",
"name": "$name",
"email": "$email",
"created_at": "$datetime(%Y-%m-%d %H:%M:%S)"
}
Test Scope
These load tests focused specifically on the deduplication functionality of GlassFlow. The tests did not include temporal joins. Each test run:
- Generated events with the above schema
- Applied deduplication based on event_id
- Measured performance metrics for the deduplication process
Discussion
We encourage discussion and feedback on these results. Please join the conversation in our GitHub Discussions where you can:
- Share your observations
- Ask questions about the test methodology
- Discuss potential optimizations
- Compare results with your own testing
Test results
GlassFlow Pipeline Parameters
Parameter | Value |
---|---|
Duplication Rate | 0.1 |
Deduplication Window | 8h |
Max Delay Time | 10s |
Max Batch Size (GlassFlow Sink - Clickhouse) | 5000 |
Test Results
Variant ID | #records (millions) | #Kafka Publishers (num_processes) | Source RPS in Kafka (records/s) | GlassFlow RPS (records/s) | Average Latency (ms) | Lag (sec) |
---|---|---|---|---|---|---|
load_9fb6b2c9 | 5.0 | 2 | 8705 | 8547 | 0.117 | 10.1 |
load_0b8b8a70 | 10.0 | 2 | 8773 | 8653 | 0.1156 | 15.04 |
load_a7e0c0df | 15.0 | 2 | 8804 | 8748 | 0.1143 | 10.04 |
load_bd0fdf39 | 20.0 | 2 | 8737 | 8556 | 0.1169 | 47.74 |
load_1542aa3b | 5.0 | 4 | 17679 | 9189 | 0.1088 | 260.55 |
load_a85a4c42 | 10.0 | 4 | 17738 | 9429 | 0.1061 | 495.97 |
load_5efd111b | 15.0 | 4 | 17679 | 9341 | 0.1071 | 756.49 |
load_23da167d | 20.0 | 4 | 17534 | 9377 | 0.1066 | 991.77 |
load_883b39a0 | 5.0 | 6 | 25995 | 8869 | 0.1128 | 370.57 |
load_b083f89f | 10.0 | 6 | 26226 | 9148 | 0.1093 | 710.97 |
load_462558f4 | 15.0 | 6 | 26328 | 9191 | 0.1088 | 1061.44 |
load_254adf29 | 20.0 | 6 | 26010 | 8391 | 0.1192 | 1613.62 |
load_0c3fdefc | 5.0 | 8 | 34384 | 8895 | 0.1124 | 415.78 |
load_3942530b | 10.0 | 8 | 33779 | 8747 | 0.1143 | 846.26 |
load_d2c1783c | 15.0 | 8 | 34409 | 9067 | 0.1103 | 1217.37 |
load_febf151f | 20.0 | 8 | 35135 | 9121 | 0.1096 | 1622.75 |
load_993c0bc5 | 5.0 | 10 | 40256 | 8757 | 0.1142 | 445.76 |
load_022e44e5 | 10.0 | 10 | 38715 | 8687 | 0.1151 | 891.8 |
load_0adbae83 | 15.0 | 10 | 39820 | 8694 | 0.115 | 1347.66 |
load_77d67ac7 | 20.0 | 10 | 40458 | 8401 | 0.119 | 1885.24 |
load_af120520 | 5.0 | 12 | 37691 | 8068 | 0.124 | 485.95 |
load_c9424931 | 10.0 | 12 | 45743 | 8610 | 0.1161 | 941.66 |
load_ee837ca6 | 15.0 | 12 | 45539 | 8605 | 0.1162 | 1412.48 |
load_ac40b143 | 20.0 | 12 | 49005 | 8878 | 0.1126 | 1843.61 |
load_675d04f3 | 5.0 | 12 | 40382 | 8467 | 0.1181 | 465.66 |
load_28956d50 | 10.0 | 12 | 55829 | 8018 | 0.1247 | 1066.62 |
Result parameters
Parameter | Description |
---|---|
Variant ID | A unique identifier for each load test run with a specific set of parameters |
Source RPS in Kafka | Rate of sending records to Kafka by the load testing framework (records per second) |
GlassFlow RPS | The average rate of processing that GlassFlow achieved (records per second) |
Average Latency | Average latency in milliseconds per record incurred by GlassFlow processing |
Lag | The time difference (in seconds) between when the last record was sent to Kafka by the load testing framework and when that record became available in ClickHouse |
Performance Graph
Key Findings
Stability
- The system was able to handle the load without any issues, even upto 55k RPS.
- The system was able to deduplicate the data without any issues
GlassFlow Processing RPS
- GlassFlow processing RPS was stable even under high load
- The processing RPS seems to peak at 9000 records per second in the current setup
- The processing RPS is limited by resources of the machine running GlassFlow (currently in docker container)
Lag
- Lag was directly proportional to the Ingestion Kafka RPS and the amount of data sent at the given RPS.
- For a given Ingestion Kafka RPS, lag was increasing with the amount of data sent (expected)
- Since GlassFlow RPS was maxed at around 9000 records per second, lag was increasing with the increased Ingestion Kafka RPS
Resource Utilization
- CPU utilization was efficient across all test variants
- Memory usage remained stable during extended test runs
Next Steps
-
Further Testing
- Longer duration tests
- Higher throughput scenarios
- Tests with different events data
-
Batch Size Optimization
- Testing with different batch sizes to compare performance and lag
-
Monitoring Improvements
- Enhanced metrics collection
- Advanced analysis of the results
-
Hardware setup
- Testing on different hardware
- Tests with remote clickhouse and kafka to check performance with network delay