Skip to Content
Load TestingTest Setup

Load Test Setup Guide

The complete load testing code is available in the clickhouse-etl-loadtest repository.

Test Environment

Hardware Specifications

The load tests were conducted on a MacBook Pro with the following specifications:

SpecificationDetails
Model NameMacBook Pro
Model IdentifierMac14,5
Model NumberMPHG3D/A
ChipApple M2 Max
Total Number of Cores12 (8 performance and 4 efficiency)
Memory32 GB

Software Stack

The test environment uses Docker containers for all components:

  • Kafka: Message broker for event streaming
  • ClickHouse: Database for storing and querying the processed events
  • GlassFlow: Data pipeline tool for processing and transforming events

Test Architecture

Component Overview

Test Flow

  1. Each test run:

    • Creates a new GlassFlow pipeline
    • Configures the pipeline with test parameters
    • Sends test data to Kafka
    • Monitors the data flow through the pipeline
    • Verifies results in ClickHouse
    • Reports metrics for the run
  2. Success Criteria:

    • Test is considered successful when all expected data is available in ClickHouse
    • Metrics are collected and reported for each run

Performance Considerations

Kafka Ingestion Rate (Records Per Second)

  • The tool attempts to achieve the requested RPS
  • The parameter controlling the Kafka ingestion rate is num_processes
  • The actual achieved RPS may vary based on:
    • System resources
    • Network conditions
    • Pipeline configuration
  • The results table shows the actual RPS achieved for each run

Lag Measurement

  • Lag is defined as the time difference between:
    • When the last event is sent to Kafka
    • When that event becomes available in ClickHouse
  • Lag is influenced by:
    • Target RPS
    • Total number of events
    • Pipeline configuration
    • System performance

GlassFlow Processing RPS (Records Per Second)

  • The results shows the average RPS processed by GlassFlow

Configuration Parameters

Test Parameters

The load test can be configured using the following parameters in load_test_params.json:

ParameterRequired/OptionalDescriptionExample Range/ValuesDefault
num_processesRequiredNumber of parallel processes1-N (step: 1)-
total_recordsRequiredTotal number of records to generate500,000-5,000,000 (step: 500,000)-
duplication_rateOptionalRate of duplicate records0.1 (10% duplicates)0.1
deduplication_windowOptionalTime window for deduplication[“1h”, “4h”]“8h”
max_batch_sizeOptionalMax batch size for the sink[5000]5000
max_delay_timeOptionalMax delay time for the sink[”10s”]”10s”

Example Configuration

Example configuration:

{ "parameters": { "num_processes": { "min": 1, "max": 4, "step": 1, "description": "Number of parallel processes to run" }, "total_records": { "min": 5000000, "max": 10000000, "step": 5000000, "description": "Total number of records to generate" } }, "max_combinations": 1 }

ClickHouse Sink Parameters

These parameters affect the performance of the ClickHouse sink component in GlassFlow:

  1. Batch Size

    • Controls how many records are processed in a single batch
    • Higher batch sizes generally provide better performance under high load
    • Should be configured based on:
      • Expected load
      • Available memory
      • Latency requirements
  2. Delay Time

    • Maximum time to wait before processing a batch
    • Affects the balance between latency and throughput
    • Should be tuned based on:
      • Real-time processing requirements
      • System resources
      • Expected load patterns

Running the Tests

Prerequisites

  1. Docker installed and running
  2. Python 3.x installed
  3. Required Python packages installed

Test Execution

# Run a load test with specific configuration python main.py --test-id load_test_1 --config load_test_params.json # Analyze the results python results.py --results-file results/load_test_1.csv

Test Results File Format

The following metrics are collected and displayed for each test run:

MetricDescriptionUnit
duration_secTotal time taken for the testseconds
result_num_recordsNumber of records processedcount
result_time_taken_publish_msTime taken to publish records to Kafkamilliseconds
result_time_taken_msTime taken to process records through the pipelinemilliseconds
result_kafka_ingestion_rpsRecords per second sent to Kafkarecords/second
result_avg_latency_msAverage latency per recordmilliseconds
result_successWhether the test completed successfullyboolean
result_lag_msLag between data generation and processingmilliseconds
result_glassflow_rpsRecords per second processed by GlassFlowrecords/second

Example Results

{ "Parameters": { "Variant ID": "load_bd0fdf39", "Max Batch Size": 5000, "Duplication Rate": 0.1, "Deduplication Window": "8h", "Max Delay Time": "10s" }, "Results": { "Success": "True", "Number of Records": "20.0M", "Time to Publish": "2289.22 s", "Source RPS in Kafka": "8737 records/s", "GlassFlow RPS": "8556 records/s", "Time to Process": "2337.666 s", "Average Latency": "0.0001 s", "Lag": "47.738 s" } }

Best Practices

  1. Resource Management

    • Monitor Docker container resource usage
    • Ensure sufficient memory allocation
    • Watch for container limits
  2. Test Planning

    • Start with conservative parameters
    • Gradually increase load
    • Monitor system resources
    • Document all parameter changes
  3. Data Verification

    • Verify data consistency in ClickHouse
    • Check for any data loss
    • Validate processing order when required
Last updated on