Skip to Content
ConfigurationPrometheus Metrics

GlassFlow Metrics

This guide provides comprehensive information about GlassFlow’s metrics, including available metrics, labels, and monitoring best practices.

  • Metrics are enabled by default and available at the OTEL collector endpoint
  • All backend metrics follow Prometheus format and can be scraped by Prometheus
  • Metrics include component-specific labels for detailed monitoring
  • OpenTelemetry (OTEL) collector exposes metrics via Prometheus exporter
  • UI metrics are exported via OTLP, not Prometheus scraping

Metrics Overview

GlassFlow exports comprehensive metrics in Prometheus format through an OpenTelemetry collector. The metrics are designed to provide visibility into:

  • Data Ingestion: Kafka record consumption rates and volumes
  • Data Processing: Processing duration, throughput, and byte volume metrics
  • Data Sinking: ClickHouse write operations and performance
  • Error Handling: Dead Letter Queue (DLQ) operations
  • OTLP Receiver: Incoming OTLP request rates and latency
  • HTTP Server: API request rates and latency
  • UI: Page views, interactions, and frontend API performance

Metric Naming Convention

All GlassFlow backend metrics follow a consistent naming pattern:

{namespace}_gfm_{metric_name}

Where:

  • {namespace} - Deployment namespace prefix (e.g., “glassflow” if deployed in glassflow namespace)
  • gfm - GlassFlow Metrics prefix
  • {metric_name} - Descriptive metric name

UI metrics use the prefix gfm_ui_ and are exported via OTLP.

The namespace prefix is automatically added based on your deployment configuration. If you deploy GlassFlow in a different namespace, the prefix will change accordingly.

Core Metrics

Data Ingestion Metrics

{namespace}_gfm_kafka_records_read_total

  • Type: Counter
  • Description: Total number of records read from Kafka
  • Unit: Records
  • Components: Ingestor
  • Labels:
    • component: Component type (e.g., “ingestor”) - Added by GlassFlow
    • pipeline_id: Unique pipeline identifier - Added by GlassFlow
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus

Example:

glassflow_gfm_kafka_records_read_total{component="ingestor",instance="ingestor-0-7f44fbbfd8-bqbw9",job="pipeline-load-pipeline-1-05b7/ingestor",pipeline_id="load-pipeline-1-05b7"} 16914

In this example, glassflow is the namespace prefix. If you deploy in a different namespace, the prefix will change accordingly.

Data Processing Metrics

{namespace}_gfm_processing_duration_seconds

  • Type: Histogram
  • Description: Processing duration in seconds
  • Unit: Seconds
  • Components: Ingestor, Sink, Transform, Filter, Dedup
  • Labels:
    • component: Component type - Added by GlassFlow
    • pipeline_id: Unique pipeline identifier - Added by GlassFlow
    • stage: (Optional) Processing stage — Added by GlassFlow. Values: dedup_filter, dedup_write, schema_mapping, total_preparation, per_message. Omitted when not applicable.
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus
    • le: Histogram bucket boundary - Added by Prometheus

Histogram Buckets:

  • 0.001s (1ms)
  • 0.005s (5ms)
  • 0.01s (10ms)
  • 0.025s (25ms)
  • 0.05s (50ms)
  • 0.1s (100ms)
  • 0.25s (250ms)
  • 0.5s (500ms)
  • 1.0s (1s)
  • 2.5s (2.5s)
  • 5.0s (5s)
  • 10.0s (10s)

Example:

glassflow_gfm_processing_duration_seconds_bucket{component="ingestor",instance="ingestor-0-7f44fbbfd8-bqbw9",job="pipeline-load-pipeline-1-05b7/ingestor",pipeline_id="load-pipeline-1-05b7",le="5"} 16914 glassflow_gfm_processing_duration_seconds_sum{component="ingestor",instance="ingestor-0-7f44fbbfd8-bqbw9",job="pipeline-load-pipeline-1-05b7/ingestor",pipeline_id="load-pipeline-1-05b7"} 2.8343126270000267 glassflow_gfm_processing_duration_seconds_count{component="ingestor",instance="ingestor-0-7f44fbbfd8-bqbw9",job="pipeline-load-pipeline-1-05b7/ingestor",pipeline_id="load-pipeline-1-05b7"} 16914

In this example, glassflow is the namespace prefix. If you deploy in a different namespace, the prefix will change accordingly.

{namespace}_gfm_processor_messages_total

  • Type: Counter
  • Description: Total number of messages processed by a processor, by status
  • Unit: Messages
  • Components: Ingestor, Sink, Transform, Filter, Dedup
  • Labels:
    • component: Component type - Added by GlassFlow
    • pipeline_id: Unique pipeline identifier - Added by GlassFlow
    • status: Outcome - Added by GlassFlow — Values: success, error, filtered, duplicate, out
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus

There is no separate gfm_records_filtered_total metric. To track filtered records, query gfm_processor_messages_total with status="filtered".

{namespace}_gfm_bytes_processed_total

  • Type: Counter
  • Description: Total bytes processed
  • Unit: Bytes
  • Components: Ingestor, Sink, Transform, Filter, Dedup
  • Labels:
    • component: Component type - Added by GlassFlow
    • pipeline_id: Unique pipeline identifier - Added by GlassFlow
    • direction: Data flow direction - Added by GlassFlow — Values: in, out
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus

Data Sinking Metrics

{namespace}_gfm_clickhouse_records_written_total

  • Type: Counter
  • Description: Total number of records written to ClickHouse
  • Unit: Records
  • Components: Sink
  • Labels:
    • component: Component type (e.g., “sink”) - Added by GlassFlow
    • pipeline_id: Unique pipeline identifier - Added by GlassFlow
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus

Example:

glassflow_gfm_clickhouse_records_written_total{component="sink",instance="sink-7c87594fd9-jmdw2",job="pipeline-load-pipeline-1-05b7/sink",pipeline_id="load-pipeline-1-05b7"} 80000

In this example, glassflow is the namespace prefix. If you deploy in a different namespace, the prefix will change accordingly.

{namespace}_gfm_clickhouse_records_written_per_second

  • Type: Gauge
  • Description: Number of records written to ClickHouse per second
  • Unit: Records per second
  • Components: Sink
  • Labels:
    • component: Component type (e.g., “sink”) - Added by GlassFlow
    • pipeline_id: Unique pipeline identifier - Added by GlassFlow
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus

Example:

glassflow_gfm_clickhouse_records_written_per_second{component="sink",instance="sink-7c87594fd9-jmdw2",job="pipeline-load-pipeline-1-05b7/sink",pipeline_id="load-pipeline-1-05b7"} 430120.2485745206

In this example, glassflow is the namespace prefix. If you deploy in a different namespace, the prefix will change accordingly.

Error Handling Metrics

{namespace}_gfm_dlq_records_written_total

  • Type: Counter
  • Description: Total number of records written to dead letter queue
  • Unit: Records
  • Components: Ingestor, Sink
  • Labels:
    • component: Component type - Added by GlassFlow
    • pipeline_id: Unique pipeline identifier - Added by GlassFlow
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus

This metric is defined in the code but may not appear in the sample metrics if no records have been written to the DLQ during the observation period.

HTTP Server Metrics

{namespace}_gfm_http_server_request_count

  • Type: Counter
  • Description: Total number of HTTP requests
  • Unit: Requests
  • Components: API server
  • Labels:
    • method: HTTP method - Added by GlassFlow
    • path: Route path template - Added by GlassFlow
    • status: HTTP response status code (integer) - Added by GlassFlow
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus

HTTP server metrics do not include component or pipeline_id labels. They are scoped by method, path, and status only.

{namespace}_gfm_http_server_request_duration_seconds

  • Type: Histogram
  • Description: Duration of HTTP requests in seconds
  • Unit: Seconds
  • Components: API server
  • Labels:
    • method: HTTP method - Added by GlassFlow
    • path: Route path template - Added by GlassFlow
    • status: HTTP response status code (integer) - Added by GlassFlow
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus
    • le: Histogram bucket boundary - Added by Prometheus

Histogram Buckets: Same as processing_duration_seconds (0.001 to 10.0).

OTLP Receiver Metrics

{namespace}_gfm_receiver_request_count

  • Type: Counter
  • Description: Total number of OTLP receiver requests
  • Unit: Requests
  • Components: otlp.logs, otlp.metrics, otlp.traces
  • Labels:
    • component: Component type - Added by GlassFlow
    • pipeline_id: Unique pipeline identifier - Added by GlassFlow
    • transport: Transport protocol - Added by GlassFlow — Values: http, grpc
    • status: Request outcome - Added by GlassFlow — Values: ok, error
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus

{namespace}_gfm_receiver_request_duration_seconds

  • Type: Histogram
  • Description: Duration of OTLP receiver requests in seconds
  • Unit: Seconds
  • Components: otlp.logs, otlp.metrics, otlp.traces
  • Labels:
    • component: Component type - Added by GlassFlow
    • pipeline_id: Unique pipeline identifier - Added by GlassFlow
    • transport: Transport protocol - Added by GlassFlow — Values: http, grpc
    • status: Request outcome - Added by GlassFlow — Values: ok, error
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus
    • le: Histogram bucket boundary - Added by Prometheus

Histogram Buckets: Same as processing_duration_seconds (0.001 to 10.0).

Back-Pressure Metrics

These metrics are emitted by the ingestor component and the NATS stream sampler. They help identify whether back-pressure is active and how long episodes last.

{namespace}_gfm_ingestor_backpressure_active

  • Type: Gauge (Int64)
  • Description: Set to 1 while the ingestor is blocked waiting for NATS to drain; 0 otherwise
  • Unit: —
  • Components: Ingestor
  • Labels:
    • pipeline_id: Unique pipeline identifier - Added by GlassFlow
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus

{namespace}_gfm_ingestor_backpressure_events_total

  • Type: Counter
  • Description: Total number of times the ingestor entered a back-pressure episode
  • Unit: Events
  • Components: Ingestor
  • Labels:
    • pipeline_id: Unique pipeline identifier - Added by GlassFlow
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus

{namespace}_gfm_ingestor_backpressure_duration_seconds

  • Type: Histogram
  • Description: Duration of each ingestor back-pressure episode in seconds
  • Unit: Seconds
  • Components: Ingestor
  • Labels:
    • pipeline_id: Unique pipeline identifier - Added by GlassFlow
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus
    • le: Histogram bucket boundary - Added by Prometheus

Histogram Buckets: 0.1, 0.5, 1, 2.5, 5, 10, 30, 60, 120, 300, 600, 1800 seconds (second-to-minute scale, unlike the millisecond-scale default buckets used by request/processing histograms).

{namespace}_gfm_stream_depth

  • Type: Gauge (Int64)
  • Description: Number of messages currently stored in a JetStream stream
  • Unit: Messages
  • Components: Ingestor
  • Labels:
    • pipeline_id: Unique pipeline identifier - Added by GlassFlow
    • stream: JetStream stream name - Added by GlassFlow
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus

{namespace}_gfm_stream_depth_ratio

  • Type: Gauge (Float64)
  • Description: Stream depth divided by max_messages; ranges from 0.0 to 1.0. Sustained values near 1.0 indicate the stream is near capacity and back-pressure is likely
  • Unit: Ratio (0.0–1.0)
  • Components: Ingestor
  • Labels:
    • pipeline_id: Unique pipeline identifier - Added by GlassFlow
    • stream: JetStream stream name - Added by GlassFlow
    • instance: Instance identifier - Added by Prometheus
    • job: Job identifier - Added by Prometheus

UI Metrics

UI metrics are exported via OTLP (not Prometheus scraping). All UI metrics use the gfm_ui_ prefix.

Interaction Metrics

gfm_ui_page_views_total

  • Type: Counter
  • Description: Total page views
  • Labels: path, component

gfm_ui_button_clicks_total

  • Type: Counter
  • Description: Total button clicks
  • Labels: button_name, component

gfm_ui_form_submissions_total

  • Type: Counter
  • Description: Total form submissions
  • Labels: form_name, success, component

Pipeline Lifecycle Metrics

gfm_ui_pipeline_created_total

  • Type: Counter
  • Description: Total pipelines created from the UI
  • Labels: pipeline_type, component

gfm_ui_pipeline_deleted_total

  • Type: Counter
  • Description: Total pipelines deleted from the UI
  • Labels: pipeline_id, component

gfm_ui_pipeline_status_changed_total

  • Type: Counter
  • Description: Total pipeline status changes from the UI
  • Labels: pipeline_id, from_status, to_status, component

UI API Metrics

gfm_ui_api_request_count

  • Type: Counter
  • Description: Total API requests made from the UI
  • Labels: method, path, status, component

gfm_ui_api_request_errors_total

  • Type: Counter
  • Description: Total API request errors from the UI (HTTP status >= 400)
  • Labels: method, path, status, component

gfm_ui_api_request_duration_seconds

  • Type: Histogram
  • Description: Duration of API requests made from the UI
  • Labels: method, path, status, component

gfm_ui_page_load_duration_seconds

  • Type: Histogram
  • Description: Page load duration
  • Labels: path, component

Component-Specific Metrics

Ingestor Component

The Ingestor component primarily exports:

  • {namespace}_gfm_kafka_records_read_total - Records consumed from Kafka
  • {namespace}_gfm_processing_duration_seconds - Processing time for ingested records
  • {namespace}_gfm_processor_messages_total - Message counts by status
  • {namespace}_gfm_bytes_processed_total - Bytes processed (in/out)
  • {namespace}_gfm_dlq_records_written_total - Records sent to DLQ on processing errors
  • {namespace}_gfm_ingestor_backpressure_active - 1 while blocked on NATS back-pressure
  • {namespace}_gfm_ingestor_backpressure_events_total - Count of back-pressure episodes
  • {namespace}_gfm_ingestor_backpressure_duration_seconds - Duration of each back-pressure episode
  • {namespace}_gfm_stream_depth - Current message count in the JetStream stream
  • {namespace}_gfm_stream_depth_ratio - Stream fill ratio (0.0–1.0)

Sink Component

The Sink component primarily exports:

  • {namespace}_gfm_clickhouse_records_written_total - Records written to ClickHouse
  • {namespace}_gfm_clickhouse_records_written_per_second - Write rate to ClickHouse
  • {namespace}_gfm_processing_duration_seconds - Processing time for sink operations (with optional stage: dedup_filter, dedup_write, schema_mapping, total_preparation, per_message)
  • {namespace}_gfm_processor_messages_total - Message counts by status
  • {namespace}_gfm_bytes_processed_total - Bytes processed (in/out)
  • {namespace}_gfm_dlq_records_written_total - Records sent to DLQ on write errors

Transform Component

The Transform component primarily exports:

  • {namespace}_gfm_processing_duration_seconds - Processing time for transform operations
  • {namespace}_gfm_processor_messages_total - Message counts by status
  • {namespace}_gfm_bytes_processed_total - Bytes processed (in/out)

Filter Component

The Filter component primarily exports:

  • {namespace}_gfm_processing_duration_seconds - Processing time for filter operations
  • {namespace}_gfm_processor_messages_total - Message counts by status (use status="filtered" to track filtered records)
  • {namespace}_gfm_bytes_processed_total - Bytes processed (in/out)

Dedup Component

The Dedup component primarily exports:

  • {namespace}_gfm_processing_duration_seconds - Processing time for dedup operations (with stage: dedup_filter, dedup_write)
  • {namespace}_gfm_processor_messages_total - Message counts by status (use status="duplicate" to track deduplicated records)
  • {namespace}_gfm_bytes_processed_total - Bytes processed (in/out)

API Server

The API server exports HTTP metrics when metrics are enabled:

  • {namespace}_gfm_http_server_request_count - HTTP request count by method, path, and status
  • {namespace}_gfm_http_server_request_duration_seconds - HTTP request duration by method, path, and status

OTLP Receiver

The OTLP receiver components (otlp.logs, otlp.metrics, otlp.traces) export:

  • {namespace}_gfm_receiver_request_count - Receiver request count by transport (http/grpc) and status (ok/error)
  • {namespace}_gfm_receiver_request_duration_seconds - Receiver request duration by transport and status
  • {namespace}_gfm_bytes_processed_total - Bytes processed (in/out)

The gfm_ingestor_backpressure_* and gfm_stream_depth* metrics are emitted by the Kafka ingestor and are not available for OTLP source pipelines.

UI

The UI exports interaction and performance metrics via OTLP:

  • gfm_ui_page_views_total - Page view counts
  • gfm_ui_button_clicks_total - Button click counts
  • gfm_ui_form_submissions_total - Form submission counts
  • gfm_ui_api_request_count - API request counts from the UI
  • gfm_ui_api_request_errors_total - API request error counts from the UI
  • gfm_ui_api_request_duration_seconds - API request duration from the UI
  • gfm_ui_page_load_duration_seconds - Page load duration
  • gfm_ui_pipeline_created_total - Pipeline creation counts
  • gfm_ui_pipeline_deleted_total - Pipeline deletion counts
  • gfm_ui_pipeline_status_changed_total - Pipeline status change counts

Metric Labels

GlassFlow metrics include labels from two sources:

Application Labels (Added by GlassFlow)

These labels are added by the GlassFlow application code:

LabelDescriptionExample Values
componentComponent typeingestor, sink, dedup, transform, filter, api, otlp.logs, otlp.metrics, otlp.traces
pipeline_idUnique pipeline identifierload-pipeline-1-05b7
stageProcessing stage (optional, for processing_duration_seconds)dedup_filter, dedup_write, schema_mapping, total_preparation, per_message
statusOutcome (processor messages), request result (receiver), or HTTP status code (HTTP/UI metrics)success, error, filtered, duplicate, out, ok, or HTTP code e.g. 200
directionData flow direction (for bytes_processed_total)in, out
transportTransport protocol (for receiver metrics)http, grpc
streamJetStream stream name (for stream depth metrics)pipeline stream name
methodHTTP method (HTTP and UI API metrics)GET, POST, PUT, DELETE
pathRoute path template (HTTP and UI metrics)/api/v1/pipelines, /health

Prometheus Labels (Added by Prometheus)

These labels are automatically added by Prometheus during the scraping process:

LabelDescriptionExample Values
instanceInstance identifier (typically pod name)ingestor-0-7f44fbbfd8-bqbw9
jobJob identifier (from Prometheus config)pipeline-load-pipeline-1-05b7/ingestor
leHistogram bucket boundary (for histogram metrics only)0.001, 0.005, 1.0, +Inf

Label Sources:

  • Application labels (component, pipeline_id) are added by GlassFlow code and are consistent across all deployments
  • Prometheus labels (instance, job, le) are added by Prometheus during scraping and depend on your monitoring setup
  • The job label comes from your Prometheus configuration’s job_name field
  • The instance label typically contains the Kubernetes pod name or target endpoint
  • HTTP server metrics (gfm_http_server_request_count, gfm_http_server_request_duration_seconds) do not include component or pipeline_id labels

Accessing Metrics

Metrics Endpoint

The OTEL collector service exposes metrics at:

{release-name}-otel-collector.{namespace}.svc.cluster.local:9090/metrics

For example, if you installed GlassFlow with the release name glassflow-chart in the glassflow namespace:

glassflow-chart-otel-collector.glassflow.svc.cluster.local:9090/metrics

Prometheus Scraping

To scrape metrics with Prometheus, add the following configuration to your Prometheus config:

# GlassFlow OTEL Collector metrics - job_name: 'glassflow-otel-collector' static_configs: - targets: ['glassflow-chart-otel-collector.glassflow.svc.cluster.local:9090'] metrics_path: /metrics scrape_interval: 15s
💡

Replace glassflow-chart and glassflow with your actual release name and namespace if different.

Understanding the job Label

The job label in your metrics comes from the job_name field in your Prometheus configuration. For example:

  • If your Prometheus config has job_name: 'glassflow-otel-collector', then job="glassflow-otel-collector"
  • If you use Kubernetes service discovery, the job name might be auto-generated based on the service name
  • The job name helps Prometheus identify which scrape configuration was used to collect the metrics

Job Label Examples:

  • job="pipeline-load-pipeline-1-05b7/ingestor" - Indicates this metric came from an ingestor component
  • job="pipeline-load-pipeline-1-05b7/sink" - Indicates this metric came from a sink component
  • job="glassflow-otel-collector" - Indicates this metric came from the OTEL collector endpoint

Monitoring Best Practices

Key Metrics to Monitor

  1. Throughput Metrics:

    • rate({namespace}_gfm_kafka_records_read_total[5m]) - Kafka consumption rate
    • {namespace}_gfm_clickhouse_records_written_per_second - ClickHouse write rate
    • rate({namespace}_gfm_bytes_processed_total{direction="in"}[5m]) - Bytes ingestion rate
    • rate({namespace}_gfm_bytes_processed_total{direction="out"}[5m]) - Bytes output rate
  2. Latency Metrics:

    • histogram_quantile(0.95, rate({namespace}_gfm_processing_duration_seconds_bucket[5m])) - 95th percentile processing time
    • histogram_quantile(0.99, rate({namespace}_gfm_processing_duration_seconds_bucket[5m])) - 99th percentile processing time
    • Use the stage label on processing_duration_seconds for per-stage timing (e.g., dedup_filter, dedup_write, schema_mapping, total_preparation, per_message)
    • histogram_quantile(0.95, rate({namespace}_gfm_http_server_request_duration_seconds_bucket[5m])) - 95th percentile API latency
  3. Error Metrics:

    • rate({namespace}_gfm_dlq_records_written_total[5m]) - DLQ write rate
    • rate({namespace}_gfm_processor_messages_total{status="error"}[5m]) - Processor error rate
    • rate({namespace}_gfm_processor_messages_total{status="filtered"}[5m]) - Record filtering rate
    • rate({namespace}_gfm_processor_messages_total{status="duplicate"}[5m]) - Deduplication rate
  4. Receiver Metrics:

    • rate({namespace}_gfm_receiver_request_count{status="error"}[5m]) - OTLP receiver error rate
    • histogram_quantile(0.95, rate({namespace}_gfm_receiver_request_duration_seconds_bucket[5m])) - 95th percentile receiver latency
  5. Back-Pressure Metrics (Kafka ingestor pipelines only):

    • {namespace}_gfm_ingestor_backpressure_active - Currently in back-pressure (1) or not (0)
    • rate({namespace}_gfm_ingestor_backpressure_events_total[5m]) - Rate of new back-pressure episodes
    • histogram_quantile(0.95, rate({namespace}_gfm_ingestor_backpressure_duration_seconds_bucket[5m])) - 95th percentile episode duration
    • {namespace}_gfm_stream_depth_ratio - Stream fill ratio; alert when sustained near 1.0
    • {namespace}_gfm_stream_depth - Absolute message backlog in stream
  6. Health Metrics:

    • {namespace}_up - Service availability
    • {namespace}_target_info - Service metadata

Installation and Setup

For detailed installation instructions and configuration options, see the Observability Installation Guide.

Last updated on