Skip to Content
SourcesApache Kafka

Apache Kafka®

Apache Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant data pipelines. It enables real-time data integration, messaging, and stream processing across systems.

Using Apache Kafka with GlassFlow

Kafka source pipelines consume messages from one or more Kafka topics, apply optional transformations, and write the results to ClickHouse. This is the default source type and supports the full set of GlassFlow pipeline features including deduplication, temporal joins, filtering, and stateless transformations.

Configuration

Each Kafka source is an entry in the sources array with type set to "kafka":

{ "type": "kafka", "source_id": "events", "connection_params": { "brokers": ["kafka:9092"], "protocol": "PLAINTEXT", "mechanism": "NO_AUTH" }, "topic": "events", "consumer_group_initial_offset": "earliest", "schema_fields": [ {"name": "event_id", "type": "string"}, {"name": "timestamp", "type": "datetime"} ] }

Source Parameters

FieldTypeRequiredDescription
typestringYesMust be "kafka"
source_idstringYesUnique identifier, referenced by transforms, join, and sink mapping
connection_paramsobjectYesKafka connection parameters
topicstringYesKafka topic name
consumer_group_initial_offsetstringNoWhere to start reading: earliest or latest (default: latest)
schema_fieldsarrayYesField definitions for this source. Each entry: {"name": "...", "type": "..."}

Connection Parameters

FieldTypeRequiredDescription
brokersarrayYesKafka broker addresses (e.g., ["kafka:9092"])
protocolstringYesSecurity protocol: PLAINTEXT, SASL_PLAINTEXT, SSL, SASL_SSL
mechanismstringConditionalAuth mechanism: NO_AUTH, PLAIN, SCRAM-SHA-256, SCRAM-SHA-512, GSSAPI
usernamestringConditionalKafka username (required when auth is enabled)
passwordstringConditionalKafka password (required when auth is enabled)
root_castringNoPEM-encoded CA certificate for TLS
skip_tls_verificationbooleanNoSkip TLS certificate verification (default: false)

For detailed examples of each protocol and authentication method (including Kerberos), see Connections.

Features

FeatureSupportedDetails
DeduplicationYesDeduplication
Temporal JoinsYesJoin
FilterYesFilter
Stateless TransformationYesStateless Transformation

Kafka-compatible providers

The Kafka source works against any Kafka API-compatible broker. The following providers are validated and have provider-specific connection guides:

  • Confluent Cloud — managed Kafka with API-key SASL auth
  • Redpanda — Kafka-compatible streaming without JVM
  • AWS MSK — managed Kafka on AWS, connect via SASL/SCRAM
  • WarpStream — Kafka-compatible with object-storage backend

Next Steps

Last updated on