Skip to Content
ConfigurationTransformationsStateless Transformations

Stateless Transformations

The Stateless Transformation feature lets you reshape event payloads on the fly using expression-based mappings.
It is called stateless because each event is transformed independently, without relying on any stored state or history.

Use stateless transformations when you need to:

  • Normalize fields (e.g., parse URLs, user agents, or timestamps)
  • Derive new fields from existing ones
  • Clean up or reformat data before it lands in ClickHouse
  • Map nested JSON structures into a flat schema

How It Works

Internally, stateless transformations are powered by the expr expression engine and a set of custom helper functions. Each transformation defines:

  • expression: how to compute the new value, using the input JSON as context
  • output_name: the resulting field name in the transformed payload
  • output_type: the expected type of the result (e.g., string, int, float, bool)

Internal Process

  1. Input Parsing: The original Kafka event is parsed as a JSON object (map[string]any).
  2. Expression Evaluation:
    For each configured transformation, GlassFlow evaluates the expression against the input object.
  3. Type Conversion:
    The result of the expression is converted to the configured output_type (string, int, float64, bool, or []string).
  4. Output Assembly:
    All transformed fields are collected into a new JSON object keyed by output_name.
  5. Downstream Processing:
    The transformed JSON is then used for schema mapping and written to ClickHouse.

If no stateless transformations are configured, the input JSON is passed through unchanged.

Expression Context

Inside an expression, you can reference any field from the input JSON by its name, for example:

  • url, user_agent, status, age
  • Nested fields using dot notation, e.g. user.id, payload.metadata.referrer

The expression engine is type-aware and supports arithmetic, string operations, conditionals, and logical operators.

Transformation Expressions

Stateless transformations are evaluated using the expr  expression language. Expr is a simple, type-aware expression engine that supports literals, variables, operators, and a rich set of built-in functions. Your transformation expressions run on this engine, so you can use the full expr syntax—including arithmetic, comparison, logical and string operators, optional chaining, and predicates—in addition to the GlassFlow-specific helpers described below.

Expression Operators

The following operators are available in expressions (from the expr language definition ):

CategoryOperatorsExamples
Arithmetic+, -, *, /, %, ^ or **count * 2, 3 ** 2
Comparison==, !=, <, >, <=, >=status == "active", age >= 18
Logicalnot or !, and or &&, or or ||!disabled, a and b
Conditional?:, ??, ifelsex ? 1 : 0, name ?? "Anonymous"
Membership[], ., ?., inpayload.id, tags[0], "x" in list
String+, contains, startsWith, endsWith"Hi " + name
Regexmatchesemail matches ".*@.+\\.com"
Range..1..3[1, 2, 3]
Slice[:]arr[1:4], arr[-1]
Pipe|s | lower() | split(",")

Functions

In addition to standard expr operators, GlassFlow provides helper functions useful for web / event data pipelines.
Use them directly in expressions, e.g. upper(status), parseUserAgent(user_agent, "device").

Each function is listed below with a short description and an example showing expressionoutput.


parseQuery

Parses a URL query string into a map of key/value pairs. Useful for deriving fields from ?key=value&foo=bar-style strings.

ExpressionExample output
parseQuery('a=1&b=2&c=hello'){"a": "1", "b": "2", "c": "hello"}
parseQuery(query_string) (with query_string = "utm_source=google"){"utm_source": "google"}

getQueryParam

Return the value of a specific query parameter from a query string.

ExpressionExample output
getQueryParam('a=1&b=2', 'b')"2"
getQueryParam(url_query, 'utm_campaign') (with url_query = "utm_campaign=winter")"winter"
getQueryParam('a=1&b=2', 'missing')""

urlDecode

URL-decodes an encoded string (e.g. %20 → space, %2B+).

ExpressionExample output
urlDecode('hello%20world')"hello world"
urlDecode('a%3D1%26b%3D2')"a=1&b=2"

parseISO8601

Parses common ISO-8601 timestamp strings and returns a Unix timestamp in seconds. Supports formats such as 2006-01-02T15:04:05Z and 2006-01-02 15:04:05.000000.

ExpressionExample output
parseISO8601('2024-06-15T14:30:00Z')1718461800
parseISO8601(timestamp_field) (with timestamp_field = "2024-01-01T00:00:00Z")1704067200
parseISO8601('') or invalid input0

toDate

Converts a Unix timestamp (seconds) or a time.Time value into a YYYY-MM-DD date string.

ExpressionExample output
toDate(1718458200)"2024-06-15"
toDate(parseISO8601('2024-06-15T14:30:00Z'))"2024-06-15"

parseUserAgent

Extracts structured information from a user agent string. The second argument must be one of: "device", "browser", or "os".

Example user_agent value used below:
Mozilla/5.0 (Linux; Android 10; SM-G973F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.120 Mobile Safari/537.36

ExpressionExample output
parseUserAgent(user_agent, "device")"Mobile" (or "Desktop" / "Tablet" for other UAs)
parseUserAgent(user_agent, "browser")"Chrome"
parseUserAgent(user_agent, "os")"Android"

containsStr

Returns whether the first string contains the second string (substring check).

ExpressionExample output
containsStr('hello world', 'world')true
containsStr('hello world', 'xyz')false
containsStr(path, '/api/') (with path = '/api/v1/users')true

hasPrefix

Returns whether the first string starts with the second string.

ExpressionExample output
hasPrefix('hello world', 'hello')true
hasPrefix('hello world', 'world')false
hasPrefix(path, '/g/') (with path = '/g/collect')true

hasSuffix

Returns whether the first string ends with the second string.

ExpressionExample output
hasSuffix('file.json', '.json')true
hasSuffix('file.json', '.txt')false

upper

Converts a string to uppercase.

ExpressionExample output
upper('hello')"HELLO"
upper(status) (with status = "pending")"PENDING"

lower

Converts a string to lowercase.

ExpressionExample output
lower('HELLO')"hello"
lower(type) (with type = "EVENT")"event"

trim

Trims leading and trailing whitespace from a string.

ExpressionExample output
trim(' hello ')"hello"
trim(raw_input) (with raw_input = "\t value \n")"value"

split

Splits a string by a separator and returns an array of strings.

ExpressionExample output
split('a,b,c', ',')["a", "b", "c"]
split('one-two-three', '-')["one", "two", "three"]
split(path, '/') (with path = "/api/v1")["", "api", "v1"]

join

Joins an array of values into a single string using the given separator. The first argument must be an array (e.g. the result of split).

ExpressionExample output
join(['a', 'b', 'c'], '-')"a-b-c"
join(['x', 'y', 'z'], ' ')"x y z"

replace

Replaces all occurrences of a substring with another string.

ExpressionExample output
replace('hello world', 'world', 'there')"hello there"
replace(url, 'http://', 'https://') (with url = 'http://example.com')"https://example.com"

toInt

Converts a value to an integer. Returns 0 for empty or invalid input.

ExpressionExample output
toInt('42')42
toInt('3.14')0 (invalid for int)
toInt('abc')0 (invalid for int)
toInt(age_str) (with age_str = "25")25

toFloat

Converts a value to a float. Returns 0.0 for empty or invalid input.

ExpressionExample output
toFloat('3.14')3.14
toFloat('42')42
toFloat(price_str) (with price_str = "19.99")19.99

toString

Converts a value to a string. Returns "" for empty input.

ExpressionExample output
toString(42)"42"
toString(3.14)"3.14"
toString(id) (with id = 12345)"12345"

keys

Returns all keys of a JSON object (map), sorted alphabetically.

ExpressionExample output
keys(parseQuery('a=1&b=2'))["a", "b"]
keys(payload) (with payload = {"x": 1, "y": 2})["x", "y"]

waterfall

Returns the first non-empty value from a list of candidates. Accepts either multiple arguments or a single array. Useful for fallbacks (e.g. prefer utm_source, then source, then "direct").

ExpressionExample output
waterfall('', 'b', 'c')"b"
waterfall(utm_source, source, 'direct') (with utm_source = "", source = "newsletter")"newsletter"
waterfall('', '', 'last')"last"

extractPathType

Maps a known path to a request type. Recognized paths: /g/collect"collect", /_/set_cookie"set_cookie"; any other path → "unknown".

ExpressionExample output
extractPathType('/g/collect')"collect"
extractPathType(path) (with path = "/_/set_cookie")"set_cookie"
extractPathType('/other')"unknown"

hasKeyPrefix

Returns whether any key in the given map starts with one of the given prefixes. First argument: map (object); second: array of prefix strings.

ExpressionExample output
hasKeyPrefix(payload, ['ga_', 'gclid_']) (with payload = {"ga_session_id": "123"})true
hasKeyPrefix(payload, ['utm_']) (with payload = {"page": "/"})false

hasAnyKey

Returns whether the given map has at least one of the given keys. First argument: map; second: array of key names.

ExpressionExample output
hasAnyKey(payload, ['user_id', 'client_id']) (with payload = {"client_id": "abc"})true
hasAnyKey(payload, ['user_id', 'client_id']) (with payload = {"page": "/"})false

Configuration

Stateless transformations are configured as part of the pipeline’s transformation section.
They are typically surfaced through the UI as derived fields or expression-based mappings.

Configuration Structure

Examples

Here’s a complete example of a pipeline with stateless transformation enabled.

Scenario: Web events with optional timestamp, path, and request_time. We derive an event date, a path-based category, and request time in microseconds.

{ "version": "v2", "pipeline_id": "events-with-transform", "name": "Web Events with Stateless Transformation", "source": { "type": "kafka", "connection_params": { ... }, "topics": [ { "name": "events", "consumer_group_initial_offset": "latest", "replicas": 1 } ] }, "stateless_transformation": { "enabled": true, "id": "transform", "type": "expr_lang_transform", "config": { "transform": [ { "expression": "toDate(parseISO8601(timestamp ?? ''))", "output_name": "event_date", "output_type": "string" }, { "expression": "extractPathType(path ?? '') == 'collect' ? 'event' : 'other'", "output_name": "event_category", "output_type": "string" }, { "expression": "toInt(toFloat(request_time ?? 0) * 1000000)", "output_name": "request_time_usec", "output_type": "uint" } ] } }, "schema": { "fields": [ { "source_id": "transform", "name": "event_date", "type": "string" }, { "source_id": "transform", "name": "event_category", "type": "string" }, { "source_id": "transform", "name": "request_time_usec", "type": "uint" } ] }, "sink": { "type": "clickhouse", ... } }
  • event_date — Custom helpers parseISO8601 and toDate turn an ISO-8601 timestamp into a YYYY-MM-DD string; timestamp ?? '' avoids errors when the field is missing (expr nil coalescing ).
  • event_categoryextractPathType plus expr ternary : paths like /g/collect become "event", everything else "other".
  • request_time_usectoFloat converts request_time to a float, then multiplies by 1,000,000 to get microseconds; toInt converts the result to an integer.

Best Practices

  • Keep expressions focused:
    Prefer several small, clear transformations over one very complex expression.
  • Validate transformations with sample data:
    Use the UI or SDK evaluation helpers (where available) to test expressions before deploying.
  • Align output_type with ClickHouse schema:
    Make sure the type you choose matches the target ClickHouse column type to avoid conversion errors.
  • Use helpers for common patterns:
    Offload parsing and normalization (timestamps, user agents, query strings) to the provided helper functions instead of re-implementing logic inside expressions.
Last updated on