Define a transformation function

This page explains how to create a custom transformation function with GlassFlow in Python

Data transformation enables the conversion or mapping of data from one format or structure into another. GlassFlow facilitates this process using a custom Python transformation function, allowing for a wide range of transformation scenarios including data cleaning, validation, normalization, enrichment, and more.

Implementing Transformations

To perform data transformations in GlassFlow, you write a Python script containing a mandatory handler function. This function is where you define your transformation logic:

def handler(data, log):
    # Your transformation logic goes here.
    return data

GlassFlow automatically invokes this function when a data pipeline runs and it passes two arguments:

  • data - represents the event dispatched to the pipeline, accessible within the function as a JSON or Python dictionary.

  • log - is a Python logging object to generate logs. Any logs created by the user will be included in the pipeline logs, which can be viewed through the CLI.

The handler function processes this data and returns the transformed data as a JSON or Python dictionary.

Default Transformation Function

When you create a pipeline in GlassFlow without a custom transformation function, a default "echo" function is automatically created. Here's the basic structure of the default transformation function script in GlassFlow:

import json

def handler(data, log):
    log.info("Echo: " + json.dumps(data))

    return data

To customize the transformation function, you can modify the handler.py file to include your transformation logic.

You can also include other Python dependencies (Python packages that youimport into your script) in the transformation function. You can add a requirements.txt file when creating the pipeline

Next

In the Create a Pipeline section, you will learn how to configure a new pipeline.

Last updated

Logo

© 2023 GlassFlow