Data transformation

This page outlines data transformation concepts in GlassFlow.

What is Data Transformation?

Data transformation involves converting data from its original format to a different format or structure to make it more suitable for analysis, processing, or storage. Data transformation often involves cleaning, enriching, and manipulating data using various libraries and functions.

Common Data Transformations

Stateless

  • Data Cleaning

  • Data Enrichment

  • Data Validation

  • Data Anomaly Detection

  • Data Profiling

  • Data Quality Check

  • Data Normalization

  • Data Conversion

  • Real-time APIs integration

  • LLMs integration

  • ML-trained model integration

Stateful

  • Data Aggregation

  • Data Filtering

  • Data transformation based on history.

Transforming data in Python with GlassFlow

In GlassFlow, you create a custom transformation function in a Python script to transform data. You implement your logic for the transformation inside the handler function. See how to implement a transformation function.

Deploy transformation function

To deploy and run the transformation function you defined in a Python script in GlassFlow, you create a pipeline and provide a reference to the script. GlassFlow runs the transformation function on its Serverless Execution Engine.

Python dependencies for transformation

With each import statement in your transformation function script, you are bringing in a new Python dependency. GlassFlow needs to install those dependencies to compile and run the function successfully. When you upload your transformation function through the GlassFlow interface or using the CLI command, GlassFlow automatically compiles your function with the supported libraries installed. This process verifies that your function is compatible with the serverless execution environment.

Add external Python dependencies

GlassFlow supports including any external Python libraries in your transformation function. This allows you to easily manage and integrate additional Python packages needed for your data transformations.

Include Python dependencies using WebApp

  • Access the Editor:

    • Navigate to your existing or new GlassFlow pipeline in the WebApp.

    • Go to the "Transformer" tab and select "requirement.txt" file.

  • Edit requirements.txt:

    • You will see an editor where you can modify the requirements.txt file.

    • Add the names of the libraries you need, one per line. For example:

      numpy
      pandas
    • Alternatively, if you needed to specify certain versions, another valid example would be:

      numpy==1.21.1
      pandas>2.0
  • Save Changes:

    • After editing, click the "Save Transformer" button to apply your changes.

    • The WebApp will automatically install the specified libraries in your pipeline project environment.

You should not include built-in Python libraries like math or random in your requirements.txt file. These are a part of Python and aren't installed separately.

Include Python dependencies using CLI

Add Python dependencies by passing additional param --requirements=openaito the GlassFlow CLI pipeline creation command.

Next

Proceed to the Pipeline Configuration page in our documentation for further details on configuring your data pipelines to utilize these transformations effectively.

Last updated

Logo

© 2023 GlassFlow