Data transformation

This page outlines data transformation concepts in GlassFlow.

What is Data Transformation?

Data transformation involves converting data from its original format to a different format or structure to make it more suitable for analysis, processing, or storage. Data transformation often involves cleaning, enriching, and manipulating data using various libraries and functions.

Common Data Transformations

Stateless

Data Cleaning
Data Enrichment
Data Validation
Data Anomaly Detection
Data Profiling
Data Quality Check
Data Normalization
Data Conversion
Real-time APIs integration
LLMs integration
ML-trained model integration

Stateful

Data Aggregation
Data Filtering
Data transformation based on history.

Transforming data in Python with GlassFlow

In GlassFlow, you create a custom transformation function in a Python script to transform data. You implement your logic for the transformation inside the handler function. See how to implement a transformation function.

Deploy transformation function

To deploy and run the transformation function you defined in a Python script in GlassFlow, you create a pipeline and provide a reference to the script. GlassFlow runs the transformation function on its Serverless Execution Engine.

Python dependencies for transformation

With each import statement in your transformation function script, you are bringing in a new Python dependency. GlassFlow needs to install those dependencies to compile and run the function successfully. When you upload your transformation function through the GlassFlow interface or using the CLI command, GlassFlow automatically compiles your function with the supported libraries installed. This process verifies that your function is compatible with the serverless execution environment.

Add external Python dependencies

GlassFlow supports including any external Python libraries in your transformation function. This allows you to easily manage and integrate additional Python packages needed for your data transformations.

Include Python dependencies using WebApp

Access the Editor:
- Navigate to your existing or new GlassFlow pipeline in the WebApp.
- Go to the "Transformer" tab and select "requirement.txt" file.
Edit requirements.txt:
- You will see an editor where you can modify the requirements.txt file.
- Add the names of the libraries you need, one per line. For example:
  numpy pandas
- Alternatively, if you needed to specify certain versions, another valid example would be:
  numpy==1.21.1 pandas>2.0
Save Changes:
- After editing, click the "Save Transformer" button to apply your changes.
- The WebApp will automatically install the specified libraries in your pipeline project environment.

You should not include built-in Python libraries like math or random in your requirements.txt file. These are a part of Python and aren't installed separately.

Include Python dependencies using CLI

Add Python dependencies by passing additional param --requirements=openaito the GlassFlow CLI pipeline creation command.

Proceed to the Pipeline Configuration page in our documentation for further details on configuring your data pipelines to utilize these transformations effectively.

PreviousUse cases NextSpace

Last updated 2 months ago