Data transformation
This page outlines data transformation concepts in GlassFlow.
What is Data Transformation?
Data transformation involves converting data from its original format to a different format or structure to make it more suitable for analysis, processing, or storage. Data transformation often involves cleaning, enriching, and manipulating data using various libraries and functions.
Common Data Transformations
Stateless
Data Cleaning
Data Enrichment
Data Validation
Data Anomaly Detection
Data Profiling
Data Quality Check
Data Normalization
Data Conversion
Real-time APIs integration
LLMs integration
ML-trained model integration
Stateful
Data Aggregation
Data Filtering
Data transformation based on history.
Transforming data in Python with GlassFlow
In GlassFlow, you create a custom transformation function in a Python script to transform data. You implement your logic for the transformation inside the handler
function. See how to implement a transformation function.
Deploy transformation function
To deploy and run the transformation function you defined in a Python script in GlassFlow, you create a pipeline and provide a reference to the script. GlassFlow runs the transformation function on its Serverless Execution Engine.
Python dependencies for transformation
With each import
statement in your transformation function script, you are bringing in a new Python dependency. GlassFlow needs to install those dependencies to compile and run the function successfully. When you upload your transformation function through the GlassFlow interface or using the CLI command, GlassFlow automatically compiles your function with the supported libraries installed. This process verifies that your function is compatible with the serverless execution environment.
Add external Python dependencies
GlassFlow supports including any external Python libraries in your transformation function. This allows you to easily manage and integrate additional Python packages needed for your data transformations.
Include Python dependencies using WebApp
Access the Editor:
Navigate to your existing or new GlassFlow pipeline in the WebApp.
Go to the "Transformer" tab and select "requirement.txt" file.
Edit
requirements.txt
:You will see an editor where you can modify the
requirements.txt
file.Add the names of the libraries you need, one per line. For example:
Alternatively, if you needed to specify certain versions, another valid example would be:
Save Changes:
After editing, click the "Save Transformer" button to apply your changes.
The WebApp will automatically install the specified libraries in your pipeline project environment.
You should not include built-in Python libraries like math
or random
in your requirements.txt
file. These are a part of Python and aren't installed separately.
Include Python dependencies using CLI
Add Python dependencies by passing additional param --requirements=openai
to the GlassFlow CLI pipeline creation command.
Next
Proceed to the Pipeline Configuration page in our documentation for further details on configuring your data pipelines to utilize these transformations effectively.
Last updated