Quickstart
A quickstart guide to run GlassFlow, create your first pipeline using GlassFlow WebApp, produce and consume data in real-time.
Last updated
A quickstart guide to run GlassFlow, create your first pipeline using GlassFlow WebApp, produce and consume data in real-time.
Last updated
© 2023 GlassFlow
Let's create a sample data pipeline to detect PII (Personal Identifiable Information) in real-time and hide it to comply with data privacy regulations.
Visit app.glassflow.dev to create your account. Click on "Sign Up" to initiate the sign-up process.
Choose a signup option using your Google or GitHub account and press continue.
After successfully creating an account, you can access the GlassFlow dashboard.
Visit the GlassFlow app.glassflow.dev, locate the "Pipelines" section, and click "Create new Pipeline".
Enter a name for your pipeline (e.g., "PII detection"). Optionally, you can also choose Space to create the pipeline inside that space. By default, the main space is selected.
Click "Next Step" to select the Data Source.
Select "SDK". The GlassFlow SDK option requires you to implement the logic for sending data from a custom data source to the GlassFlow pipeline in Python.
Click "Next Step" to set up the transformation function. You will see a built-in editor to write code for the transformer. There is also an option to choose a sample transformer from the "Template" dropdown menu.
Select the "PII Detection" function template. See how to define the transformation function.
Note that the handler function is mandatory to implement in your code. Without it, the running transformation function will not be successful.
You can also import
any Python dependencies (libraries) in the transformation function. See how to include them in the transformation with GlassFlow.
Click "Next Step" and select the "SDK" option. The GlassFlow SDK option requires you to implement the logic for consuming data from the GlassFlow pipeline in Python.
It is the final step to confirm your pipeline details. Check the pipeline overview and click "Create Pipeline".
Once the pipeline is created, copy its credentials such as Pipeline ID and Access Token.
Important: After creating the pipeline, the transformation function is deployed and running on GlassFlow’s serverless execution engine in the cloud. You do not need to configure or maintain any infrastructure on your side.
To complete this part you'll need the following:
Python is installed on your machine.
Download and Install Pip to manage project packages.
Start by creating a dedicated project folder. Create a directory for this project named glassflow-playground
, where you will place all the necessary files.
Set environment variables
Set environment variables with your actual GlassFlow pipeline credentials such as PIPELINE_ID
and PIPELINE_ACCESS_TOKEN:
After running these commands, the environment variables PIPELINE_ID
and PIPELINE_ACCESS_TOKEN
will be available to your application allowing it to connect to your GlassFlow pipeline from the client SDK.
Install GlassFlow SDK
Install the GlassFlow Python SDK using pip
.
Optional: Create a virtual environment before installing Python dependencies. Run the following command: python -m venv .venv && source .venv/bin/activate
Now you can start sending data to the pipeline. GlassFlow will automatically run your transformation function on each event entering the pipeline and make your transformed data available in milliseconds.
Create a new Python file in your project root directory called producer.py
with the following code:
Run the above Python script with the following command in your terminal:
When you run the above script, on every event that enters the pipeline, GlassFlow will invoke the transformation function that you have defined in the pipeline. In this case, it will run the PII detection function and transform every raw data you send to the pipeline.
Create a new Python file called consumer.py
. The consumer is responsible for pulling transformed data from the pipeline. It continuously checks for new data, processes it as needed, and acts upon the transformed information.
Copy and paste the following code to consumer.py
the file:
Run the above Python script consumer.py
in a separate terminal window to see the output side-by-side:
Now you activated the consumer side of the pipeline. This consumer.py
script retrieves the processed data from the pipeline in real-time and you will see the results of data transformation in the terminal.
Congratulations! You've set up a real-time pipeline using GlassFlow.
In this Quickstart you've learned the following:
How to install GlassFlow and set up a new project.
How to create a data pipeline using the GlassFlow Web App.
How to implement transform function and process data in real-time.
How to publish data into the pipeline.
How to consume data from the pipeline.
Explore glassflow-examples repository on GitHub to try additional examples and use Python SDK to create and manage pipelines.