Real-time generative feedback loop automation
A practical example of creating a data streaming pipeline with GlassFlow to detect changes in Supabase and continuously update the Weaviate vector database.
Generative Feedback Loop (GFL) improves the output of Generative AI where the prompt results generated from language models like GPT are vectorized and saved back into a vector database for future use. Read more about GFL in our blog.
This tutorial will guide you through setting up a real-time GFL automation pipeline using GlassFlow, Supabase, and Weaviate. By the end of this guide, you'll be able to process Airbnb listing data in Supabase, enrich it with AI, store it in the Weaviate, and search through the enriched listings using the Weaviate Console. The example pipeline can be used to create personalized recommendation solutions and generate targeted ads based on real-time information.
Pipeline components
Airbnb sample dataset
We use a CSV sample dataset with room listings in New York City from Airbnb. This dataset includes listing attributes such as the listing name, host name, location details, room type, price, and availability. Here is a data row example:
id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2539 | Clean & quiet apt home by the park | 2787 | John | Brooklyn | Kensington | 40.64749 | -73.97237 | Private room | 149 | 1 | 9 | 2018-10-19 | 0.21 | 6 | 365 |
Data source
Supabase acts as the primary database where Airbnb listings are stored initially. Whenever a new listing is added or an existing one is updated, Supabase triggers an event to send changes directly to the GlassFlow pipeline using a Webhook data source connector.
Transformation: AI and Vectorization
GlassFlow applies AI-driven transformations in Python to the raw data captured from Supabase. In this case, the AI model from OpenAI is used to enrich the listing descriptions by generating richer and more descriptive summaries from listing attributes. The transformed descriptions are then vectorized—and converted into a format suitable for fast and efficient searches. These vectors are stored in Weaviate, enabling advanced search capabilities.
Data Sink
Weaviate is used as a vector database to store the listing summary and vectorized descriptions. The use of Weaviate ensures that the enriched descriptions are indexed and can be searched efficiently, providing users with the most relevant listings in real time.
Once we understand the use case and pipeline components, let's build a pipeline for it.
Set Up the Weaviate
Have a Weaviate account.
Login into the Weaviate console. Create a new collection called
AirbnbNYC
inside a cluster and choose a vectorizer typetext2vec-openai
and modeltext-embedding-3-small
. Keep the rest of the configuration by default.Copy the Weaviate Cluster API URL and Admin KEY.
Set Up the GlassFlow Pipeline
You will use the GlassFlow WebApp to create a data processing pipeline.
Prerequisites
To start with the pipeline setup, you need the following.
Have the Weaviate Cluster API URL and API Key you obtained from the previous section.
Step 1. Log in to GlassFlow WebApp
Navigate to the GlassFlow WebApp and log in with your credentials.
Step 2. Create a New Pipeline
Click on "Create New Pipeline" and provide a name. You can name it "generative-feedback-loop". Optionally, create a Space or use the default main space.
Step 3. Configure a Data Source
Choose "Webhook" as the Data Source type. GlassFlow will provide you with a unique webhook URL for your pipeline after the pipeline is created. You will use it to push data from Supabase.
Step 4. Define the Transformer
Copy and paste the following transformation function code into the transformer's built-in editor.
By default, the transformer function uses a free OpenAI API key provided by GlassFlow.
You can replace it with your API key too. To do so:
Have an OpenAI API account.
Create an API key.
Set the API key in the transformation code:
OPENAI_API_KEY = "YOUR_OPENAI_API_KEY"
Step 5. Choose a transformer dependency
The transformation function uses openai external library in the code, so we need to choose it from the Dependencies dropdown menu. GlassFlow includes the external library in the function deployment and runtime. Read more about Python dependencies for transformation.
Step 6. Configure a Data Sink
Select "Webhook" as a data sink to configure the pipeline to use the Weaviate Webhook URL.
Fill in the URL and headers under Connector Details:
Method: POST
URL:
https://${WEAVIATE_CLUSTER_URL}/v1/objects
Headers:
Content-Type
:application/json
Authentication
:Bearer ${WEAVIATE_API_KEY}
Step 7. Confirm the Pipeline
Click "Next", confirm the pipeline settings in the final step, and click "Create Pipeline".
Step 8. Copy the Pipeline Credentials
Once the pipeline is created, copy its Access Token and Webhook URL.
Now the GlassFlow pipeline is ready to send data automatically to Weaviate. Next, you set up the Supabase database and populate it with sample data.
Set Up the Supabase
Have a Supabase account.
Create a project and a database table called "Airbnb-nyc-2019" in Supabase.
Add a schema definition to map column names (such as the listing name, host name, location details, room type, price, etc.) in a CSV file to the Supabase table.
Create a webhook trigger on Supabase. Follow the instructions from Supabase to create the webhook and hook it to
INSERT
events on your table. Use the GlassFlow pipeline Webhook URL you copied and add the following headersX-Pipeline-Access-Token
set to a valid Access Token andContent-Type
set toapplication/json
.Copy the Supabase API URL and Key from the configuration.
Populate the Supabase database with data
To produce sample data for the Supabase database, you can run a Python script in the repo folder or insert data using directly Supabase.
Prerequisites
To complete this part you'll need the following:
Python is installed on your machine.
Download and Install Pip to manage project packages.
You have the Supabase API URL and Key obtained from the previous section.
Installation
Clone the
glassflow-examples
repository to your local machine:Navigate to the project directory:
Create a new virtual environment:
Install the required dependencies:
Create an environment configuration file
Add a .env
file in the project directory and add the following configuration variables:
Replace the placeholders (your_supabase_access_key
, and your_supabase_url
) with the appropriate values.
Unzip Airbnb New York Listings
Uncompress the sample CSV data provided in the repository:
Populate Supabase database
To test the pipeline, we can create some rows in our Supabase table by running the command:
This will insert the first 10 rows from the Airbnb dataset into the Airbnb listings table in the Supabase database, which then sends the change events automatically to the GlassFlow pipeline. After the transformation, the data will be available immediately in Weaviate.
Search for Airbnb listings
You can now search for enriched listings in the Weaviate database using the search console. Run the following vector similarity query in the Weaviate query console:
This type of vector search query is particularly useful in scenarios where you want to find items (like Airbnb listings) that match a particular set of characteristics described in natural human language. For example, a user might be looking for listings that are described as "luxurious" and "have a great view," and this query would return the most relevant results based on that description.
After running the command, you will get the relevant result:
An amazing part of the process is happening behind the scenes, the pipeline you created continuously delivers new Airbnb listings to show the latest data.
Conclusion
In this tutorial, you learned how to detect changes on the primary database, stream these changes, and continuously update the vector database for AI-powered applications.
Last updated