Real-time generative feedback loop automation
A practical example of creating a data streaming pipeline with GlassFlow to detect changes in Supabase and continuously update the Weaviate vector database.
Last updated
A practical example of creating a data streaming pipeline with GlassFlow to detect changes in Supabase and continuously update the Weaviate vector database.
Last updated
© 2023 GlassFlow
Generative Feedback Loop (GFL) improves the output of Generative AI where the prompt results generated from language models like GPT are vectorized and saved back into a vector database for future use. Read more about GFL in our blog.
This tutorial will guide you through setting up a real-time GFL automation pipeline using GlassFlow, Supabase, and Weaviate. By the end of this guide, you'll be able to process Airbnb listing data in Supabase, enrich it with AI, store it in the Weaviate, and search through the enriched listings using the Weaviate Console. The example pipeline can be used to create personalized recommendation solutions and generate targeted ads based on real-time information.
Link to the GitHub project repository
We use a CSV sample dataset with room listings in New York City from Airbnb. This dataset includes listing attributes such as the listing name, host name, location details, room type, price, and availability. Here is a data row example:
Supabase acts as the primary database where Airbnb listings are stored initially. Whenever a new listing is added or an existing one is updated, Supabase triggers an event to send changes directly to the GlassFlow pipeline using a Webhook data source connector.
GlassFlow applies AI-driven transformations in Python to the raw data captured from Supabase. In this case, the AI model from OpenAI is used to enrich the listing descriptions by generating richer and more descriptive summaries from listing attributes. The transformed descriptions are then vectorized—and converted into a format suitable for fast and efficient searches. These vectors are stored in Weaviate, enabling advanced search capabilities.
Weaviate is used as a vector database to store the listing summary and vectorized descriptions. The use of Weaviate ensures that the enriched descriptions are indexed and can be searched efficiently, providing users with the most relevant listings in real time.
Once we understand the use case and pipeline components, let's build a pipeline for it.
Have a Weaviate account.
Login into the Weaviate console. Create a new collection called AirbnbNYC
inside a cluster and choose a vectorizer type text2vec-openai
and model text-embedding-3-small
. Keep the rest of the configuration by default.
Copy the Weaviate Cluster API URL and Admin KEY.
You will use the GlassFlow WebApp to create a data processing pipeline.
To start with the pipeline setup, you need the following.
Have the Weaviate Cluster API URL and API Key you obtained from the previous section.
Navigate to the GlassFlow WebApp and log in with your credentials.
Click on "Create New Pipeline" and provide a name. You can name it "generative-feedback-loop". Optionally, create a Space or use the default main space.
Choose "Webhook" as the Data Source type. GlassFlow will provide you with a unique webhook URL for your pipeline after the pipeline is created. You will use it to push data from Supabase.
Copy and paste the following transformation function code into the transformer's built-in editor.
By default, the transformer function uses a free OpenAI API key provided by GlassFlow.
You can replace it with your API key too. To do so:
Have an OpenAI API account.
Create an API key.
Set the API key in the transformation code: OPENAI_API_KEY = "YOUR_OPENAI_API_KEY"
The transformation function uses openai external library in the code, so we need to choose it from the Dependencies dropdown menu. GlassFlow includes the external library in the function deployment and runtime. Read more about Python dependencies for transformation.
Select "Webhook" as a data sink to configure the pipeline to use the Weaviate Webhook URL.
Fill in the URL and headers under Connector Details:
Method: POST
URL: https://${WEAVIATE_CLUSTER_URL}/v1/objects
Headers:
Content-Type
: application/json
Authentication
: Bearer ${WEAVIATE_API_KEY}
Click "Next", confirm the pipeline settings in the final step, and click "Create Pipeline".
Once the pipeline is created, copy its Access Token and Webhook URL.
Now the GlassFlow pipeline is ready to send data automatically to Weaviate. Next, you set up the Supabase database and populate it with sample data.
Have a Supabase account.
Create a project and a database table called "Airbnb-nyc-2019" in Supabase.
Add a schema definition to map column names (such as the listing name, host name, location details, room type, price, etc.) in a CSV file to the Supabase table.
Create a webhook trigger on Supabase. Follow the instructions from Supabase to create the webhook and hook it to INSERT
events on your table. Use the GlassFlow pipeline Webhook URL you copied and add the following headers X-Pipeline-Access-Token
set to a valid Access Token and Content-Type
set to application/json
.
Copy the Supabase API URL and Key from the configuration.
To produce sample data for the Supabase database, you can run a Python script in the repo folder or insert data using directly Supabase.
To complete this part you'll need the following:
Python is installed on your machine.
Download and Install Pip to manage project packages.
You have the Supabase API URL and Key obtained from the previous section.
Clone the glassflow-examples
repository to your local machine:
Navigate to the project directory:
Create a new virtual environment:
Install the required dependencies:
Add a .env
file in the project directory and add the following configuration variables:
Replace the placeholders (your_supabase_access_key
, and your_supabase_url
) with the appropriate values.
Uncompress the sample CSV data provided in the repository:
To test the pipeline, we can create some rows in our Supabase table by running the command:
This will insert the first 10 rows from the Airbnb dataset into the Airbnb listings table in the Supabase database, which then sends the change events automatically to the GlassFlow pipeline. After the transformation, the data will be available immediately in Weaviate.
You can now search for enriched listings in the Weaviate database using the search console. Run the following vector similarity query in the Weaviate query console:
This type of vector search query is particularly useful in scenarios where you want to find items (like Airbnb listings) that match a particular set of characteristics described in natural human language. For example, a user might be looking for listings that are described as "luxurious" and "have a great view," and this query would return the most relevant results based on that description.
After running the command, you will get the relevant result:
An amazing part of the process is happening behind the scenes, the pipeline you created continuously delivers new Airbnb listings to show the latest data.
In this tutorial, you learned how to detect changes on the primary database, stream these changes, and continuously update the vector database for AI-powered applications.
id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2539
Clean & quiet apt home by the park
2787
John
Brooklyn
Kensington
40.64749
-73.97237
Private room
149
1
9
2018-10-19
0.21
6
365