Real-time generative feedback loop automation

A practical example of creating a data streaming pipeline with GlassFlow to detect changes in Supabase and continuously update the Weaviate vector database.

Generative Feedback Loop (GFL) improves the output of Generative AI where the prompt results generated from language models like GPT are vectorized and saved back into a vector database for future use. Read more about GFL in our blog.

This tutorial will guide you through setting up a real-time GFL automation pipeline using GlassFlow, Supabase, and Weaviate. By the end of this guide, you'll be able to process Airbnb listing data in Supabase, enrich it with AI, store it in the Weaviate, and search through the enriched listings using the Weaviate Console. The example pipeline can be used to create personalized recommendation solutions and generate targeted ads based on real-time information.

Pipeline components

Airbnb sample dataset

We use a CSV sample dataset with room listings in New York City from Airbnb. This dataset includes listing attributes such as the listing name, host name, location details, room type, price, and availability. Here is a data row example:

idnamehost_idhost_nameneighbourhood_groupneighbourhoodlatitudelongituderoom_typepriceminimum_nightsnumber_of_reviewslast_reviewreviews_per_monthcalculated_host_listings_countavailability_365

2539

Clean & quiet apt home by the park

2787

John

Brooklyn

Kensington

40.64749

-73.97237

Private room

149

1

9

2018-10-19

0.21

6

365

Data source

Supabase acts as the primary database where Airbnb listings are stored initially. Whenever a new listing is added or an existing one is updated, Supabase triggers an event to send changes directly to the GlassFlow pipeline using a Webhook data source connector.

Transformation: AI and Vectorization

GlassFlow applies AI-driven transformations in Python to the raw data captured from Supabase. In this case, the AI model from OpenAI is used to enrich the listing descriptions by generating richer and more descriptive summaries from listing attributes. The transformed descriptions are then vectorized—and converted into a format suitable for fast and efficient searches. These vectors are stored in Weaviate, enabling advanced search capabilities.

Data Sink

Weaviate is used as a vector database to store the listing summary and vectorized descriptions. The use of Weaviate ensures that the enriched descriptions are indexed and can be searched efficiently, providing users with the most relevant listings in real time.

Once we understand the use case and pipeline components, let's build a pipeline for it.

Set Up the Weaviate

  1. Have a Weaviate account.

  2. Login into the Weaviate console. Create a new collection called AirbnbNYC inside a cluster and choose a vectorizer type text2vec-openai and model text-embedding-3-small. Keep the rest of the configuration by default.

Set Up the GlassFlow Pipeline

You will use the GlassFlow WebApp to create a data processing pipeline.

Prerequisites

To start with the pipeline setup, you need the following.

Step 1. Log in to GlassFlow WebApp

Navigate to the GlassFlow WebApp and log in with your credentials.

Step 2. Create a New Pipeline

Click on "Create New Pipeline" and provide a name. You can name it "generative-feedback-loop". Optionally, create a Space or use the default main space.

Step 3. Configure a Data Source

Choose "Webhook" as the Data Source type. GlassFlow will provide you with a unique webhook URL for your pipeline after the pipeline is created. You will use it to push data from Supabase.

Step 4. Define the Transformer

Copy and paste the following transformation function code into the transformer's built-in editor.

By default, the transformer function uses a free OpenAI API key provided by GlassFlow.

You can replace it with your API key too. To do so:

  1. Have an OpenAI API account.

  2. Create an API key.

  3. Set the API key in the transformation code: OPENAI_API_KEY = "YOUR_OPENAI_API_KEY"

Step 5. Choose a transformer dependency

The transformation function uses openai external library in the code, so we need to choose it from the Dependencies dropdown menu. GlassFlow includes the external library in the function deployment and runtime. Read more about Python dependencies for transformation.

Step 6. Configure a Data Sink

Select "Webhook" as a data sink to configure the pipeline to use the Weaviate Webhook URL.

Fill in the URL and headers under Connector Details:

  1. Method: POST

  2. URL: https://${WEAVIATE_CLUSTER_URL}/v1/objects

  3. Headers:

    • Content-Type: application/json

    • Authentication: Bearer ${WEAVIATE_API_KEY}

Step 7. Confirm the Pipeline

Click "Next", confirm the pipeline settings in the final step, and click "Create Pipeline".

Step 8. Copy the Pipeline Credentials

Once the pipeline is created, copy its Access Token and Webhook URL.

Now the GlassFlow pipeline is ready to send data automatically to Weaviate. Next, you set up the Supabase database and populate it with sample data.

Set Up the Supabase

  1. Have a Supabase account.

  2. Create a project and a database table called "Airbnb-nyc-2019" in Supabase.

  3. Add a schema definition to map column names (such as the listing name, host name, location details, room type, price, etc.) in a CSV file to the Supabase table.

  4. Create a webhook trigger on Supabase. Follow the instructions from Supabase to create the webhook and hook it to INSERT events on your table. Use the GlassFlow pipeline Webhook URL you copied and add the following headers X-Pipeline-Access-Token set to a valid Access Token and Content-Type set to application/json.

  5. Copy the Supabase API URL and Key from the configuration.

Populate the Supabase database with data

To produce sample data for the Supabase database, you can run a Python script in the repo folder or insert data using directly Supabase.

Prerequisites

To complete this part you'll need the following:

  • Python is installed on your machine.

  • Download and Install Pip to manage project packages.

  • You have the Supabase API URL and Key obtained from the previous section.

Installation

  1. Clone the glassflow-examples repository to your local machine:

    git clone https://github.com/glassflow/glassflow-examples.git
  2. Navigate to the project directory:

    cd use-cases/generative-feedback-loop
  3. Create a new virtual environment:

    python -m venv .venv && source .venv/bin/activate
  4. Install the required dependencies:

    pip install -r requirements.txt

Create an environment configuration file

Add a .env file in the project directory and add the following configuration variables:

SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_access_key

Replace the placeholders (your_supabase_access_key, and your_supabase_url) with the appropriate values.

Unzip Airbnb New York Listings

Uncompress the sample CSV data provided in the repository:

cd data
tar -xzvf airbnb_vector_search.tar.gz

Populate Supabase database

To test the pipeline, we can create some rows in our Supabase table by running the command:

python populate_supabase.py 10

This will insert the first 10 rows from the Airbnb dataset into the Airbnb listings table in the Supabase database, which then sends the change events automatically to the GlassFlow pipeline. After the transformation, the data will be available immediately in Weaviate.

Search for Airbnb listings

You can now search for enriched listings in the Weaviate database using the search console. Run the following vector similarity query in the Weaviate query console:

{
  Get {
    AirbnbNYC(
      limit: 3
      nearText: {
        concepts: ["luxury apartment with a view"]
        distance: 0.7
      }
    ) {
      name
      summary
      price
      neighbourhood
      _additional {
        distance
      }
    }
  }
}

This type of vector search query is particularly useful in scenarios where you want to find items (like Airbnb listings) that match a particular set of characteristics described in natural human language. For example, a user might be looking for listings that are described as "luxurious" and "have a great view," and this query would return the most relevant results based on that description.

After running the command, you will get the relevant result:

An amazing part of the process is happening behind the scenes, the pipeline you created continuously delivers new Airbnb listings to show the latest data.

Conclusion

In this tutorial, you learned how to detect changes on the primary database, stream these changes, and continuously update the vector database for AI-powered applications.

Last updated

Logo

© 2023 GlassFlow