Real-time classified ads enrichment
A practical example of creating a pipeline to enrich classified ads in real-time using AI, GlassFlow, Langchain, and Redis.
Learn how to build a data processing pipeline to enrich classified ads in real-time. We'll use GlassFlow to process ads, enrich them with additional information, categorize them using Langchain and OpenAI, and store the enriched ads in Redis for quick and advanced search.
Pipeline components
Data source
The initial data source is a platform where users post classified ads. This could be an existing marketplace like Craigslist or eBay, or a custom-built website. GlassFlow can also continuously ingest new and updated ads from a custom website using a Webhook connector. For the sake of the demo, we generate and ingest sample ads from predefined JSON files using Python SDK. Sample input looks like this:
Transformation
The transformation function in GlassFlow processes the ingested ads. It uses AI to analyze images for tags, descriptions, and generated content descriptions.
Data sink
The enriched ad data is stored in Redis, a high-performance in-memory database. The enriched data in Redis is made available to the frontend of the classified ads platform, ensuring that users see the most relevant and informative ads.
Setting Up the Pipeline with GlassFlow
You will use the GlassFlow WebApp to create a data processing pipeline.
Prerequisites
To start with this setup, you need a free GlassFlow account.
Step 1. Log in to GlassFlow WebApp
Navigate to the GlassFlow WebApp and log in with your credentials.
Step 2. Create a New Pipeline
Click on "Create New Pipeline" and provide a name. You can name it "classified-ads-enrichment".
Step 3. Configure a Data Source
Select the "SDK" data source to configure the pipeline to use Python SDK. For demo purposes, we ingest data from a JSON file with examples of classified ads using a Python script called producer.py
.
Step 4. Define the Transformer
generateThe transformation function uses Langchain and OpenAI to analyze, categorize, and add more information to the ad, generate tags based on ad images, and write a summary for each ad.
Copy and paste the following transformation function code into the transformer's built-in editor.
Note that the handler function is mandatory to implement in your code. Without it, the transformation function will not be successful.
By default, the transformer function uses a free OpenAI API key provided by GlassFlow.
You can replace it with your API key too. To do so:
Have an OpenAI API account.
Create an API key.
Set the API key in the transformation code.
Step 5. Choose a transformer dependency
The transformation function uses openai external library in the code, so we need to choose it from the Dependencies dropdown menu. GlassFlow includes the external library in the function deployment and runtime. Read more about Python dependencies for transformation.
Step 6. Configure a Data Sink
Select "SDK" as a data sink to configure the pipeline to use Python SDK. Code provided in Python sink_connector.py
for sending enriched classified ads to Redis. You use Docker to run a Redis instance and a sink connector Python script for Redis.
Step 7. Confirm the Pipeline
Confirm the pipeline settings in the final step and click "Create Pipeline".
Step 8. Copy the Pipeline Credentials
Once the pipeline is created, copy its credentials such as Pipeline ID and Access Token.
Send and consume data from the pipeline
Prerequisites
To complete this part you'll need the following:
Python is installed on your machine.
Download and Install Pip to manage project packages.
Installation
Clone the
glassflow-examples
repository to your local machine:Navigate to the project directory:
Create a new virtual environment:
Install the required dependencies:
Create an environment configuration file
Add a .env
file in the project directory and add the following configuration variables:
Replace your_pipeline_id
and your_pipeline_access_token
with appropriate values obtained from your GlassFlow account.
Run the Docker
Run the below Docker command in a terminal:
Docker compose will spin up the sink connector service that listens to the GlassFlow pipeline and saves the documents to the Redis database continuously. Also, it runs Redis Stack with Redis server and Redis insight.
Run the producer
Run producer.py
Python script in a terminal to publish sample ads data to the GlassFlow pipeline:
Once your Docker containers and producer Python script are running, you can access the Redis Insight UI to view the enriched ads.
See the enriched data on Redis Insight
Open your web browser and navigate to the following URL(https://localhost:8001) in your browser and have a look at the enriched data.
You should see the output with enriched ads.
Summary
By following these steps, you created the pipeline with GlassFlow, processed and enriched classified ads in real time. You can also experiment with updating input ads data and searching for ads based on different attributes to see how the enriched information enhances the search experience. Explore other use cases to see how GlassFlow can revolutionize your data processing needs.
Last updated