Real-time classified ads enrichment
A practical example of creating a pipeline to enrich classified ads in real-time using AI, GlassFlow, Langchain, and Redis.
Last updated
A practical example of creating a pipeline to enrich classified ads in real-time using AI, GlassFlow, Langchain, and Redis.
Last updated
© 2023 GlassFlow
Learn how to build a data processing pipeline to enrich classified ads in real-time. We'll use GlassFlow to process ads, enrich them with additional information, categorize them using Langchain and OpenAI, and store the enriched ads in Redis for quick and advanced search.
Link to the GitHub project repository
The initial data source is a platform where users post classified ads. This could be an existing marketplace like Craigslist or eBay, or a custom-built website. GlassFlow can also continuously ingest new and updated ads from a custom website using a Webhook connector. For the sake of the demo, we generate and ingest sample ads from predefined JSON files using Python SDK. Sample input looks like this:
The transformation function in GlassFlow processes the ingested ads. It uses AI to analyze images for tags, descriptions, and generated content descriptions.
The enriched ad data is stored in Redis, a high-performance in-memory database. The enriched data in Redis is made available to the frontend of the classified ads platform, ensuring that users see the most relevant and informative ads.
You will use the GlassFlow WebApp to create a data processing pipeline.
To start with this setup, you need a free GlassFlow account.
Navigate to the GlassFlow WebApp and log in with your credentials.
Click on "Create New Pipeline" and provide a name. You can name it "classified-ads-enrichment".
Select the "SDK" data source to configure the pipeline to use Python SDK. For demo purposes, we ingest data from a JSON file with examples of classified ads using a Python script called producer.py
.
generateThe transformation function uses Langchain and OpenAI to analyze, categorize, and add more information to the ad, generate tags based on ad images, and write a summary for each ad.
Copy and paste the following transformation function code into the transformer's built-in editor.
Note that the handler function is mandatory to implement in your code. Without it, the transformation function will not be successful.
By default, the transformer function uses a free OpenAI API key provided by GlassFlow.
You can replace it with your API key too. To do so:
Have an OpenAI API account.
Create an API key.
Set the API key in the transformation code.
The transformation function uses openai external library in the code, so we need to choose it from the Dependencies dropdown menu. GlassFlow includes the external library in the function deployment and runtime. Read more about Python dependencies for transformation.
Select "SDK" as a data sink to configure the pipeline to use Python SDK. Code provided in Python sink_connector.py
for sending enriched classified ads to Redis. You use Docker to run a Redis instance and a sink connector Python script for Redis.
Confirm the pipeline settings in the final step and click "Create Pipeline".
Once the pipeline is created, copy its credentials such as Pipeline ID and Access Token.
To complete this part you'll need the following:
Python is installed on your machine.
Download and Install Pip to manage project packages.
Clone the glassflow-examples
repository to your local machine:
Navigate to the project directory:
Create a new virtual environment:
Install the required dependencies:
Add a .env
file in the project directory and add the following configuration variables:
Replace your_pipeline_id
and your_pipeline_access_token
with appropriate values obtained from your GlassFlow account.
Run the below Docker command in a terminal:
Docker compose will spin up the sink connector service that listens to the GlassFlow pipeline and saves the documents to the Redis database continuously. Also, it runs Redis Stack with Redis server and Redis insight.
Run producer.py
Python script in a terminal to publish sample ads data to the GlassFlow pipeline:
Once your Docker containers and producer Python script are running, you can access the Redis Insight UI to view the enriched ads.
Open your web browser and navigate to the following URL(https://localhost:8001) in your browser and have a look at the enriched data.
You should see the output with enriched ads.
By following these steps, you created the pipeline with GlassFlow, processed and enriched classified ads in real time. You can also experiment with updating input ads data and searching for ads based on different attributes to see how the enriched information enhances the search experience. Explore other use cases to see how GlassFlow can revolutionize your data processing needs.