Real-time log data anomaly detection
A practical example of creating a pipeline for real-time logs data anomaly detection using AI.
This example data transformation pipeline demonstrates data anomaly detection with GlassFlow and AI to monitor server logs to detect unusual patterns or suspicious activities and send notifications to Slack.
Setting Up the Pipeline with GlassFlow
You will use the GlassFlow WebApp to create a data processing pipeline.
Prerequisites
To start with the pipeline creation, you need the following.
You have an OpenAI API account.
Slack account: If don't have a Slack account, sign up for a new free one here and go to the Slack Get Started page.
Slack workspace: You need access to a Slack workspace where you're an admin. If you are creating just a new workspace, follow this guide.
You created an incoming webhook for your Slack workspace.
Step 1. Log in to GlassFlow WebApp
Navigate to the GlassFlow WebApp and log in with your credentials.
Step 2. Create a New Pipeline
Click on "Create New Pipeline" and provide a name. You can name it "Log Data Anomaly Detection".
Step 3. Configure a Data Source
Select "SDK" to configure the pipeline to use Python SDK to ingest log data from a source like PostgreSQL. For the sake of the demo, we use a sample server logs generator Python script.
Step 4. Define the Transformer
AI-powered transformation function in Python detects anomalies in log data using LLMs(Large Language Models) like GPT-3.5-turbo
from OpenAI. Create an API key and set the API key
in the transformation code below. Paste the updated transformation function code into the transformer's built-in editor.
Note that the handler function is mandatory to implement in your code. Without it, the transformation function will not be successful.
Step 5. Choose a transformer dependency
The transformation function uses openai external library in the code, so we need to choose it from the Dependencies dropdown menu. GlassFlow includes the library in the function deployment and runtime. Read more about Python dependencies for transformation.
Step 6. Configure a Data Sink
Select "Webhook" as a data sink to configure the pipeline to use the Slack Incoming Webhook URL.
Fill in the URL and headers under Connector Details:
Method: POST
URL:
https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
Headers:
Content-Type
:application/json
Step 7. Confirm the Pipeline
Confirm the pipeline settings in the final step and click "Create Pipeline".
Step 8. Copy the Pipeline Credentials
Once the pipeline is created, copy its credentials such as Pipeline ID and Access Token.
Send sample log data to the pipeline
Prerequisites
To complete this part you'll need the following:
Python is installed on your machine.
Download and Install Pip to manage project packages.
Installation
Clone the
glassflow-examples
repository to your local machine:Navigate to the project directory:
Create a new virtual environment:
Install the required dependencies:
Create an environment configuration file
Add a .env
file in the project directory and add the following configuration variables:
Replace your_pipeline_id
and your_pipeline_access_token
with appropriate values obtained from your GlassFlow account.
Run the Log Producer
Run producer.py
Python script in a terminal to publish sample server log data to the GlassFlow pipeline:
GlassFlow pipeline automatically sends transformed events to Slack in case any suspicious or unusual activities are detected in the sample logs. You should see an output indicating that messages are being received on Slack.
Summary
Following this tutorial, you’ve set up a real-time log data anomaly detection pipeline using GlassFlow, Open AI, and Slack. Enriched logs, containing identified anomalies, can also be sent to Amazon S3 or OpenSearch Service for further analysis and long-term storage. Additionally, alert notifications can be integrated with communication platforms such as Microsoft Teams or SMS services like Twilio.
This pipeline can be easily adapted for other real-time alerting use cases. That includes monitoring financial transactions for fraud, detecting security breaches, tracking performance metrics, and ensuring compliance with regulatory requirements.
Last updated