Real-time clickstream analytics
A practical example of creating a pipeline for analyzing clickstream data and continuously updating real-time dashboards.
Last updated
A practical example of creating a pipeline for analyzing clickstream data and continuously updating real-time dashboards.
Last updated
© 2023 GlassFlow
Clickstream data contains the information gathered as a user navigates through a web application. Clickstream analytics involves tracking, analyzing, and reporting the web pages visited and user behavior on those pages. This data provides valuable insights into user behavior, such as how they discovered the product or service and their interactions on the website.
In this tutorial, we will build a clickstream analytics dashboard using GlassFlow. We will use Google Analytics Data API in Python to collect clickstream data from a website and send them to a GlassFlow pipeline. Our transformation function will analyze the data to calculate additional metrics, and we will use Streamlit and Plotly to visualize the results.
There are two options for data producers:
Use Python script fake_producer.py
with the Faker library to generate mock clickstream data and push it to GlassFlow. You do not need to Set Up Google Analytics 4 API in this case.
Use the Google Analytics 4 Data API integration example code in ga_producer.py
Python script to push real-time report events to GlassFlow.
GlassFlow is responsible for receiving real-time analytics data from the producer using Python SDK, applying the transformation function, and then making the transformed data available for consumption by the consumer.
The dashboard component is built using Streamlit, a powerful tool for creating interactive web applications. This component visualizes the clickstream data by creating various charts and graphs in Plotly.
You will use the GlassFlow WebApp to create a data processing pipeline.
To start with this setup, you need a free GlassFlow account.
Navigate to the GlassFlow WebApp and log in with your credentials.
Click on "Create New Pipeline" and provide a name. You can name it "Clickstream Analytics".
Select "SDK" to configure the pipeline to use Python SDK for ingesting analytics event data from the API.
To provide meaningful insights to the user based on the received dimensions and metrics from Google Analytics, we apply some computations in the transformation function. Copy and paste the following transformation function into the transformer's built-in editor.
Note that the handler function is mandatory to implement in your code. Without it, the transformation function will not be successful.
The sample transformation function enriches input event data with the following:
Engagement Score: Calculates an engagement score based on event count, screen page views, and active users.
Device Usage Insights: Analyzes the proportion of different device categories.
Content Popularity: Tracks the popularity of different screens/pages.
Geographic Distribution: Provides insights on user distribution based on geographic location.
Select "SDK" to configure the pipeline to use Python SDK for sending data to the fleet management file. In a real-world project, you send data to dashboards and notification systems.
Confirm the pipeline settings in the final step and click "Create Pipeline".
Once the pipeline is created, copy its credentials such as Pipeline ID and Access Token.
You have a Google Analytics (GA) account if you use the GA as a data producer.
Google Analytics 4 (or GA4) has an API that provides access to page views, traffic sources, and other data points. With this API, you can build custom dashboards, automate reporting, and integrate with other applications. We focus only on accessing and exporting data to GlasFlow using Python. You can find more comprehensive information about how to set up the Google Cloud Project (GCP), enable the API, and configure authentication in the API quickstart, or follow this step-by-step guide.
Enable the Google Analytics Data API for a new project or select an existing project.
Go to https://console.cloud.google.com/apis/credentials. Click "Create credentials" and choose a "Service Account" option. Name the service user and click through the next steps.
Once more go to https://console.cloud.google.com/apis/credentials and click on your newly created user (under Service Accounts) Go to "Keys", click "Add key" -> "Create new key" -> "JSON". A JSON file will be saved to your computer.
Rename this JSON file to credentials.json
and put it under use-cases/clickstream-analytics-dashboard
. Then set the path to this file to the environment variable GOOGLE_APPLICATION_CREDENTIALS
:
Add a service account to the Google Analytics property. Using a text editor or VS code, open the credentials.json
file downloaded in the previous step and search for client_email
field to obtain the service account email address that looks similar to:
Use this email address to add a user to the Google Analytics property you want to access via the Google Analytics Data API v1. For this tutorial, only Viewer permissions are needed.
Copy the Google Analytics property ID you are discovering and save it to variable value for GA_PROPERTY_ID
in a .env
file in the project directory.
The Streamlit dashboard code in consumer.py
the script will visualize the output from the GlassFlow transformation, which includes additional insights such as engagement score, device usage, content popularity, and geographic distribution.
The dashboard is updated in real-time with data being continuously consumed from the GlassFlow pipeline.
To complete this part you'll need the following:
Python is installed on your machine.
Download and Install Pip to manage project packages.
Clone the glassflow-examples
repository to your local machine:
Navigate to the project directory:
Create a new virtual environment:
Install the required dependencies:
Add a .env
file in the project directory with the following configuration variables and their values:
Replace your_pipeline_id
and your_pipeline_access_token
with appropriate values obtained from your GlassFlow account.
Run the ga_producer.py
or fake_producer.py
script to start publishing data:
Use Streamlit command to run the dashboard:
You see the output with several dashboards updating in real-time:
You learned how to integrate real-time analytics data from Google Analytics into GlassFlow for further processing and visualization. Analytics data can be also stored in a database like ClickHouse for future use.