Real-time clickstream analytics

A practical example of creating a pipeline for analyzing clickstream data and continuously updating real-time dashboards.

Clickstream data contains the information gathered as a user navigates through a web application. Clickstream analytics involves tracking, analyzing, and reporting the web pages visited and user behavior on those pages. This data provides valuable insights into user behavior, such as how they discovered the product or service and their interactions on the website.

In this tutorial, we will build a clickstream analytics dashboard using GlassFlow. We will use Google Analytics Data API in Python to collect clickstream data from a website and send them to a GlassFlow pipeline. Our transformation function will analyze the data to calculate additional metrics, and we will use Streamlit and Plotly to visualize the results.

Pipeline components

Producer

There are two options for data producers:

GlassFlow

GlassFlow is responsible for receiving real-time analytics data from the producer using Python SDK, applying the transformation function, and then making the transformed data available for consumption by the consumer.

Consumer

The dashboard component is built using Streamlit, a powerful tool for creating interactive web applications. This component visualizes the clickstream data by creating various charts and graphs in Plotly.

Setting Up the Pipeline with GlassFlow

You will use the GlassFlow WebApp to create a data processing pipeline.

Prerequisites

To start with this setup, you need a free GlassFlow account.

Sign up for a free account

Step 1. Log in to GlassFlow WebApp

Navigate to the GlassFlow WebApp and log in with your credentials.

Step 2. Create a New Pipeline

Click on "Create New Pipeline" and provide a name. You can name it "Clickstream Analytics".

Step 3. Configure a Data Source

Select "SDK" to configure the pipeline to use Python SDK for ingesting analytics event data from the API.

Step 4. Define the Transformer

To provide meaningful insights to the user based on the received dimensions and metrics from Google Analytics, we apply some computations in the transformation function. Copy and paste the following transformation function into the transformer's built-in editor.

Note that the handler function is mandatory to implement in your code. Without it, the transformation function will not be successful.

The sample transformation function enriches input event data with the following:

  • Engagement Score: Calculates an engagement score based on event count, screen page views, and active users.

  • Device Usage Insights: Analyzes the proportion of different device categories.

  • Content Popularity: Tracks the popularity of different screens/pages.

  • Geographic Distribution: Provides insights on user distribution based on geographic location.

Step 5. Configure a Data Sink

Select "SDK" to configure the pipeline to use Python SDK for sending data to the fleet management file. In a real-world project, you send data to dashboards and notification systems.

Step 6. Confirm the Pipeline

Confirm the pipeline settings in the final step and click "Create Pipeline".

Step 7. Copy the Pipeline Credentials

Once the pipeline is created, copy its credentials such as Pipeline ID and Access Token.

Setting Up Google Analytics 4 API

Prerequisites

You have a Google Analytics (GA) account if you use the GA as a data producer.

Google Analytics 4 (or GA4) has an API that provides access to page views, traffic sources, and other data points. With this API, you can build custom dashboards, automate reporting, and integrate with other applications. We focus only on accessing and exporting data to GlasFlow using Python. You can find more comprehensive information about how to set up the Google Cloud Project (GCP), enable the API, and configure authentication in the API quickstart, or follow this step-by-step guide.

  1. Enable the Google Analytics Data API for a new project or select an existing project.

  2. Go to https://console.cloud.google.com/apis/credentials. Click "Create credentials" and choose a "Service Account" option. Name the service user and click through the next steps.

  3. Once more go to https://console.cloud.google.com/apis/credentials and click on your newly created user (under Service Accounts) Go to "Keys", click "Add key" -> "Create new key" -> "JSON". A JSON file will be saved to your computer.

  4. Rename this JSON file to credentials.json and put it under use-cases/clickstream-analytics-dashboard. Then set the path to this file to the environment variable GOOGLE_APPLICATION_CREDENTIALS:

export GOOGLE_APPLICATION_CREDENTIALS=credentials.json
  1. Add a service account to the Google Analytics property. Using a text editor or VS code, open the credentials.json file downloaded in the previous step and search for client_email field to obtain the service account email address that looks similar to:

Use this email address to add a user to the Google Analytics property you want to access via the Google Analytics Data API v1. For this tutorial, only Viewer permissions are needed.

  1. Copy the Google Analytics property ID you are discovering and save it to variable value for GA_PROPERTY_ID in a .env file in the project directory.

Design Streamlit dashboard

The Streamlit dashboard code in consumer.py the script will visualize the output from the GlassFlow transformation, which includes additional insights such as engagement score, device usage, content popularity, and geographic distribution.

The dashboard is updated in real-time with data being continuously consumed from the GlassFlow pipeline.

Send and consume data from the pipeline

Prerequisites

To complete this part you'll need the following:

Installation

  1. Clone the glassflow-examples repository to your local machine:

    git clone https://github.com/glassflow/glassflow-examples.git
  2. Navigate to the project directory:

    cd use-cases/clickstream-analytics-dashboard
  3. Create a new virtual environment:

    python -m venv .venv && source .venv/bin/activate
  4. Install the required dependencies:

    pip install -r requirements.txt

Create an environment configuration file

Add a .env file in the project directory with the following configuration variables and their values:

GA_PROPERTY_ID=your_ga_property_id # You do not need it if you generating mock events. 
PIPELINE_ID=your_pipeline_id
PIPELINE_ACCESS_TOKEN=your_pipeline_access_token

Replace your_pipeline_idand your_pipeline_access_token with appropriate values obtained from your GlassFlow account.

Run data producer

Run the ga_producer.py or fake_producer.py script to start publishing data:

python ga_producer.py

Run the dashboard

Use Streamlit command to run the dashboard:

streamlit run consumer.py

You see the output with several dashboards updating in real-time:

You learned how to integrate real-time analytics data from Google Analytics into GlassFlow for further processing and visualization. Analytics data can be also stored in a database like ClickHouse for future use.

Last updated

Logo

© 2023 GlassFlow