Release Notes v2.5.x

The 2.5.x series represents a major architectural upgrade to GlassFlow, introducing PostgreSQL-based metadata storage, high-performance file-based deduplication, and a completely redesigned user interface. This release series significantly improves scalability, performance, and user experience while maintaining the core functionality that makes GlassFlow powerful.

Version History

v2.5.1 - Patch release with bug fixes and improvements
v2.5.0 - Major architectural upgrade with PostgreSQL metadata storage

🆕 What’s New in v2.5.1

Bug Fixes and Improvements

Fix issue with data persistance on postgresql chart
UI fix on handling updated pipeline configuration
Migration script fixes for restarting deduplication enabled pipelines

🚀 Migration Required

Important: Upgrading to v2.5.x requires following specific migration steps due to architectural changes.

👉 Complete Migration Guide → - Step-by-step instructions for upgrading from v2.4.x

What’s New in v2.5.x

🗄️ PostgreSQL Metadata Storage

GlassFlow now uses PostgreSQL as the primary metadata store, replacing NATS KV for pipeline storage:

Robust metadata management - Pipeline configurations, schemas, and state are now stored in PostgreSQL for better reliability and consistency
Integrated PostgreSQL instance - A PostgreSQL instance is included in the Helm installation and fully configurable via Helm values
Automatic migration - Existing pipeline data is automatically migrated from NATS KV to PostgreSQL during upgrade
Enhanced data integrity - Better transaction support and data consistency for pipeline metadata
Scalable storage - PostgreSQL provides better scalability for large numbers of pipelines and complex metadata

This change provides a more robust foundation for pipeline management and enables future enhancements like advanced querying and reporting capabilities.

🔄 High-Performance File-Based Deduplication

The deduplication service has been completely redesigned for better performance and longer time windows:

Persistent file storage - Deduplication now uses BadgerDB for persistent state instead of memory-only storage
Extended time windows - Support for much longer deduplication windows (up to 7 days) with minimal infrastructure footprint
Dedicated deduplication component - New separate component optimized specifically for deduplication operations
Lower memory usage - Significantly reduced memory requirements while maintaining high performance

For end users: The behavior remains exactly the same - you configure deduplication the same way, but now it’s more efficient and supports longer time windows.

📋 Restructured Pipeline Configuration Schema

The pipeline.json structure has been significantly reorganized for better clarity and maintainability:

Key Structural Changes

Schema Consolidation: The schema definition has been moved from being scattered across source topics and sink table mappings to a unified top-level schema section.

Before (v2.4.x):


{
  "source": {
    "topics": [
      {
        "schema": {
          "fields": [
            {"name": "event_id", "type": "string"},
            {"name": "user_id", "type": "string"}
          ]
        }
      }
    ]
  },
  "sink": {
    "table_mapping": [
      {
        "source_id": "topic1",
        "field_name": "event_id",
        "column_name": "event_id", 
        "column_type": "UUID"
      }
    ]
  }
}

After (v2.5.0):


{
  "source": {
    "topics": [
      {
        "name": "topic1",
        "deduplication": {...}
      }
    ]
  },
  "sink": {
    "type": "clickhouse",
    "host": "...",
    "max_batch_size": 100
  },
  "schema": {
    "fields": [
      {
        "source_id": "topic1",
        "name": "event_id",
        "type": "string",
        "column_name": "event_id",
        "column_type": "UUID"
      }
    ]
  }
}

Benefits of New Structure

Centralized schema management - All field definitions and mappings in one place
Cleaner separation of concerns - Source, sink, and schema configurations are clearly separated
Simplified field mapping - Each field contains both source and destination information
Better schema validation - Enhanced validation with clearer error messages
Improved maintainability - Easier to understand and modify pipeline configurations

🎨 Redesigned User Interface

A completely new user experience has been introduced:

Simplified pipeline creation - More intuitive wizard-based pipeline creation process
Enhanced pipeline management - Better overview and management of existing pipelines
Improved navigation - Streamlined navigation and more logical information architecture
Context-aware actions - Actions and options are now more contextually relevant
Enhanced error handling - Better error messages and user feedback throughout the interface

The new UI makes it easier for both new and experienced users to create, manage, and monitor their data pipelines.

⚙️ Kubernetes Configuration Management

Enhanced configuration management capabilities:

ConfigMap-based configuration - Cluster configuration can now be updated directly via Kubernetes ConfigMaps
Dynamic configuration updates - Configuration changes can be applied without full redeployment
Better Helm integration - Improved Helm values structure for easier configuration management
Environment-specific configs - Better support for different environment configurations

Migration Guide

⚠️ Important: This release requires migration from v2.4.x. The migration is automated but requires following specific steps.

Prerequisites

GlassFlow v2.4.x currently installed
Helm 3.x
Kubernetes cluster with sufficient resources for PostgreSQL

Migration Steps

Stop all running pipelines (via UI or API):


# Via API (stop each pipeline individually)
curl -X POST http://your-glassflow-api/api/v1/pipeline/{pipeline-id}/stop

Update Helm repository:
```
helm repo update
```

Verify new chart version:


helm show chart glassflow/glassflow-etl
# Should show version: 0.4.1, appVersion: 2.5.1

Upgrade to v2.5.x:


helm upgrade your-release-name glassflow/glassflow-etl \
  --namespace glassflow \
  --version 0.4.1 \
  --wait --timeout 600s

Verify migration completion:


# Check migration job completion
kubectl get jobs -n glassflow | grep migration
 
# Check migration logs
kubectl logs -n glassflow job/your-release-name-glassflow-etl-migration

Resume pipelines (via UI or API): Resume all pipelines that you stopped in the step 1. All the pipelines will then resume with the migrated configurations.

What Happens During Migration

The migration process automatically:

Creates PostgreSQL database - Configure PostgreSQL for metadata storage
Runs database migrations - Sets up the required database schema
Migrates pipeline data - Transfers all pipeline configurations from NATS KV to PostgreSQL
Preserves pipeline IDs - Maintains existing pipeline identifiers for continuity
Migrates schemas - Transfers all pipeline schemas and configurations
Updates internal references - Updates all internal references to use PostgreSQL

Migration Logs Example


INFO: Starting data migration from NATS KV to PostgreSQL
INFO: Found pipelines in NATS KV store count=5
INFO: Migrating pipeline pipeline_id=demo-pipeline-1 name=demo-dedup
INFO: Pipeline migrated successfully pipeline_id=demo-pipeline-1
INFO: Data migration completed migrated=5 skipped=0 errors=0
INFO: Migration job completed successfully

Rollback Considerations

⚠️ Important: Rolling back from v2.5.x to v2.4.x is not supported due to the architectural changes. Ensure you have:

Backup of NATS KV data (if needed for disaster recovery)
Pipeline configuration exports (via API or SDK)
Test the migration in a non-production environment first

Configuration Changes

New Helm Values

For a complete list of all available Helm values and configuration options, see the GlassFlow ETL Helm Chart README .

Key new values in v2.5.x:


# PostgreSQL configuration
postgresql:
  enabled: true
  auth:
    database: "glassflow"
    username: "glassflow"
    password: "your-secure-password"

Performance Improvements

Faster pipeline operations - PostgreSQL-based storage provides faster pipeline CRUD operations
Improved deduplication performance - File-based deduplication is more efficient for large datasets
Better resource utilization - Optimized memory usage across all components
Enhanced throughput - Better handling of high-volume data streams

Breaking Changes

For Existing Users

Migration required - Automatic migration from NATS KV to PostgreSQL
New UI - Completely redesigned user interface (functionality remains the same)
Configuration changes - New Helm values structure for PostgreSQL and deduplication

For Developers

Schema format updates - Internal pipeline schema format has been enhanced
Database integration - Applications integrating with GlassFlow metadata should use the API
Configuration management - Use ConfigMaps for cluster configuration updates

Try It Out

To experience the new features in v2.5.x:

Follow the migration guide to upgrade from v2.4.x or install the latest version v2.5.1 on a cluster
Explore the new UI - Create and manage pipelines with the redesigned interface
Test longer deduplication windows - Configure deduplication for extended time periods
Use ConfigMap configuration - Update cluster settings via Kubernetes ConfigMaps
Monitor PostgreSQL - Use standard PostgreSQL monitoring tools for metadata storage

CLI Support

GlassFlow v2.5.x includes CLI support for local development:


# Install/upgrade CLI
brew upgrade glassflow
 
# Start local development environment with new architecture
glassflow up --demo

See CLI Installation Guide for more details.

Full Changelog

For a complete list of all changes, improvements, and bug fixes in the v2.5.x series, see our GitHub releases:

GlassFlow v2.5.x represents a major step forward in making streaming ETL more scalable, performant, and user-friendly for enterprise production environments.