Use Cases

Real-world applications of clpipe for SQL pipeline development, testing, and governance.

Three Pillars of clpipe

clpipe provides three complementary capabilities that address different aspects of SQL pipeline management:

SQL Debugging & Lineage

Understand your data flow, trace issues to source, and confidently make changes with complete lineage visibility.

Learn more
Multi-Environment Execution

Test locally with DuckDB, deploy to BigQuery in production—same SQL, different backends, zero vendor lock-in.

Learn more
Data Catalog & AI

Extract metadata from SQL comments, propagate PII tracking, and enable AI-powered data discovery.

Learn more

Quick Comparison

Capability	What it solves	Key benefit
Lineage	"Where does this data come from?"	Debug in seconds, not hours
Execution	"How do I test without touching prod?"	Local testing with DuckDB
Metadata	"What does this column mean?"	Auto-documented, AI-ready data

Example Pipeline

All use cases in this section use an e-commerce pipeline example:

source_orders ─────┐
source_customers ──┼──> raw_* tables ──> stg_orders_enriched ──┬──> int_daily_metrics
source_products ───┤                                           ├──> mart_customer_ltv
source_order_items─┘                                           └──> mart_product_performance

Try the examples yourself:

cd clpipe

# Lineage analysis (no execution)
uv run python examples/sql_files/run_lineage.py

# Full execution with DuckDB
uv run python examples/sql_files/run_with_duckdb.py

# Metadata management
uv run python examples/sql_files/run_metadata.py

Use Case Details

SQL Debugging & Lineage

For data engineers who need to understand and debug complex SQL pipelines

Key capabilities:

Trace any column backward to its sources
Analyze impact of schema changes before making them
Understand query dependencies and execution order
Debug data quality issues by following the data path

Example:

# Where does this metric come from?
sources = pipeline.trace_column_backward("mart_customer_ltv", "lifetime_revenue")

# What breaks if I rename this column?
impacts = pipeline.trace_column_forward("raw_orders", "total_amount")

Full guide

Multi-Environment Execution

For teams who want fast local testing and production-ready deployments

Key capabilities:

Run the same SQL against DuckDB locally or BigQuery in production
Generate fake data for testing
Integrate with CI/CD pipelines
Prevent vendor lock-in with portable SQL

Example:

# Same pipeline, different executors
def duckdb_executor(sql): conn.execute(sql)
def bigquery_executor(sql): client.query(sql).result()

# Local testing
result = pipeline.run(executor=duckdb_executor, max_workers=1)

# Production deployment
result = pipeline.run(executor=bigquery_executor, max_workers=4)

Full guide

Data Catalog & AI

For data teams building self-documenting, AI-ready data platforms

Key capabilities:

Extract metadata from inline SQL comments
Automatically propagate PII flags through lineage
Query columns by owner, tag, or custom metadata
Enable text-to-SQL with graph-grounded context

Example:

# PII propagates through transformations
pipeline.propagate_all_metadata()
pii_columns = list(pipeline.get_pii_columns())

# Find columns by metadata
finance_columns = list(pipeline.get_columns_by_owner("finance"))
metric_columns = list(pipeline.get_columns_by_tag("metric"))

Full guide

Next Steps

Installation - Get clpipe installed
Quick Start - Build your first pipeline
Concepts - Understand the fundamentals