Skip to content

Use Cases

Real-world applications of clpipe for SQL pipeline development, testing, and governance.


Three Pillars of clpipe

clpipe provides three complementary capabilities that address different aspects of SQL pipeline management:

  • SQL Debugging & Lineage


    Understand your data flow, trace issues to source, and confidently make changes with complete lineage visibility.

    Learn more

  • Multi-Environment Execution


    Test locally with DuckDB, deploy to BigQuery in production—same SQL, different backends, zero vendor lock-in.

    Learn more

  • Data Catalog & AI


    Extract metadata from SQL comments, propagate PII tracking, and enable AI-powered data discovery.

    Learn more


Quick Comparison

Capability What it solves Key benefit
Lineage "Where does this data come from?" Debug in seconds, not hours
Execution "How do I test without touching prod?" Local testing with DuckDB
Metadata "What does this column mean?" Auto-documented, AI-ready data

Example Pipeline

All use cases in this section use an e-commerce pipeline example:

source_orders ─────┐
source_customers ──┼──> raw_* tables ──> stg_orders_enriched ──┬──> int_daily_metrics
source_products ───┤                                           ├──> mart_customer_ltv
source_order_items─┘                                           └──> mart_product_performance

Try the examples yourself:

cd clpipe

# Lineage analysis (no execution)
uv run python examples/sql_files/run_lineage.py

# Full execution with DuckDB
uv run python examples/sql_files/run_with_duckdb.py

# Metadata management
uv run python examples/sql_files/run_metadata.py

Use Case Details

SQL Debugging & Lineage

For data engineers who need to understand and debug complex SQL pipelines

Key capabilities:

  • Trace any column backward to its sources
  • Analyze impact of schema changes before making them
  • Understand query dependencies and execution order
  • Debug data quality issues by following the data path

Example:

# Where does this metric come from?
sources = pipeline.trace_column_backward("mart_customer_ltv", "lifetime_revenue")

# What breaks if I rename this column?
impacts = pipeline.trace_column_forward("raw_orders", "total_amount")

Full guide


Multi-Environment Execution

For teams who want fast local testing and production-ready deployments

Key capabilities:

  • Run the same SQL against DuckDB locally or BigQuery in production
  • Generate fake data for testing
  • Integrate with CI/CD pipelines
  • Prevent vendor lock-in with portable SQL

Example:

# Same pipeline, different executors
def duckdb_executor(sql): conn.execute(sql)
def bigquery_executor(sql): client.query(sql).result()

# Local testing
result = pipeline.run(executor=duckdb_executor, max_workers=1)

# Production deployment
result = pipeline.run(executor=bigquery_executor, max_workers=4)

Full guide


Data Catalog & AI

For data teams building self-documenting, AI-ready data platforms

Key capabilities:

  • Extract metadata from inline SQL comments
  • Automatically propagate PII flags through lineage
  • Query columns by owner, tag, or custom metadata
  • Enable text-to-SQL with graph-grounded context

Example:

# PII propagates through transformations
pipeline.propagate_all_metadata()
pii_columns = list(pipeline.get_pii_columns())

# Find columns by metadata
finance_columns = list(pipeline.get_columns_by_owner("finance"))
metric_columns = list(pipeline.get_columns_by_tag("metric"))

Full guide


Next Steps