Use Cases
Real-world applications of clpipe for SQL pipeline development, testing, and governance.
Three Pillars of clpipe
clpipe provides three complementary capabilities that address different aspects of SQL pipeline management:
-
SQL Debugging & Lineage
Understand your data flow, trace issues to source, and confidently make changes with complete lineage visibility.
-
Multi-Environment Execution
Test locally with DuckDB, deploy to BigQuery in production—same SQL, different backends, zero vendor lock-in.
-
Data Catalog & AI
Extract metadata from SQL comments, propagate PII tracking, and enable AI-powered data discovery.
Quick Comparison
| Capability | What it solves | Key benefit |
|---|---|---|
| Lineage | "Where does this data come from?" | Debug in seconds, not hours |
| Execution | "How do I test without touching prod?" | Local testing with DuckDB |
| Metadata | "What does this column mean?" | Auto-documented, AI-ready data |
Example Pipeline
All use cases in this section use an e-commerce pipeline example:
source_orders ─────┐
source_customers ──┼──> raw_* tables ──> stg_orders_enriched ──┬──> int_daily_metrics
source_products ───┤ ├──> mart_customer_ltv
source_order_items─┘ └──> mart_product_performance
Try the examples yourself:
cd clpipe
# Lineage analysis (no execution)
uv run python examples/sql_files/run_lineage.py
# Full execution with DuckDB
uv run python examples/sql_files/run_with_duckdb.py
# Metadata management
uv run python examples/sql_files/run_metadata.py
Use Case Details
SQL Debugging & Lineage
For data engineers who need to understand and debug complex SQL pipelines
Key capabilities:
- Trace any column backward to its sources
- Analyze impact of schema changes before making them
- Understand query dependencies and execution order
- Debug data quality issues by following the data path
Example:
# Where does this metric come from?
sources = pipeline.trace_column_backward("mart_customer_ltv", "lifetime_revenue")
# What breaks if I rename this column?
impacts = pipeline.trace_column_forward("raw_orders", "total_amount")
Multi-Environment Execution
For teams who want fast local testing and production-ready deployments
Key capabilities:
- Run the same SQL against DuckDB locally or BigQuery in production
- Generate fake data for testing
- Integrate with CI/CD pipelines
- Prevent vendor lock-in with portable SQL
Example:
# Same pipeline, different executors
def duckdb_executor(sql): conn.execute(sql)
def bigquery_executor(sql): client.query(sql).result()
# Local testing
result = pipeline.run(executor=duckdb_executor, max_workers=1)
# Production deployment
result = pipeline.run(executor=bigquery_executor, max_workers=4)
Data Catalog & AI
For data teams building self-documenting, AI-ready data platforms
Key capabilities:
- Extract metadata from inline SQL comments
- Automatically propagate PII flags through lineage
- Query columns by owner, tag, or custom metadata
- Enable text-to-SQL with graph-grounded context
Example:
# PII propagates through transformations
pipeline.propagate_all_metadata()
pii_columns = list(pipeline.get_pii_columns())
# Find columns by metadata
finance_columns = list(pipeline.get_columns_by_owner("finance"))
metric_columns = list(pipeline.get_columns_by_tag("metric"))
Next Steps
- Installation - Get clpipe installed
- Quick Start - Build your first pipeline
- Concepts - Understand the fundamentals