Getting Started

Welcome to clpipe! Get up and running in minutes.

What You'll Learn

This guide will help you:

Install clpipe - Set up in your environment
Parse SQL - Build your first lineage graph
Explore lineage - Understand table and column dependencies
Execute pipelines - Run SQL or generate Airflow DAGs
Use advanced features - Metadata propagation, LLM documentation, pipeline splitting

Quick Links

Installation

Install from GitHub repository:

pip install git+https://github.com/clpipe/clpipe.git

Takes 2 minutes. Full guide →

Quick Start

5-minute tutorial to build your first pipeline:

from clpipe import Pipeline

# Parse SQL
pipeline = Pipeline.from_sql_files("queries/", dialect="bigquery")

# Explore lineage
tables = pipeline.table_graph.tables
sources = pipeline.trace_column_backward("table", "column")

# Execute
results = pipeline.run(executor=my_executor)

Full tutorial →

Examples

Real-world use cases:

PII Compliance Audit - Find all sensitive data
Impact Analysis - Know what breaks before making changes
Multi-Schedule Pipelines - Different frequencies for different tables
LLM Documentation - Auto-generate descriptions
Root Cause Analysis - Trace data issues back to source

See all examples →

Learning Path

For New Users

Install - Get clpipe set up
Quick Start - Build your first pipeline
Concepts: From SQL to Lineage Graph - Understand how it works

For Existing Projects

Install - Add to your project
Examples - Find your use case
Concepts: Table Lineage & Orchestration - Learn execution patterns

For Production Deployment

Quick Start: Generate Airflow DAG - Create production DAG
Examples: Multi-Schedule Pipeline - Split by frequency
Examples: Team-Based Split - Organize by ownership

Key Features

Built-In Column Lineage

# Trace any column to its source
sources = pipeline.trace_column_backward(
    "analytics.customer_metrics",
    "avg_order_value"
)

# Output: Complete path from raw data to final metric

No configuration required. Column lineage is built automatically when you parse SQL.

Automatic Metadata Propagation

# Set PII once at source
pipeline.columns["raw.users.email"].pii = True

# Propagates through entire pipeline
pipeline.propagate_all_metadata()

# Query anywhere
pii_columns = pipeline.get_pii_columns()

Tags, ownership, and PII markers flow automatically through joins and transformations.

Multiple Execution Modes

# Local execution
results = pipeline.run(executor=my_executor, max_workers=4)

# Async execution
results = await pipeline.async_run(executor=my_async_executor)

# Airflow DAG
dag = pipeline.to_airflow_dag(executor=my_executor, dag_id="pipeline")

Write once, deploy anywhere.

No Vendor Lock-In

# Your lineage lives in your code
lineage_json = pipeline.to_json()

# Export to any format
lineage_df = pipeline.to_dataframe()

# Integrate with any tool

You own the graph. Not locked into any platform.

Common Questions

Do I need to change my SQL?

No. clpipe works with your existing SQL files. No annotations, no special syntax.

What databases are supported?

BigQuery, Snowflake, PostgreSQL, DuckDB, Redshift, and many more.

Can I use it with Airflow?

Yes. Generate Airflow DAGs automatically with pipeline.to_airflow_dag().

Does it work with large pipelines?

Yes. Tested on 1,000+ queries and 10,000+ columns. Parse time <5 seconds.

Is it open source?

Yes. MIT license. View on GitHub.

Need Help?

Questions? Open an issue on GitHub
Bug reports? File a bug
Feature requests? Start a discussion

Next Steps

Ready to dive in?

Installation - Get set up (2 minutes)
Quick Start - Build your first pipeline (5 minutes)
Examples - See real-world use cases

Or explore the fundamentals:

Concepts - How clpipe works
API Documentation - Full reference