Installation
Prerequisites
- Python 3.9 or higher
- pip or uv
Install from GitHub
clpipe is currently available from GitHub:
Install from Source
For development or to use the latest features:
# Clone the repository
git clone https://github.com/clpipe/clpipe.git
cd clpipe
# Install in development mode
pip install -e .
# Or with LLM support
pip install -e ".[llm]"
Optional Dependencies
LLM Support
For AI-powered column descriptions and documentation:
This includes:
- langchain - LLM framework
- langchain-openai - OpenAI integration
- langchain-anthropic - Anthropic/Claude integration
Usage:
from clpipe import Pipeline
from langchain_openai import ChatOpenAI
pipeline = Pipeline.from_sql_files("queries/", dialect="bigquery")
# Set LLM for documentation generation
pipeline.llm = ChatOpenAI(model="gpt-4")
# Generate descriptions for all columns
pipeline.generate_all_descriptions()
Airflow Integration
For generating Airflow DAGs:
Note: Airflow is not included by default. Install it separately if you plan to use pipeline.to_airflow_dag().
Verify Installation
Supported Platforms
- Operating Systems: Linux, macOS, Windows
- Python Versions: 3.9, 3.10, 3.11, 3.12
- Databases: BigQuery, Snowflake, PostgreSQL, DuckDB, Redshift, and more
Troubleshooting
Import Error: No module named 'clpipe'
Make sure you've activated the correct virtual environment:
LLM Dependencies Missing
If you see errors about missing langchain modules:
Airflow Import Error
If you plan to use Airflow integration:
Next Steps
- Quick Start - Get up and running in 5 minutes
- Examples - See real-world use cases
- Concepts - Understand how it works