Getting Started
This guide walks you from a raw CSV file to a running pipeline in about 5 minutes.
Requirements
- Python 3.11 or newer
- uv — the package manager used by this project
Install
The filedge command is now available in the project's virtual environment:
To use it without uv run, activate the environment:
Step 1: Inspect your file
Start by pointing filedge inspect at your data file. It samples the first 1,000 rows and produces a ready-to-paste columns: block for pipeline.yaml.
Output goes to stdout; a human-readable summary goes to stderr:
Columns: 4 High confidence: 3 Low confidence: 1 Ambiguous: 0
# Inferred from data.csv (1000 rows sampled)
columns:
- source: order_id
dest: order_id
type: string
required: true
- source: amount
dest: amount
type: float
required: true # ⚠ low confidence — 3 null values in sample
- source: order_date
dest: order_date
type: date
required: false
- source: customer_name
dest: customer_name
type: string
required: true
Review columns marked low confidence or ambiguous before using them in production. See the Inspect guide for details.
To write the output directly to a file:
Step 2: Complete the config
filedge inspect produces a columns: block. Wrap it in a full pipeline.yaml:
format: csv
dest_table: orders
write_mode: append
connector:
type: sqlite
url: sqlite:///orders.db
columns:
- source: order_id
dest: order_id
type: string
required: true
- source: amount
dest: amount
type: float
required: true
- source: order_date
dest: order_date
type: date
required: false
- source: customer_name
dest: customer_name
type: string
required: true
See the pipeline.yaml reference for every available option.
Step 3: Validate the config
Before writing any data, dry-run the file against your config:
Exit code 0 means clean; exit code 1 means failures were found:
Or with failures:
✗ row 42 amount cannot coerce 'n/a' to float
✗ row 87 amount cannot coerce '' to float (required)
2 failure(s) in 1000 rows checked
Fix the source data (or adjust required: false in the config) until validation is clean. See the Validate guide for more options.
Step 4: Run the pipeline
Place your files in an incoming directory and run:
filedge run --dir ./incoming --config pipeline.yaml --audit-db-url sqlite:///filedge.db
# Committed: 1 Failed: 0 Skipped: 0 New: 1 Reclaimed: 0 Retried: 0
--audit-db-url can also be set via the FILEDGE_AUDIT_DB_URL environment variable.
Check status any time:
filedge status --audit-db-url sqlite:///filedge.db
# PENDING: 0
# PROCESSING: 0
# COMMITTED: 1
# FAILED: 0
Previewing rows
If validation reports a bad row, jump straight to it without opening the file in an editor:
See the Preview guide for details.
Parquet files
Filedge supports Parquet natively. Install the optional extra first:
Then use any read command as usual — the format is detected from the .parquet extension:
filedge inspect events.parquet
filedge preview events.parquet
filedge validate events.parquet --config pipeline.yaml
Next steps
- Preview guide — spot-check files and jump to specific rows
- Run guide — scheduling, retry behaviour, write modes
- Connectors — switch from SQLite to PostgreSQL or BigQuery
- Compact guide — merge small files before ingestion