pipeline.yaml reference
pipeline.yaml declares how a single ingestion pipeline behaves. One file per pipeline.
Minimal example
format: csv
dest_table: orders
connector:
type: sqlite
url: sqlite:///orders.db
columns:
- source: order_id
dest: order_id
type: string
required: true
- source: amount
dest: amount
type: float
required: true
Full example
format: csv
dest_table: orders
write_mode: append
retry_cap: 3
stale_timeout_minutes: 30
batch_size: 1000
connector:
type: postgres
url: postgresql://user:pass@host/dbname
columns:
- source: order_id
dest: order_id
type: string
required: true
- source: amount
dest: amount
type: float
required: true
- source: order_date
dest: order_date
type: date
required: false
- source: notes
dest: notes
type: string
required: false
Top-level fields
format
Required. The file format for files in the watched directory.
| Value | File types |
|---|---|
csv |
.csv |
ndjson |
.ndjson, .jsonl, .ndjson.gz |
dest_table
Required. The name of the destination table to write rows into.
write_mode
Optional. Default: append.
| Value | Behaviour | Idempotency |
|---|---|---|
append |
Rows added alongside prior records | Delete-where-hash then insert on retry |
truncate |
Table wiped then replaced with this file's rows | Inherently idempotent |
cdc |
Apply CDC Files as SCD Type 1 inserts, updates, and deletes | Re-applying the same File converges by business key |
When write_mode: cdc is used, a cdc: block is required. CDC support starts from complete Files in the Watched Directory; Filedge does not capture database logs or consume directly from queues.
write_mode: cdc
cdc:
keys: [customer_id]
operation_column: op
sequence_by: updated_at
operations:
insert: [c, insert]
update: [u, update]
delete: [d, delete]
cdc
Configures how Filedge applies a CDC File to the destination table.
| Field | Required | Meaning |
|---|---|---|
keys |
Yes | Source column names that identify the destination row |
operation_column |
Yes | Source column containing the change operation |
sequence_by |
Yes | Source column used to pick the latest change for a key within one File |
operations.insert |
Yes | Operation values treated as inserts |
operations.update |
Yes | Operation values treated as updates |
operations.delete |
Yes | Operation values treated as deletes |
keys and sequence_by must be declared in columns:. The operation column may be CDC metadata only; it does not need to be declared unless you also want to write it to the destination.
First-version CDC support is SCD Type 1 only. Inserts and updates replace the current row for the configured key. Deletes remove the current row for the key. SCD Type 2 history tables are out of scope.
retry_cap
Optional. Default: 3. Maximum number of attempts before a file enters terminal FAILED state. Set to 1 to disable automatic retry.
stale_timeout_minutes
Optional. Default: 30. How long a PROCESSING lock may be held before it's reclaimed as stale.
batch_size
Optional. Default: 1000. Number of rows per database batch during write_rows. Larger batches are more efficient but use more memory.
connector block
Declares the destination backend. See Connectors for full details on each type.
connector:
type: sqlite # sqlite | postgres | bigquery | databricks | duckdb
url: sqlite:///... # type-specific options follow
columns block
Declares the schema mapping between source file columns and destination table columns.
columns:
- source: <source_column_name> # name as it appears in the file
dest: <dest_column_name> # name in the destination table
type: <type> # see Column Types
required: true | false
source
The column name as it appears in the CSV header or NDJSON key.
dest
The column name in the destination table. May differ from source for renaming.
type
The target type for coercion. See Column Types.
required
Whether a missing or null value in this column should fail the row. When required: true, a null or missing value causes the file to fail (strict mode — the whole file is rejected, not just the row).
Column tolerance
Extra columns in the source file that are not declared in columns: are silently ignored. Only declared columns are written to the destination. This lets upstream systems add fields without breaking your pipeline.
Schema guard
On first run, the connector creates the destination table from the columns: block. On subsequent runs, if the live table schema doesn't match the config, the run fails loudly with a diff. No auto-migration — schema changes require manual action.