pipeline.yaml reference

pipeline.yaml declares how a single ingestion pipeline behaves. One file per pipeline.

Minimal example

format: csv
dest_table: orders

connector:
  type: sqlite
  url: sqlite:///orders.db

columns:
  - source: order_id
    dest: order_id
    type: string
    required: true
  - source: amount
    dest: amount
    type: float
    required: true

Full example

format: csv
dest_table: orders
write_mode: append
retry_cap: 3
stale_timeout_minutes: 30
batch_size: 1000

connector:
  type: postgres
  url: postgresql://user:pass@host/dbname

columns:
  - source: order_id
    dest: order_id
    type: string
    required: true
  - source: amount
    dest: amount
    type: float
    required: true
  - source: order_date
    dest: order_date
    type: date
    required: false
  - source: notes
    dest: notes
    type: string
    required: false

Top-level fields

`format`

Required. The file format for files in the watched directory.

Value	File types
`csv`	`.csv`
`ndjson`	`.ndjson`, `.jsonl`, `.ndjson.gz`

`dest_table`

Required. The name of the destination table to write rows into.

`write_mode`

Optional. Default: append.

Value	Behaviour	Idempotency
`append`	Rows added alongside prior records	Delete-where-hash then insert on retry
`truncate`	Table wiped then replaced with this file's rows	Inherently idempotent
`cdc`	Apply CDC Files as SCD Type 1 inserts, updates, and deletes	Re-applying the same File converges by business key

When write_mode: cdc is used, a cdc: block is required. CDC support starts from complete Files in the Watched Directory; Filedge does not capture database logs or consume directly from queues.

write_mode: cdc

cdc:
  keys: [customer_id]
  operation_column: op
  sequence_by: updated_at
  operations:
    insert: [c, insert]
    update: [u, update]
    delete: [d, delete]

`cdc`

Configures how Filedge applies a CDC File to the destination table.

Field	Required	Meaning
`keys`	Yes	Source column names that identify the destination row
`operation_column`	Yes	Source column containing the change operation
`sequence_by`	Yes	Source column used to pick the latest change for a key within one File
`operations.insert`	Yes	Operation values treated as inserts
`operations.update`	Yes	Operation values treated as updates
`operations.delete`	Yes	Operation values treated as deletes

keys and sequence_by must be declared in columns:. The operation column may be CDC metadata only; it does not need to be declared unless you also want to write it to the destination.

First-version CDC support is SCD Type 1 only. Inserts and updates replace the current row for the configured key. Deletes remove the current row for the key. SCD Type 2 history tables are out of scope.

`retry_cap`

Optional. Default: 3. Maximum number of attempts before a file enters terminal FAILED state. Set to 1 to disable automatic retry.

`stale_timeout_minutes`

Optional. Default: 30. How long a PROCESSING lock may be held before it's reclaimed as stale.

`batch_size`

Optional. Default: 1000. Number of rows per database batch during write_rows. Larger batches are more efficient but use more memory.

`connector` block

Declares the destination backend. See Connectors for full details on each type.

connector:
  type: sqlite          # sqlite | postgres | bigquery | databricks | duckdb
  url: sqlite:///...    # type-specific options follow

`columns` block

Declares the schema mapping between source file columns and destination table columns.

columns:
  - source: <source_column_name>   # name as it appears in the file
    dest: <dest_column_name>       # name in the destination table
    type: <type>                   # see Column Types
    required: true | false

`source`

The column name as it appears in the CSV header or NDJSON key.

`dest`

The column name in the destination table. May differ from source for renaming.

`type`

The target type for coercion. See Column Types.

`required`

Whether a missing or null value in this column should fail the row. When required: true, a null or missing value causes the file to fail (strict mode — the whole file is rejected, not just the row).

Column tolerance

Extra columns in the source file that are not declared in columns: are silently ignored. Only declared columns are written to the destination. This lets upstream systems add fields without breaking your pipeline.

Schema guard

On first run, the connector creates the destination table from the columns: block. On subsequent runs, if the live table schema doesn't match the config, the run fails loudly with a diff. No auto-migration — schema changes require manual action.

pipeline.yaml reference

Minimal example

Full example

Top-level fields

format

dest_table

write_mode

cdc

retry_cap

stale_timeout_minutes

batch_size

connector block

columns block

source

dest

type

required