~/siphon

Incremental backups

Incremental backups capture only what changed since an earlier dump, so a chain of small incrementals can stand in for repeated full dumps. siphon records the relationship between dumps in the dump envelope (a 4 KB JSON header prepended to every dump) and reconstructs the full picture at restore time by walking the base → incremental chain.

How it works

Every dump carries an envelope (internal/dumps/envelope.go) with a type (base or incremental), a base_id/parent_id, and engine-specific resume coordinates: wal_start/wal_end for Postgres, binlog_file + binlog_start/binlog_end for MySQL/MariaDB.

Catalog.ResolveChain (internal/dumps/chain.go) walks parent_id backwards from a target dump to its base, detecting cycles and broken chains rather than looping or silently truncating. Restore then applies the resolved chain in order, base first (internal/app/restore.go). A plain, non-incremental dump resolves to a single-element chain, so the same restore path serves both.

An incremental backup is a bounded change capture: starting from the base dump's recorded end position, siphon streams the row changes that committed since then up to a fixed end position captured at backup time, and serializes each as a JSONL CanonicalChange (insert/update/delete with primary key + post-image). The incremental dump body is therefore engine-neutral change records, not raw WAL/binlog bytes. At restore time those changes are replayed via ApplyChange rather than fed to the native restore tool — base links restore natively, incremental links replay change records.

Where the first incremental resumes from

backup --incremental --base <id> reads the base dump's envelope for its end position (basePosition() in internal/app/backup.go). For this to be correct when --base points at a full dump, that full backup must record where the engine's change stream stood as of the dump. Every full backup therefore captures the engine position immediately after the dump completes (via driver.BasePositioner.CurrentPositionpg_current_wal_lsn() for Postgres, the current binlog file+offset for MySQL/MariaDB) and stamps it into the base envelope (WALEnd / BinlogFile+BinlogEnd). Without this, the first incremental off a full base would start from "now" and silently drop every change committed between the base dump and the incremental run.

Capturing the position after the dump (rather than before) never under-captures: a consistent dump reflects the DB as of its snapshot, and the post-dump position is at-or-after that snapshot, so the incremental picks up every post-base change. The only edge is a change landing right at the boundary being captured in both the base and the incremental; incremental-replay INSERTs are therefore idempotent (ON CONFLICT DO NOTHING for Postgres, INSERT IGNORE for MySQL/MariaDB), so the re-apply is harmless.

The driver-level capture (driver.IncrementalBackuper):

  • Postgres (internal/driver/postgres/incremental_change.go) captures the current pg_current_wal_lsn() as the end bound, then drives the same pgoutput logical-decoding loop as CDC with that LSN as a stop target — it returns cleanly at the first message boundary past the bound, so every change committed at or before it is captured and none after.
  • MySQL/MariaDB (internal/driver/_mysqlcommon/incremental.go) captures the current binlog file + offset via SHOW BINARY LOG STATUS (MySQL 8.4+) or SHOW MASTER STATUS (older MySQL / MariaDB) as the end bound, then decodes the fork's binlog tool output up to that offset.

The CLI surface

# Restore a dump, walking its base→incremental chain automatically.
siphon restore <dump-id> --profile <target>

# Stop applying the chain after a specific dump (point-in-chain restore).
siphon restore <dump-id> --profile <target> --up-to <intermediate-id>

# Take an incremental backup capturing changes since a base dump.
siphon backup <profile> --incremental --base <base-dump-id>

--incremental requires --base <dump-id>; --base without --incremental (or --incremental without --base) is rejected with a clear error.

Status

CapabilityStatus
Dump envelope (type/base/parent, WAL & binlog fields)✅ Works
Chain resolution (ResolveChain)✅ Works
Chain-walking restore (base → incrementals, in order)✅ Works
restore --up-to <id> (stop chain early)✅ Works
backup --incremental --base <id> (bounded change capture)✅ Works
Incremental restore (change replay via ApplyChange)✅ Works
Postgres orphan replication-slot sweep✅ Works

The full incremental path is wired end-to-end: backup --incremental reads the base envelope's end position, captures the bounded change set via the driver's IncrementalBackuper, and writes an incremental-type catalog entry whose envelope carries this capture's end position (so the next incremental resumes exactly here). Restore replays each incremental link's changes via ApplyChange. The live-server behavior is exercised in CI (integration-tagged tests against a wal_level=logical Postgres); it is compile-checked but not run locally.

Examples

Chain-walking restore (works today). If inc-2 was built on inc-1 on base-0, restoring inc-2 applies all three in order:

siphon restore inc-2 --profile prod-replica

Point-in-chain restore with --up-to (works today) — apply only up to and including inc-1, skipping inc-2:

siphon restore inc-2 --profile prod-replica --up-to inc-1

A typo'd --up-to is rejected (the dump isn't in the chain) rather than silently restoring more than asked.

Taking an incremental backup against a base dump:

siphon backup prod --incremental --base base-0
# Captures changes committed since base-0's end position and writes a new
# incremental dump linked to base-0. Restoring it later replays base-0 then the
# captured changes.

Limitations and runtime gates

These apply to the incremental backup path:

  • Postgres uses a persistent logical replication slot (siphon_logical) as the change-stream resume anchor. Per-base physical slots are swept automatically: before each incremental capture, SweepOrphanSlots drops any inactive siphon_* physical slot (an inactive siphon slot is by definition orphaned — a completed backup drops its own, an active one is in use). The persistent logical slot is excluded from the sweep so the resume position survives between runs.
  • MySQL/MariaDB require binlog_format=ROW for usable incrementals.
  • Cross-version incrementals are unsupported (CrossVersionIncremental: false): a chain must be captured and restored against the same engine major version.