Kzip — Format Spec (v1 — restic-compatible bootstrap)
Formato canônico de arquivos `.kz` gerados por `kzip`, o compactador da Koder Stack. Uma única extensão cobre todos os modos (single-file, multi-file tar, diretório); dispatch é por magic bytes, não por sufixo. Durante o bootstrap (v1), o formato é compatível byte-a-byte com o repositório restic v0.18.x — single source of truth. Divergências futuras requerem bump de versão major + ticket explícito + nota de incompatibilidade.
Quando esta spec se aplica
Triggers primários
- Modificar qualquer estrutura encoded em arquivos .kz
- Adicionar ou remover campos de metadata serializados
Todos os triggers
- Modificar qualquer estrutura encoded em arquivos .kz
- Alterar o layout de pack-files, snapshots, índices ou árvores no repositório kzip
- Adicionar ou remover campos de metadata serializados
- Trocar algoritmos de compressão default ou cripto AEAD
- Definir novo magic number, frame extension, chunk type, ou block format
Corpo da especificação
Kzip Format Specification — v1 (restic-compatible)
0. Stage and stability
Esta v1 do formato é restic-compatible durante o bootstrap. Documentos canônicos de referência:
restic design.rst v0.18.1— comportamento do repositóriorestic references/design.rst v0.18.1— formato wire-level
A spec abaixo resume o contrato; em caso de conflito, os documentos restic acima prevalecem (até a v1 ser ratificada).
1. File extension
Tudo que kzip emite usa a extensão .kz, independente do modo. Dispatch single-file vs multi-file vs sidecar é feito por magic bytes (ver §11), não pelo sufixo — em linha com .zst/.gz/.xz (uma extensão por formato, payload variado).
| Forma | Modo | Notas |
|---|---|---|
<name>.kz (arquivo) | Single-file stream | Frame de stream comprimido análogo a .zst. Reservado mas não emitido pelo bootstrap v0.1 (a definir no ticket #010). |
<name>.tar.kz (arquivo) | Multi-file empacotado | Repositório kzip agregado num .tar (transporte/distribuição). Cliente desempacota antes de operar. |
<name>.kz/ (diretório) | Multi-file desempacotado | Repositório kzip como árvore de arquivos (layout nativo de §2). Modo de trabalho típico. |
<name>.kz.rs (sidecar) | Reed-Solomon parity sidecar | Ver §13.1. Sufixo .rs aplicado depois do .kz do arquivo coberto (backup.kz → backup.kz.rs). |
Histórico: versões anteriores da spec usavam
.kzip(multi-file) e.kzrs(sidecar). Unificadas em.kz/.kz.rsem 2026-05-12 antes de qualquer dado em produção; nenhum migration path necessário.Nota: restic não usa extensão pra repositórios. Adotamos
.kz(sufixo de diretório ou tar) para signal in-name.
2. Repository layout
Um repositório kzip é um diretório com a seguinte estrutura. Todos os arquivos abaixo são encrypted com a chave do repositório (exceto config que tem header não-cifrado pra detection).
<repo-root>/
├── config ← repository config (JSON, encrypted body)
├── keys/<id> ← chaves derivadas + Argon2 KDF
├── data/<2-hex>/<pack-id> ← packs (blobs concatenados + comprimidos)
├── index/<index-id> ← índices map (blob-id → pack-id+offset+length)
├── snapshots/<snap-id> ← snapshot metadata (timestamp, paths, tree-id)
├── locks/<lock-id> ← exclusion locks (TTL ~30 min)
└── HEAD ← (opcional) ponteiro para snapshot mais recente
Todos os IDs são SHA-256 hashes em hex lowercase (64 chars).
3. Crypto
3.1 Master key derivation
- KDF: Argon2id (default
time=3, memory=64MiB, parallelism=1). - Salt: 64 bytes random per-key.
- Output: 64 bytes (32 encryption + 32 MAC).
3.2 Encryption
- Cipher: AES-256-CTR.
- IV: 16 bytes random per-blob.
- MAC: Poly1305 (32 bytes) over (IV || ciphertext).
- AEAD construction: encrypt-then-MAC.
3.3 Format wire de cada blob criptografado
+-----------------------------+----------------+-------------------------+
| IV (16 bytes random) | ciphertext (N) | Poly1305 tag (32 bytes) |
+-----------------------------+----------------+-------------------------+
Total = IV(16) + N + tag(32). N pode ser 0 (empty plaintext válido).
3.4 Chave de repositório
Cada keys/<id> contém JSON cifrado com a master key, com:
{
"created": "<RFC3339>",
"username": "<string>",
"hostname": "<string>",
"kdf": "argon2id",
"n": 524288, "r": 1, "p": 1, // Argon2 params
"salt": "<base64>",
"data": "<base64-encrypted-master-key>"
}
A senha do operador deriva uma key-encryption-key via Argon2id; essa KEK descriptografa data para obter a master key real do repositório. Múltiplas keys/ podem coexistir (multi-user / rotation).
3.4.1 Recipient mode — X25519 public-key wrap (ticket #005, ratified 2026-05-13)
Para automação sem operador presente (CI, cron, runner sem senha), um keys/<id> pode estar em recipient mode: a master key viaja envelopada para uma ou mais chaves públicas X25519 em vez de derivada por senha. O JSON omite kdf/N/r/p/salt/data e carrega:
{
"created": "<RFC3339>",
"username": "<string>",
"hostname": "<string>",
"recipients": [
{
"type": "x25519-kzip-v1",
"recipient_fingerprint": "<base64-32-bytes-X25519-pub>",
"envelope": "<base64-of-envelope-bytes>"
}
]
}
Cada envelope é a saída de internal/pubkey.Recipient.Wrap:
+----------------+--------+---------------------------+
| ephemeral_pub | nonce | ciphertext ‖ Poly1305 tag |
| (32 bytes) |(12 B) | (variable) |
+----------------+--------+---------------------------+
Chave de embrulho:
shared = ECDH(ephemeral_priv, recipient_pub)
key = HKDF-SHA256(shared, salt="", info="kzip-pubkey-v1")
ct/tag = ChaCha20-Poly1305.Seal(key, nonce, master_key_json, AD=ephemeral_pub)
Properties:
- Determinismo do payload:
master_key_jsoné o JSON-marshal da*crypto.Keymaster — bit-idêntico aodataque password-mode produziria. Ler viaOpenKeyWithIdentityproduz o mesmocrypto.KeyqueOpenKeydaria. - Ephemeral key per Wrap: cada
envelopecarrega umaephemeral_pubfresca. Dois envelopes para o mesmo recipient do mesmo master são distintos (anti-dedup). - AEAD cobre o envelope inteiro:
ephemeral_pubé a associated data, então alterar qualquer byte (ephemeral_pub,nonce,ciphertext,tag) surface comoErrUnwrap. - Modos mútuamente exclusivos por arquivo: um
keys/<id>em recipient mode não temkdf, e vice-versa.OpenKeyrecusa recipient-mode comErrRecipientMode;OpenKeyWithIdentityrecusa password-mode.SearchKeyeSearchKeyWithIdentityskipam o modo errado silenciosamente, então um repo pode ter keys mistos. - Fingerprint routing:
OpenKeyWithIdentitysó tenta envelopes cujorecipient_fingerprintbate comidentity.Recipient().Bytes()— evita timing oracle sobre os outros recipients no mesmo arquivo.
File format on disk (identity/recipient files)
Operadores trocam o public (<prefix>.pub) e guardam o private (<prefix>.key, perm 0600). Layout texto, uma linha:
KZIP-IDENTITY-V1-<base64-StdEncoding(priv 32B)>
KZIP-RECIPIENT-V1-<base64-StdEncoding(pub 32B)>
O prefixo distintivo previne footgun (priv em slot de pub e vice-versa): internal/pubkey.ReadIdentityFile / ReadRecipientFile retornam ErrBadFormat antes de qualquer decode.
CLI surface
| Comando | Função |
|---|---|
kzip key gen-identity --output <prefix> | Gera <prefix>.key (priv 0600) + <prefix>.pub (pub 0644) |
kzip init --recipient <pub> (1+) | Bootstrapa repo recipient-only — sem senha |
kzip key add-recipient --pubkey <pub> | Adiciona recipient a repo existente (master vem do open atual) |
kzip --identity <key> <cmd…> | Global flag — abre repo via X25519 ao invés de senha |
kzip key list | Coluna Mode (password/recipient) + Fingerprints |
Divergência byte-compat
Recipient-mode keys/kdf/data faz o open path do restic falhar imediatamente ("KDF is not scrypt"). É uma divergência forward-compat-pra-frente: kzip lê o que restic escreve, restic não consegue abrir keys recipient-mode. Listada em §14.
3.4.2 Future — age compatibility (separate ticket)
Um envelope type: "age-v1" ficaria ortogonal: mesmo arquivo, recipient diferente. O ticket atual (#005) não implementa age compat pra evitar arrastar o header/armor format do age; manter a impl mínima até houver demanda concreta.
4. Pack files
Pack files agregam vários blobs num único arquivo para amortizar overhead I/O e melhorar compressão.
4.1 Layout
+---------------+---------------+-----+---------------+-------------------+
| blob 1 | blob 2 | ... | blob N | header (encrypted) |
+---------------+---------------+-----+---------------+-------------------+
↑
ends at EOF - 4
+----+
| H | header length (uint32 LE, last 4 bytes of file)
+----+
4.2 Pack header (após decrypt)
type PackHeaderEntry struct {
Type byte // 0=data, 1=tree, 2=data-compressed, 3=tree-compressed
Length uint32 // length of ciphertext (= encrypted-and-possibly-compressed bytes)
// If Type ∈ {2, 3} an additional uint32 LE follows here:
// UncompressedLength uint32 — length of plaintext before compression
ID [32]byte // SHA-256 of plaintext
}
Header inteiro = repeated PackHeaderEntry + cifrado AEAD.
Type byte legend (revised 2026-05-12 to match restic v0.18.x semantics): values 0–3 are exhausted by the (DataBlob/TreeBlob) × (compressed/uncompressed) cross-product. The "padding (legacy)" reading in earlier drafts of this spec was incorrect; the byte 2 is
DataBlob compressedin restic v0.18.x and is reserved with that meaning here. Values 4–255 are available for kzip extensions (see §13).
4.3 Tipos de blob
| Type | Conteúdo | Compressed | Geração |
|---|---|---|---|
data (0) | Chunk de arquivo (bytes brutos do arquivo após chunking) | no | Pelo backup, antes de cifrar |
tree (1) | JSON serializado de uma árvore de diretório | no | Pelo backup, ao subir cada dir |
data-c (2) | Chunk de arquivo — comprimido com zstd | yes | Pelo backup, antes de cifrar |
tree-c (3) | JSON da árvore de diretório — comprimido com zstd | yes | Pelo backup, ao subir cada dir |
Compressed variants carry an extra 4-byte LE UncompressedLength field in the header entry (between Length and ID) so the reader can pre-allocate the decode buffer. Default compressor is zstd level 3 (see §10).
5. Content-defined chunking (CDC)
- Algoritmo: Rabin fingerprint sobre janela rolante.
- Polinomial: random per-repo (gerado no
init, salvo emconfig). - Tamanhos: min=512 KiB, max=8 MiB, target=1 MiB (defaults restic — podem ser tunáveis em RFC futura).
- Boundary: hash mod 2²⁰ == 0 (ajustável para hit target).
Cada chunk vira um data blob (após dedup pelo hash).
6. Trees
Um tree blob é JSON serializado:
{
"nodes": [
{
"name": "filename",
"type": "file" | "dir" | "symlink" | "fifo" | "socket" | "blockdev" | "chardev",
"mode": "0644",
"mtime": "<RFC3339>",
"atime": "<RFC3339>",
"ctime": "<RFC3339>",
"uid": 1000, "gid": 1000,
"user": "koder", "group": "koder",
"size": 12345,
"content": ["<blob-id>", "<blob-id>"], // for files
"subtree": "<tree-id>", // for dirs
"linktarget": "<path>", // for symlinks
"extended_attributes": [{"name":"...","value":"<base64>"}]
}
]
}
xattrs e ACLs são preservados via extended_attributes. Hard-links não são deduplicados explicitamente — mesma content array implica mesmo conteúdo, mas inode identity não é preservada.
7. Snapshots
Cada snapshots/<id> contém JSON cifrado:
{
"time": "<RFC3339>",
"tree": "<root-tree-id>",
"paths": ["/home/user/docs"],
"hostname": "host",
"username": "user",
"uid": 1000, "gid": 1000,
"tags": ["weekly", "automated"],
"parent": "<previous-snapshot-id>", // optional
"program_version": "kzip 0.1.0-bootstrap (restic-fork)"
}
8. Indices
index/<id> é JSON cifrado mapping cada blob-id para sua localização:
{
"supersedes": ["<old-index-id>"],
"packs": [
{
"id": "<pack-id>",
"blobs": [
{
"id": "<blob-id>",
"type": "data" | "tree",
"offset": 0,
"length": 4194304,
"uncompressed_length": 5242880 // optional, for compressed blobs
}
]
}
]
}
prune consolida múltiplos índices num só (substituindo via supersedes).
9. Locks
Exclusion locks em locks/<id>:
{
"time": "<RFC3339>",
"exclusive": true | false,
"hostname": "host",
"username": "user",
"pid": 12345
}
TTL ~30 min; locks abandonados expiram. Stale locks detectados via PID liveness.
10. Compression
Blobs (data + tree) são comprimidos antes de cifrar. v1 suporta:
| Algorithm | Default | Notas |
|---|---|---|
| zstd level 3 | sim | balance perf/ratio default |
| zstd level 1 | opt-in | máxima velocidade |
| zstd level 11 | opt-in (--compression max) | máxima compressão |
| nenhum | opt-in (--compression off) | escapa quando dados já comprimidos |
LZMA, BWT, BCJ filters não suportados em v1 (planejados em ticket #003).
11. Magic numbers / detection
Como a extensão .kz é única (§1), o dispatch é pelo conteúdo. Dado um <name>.kz arbitrário, o CLI decide o modo na ordem:
- Directory check — se for diretório, é repositório multi-file desempacotado; abrir
<name>.kz/config. - Tar header — se os primeiros 257..264 bytes contêm o magic
ustar\0(POSIX tar), é repositório empacotado; extrair para<name>.kz/(ou stream-process tar) antes de operar. - Sidecar magic — primeiros 4 bytes =
KZRS→ sidecar Reed-Solomon (ver §13.1). Sufixo canônico do arquivo:.kz.rs. - Stream frame magic — primeiros 4 bytes =
KZ\x01\x00(single-file stream; ver §13.2 para o layout completo do header). Decode via stream decompressor. - Senão — erro
KZIP-FORMAT-001("unrecognized kzip payload").
Outros pontos de detection:
- Pack files dentro de
<repo>/data/<2-hex>/: sem magic number explícito — detection via tentativa de decrypt do header lido pelos últimos 4 bytes (length). - Config: JSON cifrado com header
"version": 2(após decrypt). - Repository version atual: 2 (mesmo do restic v0.18.x).
12. Endianness
Todos campos numéricos binários são little-endian.
13. Future extension hooks
A v1 reserva os seguintes campos para uso futuro sem quebrar compat:
PackHeaderEntry.Typevalues 0–3 are taken by restic blob semantics (see §4.2/§4.3). Value4= filter-chain descriptor (see §13.3, ratified 2026-05-12). Values 5-255 remain reserved for future extensions (signature blob, etc.).Snapshot.tagsaceita arbitrary strings para metadata Koder-specific (koder:repo=hub,koder:role=daily-backup).- Config JSON aceita campos não-reconhecidos sem erro (forward-compat) — kzip futuro pode adicionar
signing_key_id,recovery_records_enabled,bcj_filter_chain, etc.
13.1 Sidecar artifacts (out-of-band, not part of repo format)
Some kzip features write sidecar files alongside repo artifacts without modifying the repo format. Sidecars are additive: a v1 reader/restic that doesn't recognize the sidecar simply ignores it.
.kz.rs — Reed-Solomon parity sidecar (kzip ticket #007 v1 sidecar mode):
Filename convention: applied as a .rs suffix on top of the covered file's .kz suffix — backup.kz → backup.kz.rs. The internal magic stays KZRS regardless of filename. (Pre-2026-05-12 builds wrote .kzrs; readers SHOULD accept both during the deprecation window per ticket #010.)
Layout (big-endian where applicable):
+-----+--------+--------+--------+----------+----------+----------+
| 4B | 1B | 1B | 1B | 4B BE | 32B | N×B |
| KZRS| ver=01 | dShard | pShard | dataSize | sha256 | parity |
+-----+--------+--------+--------+----------+----------+----------+
- Magic
KZRS(0x4B 0x5A 0x52 0x53); version0x01. dShard + pShard ≤ 256(klauspost/reedsolomon constraint).parity=pShardshards ofceil(dataSize / dShard)bytes.- Generated by
kzip recovery encode <file>; consumed byverify/repair. - Out-of-band: removing all
.kz.rsfiles leaves the repo intact and readable by stock restic.
The pack-format-embedded variant landed via kzip ticket #009 (ratified 2026-05-13). It carries parity inside the pack header as a PackHeaderEntry.Type=5 blob with per-shard SHA-256 checksums — see §13.4 for the wire layout and §14.3 for the divergence note. The sidecar form continues to be supported in parallel for files outside the repo (e.g. raw deploy artifacts). (Pre-2026-05-12 drafts of this spec proposed Type=4 for the RS pack-embedded entry; Type=4 was reallocated to the filter-chain descriptor in §13.3 once the actual restic type space was audited.)
13.2 Single-file stream header (.kz, ratified 2026-05-12 via ticket #010; emitter shipped 2026-05-19 via ticket #024)
A single-file .kz artifact has a 48-byte fixed header, an opaque compressed payload, and a 4-byte trailer. The header is not encrypted (the payload may be); §11 detection only requires the first 4 bytes.
+-----+--------+--------+------+-----------+----------+---------+------+
| 4B | 1B | 1B | 2B | 8B LE | 32B | N bytes | 4B |
| KZ | ver=01 | comp | flags| uncomp_sz | sha256 | payload | tlen |
| \01 | | | LE | | (plain) | (compr.)| (LE) |
| \00 | | | | | | | |
+-----+--------+--------+------+-----------+----------+---------+------+
| Field | Width | Encoding | Meaning |
|---|---|---|---|
magic | 4B | bytes | Exactly 0x4B 0x5A 0x01 0x00 (KZ, version-prefix, NUL). Distinguishes from KZRS (sidecar) and ustar\0 (tar repo). |
ver | 1B | u8 | Header version. 0x01 is the only value defined; readers MUST reject other values. |
comp | 1B | u8 | Compressor: 0 = uncompressed; otherwise the RFC-002 CompressorID (§13.5 / internal/compressor/): 1=zstd, 2=lzma2, 3=brotli, 4–127 reserved first-party, 128–255 reserved downstream. Level is not persisted (codecs auto-detect). |
flags | 2B | u16 LE | Bit flags. Bit 0 = encrypted_payload (payload is AEAD-wrapped per §3.3); bits 1–15 reserved (MUST be zero on write; readers MUST accept any value to allow forward-compat additions). |
uncomp_sz | 8B | u64 LE | Plaintext length. 0 is valid (empty file). Cap = 2⁶³−1 (signed-int interop). |
sha256 | 32B | bytes | SHA-256 of plaintext (pre-compression, pre-encryption). |
payload | N bytes | opaque | Compressed payload. If comp == 0, the payload is the raw plaintext (or AEAD-wrapped raw plaintext when flags.encrypted_payload). If comp > 0, the payload is the native bytes of the named codec (zstd frame, .xz stream, brotli stream). |
tlen | 4B | u32 LE | Trailer: byte length of payload (sanity vs. truncation; equals filesize − 52). |
Endianness rule of thumb: uncomp_sz, flags, and tlen are little-endian (matching §12 and the rest of the kzip pack-file format). The sidecar .kz.rs blob is the only big-endian field in the spec — preserved unchanged for restic-format alignment.
Constraints:
len(file) ≥ 52(header 48 + trailer 4 ≥ 52, with an empty payload allowed). Reader rejects shorter files withKZIP-FORMAT-002.tlen + 52 == len(file)MUST hold; otherwiseKZIP-FORMAT-003 ("truncated kz stream").- If
comp == 0,len(payload) == uncomp_sz(or, with encryption,len(payload) == uncomp_sz + 48per §3.3). sha256is computed over the plaintext, not the ciphertext or the compressed bytes. Reader verifies after decompress+decrypt.
Pre-2026-05-19 drafts of this section assigned comp to a zstd-only enum (1=zstd-1, 2=zstd-3, 3=zstd-11, 16–31 reserved for non-zstd). That assignment was never emitted (the section was "reserved but not emitted" until #024) and is superseded by the RFC-002 CompressorID alignment above. The new assignment is the canonical one going forward — readers MUST follow this table.
CLI surface (kzip #024):
kzip compress <file> [--compressor=<name>[:level]]→ writes<file>.kzper the layout above.kzip decompress <file>.kz→ infers compressor fromcomp, validatessha256, writes<file>.kzip compress -c <file>/kzip decompress -c <file>.kz→ stdout, matching gzip semantics.- Bare
kzip <file>(no subcommand) routes by extension:.kzending → decompress; else → compress. - Symlinks named
gunzip,kunzip,kzdforce decompress regardless of extension.
13.3 Filter-chain descriptor (PackHeaderEntry.Type = 4, ratified 2026-05-12 via ticket #003)
A pack file may include a filter-chain descriptor to document the pre-compression byte transforms (e.g. BCJ, delta) that were applied to its data/tree blobs before compression+encryption. The descriptor is one extra entry in the pack header, distinguishable from data (Type=0) / tree (Type=1) / legacy padding (Type=2) by its Type byte.
Wire format (after decrypt, little-endian):
+--------+------------+------------+------------+-----+-----------------------------+
| 0x04 | nfilters | flen[0] | filter[0] | ... | flen[n-1] | filter[n-1] |
| (Type) | (u16 LE) | (u16 LE) | (n bytes) | | (u16 LE) | (n bytes) |
+--------+------------+------------+------------+-----+-----------------------------+
Each filter[i] is a UTF-8 string of the form "<name>" or "<name>:<param>", exactly matching the chain-syntax produced by Chain.String() in engines/compress/kzip/internal/filters/ (post-#008 layout; was engine/restic_vendor/internal/filters/ during bootstrap). Names are stable identifiers from the filter registry:
| Name | Status | Origin |
|---|---|---|
delta | shipped (ticket #003) | universal delta-encoding (out[i] = in[i] − in[i−d]) |
bcj-x86 | stub (ticket #012) | xz/7-zip x86 branch/call/jump |
bcj-arm, bcj-thumb, bcj-arm64 | stub (ticket #013) | xz ARM variants |
bcj-ppc, bcj-ppc-le, bcj-ia64, bcj-sparc, bcj-riscv, bcj-riscv32 | stub (ticket #014) | xz misc-arch variants |
Constraints:
nfilters ≤ 256— generous cap, no realistic chain exceeds 4.flen[i] ≤ 64— keeps each entry inside one cache line; sanity-bounds the descriptor.- The total descriptor is bounded above by
4 + 256 × (2 + 64) = 16900 bytes. - Filter names not present in the reader's registry MUST fail loudly (
KZIP-FORMAT-004 ("unknown filter '<name>' in pack header")). Treating unknown filters as no-ops would silently produce garbage on decode — never acceptable.
Backward-compat for readers without filter support:
- A v1 reader (restic v0.18.x or pre-#003 kzip) that encounters
Type=4in the pack header MUST skip the entry —Lengthtells it how many bytes to advance. Skipping is safe iff the pack file'sdata/treeblobs were not transformed by any filter; producers that emitType=4with non-empty chains MUST guarantee any non-trivial filter was actually applied (the chain is the contract). - The reverse case (a kzip-#003 reader on a v1 pack without
Type=4) is the common case: empty chain implicit, no transform was applied, blobs decompress directly.
The encoding is intentionally text-rather-than-bytecode to keep the chain debuggable via xxd on the encrypted header and to align with the CLI's --filter= chain syntax.
13.3.1 Per-blob filter-skip flag (kzip ticket #017, 2026-05-15)
The pack-header entry Type byte of a data/tree blob carries an
opt-in high-nibble flag field in addition to the low-nibble type
discriminator. The dispatch nibble (bits 0–3) stays compatible with
the values defined in §4.3 (0=DataBlob, 1=TreeBlob, 2=DataBlob
compressed, 3=TreeBlob compressed). Bits 4–7 are reserved for kzip
extensions; the first defined flag is:
| Bit | Mask | Name | Semantics |
|---|---|---|---|
| 7 | 0x80 | FlagFilterSkipped | The chain declared in this pack's Type=4 descriptor was withheld for this blob. The reader MUST decompress as usual but MUST NOT call chain.Reverse — the post-decompress buffer IS the original plaintext. |
| 4–6 | 0x70 | reserved | Future kzip flags. Readers MUST mask them out (Type & 0xF0 carries the full flag field today; older readers that lacked this field encoded only bit 7). |
Why this exists: content-aware filters like bcj-x86 only pay off
when the whole chunk is the right kind of data. Content-defined
chunking can produce chunks that span tar headers, ELF .text, debug
info, and unrelated files (see ticket #016: tar-of-/usr/bin showed
a +17.6% regression with bcj-x86,delta:1). When the per-chunk
sniffer (internal/sniff/) classifies a chunk as not matching the
declared chain, the writer (saveAndEncrypt) skips chain.Apply and
sets bit 7 on the Type byte so the reader knows to skip
chain.Reverse.
Backward-compat:
- Bit 7 is opt-in via writer choice — a kzip backup without a content-aware filter chain never sets it. Repos written by kzip 0.1.0 (pre-#017) are byte-identical to repos written by kzip with #017 on empty/non-BCJ chains.
- A pre-#017 kzip reader sees
Type=0x82(compressed-data + flag) and falls into thedefault → errorbranch of the entry-type switch, refusing to decode the pack. This is forward-incompatible for opt-in repos only — users who never invoke BCJ chains never hit it. If strict forward-compat is needed, gate emission of bit 7 behind a futurerepo.Versionbump (path C of #016 §13.3 design).
Encode side (writer, kzip ≥ 0.2.0):
chain := r.filterChain
flags := restic.BlobFlags(0)
if chain.NeedsSniffing() && t == DataBlob {
if !chain.Matches(sniff.Sniff(data)) {
flags |= restic.FlagFilterSkipped // skip chain.Apply
}
}
// pack header entry Type byte = (dispatch_type & 0x0F) | (flags & 0xF0)
Decode side (reader, kzip ≥ 0.2.0):
encodedType := p[0]
b.Flags = restic.BlobFlags(encodedType & 0xF0)
tpe := encodedType & 0x0F // 0..3 — same as pre-#017
// ... decompress ...
if !b.Flags.Has(restic.FlagFilterSkipped) {
plaintext = chain.Reverse(plaintext)
}
13.4 Pack-embedded recovery record (PackHeaderEntry.Type = 5, ratified 2026-05-13 via ticket #009)
A pack file may include a Reed-Solomon recovery record that lets the data section be reconstructed in place after partial corruption. The record lives inside the encrypted pack header as a single Type=5 entry — distinct from the sidecar mode in §13.1, which keeps parity in a separate .kz.rs file.
Pack-header wire format (after decrypt, little-endian):
+--------+----------------+------------------------------+
| 0x05 | u32 LE length | kzre-v1 payload (length B) |
+--------+----------------+------------------------------+
The payload itself is the kzre-v1 blob produced by internal/krecovery.EncodeEmbedded:
+-------+-----+--------+--------+----------+----------+--------+----------+
| KZRE | ver | dShard | pShard | reserved | dataSize | sha256 | shard |
| (4B) | (1) | (1) | (1) | (1) | (4 LE) | (32B) | hashes |
+-------+-----+--------+--------+----------+----------+--------+----------+
+---------+
| parity |
| shards |
+---------+
KZREmagic distinguishes from the sidecar'sKZRS.IsEmbedded(blob)is the cheap 4-byte peek.- Per-shard SHA-256 covers every shard (data shards then parity shards). Repair hashes each shard, marks failures as nil, and reconstructs only when
corruptCount ≤ ParityShards— no brute-force suspect loop. - Little-endian length (
dataSize) matches §13.2 / §13.3. Sidecar mode stays big-endian for restic-format alignment. pickShards(ratio)mirrors the sidecar logic: aim for ~256 total shards, parity =clamp(ratio × 256, 1, 128), data =256 - parity.
Constraints:
- The Type=5 entry payload is capped at
MaxRecoveryEntrySize = 8 MiB. A 16 MiB pack at 50% ratio stays well under this; the cap is a sanity gate against bogus length fields. - Multiple Type=5 entries in the same header are rejected (one record per pack).
- Per-shard hashes are checked before any Reed-Solomon math — corrupt shards are identified in O(N) rather than tried in turn.
Backward-compat for readers without recovery support:
- A v1 reader (restic v0.18.x or pre-#009 kzip) that encounters
Type=5MUST skip the entry — the u32 length tells it how many bytes to advance. Skipping is safe; the data section is byte-identical with or without the recovery record (the parity blob never modifies blob bytes). - A kzip-#009 reader on a v1 pack without
Type=5is the common case: no record means no repair available, the pack reader returns whatever the underlying check produces.
CLI surface:
| Command | Function |
|---|---|
kzip backup --recovery-records=N% | Emit Type=5 in every written pack (#009 R3) |
kzip recovery repair-pack <pack-id> | Read Type=5, reconstruct corrupted data section, atomically replace the pack on the backend (#009 R4) |
kzip recovery {encode,verify,repair} continue to operate on sidecar .kz.rs files — distinct flow for files outside a repo.
13.5 KZC compressor envelope (payload-prefix, ratified 2026-05-19 via ticket #018 / RFC-002)
Per-blob compressor descriptor for pluggable compressor backends (zstd / lzma2 / brotli, with reserved IDs for future first-party backends). The descriptor lives inside the encrypted-and-then-compressed payload, not in the pack-header type space — strictly additive, no PackHeaderEntry.Type reallocated.
Post-decryption layout:
+-----+-----+-----+-----+-----+----------+----------+
| 'K' | 'Z' | 'C' | ver | CID | reserved | payload |
| 4B | 5A | 43 | 01 | N | 00 | ... |
+-----+-----+-----+-----+-----+----------+----------+
Reader-side dispatch (post-decryption, pre-decompression):
- Byte 0 ==
0x28→ bare zstd frame (legacy path, every pre-#018 blob). Decode with the zstd backend. - Bytes 0..3 ==
0x4B 0x5A 0x43 0x01→ KZC envelope. Parse byte 4 as CompressorID and dispatch to the registered backend; bytes 6+ are the compressor-native payload. - Anything else → corruption / unknown future format; surface as a typed error pointing the operator at
kzip versionandkzip migrate.
CompressorID registry (RFC-002 §3.3):
| ID | Name |
|---|---|
| 0 | reserved (do not emit) |
| 1 | zstd (never emitted in canonical kzip — zstd blobs are bare) |
| 2 | lzma2 |
| 3 | brotli |
| 4..127 | reserved (first-party backends) |
| 128..255 | reserved (downstream forks) |
Backward-compat:
- Zstd blobs (the default) stay bare. Pre-#018 readers see byte-identical bytes — the format is fully backward-compatible for the common case.
- A pre-#018 reader encountering a KZC-enveloped blob fails at the zstd frame check (
zstd: invalid input) — the AEAD over the ciphertext catches any tampering before decompression, so silent mis-decode is impossible. - A post-#018 reader on a pre-#018 pack: identical to the bare-zstd legacy path. T3 in RFC-002 §8 asserts byte-identical restore.
Mixed-backend repositories are valid; selection is per-SaveBlob, threaded via Repository.Options.Compressor. kzip check decodes every blob through the right backend automatically — operators do not need to track which packs are which.
CLI surface:
| Command | Function |
|---|---|
kzip backup --compressor=<name>[:level] | Pick the backend for the new blobs in this backup |
kzip compressor list | List registered backends with their IDs + level ranges |
$KZIP_COMPRESSOR | Env-var equivalent of --compressor |
Posture vs §14 divergence policy: this is a non-breaking addition — the encrypted-payload envelope is invisible to pack-header parsers. It does not constitute a §14 divergence; no repository.version bump.
13.6 Detached-signature sidecar kzig-v1 (.kz.sig, ratified 2026-05-19 via ticket #023)
Out-of-band Ed25519 signature for any file (compressed or not). The sidecar lives next to the covered file with suffix .kz.sig — analogous to the .kz.rs parity sidecar from §13.1.
Wire format (fixed 101 bytes, little-endian where applicable):
+-----+--------+----------+--------------+
| 4B | 1B | 32B | 64B |
|KZIG | algo | pubkey | signature |
+-----+--------+----------+--------------+
| Field | Width | Encoding | Meaning |
|---|---|---|---|
magic | 4B | bytes | Exactly 0x4B 0x5A 0x49 0x47 (KZIG). Distinguishes from KZRS (RS parity sidecar) and KZ\x01\x00 (single-file stream). |
algo | 1B | u8 | Signature algorithm. 0x01 = Ed25519 is the only value defined; readers MUST reject other values. |
pubkey | 32B | bytes | Ed25519 public key of the signer. Embedded so verification works without an out-of-band recipient file; the hardening posture is to pass --pubkey to pin a known signer. |
signature | 64B | bytes | Ed25519 signature over SHA-256(covered-file-bytes). Streaming-safe — the hash is computed incrementally, so the covered file may be multi-gigabyte without buffering. |
Constraints:
len(file) == 101. Reader rejects shorter/longer files withkzsig: wrong sidecar size.algoMUST be0x01(Ed25519). Future algorithms add new values; writers MUST NOT emit a value the spec hasn't ratified.- The signed message is
SHA-256(covered-file). Not the file directly — this lets the signing primitive stay 32 bytes regardless of covered-file size.
Identity file formats (companion to §3.4.x recipient files):
Signing identity (private, 0600): KZIP-SIGNING-IDENTITY-V1-<base64-32B-seed>
Signing recipient (public, 0644): KZIP-SIGNING-RECIPIENT-V1-<base64-32B-pub>
Suffixes .kzkey (private) and .kzpub (public) are the canonical extensions.
CLI surface (kzip #023):
| Command | Function |
|---|---|
kzip key gen-signing-identity --output <prefix> | Mint a fresh Ed25519 keypair; writes <prefix>.kzkey + <prefix>.kzpub |
kzip sign --signing-identity <file> <path>... | Produce <path>.kz.sig for each file |
kzip verify <path>... | Validate <path>.kz.sig against the file; exit 0 only on full success |
kzip verify --pubkey <hex|file> | Hardening: require the sidecar's embedded pubkey to match the pinned one |
kzip verify --keyring <dir> | Hardening: require the sidecar's pubkey to be one of the .kzpub files in the directory |
Posture vs §14 divergence policy: the sidecar is out-of-band — it does NOT touch the pack-format wire layout and is invisible to restic v0.18.x. This is non-breaking.
Why a separate file from .kz.rs? Both sidecars sit beside the covered file but answer different questions. .kz.rs answers "can I reconstruct corrupted bytes?"; .kz.sig answers "who created these bytes?". Keeping them in distinct files lets operators ship parity without exposing the signing key, or sign without paying the parity overhead.
14. Divergence policy
Mudanças que quebram a compat byte-a-byte com restic v0.18.x:
- Exigem RFC novo (e.g.
kzip-RFC-002-format-divergence.md) com:- Justificativa (feature impossível com formato atual)
- Caminho de migração (forward-compat se possível)
- Bump de
repository.version(3 ou superior)
- Lifecycle:
--migratecommand para converter repos v2 → vN- Suporte de leitura para v2 mantido por ≥1 ano após bump
- Snapshot nota explícita:
kzip 0.X.0 introduced repo v3, see CHANGELOG
Mudanças que mantêm byte-compat (não quebram):
- Adicionar campos JSON novos (forward-compat por convention)
- Adicionar tags Koder-specific
- Adicionar comprehensible algorithms (zstd higher levels, etc.)
Estes não exigem RFC, apenas atualização desta spec + entrada CHANGELOG.
14.5 Fifth active divergence: Configurable blob-ID hash algorithm (kzip ticket #026/#028, 2026-05-19)
A repository's blob-ID hash algorithm is selectable at init time via kzip init --hash=<name>. The choice is persisted in Config.HashAlg; subsequent operations dispatch through that algorithm transparently. Registered algorithms (kzip #026):
| Name | Digest size | Notes |
|---|---|---|
sha256 | 32 bytes | Default. restic-compatible (the only value restic upstream recognizes). |
blake3 | 32 bytes | ~2–3× faster than SHA-256 on modern x86. |
All registered algorithms produce 32-byte digests, so the [32]byte on-disk ID layout is unchanged. Variable-length digests (xxhash 8-byte) would require a separate divergence and are deferred.
Config.HashAlg = "" or "sha256" is byte-compatible with restic v0.18.x (the legacy default). Config.HashAlg = "blake3" is the breakage point: restic will fall through to SHA-256 on read and find no matching blob IDs, failing cleanly without silent corruption.
Dispatch contract:
- Process-wide:
restic.SetHashAlg(name, fn)sets a package-level dispatch target consumed byrestic.Hash(data). The first repo opened in a process locks the algorithm; subsequent opens with a different algorithm returnErrHashAlgConflict. - Per-blob: not supported. The whole repo uses one algorithm. Cross-algorithm mid-life migration is not supported in v1; the supported workflow is restore-into-staging + re-init-with-new-hash + re-backup.
14.4 Fourth active divergence: Per-blob filter-skip flag (kzip ticket #017, 2026-05-15)
The high nibble (0xF0) of the pack-header entry Type byte (§13.3.1)
encodes opt-in BlobFlags. The first defined flag is
FlagFilterSkipped = 0x80. A pre-#017 kzip reader or restic v0.18.x
sees Type=0x80..0xFF and falls into default → invalid type —
same one-directional break as §14.1/§14.3. Opt-in via writer
behavior: kzip without a content-aware filter chain (no bcj-* in
--filter) never sets the bit and emits byte-identical pre-#017
packs.
Skip-by-length contract: the high nibble doesn't change entry size;
a generic skip-unknown-types reader (which neither restic nor kzip
implement) could mask Type & 0x0F to recover the dispatch nibble
and proceed. Current readers don't do that — they reject the byte
outright on the strict-fail-better-than-silent-corrupt principle
(§14.3 rationale).
14.3 Third active divergence: Type=5 recovery records (kzip ticket #009, 2026-05-13)
Pack files written with kzip backup --recovery-records=N% carry a Type=5 entry per §13.4. Per §13 (header type byte legend), values 5-255 are reserved for kzip extensions; restic v0.18.x doesn't recognise this type byte. The current restic vendor parser returns invalid type 5 and aborts — same posture as Type=4 (filter chain).
Skip-by-length is the contract for both: a v1 reader that knows to advance by the u32-encoded length field (Type=4 and Type=5 share this layout) can still read the regular blob entries. Stock restic doesn't do that — it returns an error on the first unknown type. The breakage is one-directional: kzip can read packs that restic wrote (no Type=5 ever); restic can NOT read kzip packs with --recovery-records set.
Hard-fail is strictly safer than silent skip here: silent skip would mean a restic operator sees the pack as parseable but loses all repair capacity without knowing.
14.2 Second active divergence: recipient-mode keys/ (kzip ticket #005, 2026-05-13)
Recipient-mode key files (§3.4.1) carry recipients instead of kdf/data. A restic v0.18.x reader treats the file as JSON-valid but fails to open ("KDF is not scrypt"); it does NOT silently corrupt anything — read-only side-effect-free. kzip provides OpenKeyWithIdentity to consume these files; mixed-mode repos (some password keys, some recipient keys) work transparently in both directions via the routing-error sentinels.
This divergence is opt-in: users who don't pass --recipient to kzip init get a fully restic-byte-compat repo. Once a recipient-mode key file lands, the operator commits to kzip-aware tooling for opening — but the pack files and snapshots remain byte-compat as long as no --filter= is used (cf. §14.1).
14.1 First active divergence: Type=4 (kzip ticket #015, 2026-05-13)
The filter-chain descriptor (§13.3) became the first byte-compat-breaking
divergence shipped to users. A restic v0.18.x reader hitting a pack with
a Type=4 entry returns invalid type 4 and aborts. This is acceptable
because:
- The break only manifests on packs actually written with
--filter=…. No-filter packs (the byte-compat default) still parse cleanly under stock restic. - The reverse-pipeline contract (decrypt → decompress →
chain.Reverse) is the only way a no-filter-aware reader could honour the data; without it, plaintext would silently be the filtered (incorrect) form. Hard failure is strictly safer than silent corruption. - The
repository.versionfield is unchanged (still 2). Migration is one-way: kzip-written filtered repos cannot be downgraded to restic without re-archiving from source. Plain (no--filter=) kzip repos remain interchangeable with restic v0.18.x.
Future byte-compat-breaking extensions (e.g. Type=5 RS pack-embedded, #009) follow the same pattern: gated by an opt-in CLI surface, hard-fail on legacy readers, no silent-misread escape hatch.
- Tests de regressão em
engines/compress/kzip/tests/regression/devem incluir golden-hash compare contra binaries restic v0.18.x para garantir interop. - Tests em
engines/compress/kzip/{cmd,internal}/**/*_test.go(test suite herdado de restic v0.18.1, agora top-level desde #008) preservados como-is.
Anexo A — Mapeamento kzip ↔ restic
Durante o bootstrap, todos os termos restic são equivalentes aos termos kzip. Mapeamento:
| Restic | Kzip | Notas |
|---|---|---|
restic init | kzip init | mesmo behavior |
| repository | repository (.kz/ ou .tar.kz) | extensão única .kz, dispatch por magic (§11) |
| pack file | pack file | layout idêntico |
| blob | blob | idêntico |
| snapshot | snapshot | idêntico |
| Argon2id KDF | Argon2id KDF | idêntico |
| AES-256-CTR + Poly1305 | AES-256-CTR + Poly1305 | idêntico |
| Rabin chunker | Rabin chunker | idêntico |
Quando começar a divergir (ticket #003 BCJ filters, etc.), entradas serão adicionadas a este anexo com data de divergência.
Referências
engines/compress/kzip/docs/rfcs/RFC-001-charter.mdengines/compress/kzip/docs/upstream/restic-NOTICE.mdhttps://github.com/restic/restic/blob/v0.18.1/doc/design.rsthttps://github.com/restic/restic/blob/v0.18.1/doc/references/design.rst