Always-on test recipes T1-T9
testing specs/testing/always-on-recipes.kmd
Receitas concretas pros 9 templates de teste obrigatórios em `policies/always-on.kmd § Templates de teste mandatórios`. Cada receita tem setup, comandos de execução, asserts e calibração conhecida. Componentes copy-paste-tweak; não reinventam.
Corpo da especificação
Spec: Always-on test recipes T1–T9
Status: draft v0.1 (2026-05-24). Receitas validadas em produção em pelo menos um componente serão promovidas a
stable. Componentes consultam este doc antes de escrever T-suites pra evitar reinvenção.
Convenções comuns
- Test host: testes pesados rodam em VM em
s.khost1perpolicies/test-host-isolation.kmd; recipes deste doc não fazem exceção. Comandos abaixo assumem PWD = repo root. - SDK headless:
policies/headless-first.kmdR8 manda reuso deengines/sdk/koder_test_*. Onde fizer sentido, recipes apontam pro SDK em vez de shell-out. - Compat window: recipes assumem janela R1.1 default (2 minor + 1 major + 180 dias). Componentes que TIGHTEN ajustam matrizes proporcionalmente.
- Linguagem: snippets em Go quando o componente é Go; em Dart pra Flutter; em Bash pra orquestração. Adaptar para a stack do alvo.
T1 — Matriz N × N-1 (R1.1, R1.2, R1.3)
Goal: cada combinação de versões cliente↔servidor dentro da janela R1.1 passa o smoke-test do componente, sem 4xx/5xx percebidos pelo cliente.
Setup
# tests/compat/docker-compose.yml
services:
server-N-2:
image: ghcr.io/koder/<component>:${VERSION_N_MINUS_2}
ports: ["18080:8080"]
server-N-1:
image: ghcr.io/koder/<component>:${VERSION_N_MINUS_1}
ports: ["18081:8080"]
server-N:
image: ghcr.io/koder/<component>:${VERSION_N}
ports: ["18082:8080"]
Run
# tests/compat/run-matrix.sh
set -euo pipefail
versions=(N-2 N-1 N)
for c in "${versions[@]}"; do
for s in "${versions[@]}"; do
echo "== client=$c server=$s =="
KODER_SERVER_URL=http://localhost:1808${s/N-/} \
KODER_CLIENT_VERSION="$c" \
go test ./tests/compat/... -run TestSmoke -tags compat
done
done
Assert
- All 3×3 = 9 combinations exit 0.
- No 5xx in server logs.
- No "schema mismatch" or "version too old" errors in client logs.
Notes
- N-2 included: only if
window_minor_versions ≥ 2(Stack default). Components that TIGHTEN to N-3 add a 4th row/column. - Image source: GHCR is the example; substitute Hub registry
(
hub.koder.dev/apps/<slug>:<version>) for Koder-hosted artifacts. - Per-bug regression: when a compat bug is fixed, add a test under
tests/compat/regression/perpolicies/regression-tests.kmd.
T2 — Unknown-field round-trip (R2.1, R2.3)
Goal: parser preserva campo novo desconhecido em re-emit; enum desconhecido degrada graceful sem panic.
Setup
// internal/wire/unknown_field_test.go
//go:build compat
package wire
import (
"bytes"
"encoding/json"
"testing"
)
// Sentinel payload with a future field the current parser doesn't model.
const futurePayload = `{
"v": 3,
"kind": "MessageDelivered",
"payload": { "id": 42 },
"future_field": { "nested": "must survive round-trip" }
}`
Run
go test ./internal/wire/... -run TestUnknownFieldRoundTrip -tags compat -v
Assert
func TestUnknownFieldRoundTrip(t *testing.T) {
var doc map[string]any
if err := json.Unmarshal([]byte(futurePayload), &doc); err != nil {
t.Fatal(err)
}
out, _ := json.Marshal(doc)
if !bytes.Contains(out, []byte(`"future_field"`)) {
t.Fatal("future_field stripped on round-trip — R2.1 violation")
}
}
func TestUnknownEnumDegrades(t *testing.T) {
// Enum value 99 is not yet known. Must NOT panic; must map to UNKNOWN sentinel.
k := ParseMessageKind(99)
if k != MESSAGE_KIND_UNSPECIFIED {
t.Fatal("unknown enum should degrade to UNSPECIFIED — R2.3 violation")
}
}
Notes
- For protobuf: rely on
proto3default of preserving unknown fields (since protoc-gen-go 1.21). Verify withproto.MessageReflect(m).GetUnknown(). - For KMD/KVG/KPKG: parser MUST tolerate unknown directives.
See
specs/document-format.kmdfor the formal contract.
T3 — Rolling upgrade simulated (R4.1, R4.2, R4.3)
Goal: 3 replicas, rolling restart com load gerado em paralelo, zero 5xx percebidos pelo cliente durante a janela.
Setup
# tests/rollout/docker-compose.yml
services:
lb:
image: koder-jet:stable
ports: ["8080:80"]
depends_on: [app-1, app-2, app-3]
app-1: &app
image: ghcr.io/koder/<component>:${VERSION_OLD}
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost:8080/healthz"]
interval: 2s
retries: 3
app-2:
<<: *app
app-3:
<<: *app
Run
# tests/rollout/rolling-upgrade.sh
docker compose -f tests/rollout/docker-compose.yml up -d
sleep 5 # warm up
# Start load generator (sustained 50 RPS for 90s)
hey -z 90s -q 50 -c 5 http://localhost:8080/healthz > /tmp/load.log &
LOAD_PID=$!
# Roll each replica to NEW version, one at a time
for i in 1 2 3; do
docker compose -f tests/rollout/docker-compose.yml \
stop "app-$i"
docker compose -f tests/rollout/docker-compose.yml \
--env-file=tests/rollout/new-version.env \
up -d "app-$i"
# Wait for healthz before next replica
until curl -fsS "http://localhost:8080/healthz" >/dev/null; do
sleep 1
done
done
wait $LOAD_PID
Assert
# Parse hey output: pass if 0 errors (non-2xx) during entire 90s
errors=$(grep -oP 'responses\]\s+\K[0-9]+\s+\[non-2xx\]' /tmp/load.log | awk '{print $1}')
test "$errors" = "0" || { echo "FAIL: $errors non-2xx during rollout"; exit 1; }
Notes
- Load tool:
heyis the example; substitutewrk2,vegeta, ork6if your team standardises on those. - R4.2: if any replica answers 200 on
/healthzbefore its DB pool is warm, the LB sends traffic to a half-ready instance → 5xx for the user. Tests will detect this naturally. - R4.3 graceful shutdown: send SIGTERM (not SIGKILL) during the
swap;
docker compose stopdoes this. Verify in-flight requests drain by sampling latency:p99 < 5×p50during the rollout window.
T4 — Schema migration in production-likeness (R3.1, R3.2, R3.3, R3.4)
Goal: aplicar migration em snapshot prod-like; medir locks, downtime e falha de queries concorrentes; reject se qualquer query bloqueia > 100 ms.
Setup
# tests/migrations/prepare-snapshot.sh
# Restore the most recent prod-like snapshot to a scratch DB.
pg_restore -d migration_test \
--no-owner --clean --if-exists \
/var/snapshots/${COMPONENT}-prod-like-latest.dump
Run
# tests/migrations/run-with-load.sh
# 1. Start sustained read+write load against the scratch DB
pgbench -c 10 -j 2 -T 120 -P 1 migration_test > /tmp/pgbench.log &
PG_PID=$!
# 2. Apply the migration after 10s
sleep 10
psql migration_test -f migrations/${VERSION}-up.sql 2>&1 \
| tee /tmp/migration.log
# 3. Wait load to settle
wait $PG_PID
Assert
# pgbench reports latency_avg per second. Look for spikes.
peak=$(awk '/^progress/ {print $NF}' /tmp/pgbench.log | sort -n | tail -1)
echo "Peak per-second latency_avg: ${peak}ms"
awk -v lim=100 -v peak="$peak" 'BEGIN { exit (peak+0 > lim) ? 1 : 0 }' || {
echo "FAIL: migration caused ${peak}ms peak latency (limit 100ms) — R3.2 / R3.4 violation"
exit 1
}
Notes
- Snapshot freshness: ≤ 24h old; older snapshots may miss recent schema/data shapes that trigger lock paths.
- Online DDL only:
CREATE INDEX CONCURRENTLY,ALTER ... ADD COLUMN(no rewrite),pg_repackfor reorders. Any plainCREATE INDEXor bareALTER NOT NULLon a populated column fails T4 by construction. - For kdb: substitute the same pattern with kdb's online DDL paths.
See
infra/data/kdb/docs/for the canonical kdb migration helpers. - Per migration: T4 runs once per migration file; baseline kept
in
registries/perf-baseline.mdper component.
T5 — Chaos: optional dependency dies (R5.1, R5.2, R5.3, R5.4)
Goal: matar dependência opcional → produto responde 200 em features não-dependentes; feature dependente degrada com mensagem clara (R5.1); circuit breaker abre (R5.2); timeouts e jitter aplicados (R5.3, R5.4).
Setup
# tests/chaos/docker-compose.yml
services:
app:
image: ghcr.io/koder/<component>:stable
environment:
AI_GATEWAY_URL: http://toxiproxy:8474/proxies/ai
toxiproxy:
image: ghcr.io/shopify/toxiproxy:2.5.0
ports: ["8474:8474"]
ai-gateway:
image: ghcr.io/koder/ai-gateway:stable
Run
# Bring up; configure toxiproxy to route ai → ai-gateway:9000
docker compose -f tests/chaos/docker-compose.yml up -d
curl -X POST http://localhost:8474/proxies -d '{
"name": "ai",
"listen": "0.0.0.0:9999",
"upstream": "ai-gateway:9000",
"enabled": true
}'
# Baseline: feature works
curl -fsS http://localhost:8080/generate-with-ai
# Inject failure: 100% packets dropped
curl -X POST http://localhost:8474/proxies/ai/toxics -d '{
"type": "timeout",
"attributes": {"timeout": 0}
}'
# Re-test the same and an unrelated endpoint
curl -fsS http://localhost:8080/healthz # MUST still 200
status=$(curl -s -o /dev/null -w '%{http_code}' http://localhost:8080/generate-with-ai)
echo "Feature with failed dep returned: $status"
Assert
/healthzreturns 200 throughout./generate-with-aireturns 5xx (or 503 + JSON{"error": "ai_unavailable"}) — NEVER hangs past the timeout configured per R5.3.- After dependency restored,
/generate-with-airecovers within one circuit-breaker cooldown window (60s default per R5.2). - Logs show retries with jitter ≥ 25% between attempts (R5.4).
Notes
toxiproxytoxics:timeout,latency,slow_close,bandwidth,slicer. Usetimeout: 0for total outage;latency: 5000for slow-loris.- For non-network deps (disk full, /dev/random blocked), use
chaos-meshor LXC-level fault injection.
T6 — Resumability of upload (R6.1, R6.3, R6.4)
Goal: upload de 100 MiB interrompido a 50% retoma e produz bytes finais idênticos.
Setup
# tests/resume/fixture.sh
dd if=/dev/urandom of=/tmp/payload.bin bs=1M count=100
sha256sum /tmp/payload.bin > /tmp/payload.sha256
Run
# Start upload in background; kill at 50%
upload_pid=""
( curl -fsS --upload-file /tmp/payload.bin \
--header "Idempotency-Key: $(uuidgen)" \
"${UPLOAD_URL}" \
> /tmp/upload-1.log 2>&1 ) &
upload_pid=$!
sleep 1
# Watch byte progress; kill when bytes_sent >= 50MB
while :; do
sent=$(ss -tip "( dport = :8080 )" 2>/dev/null \
| grep -oP 'bytes_sent:\K[0-9]+' | head -1 || echo 0)
if [ "${sent:-0}" -ge $((50*1024*1024)) ]; then
kill -9 "$upload_pid"
break
fi
sleep 0.5
done
# Resume from byte cursor
session_id=$(jq -r .session /tmp/upload-1.log)
curl -fsS -H "X-Resume-Session: $session_id" \
--upload-file /tmp/payload.bin \
--header "Idempotency-Key: $(uuidgen)" \
"${UPLOAD_URL}/resume"
Assert
# Server-side: fetch the assembled blob; sha256 must match original
curl -fsS "${UPLOAD_URL}/blob/$session_id" -o /tmp/retrieved.bin
diff <(sha256sum /tmp/retrieved.bin | awk '{print $1}') \
<(awk '{print $1}' /tmp/payload.sha256) \
|| { echo "FAIL: bytes differ after resume — R6.1 violation"; exit 1; }
Notes
- Chunk size: 8 MiB default per R6.1; smaller chunks = more bookkeeping, larger = less precise resume cursor.
- Idempotency key (R6.3): SAME
Idempotency-Keyon resume must return the same upload metadata, never duplicate. - Long-lived session (R6.4): server keeps resume session alive
≥ 5 min by default. Sessions older than the limit return 410 Gone
with
{"error": "session_expired", "retryable": false}.
T7 — Offline tolerance (mobile/desktop) (R6.2)
Goal: disable network → fluxo principal funciona em features locais → re-enable → sync completa sem perda nem duplicação.
Setup
// integration_test/offline_test.dart (Flutter)
import 'package:integration_test/integration_test.dart';
import 'package:koder_test_input/koder_test_input.dart';
import 'package:koder_test_state/koder_test_state.dart';
void main() {
IntegrationTestWidgetsFlutterBinding.ensureInitialized();
testWidgets('offline create + sync round-trip', (tester) async {
// 1. Bring app up online; baseline sync clean.
await KoderTestState.attachToProcess();
expect(await KoderTestState.outboxCount(), 0);
// 2. Block network (test SDK helper).
await KoderTestInput.setNetworkEnabled(false);
// 3. Create 3 items offline.
for (var i = 0; i < 3; i++) {
await KoderTestInput.tap('fab-create');
await KoderTestInput.enterText('input-title', 'item-$i');
await KoderTestInput.tap('save');
}
// 4. Confirm UI shows them with pending sync state.
expect(await KoderTestState.outboxCount(), 3);
for (var i = 0; i < 3; i++) {
expect(await KoderTestState.itemSyncState('item-$i'), 'pending');
}
// 5. Re-enable network. Wait for outbox drain.
await KoderTestInput.setNetworkEnabled(true);
await KoderTestState.waitFor(
() async => (await KoderTestState.outboxCount()) == 0,
timeout: Duration(seconds: 30),
);
// 6. Server sees exactly 3 — no duplicates (Idempotency-Key working).
final remote = await KoderTestState.serverItems();
expect(remote.length, 3);
expect(remote.map((e) => e['title']).toSet(), {'item-0','item-1','item-2'});
});
}
Run
cd app && flutter test integration_test/offline_test.dart \
-d linux # or s.khost1 emulator per test-host-isolation.kmd
Assert
Test passes (Dart-level expectations). Server-side row count matches client-side count (no duplicates from retry-after-reconnect).
Notes
- Conflict resolution: if the user edits the same item while offline
and online sessions run concurrently, last-writer-wins is rarely
correct. Specs in
specs/data-sync/conflict-resolution.kmd(future) will govern; per-component overrides inkoder.toml [sync]. - Test SDK path:
setNetworkEnabledis exposed byengines/sdk/koder_test_inputperheadless-first.kmdR8. Don't shell out toadbornmclidirectly.
T8 — Cross-surface compat coverage (R8.1, R8.4)
Goal: combinação cross-surface (mobile × desktop × web × TV × CLI) × pares de versões dentro da janela R1.1, cobertura mínima 80%.
Setup
# registries/variant-compat-matrix.md (per-component)
# Each row: client-surface × client-version × server-surface × server-version
# Cell value: { status: pass|fail|untested, evidence: <ci-run-url> }
Run
Automated by CI; manual rows added when an integration test passes:
# .gitea/workflows/cross-surface-compat.yml
jobs:
matrix:
strategy:
matrix:
client_surface: [mobile, desktop, web, tv, cli]
client_version: [N-1, N]
server_version: [N-1, N]
steps:
- run: ./tests/cross-surface/run.sh \
${{ matrix.client_surface }} \
${{ matrix.client_version }} \
${{ matrix.server_version }}
Assert
- Per-component CI publishes
coverage.jsontoregistries/variant-compat-matrix.mdviakoder-spec-audit always-on --report --json. - Release gate (CI step): fail if covered < 80% of matrix cells.
Notes
- Surface multiplication is huge; pragmatic minimum is the largest active surface × oldest surface in window per release. Full N×N matrix only for crit components (auth, sync, identity).
- Coverage matrix file format will be formalised in
specs/testing/coverage-matrix.mdv1 (currently v0).
T9 — Failover regional (R7.1, R7.2, R7.4)
Goal: simular queda da região primária; medir tempo até traffic re-roteado; assertir ≤ 60s e zero data loss.
Setup
# Two regions: primary (us-east) and replica (eu-west)
# DNS health-check has 30s TTL (R7.2)
# Replica DB lags primary by ≤ 5s async replication
Run
# tests/failover/regional.sh
# 1. Verify primary serving
primary_ip=$(dig +short app.koder.dev)
echo "Primary IP: $primary_ip"
curl -fsS "https://app.koder.dev/healthz"
# 2. Inject failure: stop primary region (use deployment-specific cmd)
incus stop --project=us-east app-primary
# 3. Mark t0; poll DNS + healthz until we're routed elsewhere
t0=$(date +%s)
until [ "$(dig +short app.koder.dev | head -1)" != "$primary_ip" ]; do
sleep 2
age=$(( $(date +%s) - t0 ))
if [ $age -gt 120 ]; then
echo "FAIL: failover > 120s — R7.2 violation"; exit 1
fi
done
t1=$(date +%s)
echo "Failover completed in $((t1-t0))s"
# 4. Verify replica region serving the latest data
last_id_before=$(curl -fsS "https://app.koder.dev/last-record-id-before-failover")
last_id_after=$(curl -fsS "https://app.koder.dev/items/$last_id_before")
test "$last_id_after" != "404" || {
echo "FAIL: data lost on failover — R7.4 violation"; exit 1
}
Assert
- Failover window ≤ 60s (default; R7.2).
- Zero data loss for committed writes ≥ replication-lag seconds before the outage.
/healthzfrom the new primary returns 200 within ≤ 5s after DNS flip.
Notes
- Replication lag is the data-loss budget. Async replication at ≤ 5s lag means writes done < 5s before outage may not have replicated. Per R7.4, backup snapshots (≥ hourly) bound this further.
- DNS TTL: 30s default per R7.2; tighter TTL = faster failover but more DNS query load. Calibrate per component.
- Anycast components (Koder Jet, Koder ID) skip DNS-flip and use BGP withdrawal; the same outcome (≤ 60s window, zero data loss) applies but the mechanism differs. Anycast-specific recipe pending.
Coverage gate
Per policies/always-on.kmd § Gate de release, a component cannot
release if T1–T9 are missing or failing and the gate is not
deferred via always-on-debt.md. The auditor koder-spec-audit always-on --strict reads this state and exits non-zero on missing
coverage.
Per-component implementation lives in <component>/tests/ paths:
| Test | Default location |
|---|---|
| T1 | <component>/tests/compat/ |
| T2 | <component>/internal/wire/*_test.go (or equivalent) |
| T3 | <component>/tests/rollout/ |
| T4 | <component>/tests/migrations/ |
| T5 | <component>/tests/chaos/ |
| T6 | <component>/tests/resume/ |
| T7 | <component>/integration_test/ (Flutter) or <component>/tests/offline/ |
| T8 | CI matrix + registries/variant-compat-matrix.md rows |
| T9 | <component>/tests/failover/ (single-region: opt out via debt entry) |
Status
- v0.1 (2026-05-24): receitas iniciais. T1–T5 e T9 têm exemplos
reusáveis. T7 depende de SDKs Dart (
koder_test_input,koder_test_state) — verificar versão antes de copiar. - Promoção pra v1.0: depois que ≥ 3 componentes shipparem ao menos T1+T3+T5 e validarem as receitas em produção.
- Próximas slices: receita pra IPC entre apps Koder (combina com
specs/ipc/protocol.kmd); receita anycast/BGP pro T9 alternativo.
Referências
policies/always-on.kmdpolicies/headless-first.kmdpolicies/test-host-isolation.kmdspecs/testing/coverage-matrix.md