Pular para o conteúdo

Kover A/B Report & Gate Verdict

kover specs/kover/ab-report.kmd

Contrato de SAÍDA do Koder Kover: o JSON estável que `kover ab --json` e `kover gate --json` emitem para consumidores downstream (perf-gate de CI, bundle do Kortex, dashboards). O `protocol.kmd` define como um programa é OBSERVADO (entrada); este define o que o Kover PRODUZ a partir de uma comparação A/B — o `Report` (mediana + IQR por métrica, com flag de significância) e o `Verdict` (resultado do gate por orçamento). Schema versionado e forward-compatível; toda regra `R*` é testável.

Quando esta spec se aplica

Triggers primários

Todos os triggers

Corpo da especificação

Spec — Kover A/B Report & Gate Verdict (output contract v0.1)

This spec defines the machine-readable output of a Kover A/B run: the Report (kover ab --json) and the gate Verdict (kover gate --json). Where protocol.kmd is the input contract (how a program is observed), this is the output contract (what Kover emits). It is the surface a CI perf-gate, the Kortex bundle, and dashboards consume. Every rule R* is testable; tests T* at the end.

Scope

Applies to any consumer of a Kover comparison: CI pipelines that gate on performance, the Kortex handoff (RFC-001 §8), and dashboards. The contract is mode-agnostic — a single run is repeats=1 (every Stat has iqr=0), so one schema serves a one-off comparison and an N-run benchmark.

R1 — The metric set is closed and ordered

R1.1 — A Report reports this closed, ordered metric set. Render metrics first, then resources:

metricMeaningUnit
fcp_msfirst-contentful-paintms
dom_interactive_msnavigation domInteractivems
load_msnavigation loadEventEndms
lcp_mslargest-contentful-paint (Core Web Vital)ms
clscumulative layout shift (Core Web Vital)score (unitless)
rss_mbresident set, whole process treeMiB
cpu_pctCPU%, whole process treepercent

R1.2 — The same metric descriptor set drives both the single-run delta and the repeated median (one source of truth) — a producer MUST NOT emit a metric in one mode that it omits in the other.

R2 — Stat: median + inter-quartile spread

R2.1 — Each target's value for a metric is a Stat:

{ "median": 308.0, "p25": 302.0, "p75": 314.0, "min": 298.0, "max": 320.0, "n": 3 }

R2.2 — Quartiles use linear interpolation between closest ranks (the numpy/"type-7" default). iqr = p75 − p25 is the metric's spread. With n=1 every quantile equals the single value and iqr=0.

R2.3 — A single Kover run is noise; the benchmark signal is the median plus the IQR over repeats runs (perf-baseline.md: report median + IQR, never a lone run). A producer of a multi-run Report MUST populate Stat from all repeats, not the last.

R3 — MetricStats and the significance rule

R3.1 — One metric across both targets:

{ "metric": "load_ms", "a": <Stat>, "b": <Stat>,
  "delta_median": 6.8, "significant": true }

delta_median = a.median − b.median (positive ⇒ target A is heavier/slower).

R3.2 — significant is true iff |delta_median| exceeds BOTH targets' IQR. This is the "real difference vs run-to-run jitter?" rule. It is a serialized field, not a hint: a consumer reads it directly and MUST NOT recompute a different significance from the quartiles. With n=1 (both iqr=0) any non-zero delta is significant.

R4 — Report: the full A/B result

R4.1 — kover ab --json emits:

{ "url": "scenario:flow.json", "primary": "kruze", "secondary": "chrome",
  "repeats": 3, "metrics": [ <MetricStats>, … ] }

R4.2 — primary is target A, secondary is target B — fixed, so delta_median signs are stable across consumers. url is the page (or scenario:<file> when a scenario drove the run).

R5 — Verdict: the gate result over a budget

R5.1 — A budget is per-metric ceilings on the A−B median delta:

{ "metrics": { "load_ms": { "max_delta": 50 }, "rss_mb": { "max_delta": 100 } } }

A metric absent from the budget is reported but never gates the build.

R5.2 — kover gate --json emits a Verdict:

{ "pass": false, "results": [
  { "metric": "load_ms", "delta": 75.0, "budget": 50.0,
    "gated": true, "significant": true, "regressed": true }, … ] }

R5.3 — A metric regressed is true iff gated AND delta > max_delta AND significant. Over-budget but not significant is run-to-run jitter and MUST NOT regress (anti-flaky-gate). Verdict.pass is false iff any metric regressed.

R5.4 — A failed scenario assertion (a kover gate --scenario run whose replay failed; scenario-dsl.kmd R1.1 assert) fails the gate distinctly from a budget regression — the producer signals it as a scenario failure, not a regressed metric. A consumer MUST treat a non-zero gate exit without a regressed metric as a correctness failure, not a perf regression.

R6 — Versioning & forward-compat

R6.1 — Unknown fields MUST be ignored by consumers (forward-compat), never fatal. New metrics are additive to the R1.1 set; a minor schema bump never removes or renames a field.

Test cases

#CheckSeverity
T1A Report lists exactly the R1.1 metrics, in order, render-before-resources.hard
T2Stat quartiles match type-7 interpolation; iqr = p75 − p25; n=1 ⇒ iqr=0.hard
T3significant is true iff `delta_median
T4delta_median = a.median − b.median with primary=A, secondary=B.hard
T5A regressed metric satisfies gated ∧ delta>max_delta ∧ significant; over-budget-not-significant does not regress (R5.3).hard
T6Verdict.pass=false iff some metric regressed; a scenario-assertion failure is signalled distinctly (R5.4).hard
T7A consumer ignores an unknown field/metric without erroring (R6.1).soft

Non-goals

  • The connector / input contract — owned by protocol.kmd.
  • Scenario step semantics — owned by scenario-dsl.kmd.
  • Capture byte storage — owned by capture.kmd.
  • The measurement method (how FCP/RSS/CPU are sampled) — implementation detail of products/dev/kover, not part of the wire contract.

Notes

The reference producer is the kover CLI (products/dev/kover): kover ab --json (Report), kover gate --json (Verdict). The Go types ab.Report, ab.MetricStats, ab.Stat, and gate.Verdict are the canonical encoders.

Referências