Multi-tenancy contract
multi-tenancy specs/multi-tenancy/contract.kmd
Specification body
Spec — Multi-tenancy contract
Mecanismos concretos para implementar
policies/multi-tenant-by-default.kmd. Spec é normativo: todo módulo
da Koder Stack que armazena dado-de-usuário deve passar nos checks
T1–T9 abaixo.
Identity model
koder_user_id BIGINT NOT NULL -- FK to services/foundation/id.user(id)
workspace_id BIGINT -- nullable; FK to id.workspace(id)
koder_user_id é canonical PK partial em toda tabela com PII.
workspace_id amplifica scope: nullable significa "pessoal";
non-null significa "do workspace, todos os members veem por
membership".
Tabela canonical de membership (Koder ID):
CREATE TABLE workspace_member (
workspace_id BIGINT NOT NULL,
koder_user_id BIGINT NOT NULL,
role TEXT NOT NULL, -- 'owner' | 'admin' | 'member' | 'viewer'
joined_unix BIGINT NOT NULL,
PRIMARY KEY (workspace_id, koder_user_id)
);
Qualquer query cross-workspace passa por essa tabela. Não cacheia
membership client-side — é hot path do auth, fica em
services/foundation/id com cache de 60s server-side.
Scope model — 4 categories + system-wide
The koder_user_id + workspace_id identity model above is the
tenant scope — every product table that holds PII goes here.
Foundation services (auth, audit, behavior, device, telephony,
SSO …) also need isolation, but along axes that are not the
product-tenant axis. The Koder Stack therefore defines four
isolation scopes plus an explicit no-RLS bucket. Org → Workspace →
Project (RFC-017) is the product-side nesting: org_id is the
billing/identity boundary, workspace_id the governance boundary, and
project_id the delivery boundary where resources live:
| Scope | Discriminator column | GUC setting | Helper | When to use |
|---|---|---|---|---|
| Tenant-scoped | koder_user_id + optional workspace_id | app.current_tenant_id | kdb.WithTenantTx | Default for product data with PII (Dek recordings, Drive blobs, Talk messages, Kruze bookmarks, …) |
| Project-scoped | project_id (nested under workspace_id) | app.current_project_id | kdb.WithProjectTx | Product resources owned by a delivery Project inside a Workspace (RFC-017): repos, datasets, deployments, agents, files, batches. The narrowest product axis — prevents cross-project leaks within one workspace |
| Org-scoped | org_id | app.current_org_id | kdb.WithOrgTx | Foundation tables owned by an org: audit events, MFA policies, access policies, org settings |
| User-scoped | user_id (== koder_user_id at the foundation layer) | app.current_user_id | kdb.WithUserTx | Foundation tables owned by a single user: behavior events, QR sessions, sync state, telephony OTPs, user-scoped webhooks/api keys, SSO grants/sessions |
| System-wide | (none) | (none) | (none — admin auth) | Tables shared across all tenants by design: rate-limit buckets, SSO client registry, SSO revocation list. Access is governed by admin auth, not RLS |
Selection rules:
- Tenant-scoped is the default for product data. Tables that
surface in a product UI carry
koder_user_id. - Org-scoped applies when the primary owner is an org, not a user — even if individual user IDs are recorded. Audit events for an org are visible to every admin of that org regardless of who triggered the event.
- User-scoped applies when the row is per-user inside a foundation service and there is no org or product tenant in play. Behavior risk profiles, QR pairing sessions, sync state.
- Project-scoped applies to product resources owned by a
delivery Project (RFC-017) — repos, datasets, deployments,
agents, files, batches. It is the narrowest product axis, nested
under
workspace_id; a leak between two projects of the same workspace is an isolation failure. The full materialized path on such a row iskoder_user_id + org_id + workspace_id + project_id. - System-wide is the explicit exception. It must be justified in the table's migration comment (e.g. "global rate-limit counters; no per-tenant cardinality"). Drift toward this scope without justification is a spec violation.
Mixed-scope tables (e.g. device carrying both user_id and
org_id columns) either split into two migrations_rls files
or use a composite policy:
USING (org_id = current_setting('app.current_org_id', true)
OR user_id = current_setting('app.current_user_id', true))
The composite form is acceptable when both isolation axes apply to the same row simultaneously; otherwise prefer the split.
PAT scope grammar
PAT (Personal Access Token) emitido pelo Koder ID carrega scope. Sintaxe canonical (herdado do Flow RFC-003 credentials/backups):
<verb>:<resource>[:<modifier>]
verbs: read | write | admin
resources: user | workspace | repo | credentials | backups | …
modifier: optional, e.g. "self" or "<id>"
Exemplos:
read:user— ler perfil própriowrite:credentials— escrever credentials no scope que o PAT herdouread:workspace:<id>— ler dados específicos de um workspaceadmin:user— privileged self-management
PAT é scoped to a single koder_user_id (o owner). Workspace
access é resolvido via workspace_member na hora do request, não
via PAT scope. PATs não atravessam tenants.
RLS template (Postgres / kdb-next)
Toda tabela com PII tem RLS. Helper migration:
-- 1. Schema com tenant fields
CREATE TABLE my_resource (
id BIGSERIAL,
koder_user_id BIGINT NOT NULL REFERENCES koder_id.user(id),
workspace_id BIGINT REFERENCES koder_id.workspace(id),
payload JSONB NOT NULL,
created_unix BIGINT NOT NULL DEFAULT extract(epoch from now()),
PRIMARY KEY (koder_user_id, id)
);
-- 2. Index on tenant + recent
CREATE INDEX ix_my_resource_user_recent
ON my_resource (koder_user_id, created_unix DESC);
-- 3. RLS enable + policy
ALTER TABLE my_resource ENABLE ROW LEVEL SECURITY;
CREATE POLICY p_owner ON my_resource
USING (koder_user_id = current_setting('app.current_tenant_id', true)::BIGINT);
CREATE POLICY p_workspace_member ON my_resource
USING (workspace_id IS NOT NULL
AND EXISTS (
SELECT 1 FROM koder_id.workspace_member m
WHERE m.workspace_id = my_resource.workspace_id
AND m.koder_user_id = current_setting('app.current_tenant_id', true)::BIGINT
));
Connection setup (per request):
// Tenant-scoped (default for product data with PII)
err := kdb.WithTenantTx(ctx, store, auth.UserID, func(ctx context.Context, s kdb.Store) error {
return repo.List(ctx)
})
The helper opens a transaction, runs
SELECT set_config('app.current_tenant_id', $1, true), and
dispatches the callback through a *sql.Tx. Empty scope ID is
fail-closed (returns an error rather than running unscoped).
Bypass admin path (rare): an explicit RESET app.current_tenant_id
is privilege of the role koder_admin only. Audit log obrigatório
em qualquer reset.
Project-scoped variant
For product resources owned by a delivery Project nested under a
Workspace (RFC-017): repos, datasets, deployments, agents, files,
batches. A row is visible if its project_id matches the scoped
project or if the caller is a member of the project's workspace:
ALTER TABLE project_resource ENABLE ROW LEVEL SECURITY;
CREATE POLICY p_project ON project_resource
USING (project_id = current_setting('app.current_project_id', true)::BIGINT);
CREATE POLICY p_project_member ON project_resource
USING (EXISTS (
SELECT 1
FROM koder_id.project p
JOIN koder_id.workspace_member m
ON m.workspace_id = p.workspace_id
WHERE p.id = project_resource.project_id
AND m.koder_user_id = current_setting('app.current_tenant_id', true)::BIGINT
));
err := kdb.WithProjectTx(ctx, store, projectID, func(ctx context.Context, s kdb.Store) error {
return repo.List(ctx)
})
Org-scoped variant
For foundation tables owned by an org (audit events, MFA policies, access policies):
ALTER TABLE org_resource ENABLE ROW LEVEL SECURITY;
CREATE POLICY p_org ON org_resource
USING (org_id = current_setting('app.current_org_id', true)::BIGINT);
err := kdb.WithOrgTx(ctx, store, auth.OrgID, func(ctx context.Context, s kdb.Store) error {
return repo.List(ctx)
})
User-scoped variant
For foundation tables owned by a single user (behavior events, QR sessions, sync state, telephony OTPs, user-scoped webhooks):
ALTER TABLE user_resource ENABLE ROW LEVEL SECURITY;
CREATE POLICY p_user ON user_resource
USING (user_id = current_setting('app.current_user_id', true)::BIGINT);
err := kdb.WithUserTx(ctx, store, auth.UserID, func(ctx context.Context, s kdb.Store) error {
return repo.List(ctx)
})
All three helpers share the same body in
services/foundation/id/engine/pkg/kdb/rls.go::withScopeTx —
fail-closed on empty scope, ctx-stash propagated to nested calls,
test coverage in rls_scope_test.go.
KV / cache template (Redis-style)
Toda key tem prefixo de tenant:
<namespace>:<tenant-key>:<resource-key>
examples:
rate_limit:user:<uid>:5h_window → counter
session:user:<uid>:<session_id> → JSON
presence:workspace:<wid>:<uid> → boolean
Helper:
func TenantKey(uid int64, parts ...string) string {
return fmt.Sprintf("user:%d:%s", uid, strings.Join(parts, ":"))
}
Key sem prefixo → bug crítico (cross-tenant leak via cache).
S3 / object storage template
Path:
<bucket>/<koder_user_id>/<workspace_id|"personal">/<resource_id>/<file>
IAM / signed-URL: per-request, restricted to the tenant prefix.
Test contract — T1..T12
Todo módulo multi-tenant tem suite que cobre:
Tenant-scoped baseline (T1..T9 — every product table)
| ID | Test | Description |
|---|---|---|
| T1 | Auth required | GET /resource sem PAT → 401 |
| T2 | Self read | A's PAT, GET /my-resource → A's data only |
| T3 | Cross-tenant read denied | A's PAT, GET /resource/<B's id> → 404 (not 403) |
| T4 | Cross-tenant write denied | A's PAT, POST /resource setting koder_user_id=B → 400 or silent override to A |
| T5 | Workspace member read | A in workspace W, GET /resource?workspace=W → all members' data |
| T6 | Workspace non-member read | A not in W, GET /resource?workspace=W → 404 |
| T7 | RLS isolation | Direct DB query without SET LOCAL app.current_tenant_id → returns nothing (or error) |
| T8 | Index efficiency | EXPLAIN of A's read uses tenant index, not seq scan |
| T9 | Tenant deletion | When user A is deleted, all WHERE koder_user_id = A rows are removed within retention window |
Org/user-scoped extension (T10..T12 — foundation services)
Required for any table whose RLS policy is keyed by org_id or
user_id (not koder_user_id). Sibling of T3+T7 but along the
secondary axis.
| ID | Test | Description |
|---|---|---|
| T10 | Cross-org read denied | Admin of org A, GET /audit-events from org B context → 404 (not 403). Mirrors T3 for app.current_org_id. |
| T11 | Cross-user read denied (user-scoped foundation) | User A's session, GET /qr-sessions/<B's id> → 404. Mirrors T3 for app.current_user_id. |
| T12 | Scope isolation in RLS | Direct DB query without the appropriate SET LOCAL app.current_org_id / app.current_user_id → returns nothing (or error). Mirrors T7 for the secondary axis. |
System-wide tables (rate-limit buckets, SSO client registry, SSO
revocation list) intentionally have no T10..T12 row — they are
not scope-protected by RLS and instead rely on admin auth. Their
migration MUST carry a -- system-wide: <justification> comment
that audit will read.
Cada implementação ships com tests/multi-tenant/T1..T9_test.go
(produto) ou tests/multi-tenant/T10..T12_test.go (foundation
org/user-scoped). Audit: PR sem o subconjunto aplicável verde
bloqueia merge (ver policies/regression-tests.kmd
co-enforcement).
Error model
| Cenário | HTTP | gRPC | Body |
|---|---|---|---|
| Sem auth | 401 | UNAUTHENTICATED | {"error": "auth required"} |
| Token inválido | 401 | UNAUTHENTICATED | {"error": "invalid token"} |
| Recurso não-existente OU de outro tenant | 404 | NOT_FOUND | {"error": "not found"} |
| Recurso existe mas role insuficiente (workspace member sem write) | 403 | PERMISSION_DENIED | {"error": "insufficient role"} |
| Bad input | 400 | INVALID_ARGUMENT | {"error": "<details>"} |
| Server error | 500 | INTERNAL | {"error": "internal"} |
Crítico: 404, não 403, em cross-tenant cases. 403 vaza existência ("este id existe mas você não pode ler" → atacante sabe que existe).
Audit log
Toda operação mutating que toca PII grava audit row:
CREATE TABLE audit_log (
id BIGSERIAL PRIMARY KEY,
actor_user_id BIGINT NOT NULL, -- the PAT owner
target_user_id BIGINT, -- tenant being acted on (often = actor)
action TEXT NOT NULL, -- 'create' | 'update' | 'delete' | 'read_admin'
resource TEXT NOT NULL, -- 'credentials' | 'usage' | …
resource_id BIGINT,
payload JSONB,
created_unix BIGINT NOT NULL
);
CREATE INDEX ix_audit_actor ON audit_log (actor_user_id, created_unix DESC);
CREATE INDEX ix_audit_target ON audit_log (target_user_id, created_unix DESC);
Audit row é best-effort write (failure logs but doesn't abort
the action; ver flow#056b policy).
Sharding model (futuro, hyperscale)
Quando uma tabela passar de ~10M rows ou ~100K tenants ativos:
- Range-shard por
koder_user_id(TiKV PD faz isso automático em kdb-next) - Hash-shard via
hash(koder_user_id) % N(alternativa em Postgres com Citus / pg_partman) - Geo-shard por região do tenant (multi-region future, ver stack-RFC-001 §faseamento)
Não pré-otimizar. Trigger: monitoring sinalizar p99 latência > 50ms ou table size > 1TB.
Edge cases
User rename / handle change
koder_user_id é immutable — handle (@username) muda; ID
não. Toda referência cross-table usa koder_user_id (BIGINT),
nunca handle.
Workspace transfer
Workspace muda de owner: workspace.owner_id muda; workspace_id
permanece. Resources com workspace_id = X continuam acessíveis
pelos members atuais.
Account deletion (GDPR-style)
Quando user pede delete:
- Set
user.deleted_unix = NOW()(soft delete) - Cron job de retention varre tabelas e deleta rows
WHERE koder_user_id = X AND <table-specific retention> - Audit row em
audit_logregistra "user_deleted" antes da limpeza - Retention default: 30 dias (configurável per-tenant pra compliance).
Account merge
Out of scope — Koder Stack não suporta merge automático de accounts. Admin-only manual operation se necessário.
Spec audit
Aplicabilidade automática (futuro: koder-spec-audit multi-tenancy):
- Escaneia migrations: tabelas com PII columns (
email,name,password*,key*) semkoder_user_idnemuser_id/org_id(3-scope model) → flag - Escaneia routers: endpoints sem auth middleware → flag
- Escaneia código:
SELECT * FROM <pii-table>semWHERE→ flag - Escaneia env vars: shared cache keys sem prefixo → flag
- Escaneia migrations sem
ALTER TABLE … ENABLE ROW LEVEL SECURITYpara tabelas que carregam colunakoder_user_id/user_id/org_id→ flag - Escaneia migrations system-wide (
-- system-wide:comment): cada tabela sem RLS deve ter uma justification clause de uma linha, ou o flag aponta um candidato a auto-elevar para org-scoped.
Severity: error (block release) na primeira release que adopt o audit; advisory antes.
References
policies/multi-tenant-by-default.kmdrfcs/stack-RFC-001-kdb-as-unified-data-plane.kmdservices/foundation/id/engine/docs/rfcs/RFC-004-oauth2-oidc-service.md