Pular para o conteúdo

Multi-tenancy contract

multi-tenancy specs/multi-tenancy/contract.kmd

Corpo da especificação

Spec — Multi-tenancy contract

Mecanismos concretos para implementar policies/multi-tenant-by-default.kmd. Spec é normativo: todo módulo da Koder Stack que armazena dado-de-usuário deve passar nos checks T1–T9 abaixo.

Identity model

koder_user_id   BIGINT NOT NULL    -- FK to services/foundation/id.user(id)
workspace_id    BIGINT             -- nullable; FK to id.workspace(id)

koder_user_id é canonical PK partial em toda tabela com PII. workspace_id amplifica scope: nullable significa "pessoal"; non-null significa "do workspace, todos os members veem por membership".

Tabela canonical de membership (Koder ID):

CREATE TABLE workspace_member (
  workspace_id BIGINT NOT NULL,
  koder_user_id BIGINT NOT NULL,
  role TEXT NOT NULL,             -- 'owner' | 'admin' | 'member' | 'viewer'
  joined_unix BIGINT NOT NULL,
  PRIMARY KEY (workspace_id, koder_user_id)
);

Qualquer query cross-workspace passa por essa tabela. Não cacheia membership client-side — é hot path do auth, fica em services/foundation/id com cache de 60s server-side.

Scope model — 4 categories + system-wide

The koder_user_id + workspace_id identity model above is the tenant scope — every product table that holds PII goes here. Foundation services (auth, audit, behavior, device, telephony, SSO …) also need isolation, but along axes that are not the product-tenant axis. The Koder Stack therefore defines four isolation scopes plus an explicit no-RLS bucket. Org → Workspace → Project (RFC-017) is the product-side nesting: org_id is the billing/identity boundary, workspace_id the governance boundary, and project_id the delivery boundary where resources live:

ScopeDiscriminator columnGUC settingHelperWhen to use
Tenant-scopedkoder_user_id + optional workspace_idapp.current_tenant_idkdb.WithTenantTxDefault for product data with PII (Dek recordings, Drive blobs, Talk messages, Kruze bookmarks, …)
Project-scopedproject_id (nested under workspace_id)app.current_project_idkdb.WithProjectTxProduct resources owned by a delivery Project inside a Workspace (RFC-017): repos, datasets, deployments, agents, files, batches. The narrowest product axis — prevents cross-project leaks within one workspace
Org-scopedorg_idapp.current_org_idkdb.WithOrgTxFoundation tables owned by an org: audit events, MFA policies, access policies, org settings
User-scopeduser_id (== koder_user_id at the foundation layer)app.current_user_idkdb.WithUserTxFoundation tables owned by a single user: behavior events, QR sessions, sync state, telephony OTPs, user-scoped webhooks/api keys, SSO grants/sessions
System-wide(none)(none)(none — admin auth)Tables shared across all tenants by design: rate-limit buckets, SSO client registry, SSO revocation list. Access is governed by admin auth, not RLS

Selection rules:

  1. Tenant-scoped is the default for product data. Tables that surface in a product UI carry koder_user_id.
  2. Org-scoped applies when the primary owner is an org, not a user — even if individual user IDs are recorded. Audit events for an org are visible to every admin of that org regardless of who triggered the event.
  3. User-scoped applies when the row is per-user inside a foundation service and there is no org or product tenant in play. Behavior risk profiles, QR pairing sessions, sync state.
  4. Project-scoped applies to product resources owned by a delivery Project (RFC-017) — repos, datasets, deployments, agents, files, batches. It is the narrowest product axis, nested under workspace_id; a leak between two projects of the same workspace is an isolation failure. The full materialized path on such a row is koder_user_id + org_id + workspace_id + project_id.
  5. System-wide is the explicit exception. It must be justified in the table's migration comment (e.g. "global rate-limit counters; no per-tenant cardinality"). Drift toward this scope without justification is a spec violation.

Mixed-scope tables (e.g. device carrying both user_id and org_id columns) either split into two migrations_rls files or use a composite policy:

USING (org_id   = current_setting('app.current_org_id',   true)
    OR user_id = current_setting('app.current_user_id', true))

The composite form is acceptable when both isolation axes apply to the same row simultaneously; otherwise prefer the split.

PAT scope grammar

PAT (Personal Access Token) emitido pelo Koder ID carrega scope. Sintaxe canonical (herdado do Flow RFC-003 credentials/backups):

<verb>:<resource>[:<modifier>]

verbs:    read | write | admin
resources: user | workspace | repo | credentials | backups | …
modifier:  optional, e.g. "self" or "<id>"

Exemplos:

  • read:user — ler perfil próprio
  • write:credentials — escrever credentials no scope que o PAT herdou
  • read:workspace:<id> — ler dados específicos de um workspace
  • admin:user — privileged self-management

PAT é scoped to a single koder_user_id (o owner). Workspace access é resolvido via workspace_member na hora do request, não via PAT scope. PATs não atravessam tenants.

RLS template (Postgres / kdb-next)

Toda tabela com PII tem RLS. Helper migration:

-- 1. Schema com tenant fields
CREATE TABLE my_resource (
  id BIGSERIAL,
  koder_user_id BIGINT NOT NULL REFERENCES koder_id.user(id),
  workspace_id BIGINT REFERENCES koder_id.workspace(id),
  payload JSONB NOT NULL,
  created_unix BIGINT NOT NULL DEFAULT extract(epoch from now()),
  PRIMARY KEY (koder_user_id, id)
);

-- 2. Index on tenant + recent
CREATE INDEX ix_my_resource_user_recent
  ON my_resource (koder_user_id, created_unix DESC);

-- 3. RLS enable + policy
ALTER TABLE my_resource ENABLE ROW LEVEL SECURITY;

CREATE POLICY p_owner ON my_resource
  USING (koder_user_id = current_setting('app.current_tenant_id', true)::BIGINT);

CREATE POLICY p_workspace_member ON my_resource
  USING (workspace_id IS NOT NULL
         AND EXISTS (
           SELECT 1 FROM koder_id.workspace_member m
           WHERE m.workspace_id = my_resource.workspace_id
             AND m.koder_user_id = current_setting('app.current_tenant_id', true)::BIGINT
         ));

Connection setup (per request):

// Tenant-scoped (default for product data with PII)
err := kdb.WithTenantTx(ctx, store, auth.UserID, func(ctx context.Context, s kdb.Store) error {
    return repo.List(ctx)
})

The helper opens a transaction, runs SELECT set_config('app.current_tenant_id', $1, true), and dispatches the callback through a *sql.Tx. Empty scope ID is fail-closed (returns an error rather than running unscoped).

Bypass admin path (rare): an explicit RESET app.current_tenant_id is privilege of the role koder_admin only. Audit log obrigatório em qualquer reset.

Project-scoped variant

For product resources owned by a delivery Project nested under a Workspace (RFC-017): repos, datasets, deployments, agents, files, batches. A row is visible if its project_id matches the scoped project or if the caller is a member of the project's workspace:

ALTER TABLE project_resource ENABLE ROW LEVEL SECURITY;

CREATE POLICY p_project ON project_resource
  USING (project_id = current_setting('app.current_project_id', true)::BIGINT);

CREATE POLICY p_project_member ON project_resource
  USING (EXISTS (
           SELECT 1
           FROM koder_id.project p
           JOIN koder_id.workspace_member m
             ON m.workspace_id = p.workspace_id
           WHERE p.id = project_resource.project_id
             AND m.koder_user_id = current_setting('app.current_tenant_id', true)::BIGINT
         ));
err := kdb.WithProjectTx(ctx, store, projectID, func(ctx context.Context, s kdb.Store) error {
    return repo.List(ctx)
})

Org-scoped variant

For foundation tables owned by an org (audit events, MFA policies, access policies):

ALTER TABLE org_resource ENABLE ROW LEVEL SECURITY;

CREATE POLICY p_org ON org_resource
  USING (org_id = current_setting('app.current_org_id', true)::BIGINT);
err := kdb.WithOrgTx(ctx, store, auth.OrgID, func(ctx context.Context, s kdb.Store) error {
    return repo.List(ctx)
})

User-scoped variant

For foundation tables owned by a single user (behavior events, QR sessions, sync state, telephony OTPs, user-scoped webhooks):

ALTER TABLE user_resource ENABLE ROW LEVEL SECURITY;

CREATE POLICY p_user ON user_resource
  USING (user_id = current_setting('app.current_user_id', true)::BIGINT);
err := kdb.WithUserTx(ctx, store, auth.UserID, func(ctx context.Context, s kdb.Store) error {
    return repo.List(ctx)
})

All three helpers share the same body in services/foundation/id/engine/pkg/kdb/rls.go::withScopeTx — fail-closed on empty scope, ctx-stash propagated to nested calls, test coverage in rls_scope_test.go.

KV / cache template (Redis-style)

Toda key tem prefixo de tenant:

<namespace>:<tenant-key>:<resource-key>

examples:
  rate_limit:user:<uid>:5h_window     → counter
  session:user:<uid>:<session_id>     → JSON
  presence:workspace:<wid>:<uid>      → boolean

Helper:

func TenantKey(uid int64, parts ...string) string {
    return fmt.Sprintf("user:%d:%s", uid, strings.Join(parts, ":"))
}

Key sem prefixo → bug crítico (cross-tenant leak via cache).

S3 / object storage template

Path:

<bucket>/<koder_user_id>/<workspace_id|"personal">/<resource_id>/<file>

IAM / signed-URL: per-request, restricted to the tenant prefix.

Test contract — T1..T12

Todo módulo multi-tenant tem suite que cobre:

Tenant-scoped baseline (T1..T9 — every product table)

IDTestDescription
T1Auth requiredGET /resource sem PAT → 401
T2Self readA's PAT, GET /my-resource → A's data only
T3Cross-tenant read deniedA's PAT, GET /resource/<B's id>404 (not 403)
T4Cross-tenant write deniedA's PAT, POST /resource setting koder_user_id=B → 400 or silent override to A
T5Workspace member readA in workspace W, GET /resource?workspace=W → all members' data
T6Workspace non-member readA not in W, GET /resource?workspace=W → 404
T7RLS isolationDirect DB query without SET LOCAL app.current_tenant_id → returns nothing (or error)
T8Index efficiencyEXPLAIN of A's read uses tenant index, not seq scan
T9Tenant deletionWhen user A is deleted, all WHERE koder_user_id = A rows are removed within retention window

Org/user-scoped extension (T10..T12 — foundation services)

Required for any table whose RLS policy is keyed by org_id or user_id (not koder_user_id). Sibling of T3+T7 but along the secondary axis.

IDTestDescription
T10Cross-org read deniedAdmin of org A, GET /audit-events from org B context → 404 (not 403). Mirrors T3 for app.current_org_id.
T11Cross-user read denied (user-scoped foundation)User A's session, GET /qr-sessions/<B's id> → 404. Mirrors T3 for app.current_user_id.
T12Scope isolation in RLSDirect DB query without the appropriate SET LOCAL app.current_org_id / app.current_user_id → returns nothing (or error). Mirrors T7 for the secondary axis.

System-wide tables (rate-limit buckets, SSO client registry, SSO revocation list) intentionally have no T10..T12 row — they are not scope-protected by RLS and instead rely on admin auth. Their migration MUST carry a -- system-wide: <justification> comment that audit will read.

Cada implementação ships com tests/multi-tenant/T1..T9_test.go (produto) ou tests/multi-tenant/T10..T12_test.go (foundation org/user-scoped). Audit: PR sem o subconjunto aplicável verde bloqueia merge (ver policies/regression-tests.kmd co-enforcement).

Error model

CenárioHTTPgRPCBody
Sem auth401UNAUTHENTICATED{"error": "auth required"}
Token inválido401UNAUTHENTICATED{"error": "invalid token"}
Recurso não-existente OU de outro tenant404NOT_FOUND{"error": "not found"}
Recurso existe mas role insuficiente (workspace member sem write)403PERMISSION_DENIED{"error": "insufficient role"}
Bad input400INVALID_ARGUMENT{"error": "<details>"}
Server error500INTERNAL{"error": "internal"}

Crítico: 404, não 403, em cross-tenant cases. 403 vaza existência ("este id existe mas você não pode ler" → atacante sabe que existe).

Audit log

Toda operação mutating que toca PII grava audit row:

CREATE TABLE audit_log (
  id BIGSERIAL PRIMARY KEY,
  actor_user_id BIGINT NOT NULL,    -- the PAT owner
  target_user_id BIGINT,            -- tenant being acted on (often = actor)
  action TEXT NOT NULL,             -- 'create' | 'update' | 'delete' | 'read_admin'
  resource TEXT NOT NULL,           -- 'credentials' | 'usage' | …
  resource_id BIGINT,
  payload JSONB,
  created_unix BIGINT NOT NULL
);
CREATE INDEX ix_audit_actor ON audit_log (actor_user_id, created_unix DESC);
CREATE INDEX ix_audit_target ON audit_log (target_user_id, created_unix DESC);

Audit row é best-effort write (failure logs but doesn't abort the action; ver flow#056b policy).

Sharding model (futuro, hyperscale)

Quando uma tabela passar de ~10M rows ou ~100K tenants ativos:

  • Range-shard por koder_user_id (TiKV PD faz isso automático em kdb-next)
  • Hash-shard via hash(koder_user_id) % N (alternativa em Postgres com Citus / pg_partman)
  • Geo-shard por região do tenant (multi-region future, ver stack-RFC-001 §faseamento)

Não pré-otimizar. Trigger: monitoring sinalizar p99 latência > 50ms ou table size > 1TB.

Edge cases

User rename / handle change

koder_user_id é immutable — handle (@username) muda; ID não. Toda referência cross-table usa koder_user_id (BIGINT), nunca handle.

Workspace transfer

Workspace muda de owner: workspace.owner_id muda; workspace_id permanece. Resources com workspace_id = X continuam acessíveis pelos members atuais.

Account deletion (GDPR-style)

Quando user pede delete:

  1. Set user.deleted_unix = NOW() (soft delete)
  2. Cron job de retention varre tabelas e deleta rows WHERE koder_user_id = X AND <table-specific retention>
  3. Audit row em audit_log registra "user_deleted" antes da limpeza
  4. Retention default: 30 dias (configurável per-tenant pra compliance).

Account merge

Out of scope — Koder Stack não suporta merge automático de accounts. Admin-only manual operation se necessário.

Spec audit

Aplicabilidade automática (futuro: koder-spec-audit multi-tenancy):

  • Escaneia migrations: tabelas com PII columns (email, name, password*, key*) sem koder_user_id nem user_id / org_id (3-scope model) → flag
  • Escaneia routers: endpoints sem auth middleware → flag
  • Escaneia código: SELECT * FROM <pii-table> sem WHERE → flag
  • Escaneia env vars: shared cache keys sem prefixo → flag
  • Escaneia migrations sem ALTER TABLE … ENABLE ROW LEVEL SECURITY para tabelas que carregam coluna koder_user_id / user_id / org_id → flag
  • Escaneia migrations system-wide (-- system-wide: comment): cada tabela sem RLS deve ter uma justification clause de uma linha, ou o flag aponta um candidato a auto-elevar para org-scoped.

Severity: error (block release) na primeira release que adopt o audit; advisory antes.

Referências