AI voice mode UI

ai-ui specs/ai-ui/voice-mode.kmd

Fullscreen voice conversation mode UI: waveform (input/output color-coded), push-to-talk + always-on toggle, barge-in visual, mute, end session. Extends voice/wake-word.kmd with conversational mode semantics. Required for Talk product + future Kortex voice mode.

Quando esta spec se aplica

Triggers primários

Render fullscreen voice conversation UI

Todos os triggers

Build conversational voice mode in any AI product
Implement Talk voice surface

Spec — AI voice mode UI

Spec base: voice/wake-word.kmd cobre toggles + backend. Esta spec cobre UX do modo conversacional ao vivo. Trigger: mic button no multimodal-input.kmd (#116). Impl ticket: services/ai/ai#115.

Princípios

Fullscreen focus — voice mode = dedicated surface, não inline.
Visual feedback — waveform input vs output color-coded.
Barge-in supported — user fala enquanto IA fala; UX deve sinalizar transição.
One-tap escape — mute / end always reachable.
Optional transcription overlay — toggle live STT visible.

R1 — Anatomia

┌─────────────────────────────────────────────┐
│  [✕]                                        │  ← top: end session
│                                             │
│            [Assistant Icon]                 │
│         Generated by AI — verify            │
│                                             │
│   ╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮              │
│  ─╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰─             │  ← waveform (output = assistant talking)
│                                             │
│   "Sure, here's what I think..."            │  ← optional transcript overlay
│                                             │
│                                             │
│                                             │
│       ┌────────────────────────┐            │
│       │  Push to talk    [PTT] │            │  ← mode toggle
│       │  Always on       [AON] │            │
│       └────────────────────────┘            │
│                                             │
│                  [🎙 mute]                  │
└─────────────────────────────────────────────┘

Slots:

Slot	Function
End button (✕)	Close voice mode; return to composer or chat surface
Avatar	Assistant icon (configurable)
AI disclaimer	Subtle label per `ai-disclaimer.kmd` R1 tier 1
Waveform	Animated; color shifts per state (R2)
Transcript overlay	Optional toggle; shows STT live (input) + TTS source (output)
Mode toggle	Push-to-talk vs Always-on
Mute button	1-tap mute mic; visual indicator state

R2 — Waveform states

Color coding per themes/color-roles.kmd:

State	Waveform color	Behavior
idle	`text-muted` flat line	no input/output
listening (input)	`accent` animated bars	input audio captured
processing	`text-muted` pulsing dot	waiting for response
speaking (output)	`success` animated bars	output audio playing
barge-in transition	Crossfade accent ↔ success	brief overlap when user speaks during output

Visual MUST honor reduced-motion: replace animation with static "...listening" / "speaking" labels.

R3 — Push-to-talk vs Always-on

Mode	Behavior
Push-to-talk (PTT)	Hold button (or spacebar) to record; release to send. Default for noisy environments.
Always-on (AON)	Continuous capture with VAD (voice activity detection). Default for hands-free.

Toggle persists per-user. Cross-link voice/wake-word.kmd R1 toggles (voice.enabled, talkMode).

R4 — Barge-in

When user speaks during assistant output (barge-in):

Per voice/wake-word.kmd R5: bargeIn: true → output audio fades out + input fades in.
Visual: waveform color crossfades accent ↔ success over ~200ms.
Output TTS interrupted; new input streamed to backend.
Audit: barge-in event logged.

R5 — Transcript overlay

Toggle button (default OFF). When ON:

Live STT text appears overlay below waveform (input state).
Live TTS source text appears (output state).
Auto-scroll; max 3 lines visible.
After mode end: transcript persisted to conversation history per conversation-history.kmd (#115).

R6 — End session

Top-right ✕ button OR swipe-down gesture (mobile):

End TTS playback gracefully (fade out 200ms).
Disconnect WebSocket from services/ai/voice.
Return user to composer OR chat history (configurable per product).
Final transcript saved to conversation history.

R7 — Mute

1-tap toggle bottom-center mic button:

Mute: mic input gated locally; backend still connected.
Unmute: input resumes.
Mute state announced via aria-live.
Visual: mic icon strikethrough.

R8 — Surface bindings

Surface	API
Flutter	`KoderVoiceModeSheet({onEnd, onMute, onBargeIn, transcriptToggle})` em `koder_kit/lib/src/ai/voice_mode_sheet.dart`
Web	`<koder-voice-mode-sheet>`
Compose Android	`KoderVoiceModeSheet` em `koder-design-compose` (futuro)
SwiftUI iOS	idem em `koder-design-swift` (futuro)
CLI / TUI	Plain prompt-and-response; mic via system; no waveform

R9 — Acessibilidade

Sheet: role="dialog" aria-modal="true" aria-label="Voice conversation".
Waveform: aria-hidden="true" (visual only); state announced via aria-live ("Listening", "Speaking").
Buttons: keyboard accessible; spacebar for PTT.
Reduced-motion: waveform replaced by labels.
Screen reader: announces transition states.

R10 — i18n

Key	en-US	pt-BR
`ai.voice.mode.title`	"Voice mode"	"Modo de voz"
`ai.voice.mode.ptt`	"Push to talk"	"Pressionar para falar"
`ai.voice.mode.aon`	"Always on"	"Sempre ativo"
`ai.voice.mode.mute`	"Mute"	"Silenciar"
`ai.voice.mode.unmute`	"Unmute"	"Reativar"
`ai.voice.mode.end`	"End conversation"	"Encerrar conversa"
`ai.voice.mode.transcript_toggle`	"Show transcript"	"Mostrar transcrição"
`ai.voice.state.listening`	"Listening..."	"Ouvindo..."
`ai.voice.state.processing`	"Processing..."	"Processando..."
`ai.voice.state.speaking`	"Speaking..."	"Falando..."

R11 — Per-preset

Cosmetic only.

T-suite

T1 Mount: voice mode sheet renders; default mode PTT or AON per user pref.
T2 State transitions: idle → listening → processing → speaking → idle.
T3 Waveform colors: each state correct color.
T4 PTT mode: hold spacebar → recording; release → send.
T5 AON mode: VAD-triggered start/stop.
T6 Barge-in: speak during output → crossfade accent ↔ success; output TTS interrupted.
T7 Mute: 1-tap → mic gated; aria announces "Muted".
T8 End: tap ✕ → fade out + cleanup + return to composer; transcript saved.
T9 Transcript toggle: enable → overlay visible during conversation.
T10 Reduced-motion: waveform replaced by text labels.
T11 A11y: aria-live announces state transitions.
N1 Mic permission revoked mid-session: graceful end + prompt to re-grant.

Cross-link

Base spec: voice/wake-word.kmd (toggles + backend)
Companion: multimodal-input.kmd (#116 — composer mic button entry point), conversation-history.kmd (transcript persist), ai-disclaimer.kmd (disclaimer label R1)
Backend: services/ai/voice/, services/ai/ai/backlog/pending/115-cli-desktop-voice-input.md

Referências

meta/docs/stack/specs/voice/wake-word.kmd
meta/docs/stack/specs/ai-ui/multimodal-input.kmd
meta/docs/stack/specs/ai-ui/chat-message-bubble.kmd