AI voice mode UI
ai-ui specs/ai-ui/voice-mode.kmd
Fullscreen voice conversation mode UI: waveform (input/output color-coded), push-to-talk + always-on toggle, barge-in visual, mute, end session. Extends voice/wake-word.kmd with conversational mode semantics. Required for Talk product + future Kortex voice mode.
Quando esta spec se aplica
Triggers primários
- Render fullscreen voice conversation UI
Todos os triggers
- Build conversational voice mode in any AI product
- Implement Talk voice surface
Corpo da especificação
Spec — AI voice mode UI
Spec base:
voice/wake-word.kmdcobre toggles + backend. Esta spec cobre UX do modo conversacional ao vivo. Trigger: mic button nomultimodal-input.kmd(#116). Impl ticket:services/ai/ai#115.
Princípios
- Fullscreen focus — voice mode = dedicated surface, não inline.
- Visual feedback — waveform input vs output color-coded.
- Barge-in supported — user fala enquanto IA fala; UX deve sinalizar transição.
- One-tap escape — mute / end always reachable.
- Optional transcription overlay — toggle live STT visible.
R1 — Anatomia
┌─────────────────────────────────────────────┐
│ [✕] │ ← top: end session
│ │
│ [Assistant Icon] │
│ Generated by AI — verify │
│ │
│ ╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮ │
│ ─╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰─ │ ← waveform (output = assistant talking)
│ │
│ "Sure, here's what I think..." │ ← optional transcript overlay
│ │
│ │
│ │
│ ┌────────────────────────┐ │
│ │ Push to talk [PTT] │ │ ← mode toggle
│ │ Always on [AON] │ │
│ └────────────────────────┘ │
│ │
│ [🎙 mute] │
└─────────────────────────────────────────────┘
Slots:
| Slot | Function |
|---|---|
| End button (✕) | Close voice mode; return to composer or chat surface |
| Avatar | Assistant icon (configurable) |
| AI disclaimer | Subtle label per ai-disclaimer.kmd R1 tier 1 |
| Waveform | Animated; color shifts per state (R2) |
| Transcript overlay | Optional toggle; shows STT live (input) + TTS source (output) |
| Mode toggle | Push-to-talk vs Always-on |
| Mute button | 1-tap mute mic; visual indicator state |
R2 — Waveform states
Color coding per themes/color-roles.kmd:
| State | Waveform color | Behavior |
|---|---|---|
| idle | text-muted flat line | no input/output |
| listening (input) | accent animated bars | input audio captured |
| processing | text-muted pulsing dot | waiting for response |
| speaking (output) | success animated bars | output audio playing |
| barge-in transition | Crossfade accent ↔ success | brief overlap when user speaks during output |
Visual MUST honor reduced-motion: replace animation with static "...listening" / "speaking" labels.
R3 — Push-to-talk vs Always-on
| Mode | Behavior |
|---|---|
| Push-to-talk (PTT) | Hold button (or spacebar) to record; release to send. Default for noisy environments. |
| Always-on (AON) | Continuous capture with VAD (voice activity detection). Default for hands-free. |
Toggle persists per-user. Cross-link voice/wake-word.kmd R1 toggles (voice.enabled, talkMode).
R4 — Barge-in
When user speaks during assistant output (barge-in):
- Per
voice/wake-word.kmdR5:bargeIn: true→ output audio fades out + input fades in. - Visual: waveform color crossfades accent ↔ success over ~200ms.
- Output TTS interrupted; new input streamed to backend.
- Audit: barge-in event logged.
R5 — Transcript overlay
Toggle button (default OFF). When ON:
- Live STT text appears overlay below waveform (input state).
- Live TTS source text appears (output state).
- Auto-scroll; max 3 lines visible.
- After mode end: transcript persisted to conversation history per
conversation-history.kmd(#115).
R6 — End session
Top-right ✕ button OR swipe-down gesture (mobile):
- End TTS playback gracefully (fade out 200ms).
- Disconnect WebSocket from
services/ai/voice. - Return user to composer OR chat history (configurable per product).
- Final transcript saved to conversation history.
R7 — Mute
1-tap toggle bottom-center mic button:
- Mute: mic input gated locally; backend still connected.
- Unmute: input resumes.
- Mute state announced via aria-live.
- Visual: mic icon strikethrough.
R8 — Surface bindings
| Surface | API |
|---|---|
| Flutter | KoderVoiceModeSheet({onEnd, onMute, onBargeIn, transcriptToggle}) em koder_kit/lib/src/ai/voice_mode_sheet.dart |
| Web | <koder-voice-mode-sheet> |
| Compose Android | KoderVoiceModeSheet em koder-design-compose (futuro) |
| SwiftUI iOS | idem em koder-design-swift (futuro) |
| CLI / TUI | Plain prompt-and-response; mic via system; no waveform |
R9 — Acessibilidade
- Sheet:
role="dialog" aria-modal="true" aria-label="Voice conversation". - Waveform:
aria-hidden="true"(visual only); state announced via aria-live ("Listening", "Speaking"). - Buttons: keyboard accessible; spacebar for PTT.
- Reduced-motion: waveform replaced by labels.
- Screen reader: announces transition states.
R10 — i18n
| Key | en-US | pt-BR |
|---|---|---|
ai.voice.mode.title | "Voice mode" | "Modo de voz" |
ai.voice.mode.ptt | "Push to talk" | "Pressionar para falar" |
ai.voice.mode.aon | "Always on" | "Sempre ativo" |
ai.voice.mode.mute | "Mute" | "Silenciar" |
ai.voice.mode.unmute | "Unmute" | "Reativar" |
ai.voice.mode.end | "End conversation" | "Encerrar conversa" |
ai.voice.mode.transcript_toggle | "Show transcript" | "Mostrar transcrição" |
ai.voice.state.listening | "Listening..." | "Ouvindo..." |
ai.voice.state.processing | "Processing..." | "Processando..." |
ai.voice.state.speaking | "Speaking..." | "Falando..." |
R11 — Per-preset
Cosmetic only.
T-suite
- T1 Mount: voice mode sheet renders; default mode PTT or AON per user pref.
- T2 State transitions: idle → listening → processing → speaking → idle.
- T3 Waveform colors: each state correct color.
- T4 PTT mode: hold spacebar → recording; release → send.
- T5 AON mode: VAD-triggered start/stop.
- T6 Barge-in: speak during output → crossfade accent ↔ success; output TTS interrupted.
- T7 Mute: 1-tap → mic gated; aria announces "Muted".
- T8 End: tap ✕ → fade out + cleanup + return to composer; transcript saved.
- T9 Transcript toggle: enable → overlay visible during conversation.
- T10 Reduced-motion: waveform replaced by text labels.
- T11 A11y: aria-live announces state transitions.
- N1 Mic permission revoked mid-session: graceful end + prompt to re-grant.
Cross-link
- Base spec:
voice/wake-word.kmd(toggles + backend) - Companion:
multimodal-input.kmd(#116 — composer mic button entry point),conversation-history.kmd(transcript persist),ai-disclaimer.kmd(disclaimer label R1) - Backend:
services/ai/voice/,services/ai/ai/backlog/pending/115-cli-desktop-voice-input.md
Referências
meta/docs/stack/specs/voice/wake-word.kmdmeta/docs/stack/specs/ai-ui/multimodal-input.kmdmeta/docs/stack/specs/ai-ui/chat-message-bubble.kmd