Skip to content

AI voice mode UI

ai-ui specs/ai-ui/voice-mode.kmd

Fullscreen voice conversation mode UI: waveform (input/output color-coded), push-to-talk + always-on toggle, barge-in visual, mute, end session. Extends voice/wake-word.kmd with conversational mode semantics. Required for Talk product + future Kortex voice mode.

When this spec applies

Primary triggers

All triggers

Specification body

Spec — AI voice mode UI

Spec base: voice/wake-word.kmd cobre toggles + backend. Esta spec cobre UX do modo conversacional ao vivo. Trigger: mic button no multimodal-input.kmd (#116). Impl ticket: services/ai/ai#115.

Princípios

  1. Fullscreen focus — voice mode = dedicated surface, não inline.
  2. Visual feedback — waveform input vs output color-coded.
  3. Barge-in supported — user fala enquanto IA fala; UX deve sinalizar transição.
  4. One-tap escape — mute / end always reachable.
  5. Optional transcription overlay — toggle live STT visible.

R1 — Anatomia

┌─────────────────────────────────────────────┐
│  [✕]                                        │  ← top: end session
│                                             │
│            [Assistant Icon]                 │
│         Generated by AI — verify            │
│                                             │
│   ╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮              │
│  ─╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰─             │  ← waveform (output = assistant talking)
│                                             │
│   "Sure, here's what I think..."            │  ← optional transcript overlay
│                                             │
│                                             │
│                                             │
│       ┌────────────────────────┐            │
│       │  Push to talk    [PTT] │            │  ← mode toggle
│       │  Always on       [AON] │            │
│       └────────────────────────┘            │
│                                             │
│                  [🎙 mute]                  │
└─────────────────────────────────────────────┘

Slots:

SlotFunction
End button (✕)Close voice mode; return to composer or chat surface
AvatarAssistant icon (configurable)
AI disclaimerSubtle label per ai-disclaimer.kmd R1 tier 1
WaveformAnimated; color shifts per state (R2)
Transcript overlayOptional toggle; shows STT live (input) + TTS source (output)
Mode togglePush-to-talk vs Always-on
Mute button1-tap mute mic; visual indicator state

R2 — Waveform states

Color coding per themes/color-roles.kmd:

StateWaveform colorBehavior
idletext-muted flat lineno input/output
listening (input)accent animated barsinput audio captured
processingtext-muted pulsing dotwaiting for response
speaking (output)success animated barsoutput audio playing
barge-in transitionCrossfade accent ↔ successbrief overlap when user speaks during output

Visual MUST honor reduced-motion: replace animation with static "...listening" / "speaking" labels.

R3 — Push-to-talk vs Always-on

ModeBehavior
Push-to-talk (PTT)Hold button (or spacebar) to record; release to send. Default for noisy environments.
Always-on (AON)Continuous capture with VAD (voice activity detection). Default for hands-free.

Toggle persists per-user. Cross-link voice/wake-word.kmd R1 toggles (voice.enabled, talkMode).

R4 — Barge-in

When user speaks during assistant output (barge-in):

  • Per voice/wake-word.kmd R5: bargeIn: true → output audio fades out + input fades in.
  • Visual: waveform color crossfades accent ↔ success over ~200ms.
  • Output TTS interrupted; new input streamed to backend.
  • Audit: barge-in event logged.

R5 — Transcript overlay

Toggle button (default OFF). When ON:

  • Live STT text appears overlay below waveform (input state).
  • Live TTS source text appears (output state).
  • Auto-scroll; max 3 lines visible.
  • After mode end: transcript persisted to conversation history per conversation-history.kmd (#115).

R6 — End session

Top-right ✕ button OR swipe-down gesture (mobile):

  • End TTS playback gracefully (fade out 200ms).
  • Disconnect WebSocket from services/ai/voice.
  • Return user to composer OR chat history (configurable per product).
  • Final transcript saved to conversation history.

R7 — Mute

1-tap toggle bottom-center mic button:

  • Mute: mic input gated locally; backend still connected.
  • Unmute: input resumes.
  • Mute state announced via aria-live.
  • Visual: mic icon strikethrough.

R8 — Surface bindings

SurfaceAPI
FlutterKoderVoiceModeSheet({onEnd, onMute, onBargeIn, transcriptToggle}) em koder_kit/lib/src/ai/voice_mode_sheet.dart
Web<koder-voice-mode-sheet>
Compose AndroidKoderVoiceModeSheet em koder-design-compose (futuro)
SwiftUI iOSidem em koder-design-swift (futuro)
CLI / TUIPlain prompt-and-response; mic via system; no waveform

R9 — Acessibilidade

  • Sheet: role="dialog" aria-modal="true" aria-label="Voice conversation".
  • Waveform: aria-hidden="true" (visual only); state announced via aria-live ("Listening", "Speaking").
  • Buttons: keyboard accessible; spacebar for PTT.
  • Reduced-motion: waveform replaced by labels.
  • Screen reader: announces transition states.

R10 — i18n

Keyen-USpt-BR
ai.voice.mode.title"Voice mode""Modo de voz"
ai.voice.mode.ptt"Push to talk""Pressionar para falar"
ai.voice.mode.aon"Always on""Sempre ativo"
ai.voice.mode.mute"Mute""Silenciar"
ai.voice.mode.unmute"Unmute""Reativar"
ai.voice.mode.end"End conversation""Encerrar conversa"
ai.voice.mode.transcript_toggle"Show transcript""Mostrar transcrição"
ai.voice.state.listening"Listening...""Ouvindo..."
ai.voice.state.processing"Processing...""Processando..."
ai.voice.state.speaking"Speaking...""Falando..."

R11 — Per-preset

Cosmetic only.

T-suite

  • T1 Mount: voice mode sheet renders; default mode PTT or AON per user pref.
  • T2 State transitions: idle → listening → processing → speaking → idle.
  • T3 Waveform colors: each state correct color.
  • T4 PTT mode: hold spacebar → recording; release → send.
  • T5 AON mode: VAD-triggered start/stop.
  • T6 Barge-in: speak during output → crossfade accent ↔ success; output TTS interrupted.
  • T7 Mute: 1-tap → mic gated; aria announces "Muted".
  • T8 End: tap ✕ → fade out + cleanup + return to composer; transcript saved.
  • T9 Transcript toggle: enable → overlay visible during conversation.
  • T10 Reduced-motion: waveform replaced by text labels.
  • T11 A11y: aria-live announces state transitions.
  • N1 Mic permission revoked mid-session: graceful end + prompt to re-grant.

References