Speak inside it and structure emerges. Ask of it and it answers, then draws. Twenty-eight typed actions — proposals, decisions, sticky notes, geometries, arrows, alignments, deletions. Both voices, one canvas, in real time.
Speakers don't pause for the canvas. The canvas catches up — at a three-second cadence, in typed actions that already know how they relate to each other.
Both paths share the same typed-action interface and the same dedup, layout, and persistence machinery. A direct command in voice lands the same shape as a typed instruction. The chat panel remembers what was already discussed; the orchestrator never forgets a decision it already wrote.
Five meeting cards.The five typed nouns the orchestrator emits from passive listening — Proposal, Decision, Commitment, Blocker, Question. Each carries its own visual rhythm so the canvas reads top-to-bottom as a story.
Three free-form shapes.Direct commands — voice or chat — also reach for tldraw's native repertoire. Sticky notes for jots, geometric shapes for diagrams, text for headings. The chat panel converts intent into the closest fit.
Manipulation.Every shape can later be moved, resized, restyled, aligned, distributed, reordered, deleted, or pointed at by a freeform arrow. The canvas is editable conversationally, not just at creation.
Every utterance — spoken or typed — gets normalized into one of these twenty-eight Zod-validated action variants. The model doesn't invent verbs; if it tries, a runtime normalizer maps the drift back to the canonical form.
Action.Speechmatics streams a diarized transcript. A Gemini 3 Flash orchestrator reads the last thirty seconds and emits typed UI actions — make a proposal, lock a decision, raise a blocker, pose a question.
Relation.Cards do not float independently. Proposals resolve into decisions; commitments hang off their parent decision; blockers point at what they block. The orchestrator names the links before it draws them.
Direction.The same canvas takes direct commands — spoken or typed — across the full action vocabulary. Voice for ambient capture, chat for sharper intent. One closed alphabet, two surfaces.
Memory.Action history, tldraw store snapshots, and the chat agent's conversation all persist to Postgres. Reload the canvas — the orchestrator still remembers what was decided three hours ago.
Artifact.What remains at the end of the call is a typed, navigable document — proposals, decisions, commitments and the arrows between them — not a wall of transcript. The canvas is the meeting minute.