vojo/docs/ai/ai-bot.md

# Vojo AI bot (`@ai:vojo.chat`)

A Go **Synapse application service** in [`apps/ai-bot/`](../../apps/ai-bot/) — not a normal
bot user. Answers `@`-mentions in groups and every message in 1:1s, over the plaintext
CS-API (Vojo rooms are unencrypted by default). It is a separate server-side service
deployed next to Synapse; it ships nothing to the web client.

- **Operator / full env reference:** [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md) (config tables, setup, deploy).
- **Deploy / server config:** [server-side.md](server-side.md) (the `ai-bot` service row, the `vojo_ai` Postgres role).
- **Detailed design SOT:** `docs/plans/grok_bot.md` + `docs/plans/ai_backend_build_plan.md` — **local-only, `docs/plans/` is gitignored.**

## Request flow

Synapse pushes a transaction → the bot **acks 200 instantly, then processes async per-room**
([appservice.go](../../apps/ai-bot/appservice.go)), so a slow model call never blocks other
rooms or the homeserver. `handleMessage` ([bot.go](../../apps/ai-bot/bot.go)) gates in order:
durable+in-memory dedup → encrypted-room skip → decode / edit / own-message / notice →
foreign-server leave → DM-or-mention → media react → resolve conversation (thread) →
**per-(room,thread) single-flight** → spawn `respond`. `respond` = `Reserve(estimate)` →
`generate()` → `Settle(actual)` → `sendReply`; **any failure produces an emoji react, never silence.**

## Conversations (threads) — ChatGPT-style multi-chat

In a 1:1 DM a top-level message **roots a new thread** (a fresh conversation) and the bot answers
inside it ([bot.go](../../apps/ai-bot/bot.go) `resolveThreadRoot`); a message already in a thread
continues it (F27). **Groups are never auto-threaded** — the gate is structural (`isDM`), not a
flag, so the threading feature can never change group behavior. Auto-threading in DMs is **always
on** (the old `THREAD_CONVERSATIONS` env flag was removed — it only created a host/backend
mismatch footgun). Context and single-flight are keyed per-`(room, thread)` so conversations
neither share history nor block each other; typing is room-level (Matrix has no per-thread typing)
via a refcount; per-thread context buffers are LRU-bounded (`maxConvBuffersPerRoom`).

**Host pairing:** the cinny host shows the conversation surface for a bot only when its
`config.json` preset has `experience.type: "ai-chat"` (today only `@ai`). That surface is a
**fully isolated, native in-client chat** (`features/bots/BotConversations` + `AiChatHeader` +
`AiChatMenu`, reusing the generic `ThreadDrawer`/`RoomInput`) — it shares **no** runtime with the
bridge widget pipeline (no `BotShell`/iframe, no `show-chat` toggle; there is no `vojo-ai` widget
any more). Bridges keep `experience.type: "matrix-widget"` (iframe + show-chat fallback). Because
the backend now always threads DMs, any DM message @ai answers lands in a thread the host can open.

## Cascade (flag-gated "operator cascade", every layer default OFF)

`generate()` ([cascade.go](../../apps/ai-bot/cascade.go)) routes ([router.go](../../apps/ai-bot/router.go))
then dispatches; **any layer off or failing degrades to `grok_direct`** (never an error to the user):

- **`grok_direct`** — DEFAULT, one Grok call. **Grok is the final voice on everything substantive.**
- **`trivial_direct`** — greetings/acks → cheap Gemini (`TRIVIAL_OFFLOAD_ENABLED`).
- **`web_then_grok`** — fresh facts: a WebProvider fetches a grounded digest + citations, then **Grok synthesises the answer in voice** ([web.go](../../apps/ai-bot/web.go)).
- **`reason_then_grok`** — manual trigger ("подумай глубже") → Grok at a higher `reasoning_effort`.
- **`project_then_grok`** — questions about the **Vojo product itself** (`PROJECT_KB_ENABLED`): a curated KB (operator data from `PROJECT_KB_PATH`, default the bundled `prompts/vojo_kb.txt`) is injected as a system note and **Grok answers product claims strictly from it** (anti-hallucination — Grok has no parametric Vojo knowledge, and the web doesn't either). Gated by the classifier's `about_project` signal (the context-aware judge — it resolves follow-ups like "Про этот" → the app); a false positive is bounded by the entity-scoped note. Beats every web arm. One Grok call, so it costs ~the same as `grok_direct`. See [docs/plans/ai_project_knowledge.md](../plans/ai_project_knowledge.md).
- Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (`grok_direct`).

**Invariant:** all cascade flags OFF == today's bot — a single `grok_direct` call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes.

## Provider seam (no vendor names in business logic)

[llm.go](../../apps/ai-bot/llm.go) (`Message`/`Usage`/`LLMRequest`/`LLMResponse`/`LLMClient`) +
[httpllm.go](../../apps/ai-bot/httpllm.go) (shared OpenAI-compatible transport + retry) + thin
adapters [provider_xai.go](../../apps/ai-bot/provider_xai.go) /
[provider_gemini.go](../../apps/ai-bot/provider_gemini.go) + [pricing.go](../../apps/ai-bot/pricing.go)
(`priceFor` model→price map). `Bot.llm` is an `LLMClient`, never a concrete vendor type.

## Money, invariants & store ([store.go](../../apps/ai-bot/store.go))

- **Ceiling is TOCTOU-safe:** `Reserve` books a route's estimated max-cost into `reserved_usd`
  under a per-day **global** advisory lock; the gate counts committed + reserved spend; `Settle`
  releases the reservation and books the real per-component `CostBreakdown`. A concurrent burst
  overshoots by at most one reservation.
- **Never charge for silence:** a 2xx is billed; if the reply then fails to send, refund the
  request SLOT (not the USD) + react. A failed call releases the reservation + refunds the slot;
  a panic releases via a deferred guard.
- Caps: `DAILY_USD_CEILING` (global $), `PER_USER_DAILY_CAP` (requests/user), `PER_USER_DAILY_USD`
  (optional $/user). **at-most-once** dedup is durable (`SeenEvent`/`MarkTxn`); generation is
  per-(room,thread) single-flight.
- One overall **per-request deadline** bounds the whole cascade (no per-stage 3×60s accretion).
- **Telemetry:** one `request_log` row per engaged request (route, per-component $, latency,
  degrade reasons), written async + isolated (its failure never drops a reply), `TELEMETRY_ENABLED`
  default off, time-based retention.
- **Store:** dedicated Postgres `vojo_ai` (pgx); schema is an ordered `migrations` array in
  store.go. **Operational state only** (dedup, spend ledger, grounding cap, `request_log`,
  warned-encrypted) — **no message content** (that lives in Synapse).

## Current prod config (the cheap web path)

`WEB_PROVIDER=gemini_grounding`: Gemini 2.5 Flash-Lite does the fetch via the **native v1beta
`google_search` tool** (NOT the OpenAI-compat endpoint — grounding is silently ignored there,
F-EXT-3), then Grok-4.3 voices it. ~**$0.0013/query** (vs ~$0.022 for the old two-Grok path);
grounding is free under the daily RPD, guarded by `WEB_GROUNDING_DAILY_CAP`. `XAI_MODEL=grok-4.3`
+ `GROK_REASONING_EFFORT=none` (4.3 otherwise reasons on every reply). Full flag table in the
[README](../../apps/ai-bot/README.md).

## Trigger hygiene (what reaches the search query)

The raw event body is **cleaned once** at the top of `respond` ([bot.go](../../apps/ai-bot/bot.go),
`stripBotMention(stripReplyFallback(...))`) before it is used as the web-search query, the prompt
trigger, the buffer entry, or telemetry. Two egress hazards both rode the raw body: the bot's own
mention pill fallback (cinny writes the **full mxid** `@ai:vojo.chat` into the plain `body`), and
the rich-reply quoted parent. The mxid was the worse one — sent verbatim to gemini grounding it
made the provider treat **`vojo.chat`** as the subject entity ("was the *Vojo.chat* messenger
removed?") and confabulate a confident wrong answer; the same question without the mention (e.g. in
a DM, which has no mention) grounded correctly. Mention **detection** is unaffected — it runs
upstream on `m.mentions`/`replyParentIsBot` ([mentions.go](../../apps/ai-bot/mentions.go)), not on
body text. The human display name is deliberately **not** stripped, so "что умеет Vojo AI" survives.

## Source attribution (the "Sources" footer)

Web answers append a compact, deduped **`Источники: [rbc.ru](…), …`** line built **server-side**
after Grok's prose ([sources.go](../../apps/ai-bot/sources.go) `sourcesFooter`), never via the Grok
prompt (the synth note still says "no URLs or links" — instructing Grok to cite made it paste ugly
redirects and mis-attribute them). The label is the publisher **domain** (`web.title`); the link is
the citation's URL — for `gemini_grounding` that is the opaque `grounding-api-redirect` URL, which
the **end user clicks** to reach the real article. **Gemini Grounding terms** (verified against
`ai.google.dev/gemini-api/terms`) constrain this: the redirect must **not** be resolved
server-side (no "programmatic/automated access to Grounded Results"), and a strict reading also
requires showing the **Search-Suggestions chip** (`searchEntryPoint.renderedContent`, HTML/CSS) —
which a sanitised Matrix bubble can't render, so that part stays unmet (pre-existing gap; the bot
already shows grounded prose without it). The footer is appended to the **sent** message only, not
the buffered turn — the redirect links are ephemeral, so they must not pollute the history that
feeds later prompts. `grok_web_search` returns **real** publisher URLs (no Google display ToS), so
switching `WEB_PROVIDER` is the path to true article links — at ~17× the cost.

## Observability (logs + per-request trace)

`log/slog` to stderr (`LOG_LEVEL`, `LOG_FORMAT=text|json`). A context-aware handler
([logging.go](../../apps/ai-bot/logging.go)) stamps a per-request **`trace_id`** —
minted once per handled event in `handleEvent` ([trace.go](../../apps/ai-bot/trace.go))
and carried in `ctx` down to the model HTTP call — onto **every** log line, so one
`trace_id` greps the whole request trail (the userver idiom; the id is OTel-trace-id
shaped for a future exporter). Routing diagnostics (`route decided` / `generation
outcome`) are DEBUG, content-free. Full model **request/response bodies** are gated by a
**per-user allowlist** `LOG_BODIES_USERS` (empty = nobody) **and** `LOG_LEVEL=debug`,
truncated to a fixed ~4 KB cap, with URL/headers (the API key) never logged — decided once
at admission via a `verbose` flag in `ctx`, read by the dumb transport. This is the
**debug** path; `request_log` (`TELEMETRY_*`) is the separate **analytics** path — they
correlate via `trace_id`/`event_id` but are independent. Ship JSON stdout to
OpenSearch/Loki with a collector (Fluent Bit/Vector); the bot never talks to a log
backend. Full flag table in the [README](../../apps/ai-bot/README.md#observability--logs--per-request-trace).

## Building / testing

Go toolchain lives at `/home/ubuntu/.go-toolchain/go/bin` (NOT on PATH). Store-backed tests need
`AI_BOT_TEST_DATABASE_URL` (a throwaway Postgres) and **skip** without it, so `go test ./...` stays
green on a machine without one. Keep `gofmt -l`, `go vet ./...`, `go test -race ./...` clean.