# Vojo AI bot (`@ai:vojo.chat`) A Go **Synapse application service** in [`apps/ai-bot/`](../../apps/ai-bot/) — not a normal bot user. Answers `@`-mentions in groups and every message in 1:1s, over the plaintext CS-API (Vojo rooms are unencrypted by default). It is a separate server-side service deployed next to Synapse; it ships nothing to the web client. - **Operator / full env reference:** [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md) (config tables, setup, deploy). - **Deploy / server config:** [server-side.md](server-side.md) (the `ai-bot` service row, the `vojo_ai` Postgres role). - **Detailed design SOT:** `docs/plans/grok_bot.md` + `docs/plans/ai_backend_build_plan.md` — **local-only, `docs/plans/` is gitignored.** ## Request flow Synapse pushes a transaction → the bot **acks 200 instantly, then processes async per-room** ([appservice.go](../../apps/ai-bot/appservice.go)), so a slow model call never blocks other rooms or the homeserver. `handleMessage` ([bot.go](../../apps/ai-bot/bot.go)) gates in order: durable+in-memory dedup → encrypted-room skip → decode / edit / own-message / notice → foreign-server leave → DM-or-mention → media react → resolve conversation (thread) → **per-(room,thread) single-flight** → spawn `respond`. `respond` = `Reserve(estimate)` → `generate()` → `Settle(actual)` → `sendReply`; **any failure produces an emoji react, never silence.** ## Conversations (threads) — ChatGPT-style multi-chat In a 1:1 DM a top-level message **roots a new thread** (a fresh conversation) and the bot answers inside it ([bot.go](../../apps/ai-bot/bot.go) `resolveThreadRoot`); a message already in a thread continues it (F27). **Groups are never auto-threaded** — the gate is structural (`isDM`), not a flag, so the threading feature can never change group behavior. Auto-threading in DMs is **always on** (the old `THREAD_CONVERSATIONS` env flag was removed — it only created a host/backend mismatch footgun). Context and single-flight are keyed per-`(room, thread)` so conversations neither share history nor block each other; typing is room-level (Matrix has no per-thread typing) via a refcount; per-thread context buffers are LRU-bounded (`maxConvBuffersPerRoom`). **Host pairing:** the cinny host shows the conversation surface for a bot only when its `config.json` preset has `experience.type: "ai-chat"` (today only `@ai`). That surface is a **fully isolated, native in-client chat** (`features/bots/BotConversations` + `AiChatHeader` + `AiChatMenu`, reusing the generic `ThreadDrawer`/`RoomInput`) — it shares **no** runtime with the bridge widget pipeline (no `BotShell`/iframe, no `show-chat` toggle; there is no `vojo-ai` widget any more). Bridges keep `experience.type: "matrix-widget"` (iframe + show-chat fallback). Because the backend now always threads DMs, any DM message @ai answers lands in a thread the host can open. ## Cascade (flag-gated "operator cascade", every layer default OFF) `generate()` ([cascade.go](../../apps/ai-bot/cascade.go)) routes ([router.go](../../apps/ai-bot/router.go)) then dispatches; **any layer off or failing degrades to `grok_direct`** (never an error to the user): - **`grok_direct`** — DEFAULT, one Grok call. **Grok is the final voice on everything substantive.** - **`trivial_direct`** — greetings/acks → cheap Gemini (`TRIVIAL_OFFLOAD_ENABLED`). - **`web_then_grok`** — fresh facts: a WebProvider fetches a grounded digest + citations, then **Grok synthesises the answer in voice** ([web.go](../../apps/ai-bot/web.go)). - **`reason_then_grok`** — manual trigger ("подумай глубже") → Grok at a higher `reasoning_effort`. - **`project_then_grok`** — questions about the **Vojo product itself** (`PROJECT_KB_ENABLED`): a curated KB (operator data from `PROJECT_KB_PATH`, default the bundled `prompts/vojo_kb.txt`) is injected as a system note and **Grok answers product claims strictly from it** (anti-hallucination — Grok has no parametric Vojo knowledge, and the web doesn't either). Gated by the classifier's `about_project` signal (the context-aware judge — it resolves follow-ups like "Про этот" → the app); a false positive is bounded by the entity-scoped note. Beats every web arm. One Grok call, so it costs ~the same as `grok_direct`. See [docs/plans/ai_project_knowledge.md](../plans/ai_project_knowledge.md). - Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (`grok_direct`). **Invariant:** all cascade flags OFF == today's bot — a single `grok_direct` call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes. ## Provider seam (no vendor names in business logic) [llm.go](../../apps/ai-bot/llm.go) (`Message`/`Usage`/`LLMRequest`/`LLMResponse`/`LLMClient`) + [httpllm.go](../../apps/ai-bot/httpllm.go) (shared OpenAI-compatible transport + retry) + thin adapters [provider_xai.go](../../apps/ai-bot/provider_xai.go) / [provider_gemini.go](../../apps/ai-bot/provider_gemini.go) + [pricing.go](../../apps/ai-bot/pricing.go) (`priceFor` model→price map). `Bot.llm` is an `LLMClient`, never a concrete vendor type. ## Money, invariants & store ([store.go](../../apps/ai-bot/store.go)) - **Ceiling is TOCTOU-safe:** `Reserve` books a route's estimated max-cost into `reserved_usd` under a per-day **global** advisory lock; the gate counts committed + reserved spend; `Settle` releases the reservation and books the real per-component `CostBreakdown`. A concurrent burst overshoots by at most one reservation. - **Never charge for silence:** a 2xx is billed; if the reply then fails to send, refund the request SLOT (not the USD) + react. A failed call releases the reservation + refunds the slot; a panic releases via a deferred guard. - Caps: `DAILY_USD_CEILING` (global $), `PER_USER_DAILY_CAP` (requests/user), `PER_USER_DAILY_USD` (optional $/user). **at-most-once** dedup is durable (`SeenEvent`/`MarkTxn`); generation is per-(room,thread) single-flight. - One overall **per-request deadline** bounds the whole cascade (no per-stage 3×60s accretion). - **Telemetry:** one `request_log` row per engaged request (route, per-component $, latency, degrade reasons), written async + isolated (its failure never drops a reply), `TELEMETRY_ENABLED` default off, time-based retention. - **Store:** dedicated Postgres `vojo_ai` (pgx); schema is an ordered `migrations` array in store.go. **Operational state only** (dedup, spend ledger, grounding cap, `request_log`, warned-encrypted) — **no message content** (that lives in Synapse). ## Current prod config (the cheap web path) `WEB_PROVIDER=gemini_grounding`: Gemini 2.5 Flash-Lite does the fetch via the **native v1beta `google_search` tool** (NOT the OpenAI-compat endpoint — grounding is silently ignored there, F-EXT-3), then Grok-4.3 voices it. ~**$0.0013/query** (vs ~$0.022 for the old two-Grok path); grounding is free under the daily RPD, guarded by `WEB_GROUNDING_DAILY_CAP`. `XAI_MODEL=grok-4.3` + `GROK_REASONING_EFFORT=none` (4.3 otherwise reasons on every reply). Full flag table in the [README](../../apps/ai-bot/README.md). ## Observability (logs + per-request trace) `log/slog` to stderr (`LOG_LEVEL`, `LOG_FORMAT=text|json`). A context-aware handler ([logging.go](../../apps/ai-bot/logging.go)) stamps a per-request **`trace_id`** — minted once per handled event in `handleEvent` ([trace.go](../../apps/ai-bot/trace.go)) and carried in `ctx` down to the model HTTP call — onto **every** log line, so one `trace_id` greps the whole request trail (the userver idiom; the id is OTel-trace-id shaped for a future exporter). Routing diagnostics (`route decided` / `generation outcome`) are DEBUG, content-free. Full model **request/response bodies** are gated by a **per-user allowlist** `LOG_BODIES_USERS` (empty = nobody) **and** `LOG_LEVEL=debug`, truncated to a fixed ~4 KB cap, with URL/headers (the API key) never logged — decided once at admission via a `verbose` flag in `ctx`, read by the dumb transport. This is the **debug** path; `request_log` (`TELEMETRY_*`) is the separate **analytics** path — they correlate via `trace_id`/`event_id` but are independent. Ship JSON stdout to OpenSearch/Loki with a collector (Fluent Bit/Vector); the bot never talks to a log backend. Full flag table in the [README](../../apps/ai-bot/README.md#observability--logs--per-request-trace). ## Building / testing Go toolchain lives at `/home/ubuntu/.go-toolchain/go/bin` (NOT on PATH). Store-backed tests need `AI_BOT_TEST_DATABASE_URL` (a throwaway Postgres) and **skip** without it, so `go test ./...` stays green on a machine without one. Keep `gofmt -l`, `go vet ./...`, `go test -race ./...` clean.