From 5d959311f22718ae961b3269fd0697dfabcd7eae Mon Sep 17 00:00:00 2001
From: heaven <vojochatdev@gmail.com>
Date: Mon, 1 Jun 2026 20:41:33 +0300
Subject: [PATCH] docs(ai): add ai-bot.md documenting the bot's Grok-voiced
 cascade backend and link it from the context bank

---
 .gitignore             |  1 +
 docs/ai/README.md      |  1 +
 docs/ai/ai-bot.md      | 76 ++++++++++++++++++++++++++++++++++++++++++
 docs/ai/server-side.md |  2 +-
 4 files changed, 79 insertions(+), 1 deletion(-)
 create mode 100644 docs/ai/ai-bot.md

diff --git a/.gitignore b/.gitignore
index 44b8a204..03deb86a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -23,5 +23,6 @@ docs/ai/*
 !docs/ai/i18n.md
 !docs/ai/overview.md
 !docs/ai/server-side.md
+!docs/ai/ai-bot.md
 
 vite.config.*.timestamp-*.mjs
diff --git a/docs/ai/README.md b/docs/ai/README.md
index 3bec6564..8894d156 100644
--- a/docs/ai/README.md
+++ b/docs/ai/README.md
@@ -22,6 +22,7 @@ Any agent (Claude Code, Cursor, Codex, Windsurf, Cline, Copilot, Aider, …) wor
 | [electron.md](electron.md) | Electron desktop wrapper, privileged `vojo://` scheme for SW, build chain, IPC security, Windows distribution |
 | [bugs.md](bugs.md) | Known bugs & regressions |
 | [server-side.md](server-side.md) | Some configs that deployd on server |
+| [ai-bot.md](ai-bot.md) | Vojo AI bot (`@ai:vojo.chat`) — server-side Grok-voiced cascade appservice: request flow, routes, provider seam, spend ledger, current cheap-web config |
 
 ## Rules for updating
 
diff --git a/docs/ai/ai-bot.md b/docs/ai/ai-bot.md
new file mode 100644
index 00000000..26c7b926
--- /dev/null
+++ b/docs/ai/ai-bot.md
@@ -0,0 +1,76 @@
+# Vojo AI bot (`@ai:vojo.chat`)
+
+A Go **Synapse application service** in [`apps/ai-bot/`](../../apps/ai-bot/) — not a normal
+bot user. Answers `@`-mentions in groups and every message in 1:1s, over the plaintext
+CS-API (Vojo rooms are unencrypted by default). It is a separate server-side service
+deployed next to Synapse; it ships nothing to the web client.
+
+- **Operator / full env reference:** [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md) (config tables, setup, deploy).
+- **Deploy / server config:** [server-side.md](server-side.md) (the `ai-bot` service row, the `vojo_ai` Postgres role).
+- **Detailed design SOT:** `docs/plans/grok_bot.md` + `docs/plans/ai_backend_build_plan.md` — **local-only, `docs/plans/` is gitignored.**
+
+## Request flow
+
+Synapse pushes a transaction → the bot **acks 200 instantly, then processes async per-room**
+([appservice.go](../../apps/ai-bot/appservice.go)), so a slow model call never blocks other
+rooms or the homeserver. `handleMessage` ([bot.go](../../apps/ai-bot/bot.go)) gates in order:
+durable+in-memory dedup → encrypted-room skip → decode / edit / own-message / notice →
+foreign-server leave → DM-or-mention → media react → **per-room single-flight** → spawn
+`respond`. `respond` = `Reserve(estimate)` → `generate()` → `Settle(actual)` → `sendReply`;
+**any failure produces an emoji react, never silence.**
+
+## Cascade (flag-gated "operator cascade", every layer default OFF)
+
+`generate()` ([cascade.go](../../apps/ai-bot/cascade.go)) routes ([router.go](../../apps/ai-bot/router.go))
+then dispatches; **any layer off or failing degrades to `grok_direct`** (never an error to the user):
+
+- **`grok_direct`** — DEFAULT, one Grok call. **Grok is the final voice on everything substantive.**
+- **`trivial_direct`** — greetings/acks → cheap Gemini (`TRIVIAL_OFFLOAD_ENABLED`).
+- **`web_then_grok`** — fresh facts: a WebProvider fetches a grounded digest + citations, then **Grok synthesises the answer in voice** ([web.go](../../apps/ai-bot/web.go)).
+- **`reason_then_grok`** — manual trigger ("подумай глубже") → Grok at a higher `reasoning_effort`.
+- Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (`grok_direct`).
+
+**Invariant:** all cascade flags OFF == today's bot — a single `grok_direct` call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes.
+
+## Provider seam (no vendor names in business logic)
+
+[llm.go](../../apps/ai-bot/llm.go) (`Message`/`Usage`/`LLMRequest`/`LLMResponse`/`LLMClient`) +
+[httpllm.go](../../apps/ai-bot/httpllm.go) (shared OpenAI-compatible transport + retry) + thin
+adapters [provider_xai.go](../../apps/ai-bot/provider_xai.go) /
+[provider_gemini.go](../../apps/ai-bot/provider_gemini.go) + [pricing.go](../../apps/ai-bot/pricing.go)
+(`priceFor` model→price map). `Bot.llm` is an `LLMClient`, never a concrete vendor type.
+
+## Money, invariants & store ([store.go](../../apps/ai-bot/store.go))
+
+- **Ceiling is TOCTOU-safe:** `Reserve` books a route's estimated max-cost into `reserved_usd`
+  under a per-day **global** advisory lock; the gate counts committed + reserved spend; `Settle`
+  releases the reservation and books the real per-component `CostBreakdown`. A concurrent burst
+  overshoots by at most one reservation.
+- **Never charge for silence:** a 2xx is billed; if the reply then fails to send, refund the
+  request SLOT (not the USD) + react. A failed call releases the reservation + refunds the slot;
+  a panic releases via a deferred guard.
+- Caps: `DAILY_USD_CEILING` (global $), `PER_USER_DAILY_CAP` (requests/user), `PER_USER_DAILY_USD`
+  (optional $/user). **at-most-once** dedup is durable (`SeenEvent`/`MarkTxn`); generation is
+  per-room single-flight.
+- One overall **per-request deadline** bounds the whole cascade (no per-stage 3×60s accretion).
+- **Telemetry:** one `request_log` row per engaged request (route, per-component $, latency,
+  degrade reasons), written async + isolated (its failure never drops a reply), `TELEMETRY_ENABLED`
+  default off, time-based retention.
+- **Store:** dedicated Postgres `vojo_ai` (pgx); schema is an ordered `migrations` array in
+  store.go. **Operational state only** (dedup, spend ledger, grounding cap, `request_log`,
+  warned-encrypted) — **no message content** (that lives in Synapse).
+
+## Current prod config (the cheap web path)
+
+`WEB_PROVIDER=gemini_grounding`: Gemini 2.5 Flash-Lite does the fetch via the **native v1beta
+`google_search` tool** (NOT the OpenAI-compat endpoint — grounding is silently ignored there,
+F-EXT-3), then Grok-4.3 voices it. ~**$0.0013/query** (vs ~$0.022 for the old two-Grok path);
+grounding is free under the daily RPD, guarded by `WEB_GROUNDING_DAILY_CAP`. `XAI_MODEL=grok-4.3`
++ `GROK_REASONING_EFFORT=none` (4.3 otherwise reasons on every reply). Full flag table in the
+[README](../../apps/ai-bot/README.md).
+
+## Building / testing
+
+Go toolchain lives at `/home/ubuntu/.go-toolchain/go/bin` (NOT on PATH). Store-backed tests need
+`AI_BOT_TEST_DATABASE_URL` (a throwaway Postgres) and **skip** without it, so `go test ./...` stays
+green on a machine without one. Keep `gofmt -l`, `go vet ./...`, `go test -race ./...` clean.
diff --git a/docs/ai/server-side.md b/docs/ai/server-side.md
index 758f05b3..2b81e483 100644
--- a/docs/ai/server-side.md
+++ b/docs/ai/server-side.md
@@ -132,7 +132,7 @@ in doubt.
 | `telegram-bridge` | `dock.mau.dev/mautrix/telegram:<v26.04 bridgev2 tag>` | `./bridges/telegram:/data` |
 | `discord-bridge` | `dock.mau.dev/mautrix/discord:v0.7.5` | `./bridges/discord:/data` (legacy bridge — runtime reports `0.7.6+dev`) |
 | `whatsapp-bridge` | `dock.mau.dev/mautrix/whatsapp:v0.12.4` | `./bridges/whatsapp:/data` |
-| `ai-bot` | `ai-bot:custom` (built locally from [`apps/ai-bot/`](../../apps/ai-bot/), shipped via `docker save \| ssh docker load` — VS Code task **Deploy AI bot**) | **Vojo AI** = `@ai:vojo.chat`, an xAI-Grok-backed **application service** (NOT a normal bot user). Answers `@`-mentions in groups + everything in 1:1s; the Grok reply (markdown) is rendered to `org.matrix.custom.html` and sent as `formatted_body` (in-bot `markdown.go`, zero deps; emits only tags Cinny's sanitizer keeps, escapes all model text), falling back to the plain `body` when there's no formatting. Mounts `./ai-bot:/data` (owned **uid 65532**, distroless nonroot) holding `registration.yaml` (self-generated, `generate-registration`), `state/` (runtime dir) and `secrets/xai_api_key`. Its **operational store** (txn/event dedup, daily spend ledger, encrypted-warned set) lives in the dedicated `vojo_ai` Postgres DB via `AI_BOT_DATABASE_URL` — `depends_on: [synapse, postgres]`. Push port `:8009` (registration `url: http://ai-bot:8009`). Secrets via env/`*_FILE`; `as_token`/`hs_token` read from `registration.yaml` (no rotation). See [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md). |
+| `ai-bot` | `ai-bot:custom` (built locally from [`apps/ai-bot/`](../../apps/ai-bot/), shipped via `docker save \| ssh docker load` — VS Code task **Deploy AI bot**) | **Vojo AI** = `@ai:vojo.chat`, a Grok-voiced **cascade application service** (NOT a normal bot user; architecture: [ai-bot.md](ai-bot.md)). Answers `@`-mentions in groups + everything in 1:1s; the Grok reply (markdown) is rendered to `org.matrix.custom.html` and sent as `formatted_body` (in-bot `markdown.go`, zero deps; emits only tags Cinny's sanitizer keeps, escapes all model text), falling back to the plain `body` when there's no formatting. Mounts `./ai-bot:/data` (owned **uid 65532**, distroless nonroot) holding `registration.yaml` (self-generated, `generate-registration`), `state/` (runtime dir) and `secrets/xai_api_key`. Its **operational store** (txn/event dedup, daily spend ledger, encrypted-warned set) lives in the dedicated `vojo_ai` Postgres DB via `AI_BOT_DATABASE_URL` — `depends_on: [synapse, postgres]`. Push port `:8009` (registration `url: http://ai-bot:8009`). Secrets via env/`*_FILE`; `as_token`/`hs_token` read from `registration.yaml` (no rotation). See [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md). |
 
 ### Bridge service stanza (template)