From 5d959311f22718ae961b3269fd0697dfabcd7eae Mon Sep 17 00:00:00 2001 From: heaven Date: Mon, 1 Jun 2026 20:41:33 +0300 Subject: [PATCH] docs(ai): add ai-bot.md documenting the bot's Grok-voiced cascade backend and link it from the context bank --- .gitignore | 1 + docs/ai/README.md | 1 + docs/ai/ai-bot.md | 76 ++++++++++++++++++++++++++++++++++++++++++ docs/ai/server-side.md | 2 +- 4 files changed, 79 insertions(+), 1 deletion(-) create mode 100644 docs/ai/ai-bot.md diff --git a/.gitignore b/.gitignore index 44b8a204..03deb86a 100644 --- a/.gitignore +++ b/.gitignore @@ -23,5 +23,6 @@ docs/ai/* !docs/ai/i18n.md !docs/ai/overview.md !docs/ai/server-side.md +!docs/ai/ai-bot.md vite.config.*.timestamp-*.mjs diff --git a/docs/ai/README.md b/docs/ai/README.md index 3bec6564..8894d156 100644 --- a/docs/ai/README.md +++ b/docs/ai/README.md @@ -22,6 +22,7 @@ Any agent (Claude Code, Cursor, Codex, Windsurf, Cline, Copilot, Aider, …) wor | [electron.md](electron.md) | Electron desktop wrapper, privileged `vojo://` scheme for SW, build chain, IPC security, Windows distribution | | [bugs.md](bugs.md) | Known bugs & regressions | | [server-side.md](server-side.md) | Some configs that deployd on server | +| [ai-bot.md](ai-bot.md) | Vojo AI bot (`@ai:vojo.chat`) — server-side Grok-voiced cascade appservice: request flow, routes, provider seam, spend ledger, current cheap-web config | ## Rules for updating diff --git a/docs/ai/ai-bot.md b/docs/ai/ai-bot.md new file mode 100644 index 00000000..26c7b926 --- /dev/null +++ b/docs/ai/ai-bot.md @@ -0,0 +1,76 @@ +# Vojo AI bot (`@ai:vojo.chat`) + +A Go **Synapse application service** in [`apps/ai-bot/`](../../apps/ai-bot/) — not a normal +bot user. Answers `@`-mentions in groups and every message in 1:1s, over the plaintext +CS-API (Vojo rooms are unencrypted by default). It is a separate server-side service +deployed next to Synapse; it ships nothing to the web client. + +- **Operator / full env reference:** [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md) (config tables, setup, deploy). +- **Deploy / server config:** [server-side.md](server-side.md) (the `ai-bot` service row, the `vojo_ai` Postgres role). +- **Detailed design SOT:** `docs/plans/grok_bot.md` + `docs/plans/ai_backend_build_plan.md` — **local-only, `docs/plans/` is gitignored.** + +## Request flow + +Synapse pushes a transaction → the bot **acks 200 instantly, then processes async per-room** +([appservice.go](../../apps/ai-bot/appservice.go)), so a slow model call never blocks other +rooms or the homeserver. `handleMessage` ([bot.go](../../apps/ai-bot/bot.go)) gates in order: +durable+in-memory dedup → encrypted-room skip → decode / edit / own-message / notice → +foreign-server leave → DM-or-mention → media react → **per-room single-flight** → spawn +`respond`. `respond` = `Reserve(estimate)` → `generate()` → `Settle(actual)` → `sendReply`; +**any failure produces an emoji react, never silence.** + +## Cascade (flag-gated "operator cascade", every layer default OFF) + +`generate()` ([cascade.go](../../apps/ai-bot/cascade.go)) routes ([router.go](../../apps/ai-bot/router.go)) +then dispatches; **any layer off or failing degrades to `grok_direct`** (never an error to the user): + +- **`grok_direct`** — DEFAULT, one Grok call. **Grok is the final voice on everything substantive.** +- **`trivial_direct`** — greetings/acks → cheap Gemini (`TRIVIAL_OFFLOAD_ENABLED`). +- **`web_then_grok`** — fresh facts: a WebProvider fetches a grounded digest + citations, then **Grok synthesises the answer in voice** ([web.go](../../apps/ai-bot/web.go)). +- **`reason_then_grok`** — manual trigger ("подумай глубже") → Grok at a higher `reasoning_effort`. +- Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (`grok_direct`). + +**Invariant:** all cascade flags OFF == today's bot — a single `grok_direct` call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes. + +## Provider seam (no vendor names in business logic) + +[llm.go](../../apps/ai-bot/llm.go) (`Message`/`Usage`/`LLMRequest`/`LLMResponse`/`LLMClient`) + +[httpllm.go](../../apps/ai-bot/httpllm.go) (shared OpenAI-compatible transport + retry) + thin +adapters [provider_xai.go](../../apps/ai-bot/provider_xai.go) / +[provider_gemini.go](../../apps/ai-bot/provider_gemini.go) + [pricing.go](../../apps/ai-bot/pricing.go) +(`priceFor` model→price map). `Bot.llm` is an `LLMClient`, never a concrete vendor type. + +## Money, invariants & store ([store.go](../../apps/ai-bot/store.go)) + +- **Ceiling is TOCTOU-safe:** `Reserve` books a route's estimated max-cost into `reserved_usd` + under a per-day **global** advisory lock; the gate counts committed + reserved spend; `Settle` + releases the reservation and books the real per-component `CostBreakdown`. A concurrent burst + overshoots by at most one reservation. +- **Never charge for silence:** a 2xx is billed; if the reply then fails to send, refund the + request SLOT (not the USD) + react. A failed call releases the reservation + refunds the slot; + a panic releases via a deferred guard. +- Caps: `DAILY_USD_CEILING` (global $), `PER_USER_DAILY_CAP` (requests/user), `PER_USER_DAILY_USD` + (optional $/user). **at-most-once** dedup is durable (`SeenEvent`/`MarkTxn`); generation is + per-room single-flight. +- One overall **per-request deadline** bounds the whole cascade (no per-stage 3×60s accretion). +- **Telemetry:** one `request_log` row per engaged request (route, per-component $, latency, + degrade reasons), written async + isolated (its failure never drops a reply), `TELEMETRY_ENABLED` + default off, time-based retention. +- **Store:** dedicated Postgres `vojo_ai` (pgx); schema is an ordered `migrations` array in + store.go. **Operational state only** (dedup, spend ledger, grounding cap, `request_log`, + warned-encrypted) — **no message content** (that lives in Synapse). + +## Current prod config (the cheap web path) + +`WEB_PROVIDER=gemini_grounding`: Gemini 2.5 Flash-Lite does the fetch via the **native v1beta +`google_search` tool** (NOT the OpenAI-compat endpoint — grounding is silently ignored there, +F-EXT-3), then Grok-4.3 voices it. ~**$0.0013/query** (vs ~$0.022 for the old two-Grok path); +grounding is free under the daily RPD, guarded by `WEB_GROUNDING_DAILY_CAP`. `XAI_MODEL=grok-4.3` ++ `GROK_REASONING_EFFORT=none` (4.3 otherwise reasons on every reply). Full flag table in the +[README](../../apps/ai-bot/README.md). + +## Building / testing + +Go toolchain lives at `/home/ubuntu/.go-toolchain/go/bin` (NOT on PATH). Store-backed tests need +`AI_BOT_TEST_DATABASE_URL` (a throwaway Postgres) and **skip** without it, so `go test ./...` stays +green on a machine without one. Keep `gofmt -l`, `go vet ./...`, `go test -race ./...` clean. diff --git a/docs/ai/server-side.md b/docs/ai/server-side.md index 758f05b3..2b81e483 100644 --- a/docs/ai/server-side.md +++ b/docs/ai/server-side.md @@ -132,7 +132,7 @@ in doubt. | `telegram-bridge` | `dock.mau.dev/mautrix/telegram:` | `./bridges/telegram:/data` | | `discord-bridge` | `dock.mau.dev/mautrix/discord:v0.7.5` | `./bridges/discord:/data` (legacy bridge — runtime reports `0.7.6+dev`) | | `whatsapp-bridge` | `dock.mau.dev/mautrix/whatsapp:v0.12.4` | `./bridges/whatsapp:/data` | -| `ai-bot` | `ai-bot:custom` (built locally from [`apps/ai-bot/`](../../apps/ai-bot/), shipped via `docker save \| ssh docker load` — VS Code task **Deploy AI bot**) | **Vojo AI** = `@ai:vojo.chat`, an xAI-Grok-backed **application service** (NOT a normal bot user). Answers `@`-mentions in groups + everything in 1:1s; the Grok reply (markdown) is rendered to `org.matrix.custom.html` and sent as `formatted_body` (in-bot `markdown.go`, zero deps; emits only tags Cinny's sanitizer keeps, escapes all model text), falling back to the plain `body` when there's no formatting. Mounts `./ai-bot:/data` (owned **uid 65532**, distroless nonroot) holding `registration.yaml` (self-generated, `generate-registration`), `state/` (runtime dir) and `secrets/xai_api_key`. Its **operational store** (txn/event dedup, daily spend ledger, encrypted-warned set) lives in the dedicated `vojo_ai` Postgres DB via `AI_BOT_DATABASE_URL` — `depends_on: [synapse, postgres]`. Push port `:8009` (registration `url: http://ai-bot:8009`). Secrets via env/`*_FILE`; `as_token`/`hs_token` read from `registration.yaml` (no rotation). See [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md). | +| `ai-bot` | `ai-bot:custom` (built locally from [`apps/ai-bot/`](../../apps/ai-bot/), shipped via `docker save \| ssh docker load` — VS Code task **Deploy AI bot**) | **Vojo AI** = `@ai:vojo.chat`, a Grok-voiced **cascade application service** (NOT a normal bot user; architecture: [ai-bot.md](ai-bot.md)). Answers `@`-mentions in groups + everything in 1:1s; the Grok reply (markdown) is rendered to `org.matrix.custom.html` and sent as `formatted_body` (in-bot `markdown.go`, zero deps; emits only tags Cinny's sanitizer keeps, escapes all model text), falling back to the plain `body` when there's no formatting. Mounts `./ai-bot:/data` (owned **uid 65532**, distroless nonroot) holding `registration.yaml` (self-generated, `generate-registration`), `state/` (runtime dir) and `secrets/xai_api_key`. Its **operational store** (txn/event dedup, daily spend ledger, encrypted-warned set) lives in the dedicated `vojo_ai` Postgres DB via `AI_BOT_DATABASE_URL` — `depends_on: [synapse, postgres]`. Push port `:8009` (registration `url: http://ai-bot:8009`). Secrets via env/`*_FILE`; `as_token`/`hs_token` read from `registration.yaml` (no rotation). See [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md). | ### Bridge service stanza (template)