docs(ai): add ai-bot.md documenting the bot's Grok-voiced cascade backend and link it from the context bank

This commit is contained in:
heaven 2026-06-01 20:41:33 +03:00
parent ff8918dae1
commit 5d959311f2
4 changed files with 79 additions and 1 deletions

1
.gitignore vendored
View file

@ -23,5 +23,6 @@ docs/ai/*
!docs/ai/i18n.md
!docs/ai/overview.md
!docs/ai/server-side.md
!docs/ai/ai-bot.md
vite.config.*.timestamp-*.mjs

View file

@ -22,6 +22,7 @@ Any agent (Claude Code, Cursor, Codex, Windsurf, Cline, Copilot, Aider, …) wor
| [electron.md](electron.md) | Electron desktop wrapper, privileged `vojo://` scheme for SW, build chain, IPC security, Windows distribution |
| [bugs.md](bugs.md) | Known bugs & regressions |
| [server-side.md](server-side.md) | Some configs that deployd on server |
| [ai-bot.md](ai-bot.md) | Vojo AI bot (`@ai:vojo.chat`) — server-side Grok-voiced cascade appservice: request flow, routes, provider seam, spend ledger, current cheap-web config |
## Rules for updating

76
docs/ai/ai-bot.md Normal file
View file

@ -0,0 +1,76 @@
# Vojo AI bot (`@ai:vojo.chat`)
A Go **Synapse application service** in [`apps/ai-bot/`](../../apps/ai-bot/) — not a normal
bot user. Answers `@`-mentions in groups and every message in 1:1s, over the plaintext
CS-API (Vojo rooms are unencrypted by default). It is a separate server-side service
deployed next to Synapse; it ships nothing to the web client.
- **Operator / full env reference:** [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md) (config tables, setup, deploy).
- **Deploy / server config:** [server-side.md](server-side.md) (the `ai-bot` service row, the `vojo_ai` Postgres role).
- **Detailed design SOT:** `docs/plans/grok_bot.md` + `docs/plans/ai_backend_build_plan.md` — **local-only, `docs/plans/` is gitignored.**
## Request flow
Synapse pushes a transaction → the bot **acks 200 instantly, then processes async per-room**
([appservice.go](../../apps/ai-bot/appservice.go)), so a slow model call never blocks other
rooms or the homeserver. `handleMessage` ([bot.go](../../apps/ai-bot/bot.go)) gates in order:
durable+in-memory dedup → encrypted-room skip → decode / edit / own-message / notice →
foreign-server leave → DM-or-mention → media react → **per-room single-flight** → spawn
`respond`. `respond` = `Reserve(estimate)``generate()``Settle(actual)``sendReply`;
**any failure produces an emoji react, never silence.**
## Cascade (flag-gated "operator cascade", every layer default OFF)
`generate()` ([cascade.go](../../apps/ai-bot/cascade.go)) routes ([router.go](../../apps/ai-bot/router.go))
then dispatches; **any layer off or failing degrades to `grok_direct`** (never an error to the user):
- **`grok_direct`** — DEFAULT, one Grok call. **Grok is the final voice on everything substantive.**
- **`trivial_direct`** — greetings/acks → cheap Gemini (`TRIVIAL_OFFLOAD_ENABLED`).
- **`web_then_grok`** — fresh facts: a WebProvider fetches a grounded digest + citations, then **Grok synthesises the answer in voice** ([web.go](../../apps/ai-bot/web.go)).
- **`reason_then_grok`** — manual trigger ("подумай глубже") → Grok at a higher `reasoning_effort`.
- Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (`grok_direct`).
**Invariant:** all cascade flags OFF == today's bot — a single `grok_direct` call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes.
## Provider seam (no vendor names in business logic)
[llm.go](../../apps/ai-bot/llm.go) (`Message`/`Usage`/`LLMRequest`/`LLMResponse`/`LLMClient`) +
[httpllm.go](../../apps/ai-bot/httpllm.go) (shared OpenAI-compatible transport + retry) + thin
adapters [provider_xai.go](../../apps/ai-bot/provider_xai.go) /
[provider_gemini.go](../../apps/ai-bot/provider_gemini.go) + [pricing.go](../../apps/ai-bot/pricing.go)
(`priceFor` model→price map). `Bot.llm` is an `LLMClient`, never a concrete vendor type.
## Money, invariants & store ([store.go](../../apps/ai-bot/store.go))
- **Ceiling is TOCTOU-safe:** `Reserve` books a route's estimated max-cost into `reserved_usd`
under a per-day **global** advisory lock; the gate counts committed + reserved spend; `Settle`
releases the reservation and books the real per-component `CostBreakdown`. A concurrent burst
overshoots by at most one reservation.
- **Never charge for silence:** a 2xx is billed; if the reply then fails to send, refund the
request SLOT (not the USD) + react. A failed call releases the reservation + refunds the slot;
a panic releases via a deferred guard.
- Caps: `DAILY_USD_CEILING` (global $), `PER_USER_DAILY_CAP` (requests/user), `PER_USER_DAILY_USD`
(optional $/user). **at-most-once** dedup is durable (`SeenEvent`/`MarkTxn`); generation is
per-room single-flight.
- One overall **per-request deadline** bounds the whole cascade (no per-stage 3×60s accretion).
- **Telemetry:** one `request_log` row per engaged request (route, per-component $, latency,
degrade reasons), written async + isolated (its failure never drops a reply), `TELEMETRY_ENABLED`
default off, time-based retention.
- **Store:** dedicated Postgres `vojo_ai` (pgx); schema is an ordered `migrations` array in
store.go. **Operational state only** (dedup, spend ledger, grounding cap, `request_log`,
warned-encrypted) — **no message content** (that lives in Synapse).
## Current prod config (the cheap web path)
`WEB_PROVIDER=gemini_grounding`: Gemini 2.5 Flash-Lite does the fetch via the **native v1beta
`google_search` tool** (NOT the OpenAI-compat endpoint — grounding is silently ignored there,
F-EXT-3), then Grok-4.3 voices it. ~**$0.0013/query** (vs ~$0.022 for the old two-Grok path);
grounding is free under the daily RPD, guarded by `WEB_GROUNDING_DAILY_CAP`. `XAI_MODEL=grok-4.3`
+ `GROK_REASONING_EFFORT=none` (4.3 otherwise reasons on every reply). Full flag table in the
[README](../../apps/ai-bot/README.md).
## Building / testing
Go toolchain lives at `/home/ubuntu/.go-toolchain/go/bin` (NOT on PATH). Store-backed tests need
`AI_BOT_TEST_DATABASE_URL` (a throwaway Postgres) and **skip** without it, so `go test ./...` stays
green on a machine without one. Keep `gofmt -l`, `go vet ./...`, `go test -race ./...` clean.

View file

@ -132,7 +132,7 @@ in doubt.
| `telegram-bridge` | `dock.mau.dev/mautrix/telegram:<v26.04 bridgev2 tag>` | `./bridges/telegram:/data` |
| `discord-bridge` | `dock.mau.dev/mautrix/discord:v0.7.5` | `./bridges/discord:/data` (legacy bridge — runtime reports `0.7.6+dev`) |
| `whatsapp-bridge` | `dock.mau.dev/mautrix/whatsapp:v0.12.4` | `./bridges/whatsapp:/data` |
| `ai-bot` | `ai-bot:custom` (built locally from [`apps/ai-bot/`](../../apps/ai-bot/), shipped via `docker save \| ssh docker load` — VS Code task **Deploy AI bot**) | **Vojo AI** = `@ai:vojo.chat`, an xAI-Grok-backed **application service** (NOT a normal bot user). Answers `@`-mentions in groups + everything in 1:1s; the Grok reply (markdown) is rendered to `org.matrix.custom.html` and sent as `formatted_body` (in-bot `markdown.go`, zero deps; emits only tags Cinny's sanitizer keeps, escapes all model text), falling back to the plain `body` when there's no formatting. Mounts `./ai-bot:/data` (owned **uid 65532**, distroless nonroot) holding `registration.yaml` (self-generated, `generate-registration`), `state/` (runtime dir) and `secrets/xai_api_key`. Its **operational store** (txn/event dedup, daily spend ledger, encrypted-warned set) lives in the dedicated `vojo_ai` Postgres DB via `AI_BOT_DATABASE_URL``depends_on: [synapse, postgres]`. Push port `:8009` (registration `url: http://ai-bot:8009`). Secrets via env/`*_FILE`; `as_token`/`hs_token` read from `registration.yaml` (no rotation). See [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md). |
| `ai-bot` | `ai-bot:custom` (built locally from [`apps/ai-bot/`](../../apps/ai-bot/), shipped via `docker save \| ssh docker load` — VS Code task **Deploy AI bot**) | **Vojo AI** = `@ai:vojo.chat`, a Grok-voiced **cascade application service** (NOT a normal bot user; architecture: [ai-bot.md](ai-bot.md)). Answers `@`-mentions in groups + everything in 1:1s; the Grok reply (markdown) is rendered to `org.matrix.custom.html` and sent as `formatted_body` (in-bot `markdown.go`, zero deps; emits only tags Cinny's sanitizer keeps, escapes all model text), falling back to the plain `body` when there's no formatting. Mounts `./ai-bot:/data` (owned **uid 65532**, distroless nonroot) holding `registration.yaml` (self-generated, `generate-registration`), `state/` (runtime dir) and `secrets/xai_api_key`. Its **operational store** (txn/event dedup, daily spend ledger, encrypted-warned set) lives in the dedicated `vojo_ai` Postgres DB via `AI_BOT_DATABASE_URL``depends_on: [synapse, postgres]`. Push port `:8009` (registration `url: http://ai-bot:8009`). Secrets via env/`*_FILE`; `as_token`/`hs_token` read from `registration.yaml` (no rotation). See [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md). |
### Bridge service stanza (template)