docs(ai): add ai-bot.md documenting the bot's Grok-voiced cascade backend and link it from the context bank
This commit is contained in:
parent
ff8918dae1
commit
5d959311f2
4 changed files with 79 additions and 1 deletions
1
.gitignore
vendored
1
.gitignore
vendored
|
|
@ -23,5 +23,6 @@ docs/ai/*
|
|||
!docs/ai/i18n.md
|
||||
!docs/ai/overview.md
|
||||
!docs/ai/server-side.md
|
||||
!docs/ai/ai-bot.md
|
||||
|
||||
vite.config.*.timestamp-*.mjs
|
||||
|
|
|
|||
|
|
@ -22,6 +22,7 @@ Any agent (Claude Code, Cursor, Codex, Windsurf, Cline, Copilot, Aider, …) wor
|
|||
| [electron.md](electron.md) | Electron desktop wrapper, privileged `vojo://` scheme for SW, build chain, IPC security, Windows distribution |
|
||||
| [bugs.md](bugs.md) | Known bugs & regressions |
|
||||
| [server-side.md](server-side.md) | Some configs that deployd on server |
|
||||
| [ai-bot.md](ai-bot.md) | Vojo AI bot (`@ai:vojo.chat`) — server-side Grok-voiced cascade appservice: request flow, routes, provider seam, spend ledger, current cheap-web config |
|
||||
|
||||
## Rules for updating
|
||||
|
||||
|
|
|
|||
76
docs/ai/ai-bot.md
Normal file
76
docs/ai/ai-bot.md
Normal file
|
|
@ -0,0 +1,76 @@
|
|||
# Vojo AI bot (`@ai:vojo.chat`)
|
||||
|
||||
A Go **Synapse application service** in [`apps/ai-bot/`](../../apps/ai-bot/) — not a normal
|
||||
bot user. Answers `@`-mentions in groups and every message in 1:1s, over the plaintext
|
||||
CS-API (Vojo rooms are unencrypted by default). It is a separate server-side service
|
||||
deployed next to Synapse; it ships nothing to the web client.
|
||||
|
||||
- **Operator / full env reference:** [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md) (config tables, setup, deploy).
|
||||
- **Deploy / server config:** [server-side.md](server-side.md) (the `ai-bot` service row, the `vojo_ai` Postgres role).
|
||||
- **Detailed design SOT:** `docs/plans/grok_bot.md` + `docs/plans/ai_backend_build_plan.md` — **local-only, `docs/plans/` is gitignored.**
|
||||
|
||||
## Request flow
|
||||
|
||||
Synapse pushes a transaction → the bot **acks 200 instantly, then processes async per-room**
|
||||
([appservice.go](../../apps/ai-bot/appservice.go)), so a slow model call never blocks other
|
||||
rooms or the homeserver. `handleMessage` ([bot.go](../../apps/ai-bot/bot.go)) gates in order:
|
||||
durable+in-memory dedup → encrypted-room skip → decode / edit / own-message / notice →
|
||||
foreign-server leave → DM-or-mention → media react → **per-room single-flight** → spawn
|
||||
`respond`. `respond` = `Reserve(estimate)` → `generate()` → `Settle(actual)` → `sendReply`;
|
||||
**any failure produces an emoji react, never silence.**
|
||||
|
||||
## Cascade (flag-gated "operator cascade", every layer default OFF)
|
||||
|
||||
`generate()` ([cascade.go](../../apps/ai-bot/cascade.go)) routes ([router.go](../../apps/ai-bot/router.go))
|
||||
then dispatches; **any layer off or failing degrades to `grok_direct`** (never an error to the user):
|
||||
|
||||
- **`grok_direct`** — DEFAULT, one Grok call. **Grok is the final voice on everything substantive.**
|
||||
- **`trivial_direct`** — greetings/acks → cheap Gemini (`TRIVIAL_OFFLOAD_ENABLED`).
|
||||
- **`web_then_grok`** — fresh facts: a WebProvider fetches a grounded digest + citations, then **Grok synthesises the answer in voice** ([web.go](../../apps/ai-bot/web.go)).
|
||||
- **`reason_then_grok`** — manual trigger ("подумай глубже") → Grok at a higher `reasoning_effort`.
|
||||
- Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (`grok_direct`).
|
||||
|
||||
**Invariant:** all cascade flags OFF == today's bot — a single `grok_direct` call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes.
|
||||
|
||||
## Provider seam (no vendor names in business logic)
|
||||
|
||||
[llm.go](../../apps/ai-bot/llm.go) (`Message`/`Usage`/`LLMRequest`/`LLMResponse`/`LLMClient`) +
|
||||
[httpllm.go](../../apps/ai-bot/httpllm.go) (shared OpenAI-compatible transport + retry) + thin
|
||||
adapters [provider_xai.go](../../apps/ai-bot/provider_xai.go) /
|
||||
[provider_gemini.go](../../apps/ai-bot/provider_gemini.go) + [pricing.go](../../apps/ai-bot/pricing.go)
|
||||
(`priceFor` model→price map). `Bot.llm` is an `LLMClient`, never a concrete vendor type.
|
||||
|
||||
## Money, invariants & store ([store.go](../../apps/ai-bot/store.go))
|
||||
|
||||
- **Ceiling is TOCTOU-safe:** `Reserve` books a route's estimated max-cost into `reserved_usd`
|
||||
under a per-day **global** advisory lock; the gate counts committed + reserved spend; `Settle`
|
||||
releases the reservation and books the real per-component `CostBreakdown`. A concurrent burst
|
||||
overshoots by at most one reservation.
|
||||
- **Never charge for silence:** a 2xx is billed; if the reply then fails to send, refund the
|
||||
request SLOT (not the USD) + react. A failed call releases the reservation + refunds the slot;
|
||||
a panic releases via a deferred guard.
|
||||
- Caps: `DAILY_USD_CEILING` (global $), `PER_USER_DAILY_CAP` (requests/user), `PER_USER_DAILY_USD`
|
||||
(optional $/user). **at-most-once** dedup is durable (`SeenEvent`/`MarkTxn`); generation is
|
||||
per-room single-flight.
|
||||
- One overall **per-request deadline** bounds the whole cascade (no per-stage 3×60s accretion).
|
||||
- **Telemetry:** one `request_log` row per engaged request (route, per-component $, latency,
|
||||
degrade reasons), written async + isolated (its failure never drops a reply), `TELEMETRY_ENABLED`
|
||||
default off, time-based retention.
|
||||
- **Store:** dedicated Postgres `vojo_ai` (pgx); schema is an ordered `migrations` array in
|
||||
store.go. **Operational state only** (dedup, spend ledger, grounding cap, `request_log`,
|
||||
warned-encrypted) — **no message content** (that lives in Synapse).
|
||||
|
||||
## Current prod config (the cheap web path)
|
||||
|
||||
`WEB_PROVIDER=gemini_grounding`: Gemini 2.5 Flash-Lite does the fetch via the **native v1beta
|
||||
`google_search` tool** (NOT the OpenAI-compat endpoint — grounding is silently ignored there,
|
||||
F-EXT-3), then Grok-4.3 voices it. ~**$0.0013/query** (vs ~$0.022 for the old two-Grok path);
|
||||
grounding is free under the daily RPD, guarded by `WEB_GROUNDING_DAILY_CAP`. `XAI_MODEL=grok-4.3`
|
||||
+ `GROK_REASONING_EFFORT=none` (4.3 otherwise reasons on every reply). Full flag table in the
|
||||
[README](../../apps/ai-bot/README.md).
|
||||
|
||||
## Building / testing
|
||||
|
||||
Go toolchain lives at `/home/ubuntu/.go-toolchain/go/bin` (NOT on PATH). Store-backed tests need
|
||||
`AI_BOT_TEST_DATABASE_URL` (a throwaway Postgres) and **skip** without it, so `go test ./...` stays
|
||||
green on a machine without one. Keep `gofmt -l`, `go vet ./...`, `go test -race ./...` clean.
|
||||
|
|
@ -132,7 +132,7 @@ in doubt.
|
|||
| `telegram-bridge` | `dock.mau.dev/mautrix/telegram:<v26.04 bridgev2 tag>` | `./bridges/telegram:/data` |
|
||||
| `discord-bridge` | `dock.mau.dev/mautrix/discord:v0.7.5` | `./bridges/discord:/data` (legacy bridge — runtime reports `0.7.6+dev`) |
|
||||
| `whatsapp-bridge` | `dock.mau.dev/mautrix/whatsapp:v0.12.4` | `./bridges/whatsapp:/data` |
|
||||
| `ai-bot` | `ai-bot:custom` (built locally from [`apps/ai-bot/`](../../apps/ai-bot/), shipped via `docker save \| ssh docker load` — VS Code task **Deploy AI bot**) | **Vojo AI** = `@ai:vojo.chat`, an xAI-Grok-backed **application service** (NOT a normal bot user). Answers `@`-mentions in groups + everything in 1:1s; the Grok reply (markdown) is rendered to `org.matrix.custom.html` and sent as `formatted_body` (in-bot `markdown.go`, zero deps; emits only tags Cinny's sanitizer keeps, escapes all model text), falling back to the plain `body` when there's no formatting. Mounts `./ai-bot:/data` (owned **uid 65532**, distroless nonroot) holding `registration.yaml` (self-generated, `generate-registration`), `state/` (runtime dir) and `secrets/xai_api_key`. Its **operational store** (txn/event dedup, daily spend ledger, encrypted-warned set) lives in the dedicated `vojo_ai` Postgres DB via `AI_BOT_DATABASE_URL` — `depends_on: [synapse, postgres]`. Push port `:8009` (registration `url: http://ai-bot:8009`). Secrets via env/`*_FILE`; `as_token`/`hs_token` read from `registration.yaml` (no rotation). See [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md). |
|
||||
| `ai-bot` | `ai-bot:custom` (built locally from [`apps/ai-bot/`](../../apps/ai-bot/), shipped via `docker save \| ssh docker load` — VS Code task **Deploy AI bot**) | **Vojo AI** = `@ai:vojo.chat`, a Grok-voiced **cascade application service** (NOT a normal bot user; architecture: [ai-bot.md](ai-bot.md)). Answers `@`-mentions in groups + everything in 1:1s; the Grok reply (markdown) is rendered to `org.matrix.custom.html` and sent as `formatted_body` (in-bot `markdown.go`, zero deps; emits only tags Cinny's sanitizer keeps, escapes all model text), falling back to the plain `body` when there's no formatting. Mounts `./ai-bot:/data` (owned **uid 65532**, distroless nonroot) holding `registration.yaml` (self-generated, `generate-registration`), `state/` (runtime dir) and `secrets/xai_api_key`. Its **operational store** (txn/event dedup, daily spend ledger, encrypted-warned set) lives in the dedicated `vojo_ai` Postgres DB via `AI_BOT_DATABASE_URL` — `depends_on: [synapse, postgres]`. Push port `:8009` (registration `url: http://ai-bot:8009`). Secrets via env/`*_FILE`; `as_token`/`hs_token` read from `registration.yaml` (no rotation). See [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md). |
|
||||
|
||||
### Bridge service stanza (template)
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue