vojo/docs/ai/ai-bot.md

76 lines
5.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Vojo AI bot (`@ai:vojo.chat`)
A Go **Synapse application service** in [`apps/ai-bot/`](../../apps/ai-bot/) — not a normal
bot user. Answers `@`-mentions in groups and every message in 1:1s, over the plaintext
CS-API (Vojo rooms are unencrypted by default). It is a separate server-side service
deployed next to Synapse; it ships nothing to the web client.
- **Operator / full env reference:** [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md) (config tables, setup, deploy).
- **Deploy / server config:** [server-side.md](server-side.md) (the `ai-bot` service row, the `vojo_ai` Postgres role).
- **Detailed design SOT:** `docs/plans/grok_bot.md` + `docs/plans/ai_backend_build_plan.md`**local-only, `docs/plans/` is gitignored.**
## Request flow
Synapse pushes a transaction → the bot **acks 200 instantly, then processes async per-room**
([appservice.go](../../apps/ai-bot/appservice.go)), so a slow model call never blocks other
rooms or the homeserver. `handleMessage` ([bot.go](../../apps/ai-bot/bot.go)) gates in order:
durable+in-memory dedup → encrypted-room skip → decode / edit / own-message / notice →
foreign-server leave → DM-or-mention → media react → **per-room single-flight** → spawn
`respond`. `respond` = `Reserve(estimate)``generate()``Settle(actual)``sendReply`;
**any failure produces an emoji react, never silence.**
## Cascade (flag-gated "operator cascade", every layer default OFF)
`generate()` ([cascade.go](../../apps/ai-bot/cascade.go)) routes ([router.go](../../apps/ai-bot/router.go))
then dispatches; **any layer off or failing degrades to `grok_direct`** (never an error to the user):
- **`grok_direct`** — DEFAULT, one Grok call. **Grok is the final voice on everything substantive.**
- **`trivial_direct`** — greetings/acks → cheap Gemini (`TRIVIAL_OFFLOAD_ENABLED`).
- **`web_then_grok`** — fresh facts: a WebProvider fetches a grounded digest + citations, then **Grok synthesises the answer in voice** ([web.go](../../apps/ai-bot/web.go)).
- **`reason_then_grok`** — manual trigger ("подумай глубже") → Grok at a higher `reasoning_effort`.
- Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (`grok_direct`).
**Invariant:** all cascade flags OFF == today's bot — a single `grok_direct` call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes.
## Provider seam (no vendor names in business logic)
[llm.go](../../apps/ai-bot/llm.go) (`Message`/`Usage`/`LLMRequest`/`LLMResponse`/`LLMClient`) +
[httpllm.go](../../apps/ai-bot/httpllm.go) (shared OpenAI-compatible transport + retry) + thin
adapters [provider_xai.go](../../apps/ai-bot/provider_xai.go) /
[provider_gemini.go](../../apps/ai-bot/provider_gemini.go) + [pricing.go](../../apps/ai-bot/pricing.go)
(`priceFor` model→price map). `Bot.llm` is an `LLMClient`, never a concrete vendor type.
## Money, invariants & store ([store.go](../../apps/ai-bot/store.go))
- **Ceiling is TOCTOU-safe:** `Reserve` books a route's estimated max-cost into `reserved_usd`
under a per-day **global** advisory lock; the gate counts committed + reserved spend; `Settle`
releases the reservation and books the real per-component `CostBreakdown`. A concurrent burst
overshoots by at most one reservation.
- **Never charge for silence:** a 2xx is billed; if the reply then fails to send, refund the
request SLOT (not the USD) + react. A failed call releases the reservation + refunds the slot;
a panic releases via a deferred guard.
- Caps: `DAILY_USD_CEILING` (global $), `PER_USER_DAILY_CAP` (requests/user), `PER_USER_DAILY_USD`
(optional $/user). **at-most-once** dedup is durable (`SeenEvent`/`MarkTxn`); generation is
per-room single-flight.
- One overall **per-request deadline** bounds the whole cascade (no per-stage 3×60s accretion).
- **Telemetry:** one `request_log` row per engaged request (route, per-component $, latency,
degrade reasons), written async + isolated (its failure never drops a reply), `TELEMETRY_ENABLED`
default off, time-based retention.
- **Store:** dedicated Postgres `vojo_ai` (pgx); schema is an ordered `migrations` array in
store.go. **Operational state only** (dedup, spend ledger, grounding cap, `request_log`,
warned-encrypted) — **no message content** (that lives in Synapse).
## Current prod config (the cheap web path)
`WEB_PROVIDER=gemini_grounding`: Gemini 2.5 Flash-Lite does the fetch via the **native v1beta
`google_search` tool** (NOT the OpenAI-compat endpoint — grounding is silently ignored there,
F-EXT-3), then Grok-4.3 voices it. ~**$0.0013/query** (vs ~$0.022 for the old two-Grok path);
grounding is free under the daily RPD, guarded by `WEB_GROUNDING_DAILY_CAP`. `XAI_MODEL=grok-4.3`
+ `GROK_REASONING_EFFORT=none` (4.3 otherwise reasons on every reply). Full flag table in the
[README](../../apps/ai-bot/README.md).
## Building / testing
Go toolchain lives at `/home/ubuntu/.go-toolchain/go/bin` (NOT on PATH). Store-backed tests need
`AI_BOT_TEST_DATABASE_URL` (a throwaway Postgres) and **skip** without it, so `go test ./...` stays
green on a machine without one. Keep `gofmt -l`, `go vet ./...`, `go test -race ./...` clean.