113 lines
8.6 KiB
Markdown
113 lines
8.6 KiB
Markdown
# Vojo AI bot (`@ai:vojo.chat`)
|
||
|
||
A Go **Synapse application service** in [`apps/ai-bot/`](../../apps/ai-bot/) — not a normal
|
||
bot user. Answers `@`-mentions in groups and every message in 1:1s, over the plaintext
|
||
CS-API (Vojo rooms are unencrypted by default). It is a separate server-side service
|
||
deployed next to Synapse; it ships nothing to the web client.
|
||
|
||
- **Operator / full env reference:** [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md) (config tables, setup, deploy).
|
||
- **Deploy / server config:** [server-side.md](server-side.md) (the `ai-bot` service row, the `vojo_ai` Postgres role).
|
||
- **Detailed design SOT:** `docs/plans/grok_bot.md` + `docs/plans/ai_backend_build_plan.md` — **local-only, `docs/plans/` is gitignored.**
|
||
|
||
## Request flow
|
||
|
||
Synapse pushes a transaction → the bot **acks 200 instantly, then processes async per-room**
|
||
([appservice.go](../../apps/ai-bot/appservice.go)), so a slow model call never blocks other
|
||
rooms or the homeserver. `handleMessage` ([bot.go](../../apps/ai-bot/bot.go)) gates in order:
|
||
durable+in-memory dedup → encrypted-room skip → decode / edit / own-message / notice →
|
||
foreign-server leave → DM-or-mention → media react → resolve conversation (thread) →
|
||
**per-(room,thread) single-flight** → spawn `respond`. `respond` = `Reserve(estimate)` →
|
||
`generate()` → `Settle(actual)` → `sendReply`; **any failure produces an emoji react, never silence.**
|
||
|
||
## Conversations (threads) — ChatGPT-style multi-chat
|
||
|
||
In a 1:1 DM a top-level message **roots a new thread** (a fresh conversation) and the bot answers
|
||
inside it ([bot.go](../../apps/ai-bot/bot.go) `resolveThreadRoot`); a message already in a thread
|
||
continues it (F27). **Groups are never auto-threaded** — the gate is structural (`isDM`), not a
|
||
flag, so the threading feature can never change group behavior. Auto-threading in DMs is **always
|
||
on** (the old `THREAD_CONVERSATIONS` env flag was removed — it only created a host/backend
|
||
mismatch footgun). Context and single-flight are keyed per-`(room, thread)` so conversations
|
||
neither share history nor block each other; typing is room-level (Matrix has no per-thread typing)
|
||
via a refcount; per-thread context buffers are LRU-bounded (`maxConvBuffersPerRoom`).
|
||
|
||
**Host pairing:** the cinny host shows the conversation surface for a bot only when its
|
||
`config.json` preset has `experience.type: "ai-chat"` (today only `@ai`). That surface is a
|
||
**fully isolated, native in-client chat** (`features/bots/BotConversations` + `AiChatHeader` +
|
||
`AiChatMenu`, reusing the generic `ThreadDrawer`/`RoomInput`) — it shares **no** runtime with the
|
||
bridge widget pipeline (no `BotShell`/iframe, no `show-chat` toggle; there is no `vojo-ai` widget
|
||
any more). Bridges keep `experience.type: "matrix-widget"` (iframe + show-chat fallback). Because
|
||
the backend now always threads DMs, any DM message @ai answers lands in a thread the host can open.
|
||
|
||
## Cascade (flag-gated "operator cascade", every layer default OFF)
|
||
|
||
`generate()` ([cascade.go](../../apps/ai-bot/cascade.go)) routes ([router.go](../../apps/ai-bot/router.go))
|
||
then dispatches; **any layer off or failing degrades to `grok_direct`** (never an error to the user):
|
||
|
||
- **`grok_direct`** — DEFAULT, one Grok call. **Grok is the final voice on everything substantive.**
|
||
- **`trivial_direct`** — greetings/acks → cheap Gemini (`TRIVIAL_OFFLOAD_ENABLED`).
|
||
- **`web_then_grok`** — fresh facts: a WebProvider fetches a grounded digest + citations, then **Grok synthesises the answer in voice** ([web.go](../../apps/ai-bot/web.go)).
|
||
- **`reason_then_grok`** — manual trigger ("подумай глубже") → Grok at a higher `reasoning_effort`.
|
||
- **`project_then_grok`** — questions about the **Vojo product itself** (`PROJECT_KB_ENABLED`): a curated KB (operator data from `PROJECT_KB_PATH`, default the bundled `prompts/vojo_kb.txt`) is injected as a system note and **Grok answers product claims strictly from it** (anti-hallucination — Grok has no parametric Vojo knowledge, and the web doesn't either). Gated by the classifier's `about_project` signal (the context-aware judge — it resolves follow-ups like "Про этот" → the app); a false positive is bounded by the entity-scoped note. Beats every web arm. One Grok call, so it costs ~the same as `grok_direct`. See [docs/plans/ai_project_knowledge.md](../plans/ai_project_knowledge.md).
|
||
- Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (`grok_direct`).
|
||
|
||
**Invariant:** all cascade flags OFF == today's bot — a single `grok_direct` call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes.
|
||
|
||
## Provider seam (no vendor names in business logic)
|
||
|
||
[llm.go](../../apps/ai-bot/llm.go) (`Message`/`Usage`/`LLMRequest`/`LLMResponse`/`LLMClient`) +
|
||
[httpllm.go](../../apps/ai-bot/httpllm.go) (shared OpenAI-compatible transport + retry) + thin
|
||
adapters [provider_xai.go](../../apps/ai-bot/provider_xai.go) /
|
||
[provider_gemini.go](../../apps/ai-bot/provider_gemini.go) + [pricing.go](../../apps/ai-bot/pricing.go)
|
||
(`priceFor` model→price map). `Bot.llm` is an `LLMClient`, never a concrete vendor type.
|
||
|
||
## Money, invariants & store ([store.go](../../apps/ai-bot/store.go))
|
||
|
||
- **Ceiling is TOCTOU-safe:** `Reserve` books a route's estimated max-cost into `reserved_usd`
|
||
under a per-day **global** advisory lock; the gate counts committed + reserved spend; `Settle`
|
||
releases the reservation and books the real per-component `CostBreakdown`. A concurrent burst
|
||
overshoots by at most one reservation.
|
||
- **Never charge for silence:** a 2xx is billed; if the reply then fails to send, refund the
|
||
request SLOT (not the USD) + react. A failed call releases the reservation + refunds the slot;
|
||
a panic releases via a deferred guard.
|
||
- Caps: `DAILY_USD_CEILING` (global $), `PER_USER_DAILY_CAP` (requests/user), `PER_USER_DAILY_USD`
|
||
(optional $/user). **at-most-once** dedup is durable (`SeenEvent`/`MarkTxn`); generation is
|
||
per-(room,thread) single-flight.
|
||
- One overall **per-request deadline** bounds the whole cascade (no per-stage 3×60s accretion).
|
||
- **Telemetry:** one `request_log` row per engaged request (route, per-component $, latency,
|
||
degrade reasons), written async + isolated (its failure never drops a reply), `TELEMETRY_ENABLED`
|
||
default off, time-based retention.
|
||
- **Store:** dedicated Postgres `vojo_ai` (pgx); schema is an ordered `migrations` array in
|
||
store.go. **Operational state only** (dedup, spend ledger, grounding cap, `request_log`,
|
||
warned-encrypted) — **no message content** (that lives in Synapse).
|
||
|
||
## Current prod config (the cheap web path)
|
||
|
||
`WEB_PROVIDER=gemini_grounding`: Gemini 2.5 Flash-Lite does the fetch via the **native v1beta
|
||
`google_search` tool** (NOT the OpenAI-compat endpoint — grounding is silently ignored there,
|
||
F-EXT-3), then Grok-4.3 voices it. ~**$0.0013/query** (vs ~$0.022 for the old two-Grok path);
|
||
grounding is free under the daily RPD, guarded by `WEB_GROUNDING_DAILY_CAP`. `XAI_MODEL=grok-4.3`
|
||
+ `GROK_REASONING_EFFORT=none` (4.3 otherwise reasons on every reply). Full flag table in the
|
||
[README](../../apps/ai-bot/README.md).
|
||
|
||
## Observability (logs + per-request trace)
|
||
|
||
`log/slog` to stderr (`LOG_LEVEL`, `LOG_FORMAT=text|json`). A context-aware handler
|
||
([logging.go](../../apps/ai-bot/logging.go)) stamps a per-request **`trace_id`** —
|
||
minted once per handled event in `handleEvent` ([trace.go](../../apps/ai-bot/trace.go))
|
||
and carried in `ctx` down to the model HTTP call — onto **every** log line, so one
|
||
`trace_id` greps the whole request trail (the userver idiom; the id is OTel-trace-id
|
||
shaped for a future exporter). Routing diagnostics (`route decided` / `generation
|
||
outcome`) are DEBUG, content-free. Full model **request/response bodies** are gated by a
|
||
**per-user allowlist** `LOG_BODIES_USERS` (empty = nobody) **and** `LOG_LEVEL=debug`,
|
||
truncated to a fixed ~4 KB cap, with URL/headers (the API key) never logged — decided once
|
||
at admission via a `verbose` flag in `ctx`, read by the dumb transport. This is the
|
||
**debug** path; `request_log` (`TELEMETRY_*`) is the separate **analytics** path — they
|
||
correlate via `trace_id`/`event_id` but are independent. Ship JSON stdout to
|
||
OpenSearch/Loki with a collector (Fluent Bit/Vector); the bot never talks to a log
|
||
backend. Full flag table in the [README](../../apps/ai-bot/README.md#observability--logs--per-request-trace).
|
||
|
||
## Building / testing
|
||
|
||
Go toolchain lives at `/home/ubuntu/.go-toolchain/go/bin` (NOT on PATH). Store-backed tests need
|
||
`AI_BOT_TEST_DATABASE_URL` (a throwaway Postgres) and **skip** without it, so `go test ./...` stays
|
||
green on a machine without one. Keep `gofmt -l`, `go vet ./...`, `go test -race ./...` clean.
|