vojo/docs/ai/ai-bot.md

143 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Vojo AI bot (`@ai:vojo.chat`)
A Go **Synapse application service** in [`apps/ai-bot/`](../../apps/ai-bot/) — not a normal
bot user. Answers `@`-mentions in groups and every message in 1:1s, over the plaintext
CS-API (Vojo rooms are unencrypted by default). It is a separate server-side service
deployed next to Synapse; it ships nothing to the web client.
- **Operator / full env reference:** [`apps/ai-bot/README.md`](../../apps/ai-bot/README.md) (config tables, setup, deploy).
- **Deploy / server config:** [server-side.md](server-side.md) (the `ai-bot` service row, the `vojo_ai` Postgres role).
- **Detailed design SOT:** `docs/plans/grok_bot.md` + `docs/plans/ai_backend_build_plan.md`**local-only, `docs/plans/` is gitignored.**
## Request flow
Synapse pushes a transaction → the bot **acks 200 instantly, then processes async per-room**
([appservice.go](../../apps/ai-bot/appservice.go)), so a slow model call never blocks other
rooms or the homeserver. `handleMessage` ([bot.go](../../apps/ai-bot/bot.go)) gates in order:
durable+in-memory dedup → encrypted-room skip → decode / edit / own-message / notice →
foreign-server leave → DM-or-mention → media react → resolve conversation (thread) →
**per-(room,thread) single-flight** → spawn `respond`. `respond` = `Reserve(estimate)`
`generate()``Settle(actual)``sendReply`; **any failure produces an emoji react, never silence.**
## Conversations (threads) — ChatGPT-style multi-chat
In a 1:1 DM a top-level message **roots a new thread** (a fresh conversation) and the bot answers
inside it ([bot.go](../../apps/ai-bot/bot.go) `resolveThreadRoot`); a message already in a thread
continues it (F27). **Groups are never auto-threaded** — the gate is structural (`isDM`), not a
flag, so the threading feature can never change group behavior. Auto-threading in DMs is **always
on** (the old `THREAD_CONVERSATIONS` env flag was removed — it only created a host/backend
mismatch footgun). Context and single-flight are keyed per-`(room, thread)` so conversations
neither share history nor block each other; typing is room-level (Matrix has no per-thread typing)
via a refcount; per-thread context buffers are LRU-bounded (`maxConvBuffersPerRoom`).
**Host pairing:** the cinny host shows the conversation surface for a bot only when its
`config.json` preset has `experience.type: "ai-chat"` (today only `@ai`). That surface is a
**fully isolated, native in-client chat** (`features/bots/BotConversations` + `AiChatHeader` +
`AiChatMenu`, reusing the generic `ThreadDrawer`/`RoomInput`) — it shares **no** runtime with the
bridge widget pipeline (no `BotShell`/iframe, no `show-chat` toggle; there is no `vojo-ai` widget
any more). Bridges keep `experience.type: "matrix-widget"` (iframe + show-chat fallback). Because
the backend now always threads DMs, any DM message @ai answers lands in a thread the host can open.
## Cascade (flag-gated "operator cascade", every layer default OFF)
`generate()` ([cascade.go](../../apps/ai-bot/cascade.go)) routes ([router.go](../../apps/ai-bot/router.go))
then dispatches; **any layer off or failing degrades to `grok_direct`** (never an error to the user):
- **`grok_direct`** — DEFAULT, one Grok call. **Grok is the final voice on everything substantive.**
- **`trivial_direct`** — greetings/acks → cheap Gemini (`TRIVIAL_OFFLOAD_ENABLED`).
- **`web_then_grok`** — fresh facts: a WebProvider fetches a grounded digest + citations, then **Grok synthesises the answer in voice** ([web.go](../../apps/ai-bot/web.go)).
- **`reason_then_grok`** — manual trigger ("подумай глубже") → Grok at a higher `reasoning_effort`.
- **`project_then_grok`** — questions about the **Vojo product itself** (`PROJECT_KB_ENABLED`): a curated KB (operator data from `PROJECT_KB_PATH`, default the bundled `prompts/vojo_kb.txt`) is injected as a system note and **Grok answers product claims strictly from it** (anti-hallucination — Grok has no parametric Vojo knowledge, and the web doesn't either). Gated by the classifier's `about_project` signal (the context-aware judge — it resolves follow-ups like "Про этот" → the app); a false positive is bounded by the entity-scoped note. Beats every web arm. One Grok call, so it costs ~the same as `grok_direct`. See [docs/plans/ai_project_knowledge.md](../plans/ai_project_knowledge.md).
- Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (`grok_direct`).
**Invariant:** all cascade flags OFF == today's bot — a single `grok_direct` call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes.
## Provider seam (no vendor names in business logic)
[llm.go](../../apps/ai-bot/llm.go) (`Message`/`Usage`/`LLMRequest`/`LLMResponse`/`LLMClient`) +
[httpllm.go](../../apps/ai-bot/httpllm.go) (shared OpenAI-compatible transport + retry) + thin
adapters [provider_xai.go](../../apps/ai-bot/provider_xai.go) /
[provider_gemini.go](../../apps/ai-bot/provider_gemini.go) + [pricing.go](../../apps/ai-bot/pricing.go)
(`priceFor` model→price map). `Bot.llm` is an `LLMClient`, never a concrete vendor type.
## Money, invariants & store ([store.go](../../apps/ai-bot/store.go))
- **Ceiling is TOCTOU-safe:** `Reserve` books a route's estimated max-cost into `reserved_usd`
under a per-day **global** advisory lock; the gate counts committed + reserved spend; `Settle`
releases the reservation and books the real per-component `CostBreakdown`. A concurrent burst
overshoots by at most one reservation.
- **Never charge for silence:** a 2xx is billed; if the reply then fails to send, refund the
request SLOT (not the USD) + react. A failed call releases the reservation + refunds the slot;
a panic releases via a deferred guard.
- Caps: `DAILY_USD_CEILING` (global $), `PER_USER_DAILY_CAP` (requests/user), `PER_USER_DAILY_USD`
(optional $/user). **at-most-once** dedup is durable (`SeenEvent`/`MarkTxn`); generation is
per-(room,thread) single-flight.
- One overall **per-request deadline** bounds the whole cascade (no per-stage 3×60s accretion).
- **Telemetry:** one `request_log` row per engaged request (route, per-component $, latency,
degrade reasons), written async + isolated (its failure never drops a reply), `TELEMETRY_ENABLED`
default off, time-based retention.
- **Store:** dedicated Postgres `vojo_ai` (pgx); schema is an ordered `migrations` array in
store.go. **Operational state only** (dedup, spend ledger, grounding cap, `request_log`,
warned-encrypted) — **no message content** (that lives in Synapse).
## Current prod config (the cheap web path)
`WEB_PROVIDER=gemini_grounding`: Gemini 2.5 Flash-Lite does the fetch via the **native v1beta
`google_search` tool** (NOT the OpenAI-compat endpoint — grounding is silently ignored there,
F-EXT-3), then Grok-4.3 voices it. ~**$0.0013/query** (vs ~$0.022 for the old two-Grok path);
grounding is free under the daily RPD, guarded by `WEB_GROUNDING_DAILY_CAP`. `XAI_MODEL=grok-4.3`
+ `GROK_REASONING_EFFORT=none` (4.3 otherwise reasons on every reply). Full flag table in the
[README](../../apps/ai-bot/README.md).
## Trigger hygiene (what reaches the search query)
The raw event body is **cleaned once** at the top of `respond` ([bot.go](../../apps/ai-bot/bot.go),
`stripBotMention(stripReplyFallback(...))`) before it is used as the web-search query, the prompt
trigger, the buffer entry, or telemetry. Two egress hazards both rode the raw body: the bot's own
mention pill fallback (cinny writes the **full mxid** `@ai:vojo.chat` into the plain `body`), and
the rich-reply quoted parent. The mxid was the worse one — sent verbatim to gemini grounding it
made the provider treat **`vojo.chat`** as the subject entity ("was the *Vojo.chat* messenger
removed?") and confabulate a confident wrong answer; the same question without the mention (e.g. in
a DM, which has no mention) grounded correctly. Mention **detection** is unaffected — it runs
upstream on `m.mentions`/`replyParentIsBot` ([mentions.go](../../apps/ai-bot/mentions.go)), not on
body text. The human display name is deliberately **not** stripped, so "что умеет Vojo AI" survives.
## Source attribution (the "Sources" footer)
Web answers append a compact, deduped **`Источники: [rbc.ru](…), …`** line built **server-side**
after Grok's prose ([sources.go](../../apps/ai-bot/sources.go) `sourcesFooter`), never via the Grok
prompt (the synth note still says "no URLs or links" — instructing Grok to cite made it paste ugly
redirects and mis-attribute them). The label is the publisher **domain** (`web.title`); the link is
the citation's URL — for `gemini_grounding` that is the opaque `grounding-api-redirect` URL, which
the **end user clicks** to reach the real article. **Gemini Grounding terms** (verified against
`ai.google.dev/gemini-api/terms`) constrain this: the redirect must **not** be resolved
server-side (no "programmatic/automated access to Grounded Results"), and a strict reading also
requires showing the **Search-Suggestions chip** (`searchEntryPoint.renderedContent`, HTML/CSS) —
which a sanitised Matrix bubble can't render, so that part stays unmet (pre-existing gap; the bot
already shows grounded prose without it). The footer is appended to the **sent** message only, not
the buffered turn — the redirect links are ephemeral, so they must not pollute the history that
feeds later prompts. `grok_web_search` returns **real** publisher URLs (no Google display ToS), so
switching `WEB_PROVIDER` is the path to true article links — at ~17× the cost.
## Observability (logs + per-request trace)
`log/slog` to stderr (`LOG_LEVEL`, `LOG_FORMAT=text|json`). A context-aware handler
([logging.go](../../apps/ai-bot/logging.go)) stamps a per-request **`trace_id`** —
minted once per handled event in `handleEvent` ([trace.go](../../apps/ai-bot/trace.go))
and carried in `ctx` down to the model HTTP call — onto **every** log line, so one
`trace_id` greps the whole request trail (the userver idiom; the id is OTel-trace-id
shaped for a future exporter). Routing diagnostics (`route decided` / `generation
outcome`) are DEBUG, content-free. Full model **request/response bodies** are gated by a
**per-user allowlist** `LOG_BODIES_USERS` (empty = nobody) **and** `LOG_LEVEL=debug`,
truncated to a fixed ~4 KB cap, with URL/headers (the API key) never logged — decided once
at admission via a `verbose` flag in `ctx`, read by the dumb transport. This is the
**debug** path; `request_log` (`TELEMETRY_*`) is the separate **analytics** path — they
correlate via `trace_id`/`event_id` but are independent. Ship JSON stdout to
OpenSearch/Loki with a collector (Fluent Bit/Vector); the bot never talks to a log
backend. Full flag table in the [README](../../apps/ai-bot/README.md#observability--logs--per-request-trace).
## Building / testing
Go toolchain lives at `/home/ubuntu/.go-toolchain/go/bin` (NOT on PATH). Store-backed tests need
`AI_BOT_TEST_DATABASE_URL` (a throwaway Postgres) and **skip** without it, so `go test ./...` stays
green on a machine without one. Keep `gofmt -l`, `go vet ./...`, `go test -race ./...` clean.