vojo/docs/ai/ai-bot.md

11 KiB
Raw Blame History

Vojo AI bot (@ai:vojo.chat)

A Go Synapse application service in apps/ai-bot/ — not a normal bot user. Answers @-mentions in groups and every message in 1:1s, over the plaintext CS-API (Vojo rooms are unencrypted by default). It is a separate server-side service deployed next to Synapse; it ships nothing to the web client.

  • Operator / full env reference: apps/ai-bot/README.md (config tables, setup, deploy).
  • Deploy / server config: server-side.md (the ai-bot service row, the vojo_ai Postgres role).
  • Detailed design SOT: docs/plans/grok_bot.md + docs/plans/ai_backend_build_plan.mdlocal-only, docs/plans/ is gitignored.

Request flow

Synapse pushes a transaction → the bot acks 200 instantly, then processes async per-room (appservice.go), so a slow model call never blocks other rooms or the homeserver. handleMessage (bot.go) gates in order: durable+in-memory dedup → encrypted-room skip → decode / edit / own-message / notice → foreign-server leave → DM-or-mention → media react → resolve conversation (thread) → per-(room,thread) single-flight → spawn respond. respond = Reserve(estimate)generate()Settle(actual)sendReply; any failure produces an emoji react, never silence.

Conversations (threads) — ChatGPT-style multi-chat

In a 1:1 DM a top-level message roots a new thread (a fresh conversation) and the bot answers inside it (bot.go resolveThreadRoot); a message already in a thread continues it (F27). Groups are never auto-threaded — the gate is structural (isDM), not a flag, so the threading feature can never change group behavior. Auto-threading in DMs is always on (the old THREAD_CONVERSATIONS env flag was removed — it only created a host/backend mismatch footgun). Context and single-flight are keyed per-(room, thread) so conversations neither share history nor block each other; typing is room-level (Matrix has no per-thread typing) via a refcount; per-thread context buffers are LRU-bounded (maxConvBuffersPerRoom).

Host pairing: the cinny host shows the conversation surface for a bot only when its config.json preset has experience.type: "ai-chat" (today only @ai). That surface is a fully isolated, native in-client chat (features/bots/BotConversations + AiChatHeader + AiChatMenu, reusing the generic ThreadDrawer/RoomInput) — it shares no runtime with the bridge widget pipeline (no BotShell/iframe, no show-chat toggle; there is no vojo-ai widget any more). Bridges keep experience.type: "matrix-widget" (iframe + show-chat fallback). Because the backend now always threads DMs, any DM message @ai answers lands in a thread the host can open.

Cascade (flag-gated "operator cascade", every layer default OFF)

generate() (cascade.go) routes (router.go) then dispatches; any layer off or failing degrades to grok_direct (never an error to the user):

  • grok_direct — DEFAULT, one Grok call. Grok is the final voice on everything substantive.
  • trivial_direct — greetings/acks → cheap Gemini (TRIVIAL_OFFLOAD_ENABLED).
  • web_then_grok — fresh facts: a WebProvider fetches a grounded digest + citations, then Grok synthesises the answer in voice (web.go).
  • reason_then_grok — manual trigger ("подумай глубже") → Grok at a higher reasoning_effort.
  • project_then_grok — questions about the Vojo product itself (PROJECT_KB_ENABLED): a curated KB (operator data from PROJECT_KB_PATH, default the bundled prompts/vojo_kb.txt) is injected as a system note and Grok answers product claims strictly from it (anti-hallucination — Grok has no parametric Vojo knowledge, and the web doesn't either). Gated by the classifier's about_project signal (the context-aware judge — it resolves follow-ups like "Про этот" → the app); a false positive is bounded by the entity-scoped note. Beats every web arm. One Grok call, so it costs ~the same as grok_direct. See docs/plans/ai_project_knowledge.md.
  • Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (grok_direct).

Invariant: all cascade flags OFF == today's bot — a single grok_direct call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes.

Provider seam (no vendor names in business logic)

llm.go (Message/Usage/LLMRequest/LLMResponse/LLMClient) + httpllm.go (shared OpenAI-compatible transport + retry) + thin adapters provider_xai.go / provider_gemini.go + pricing.go (priceFor model→price map). Bot.llm is an LLMClient, never a concrete vendor type.

Money, invariants & store (store.go)

  • Ceiling is TOCTOU-safe: Reserve books a route's estimated max-cost into reserved_usd under a per-day global advisory lock; the gate counts committed + reserved spend; Settle releases the reservation and books the real per-component CostBreakdown. A concurrent burst overshoots by at most one reservation.
  • Never charge for silence: a 2xx is billed; if the reply then fails to send, refund the request SLOT (not the USD) + react. A failed call releases the reservation + refunds the slot; a panic releases via a deferred guard.
  • Caps: DAILY_USD_CEILING (global $), PER_USER_DAILY_CAP (requests/user), PER_USER_DAILY_USD (optional $/user). at-most-once dedup is durable (SeenEvent/MarkTxn); generation is per-(room,thread) single-flight.
  • One overall per-request deadline bounds the whole cascade (no per-stage 3×60s accretion).
  • Telemetry: one request_log row per engaged request (route, per-component $, latency, degrade reasons), written async + isolated (its failure never drops a reply), TELEMETRY_ENABLED default off, time-based retention.
  • Store: dedicated Postgres vojo_ai (pgx); schema is an ordered migrations array in store.go. Operational state only (dedup, spend ledger, grounding cap, request_log, warned-encrypted) — no message content (that lives in Synapse).

Current prod config (the cheap web path)

WEB_PROVIDER=gemini_grounding: Gemini 2.5 Flash-Lite does the fetch via the native v1beta google_search tool (NOT the OpenAI-compat endpoint — grounding is silently ignored there, F-EXT-3), then Grok-4.3 voices it. ~$0.0013/query (vs ~$0.022 for the old two-Grok path); grounding is free under the daily RPD, guarded by WEB_GROUNDING_DAILY_CAP. XAI_MODEL=grok-4.3

  • GROK_REASONING_EFFORT=none (4.3 otherwise reasons on every reply). Full flag table in the README.

Trigger hygiene (what reaches the search query)

The raw event body is cleaned once at the top of respond (bot.go, stripBotMention(stripReplyFallback(...))) before it is used as the web-search query, the prompt trigger, the buffer entry, or telemetry. Two egress hazards both rode the raw body: the bot's own mention pill fallback (cinny writes the full mxid @ai:vojo.chat into the plain body), and the rich-reply quoted parent. The mxid was the worse one — sent verbatim to gemini grounding it made the provider treat vojo.chat as the subject entity ("was the Vojo.chat messenger removed?") and confabulate a confident wrong answer; the same question without the mention (e.g. in a DM, which has no mention) grounded correctly. Mention detection is unaffected — it runs upstream on m.mentions/replyParentIsBot (mentions.go), not on body text. The human display name is deliberately not stripped, so "что умеет Vojo AI" survives.

Web answers append a compact, deduped Источники: [rbc.ru](…), … line built server-side after Grok's prose (sources.go sourcesFooter), never via the Grok prompt (the synth note still says "no URLs or links" — instructing Grok to cite made it paste ugly redirects and mis-attribute them). The label is the publisher domain (web.title); the link is the citation's URL — for gemini_grounding that is the opaque grounding-api-redirect URL, which the end user clicks to reach the real article. Gemini Grounding terms (verified against ai.google.dev/gemini-api/terms) constrain this: the redirect must not be resolved server-side (no "programmatic/automated access to Grounded Results"), and a strict reading also requires showing the Search-Suggestions chip (searchEntryPoint.renderedContent, HTML/CSS) — which a sanitised Matrix bubble can't render, so that part stays unmet (pre-existing gap; the bot already shows grounded prose without it). The footer is appended to the sent message only, not the buffered turn — the redirect links are ephemeral, so they must not pollute the history that feeds later prompts. grok_web_search returns real publisher URLs (no Google display ToS), so switching WEB_PROVIDER is the path to true article links — at ~17× the cost.

Observability (logs + per-request trace)

log/slog to stderr (LOG_LEVEL, LOG_FORMAT=text|json). A context-aware handler (logging.go) stamps a per-request trace_id — minted once per handled event in handleEvent (trace.go) and carried in ctx down to the model HTTP call — onto every log line, so one trace_id greps the whole request trail (the userver idiom; the id is OTel-trace-id shaped for a future exporter). Routing diagnostics (route decided / generation outcome) are DEBUG, content-free. Full model request/response bodies are gated by a per-user allowlist LOG_BODIES_USERS (empty = nobody) and LOG_LEVEL=debug, truncated to a fixed ~4 KB cap, with URL/headers (the API key) never logged — decided once at admission via a verbose flag in ctx, read by the dumb transport. This is the debug path; request_log (TELEMETRY_*) is the separate analytics path — they correlate via trace_id/event_id but are independent. Ship JSON stdout to OpenSearch/Loki with a collector (Fluent Bit/Vector); the bot never talks to a log backend. Full flag table in the README.

Building / testing

Go toolchain lives at /home/ubuntu/.go-toolchain/go/bin (NOT on PATH). Store-backed tests need AI_BOT_TEST_DATABASE_URL (a throwaway Postgres) and skip without it, so go test ./... stays green on a machine without one. Keep gofmt -l, go vet ./..., go test -race ./... clean.