vojo/docs/ai/ai-bot.md

6.7 KiB
Raw Blame History

Vojo AI bot (@ai:vojo.chat)

A Go Synapse application service in apps/ai-bot/ — not a normal bot user. Answers @-mentions in groups and every message in 1:1s, over the plaintext CS-API (Vojo rooms are unencrypted by default). It is a separate server-side service deployed next to Synapse; it ships nothing to the web client.

  • Operator / full env reference: apps/ai-bot/README.md (config tables, setup, deploy).
  • Deploy / server config: server-side.md (the ai-bot service row, the vojo_ai Postgres role).
  • Detailed design SOT: docs/plans/grok_bot.md + docs/plans/ai_backend_build_plan.mdlocal-only, docs/plans/ is gitignored.

Request flow

Synapse pushes a transaction → the bot acks 200 instantly, then processes async per-room (appservice.go), so a slow model call never blocks other rooms or the homeserver. handleMessage (bot.go) gates in order: durable+in-memory dedup → encrypted-room skip → decode / edit / own-message / notice → foreign-server leave → DM-or-mention → media react → resolve conversation (thread) → per-(room,thread) single-flight → spawn respond. respond = Reserve(estimate)generate()Settle(actual)sendReply; any failure produces an emoji react, never silence.

Conversations (threads) — ChatGPT-style multi-chat

In a 1:1 DM a top-level message roots a new thread (a fresh conversation) and the bot answers inside it (bot.go resolveThreadRoot); a message already in a thread continues it (F27). Groups are never auto-threaded — the gate is structural (isDM), not a flag, so the threading feature can never change group behavior. Auto-threading in DMs is always on (the old THREAD_CONVERSATIONS env flag was removed — it only created a host/backend mismatch footgun). Context and single-flight are keyed per-(room, thread) so conversations neither share history nor block each other; typing is room-level (Matrix has no per-thread typing) via a refcount; per-thread context buffers are LRU-bounded (maxConvBuffersPerRoom).

Host pairing: the cinny host shows the conversation surface for a bot only when its config.json preset has experience.type: "ai-chat" (today only @ai). That surface is a fully isolated, native in-client chat (features/bots/BotConversations + AiChatHeader + AiChatMenu, reusing the generic ThreadDrawer/RoomInput) — it shares no runtime with the bridge widget pipeline (no BotShell/iframe, no show-chat toggle; there is no vojo-ai widget any more). Bridges keep experience.type: "matrix-widget" (iframe + show-chat fallback). Because the backend now always threads DMs, any DM message @ai answers lands in a thread the host can open.

Cascade (flag-gated "operator cascade", every layer default OFF)

generate() (cascade.go) routes (router.go) then dispatches; any layer off or failing degrades to grok_direct (never an error to the user):

  • grok_direct — DEFAULT, one Grok call. Grok is the final voice on everything substantive.
  • trivial_direct — greetings/acks → cheap Gemini (TRIVIAL_OFFLOAD_ENABLED).
  • web_then_grok — fresh facts: a WebProvider fetches a grounded digest + citations, then Grok synthesises the answer in voice (web.go).
  • reason_then_grok — manual trigger ("подумай глубже") → Grok at a higher reasoning_effort.
  • Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (grok_direct).

Invariant: all cascade flags OFF == today's bot — a single grok_direct call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes.

Provider seam (no vendor names in business logic)

llm.go (Message/Usage/LLMRequest/LLMResponse/LLMClient) + httpllm.go (shared OpenAI-compatible transport + retry) + thin adapters provider_xai.go / provider_gemini.go + pricing.go (priceFor model→price map). Bot.llm is an LLMClient, never a concrete vendor type.

Money, invariants & store (store.go)

  • Ceiling is TOCTOU-safe: Reserve books a route's estimated max-cost into reserved_usd under a per-day global advisory lock; the gate counts committed + reserved spend; Settle releases the reservation and books the real per-component CostBreakdown. A concurrent burst overshoots by at most one reservation.
  • Never charge for silence: a 2xx is billed; if the reply then fails to send, refund the request SLOT (not the USD) + react. A failed call releases the reservation + refunds the slot; a panic releases via a deferred guard.
  • Caps: DAILY_USD_CEILING (global $), PER_USER_DAILY_CAP (requests/user), PER_USER_DAILY_USD (optional $/user). at-most-once dedup is durable (SeenEvent/MarkTxn); generation is per-(room,thread) single-flight.
  • One overall per-request deadline bounds the whole cascade (no per-stage 3×60s accretion).
  • Telemetry: one request_log row per engaged request (route, per-component $, latency, degrade reasons), written async + isolated (its failure never drops a reply), TELEMETRY_ENABLED default off, time-based retention.
  • Store: dedicated Postgres vojo_ai (pgx); schema is an ordered migrations array in store.go. Operational state only (dedup, spend ledger, grounding cap, request_log, warned-encrypted) — no message content (that lives in Synapse).

Current prod config (the cheap web path)

WEB_PROVIDER=gemini_grounding: Gemini 2.5 Flash-Lite does the fetch via the native v1beta google_search tool (NOT the OpenAI-compat endpoint — grounding is silently ignored there, F-EXT-3), then Grok-4.3 voices it. ~$0.0013/query (vs ~$0.022 for the old two-Grok path); grounding is free under the daily RPD, guarded by WEB_GROUNDING_DAILY_CAP. XAI_MODEL=grok-4.3

  • GROK_REASONING_EFFORT=none (4.3 otherwise reasons on every reply). Full flag table in the README.

Building / testing

Go toolchain lives at /home/ubuntu/.go-toolchain/go/bin (NOT on PATH). Store-backed tests need AI_BOT_TEST_DATABASE_URL (a throwaway Postgres) and skip without it, so go test ./... stays green on a machine without one. Keep gofmt -l, go vet ./..., go test -race ./... clean.