vojo/docs/ai/ai-bot.md

5.2 KiB
Raw Blame History

Vojo AI bot (@ai:vojo.chat)

A Go Synapse application service in apps/ai-bot/ — not a normal bot user. Answers @-mentions in groups and every message in 1:1s, over the plaintext CS-API (Vojo rooms are unencrypted by default). It is a separate server-side service deployed next to Synapse; it ships nothing to the web client.

  • Operator / full env reference: apps/ai-bot/README.md (config tables, setup, deploy).
  • Deploy / server config: server-side.md (the ai-bot service row, the vojo_ai Postgres role).
  • Detailed design SOT: docs/plans/grok_bot.md + docs/plans/ai_backend_build_plan.mdlocal-only, docs/plans/ is gitignored.

Request flow

Synapse pushes a transaction → the bot acks 200 instantly, then processes async per-room (appservice.go), so a slow model call never blocks other rooms or the homeserver. handleMessage (bot.go) gates in order: durable+in-memory dedup → encrypted-room skip → decode / edit / own-message / notice → foreign-server leave → DM-or-mention → media react → per-room single-flight → spawn respond. respond = Reserve(estimate)generate()Settle(actual)sendReply; any failure produces an emoji react, never silence.

Cascade (flag-gated "operator cascade", every layer default OFF)

generate() (cascade.go) routes (router.go) then dispatches; any layer off or failing degrades to grok_direct (never an error to the user):

  • grok_direct — DEFAULT, one Grok call. Grok is the final voice on everything substantive.
  • trivial_direct — greetings/acks → cheap Gemini (TRIVIAL_OFFLOAD_ENABLED).
  • web_then_grok — fresh facts: a WebProvider fetches a grounded digest + citations, then Grok synthesises the answer in voice (web.go).
  • reason_then_grok — manual trigger ("подумай глубже") → Grok at a higher reasoning_effort.
  • Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (grok_direct).

Invariant: all cascade flags OFF == today's bot — a single grok_direct call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes.

Provider seam (no vendor names in business logic)

llm.go (Message/Usage/LLMRequest/LLMResponse/LLMClient) + httpllm.go (shared OpenAI-compatible transport + retry) + thin adapters provider_xai.go / provider_gemini.go + pricing.go (priceFor model→price map). Bot.llm is an LLMClient, never a concrete vendor type.

Money, invariants & store (store.go)

  • Ceiling is TOCTOU-safe: Reserve books a route's estimated max-cost into reserved_usd under a per-day global advisory lock; the gate counts committed + reserved spend; Settle releases the reservation and books the real per-component CostBreakdown. A concurrent burst overshoots by at most one reservation.
  • Never charge for silence: a 2xx is billed; if the reply then fails to send, refund the request SLOT (not the USD) + react. A failed call releases the reservation + refunds the slot; a panic releases via a deferred guard.
  • Caps: DAILY_USD_CEILING (global $), PER_USER_DAILY_CAP (requests/user), PER_USER_DAILY_USD (optional $/user). at-most-once dedup is durable (SeenEvent/MarkTxn); generation is per-room single-flight.
  • One overall per-request deadline bounds the whole cascade (no per-stage 3×60s accretion).
  • Telemetry: one request_log row per engaged request (route, per-component $, latency, degrade reasons), written async + isolated (its failure never drops a reply), TELEMETRY_ENABLED default off, time-based retention.
  • Store: dedicated Postgres vojo_ai (pgx); schema is an ordered migrations array in store.go. Operational state only (dedup, spend ledger, grounding cap, request_log, warned-encrypted) — no message content (that lives in Synapse).

Current prod config (the cheap web path)

WEB_PROVIDER=gemini_grounding: Gemini 2.5 Flash-Lite does the fetch via the native v1beta google_search tool (NOT the OpenAI-compat endpoint — grounding is silently ignored there, F-EXT-3), then Grok-4.3 voices it. ~$0.0013/query (vs ~$0.022 for the old two-Grok path); grounding is free under the daily RPD, guarded by WEB_GROUNDING_DAILY_CAP. XAI_MODEL=grok-4.3

  • GROK_REASONING_EFFORT=none (4.3 otherwise reasons on every reply). Full flag table in the README.

Building / testing

Go toolchain lives at /home/ubuntu/.go-toolchain/go/bin (NOT on PATH). Store-backed tests need AI_BOT_TEST_DATABASE_URL (a throwaway Postgres) and skip without it, so go test ./... stays green on a machine without one. Keep gofmt -l, go vet ./..., go test -race ./... clean.