Vojo AI bot (`@ai:vojo.chat`)

A Go Synapse application service in apps/ai-bot/ — not a normal bot user. Answers @-mentions in groups and every message in 1:1s, over the plaintext CS-API (Vojo rooms are unencrypted by default). It is a separate server-side service deployed next to Synapse; it ships nothing to the web client.

Operator / full env reference: apps/ai-bot/README.md (config tables, setup, deploy).
Deploy / server config: server-side.md (the ai-bot service row, the vojo_ai Postgres role).
Detailed design SOT: docs/plans/grok_bot.md + docs/plans/ai_backend_build_plan.md — local-only, docs/plans/ is gitignored.

Request flow

Synapse pushes a transaction → the bot acks 200 instantly, then processes async per-room (appservice.go), so a slow model call never blocks other rooms or the homeserver. handleMessage (bot.go) gates in order: durable+in-memory dedup → encrypted-room skip → decode / edit / own-message / notice → foreign-server leave → DM-or-mention → media react → resolve conversation (thread) → per-(room,thread) single-flight → spawn respond. respond = Reserve(estimate) → generate() → Settle(actual) → sendReply; any failure produces an emoji react, never silence.

Conversations (threads) — ChatGPT-style multi-chat

In a 1:1 DM a top-level message roots a new thread (a fresh conversation) and the bot answers inside it (bot.go resolveThreadRoot); a message already in a thread continues it (F27). Groups are never auto-threaded — the gate is structural (isDM), not a flag, so the threading feature can never change group behavior. Auto-threading in DMs is always on (the old THREAD_CONVERSATIONS env flag was removed — it only created a host/backend mismatch footgun). Context and single-flight are keyed per-(room, thread) so conversations neither share history nor block each other; typing is room-level (Matrix has no per-thread typing) via a refcount; per-thread context buffers are LRU-bounded (maxConvBuffersPerRoom).

Host pairing: the cinny host shows the conversation surface for a bot only when its config.json preset has experience.type: "ai-chat" (today only @ai). That surface is a fully isolated, native in-client chat (features/bots/BotConversations + AiChatHeader + AiChatMenu, reusing the generic ThreadDrawer/RoomInput) — it shares no runtime with the bridge widget pipeline (no BotShell/iframe, no show-chat toggle; there is no vojo-ai widget any more). Bridges keep experience.type: "matrix-widget" (iframe + show-chat fallback). Because the backend now always threads DMs, any DM message @ai answers lands in a thread the host can open.

Cascade (flag-gated "operator cascade", every layer default OFF)

generate() (cascade.go) routes (router.go) then dispatches; any layer off or failing degrades to grok_direct (never an error to the user):

grok_direct — DEFAULT, one Grok call. Grok is the final voice on everything substantive.
trivial_direct — greetings/acks → cheap Gemini (TRIVIAL_OFFLOAD_ENABLED).
web_then_grok — fresh facts: a WebProvider fetches a grounded digest + citations, then Grok synthesises the answer in voice (web.go).
reason_then_grok — manual trigger ("подумай глубже") → Grok at a higher reasoning_effort.
project_then_grok — questions about the Vojo product itself (PROJECT_KB_ENABLED): a curated KB (operator data from PROJECT_KB_PATH, default the bundled prompts/vojo_kb.txt) is injected as a system note and Grok answers product claims strictly from it (anti-hallucination — Grok has no parametric Vojo knowledge, and the web doesn't either). Gated by the classifier's about_project signal (the context-aware judge — it resolves follow-ups like "Про этот" → the app); a false positive is bounded by the entity-scoped note. Beats every web arm. One Grok call, so it costs ~the same as grok_direct. See docs/plans/ai_project_knowledge.md.
Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (grok_direct).

Invariant: all cascade flags OFF == today's bot — a single grok_direct call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes.

Provider seam (no vendor names in business logic)

llm.go (Message/Usage/LLMRequest/LLMResponse/LLMClient) + httpllm.go (shared OpenAI-compatible transport + retry) + thin adapters provider_xai.go / provider_gemini.go + pricing.go (priceFor model→price map). Bot.llm is an LLMClient, never a concrete vendor type.

Money, invariants & store (store.go)

Ceiling is TOCTOU-safe: Reserve books a route's estimated max-cost into reserved_usd under a per-day global advisory lock; the gate counts committed + reserved spend; Settle releases the reservation and books the real per-component CostBreakdown. A concurrent burst overshoots by at most one reservation.
Never charge for silence: a 2xx is billed; if the reply then fails to send, refund the request SLOT (not the USD) + react. A failed call releases the reservation + refunds the slot; a panic releases via a deferred guard.
Caps: DAILY_USD_CEILING (global $), PER_USER_DAILY_CAP (requests/user), PER_USER_DAILY_USD (optional $/user). at-most-once dedup is durable (SeenEvent/MarkTxn); generation is per-(room,thread) single-flight.
One overall per-request deadline bounds the whole cascade (no per-stage 3×60s accretion).
Telemetry: one request_log row per engaged request (route, per-component $, latency, degrade reasons), written async + isolated (its failure never drops a reply), TELEMETRY_ENABLED default off, time-based retention.
Store: dedicated Postgres vojo_ai (pgx); schema is an ordered migrations array in store.go. Operational state only (dedup, spend ledger, grounding cap, request_log, warned-encrypted) — no message content (that lives in Synapse).

Current prod config (the cheap web path)

WEB_PROVIDER=gemini_grounding: Gemini 2.5 Flash-Lite does the fetch via the native v1beta google_search tool (NOT the OpenAI-compat endpoint — grounding is silently ignored there, F-EXT-3), then Grok-4.3 voices it. ~$0.0013/query (vs ~$0.022 for the old two-Grok path); grounding is free under the daily RPD, guarded by WEB_GROUNDING_DAILY_CAP. XAI_MODEL=grok-4.3

GROK_REASONING_EFFORT=none (4.3 otherwise reasons on every reply). Full flag table in the README.

Observability (logs + per-request trace)

log/slog to stderr (LOG_LEVEL, LOG_FORMAT=text|json). A context-aware handler (logging.go) stamps a per-request trace_id — minted once per handled event in handleEvent (trace.go) and carried in ctx down to the model HTTP call — onto every log line, so one trace_id greps the whole request trail (the userver idiom; the id is OTel-trace-id shaped for a future exporter). Routing diagnostics (route decided / generation outcome) are DEBUG, content-free. Full model request/response bodies are gated by a per-user allowlist LOG_BODIES_USERS (empty = nobody) and LOG_LEVEL=debug, truncated to a fixed ~4 KB cap, with URL/headers (the API key) never logged — decided once at admission via a verbose flag in ctx, read by the dumb transport. This is the debug path; request_log (TELEMETRY_*) is the separate analytics path — they correlate via trace_id/event_id but are independent. Ship JSON stdout to OpenSearch/Loki with a collector (Fluent Bit/Vector); the bot never talks to a log backend. Full flag table in the README.

Building / testing

Go toolchain lives at /home/ubuntu/.go-toolchain/go/bin (NOT on PATH). Store-backed tests need AI_BOT_TEST_DATABASE_URL (a throwaway Postgres) and skip without it, so go test ./... stays green on a machine without one. Keep gofmt -l, go vet ./..., go test -race ./... clean.

8.6 KiB Raw Blame History Unescape Escape

Vojo AI bot (@ai:vojo.chat)