11 KiB
Vojo AI bot (@ai:vojo.chat)
A Go Synapse application service in apps/ai-bot/ — not a normal
bot user. Answers @-mentions in groups and every message in 1:1s, over the plaintext
CS-API (Vojo rooms are unencrypted by default). It is a separate server-side service
deployed next to Synapse; it ships nothing to the web client.
- Operator / full env reference:
apps/ai-bot/README.md(config tables, setup, deploy). - Deploy / server config: server-side.md (the
ai-botservice row, thevojo_aiPostgres role). - Detailed design SOT:
docs/plans/grok_bot.md+docs/plans/ai_backend_build_plan.md— local-only,docs/plans/is gitignored.
Request flow
Synapse pushes a transaction → the bot acks 200 instantly, then processes async per-room
(appservice.go), so a slow model call never blocks other
rooms or the homeserver. handleMessage (bot.go) gates in order:
durable+in-memory dedup → encrypted-room skip → decode / edit / own-message / notice →
foreign-server leave → DM-or-mention → media react → resolve conversation (thread) →
per-(room,thread) single-flight → spawn respond. respond = Reserve(estimate) →
generate() → Settle(actual) → sendReply; any failure produces an emoji react, never silence.
Conversations (threads) — ChatGPT-style multi-chat
In a 1:1 DM a top-level message roots a new thread (a fresh conversation) and the bot answers
inside it (bot.go resolveThreadRoot); a message already in a thread
continues it (F27). Groups are never auto-threaded — the gate is structural (isDM), not a
flag, so the threading feature can never change group behavior. Auto-threading in DMs is always
on (the old THREAD_CONVERSATIONS env flag was removed — it only created a host/backend
mismatch footgun). Context and single-flight are keyed per-(room, thread) so conversations
neither share history nor block each other; typing is room-level (Matrix has no per-thread typing)
via a refcount; per-thread context buffers are LRU-bounded (maxConvBuffersPerRoom).
Host pairing: the cinny host shows the conversation surface for a bot only when its
config.json preset has experience.type: "ai-chat" (today only @ai). That surface is a
fully isolated, native in-client chat (features/bots/BotConversations + AiChatHeader +
AiChatMenu, reusing the generic ThreadDrawer/RoomInput) — it shares no runtime with the
bridge widget pipeline (no BotShell/iframe, no show-chat toggle; there is no vojo-ai widget
any more). Bridges keep experience.type: "matrix-widget" (iframe + show-chat fallback). Because
the backend now always threads DMs, any DM message @ai answers lands in a thread the host can open.
Cascade (flag-gated "operator cascade", every layer default OFF)
generate() (cascade.go) routes (router.go)
then dispatches; any layer off or failing degrades to grok_direct (never an error to the user):
grok_direct— DEFAULT, one Grok call. Grok is the final voice on everything substantive.trivial_direct— greetings/acks → cheap Gemini (TRIVIAL_OFFLOAD_ENABLED).web_then_grok— fresh facts: a WebProvider fetches a grounded digest + citations, then Grok synthesises the answer in voice (web.go).reason_then_grok— manual trigger ("подумай глубже") → Grok at a higherreasoning_effort.project_then_grok— questions about the Vojo product itself (PROJECT_KB_ENABLED): a curated KB (operator data fromPROJECT_KB_PATH, default the bundledprompts/vojo_kb.txt) is injected as a system note and Grok answers product claims strictly from it (anti-hallucination — Grok has no parametric Vojo knowledge, and the web doesn't either). The same note carries a per-turn tone override so product answers come in a plain, matter-of-fact product register — the base persona's dry irony and "bring-your-own-take" warmth are dropped for this route (register only; the entity-scoped sourcing license and the language rule are untouched). Gated by the classifier'sabout_projectsignal (the context-aware judge — it resolves follow-ups like "Про этот" → the app); a false positive is bounded by the entity-scoped note. Beats every web arm. One Grok call, so it costs ~the same asgrok_direct. See docs/plans/ai_project_knowledge.md.- Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (
grok_direct).
Invariant: all cascade flags OFF == today's bot — a single grok_direct call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes.
Provider seam (no vendor names in business logic)
llm.go (Message/Usage/LLMRequest/LLMResponse/LLMClient) +
httpllm.go (shared OpenAI-compatible transport + retry) + thin
adapters provider_xai.go /
provider_gemini.go + pricing.go
(priceFor model→price map). Bot.llm is an LLMClient, never a concrete vendor type.
Money, invariants & store (store.go)
- Ceiling is TOCTOU-safe:
Reservebooks a route's estimated max-cost intoreserved_usdunder a per-day global advisory lock; the gate counts committed + reserved spend;Settlereleases the reservation and books the real per-componentCostBreakdown. A concurrent burst overshoots by at most one reservation. - Never charge for silence: a 2xx is billed; if the reply then fails to send, refund the request SLOT (not the USD) + react. A failed call releases the reservation + refunds the slot; a panic releases via a deferred guard.
- Caps:
DAILY_USD_CEILING(global $),PER_USER_DAILY_CAP(requests/user),PER_USER_DAILY_USD(optional $/user). at-most-once dedup is durable (SeenEvent/MarkTxn); generation is per-(room,thread) single-flight. - One overall per-request deadline bounds the whole cascade (no per-stage 3×60s accretion).
- Telemetry: one
request_logrow per engaged request (route, per-component $, latency, degrade reasons), written async + isolated (its failure never drops a reply),TELEMETRY_ENABLEDdefault off, time-based retention. - Store: dedicated Postgres
vojo_ai(pgx); schema is an orderedmigrationsarray in store.go. Operational state only (dedup, spend ledger, grounding cap,request_log, warned-encrypted) — no message content (that lives in Synapse).
Current prod config (the cheap web path)
WEB_PROVIDER=gemini_grounding: Gemini 2.5 Flash-Lite does the fetch via the native v1beta
google_search tool (NOT the OpenAI-compat endpoint — grounding is silently ignored there,
F-EXT-3), then Grok-4.3 voices it. ~$0.0013/query (vs ~$0.022 for the old two-Grok path);
grounding is free under the daily RPD, guarded by WEB_GROUNDING_DAILY_CAP. XAI_MODEL=grok-4.3
GROK_REASONING_EFFORT=none(4.3 otherwise reasons on every reply). Full flag table in the README.
Trigger hygiene (what reaches the search query)
The raw event body is cleaned once at the top of respond (bot.go,
stripBotMention(stripReplyFallback(...))) before it is used as the web-search query, the prompt
trigger, the buffer entry, or telemetry. Two egress hazards both rode the raw body: the bot's own
mention pill fallback (cinny writes the full mxid @ai:vojo.chat into the plain body), and
the rich-reply quoted parent. The mxid was the worse one — sent verbatim to gemini grounding it
made the provider treat vojo.chat as the subject entity ("was the Vojo.chat messenger
removed?") and confabulate a confident wrong answer; the same question without the mention (e.g. in
a DM, which has no mention) grounded correctly. Mention detection is unaffected — it runs
upstream on m.mentions/replyParentIsBot (mentions.go), not on
body text. The human display name is deliberately not stripped, so "что умеет Vojo AI" survives.
Source attribution (the "Sources" footer)
Web answers append a compact, deduped Источники: [rbc.ru](…), … line built server-side
after Grok's prose (sources.go sourcesFooter), never via the Grok
prompt (the synth note still says "no URLs or links" — instructing Grok to cite made it paste ugly
redirects and mis-attribute them). The label is the publisher domain (web.title); the link is
the citation's URL — for gemini_grounding that is the opaque grounding-api-redirect URL, which
the end user clicks to reach the real article. Gemini Grounding terms (verified against
ai.google.dev/gemini-api/terms) constrain this: the redirect must not be resolved
server-side (no "programmatic/automated access to Grounded Results"), and a strict reading also
requires showing the Search-Suggestions chip (searchEntryPoint.renderedContent, HTML/CSS) —
which a sanitised Matrix bubble can't render, so that part stays unmet (pre-existing gap; the bot
already shows grounded prose without it). The footer is appended to the sent message only, not
the buffered turn — the redirect links are ephemeral, so they must not pollute the history that
feeds later prompts. grok_web_search returns real publisher URLs (no Google display ToS), so
switching WEB_PROVIDER is the path to true article links — at ~17× the cost.
Observability (logs + per-request trace)
log/slog to stderr (LOG_LEVEL, LOG_FORMAT=text|json). A context-aware handler
(logging.go) stamps a per-request trace_id —
minted once per handled event in handleEvent (trace.go)
and carried in ctx down to the model HTTP call — onto every log line, so one
trace_id greps the whole request trail (the userver idiom; the id is OTel-trace-id
shaped for a future exporter). Routing diagnostics (route decided / generation outcome) are DEBUG, content-free. Full model request/response bodies are gated by a
per-user allowlist LOG_BODIES_USERS (empty = nobody) and LOG_LEVEL=debug,
truncated to a fixed ~4 KB cap, with URL/headers (the API key) never logged — decided once
at admission via a verbose flag in ctx, read by the dumb transport. This is the
debug path; request_log (TELEMETRY_*) is the separate analytics path — they
correlate via trace_id/event_id but are independent. Ship JSON stdout to
OpenSearch/Loki with a collector (Fluent Bit/Vector); the bot never talks to a log
backend. Full flag table in the README.
Building / testing
Go toolchain lives at /home/ubuntu/.go-toolchain/go/bin (NOT on PATH). Store-backed tests need
AI_BOT_TEST_DATABASE_URL (a throwaway Postgres) and skip without it, so go test ./... stays
green on a machine without one. Keep gofmt -l, go vet ./..., go test -race ./... clean.