8.6 KiB
Vojo AI bot (@ai:vojo.chat)
A Go Synapse application service in apps/ai-bot/ — not a normal
bot user. Answers @-mentions in groups and every message in 1:1s, over the plaintext
CS-API (Vojo rooms are unencrypted by default). It is a separate server-side service
deployed next to Synapse; it ships nothing to the web client.
- Operator / full env reference:
apps/ai-bot/README.md(config tables, setup, deploy). - Deploy / server config: server-side.md (the
ai-botservice row, thevojo_aiPostgres role). - Detailed design SOT:
docs/plans/grok_bot.md+docs/plans/ai_backend_build_plan.md— local-only,docs/plans/is gitignored.
Request flow
Synapse pushes a transaction → the bot acks 200 instantly, then processes async per-room
(appservice.go), so a slow model call never blocks other
rooms or the homeserver. handleMessage (bot.go) gates in order:
durable+in-memory dedup → encrypted-room skip → decode / edit / own-message / notice →
foreign-server leave → DM-or-mention → media react → resolve conversation (thread) →
per-(room,thread) single-flight → spawn respond. respond = Reserve(estimate) →
generate() → Settle(actual) → sendReply; any failure produces an emoji react, never silence.
Conversations (threads) — ChatGPT-style multi-chat
In a 1:1 DM a top-level message roots a new thread (a fresh conversation) and the bot answers
inside it (bot.go resolveThreadRoot); a message already in a thread
continues it (F27). Groups are never auto-threaded — the gate is structural (isDM), not a
flag, so the threading feature can never change group behavior. Auto-threading in DMs is always
on (the old THREAD_CONVERSATIONS env flag was removed — it only created a host/backend
mismatch footgun). Context and single-flight are keyed per-(room, thread) so conversations
neither share history nor block each other; typing is room-level (Matrix has no per-thread typing)
via a refcount; per-thread context buffers are LRU-bounded (maxConvBuffersPerRoom).
Host pairing: the cinny host shows the conversation surface for a bot only when its
config.json preset has experience.type: "ai-chat" (today only @ai). That surface is a
fully isolated, native in-client chat (features/bots/BotConversations + AiChatHeader +
AiChatMenu, reusing the generic ThreadDrawer/RoomInput) — it shares no runtime with the
bridge widget pipeline (no BotShell/iframe, no show-chat toggle; there is no vojo-ai widget
any more). Bridges keep experience.type: "matrix-widget" (iframe + show-chat fallback). Because
the backend now always threads DMs, any DM message @ai answers lands in a thread the host can open.
Cascade (flag-gated "operator cascade", every layer default OFF)
generate() (cascade.go) routes (router.go)
then dispatches; any layer off or failing degrades to grok_direct (never an error to the user):
grok_direct— DEFAULT, one Grok call. Grok is the final voice on everything substantive.trivial_direct— greetings/acks → cheap Gemini (TRIVIAL_OFFLOAD_ENABLED).web_then_grok— fresh facts: a WebProvider fetches a grounded digest + citations, then Grok synthesises the answer in voice (web.go).reason_then_grok— manual trigger ("подумай глубже") → Grok at a higherreasoning_effort.project_then_grok— questions about the Vojo product itself (PROJECT_KB_ENABLED): a curated KB (operator data fromPROJECT_KB_PATH, default the bundledprompts/vojo_kb.txt) is injected as a system note and Grok answers product claims strictly from it (anti-hallucination — Grok has no parametric Vojo knowledge, and the web doesn't either). Gated by the classifier'sabout_projectsignal (the context-aware judge — it resolves follow-ups like "Про этот" → the app); a false positive is bounded by the entity-scoped note. Beats every web arm. One Grok call, so it costs ~the same asgrok_direct. See docs/plans/ai_project_knowledge.md.- Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (
grok_direct).
Invariant: all cascade flags OFF == today's bot — a single grok_direct call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes.
Provider seam (no vendor names in business logic)
llm.go (Message/Usage/LLMRequest/LLMResponse/LLMClient) +
httpllm.go (shared OpenAI-compatible transport + retry) + thin
adapters provider_xai.go /
provider_gemini.go + pricing.go
(priceFor model→price map). Bot.llm is an LLMClient, never a concrete vendor type.
Money, invariants & store (store.go)
- Ceiling is TOCTOU-safe:
Reservebooks a route's estimated max-cost intoreserved_usdunder a per-day global advisory lock; the gate counts committed + reserved spend;Settlereleases the reservation and books the real per-componentCostBreakdown. A concurrent burst overshoots by at most one reservation. - Never charge for silence: a 2xx is billed; if the reply then fails to send, refund the request SLOT (not the USD) + react. A failed call releases the reservation + refunds the slot; a panic releases via a deferred guard.
- Caps:
DAILY_USD_CEILING(global $),PER_USER_DAILY_CAP(requests/user),PER_USER_DAILY_USD(optional $/user). at-most-once dedup is durable (SeenEvent/MarkTxn); generation is per-(room,thread) single-flight. - One overall per-request deadline bounds the whole cascade (no per-stage 3×60s accretion).
- Telemetry: one
request_logrow per engaged request (route, per-component $, latency, degrade reasons), written async + isolated (its failure never drops a reply),TELEMETRY_ENABLEDdefault off, time-based retention. - Store: dedicated Postgres
vojo_ai(pgx); schema is an orderedmigrationsarray in store.go. Operational state only (dedup, spend ledger, grounding cap,request_log, warned-encrypted) — no message content (that lives in Synapse).
Current prod config (the cheap web path)
WEB_PROVIDER=gemini_grounding: Gemini 2.5 Flash-Lite does the fetch via the native v1beta
google_search tool (NOT the OpenAI-compat endpoint — grounding is silently ignored there,
F-EXT-3), then Grok-4.3 voices it. ~$0.0013/query (vs ~$0.022 for the old two-Grok path);
grounding is free under the daily RPD, guarded by WEB_GROUNDING_DAILY_CAP. XAI_MODEL=grok-4.3
GROK_REASONING_EFFORT=none(4.3 otherwise reasons on every reply). Full flag table in the README.
Observability (logs + per-request trace)
log/slog to stderr (LOG_LEVEL, LOG_FORMAT=text|json). A context-aware handler
(logging.go) stamps a per-request trace_id —
minted once per handled event in handleEvent (trace.go)
and carried in ctx down to the model HTTP call — onto every log line, so one
trace_id greps the whole request trail (the userver idiom; the id is OTel-trace-id
shaped for a future exporter). Routing diagnostics (route decided / generation outcome) are DEBUG, content-free. Full model request/response bodies are gated by a
per-user allowlist LOG_BODIES_USERS (empty = nobody) and LOG_LEVEL=debug,
truncated to a fixed ~4 KB cap, with URL/headers (the API key) never logged — decided once
at admission via a verbose flag in ctx, read by the dumb transport. This is the
debug path; request_log (TELEMETRY_*) is the separate analytics path — they
correlate via trace_id/event_id but are independent. Ship JSON stdout to
OpenSearch/Loki with a collector (Fluent Bit/Vector); the bot never talks to a log
backend. Full flag table in the README.
Building / testing
Go toolchain lives at /home/ubuntu/.go-toolchain/go/bin (NOT on PATH). Store-backed tests need
AI_BOT_TEST_DATABASE_URL (a throwaway Postgres) and skip without it, so go test ./... stays
green on a machine without one. Keep gofmt -l, go vet ./..., go test -race ./... clean.