14 KiB
ai-bot
A plaintext Matrix bot user (@ai:vojo.chat, display name Vojo AI) that
answers xAI Grok completions in its rooms: @-mentions in group rooms and every
message in a 1:1. It runs as a Synapse application service — Synapse pushes
event transactions to the bot's HTTP endpoint; the bot speaks the Matrix CS-API
back over plain HTTP (no Olm/Megolm — Vojo rooms are unencrypted by default) and
calls the xAI OpenAI-compatible Chat Completions API.
Authentication is the appservice as_token/hs_token (from the registration) —
non-expiring, so there is no token rotation and no stored password.
It is a separate server-side service, deployed next to Synapse. It lives in
this repo (alongside apps/widget-*) but ships nothing to the web client.
Branding: user-facing name is Vojo AI with a generic icon. "Grok" appears only as the factual attribution ("powered by Grok, xAI") and as the real model id — never as the product name or logo (xAI Brand Guidelines).
Design source of truth: docs/plans/grok_bot.md. Privacy/152-ФЗ pre-launch
gating lives there (§6) and is not closed by this code.
Layout
apps/ai-bot/
├── main.go # entrypoint, lifecycle, `check-config` subcommand
├── config.go # env parsing + validation + redacted summary
├── bot.go # event handling, classification, limiter wiring
├── appservice.go # HTTP transaction-push server (hs_token auth, txn idempotency)
├── matrix.go # CS-API client as the appservice user (as_token + ?user_id=)
├── registration.go # generate + read registration.yaml (tokens, mautrix idiom)
├── events.go # Matrix event types + decoders
├── mentions.go # m.mentions + pill/reply fallbacks (F29/F30)
├── context.go # provider-neutral message-window assembly (trigger + bot replies)
├── llm.go # provider-neutral types + LLMClient interface (no vendor names)
├── httpllm.go # shared OpenAI-compatible chat/completions transport + retry (F6)
├── provider_xai.go # thin xAI/Grok adapter over the shared transport
├── provider_gemini.go # Gemini adapter: OpenAI-compat client + native v1beta grounding
├── pricing.go # per-model price table (priceFor) + CostBreakdown
├── router.go # cascade router: Layer-0 heuristic + optional Layer-1 Gemini classifier
├── cascade.go # generate(): route dispatch with degrade-to-grok_direct
├── web.go # WebProvider: grok_web_search (Live Search) | gemini_grounding + cap guard
├── telemetry.go # request_log analytics row + async emit + retention trim
├── store.go # Postgres (vojo_ai): spend ledger (+reservation/components), dedup, request_log, grounding cap
├── messages.go # language-free emoji status reactions
├── markdown.go # markdown → org.matrix.custom.html for the reply's formatted_body
├── util.go # bounded dedup set + small hash
├── prompts/system_ru.txt
├── Dockerfile # CGO-free static build → distroless, EXPOSE 8009
└── .env.example
Configuration
All via environment (see .env.example). Required: HOMESERVER_URL, BOT_MXID,
AS_TOKEN, HS_TOKEN, XAI_API_KEY, ALLOWED_SERVERS, AI_BOT_DATABASE_URL.
AS_ADDR (default :8009) is the transaction-push listen address — it must match
the url port in the registration. The model is env-configurable (XAI_MODEL,
default grok-4.20-0309-non-reasoning).
grok-4.3 is the newer unified model (same price, 1M context): one model with a
reasoning_effort dial. If you switch XAI_MODEL=grok-4.3, set
GROK_REASONING_EFFORT=none to keep the default voice fast/cheap — otherwise the API
defaults to low and reasons on every reply. GROK_REASONING_EFFORT (accepted:
none|low|medium|high, default empty = not sent) is applied to the normal Grok voice
(grok_direct + web synthesis); leave it empty for grok-4.20-non-reasoning, which
rejects the param. The reason_then_grok route always uses high regardless.
Database
The bot keeps its operational state — appservice transaction + event dedup, the
daily spend ledger, and the encrypted-room warned set — in a dedicated Postgres
database vojo_ai on the shared server, mirroring the per-service bridge databases
(each bridge owns its own role + DB). It stores no message content: the room
timeline is canonical in Synapse, and the bot's xAI context window is the in-memory
buffer in bot.go. The schema is created/migrated on startup (a schema_version
table + idempotent CREATE TABLE IF NOT EXISTS), so a fresh vojo_ai needs no
manual DDL — just the role + database:
-- once, as the Postgres superuser (e.g. `docker exec vojo-postgres-1 psql -U synapse -d postgres`):
CREATE ROLE vojo_ai LOGIN PASSWORD '<32-char secret>'; -- least privilege; NOT a superuser
CREATE DATABASE vojo_ai OWNER vojo_ai;
Point the bot at it with AI_BOT_DATABASE_URL (libpq/pgx DSN). Inside the docker
network the host is the postgres service; sslmode=disable matches Synapse and
the bridges on the internal network:
AI_BOT_DATABASE_URL=postgres://vojo_ai:<secret>@postgres:5432/vojo_ai?sslmode=disable
The hard USD ceiling is priced from the API-returned token usage times the
per-model price table (XAI_PRICE_*_PER_M, GEMINI_PRICE_*_PER_M), so a price
change only needs those constants updated — it can't silently blow the cap. The
ceiling is enforced with an optimistic reservation (reserved_usd): a request's
estimated max-cost is booked at admission and settled to the real cost afterward, so
a burst of concurrent requests can't slip past DAILY_USD_CEILING (it would
otherwise, since the USD only lands after each call).
Operator accounting (Phase 1, on by default)
REQUEST_BUDGET_SECONDS(default 180) — overall per-request deadline shared by all model calls, so a slow/retried call (or a cascade) can't accrete minutes.GROK_PROMPT_CACHE(default false) — Grok caches prompt prefixes automatically; this toggle only adds thex-grok-conv-idrouting header (a per-room id) to raise the cache hit rate. There is noprompt_cachebody param (verified on docs.x.ai).TELEMETRY_ENABLED(default false) — write arequest_loganalytics row per engaged request (route, per-component $, latency, degrade/ceiling reasons). The write is async and isolated — its failure never drops a reply.TELEMETRY_STORE_TEXT(default false) additionally keeps the query text (for offline eval);TELEMETRY_RETENTION_DAYS(default 30) time-trims old rows. Turn telemetry on to MEASURE the base before enabling any cascade layer.
Cascade (Phase 2-4) — behind flags, default OFF (every layer off == today's bot)
All optional; an unset env is exactly today's single grok_direct call. Any layer off or
failing degrades to grok_direct (never silence). Do not enable in prod until the
offline-eval gate (misroute < 2-3% AND measured saving > the second provider's cost; see
docs/plans/ai_backend_build_plan.md §9).
| Env | Default | Meaning |
|---|---|---|
ROUTER_ENABLED |
false | Layer-0 heuristic router (else everything → grok_direct) |
ROUTER_CLASSIFIER_ENABLED |
false | Layer-1 Gemini classifier on uncertain cases (requires ROUTER_ENABLED + Gemini key) |
TRIVIAL_OFFLOAD_ENABLED |
false | answer trivial messages with Gemini (requires Gemini key) |
WEB_ENABLED |
false | web_then_grok route (Gemini/Grok fetches fresh facts, Grok stays the voice) |
WEB_PROVIDER |
grok_web_search |
grok_web_search (xAI Agent Tools web_search on the Responses API, $5/1k calls, no Gemini key) or gemini_grounding (cheapest: Gemini does the fetch via native v1beta google_search, Grok voices it — ~$0.0013/query, validated on gemini-2.5-flash-lite; the F-EXT-3 "Gemini-3 only" caveat is the OpenAI-compat endpoint, native v1beta works on 2.5). Requires GEMINI_API_KEY. |
WEB_GROUNDING_DAILY_CAP |
450 | durable per-day cap for gemini_grounding before degrading (keep < the 500/day free grounding RPD; guards the per-1k overage) |
REASONING_ENABLED |
false | manual "think harder" route on REASONING_TRIGGER |
REASONING_TRIGGER |
подумай глубже |
trigger phrase |
REASONING_MODEL |
grok-4.3 |
a reasoning-capable model (the default grok-4.20-non-reasoning rejects reasoning_effort) |
REASONING_EFFORT |
high |
the reasoning_effort the "think harder" route sends (none|low|medium|high) |
GEMINI_API_KEY / _FILE |
— | required only when a Gemini-using layer is on (fail-fast at startup otherwise) |
GEMINI_MODEL |
gemini-2.5-flash-lite |
cheap model for trivial/classifier |
GEMINI_BASE_URL |
…/v1beta/openai |
OpenAI-compat endpoint (native grounding endpoint derived from it) |
One-time setup (appservice registration)
Like the mautrix bridges (e.g. telegram), the bot generates its own
registration (random as_token/hs_token) and reads its tokens back from that
same file — the single source of truth shared with Synapse, no hand-copying.
- Generate it (writes
REGISTRATION_PATH, default/data/registration.yaml):docker compose run --rm ai-bot generate-registration - Bind-mount that same file into the Synapse container (e.g. as
/data/ai-registration.yaml) and add it tohomeserver.yaml:app_service_config_files: - /data/ai-registration.yaml - Restart Synapse (it caches AS configs at startup). Synapse auto-creates
@ai:vojo.chatfromsender_localpart— noregister_new_matrix_user.
The bot reads REGISTRATION_PATH for its tokens (no env AS_TOKEN/HS_TOKEN
needed) and sets its own display name (BOT_DISPLAY_NAME, default "Vojo AI") on
startup. The bot writes/reads /data, so that dir must be owned by the image's
runtime uid (distroless nonroot = 65532): sudo chown -R 65532:65532 ~/vojo/ai-bot.
Run
go run . check-config # local config smoke test (no homeserver contact)
go run . # real run (needs env + a reachable homeserver)
Image & secrets model
The image is config-less (a .dockerignore keeps .env, state/ and VCS
out of the build context; the Dockerfile copies only the binary + prompts/).
Build locally and ship like the mautrix bridges (VS Code task Deploy AI bot =
docker build -t ai-bot:custom → docker save | ssh docker load), then run on
the server with config + secrets supplied at runtime.
Config and secrets are separated: non-secret config in ai-bot.env
(env_file); the appservice tokens live in the generated registration.yaml
(read via REGISTRATION_PATH); the only remaining standalone secret is the xAI
key (XAI_API_KEY_FILE).
Compose stanza (add to ~/vojo/docker-compose.yml; the service key ai-bot
must match the registration url host http://ai-bot:8009):
ai-bot:
image: ai-bot:custom
container_name: vojo-ai-bot
restart: unless-stopped
depends_on: [synapse, postgres] # needs both up before it starts
env_file: ./ai-bot/ai-bot.env # config incl. AI_BOT_DATABASE_URL (chmod 600 — embeds the DB password)
environment:
REGISTRATION_PATH: /data/registration.yaml # tokens (generated; shared with Synapse)
STATE_DIR: /data/state # runtime dir (the operational store is now in Postgres)
XAI_API_KEY_FILE: /data/secrets/xai_api_key # the one standalone secret
volumes:
- ./ai-bot:/data # owned by uid 65532 (see setup)
Also bind-mount the same registration into Synapse and restart it:
synapse:
volumes:
- ./ai-bot/registration.yaml:/data/ai-registration.yaml:ro
HOMESERVER_URL must use the Synapse service name (http://synapse:8008),
not localhost. Synapse and the bot must share a docker network (same compose
project does this) so Synapse can push to http://ai-bot:8009.
Verification status
Compile-level + unit-tested locally:
- ✅
go vetclean,gofmtclean, static CGO-free build. - ✅
go test— appservice transaction handling (hs_token auth → 403 on bad token, txnId idempotency / no re-dispatch, legacy?access_token=, user query 200/404); mention detection (m.mentions, empty-{}F29, no-body-fallback F30, pill, reply-to-bot); DM classification (invited+joined==2, F3: 2 joined + 1 invited is not a 1:1); group-vs-DM context minimisation (groups never leak third-party content); USD pricing; markdown → HTML rendering (escaping, safe-URL allowlist, false-positive guards, oversize/adversarial fallbacks). - ✅
check-configreads env + loads the system prompt.
The store-backed tests (appservice transaction handling + the dedup/limiter/warned
store in store_test.go, including the concurrent per-user-cap guarantee and
restart-durability) need a throwaway Postgres via AI_BOT_TEST_DATABASE_URL; they
skip when it is unset, so go test ./... stays green without one. To run them:
docker run -d --name pg -e POSTGRES_PASSWORD=p -p 5432:5432 postgres:16
# … create role+db vojo_ai, then:
AI_BOT_TEST_DATABASE_URL=postgres://vojo_ai:…@localhost:5432/vojo_ai?sslmode=disable go test ./...
Deferred to a live homeserver + xAI key + a loaded registration (runtime ✔):
- Synapse pushes transactions → bot replies (
authenticated as @ai:vojo.chatin logs); - invite from
:vojo.chat→ join, foreign-server invite → leave (F11); @-mention / 1:1 message →m.noticereply with reply (and thread, F27) relation, carrying aformatted_body(org.matrix.custom.html) when the answer has markdown;- encrypted room → exactly one notice, not repeated after restart (F5);
- per-user cap → silent drop; global USD ceiling → one notice/room/day;
- a retried transaction (lost 200) is processed at most once (txn dedup).