287 lines
18 KiB
Markdown
287 lines
18 KiB
Markdown
# ai-bot
|
||
|
||
A plaintext Matrix bot user (`@ai:vojo.chat`, display name **Vojo AI**) that
|
||
answers xAI Grok completions in its rooms: `@`-mentions in group rooms and every
|
||
message in a 1:1. It runs as a **Synapse application service** — Synapse pushes
|
||
event transactions to the bot's HTTP endpoint; the bot speaks the Matrix CS-API
|
||
back over plain HTTP (no Olm/Megolm — Vojo rooms are unencrypted by default) and
|
||
calls the xAI OpenAI-compatible Chat Completions API.
|
||
|
||
Authentication is the appservice `as_token`/`hs_token` (from the registration) —
|
||
non-expiring, so there is **no token rotation and no stored password**.
|
||
|
||
It is a **separate server-side service**, deployed next to Synapse. It lives in
|
||
this repo (alongside `apps/widget-*`) but ships nothing to the web client.
|
||
|
||
> Branding: user-facing name is **Vojo AI** with a generic icon. "Grok" appears
|
||
> only as the factual attribution ("powered by Grok, xAI") and as the real model
|
||
> id — never as the product name or logo (xAI Brand Guidelines).
|
||
|
||
Design source of truth: `docs/plans/grok_bot.md`. Privacy/152-ФЗ pre-launch
|
||
gating lives there (§6) and is **not** closed by this code.
|
||
|
||
## Layout
|
||
|
||
```
|
||
apps/ai-bot/
|
||
├── main.go # entrypoint, lifecycle, `check-config` subcommand
|
||
├── config.go # env parsing + validation + redacted summary
|
||
├── bot.go # event handling, classification, limiter wiring
|
||
├── appservice.go # HTTP transaction-push server (hs_token auth, txn idempotency)
|
||
├── matrix.go # CS-API client as the appservice user (as_token + ?user_id=)
|
||
├── registration.go # generate + read registration.yaml (tokens, mautrix idiom)
|
||
├── events.go # Matrix event types + decoders
|
||
├── mentions.go # m.mentions + pill/reply fallbacks (F29/F30)
|
||
├── context.go # provider-neutral message-window assembly (trigger + bot replies)
|
||
├── llm.go # provider-neutral types + LLMClient interface (no vendor names)
|
||
├── httpllm.go # shared OpenAI-compatible chat/completions transport + retry (F6)
|
||
├── provider_xai.go # thin xAI/Grok adapter over the shared transport
|
||
├── provider_gemini.go # Gemini adapter: OpenAI-compat client + native v1beta grounding
|
||
├── pricing.go # per-model price table (priceFor) + CostBreakdown
|
||
├── router.go # cascade router: Layer-0 heuristic + optional Layer-1 Gemini classifier
|
||
├── cascade.go # generate(): route dispatch with degrade-to-grok_direct
|
||
├── web.go # WebProvider: grok_web_search (Live Search) | gemini_grounding + cap guard
|
||
├── telemetry.go # request_log analytics row + async emit + retention trim
|
||
├── store.go # Postgres (vojo_ai): spend ledger (+reservation/components), dedup, request_log, grounding cap
|
||
├── messages.go # language-free emoji status reactions
|
||
├── markdown.go # markdown → org.matrix.custom.html for the reply's formatted_body
|
||
├── util.go # bounded dedup set + small hash
|
||
├── prompts/system_prompt.txt
|
||
├── Dockerfile # CGO-free static build → distroless, EXPOSE 8009
|
||
└── .env.example
|
||
```
|
||
|
||
## Configuration
|
||
|
||
All via environment (see `.env.example`). Required: `HOMESERVER_URL`, `BOT_MXID`,
|
||
`AS_TOKEN`, `HS_TOKEN`, `XAI_API_KEY`, `ALLOWED_SERVERS`, `AI_BOT_DATABASE_URL`.
|
||
`AS_ADDR` (default `:8009`) is the transaction-push listen address — it must match
|
||
the `url` port in the registration. The model is env-configurable (`XAI_MODEL`,
|
||
default `grok-4.20-0309-non-reasoning`).
|
||
|
||
`grok-4.3` is the newer unified model (same price, 1M context): one model with a
|
||
`reasoning_effort` dial. If you switch `XAI_MODEL=grok-4.3`, set
|
||
`GROK_REASONING_EFFORT=none` to keep the default voice fast/cheap — otherwise the API
|
||
defaults to `low` and reasons on **every** reply. `GROK_REASONING_EFFORT` (accepted:
|
||
`none|low|medium|high`, default empty = not sent) is applied to the normal Grok voice
|
||
(grok_direct + web synthesis); leave it **empty** for `grok-4.20-non-reasoning`, which
|
||
rejects the param. The reason_then_grok route always uses `high` regardless.
|
||
|
||
### Database
|
||
|
||
The bot keeps its **operational state** — appservice transaction + event dedup, the
|
||
daily spend ledger, and the encrypted-room warned set — in a dedicated Postgres
|
||
database `vojo_ai` on the shared server, mirroring the per-service bridge databases
|
||
(each bridge owns its own role + DB). It stores **no message content**: the room
|
||
timeline is canonical in Synapse, and the bot's xAI context window is the in-memory
|
||
buffer in `bot.go`. The schema is created/migrated on startup (a `schema_version`
|
||
table + idempotent `CREATE TABLE IF NOT EXISTS`), so a fresh `vojo_ai` needs no
|
||
manual DDL — just the role + database:
|
||
|
||
```sql
|
||
-- once, as the Postgres superuser (e.g. `docker exec vojo-postgres-1 psql -U synapse -d postgres`):
|
||
CREATE ROLE vojo_ai LOGIN PASSWORD '<32-char secret>'; -- least privilege; NOT a superuser
|
||
CREATE DATABASE vojo_ai OWNER vojo_ai;
|
||
```
|
||
|
||
Point the bot at it with `AI_BOT_DATABASE_URL` (libpq/pgx DSN). Inside the docker
|
||
network the host is the `postgres` service; `sslmode=disable` matches Synapse and
|
||
the bridges on the internal network:
|
||
|
||
```
|
||
AI_BOT_DATABASE_URL=postgres://vojo_ai:<secret>@postgres:5432/vojo_ai?sslmode=disable
|
||
```
|
||
|
||
The hard USD ceiling is priced from the **API-returned token usage** times the
|
||
per-model price table (`XAI_PRICE_*_PER_M`, `GEMINI_PRICE_*_PER_M`), so a price
|
||
change only needs those constants updated — it can't silently blow the cap. The
|
||
ceiling is enforced with an optimistic **reservation** (`reserved_usd`): a request's
|
||
estimated max-cost is booked at admission and settled to the real cost afterward, so
|
||
a burst of concurrent requests can't slip past `DAILY_USD_CEILING` (it would
|
||
otherwise, since the USD only lands after each call).
|
||
|
||
### Operator accounting (Phase 1, on by default)
|
||
|
||
- `REQUEST_BUDGET_SECONDS` (default 180) — overall per-request deadline shared by all
|
||
model calls, so a slow/retried call (or a cascade) can't accrete minutes.
|
||
- `GROK_PROMPT_CACHE` (default false) — Grok caches prompt prefixes automatically; this
|
||
toggle only adds the `x-grok-conv-id` routing header (a per-room id) to raise the
|
||
cache hit rate. There is no `prompt_cache` body param (verified on docs.x.ai).
|
||
- `TELEMETRY_ENABLED` (default false) — write a `request_log` analytics row per engaged
|
||
request (route, per-component $, latency, degrade/ceiling reasons). The write is async
|
||
and isolated — its failure never drops a reply. `TELEMETRY_STORE_TEXT` (default false)
|
||
additionally keeps the query text (for offline eval); `TELEMETRY_RETENTION_DAYS`
|
||
(default 30) time-trims old rows. Turn telemetry on to MEASURE the base before enabling
|
||
any cascade layer.
|
||
|
||
### Observability — logs & per-request trace
|
||
|
||
The bot logs with the Go stdlib `log/slog` to **stderr**; `LOG_LEVEL`
|
||
(`debug|info|warn|error`, default `info`) and `LOG_FORMAT` (`text|json`, default `text`)
|
||
control it. Set `LOG_FORMAT=json` in prod so a collector (Fluent Bit / Vector / Filebeat)
|
||
can tail the container's stdout and ship the lines to OpenSearch / Loki — the bot itself
|
||
never talks to a log backend (12-factor: it just writes structured lines).
|
||
|
||
- **Trace id (always on, no content).** Every handled event gets a fresh `trace_id` (a
|
||
random 16-byte / 32-hex value — the W3C/OpenTelemetry trace-id shape, so the `trace_id`
|
||
field maps straight onto an OTel trace id later; full distributed tracing would still
|
||
need a span id + `traceparent` propagation). It is minted once at the **per-event
|
||
handler** and stamped into the request `context`, then attached to **every** log line
|
||
for that request — through the per-room goroutine and down to the HTTP call to the model
|
||
— so you can grep one `trace_id` to get the whole trail. (The appservice transaction-push
|
||
logs sit above the per-event handler and carry no `trace_id`; they correlate by their
|
||
Synapse txn id instead, since one transaction fans out to many events.) The Matrix
|
||
`event_id` is logged on the entry/skip lines too, and is the `request_log.ID`, so
|
||
logs ↔ telemetry correlate.
|
||
- **Routing / selection (DEBUG, no flag — metadata only).** At `LOG_LEVEL=debug` the
|
||
router's verdict (`route decided`: route, source, confidence, needs_web) and the final
|
||
outcome (`generation outcome`: route actually run, fallback, degrade reason, per-stage
|
||
ms, $) are logged. No message content — safe to leave on while debugging routing.
|
||
- **Model request/response bodies (gated per-user, DEBUG).** `LOG_BODIES_USERS` is a
|
||
comma-separated **allowlist of sender mxids** whose full model request/response bodies
|
||
are logged (`llm exchange`). Empty (default) = **nobody** — message content never enters
|
||
the logs. It is a **double gate**: a sender must be on the allowlist AND `LOG_LEVEL=debug`
|
||
must be set. Bodies are truncated to a fixed ~4 KB cap. Only the
|
||
request/response **bodies** are logged — never the URL or any header — so the API key
|
||
cannot leak on either transport. Use it to debug your own traffic, e.g.
|
||
`LOG_BODIES_USERS=@heaven:vojo.chat`. **Note:** once on, these lines contain cleartext
|
||
message content + the model's reply + the sender mxid (personal data) — so if you ship
|
||
them to OpenSearch/Loki, apply retention and access control at that sink accordingly.
|
||
|
||
`TELEMETRY_*` (below) is the separate **analytics** path (a `request_log` row per request);
|
||
the logs above are the **debug** path. They share the `trace_id`/`event_id` correlation
|
||
keys but are independent — telemetry can be off while debug logging is on, and vice versa.
|
||
|
||
### Cascade (Phase 2-4) — behind flags, **default OFF** (every layer off == today's bot)
|
||
|
||
All optional; an unset env is exactly today's single grok_direct call. Any layer off or
|
||
failing **degrades to grok_direct** (never silence). Do **not** enable in prod until the
|
||
offline-eval gate (misroute < 2-3% AND measured saving > the second provider's cost; see
|
||
`docs/plans/ai_backend_build_plan.md` §9).
|
||
|
||
| Env | Default | Meaning |
|
||
|---|---|---|
|
||
| `ROUTER_ENABLED` | false | Layer-0 heuristic router (else everything → grok_direct) |
|
||
| `ROUTER_CLASSIFIER_ENABLED` | false | Layer-1 Gemini classifier — runs on **every** message when on (not just uncertain ones): it agreement-confirms trivial and, with `WEB_PARANOID`, raises checkable-fact lookups to web. Budget ~$0.00004/msg, reserved unconditionally. Requires `ROUTER_ENABLED` + Gemini key. |
|
||
| `TRIVIAL_OFFLOAD_ENABLED` | false | answer trivial messages with Gemini (requires Gemini key) |
|
||
| `WEB_ENABLED` | false | web_then_grok route (Gemini/Grok fetches fresh facts, **Grok stays the voice**) |
|
||
| `WEB_PROVIDER` | `grok_web_search` | `grok_web_search` (xAI Agent Tools `web_search` on the Responses API, $5/1k calls, no Gemini key) or `gemini_grounding` (**cheapest**: Gemini does the fetch via native v1beta `google_search`, Grok voices it — ~$0.0013/query, validated on `gemini-2.5-flash-lite`; the F-EXT-3 "Gemini-3 only" caveat is the OpenAI-compat endpoint, native v1beta works on 2.5). Requires `GEMINI_API_KEY`. |
|
||
| `WEB_PARANOID` | false | **the single switch that activates epistemic grounding.** Beyond freshness words, it unlocks the classifier-driven web arms (needs_web≥0.55, obscure entity, time-sensitive, lookup-hint) — i.e. it routes checkable-fact lookups (a film's cast, a date) to grounding instead of letting Grok answer from memory and hallucinate. With it off, web routing is freshness-only (= today), so turning on the classifier alone is web-routing-neutral. **Requires `WEB_PROVIDER=gemini_grounding`** (refuses to boot on `grok_web_search`, which has no daily cap). |
|
||
| `WEB_GROUNDING_DAILY_CAP` | 450 | durable per-day cap for `gemini_grounding` before degrading. Google gives **1,500 grounded requests/day free** (shared Flash/Flash-Lite, both free & paid tiers; verified ai.google.dev/pricing); keep the cap **under 1,500** so grounding stays free (token-only). Must be > 0 for `gemini_grounding` (a non-positive cap silently disables grounding → refuses to boot). |
|
||
| `GEMINI_GROUNDING_PER_PROMPT_USD` | 0.035 | the per-grounded-prompt FEE booked into the ledger so the `DAILY_USD_CEILING` accounts for it. The fee is **$35/1k = $0.035** but ONLY applies **above** the 1,500/day free allowance. So while `WEB_GROUNDING_DAILY_CAP ≤ 1,500` (e.g. the 450 default) grounding never hits the fee → **set `0`** (the bot then books only token cost, which is correct). Set `0.035` only if you raise the cap above 1,500/day, so the ceiling throttles before silently overrunning on requests #1501+. |
|
||
| `REASONING_ENABLED` | false | manual "think harder" route on `REASONING_TRIGGER` |
|
||
| `REASONING_TRIGGER` | `подумай глубже` | trigger phrase |
|
||
| `REASONING_MODEL` | `grok-4.3` | a **reasoning-capable** model (the default `grok-4.20-non-reasoning` rejects `reasoning_effort`) |
|
||
| `REASONING_EFFORT` | `high` | the reasoning_effort the "think harder" route sends (`none|low|medium|high`) |
|
||
| `GEMINI_API_KEY` / `_FILE` | — | required only when a Gemini-using layer is on (fail-fast at startup otherwise) |
|
||
| `GEMINI_MODEL` | `gemini-2.5-flash-lite` | cheap model for trivial/classifier |
|
||
| `GEMINI_BASE_URL` | `…/v1beta/openai` | OpenAI-compat endpoint (native grounding endpoint derived from it) |
|
||
|
||
## One-time setup (appservice registration)
|
||
|
||
Like the mautrix bridges (e.g. telegram), the bot **generates its own
|
||
registration** (random `as_token`/`hs_token`) and reads its tokens back from that
|
||
same file — the single source of truth shared with Synapse, no hand-copying.
|
||
|
||
1. Generate it (writes `REGISTRATION_PATH`, default `/data/registration.yaml`):
|
||
```bash
|
||
docker compose run --rm ai-bot generate-registration
|
||
```
|
||
2. Bind-mount that same file into the Synapse container (e.g. as
|
||
`/data/ai-registration.yaml`) and add it to `homeserver.yaml`:
|
||
```yaml
|
||
app_service_config_files:
|
||
- /data/ai-registration.yaml
|
||
```
|
||
3. **Restart Synapse** (it caches AS configs at startup). Synapse auto-creates
|
||
`@ai:vojo.chat` from `sender_localpart` — no `register_new_matrix_user`.
|
||
|
||
The bot reads `REGISTRATION_PATH` for its tokens (no env `AS_TOKEN`/`HS_TOKEN`
|
||
needed) and sets its own display name (`BOT_DISPLAY_NAME`, default "Vojo AI") on
|
||
startup. The bot writes/reads `/data`, so that dir must be owned by the image's
|
||
runtime uid (distroless nonroot = **65532**): `sudo chown -R 65532:65532 ~/vojo/ai-bot`.
|
||
|
||
## Run
|
||
|
||
```bash
|
||
go run . check-config # local config smoke test (no homeserver contact)
|
||
go run . # real run (needs env + a reachable homeserver)
|
||
```
|
||
|
||
### Image & secrets model
|
||
|
||
The image is **config-less** (a `.dockerignore` keeps `.env`, `state/` and VCS
|
||
out of the build context; the Dockerfile copies only the binary + `prompts/`).
|
||
Build locally and ship like the mautrix bridges (VS Code task **Deploy AI bot** =
|
||
`docker build -t ai-bot:custom` → `docker save | ssh docker load`), then run on
|
||
the server with config + secrets supplied at runtime.
|
||
|
||
Config and secrets are **separated**: non-secret config in `ai-bot.env`
|
||
(`env_file`); the appservice tokens live in the generated `registration.yaml`
|
||
(read via `REGISTRATION_PATH`); the only remaining standalone secret is the xAI
|
||
key (`XAI_API_KEY_FILE`).
|
||
|
||
Compose stanza (add to `~/vojo/docker-compose.yml`; the **service key `ai-bot`**
|
||
must match the registration `url` host `http://ai-bot:8009`):
|
||
|
||
```yaml
|
||
ai-bot:
|
||
image: ai-bot:custom
|
||
container_name: vojo-ai-bot
|
||
restart: unless-stopped
|
||
depends_on: [synapse, postgres] # needs both up before it starts
|
||
env_file: ./ai-bot/ai-bot.env # config incl. AI_BOT_DATABASE_URL (chmod 600 — embeds the DB password)
|
||
environment:
|
||
REGISTRATION_PATH: /data/registration.yaml # tokens (generated; shared with Synapse)
|
||
STATE_DIR: /data/state # runtime dir (the operational store is now in Postgres)
|
||
XAI_API_KEY_FILE: /data/secrets/xai_api_key # the one standalone secret
|
||
volumes:
|
||
- ./ai-bot:/data # owned by uid 65532 (see setup)
|
||
```
|
||
|
||
Also bind-mount the same registration into Synapse and restart it:
|
||
|
||
```yaml
|
||
synapse:
|
||
volumes:
|
||
- ./ai-bot/registration.yaml:/data/ai-registration.yaml:ro
|
||
```
|
||
|
||
`HOMESERVER_URL` must use the Synapse **service name** (`http://synapse:8008`),
|
||
not `localhost`. Synapse and the bot must share a docker network (same compose
|
||
project does this) so Synapse can push to `http://ai-bot:8009`.
|
||
|
||
## Verification status
|
||
|
||
Compile-level + unit-tested locally:
|
||
|
||
- ✅ `go vet` clean, `gofmt` clean, static CGO-free build.
|
||
- ✅ `go test` — appservice transaction handling (hs_token auth → 403 on bad
|
||
token, txnId idempotency / no re-dispatch, legacy `?access_token=`, user query
|
||
200/404); mention detection (m.mentions, empty-`{}` F29, no-body-fallback F30,
|
||
pill, reply-to-bot); DM classification (invited+joined==2, F3: 2 joined + 1
|
||
invited is **not** a 1:1); group-vs-DM context minimisation (groups never leak
|
||
third-party content); USD pricing; **markdown → HTML rendering** (escaping,
|
||
safe-URL allowlist, false-positive guards, oversize/adversarial fallbacks).
|
||
- ✅ `check-config` reads env + loads the system prompt.
|
||
|
||
The store-backed tests (appservice transaction handling + the dedup/limiter/warned
|
||
store in `store_test.go`, including the concurrent per-user-cap guarantee and
|
||
restart-durability) need a throwaway Postgres via `AI_BOT_TEST_DATABASE_URL`; they
|
||
**skip** when it is unset, so `go test ./...` stays green without one. To run them:
|
||
|
||
```bash
|
||
docker run -d --name pg -e POSTGRES_PASSWORD=p -p 5432:5432 postgres:16
|
||
# … create role+db vojo_ai, then:
|
||
AI_BOT_TEST_DATABASE_URL=postgres://vojo_ai:…@localhost:5432/vojo_ai?sslmode=disable go test ./...
|
||
```
|
||
|
||
Deferred to a live homeserver + xAI key + a loaded registration (runtime ✔):
|
||
|
||
- Synapse pushes transactions → bot replies (`authenticated as @ai:vojo.chat` in logs);
|
||
- invite from `:vojo.chat` → join, foreign-server invite → leave (F11);
|
||
- `@`-mention / 1:1 message → `m.notice` reply with reply (and thread, F27) relation,
|
||
carrying a `formatted_body` (org.matrix.custom.html) when the answer has markdown;
|
||
- encrypted room → exactly one notice, **not** repeated after restart (F5);
|
||
- per-user cap → silent drop; global USD ceiling → one notice/room/day;
|
||
- a retried transaction (lost 200) is processed at most once (txn dedup).
|