vojo/apps/ai-bot/README.md

289 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ai-bot
A plaintext Matrix bot user (`@ai:vojo.chat`, display name **Vojo AI**) that
answers xAI Grok completions in its rooms: `@`-mentions in group rooms and every
message in a 1:1. It runs as a **Synapse application service** — Synapse pushes
event transactions to the bot's HTTP endpoint; the bot speaks the Matrix CS-API
back over plain HTTP (no Olm/Megolm — Vojo rooms are unencrypted by default) and
calls the xAI OpenAI-compatible Chat Completions API.
Authentication is the appservice `as_token`/`hs_token` (from the registration) —
non-expiring, so there is **no token rotation and no stored password**.
It is a **separate server-side service**, deployed next to Synapse. It lives in
this repo (alongside `apps/widget-*`) but ships nothing to the web client.
> Branding: user-facing name is **Vojo AI** with a generic icon. "Grok" appears
> only as the factual attribution ("powered by Grok, xAI") and as the real model
> id — never as the product name or logo (xAI Brand Guidelines).
Design source of truth: `docs/plans/grok_bot.md`. Privacy/152-ФЗ pre-launch
gating lives there (§6) and is **not** closed by this code.
## Layout
```
apps/ai-bot/
├── main.go # entrypoint, lifecycle, `check-config` subcommand
├── config.go # env parsing + validation + redacted summary
├── bot.go # event handling, classification, limiter wiring
├── appservice.go # HTTP transaction-push server (hs_token auth, txn idempotency)
├── matrix.go # CS-API client as the appservice user (as_token + ?user_id=)
├── registration.go # generate + read registration.yaml (tokens, mautrix idiom)
├── events.go # Matrix event types + decoders
├── mentions.go # m.mentions + pill/reply fallbacks (F29/F30)
├── context.go # provider-neutral message-window assembly (trigger + bot replies)
├── llm.go # provider-neutral types + LLMClient interface (no vendor names)
├── httpllm.go # shared OpenAI-compatible chat/completions transport + retry (F6)
├── provider_xai.go # thin xAI/Grok adapter over the shared transport
├── provider_gemini.go # Gemini adapter: OpenAI-compat client + native v1beta grounding
├── pricing.go # per-model price table (priceFor) + CostBreakdown
├── router.go # cascade router: Layer-0 heuristic + optional Layer-1 Gemini classifier
├── cascade.go # generate(): route dispatch with degrade-to-grok_direct
├── web.go # WebProvider: grok_web_search (Live Search) | gemini_grounding + cap guard
├── telemetry.go # request_log analytics row + async emit + retention trim
├── store.go # Postgres (vojo_ai): spend ledger (+reservation/components), dedup, request_log, grounding cap
├── messages.go # language-free emoji status reactions
├── markdown.go # markdown → org.matrix.custom.html for the reply's formatted_body
├── util.go # bounded dedup set + small hash
├── prompts/system_prompt.txt
├── Dockerfile # CGO-free static build → distroless, EXPOSE 8009
└── .env.example
```
## Configuration
All via environment (see `.env.example`). Required: `HOMESERVER_URL`, `BOT_MXID`,
`AS_TOKEN`, `HS_TOKEN`, `XAI_API_KEY`, `ALLOWED_SERVERS`, `AI_BOT_DATABASE_URL`.
`AS_ADDR` (default `:8009`) is the transaction-push listen address — it must match
the `url` port in the registration. The model is env-configurable (`XAI_MODEL`,
default `grok-4.20-0309-non-reasoning`).
`grok-4.3` is the newer unified model (same price, 1M context): one model with a
`reasoning_effort` dial. If you switch `XAI_MODEL=grok-4.3`, set
`GROK_REASONING_EFFORT=none` to keep the default voice fast/cheap — otherwise the API
defaults to `low` and reasons on **every** reply. `GROK_REASONING_EFFORT` (accepted:
`none|low|medium|high`, default empty = not sent) is applied to the normal Grok voice
(grok_direct + web synthesis); leave it **empty** for `grok-4.20-non-reasoning`, which
rejects the param. The reason_then_grok route always uses `high` regardless.
### Database
The bot keeps its **operational state** — appservice transaction + event dedup, the
daily spend ledger, and the encrypted-room warned set — in a dedicated Postgres
database `vojo_ai` on the shared server, mirroring the per-service bridge databases
(each bridge owns its own role + DB). It stores **no message content**: the room
timeline is canonical in Synapse, and the bot's xAI context window is the in-memory
buffer in `bot.go`. The schema is created/migrated on startup (a `schema_version`
table + idempotent `CREATE TABLE IF NOT EXISTS`), so a fresh `vojo_ai` needs no
manual DDL — just the role + database:
```sql
-- once, as the Postgres superuser (e.g. `docker exec vojo-postgres-1 psql -U synapse -d postgres`):
CREATE ROLE vojo_ai LOGIN PASSWORD '<32-char secret>'; -- least privilege; NOT a superuser
CREATE DATABASE vojo_ai OWNER vojo_ai;
```
Point the bot at it with `AI_BOT_DATABASE_URL` (libpq/pgx DSN). Inside the docker
network the host is the `postgres` service; `sslmode=disable` matches Synapse and
the bridges on the internal network:
```
AI_BOT_DATABASE_URL=postgres://vojo_ai:<secret>@postgres:5432/vojo_ai?sslmode=disable
```
The hard USD ceiling is priced from the **API-returned token usage** times the
per-model price table (`XAI_PRICE_*_PER_M`, `GEMINI_PRICE_*_PER_M`), so a price
change only needs those constants updated — it can't silently blow the cap. The
ceiling is enforced with an optimistic **reservation** (`reserved_usd`): a request's
estimated max-cost is booked at admission and settled to the real cost afterward, so
a burst of concurrent requests can't slip past `DAILY_USD_CEILING` (it would
otherwise, since the USD only lands after each call).
### Operator accounting (Phase 1, on by default)
- `REQUEST_BUDGET_SECONDS` (default 180) — overall per-request deadline shared by all
model calls, so a slow/retried call (or a cascade) can't accrete minutes.
- `GROK_PROMPT_CACHE` (default false) — Grok caches prompt prefixes automatically; this
toggle only adds the `x-grok-conv-id` routing header (a per-room id) to raise the
cache hit rate. There is no `prompt_cache` body param (verified on docs.x.ai).
- `TELEMETRY_ENABLED` (default false) — write a `request_log` analytics row per engaged
request (route, per-component $, latency, degrade/ceiling reasons). The write is async
and isolated — its failure never drops a reply. `TELEMETRY_STORE_TEXT` (default false)
additionally keeps the query text (for offline eval); `TELEMETRY_RETENTION_DAYS`
(default 30) time-trims old rows. Turn telemetry on to MEASURE the base before enabling
any cascade layer.
### Observability — logs & per-request trace
The bot logs with the Go stdlib `log/slog` to **stderr**; `LOG_LEVEL`
(`debug|info|warn|error`, default `info`) and `LOG_FORMAT` (`text|json`, default `text`)
control it. Set `LOG_FORMAT=json` in prod so a collector (Fluent Bit / Vector / Filebeat)
can tail the container's stdout and ship the lines to OpenSearch / Loki — the bot itself
never talks to a log backend (12-factor: it just writes structured lines).
- **Trace id (always on, no content).** Every handled event gets a fresh `trace_id` (a
random 16-byte / 32-hex value — the W3C/OpenTelemetry trace-id shape, so the `trace_id`
field maps straight onto an OTel trace id later; full distributed tracing would still
need a span id + `traceparent` propagation). It is minted once at the **per-event
handler** and stamped into the request `context`, then attached to **every** log line
for that request — through the per-room goroutine and down to the HTTP call to the model
— so you can grep one `trace_id` to get the whole trail. (The appservice transaction-push
logs sit above the per-event handler and carry no `trace_id`; they correlate by their
Synapse txn id instead, since one transaction fans out to many events.) The Matrix
`event_id` is logged on the entry/skip lines too, and is the `request_log.ID`, so
logs ↔ telemetry correlate.
- **Routing / selection (DEBUG, no flag — metadata only).** At `LOG_LEVEL=debug` the
router's verdict (`route decided`: route, source, confidence, needs_web) and the final
outcome (`generation outcome`: route actually run, fallback, degrade reason, per-stage
ms, $) are logged. No message content — safe to leave on while debugging routing.
- **Model request/response bodies (gated per-user, DEBUG).** `LOG_BODIES_USERS` is a
comma-separated **allowlist of sender mxids** whose full model request/response bodies
are logged (`llm exchange`). Empty (default) = **nobody** — message content never enters
the logs. It is a **double gate**: a sender must be on the allowlist AND `LOG_LEVEL=debug`
must be set. Bodies are truncated to a fixed ~4 KB cap. Only the
request/response **bodies** are logged — never the URL or any header — so the API key
cannot leak on either transport. Use it to debug your own traffic, e.g.
`LOG_BODIES_USERS=@heaven:vojo.chat`. **Note:** once on, these lines contain cleartext
message content + the model's reply + the sender mxid (personal data) — so if you ship
them to OpenSearch/Loki, apply retention and access control at that sink accordingly.
`TELEMETRY_*` (below) is the separate **analytics** path (a `request_log` row per request);
the logs above are the **debug** path. They share the `trace_id`/`event_id` correlation
keys but are independent — telemetry can be off while debug logging is on, and vice versa.
### Cascade (Phase 2-4) — behind flags, **default OFF** (every layer off == today's bot)
All optional; an unset env is exactly today's single grok_direct call. Any layer off or
failing **degrades to grok_direct** (never silence). Do **not** enable in prod until the
offline-eval gate (misroute < 2-3% AND measured saving > the second provider's cost; see
`docs/plans/ai_backend_build_plan.md` §9).
| Env | Default | Meaning |
|---|---|---|
| `ROUTER_ENABLED` | false | Layer-0 heuristic router (else everything → grok_direct) |
| `ROUTER_CLASSIFIER_ENABLED` | false | Layer-1 Gemini classifier — runs on **every** message when on (not just uncertain ones): it agreement-confirms trivial and, with `WEB_PARANOID`, raises checkable-fact lookups to web. Budget ~$0.00004/msg, reserved unconditionally. Requires `ROUTER_ENABLED` + Gemini key. |
| `TRIVIAL_OFFLOAD_ENABLED` | false | answer trivial messages with Gemini (requires Gemini key) |
| `WEB_ENABLED` | false | web_then_grok route (Gemini/Grok fetches fresh facts, **Grok stays the voice**) |
| `WEB_PROVIDER` | `grok_web_search` | `grok_web_search` (xAI Agent Tools `web_search` on the Responses API, $5/1k calls, no Gemini key) or `gemini_grounding` (**cheapest**: Gemini does the fetch via native v1beta `google_search`, Grok voices it — ~$0.0013/query, validated on `gemini-2.5-flash-lite`; the F-EXT-3 "Gemini-3 only" caveat is the OpenAI-compat endpoint, native v1beta works on 2.5). Requires `GEMINI_API_KEY`. |
| `WEB_PARANOID` | false | **the single switch that activates epistemic grounding.** Beyond freshness words, it unlocks the classifier-driven web arms (needs_web≥0.55, obscure entity, time-sensitive, lookup-hint) — i.e. it routes checkable-fact lookups (a film's cast, a date) to grounding instead of letting Grok answer from memory and hallucinate. With it off, web routing is freshness-only (= today), so turning on the classifier alone is web-routing-neutral. **Requires `WEB_PROVIDER=gemini_grounding`** (refuses to boot on `grok_web_search`, which has no daily cap). |
| `WEB_GROUNDING_DAILY_CAP` | 450 | durable per-day cap for `gemini_grounding` before degrading. Google gives **1,500 grounded requests/day free** (shared Flash/Flash-Lite, both free & paid tiers; verified ai.google.dev/pricing); keep the cap **under 1,500** so grounding stays free (token-only). Must be > 0 for `gemini_grounding` (a non-positive cap silently disables grounding → refuses to boot). |
| `GEMINI_GROUNDING_PER_PROMPT_USD` | 0.035 | the per-grounded-prompt FEE booked into the ledger so the `DAILY_USD_CEILING` accounts for it. The fee is **$35/1k = $0.035** but ONLY applies **above** the 1,500/day free allowance. So while `WEB_GROUNDING_DAILY_CAP ≤ 1,500` (e.g. the 450 default) grounding never hits the fee → **set `0`** (the bot then books only token cost, which is correct). Set `0.035` only if you raise the cap above 1,500/day, so the ceiling throttles before silently overrunning on requests #1501+. |
| `PROJECT_KB_ENABLED` | false | **project_then_grok route** — answers questions about the **Vojo product itself** (features/how-to/limits/privacy) from a curated KB instead of Grok's empty memory (Grok doesn't know Vojo) or the web (Google doesn't either). Gated by the classifier's `about_project` signal — the classifier is the context-aware judge (it sees the conversation, so it resolves follow-ups like "Про этот" → the app that a bare-message regex can't), and a false positive is cheap (the entity-scoped note keeps Grok answering the real question). The KB is injected as a system note with an **entity-scoped** anti-hallucination instruction (Vojo claims from the KB only; "I don't have that" when absent; general parts answered normally). Beats every web arm. **Requires `ROUTER_CLASSIFIER_ENABLED`** (+ transitively `ROUTER_ENABLED` + Gemini key). One Grok call (no extra model call) → `reserveEstimate` unchanged; the KB adds ≤~2,500 input tokens on top of the capped prompt (a bounded slight under-reservation; `Settle` books the actual). |
| `PROJECT_KB_PATH` | `prompts/vojo_kb.txt` | path to the curated KB text file (operator data, **not** code), loaded once at startup like `SYSTEM_PROMPT_PATH` (no hot-reload — edit + restart). **Defaults to the KB baked into the image**, so enabling the route needs only `PROJECT_KB_ENABLED=true`. An empty/missing file or a KB over ~2,500 tokens **refuses to boot** (fail-closed). Format: terse bullets, one fact per line, keep negations explicit. |
| `REASONING_ENABLED` | false | manual "think harder" route on `REASONING_TRIGGER` |
| `REASONING_TRIGGER` | `подумай глубже` | trigger phrase |
| `REASONING_MODEL` | `grok-4.3` | a **reasoning-capable** model (the default `grok-4.20-non-reasoning` rejects `reasoning_effort`) |
| `REASONING_EFFORT` | `high` | the reasoning_effort the "think harder" route sends (`nonelowmediumhigh`) |
| `GEMINI_API_KEY` / `_FILE` | — | required only when a Gemini-using layer is on (fail-fast at startup otherwise) |
| `GEMINI_MODEL` | `gemini-2.5-flash-lite` | cheap model for trivial/classifier |
| `GEMINI_BASE_URL` | `…/v1beta/openai` | OpenAI-compat endpoint (native grounding endpoint derived from it) |
## One-time setup (appservice registration)
Like the mautrix bridges (e.g. telegram), the bot **generates its own
registration** (random `as_token`/`hs_token`) and reads its tokens back from that
same file — the single source of truth shared with Synapse, no hand-copying.
1. Generate it (writes `REGISTRATION_PATH`, default `/data/registration.yaml`):
```bash
docker compose run --rm ai-bot generate-registration
```
2. Bind-mount that same file into the Synapse container (e.g. as
`/data/ai-registration.yaml`) and add it to `homeserver.yaml`:
```yaml
app_service_config_files:
- /data/ai-registration.yaml
```
3. **Restart Synapse** (it caches AS configs at startup). Synapse auto-creates
`@ai:vojo.chat` from `sender_localpart` — no `register_new_matrix_user`.
The bot reads `REGISTRATION_PATH` for its tokens (no env `AS_TOKEN`/`HS_TOKEN`
needed) and sets its own display name (`BOT_DISPLAY_NAME`, default "Vojo AI") on
startup. The bot writes/reads `/data`, so that dir must be owned by the image's
runtime uid (distroless nonroot = **65532**): `sudo chown -R 65532:65532 ~/vojo/ai-bot`.
## Run
```bash
go run . check-config # local config smoke test (no homeserver contact)
go run . # real run (needs env + a reachable homeserver)
```
### Image & secrets model
The image is **config-less** (a `.dockerignore` keeps `.env`, `state/` and VCS
out of the build context; the Dockerfile copies only the binary + `prompts/`).
Build locally and ship like the mautrix bridges (VS Code task **Deploy AI bot** =
`docker build -t ai-bot:custom` → `docker save | ssh docker load`), then run on
the server with config + secrets supplied at runtime.
Config and secrets are **separated**: non-secret config in `ai-bot.env`
(`env_file`); the appservice tokens live in the generated `registration.yaml`
(read via `REGISTRATION_PATH`); the only remaining standalone secret is the xAI
key (`XAI_API_KEY_FILE`).
Compose stanza (add to `~/vojo/docker-compose.yml`; the **service key `ai-bot`**
must match the registration `url` host `http://ai-bot:8009`):
```yaml
ai-bot:
image: ai-bot:custom
container_name: vojo-ai-bot
restart: unless-stopped
depends_on: [synapse, postgres] # needs both up before it starts
env_file: ./ai-bot/ai-bot.env # config incl. AI_BOT_DATABASE_URL (chmod 600 — embeds the DB password)
environment:
REGISTRATION_PATH: /data/registration.yaml # tokens (generated; shared with Synapse)
STATE_DIR: /data/state # runtime dir (the operational store is now in Postgres)
XAI_API_KEY_FILE: /data/secrets/xai_api_key # the one standalone secret
volumes:
- ./ai-bot:/data # owned by uid 65532 (see setup)
```
Also bind-mount the same registration into Synapse and restart it:
```yaml
synapse:
volumes:
- ./ai-bot/registration.yaml:/data/ai-registration.yaml:ro
```
`HOMESERVER_URL` must use the Synapse **service name** (`http://synapse:8008`),
not `localhost`. Synapse and the bot must share a docker network (same compose
project does this) so Synapse can push to `http://ai-bot:8009`.
## Verification status
Compile-level + unit-tested locally:
- ✅ `go vet` clean, `gofmt` clean, static CGO-free build.
- ✅ `go test` — appservice transaction handling (hs_token auth → 403 on bad
token, txnId idempotency / no re-dispatch, legacy `?access_token=`, user query
200/404); mention detection (m.mentions, empty-`{}` F29, no-body-fallback F30,
pill, reply-to-bot); DM classification (invited+joined==2, F3: 2 joined + 1
invited is **not** a 1:1); group-vs-DM context minimisation (groups never leak
third-party content); USD pricing; **markdown → HTML rendering** (escaping,
safe-URL allowlist, false-positive guards, oversize/adversarial fallbacks).
- ✅ `check-config` reads env + loads the system prompt.
The store-backed tests (appservice transaction handling + the dedup/limiter/warned
store in `store_test.go`, including the concurrent per-user-cap guarantee and
restart-durability) need a throwaway Postgres via `AI_BOT_TEST_DATABASE_URL`; they
**skip** when it is unset, so `go test ./...` stays green without one. To run them:
```bash
docker run -d --name pg -e POSTGRES_PASSWORD=p -p 5432:5432 postgres:16
# … create role+db vojo_ai, then:
AI_BOT_TEST_DATABASE_URL=postgres://vojo_ai:…@localhost:5432/vojo_ai?sslmode=disable go test ./...
```
Deferred to a live homeserver + xAI key + a loaded registration (runtime ✔):
- Synapse pushes transactions → bot replies (`authenticated as @ai:vojo.chat` in logs);
- invite from `:vojo.chat` → join, foreign-server invite → leave (F11);
- `@`-mention / 1:1 message → `m.notice` reply with reply (and thread, F27) relation,
carrying a `formatted_body` (org.matrix.custom.html) when the answer has markdown;
- encrypted room → exactly one notice, **not** repeated after restart (F5);
- per-user cap → silent drop; global USD ceiling → one notice/room/day;
- a retried transaction (lost 200) is processed at most once (txn dedup).