vojo/apps/ai-bot/README.md

247 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ai-bot
A plaintext Matrix bot user (`@ai:vojo.chat`, display name **Vojo AI**) that
answers xAI Grok completions in its rooms: `@`-mentions in group rooms and every
message in a 1:1. It runs as a **Synapse application service** — Synapse pushes
event transactions to the bot's HTTP endpoint; the bot speaks the Matrix CS-API
back over plain HTTP (no Olm/Megolm — Vojo rooms are unencrypted by default) and
calls the xAI OpenAI-compatible Chat Completions API.
Authentication is the appservice `as_token`/`hs_token` (from the registration) —
non-expiring, so there is **no token rotation and no stored password**.
It is a **separate server-side service**, deployed next to Synapse. It lives in
this repo (alongside `apps/widget-*`) but ships nothing to the web client.
> Branding: user-facing name is **Vojo AI** with a generic icon. "Grok" appears
> only as the factual attribution ("powered by Grok, xAI") and as the real model
> id — never as the product name or logo (xAI Brand Guidelines).
Design source of truth: `docs/plans/grok_bot.md`. Privacy/152-ФЗ pre-launch
gating lives there (§6) and is **not** closed by this code.
## Layout
```
apps/ai-bot/
├── main.go # entrypoint, lifecycle, `check-config` subcommand
├── config.go # env parsing + validation + redacted summary
├── bot.go # event handling, classification, limiter wiring
├── appservice.go # HTTP transaction-push server (hs_token auth, txn idempotency)
├── matrix.go # CS-API client as the appservice user (as_token + ?user_id=)
├── registration.go # generate + read registration.yaml (tokens, mautrix idiom)
├── events.go # Matrix event types + decoders
├── mentions.go # m.mentions + pill/reply fallbacks (F29/F30)
├── context.go # provider-neutral message-window assembly (trigger + bot replies)
├── llm.go # provider-neutral types + LLMClient interface (no vendor names)
├── httpllm.go # shared OpenAI-compatible chat/completions transport + retry (F6)
├── provider_xai.go # thin xAI/Grok adapter over the shared transport
├── provider_gemini.go # Gemini adapter: OpenAI-compat client + native v1beta grounding
├── pricing.go # per-model price table (priceFor) + CostBreakdown
├── router.go # cascade router: Layer-0 heuristic + optional Layer-1 Gemini classifier
├── cascade.go # generate(): route dispatch with degrade-to-grok_direct
├── web.go # WebProvider: grok_web_search (Live Search) | gemini_grounding + cap guard
├── telemetry.go # request_log analytics row + async emit + retention trim
├── store.go # Postgres (vojo_ai): spend ledger (+reservation/components), dedup, request_log, grounding cap
├── messages.go # language-free emoji status reactions
├── markdown.go # markdown → org.matrix.custom.html for the reply's formatted_body
├── util.go # bounded dedup set + small hash
├── prompts/system_ru.txt
├── Dockerfile # CGO-free static build → distroless, EXPOSE 8009
└── .env.example
```
## Configuration
All via environment (see `.env.example`). Required: `HOMESERVER_URL`, `BOT_MXID`,
`AS_TOKEN`, `HS_TOKEN`, `XAI_API_KEY`, `ALLOWED_SERVERS`, `AI_BOT_DATABASE_URL`.
`AS_ADDR` (default `:8009`) is the transaction-push listen address — it must match
the `url` port in the registration. The model is env-configurable (`XAI_MODEL`,
default `grok-4.20-0309-non-reasoning`).
`grok-4.3` is the newer unified model (same price, 1M context): one model with a
`reasoning_effort` dial. If you switch `XAI_MODEL=grok-4.3`, set
`GROK_REASONING_EFFORT=none` to keep the default voice fast/cheap — otherwise the API
defaults to `low` and reasons on **every** reply. `GROK_REASONING_EFFORT` (accepted:
`none|low|medium|high`, default empty = not sent) is applied to the normal Grok voice
(grok_direct + web synthesis); leave it **empty** for `grok-4.20-non-reasoning`, which
rejects the param. The reason_then_grok route always uses `high` regardless.
### Database
The bot keeps its **operational state** — appservice transaction + event dedup, the
daily spend ledger, and the encrypted-room warned set — in a dedicated Postgres
database `vojo_ai` on the shared server, mirroring the per-service bridge databases
(each bridge owns its own role + DB). It stores **no message content**: the room
timeline is canonical in Synapse, and the bot's xAI context window is the in-memory
buffer in `bot.go`. The schema is created/migrated on startup (a `schema_version`
table + idempotent `CREATE TABLE IF NOT EXISTS`), so a fresh `vojo_ai` needs no
manual DDL — just the role + database:
```sql
-- once, as the Postgres superuser (e.g. `docker exec vojo-postgres-1 psql -U synapse -d postgres`):
CREATE ROLE vojo_ai LOGIN PASSWORD '<32-char secret>'; -- least privilege; NOT a superuser
CREATE DATABASE vojo_ai OWNER vojo_ai;
```
Point the bot at it with `AI_BOT_DATABASE_URL` (libpq/pgx DSN). Inside the docker
network the host is the `postgres` service; `sslmode=disable` matches Synapse and
the bridges on the internal network:
```
AI_BOT_DATABASE_URL=postgres://vojo_ai:<secret>@postgres:5432/vojo_ai?sslmode=disable
```
The hard USD ceiling is priced from the **API-returned token usage** times the
per-model price table (`XAI_PRICE_*_PER_M`, `GEMINI_PRICE_*_PER_M`), so a price
change only needs those constants updated — it can't silently blow the cap. The
ceiling is enforced with an optimistic **reservation** (`reserved_usd`): a request's
estimated max-cost is booked at admission and settled to the real cost afterward, so
a burst of concurrent requests can't slip past `DAILY_USD_CEILING` (it would
otherwise, since the USD only lands after each call).
### Operator accounting (Phase 1, on by default)
- `REQUEST_BUDGET_SECONDS` (default 180) — overall per-request deadline shared by all
model calls, so a slow/retried call (or a cascade) can't accrete minutes.
- `GROK_PROMPT_CACHE` (default false) — Grok caches prompt prefixes automatically; this
toggle only adds the `x-grok-conv-id` routing header (a per-room id) to raise the
cache hit rate. There is no `prompt_cache` body param (verified on docs.x.ai).
- `TELEMETRY_ENABLED` (default false) — write a `request_log` analytics row per engaged
request (route, per-component $, latency, degrade/ceiling reasons). The write is async
and isolated — its failure never drops a reply. `TELEMETRY_STORE_TEXT` (default false)
additionally keeps the query text (for offline eval); `TELEMETRY_RETENTION_DAYS`
(default 30) time-trims old rows. Turn telemetry on to MEASURE the base before enabling
any cascade layer.
### Cascade (Phase 2-4) — behind flags, **default OFF** (every layer off == today's bot)
All optional; an unset env is exactly today's single grok_direct call. Any layer off or
failing **degrades to grok_direct** (never silence). Do **not** enable in prod until the
offline-eval gate (misroute < 2-3% AND measured saving > the second provider's cost; see
`docs/plans/ai_backend_build_plan.md` §9).
| Env | Default | Meaning |
|---|---|---|
| `ROUTER_ENABLED` | false | Layer-0 heuristic router (else everything → grok_direct) |
| `ROUTER_CLASSIFIER_ENABLED` | false | Layer-1 Gemini classifier on uncertain cases (requires `ROUTER_ENABLED` + Gemini key) |
| `TRIVIAL_OFFLOAD_ENABLED` | false | answer trivial messages with Gemini (requires Gemini key) |
| `WEB_ENABLED` | false | web_then_grok route (Gemini/Grok fetches fresh facts, **Grok stays the voice**) |
| `WEB_PROVIDER` | `grok_web_search` | `grok_web_search` (xAI Agent Tools `web_search` on the Responses API, $5/1k calls, no Gemini key) or `gemini_grounding` (**cheapest**: Gemini does the fetch via native v1beta `google_search`, Grok voices it — ~$0.0013/query, validated on `gemini-2.5-flash-lite`; the F-EXT-3 "Gemini-3 only" caveat is the OpenAI-compat endpoint, native v1beta works on 2.5). Requires `GEMINI_API_KEY`. |
| `WEB_GROUNDING_DAILY_CAP` | 450 | durable per-day cap for `gemini_grounding` before degrading (keep < the 500/day free grounding RPD; guards the per-1k overage) |
| `REASONING_ENABLED` | false | manual "think harder" route on `REASONING_TRIGGER` |
| `REASONING_TRIGGER` | `подумай глубже` | trigger phrase |
| `REASONING_MODEL` | `grok-4.3` | a **reasoning-capable** model (the default `grok-4.20-non-reasoning` rejects `reasoning_effort`) |
| `REASONING_EFFORT` | `high` | the reasoning_effort the "think harder" route sends (`nonelowmediumhigh`) |
| `GEMINI_API_KEY` / `_FILE` | | required only when a Gemini-using layer is on (fail-fast at startup otherwise) |
| `GEMINI_MODEL` | `gemini-2.5-flash-lite` | cheap model for trivial/classifier |
| `GEMINI_BASE_URL` | `…/v1beta/openai` | OpenAI-compat endpoint (native grounding endpoint derived from it) |
## One-time setup (appservice registration)
Like the mautrix bridges (e.g. telegram), the bot **generates its own
registration** (random `as_token`/`hs_token`) and reads its tokens back from that
same file the single source of truth shared with Synapse, no hand-copying.
1. Generate it (writes `REGISTRATION_PATH`, default `/data/registration.yaml`):
```bash
docker compose run --rm ai-bot generate-registration
```
2. Bind-mount that same file into the Synapse container (e.g. as
`/data/ai-registration.yaml`) and add it to `homeserver.yaml`:
```yaml
app_service_config_files:
- /data/ai-registration.yaml
```
3. **Restart Synapse** (it caches AS configs at startup). Synapse auto-creates
`@ai:vojo.chat` from `sender_localpart` — no `register_new_matrix_user`.
The bot reads `REGISTRATION_PATH` for its tokens (no env `AS_TOKEN`/`HS_TOKEN`
needed) and sets its own display name (`BOT_DISPLAY_NAME`, default "Vojo AI") on
startup. The bot writes/reads `/data`, so that dir must be owned by the image's
runtime uid (distroless nonroot = **65532**): `sudo chown -R 65532:65532 ~/vojo/ai-bot`.
## Run
```bash
go run . check-config # local config smoke test (no homeserver contact)
go run . # real run (needs env + a reachable homeserver)
```
### Image & secrets model
The image is **config-less** (a `.dockerignore` keeps `.env`, `state/` and VCS
out of the build context; the Dockerfile copies only the binary + `prompts/`).
Build locally and ship like the mautrix bridges (VS Code task **Deploy AI bot** =
`docker build -t ai-bot:custom` → `docker save | ssh docker load`), then run on
the server with config + secrets supplied at runtime.
Config and secrets are **separated**: non-secret config in `ai-bot.env`
(`env_file`); the appservice tokens live in the generated `registration.yaml`
(read via `REGISTRATION_PATH`); the only remaining standalone secret is the xAI
key (`XAI_API_KEY_FILE`).
Compose stanza (add to `~/vojo/docker-compose.yml`; the **service key `ai-bot`**
must match the registration `url` host `http://ai-bot:8009`):
```yaml
ai-bot:
image: ai-bot:custom
container_name: vojo-ai-bot
restart: unless-stopped
depends_on: [synapse, postgres] # needs both up before it starts
env_file: ./ai-bot/ai-bot.env # config incl. AI_BOT_DATABASE_URL (chmod 600 — embeds the DB password)
environment:
REGISTRATION_PATH: /data/registration.yaml # tokens (generated; shared with Synapse)
STATE_DIR: /data/state # runtime dir (the operational store is now in Postgres)
XAI_API_KEY_FILE: /data/secrets/xai_api_key # the one standalone secret
volumes:
- ./ai-bot:/data # owned by uid 65532 (see setup)
```
Also bind-mount the same registration into Synapse and restart it:
```yaml
synapse:
volumes:
- ./ai-bot/registration.yaml:/data/ai-registration.yaml:ro
```
`HOMESERVER_URL` must use the Synapse **service name** (`http://synapse:8008`),
not `localhost`. Synapse and the bot must share a docker network (same compose
project does this) so Synapse can push to `http://ai-bot:8009`.
## Verification status
Compile-level + unit-tested locally:
- ✅ `go vet` clean, `gofmt` clean, static CGO-free build.
- ✅ `go test` — appservice transaction handling (hs_token auth → 403 on bad
token, txnId idempotency / no re-dispatch, legacy `?access_token=`, user query
200/404); mention detection (m.mentions, empty-`{}` F29, no-body-fallback F30,
pill, reply-to-bot); DM classification (invited+joined==2, F3: 2 joined + 1
invited is **not** a 1:1); group-vs-DM context minimisation (groups never leak
third-party content); USD pricing; **markdown → HTML rendering** (escaping,
safe-URL allowlist, false-positive guards, oversize/adversarial fallbacks).
- ✅ `check-config` reads env + loads the system prompt.
The store-backed tests (appservice transaction handling + the dedup/limiter/warned
store in `store_test.go`, including the concurrent per-user-cap guarantee and
restart-durability) need a throwaway Postgres via `AI_BOT_TEST_DATABASE_URL`; they
**skip** when it is unset, so `go test ./...` stays green without one. To run them:
```bash
docker run -d --name pg -e POSTGRES_PASSWORD=p -p 5432:5432 postgres:16
# … create role+db vojo_ai, then:
AI_BOT_TEST_DATABASE_URL=postgres://vojo_ai:…@localhost:5432/vojo_ai?sslmode=disable go test ./...
```
Deferred to a live homeserver + xAI key + a loaded registration (runtime ✔):
- Synapse pushes transactions → bot replies (`authenticated as @ai:vojo.chat` in logs);
- invite from `:vojo.chat` → join, foreign-server invite → leave (F11);
- `@`-mention / 1:1 message → `m.notice` reply with reply (and thread, F27) relation,
carrying a `formatted_body` (org.matrix.custom.html) when the answer has markdown;
- encrypted room exactly one notice, **not** repeated after restart (F5);
- per-user cap silent drop; global USD ceiling one notice/room/day;
- a retried transaction (lost 200) is processed at most once (txn dedup).