feat(ai-bot): add a flag-gated, provider-neutral answer cascade with a reserved-USD ledger and request telemetry
This commit is contained in:
parent
f7f6984d18
commit
ff8918dae1
24 changed files with 2970 additions and 313 deletions
|
|
@ -32,12 +32,20 @@ apps/ai-bot/
|
||||||
├── registration.go # generate + read registration.yaml (tokens, mautrix idiom)
|
├── registration.go # generate + read registration.yaml (tokens, mautrix idiom)
|
||||||
├── events.go # Matrix event types + decoders
|
├── events.go # Matrix event types + decoders
|
||||||
├── mentions.go # m.mentions + pill/reply fallbacks (F29/F30)
|
├── mentions.go # m.mentions + pill/reply fallbacks (F29/F30)
|
||||||
├── context.go # xAI message-window assembly (trigger + bot replies)
|
├── context.go # provider-neutral message-window assembly (trigger + bot replies)
|
||||||
├── xai.go # chat/completions client + retry (F6)
|
├── llm.go # provider-neutral types + LLMClient interface (no vendor names)
|
||||||
├── store.go # Postgres (vojo_ai): spend ledger, txn/event dedup, encrypted-warned set
|
├── httpllm.go # shared OpenAI-compatible chat/completions transport + retry (F6)
|
||||||
├── messages.go # bot-authored RU notices
|
├── provider_xai.go # thin xAI/Grok adapter over the shared transport
|
||||||
|
├── provider_gemini.go # Gemini adapter: OpenAI-compat client + native v1beta grounding
|
||||||
|
├── pricing.go # per-model price table (priceFor) + CostBreakdown
|
||||||
|
├── router.go # cascade router: Layer-0 heuristic + optional Layer-1 Gemini classifier
|
||||||
|
├── cascade.go # generate(): route dispatch with degrade-to-grok_direct
|
||||||
|
├── web.go # WebProvider: grok_web_search (Live Search) | gemini_grounding + cap guard
|
||||||
|
├── telemetry.go # request_log analytics row + async emit + retention trim
|
||||||
|
├── store.go # Postgres (vojo_ai): spend ledger (+reservation/components), dedup, request_log, grounding cap
|
||||||
|
├── messages.go # language-free emoji status reactions
|
||||||
├── markdown.go # markdown → org.matrix.custom.html for the reply's formatted_body
|
├── markdown.go # markdown → org.matrix.custom.html for the reply's formatted_body
|
||||||
├── util.go # bounded dedup set
|
├── util.go # bounded dedup set + small hash
|
||||||
├── prompts/system_ru.txt
|
├── prompts/system_ru.txt
|
||||||
├── Dockerfile # CGO-free static build → distroless, EXPOSE 8009
|
├── Dockerfile # CGO-free static build → distroless, EXPOSE 8009
|
||||||
└── .env.example
|
└── .env.example
|
||||||
|
|
@ -49,8 +57,15 @@ All via environment (see `.env.example`). Required: `HOMESERVER_URL`, `BOT_MXID`
|
||||||
`AS_TOKEN`, `HS_TOKEN`, `XAI_API_KEY`, `ALLOWED_SERVERS`, `AI_BOT_DATABASE_URL`.
|
`AS_TOKEN`, `HS_TOKEN`, `XAI_API_KEY`, `ALLOWED_SERVERS`, `AI_BOT_DATABASE_URL`.
|
||||||
`AS_ADDR` (default `:8009`) is the transaction-push listen address — it must match
|
`AS_ADDR` (default `:8009`) is the transaction-push listen address — it must match
|
||||||
the `url` port in the registration. The model is env-configurable (`XAI_MODEL`,
|
the `url` port in the registration. The model is env-configurable (`XAI_MODEL`,
|
||||||
default `grok-4.20-0309-non-reasoning`; `grok-4.3` is an alternative — **re-verify
|
default `grok-4.20-0309-non-reasoning`).
|
||||||
the id + price on docs.x.ai before deploy**).
|
|
||||||
|
`grok-4.3` is the newer unified model (same price, 1M context): one model with a
|
||||||
|
`reasoning_effort` dial. If you switch `XAI_MODEL=grok-4.3`, set
|
||||||
|
`GROK_REASONING_EFFORT=none` to keep the default voice fast/cheap — otherwise the API
|
||||||
|
defaults to `low` and reasons on **every** reply. `GROK_REASONING_EFFORT` (accepted:
|
||||||
|
`none|low|medium|high`, default empty = not sent) is applied to the normal Grok voice
|
||||||
|
(grok_direct + web synthesis); leave it **empty** for `grok-4.20-non-reasoning`, which
|
||||||
|
rejects the param. The reason_then_grok route always uses `high` regardless.
|
||||||
|
|
||||||
### Database
|
### Database
|
||||||
|
|
||||||
|
|
@ -78,8 +93,49 @@ AI_BOT_DATABASE_URL=postgres://vojo_ai:<secret>@postgres:5432/vojo_ai?sslmode=di
|
||||||
```
|
```
|
||||||
|
|
||||||
The hard USD ceiling is priced from the **API-returned token usage** times the
|
The hard USD ceiling is priced from the **API-returned token usage** times the
|
||||||
configured `XAI_PRICE_*_PER_M` fallbacks, so a price change only needs those
|
per-model price table (`XAI_PRICE_*_PER_M`, `GEMINI_PRICE_*_PER_M`), so a price
|
||||||
constants updated — it can't silently blow the cap.
|
change only needs those constants updated — it can't silently blow the cap. The
|
||||||
|
ceiling is enforced with an optimistic **reservation** (`reserved_usd`): a request's
|
||||||
|
estimated max-cost is booked at admission and settled to the real cost afterward, so
|
||||||
|
a burst of concurrent requests can't slip past `DAILY_USD_CEILING` (it would
|
||||||
|
otherwise, since the USD only lands after each call).
|
||||||
|
|
||||||
|
### Operator accounting (Phase 1, on by default)
|
||||||
|
|
||||||
|
- `REQUEST_BUDGET_SECONDS` (default 180) — overall per-request deadline shared by all
|
||||||
|
model calls, so a slow/retried call (or a cascade) can't accrete minutes.
|
||||||
|
- `GROK_PROMPT_CACHE` (default false) — Grok caches prompt prefixes automatically; this
|
||||||
|
toggle only adds the `x-grok-conv-id` routing header (a per-room id) to raise the
|
||||||
|
cache hit rate. There is no `prompt_cache` body param (verified on docs.x.ai).
|
||||||
|
- `TELEMETRY_ENABLED` (default false) — write a `request_log` analytics row per engaged
|
||||||
|
request (route, per-component $, latency, degrade/ceiling reasons). The write is async
|
||||||
|
and isolated — its failure never drops a reply. `TELEMETRY_STORE_TEXT` (default false)
|
||||||
|
additionally keeps the query text (for offline eval); `TELEMETRY_RETENTION_DAYS`
|
||||||
|
(default 30) time-trims old rows. Turn telemetry on to MEASURE the base before enabling
|
||||||
|
any cascade layer.
|
||||||
|
|
||||||
|
### Cascade (Phase 2-4) — behind flags, **default OFF** (every layer off == today's bot)
|
||||||
|
|
||||||
|
All optional; an unset env is exactly today's single grok_direct call. Any layer off or
|
||||||
|
failing **degrades to grok_direct** (never silence). Do **not** enable in prod until the
|
||||||
|
offline-eval gate (misroute < 2-3% AND measured saving > the second provider's cost; see
|
||||||
|
`docs/plans/ai_backend_build_plan.md` §9).
|
||||||
|
|
||||||
|
| Env | Default | Meaning |
|
||||||
|
|---|---|---|
|
||||||
|
| `ROUTER_ENABLED` | false | Layer-0 heuristic router (else everything → grok_direct) |
|
||||||
|
| `ROUTER_CLASSIFIER_ENABLED` | false | Layer-1 Gemini classifier on uncertain cases (requires `ROUTER_ENABLED` + Gemini key) |
|
||||||
|
| `TRIVIAL_OFFLOAD_ENABLED` | false | answer trivial messages with Gemini (requires Gemini key) |
|
||||||
|
| `WEB_ENABLED` | false | web_then_grok route (Gemini/Grok fetches fresh facts, **Grok stays the voice**) |
|
||||||
|
| `WEB_PROVIDER` | `grok_web_search` | `grok_web_search` (xAI Agent Tools `web_search` on the Responses API, $5/1k calls, no Gemini key) or `gemini_grounding` (**cheapest**: Gemini does the fetch via native v1beta `google_search`, Grok voices it — ~$0.0013/query, validated on `gemini-2.5-flash-lite`; the F-EXT-3 "Gemini-3 only" caveat is the OpenAI-compat endpoint, native v1beta works on 2.5). Requires `GEMINI_API_KEY`. |
|
||||||
|
| `WEB_GROUNDING_DAILY_CAP` | 450 | durable per-day cap for `gemini_grounding` before degrading (keep < the 500/day free grounding RPD; guards the per-1k overage) |
|
||||||
|
| `REASONING_ENABLED` | false | manual "think harder" route on `REASONING_TRIGGER` |
|
||||||
|
| `REASONING_TRIGGER` | `подумай глубже` | trigger phrase |
|
||||||
|
| `REASONING_MODEL` | `grok-4.3` | a **reasoning-capable** model (the default `grok-4.20-non-reasoning` rejects `reasoning_effort`) |
|
||||||
|
| `REASONING_EFFORT` | `high` | the reasoning_effort the "think harder" route sends (`none|low|medium|high`) |
|
||||||
|
| `GEMINI_API_KEY` / `_FILE` | — | required only when a Gemini-using layer is on (fail-fast at startup otherwise) |
|
||||||
|
| `GEMINI_MODEL` | `gemini-2.5-flash-lite` | cheap model for trivial/classifier |
|
||||||
|
| `GEMINI_BASE_URL` | `…/v1beta/openai` | OpenAI-compat endpoint (native grounding endpoint derived from it) |
|
||||||
|
|
||||||
## One-time setup (appservice registration)
|
## One-time setup (appservice registration)
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -32,7 +32,7 @@ func openTestStore(t *testing.T) *Store {
|
||||||
}
|
}
|
||||||
ctx, cancel := opContext()
|
ctx, cancel := opContext()
|
||||||
defer cancel()
|
defer cancel()
|
||||||
if _, err := st.pool.Exec(ctx, `TRUNCATE processed_txn, processed_event, spend, warned_encrypted`); err != nil {
|
if _, err := st.pool.Exec(ctx, `TRUNCATE processed_txn, processed_event, spend, warned_encrypted, request_log, grounding_count`); err != nil {
|
||||||
st.Close()
|
st.Close()
|
||||||
t.Fatalf("truncate test tables: %v", err)
|
t.Fatalf("truncate test tables: %v", err)
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -5,6 +5,7 @@ import (
|
||||||
"fmt"
|
"fmt"
|
||||||
"log/slog"
|
"log/slog"
|
||||||
"sync"
|
"sync"
|
||||||
|
"sync/atomic"
|
||||||
"time"
|
"time"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -30,9 +31,22 @@ type Bot struct {
|
||||||
cfg *Config
|
cfg *Config
|
||||||
log *slog.Logger
|
log *slog.Logger
|
||||||
mx *MatrixClient
|
mx *MatrixClient
|
||||||
xai *XAIClient
|
llm LLMClient
|
||||||
st *Store
|
st *Store
|
||||||
|
|
||||||
|
// gemini is the cheap chat backend for the trivial route and the Layer-1 classifier
|
||||||
|
// (an LLMClient so tests can fake it); nil unless a layer that uses it is enabled.
|
||||||
|
// web is the web-freshness provider, built only when WEB_ENABLED. Both nil → the
|
||||||
|
// cascade can only ever produce grok_direct.
|
||||||
|
gemini LLMClient
|
||||||
|
web WebProvider
|
||||||
|
|
||||||
|
// promptVersion is a short stable hash of the system prompt, logged with each
|
||||||
|
// request so prompt changes are visible in the analytics (A/B + regressions).
|
||||||
|
promptVersion string
|
||||||
|
// telemetryWrites paces the retention trim (every telemetryTrimEvery writes).
|
||||||
|
telemetryWrites atomic.Uint64
|
||||||
|
|
||||||
// mu guards the in-memory maps/sets below. Each transaction is acked to Synapse
|
// mu guards the in-memory maps/sets below. Each transaction is acked to Synapse
|
||||||
// immediately (appservice.go) and its events are processed in transaction order,
|
// immediately (appservice.go) and its events are processed in transaction order,
|
||||||
// but the slow xAI generation runs in a per-room goroutine and the lazy probes run
|
// but the slow xAI generation runs in a per-room goroutine and the lazy probes run
|
||||||
|
|
@ -49,7 +63,7 @@ type Bot struct {
|
||||||
|
|
||||||
func NewBot(ctx context.Context, cfg *Config, logger *slog.Logger) (*Bot, error) {
|
func NewBot(ctx context.Context, cfg *Config, logger *slog.Logger) (*Bot, error) {
|
||||||
mx := NewMatrixClient(cfg.HomeserverURL, cfg.ASToken, cfg.BotMXID)
|
mx := NewMatrixClient(cfg.HomeserverURL, cfg.ASToken, cfg.BotMXID)
|
||||||
xai := NewXAIClient(cfg.XAIBaseURL, cfg.XAIAPIKey, logger)
|
llm := NewXAIClient(cfg.XAIBaseURL, cfg.XAIAPIKey, logger)
|
||||||
|
|
||||||
st, err := OpenStore(cfg.DatabaseURL)
|
st, err := OpenStore(cfg.DatabaseURL)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
|
|
@ -57,16 +71,35 @@ func NewBot(ctx context.Context, cfg *Config, logger *slog.Logger) (*Bot, error)
|
||||||
}
|
}
|
||||||
|
|
||||||
b := &Bot{
|
b := &Bot{
|
||||||
cfg: cfg,
|
cfg: cfg,
|
||||||
log: logger,
|
log: logger,
|
||||||
mx: mx,
|
mx: mx,
|
||||||
xai: xai,
|
llm: llm,
|
||||||
st: st,
|
st: st,
|
||||||
seen: newLRUSet(5000),
|
promptVersion: fmt.Sprintf("%08x", hashString(cfg.SystemPrompt)),
|
||||||
botSent: newLRUSet(5000),
|
seen: newLRUSet(5000),
|
||||||
meta: make(map[string]*roomMeta),
|
botSent: newLRUSet(5000),
|
||||||
buf: make(map[string][]bufferedMsg),
|
meta: make(map[string]*roomMeta),
|
||||||
inflight: make(map[string]bool),
|
buf: make(map[string][]bufferedMsg),
|
||||||
|
inflight: make(map[string]bool),
|
||||||
|
}
|
||||||
|
|
||||||
|
// Build the cascade backends only for enabled layers (config already fail-fast
|
||||||
|
// validated that the keys exist). With every cascade flag off these stay nil and
|
||||||
|
// generate() can only produce grok_direct — today's bot. The grounding web provider
|
||||||
|
// needs the concrete client (for the native generateContent call), so keep a typed
|
||||||
|
// handle alongside the LLMClient face.
|
||||||
|
var gc *geminiClient
|
||||||
|
if cfg.needsGemini() {
|
||||||
|
gc = NewGeminiClient(cfg.GeminiBaseURL, cfg.GeminiAPIKey, cfg.GeminiModel, logger)
|
||||||
|
b.gemini = gc
|
||||||
|
}
|
||||||
|
if cfg.WebEnabled {
|
||||||
|
if cfg.WebProvider == webProviderGeminiGrounding {
|
||||||
|
b.web = &geminiGrounding{gem: gc, st: st, cfg: cfg}
|
||||||
|
} else {
|
||||||
|
b.web = newGrokWebSearch(cfg, logger)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Confirm the as_token + user_id resolves to BOT_MXID before serving.
|
// Confirm the as_token + user_id resolves to BOT_MXID before serving.
|
||||||
|
|
@ -223,7 +256,11 @@ func (b *Bot) handleMessage(ctx context.Context, ev *Event) {
|
||||||
// bot can't read it. The probe runs without the lock.
|
// bot can't read it. The probe runs without the lock.
|
||||||
if b.ensureEncryption(ctx, roomID) {
|
if b.ensureEncryption(ctx, roomID) {
|
||||||
b.log.Debug("skip: encrypted room", "room", roomID)
|
b.log.Debug("skip: encrypted room", "room", roomID)
|
||||||
b.reactEncryptedOnce(ctx, roomID, ev.EventID)
|
// Log the skip only when we actually react (once per room), so an encrypted room
|
||||||
|
// the bot can't read doesn't flood request_log with one row per message.
|
||||||
|
if b.reactEncryptedOnce(ctx, roomID, ev.EventID) {
|
||||||
|
b.recordSkip(ev, degradeEncrypted)
|
||||||
|
}
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -255,6 +292,7 @@ func (b *Bot) handleMessage(ctx context.Context, ev *Event) {
|
||||||
// "leak" the bot into) a federated room with non-consenting third parties.
|
// "leak" the bot into) a federated room with non-consenting third parties.
|
||||||
if foreign {
|
if foreign {
|
||||||
b.leaveForeign(ctx, roomID)
|
b.leaveForeign(ctx, roomID)
|
||||||
|
b.recordSkip(ev, degradeForeign)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -282,6 +320,7 @@ func (b *Bot) handleMessage(ctx context.Context, ev *Event) {
|
||||||
if isMedia {
|
if isMedia {
|
||||||
b.log.Debug("skip: non-text msgtype (reacted)", "room", roomID, "sender", ev.Sender, "msgtype", mc.MsgType)
|
b.log.Debug("skip: non-text msgtype (reacted)", "room", roomID, "sender", ev.Sender, "msgtype", mc.MsgType)
|
||||||
b.react(ctx, roomID, ev.EventID, reactMedia)
|
b.react(ctx, roomID, ev.EventID, reactMedia)
|
||||||
|
b.recordSkip(ev, degradeMedia)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -312,20 +351,46 @@ func (b *Bot) handleMessage(ctx context.Context, ev *Event) {
|
||||||
const unlimitedCap = 1 << 30
|
const unlimitedCap = 1 << 30
|
||||||
|
|
||||||
func (b *Bot) respond(ctx context.Context, roomID string, isDM bool, ev *Event, mc *MessageContent, history []bufferedMsg) {
|
func (b *Bot) respond(ctx context.Context, roomID string, isDM bool, ev *Event, mc *MessageContent, history []bufferedMsg) {
|
||||||
|
started := time.Now()
|
||||||
|
// One telemetry row per request, populated as the flow decides its outcome and
|
||||||
|
// emitted once via defer — so every exit (deny, error, empty, paid silence, success)
|
||||||
|
// is recorded without scattering writes (F-FUNC-5). It starts as route=none/ok=false;
|
||||||
|
// proceeding to the model sets the route, success sets ok=true.
|
||||||
|
rl := RequestLog{
|
||||||
|
ID: ev.EventID, RoomID: roomID, Sender: ev.Sender,
|
||||||
|
Route: routeNone, RouterSource: "default",
|
||||||
|
PromptVersion: b.promptVersion,
|
||||||
|
QueryText: mc.Body,
|
||||||
|
Models: map[string]string{"final": b.cfg.XAIModel},
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
rl.LatencyMS = int(time.Since(started).Milliseconds())
|
||||||
|
b.recordTelemetry(rl)
|
||||||
|
}()
|
||||||
|
|
||||||
perUserCap := b.cfg.PerUserDailyCap
|
perUserCap := b.cfg.PerUserDailyCap
|
||||||
|
perUserUSD := b.cfg.PerUserDailyUSD
|
||||||
if b.cfg.UnlimitedUsers[ev.Sender] {
|
if b.cfg.UnlimitedUsers[ev.Sender] {
|
||||||
perUserCap = unlimitedCap
|
perUserCap = unlimitedCap
|
||||||
|
perUserUSD = 0 // exempt from both per-user gates; the global ceiling still applies
|
||||||
}
|
}
|
||||||
switch res, err := b.st.Reserve(ev.Sender, perUserCap, b.cfg.DailyUSDCeiling); {
|
// Reserve the route's estimated max-cost (not $0) so the global ceiling counts
|
||||||
|
// this in-flight call BEFORE it returns — the TOCTOU fix (§8.1). The envelope covers
|
||||||
|
// the most expensive ENABLED route, so whichever the router picks is admitted within
|
||||||
|
// the reservation; with the cascade off it is exactly grok_direct's estimate.
|
||||||
|
estimate := b.reserveEstimate()
|
||||||
|
switch res, err := b.st.Reserve(ev.Sender, perUserCap, perUserUSD, b.cfg.DailyUSDCeiling, estimate); {
|
||||||
case err != nil:
|
case err != nil:
|
||||||
// A limiter failure is on our side — don't leave the user wondering.
|
// A limiter failure is on our side — don't leave the user wondering.
|
||||||
b.log.Error("limiter reserve failed", "sender", ev.Sender, "err", err)
|
b.log.Error("limiter reserve failed", "sender", ev.Sender, "err", err)
|
||||||
|
rl.Degraded, rl.Err = degradeReserveErr, err.Error()
|
||||||
b.react(ctx, roomID, ev.EventID, reactError)
|
b.react(ctx, roomID, ev.EventID, reactError)
|
||||||
return
|
return
|
||||||
case res == reserveDeniedUser:
|
case res == reserveDeniedUser:
|
||||||
// Per-user cap (anti-abuse, F24): stop answering, but always signal the limit —
|
// Per-user cap (anti-abuse, F24): stop answering, but always signal the limit —
|
||||||
// no message addressed to the bot is left without feedback.
|
// no message addressed to the bot is left without feedback.
|
||||||
b.log.Info("per-user daily cap reached; reacting", "sender", ev.Sender)
|
b.log.Info("per-user daily cap reached; reacting", "sender", ev.Sender)
|
||||||
|
rl.PerUserCapHit = true
|
||||||
b.react(ctx, roomID, ev.EventID, reactRateLimit)
|
b.react(ctx, roomID, ev.EventID, reactRateLimit)
|
||||||
return
|
return
|
||||||
case res == reserveDeniedGlobal:
|
case res == reserveDeniedGlobal:
|
||||||
|
|
@ -333,23 +398,80 @@ func (b *Bot) respond(ctx context.Context, roomID string, isDM bool, ev *Event,
|
||||||
// once-per-day text notice), so signal every affected message rather than
|
// once-per-day text notice), so signal every affected message rather than
|
||||||
// going silent after the first.
|
// going silent after the first.
|
||||||
b.log.Warn("global daily USD ceiling reached", "room", roomID, "sender", ev.Sender)
|
b.log.Warn("global daily USD ceiling reached", "room", roomID, "sender", ev.Sender)
|
||||||
|
rl.CeilingHit = true
|
||||||
b.react(ctx, roomID, ev.EventID, reactRateLimit)
|
b.react(ctx, roomID, ev.EventID, reactRateLimit)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Past admission, a reservation + request slot are held. Guarantee they're freed on
|
||||||
|
// ANY exit that didn't settle — including a panic in generate() (recovered by safego)
|
||||||
|
// — so a leaked reservation can never permanently drift the global ceiling down
|
||||||
|
// (§8.1c). The normal paths set settled=true at Settle, so this defer then no-ops; it
|
||||||
|
// fires only on the panic/unexpected-return path, where it also reacts so the failure
|
||||||
|
// isn't silent.
|
||||||
|
settled := false
|
||||||
|
defer func() {
|
||||||
|
if settled {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
rl.Degraded, rl.Err = "panic", "generation panicked or returned without settling"
|
||||||
|
if rerr := b.st.ReleaseReservation(ev.Sender, estimate); rerr != nil {
|
||||||
|
b.log.Error("release reservation (unsettled) failed", "sender", ev.Sender, "err", rerr)
|
||||||
|
}
|
||||||
|
if rerr := b.st.RefundRequest(ev.Sender); rerr != nil {
|
||||||
|
b.log.Error("refund (unsettled) failed", "sender", ev.Sender, "err", rerr)
|
||||||
|
}
|
||||||
|
b.react(ctx, roomID, ev.EventID, reactError)
|
||||||
|
}()
|
||||||
|
|
||||||
// Show "Vojo AI печатает…" for the whole generation. The keepalive refreshes the
|
// Show "Vojo AI печатает…" for the whole generation. The keepalive refreshes the
|
||||||
// typing notification every 20s (the server expires it after 30s) so the indicator
|
// typing notification every 20s (the server expires it after 30s) so the indicator
|
||||||
// never lapses on a slow/retried answer, and the deferred stop clears it on exit.
|
// never lapses on a slow/retried answer, and the deferred stop clears it on exit.
|
||||||
stopTyping := b.startTypingKeepalive(ctx, roomID)
|
stopTyping := b.startTypingKeepalive(ctx, roomID)
|
||||||
defer stopTyping()
|
defer stopTyping()
|
||||||
|
|
||||||
msgs := buildContext(b.cfg.SystemPrompt, history, isDM, mc.Body, b.cfg.MaxCtxEvent, 8000)
|
// Overall per-request deadline (§8.2.2): every model call in the cascade shares this
|
||||||
resp, err := b.xai.Complete(ctx, b.cfg.XAIModel, msgs, b.cfg.MaxOutTok, b.cfg.XAITemp)
|
// single budget (genCtx), so a multi-stage route can't accrete minutes the way
|
||||||
|
// per-stage 3×60s retries would. react/send/store ops use the live room ctx, NOT
|
||||||
|
// genCtx, so a budget timeout still surfaces as a ⚠️ react, never silence.
|
||||||
|
genCtx, cancel := context.WithTimeout(ctx, b.cfg.RequestBudget)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
msgs := buildContext(b.cfg.SystemPrompt, history, isDM, mc.Body, b.cfg.MaxCtxEvent, maxPromptTokens)
|
||||||
|
res, err := b.generate(genCtx, mc.Body, msgs, b.convID(roomID))
|
||||||
|
|
||||||
|
// Record what the routing + generation actually did, whatever the outcome.
|
||||||
|
rl.Route = res.route
|
||||||
|
rl.RouterSource = res.decision.Source
|
||||||
|
rl.RouterConfidence = res.decision.Confidence
|
||||||
|
rl.FallbackFired = res.fallback
|
||||||
|
rl.Escalated = res.route == routeReason
|
||||||
|
rl.Cost = res.cost
|
||||||
|
if res.stageMS != nil {
|
||||||
|
rl.StageMS = res.stageMS
|
||||||
|
}
|
||||||
|
if res.finalModel != "" {
|
||||||
|
rl.Models["final"] = res.finalModel
|
||||||
|
}
|
||||||
|
if res.decision.Source == "classifier" {
|
||||||
|
rl.Models["router"] = b.cfg.GeminiModel
|
||||||
|
}
|
||||||
|
if res.degraded != "" {
|
||||||
|
rl.Degraded = res.degraded
|
||||||
|
}
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
// at-most-once already retried transient failures inside Complete; refund the
|
// Terminal: even grok_direct failed. Settle whatever the cascade ACTUALLY spent
|
||||||
// reserved request so an xAI outage doesn't burn the user's daily cap, and
|
// (e.g. a paid web fetch before the failure) and release the rest of the
|
||||||
// signal the failure (react → no anti-loop, no language).
|
// reservation in one step, then refund the request slot so an outage doesn't burn
|
||||||
b.log.Error("xai completion failed", "sender", ev.Sender, "err", err)
|
// the cap, and react (never silent). Settle with an all-zero cost is just a
|
||||||
|
// release, so a pure grok_direct failure books nothing — exactly as before.
|
||||||
|
b.log.Error("generation failed", "sender", ev.Sender, "route", res.route, "err", err)
|
||||||
|
rl.Err = err.Error()
|
||||||
|
if serr := b.st.Settle(ev.Sender, estimate, res.cost); serr != nil {
|
||||||
|
b.log.Error("settle (failed request) failed", "sender", ev.Sender, "err", serr)
|
||||||
|
}
|
||||||
|
settled = true
|
||||||
if rerr := b.st.RefundRequest(ev.Sender); rerr != nil {
|
if rerr := b.st.RefundRequest(ev.Sender); rerr != nil {
|
||||||
b.log.Error("refund failed", "sender", ev.Sender, "err", rerr)
|
b.log.Error("refund failed", "sender", ev.Sender, "err", rerr)
|
||||||
}
|
}
|
||||||
|
|
@ -357,39 +479,86 @@ func (b *Bot) respond(ctx context.Context, roomID string, isDM bool, ev *Event,
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
// A 2xx from xAI is billed even if the text came back empty — always book the real
|
// Success from some route. Settle: release the reservation and book the real
|
||||||
// cost so both caps see it (an empty 200 must not bypass the per-user cap and the
|
// per-component cost, so both caps see grounding/tool fees too — not just tokens.
|
||||||
// global ceiling).
|
if err := b.st.Settle(ev.Sender, estimate, res.cost); err != nil {
|
||||||
usd := computeUSD(resp.Usage, b.cfg)
|
b.log.Error("settle spend failed", "sender", ev.Sender, "err", err)
|
||||||
if err := b.st.Reconcile(ev.Sender, usd); err != nil {
|
|
||||||
b.log.Error("reconcile spend failed", "sender", ev.Sender, "err", err)
|
|
||||||
}
|
}
|
||||||
|
settled = true
|
||||||
|
rl.PromptTokens, rl.CachedTokens, rl.CompletionTokens =
|
||||||
|
res.usage.PromptTokens, res.usage.CachedTokens, res.usage.CompletionTokens
|
||||||
|
rl.CacheHit = res.usage.CachedTokens > 0
|
||||||
|
rl.ProviderRequestID = res.providerID
|
||||||
|
|
||||||
text := resp.Text()
|
text := res.text
|
||||||
if text == "" {
|
if text == "" {
|
||||||
// Billed but no usable text (content filter / length cap / empty choices). Never
|
// Billed but no usable text (content filter / length cap / empty choices). Never
|
||||||
// leave a billed request without feedback — react "couldn't answer".
|
// leave a billed request without feedback — react "couldn't answer". The slot
|
||||||
b.log.Warn("xai returned empty completion (billed, reacting)", "sender", ev.Sender, "usd", usd)
|
// stays consumed (the 2xx was real); no refund, or an empty reply could be forced
|
||||||
|
// to dodge the cap.
|
||||||
|
b.log.Warn("empty completion (billed, reacting)", "sender", ev.Sender, "usd", res.cost.Total())
|
||||||
|
rl.Degraded = degradeEmpty
|
||||||
b.react(ctx, roomID, ev.EventID, reactError)
|
b.react(ctx, roomID, ev.EventID, reactError)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
b.log.Info("answered", "room", roomID, "sender", ev.Sender, "dm", isDM,
|
b.log.Info("answered", "room", roomID, "sender", ev.Sender, "dm", isDM, "route", res.route,
|
||||||
"usd", usd, "prompt_tokens", resp.Usage.PromptTokens, "completion_tokens", resp.Usage.CompletionTokens)
|
"usd", res.cost.Total(), "prompt_tokens", res.usage.PromptTokens, "completion_tokens", res.usage.CompletionTokens)
|
||||||
b.sendReply(ctx, roomID, ev, mc, text)
|
if err := b.sendReply(ctx, roomID, ev, mc, text); err != nil {
|
||||||
|
// Paid silence (§8.1): the spend is real (USD is kept — refunding it would
|
||||||
|
// under-count the ceiling), but the reply never landed. Refund the request SLOT
|
||||||
|
// so the user can retry, and react ⚠️ so the failure isn't silent.
|
||||||
|
b.log.Error("send reply failed after billing; refunding slot + reacting", "sender", ev.Sender, "err", err)
|
||||||
|
rl.Degraded, rl.Err = degradeSendFailed, err.Error()
|
||||||
|
if rerr := b.st.RefundRequest(ev.Sender); rerr != nil {
|
||||||
|
b.log.Error("refund failed", "sender", ev.Sender, "err", rerr)
|
||||||
|
}
|
||||||
|
b.react(ctx, roomID, ev.EventID, reactError)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
rl.OK = true
|
||||||
}
|
}
|
||||||
|
|
||||||
// computeUSD prices the call from the API-returned token usage (authoritative
|
// maxPromptTokens bounds the assembled prompt (history is trimmed to fit) and feeds
|
||||||
// counts) and the configured per-1M prices — so the hard ceiling tracks real
|
// the reservation estimate, so the two never disagree about a request's size.
|
||||||
// usage even if the model/price changes (only the constants need updating).
|
const maxPromptTokens = 8000
|
||||||
func computeUSD(u xaiUsage, cfg *Config) float64 {
|
|
||||||
cached := u.PromptTokensDetails.CachedTokens
|
// estimateUSD is the conservative max-cost reserved for a route before the call, so
|
||||||
nonCached := u.PromptTokens - cached
|
// the global ceiling can count an in-flight request (§8.1). It prices a full prompt
|
||||||
|
// (maxPromptTokens) plus the max output at the model's non-cached rates — an upper-ish
|
||||||
|
// bound, since real calls send fewer tokens and get the cheaper cached rate. Settle
|
||||||
|
// later books the authoritative actual cost regardless, so a slightly-off estimate
|
||||||
|
// only nudges admission, never the final accounting.
|
||||||
|
func (b *Bot) estimateUSD(model string) float64 {
|
||||||
|
p := b.cfg.priceFor(model)
|
||||||
|
return float64(maxPromptTokens)/1e6*p.InputPerM + float64(b.cfg.MaxOutTok)/1e6*p.OutputPerM
|
||||||
|
}
|
||||||
|
|
||||||
|
// convID returns the prompt-cache routing hint sent as x-grok-conv-id, or "" when
|
||||||
|
// GROK_PROMPT_CACHE is off. Grok caches prompt prefixes automatically; the header
|
||||||
|
// only pins a conversation to the same backend to raise the hit rate (docs.x.ai), so
|
||||||
|
// a stable per-room id is the right unit — every turn in a room shares the system
|
||||||
|
// prompt and history prefix. It carries no PII (the room id is opaque) and is hashed
|
||||||
|
// to keep it compact and non-identifying.
|
||||||
|
func (b *Bot) convID(roomID string) string {
|
||||||
|
if !b.cfg.GrokPromptCache {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
return fmt.Sprintf("vojo-%08x", hashString(roomID))
|
||||||
|
}
|
||||||
|
|
||||||
|
// computeUSD prices a call from the API-returned token usage (authoritative
|
||||||
|
// counts) and the per-model price table — so the hard ceiling tracks real usage
|
||||||
|
// even if the model/price changes (only the price table needs updating), and a
|
||||||
|
// call books at the price of the model that actually served it.
|
||||||
|
func computeUSD(model string, u Usage, cfg *Config) float64 {
|
||||||
|
p := cfg.priceFor(model)
|
||||||
|
nonCached := u.PromptTokens - u.CachedTokens
|
||||||
if nonCached < 0 {
|
if nonCached < 0 {
|
||||||
nonCached = 0
|
nonCached = 0
|
||||||
}
|
}
|
||||||
return float64(nonCached)/1e6*cfg.PriceInputPerM +
|
return float64(nonCached)/1e6*p.InputPerM +
|
||||||
float64(cached)/1e6*cfg.PriceCachedPerM +
|
float64(u.CachedTokens)/1e6*p.CachedPerM +
|
||||||
float64(u.CompletionTokens)/1e6*cfg.PriceOutputPerM
|
float64(u.CompletionTokens)/1e6*p.OutputPerM
|
||||||
}
|
}
|
||||||
|
|
||||||
// react adds an emoji m.reaction to the triggering event — the bot's language-free
|
// react adds an emoji m.reaction to the triggering event — the bot's language-free
|
||||||
|
|
@ -415,27 +584,31 @@ func (b *Bot) react(ctx context.Context, roomID, eventID, emoji string) {
|
||||||
// default, so this is a near-dead safety path; the reaction is far less intrusive
|
// default, so this is a near-dead safety path; the reaction is far less intrusive
|
||||||
// than the old text notice, but the once-gate keeps it from annotating every message
|
// than the old text notice, but the once-gate keeps it from annotating every message
|
||||||
// in the rare encrypted room.
|
// in the rare encrypted room.
|
||||||
func (b *Bot) reactEncryptedOnce(ctx context.Context, roomID, eventID string) {
|
// reactEncryptedOnce returns whether it reacted this call (true only the first time
|
||||||
|
// for a room), so the caller can log the skip exactly once too.
|
||||||
|
func (b *Bot) reactEncryptedOnce(ctx context.Context, roomID, eventID string) bool {
|
||||||
warned, err := b.st.HasWarnedEncrypted(roomID)
|
warned, err := b.st.HasWarnedEncrypted(roomID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
b.log.Error("warned-flag read failed", "room", roomID, "err", err)
|
b.log.Error("warned-flag read failed", "room", roomID, "err", err)
|
||||||
return
|
return false
|
||||||
}
|
}
|
||||||
if warned {
|
if warned {
|
||||||
return
|
return false
|
||||||
}
|
}
|
||||||
b.react(ctx, roomID, eventID, reactEncrypted)
|
b.react(ctx, roomID, eventID, reactEncrypted)
|
||||||
if err := b.st.SetWarnedEncrypted(roomID); err != nil {
|
if err := b.st.SetWarnedEncrypted(roomID); err != nil {
|
||||||
b.log.Error("persist warned-flag failed", "room", roomID, "err", err)
|
b.log.Error("persist warned-flag failed", "room", roomID, "err", err)
|
||||||
}
|
}
|
||||||
|
return true
|
||||||
}
|
}
|
||||||
|
|
||||||
// sendReply sends the model's actual answer and records the completed exchange in the
|
// sendReply sends the model's actual answer and records the completed exchange in the
|
||||||
// conversation buffer so the next turn has context.
|
// conversation buffer so the next turn has context. It RETURNS the send error so the
|
||||||
func (b *Bot) sendReply(ctx context.Context, roomID string, trigger *Event, triggerMC *MessageContent, body string) {
|
// caller can handle paid silence (§8.1): a billed answer that failed to deliver must
|
||||||
id := b.sendMessage(ctx, roomID, trigger, triggerMC, body)
|
// refund the slot and react, not vanish.
|
||||||
if id == "" {
|
func (b *Bot) sendReply(ctx context.Context, roomID string, trigger *Event, triggerMC *MessageContent, body string) error {
|
||||||
return
|
if err := b.sendMessage(ctx, roomID, trigger, triggerMC, body); err != nil {
|
||||||
|
return err
|
||||||
}
|
}
|
||||||
// Record the user trigger AND the assistant answer together, only AFTER the answer
|
// Record the user trigger AND the assistant answer together, only AFTER the answer
|
||||||
// was sent, so a failed or empty generation never leaves a dangling user turn (a
|
// was sent, so a failed or empty generation never leaves a dangling user turn (a
|
||||||
|
|
@ -443,20 +616,21 @@ func (b *Bot) sendReply(ctx context.Context, roomID string, trigger *Event, trig
|
||||||
// Single-flight guarantees no other turn for this room interleaves between the two.
|
// Single-flight guarantees no other turn for this room interleaves between the two.
|
||||||
b.appendBuf(roomID, bufferedMsg{sender: trigger.Sender, body: triggerMC.Body, isBot: false})
|
b.appendBuf(roomID, bufferedMsg{sender: trigger.Sender, body: triggerMC.Body, isBot: false})
|
||||||
b.appendBuf(roomID, bufferedMsg{sender: b.cfg.BotMXID, body: body, isBot: true})
|
b.appendBuf(roomID, bufferedMsg{sender: b.cfg.BotMXID, body: body, isBot: true})
|
||||||
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
// sendMessage builds and sends an m.notice reply, tracks our own event id, and returns
|
// sendMessage builds and sends an m.notice reply and tracks our own event id. Returns
|
||||||
// the new event id ("" on failure).
|
// the send error (nil on success) so the caller can detect a failed delivery.
|
||||||
func (b *Bot) sendMessage(ctx context.Context, roomID string, trigger *Event, triggerMC *MessageContent, body string) string {
|
func (b *Bot) sendMessage(ctx context.Context, roomID string, trigger *Event, triggerMC *MessageContent, body string) error {
|
||||||
content := buildNoticeContent(trigger.EventID, trigger.Sender, triggerMC.RelatesTo, body)
|
content := buildNoticeContent(trigger.EventID, trigger.Sender, triggerMC.RelatesTo, body)
|
||||||
id, err := b.mx.SendEvent(ctx, roomID, "m.room.message", content)
|
id, err := b.mx.SendEvent(ctx, roomID, "m.room.message", content)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
b.log.Error("send failed", "room", roomID, "err", err)
|
b.log.Error("send failed", "room", roomID, "err", err)
|
||||||
return ""
|
return err
|
||||||
}
|
}
|
||||||
// Track our own reply so a future reply-to-it is recognised as addressing us.
|
// Track our own reply so a future reply-to-it is recognised as addressing us.
|
||||||
b.botSent.Add(id)
|
b.botSent.Add(id)
|
||||||
return id
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
// startTypingKeepalive starts the typing indicator and keeps it alive for the whole
|
// startTypingKeepalive starts the typing indicator and keeps it alive for the whole
|
||||||
|
|
|
||||||
|
|
@ -71,17 +71,22 @@ func TestStripReplyFallback(t *testing.T) {
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestComputeUSD(t *testing.T) {
|
func TestComputeUSD(t *testing.T) {
|
||||||
cfg := &Config{PriceInputPerM: 1.25, PriceCachedPerM: 0.20, PriceOutputPerM: 2.50}
|
const model = "grok-test"
|
||||||
var u xaiUsage
|
cfg := &Config{XAIModel: model, Prices: map[string]ModelPrice{
|
||||||
u.PromptTokens = 1_000_000
|
model: {InputPerM: 1.25, CachedPerM: 0.20, OutputPerM: 2.50},
|
||||||
u.PromptTokensDetails.CachedTokens = 400_000
|
}}
|
||||||
u.CompletionTokens = 1_000_000
|
u := Usage{PromptTokens: 1_000_000, CachedTokens: 400_000, CompletionTokens: 1_000_000}
|
||||||
// nonCached 600k*1.25 + cached 400k*0.20 + out 1M*2.50 = 0.75 + 0.08 + 2.50
|
// nonCached 600k*1.25 + cached 400k*0.20 + out 1M*2.50 = 0.75 + 0.08 + 2.50
|
||||||
got := computeUSD(u, cfg)
|
got := computeUSD(model, u, cfg)
|
||||||
want := 0.75 + 0.08 + 2.50
|
want := 0.75 + 0.08 + 2.50
|
||||||
if diff := got - want; diff > 1e-9 || diff < -1e-9 {
|
if diff := got - want; diff > 1e-9 || diff < -1e-9 {
|
||||||
t.Fatalf("computeUSD = %v, want %v", got, want)
|
t.Fatalf("computeUSD = %v, want %v", got, want)
|
||||||
}
|
}
|
||||||
|
// An unknown model falls back to the default model's price (never $0, which would
|
||||||
|
// blind the ceiling).
|
||||||
|
if got := computeUSD("unknown-model", u, cfg); got != want {
|
||||||
|
t.Fatalf("unknown-model fallback = %v, want default %v", got, want)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestBuildContextGroupDropsThirdParties(t *testing.T) {
|
func TestBuildContextGroupDropsThirdParties(t *testing.T) {
|
||||||
|
|
|
||||||
279
apps/ai-bot/cascade.go
Normal file
279
apps/ai-bot/cascade.go
Normal file
|
|
@ -0,0 +1,279 @@
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"errors"
|
||||||
|
"fmt"
|
||||||
|
"strings"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// cascade.go is the generation half of the bot: given an admitted request, it routes
|
||||||
|
// (router.go), runs the chosen route's provider(s), and ALWAYS degrades to grok_direct
|
||||||
|
// on any layer being off or failing (§8.2). It returns a genResult the business logic
|
||||||
|
// (respond) settles, sends, and logs — keeping ledger/never-silent/telemetry in one
|
||||||
|
// place and the routing here. With every cascade flag off, classify returns grok_direct
|
||||||
|
// and this collapses to exactly today's single Grok call.
|
||||||
|
|
||||||
|
// genResult is everything respond needs from a generation: the answer, the model's
|
||||||
|
// usage (for token billing), the FULL cost breakdown (router + web + final), and the
|
||||||
|
// routing metadata for telemetry. cost accumulates across stages, so a partial cascade
|
||||||
|
// (a paid web fetch that then degraded) still books what it actually spent.
|
||||||
|
type genResult struct {
|
||||||
|
text string
|
||||||
|
usage Usage
|
||||||
|
cost CostBreakdown
|
||||||
|
finalModel string
|
||||||
|
providerID string
|
||||||
|
decision RouterDecision
|
||||||
|
route string // the route actually taken (may differ from decision on degrade)
|
||||||
|
fallback bool // true if we degraded off the decided route
|
||||||
|
degraded string // degrade reason for request_log
|
||||||
|
stageMS map[string]int
|
||||||
|
}
|
||||||
|
|
||||||
|
func msSince(t time.Time) int { return int(time.Since(t).Milliseconds()) }
|
||||||
|
|
||||||
|
// reserveEstimate is the admission envelope: the most expensive ENABLED route's cost,
|
||||||
|
// so whichever route the router picks is covered by the reservation (the ceiling can't
|
||||||
|
// be slipped by routing to a pricier path after admission). With every cascade flag
|
||||||
|
// off it equals grok_direct's estimate — byte-for-byte today's reservation. Slightly
|
||||||
|
// generous is fine: Settle books the authoritative actual afterward.
|
||||||
|
func (b *Bot) reserveEstimate() float64 {
|
||||||
|
est := b.estimateUSD(b.cfg.XAIModel) // grok_direct / trivial(cheaper)/synthesis base
|
||||||
|
if b.cfg.WebEnabled {
|
||||||
|
// web_then_grok = a web fetch fee + the Grok synthesis already counted above.
|
||||||
|
if b.cfg.WebProvider == webProviderGrokWebSearch {
|
||||||
|
// fetch can search several times and pull large context; reserve generously.
|
||||||
|
est += float64(maxWebSearchCalls)*grokWebSearchPerCall + b.estimateUSD(b.cfg.XAIModel)
|
||||||
|
} else {
|
||||||
|
est += b.estimateUSD(b.cfg.GeminiModel)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if b.cfg.ReasoningEnabled {
|
||||||
|
// Higher reasoning effort can burn more output tokens; reserve double.
|
||||||
|
est = max(est, 2*b.estimateUSD(b.cfg.ReasoningModel))
|
||||||
|
}
|
||||||
|
return est
|
||||||
|
}
|
||||||
|
|
||||||
|
// generate routes and produces an answer, degrading to grok_direct on any failure.
|
||||||
|
// It returns a terminal error ONLY if even grok_direct fails; every other route falls
|
||||||
|
// through to grok_direct rather than erroring.
|
||||||
|
func (b *Bot) generate(ctx context.Context, body string, msgs []Message, convID string) (genResult, error) {
|
||||||
|
res := genResult{stageMS: map[string]int{}, finalModel: b.cfg.XAIModel}
|
||||||
|
|
||||||
|
t0 := time.Now()
|
||||||
|
res.decision = b.classify(ctx, body, &res.cost) // accumulates cost.Router if Layer-1 runs
|
||||||
|
res.stageMS["router"] = msSince(t0)
|
||||||
|
res.route = res.decision.Route
|
||||||
|
|
||||||
|
finalMsgs := msgs
|
||||||
|
switch res.decision.Route {
|
||||||
|
case routeTrivial:
|
||||||
|
if b.cfg.TrivialOffloadEnabled && b.gemini != nil {
|
||||||
|
if err := b.genTrivial(ctx, msgs, &res); err == nil {
|
||||||
|
return res, nil
|
||||||
|
} else {
|
||||||
|
b.log.Warn("trivial offload failed; degrading to grok_direct", "err", err)
|
||||||
|
b.degradeTo(&res, degradeTrivial)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
case routeWebThenGrok:
|
||||||
|
if b.cfg.WebEnabled && b.web != nil {
|
||||||
|
if err := b.genWebThenGrok(ctx, body, msgs, convID, &res); err == nil {
|
||||||
|
return res, nil
|
||||||
|
} else {
|
||||||
|
b.log.Warn("web route failed; degrading to grok_direct", "err", err, "reason", res.degraded)
|
||||||
|
b.degradeTo(&res, degradeWeb)
|
||||||
|
// The question wanted fresh facts but we have none — answer from training
|
||||||
|
// knowledge WITH an honest staleness caveat, not stale-as-current (§8.2.1).
|
||||||
|
finalMsgs = hedgeMessages(msgs)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
case routeReason:
|
||||||
|
if b.cfg.ReasoningEnabled {
|
||||||
|
if err := b.genReason(ctx, msgs, convID, &res); err == nil {
|
||||||
|
return res, nil
|
||||||
|
} else {
|
||||||
|
b.log.Warn("reasoning route failed; degrading to grok_direct", "err", err)
|
||||||
|
b.degradeTo(&res, degradeReasoning)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// grok_direct — the default route AND the universal fallback. The only path that
|
||||||
|
// can return a terminal error (even Grok failed). It preserves any cost already
|
||||||
|
// spent (router classifier, a partial web fetch) in res.cost.
|
||||||
|
if err := b.genGrokDirect(ctx, finalMsgs, convID, &res); err != nil {
|
||||||
|
return res, err
|
||||||
|
}
|
||||||
|
return res, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// degradeTo marks res as a fallback to grok_direct, keeping the first/most-specific
|
||||||
|
// degrade reason (e.g. a web provider's grounding_cap set inside genWebThenGrok).
|
||||||
|
func (b *Bot) degradeTo(res *genResult, reason string) {
|
||||||
|
res.fallback = true
|
||||||
|
if res.degraded == "" {
|
||||||
|
res.degraded = reason
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// genGrokDirect is today's path: one Grok call. Also the fallback for every other
|
||||||
|
// route. On success it fills res (route, final model, text, usage, provider id) and
|
||||||
|
// adds the token cost.
|
||||||
|
func (b *Bot) genGrokDirect(ctx context.Context, msgs []Message, convID string, res *genResult) error {
|
||||||
|
t := time.Now()
|
||||||
|
resp, err := b.llm.Complete(ctx, LLMRequest{
|
||||||
|
Model: b.cfg.XAIModel,
|
||||||
|
Messages: msgs,
|
||||||
|
MaxTokens: b.cfg.MaxOutTok,
|
||||||
|
Temperature: b.cfg.XAITemp,
|
||||||
|
ConvID: convID,
|
||||||
|
ReasoningEffort: b.cfg.GrokReasoningEffort, // "" → not sent; "none" keeps grok-4.3 fast
|
||||||
|
})
|
||||||
|
res.stageMS["final"] = msSince(t)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
res.route, res.finalModel = routeGrokDirect, b.cfg.XAIModel
|
||||||
|
res.text, res.usage, res.providerID = resp.Text, resp.Usage, resp.ProviderRequestID
|
||||||
|
res.cost.Token += computeUSD(b.cfg.XAIModel, resp.Usage, b.cfg)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// genTrivial answers a trivial message with the cheap Gemini model. An empty reply is
|
||||||
|
// treated as a failure so the caller degrades to Grok rather than sending nothing.
|
||||||
|
func (b *Bot) genTrivial(ctx context.Context, msgs []Message, res *genResult) error {
|
||||||
|
t := time.Now()
|
||||||
|
resp, err := b.gemini.Complete(ctx, LLMRequest{
|
||||||
|
Model: b.cfg.GeminiModel,
|
||||||
|
Messages: msgs,
|
||||||
|
MaxTokens: b.cfg.MaxOutTok,
|
||||||
|
Temperature: b.cfg.XAITemp,
|
||||||
|
})
|
||||||
|
res.stageMS["final"] = msSince(t)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if strings.TrimSpace(resp.Text) == "" {
|
||||||
|
return fmt.Errorf("trivial: empty Gemini reply")
|
||||||
|
}
|
||||||
|
res.route, res.finalModel = routeTrivial, b.cfg.GeminiModel
|
||||||
|
res.text, res.usage, res.providerID = resp.Text, resp.Usage, resp.ProviderRequestID
|
||||||
|
res.cost.Token += computeUSD(b.cfg.GeminiModel, resp.Usage, b.cfg)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// genReason answers with Grok at a higher reasoning effort. Uses the configured
|
||||||
|
// reasoning-capable model (the default grok-4.20-non-reasoning would reject the param).
|
||||||
|
func (b *Bot) genReason(ctx context.Context, msgs []Message, convID string, res *genResult) error {
|
||||||
|
t := time.Now()
|
||||||
|
resp, err := b.llm.Complete(ctx, LLMRequest{
|
||||||
|
Model: b.cfg.ReasoningModel,
|
||||||
|
Messages: msgs,
|
||||||
|
MaxTokens: b.cfg.MaxOutTok,
|
||||||
|
Temperature: b.cfg.XAITemp,
|
||||||
|
ReasoningEffort: b.cfg.ReasoningEffort, // "think harder" level (default high)
|
||||||
|
ConvID: convID,
|
||||||
|
})
|
||||||
|
res.stageMS["final"] = msSince(t)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if strings.TrimSpace(resp.Text) == "" {
|
||||||
|
return fmt.Errorf("reason: empty reply")
|
||||||
|
}
|
||||||
|
res.route, res.finalModel = routeReason, b.cfg.ReasoningModel
|
||||||
|
res.text, res.usage, res.providerID = resp.Text, resp.Usage, resp.ProviderRequestID
|
||||||
|
res.cost.Token += computeUSD(b.cfg.ReasoningModel, resp.Usage, b.cfg)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// webStageTimeout bounds the web/grounding fetch independently of the overall budget
|
||||||
|
// (§8.2.2): a slow search must not eat the whole request before synthesis.
|
||||||
|
const webStageTimeout = 15 * time.Second
|
||||||
|
|
||||||
|
// genWebThenGrok fetches fresh facts via the web provider, then has Grok synthesise the
|
||||||
|
// answer in voice from that digest. The web fetch's cost+tokens are booked into res
|
||||||
|
// EVEN ON FAILURE — the call was billed — so a synth failure or empty fetch still
|
||||||
|
// accounts for the spend before the caller degrades to grok_direct (the partial cascade
|
||||||
|
// case, §8.1). The daily cap and per-stage deadline are applied here, uniformly for both
|
||||||
|
// providers.
|
||||||
|
func (b *Bot) genWebThenGrok(ctx context.Context, body string, msgs []Message, convID string, res *genResult) error {
|
||||||
|
// Per-stage web/grounding deadline, independent of the overall budget.
|
||||||
|
wctx, cancelW := context.WithTimeout(ctx, webStageTimeout)
|
||||||
|
tw := time.Now()
|
||||||
|
wc, ferr := b.web.Fetch(wctx, body)
|
||||||
|
cancelW()
|
||||||
|
res.stageMS["web"] = msSince(tw)
|
||||||
|
// Book the fetch's fee + tokens whether or not it produced a usable digest — the call
|
||||||
|
// was billed (the daily cap, if any, is enforced inside the provider).
|
||||||
|
res.cost.Grounding += wc.Cost.Grounding
|
||||||
|
res.cost.WebTool += wc.Cost.WebTool
|
||||||
|
webUsage := wc.Usage
|
||||||
|
if ferr != nil {
|
||||||
|
if errors.Is(ferr, errGroundingCapped) {
|
||||||
|
res.degraded = degradeGroundCap
|
||||||
|
}
|
||||||
|
return ferr // web fee already booked; caller degrades to grok_direct (with hedge)
|
||||||
|
}
|
||||||
|
|
||||||
|
tf := time.Now()
|
||||||
|
resp, err := b.llm.Complete(ctx, LLMRequest{
|
||||||
|
Model: b.cfg.XAIModel,
|
||||||
|
Messages: webSynthMessages(msgs, wc),
|
||||||
|
MaxTokens: b.cfg.MaxOutTok,
|
||||||
|
Temperature: b.cfg.XAITemp,
|
||||||
|
ConvID: convID,
|
||||||
|
ReasoningEffort: b.cfg.GrokReasoningEffort, // same voice, same effort as grok_direct
|
||||||
|
})
|
||||||
|
res.stageMS["final"] = msSince(tf)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if strings.TrimSpace(resp.Text) == "" {
|
||||||
|
return fmt.Errorf("web synth: empty reply")
|
||||||
|
}
|
||||||
|
res.route, res.finalModel = routeWebThenGrok, b.cfg.XAIModel
|
||||||
|
res.text, res.providerID = resp.Text, resp.ProviderRequestID
|
||||||
|
// Report BOTH calls' tokens so the analytics token totals match the two-call route.
|
||||||
|
res.usage = Usage{
|
||||||
|
PromptTokens: resp.Usage.PromptTokens + webUsage.PromptTokens,
|
||||||
|
CachedTokens: resp.Usage.CachedTokens + webUsage.CachedTokens,
|
||||||
|
CompletionTokens: resp.Usage.CompletionTokens + webUsage.CompletionTokens,
|
||||||
|
}
|
||||||
|
res.cost.Token += computeUSD(b.cfg.XAIModel, resp.Usage, b.cfg)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// webSynthMessages inserts the fresh web digest (and its sources) as a system note just
|
||||||
|
// after the system prompt, so Grok answers in voice using current facts.
|
||||||
|
func webSynthMessages(base []Message, wc WebContext) []Message {
|
||||||
|
facts := "Свежие данные из веба (используй их в ответе и сошлись на источники):\n" + wc.Digest
|
||||||
|
if len(wc.Citations) > 0 {
|
||||||
|
facts += "\nИсточники: " + strings.Join(wc.Citations, ", ")
|
||||||
|
}
|
||||||
|
return insertSystemNote(base, facts)
|
||||||
|
}
|
||||||
|
|
||||||
|
// hedgeMessages adds an honest staleness caveat for a web→grok_direct degrade: the user
|
||||||
|
// wanted fresh facts but we couldn't fetch them, so the model must flag that its answer
|
||||||
|
// is from training knowledge and may be out of date.
|
||||||
|
func hedgeMessages(base []Message) []Message {
|
||||||
|
return insertSystemNote(base, "Нет доступа к свежим источникам прямо сейчас — отвечай по знаниям на момент обучения и честно предупреди, что данные могут быть устаревшими.")
|
||||||
|
}
|
||||||
|
|
||||||
|
// insertSystemNote inserts an extra system message right after the system prompt
|
||||||
|
// (base[0] from buildContext), preserving the rest of the window.
|
||||||
|
func insertSystemNote(base []Message, content string) []Message {
|
||||||
|
note := Message{Role: "system", Content: content}
|
||||||
|
if len(base) == 0 {
|
||||||
|
return []Message{note}
|
||||||
|
}
|
||||||
|
out := make([]Message, 0, len(base)+1)
|
||||||
|
out = append(out, base[0], note)
|
||||||
|
out = append(out, base[1:]...)
|
||||||
|
return out
|
||||||
|
}
|
||||||
275
apps/ai-bot/cascade_test.go
Normal file
275
apps/ai-bot/cascade_test.go
Normal file
|
|
@ -0,0 +1,275 @@
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"errors"
|
||||||
|
"io"
|
||||||
|
"log/slog"
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
func discardLog() *slog.Logger { return slog.New(slog.NewTextHandler(io.Discard, nil)) }
|
||||||
|
|
||||||
|
// fakeLLM is a scriptable LLMClient for dispatch/degrade tests.
|
||||||
|
type fakeLLM struct {
|
||||||
|
text string
|
||||||
|
usage Usage
|
||||||
|
err error
|
||||||
|
calls int
|
||||||
|
lastReq LLMRequest
|
||||||
|
}
|
||||||
|
|
||||||
|
func (f *fakeLLM) Complete(_ context.Context, req LLMRequest) (*LLMResponse, error) {
|
||||||
|
f.calls++
|
||||||
|
f.lastReq = req
|
||||||
|
if f.err != nil {
|
||||||
|
return nil, f.err
|
||||||
|
}
|
||||||
|
return &LLMResponse{Text: f.text, Usage: f.usage, ProviderRequestID: "fake"}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
type fakeWeb struct {
|
||||||
|
wc WebContext
|
||||||
|
err error
|
||||||
|
calls int
|
||||||
|
}
|
||||||
|
|
||||||
|
func (f *fakeWeb) Fetch(_ context.Context, _ string) (WebContext, error) {
|
||||||
|
f.calls++
|
||||||
|
if f.err != nil {
|
||||||
|
return WebContext{}, f.err
|
||||||
|
}
|
||||||
|
return f.wc, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// cascadeCfg is a config with the model/price table set and EVERY cascade flag off.
|
||||||
|
// Tests flip individual flags on a copy.
|
||||||
|
func cascadeCfg() Config {
|
||||||
|
return Config{
|
||||||
|
XAIModel: "grok-x", GeminiModel: "gemini-x", ReasoningModel: "grok-reason",
|
||||||
|
MaxOutTok: 100, XAITemp: 0.5,
|
||||||
|
ReasoningTrigger: "подумай глубже",
|
||||||
|
ReasoningEffort: "high",
|
||||||
|
WebProvider: webProviderGrokWebSearch,
|
||||||
|
Prices: map[string]ModelPrice{
|
||||||
|
"grok-x": {InputPerM: 1, CachedPerM: 0.2, OutputPerM: 2},
|
||||||
|
"gemini-x": {InputPerM: 0.1, CachedPerM: 0.1, OutputPerM: 0.4},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func msgs(body string) []Message {
|
||||||
|
return []Message{{Role: "system", Content: "SYS"}, {Role: "user", Content: body}}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestGenerateAllFlagsOffIsGrokDirect is the cascade-off parity invariant: even a
|
||||||
|
// "trivial"-looking message goes to Grok, and Gemini is never touched, when the router
|
||||||
|
// is off.
|
||||||
|
func TestGenerateAllFlagsOffIsGrokDirect(t *testing.T) {
|
||||||
|
grok := &fakeLLM{text: "grok answer"}
|
||||||
|
gem := &fakeLLM{text: "should not run"}
|
||||||
|
cfg := cascadeCfg()
|
||||||
|
b := &Bot{cfg: &cfg, llm: grok, gemini: gem, log: discardLog()}
|
||||||
|
|
||||||
|
res, err := b.generate(context.Background(), "привет", msgs("привет"), "")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("generate: %v", err)
|
||||||
|
}
|
||||||
|
if res.route != routeGrokDirect || res.text != "grok answer" {
|
||||||
|
t.Fatalf("res = (%q,%q), want grok_direct/\"grok answer\"", res.route, res.text)
|
||||||
|
}
|
||||||
|
if res.decision.Source != "default" {
|
||||||
|
t.Fatalf("router source = %q, want default (router off)", res.decision.Source)
|
||||||
|
}
|
||||||
|
if grok.calls != 1 || gem.calls != 0 {
|
||||||
|
t.Fatalf("calls grok=%d gem=%d, want 1/0", grok.calls, gem.calls)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestGenerateTrivialOffload(t *testing.T) {
|
||||||
|
grok := &fakeLLM{text: "grok"}
|
||||||
|
gem := &fakeLLM{text: "gemini trivial"}
|
||||||
|
cfg := cascadeCfg()
|
||||||
|
cfg.RouterEnabled, cfg.TrivialOffloadEnabled = true, true
|
||||||
|
b := &Bot{cfg: &cfg, llm: grok, gemini: gem, log: discardLog()}
|
||||||
|
|
||||||
|
res, err := b.generate(context.Background(), "привет", msgs("привет"), "")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("generate: %v", err)
|
||||||
|
}
|
||||||
|
if res.route != routeTrivial || res.text != "gemini trivial" || res.finalModel != "gemini-x" {
|
||||||
|
t.Fatalf("res = (%q,%q,%q), want trivial/gemini", res.route, res.text, res.finalModel)
|
||||||
|
}
|
||||||
|
if gem.calls != 1 || grok.calls != 0 {
|
||||||
|
t.Fatalf("calls grok=%d gem=%d, want 0/1 (Gemini answered)", grok.calls, gem.calls)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestGenerateTrivialDegradesToGrok: Gemini failing on the trivial route must fall back
|
||||||
|
// to Grok, never go silent.
|
||||||
|
func TestGenerateTrivialDegradesToGrok(t *testing.T) {
|
||||||
|
grok := &fakeLLM{text: "grok fallback"}
|
||||||
|
gem := &fakeLLM{err: errors.New("gemini down")}
|
||||||
|
cfg := cascadeCfg()
|
||||||
|
cfg.RouterEnabled, cfg.TrivialOffloadEnabled = true, true
|
||||||
|
b := &Bot{cfg: &cfg, llm: grok, gemini: gem, log: discardLog()}
|
||||||
|
|
||||||
|
res, err := b.generate(context.Background(), "привет", msgs("привет"), "")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("generate: %v", err)
|
||||||
|
}
|
||||||
|
if res.route != routeGrokDirect || res.text != "grok fallback" {
|
||||||
|
t.Fatalf("res = (%q,%q), want grok_direct fallback", res.route, res.text)
|
||||||
|
}
|
||||||
|
if !res.fallback || res.degraded != degradeTrivial {
|
||||||
|
t.Fatalf("fallback=%v degraded=%q, want true/trivial_failed", res.fallback, res.degraded)
|
||||||
|
}
|
||||||
|
if gem.calls != 1 || grok.calls != 1 {
|
||||||
|
t.Fatalf("calls grok=%d gem=%d, want 1/1", grok.calls, gem.calls)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestGenerateWebThenGrok(t *testing.T) {
|
||||||
|
grok := &fakeLLM{text: "synthesised", usage: Usage{PromptTokens: 100, CompletionTokens: 50}}
|
||||||
|
web := &fakeWeb{wc: WebContext{Digest: "fresh facts", Citations: []string{"http://src"}, Cost: CostBreakdown{WebTool: 0.1}}}
|
||||||
|
cfg := cascadeCfg()
|
||||||
|
cfg.RouterEnabled, cfg.WebEnabled = true, true
|
||||||
|
b := &Bot{cfg: &cfg, llm: grok, web: web, log: discardLog()}
|
||||||
|
|
||||||
|
res, err := b.generate(context.Background(), "какие новости сегодня", msgs("какие новости сегодня"), "")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("generate: %v", err)
|
||||||
|
}
|
||||||
|
if res.route != routeWebThenGrok || res.text != "synthesised" {
|
||||||
|
t.Fatalf("res = (%q,%q), want web_then_grok/synthesised", res.route, res.text)
|
||||||
|
}
|
||||||
|
if res.cost.WebTool != 0.1 || res.cost.Token <= 0 {
|
||||||
|
t.Fatalf("cost = %+v, want WebTool 0.1 + Token>0", res.cost)
|
||||||
|
}
|
||||||
|
if web.calls != 1 || grok.calls != 1 {
|
||||||
|
t.Fatalf("calls web=%d grok=%d, want 1/1", web.calls, grok.calls)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestGenerateWebDegradesToGrok: a web fetch failure (provider down or cap hit) degrades
|
||||||
|
// to grok_direct and books no web cost.
|
||||||
|
func TestGenerateWebDegradesToGrok(t *testing.T) {
|
||||||
|
grok := &fakeLLM{text: "grok fallback"}
|
||||||
|
web := &fakeWeb{err: errGroundingCapped}
|
||||||
|
cfg := cascadeCfg()
|
||||||
|
cfg.RouterEnabled, cfg.WebEnabled = true, true
|
||||||
|
b := &Bot{cfg: &cfg, llm: grok, web: web, log: discardLog()}
|
||||||
|
|
||||||
|
res, err := b.generate(context.Background(), "новости сегодня", msgs("новости сегодня"), "")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("generate: %v", err)
|
||||||
|
}
|
||||||
|
if res.route != routeGrokDirect || res.text != "grok fallback" || !res.fallback {
|
||||||
|
t.Fatalf("res = (%q,%q,fallback=%v), want grok_direct fallback", res.route, res.text, res.fallback)
|
||||||
|
}
|
||||||
|
if res.degraded != degradeGroundCap {
|
||||||
|
t.Fatalf("degraded = %q, want grounding_cap (the specific reason)", res.degraded)
|
||||||
|
}
|
||||||
|
if res.cost.WebTool != 0 || res.cost.Grounding != 0 {
|
||||||
|
t.Fatalf("web cost = %+v, want 0 (fetch failed before billing)", res.cost)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestGenerateReasoningForced: the manual trigger routes to the reasoning model with
|
||||||
|
// reasoning_effort, independent of ROUTER_ENABLED.
|
||||||
|
func TestGenerateReasoningForced(t *testing.T) {
|
||||||
|
grok := &fakeLLM{text: "deep answer"}
|
||||||
|
cfg := cascadeCfg()
|
||||||
|
cfg.ReasoningEnabled = true // ROUTER_ENABLED deliberately left off
|
||||||
|
b := &Bot{cfg: &cfg, llm: grok, log: discardLog()}
|
||||||
|
|
||||||
|
res, err := b.generate(context.Background(), "подумай глубже про сознание", msgs("подумай глубже про сознание"), "")
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("generate: %v", err)
|
||||||
|
}
|
||||||
|
if res.route != routeReason || res.decision.Source != "forced" {
|
||||||
|
t.Fatalf("res route=%q source=%q, want reason/forced", res.route, res.decision.Source)
|
||||||
|
}
|
||||||
|
if grok.lastReq.ReasoningEffort != "high" || grok.lastReq.Model != "grok-reason" {
|
||||||
|
t.Fatalf("reasoning req = (effort %q, model %q), want high/grok-reason", grok.lastReq.ReasoningEffort, grok.lastReq.Model)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestClassifierConfidenceFloor: a Layer-1 classifier label that escalates off the safe
|
||||||
|
// floor (trivial/web) must clear the confidence floor, else the request stays on
|
||||||
|
// grok_direct — the false-trivial voice-leak guard (§8.6).
|
||||||
|
func TestClassifierConfidenceFloor(t *testing.T) {
|
||||||
|
cfg := cascadeCfg()
|
||||||
|
cfg.RouterEnabled, cfg.RouterClassifierEnabled = true, true
|
||||||
|
gem := &fakeLLM{}
|
||||||
|
b := &Bot{cfg: &cfg, gemini: gem, log: discardLog()}
|
||||||
|
var cost CostBreakdown
|
||||||
|
const substantive = "напиши подробное эссе про историю римской империи" // Layer-0 → grok_direct
|
||||||
|
|
||||||
|
gem.text = `{"route":"trivial","confidence":0.2}` // low-confidence escalation
|
||||||
|
if d := b.classify(context.Background(), substantive, &cost); d.Route != routeGrokDirect {
|
||||||
|
t.Fatalf("low-confidence trivial must stay grok_direct (safe floor), got %q", d.Route)
|
||||||
|
}
|
||||||
|
gem.text = `{"route":"trivial","confidence":0.95}` // confident escalation is honoured
|
||||||
|
if d := b.classify(context.Background(), substantive, &cost); d.Route != routeTrivial {
|
||||||
|
t.Fatalf("high-confidence trivial should route trivial, got %q", d.Route)
|
||||||
|
}
|
||||||
|
// A classifier error degrades to the Layer-0 verdict (grok_direct), never silence.
|
||||||
|
gem.text, gem.err = "", errors.New("gemini down")
|
||||||
|
if d := b.classify(context.Background(), substantive, &cost); d.Route != routeGrokDirect {
|
||||||
|
t.Fatalf("classifier failure must fall back to heuristic grok_direct, got %q", d.Route)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestGrokReasoningEffort: GROK_REASONING_EFFORT is sent on grok_direct (so grok-4.3 can
|
||||||
|
// be kept fast with "none"), empty means not sent (compat with grok-4.20-non-reasoning),
|
||||||
|
// and the reason route always overrides to "high" regardless.
|
||||||
|
func TestGrokReasoningEffort(t *testing.T) {
|
||||||
|
// Configured effort reaches grok_direct.
|
||||||
|
grok := &fakeLLM{text: "ok"}
|
||||||
|
cfg := cascadeCfg()
|
||||||
|
cfg.GrokReasoningEffort = "none"
|
||||||
|
b := &Bot{cfg: &cfg, llm: grok, log: discardLog()}
|
||||||
|
if _, err := b.generate(context.Background(), "hello", msgs("hello"), ""); err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
if grok.lastReq.ReasoningEffort != "none" {
|
||||||
|
t.Fatalf("grok_direct effort = %q, want none", grok.lastReq.ReasoningEffort)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Empty default → not sent (so grok-4.20-non-reasoning keeps working).
|
||||||
|
grokDef := &fakeLLM{text: "ok"}
|
||||||
|
cfgDef := cascadeCfg() // GrokReasoningEffort == ""
|
||||||
|
bDef := &Bot{cfg: &cfgDef, llm: grokDef, log: discardLog()}
|
||||||
|
if _, err := bDef.generate(context.Background(), "hello", msgs("hello"), ""); err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
if grokDef.lastReq.ReasoningEffort != "" {
|
||||||
|
t.Fatalf("default effort = %q, want empty (not sent)", grokDef.lastReq.ReasoningEffort)
|
||||||
|
}
|
||||||
|
|
||||||
|
// The reason route ignores GROK_REASONING_EFFORT and always uses "high".
|
||||||
|
grokR := &fakeLLM{text: "deep"}
|
||||||
|
cfgR := cascadeCfg()
|
||||||
|
cfgR.GrokReasoningEffort = "none"
|
||||||
|
cfgR.ReasoningEnabled = true
|
||||||
|
bR := &Bot{cfg: &cfgR, llm: grokR, log: discardLog()}
|
||||||
|
if _, err := bR.generate(context.Background(), "подумай глубже про X", msgs("подумай глубже про X"), ""); err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
if grokR.lastReq.ReasoningEffort != "high" {
|
||||||
|
t.Fatalf("reason route effort = %q, want high (overrides GROK_REASONING_EFFORT)", grokR.lastReq.ReasoningEffort)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestGenerateTerminalErrorPropagates: if even grok_direct fails, generate returns the
|
||||||
|
// error (respond turns it into refund + react), not a silent empty success.
|
||||||
|
func TestGenerateTerminalErrorPropagates(t *testing.T) {
|
||||||
|
grok := &fakeLLM{err: errors.New("xai down")}
|
||||||
|
cfg := cascadeCfg()
|
||||||
|
b := &Bot{cfg: &cfg, llm: grok, log: discardLog()}
|
||||||
|
|
||||||
|
if _, err := b.generate(context.Background(), "hello", msgs("hello"), ""); err == nil {
|
||||||
|
t.Fatal("want terminal error when grok_direct fails, got nil")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -5,6 +5,7 @@ import (
|
||||||
"os"
|
"os"
|
||||||
"strconv"
|
"strconv"
|
||||||
"strings"
|
"strings"
|
||||||
|
"time"
|
||||||
)
|
)
|
||||||
|
|
||||||
// Config is the fully-resolved runtime configuration, parsed once from the
|
// Config is the fully-resolved runtime configuration, parsed once from the
|
||||||
|
|
@ -36,23 +37,106 @@ type Config struct {
|
||||||
MaxOutTok int
|
MaxOutTok int
|
||||||
MaxCtxEvent int
|
MaxCtxEvent int
|
||||||
|
|
||||||
|
// GrokReasoningEffort is the reasoning_effort sent on the normal Grok voice calls
|
||||||
|
// (grok_direct + web synthesis). Empty = don't send it (the default — required for
|
||||||
|
// grok-4.20-non-reasoning, which rejects the param). On a unified model like
|
||||||
|
// grok-4.3 the API otherwise defaults to "low" (it thinks on every reply); set this
|
||||||
|
// to "none" to keep the default voice fast/cheap. The reason_then_grok route ignores
|
||||||
|
// this and always uses "high". Accepted: "" | none | low | medium | high.
|
||||||
|
GrokReasoningEffort string
|
||||||
|
|
||||||
// Allowlist of homeservers whose users may pull the bot into a room. Gates
|
// Allowlist of homeservers whose users may pull the bot into a room. Gates
|
||||||
// the *inviter* (F11). Comma-separated env, stored as a set.
|
// the *inviter* (F11). Comma-separated env, stored as a set.
|
||||||
AllowedServers map[string]bool
|
AllowedServers map[string]bool
|
||||||
|
|
||||||
DailyUSDCeiling float64
|
DailyUSDCeiling float64
|
||||||
PerUserDailyCap int
|
PerUserDailyCap int
|
||||||
|
// PerUserDailyUSD is an optional per-user daily $ quota (0 = off) on top of the
|
||||||
|
// request count cap, so one user on expensive routes can't drain the shared global
|
||||||
|
// ceiling and deny everyone else. Checked against the user's own committed+reserved
|
||||||
|
// spend in Reserve.
|
||||||
|
PerUserDailyUSD float64
|
||||||
|
|
||||||
// mxids exempt from PER_USER_DAILY_CAP (e.g. the owner/admins testing). Still
|
// mxids exempt from PER_USER_DAILY_CAP (e.g. the owner/admins testing). Still
|
||||||
// subject to the global DAILY_USD_CEILING, so the wallet stays protected.
|
// subject to the global DAILY_USD_CEILING, so the wallet stays protected.
|
||||||
UnlimitedUsers map[string]bool
|
UnlimitedUsers map[string]bool
|
||||||
|
|
||||||
// USD-per-1M-token prices applied to the API-returned token usage so the
|
// USD-per-1M-token prices for the default (final-voice) model, applied to the
|
||||||
// hard ceiling tracks real usage even if the model/price changes.
|
// API-returned token usage so the hard ceiling tracks real usage even if the
|
||||||
|
// model/price changes. Kept as the back-compat XAI_PRICE_* source; folded into
|
||||||
|
// Prices below.
|
||||||
PriceInputPerM float64
|
PriceInputPerM float64
|
||||||
PriceCachedPerM float64
|
PriceCachedPerM float64
|
||||||
PriceOutputPerM float64
|
PriceOutputPerM float64
|
||||||
|
|
||||||
|
// Prices is the per-model price table (LiteLLM pattern) read by priceFor(model),
|
||||||
|
// so a call books at the price of the model that actually served it. Built in
|
||||||
|
// LoadConfig; the default model's entry comes from the XAI_PRICE_* envs, and a
|
||||||
|
// second model (Gemini) adds its own entry when that layer lands.
|
||||||
|
Prices map[string]ModelPrice
|
||||||
|
|
||||||
|
// RequestBudget bounds one whole request (all model calls share it), so a slow or
|
||||||
|
// retried call — or a multi-stage cascade — can't accrete minutes. The default
|
||||||
|
// matches the previous effective ceiling for a single grok_direct call.
|
||||||
|
RequestBudget time.Duration
|
||||||
|
|
||||||
|
// GrokPromptCache, when true, sends the x-grok-conv-id routing header to raise the
|
||||||
|
// prompt-cache hit rate (Grok caches automatically; the header only pins routing).
|
||||||
|
GrokPromptCache bool
|
||||||
|
|
||||||
|
// TelemetryEnabled writes the request_log analytics row for every request. Default
|
||||||
|
// off so the cascade-off path adds no extra write; turned on to measure the base.
|
||||||
|
// Its write is isolated — a failure logs a WARN, never drops the answer.
|
||||||
|
TelemetryEnabled bool
|
||||||
|
// TelemetryStoreText additionally stores the query text in request_log (for offline
|
||||||
|
// eval). Default off — only metadata is kept.
|
||||||
|
TelemetryStoreText bool
|
||||||
|
// TelemetryRetention trims request_log rows older than this (time-based, since the
|
||||||
|
// analytics are a time series). 0 disables trimming.
|
||||||
|
TelemetryRetention time.Duration
|
||||||
|
|
||||||
|
// --- Cascade (Phase 2-4). EVERY flag defaults OFF, so an unset environment is
|
||||||
|
// exactly today's bot: one grok_direct call. Any layer off or failing degrades to
|
||||||
|
// grok_direct (§8.2). None of these is enabled in prod until the offline-eval gate
|
||||||
|
// (§9) passes. ---
|
||||||
|
|
||||||
|
// RouterEnabled turns on the Layer-0 heuristic router; off → everything is
|
||||||
|
// grok_direct. RouterClassifierEnabled additionally consults the Gemini Layer-1
|
||||||
|
// classifier on cases the heuristic left as grok_direct.
|
||||||
|
RouterEnabled bool
|
||||||
|
RouterClassifierEnabled bool
|
||||||
|
// TrivialOffloadEnabled lets the trivial route answer with Gemini; off → trivial
|
||||||
|
// still goes to Grok.
|
||||||
|
TrivialOffloadEnabled bool
|
||||||
|
// WebEnabled turns on the web_then_grok route. WebProvider selects the source:
|
||||||
|
// grok_web_search (default, works on chat/completions via Live Search) or
|
||||||
|
// gemini_grounding (Gemini-3 native only — see F-EXT-3).
|
||||||
|
WebEnabled bool
|
||||||
|
WebProvider string
|
||||||
|
// WebGroundingDailyCap caps grounded prompts/day (durable counter) before falling
|
||||||
|
// back, guarding the $/1k grounding overage. WebGroundingTier records the Gemini
|
||||||
|
// plan the cap reflects.
|
||||||
|
WebGroundingDailyCap int
|
||||||
|
WebGroundingTier string
|
||||||
|
// Reasoning route: a manual "think harder" trigger. ReasoningModel must be a
|
||||||
|
// reasoning-capable model (the default grok-4.20-non-reasoning is NOT — see the
|
||||||
|
// docs.x.ai finding); set REASONING_MODEL to e.g. grok-4.3 to use it.
|
||||||
|
ReasoningEnabled bool
|
||||||
|
ReasoningTrigger string
|
||||||
|
ReasoningModel string
|
||||||
|
// ReasoningEffort is the reasoning_effort the reason_then_grok route sends on the
|
||||||
|
// manual "think harder" trigger. Default "high". Accepted: none|low|medium|high.
|
||||||
|
ReasoningEffort string
|
||||||
|
// CanaryPercent routes a fraction of traffic through the new path for A/B before a
|
||||||
|
// full enable. 0 = off (scaffold; not yet consulted by the dispatch).
|
||||||
|
CanaryPercent int
|
||||||
|
|
||||||
|
// Gemini backend (the cheap/router/grounding model). Required only when a layer
|
||||||
|
// that uses it is enabled (validated below).
|
||||||
|
GeminiBaseURL string
|
||||||
|
GeminiAPIKey string
|
||||||
|
GeminiModel string
|
||||||
|
|
||||||
SystemPromptPath string
|
SystemPromptPath string
|
||||||
SystemPrompt string
|
SystemPrompt string
|
||||||
StateDir string
|
StateDir string
|
||||||
|
|
@ -111,6 +195,23 @@ func getenvFloat(key string, def float64) (float64, error) {
|
||||||
return f, nil
|
return f, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// getenvBool parses a boolean flag. Accepts the usual 1/0/true/false/yes/no/on/off
|
||||||
|
// (case-insensitive); empty → default. Every cascade flag defaults false, so an unset
|
||||||
|
// or blank env keeps today's behaviour.
|
||||||
|
func getenvBool(key string, def bool) (bool, error) {
|
||||||
|
raw := strings.TrimSpace(getenv(key, ""))
|
||||||
|
if raw == "" {
|
||||||
|
return def, nil
|
||||||
|
}
|
||||||
|
switch strings.ToLower(raw) {
|
||||||
|
case "1", "true", "yes", "on":
|
||||||
|
return true, nil
|
||||||
|
case "0", "false", "no", "off":
|
||||||
|
return false, nil
|
||||||
|
}
|
||||||
|
return false, fmt.Errorf("%s must be a boolean (true/false), got %q", key, raw)
|
||||||
|
}
|
||||||
|
|
||||||
func parseServerSet(raw string) map[string]bool {
|
func parseServerSet(raw string) map[string]bool {
|
||||||
set := make(map[string]bool)
|
set := make(map[string]bool)
|
||||||
for _, s := range strings.Split(raw, ",") {
|
for _, s := range strings.Split(raw, ",") {
|
||||||
|
|
@ -139,6 +240,16 @@ func LoadConfig() (*Config, error) {
|
||||||
DatabaseURL: getenv("AI_BOT_DATABASE_URL", ""),
|
DatabaseURL: getenv("AI_BOT_DATABASE_URL", ""),
|
||||||
AllowedServers: parseServerSet(getenv("ALLOWED_SERVERS", "")),
|
AllowedServers: parseServerSet(getenv("ALLOWED_SERVERS", "")),
|
||||||
UnlimitedUsers: parseServerSet(getenv("UNLIMITED_USERS", "")),
|
UnlimitedUsers: parseServerSet(getenv("UNLIMITED_USERS", "")),
|
||||||
|
|
||||||
|
// Cascade string-valued config (flags/ints/secrets parsed below).
|
||||||
|
GrokReasoningEffort: strings.ToLower(strings.TrimSpace(getenv("GROK_REASONING_EFFORT", ""))),
|
||||||
|
WebProvider: getenv("WEB_PROVIDER", webProviderGrokWebSearch),
|
||||||
|
WebGroundingTier: getenv("WEB_GROUNDING_TIER", "free"),
|
||||||
|
ReasoningTrigger: getenv("REASONING_TRIGGER", "подумай глубже"),
|
||||||
|
ReasoningModel: getenv("REASONING_MODEL", "grok-4.3"),
|
||||||
|
ReasoningEffort: strings.ToLower(strings.TrimSpace(getenv("REASONING_EFFORT", "high"))),
|
||||||
|
GeminiBaseURL: strings.TrimRight(getenv("GEMINI_BASE_URL", "https://generativelanguage.googleapis.com/v1beta/openai"), "/"),
|
||||||
|
GeminiModel: getenv("GEMINI_MODEL", "gemini-2.5-flash-lite"),
|
||||||
}
|
}
|
||||||
|
|
||||||
var problems []string
|
var problems []string
|
||||||
|
|
@ -152,6 +263,7 @@ func LoadConfig() (*Config, error) {
|
||||||
{"AS_TOKEN", &cfg.ASToken},
|
{"AS_TOKEN", &cfg.ASToken},
|
||||||
{"HS_TOKEN", &cfg.HSToken},
|
{"HS_TOKEN", &cfg.HSToken},
|
||||||
{"XAI_API_KEY", &cfg.XAIAPIKey},
|
{"XAI_API_KEY", &cfg.XAIAPIKey},
|
||||||
|
{"GEMINI_API_KEY", &cfg.GeminiAPIKey}, // optional; required only if a Gemini layer is on
|
||||||
} {
|
} {
|
||||||
v, err := getSecret(s.key)
|
v, err := getSecret(s.key)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
|
|
@ -207,6 +319,9 @@ func LoadConfig() (*Config, error) {
|
||||||
if cfg.PerUserDailyCap, err = getenvInt("PER_USER_DAILY_CAP", 30); err != nil {
|
if cfg.PerUserDailyCap, err = getenvInt("PER_USER_DAILY_CAP", 30); err != nil {
|
||||||
problems = append(problems, err.Error())
|
problems = append(problems, err.Error())
|
||||||
}
|
}
|
||||||
|
if cfg.PerUserDailyUSD, err = getenvFloat("PER_USER_DAILY_USD", 0); err != nil {
|
||||||
|
problems = append(problems, err.Error())
|
||||||
|
}
|
||||||
if cfg.PriceInputPerM, err = getenvFloat("XAI_PRICE_INPUT_PER_M", 1.25); err != nil {
|
if cfg.PriceInputPerM, err = getenvFloat("XAI_PRICE_INPUT_PER_M", 1.25); err != nil {
|
||||||
problems = append(problems, err.Error())
|
problems = append(problems, err.Error())
|
||||||
}
|
}
|
||||||
|
|
@ -216,6 +331,110 @@ func LoadConfig() (*Config, error) {
|
||||||
if cfg.PriceOutputPerM, err = getenvFloat("XAI_PRICE_OUTPUT_PER_M", 2.50); err != nil {
|
if cfg.PriceOutputPerM, err = getenvFloat("XAI_PRICE_OUTPUT_PER_M", 2.50); err != nil {
|
||||||
problems = append(problems, err.Error())
|
problems = append(problems, err.Error())
|
||||||
}
|
}
|
||||||
|
// Per-model price table. The default (final-voice) model is priced from the
|
||||||
|
// XAI_PRICE_* envs; additional models register their own entry as their layer
|
||||||
|
// lands. priceFor falls back to this default model for an unknown model.
|
||||||
|
cfg.Prices = map[string]ModelPrice{
|
||||||
|
cfg.XAIModel: {
|
||||||
|
InputPerM: cfg.PriceInputPerM,
|
||||||
|
CachedPerM: cfg.PriceCachedPerM,
|
||||||
|
OutputPerM: cfg.PriceOutputPerM,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
var budgetSec, retentionDays int
|
||||||
|
if budgetSec, err = getenvInt("REQUEST_BUDGET_SECONDS", 180); err != nil {
|
||||||
|
problems = append(problems, err.Error())
|
||||||
|
}
|
||||||
|
cfg.RequestBudget = time.Duration(budgetSec) * time.Second
|
||||||
|
if cfg.GrokPromptCache, err = getenvBool("GROK_PROMPT_CACHE", false); err != nil {
|
||||||
|
problems = append(problems, err.Error())
|
||||||
|
}
|
||||||
|
if cfg.TelemetryEnabled, err = getenvBool("TELEMETRY_ENABLED", false); err != nil {
|
||||||
|
problems = append(problems, err.Error())
|
||||||
|
}
|
||||||
|
if cfg.TelemetryStoreText, err = getenvBool("TELEMETRY_STORE_TEXT", false); err != nil {
|
||||||
|
problems = append(problems, err.Error())
|
||||||
|
}
|
||||||
|
if retentionDays, err = getenvInt("TELEMETRY_RETENTION_DAYS", 30); err != nil {
|
||||||
|
problems = append(problems, err.Error())
|
||||||
|
}
|
||||||
|
cfg.TelemetryRetention = time.Duration(retentionDays) * 24 * time.Hour
|
||||||
|
|
||||||
|
// Cascade flags — every one defaults false, so an unset env is today's bot.
|
||||||
|
for _, f := range []struct {
|
||||||
|
key string
|
||||||
|
dest *bool
|
||||||
|
}{
|
||||||
|
{"ROUTER_ENABLED", &cfg.RouterEnabled},
|
||||||
|
{"ROUTER_CLASSIFIER_ENABLED", &cfg.RouterClassifierEnabled},
|
||||||
|
{"TRIVIAL_OFFLOAD_ENABLED", &cfg.TrivialOffloadEnabled},
|
||||||
|
{"WEB_ENABLED", &cfg.WebEnabled},
|
||||||
|
{"REASONING_ENABLED", &cfg.ReasoningEnabled},
|
||||||
|
} {
|
||||||
|
if *f.dest, err = getenvBool(f.key, false); err != nil {
|
||||||
|
problems = append(problems, err.Error())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if cfg.WebGroundingDailyCap, err = getenvInt("WEB_GROUNDING_DAILY_CAP", 450); err != nil {
|
||||||
|
problems = append(problems, err.Error())
|
||||||
|
}
|
||||||
|
if cfg.CanaryPercent, err = getenvInt("CANARY_PERCENT", 0); err != nil {
|
||||||
|
problems = append(problems, err.Error())
|
||||||
|
}
|
||||||
|
|
||||||
|
// Gemini pricing → the per-model table (defaults: gemini-2.5-flash-lite $0.10/$0.40
|
||||||
|
// per 1M; cached priced as input, a conservative over-count). Lets the ceiling and
|
||||||
|
// request_log price Gemini calls at Gemini rates.
|
||||||
|
var gIn, gOut float64
|
||||||
|
if gIn, err = getenvFloat("GEMINI_PRICE_INPUT_PER_M", 0.10); err != nil {
|
||||||
|
problems = append(problems, err.Error())
|
||||||
|
}
|
||||||
|
if gOut, err = getenvFloat("GEMINI_PRICE_OUTPUT_PER_M", 0.40); err != nil {
|
||||||
|
problems = append(problems, err.Error())
|
||||||
|
}
|
||||||
|
cfg.Prices[cfg.GeminiModel] = ModelPrice{InputPerM: gIn, CachedPerM: gIn, OutputPerM: gOut}
|
||||||
|
// Reasoning model price (defaults to the final-voice grok rates — grok-4.3 ≈ 4.20),
|
||||||
|
// so the reasoning route reserves/bills at its own price instead of falling back.
|
||||||
|
var rIn, rOut float64
|
||||||
|
if rIn, err = getenvFloat("REASONING_PRICE_INPUT_PER_M", cfg.PriceInputPerM); err != nil {
|
||||||
|
problems = append(problems, err.Error())
|
||||||
|
}
|
||||||
|
if rOut, err = getenvFloat("REASONING_PRICE_OUTPUT_PER_M", cfg.PriceOutputPerM); err != nil {
|
||||||
|
problems = append(problems, err.Error())
|
||||||
|
}
|
||||||
|
cfg.Prices[cfg.ReasoningModel] = ModelPrice{InputPerM: rIn, CachedPerM: cfg.PriceCachedPerM, OutputPerM: rOut}
|
||||||
|
|
||||||
|
// Fail-fast on broken cascade wiring (§5/F-FUNC-9), at EVERY start (not just
|
||||||
|
// check-config): a layer that needs Gemini but has no key would silently never
|
||||||
|
// fire. Better to refuse to start than to quietly run degraded.
|
||||||
|
needsGemini := cfg.TrivialOffloadEnabled || cfg.RouterClassifierEnabled ||
|
||||||
|
(cfg.WebEnabled && cfg.WebProvider == webProviderGeminiGrounding)
|
||||||
|
if needsGemini && cfg.GeminiAPIKey == "" {
|
||||||
|
problems = append(problems, "GEMINI_API_KEY is required when TRIVIAL_OFFLOAD_ENABLED, ROUTER_CLASSIFIER_ENABLED, or WEB_ENABLED with gemini_grounding is set")
|
||||||
|
}
|
||||||
|
if cfg.RouterClassifierEnabled && !cfg.RouterEnabled {
|
||||||
|
problems = append(problems, "ROUTER_CLASSIFIER_ENABLED requires ROUTER_ENABLED")
|
||||||
|
}
|
||||||
|
if cfg.WebEnabled && cfg.WebProvider != webProviderGrokWebSearch && cfg.WebProvider != webProviderGeminiGrounding {
|
||||||
|
problems = append(problems, fmt.Sprintf("WEB_PROVIDER must be %q or %q, got %q",
|
||||||
|
webProviderGrokWebSearch, webProviderGeminiGrounding, cfg.WebProvider))
|
||||||
|
}
|
||||||
|
if cfg.ReasoningEnabled && cfg.ReasoningModel == "" {
|
||||||
|
problems = append(problems, "REASONING_MODEL is required when REASONING_ENABLED is set")
|
||||||
|
}
|
||||||
|
switch cfg.GrokReasoningEffort {
|
||||||
|
case "", "none", "low", "medium", "high":
|
||||||
|
default:
|
||||||
|
problems = append(problems, fmt.Sprintf(
|
||||||
|
"GROK_REASONING_EFFORT must be one of none/low/medium/high (or empty), got %q", cfg.GrokReasoningEffort))
|
||||||
|
}
|
||||||
|
switch cfg.ReasoningEffort {
|
||||||
|
case "none", "low", "medium", "high":
|
||||||
|
default:
|
||||||
|
problems = append(problems, fmt.Sprintf(
|
||||||
|
"REASONING_EFFORT must be one of none/low/medium/high, got %q", cfg.ReasoningEffort))
|
||||||
|
}
|
||||||
|
|
||||||
if len(problems) > 0 {
|
if len(problems) > 0 {
|
||||||
return nil, fmt.Errorf("invalid configuration:\n - %s", strings.Join(problems, "\n - "))
|
return nil, fmt.Errorf("invalid configuration:\n - %s", strings.Join(problems, "\n - "))
|
||||||
|
|
@ -223,6 +442,14 @@ func LoadConfig() (*Config, error) {
|
||||||
return cfg, nil
|
return cfg, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// needsGemini reports whether any enabled layer requires the Gemini backend — the
|
||||||
|
// cheap trivial route, the Layer-1 classifier, or Gemini-native web grounding. Drives
|
||||||
|
// both the fail-fast key check and whether the client is built at all.
|
||||||
|
func (c *Config) needsGemini() bool {
|
||||||
|
return c.TrivialOffloadEnabled || c.RouterClassifierEnabled ||
|
||||||
|
(c.WebEnabled && c.WebProvider == webProviderGeminiGrounding)
|
||||||
|
}
|
||||||
|
|
||||||
// Summary returns a human-readable, SECRET-REDACTED dump for the startup log.
|
// Summary returns a human-readable, SECRET-REDACTED dump for the startup log.
|
||||||
func (c *Config) Summary() string {
|
func (c *Config) Summary() string {
|
||||||
servers := make([]string, 0, len(c.AllowedServers))
|
servers := make([]string, 0, len(c.AllowedServers))
|
||||||
|
|
@ -255,6 +482,12 @@ func (c *Config) Summary() string {
|
||||||
" HS_TOKEN = " + redact(c.HSToken),
|
" HS_TOKEN = " + redact(c.HSToken),
|
||||||
" XAI_BASE_URL = " + c.XAIBaseURL,
|
" XAI_BASE_URL = " + c.XAIBaseURL,
|
||||||
" XAI_MODEL = " + c.XAIModel,
|
" XAI_MODEL = " + c.XAIModel,
|
||||||
|
" GROK_REASONING_EFFORT = " + func() string {
|
||||||
|
if c.GrokReasoningEffort == "" {
|
||||||
|
return "(unset — not sent; provider default)"
|
||||||
|
}
|
||||||
|
return c.GrokReasoningEffort
|
||||||
|
}(),
|
||||||
" XAI_API_KEY = " + redact(c.XAIAPIKey),
|
" XAI_API_KEY = " + redact(c.XAIAPIKey),
|
||||||
fmt.Sprintf(" XAI_TEMPERATURE = %g", c.XAITemp),
|
fmt.Sprintf(" XAI_TEMPERATURE = %g", c.XAITemp),
|
||||||
fmt.Sprintf(" MAX_OUTPUT_TOKENS = %d", c.MaxOutTok),
|
fmt.Sprintf(" MAX_OUTPUT_TOKENS = %d", c.MaxOutTok),
|
||||||
|
|
@ -268,5 +501,14 @@ func (c *Config) Summary() string {
|
||||||
" SYSTEM_PROMPT_PATH = " + c.SystemPromptPath,
|
" SYSTEM_PROMPT_PATH = " + c.SystemPromptPath,
|
||||||
" STATE_DIR = " + c.StateDir,
|
" STATE_DIR = " + c.StateDir,
|
||||||
" AI_BOT_DATABASE_URL= " + redact(c.DatabaseURL),
|
" AI_BOT_DATABASE_URL= " + redact(c.DatabaseURL),
|
||||||
|
fmt.Sprintf(" REQUEST_BUDGET = %s", c.RequestBudget),
|
||||||
|
fmt.Sprintf(" GROK_PROMPT_CACHE = %t", c.GrokPromptCache),
|
||||||
|
fmt.Sprintf(" TELEMETRY_ENABLED = %t (store_text=%t, retention=%s)",
|
||||||
|
c.TelemetryEnabled, c.TelemetryStoreText, c.TelemetryRetention),
|
||||||
|
fmt.Sprintf(" CASCADE: router=%t classifier=%t trivial=%t web=%t(%s, cap=%d) reason=%t(%s)",
|
||||||
|
c.RouterEnabled, c.RouterClassifierEnabled, c.TrivialOffloadEnabled,
|
||||||
|
c.WebEnabled, c.WebProvider, c.WebGroundingDailyCap, c.ReasoningEnabled, c.ReasoningEffort),
|
||||||
|
" GEMINI_MODEL = " + c.GeminiModel,
|
||||||
|
" GEMINI_API_KEY = " + redact(c.GeminiAPIKey),
|
||||||
}, "\n")
|
}, "\n")
|
||||||
}
|
}
|
||||||
|
|
|
||||||
98
apps/ai-bot/config_test.go
Normal file
98
apps/ai-bot/config_test.go
Normal file
|
|
@ -0,0 +1,98 @@
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"strings"
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
// setBaseEnv sets the minimal valid environment (all cascade flags off) so each test
|
||||||
|
// can toggle one combination and assert the fail-fast validation (F-FUNC-9).
|
||||||
|
func setBaseEnv(t *testing.T) {
|
||||||
|
t.Helper()
|
||||||
|
t.Setenv("HOMESERVER_URL", "http://hs")
|
||||||
|
t.Setenv("BOT_MXID", "@ai:vojo.chat")
|
||||||
|
t.Setenv("AS_TOKEN", "as")
|
||||||
|
t.Setenv("HS_TOKEN", "hs")
|
||||||
|
t.Setenv("XAI_API_KEY", "xai")
|
||||||
|
t.Setenv("AI_BOT_DATABASE_URL", "postgres://x")
|
||||||
|
t.Setenv("ALLOWED_SERVERS", "vojo.chat")
|
||||||
|
// Force a clean baseline so the host environment can't leak in.
|
||||||
|
for _, k := range []string{
|
||||||
|
"GEMINI_API_KEY", "GEMINI_API_KEY_FILE", "ROUTER_ENABLED", "ROUTER_CLASSIFIER_ENABLED",
|
||||||
|
"TRIVIAL_OFFLOAD_ENABLED", "WEB_ENABLED", "REASONING_ENABLED", "WEB_PROVIDER", "REASONING_MODEL",
|
||||||
|
} {
|
||||||
|
t.Setenv(k, "")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestConfigBaseValid(t *testing.T) {
|
||||||
|
setBaseEnv(t)
|
||||||
|
if _, err := LoadConfig(); err != nil {
|
||||||
|
t.Fatalf("base config should be valid: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestConfigAllCascadeFlagsDefaultOff(t *testing.T) {
|
||||||
|
setBaseEnv(t)
|
||||||
|
cfg, err := LoadConfig()
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("%v", err)
|
||||||
|
}
|
||||||
|
if cfg.RouterEnabled || cfg.RouterClassifierEnabled || cfg.TrivialOffloadEnabled ||
|
||||||
|
cfg.WebEnabled || cfg.ReasoningEnabled || cfg.TelemetryEnabled || cfg.GrokPromptCache {
|
||||||
|
t.Fatal("every cascade/telemetry flag must default off (cascade-off == today)")
|
||||||
|
}
|
||||||
|
if cfg.WebProvider != webProviderGrokWebSearch {
|
||||||
|
t.Fatalf("default WEB_PROVIDER = %q, want grok_web_search", cfg.WebProvider)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestConfigTrivialNeedsGeminiKey(t *testing.T) {
|
||||||
|
setBaseEnv(t)
|
||||||
|
t.Setenv("TRIVIAL_OFFLOAD_ENABLED", "true")
|
||||||
|
if _, err := LoadConfig(); err == nil || !strings.Contains(err.Error(), "GEMINI_API_KEY") {
|
||||||
|
t.Fatalf("want GEMINI_API_KEY error, got %v", err)
|
||||||
|
}
|
||||||
|
t.Setenv("GEMINI_API_KEY", "gk")
|
||||||
|
if _, err := LoadConfig(); err != nil {
|
||||||
|
t.Fatalf("with key it should be valid: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestConfigClassifierNeedsRouter(t *testing.T) {
|
||||||
|
setBaseEnv(t)
|
||||||
|
t.Setenv("GEMINI_API_KEY", "gk")
|
||||||
|
t.Setenv("ROUTER_CLASSIFIER_ENABLED", "true") // without ROUTER_ENABLED
|
||||||
|
if _, err := LoadConfig(); err == nil || !strings.Contains(err.Error(), "ROUTER_ENABLED") {
|
||||||
|
t.Fatalf("want ROUTER_ENABLED error, got %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestConfigBadWebProvider(t *testing.T) {
|
||||||
|
setBaseEnv(t)
|
||||||
|
t.Setenv("WEB_ENABLED", "true")
|
||||||
|
t.Setenv("WEB_PROVIDER", "bing")
|
||||||
|
if _, err := LoadConfig(); err == nil || !strings.Contains(err.Error(), "WEB_PROVIDER") {
|
||||||
|
t.Fatalf("want WEB_PROVIDER error, got %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// The default web provider (grok_web_search) uses the existing xAI key, so WEB_ENABLED
|
||||||
|
// alone must NOT demand a Gemini key.
|
||||||
|
func TestConfigWebGrokNeedsNoGeminiKey(t *testing.T) {
|
||||||
|
setBaseEnv(t)
|
||||||
|
t.Setenv("WEB_ENABLED", "true")
|
||||||
|
if _, err := LoadConfig(); err != nil {
|
||||||
|
t.Fatalf("web+grok_web_search should not need a Gemini key: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// gemini_grounding DOES need a Gemini key.
|
||||||
|
func TestConfigWebGeminiGroundingNeedsKey(t *testing.T) {
|
||||||
|
setBaseEnv(t)
|
||||||
|
t.Setenv("WEB_ENABLED", "true")
|
||||||
|
t.Setenv("WEB_PROVIDER", webProviderGeminiGrounding)
|
||||||
|
if _, err := LoadConfig(); err == nil || !strings.Contains(err.Error(), "GEMINI_API_KEY") {
|
||||||
|
t.Fatalf("want GEMINI_API_KEY error, got %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -9,20 +9,20 @@ type bufferedMsg struct {
|
||||||
isBot bool
|
isBot bool
|
||||||
}
|
}
|
||||||
|
|
||||||
// buildContext assembles the xAI message list under the owner's minimisation
|
// buildContext assembles the provider-neutral message list under the owner's
|
||||||
// rule ("trigger + bot replies only", §6/F8):
|
// minimisation rule ("trigger + bot replies only", §6/F8):
|
||||||
//
|
//
|
||||||
// - GROUP rooms: send ONLY the bot's own prior replies (assistant turns) plus
|
// - GROUP rooms: send ONLY the bot's own prior replies (assistant turns) plus
|
||||||
// the single triggering message (user turn). Other participants' messages and
|
// the single triggering message (user turn). Other participants' messages and
|
||||||
// display names never reach xAI — the third-party-consent mitigation.
|
// display names never reach the model — the third-party-consent mitigation.
|
||||||
// - 1:1 rooms: there are no third parties, so the peer's recent turns are
|
// - 1:1 rooms: there are no third parties, so the peer's recent turns are
|
||||||
// included too for coherence. Still no display names (pseudo "user").
|
// included too for coherence. Still no display names (pseudo "user").
|
||||||
//
|
//
|
||||||
// `history` is the recent room window EXCLUDING the trigger; `triggerBody` is the
|
// `history` is the recent room window EXCLUDING the trigger; `triggerBody` is the
|
||||||
// message that addressed the bot. Bodies are stripped of reply-fallback quotes so
|
// message that addressed the bot. Bodies are stripped of reply-fallback quotes so
|
||||||
// quoted third-party text doesn't leak.
|
// quoted third-party text doesn't leak.
|
||||||
func buildContext(system string, history []bufferedMsg, isDM bool, triggerBody string, maxEvents, maxTokens int) []xaiMessage {
|
func buildContext(system string, history []bufferedMsg, isDM bool, triggerBody string, maxEvents, maxTokens int) []Message {
|
||||||
msgs := []xaiMessage{{Role: "system", Content: system}}
|
msgs := []Message{{Role: "system", Content: system}}
|
||||||
|
|
||||||
// Keep at most the last maxEvents history items.
|
// Keep at most the last maxEvents history items.
|
||||||
if len(history) > maxEvents {
|
if len(history) > maxEvents {
|
||||||
|
|
@ -34,16 +34,16 @@ func buildContext(system string, history []bufferedMsg, isDM bool, triggerBody s
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
if h.isBot {
|
if h.isBot {
|
||||||
msgs = append(msgs, xaiMessage{Role: "assistant", Content: body})
|
msgs = append(msgs, Message{Role: "assistant", Content: body})
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
if isDM {
|
if isDM {
|
||||||
msgs = append(msgs, xaiMessage{Role: "user", Content: body})
|
msgs = append(msgs, Message{Role: "user", Content: body})
|
||||||
}
|
}
|
||||||
// group + non-bot history → dropped (privacy minimisation)
|
// group + non-bot history → dropped (privacy minimisation)
|
||||||
}
|
}
|
||||||
|
|
||||||
msgs = append(msgs, xaiMessage{Role: "user", Content: stripReplyFallback(triggerBody)})
|
msgs = append(msgs, Message{Role: "user", Content: stripReplyFallback(triggerBody)})
|
||||||
return truncateToTokens(msgs, maxTokens)
|
return truncateToTokens(msgs, maxTokens)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -57,7 +57,7 @@ func estimateTokens(s string) int {
|
||||||
// truncateToTokens drops the oldest non-system, non-final messages until the
|
// truncateToTokens drops the oldest non-system, non-final messages until the
|
||||||
// estimate fits maxTokens. The system prompt (index 0) and the final user
|
// estimate fits maxTokens. The system prompt (index 0) and the final user
|
||||||
// trigger are always preserved.
|
// trigger are always preserved.
|
||||||
func truncateToTokens(msgs []xaiMessage, maxTokens int) []xaiMessage {
|
func truncateToTokens(msgs []Message, maxTokens int) []Message {
|
||||||
total := 0
|
total := 0
|
||||||
for _, m := range msgs {
|
for _, m := range msgs {
|
||||||
total += estimateTokens(m.Content)
|
total += estimateTokens(m.Content)
|
||||||
|
|
|
||||||
200
apps/ai-bot/httpllm.go
Normal file
200
apps/ai-bot/httpllm.go
Normal file
|
|
@ -0,0 +1,200 @@
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"context"
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"log/slog"
|
||||||
|
"math/rand"
|
||||||
|
"net/http"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// httpllm.go is the shared OpenAI-compatible Chat Completions transport: one
|
||||||
|
// HTTP+retry implementation reused by every provider adapter. Grok and Gemini both
|
||||||
|
// expose this wire format, so the retry/backoff classification (429/5xx/network =
|
||||||
|
// retryable, other 4xx = terminal) lives once here, parameterised by base/key/
|
||||||
|
// headers, instead of being copied per provider.
|
||||||
|
|
||||||
|
// openAIClient performs OpenAI-compatible /chat/completions calls with retry.
|
||||||
|
type openAIClient struct {
|
||||||
|
name string // provider label for logs/errors ("xai", "gemini")
|
||||||
|
base string
|
||||||
|
key string
|
||||||
|
http *http.Client
|
||||||
|
maxTry int
|
||||||
|
headers map[string]string // extra static headers (provider-specific), may be nil
|
||||||
|
log *slog.Logger
|
||||||
|
}
|
||||||
|
|
||||||
|
func newOpenAIClient(name, base, key string, headers map[string]string, logger *slog.Logger) *openAIClient {
|
||||||
|
return &openAIClient{
|
||||||
|
name: name,
|
||||||
|
base: base,
|
||||||
|
key: key,
|
||||||
|
http: &http.Client{},
|
||||||
|
maxTry: 3,
|
||||||
|
headers: headers,
|
||||||
|
log: logger,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- OpenAI-compatible wire types -------------------------------------------------
|
||||||
|
|
||||||
|
type openAIMessage struct {
|
||||||
|
Role string `json:"role"`
|
||||||
|
Content string `json:"content"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// openAITool is the wire shape of a model tool (e.g. web search). Only serialized
|
||||||
|
// when the request carries tools, so a plain completion's body is unchanged.
|
||||||
|
type openAITool struct {
|
||||||
|
Type string `json:"type"`
|
||||||
|
}
|
||||||
|
|
||||||
|
type openAIRequest struct {
|
||||||
|
Model string `json:"model"`
|
||||||
|
Messages []openAIMessage `json:"messages"`
|
||||||
|
MaxTokens int `json:"max_tokens"`
|
||||||
|
Temperature float64 `json:"temperature"`
|
||||||
|
Stream bool `json:"stream"`
|
||||||
|
// Optional; omitempty keeps the grok_direct body byte-identical to before.
|
||||||
|
Tools []openAITool `json:"tools,omitempty"`
|
||||||
|
ReasoningEffort string `json:"reasoning_effort,omitempty"`
|
||||||
|
// SearchParameters drives xAI Live Search on chat/completions (the web route's
|
||||||
|
// grok_web_search provider). nil for every non-web call, so it serializes away.
|
||||||
|
SearchParameters any `json:"search_parameters,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
type openAIUsage struct {
|
||||||
|
PromptTokens int `json:"prompt_tokens"`
|
||||||
|
CompletionTokens int `json:"completion_tokens"`
|
||||||
|
PromptTokensDetails struct {
|
||||||
|
CachedTokens int `json:"cached_tokens"`
|
||||||
|
} `json:"prompt_tokens_details"`
|
||||||
|
}
|
||||||
|
|
||||||
|
type openAIResponse struct {
|
||||||
|
ID string `json:"id"`
|
||||||
|
Choices []struct {
|
||||||
|
Message struct {
|
||||||
|
Content string `json:"content"`
|
||||||
|
} `json:"message"`
|
||||||
|
FinishReason string `json:"finish_reason"`
|
||||||
|
} `json:"choices"`
|
||||||
|
Usage openAIUsage `json:"usage"`
|
||||||
|
// Citations is the source list xAI Live Search returns by default (absent on a
|
||||||
|
// non-web call → nil).
|
||||||
|
Citations []string `json:"citations"`
|
||||||
|
}
|
||||||
|
|
||||||
|
func (r *openAIResponse) Text() string {
|
||||||
|
if len(r.Choices) == 0 {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
return r.Choices[0].Message.Content
|
||||||
|
}
|
||||||
|
|
||||||
|
// complete calls Chat Completions with retry on transient failures (429 / 5xx /
|
||||||
|
// network timeout, exponential backoff + jitter). Non-retryable 4xx fail
|
||||||
|
// immediately. On exhaustion the caller refunds the reserved request and notifies
|
||||||
|
// the user, so a transient failure is never silently swallowed (F6). reqHeaders are
|
||||||
|
// per-request headers (e.g. x-grok-conv-id) merged on top of the static ones; nil is
|
||||||
|
// fine.
|
||||||
|
func (c *openAIClient) complete(ctx context.Context, reqBody openAIRequest, reqHeaders map[string]string) (*openAIResponse, error) {
|
||||||
|
payload, err := json.Marshal(reqBody)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
var lastErr error
|
||||||
|
for attempt := 0; attempt < c.maxTry; attempt++ {
|
||||||
|
if attempt > 0 {
|
||||||
|
// 0.5s, 1s, 2s … capped at 8s, plus up to 250ms jitter.
|
||||||
|
backoff := time.Duration(500<<uint(attempt-1)) * time.Millisecond
|
||||||
|
if backoff > 8*time.Second {
|
||||||
|
backoff = 8 * time.Second
|
||||||
|
}
|
||||||
|
backoff += time.Duration(rand.Intn(250)) * time.Millisecond
|
||||||
|
select {
|
||||||
|
case <-ctx.Done():
|
||||||
|
return nil, ctx.Err()
|
||||||
|
case <-time.After(backoff):
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
resp, retryable, err := c.attempt(ctx, payload, reqHeaders)
|
||||||
|
if err == nil {
|
||||||
|
return resp, nil
|
||||||
|
}
|
||||||
|
lastErr = err
|
||||||
|
if ctx.Err() != nil {
|
||||||
|
return nil, ctx.Err()
|
||||||
|
}
|
||||||
|
if !retryable {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
if c.log != nil {
|
||||||
|
c.log.Warn(c.name+" attempt failed, will retry", "attempt", attempt+1, "max", c.maxTry, "err", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nil, fmt.Errorf("%s: exhausted %d attempts: %w", c.name, c.maxTry, lastErr)
|
||||||
|
}
|
||||||
|
|
||||||
|
// attempt performs one HTTP call. It returns retryable=true for 429/5xx and
|
||||||
|
// network errors, false for other non-2xx (terminal 4xx). The per-attempt deadline
|
||||||
|
// bounds a single hung connection; the overall per-request deadline (set by the
|
||||||
|
// caller via ctx) bounds the whole retry loop so a cascade can't accrete minutes.
|
||||||
|
func (c *openAIClient) attempt(ctx context.Context, payload []byte, reqHeaders map[string]string) (*openAIResponse, bool, error) {
|
||||||
|
attemptCtx, cancel := context.WithTimeout(ctx, 60*time.Second)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
req, err := http.NewRequestWithContext(attemptCtx, http.MethodPost, c.base+"/chat/completions", bytes.NewReader(payload))
|
||||||
|
if err != nil {
|
||||||
|
return nil, false, err
|
||||||
|
}
|
||||||
|
req.Header.Set("Content-Type", "application/json")
|
||||||
|
req.Header.Set("Authorization", "Bearer "+c.key)
|
||||||
|
for k, v := range c.headers {
|
||||||
|
req.Header.Set(k, v)
|
||||||
|
}
|
||||||
|
for k, v := range reqHeaders {
|
||||||
|
req.Header.Set(k, v)
|
||||||
|
}
|
||||||
|
|
||||||
|
resp, err := c.http.Do(req)
|
||||||
|
if err != nil {
|
||||||
|
// Network error / timeout — retryable (unless the parent ctx is done).
|
||||||
|
return nil, ctx.Err() == nil, err
|
||||||
|
}
|
||||||
|
defer resp.Body.Close()
|
||||||
|
data, _ := io.ReadAll(resp.Body)
|
||||||
|
|
||||||
|
if resp.StatusCode == http.StatusTooManyRequests || resp.StatusCode >= 500 {
|
||||||
|
return nil, true, fmt.Errorf("%s http %d: %s", c.name, resp.StatusCode, snippet(data))
|
||||||
|
}
|
||||||
|
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
|
||||||
|
return nil, false, fmt.Errorf("%s http %d: %s", c.name, resp.StatusCode, snippet(data))
|
||||||
|
}
|
||||||
|
|
||||||
|
var out openAIResponse
|
||||||
|
if err := json.Unmarshal(data, &out); err != nil {
|
||||||
|
return nil, false, fmt.Errorf("%s decode: %w", c.name, err)
|
||||||
|
}
|
||||||
|
// A 2xx is a billed call even when the model returns empty content (content
|
||||||
|
// filter, finish_reason=length with no text, or no choices). Return it as a
|
||||||
|
// success so the caller books the real cost via the ledger instead of refunding
|
||||||
|
// the slot and losing the spend — which would let empty replies bypass BOTH the
|
||||||
|
// per-user cap and the global ceiling. The caller just won't send an empty body.
|
||||||
|
return &out, false, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func snippet(b []byte) string {
|
||||||
|
const max = 300
|
||||||
|
if len(b) > max {
|
||||||
|
return string(b[:max]) + "…"
|
||||||
|
}
|
||||||
|
return string(b)
|
||||||
|
}
|
||||||
65
apps/ai-bot/llm.go
Normal file
65
apps/ai-bot/llm.go
Normal file
|
|
@ -0,0 +1,65 @@
|
||||||
|
package main
|
||||||
|
|
||||||
|
import "context"
|
||||||
|
|
||||||
|
// llm.go is the provider-neutral seam between the bot's business logic and the
|
||||||
|
// concrete model backends. Nothing here names a vendor: the bot composes its
|
||||||
|
// context, prices usage, and books spend against these types, and a thin adapter
|
||||||
|
// (provider_xai.go, provider_gemini.go) maps them to/from each backend's wire
|
||||||
|
// format. This is what lets a second model (Gemini) slot in behind a flag without
|
||||||
|
// the business logic learning a new shape.
|
||||||
|
|
||||||
|
// Message is one provider-neutral chat turn.
|
||||||
|
type Message struct {
|
||||||
|
Role string // "system" | "user" | "assistant"
|
||||||
|
Content string
|
||||||
|
}
|
||||||
|
|
||||||
|
// Usage is the provider-neutral token accounting returned with a completion. It
|
||||||
|
// drives billing (computeUSD) — the counts are the API's own, authoritative even
|
||||||
|
// if our price constants drift.
|
||||||
|
type Usage struct {
|
||||||
|
PromptTokens int
|
||||||
|
CachedTokens int // subset of PromptTokens served from the provider's prompt cache
|
||||||
|
CompletionTokens int
|
||||||
|
}
|
||||||
|
|
||||||
|
// Tool is a provider-neutral tool the model may invoke (e.g. web search). Empty
|
||||||
|
// today; the web-freshness layer (Phase 3) populates it. Carried here so the
|
||||||
|
// request type is stable across phases.
|
||||||
|
type Tool struct {
|
||||||
|
// Type names the tool, e.g. "web_search". Adapters translate it to each
|
||||||
|
// backend's tool wire shape.
|
||||||
|
Type string
|
||||||
|
}
|
||||||
|
|
||||||
|
// LLMRequest is a provider-neutral completion request. New optional fields (Tools,
|
||||||
|
// ReasoningEffort) serialize away when empty, so a plain grok_direct call produces
|
||||||
|
// exactly the same wire body it did before this seam existed.
|
||||||
|
type LLMRequest struct {
|
||||||
|
Model string
|
||||||
|
Messages []Message
|
||||||
|
MaxTokens int
|
||||||
|
Temperature float64
|
||||||
|
Tools []Tool // optional; populated by the web layer
|
||||||
|
ReasoningEffort string // optional; "" = default, e.g. "low"|"high" for the reasoning route
|
||||||
|
// ConvID is an optional prompt-cache routing hint. Adapters that support it (xAI's
|
||||||
|
// x-grok-conv-id) pin a conversation to one backend to raise cache hit rate; "" =
|
||||||
|
// don't send it. It is a header, not part of the request body, so it never changes
|
||||||
|
// the wire body and an unset value is a no-op.
|
||||||
|
ConvID string
|
||||||
|
}
|
||||||
|
|
||||||
|
// LLMResponse is a provider-neutral completion result.
|
||||||
|
type LLMResponse struct {
|
||||||
|
Text string
|
||||||
|
Usage Usage
|
||||||
|
ProviderRequestID string // the backend's response id, logged for support/debug
|
||||||
|
}
|
||||||
|
|
||||||
|
// LLMClient is any chat-completion backend (Grok, Gemini, …). Implementations are
|
||||||
|
// thin adapters over a wire protocol; the bot depends only on this interface, so
|
||||||
|
// Bot.llm can be swapped or routed without touching business logic.
|
||||||
|
type LLMClient interface {
|
||||||
|
Complete(ctx context.Context, req LLMRequest) (*LLMResponse, error)
|
||||||
|
}
|
||||||
|
|
@ -11,7 +11,6 @@ import (
|
||||||
"fmt"
|
"fmt"
|
||||||
"os"
|
"os"
|
||||||
"os/signal"
|
"os/signal"
|
||||||
"path/filepath"
|
|
||||||
"syscall"
|
"syscall"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -91,8 +90,3 @@ func main() {
|
||||||
}
|
}
|
||||||
logger.Info("shut down cleanly")
|
logger.Info("shut down cleanly")
|
||||||
}
|
}
|
||||||
|
|
||||||
// statePath joins a filename under the configured state directory.
|
|
||||||
func (c *Config) statePath(name string) string {
|
|
||||||
return filepath.Join(c.StateDir, name)
|
|
||||||
}
|
|
||||||
|
|
|
||||||
42
apps/ai-bot/pricing.go
Normal file
42
apps/ai-bot/pricing.go
Normal file
|
|
@ -0,0 +1,42 @@
|
||||||
|
package main
|
||||||
|
|
||||||
|
// pricing.go centralises model pricing as a per-model table (the LiteLLM pattern)
|
||||||
|
// instead of three hardcoded Grok fields. The spend ledger prices each call by the
|
||||||
|
// model it actually used, so when a second model (Gemini) starts answering some
|
||||||
|
// routes, its cost books correctly against the same global ceiling.
|
||||||
|
|
||||||
|
// ModelPrice is the per-1M-token USD price for one model, applied to the API's
|
||||||
|
// returned usage so the wallet ceiling tracks real cost even as prices change.
|
||||||
|
type ModelPrice struct {
|
||||||
|
InputPerM float64 // non-cached prompt tokens
|
||||||
|
CachedPerM float64 // prompt tokens served from the provider cache (cheaper)
|
||||||
|
OutputPerM float64 // completion tokens
|
||||||
|
}
|
||||||
|
|
||||||
|
// CostBreakdown is the per-component USD cost of answering one request. A plain
|
||||||
|
// grok_direct call has only Token; a cascade adds Router (the cheap classifier),
|
||||||
|
// Grounding (Gemini Google-search) and/or WebTool (Grok web search) on top. Settle
|
||||||
|
// books each column separately so the ledger and request_log can attribute spend,
|
||||||
|
// and so a half-finished cascade can book only what it actually spent (§8.1).
|
||||||
|
type CostBreakdown struct {
|
||||||
|
Token float64
|
||||||
|
Grounding float64
|
||||||
|
WebTool float64
|
||||||
|
Router float64
|
||||||
|
}
|
||||||
|
|
||||||
|
// Total is the grand total across all components (the number the wallet ceiling and
|
||||||
|
// request_log.total_usd care about). Computed, never stored, so it can't drift.
|
||||||
|
func (c CostBreakdown) Total() float64 {
|
||||||
|
return c.Token + c.Grounding + c.WebTool + c.Router
|
||||||
|
}
|
||||||
|
|
||||||
|
// priceFor returns the configured price for a model. An unknown model falls back to
|
||||||
|
// the default (final-voice) model's price rather than $0 — a $0 price would silently
|
||||||
|
// blind the global ceiling to that call, the one failure mode we never want.
|
||||||
|
func (c *Config) priceFor(model string) ModelPrice {
|
||||||
|
if p, ok := c.Prices[model]; ok {
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
return c.Prices[c.XAIModel]
|
||||||
|
}
|
||||||
189
apps/ai-bot/provider_gemini.go
Normal file
189
apps/ai-bot/provider_gemini.go
Normal file
|
|
@ -0,0 +1,189 @@
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"context"
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"log/slog"
|
||||||
|
"net/http"
|
||||||
|
"net/url"
|
||||||
|
"strings"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// provider_gemini.go is the Gemini backend. Two faces:
|
||||||
|
//
|
||||||
|
// - geminiClient: a thin LLMClient over the OpenAI-compatible endpoint, used for the
|
||||||
|
// cheap trivial route and the Layer-1 router classifier. Same wire format as Grok,
|
||||||
|
// so it reuses the shared transport (httpllm.go).
|
||||||
|
// - groundedSearch: a SEPARATE call against the NATIVE v1beta generateContent endpoint
|
||||||
|
// with the google_search tool. Grounding does NOT work on the OpenAI-compat layer
|
||||||
|
// (it is silently ignored there, and only on Gemini 3+) — verified against Google's
|
||||||
|
// docs (F-EXT-3) — so the web layer that wants Gemini grounding must use this native
|
||||||
|
// path and VERIFY citations came back, else degrade.
|
||||||
|
type geminiClient struct {
|
||||||
|
http *openAIClient
|
||||||
|
nativeBase string // …/v1beta — derived from the OpenAI-compat base by dropping /openai
|
||||||
|
key string
|
||||||
|
model string
|
||||||
|
httpc *http.Client
|
||||||
|
log *slog.Logger
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewGeminiClient builds the Gemini backend. base is the OpenAI-compatible endpoint
|
||||||
|
// (…/v1beta/openai); the native grounding endpoint is derived from it. Returns the
|
||||||
|
// concrete type (not just LLMClient) because the web layer needs groundedSearch too.
|
||||||
|
func NewGeminiClient(base, key, model string, logger *slog.Logger) *geminiClient {
|
||||||
|
return &geminiClient{
|
||||||
|
http: newOpenAIClient("gemini", base, key, nil, logger),
|
||||||
|
nativeBase: strings.TrimSuffix(base, "/openai"),
|
||||||
|
key: key,
|
||||||
|
model: model,
|
||||||
|
httpc: &http.Client{},
|
||||||
|
log: logger,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Complete answers via the OpenAI-compatible endpoint (trivial route + classifier).
|
||||||
|
func (c *geminiClient) Complete(ctx context.Context, req LLMRequest) (*LLMResponse, error) {
|
||||||
|
msgs := make([]openAIMessage, len(req.Messages))
|
||||||
|
for i, m := range req.Messages {
|
||||||
|
msgs[i] = openAIMessage{Role: m.Role, Content: m.Content}
|
||||||
|
}
|
||||||
|
resp, err := c.http.complete(ctx, openAIRequest{
|
||||||
|
Model: req.Model,
|
||||||
|
Messages: msgs,
|
||||||
|
MaxTokens: req.MaxTokens,
|
||||||
|
Temperature: req.Temperature,
|
||||||
|
Stream: false,
|
||||||
|
}, nil)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
return &LLMResponse{
|
||||||
|
Text: resp.Text(),
|
||||||
|
Usage: Usage{
|
||||||
|
PromptTokens: resp.Usage.PromptTokens,
|
||||||
|
CachedTokens: resp.Usage.PromptTokensDetails.CachedTokens,
|
||||||
|
CompletionTokens: resp.Usage.CompletionTokens,
|
||||||
|
},
|
||||||
|
ProviderRequestID: resp.ID,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- native v1beta grounded search (google_search tool) ---------------------------
|
||||||
|
|
||||||
|
type geminiGroundResult struct {
|
||||||
|
Digest string
|
||||||
|
Citations []string
|
||||||
|
Usage Usage
|
||||||
|
}
|
||||||
|
|
||||||
|
// native generateContent wire types (only the fields we read/write).
|
||||||
|
type geminiNativeRequest struct {
|
||||||
|
Contents []geminiContent `json:"contents"`
|
||||||
|
Tools []geminiTool `json:"tools"`
|
||||||
|
}
|
||||||
|
type geminiContent struct {
|
||||||
|
Role string `json:"role,omitempty"`
|
||||||
|
Parts []geminiPart `json:"parts"`
|
||||||
|
}
|
||||||
|
type geminiPart struct {
|
||||||
|
Text string `json:"text"`
|
||||||
|
}
|
||||||
|
type geminiTool struct {
|
||||||
|
// google_search is the current grounding tool (Gemini 3 / current models). The
|
||||||
|
// empty object enables it.
|
||||||
|
GoogleSearch struct{} `json:"google_search"`
|
||||||
|
}
|
||||||
|
type geminiNativeResponse struct {
|
||||||
|
Candidates []struct {
|
||||||
|
Content struct {
|
||||||
|
Parts []geminiPart `json:"parts"`
|
||||||
|
} `json:"content"`
|
||||||
|
GroundingMetadata struct {
|
||||||
|
GroundingChunks []struct {
|
||||||
|
Web struct {
|
||||||
|
URI string `json:"uri"`
|
||||||
|
Title string `json:"title"`
|
||||||
|
} `json:"web"`
|
||||||
|
} `json:"groundingChunks"`
|
||||||
|
} `json:"groundingMetadata"`
|
||||||
|
} `json:"candidates"`
|
||||||
|
UsageMetadata struct {
|
||||||
|
PromptTokenCount int `json:"promptTokenCount"`
|
||||||
|
CandidatesTokenCount int `json:"candidatesTokenCount"`
|
||||||
|
CachedContentTokenCount int `json:"cachedContentTokenCount"`
|
||||||
|
} `json:"usageMetadata"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// groundedSearch runs one grounded generateContent against the native endpoint and
|
||||||
|
// returns the model's grounded answer plus the source URLs. It REQUIRES citations:
|
||||||
|
// if groundingMetadata has no chunks the request was not actually grounded (the
|
||||||
|
// silent-ignore failure mode, F-EXT-3), so it errors and the caller degrades rather
|
||||||
|
// than passing off ungrounded — possibly stale — text as fresh.
|
||||||
|
func (c *geminiClient) groundedSearch(ctx context.Context, query string) (geminiGroundResult, error) {
|
||||||
|
body, err := json.Marshal(geminiNativeRequest{
|
||||||
|
Contents: []geminiContent{{Role: "user", Parts: []geminiPart{{Text: query}}}},
|
||||||
|
Tools: []geminiTool{{}},
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return geminiGroundResult{}, err
|
||||||
|
}
|
||||||
|
|
||||||
|
// API key in the query string is the native v1beta convention.
|
||||||
|
endpoint := fmt.Sprintf("%s/models/%s:generateContent?key=%s",
|
||||||
|
c.nativeBase, url.PathEscape(c.model), url.QueryEscape(c.key))
|
||||||
|
|
||||||
|
reqCtx, cancel := context.WithTimeout(ctx, 15*time.Second) // web/grounding budget (§8.2.2)
|
||||||
|
defer cancel()
|
||||||
|
req, err := http.NewRequestWithContext(reqCtx, http.MethodPost, endpoint, bytes.NewReader(body))
|
||||||
|
if err != nil {
|
||||||
|
return geminiGroundResult{}, err
|
||||||
|
}
|
||||||
|
req.Header.Set("Content-Type", "application/json")
|
||||||
|
|
||||||
|
resp, err := c.httpc.Do(req)
|
||||||
|
if err != nil {
|
||||||
|
return geminiGroundResult{}, err
|
||||||
|
}
|
||||||
|
defer resp.Body.Close()
|
||||||
|
data, _ := io.ReadAll(resp.Body)
|
||||||
|
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
|
||||||
|
return geminiGroundResult{}, fmt.Errorf("gemini grounding http %d: %s", resp.StatusCode, snippet(data))
|
||||||
|
}
|
||||||
|
|
||||||
|
var out geminiNativeResponse
|
||||||
|
if err := json.Unmarshal(data, &out); err != nil {
|
||||||
|
return geminiGroundResult{}, fmt.Errorf("gemini grounding decode: %w", err)
|
||||||
|
}
|
||||||
|
if len(out.Candidates) == 0 {
|
||||||
|
return geminiGroundResult{}, fmt.Errorf("gemini grounding: no candidates")
|
||||||
|
}
|
||||||
|
|
||||||
|
var sb strings.Builder
|
||||||
|
for _, p := range out.Candidates[0].Content.Parts {
|
||||||
|
sb.WriteString(p.Text)
|
||||||
|
}
|
||||||
|
var citations []string
|
||||||
|
for _, ch := range out.Candidates[0].GroundingMetadata.GroundingChunks {
|
||||||
|
if ch.Web.URI != "" {
|
||||||
|
citations = append(citations, ch.Web.URI)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// The verify-gate: no citations ⇒ not actually grounded ⇒ degrade.
|
||||||
|
if len(citations) == 0 {
|
||||||
|
return geminiGroundResult{}, fmt.Errorf("gemini grounding: no citations (ungrounded — degrade)")
|
||||||
|
}
|
||||||
|
return geminiGroundResult{
|
||||||
|
Digest: strings.TrimSpace(sb.String()),
|
||||||
|
Citations: citations,
|
||||||
|
Usage: Usage{
|
||||||
|
PromptTokens: out.UsageMetadata.PromptTokenCount,
|
||||||
|
CachedTokens: out.UsageMetadata.CachedContentTokenCount,
|
||||||
|
CompletionTokens: out.UsageMetadata.CandidatesTokenCount,
|
||||||
|
},
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
62
apps/ai-bot/provider_xai.go
Normal file
62
apps/ai-bot/provider_xai.go
Normal file
|
|
@ -0,0 +1,62 @@
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"log/slog"
|
||||||
|
)
|
||||||
|
|
||||||
|
// provider_xai.go is the thin adapter for xAI's Grok backend. xAI speaks the
|
||||||
|
// OpenAI Chat Completions wire format, so this is a shell over the shared
|
||||||
|
// openAIClient transport (httpllm.go): it only maps the neutral LLMRequest/
|
||||||
|
// LLMResponse to/from the wire types. Any xAI-specific request shaping would live
|
||||||
|
// here, but Grok needs none today.
|
||||||
|
type xaiClient struct {
|
||||||
|
http *openAIClient
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewXAIClient builds the Grok backend. Returns the neutral LLMClient so the bot
|
||||||
|
// holds no vendor type.
|
||||||
|
func NewXAIClient(base, key string, logger *slog.Logger) LLMClient {
|
||||||
|
return &xaiClient{http: newOpenAIClient("xai", base, key, nil, logger)}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (c *xaiClient) Complete(ctx context.Context, req LLMRequest) (*LLMResponse, error) {
|
||||||
|
msgs := make([]openAIMessage, len(req.Messages))
|
||||||
|
for i, m := range req.Messages {
|
||||||
|
msgs[i] = openAIMessage{Role: m.Role, Content: m.Content}
|
||||||
|
}
|
||||||
|
var tools []openAITool
|
||||||
|
for _, t := range req.Tools {
|
||||||
|
tools = append(tools, openAITool{Type: t.Type})
|
||||||
|
}
|
||||||
|
|
||||||
|
// x-grok-conv-id pins this conversation to one backend to raise the prompt-cache
|
||||||
|
// hit rate (caching itself is automatic on xAI). Only sent when set, so the
|
||||||
|
// default path adds no header.
|
||||||
|
var headers map[string]string
|
||||||
|
if req.ConvID != "" {
|
||||||
|
headers = map[string]string{"x-grok-conv-id": req.ConvID}
|
||||||
|
}
|
||||||
|
|
||||||
|
resp, err := c.http.complete(ctx, openAIRequest{
|
||||||
|
Model: req.Model,
|
||||||
|
Messages: msgs,
|
||||||
|
MaxTokens: req.MaxTokens,
|
||||||
|
Temperature: req.Temperature,
|
||||||
|
Stream: false,
|
||||||
|
Tools: tools,
|
||||||
|
ReasoningEffort: req.ReasoningEffort,
|
||||||
|
}, headers)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
return &LLMResponse{
|
||||||
|
Text: resp.Text(),
|
||||||
|
Usage: Usage{
|
||||||
|
PromptTokens: resp.Usage.PromptTokens,
|
||||||
|
CachedTokens: resp.Usage.PromptTokensDetails.CachedTokens,
|
||||||
|
CompletionTokens: resp.Usage.CompletionTokens,
|
||||||
|
},
|
||||||
|
ProviderRequestID: resp.ID,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
180
apps/ai-bot/router.go
Normal file
180
apps/ai-bot/router.go
Normal file
|
|
@ -0,0 +1,180 @@
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"encoding/json"
|
||||||
|
"regexp"
|
||||||
|
"strings"
|
||||||
|
)
|
||||||
|
|
||||||
|
// router.go classifies a message into a route. It runs INSIDE respond() — after the
|
||||||
|
// mention/media/foreign/single-flight gates (F-FUNC-7) — so a paid Layer-1 classifier
|
||||||
|
// is never spent on a message today's bot drops for free.
|
||||||
|
//
|
||||||
|
// Two layers, both conservative (doubt → grok_direct, the safe floor that keeps
|
||||||
|
// substantive questions on Grok, §8.6):
|
||||||
|
// - Layer-0: free regex heuristics (RU+EN). Always runs when ROUTER_ENABLED.
|
||||||
|
// - Layer-1: a cheap Gemini JSON classifier, consulted ONLY on Layer-0 grok_direct
|
||||||
|
// when ROUTER_CLASSIFIER_ENABLED. Any failure falls back to the Layer-0 verdict.
|
||||||
|
|
||||||
|
// RouterDecision is the route plus the signals behind it (logged for threshold
|
||||||
|
// calibration). Only Route/Source/Confidence/NeedsWeb drive behaviour today; the rest
|
||||||
|
// are recorded for the offline router-replay eval (§9).
|
||||||
|
type RouterDecision struct {
|
||||||
|
Route string
|
||||||
|
Source string // heuristic | classifier | default | forced | degraded
|
||||||
|
Confidence float64
|
||||||
|
NeedsWeb bool
|
||||||
|
Freshness string
|
||||||
|
ReasoningLevel string
|
||||||
|
Domain string
|
||||||
|
Difficulty string
|
||||||
|
}
|
||||||
|
|
||||||
|
// Heuristic patterns. Kept deliberately tight: a false "trivial" leaks a real question
|
||||||
|
// to the cheap model, so trivial fires only on short, unmistakable greetings/acks or
|
||||||
|
// bare arithmetic. Freshness words route to web (a false web-route only costs a fetch
|
||||||
|
// and degrades cleanly — never a wrong answer).
|
||||||
|
var (
|
||||||
|
greetingRe = regexp.MustCompile(`^(привет(ик)?|здравствуй(те)?|хай|прив|ку|добрый\s+(день|вечер|утро)|спасибо|спс|благодарю|пока|ок(ей)?|угу|ага|hello|hi|hey|yo|thanks|thank\s+you|thx|ty|bye|goodbye|ok|okay|cool|nice)[\s!.,)]*$`)
|
||||||
|
arithmeticRe = regexp.MustCompile(`^[\s(]*\d+(\s*[-+*/×÷]\s*\d+)+[\s)=?]*$`)
|
||||||
|
freshnessRe = regexp.MustCompile(`(новост|сегодня|сейчас|последн|курс\s|погод|котировк|расписани|прогноз|breaking|today|right now|latest|current(ly)?|news|weather|stock price|exchange rate|score)`)
|
||||||
|
)
|
||||||
|
|
||||||
|
// routeLayer0 is the free heuristic. Confidence is a rough self-estimate used only for
|
||||||
|
// logging/threshold tuning, not control flow.
|
||||||
|
func routeLayer0(body string) RouterDecision {
|
||||||
|
s := strings.ToLower(strings.TrimSpace(body))
|
||||||
|
if s == "" {
|
||||||
|
return RouterDecision{Route: routeGrokDirect, Source: "heuristic", Confidence: 0.5}
|
||||||
|
}
|
||||||
|
if freshnessRe.MatchString(s) {
|
||||||
|
return RouterDecision{Route: routeWebThenGrok, Source: "heuristic", Confidence: 0.7, NeedsWeb: true, Freshness: "recent"}
|
||||||
|
}
|
||||||
|
if isTrivial(s) {
|
||||||
|
return RouterDecision{Route: routeTrivial, Source: "heuristic", Confidence: 0.85, Difficulty: "trivial"}
|
||||||
|
}
|
||||||
|
return RouterDecision{Route: routeGrokDirect, Source: "heuristic", Confidence: 0.6}
|
||||||
|
}
|
||||||
|
|
||||||
|
// isTrivial: a short greeting/ack or a bare arithmetic expression, with no sign of a
|
||||||
|
// real question. Length-bounded so "thanks, now explain quantum tunnelling" is NOT
|
||||||
|
// trivial.
|
||||||
|
func isTrivial(s string) bool {
|
||||||
|
if arithmeticRe.MatchString(s) {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
if len(strings.Fields(s)) <= 4 && greetingRe.MatchString(s) {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
// classify produces the final RouterDecision for a request. The manual reasoning
|
||||||
|
// trigger is honoured independently of the heuristic router (it's a deliberate user
|
||||||
|
// signal). Layer-1's cost, when it runs, is accumulated into cost.Router.
|
||||||
|
func (b *Bot) classify(ctx context.Context, body string, cost *CostBreakdown) RouterDecision {
|
||||||
|
if b.cfg.ReasoningEnabled && containsTrigger(body, b.cfg.ReasoningTrigger) {
|
||||||
|
return RouterDecision{Route: routeReason, Source: "forced", Confidence: 1, ReasoningLevel: "high"}
|
||||||
|
}
|
||||||
|
if !b.cfg.RouterEnabled {
|
||||||
|
return RouterDecision{Route: routeGrokDirect, Source: "default"}
|
||||||
|
}
|
||||||
|
d := routeLayer0(body)
|
||||||
|
// Layer-1 only refines the uncertain grok_direct verdict, and only if enabled and
|
||||||
|
// the Gemini client exists. Anything else stands on the heuristic.
|
||||||
|
if d.Route != routeGrokDirect || !b.cfg.RouterClassifierEnabled || b.gemini == nil {
|
||||||
|
return d
|
||||||
|
}
|
||||||
|
refined, err := b.routeLayer1(ctx, body, cost)
|
||||||
|
if err != nil {
|
||||||
|
b.log.Warn("layer-1 classifier failed; using heuristic", "err", err)
|
||||||
|
return d // degrade to the heuristic verdict
|
||||||
|
}
|
||||||
|
return refined
|
||||||
|
}
|
||||||
|
|
||||||
|
// classifierConfidenceFloor is the bar a Layer-1 escalation OFF the safe floor
|
||||||
|
// (trivial/web/reason) must clear. Below it, the verdict is treated as doubt and the
|
||||||
|
// request stays on grok_direct — the owner's "substantive stays on Grok" rule (§8.6).
|
||||||
|
// A low-confidence "trivial" is exactly the false-trivial voice leak we must not take.
|
||||||
|
const classifierConfidenceFloor = 0.8
|
||||||
|
|
||||||
|
// classifierPrompt asks Gemini for a strict JSON verdict. Kept terse to bound tokens.
|
||||||
|
const classifierPrompt = `You are a router. Classify the user message into exactly one route and reply with ONLY a JSON object, no prose.
|
||||||
|
Routes: "trivial" (greeting/ack/tiny arithmetic), "web" (needs fresh/current facts: news, prices, weather, "today"), "normal" (everything else).
|
||||||
|
Schema: {"route":"trivial|web|normal","confidence":0.0-1.0,"needs_web":true|false}
|
||||||
|
Message: `
|
||||||
|
|
||||||
|
// routeLayer1 runs the Gemini classifier and parses its JSON. A non-JSON or unknown
|
||||||
|
// answer is an error so classify() degrades to the heuristic — the cheap model never
|
||||||
|
// gets to silently mis-route by returning garbage.
|
||||||
|
func (b *Bot) routeLayer1(ctx context.Context, body string, cost *CostBreakdown) (RouterDecision, error) {
|
||||||
|
resp, err := b.gemini.Complete(ctx, LLMRequest{
|
||||||
|
Model: b.cfg.GeminiModel,
|
||||||
|
Messages: []Message{{Role: "user", Content: classifierPrompt + body}},
|
||||||
|
MaxTokens: 60,
|
||||||
|
Temperature: 0,
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return RouterDecision{}, err
|
||||||
|
}
|
||||||
|
cost.Router += computeUSD(b.cfg.GeminiModel, resp.Usage, b.cfg)
|
||||||
|
|
||||||
|
var parsed struct {
|
||||||
|
Route string `json:"route"`
|
||||||
|
Confidence float64 `json:"confidence"`
|
||||||
|
NeedsWeb bool `json:"needs_web"`
|
||||||
|
}
|
||||||
|
if err := json.Unmarshal([]byte(extractJSON(resp.Text)), &parsed); err != nil {
|
||||||
|
return RouterDecision{}, err
|
||||||
|
}
|
||||||
|
route := normalizeRoute(parsed.Route)
|
||||||
|
// Safe floor: a low-confidence escalation off grok_direct is doubt — keep it on
|
||||||
|
// Grok rather than leak a possibly-substantive question to the cheap model.
|
||||||
|
if route != routeGrokDirect && parsed.Confidence < classifierConfidenceFloor {
|
||||||
|
return RouterDecision{Route: routeGrokDirect, Source: "classifier", Confidence: parsed.Confidence}, nil
|
||||||
|
}
|
||||||
|
return RouterDecision{
|
||||||
|
Route: route,
|
||||||
|
Source: "classifier",
|
||||||
|
Confidence: parsed.Confidence,
|
||||||
|
NeedsWeb: parsed.NeedsWeb || route == routeWebThenGrok,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// normalizeRoute maps a classifier label to a route constant, defaulting unknown
|
||||||
|
// labels to grok_direct — the safe floor, so a confused classifier never escalates.
|
||||||
|
func normalizeRoute(label string) string {
|
||||||
|
switch strings.ToLower(strings.TrimSpace(label)) {
|
||||||
|
case "trivial", "trivial_direct":
|
||||||
|
return routeTrivial
|
||||||
|
case "web", "web_then_grok":
|
||||||
|
return routeWebThenGrok
|
||||||
|
case "reason", "reason_then_grok":
|
||||||
|
return routeReason
|
||||||
|
default:
|
||||||
|
return routeGrokDirect
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// extractJSON pulls the first {...} object out of a model reply, tolerating prose or
|
||||||
|
// code fences around it. Returns "" if none (→ a parse error → degrade).
|
||||||
|
func extractJSON(s string) string {
|
||||||
|
i := strings.IndexByte(s, '{')
|
||||||
|
j := strings.LastIndexByte(s, '}')
|
||||||
|
if i < 0 || j < i {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
return s[i : j+1]
|
||||||
|
}
|
||||||
|
|
||||||
|
// containsTrigger reports whether body contains the manual trigger phrase
|
||||||
|
// (case-insensitive, whitespace-trimmed). Empty trigger never matches.
|
||||||
|
func containsTrigger(body, trigger string) bool {
|
||||||
|
trigger = strings.TrimSpace(strings.ToLower(trigger))
|
||||||
|
if trigger == "" {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
return strings.Contains(strings.ToLower(body), trigger)
|
||||||
|
}
|
||||||
81
apps/ai-bot/router_test.go
Normal file
81
apps/ai-bot/router_test.go
Normal file
|
|
@ -0,0 +1,81 @@
|
||||||
|
package main
|
||||||
|
|
||||||
|
import "testing"
|
||||||
|
|
||||||
|
// TestRouteLayer0 is the heuristic golden set (RU+EN). The critical property is the
|
||||||
|
// safe floor: anything substantive must land on grok_direct, and a long message that
|
||||||
|
// merely starts with a greeting must NOT be trivial (no leaking real questions to the
|
||||||
|
// cheap model, §8.6).
|
||||||
|
func TestRouteLayer0(t *testing.T) {
|
||||||
|
cases := []struct {
|
||||||
|
body string
|
||||||
|
want string
|
||||||
|
}{
|
||||||
|
// trivial: short greetings/acks and bare arithmetic
|
||||||
|
{"привет", routeTrivial},
|
||||||
|
{"Привет!", routeTrivial},
|
||||||
|
{"спасибо", routeTrivial},
|
||||||
|
{"спс", routeTrivial},
|
||||||
|
{"ок", routeTrivial},
|
||||||
|
{"hi", routeTrivial},
|
||||||
|
{"hello", routeTrivial},
|
||||||
|
{"thanks", routeTrivial},
|
||||||
|
{"thank you", routeTrivial},
|
||||||
|
{"ok", routeTrivial},
|
||||||
|
{"2+2", routeTrivial},
|
||||||
|
{"100 * 50", routeTrivial},
|
||||||
|
{"12 / 4 - 1", routeTrivial},
|
||||||
|
// web: freshness signals
|
||||||
|
{"какие новости сегодня?", routeWebThenGrok},
|
||||||
|
{"что сейчас происходит в мире", routeWebThenGrok},
|
||||||
|
{"курс доллара сегодня", routeWebThenGrok},
|
||||||
|
{"what's the weather today", routeWebThenGrok},
|
||||||
|
{"latest news on AI", routeWebThenGrok},
|
||||||
|
{"current bitcoin price", routeWebThenGrok},
|
||||||
|
// grok_direct: substantive (the safe floor)
|
||||||
|
{"посоветуй фильм на вечер", routeGrokDirect},
|
||||||
|
{"explain how TCP works", routeGrokDirect},
|
||||||
|
{"расскажи историю римской империи", routeGrokDirect},
|
||||||
|
{"спасибо, а теперь подробно объясни квантовую запутанность", routeGrokDirect}, // starts w/ ack but long
|
||||||
|
{"hi, can you help me debug this Go program?", routeGrokDirect}, // starts w/ hi but a real ask
|
||||||
|
{"напиши функцию сортировки на python", routeGrokDirect},
|
||||||
|
}
|
||||||
|
for _, c := range cases {
|
||||||
|
if got := routeLayer0(c.body).Route; got != c.want {
|
||||||
|
t.Errorf("routeLayer0(%q) = %q, want %q", c.body, got, c.want)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestNormalizeRoute(t *testing.T) {
|
||||||
|
cases := map[string]string{
|
||||||
|
"trivial": routeTrivial, "web": routeWebThenGrok, "reason": routeReason,
|
||||||
|
"normal": routeGrokDirect, "garbage": routeGrokDirect, "": routeGrokDirect,
|
||||||
|
}
|
||||||
|
for in, want := range cases {
|
||||||
|
if got := normalizeRoute(in); got != want {
|
||||||
|
t.Errorf("normalizeRoute(%q) = %q, want %q", in, got, want)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExtractJSON(t *testing.T) {
|
||||||
|
if got := extractJSON("prefix {\"route\":\"web\"} suffix"); got != `{"route":"web"}` {
|
||||||
|
t.Errorf("extractJSON = %q", got)
|
||||||
|
}
|
||||||
|
if got := extractJSON("no json here"); got != "" {
|
||||||
|
t.Errorf("extractJSON(no json) = %q, want empty", got)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestContainsTrigger(t *testing.T) {
|
||||||
|
if !containsTrigger("ну подумай глубже про это", "подумай глубже") {
|
||||||
|
t.Error("should match trigger phrase mid-sentence")
|
||||||
|
}
|
||||||
|
if containsTrigger("just a normal question", "подумай глубже") {
|
||||||
|
t.Error("must not match when phrase absent")
|
||||||
|
}
|
||||||
|
if containsTrigger("anything", "") {
|
||||||
|
t.Error("empty trigger must never match")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -2,6 +2,7 @@ package main
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
|
"encoding/json"
|
||||||
"errors"
|
"errors"
|
||||||
"fmt"
|
"fmt"
|
||||||
"time"
|
"time"
|
||||||
|
|
@ -107,6 +108,64 @@ var migrations = []string{
|
||||||
PRIMARY KEY (date, mxid)
|
PRIMARY KEY (date, mxid)
|
||||||
);
|
);
|
||||||
CREATE TABLE IF NOT EXISTS warned_encrypted (room_id TEXT PRIMARY KEY);`,
|
CREATE TABLE IF NOT EXISTS warned_encrypted (room_id TEXT PRIMARY KEY);`,
|
||||||
|
|
||||||
|
// v2: component cost columns + the optimistic reservation column. `reserved_usd`
|
||||||
|
// holds the estimated max-cost of in-flight calls so the global ceiling counts
|
||||||
|
// committed + reserved spend at admission time (the TOCTOU fix, §8.1): without it
|
||||||
|
// a burst of concurrent calls all read the same low committed SUM and slip past
|
||||||
|
// the ceiling, because the USD only lands at settle, AFTER the call. The component
|
||||||
|
// columns let the ceiling see grounding/tool fees too (not just tokens), and feed
|
||||||
|
// the per-component analytics. ADD COLUMN IF NOT EXISTS is idempotent.
|
||||||
|
`ALTER TABLE spend ADD COLUMN IF NOT EXISTS reserved_usd DOUBLE PRECISION NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE spend ADD COLUMN IF NOT EXISTS router_usd DOUBLE PRECISION NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE spend ADD COLUMN IF NOT EXISTS grounding_usd DOUBLE PRECISION NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE spend ADD COLUMN IF NOT EXISTS webtool_usd DOUBLE PRECISION NOT NULL DEFAULT 0;`,
|
||||||
|
|
||||||
|
// v3: request_log — one row per engaged request, for offline analysis of the route
|
||||||
|
// mix, per-component $/day, latency, escalation/degrade rates (§6.2). Operational,
|
||||||
|
// not message content: query_text is written ONLY when TELEMETRY_STORE_TEXT is on.
|
||||||
|
// Indexed by ts for the time-based retention trim and time-series queries.
|
||||||
|
`CREATE TABLE IF NOT EXISTS request_log (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
ts TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||||
|
room_id TEXT,
|
||||||
|
sender TEXT,
|
||||||
|
route TEXT,
|
||||||
|
router_source TEXT,
|
||||||
|
router_confidence REAL,
|
||||||
|
models JSONB,
|
||||||
|
prompt_tokens INT,
|
||||||
|
cached_tokens INT,
|
||||||
|
completion_tokens INT,
|
||||||
|
token_usd DOUBLE PRECISION,
|
||||||
|
grounding_usd DOUBLE PRECISION,
|
||||||
|
router_usd DOUBLE PRECISION,
|
||||||
|
webtool_usd DOUBLE PRECISION,
|
||||||
|
total_usd DOUBLE PRECISION,
|
||||||
|
latency_ms INT,
|
||||||
|
stage_ms JSONB,
|
||||||
|
escalated BOOL DEFAULT false,
|
||||||
|
fallback_fired BOOL DEFAULT false,
|
||||||
|
cache_hit BOOL DEFAULT false,
|
||||||
|
ceiling_hit BOOL DEFAULT false,
|
||||||
|
per_user_cap_hit BOOL DEFAULT false,
|
||||||
|
prompt_version TEXT,
|
||||||
|
provider_request_id TEXT,
|
||||||
|
degraded TEXT DEFAULT '',
|
||||||
|
err TEXT DEFAULT '',
|
||||||
|
ok BOOL,
|
||||||
|
query_text TEXT
|
||||||
|
);
|
||||||
|
CREATE INDEX IF NOT EXISTS request_log_ts_idx ON request_log (ts);`,
|
||||||
|
|
||||||
|
// v4: per-day grounded-prompt counter for the web grounding cap guard (§8.2.3). One
|
||||||
|
// row per UTC day; the cap check + increment is one atomic statement (same TOCTOU
|
||||||
|
// discipline as the spend gate), so a burst can't blow past the $/1k grounding
|
||||||
|
// overage. Day-keyed, so it self-resets and needs no separate trim.
|
||||||
|
`CREATE TABLE IF NOT EXISTS grounding_count (
|
||||||
|
date TEXT PRIMARY KEY,
|
||||||
|
n INTEGER NOT NULL DEFAULT 0
|
||||||
|
);`,
|
||||||
}
|
}
|
||||||
|
|
||||||
// migrate runs all pending migrations on a single connection under a session
|
// migrate runs all pending migrations on a single connection under a session
|
||||||
|
|
@ -207,13 +266,19 @@ func (s *Store) SeenEvent(eventID string) (bool, error) {
|
||||||
return true, err
|
return true, err
|
||||||
}
|
}
|
||||||
|
|
||||||
// SpentTodayUSD sums all spend for the current UTC day. SUM over no rows is NULL,
|
// committedUSDExpr sums every COMMITTED cost component of a spend row — tokens plus
|
||||||
// which scans into a nil *float64 → treated as 0.
|
// the grounding/web/router fees a cascade can incur — so the wallet ceiling is never
|
||||||
|
// blind to non-token spend. It deliberately excludes reserved_usd (that is in-flight,
|
||||||
|
// not yet spent); the admission gate adds reserved separately.
|
||||||
|
const committedUSDExpr = `usd + router_usd + grounding_usd + webtool_usd`
|
||||||
|
|
||||||
|
// SpentTodayUSD sums all COMMITTED spend for the current UTC day. SUM over no rows is
|
||||||
|
// NULL, which scans into a nil *float64 → treated as 0.
|
||||||
func (s *Store) SpentTodayUSD() (float64, error) {
|
func (s *Store) SpentTodayUSD() (float64, error) {
|
||||||
ctx, cancel := opContext()
|
ctx, cancel := opContext()
|
||||||
defer cancel()
|
defer cancel()
|
||||||
var v *float64
|
var v *float64
|
||||||
if err := s.pool.QueryRow(ctx, `SELECT SUM(usd) FROM spend WHERE date = $1`, todayUTC()).Scan(&v); err != nil {
|
if err := s.pool.QueryRow(ctx, `SELECT SUM(`+committedUSDExpr+`) FROM spend WHERE date = $1`, todayUTC()).Scan(&v); err != nil {
|
||||||
return 0, err
|
return 0, err
|
||||||
}
|
}
|
||||||
if v == nil {
|
if v == nil {
|
||||||
|
|
@ -222,19 +287,30 @@ func (s *Store) SpentTodayUSD() (float64, error) {
|
||||||
return *v, nil
|
return *v, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
// Reserve runs the two independent gates in one transaction, BEFORE the xAI call
|
// reserveDayLockKey namespaces the per-day admission lock so it can't collide with
|
||||||
// (F4): the global USD ceiling protects the wallet; the per-user request cap is
|
// the migration lock or any other advisory lock.
|
||||||
// anti-abuse. It increments the per-user request count on success; the USD is
|
const reserveDayLockKey = "ai-bot:reserve:"
|
||||||
// reconciled after the response. Order: global first (cheapest to deny), then
|
|
||||||
// per-user.
|
// Reserve runs the two admission gates in one transaction, BEFORE the call (F4): the
|
||||||
|
// global USD ceiling protects the wallet; the per-user request cap is anti-abuse. On
|
||||||
|
// success it both increments the per-user request count AND books `estimate` (the
|
||||||
|
// route's max-cost) into reserved_usd, so the global gate counts committed + reserved
|
||||||
|
// spend. The actual USD is settled after the response (Settle), at which point the
|
||||||
|
// reservation is released and the real cost booked. Order: global first (cheapest to
|
||||||
|
// deny), then per-user.
|
||||||
//
|
//
|
||||||
// A transaction-scoped advisory lock on (date, mxid) serializes concurrent
|
// The check-and-reserve is serialized GLOBALLY for the day by a transaction-scoped
|
||||||
// reservations for the SAME user+day, so the per-user check-then-increment stays
|
// advisory lock keyed on the date (not on date|mxid as the bare port did). This is
|
||||||
// atomic. The former SQLite store got this for free (one connection serialized all
|
// the TOCTOU fix (§8.1): the ceiling reads SUM(committed)+SUM(reserved) and then adds
|
||||||
// callers); the pgx pool is concurrent, and the same user messaging from two rooms
|
// its own reservation atomically, so a burst of DIFFERENT users can overshoot the
|
||||||
// at once would otherwise be able to slip past the per-user cap. Different users
|
// ceiling by at most ONE max-reservation rather than slipping through unbounded — the
|
||||||
// never contend.
|
// per-(date,mxid) lock only serialized one user with himself and left the cross-user
|
||||||
func (s *Store) Reserve(mxid string, perUserCap int, dailyUSDCeiling float64) (reserveResult, error) {
|
// ceiling unprotected. The former SQLite store serialized ALL callers on its single
|
||||||
|
// connection anyway, so this restores that exact admission semantics, durably; the
|
||||||
|
// bot is low-volume with per-room single-flight, so a per-day admission lock costs
|
||||||
|
// nothing observable. Settle/Release run lock-free (they only release/convert spend,
|
||||||
|
// never admit).
|
||||||
|
func (s *Store) Reserve(mxid string, perUserCap int, perUserUSD, dailyUSDCeiling, estimate float64) (reserveResult, error) {
|
||||||
ctx, cancel := opContext()
|
ctx, cancel := opContext()
|
||||||
defer cancel()
|
defer cancel()
|
||||||
day := todayUTC()
|
day := todayUTC()
|
||||||
|
|
@ -245,41 +321,50 @@ func (s *Store) Reserve(mxid string, perUserCap int, dailyUSDCeiling float64) (r
|
||||||
}
|
}
|
||||||
defer tx.Rollback(ctx)
|
defer tx.Rollback(ctx)
|
||||||
|
|
||||||
// Key on date|mxid. The separator only needs to avoid cross-key ambiguity; a
|
if _, err := tx.Exec(ctx, `SELECT pg_advisory_xact_lock(hashtextextended($1, 0))`, reserveDayLockKey+day); err != nil {
|
||||||
// hash collision would merely over-serialize two unrelated users, never corrupt a
|
|
||||||
// count. (NUL is rejected by Postgres text, so use a printable separator.)
|
|
||||||
if _, err := tx.Exec(ctx, `SELECT pg_advisory_xact_lock(hashtextextended($1, 0))`, day+"|"+mxid); err != nil {
|
|
||||||
return reserveOK, err
|
return reserveOK, err
|
||||||
}
|
}
|
||||||
|
|
||||||
// SUM over zero rows is NULL → nil pointer → treat as 0.0, exactly as the SQLite
|
// committed + reserved. SUM over zero rows is NULL → nil pointer → treat as 0.0,
|
||||||
// store's sql.NullFloat64 did (and as SpentTodayUSD does). This keeps the gate 1:1
|
// exactly as the SQLite store's sql.NullFloat64 did. This keeps the gate 1:1 even
|
||||||
// even at the degenerate dailyUSDCeiling == 0 (deny everything), where 0 >= 0.
|
// at the degenerate dailyUSDCeiling == 0 (deny everything), where 0 >= 0.
|
||||||
var global *float64
|
var inFlight *float64
|
||||||
if err := tx.QueryRow(ctx, `SELECT SUM(usd) FROM spend WHERE date = $1`, day).Scan(&global); err != nil {
|
if err := tx.QueryRow(ctx,
|
||||||
|
`SELECT SUM(`+committedUSDExpr+` + reserved_usd) FROM spend WHERE date = $1`, day).Scan(&inFlight); err != nil {
|
||||||
return reserveOK, err
|
return reserveOK, err
|
||||||
}
|
}
|
||||||
spentToday := 0.0
|
spentToday := 0.0
|
||||||
if global != nil {
|
if inFlight != nil {
|
||||||
spentToday = *global
|
spentToday = *inFlight
|
||||||
}
|
}
|
||||||
if spentToday >= dailyUSDCeiling {
|
if spentToday >= dailyUSDCeiling {
|
||||||
return reserveDeniedGlobal, nil
|
return reserveDeniedGlobal, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Per-user row: read requests AND the user's own committed+reserved $ in one go, so
|
||||||
|
// both per-user gates are checked under the same lock. ErrNoRows → first request of
|
||||||
|
// the day for this user → all zero.
|
||||||
var requests int
|
var requests int
|
||||||
err = tx.QueryRow(ctx, `SELECT requests FROM spend WHERE date = $1 AND mxid = $2`, day, mxid).Scan(&requests)
|
var userUSD float64
|
||||||
|
err = tx.QueryRow(ctx,
|
||||||
|
`SELECT requests, `+committedUSDExpr+` + reserved_usd FROM spend WHERE date = $1 AND mxid = $2`,
|
||||||
|
day, mxid).Scan(&requests, &userUSD)
|
||||||
if err != nil && !errors.Is(err, pgx.ErrNoRows) {
|
if err != nil && !errors.Is(err, pgx.ErrNoRows) {
|
||||||
return reserveOK, err
|
return reserveOK, err
|
||||||
}
|
}
|
||||||
if requests >= perUserCap {
|
if requests >= perUserCap {
|
||||||
return reserveDeniedUser, nil
|
return reserveDeniedUser, nil
|
||||||
}
|
}
|
||||||
|
// Optional per-user $ quota (0 = off): keep one user from draining the shared ceiling.
|
||||||
|
if perUserUSD > 0 && userUSD >= perUserUSD {
|
||||||
|
return reserveDeniedUser, nil
|
||||||
|
}
|
||||||
|
|
||||||
if _, err := tx.Exec(ctx,
|
if _, err := tx.Exec(ctx,
|
||||||
`INSERT INTO spend (date, mxid, requests, usd) VALUES ($1, $2, 1, 0)
|
`INSERT INTO spend (date, mxid, requests, reserved_usd) VALUES ($1, $2, 1, $3)
|
||||||
ON CONFLICT (date, mxid) DO UPDATE SET requests = spend.requests + 1`,
|
ON CONFLICT (date, mxid) DO UPDATE SET requests = spend.requests + 1,
|
||||||
day, mxid); err != nil {
|
reserved_usd = spend.reserved_usd + excluded.reserved_usd`,
|
||||||
|
day, mxid, estimate); err != nil {
|
||||||
return reserveOK, err
|
return reserveOK, err
|
||||||
}
|
}
|
||||||
if err := tx.Commit(ctx); err != nil {
|
if err := tx.Commit(ctx); err != nil {
|
||||||
|
|
@ -288,10 +373,11 @@ func (s *Store) Reserve(mxid string, perUserCap int, dailyUSDCeiling float64) (r
|
||||||
return reserveOK, nil
|
return reserveOK, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
// RefundRequest gives back a reserved request slot when the call ultimately
|
// RefundRequest gives back a reserved request SLOT when the call ultimately failed
|
||||||
// failed (e.g. an xAI outage), so a transient failure doesn't burn the user's
|
// (an outage) or the reply couldn't be delivered (paid silence, §8.1), so a transient
|
||||||
// daily cap. Never drops below zero. A single UPDATE is atomic, so concurrent
|
// failure doesn't burn the user's daily cap. It does NOT touch USD: a 2xx is really
|
||||||
// refunds settle correctly without extra locking.
|
// billed even if we then fail to deliver. Never drops below zero. A single UPDATE is
|
||||||
|
// atomic, so concurrent refunds settle correctly without extra locking.
|
||||||
func (s *Store) RefundRequest(mxid string) error {
|
func (s *Store) RefundRequest(mxid string) error {
|
||||||
ctx, cancel := opContext()
|
ctx, cancel := opContext()
|
||||||
defer cancel()
|
defer cancel()
|
||||||
|
|
@ -301,19 +387,128 @@ func (s *Store) RefundRequest(mxid string) error {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
// Reconcile books the actual USD cost of a completed call against the user's
|
// ReleaseReservation frees a reservation whose request produced no billable spend,
|
||||||
// daily row (and thus the global total). The accumulating upsert is atomic and
|
// restoring the global headroom without booking anything. The normal failure paths
|
||||||
// commutative, so concurrent reconciles for the same user sum correctly.
|
// settle via Settle (which also releases), so this is the safety valve for an
|
||||||
func (s *Store) Reconcile(mxid string, usd float64) error {
|
// UNSETTLED exit — a panic in generation, recovered by safego — where it runs with
|
||||||
|
// RefundRequest in respond's deferred guard so a leaked reservation can't drift the
|
||||||
|
// ceiling. GREATEST(0, …) guards against a double-release driving reserved_usd
|
||||||
|
// negative. Lock-free: it only lowers the in-flight reserved total, never admits.
|
||||||
|
func (s *Store) ReleaseReservation(mxid string, estimate float64) error {
|
||||||
ctx, cancel := opContext()
|
ctx, cancel := opContext()
|
||||||
defer cancel()
|
defer cancel()
|
||||||
_, err := s.pool.Exec(ctx,
|
_, err := s.pool.Exec(ctx,
|
||||||
`INSERT INTO spend (date, mxid, requests, usd) VALUES ($1, $2, 0, $3)
|
`UPDATE spend SET reserved_usd = GREATEST(0, reserved_usd - $3) WHERE date = $1 AND mxid = $2`,
|
||||||
ON CONFLICT (date, mxid) DO UPDATE SET usd = spend.usd + excluded.usd`,
|
todayUTC(), mxid, estimate)
|
||||||
todayUTC(), mxid, usd)
|
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Settle releases a call's reservation and books its ACTUAL cost in one atomic step
|
||||||
|
// (replacing the old additive Reconcile): reserved_usd drops by the reservation while
|
||||||
|
// the real per-component cost is added to the committed columns. This is non-additive
|
||||||
|
// on the reservation (settle, not accumulate), the semantics the ceiling needs. It is
|
||||||
|
// also the partial-cascade-refund primitive (§8.1): a web_then_grok call that paid
|
||||||
|
// grounding but failed at the final model passes a CostBreakdown carrying only the
|
||||||
|
// grounding it actually spent, releases the rest of the reservation, and refunds the
|
||||||
|
// request slot separately. GREATEST(0, …) keeps reserved_usd from underflowing.
|
||||||
|
// Atomic and commutative per row, so concurrent settles for one user sum correctly.
|
||||||
|
func (s *Store) Settle(mxid string, estimate float64, cost CostBreakdown) error {
|
||||||
|
ctx, cancel := opContext()
|
||||||
|
defer cancel()
|
||||||
|
_, err := s.pool.Exec(ctx,
|
||||||
|
`INSERT INTO spend (date, mxid, requests, usd, router_usd, grounding_usd, webtool_usd, reserved_usd)
|
||||||
|
VALUES ($1, $2, 0, $3, $4, $5, $6, 0)
|
||||||
|
ON CONFLICT (date, mxid) DO UPDATE SET
|
||||||
|
usd = spend.usd + excluded.usd,
|
||||||
|
router_usd = spend.router_usd + excluded.router_usd,
|
||||||
|
grounding_usd = spend.grounding_usd + excluded.grounding_usd,
|
||||||
|
webtool_usd = spend.webtool_usd + excluded.webtool_usd,
|
||||||
|
reserved_usd = GREATEST(0, spend.reserved_usd - $7)`,
|
||||||
|
todayUTC(), mxid, cost.Token, cost.Router, cost.Grounding, cost.WebTool, estimate)
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// InsertRequestLog writes one analytics row. id is the event id (PRIMARY KEY), so a
|
||||||
|
// re-logged event is a no-op (ON CONFLICT DO NOTHING) — each event takes exactly one
|
||||||
|
// terminal path, so this never overwrites a real outcome. The write is isolated: the
|
||||||
|
// caller runs it off the answer path and only logs a failure, never drops the reply.
|
||||||
|
func (s *Store) InsertRequestLog(rl RequestLog) error {
|
||||||
|
ctx, cancel := opContext()
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
models, err := json.Marshal(rl.Models)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
stages, err := json.Marshal(rl.StageMS)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
// query_text is NULL unless text capture is on (the struct carries "" otherwise),
|
||||||
|
// so the analytics table never holds message content by default.
|
||||||
|
var queryText any
|
||||||
|
if rl.QueryText != "" {
|
||||||
|
queryText = rl.QueryText
|
||||||
|
}
|
||||||
|
|
||||||
|
_, err = s.pool.Exec(ctx, `
|
||||||
|
INSERT INTO request_log (
|
||||||
|
id, room_id, sender, route, router_source, router_confidence, models,
|
||||||
|
prompt_tokens, cached_tokens, completion_tokens,
|
||||||
|
token_usd, grounding_usd, router_usd, webtool_usd, total_usd,
|
||||||
|
latency_ms, stage_ms, escalated, fallback_fired, cache_hit, ceiling_hit,
|
||||||
|
per_user_cap_hit, prompt_version, provider_request_id, degraded, err, ok, query_text
|
||||||
|
) VALUES (
|
||||||
|
$1, $2, $3, $4, $5, $6, $7,
|
||||||
|
$8, $9, $10,
|
||||||
|
$11, $12, $13, $14, $15,
|
||||||
|
$16, $17, $18, $19, $20, $21,
|
||||||
|
$22, $23, $24, $25, $26, $27, $28
|
||||||
|
) ON CONFLICT (id) DO NOTHING`,
|
||||||
|
rl.ID, rl.RoomID, rl.Sender, rl.Route, rl.RouterSource, rl.RouterConfidence, models,
|
||||||
|
rl.PromptTokens, rl.CachedTokens, rl.CompletionTokens,
|
||||||
|
rl.Cost.Token, rl.Cost.Grounding, rl.Cost.Router, rl.Cost.WebTool, rl.Cost.Total(),
|
||||||
|
rl.LatencyMS, stages, rl.Escalated, rl.FallbackFired, rl.CacheHit, rl.CeilingHit,
|
||||||
|
rl.PerUserCapHit, rl.PromptVersion, rl.ProviderRequestID, rl.Degraded, rl.Err, rl.OK, queryText)
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// TrimRequestLog deletes analytics rows older than the cutoff (time-based, since the
|
||||||
|
// data is a time series — unlike the count-bounded dedup tables). A no-op for a zero
|
||||||
|
// cutoff. Cheap given the ts index.
|
||||||
|
func (s *Store) TrimRequestLog(olderThan time.Time) error {
|
||||||
|
ctx, cancel := opContext()
|
||||||
|
defer cancel()
|
||||||
|
_, err := s.pool.Exec(ctx, `DELETE FROM request_log WHERE ts < $1`, olderThan)
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// IncrGroundingIfUnder atomically admits one grounded prompt for today if the day's
|
||||||
|
// count is below cap, returning whether it was admitted. The check-and-increment is a
|
||||||
|
// single statement, so concurrent grounding calls can't race past the cap and into the
|
||||||
|
// per-1k overage (§8.2.3). A non-positive cap denies everything (grounding effectively
|
||||||
|
// off). The counter is day-keyed and self-resets at UTC midnight.
|
||||||
|
func (s *Store) IncrGroundingIfUnder(cap int) (bool, error) {
|
||||||
|
if cap <= 0 {
|
||||||
|
return false, nil
|
||||||
|
}
|
||||||
|
ctx, cancel := opContext()
|
||||||
|
defer cancel()
|
||||||
|
var n int
|
||||||
|
err := s.pool.QueryRow(ctx, `
|
||||||
|
INSERT INTO grounding_count (date, n) VALUES ($1, 1)
|
||||||
|
ON CONFLICT (date) DO UPDATE SET n = grounding_count.n + 1
|
||||||
|
WHERE grounding_count.n < $2
|
||||||
|
RETURNING n`, todayUTC(), cap).Scan(&n)
|
||||||
|
if errors.Is(err, pgx.ErrNoRows) {
|
||||||
|
return false, nil // at/over cap — the conflict update was filtered out
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return false, err
|
||||||
|
}
|
||||||
|
return true, nil
|
||||||
|
}
|
||||||
|
|
||||||
// HasWarnedEncrypted / SetWarnedEncrypted persist the one-shot "reacted 🔒 to this
|
// HasWarnedEncrypted / SetWarnedEncrypted persist the one-shot "reacted 🔒 to this
|
||||||
// room because I can't read encryption" flag so a restart doesn't re-react on every
|
// room because I can't read encryption" flag so a restart doesn't re-react on every
|
||||||
// message (F5). The bot never reacts to its own events: m.reaction is not an
|
// message (F5). The bot never reacts to its own events: m.reaction is not an
|
||||||
|
|
|
||||||
|
|
@ -1,9 +1,11 @@
|
||||||
package main
|
package main
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
"fmt"
|
||||||
"sync"
|
"sync"
|
||||||
"sync/atomic"
|
"sync/atomic"
|
||||||
"testing"
|
"testing"
|
||||||
|
"time"
|
||||||
)
|
)
|
||||||
|
|
||||||
// These tests exercise the Postgres-backed store directly. They run only when
|
// These tests exercise the Postgres-backed store directly. They run only when
|
||||||
|
|
@ -84,23 +86,23 @@ func TestStoreLimiterPerUserCap(t *testing.T) {
|
||||||
const cap, ceiling = 2, 100.0
|
const cap, ceiling = 2, 100.0
|
||||||
|
|
||||||
for i := 0; i < cap; i++ {
|
for i := 0; i < cap; i++ {
|
||||||
if res, err := st.Reserve(user, cap, ceiling); err != nil || res != reserveOK {
|
if res, err := st.Reserve(user, cap, 0, ceiling, 0); err != nil || res != reserveOK {
|
||||||
t.Fatalf("reserve %d: got (%v,%v), want reserveOK", i, res, err)
|
t.Fatalf("reserve %d: got (%v,%v), want reserveOK", i, res, err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
// The (cap+1)th request is denied per-user.
|
// The (cap+1)th request is denied per-user.
|
||||||
if res, err := st.Reserve(user, cap, ceiling); err != nil || res != reserveDeniedUser {
|
if res, err := st.Reserve(user, cap, 0, ceiling, 0); err != nil || res != reserveDeniedUser {
|
||||||
t.Fatalf("over-cap reserve: got (%v,%v), want reserveDeniedUser", res, err)
|
t.Fatalf("over-cap reserve: got (%v,%v), want reserveDeniedUser", res, err)
|
||||||
}
|
}
|
||||||
// A different user is unaffected.
|
// A different user is unaffected.
|
||||||
if res, err := st.Reserve("@v:vojo.chat", cap, ceiling); err != nil || res != reserveOK {
|
if res, err := st.Reserve("@v:vojo.chat", cap, 0, ceiling, 0); err != nil || res != reserveOK {
|
||||||
t.Fatalf("other user reserve: got (%v,%v), want reserveOK", res, err)
|
t.Fatalf("other user reserve: got (%v,%v), want reserveOK", res, err)
|
||||||
}
|
}
|
||||||
// Refund returns a slot, so the first user can reserve once more.
|
// Refund returns a slot, so the first user can reserve once more.
|
||||||
if err := st.RefundRequest(user); err != nil {
|
if err := st.RefundRequest(user); err != nil {
|
||||||
t.Fatalf("refund: %v", err)
|
t.Fatalf("refund: %v", err)
|
||||||
}
|
}
|
||||||
if res, err := st.Reserve(user, cap, ceiling); err != nil || res != reserveOK {
|
if res, err := st.Reserve(user, cap, 0, ceiling, 0); err != nil || res != reserveOK {
|
||||||
t.Fatalf("post-refund reserve: got (%v,%v), want reserveOK", res, err)
|
t.Fatalf("post-refund reserve: got (%v,%v), want reserveOK", res, err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -110,7 +112,7 @@ func TestStoreLimiterPerUserCap(t *testing.T) {
|
||||||
func TestStoreLimiterZeroCap(t *testing.T) {
|
func TestStoreLimiterZeroCap(t *testing.T) {
|
||||||
st := openTestStore(t)
|
st := openTestStore(t)
|
||||||
defer st.Close()
|
defer st.Close()
|
||||||
if res, err := st.Reserve("@u:vojo.chat", 0, 100.0); err != nil || res != reserveDeniedUser {
|
if res, err := st.Reserve("@u:vojo.chat", 0, 0, 100.0, 0); err != nil || res != reserveDeniedUser {
|
||||||
t.Fatalf("zero-cap reserve: got (%v,%v), want reserveDeniedUser", res, err)
|
t.Fatalf("zero-cap reserve: got (%v,%v), want reserveDeniedUser", res, err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -121,7 +123,7 @@ func TestStoreLimiterZeroCap(t *testing.T) {
|
||||||
func TestStoreLimiterZeroCeiling(t *testing.T) {
|
func TestStoreLimiterZeroCeiling(t *testing.T) {
|
||||||
st := openTestStore(t)
|
st := openTestStore(t)
|
||||||
defer st.Close()
|
defer st.Close()
|
||||||
if res, err := st.Reserve("@u:vojo.chat", 1_000_000, 0); err != nil || res != reserveDeniedGlobal {
|
if res, err := st.Reserve("@u:vojo.chat", 1_000_000, 0, 0, 0); err != nil || res != reserveDeniedGlobal {
|
||||||
t.Fatalf("zero-ceiling reserve on empty store: got (%v,%v), want reserveDeniedGlobal", res, err)
|
t.Fatalf("zero-ceiling reserve on empty store: got (%v,%v), want reserveDeniedGlobal", res, err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -131,18 +133,18 @@ func TestStoreLimiterGlobalCeiling(t *testing.T) {
|
||||||
defer st.Close()
|
defer st.Close()
|
||||||
|
|
||||||
const ceiling = 1.0
|
const ceiling = 1.0
|
||||||
// Book spend up to the ceiling (Reconcile is what feeds the global gate).
|
// Book spend up to the ceiling (Settle is what feeds the global gate).
|
||||||
if err := st.Reconcile("@a:vojo.chat", 0.6); err != nil {
|
if err := st.Settle("@a:vojo.chat", 0, CostBreakdown{Token: 0.6}); err != nil {
|
||||||
t.Fatalf("reconcile a: %v", err)
|
t.Fatalf("settle a: %v", err)
|
||||||
}
|
}
|
||||||
if err := st.Reconcile("@b:vojo.chat", 0.5); err != nil {
|
if err := st.Settle("@b:vojo.chat", 0, CostBreakdown{Token: 0.5}); err != nil {
|
||||||
t.Fatalf("reconcile b: %v", err)
|
t.Fatalf("settle b: %v", err)
|
||||||
}
|
}
|
||||||
if spent, err := st.SpentTodayUSD(); err != nil || spent < 1.1 {
|
if spent, err := st.SpentTodayUSD(); err != nil || spent < 1.1 {
|
||||||
t.Fatalf("spent today: got (%v,%v), want >= 1.1", spent, err)
|
t.Fatalf("spent today: got (%v,%v), want >= 1.1", spent, err)
|
||||||
}
|
}
|
||||||
// Now any reservation is denied globally, regardless of the per-user cap.
|
// Now any reservation is denied globally, regardless of the per-user cap.
|
||||||
if res, err := st.Reserve("@c:vojo.chat", 1_000_000, ceiling); err != nil || res != reserveDeniedGlobal {
|
if res, err := st.Reserve("@c:vojo.chat", 1_000_000, 0, ceiling, 0); err != nil || res != reserveDeniedGlobal {
|
||||||
t.Fatalf("over-ceiling reserve: got (%v,%v), want reserveDeniedGlobal", res, err)
|
t.Fatalf("over-ceiling reserve: got (%v,%v), want reserveDeniedGlobal", res, err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -165,7 +167,7 @@ func TestStoreReserveConcurrentRespectsCap(t *testing.T) {
|
||||||
wg.Add(1)
|
wg.Add(1)
|
||||||
go func() {
|
go func() {
|
||||||
defer wg.Done()
|
defer wg.Done()
|
||||||
res, err := st.Reserve(user, cap, 1e9)
|
res, err := st.Reserve(user, cap, 0, 1e9, 0)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Errorf("reserve: %v", err)
|
t.Errorf("reserve: %v", err)
|
||||||
return
|
return
|
||||||
|
|
@ -181,6 +183,266 @@ func TestStoreReserveConcurrentRespectsCap(t *testing.T) {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// TestStoreReserveConcurrentCeilingBounded is the §8.1 TOCTOU regression. Many
|
||||||
|
// DIFFERENT users reserving at once against a low ceiling must not overshoot it by
|
||||||
|
// more than ONE max-reservation. The bare pgx port's per-(date,mxid) lock left the
|
||||||
|
// cross-user ceiling unprotected: every user read the same committed SUM(usd)=0 (the
|
||||||
|
// USD only lands at settle, after the call) and slipped through, so all N were
|
||||||
|
// admitted. The per-day admission lock + reserved_usd here bound the overshoot.
|
||||||
|
// Run under -race.
|
||||||
|
func TestStoreReserveConcurrentCeilingBounded(t *testing.T) {
|
||||||
|
st := openTestStore(t)
|
||||||
|
defer st.Close()
|
||||||
|
|
||||||
|
const estimate = 1.0 // each in-flight call reserves $1
|
||||||
|
const ceiling = 10.0 // so the gate should admit ~10, not 100
|
||||||
|
const perUserCap = 1_000_000 // keep the per-user cap out of the way
|
||||||
|
const goroutines = 100
|
||||||
|
|
||||||
|
var ok int64
|
||||||
|
var wg sync.WaitGroup
|
||||||
|
for i := 0; i < goroutines; i++ {
|
||||||
|
wg.Add(1)
|
||||||
|
go func(n int) {
|
||||||
|
defer wg.Done()
|
||||||
|
user := fmt.Sprintf("@u%d:vojo.chat", n) // a DIFFERENT user each time
|
||||||
|
res, err := st.Reserve(user, perUserCap, 0, ceiling, estimate)
|
||||||
|
if err != nil {
|
||||||
|
t.Errorf("reserve: %v", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if res == reserveOK {
|
||||||
|
atomic.AddInt64(&ok, 1)
|
||||||
|
}
|
||||||
|
}(i)
|
||||||
|
}
|
||||||
|
wg.Wait()
|
||||||
|
|
||||||
|
// committed+reserved < ceiling admits; the last admit can push reserved to just
|
||||||
|
// under ceiling+estimate, so admitted ≤ ceiling/estimate + 1. The pre-fix code
|
||||||
|
// admitted all 100.
|
||||||
|
maxAdmit := int64(ceiling/estimate) + 1
|
||||||
|
if ok < 1 || ok > maxAdmit {
|
||||||
|
t.Fatalf("admitted %d different users, want in [1, %d] (ceiling + one max-reserve)", ok, maxAdmit)
|
||||||
|
}
|
||||||
|
// Nothing was settled, so committed spend is still 0 — the cap came purely from
|
||||||
|
// reservations, which is the whole point (the USD isn't known until after the call).
|
||||||
|
if spent, err := st.SpentTodayUSD(); err != nil || spent != 0 {
|
||||||
|
t.Fatalf("committed spend = (%v,%v), want 0 (only reservations held)", spent, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestStoreSettleReleasesReservation verifies that Settle frees the reservation it
|
||||||
|
// books actual cost for, restoring global headroom — proven through the admission
|
||||||
|
// gate so it doesn't depend on reading the private column.
|
||||||
|
func TestStoreSettleReleasesReservation(t *testing.T) {
|
||||||
|
st := openTestStore(t)
|
||||||
|
defer st.Close()
|
||||||
|
const est = 5.0
|
||||||
|
const ceiling = 10.0
|
||||||
|
|
||||||
|
// Two reservations fill the ceiling (reserved 5 + 5 = 10); the third is denied.
|
||||||
|
if res, _ := st.Reserve("@a:vojo.chat", 1_000_000, 0, ceiling, est); res != reserveOK {
|
||||||
|
t.Fatalf("reserve a: %v", res)
|
||||||
|
}
|
||||||
|
if res, _ := st.Reserve("@b:vojo.chat", 1_000_000, 0, ceiling, est); res != reserveOK {
|
||||||
|
t.Fatalf("reserve b: %v", res)
|
||||||
|
}
|
||||||
|
if res, _ := st.Reserve("@c:vojo.chat", 1_000_000, 0, ceiling, est); res != reserveDeniedGlobal {
|
||||||
|
t.Fatalf("reserve c over full ceiling: got %v, want denied", res)
|
||||||
|
}
|
||||||
|
// Settle a with a small actual cost: reserved 10→5, committed 0→0.01. Headroom
|
||||||
|
// returns, so a new reservation is admitted again.
|
||||||
|
if err := st.Settle("@a:vojo.chat", est, CostBreakdown{Token: 0.01}); err != nil {
|
||||||
|
t.Fatalf("settle a: %v", err)
|
||||||
|
}
|
||||||
|
if res, _ := st.Reserve("@d:vojo.chat", 1_000_000, 0, ceiling, est); res != reserveOK {
|
||||||
|
t.Fatalf("reserve d after settle freed headroom: got %v, want reserveOK", res)
|
||||||
|
}
|
||||||
|
if spent, _ := st.SpentTodayUSD(); spent < 0.009 || spent > 0.011 {
|
||||||
|
t.Fatalf("committed after one settle = %v, want ~0.01", spent)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestStoreReleaseReservation verifies the call-failed path: a released reservation
|
||||||
|
// frees headroom and books no USD, and an over-release clamps reserved_usd to 0
|
||||||
|
// rather than going negative (a negative reservation would manufacture phantom
|
||||||
|
// headroom past the ceiling).
|
||||||
|
func TestStoreReleaseReservation(t *testing.T) {
|
||||||
|
st := openTestStore(t)
|
||||||
|
defer st.Close()
|
||||||
|
const est = 5.0
|
||||||
|
const ceiling = 10.0
|
||||||
|
|
||||||
|
// Reserve a, then over-release it by far more than it held.
|
||||||
|
if res, _ := st.Reserve("@a:vojo.chat", 1_000_000, 0, ceiling, est); res != reserveOK {
|
||||||
|
t.Fatalf("reserve a: %v", res)
|
||||||
|
}
|
||||||
|
if err := st.ReleaseReservation("@a:vojo.chat", 100); err != nil {
|
||||||
|
t.Fatalf("over-release: %v", err)
|
||||||
|
}
|
||||||
|
// a's reserved must now be 0 (not -95): exactly two more $5 reservations fit the
|
||||||
|
// $10 ceiling, and the third is denied. Were reserved negative, far more would slip
|
||||||
|
// through — so the deny at the third request proves both the headroom was freed and
|
||||||
|
// the clamp held.
|
||||||
|
if res, _ := st.Reserve("@b:vojo.chat", 1_000_000, 0, ceiling, est); res != reserveOK {
|
||||||
|
t.Fatalf("reserve b: %v", res)
|
||||||
|
}
|
||||||
|
if res, _ := st.Reserve("@c:vojo.chat", 1_000_000, 0, ceiling, est); res != reserveOK {
|
||||||
|
t.Fatalf("reserve c: %v", res)
|
||||||
|
}
|
||||||
|
if res, _ := st.Reserve("@d:vojo.chat", 1_000_000, 0, ceiling, est); res != reserveDeniedGlobal {
|
||||||
|
t.Fatalf("reserve d: got %v, want denied (reserved must have clamped to 0, not gone negative)", res)
|
||||||
|
}
|
||||||
|
// Nothing was ever settled, so committed spend stays 0 — release books no USD.
|
||||||
|
if spent, _ := st.SpentTodayUSD(); spent != 0 {
|
||||||
|
t.Fatalf("committed after release = %v, want 0 (a failed call bills nothing)", spent)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestStoreRequestLog covers the analytics row: total_usd is the component sum,
|
||||||
|
// query_text is NULL unless captured, re-inserting one id is a no-op, and the
|
||||||
|
// time-based trim removes old rows.
|
||||||
|
func TestStoreRequestLog(t *testing.T) {
|
||||||
|
st := openTestStore(t)
|
||||||
|
defer st.Close()
|
||||||
|
|
||||||
|
noText := RequestLog{
|
||||||
|
ID: "$ev-rl-1", RoomID: "!r:vojo.chat", Sender: "@u:vojo.chat",
|
||||||
|
Route: routeGrokDirect, RouterSource: "default",
|
||||||
|
Models: map[string]string{"final": "grok-x"},
|
||||||
|
Cost: CostBreakdown{Token: 0.01, Grounding: 0.02},
|
||||||
|
LatencyMS: 1234, StageMS: map[string]int{"final": 1200},
|
||||||
|
ProviderRequestID: "prov-1", OK: true, // QueryText empty → NULL
|
||||||
|
}
|
||||||
|
if err := st.InsertRequestLog(noText); err != nil {
|
||||||
|
t.Fatalf("insert: %v", err)
|
||||||
|
}
|
||||||
|
// Re-inserting the same id is a no-op (ON CONFLICT DO NOTHING), not an error.
|
||||||
|
if err := st.InsertRequestLog(noText); err != nil {
|
||||||
|
t.Fatalf("re-insert: %v", err)
|
||||||
|
}
|
||||||
|
withText := RequestLog{ID: "$ev-rl-2", Route: routeTrivial, OK: false, QueryText: "hello"}
|
||||||
|
if err := st.InsertRequestLog(withText); err != nil {
|
||||||
|
t.Fatalf("insert-with-text: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
ctx, cancel := opContext()
|
||||||
|
defer cancel()
|
||||||
|
var route string
|
||||||
|
var total float64
|
||||||
|
var ok bool
|
||||||
|
var qt *string
|
||||||
|
if err := st.pool.QueryRow(ctx,
|
||||||
|
`SELECT route, total_usd, ok, query_text FROM request_log WHERE id = $1`, noText.ID).
|
||||||
|
Scan(&route, &total, &ok, &qt); err != nil {
|
||||||
|
t.Fatalf("read row1: %v", err)
|
||||||
|
}
|
||||||
|
if route != routeGrokDirect || !ok {
|
||||||
|
t.Fatalf("row1 = (%q, ok=%v), want (grok_direct, true)", route, ok)
|
||||||
|
}
|
||||||
|
if d := total - 0.03; d > 1e-9 || d < -1e-9 {
|
||||||
|
t.Fatalf("row1 total_usd = %v, want 0.03 (token+grounding)", total)
|
||||||
|
}
|
||||||
|
if qt != nil {
|
||||||
|
t.Fatalf("row1 query_text = %q, want NULL when text capture off", *qt)
|
||||||
|
}
|
||||||
|
if err := st.pool.QueryRow(ctx, `SELECT query_text FROM request_log WHERE id = $1`, withText.ID).Scan(&qt); err != nil {
|
||||||
|
t.Fatalf("read row2: %v", err)
|
||||||
|
}
|
||||||
|
if qt == nil || *qt != "hello" {
|
||||||
|
t.Fatalf("row2 query_text = %v, want \"hello\"", qt)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Trim everything older than one hour from now → both rows (ts<now) gone.
|
||||||
|
if err := st.TrimRequestLog(time.Now().Add(time.Hour)); err != nil {
|
||||||
|
t.Fatalf("trim: %v", err)
|
||||||
|
}
|
||||||
|
var count int
|
||||||
|
if err := st.pool.QueryRow(ctx, `SELECT count(*) FROM request_log`).Scan(&count); err != nil {
|
||||||
|
t.Fatalf("count: %v", err)
|
||||||
|
}
|
||||||
|
if count != 0 {
|
||||||
|
t.Fatalf("after trim count = %d, want 0", count)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestStorePerUserUSDCap covers the optional per-user $ quota: a user is denied once
|
||||||
|
// their own committed+reserved spend reaches the cap, other users are unaffected, and a
|
||||||
|
// zero cap disables the check.
|
||||||
|
func TestStorePerUserUSDCap(t *testing.T) {
|
||||||
|
st := openTestStore(t)
|
||||||
|
defer st.Close()
|
||||||
|
const user = "@u:vojo.chat"
|
||||||
|
const perUserUSD = 1.0
|
||||||
|
|
||||||
|
if err := st.Settle(user, 0, CostBreakdown{Token: 0.9}); err != nil {
|
||||||
|
t.Fatalf("settle: %v", err)
|
||||||
|
}
|
||||||
|
// $0.9 < $1.0 cap → admitted.
|
||||||
|
if res, err := st.Reserve(user, 1_000_000, perUserUSD, 1e9, 0); err != nil || res != reserveOK {
|
||||||
|
t.Fatalf("under per-user USD: (%v,%v), want reserveOK", res, err)
|
||||||
|
}
|
||||||
|
// Push the user over the cap.
|
||||||
|
if err := st.Settle(user, 0, CostBreakdown{Token: 0.5}); err != nil { // now $1.4
|
||||||
|
t.Fatalf("settle: %v", err)
|
||||||
|
}
|
||||||
|
if res, err := st.Reserve(user, 1_000_000, perUserUSD, 1e9, 0); err != nil || res != reserveDeniedUser {
|
||||||
|
t.Fatalf("over per-user USD: (%v,%v), want reserveDeniedUser", res, err)
|
||||||
|
}
|
||||||
|
// A different user is unaffected by the first user's spend.
|
||||||
|
if res, _ := st.Reserve("@v:vojo.chat", 1_000_000, perUserUSD, 1e9, 0); res != reserveOK {
|
||||||
|
t.Fatal("other user must be unaffected by the first user's per-user USD")
|
||||||
|
}
|
||||||
|
// perUserUSD == 0 disables the check entirely (the big spender is admitted again).
|
||||||
|
if res, _ := st.Reserve(user, 1_000_000, 0, 1e9, 0); res != reserveOK {
|
||||||
|
t.Fatal("perUserUSD=0 must disable the per-user $ cap")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestStoreGroundingCap covers the durable grounding cap guard: it admits up to the
|
||||||
|
// cap, then denies; a non-positive cap denies everything.
|
||||||
|
func TestStoreGroundingCap(t *testing.T) {
|
||||||
|
st := openTestStore(t)
|
||||||
|
defer st.Close()
|
||||||
|
const cap = 3
|
||||||
|
for i := 0; i < cap; i++ {
|
||||||
|
if ok, err := st.IncrGroundingIfUnder(cap); err != nil || !ok {
|
||||||
|
t.Fatalf("grounding %d: (%v,%v), want admitted", i, ok, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if ok, err := st.IncrGroundingIfUnder(cap); err != nil || ok {
|
||||||
|
t.Fatalf("over-cap grounding: (%v,%v), want denied", ok, err)
|
||||||
|
}
|
||||||
|
if ok, _ := st.IncrGroundingIfUnder(0); ok {
|
||||||
|
t.Fatal("cap 0 must deny everything (grounding off)")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestStoreGroundingCapConcurrent: the atomic check-increment must admit EXACTLY cap
|
||||||
|
// under a concurrent burst, so a spike can't blow past the $/1k overage. Run under -race.
|
||||||
|
func TestStoreGroundingCapConcurrent(t *testing.T) {
|
||||||
|
st := openTestStore(t)
|
||||||
|
defer st.Close()
|
||||||
|
const cap = 10
|
||||||
|
const goroutines = 50
|
||||||
|
var ok int64
|
||||||
|
var wg sync.WaitGroup
|
||||||
|
for i := 0; i < goroutines; i++ {
|
||||||
|
wg.Add(1)
|
||||||
|
go func() {
|
||||||
|
defer wg.Done()
|
||||||
|
if a, err := st.IncrGroundingIfUnder(cap); err == nil && a {
|
||||||
|
atomic.AddInt64(&ok, 1)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
}
|
||||||
|
wg.Wait()
|
||||||
|
if ok != cap {
|
||||||
|
t.Fatalf("concurrent grounding admitted %d, want exactly %d", ok, cap)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
func TestStoreWarnedEncrypted(t *testing.T) {
|
func TestStoreWarnedEncrypted(t *testing.T) {
|
||||||
st := openTestStore(t)
|
st := openTestStore(t)
|
||||||
const room = "!enc:vojo.chat"
|
const room = "!enc:vojo.chat"
|
||||||
|
|
|
||||||
124
apps/ai-bot/telemetry.go
Normal file
124
apps/ai-bot/telemetry.go
Normal file
|
|
@ -0,0 +1,124 @@
|
||||||
|
package main
|
||||||
|
|
||||||
|
import "time"
|
||||||
|
|
||||||
|
// telemetry.go is the request_log analytics path: it captures route, cost, latency
|
||||||
|
// and outcome for each engaged request so the real $/day and route mix can be
|
||||||
|
// MEASURED (the build plan's whole "is the cascade worth it" question) instead of
|
||||||
|
// modelled. It is strictly off the answer path — gated by TELEMETRY_ENABLED, written
|
||||||
|
// in a recovered goroutine, and a write failure only logs a WARN. A request never
|
||||||
|
// fails to be answered because telemetry couldn't be recorded.
|
||||||
|
|
||||||
|
// Route names (also the request_log.route values). grok_direct is today's path; the
|
||||||
|
// rest land behind flags in later phases. "none" means no model ran (a skip or a
|
||||||
|
// limiter denial).
|
||||||
|
const (
|
||||||
|
routeNone = "none"
|
||||||
|
routeGrokDirect = "grok_direct"
|
||||||
|
routeTrivial = "trivial_direct"
|
||||||
|
routeWebThenGrok = "web_then_grok"
|
||||||
|
routeReason = "reason_then_grok"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Degrade/skip reason strings (request_log.degraded). Stable tokens so the analytics
|
||||||
|
// can GROUP BY them.
|
||||||
|
const (
|
||||||
|
degradeEncrypted = "encrypted_room"
|
||||||
|
degradeMedia = "media"
|
||||||
|
degradeForeign = "foreign_room"
|
||||||
|
degradeEmpty = "empty_completion"
|
||||||
|
degradeSendFailed = "send_failed"
|
||||||
|
degradeReserveErr = "reserve_error"
|
||||||
|
degradeRouter = "router_failed"
|
||||||
|
degradeWeb = "web_failed"
|
||||||
|
degradeTrivial = "trivial_failed"
|
||||||
|
degradeGroundCap = "grounding_cap"
|
||||||
|
degradeReasoning = "reasoning_failed"
|
||||||
|
)
|
||||||
|
|
||||||
|
// telemetryTrimEvery bounds how often the retention trim runs — once per N writes,
|
||||||
|
// off the hot path, so the analytics table stays time-bounded without a separate
|
||||||
|
// lifecycle or a DELETE on every insert.
|
||||||
|
const telemetryTrimEvery = 200
|
||||||
|
|
||||||
|
// RequestLog is one analytics row (the request_log columns). Zero values are the
|
||||||
|
// "didn't apply" case — a grok_direct request leaves the cascade fields zero.
|
||||||
|
type RequestLog struct {
|
||||||
|
ID string
|
||||||
|
RoomID string
|
||||||
|
Sender string
|
||||||
|
Route string
|
||||||
|
RouterSource string // heuristic|classifier|default|forced|degraded
|
||||||
|
RouterConfidence float64
|
||||||
|
Models map[string]string // {"router":"…","final":"…"}
|
||||||
|
|
||||||
|
PromptTokens int
|
||||||
|
CachedTokens int
|
||||||
|
CompletionTokens int
|
||||||
|
Cost CostBreakdown
|
||||||
|
|
||||||
|
LatencyMS int
|
||||||
|
StageMS map[string]int // {"router":12,"web":1400,"final":2100}
|
||||||
|
|
||||||
|
Escalated bool
|
||||||
|
FallbackFired bool
|
||||||
|
CacheHit bool
|
||||||
|
CeilingHit bool
|
||||||
|
PerUserCapHit bool
|
||||||
|
PromptVersion string
|
||||||
|
ProviderRequestID string
|
||||||
|
Degraded string
|
||||||
|
Err string
|
||||||
|
OK bool
|
||||||
|
QueryText string // stored only when TELEMETRY_STORE_TEXT; stripped otherwise
|
||||||
|
}
|
||||||
|
|
||||||
|
// recordTelemetry persists a row off the answer path. No-op unless TELEMETRY_ENABLED.
|
||||||
|
// The query text is stripped unless TELEMETRY_STORE_TEXT, so message content never
|
||||||
|
// lands in the analytics table by default. Runs in a recovered goroutine and only
|
||||||
|
// logs failures, so it can never drop or delay the reply.
|
||||||
|
func (b *Bot) recordTelemetry(rl RequestLog) {
|
||||||
|
if !b.cfg.TelemetryEnabled {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if !b.cfg.TelemetryStoreText {
|
||||||
|
rl.QueryText = ""
|
||||||
|
}
|
||||||
|
b.safego("telemetry", func() {
|
||||||
|
if err := b.st.InsertRequestLog(rl); err != nil {
|
||||||
|
b.log.Warn("request_log insert failed (non-fatal)", "id", rl.ID, "err", err)
|
||||||
|
}
|
||||||
|
b.maybeTrimTelemetry()
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// recordSkip logs a request the bot was addressed by but couldn't fully serve before
|
||||||
|
// any model ran (encrypted/media/foreign). These are low-frequency, so a direct row
|
||||||
|
// (route=none + reason) keeps the "why no answer" visible without flooding the table
|
||||||
|
// with the common not-addressed drops, which are not logged (pre-claim best-effort).
|
||||||
|
func (b *Bot) recordSkip(ev *Event, reason string) {
|
||||||
|
b.recordTelemetry(RequestLog{
|
||||||
|
ID: ev.EventID,
|
||||||
|
RoomID: ev.RoomID,
|
||||||
|
Sender: ev.Sender,
|
||||||
|
Route: routeNone,
|
||||||
|
RouterSource: "default",
|
||||||
|
PromptVersion: b.promptVersion,
|
||||||
|
Degraded: reason,
|
||||||
|
OK: false,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// maybeTrimTelemetry runs the time-based retention trim once per telemetryTrimEvery
|
||||||
|
// writes. Best-effort and off the hot path (called from the telemetry goroutine).
|
||||||
|
func (b *Bot) maybeTrimTelemetry() {
|
||||||
|
if b.cfg.TelemetryRetention <= 0 {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if b.telemetryWrites.Add(1)%telemetryTrimEvery != 0 {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if err := b.st.TrimRequestLog(time.Now().Add(-b.cfg.TelemetryRetention)); err != nil {
|
||||||
|
b.log.Warn("request_log trim failed (non-fatal)", "err", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
71
apps/ai-bot/telemetry_test.go
Normal file
71
apps/ai-bot/telemetry_test.go
Normal file
|
|
@ -0,0 +1,71 @@
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"io"
|
||||||
|
"log/slog"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// newTestBot builds a Bot with just the fields the telemetry path needs — no network,
|
||||||
|
// so it sidesteps NewBot's identity check.
|
||||||
|
func newTestBot(st *Store, cfg *Config) *Bot {
|
||||||
|
return &Bot{cfg: cfg, st: st, log: slog.New(slog.NewTextHandler(io.Discard, nil)), promptVersion: "testv"}
|
||||||
|
}
|
||||||
|
|
||||||
|
func requestLogCount(t *testing.T, st *Store) int {
|
||||||
|
t.Helper()
|
||||||
|
ctx, cancel := opContext()
|
||||||
|
defer cancel()
|
||||||
|
var n int
|
||||||
|
if err := st.pool.QueryRow(ctx, `SELECT count(*) FROM request_log`).Scan(&n); err != nil {
|
||||||
|
t.Fatalf("count: %v", err)
|
||||||
|
}
|
||||||
|
return n
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestRecordSkipWritesRow proves the early-return telemetry path actually records a
|
||||||
|
// row (route=none + the skip reason) when TELEMETRY_ENABLED is on. The write is async,
|
||||||
|
// so poll briefly.
|
||||||
|
func TestRecordSkipWritesRow(t *testing.T) {
|
||||||
|
st := openTestStore(t)
|
||||||
|
defer st.Close()
|
||||||
|
b := newTestBot(st, &Config{TelemetryEnabled: true})
|
||||||
|
|
||||||
|
ev := &Event{EventID: "$skip-1", RoomID: "!r:vojo.chat", Sender: "@u:vojo.chat"}
|
||||||
|
b.recordSkip(ev, degradeMedia)
|
||||||
|
|
||||||
|
deadline := time.Now().Add(2 * time.Second)
|
||||||
|
for requestLogCount(t, st) == 0 && time.Now().Before(deadline) {
|
||||||
|
time.Sleep(20 * time.Millisecond)
|
||||||
|
}
|
||||||
|
if n := requestLogCount(t, st); n != 1 {
|
||||||
|
t.Fatalf("telemetry rows = %d, want 1", n)
|
||||||
|
}
|
||||||
|
ctx, cancel := opContext()
|
||||||
|
defer cancel()
|
||||||
|
var route, degraded string
|
||||||
|
if err := st.pool.QueryRow(ctx,
|
||||||
|
`SELECT route, degraded FROM request_log WHERE id = $1`, ev.EventID).Scan(&route, °raded); err != nil {
|
||||||
|
t.Fatalf("read: %v", err)
|
||||||
|
}
|
||||||
|
if route != routeNone || degraded != degradeMedia {
|
||||||
|
t.Fatalf("row = (%q,%q), want (none, media)", route, degraded)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestTelemetryDisabledWritesNothing proves the default (TELEMETRY_ENABLED off) adds
|
||||||
|
// no write path — strict "cascade-off == today".
|
||||||
|
func TestTelemetryDisabledWritesNothing(t *testing.T) {
|
||||||
|
st := openTestStore(t)
|
||||||
|
defer st.Close()
|
||||||
|
b := newTestBot(st, &Config{TelemetryEnabled: false})
|
||||||
|
|
||||||
|
b.recordSkip(&Event{EventID: "$skip-2", RoomID: "!r:vojo.chat", Sender: "@u:vojo.chat"}, degradeMedia)
|
||||||
|
|
||||||
|
// Give any (incorrect) async write time to land, then assert nothing was written.
|
||||||
|
time.Sleep(200 * time.Millisecond)
|
||||||
|
if n := requestLogCount(t, st); n != 0 {
|
||||||
|
t.Fatalf("telemetry rows = %d, want 0 (TELEMETRY_ENABLED off)", n)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -1,6 +1,17 @@
|
||||||
package main
|
package main
|
||||||
|
|
||||||
import "sync"
|
import (
|
||||||
|
"hash/fnv"
|
||||||
|
"sync"
|
||||||
|
)
|
||||||
|
|
||||||
|
// hashString is a cheap, stable 32-bit hash (FNV-1a). Used for opaque, non-identifying
|
||||||
|
// derived ids (e.g. the prompt-cache conv id) — not for security.
|
||||||
|
func hashString(s string) uint32 {
|
||||||
|
h := fnv.New32a()
|
||||||
|
_, _ = h.Write([]byte(s))
|
||||||
|
return h.Sum32()
|
||||||
|
}
|
||||||
|
|
||||||
// lruSet is a bounded insertion-ordered string set used for event-id dedup and
|
// lruSet is a bounded insertion-ordered string set used for event-id dedup and
|
||||||
// tracking our own sent event ids. Oldest entries evict once cap is reached.
|
// tracking our own sent event ids. Oldest entries evict once cap is reached.
|
||||||
|
|
|
||||||
223
apps/ai-bot/web.go
Normal file
223
apps/ai-bot/web.go
Normal file
|
|
@ -0,0 +1,223 @@
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"context"
|
||||||
|
"encoding/json"
|
||||||
|
"errors"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"log/slog"
|
||||||
|
"net/http"
|
||||||
|
)
|
||||||
|
|
||||||
|
// web.go is the pluggable web-freshness layer (Phase 3). A WebProvider fetches a
|
||||||
|
// grounded factual digest + source URLs for a query; the cascade then has Grok
|
||||||
|
// synthesise the final answer in voice from that digest. Two providers, chosen by
|
||||||
|
// WEB_PROVIDER:
|
||||||
|
//
|
||||||
|
// - grok_web_search (DEFAULT): the xAI Agent Tools `web_search` tool on the Responses
|
||||||
|
// API (/v1/responses). NB the older chat/completions Live Search `search_parameters`
|
||||||
|
// mechanism was RETIRED by xAI (now 410 Gone), and the web_search tool is not on
|
||||||
|
// chat/completions — hence the Responses endpoint. Billed $5/1k tool calls + tokens.
|
||||||
|
// - gemini_grounding: Gemini native v1beta google_search. Cheaper, but Gemini-3 only
|
||||||
|
// and silently ungrounds otherwise (F-EXT-3) — so it runs behind a citations
|
||||||
|
// verify-gate and degrades if it fails.
|
||||||
|
//
|
||||||
|
// The web call is bounded by a per-stage timeout (and gemini_grounding additionally by a
|
||||||
|
// durable daily cap), and either provider failing degrades the request to grok_direct
|
||||||
|
// with a staleness hedge (never silence, never stale-as-fresh).
|
||||||
|
//
|
||||||
|
// The grok_web_search Responses-API request/response shape was VALIDATED live against
|
||||||
|
// /v1/responses (2026-06-01): output[].type=="message" → content[].output_text + inline
|
||||||
|
// url_citation annotations; usage carries input/output tokens, cached subset, and the
|
||||||
|
// web_search_calls count (one request can search several times — each billed). The
|
||||||
|
// computed cost matched the API's own cost_in_usd_ticks to 4 dp. A parse miss still
|
||||||
|
// degrades safely (empty digest → grok_direct).
|
||||||
|
const (
|
||||||
|
webProviderGrokWebSearch = "grok_web_search"
|
||||||
|
webProviderGeminiGrounding = "gemini_grounding"
|
||||||
|
|
||||||
|
// grokWebSearchPerCall is xAI's Agent Tools fee: $5 per 1,000 web_search tool calls.
|
||||||
|
grokWebSearchPerCall = 5.0 / 1000.0
|
||||||
|
|
||||||
|
// maxWebSearchCalls bounds the per-call fee in the reservation envelope (one Responses
|
||||||
|
// request can search several times; the actual count is billed exactly at settle).
|
||||||
|
maxWebSearchCalls = 4
|
||||||
|
)
|
||||||
|
|
||||||
|
// errGroundingCapped signals the daily web/grounded-prompt cap was hit, so the caller
|
||||||
|
// degrades (with a hedge) rather than paying past the cap.
|
||||||
|
var errGroundingCapped = errors.New("web grounding daily cap reached")
|
||||||
|
|
||||||
|
// WebContext is the result of a web fetch: a factual digest to feed the final model,
|
||||||
|
// the sources behind it, the fetch's own token usage, and the cost the fetch incurred
|
||||||
|
// (kept separate from the final synthesis tokens so each books to its own ledger
|
||||||
|
// column). Cost is populated even when Digest is empty/failed, because the call was
|
||||||
|
// still billed — the caller books it before degrading (§8.1 partial cascade).
|
||||||
|
type WebContext struct {
|
||||||
|
Digest string
|
||||||
|
Citations []string
|
||||||
|
Usage Usage
|
||||||
|
Cost CostBreakdown
|
||||||
|
}
|
||||||
|
|
||||||
|
// WebProvider fetches grounded facts for a query. Stateless. It returns its cost in the
|
||||||
|
// WebContext even on error (the call was billed), and an error when the digest is
|
||||||
|
// unusable so the caller can degrade.
|
||||||
|
type WebProvider interface {
|
||||||
|
Fetch(ctx context.Context, query string) (WebContext, error)
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- grok_web_search (default): xAI Agent Tools web_search on the Responses API -------
|
||||||
|
|
||||||
|
type grokWebSearch struct {
|
||||||
|
base string
|
||||||
|
key string
|
||||||
|
model string
|
||||||
|
cfg *Config
|
||||||
|
httpc *http.Client
|
||||||
|
logger *slog.Logger
|
||||||
|
}
|
||||||
|
|
||||||
|
func newGrokWebSearch(cfg *Config, logger *slog.Logger) *grokWebSearch {
|
||||||
|
return &grokWebSearch{
|
||||||
|
base: cfg.XAIBaseURL, key: cfg.XAIAPIKey, model: cfg.XAIModel,
|
||||||
|
cfg: cfg, httpc: &http.Client{}, logger: logger,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
type grokResponsesRequest struct {
|
||||||
|
Model string `json:"model"`
|
||||||
|
Input string `json:"input"`
|
||||||
|
Tools []openAITool `json:"tools"`
|
||||||
|
// Keep the fetch fast/cheap when the operator runs a unified model with effort
|
||||||
|
// "none"; empty → not sent (provider default). Validated against /v1/responses.
|
||||||
|
ReasoningEffort string `json:"reasoning_effort,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// grokResponsesResponse maps the xAI Responses API shape (verified live 2026-06-01):
|
||||||
|
// output[] carries reasoning/web_search_call/message items; the message item's content
|
||||||
|
// has output_text (with inline url_citation annotations); usage reports tokens, the
|
||||||
|
// cached subset, and the count of server-side web_search calls (a single request can
|
||||||
|
// make several, each billed).
|
||||||
|
type grokResponsesResponse struct {
|
||||||
|
Output []struct {
|
||||||
|
Type string `json:"type"`
|
||||||
|
Content []struct {
|
||||||
|
Type string `json:"type"`
|
||||||
|
Text string `json:"text"`
|
||||||
|
Annotations []struct {
|
||||||
|
Type string `json:"type"`
|
||||||
|
URL string `json:"url"`
|
||||||
|
} `json:"annotations"`
|
||||||
|
} `json:"content"`
|
||||||
|
} `json:"output"`
|
||||||
|
Usage struct {
|
||||||
|
InputTokens int `json:"input_tokens"`
|
||||||
|
OutputTokens int `json:"output_tokens"`
|
||||||
|
InputTokensDetails struct {
|
||||||
|
CachedTokens int `json:"cached_tokens"`
|
||||||
|
} `json:"input_tokens_details"`
|
||||||
|
ServerSideToolUsageDetails struct {
|
||||||
|
WebSearchCalls int `json:"web_search_calls"`
|
||||||
|
} `json:"server_side_tool_usage_details"`
|
||||||
|
} `json:"usage"`
|
||||||
|
}
|
||||||
|
|
||||||
|
func (p *grokWebSearch) Fetch(ctx context.Context, query string) (WebContext, error) {
|
||||||
|
body, err := json.Marshal(grokResponsesRequest{
|
||||||
|
Model: p.model, Input: query, Tools: []openAITool{{Type: "web_search"}},
|
||||||
|
ReasoningEffort: p.cfg.GrokReasoningEffort,
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
return WebContext{}, err
|
||||||
|
}
|
||||||
|
req, err := http.NewRequestWithContext(ctx, http.MethodPost, p.base+"/responses", bytes.NewReader(body))
|
||||||
|
if err != nil {
|
||||||
|
return WebContext{}, err
|
||||||
|
}
|
||||||
|
req.Header.Set("Content-Type", "application/json")
|
||||||
|
req.Header.Set("Authorization", "Bearer "+p.key)
|
||||||
|
|
||||||
|
resp, err := p.httpc.Do(req)
|
||||||
|
if err != nil {
|
||||||
|
return WebContext{}, err
|
||||||
|
}
|
||||||
|
defer resp.Body.Close()
|
||||||
|
data, _ := io.ReadAll(resp.Body)
|
||||||
|
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
|
||||||
|
return WebContext{}, fmt.Errorf("grok web search http %d: %s", resp.StatusCode, snippet(data))
|
||||||
|
}
|
||||||
|
var out grokResponsesResponse
|
||||||
|
if err := json.Unmarshal(data, &out); err != nil {
|
||||||
|
return WebContext{}, fmt.Errorf("grok web search decode: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
var digest string
|
||||||
|
var citations []string
|
||||||
|
for _, item := range out.Output {
|
||||||
|
if item.Type != "message" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
for _, c := range item.Content {
|
||||||
|
if c.Type == "output_text" {
|
||||||
|
digest += c.Text
|
||||||
|
}
|
||||||
|
for _, a := range c.Annotations {
|
||||||
|
if a.Type == "url_citation" && a.URL != "" {
|
||||||
|
citations = append(citations, a.URL)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
usage := Usage{
|
||||||
|
PromptTokens: out.Usage.InputTokens,
|
||||||
|
CachedTokens: out.Usage.InputTokensDetails.CachedTokens,
|
||||||
|
CompletionTokens: out.Usage.OutputTokens,
|
||||||
|
}
|
||||||
|
// Cost = the call's tokens + the $5/1k fee times the ACTUAL number of web_search
|
||||||
|
// calls the request made (one request can search several times). Booked even when the
|
||||||
|
// digest is empty (the 2xx was billed), so the caller accounts for it before degrading.
|
||||||
|
// Cross-checked live against the API's own cost_in_usd_ticks — matched to 4 dp.
|
||||||
|
wc := WebContext{
|
||||||
|
Digest: digest,
|
||||||
|
Citations: citations,
|
||||||
|
Usage: usage,
|
||||||
|
Cost: CostBreakdown{
|
||||||
|
WebTool: computeUSD(p.model, usage, p.cfg) +
|
||||||
|
float64(out.Usage.ServerSideToolUsageDetails.WebSearchCalls)*grokWebSearchPerCall,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
if digest == "" {
|
||||||
|
return wc, fmt.Errorf("grok web search: empty result")
|
||||||
|
}
|
||||||
|
return wc, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- gemini_grounding (Gemini-3 native only) --------------------------------------
|
||||||
|
|
||||||
|
type geminiGrounding struct {
|
||||||
|
gem *geminiClient
|
||||||
|
st *Store
|
||||||
|
cfg *Config
|
||||||
|
}
|
||||||
|
|
||||||
|
func (p *geminiGrounding) Fetch(ctx context.Context, query string) (WebContext, error) {
|
||||||
|
// Durable, atomic daily cap FIRST: a grounded prompt is billed whether or not it
|
||||||
|
// grounds, and the per-prompt overage ($35/1k on 2.5) is the cost this guard exists
|
||||||
|
// to bound. Admit against the cap before spending. (grok_web_search needs no such
|
||||||
|
// cap — its $5/1k per-call fee is fully reserved per request and bounded by the
|
||||||
|
// per-user request cap + global ceiling.)
|
||||||
|
if ok, err := p.st.IncrGroundingIfUnder(p.cfg.WebGroundingDailyCap); err != nil {
|
||||||
|
return WebContext{}, err
|
||||||
|
} else if !ok {
|
||||||
|
return WebContext{}, errGroundingCapped
|
||||||
|
}
|
||||||
|
res, err := p.gem.groundedSearch(ctx, query) // errors (incl. no-citations) → caller degrades
|
||||||
|
cost := CostBreakdown{Grounding: computeUSD(p.cfg.GeminiModel, res.Usage, p.cfg)}
|
||||||
|
if err != nil {
|
||||||
|
return WebContext{Cost: cost, Usage: res.Usage}, err
|
||||||
|
}
|
||||||
|
return WebContext{Digest: res.Digest, Citations: res.Citations, Usage: res.Usage, Cost: cost}, nil
|
||||||
|
}
|
||||||
|
|
@ -1,171 +0,0 @@
|
||||||
package main
|
|
||||||
|
|
||||||
import (
|
|
||||||
"bytes"
|
|
||||||
"context"
|
|
||||||
"encoding/json"
|
|
||||||
"fmt"
|
|
||||||
"io"
|
|
||||||
"log/slog"
|
|
||||||
"math/rand"
|
|
||||||
"net/http"
|
|
||||||
"time"
|
|
||||||
)
|
|
||||||
|
|
||||||
// XAIClient talks the OpenAI-compatible Chat Completions endpoint at
|
|
||||||
// {base}/chat/completions with a Bearer key.
|
|
||||||
type XAIClient struct {
|
|
||||||
base string
|
|
||||||
key string
|
|
||||||
http *http.Client
|
|
||||||
maxTry int
|
|
||||||
log *slog.Logger
|
|
||||||
}
|
|
||||||
|
|
||||||
func NewXAIClient(base, key string, logger *slog.Logger) *XAIClient {
|
|
||||||
return &XAIClient{
|
|
||||||
base: base,
|
|
||||||
key: key,
|
|
||||||
http: &http.Client{},
|
|
||||||
maxTry: 3,
|
|
||||||
log: logger,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
type xaiMessage struct {
|
|
||||||
Role string `json:"role"`
|
|
||||||
Content string `json:"content"`
|
|
||||||
}
|
|
||||||
|
|
||||||
type xaiRequest struct {
|
|
||||||
Model string `json:"model"`
|
|
||||||
Messages []xaiMessage `json:"messages"`
|
|
||||||
MaxTokens int `json:"max_tokens"`
|
|
||||||
Temperature float64 `json:"temperature"`
|
|
||||||
Stream bool `json:"stream"`
|
|
||||||
}
|
|
||||||
|
|
||||||
type xaiUsage struct {
|
|
||||||
PromptTokens int `json:"prompt_tokens"`
|
|
||||||
CompletionTokens int `json:"completion_tokens"`
|
|
||||||
PromptTokensDetails struct {
|
|
||||||
CachedTokens int `json:"cached_tokens"`
|
|
||||||
} `json:"prompt_tokens_details"`
|
|
||||||
}
|
|
||||||
|
|
||||||
type xaiResponse struct {
|
|
||||||
Choices []struct {
|
|
||||||
Message struct {
|
|
||||||
Content string `json:"content"`
|
|
||||||
} `json:"message"`
|
|
||||||
FinishReason string `json:"finish_reason"`
|
|
||||||
} `json:"choices"`
|
|
||||||
Usage xaiUsage `json:"usage"`
|
|
||||||
}
|
|
||||||
|
|
||||||
func (r *xaiResponse) Text() string {
|
|
||||||
if len(r.Choices) == 0 {
|
|
||||||
return ""
|
|
||||||
}
|
|
||||||
return r.Choices[0].Message.Content
|
|
||||||
}
|
|
||||||
|
|
||||||
// Complete calls Chat Completions with retry on transient failures (429 / 5xx /
|
|
||||||
// network timeout, exponential backoff + jitter). Non-retryable 4xx fail
|
|
||||||
// immediately. On exhaustion the caller refunds the reserved request and notifies
|
|
||||||
// the user, so a transient failure is never silently swallowed (F6).
|
|
||||||
func (x *XAIClient) Complete(ctx context.Context, model string, msgs []xaiMessage, maxTokens int, temp float64) (*xaiResponse, error) {
|
|
||||||
reqBody := xaiRequest{
|
|
||||||
Model: model,
|
|
||||||
Messages: msgs,
|
|
||||||
MaxTokens: maxTokens,
|
|
||||||
Temperature: temp,
|
|
||||||
Stream: false,
|
|
||||||
}
|
|
||||||
payload, err := json.Marshal(reqBody)
|
|
||||||
if err != nil {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
|
|
||||||
var lastErr error
|
|
||||||
for attempt := 0; attempt < x.maxTry; attempt++ {
|
|
||||||
if attempt > 0 {
|
|
||||||
// 0.5s, 1s, 2s … capped at 8s, plus up to 250ms jitter.
|
|
||||||
backoff := time.Duration(500<<uint(attempt-1)) * time.Millisecond
|
|
||||||
if backoff > 8*time.Second {
|
|
||||||
backoff = 8 * time.Second
|
|
||||||
}
|
|
||||||
backoff += time.Duration(rand.Intn(250)) * time.Millisecond
|
|
||||||
select {
|
|
||||||
case <-ctx.Done():
|
|
||||||
return nil, ctx.Err()
|
|
||||||
case <-time.After(backoff):
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
resp, retryable, err := x.attempt(ctx, payload)
|
|
||||||
if err == nil {
|
|
||||||
return resp, nil
|
|
||||||
}
|
|
||||||
lastErr = err
|
|
||||||
if ctx.Err() != nil {
|
|
||||||
return nil, ctx.Err()
|
|
||||||
}
|
|
||||||
if !retryable {
|
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
if x.log != nil {
|
|
||||||
x.log.Warn("xai attempt failed, will retry", "attempt", attempt+1, "max", x.maxTry, "err", err)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return nil, fmt.Errorf("xai: exhausted %d attempts: %w", x.maxTry, lastErr)
|
|
||||||
}
|
|
||||||
|
|
||||||
// attempt performs one HTTP call. It returns retryable=true for 429/5xx and
|
|
||||||
// network errors, false for other non-2xx (terminal 4xx).
|
|
||||||
func (x *XAIClient) attempt(ctx context.Context, payload []byte) (*xaiResponse, bool, error) {
|
|
||||||
// Per-attempt deadline so a hung connection doesn't block the whole loop.
|
|
||||||
attemptCtx, cancel := context.WithTimeout(ctx, 60*time.Second)
|
|
||||||
defer cancel()
|
|
||||||
|
|
||||||
req, err := http.NewRequestWithContext(attemptCtx, http.MethodPost, x.base+"/chat/completions", bytes.NewReader(payload))
|
|
||||||
if err != nil {
|
|
||||||
return nil, false, err
|
|
||||||
}
|
|
||||||
req.Header.Set("Content-Type", "application/json")
|
|
||||||
req.Header.Set("Authorization", "Bearer "+x.key)
|
|
||||||
|
|
||||||
resp, err := x.http.Do(req)
|
|
||||||
if err != nil {
|
|
||||||
// Network error / timeout — retryable (unless the parent ctx is done).
|
|
||||||
return nil, ctx.Err() == nil, err
|
|
||||||
}
|
|
||||||
defer resp.Body.Close()
|
|
||||||
data, _ := io.ReadAll(resp.Body)
|
|
||||||
|
|
||||||
if resp.StatusCode == http.StatusTooManyRequests || resp.StatusCode >= 500 {
|
|
||||||
return nil, true, fmt.Errorf("xai http %d: %s", resp.StatusCode, snippet(data))
|
|
||||||
}
|
|
||||||
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
|
|
||||||
return nil, false, fmt.Errorf("xai http %d: %s", resp.StatusCode, snippet(data))
|
|
||||||
}
|
|
||||||
|
|
||||||
var out xaiResponse
|
|
||||||
if err := json.Unmarshal(data, &out); err != nil {
|
|
||||||
return nil, false, fmt.Errorf("xai decode: %w", err)
|
|
||||||
}
|
|
||||||
// A 2xx is a billed call even when the model returns empty content (content
|
|
||||||
// filter, finish_reason=length with no text, or no choices). Return it as a
|
|
||||||
// success so the caller books the real cost via Reconcile instead of refunding
|
|
||||||
// the slot and losing the spend — which would let empty replies bypass BOTH the
|
|
||||||
// per-user cap and the global ceiling. The caller just won't send an empty body.
|
|
||||||
return &out, false, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
func snippet(b []byte) string {
|
|
||||||
const max = 300
|
|
||||||
if len(b) > max {
|
|
||||||
return string(b[:max]) + "…"
|
|
||||||
}
|
|
||||||
return string(b)
|
|
||||||
}
|
|
||||||
Loading…
Add table
Reference in a new issue