feat(ai-bot): route Vojo product questions to a curated KB via the classifier's about_project signal so Grok answers from facts, not hallucination

This commit is contained in:
heaven 2026-06-04 00:45:58 +03:00
parent ad730b1538
commit 2b07b110dd
15 changed files with 593 additions and 22 deletions

View file

@ -169,6 +169,8 @@ offline-eval gate (misroute < 2-3% AND measured saving > the second provider's c
| `WEB_PARANOID` | false | **the single switch that activates epistemic grounding.** Beyond freshness words, it unlocks the classifier-driven web arms (needs_web≥0.55, obscure entity, time-sensitive, lookup-hint) — i.e. it routes checkable-fact lookups (a film's cast, a date) to grounding instead of letting Grok answer from memory and hallucinate. With it off, web routing is freshness-only (= today), so turning on the classifier alone is web-routing-neutral. **Requires `WEB_PROVIDER=gemini_grounding`** (refuses to boot on `grok_web_search`, which has no daily cap). |
| `WEB_GROUNDING_DAILY_CAP` | 450 | durable per-day cap for `gemini_grounding` before degrading. Google gives **1,500 grounded requests/day free** (shared Flash/Flash-Lite, both free & paid tiers; verified ai.google.dev/pricing); keep the cap **under 1,500** so grounding stays free (token-only). Must be > 0 for `gemini_grounding` (a non-positive cap silently disables grounding → refuses to boot). |
| `GEMINI_GROUNDING_PER_PROMPT_USD` | 0.035 | the per-grounded-prompt FEE booked into the ledger so the `DAILY_USD_CEILING` accounts for it. The fee is **$35/1k = $0.035** but ONLY applies **above** the 1,500/day free allowance. So while `WEB_GROUNDING_DAILY_CAP ≤ 1,500` (e.g. the 450 default) grounding never hits the fee → **set `0`** (the bot then books only token cost, which is correct). Set `0.035` only if you raise the cap above 1,500/day, so the ceiling throttles before silently overrunning on requests #1501+. |
| `PROJECT_KB_ENABLED` | false | **project_then_grok route** — answers questions about the **Vojo product itself** (features/how-to/limits/privacy) from a curated KB instead of Grok's empty memory (Grok doesn't know Vojo) or the web (Google doesn't either). Gated by the classifier's `about_project` signal — the classifier is the context-aware judge (it sees the conversation, so it resolves follow-ups like "Про этот" → the app that a bare-message regex can't), and a false positive is cheap (the entity-scoped note keeps Grok answering the real question). The KB is injected as a system note with an **entity-scoped** anti-hallucination instruction (Vojo claims from the KB only; "I don't have that" when absent; general parts answered normally). Beats every web arm. **Requires `ROUTER_CLASSIFIER_ENABLED`** (+ transitively `ROUTER_ENABLED` + Gemini key). One Grok call (no extra model call) → `reserveEstimate` unchanged; the KB adds ≤~2,500 input tokens on top of the capped prompt (a bounded slight under-reservation; `Settle` books the actual). |
| `PROJECT_KB_PATH` | `prompts/vojo_kb.txt` | path to the curated KB text file (operator data, **not** code), loaded once at startup like `SYSTEM_PROMPT_PATH` (no hot-reload — edit + restart). **Defaults to the KB baked into the image**, so enabling the route needs only `PROJECT_KB_ENABLED=true`. An empty/missing file or a KB over ~2,500 tokens **refuses to boot** (fail-closed). Format: terse bullets, one fact per line, keep negations explicit. |
| `REASONING_ENABLED` | false | manual "think harder" route on `REASONING_TRIGGER` |
| `REASONING_TRIGGER` | `подумай глубже` | trigger phrase |
| `REASONING_MODEL` | `grok-4.3` | a **reasoning-capable** model (the default `grok-4.20-non-reasoning` rejects `reasoning_effort`) |

View file

@ -84,13 +84,22 @@ func NewBot(ctx context.Context, cfg *Config, logger *slog.Logger) (*Bot, error)
return nil, err
}
// prompt_version is a stable hash logged with each request so prompt changes show in
// telemetry. Fold the project KB in WHEN PRESENT so a KB revision is visible too; with the
// route off (KB ""), promptForVersion is exactly cfg.SystemPrompt, so the hash is unchanged
// from before this feature and flags-off telemetry is byte-identical.
promptForVersion := cfg.SystemPrompt
if cfg.ProjectKB != "" {
promptForVersion += "\x00" + cfg.ProjectKB
}
b := &Bot{
cfg: cfg,
log: logger,
mx: mx,
llm: llm,
st: st,
promptVersion: fmt.Sprintf("%08x", hashString(cfg.SystemPrompt)),
promptVersion: fmt.Sprintf("%08x", hashString(promptForVersion)),
seen: newLRUSet(5000),
botSent: newLRUSet(5000),
meta: make(map[string]*roomMeta),
@ -495,6 +504,7 @@ func (b *Bot) respond(ctx context.Context, roomID, threadRoot string, isDM bool,
rl.TimeSensitive = res.decision.TimeSensitive
rl.Verifiable = res.decision.Verifiable
rl.TrivialScore = res.decision.TrivialScore
rl.AboutProject = res.decision.AboutProject
rl.WebDecidedBy = res.decision.WebDecidedBy
rl.RewriteUsed = res.rewriteUsed
rl.WebGrounded = res.webGrounded
@ -574,6 +584,15 @@ func (b *Bot) respond(ctx context.Context, roomID, threadRoot string, isDM bool,
// the reservation estimate, so the two never disagree about a request's size.
const maxPromptTokens = 8000
// maxProjectKBTokens caps the curated project KB at startup. The KB is injected via
// insertSystemNote AFTER buildContext truncates history, so an oversized KB could push the
// real prompt past maxPromptTokens (the reservation estimate). It is a COARSE sanity backstop,
// not a precise budget: estimateTokens is runes/4 (ASCII-biased) and a Cyrillic KB tokenizes
// denser, so a KB near this cap can be more real tokens than estimated — fine, because a real
// curated single-product FAQ is ~250800 tokens (far under the cap) and the prompt stays well
// within model limits. Enforced in main.go (fail-fast at startup).
const maxProjectKBTokens = 2500
// estimateUSD is the conservative max-cost reserved for a route before the call, so
// the global ceiling can count an in-flight request (§8.1). It prices a full prompt
// (maxPromptTokens) plus the max output at the model's non-cached rates — an upper-ish

View file

@ -139,6 +139,22 @@ func (b *Bot) generate(ctx context.Context, body string, msgs []Message, convID
b.degradeTo(&res, degradeReasoning)
}
}
case routeProject:
// Combine emits this route on the two-signal gate regardless of the flag; the flag
// gates EXECUTION here (mirroring WebEnabled). With it off, the case is a no-op and
// we fall through to grok_direct — byte-identical to today, no KB injected.
if b.cfg.ProjectKBEnabled {
if err := b.genProjectThenGrok(ctx, msgs, convID, &res); err == nil {
return res, nil
} else {
b.log.WarnContext(ctx, "project route failed; degrading to grok_direct", "err", err)
b.degradeTo(&res, degradeProject)
// The KB couldn't be voiced, so a plain grok_direct retry would answer about
// Vojo from empty memory (the hallucination this route exists to stop). Inject
// an abstain hedge so even the degrade stays honest about product specifics.
finalMsgs = projectAbstainMessages(msgs)
}
}
}
// grok_direct — the default route AND the universal fallback. The only path that
@ -230,6 +246,37 @@ func (b *Bot) genReason(ctx context.Context, msgs []Message, convID string, res
return nil
}
// genProjectThenGrok answers a question about the Vojo product by injecting the curated KB
// as a system note (the same insertSystemNote mechanism the web route uses, but the
// "digest" is the operator-authored static cfg.ProjectKB, not a web fetch) and having Grok
// voice the answer strictly from it. ONE Grok call at XAIModel — no extra model call — so
// reserveEstimate is unchanged (§7). The KB adds ≤maxProjectKBTokens of input on top of the
// already-capped prompt, a bounded slight under-reservation (like the web route's digest);
// Settle books the authoritative actual, so committed accounting stays honest. An empty reply
// is a failure so the caller degrades (with the abstain hedge) rather than sending nothing.
func (b *Bot) genProjectThenGrok(ctx context.Context, msgs []Message, convID string, res *genResult) error {
t := time.Now()
resp, err := b.llm.Complete(ctx, LLMRequest{
Model: b.cfg.XAIModel,
Messages: projectKBMessages(msgs, b.cfg.ProjectKB),
MaxTokens: b.cfg.MaxOutTok,
Temperature: b.cfg.XAITemp,
ConvID: convID,
ReasoningEffort: b.cfg.GrokReasoningEffort, // same voice/effort as grok_direct
})
res.stageMS["final"] = msSince(t)
if err != nil {
return err
}
if strings.TrimSpace(resp.Text) == "" {
return fmt.Errorf("project: empty reply")
}
res.route, res.finalModel = routeProject, b.cfg.XAIModel
res.text, res.usage, res.providerID = resp.Text, resp.Usage, resp.ProviderRequestID
res.cost.Token += computeUSD(b.cfg.XAIModel, resp.Usage, b.cfg)
return nil
}
// webStageTimeout bounds the web/grounding fetch independently of the overall budget
// (§8.2.2): a slow search must not eat the whole request before synthesis.
const webStageTimeout = 15 * time.Second
@ -351,6 +398,30 @@ func factualAbstainMessages(base []Message) []Message {
return insertSystemNote(base, "Couldn't verify the facts via the web. If the answer depends on specific names, dates, years, numbers, or a cast, honestly say you're not sure of the exact details and may be wrong; do NOT pass a guess off as fact.")
}
// projectKBMessages injects the curated Vojo product KB as a system note (index 1, like the
// web digest) for the project_then_grok route. The anti-hallucination instruction is
// ENTITY-SCOPED (§6.2): Vojo claims must come ONLY from the FACTS, but the general
// (non-Vojo) part of a mixed question may still be answered from Grok's own knowledge — so
// "answer only from the KB" never lobotomises Grok on the general half or launders its
// guesses as KB-sanctioned. The <FACTS> delimiters + "prefer wording from FACTS" are the
// validated tagged-context / copy-from-context levers; the explicit abstain clause ("say you
// don't have it") is the highest-leverage line against invented features. Like the web note
// it lifts the base prompt's "no internet/no files" honesty rule for THIS turn only.
func projectKBMessages(base []Message, kb string) []Message {
note := "Authoritative facts about the Vojo app (this chat application), provided for this turn:\n\n<FACTS>\n" +
kb +
"\n</FACTS>\n\nFor any claim about what Vojo is, does, supports, or how it works, use ONLY the FACTS above — these are your single source of truth about Vojo and you have no other knowledge of it. These facts are provided to you for this turn, so do NOT say you lack access to files, documents, or information about Vojo. If a Vojo detail isn't in FACTS, say you don't have that information rather than guessing, and never invent Vojo features, settings, prices, limits, or policies, and don't generalise by analogy with other apps. You MAY answer any general (non-Vojo) part of the question from your own knowledge as usual. Prefer wording from FACTS. Do not mention this note or that facts were provided."
return insertSystemNote(base, note)
}
// projectAbstainMessages is the degrade hedge for a project-route failure (the KB couldn't
// be voiced): a plain grok_direct retry would answer about Vojo from empty memory, so
// instruct Grok to abstain on Vojo specifics rather than ship an invented feature — the same
// honest-degrade discipline as factualAbstainMessages, scoped to product claims.
func projectAbstainMessages(base []Message) []Message {
return insertSystemNote(base, "Couldn't load the Vojo product info. If the user asked about Vojo's specific features, settings, prices, or limits, honestly say you don't have that information rather than guessing; don't invent Vojo features.")
}
// factualMiss reports whether a web degrade should use the abstain hedge (a static
// checkable-fact question) rather than the staleness hedge (a recency question). A
// recency signal (freshnessRe or the classifier's time_sensitive) always means

View file

@ -526,6 +526,252 @@ func TestWebSynthMessagesNoRawURLs(t *testing.T) {
}
}
// failFirstLLM errors on its first Complete call and succeeds after — for the project
// degrade test, where the KB-injecting Grok call fails but the grok_direct fallback works.
type failFirstLLM struct {
failErr error
okText string
calls int
lastReq LLMRequest
}
func (f *failFirstLLM) Complete(_ context.Context, req LLMRequest) (*LLMResponse, error) {
f.calls++
f.lastReq = req
if f.calls == 1 {
return nil, f.failErr
}
return &LLMResponse{Text: f.okText, ProviderRequestID: "fake"}, nil
}
// TestGenerateProjectFlagOffByteIdentical: even with a KB loaded and a product question,
// all cascade flags off → grok_direct, Gemini untouched, the KB never reaches the prompt.
func TestGenerateProjectFlagOffByteIdentical(t *testing.T) {
grok := &fakeLLM{text: "grok answer"}
gem := &fakeLLM{text: "should not run"}
cfg := cascadeCfg()
cfg.ProjectKB = "VOJO FACTS" // present but the flag is off
b := &Bot{cfg: &cfg, llm: grok, gemini: gem, log: discardLog()}
res, err := b.generate(context.Background(), "что такое vojo", msgs("что такое vojo"), "", true)
if err != nil {
t.Fatalf("generate: %v", err)
}
if res.route != routeGrokDirect {
t.Fatalf("route=%q, want grok_direct (all flags off)", res.route)
}
if gem.calls != 0 {
t.Fatalf("gemini called %d, want 0 (router off)", gem.calls)
}
if hedgeContains(grok.lastReq.Messages, "VOJO FACTS") {
t.Fatalf("KB leaked into the grok prompt with flags off: %+v", grok.lastReq.Messages)
}
}
// TestGenerateProjectFlagOffFallsThrough is the canary-clean property: with the classifier
// on but PROJECT_KB_ENABLED off, Combine still DECIDES project (so about_project is recorded
// for "would-have-fired" measurement) but EXECUTION falls through to grok_direct — the KB is
// never injected and the answer is byte-identical to today's grok_direct.
func TestGenerateProjectFlagOffFallsThrough(t *testing.T) {
const verdict = `{"about_project":true,"confidence":0.9}`
grok := &fakeLLM{text: "grok answer"}
gem := &fakeLLM{text: verdict}
cfg := cascadeCfg()
cfg.RouterEnabled, cfg.RouterClassifierEnabled = true, true // PROJECT_KB_ENABLED deliberately OFF
cfg.ProjectKB = "VOJO FACTS"
b := &Bot{cfg: &cfg, llm: grok, gemini: gem, log: discardLog()}
res, err := b.generate(context.Background(), "что умеет vojo", msgs("что умеет vojo"), "", true)
if err != nil {
t.Fatalf("generate: %v", err)
}
if res.decision.Route != routeProject {
t.Fatalf("decision.Route=%q, want project_then_grok (the would-have-fired signal)", res.decision.Route)
}
if !res.decision.AboutProject {
t.Fatalf("about_project must be recorded for telemetry even with the flag off")
}
if res.route != routeGrokDirect {
t.Fatalf("route=%q, want grok_direct (flag off → fall through)", res.route)
}
if hedgeContains(grok.lastReq.Messages, "VOJO FACTS") {
t.Fatalf("KB injected despite the flag being off: %+v", grok.lastReq.Messages)
}
if grok.calls != 1 || gem.calls != 1 {
t.Fatalf("calls grok=%d gem=%d, want 1/1 (classifier + grok_direct)", grok.calls, gem.calls)
}
}
// TestGenerateProjectThenGrok: with the gate on, an about_project verdict routes to
// project_then_grok, injects the KB as a system note, and Grok voices it — one Grok call,
// Gemini only as the classifier.
func TestGenerateProjectThenGrok(t *testing.T) {
const verdict = `{"about_project":true,"needs_web":false,"confidence":0.9}`
grok := &fakeLLM{text: "voiced from KB", usage: Usage{PromptTokens: 20, CompletionTokens: 8}}
gem := &fakeLLM{text: verdict}
cfg := cascadeCfg()
cfg.RouterEnabled, cfg.RouterClassifierEnabled, cfg.ProjectKBEnabled = true, true, true
cfg.ProjectKB = "VOJO FACTS: encrypted DMs, voice calls; no group calls yet."
b := &Bot{cfg: &cfg, llm: grok, gemini: gem, log: discardLog()}
res, err := b.generate(context.Background(), "что умеет vojo", msgs("что умеет vojo"), "", true)
if err != nil {
t.Fatalf("generate: %v", err)
}
if res.route != routeProject || res.text != "voiced from KB" || res.finalModel != "grok-x" {
t.Fatalf("res=(%q,%q,%q), want project_then_grok/voiced/grok-x", res.route, res.text, res.finalModel)
}
if !hedgeContains(grok.lastReq.Messages, "VOJO FACTS: encrypted DMs") {
t.Fatalf("KB not injected into the grok prompt: %+v", grok.lastReq.Messages)
}
if grok.calls != 1 || gem.calls != 1 {
t.Fatalf("calls grok=%d gem=%d, want 1/1 (classifier + one project synth)", grok.calls, gem.calls)
}
}
// TestGenerateAboutProjectFalseNoKB: when the classifier says about_project=false, the KB is
// NOT injected even with the flag on — the route trusts the classifier in both directions.
func TestGenerateAboutProjectFalseNoKB(t *testing.T) {
const verdict = `{"about_project":false,"confidence":0.9}`
grok := &fakeLLM{text: "grok answer"}
gem := &fakeLLM{text: verdict}
cfg := cascadeCfg()
cfg.RouterEnabled, cfg.RouterClassifierEnabled, cfg.ProjectKBEnabled = true, true, true
cfg.ProjectKB = "VOJO FACTS"
b := &Bot{cfg: &cfg, llm: grok, gemini: gem, log: discardLog()}
res, err := b.generate(context.Background(), "расскажи про телеграм", msgs("расскажи про телеграм"), "", true)
if err != nil {
t.Fatalf("generate: %v", err)
}
if res.decision.Route == routeProject || res.route != routeGrokDirect {
t.Fatalf("about_project=false routed to project: decision=%q route=%q, want grok_direct", res.decision.Route, res.route)
}
if hedgeContains(grok.lastReq.Messages, "VOJO FACTS") {
t.Fatalf("KB injected when the classifier said not-about-project: %+v", grok.lastReq.Messages)
}
if grok.calls != 1 {
t.Fatalf("grok calls=%d, want 1 (no project synth attempt)", grok.calls)
}
}
// TestGenerateProjectContextFollowup: the headline live case — a context-resolved follow-up
// ("Про этот", no literal "vojo") that the classifier flags about_project=true routes to the
// KB. This is what the old regex-hint gate wrongly blocked.
func TestGenerateProjectContextFollowup(t *testing.T) {
const verdict = `{"about_project":true,"confidence":1.0}`
grok := &fakeLLM{text: "voiced from KB"}
gem := &fakeLLM{text: verdict}
cfg := cascadeCfg()
cfg.RouterEnabled, cfg.RouterClassifierEnabled, cfg.ProjectKBEnabled = true, true, true
cfg.ProjectKB = "VOJO FACTS: messaging, calls, channels."
b := &Bot{cfg: &cfg, llm: grok, gemini: gem, log: discardLog()}
res, err := b.generate(context.Background(), "Про этот", []Message{
{Role: "system", Content: "SYS"},
{Role: "user", Content: "знаешь что-нибудь про мессенджер?"},
{Role: "assistant", Content: "Знаю. Про какой именно?"},
{Role: "user", Content: "Про этот"},
}, "", true)
if err != nil {
t.Fatalf("generate: %v", err)
}
if res.route != routeProject {
t.Fatalf("context follow-up = %q, want project_then_grok (no literal 'vojo' needed)", res.route)
}
if !hedgeContains(grok.lastReq.Messages, "VOJO FACTS: messaging") {
t.Fatalf("KB not injected on the context-resolved follow-up: %+v", grok.lastReq.Messages)
}
}
// TestGenerateProjectDegradesToGrok: the KB-injecting Grok call fails → degrade to
// grok_direct with the project-abstain hedge (never silent, never a Vojo guess from empty
// memory).
func TestGenerateProjectDegradesToGrok(t *testing.T) {
const verdict = `{"about_project":true,"confidence":0.9}`
grok := &failFirstLLM{failErr: errors.New("grok boom on KB turn"), okText: "honest fallback"}
gem := &fakeLLM{text: verdict}
cfg := cascadeCfg()
cfg.RouterEnabled, cfg.RouterClassifierEnabled, cfg.ProjectKBEnabled = true, true, true
cfg.ProjectKB = "VOJO FACTS"
b := &Bot{cfg: &cfg, llm: grok, gemini: gem, log: discardLog()}
res, err := b.generate(context.Background(), "что умеет vojo", msgs("что умеет vojo"), "", true)
if err != nil {
t.Fatalf("generate: %v", err)
}
if res.route != routeGrokDirect || res.text != "honest fallback" || !res.fallback {
t.Fatalf("res=(%q,%q,fallback=%v), want grok_direct/honest fallback/true", res.route, res.text, res.fallback)
}
if res.degraded != degradeProject {
t.Fatalf("degraded=%q, want %q", res.degraded, degradeProject)
}
if !hedgeContains(grok.lastReq.Messages, "Couldn't load the Vojo product info") {
t.Fatalf("project degrade should inject the abstain hedge; messages=%+v", grok.lastReq.Messages)
}
if grok.calls != 2 {
t.Fatalf("grok calls=%d, want 2 (failed KB attempt + grok_direct fallback)", grok.calls)
}
}
// TestProjectKBMessagesScoped guards the anti-hallucination note: the KB is injected
// delimited, Vojo claims are restricted to the FACTS, the general part is explicitly
// licensed (entity-scoped, NOT "answer only from KB"), and the abstain clause is present.
func TestProjectKBMessagesScoped(t *testing.T) {
out := projectKBMessages(msgs("что умеет vojo"), "VOJO FACT: chats and calls")
var note string
for _, m := range out {
if m.Role == "system" && strings.Contains(m.Content, "FACTS") {
note = m.Content
}
}
if note == "" {
t.Fatal("project KB note missing")
}
if !strings.Contains(note, "VOJO FACT: chats and calls") {
t.Fatalf("KB not injected: %q", note)
}
if !strings.Contains(note, "<FACTS>") {
t.Fatalf("note must delimit the KB with <FACTS> tags (tagged-context grounding): %q", note)
}
// The load-bearing hard-scoping clause: Vojo claims restricted to the FACTS. Without this
// assertion the clause could be silently softened (mutation-proven) and the route would
// stop grounding — re-opening the hallucination hole.
if !strings.Contains(note, "use ONLY the FACTS") {
t.Fatalf("note must restrict Vojo claims to the FACTS (entity-scoping): %q", note)
}
// Lifts the base prompt's "no file/document access" honesty rule for this turn (like the
// web note lifts "no internet access") — else a fast Grok can hedge "I can't access Vojo
// docs" despite the injected FACTS. The doc comment claims this lift; assert the wire does it.
if !strings.Contains(note, "do NOT say you lack access") {
t.Fatalf("note must lift the no-file-access rule so Grok treats the FACTS as available: %q", note)
}
if !strings.Contains(note, "general") {
t.Fatalf("note must license the general (non-Vojo) part — entity-scoped: %q", note)
}
if !strings.Contains(note, "don't have that information") {
t.Fatalf("note must carry the explicit abstain clause: %q", note)
}
}
// TestReserveEstimateProjectNoBump: enabling PROJECT_KB_ENABLED must NOT raise the
// reservation — the project route is one Grok call on a prompt already capped at
// maxPromptTokens, ≤ the grok_direct base already counted.
func TestReserveEstimateProjectNoBump(t *testing.T) {
base := cascadeCfg()
base.RouterEnabled, base.RouterClassifierEnabled = true, true
bBase := &Bot{cfg: &base, log: discardLog()}
proj := base
proj.ProjectKBEnabled = true
proj.ProjectKB = "facts"
bProj := &Bot{cfg: &proj, log: discardLog()}
if !approxEq(bBase.reserveEstimate(), bProj.reserveEstimate()) {
t.Fatalf("PROJECT_KB_ENABLED changed reserveEstimate: %v vs %v", bBase.reserveEstimate(), bProj.reserveEstimate())
}
}
func hedgeContains(ms []Message, sub string) bool {
for _, m := range ms {
if strings.Contains(m.Content, sub) {

View file

@ -223,5 +223,69 @@
},
"expected_route": "grok_direct",
"factual": false
},
{
"name": "project: what can Vojo do (name hint + about_project)",
"message": "что умеет vojo",
"verdict": {
"needs_web": false,
"verifiable": false,
"entity_obscure": false,
"time_sensitive": false,
"trivial": false,
"about_project": true,
"search_query": "",
"confidence": 0.9
},
"expected_route": "project_then_grok",
"factual": false
},
{
"name": "project: app how-to (intent hint + about_project)",
"message": "как в этом приложении включить шифрование",
"verdict": {
"needs_web": false,
"verifiable": false,
"entity_obscure": false,
"time_sensitive": false,
"trivial": false,
"about_project": true,
"search_query": "",
"confidence": 0.85
},
"expected_route": "project_then_grok",
"factual": false
},
{
"name": "venting about the app, classifier says not-about-project (about_project=false → grok)",
"message": "vojo упал опять?",
"verdict": {
"needs_web": false,
"verifiable": false,
"entity_obscure": false,
"time_sensitive": false,
"trivial": false,
"about_project": false,
"search_query": "",
"confidence": 0.4
},
"expected_route": "grok_direct",
"factual": false
},
{
"name": "project: context follow-up, no literal name (classifier resolves it)",
"message": "Про этот",
"verdict": {
"needs_web": false,
"verifiable": false,
"entity_obscure": false,
"time_sensitive": false,
"trivial": false,
"about_project": true,
"search_query": "",
"confidence": 1.0
},
"expected_route": "project_then_grok",
"factual": false
}
]

View file

@ -154,6 +154,16 @@ type Config struct {
SystemPrompt string
StateDir string
// Project-knowledge route (project_then_grok). ProjectKB is the curated Vojo product
// knowledge base injected behind the about_project gate so Grok answers product questions
// from facts instead of empty parametric memory. It is OPERATOR DATA loaded once at
// startup from ProjectKBPath (like SystemPrompt — no hot-reload), never Go constants. Off
// (default) → the route is unreachable and the bot is byte-identical to today. Requires
// ROUTER_CLASSIFIER_ENABLED (the about_project gate is a classifier signal).
ProjectKBEnabled bool
ProjectKBPath string
ProjectKB string
// DatabaseURL is the libpq/pgx DSN of the bot's dedicated Postgres database
// (`vojo_ai`), e.g. postgres://vojo_ai:***@postgres:5432/vojo_ai?sslmode=disable.
// It holds only operational state (txn/event dedup, the daily spend ledger, the
@ -256,6 +266,9 @@ func LoadConfig() (*Config, error) {
XAIBaseURL: strings.TrimRight(getenv("XAI_BASE_URL", "https://api.x.ai/v1"), "/"),
XAIModel: getenv("XAI_MODEL", "grok-4.20-0309-non-reasoning"),
SystemPromptPath: getenv("SYSTEM_PROMPT_PATH", "prompts/system_prompt.txt"),
// Defaults to the KB that ships in the image (Dockerfile bakes prompts/), like
// SYSTEM_PROMPT_PATH — so enabling the route needs ONLY PROJECT_KB_ENABLED=true.
ProjectKBPath: getenv("PROJECT_KB_PATH", "prompts/vojo_kb.txt"),
StateDir: strings.TrimRight(getenv("STATE_DIR", "/state"), "/"),
DatabaseURL: getenv("AI_BOT_DATABASE_URL", ""),
AllowedServers: parseServerSet(getenv("ALLOWED_SERVERS", "")),
@ -393,6 +406,7 @@ func LoadConfig() (*Config, error) {
{"WEB_ENABLED", &cfg.WebEnabled},
{"WEB_PARANOID", &cfg.WebParanoid},
{"REASONING_ENABLED", &cfg.ReasoningEnabled},
{"PROJECT_KB_ENABLED", &cfg.ProjectKBEnabled},
} {
if *f.dest, err = getenvBool(f.key, false); err != nil {
problems = append(problems, err.Error())
@ -461,6 +475,13 @@ func LoadConfig() (*Config, error) {
if cfg.ReasoningEnabled && cfg.ReasoningModel == "" {
problems = append(problems, "REASONING_MODEL is required when REASONING_ENABLED is set")
}
// Project-KB route: the about_project gate is a classifier signal, so the classifier (and
// transitively the router + Gemini key) must be on, else the route can never fire.
// PROJECT_KB_PATH always has a value (defaults to the bundled KB); main.go does the file
// read + non-empty + size check (file I/O lives there, like SYSTEM_PROMPT_PATH).
if cfg.ProjectKBEnabled && !cfg.RouterClassifierEnabled {
problems = append(problems, "PROJECT_KB_ENABLED requires ROUTER_CLASSIFIER_ENABLED (the about_project gate is a classifier signal)")
}
switch cfg.GrokReasoningEffort {
case "", "none", "low", "medium", "high":
default:
@ -558,6 +579,12 @@ func (c *Config) Summary() string {
c.RouterEnabled, c.RouterClassifierEnabled, c.TrivialOffloadEnabled,
c.WebEnabled, c.WebProvider, c.WebParanoid, c.WebGroundingDailyCap,
c.GeminiGroundingPerPrompt, c.ReasoningEnabled, c.ReasoningEffort),
fmt.Sprintf(" PROJECT_KB = enabled=%t path=%s", c.ProjectKBEnabled, func() string {
if c.ProjectKBPath == "" {
return "(unset)"
}
return c.ProjectKBPath
}()),
" GEMINI_MODEL = " + c.GeminiModel,
" GEMINI_API_KEY = " + redact(c.GeminiAPIKey),
}, "\n")

View file

@ -21,6 +21,7 @@ func setBaseEnv(t *testing.T) {
"GEMINI_API_KEY", "GEMINI_API_KEY_FILE", "ROUTER_ENABLED", "ROUTER_CLASSIFIER_ENABLED",
"TRIVIAL_OFFLOAD_ENABLED", "WEB_ENABLED", "REASONING_ENABLED", "WEB_PROVIDER", "REASONING_MODEL",
"WEB_PARANOID", "WEB_GROUNDING_DAILY_CAP", "GEMINI_GROUNDING_PER_PROMPT_USD",
"PROJECT_KB_ENABLED", "PROJECT_KB_PATH",
} {
t.Setenv(k, "")
}
@ -69,6 +70,35 @@ func TestConfigClassifierNeedsRouter(t *testing.T) {
}
}
// TestConfigProjectKBDefaultsPath: PROJECT_KB_PATH defaults to the bundled KB, so enabling
// the route needs only PROJECT_KB_ENABLED=true (the classifier already on). LoadConfig does
// not read the file — main.go does the fail-closed read/empty/size check at startup.
func TestConfigProjectKBDefaultsPath(t *testing.T) {
setBaseEnv(t)
t.Setenv("GEMINI_API_KEY", "gk")
t.Setenv("ROUTER_ENABLED", "true")
t.Setenv("ROUTER_CLASSIFIER_ENABLED", "true")
t.Setenv("PROJECT_KB_ENABLED", "true") // no explicit PROJECT_KB_PATH → bundled default
cfg, err := LoadConfig()
if err != nil {
t.Fatalf("enabling with the default KB path should be valid: %v", err)
}
if cfg.ProjectKBPath != "prompts/vojo_kb.txt" {
t.Fatalf("PROJECT_KB_PATH default = %q, want prompts/vojo_kb.txt", cfg.ProjectKBPath)
}
}
// TestConfigProjectKBNeedsClassifier: PROJECT_KB_ENABLED requires ROUTER_CLASSIFIER_ENABLED
// (the about_project gate is a classifier signal; without it the route could never fire).
func TestConfigProjectKBNeedsClassifier(t *testing.T) {
setBaseEnv(t)
t.Setenv("PROJECT_KB_ENABLED", "true")
t.Setenv("PROJECT_KB_PATH", "/tmp/vojo_kb.txt") // classifier deliberately off
if _, err := LoadConfig(); err == nil || !strings.Contains(err.Error(), "ROUTER_CLASSIFIER_ENABLED") {
t.Fatalf("PROJECT_KB_ENABLED without the classifier should fail; got %v", err)
}
}
func TestConfigBadWebProvider(t *testing.T) {
setBaseEnv(t)
t.Setenv("WEB_ENABLED", "true")

View file

@ -24,6 +24,10 @@ const (
RouteGrokDirect = "grok_direct"
RouteWeb = "web_then_grok"
RouteReason = "reason_then_grok"
// RouteProject answers a question about the Vojo product itself from a curated KB
// injected into the Grok prompt (about_project gate). Like RouteWeb it grounds Grok,
// but the "digest" is an operator-authored static KB, not a web fetch.
RouteProject = "project_then_grok"
)
// Confidence floors the combine uses. These are the values the offline eval (§11)
@ -73,6 +77,13 @@ type Verdict struct {
Trivial bool `json:"trivial"`
SearchQuery string `json:"search_query"`
Confidence float64 `json:"confidence"`
// AboutProject is true when the user is asking about the Vojo product itself (its
// features/how-to/limits/privacy/pricing). It routes to the project KB on its own — the
// classifier is the context-aware judge (it sees the conversation, so it resolves
// follow-ups like "Про этот" → the app) and a false positive is bounded by the
// entity-scoped KB note. (An earlier design also required a Layer-0 lexical hint, but live
// traffic showed that blocked correct context-resolved follow-ups — see Combine.)
AboutProject bool `json:"about_project"`
}
// Layer0 is the free-regex pre-classification result. Route is the verdict when the
@ -109,6 +120,17 @@ var (
lookupIntentRe_EN = regexp.MustCompile(`(?i)(^|[\s"'(])(who\s+(is|are|was|were|starred|played|directed|wrote|founded|invented|created)\s|in\s+(what|which)\s+(year|film|movie|show|series|book|game)\b|when\s+(did|was|were|does|is)\b.*\b(release|released|come\s+out|came\s+out|born|die|died|found|founded|launch|launched|air|aired)\b|what\s+year\b|how\s+many\s+(seasons|episodes|films|movies|books))`)
)
// NOTE: the project route used to require a Layer-0 lexical hint (literal "vojo" / an
// app-how-to phrase) AND the classifier's about_project. Live traffic showed that gate was
// too strict: the classifier correctly flagged context-resolved follow-ups ("Про этот",
// "Хочу репортнуть багу. Как?") as about_project=true, but the regex — which only sees the
// bare message and cannot resolve a pronoun to "the Vojo app" — missed them, so the KB never
// fired and Grok hallucinated (a dismissive "ничего особенного", an invented GitHub support
// channel). The classifier is the context-aware layer and is the right judge here, and a
// false positive is cheap (the entity-scoped KB note keeps Grok answering the real question).
// So the route now trusts about_project alone; the regex hint was removed (it saved no money —
// about_project is one field in the classifier JSON that runs on every message regardless).
// ClassifyLayer0 runs the free heuristic over a message body. The result drives routing
// only when the classifier is off; when it is on, WebForce/Trivial/LookupHint feed
// Combine. Empty body → grok_direct (the safe floor).
@ -150,6 +172,15 @@ type Combined struct {
// Combine resolves the Layer-0 decision + the classifier Verdict into the final route.
// It is the router's brain and it never blindly trusts the model:
//
// - the PROJECT arm (the classifier's AboutProject) wins above everything, including the
// hard freshness arm — the curated KB is the authoritative source for product facts and
// the web is the worst (it would re-introduce product hallucination). It trusts the
// classifier: about_project is a context-aware judgement (it sees the conversation, so it
// resolves follow-ups like "Про этот" → the app) that a bare-message regex cannot make. A
// false positive is cheap — the entity-scoped KB note keeps Grok answering the real
// question. Combine stays flag-agnostic: it EMITS RouteProject on AboutProject; the cascade
// gates EXECUTION on PROJECT_KB_ENABLED (mirroring how WebEnabled gates the web route), so
// with the flag off a RouteProject decision cleanly falls through to grok_direct.
// - freshnessRe (WebForce) is a HARD web signal, always honoured (it survives the
// classifier being down).
// - Every OTHER web arm (the classifier's needs_web≥floor AND verifiable,
@ -175,6 +206,8 @@ func Combine(l0 Layer0, v Verdict, paranoid bool) Combined {
// CombineWithFloors is Combine with explicit thresholds (the offline-eval sweep entry).
func CombineWithFloors(l0 Layer0, v Verdict, paranoid bool, f Floors) Combined {
switch {
case v.AboutProject:
return Combined{Route: RouteProject, WebDecidedBy: WebByNone}
case l0.WebForce:
return Combined{Route: RouteWeb, WebDecidedBy: WebByFreshness}
case paranoid && v.NeedsWeb && v.Verifiable && v.Confidence >= f.WebNeedsWeb:

View file

@ -228,3 +228,36 @@ func TestWebDecidedByAttribution(t *testing.T) {
}
}
}
// TestProjectGateOnAboutProject: the project route trusts the classifier — it fires when
// AboutProject is set and not otherwise. There is no Layer-0 hint requirement (live traffic
// showed it blocked correct context-resolved follow-ups). Independent of WEB_PARANOID.
func TestProjectGateOnAboutProject(t *testing.T) {
l0 := Layer0{Route: RouteGrokDirect}
for _, paranoid := range []bool{true, false} {
if got := Combine(l0, Verdict{AboutProject: true}, paranoid).Route; got != RouteProject {
t.Errorf("AboutProject=true (paranoid=%v) = %q, want project_then_grok", paranoid, got)
}
if got := Combine(l0, Verdict{AboutProject: false}, paranoid).Route; got == RouteProject {
t.Errorf("AboutProject=false (paranoid=%v) routed to project; must not", paranoid)
}
}
}
// TestProjectBeatsWebArms: the project arm is case #0 — it out-prioritizes even the hard
// freshness (WebForce) arm and the classifier web arms, because the curated KB, not the
// web, is the authoritative source for product facts ("какие новости у Vojo" trips
// freshness yet is a product question).
func TestProjectBeatsWebArms(t *testing.T) {
l0 := Layer0{Route: RouteWeb, WebForce: true} // freshness hit
v := Verdict{AboutProject: true, NeedsWeb: true, Verifiable: true, TimeSensitive: true, Confidence: 0.9}
for _, paranoid := range []bool{true, false} {
got := Combine(l0, v, paranoid)
if got.Route != RouteProject {
t.Errorf("project must beat web arms (paranoid=%v) = %q, want project_then_grok", paranoid, got.Route)
}
if got.WebDecidedBy != WebByNone {
t.Errorf("project route web_decided_by = %q, want none", got.WebDecidedBy)
}
}
}

View file

@ -11,6 +11,7 @@ import (
"fmt"
"os"
"os/signal"
"strings"
"syscall"
)
@ -55,12 +56,38 @@ func main() {
}
cfg.SystemPrompt = string(promptBytes)
// Load the curated project KB the same way (fail fast at startup, not on the first
// product question) when the route is enabled. LoadConfig already required the path; here
// we read it and reject an empty file (fail-closed — an empty KB would ground nothing).
// A KB much larger than the prompt budget is also refused so it can't blow maxPromptTokens
// (insertSystemNote adds it AFTER history truncation). Off → ProjectKB stays "".
if cfg.ProjectKBEnabled {
kbBytes, err := os.ReadFile(cfg.ProjectKBPath)
if err != nil {
logger.Error("cannot read project KB", "path", cfg.ProjectKBPath, "err", err)
os.Exit(1)
}
cfg.ProjectKB = string(kbBytes)
if strings.TrimSpace(cfg.ProjectKB) == "" {
logger.Error("PROJECT_KB_PATH is empty", "path", cfg.ProjectKBPath)
os.Exit(1)
}
if t := estimateTokens(cfg.ProjectKB); t > maxProjectKBTokens {
logger.Error("project KB is too large for the prompt budget",
"path", cfg.ProjectKBPath, "est_tokens", t, "max", maxProjectKBTokens)
os.Exit(1)
}
}
// `ai-bot check-config` validates env + prompt + state dir and exits 0.
// Used by the A1 acceptance check ("container starts, reads env") and as a
// cheap operator smoke test without touching the homeserver.
if len(os.Args) > 1 && os.Args[1] == "check-config" {
fmt.Println(cfg.Summary())
fmt.Printf(" SYSTEM_PROMPT = loaded (%d bytes)\n", len(cfg.SystemPrompt))
if cfg.ProjectKBEnabled {
fmt.Printf(" PROJECT_KB = loaded (%d bytes, ~%d tokens)\n", len(cfg.ProjectKB), estimateTokens(cfg.ProjectKB))
}
fmt.Println("config OK")
return
}

View file

@ -41,6 +41,7 @@ type RouterDecision struct {
TimeSensitive bool
Verifiable bool
TrivialScore bool // the classifier's raw "trivial" verdict
AboutProject bool // classifier "asking about the Vojo product" — routes to the KB (trusted)
LookupHint bool // Layer-0 soft hint (never sets the route on its own, §5)
WebDecidedBy string // which arm chose web — routedecide.WebBy* (request_log)
}
@ -64,10 +65,11 @@ Decide:
- "entity_obscure": true if the salient entity is plausibly long-tail / not a household name (a minor film, a non-famous person, a niche product) these are where memory fails hardest.
- "time_sensitive": true if the answer can change over time (news, prices, weather, standings, "current"/"latest"/"now").
- "trivial": true ONLY for a bare greeting, acknowledgement, or tiny arithmetic with no real question.
- "about_project": true ONLY if the user is asking about THIS chat app itself, called Vojo its concrete features, how to do something inside the app (calls, encryption, settings, rooms, channels), its limits, privacy, or pricing. Examples: "что ты умеешь", "what can this app do", "как включить шифрование здесь", "does Vojo support video calls". FALSE for any general-knowledge question that merely mentions a product or place name (including one coincidentally called Vojo that is not this app), and FALSE for a generic "what can an AI assistant do". When unsure, prefer FALSE.
- "search_query": a SELF-CONTAINED web search query for this message, written in the LANGUAGE of the user's latest message (an English message an English query; a Russian one a Russian query) so the results match the user's language and region instead of defaulting to one country. Resolve follow-ups from context (a bare "2024 года" after discussing a film becomes "<film name> 2024 фильм актёрский состав"). For broad/region-neutral requests (e.g. "interesting news") keep it general and international, don't narrow it to a single country. Empty string ONLY if both needs_web and verifiable are false.
- "confidence": 0.0-1.0, your honest certainty in needs_web.
Schema: {"needs_web":bool,"verifiable":bool,"entity_obscure":bool,"time_sensitive":bool,"trivial":bool,"search_query":"<query or empty>","confidence":0.0-1.0}
Schema: {"needs_web":bool,"verifiable":bool,"entity_obscure":bool,"time_sensitive":bool,"trivial":bool,"about_project":bool,"search_query":"<query or empty>","confidence":0.0-1.0}
Conversation:
`
@ -137,7 +139,7 @@ func (b *Bot) routeLayer1(ctx context.Context, rcx string, l0 rd.Layer0, cost *C
resp, err := b.gemini.Complete(ctx, LLMRequest{
Model: b.cfg.GeminiModel,
Messages: []Message{{Role: "user", Content: classifierPrompt + rcx}},
MaxTokens: 80, // was 60; the schema grew
MaxTokens: 110, // was 80; the schema grew (about_project added) — must not truncate
Temperature: 0,
})
if err != nil {
@ -162,6 +164,7 @@ func (b *Bot) routeLayer1(ctx context.Context, rcx string, l0 rd.Layer0, cost *C
EntityObscure: v.EntityObscure,
TimeSensitive: v.TimeSensitive,
TrivialScore: v.Trivial,
AboutProject: v.AboutProject,
SearchQuery: v.SearchQuery,
LookupHint: l0.LookupHint,
Freshness: l0.Freshness,
@ -173,7 +176,8 @@ func (b *Bot) routeLayer1(ctx context.Context, rcx string, l0 rd.Layer0, cost *C
"route", d.Route, "web_decided_by", d.WebDecidedBy, "needs_web", d.NeedsWeb,
"verifiable", d.Verifiable, "entity_obscure", d.EntityObscure,
"time_sensitive", d.TimeSensitive, "trivial", d.TrivialScore,
"confidence", d.Confidence, "lookup_hint", d.LookupHint, "paranoid", b.cfg.WebParanoid)
"about_project", d.AboutProject, "confidence", d.Confidence,
"lookup_hint", d.LookupHint, "paranoid", b.cfg.WebParanoid)
return d, nil
}

View file

@ -187,6 +187,13 @@ var migrations = []string{
ALTER TABLE request_log ADD COLUMN IF NOT EXISTS citation_count INT DEFAULT 0;
ALTER TABLE request_log ADD COLUMN IF NOT EXISTS search_query TEXT;
ALTER TABLE request_log ADD COLUMN IF NOT EXISTS answer_text TEXT;`,
// v6 (project-knowledge route): the classifier's about_project signal, so the offline eval
// can measure project-route hit/miss and "would have fired" rate (about_project=true while
// route=grok_direct when PROJECT_KB_ENABLED is off — the canary-clean measurement). The
// route itself needs NO column: request_log.route is TEXT and takes 'project_then_grok'
// like any other route. Append-only (never edit an earlier migration).
`ALTER TABLE request_log ADD COLUMN IF NOT EXISTS about_project BOOL DEFAULT false;`,
}
// migrate runs all pending migrations on a single connection under a session
@ -490,7 +497,8 @@ func (s *Store) InsertRequestLog(rl RequestLog) error {
latency_ms, stage_ms, escalated, fallback_fired, cache_hit, ceiling_hit,
per_user_cap_hit, prompt_version, provider_request_id, degraded, err, ok, query_text,
needs_web, entity_obscure, time_sensitive, verifiable, trivial_score, web_decided_by,
grounding_fee_usd, rewrite_used, web_grounded, citation_count, search_query, answer_text
grounding_fee_usd, rewrite_used, web_grounded, citation_count, search_query, answer_text,
about_project
) VALUES (
$1, $2, $3, $4, $5, $6, $7,
$8, $9, $10,
@ -498,7 +506,8 @@ func (s *Store) InsertRequestLog(rl RequestLog) error {
$16, $17, $18, $19, $20, $21,
$22, $23, $24, $25, $26, $27, $28,
$29, $30, $31, $32, $33, $34,
$35, $36, $37, $38, $39, $40
$35, $36, $37, $38, $39, $40,
$41
) ON CONFLICT (id) DO NOTHING`,
rl.ID, rl.RoomID, rl.Sender, rl.Route, rl.RouterSource, rl.RouterConfidence, models,
rl.PromptTokens, rl.CachedTokens, rl.CompletionTokens,
@ -506,7 +515,8 @@ func (s *Store) InsertRequestLog(rl RequestLog) error {
rl.LatencyMS, stages, rl.Escalated, rl.FallbackFired, rl.CacheHit, rl.CeilingHit,
rl.PerUserCapHit, rl.PromptVersion, rl.ProviderRequestID, rl.Degraded, rl.Err, rl.OK, nullIfEmpty(rl.QueryText),
rl.NeedsWeb, rl.EntityObscure, rl.TimeSensitive, rl.Verifiable, rl.TrivialScore, rl.WebDecidedBy,
rl.Cost.GroundingFee, rl.RewriteUsed, rl.WebGrounded, rl.CitationCount, nullIfEmpty(rl.SearchQuery), nullIfEmpty(rl.AnswerText))
rl.Cost.GroundingFee, rl.RewriteUsed, rl.WebGrounded, rl.CitationCount, nullIfEmpty(rl.SearchQuery), nullIfEmpty(rl.AnswerText),
rl.AboutProject)
return err
}

View file

@ -510,6 +510,7 @@ func TestStoreRequestLogClassifierColumns(t *testing.T) {
NeedsWeb: true,
EntityObscure: true,
Verifiable: true,
AboutProject: true,
WebDecidedBy: "entity_obscure",
RewriteUsed: true,
WebGrounded: true,
@ -524,21 +525,21 @@ func TestStoreRequestLogClassifierColumns(t *testing.T) {
ctx, cancel := opContext()
defer cancel()
var (
needsWeb, entityObscure, webGrounded, rewriteUsed bool
needsWeb, entityObscure, webGrounded, rewriteUsed, aboutProject bool
webDecidedBy string
fee, total float64
cites int
sq, ans *string
)
if err := st.pool.QueryRow(ctx, `SELECT needs_web, entity_obscure, web_decided_by, grounding_fee_usd,
rewrite_used, web_grounded, citation_count, search_query, answer_text, total_usd
rewrite_used, web_grounded, citation_count, search_query, answer_text, total_usd, about_project
FROM request_log WHERE id=$1`, rl.ID).Scan(&needsWeb, &entityObscure, &webDecidedBy, &fee,
&rewriteUsed, &webGrounded, &cites, &sq, &ans, &total); err != nil {
&rewriteUsed, &webGrounded, &cites, &sq, &ans, &total, &aboutProject); err != nil {
t.Fatalf("read: %v", err)
}
if !needsWeb || !entityObscure || webDecidedBy != "entity_obscure" || !rewriteUsed || !webGrounded || cites != 3 {
t.Fatalf("signal columns wrong: needsWeb=%v obscure=%v decidedBy=%q rewrite=%v grounded=%v cites=%d",
needsWeb, entityObscure, webDecidedBy, rewriteUsed, webGrounded, cites)
if !needsWeb || !entityObscure || webDecidedBy != "entity_obscure" || !rewriteUsed || !webGrounded || cites != 3 || !aboutProject {
t.Fatalf("signal columns wrong: needsWeb=%v obscure=%v decidedBy=%q rewrite=%v grounded=%v cites=%d about=%v",
needsWeb, entityObscure, webDecidedBy, rewriteUsed, webGrounded, cites, aboutProject)
}
if d := fee - 0.035; d > 1e-9 || d < -1e-9 {
t.Fatalf("grounding_fee_usd = %v, want 0.035", fee)

View file

@ -23,6 +23,7 @@ const (
routeTrivial = rd.RouteTrivial
routeWebThenGrok = rd.RouteWeb
routeReason = rd.RouteReason
routeProject = rd.RouteProject
)
// Degrade/skip reason strings (request_log.degraded). Stable tokens so the analytics
@ -39,6 +40,7 @@ const (
degradeTrivial = "trivial_failed"
degradeGroundCap = "grounding_cap"
degradeReasoning = "reasoning_failed"
degradeProject = "project_failed"
)
// telemetryTrimEvery bounds how often the retention trim runs — once per N writes,
@ -88,6 +90,7 @@ type RequestLog struct {
TimeSensitive bool
Verifiable bool
TrivialScore bool
AboutProject bool
WebDecidedBy string
RewriteUsed bool
WebGrounded bool

View file

@ -47,6 +47,7 @@ then dispatches; **any layer off or failing degrades to `grok_direct`** (never a
- **`trivial_direct`** — greetings/acks → cheap Gemini (`TRIVIAL_OFFLOAD_ENABLED`).
- **`web_then_grok`** — fresh facts: a WebProvider fetches a grounded digest + citations, then **Grok synthesises the answer in voice** ([web.go](../../apps/ai-bot/web.go)).
- **`reason_then_grok`** — manual trigger ("подумай глубже") → Grok at a higher `reasoning_effort`.
- **`project_then_grok`** — questions about the **Vojo product itself** (`PROJECT_KB_ENABLED`): a curated KB (operator data from `PROJECT_KB_PATH`, default the bundled `prompts/vojo_kb.txt`) is injected as a system note and **Grok answers product claims strictly from it** (anti-hallucination — Grok has no parametric Vojo knowledge, and the web doesn't either). Gated by the classifier's `about_project` signal (the context-aware judge — it resolves follow-ups like "Про этот" → the app); a false positive is bounded by the entity-scoped note. Beats every web arm. One Grok call, so it costs ~the same as `grok_direct`. See [docs/plans/ai_project_knowledge.md](../plans/ai_project_knowledge.md).
- Router = free Layer-0 regex + optional Layer-1 Gemini classifier; a confidence floor keeps uncertain cases on the safe floor (`grok_direct`).
**Invariant:** all cascade flags OFF == today's bot — a single `grok_direct` call, byte-identical wire body. Do not enable layers in prod until the offline-eval gate (build plan §9) passes.