feat(ai-bot): thread a per-request trace id through logs and add per-user gated model request/response logging
This commit is contained in:
parent
6256048ddd
commit
218d463be9
14 changed files with 389 additions and 61 deletions
|
|
@ -114,6 +114,44 @@ otherwise, since the USD only lands after each call).
|
||||||
(default 30) time-trims old rows. Turn telemetry on to MEASURE the base before enabling
|
(default 30) time-trims old rows. Turn telemetry on to MEASURE the base before enabling
|
||||||
any cascade layer.
|
any cascade layer.
|
||||||
|
|
||||||
|
### Observability — logs & per-request trace
|
||||||
|
|
||||||
|
The bot logs with the Go stdlib `log/slog` to **stderr**; `LOG_LEVEL`
|
||||||
|
(`debug|info|warn|error`, default `info`) and `LOG_FORMAT` (`text|json`, default `text`)
|
||||||
|
control it. Set `LOG_FORMAT=json` in prod so a collector (Fluent Bit / Vector / Filebeat)
|
||||||
|
can tail the container's stdout and ship the lines to OpenSearch / Loki — the bot itself
|
||||||
|
never talks to a log backend (12-factor: it just writes structured lines).
|
||||||
|
|
||||||
|
- **Trace id (always on, no content).** Every handled event gets a fresh `trace_id` (a
|
||||||
|
random 16-byte / 32-hex value — the W3C/OpenTelemetry trace-id shape, so the `trace_id`
|
||||||
|
field maps straight onto an OTel trace id later; full distributed tracing would still
|
||||||
|
need a span id + `traceparent` propagation). It is minted once at the **per-event
|
||||||
|
handler** and stamped into the request `context`, then attached to **every** log line
|
||||||
|
for that request — through the per-room goroutine and down to the HTTP call to the model
|
||||||
|
— so you can grep one `trace_id` to get the whole trail. (The appservice transaction-push
|
||||||
|
logs sit above the per-event handler and carry no `trace_id`; they correlate by their
|
||||||
|
Synapse txn id instead, since one transaction fans out to many events.) The Matrix
|
||||||
|
`event_id` is logged on the entry/skip lines too, and is the `request_log.ID`, so
|
||||||
|
logs ↔ telemetry correlate.
|
||||||
|
- **Routing / selection (DEBUG, no flag — metadata only).** At `LOG_LEVEL=debug` the
|
||||||
|
router's verdict (`route decided`: route, source, confidence, needs_web) and the final
|
||||||
|
outcome (`generation outcome`: route actually run, fallback, degrade reason, per-stage
|
||||||
|
ms, $) are logged. No message content — safe to leave on while debugging routing.
|
||||||
|
- **Model request/response bodies (gated per-user, DEBUG).** `LOG_BODIES_USERS` is a
|
||||||
|
comma-separated **allowlist of sender mxids** whose full model request/response bodies
|
||||||
|
are logged (`llm exchange`). Empty (default) = **nobody** — message content never enters
|
||||||
|
the logs. It is a **double gate**: a sender must be on the allowlist AND `LOG_LEVEL=debug`
|
||||||
|
must be set. Bodies are truncated to a fixed ~4 KB cap. Only the
|
||||||
|
request/response **bodies** are logged — never the URL or any header — so the API key
|
||||||
|
cannot leak on either transport. Use it to debug your own traffic, e.g.
|
||||||
|
`LOG_BODIES_USERS=@heaven:vojo.chat`. **Note:** once on, these lines contain cleartext
|
||||||
|
message content + the model's reply + the sender mxid (personal data) — so if you ship
|
||||||
|
them to OpenSearch/Loki, apply retention and access control at that sink accordingly.
|
||||||
|
|
||||||
|
`TELEMETRY_*` (below) is the separate **analytics** path (a `request_log` row per request);
|
||||||
|
the logs above are the **debug** path. They share the `trace_id`/`event_id` correlation
|
||||||
|
keys but are independent — telemetry can be off while debug logging is on, and vice versa.
|
||||||
|
|
||||||
### Cascade (Phase 2-4) — behind flags, **default OFF** (every layer off == today's bot)
|
### Cascade (Phase 2-4) — behind flags, **default OFF** (every layer off == today's bot)
|
||||||
|
|
||||||
All optional; an unset env is exactly today's single grok_direct call. Any layer off or
|
All optional; an unset env is exactly today's single grok_direct call. Any layer off or
|
||||||
|
|
|
||||||
|
|
@ -170,11 +170,11 @@ func (b *Bot) handleTransaction(ctx context.Context, events []Event) {
|
||||||
// detached from the HTTP handler, so an unrecovered panic there would crash the
|
// detached from the HTTP handler, so an unrecovered panic there would crash the
|
||||||
// whole process and silence the bot for EVERY room — recover + log instead, so one
|
// whole process and silence the bot for EVERY room — recover + log instead, so one
|
||||||
// malformed event can never take the bot down.
|
// malformed event can never take the bot down.
|
||||||
func (b *Bot) safego(what string, fn func()) {
|
func (b *Bot) safego(ctx context.Context, what string, fn func()) {
|
||||||
go func() {
|
go func() {
|
||||||
defer func() {
|
defer func() {
|
||||||
if r := recover(); r != nil {
|
if r := recover(); r != nil {
|
||||||
b.log.Error("recovered panic in handler goroutine", "what", what, "panic", r)
|
b.log.ErrorContext(ctx, "recovered panic in handler goroutine", "what", what, "panic", r)
|
||||||
}
|
}
|
||||||
}()
|
}()
|
||||||
fn()
|
fn()
|
||||||
|
|
@ -185,14 +185,18 @@ func (b *Bot) handleEvent(ctx context.Context, ev *Event) {
|
||||||
if ev.EventID == "" || ev.RoomID == "" {
|
if ev.EventID == "" || ev.RoomID == "" {
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
if !b.markSeen(ev.EventID) {
|
// Mint a fresh trace id for this event and stamp it (plus sender + the per-user
|
||||||
|
// body-logging decision) onto ctx. Every log line below — through the per-room
|
||||||
|
// goroutine and down to the model call — carries this trace_id automatically.
|
||||||
|
ctx = withRequestTrace(ctx, newTraceID(), ev.Sender, b.cfg.LogBodiesUsers[ev.Sender])
|
||||||
|
if !b.markSeen(ctx, ev.EventID) {
|
||||||
return // already handled (in-memory or durable dedup)
|
return // already handled (in-memory or durable dedup)
|
||||||
}
|
}
|
||||||
b.log.Debug("event", "type", ev.Type, "room", ev.RoomID, "sender", ev.Sender, "id", ev.EventID)
|
b.log.DebugContext(ctx, "event", "type", ev.Type, "room", ev.RoomID, "sender", ev.Sender, "id", ev.EventID)
|
||||||
switch ev.Type {
|
switch ev.Type {
|
||||||
case "m.room.member":
|
case "m.room.member":
|
||||||
if ev.StateKey != nil && *ev.StateKey == b.cfg.BotMXID {
|
if ev.StateKey != nil && *ev.StateKey == b.cfg.BotMXID {
|
||||||
b.safego("self-membership", func() { b.handleSelfMembership(ctx, ev) })
|
b.safego(ctx, "self-membership", func() { b.handleSelfMembership(ctx, ev) })
|
||||||
} else {
|
} else {
|
||||||
// A third party's membership changed: counts + foreign flag are now stale.
|
// A third party's membership changed: counts + foreign flag are now stale.
|
||||||
// Re-probe on the next message so a 1:1 that gains a member drops out of DM
|
// Re-probe on the next message so a 1:1 that gains a member drops out of DM
|
||||||
|
|
@ -215,13 +219,13 @@ func (b *Bot) handleEvent(ctx context.Context, ev *Event) {
|
||||||
// is an atomic INSERT … ON CONFLICT DO NOTHING, so two racing goroutines for the same
|
// is an atomic INSERT … ON CONFLICT DO NOTHING, so two racing goroutines for the same
|
||||||
// event can never both proceed. On a durable-store error we fall through (the in-memory set still
|
// event can never both proceed. On a durable-store error we fall through (the in-memory set still
|
||||||
// guards this session).
|
// guards this session).
|
||||||
func (b *Bot) markSeen(eventID string) bool {
|
func (b *Bot) markSeen(ctx context.Context, eventID string) bool {
|
||||||
if !b.seen.Add(eventID) {
|
if !b.seen.Add(eventID) {
|
||||||
return false
|
return false
|
||||||
}
|
}
|
||||||
isNew, err := b.st.SeenEvent(eventID)
|
isNew, err := b.st.SeenEvent(eventID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
b.log.Error("durable dedup check failed", "id", eventID, "err", err)
|
b.log.ErrorContext(ctx, "durable dedup check failed", "id", eventID, "err", err)
|
||||||
return true
|
return true
|
||||||
}
|
}
|
||||||
return isNew
|
return isNew
|
||||||
|
|
@ -234,15 +238,15 @@ func (b *Bot) handleSelfMembership(ctx context.Context, ev *Event) {
|
||||||
switch ev.membershipOf() {
|
switch ev.membershipOf() {
|
||||||
case "invite":
|
case "invite":
|
||||||
if !b.cfg.AllowedServers[serverOf(ev.Sender)] {
|
if !b.cfg.AllowedServers[serverOf(ev.Sender)] {
|
||||||
b.log.Warn("rejecting invite (server not allowed)", "room", ev.RoomID, "sender", ev.Sender)
|
b.log.WarnContext(ctx, "rejecting invite (server not allowed)", "room", ev.RoomID, "sender", ev.Sender)
|
||||||
if err := b.mx.LeaveRoom(ctx, ev.RoomID); err != nil {
|
if err := b.mx.LeaveRoom(ctx, ev.RoomID); err != nil {
|
||||||
b.log.Error("leave (reject) failed", "room", ev.RoomID, "err", err)
|
b.log.ErrorContext(ctx, "leave (reject) failed", "room", ev.RoomID, "err", err)
|
||||||
}
|
}
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
b.log.Info("accepting invite", "room", ev.RoomID, "sender", ev.Sender)
|
b.log.InfoContext(ctx, "accepting invite", "room", ev.RoomID, "sender", ev.Sender)
|
||||||
if err := b.mx.JoinRoom(ctx, ev.RoomID); err != nil {
|
if err := b.mx.JoinRoom(ctx, ev.RoomID); err != nil {
|
||||||
b.log.Error("join failed", "room", ev.RoomID, "err", err)
|
b.log.ErrorContext(ctx, "join failed", "room", ev.RoomID, "err", err)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
// Fully-on-allowed-servers gate: a vojo.chat inviter can still pull the bot
|
// Fully-on-allowed-servers gate: a vojo.chat inviter can still pull the bot
|
||||||
|
|
@ -258,9 +262,9 @@ func (b *Bot) handleSelfMembership(ctx context.Context, ev *Event) {
|
||||||
// leaveForeign leaves a room that contains a member outside ALLOWED_SERVERS, so
|
// leaveForeign leaves a room that contains a member outside ALLOWED_SERVERS, so
|
||||||
// the bot only ever operates in rooms hosted entirely on allowed homeservers.
|
// the bot only ever operates in rooms hosted entirely on allowed homeservers.
|
||||||
func (b *Bot) leaveForeign(ctx context.Context, roomID string) {
|
func (b *Bot) leaveForeign(ctx context.Context, roomID string) {
|
||||||
b.log.Warn("leaving room — a member is outside ALLOWED_SERVERS", "room", roomID)
|
b.log.WarnContext(ctx, "leaving room — a member is outside ALLOWED_SERVERS", "room", roomID)
|
||||||
if err := b.mx.LeaveRoom(ctx, roomID); err != nil {
|
if err := b.mx.LeaveRoom(ctx, roomID); err != nil {
|
||||||
b.log.Error("leave (foreign) failed", "room", roomID, "err", err)
|
b.log.ErrorContext(ctx, "leave (foreign) failed", "room", roomID, "err", err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -270,11 +274,11 @@ func (b *Bot) handleMessage(ctx context.Context, ev *Event) {
|
||||||
// A9/F15: re-check encryption; if (or once) encrypted, react once and skip — the
|
// A9/F15: re-check encryption; if (or once) encrypted, react once and skip — the
|
||||||
// bot can't read it. The probe runs without the lock.
|
// bot can't read it. The probe runs without the lock.
|
||||||
if b.ensureEncryption(ctx, roomID) {
|
if b.ensureEncryption(ctx, roomID) {
|
||||||
b.log.Debug("skip: encrypted room", "room", roomID)
|
b.log.DebugContext(ctx, "skip: encrypted room", "room", roomID)
|
||||||
// Log the skip only when we actually react (once per room), so an encrypted room
|
// Log the skip only when we actually react (once per room), so an encrypted room
|
||||||
// the bot can't read doesn't flood request_log with one row per message.
|
// the bot can't read doesn't flood request_log with one row per message.
|
||||||
if b.reactEncryptedOnce(ctx, roomID, ev.EventID) {
|
if b.reactEncryptedOnce(ctx, roomID, ev.EventID) {
|
||||||
b.recordSkip(ev, degradeEncrypted)
|
b.recordSkip(ctx, ev, degradeEncrypted)
|
||||||
}
|
}
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
@ -283,7 +287,7 @@ func (b *Bot) handleMessage(ctx context.Context, ev *Event) {
|
||||||
if !ok {
|
if !ok {
|
||||||
// A message addressed to the bot that we can't decode shouldn't vanish without
|
// A message addressed to the bot that we can't decode shouldn't vanish without
|
||||||
// a trace (no silent drops): log at WARN so it's visible at the default level.
|
// a trace (no silent drops): log at WARN so it's visible at the default level.
|
||||||
b.log.Warn("skip: undecodable message content", "room", roomID, "sender", ev.Sender, "id", ev.EventID)
|
b.log.WarnContext(ctx, "skip: undecodable message content", "room", roomID, "sender", ev.Sender, "id", ev.EventID)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
// Edits re-carry m.mentions; never re-trigger or replay them (F16).
|
// Edits re-carry m.mentions; never re-trigger or replay them (F16).
|
||||||
|
|
@ -307,7 +311,7 @@ func (b *Bot) handleMessage(ctx context.Context, ev *Event) {
|
||||||
// "leak" the bot into) a federated room with non-consenting third parties.
|
// "leak" the bot into) a federated room with non-consenting third parties.
|
||||||
if foreign {
|
if foreign {
|
||||||
b.leaveForeign(ctx, roomID)
|
b.leaveForeign(ctx, roomID)
|
||||||
b.recordSkip(ev, degradeForeign)
|
b.recordSkip(ctx, ev, degradeForeign)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -322,10 +326,10 @@ func (b *Bot) handleMessage(ctx context.Context, ev *Event) {
|
||||||
// group (answer only on mention). Log at WARN — not a silent Debug drop —
|
// group (answer only on mention). Log at WARN — not a silent Debug drop —
|
||||||
// so it's visible; we don't react because reacting in a group the bot isn't
|
// so it's visible; we don't react because reacting in a group the bot isn't
|
||||||
// addressed in would be wrong. Re-probed on the next message.
|
// addressed in would be wrong. Re-probed on the next message.
|
||||||
b.log.Warn("skip: room unclassified (member probe failed), message not an explicit mention",
|
b.log.WarnContext(ctx, "skip: room unclassified (member probe failed), message not an explicit mention",
|
||||||
"room", roomID, "sender", ev.Sender)
|
"room", roomID, "sender", ev.Sender)
|
||||||
} else {
|
} else {
|
||||||
b.log.Debug("skip: not addressed (group without mention)", "room", roomID, "sender", ev.Sender,
|
b.log.DebugContext(ctx, "skip: not addressed (group without mention)", "room", roomID, "sender", ev.Sender,
|
||||||
"dm", isDM, "mentioned", mentioned)
|
"dm", isDM, "mentioned", mentioned)
|
||||||
}
|
}
|
||||||
return
|
return
|
||||||
|
|
@ -333,9 +337,9 @@ func (b *Bot) handleMessage(ctx context.Context, ev *Event) {
|
||||||
|
|
||||||
// Addressed but not text: react "text only" (no silent drop).
|
// Addressed but not text: react "text only" (no silent drop).
|
||||||
if isMedia {
|
if isMedia {
|
||||||
b.log.Debug("skip: non-text msgtype (reacted)", "room", roomID, "sender", ev.Sender, "msgtype", mc.MsgType)
|
b.log.DebugContext(ctx, "skip: non-text msgtype (reacted)", "room", roomID, "sender", ev.Sender, "msgtype", mc.MsgType)
|
||||||
b.react(ctx, roomID, ev.EventID, reactMedia)
|
b.react(ctx, roomID, ev.EventID, reactMedia)
|
||||||
b.recordSkip(ev, degradeMedia)
|
b.recordSkip(ctx, ev, degradeMedia)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -354,7 +358,7 @@ func (b *Bot) handleMessage(ctx context.Context, ev *Event) {
|
||||||
// another thread in the same room, and two messages in the SAME conversation still see
|
// another thread in the same room, and two messages in the SAME conversation still see
|
||||||
// exactly one winner. The claim is taken synchronously in transaction order.
|
// exactly one winner. The claim is taken synchronously in transaction order.
|
||||||
if !b.tryClaim(roomID, threadRoot) {
|
if !b.tryClaim(roomID, threadRoot) {
|
||||||
b.log.Debug("drop: conversation busy generating", "room", roomID, "thread", threadRoot, "sender", ev.Sender)
|
b.log.DebugContext(ctx, "drop: conversation busy generating", "room", roomID, "thread", threadRoot, "sender", ev.Sender)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -364,7 +368,7 @@ func (b *Bot) handleMessage(ctx context.Context, ev *Event) {
|
||||||
// trigger+answer to the same per-thread buffer on success (see sendReply) and releases
|
// trigger+answer to the same per-thread buffer on success (see sendReply) and releases
|
||||||
// the claim.
|
// the claim.
|
||||||
history := b.snapshotBuf(roomID, threadRoot)
|
history := b.snapshotBuf(roomID, threadRoot)
|
||||||
b.safego("respond", func() {
|
b.safego(ctx, "respond", func() {
|
||||||
defer b.release(roomID, threadRoot)
|
defer b.release(roomID, threadRoot)
|
||||||
b.respond(ctx, roomID, threadRoot, isDM, ev, mc, history)
|
b.respond(ctx, roomID, threadRoot, isDM, ev, mc, history)
|
||||||
})
|
})
|
||||||
|
|
@ -389,7 +393,7 @@ func (b *Bot) respond(ctx context.Context, roomID, threadRoot string, isDM bool,
|
||||||
}
|
}
|
||||||
defer func() {
|
defer func() {
|
||||||
rl.LatencyMS = int(time.Since(started).Milliseconds())
|
rl.LatencyMS = int(time.Since(started).Milliseconds())
|
||||||
b.recordTelemetry(rl)
|
b.recordTelemetry(ctx, rl)
|
||||||
}()
|
}()
|
||||||
|
|
||||||
perUserCap := b.cfg.PerUserDailyCap
|
perUserCap := b.cfg.PerUserDailyCap
|
||||||
|
|
@ -406,14 +410,14 @@ func (b *Bot) respond(ctx context.Context, roomID, threadRoot string, isDM bool,
|
||||||
switch res, err := b.st.Reserve(ev.Sender, perUserCap, perUserUSD, b.cfg.DailyUSDCeiling, estimate); {
|
switch res, err := b.st.Reserve(ev.Sender, perUserCap, perUserUSD, b.cfg.DailyUSDCeiling, estimate); {
|
||||||
case err != nil:
|
case err != nil:
|
||||||
// A limiter failure is on our side — don't leave the user wondering.
|
// A limiter failure is on our side — don't leave the user wondering.
|
||||||
b.log.Error("limiter reserve failed", "sender", ev.Sender, "err", err)
|
b.log.ErrorContext(ctx, "limiter reserve failed", "sender", ev.Sender, "err", err)
|
||||||
rl.Degraded, rl.Err = degradeReserveErr, err.Error()
|
rl.Degraded, rl.Err = degradeReserveErr, err.Error()
|
||||||
b.react(ctx, roomID, ev.EventID, reactError)
|
b.react(ctx, roomID, ev.EventID, reactError)
|
||||||
return
|
return
|
||||||
case res == reserveDeniedUser:
|
case res == reserveDeniedUser:
|
||||||
// Per-user cap (anti-abuse, F24): stop answering, but always signal the limit —
|
// Per-user cap (anti-abuse, F24): stop answering, but always signal the limit —
|
||||||
// no message addressed to the bot is left without feedback.
|
// no message addressed to the bot is left without feedback.
|
||||||
b.log.Info("per-user daily cap reached; reacting", "sender", ev.Sender)
|
b.log.InfoContext(ctx, "per-user daily cap reached; reacting", "sender", ev.Sender)
|
||||||
rl.PerUserCapHit = true
|
rl.PerUserCapHit = true
|
||||||
b.react(ctx, roomID, ev.EventID, reactRateLimit)
|
b.react(ctx, roomID, ev.EventID, reactRateLimit)
|
||||||
return
|
return
|
||||||
|
|
@ -421,7 +425,7 @@ func (b *Bot) respond(ctx context.Context, roomID, threadRoot string, isDM bool,
|
||||||
// Global USD ceiling. A reaction is cheap and non-intrusive (unlike the old
|
// Global USD ceiling. A reaction is cheap and non-intrusive (unlike the old
|
||||||
// once-per-day text notice), so signal every affected message rather than
|
// once-per-day text notice), so signal every affected message rather than
|
||||||
// going silent after the first.
|
// going silent after the first.
|
||||||
b.log.Warn("global daily USD ceiling reached", "room", roomID, "sender", ev.Sender)
|
b.log.WarnContext(ctx, "global daily USD ceiling reached", "room", roomID, "sender", ev.Sender)
|
||||||
rl.CeilingHit = true
|
rl.CeilingHit = true
|
||||||
b.react(ctx, roomID, ev.EventID, reactRateLimit)
|
b.react(ctx, roomID, ev.EventID, reactRateLimit)
|
||||||
return
|
return
|
||||||
|
|
@ -440,10 +444,10 @@ func (b *Bot) respond(ctx context.Context, roomID, threadRoot string, isDM bool,
|
||||||
}
|
}
|
||||||
rl.Degraded, rl.Err = "panic", "generation panicked or returned without settling"
|
rl.Degraded, rl.Err = "panic", "generation panicked or returned without settling"
|
||||||
if rerr := b.st.ReleaseReservation(ev.Sender, estimate); rerr != nil {
|
if rerr := b.st.ReleaseReservation(ev.Sender, estimate); rerr != nil {
|
||||||
b.log.Error("release reservation (unsettled) failed", "sender", ev.Sender, "err", rerr)
|
b.log.ErrorContext(ctx, "release reservation (unsettled) failed", "sender", ev.Sender, "err", rerr)
|
||||||
}
|
}
|
||||||
if rerr := b.st.RefundRequest(ev.Sender); rerr != nil {
|
if rerr := b.st.RefundRequest(ev.Sender); rerr != nil {
|
||||||
b.log.Error("refund (unsettled) failed", "sender", ev.Sender, "err", rerr)
|
b.log.ErrorContext(ctx, "refund (unsettled) failed", "sender", ev.Sender, "err", rerr)
|
||||||
}
|
}
|
||||||
b.react(ctx, roomID, ev.EventID, reactError)
|
b.react(ctx, roomID, ev.EventID, reactError)
|
||||||
}()
|
}()
|
||||||
|
|
@ -484,20 +488,28 @@ func (b *Bot) respond(ctx context.Context, roomID, threadRoot string, isDM bool,
|
||||||
rl.Degraded = res.degraded
|
rl.Degraded = res.degraded
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// The full routing/generation picture for one request, in one line: which route ran,
|
||||||
|
// whether it was a fallback, the degrade reason (if any), the per-stage timings and
|
||||||
|
// the spend. At DEBUG so prod stays quiet; turn LOG_LEVEL=debug on to trace routing.
|
||||||
|
b.log.DebugContext(ctx, "generation outcome",
|
||||||
|
"route", res.route, "router_source", res.decision.Source,
|
||||||
|
"router_confidence", res.decision.Confidence, "fallback", res.fallback,
|
||||||
|
"degraded", res.degraded, "stage_ms", res.stageMS, "usd", res.cost.Total())
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
// Terminal: even grok_direct failed. Settle whatever the cascade ACTUALLY spent
|
// Terminal: even grok_direct failed. Settle whatever the cascade ACTUALLY spent
|
||||||
// (e.g. a paid web fetch before the failure) and release the rest of the
|
// (e.g. a paid web fetch before the failure) and release the rest of the
|
||||||
// reservation in one step, then refund the request slot so an outage doesn't burn
|
// reservation in one step, then refund the request slot so an outage doesn't burn
|
||||||
// the cap, and react (never silent). Settle with an all-zero cost is just a
|
// the cap, and react (never silent). Settle with an all-zero cost is just a
|
||||||
// release, so a pure grok_direct failure books nothing — exactly as before.
|
// release, so a pure grok_direct failure books nothing — exactly as before.
|
||||||
b.log.Error("generation failed", "sender", ev.Sender, "route", res.route, "err", err)
|
b.log.ErrorContext(ctx, "generation failed", "sender", ev.Sender, "route", res.route, "err", err)
|
||||||
rl.Err = err.Error()
|
rl.Err = err.Error()
|
||||||
if serr := b.st.Settle(ev.Sender, estimate, res.cost); serr != nil {
|
if serr := b.st.Settle(ev.Sender, estimate, res.cost); serr != nil {
|
||||||
b.log.Error("settle (failed request) failed", "sender", ev.Sender, "err", serr)
|
b.log.ErrorContext(ctx, "settle (failed request) failed", "sender", ev.Sender, "err", serr)
|
||||||
}
|
}
|
||||||
settled = true
|
settled = true
|
||||||
if rerr := b.st.RefundRequest(ev.Sender); rerr != nil {
|
if rerr := b.st.RefundRequest(ev.Sender); rerr != nil {
|
||||||
b.log.Error("refund failed", "sender", ev.Sender, "err", rerr)
|
b.log.ErrorContext(ctx, "refund failed", "sender", ev.Sender, "err", rerr)
|
||||||
}
|
}
|
||||||
b.react(ctx, roomID, ev.EventID, reactError)
|
b.react(ctx, roomID, ev.EventID, reactError)
|
||||||
return
|
return
|
||||||
|
|
@ -506,7 +518,7 @@ func (b *Bot) respond(ctx context.Context, roomID, threadRoot string, isDM bool,
|
||||||
// Success from some route. Settle: release the reservation and book the real
|
// Success from some route. Settle: release the reservation and book the real
|
||||||
// per-component cost, so both caps see grounding/tool fees too — not just tokens.
|
// per-component cost, so both caps see grounding/tool fees too — not just tokens.
|
||||||
if err := b.st.Settle(ev.Sender, estimate, res.cost); err != nil {
|
if err := b.st.Settle(ev.Sender, estimate, res.cost); err != nil {
|
||||||
b.log.Error("settle spend failed", "sender", ev.Sender, "err", err)
|
b.log.ErrorContext(ctx, "settle spend failed", "sender", ev.Sender, "err", err)
|
||||||
}
|
}
|
||||||
settled = true
|
settled = true
|
||||||
rl.PromptTokens, rl.CachedTokens, rl.CompletionTokens =
|
rl.PromptTokens, rl.CachedTokens, rl.CompletionTokens =
|
||||||
|
|
@ -520,21 +532,21 @@ func (b *Bot) respond(ctx context.Context, roomID, threadRoot string, isDM bool,
|
||||||
// leave a billed request without feedback — react "couldn't answer". The slot
|
// leave a billed request without feedback — react "couldn't answer". The slot
|
||||||
// stays consumed (the 2xx was real); no refund, or an empty reply could be forced
|
// stays consumed (the 2xx was real); no refund, or an empty reply could be forced
|
||||||
// to dodge the cap.
|
// to dodge the cap.
|
||||||
b.log.Warn("empty completion (billed, reacting)", "sender", ev.Sender, "usd", res.cost.Total())
|
b.log.WarnContext(ctx, "empty completion (billed, reacting)", "sender", ev.Sender, "usd", res.cost.Total())
|
||||||
rl.Degraded = degradeEmpty
|
rl.Degraded = degradeEmpty
|
||||||
b.react(ctx, roomID, ev.EventID, reactError)
|
b.react(ctx, roomID, ev.EventID, reactError)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
b.log.Info("answered", "room", roomID, "sender", ev.Sender, "dm", isDM, "route", res.route,
|
b.log.InfoContext(ctx, "answered", "room", roomID, "sender", ev.Sender, "dm", isDM, "route", res.route,
|
||||||
"usd", res.cost.Total(), "prompt_tokens", res.usage.PromptTokens, "completion_tokens", res.usage.CompletionTokens)
|
"usd", res.cost.Total(), "prompt_tokens", res.usage.PromptTokens, "completion_tokens", res.usage.CompletionTokens)
|
||||||
if err := b.sendReply(ctx, roomID, threadRoot, ev, mc, text); err != nil {
|
if err := b.sendReply(ctx, roomID, threadRoot, ev, mc, text); err != nil {
|
||||||
// Paid silence (§8.1): the spend is real (USD is kept — refunding it would
|
// Paid silence (§8.1): the spend is real (USD is kept — refunding it would
|
||||||
// under-count the ceiling), but the reply never landed. Refund the request SLOT
|
// under-count the ceiling), but the reply never landed. Refund the request SLOT
|
||||||
// so the user can retry, and react ⚠️ so the failure isn't silent.
|
// so the user can retry, and react ⚠️ so the failure isn't silent.
|
||||||
b.log.Error("send reply failed after billing; refunding slot + reacting", "sender", ev.Sender, "err", err)
|
b.log.ErrorContext(ctx, "send reply failed after billing; refunding slot + reacting", "sender", ev.Sender, "err", err)
|
||||||
rl.Degraded, rl.Err = degradeSendFailed, err.Error()
|
rl.Degraded, rl.Err = degradeSendFailed, err.Error()
|
||||||
if rerr := b.st.RefundRequest(ev.Sender); rerr != nil {
|
if rerr := b.st.RefundRequest(ev.Sender); rerr != nil {
|
||||||
b.log.Error("refund failed", "sender", ev.Sender, "err", rerr)
|
b.log.ErrorContext(ctx, "refund failed", "sender", ev.Sender, "err", rerr)
|
||||||
}
|
}
|
||||||
b.react(ctx, roomID, ev.EventID, reactError)
|
b.react(ctx, roomID, ev.EventID, reactError)
|
||||||
return
|
return
|
||||||
|
|
@ -605,7 +617,7 @@ func (b *Bot) react(ctx context.Context, roomID, eventID, emoji string) {
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
if _, err := b.mx.SendEvent(ctx, roomID, "m.reaction", content); err != nil {
|
if _, err := b.mx.SendEvent(ctx, roomID, "m.reaction", content); err != nil {
|
||||||
b.log.Error("react failed", "room", roomID, "emoji", emoji, "err", err)
|
b.log.ErrorContext(ctx, "react failed", "room", roomID, "emoji", emoji, "err", err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -619,7 +631,7 @@ func (b *Bot) react(ctx context.Context, roomID, eventID, emoji string) {
|
||||||
func (b *Bot) reactEncryptedOnce(ctx context.Context, roomID, eventID string) bool {
|
func (b *Bot) reactEncryptedOnce(ctx context.Context, roomID, eventID string) bool {
|
||||||
warned, err := b.st.HasWarnedEncrypted(roomID)
|
warned, err := b.st.HasWarnedEncrypted(roomID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
b.log.Error("warned-flag read failed", "room", roomID, "err", err)
|
b.log.ErrorContext(ctx, "warned-flag read failed", "room", roomID, "err", err)
|
||||||
return false
|
return false
|
||||||
}
|
}
|
||||||
if warned {
|
if warned {
|
||||||
|
|
@ -627,7 +639,7 @@ func (b *Bot) reactEncryptedOnce(ctx context.Context, roomID, eventID string) bo
|
||||||
}
|
}
|
||||||
b.react(ctx, roomID, eventID, reactEncrypted)
|
b.react(ctx, roomID, eventID, reactEncrypted)
|
||||||
if err := b.st.SetWarnedEncrypted(roomID); err != nil {
|
if err := b.st.SetWarnedEncrypted(roomID); err != nil {
|
||||||
b.log.Error("persist warned-flag failed", "room", roomID, "err", err)
|
b.log.ErrorContext(ctx, "persist warned-flag failed", "room", roomID, "err", err)
|
||||||
}
|
}
|
||||||
return true
|
return true
|
||||||
}
|
}
|
||||||
|
|
@ -656,7 +668,7 @@ func (b *Bot) sendMessage(ctx context.Context, roomID, threadRoot string, trigge
|
||||||
content := buildNoticeContent(trigger.EventID, trigger.Sender, threadRoot, body)
|
content := buildNoticeContent(trigger.EventID, trigger.Sender, threadRoot, body)
|
||||||
id, err := b.mx.SendEvent(ctx, roomID, "m.room.message", content)
|
id, err := b.mx.SendEvent(ctx, roomID, "m.room.message", content)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
b.log.Error("send failed", "room", roomID, "err", err)
|
b.log.ErrorContext(ctx, "send failed", "room", roomID, "err", err)
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
// Track our own reply so a future reply-to-it is recognised as addressing us.
|
// Track our own reply so a future reply-to-it is recognised as addressing us.
|
||||||
|
|
@ -732,7 +744,7 @@ func (b *Bot) typingRelease(roomID string) (last bool) {
|
||||||
// non-fatal). The 30s server-side timeout is refreshed by startTypingKeepalive.
|
// non-fatal). The 30s server-side timeout is refreshed by startTypingKeepalive.
|
||||||
func (b *Bot) setTyping(ctx context.Context, roomID string, typing bool) {
|
func (b *Bot) setTyping(ctx context.Context, roomID string, typing bool) {
|
||||||
if err := b.mx.SendTyping(ctx, roomID, typing, 30000); err != nil {
|
if err := b.mx.SendTyping(ctx, roomID, typing, 30000); err != nil {
|
||||||
b.log.Debug("set typing failed", "room", roomID, "typing", typing, "err", err)
|
b.log.DebugContext(ctx, "set typing failed", "room", roomID, "typing", typing, "err", err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -871,7 +883,7 @@ func (b *Bot) ensureEncryption(ctx context.Context, roomID string) bool {
|
||||||
|
|
||||||
enc, err := b.mx.RoomEncrypted(ctx, roomID)
|
enc, err := b.mx.RoomEncrypted(ctx, roomID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
b.log.Warn("encryption probe failed", "room", roomID, "err", err)
|
b.log.WarnContext(ctx, "encryption probe failed", "room", roomID, "err", err)
|
||||||
return false // leave unknown; re-probed on the next message
|
return false // leave unknown; re-probed on the next message
|
||||||
}
|
}
|
||||||
// Re-fetch under the lock instead of writing to the pointer captured before the
|
// Re-fetch under the lock instead of writing to the pointer captured before the
|
||||||
|
|
@ -897,7 +909,7 @@ func (b *Bot) ensureCounts(ctx context.Context, roomID string) (countsKnown, isD
|
||||||
if !known {
|
if !known {
|
||||||
joined, invited, servers, err := b.mx.RoomMembership(ctx, roomID)
|
joined, invited, servers, err := b.mx.RoomMembership(ctx, roomID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
b.log.Warn("member probe failed", "room", roomID, "err", err)
|
b.log.WarnContext(ctx, "member probe failed", "room", roomID, "err", err)
|
||||||
return false, false, false
|
return false, false, false
|
||||||
}
|
}
|
||||||
isForeign := false
|
isForeign := false
|
||||||
|
|
|
||||||
|
|
@ -68,6 +68,14 @@ func (b *Bot) generate(ctx context.Context, body string, msgs []Message, convID
|
||||||
res.stageMS["router"] = msSince(t0)
|
res.stageMS["router"] = msSince(t0)
|
||||||
res.route = res.decision.Route
|
res.route = res.decision.Route
|
||||||
|
|
||||||
|
// The router's pre-dispatch verdict (what it chose, why, how sure). On a degrade the
|
||||||
|
// route that actually runs differs from this — respond logs that final outcome — so
|
||||||
|
// the two lines together show "router wanted X, we ran Y". DEBUG: routing diagnostics.
|
||||||
|
b.log.DebugContext(ctx, "route decided",
|
||||||
|
"route", res.decision.Route, "source", res.decision.Source,
|
||||||
|
"confidence", res.decision.Confidence, "needs_web", res.decision.NeedsWeb,
|
||||||
|
"reasoning_level", res.decision.ReasoningLevel)
|
||||||
|
|
||||||
finalMsgs := msgs
|
finalMsgs := msgs
|
||||||
switch res.decision.Route {
|
switch res.decision.Route {
|
||||||
case routeTrivial:
|
case routeTrivial:
|
||||||
|
|
@ -75,7 +83,7 @@ func (b *Bot) generate(ctx context.Context, body string, msgs []Message, convID
|
||||||
if err := b.genTrivial(ctx, msgs, &res); err == nil {
|
if err := b.genTrivial(ctx, msgs, &res); err == nil {
|
||||||
return res, nil
|
return res, nil
|
||||||
} else {
|
} else {
|
||||||
b.log.Warn("trivial offload failed; degrading to grok_direct", "err", err)
|
b.log.WarnContext(ctx, "trivial offload failed; degrading to grok_direct", "err", err)
|
||||||
b.degradeTo(&res, degradeTrivial)
|
b.degradeTo(&res, degradeTrivial)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -84,7 +92,7 @@ func (b *Bot) generate(ctx context.Context, body string, msgs []Message, convID
|
||||||
if err := b.genWebThenGrok(ctx, body, msgs, convID, &res); err == nil {
|
if err := b.genWebThenGrok(ctx, body, msgs, convID, &res); err == nil {
|
||||||
return res, nil
|
return res, nil
|
||||||
} else {
|
} else {
|
||||||
b.log.Warn("web route failed; degrading to grok_direct", "err", err, "reason", res.degraded)
|
b.log.WarnContext(ctx, "web route failed; degrading to grok_direct", "err", err, "reason", res.degraded)
|
||||||
b.degradeTo(&res, degradeWeb)
|
b.degradeTo(&res, degradeWeb)
|
||||||
// The question wanted fresh facts but we have none — answer from training
|
// The question wanted fresh facts but we have none — answer from training
|
||||||
// knowledge WITH an honest staleness caveat, not stale-as-current (§8.2.1).
|
// knowledge WITH an honest staleness caveat, not stale-as-current (§8.2.1).
|
||||||
|
|
@ -96,7 +104,7 @@ func (b *Bot) generate(ctx context.Context, body string, msgs []Message, convID
|
||||||
if err := b.genReason(ctx, msgs, convID, &res); err == nil {
|
if err := b.genReason(ctx, msgs, convID, &res); err == nil {
|
||||||
return res, nil
|
return res, nil
|
||||||
} else {
|
} else {
|
||||||
b.log.Warn("reasoning route failed; degrading to grok_direct", "err", err)
|
b.log.WarnContext(ctx, "reasoning route failed; degrading to grok_direct", "err", err)
|
||||||
b.degradeTo(&res, degradeReasoning)
|
b.degradeTo(&res, degradeReasoning)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -146,6 +146,13 @@ type Config struct {
|
||||||
// It holds only operational state (txn/event dedup, the daily spend ledger, the
|
// It holds only operational state (txn/event dedup, the daily spend ledger, the
|
||||||
// encrypted-warned set) — never message content. Required.
|
// encrypted-warned set) — never message content. Required.
|
||||||
DatabaseURL string
|
DatabaseURL string
|
||||||
|
|
||||||
|
// LogBodiesUsers is the allowlist of sender mxids whose model request/response
|
||||||
|
// BODIES are logged in full (truncated, at DEBUG) for debugging — everyone else gets
|
||||||
|
// routing + metadata logs only. Empty (default) = nobody, so message content never
|
||||||
|
// enters the logs unless an operator opts a specific user in AND runs at
|
||||||
|
// LOG_LEVEL=debug. Parsed from LOG_BODIES_USERS (comma-separated mxids).
|
||||||
|
LogBodiesUsers map[string]bool
|
||||||
}
|
}
|
||||||
|
|
||||||
func getenv(key, def string) string {
|
func getenv(key, def string) string {
|
||||||
|
|
@ -240,6 +247,7 @@ func LoadConfig() (*Config, error) {
|
||||||
DatabaseURL: getenv("AI_BOT_DATABASE_URL", ""),
|
DatabaseURL: getenv("AI_BOT_DATABASE_URL", ""),
|
||||||
AllowedServers: parseServerSet(getenv("ALLOWED_SERVERS", "")),
|
AllowedServers: parseServerSet(getenv("ALLOWED_SERVERS", "")),
|
||||||
UnlimitedUsers: parseServerSet(getenv("UNLIMITED_USERS", "")),
|
UnlimitedUsers: parseServerSet(getenv("UNLIMITED_USERS", "")),
|
||||||
|
LogBodiesUsers: parseServerSet(getenv("LOG_BODIES_USERS", "")),
|
||||||
|
|
||||||
// Cascade string-valued config (flags/ints/secrets parsed below).
|
// Cascade string-valued config (flags/ints/secrets parsed below).
|
||||||
GrokReasoningEffort: strings.ToLower(strings.TrimSpace(getenv("GROK_REASONING_EFFORT", ""))),
|
GrokReasoningEffort: strings.ToLower(strings.TrimSpace(getenv("GROK_REASONING_EFFORT", ""))),
|
||||||
|
|
@ -460,6 +468,10 @@ func (c *Config) Summary() string {
|
||||||
for u := range c.UnlimitedUsers {
|
for u := range c.UnlimitedUsers {
|
||||||
unlimited = append(unlimited, u)
|
unlimited = append(unlimited, u)
|
||||||
}
|
}
|
||||||
|
bodyUsers := make([]string, 0, len(c.LogBodiesUsers))
|
||||||
|
for u := range c.LogBodiesUsers {
|
||||||
|
bodyUsers = append(bodyUsers, u)
|
||||||
|
}
|
||||||
redact := func(s string) string {
|
redact := func(s string) string {
|
||||||
if s == "" {
|
if s == "" {
|
||||||
return "(unset)"
|
return "(unset)"
|
||||||
|
|
@ -505,6 +517,13 @@ func (c *Config) Summary() string {
|
||||||
fmt.Sprintf(" GROK_PROMPT_CACHE = %t", c.GrokPromptCache),
|
fmt.Sprintf(" GROK_PROMPT_CACHE = %t", c.GrokPromptCache),
|
||||||
fmt.Sprintf(" TELEMETRY_ENABLED = %t (store_text=%t, retention=%s)",
|
fmt.Sprintf(" TELEMETRY_ENABLED = %t (store_text=%t, retention=%s)",
|
||||||
c.TelemetryEnabled, c.TelemetryStoreText, c.TelemetryRetention),
|
c.TelemetryEnabled, c.TelemetryStoreText, c.TelemetryRetention),
|
||||||
|
fmt.Sprintf(" LOG_BODIES_USERS = %s (needs LOG_LEVEL=debug)",
|
||||||
|
func() string {
|
||||||
|
if len(bodyUsers) == 0 {
|
||||||
|
return "(none — bodies never logged)"
|
||||||
|
}
|
||||||
|
return strings.Join(bodyUsers, ",")
|
||||||
|
}()),
|
||||||
fmt.Sprintf(" CASCADE: router=%t classifier=%t trivial=%t web=%t(%s, cap=%d) reason=%t(%s)",
|
fmt.Sprintf(" CASCADE: router=%t classifier=%t trivial=%t web=%t(%s, cap=%d) reason=%t(%s)",
|
||||||
c.RouterEnabled, c.RouterClassifierEnabled, c.TrivialOffloadEnabled,
|
c.RouterEnabled, c.RouterClassifierEnabled, c.TrivialOffloadEnabled,
|
||||||
c.WebEnabled, c.WebProvider, c.WebGroundingDailyCap, c.ReasoningEnabled, c.ReasoningEffort),
|
c.WebEnabled, c.WebProvider, c.WebGroundingDailyCap, c.ReasoningEnabled, c.ReasoningEffort),
|
||||||
|
|
|
||||||
|
|
@ -137,7 +137,7 @@ func (c *openAIClient) complete(ctx context.Context, reqBody openAIRequest, reqH
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
if c.log != nil {
|
if c.log != nil {
|
||||||
c.log.Warn(c.name+" attempt failed, will retry", "attempt", attempt+1, "max", c.maxTry, "err", err)
|
c.log.WarnContext(ctx, c.name+" attempt failed, will retry", "attempt", attempt+1, "max", c.maxTry, "err", err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
return nil, fmt.Errorf("%s: exhausted %d attempts: %w", c.name, c.maxTry, lastErr)
|
return nil, fmt.Errorf("%s: exhausted %d attempts: %w", c.name, c.maxTry, lastErr)
|
||||||
|
|
@ -172,6 +172,10 @@ func (c *openAIClient) attempt(ctx context.Context, payload []byte, reqHeaders m
|
||||||
defer resp.Body.Close()
|
defer resp.Body.Close()
|
||||||
data, _ := io.ReadAll(resp.Body)
|
data, _ := io.ReadAll(resp.Body)
|
||||||
|
|
||||||
|
// Gated, per-user, DEBUG: the full request/response bodies for opted-in senders.
|
||||||
|
// payload never contains the API key (that's the Authorization header, not logged).
|
||||||
|
logLLMExchange(ctx, c.log, c.name, payload, resp.StatusCode, data)
|
||||||
|
|
||||||
if resp.StatusCode == http.StatusTooManyRequests || resp.StatusCode >= 500 {
|
if resp.StatusCode == http.StatusTooManyRequests || resp.StatusCode >= 500 {
|
||||||
return nil, true, fmt.Errorf("%s http %d: %s", c.name, resp.StatusCode, snippet(data))
|
return nil, true, fmt.Errorf("%s http %d: %s", c.name, resp.StatusCode, snippet(data))
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,7 @@
|
||||||
package main
|
package main
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
"context"
|
||||||
"log/slog"
|
"log/slog"
|
||||||
"os"
|
"os"
|
||||||
"strings"
|
"strings"
|
||||||
|
|
@ -27,7 +28,77 @@ func newLogger() *slog.Logger {
|
||||||
} else {
|
} else {
|
||||||
h = slog.NewTextHandler(os.Stderr, opts)
|
h = slog.NewTextHandler(os.Stderr, opts)
|
||||||
}
|
}
|
||||||
return slog.New(h)
|
// Wrap so every record logged with a *Context method carries the request trace_id.
|
||||||
|
return slog.New(contextHandler{h})
|
||||||
|
}
|
||||||
|
|
||||||
|
// contextHandler wraps a slog.Handler so a record logged with one of the *Context methods
|
||||||
|
// (InfoContext/DebugContext/WarnContext/ErrorContext) automatically gets the request's
|
||||||
|
// trace_id attached — the userver-style trail, set once at the top of a request and
|
||||||
|
// carried through to the model call. Records logged without a traced context (startup,
|
||||||
|
// the appservice transaction handler) simply carry no trace_id. The field is named
|
||||||
|
// trace_id (the OTel convention) so it maps cleanly into OpenSearch and a future OTel
|
||||||
|
// exporter.
|
||||||
|
type contextHandler struct{ slog.Handler }
|
||||||
|
|
||||||
|
func (h contextHandler) Handle(ctx context.Context, r slog.Record) error {
|
||||||
|
if id := traceFromContext(ctx); id != "" {
|
||||||
|
r.AddAttrs(slog.String("trace_id", id))
|
||||||
|
}
|
||||||
|
return h.Handler.Handle(ctx, r)
|
||||||
|
}
|
||||||
|
|
||||||
|
// WithAttrs/WithGroup must re-wrap, or the embedded handler's versions would return a
|
||||||
|
// bare handler and silently drop the trace_id injection for any child logger.
|
||||||
|
func (h contextHandler) WithAttrs(attrs []slog.Attr) slog.Handler {
|
||||||
|
return contextHandler{h.Handler.WithAttrs(attrs)}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (h contextHandler) WithGroup(name string) slog.Handler {
|
||||||
|
return contextHandler{h.Handler.WithGroup(name)}
|
||||||
|
}
|
||||||
|
|
||||||
|
// logLLMExchange logs the raw request/response bodies of one model call at DEBUG, but
|
||||||
|
// ONLY for senders on the LOG_BODIES_USERS allowlist (the per-request `verbose` flag
|
||||||
|
// stamped into ctx at admission). Everyone else gets routing + metadata logs only, so
|
||||||
|
// message content never enters the logs unless an operator opts a specific user in AND
|
||||||
|
// runs at LOG_LEVEL=debug (DebugContext is suppressed otherwise). Only the request and
|
||||||
|
// response BODIES are logged — never the URL or any header — so the API key cannot leak
|
||||||
|
// whether it rides in the Authorization header (xAI / Gemini-compat) or the URL query
|
||||||
|
// string (Gemini native grounding). Bodies are truncated to llmBodyLogMax bytes, and the
|
||||||
|
// auto-injected trace_id ties the exchange to the rest of the request's lines.
|
||||||
|
func logLLMExchange(ctx context.Context, log *slog.Logger, provider string, reqBody []byte, status int, respBody []byte) {
|
||||||
|
if log == nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
ri, ok := reqInfoFromContext(ctx)
|
||||||
|
if !ok || !ri.verbose {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
log.DebugContext(ctx, "llm exchange",
|
||||||
|
"provider", provider,
|
||||||
|
"sender", ri.sender,
|
||||||
|
"status", status,
|
||||||
|
"request", truncateForLog(reqBody, llmBodyLogMax),
|
||||||
|
"response", truncateForLog(respBody, llmBodyLogMax),
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
// llmBodyLogMax caps each logged model body. A fixed constant, not an env knob — the cap
|
||||||
|
// only bounds log volume on an opt-in debug path and never needs per-deploy tuning. ~4 KB
|
||||||
|
// keeps a typical prompt/answer readable while staying bounded.
|
||||||
|
const llmBodyLogMax = 4096
|
||||||
|
|
||||||
|
// truncateForLog renders a body for a log line, capped at maxBytes (≤0 → a 2000-byte
|
||||||
|
// default) with an ellipsis marker so a truncated payload is visibly truncated.
|
||||||
|
func truncateForLog(b []byte, maxBytes int) string {
|
||||||
|
if maxBytes <= 0 {
|
||||||
|
maxBytes = 2000
|
||||||
|
}
|
||||||
|
if len(b) > maxBytes {
|
||||||
|
return string(b[:maxBytes]) + "…(truncated)"
|
||||||
|
}
|
||||||
|
return string(b)
|
||||||
}
|
}
|
||||||
|
|
||||||
func parseLogLevel(s string) slog.Level {
|
func parseLogLevel(s string) slog.Level {
|
||||||
|
|
|
||||||
87
apps/ai-bot/logging_test.go
Normal file
87
apps/ai-bot/logging_test.go
Normal file
|
|
@ -0,0 +1,87 @@
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"context"
|
||||||
|
"log/slog"
|
||||||
|
"strings"
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
// debugLogger builds a JSON slog.Logger over buf wrapped in the contextHandler, at the
|
||||||
|
// given level — the same wrapping newLogger applies, so these tests exercise the real
|
||||||
|
// trace-injection path.
|
||||||
|
func debugLogger(buf *bytes.Buffer, level slog.Level) *slog.Logger {
|
||||||
|
h := slog.NewJSONHandler(buf, &slog.HandlerOptions{Level: level})
|
||||||
|
return slog.New(contextHandler{h})
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestContextHandlerInjectsTraceID proves a record logged with a *Context method carries
|
||||||
|
// the request's trace_id, and that a record with no traced context carries none.
|
||||||
|
func TestContextHandlerInjectsTraceID(t *testing.T) {
|
||||||
|
var buf bytes.Buffer
|
||||||
|
log := debugLogger(&buf, slog.LevelInfo)
|
||||||
|
|
||||||
|
ctx := withRequestTrace(context.Background(), "trace-abc", "@u:vojo.chat", false)
|
||||||
|
log.InfoContext(ctx, "with trace")
|
||||||
|
if !strings.Contains(buf.String(), `"trace_id":"trace-abc"`) {
|
||||||
|
t.Fatalf("traced line missing trace_id: %s", buf.String())
|
||||||
|
}
|
||||||
|
|
||||||
|
buf.Reset()
|
||||||
|
log.InfoContext(context.Background(), "no trace")
|
||||||
|
if strings.Contains(buf.String(), "trace_id") {
|
||||||
|
t.Fatalf("untraced line should carry no trace_id: %s", buf.String())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestLogLLMExchangeGate proves bodies are logged ONLY for a verbose (allowlisted)
|
||||||
|
// sender, never otherwise — the per-user privacy gate.
|
||||||
|
func TestLogLLMExchangeGate(t *testing.T) {
|
||||||
|
req := []byte(`{"model":"grok","messages":[{"role":"user","content":"secret question"}]}`)
|
||||||
|
resp := []byte(`{"choices":[{"message":{"content":"secret answer"}}]}`)
|
||||||
|
|
||||||
|
// Not on the allowlist → nothing logged, even at debug level.
|
||||||
|
var off bytes.Buffer
|
||||||
|
logOff := debugLogger(&off, slog.LevelDebug)
|
||||||
|
ctxOff := withRequestTrace(context.Background(), "t1", "@stranger:vojo.chat", false)
|
||||||
|
logLLMExchange(ctxOff, logOff, "xai", req, 200, resp)
|
||||||
|
if off.Len() != 0 {
|
||||||
|
t.Fatalf("non-allowlisted sender must not log bodies, got: %s", off.String())
|
||||||
|
}
|
||||||
|
|
||||||
|
// On the allowlist → request + response bodies appear, tagged with the trace_id.
|
||||||
|
var on bytes.Buffer
|
||||||
|
logOn := debugLogger(&on, slog.LevelDebug)
|
||||||
|
ctxOn := withRequestTrace(context.Background(), "t2", "@heaven:vojo.chat", true)
|
||||||
|
logLLMExchange(ctxOn, logOn, "xai", req, 200, resp)
|
||||||
|
out := on.String()
|
||||||
|
for _, want := range []string{"llm exchange", "secret question", "secret answer", `"trace_id":"t2"`, `"sender":"@heaven:vojo.chat"`} {
|
||||||
|
if !strings.Contains(out, want) {
|
||||||
|
t.Fatalf("allowlisted body log missing %q: %s", want, out)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestLogLLMExchangeRequiresDebug proves that even an allowlisted sender logs nothing at
|
||||||
|
// info level — bodies need LOG_LEVEL=debug, the second half of the gate.
|
||||||
|
func TestLogLLMExchangeRequiresDebug(t *testing.T) {
|
||||||
|
var buf bytes.Buffer
|
||||||
|
log := debugLogger(&buf, slog.LevelInfo)
|
||||||
|
ctx := withRequestTrace(context.Background(), "t3", "@heaven:vojo.chat", true)
|
||||||
|
logLLMExchange(ctx, log, "xai", []byte("req"), 200, []byte("resp"))
|
||||||
|
if buf.Len() != 0 {
|
||||||
|
t.Fatalf("bodies must stay suppressed at info level, got: %s", buf.String())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestTruncateForLog bounds body size and marks truncation.
|
||||||
|
func TestTruncateForLog(t *testing.T) {
|
||||||
|
if got := truncateForLog([]byte("short"), 100); got != "short" {
|
||||||
|
t.Fatalf("under-cap should pass through, got %q", got)
|
||||||
|
}
|
||||||
|
got := truncateForLog([]byte("0123456789"), 4)
|
||||||
|
if got != "0123…(truncated)" {
|
||||||
|
t.Fatalf("over-cap truncation = %q", got)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -151,6 +151,7 @@ func (c *geminiClient) groundedSearch(ctx context.Context, query string) (gemini
|
||||||
}
|
}
|
||||||
defer resp.Body.Close()
|
defer resp.Body.Close()
|
||||||
data, _ := io.ReadAll(resp.Body)
|
data, _ := io.ReadAll(resp.Body)
|
||||||
|
logLLMExchange(ctx, c.log, "gemini_grounding", body, resp.StatusCode, data)
|
||||||
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
|
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
|
||||||
return geminiGroundResult{}, fmt.Errorf("gemini grounding http %d: %s", resp.StatusCode, snippet(data))
|
return geminiGroundResult{}, fmt.Errorf("gemini grounding http %d: %s", resp.StatusCode, snippet(data))
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -88,7 +88,7 @@ func (b *Bot) classify(ctx context.Context, body string, cost *CostBreakdown) Ro
|
||||||
}
|
}
|
||||||
refined, err := b.routeLayer1(ctx, body, cost)
|
refined, err := b.routeLayer1(ctx, body, cost)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
b.log.Warn("layer-1 classifier failed; using heuristic", "err", err)
|
b.log.WarnContext(ctx, "layer-1 classifier failed; using heuristic", "err", err)
|
||||||
return d // degrade to the heuristic verdict
|
return d // degrade to the heuristic verdict
|
||||||
}
|
}
|
||||||
return refined
|
return refined
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,9 @@
|
||||||
package main
|
package main
|
||||||
|
|
||||||
import "time"
|
import (
|
||||||
|
"context"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
// telemetry.go is the request_log analytics path: it captures route, cost, latency
|
// telemetry.go is the request_log analytics path: it captures route, cost, latency
|
||||||
// and outcome for each engaged request so the real $/day and route mix can be
|
// and outcome for each engaged request so the real $/day and route mix can be
|
||||||
|
|
@ -77,18 +80,18 @@ type RequestLog struct {
|
||||||
// The query text is stripped unless TELEMETRY_STORE_TEXT, so message content never
|
// The query text is stripped unless TELEMETRY_STORE_TEXT, so message content never
|
||||||
// lands in the analytics table by default. Runs in a recovered goroutine and only
|
// lands in the analytics table by default. Runs in a recovered goroutine and only
|
||||||
// logs failures, so it can never drop or delay the reply.
|
// logs failures, so it can never drop or delay the reply.
|
||||||
func (b *Bot) recordTelemetry(rl RequestLog) {
|
func (b *Bot) recordTelemetry(ctx context.Context, rl RequestLog) {
|
||||||
if !b.cfg.TelemetryEnabled {
|
if !b.cfg.TelemetryEnabled {
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
if !b.cfg.TelemetryStoreText {
|
if !b.cfg.TelemetryStoreText {
|
||||||
rl.QueryText = ""
|
rl.QueryText = ""
|
||||||
}
|
}
|
||||||
b.safego("telemetry", func() {
|
b.safego(ctx, "telemetry", func() {
|
||||||
if err := b.st.InsertRequestLog(rl); err != nil {
|
if err := b.st.InsertRequestLog(rl); err != nil {
|
||||||
b.log.Warn("request_log insert failed (non-fatal)", "id", rl.ID, "err", err)
|
b.log.WarnContext(ctx, "request_log insert failed (non-fatal)", "id", rl.ID, "err", err)
|
||||||
}
|
}
|
||||||
b.maybeTrimTelemetry()
|
b.maybeTrimTelemetry(ctx)
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -96,8 +99,8 @@ func (b *Bot) recordTelemetry(rl RequestLog) {
|
||||||
// any model ran (encrypted/media/foreign). These are low-frequency, so a direct row
|
// any model ran (encrypted/media/foreign). These are low-frequency, so a direct row
|
||||||
// (route=none + reason) keeps the "why no answer" visible without flooding the table
|
// (route=none + reason) keeps the "why no answer" visible without flooding the table
|
||||||
// with the common not-addressed drops, which are not logged (pre-claim best-effort).
|
// with the common not-addressed drops, which are not logged (pre-claim best-effort).
|
||||||
func (b *Bot) recordSkip(ev *Event, reason string) {
|
func (b *Bot) recordSkip(ctx context.Context, ev *Event, reason string) {
|
||||||
b.recordTelemetry(RequestLog{
|
b.recordTelemetry(ctx, RequestLog{
|
||||||
ID: ev.EventID,
|
ID: ev.EventID,
|
||||||
RoomID: ev.RoomID,
|
RoomID: ev.RoomID,
|
||||||
Sender: ev.Sender,
|
Sender: ev.Sender,
|
||||||
|
|
@ -111,7 +114,7 @@ func (b *Bot) recordSkip(ev *Event, reason string) {
|
||||||
|
|
||||||
// maybeTrimTelemetry runs the time-based retention trim once per telemetryTrimEvery
|
// maybeTrimTelemetry runs the time-based retention trim once per telemetryTrimEvery
|
||||||
// writes. Best-effort and off the hot path (called from the telemetry goroutine).
|
// writes. Best-effort and off the hot path (called from the telemetry goroutine).
|
||||||
func (b *Bot) maybeTrimTelemetry() {
|
func (b *Bot) maybeTrimTelemetry(ctx context.Context) {
|
||||||
if b.cfg.TelemetryRetention <= 0 {
|
if b.cfg.TelemetryRetention <= 0 {
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
@ -119,6 +122,6 @@ func (b *Bot) maybeTrimTelemetry() {
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
if err := b.st.TrimRequestLog(time.Now().Add(-b.cfg.TelemetryRetention)); err != nil {
|
if err := b.st.TrimRequestLog(time.Now().Add(-b.cfg.TelemetryRetention)); err != nil {
|
||||||
b.log.Warn("request_log trim failed (non-fatal)", "err", err)
|
b.log.WarnContext(ctx, "request_log trim failed (non-fatal)", "err", err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,7 @@
|
||||||
package main
|
package main
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
"context"
|
||||||
"io"
|
"io"
|
||||||
"log/slog"
|
"log/slog"
|
||||||
"testing"
|
"testing"
|
||||||
|
|
@ -33,7 +34,7 @@ func TestRecordSkipWritesRow(t *testing.T) {
|
||||||
b := newTestBot(st, &Config{TelemetryEnabled: true})
|
b := newTestBot(st, &Config{TelemetryEnabled: true})
|
||||||
|
|
||||||
ev := &Event{EventID: "$skip-1", RoomID: "!r:vojo.chat", Sender: "@u:vojo.chat"}
|
ev := &Event{EventID: "$skip-1", RoomID: "!r:vojo.chat", Sender: "@u:vojo.chat"}
|
||||||
b.recordSkip(ev, degradeMedia)
|
b.recordSkip(context.Background(), ev, degradeMedia)
|
||||||
|
|
||||||
deadline := time.Now().Add(2 * time.Second)
|
deadline := time.Now().Add(2 * time.Second)
|
||||||
for requestLogCount(t, st) == 0 && time.Now().Before(deadline) {
|
for requestLogCount(t, st) == 0 && time.Now().Before(deadline) {
|
||||||
|
|
@ -61,7 +62,7 @@ func TestTelemetryDisabledWritesNothing(t *testing.T) {
|
||||||
defer st.Close()
|
defer st.Close()
|
||||||
b := newTestBot(st, &Config{TelemetryEnabled: false})
|
b := newTestBot(st, &Config{TelemetryEnabled: false})
|
||||||
|
|
||||||
b.recordSkip(&Event{EventID: "$skip-2", RoomID: "!r:vojo.chat", Sender: "@u:vojo.chat"}, degradeMedia)
|
b.recordSkip(context.Background(), &Event{EventID: "$skip-2", RoomID: "!r:vojo.chat", Sender: "@u:vojo.chat"}, degradeMedia)
|
||||||
|
|
||||||
// Give any (incorrect) async write time to land, then assert nothing was written.
|
// Give any (incorrect) async write time to land, then assert nothing was written.
|
||||||
time.Sleep(200 * time.Millisecond)
|
time.Sleep(200 * time.Millisecond)
|
||||||
|
|
|
||||||
66
apps/ai-bot/trace.go
Normal file
66
apps/ai-bot/trace.go
Normal file
|
|
@ -0,0 +1,66 @@
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"crypto/rand"
|
||||||
|
"encoding/hex"
|
||||||
|
)
|
||||||
|
|
||||||
|
// trace.go threads a per-request correlation id (and the small request facts the logger
|
||||||
|
// and the body-logging gate need) through context — the userver / OpenTelemetry idiom:
|
||||||
|
// mint once at the top of a request, and every log line below it (down to the HTTP call
|
||||||
|
// to the model) carries the same trace_id without passing a logger by hand. ctx is
|
||||||
|
// already plumbed through the whole request path (handleEvent → respond → generate →
|
||||||
|
// LLMClient.Complete → the transport), so a value placed here surfaces everywhere.
|
||||||
|
//
|
||||||
|
// The id is 16 random bytes rendered as 32 hex chars — the W3C Trace-Context / OTel
|
||||||
|
// trace-id shape — so the trace_id field maps straight onto an OpenTelemetry trace id if
|
||||||
|
// an exporter is added later (no log/field rename). Today this is a correlation key, not
|
||||||
|
// a full SpanContext: real distributed tracing would still add a span_id and traceparent
|
||||||
|
// propagation across services.
|
||||||
|
|
||||||
|
type ctxKey int
|
||||||
|
|
||||||
|
const reqInfoKey ctxKey = iota
|
||||||
|
|
||||||
|
// reqInfo is the per-request data carried in context: the trace id stamped on every log
|
||||||
|
// line, the sender (so the body-log lines stay filterable by user), and verbose —
|
||||||
|
// whether this sender is on the LOG_BODIES_USERS allowlist. verbose is decided once, at
|
||||||
|
// admission, so the deep transport never re-checks the allowlist; it just reads the flag.
|
||||||
|
type reqInfo struct {
|
||||||
|
traceID string
|
||||||
|
sender string
|
||||||
|
verbose bool
|
||||||
|
}
|
||||||
|
|
||||||
|
// withRequestTrace stamps the request's trace id + sender + body-logging decision onto
|
||||||
|
// ctx. Call it once per handled event; the value flows down through the per-room
|
||||||
|
// goroutine, the per-request deadline ctx (WithTimeout preserves values), and into the
|
||||||
|
// model transport.
|
||||||
|
func withRequestTrace(ctx context.Context, traceID, sender string, verbose bool) context.Context {
|
||||||
|
return context.WithValue(ctx, reqInfoKey, reqInfo{traceID: traceID, sender: sender, verbose: verbose})
|
||||||
|
}
|
||||||
|
|
||||||
|
func reqInfoFromContext(ctx context.Context) (reqInfo, bool) {
|
||||||
|
ri, ok := ctx.Value(reqInfoKey).(reqInfo)
|
||||||
|
return ri, ok
|
||||||
|
}
|
||||||
|
|
||||||
|
// traceFromContext returns the request trace id, or "" when ctx carries none (startup,
|
||||||
|
// the appservice transaction handler) — the slog handler then simply omits trace_id.
|
||||||
|
func traceFromContext(ctx context.Context) string {
|
||||||
|
if ri, ok := reqInfoFromContext(ctx); ok {
|
||||||
|
return ri.traceID
|
||||||
|
}
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
|
// newTraceID mints a random 16-byte id as 32 hex chars (the OTel trace-id shape).
|
||||||
|
// crypto/rand.Read never returns an error and always fills the buffer (Go 1.24+: on an
|
||||||
|
// entropy failure it crashes the process rather than returning a short read), so ignoring
|
||||||
|
// the error is safe — the id is always fully random.
|
||||||
|
func newTraceID() string {
|
||||||
|
var b [16]byte
|
||||||
|
_, _ = rand.Read(b[:])
|
||||||
|
return hex.EncodeToString(b[:])
|
||||||
|
}
|
||||||
|
|
@ -146,6 +146,7 @@ func (p *grokWebSearch) Fetch(ctx context.Context, query string) (WebContext, er
|
||||||
}
|
}
|
||||||
defer resp.Body.Close()
|
defer resp.Body.Close()
|
||||||
data, _ := io.ReadAll(resp.Body)
|
data, _ := io.ReadAll(resp.Body)
|
||||||
|
logLLMExchange(ctx, p.logger, "grok_web_search", body, resp.StatusCode, data)
|
||||||
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
|
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
|
||||||
return WebContext{}, fmt.Errorf("grok web search http %d: %s", resp.StatusCode, snippet(data))
|
return WebContext{}, fmt.Errorf("grok web search http %d: %s", resp.StatusCode, snippet(data))
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -88,6 +88,23 @@ grounding is free under the daily RPD, guarded by `WEB_GROUNDING_DAILY_CAP`. `XA
|
||||||
+ `GROK_REASONING_EFFORT=none` (4.3 otherwise reasons on every reply). Full flag table in the
|
+ `GROK_REASONING_EFFORT=none` (4.3 otherwise reasons on every reply). Full flag table in the
|
||||||
[README](../../apps/ai-bot/README.md).
|
[README](../../apps/ai-bot/README.md).
|
||||||
|
|
||||||
|
## Observability (logs + per-request trace)
|
||||||
|
|
||||||
|
`log/slog` to stderr (`LOG_LEVEL`, `LOG_FORMAT=text|json`). A context-aware handler
|
||||||
|
([logging.go](../../apps/ai-bot/logging.go)) stamps a per-request **`trace_id`** —
|
||||||
|
minted once per handled event in `handleEvent` ([trace.go](../../apps/ai-bot/trace.go))
|
||||||
|
and carried in `ctx` down to the model HTTP call — onto **every** log line, so one
|
||||||
|
`trace_id` greps the whole request trail (the userver idiom; the id is OTel-trace-id
|
||||||
|
shaped for a future exporter). Routing diagnostics (`route decided` / `generation
|
||||||
|
outcome`) are DEBUG, content-free. Full model **request/response bodies** are gated by a
|
||||||
|
**per-user allowlist** `LOG_BODIES_USERS` (empty = nobody) **and** `LOG_LEVEL=debug`,
|
||||||
|
truncated to a fixed ~4 KB cap, with URL/headers (the API key) never logged — decided once
|
||||||
|
at admission via a `verbose` flag in `ctx`, read by the dumb transport. This is the
|
||||||
|
**debug** path; `request_log` (`TELEMETRY_*`) is the separate **analytics** path — they
|
||||||
|
correlate via `trace_id`/`event_id` but are independent. Ship JSON stdout to
|
||||||
|
OpenSearch/Loki with a collector (Fluent Bit/Vector); the bot never talks to a log
|
||||||
|
backend. Full flag table in the [README](../../apps/ai-bot/README.md#observability--logs--per-request-trace).
|
||||||
|
|
||||||
## Building / testing
|
## Building / testing
|
||||||
|
|
||||||
Go toolchain lives at `/home/ubuntu/.go-toolchain/go/bin` (NOT on PATH). Store-backed tests need
|
Go toolchain lives at `/home/ubuntu/.go-toolchain/go/bin` (NOT on PATH). Store-backed tests need
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue