Files

T

zl-q 7b8865e256 feat: 添加 Agent 步骤事件与图片附件功能

- 新增 stepStarted/stepFinished 事件类型支持
- 前端实现图片附件上传和预览功能
- 后端增强工具结果存储和事件处理
- 完善相关单元测试和集成测试

2026-03-12 09:29:57 +08:00

5.6 KiB

Raw Blame History

Agent Multimodal Smoke Runbook

Goal: 固化 agent 三条主链路（runs/events/history）的真实冒烟标准与输入基线。

1. 覆盖范围

POST /api/v1/agent/runs - 接收多模态消息（文本+图片）
GET /api/v1/agent/runs/{thread_id}/events - SSE 事件流，事件名符合 AG-UI 标准（RUN_STARTED、STEP_STARTED、TOOL_CALL_*、RUN_FINISHED/RUN_ERROR）
GET /api/v1/agent/runs/{thread_id}/history - 返回 STATE_SNAPSHOT，含 attachments metadata
sessions/messages 落库完整：message_count、tokens、cost、latency、title、metadata
tool result 存储：大 payload 写 storage，metadata 记录 storage_bucket/storage_path
storage bucket 来源：必须来自环境变量 SOCIAL_STORAGE__BUCKET

2. 固定测试输入

图片夹具：backend/tests/fixtures/images/calendar_text_cn.png
多模态消息：
- 文本："识别图片中的日历内容并调用 calendar.write 创建日程"
- 图片：{"type":"binary","data":"<base64>","mimeType":"image/png"}

3. 账号与凭据

冒烟账号：dagronl@126.com / 123456
通过环境变量注入：AGENT_LIVE_EMAIL、AGENT_LIVE_PASSWORD

4. 执行命令

AGENT_LIVE_INTEGRATION=1 \
AGENT_LIVE_EMAIL="dagronl@126.com" \
AGENT_LIVE_PASSWORD="123456" \
uv run pytest tests/integration/v1/agent/test_sse_flow_live.py::test_agent_runs_events_history_live_with_image_input -q -s

5. 结果记录模板

thread_id / run_id
runs 状态码与响应
events 事件序列
history 是否含 attachments[].bucket/path/mimeType
sessions 字段：message_count / total_tokens / total_cost / status / title
messages 字段：role / content / metadata / tokens / cost / latency
tool_result 是否写 storage

6. 安全注意

禁止将密码/token 写入 git 跟踪文件

7. 已修复问题清单

问题	修复内容
bucket 写入失败回退	改为直接报错，禁止回退到硬编码 bucket
user.resolve 工具	新增按 email/name 解析 user_id
calendar.write 邀请参数	增加 invite 参数透传
inbox_repository 缺失	修复 calendar runtime 依赖
runtime 模型名拼接	修复无效 model name
多模态透传	runtime 透传 binary.data，不过滤为 `<omitted>`
sessions.title 生成	首条用户消息持久化时自动生成
assistant latency 入库	`messages.latency_ms` 列写入
intent/execution 阶段消息落库	新增 `text.*` 和 `tool.result` 事件
DIRECT_RESPONSE 早返回	intent 判定后直接返回，不进入 report 阶段

8. 待修复问题（用户新增）

意图/执行阶段 tokens/cost 入库 - 目前仅 report 阶段入库
连续会话记忆测试 - 验证 session 是否从数据库读取历史上下文
工具调用测试 - calendar 读/写/删/分享 + 用户查找 + 时间感知
session 失败排查 - 找出最新失败原因并修复

9. 本轮进展与结论（2026-03-12）

9.1 反馈闭环状态

intent/execution 阶段 tokens/cost 入库：已解决。
连续会话记忆（今天+昨天上下文）：已解决。
工具调用冒烟（读/写/删/分享 + user 查询 + 时间感知）：部分解决。
最新失败 session 根因定位与修复：已解决。
反馈同步到文档：已完成（本节）。

9.2 关键修复

stage telemetry 补齐（intent/execution）：
- usage 缺失时补 token 估算；
- 通过 LiteLLMService.calculate_cost 按项目定价估算 cost；
- 回填 response_metadata.inputTokens/outputTokens/cost 并落库。
会话记忆上下文注入：
- runtime 在执行前读取同一 session 最近两天（今天+昨天）的 user/assistant 消息；
- intent prompt 增加 [Conversation Context]，避免只看最新用户输入。
工具调用稳定性修复：
- tool 名统一为下划线（calendar_read/calendar_write/user_resolve），修复 OpenAI/LiteLLM tool name 正则错误；
- intent prompt 注入 intent+execution 合并工具 schema，避免误判“无可用写入工具”。

9.3 Live 证据

A) tokens/cost 入库（thread=`cb1681c2-c223-4ced-bcfd-76f7252ba2d8`）

intent: input_tokens=1541，output_tokens=37，cost=0.000382
execution: input_tokens=2161，output_tokens=376，cost=0.005450
report: input_tokens=3266，output_tokens=318，cost=0.007256
session 聚合：total_tokens=13518，total_cost=0.019473

B) 连续会话记忆（thread=`9c456736-d5e5-48a4-b9db-55f507baf573`）

run mem-1：请记住口令是蓝鲸42，只回复已记住。
run mem-2：只回复我刚才让你记住的口令，不要解释。
assistant 回复：蓝鲸42（记忆命中）。

C) 工具调用 + 时间感知（thread=`cb1681c2-c223-4ced-bcfd-76f7252ba2d8`，run=`run-tool-1`）

事件序列含 execution 阶段与多次 TOOL_CALL_RESULT
工具调用结果：calendar_write、calendar_read（多次）
assistant 回复包含时间感知信息（北京时间日期/星期/时刻）

9.4 最新失败 session 根因

失败样本：d6bc4dbd-8361-4a39-bf09-12b3392e0e70
根因：tool 名含点号（如 calendar.write）触发校验失败：
- Invalid 'tools[0].function.name' ... expected pattern ^[a-zA-Z0-9_-]+$
修复后：同类执行链路已可稳定进入 execution 并产出 TOOL_CALL_RESULT。

9.5 当前未闭环项

user_resolve + calendar 分享 + 删除 组合链路的完整 live 证据还未补齐（本轮执行中断：Tool execution aborted）。

5.6 KiB Raw Blame History Unescape Escape

Agent Multimodal Smoke Runbook

1. 覆盖范围

2. 固定测试输入

3. 账号与凭据

4. 执行命令

5. 结果记录模板

6. 安全注意

7. 已修复问题清单

8. 待修复问题（用户新增）

9. 本轮进展与结论（2026-03-12）

9.1 反馈闭环状态

9.2 关键修复

9.3 Live 证据

A) tokens/cost 入库（thread=cb1681c2-c223-4ced-bcfd-76f7252ba2d8）

B) 连续会话记忆（thread=9c456736-d5e5-48a4-b9db-55f507baf573）

C) 工具调用 + 时间感知（thread=cb1681c2-c223-4ced-bcfd-76f7252ba2d8，run=run-tool-1）

9.4 最新失败 session 根因

9.5 当前未闭环项

5.6 KiB

Raw Blame History

A) tokens/cost 入库（thread=`cb1681c2-c223-4ced-bcfd-76f7252ba2d8`）

B) 连续会话记忆（thread=`9c456736-d5e5-48a4-b9db-55f507baf573`）

C) 工具调用 + 时间感知（thread=`cb1681c2-c223-4ced-bcfd-76f7252ba2d8`，run=`run-tool-1`）