70 lines
2.9 KiB
Markdown
70 lines
2.9 KiB
Markdown
|
|
# Agent Multimodal Smoke Runbook
|
|||
|
|
|
|||
|
|
**Goal:** 固化 agent 三条主链路(runs/events/history)的真实冒烟标准与输入基线。
|
|||
|
|
|
|||
|
|
## 1. 覆盖范围
|
|||
|
|
|
|||
|
|
1. `POST /api/v1/agent/runs` - 接收多模态消息(文本+图片)
|
|||
|
|
2. `GET /api/v1/agent/runs/{thread_id}/events` - SSE 事件流,事件名符合 AG-UI 标准(`RUN_STARTED`、`STEP_STARTED`、`TOOL_CALL_*`、`RUN_FINISHED`/`RUN_ERROR`)
|
|||
|
|
3. `GET /api/v1/agent/runs/{thread_id}/history` - 返回 `STATE_SNAPSHOT`,含 `attachments` metadata
|
|||
|
|
4. `sessions/messages` 落库完整:message_count、tokens、cost、latency、title、metadata
|
|||
|
|
5. tool result 存储:大 payload 写 storage,metadata 记录 `storage_bucket`/`storage_path`
|
|||
|
|
6. storage bucket 来源:必须来自环境变量 `SOCIAL_STORAGE__BUCKET`
|
|||
|
|
|
|||
|
|
## 2. 固定测试输入
|
|||
|
|
|
|||
|
|
- 图片夹具:`backend/tests/fixtures/images/calendar_text_cn.png`
|
|||
|
|
- 多模态消息:
|
|||
|
|
- 文本:`"识别图片中的日历内容并调用 calendar.write 创建日程"`
|
|||
|
|
- 图片:`{"type":"binary","data":"<base64>","mimeType":"image/png"}`
|
|||
|
|
|
|||
|
|
## 3. 账号与凭据
|
|||
|
|
|
|||
|
|
- 冒烟账号:`dagronl@126.com` / `123456`
|
|||
|
|
- 通过环境变量注入:`AGENT_LIVE_EMAIL`、`AGENT_LIVE_PASSWORD`
|
|||
|
|
|
|||
|
|
## 4. 执行命令
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
AGENT_LIVE_INTEGRATION=1 \
|
|||
|
|
AGENT_LIVE_EMAIL="dagronl@126.com" \
|
|||
|
|
AGENT_LIVE_PASSWORD="123456" \
|
|||
|
|
uv run pytest tests/integration/v1/agent/test_sse_flow_live.py::test_agent_runs_events_history_live_with_image_input -q -s
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 5. 结果记录模板
|
|||
|
|
|
|||
|
|
- `thread_id` / `run_id`
|
|||
|
|
- `runs` 状态码与响应
|
|||
|
|
- `events` 事件序列
|
|||
|
|
- `history` 是否含 `attachments[].bucket/path/mimeType`
|
|||
|
|
- `sessions` 字段:message_count / total_tokens / total_cost / status / title
|
|||
|
|
- `messages` 字段:role / content / metadata / tokens / cost / latency
|
|||
|
|
- `tool_result` 是否写 storage
|
|||
|
|
|
|||
|
|
## 6. 安全注意
|
|||
|
|
|
|||
|
|
- 禁止将密码/token 写入 git 跟踪文件
|
|||
|
|
|
|||
|
|
## 7. 已修复问题清单
|
|||
|
|
|
|||
|
|
| 问题 | 修复内容 |
|
|||
|
|
|------|----------|
|
|||
|
|
| bucket 写入失败回退 | 改为直接报错,禁止回退到硬编码 bucket |
|
|||
|
|
| user.resolve 工具 | 新增按 email/name 解析 user_id |
|
|||
|
|
| calendar.write 邀请参数 | 增加 invite 参数透传 |
|
|||
|
|
| inbox_repository 缺失 | 修复 calendar runtime 依赖 |
|
|||
|
|
| runtime 模型名拼接 | 修复无效 model name |
|
|||
|
|
| 多模态透传 | runtime 透传 binary.data,不过滤为 `<omitted>` |
|
|||
|
|
| sessions.title 生成 | 首条用户消息持久化时自动生成 |
|
|||
|
|
| assistant latency 入库 | `messages.latency_ms` 列写入 |
|
|||
|
|
| intent/execution 阶段消息落库 | 新增 `text.*` 和 `tool.result` 事件 |
|
|||
|
|
| DIRECT_RESPONSE 早返回 | intent 判定后直接返回,不进入 report 阶段 |
|
|||
|
|
|
|||
|
|
## 8. 待修复问题(用户新增)
|
|||
|
|
|
|||
|
|
1. **意图/执行阶段 tokens/cost 入库** - 目前仅 report 阶段入库
|
|||
|
|
2. **连续会话记忆测试** - 验证 session 是否从数据库读取历史上下文
|
|||
|
|
3. **工具调用测试** - calendar 读/写/删/分享 + 用户查找 + 时间感知
|
|||
|
|
4. **session 失败排查** - 找出最新失败原因并修复
|