# Agent Multimodal Smoke Runbook

**Goal:** 固化 agent 三条主链路（runs/events/history）的真实冒烟标准与输入基线。

## 1. 覆盖范围

1. `POST /api/v1/agent/runs` - 接收多模态消息（文本+图片）
2. `GET /api/v1/agent/runs/{thread_id}/events` - SSE 事件流，事件名符合 AG-UI 标准（`RUN_STARTED`、`STEP_STARTED`、`TOOL_CALL_*`、`RUN_FINISHED`/`RUN_ERROR`）
3. `GET /api/v1/agent/runs/{thread_id}/history` - 返回 `STATE_SNAPSHOT`，含 `attachments` metadata
4. `sessions/messages` 落库完整：message_count、tokens、cost、latency、title、metadata
5. tool result 存储：大 payload 写 storage，metadata 记录 `storage_bucket`/`storage_path`
6. storage bucket 来源：必须来自环境变量 `SOCIAL_STORAGE__BUCKET`

## 2. 固定测试输入

- 图片夹具：`backend/tests/fixtures/images/calendar_text_cn.png`
- 多模态消息：
  - 文本：`"识别图片中的日历内容并调用 calendar.write 创建日程"`
  - 图片：`{"type":"binary","data":"<base64>","mimeType":"image/png"}`

## 3. 账号与凭据

- 冒烟账号：`dagronl@126.com` / `123456`
- 通过环境变量注入：`AGENT_LIVE_EMAIL`、`AGENT_LIVE_PASSWORD`

## 4. 执行命令

```bash
AGENT_LIVE_INTEGRATION=1 \
AGENT_LIVE_EMAIL="dagronl@126.com" \
AGENT_LIVE_PASSWORD="123456" \
uv run pytest tests/integration/v1/agent/test_sse_flow_live.py::test_agent_runs_events_history_live_with_image_input -q -s
```

## 5. 结果记录模板

- `thread_id` / `run_id`
- `runs` 状态码与响应
- `events` 事件序列
- `history` 是否含 `attachments[].bucket/path/mimeType`
- `sessions` 字段：message_count / total_tokens / total_cost / status / title
- `messages` 字段：role / content / metadata / tokens / cost / latency
- `tool_result` 是否写 storage

## 6. 安全注意

- 禁止将密码/token 写入 git 跟踪文件

## 7. 已修复问题清单

| 问题 | 修复内容 |
|------|----------|
| bucket 写入失败回退 | 改为直接报错，禁止回退到硬编码 bucket |
| user.resolve 工具 | 新增按 email/name 解析 user_id |
| calendar.write 邀请参数 | 增加 invite 参数透传 |
| inbox_repository 缺失 | 修复 calendar runtime 依赖 |
| runtime 模型名拼接 | 修复无效 model name |
| 多模态透传 | runtime 透传 binary.data，不过滤为 `<omitted>` |
| sessions.title 生成 | 首条用户消息持久化时自动生成 |
| assistant latency 入库 | `messages.latency_ms` 列写入 |
| intent/execution 阶段消息落库 | 新增 `text.*` 和 `tool.result` 事件 |
| DIRECT_RESPONSE 早返回 | intent 判定后直接返回，不进入 report 阶段 |

## 8. 待修复问题（用户新增）

1. **意图/执行阶段 tokens/cost 入库** - 目前仅 report 阶段入库
2. **连续会话记忆测试** - 验证 session 是否从数据库读取历史上下文
3. **工具调用测试** - calendar 读/写/删/分享 + 用户查找 + 时间感知
4. **session 失败排查** - 找出最新失败原因并修复

## 9. 本轮进展与结论（2026-03-12）

### 9.1 反馈闭环状态

1. **intent/execution 阶段 tokens/cost 入库**：已解决。
2. **连续会话记忆（今天+昨天上下文）**：已解决。
3. **工具调用冒烟（读/写/删/分享 + user 查询 + 时间感知）**：部分解决。
4. **最新失败 session 根因定位与修复**：已解决。
5. **反馈同步到文档**：已完成（本节）。

### 9.2 关键修复

1. **stage telemetry 补齐**（intent/execution）：
   - usage 缺失时补 token 估算；
   - 通过 `LiteLLMService.calculate_cost` 按项目定价估算 cost；
   - 回填 `response_metadata.inputTokens/outputTokens/cost` 并落库。

2. **会话记忆上下文注入**：
   - runtime 在执行前读取同一 session 最近两天（今天+昨天）的 user/assistant 消息；
   - intent prompt 增加 `[Conversation Context]`，避免只看最新用户输入。

3. **工具调用稳定性修复**：
   - tool 名统一为下划线（`calendar_read`/`calendar_write`/`user_resolve`），修复 OpenAI/LiteLLM tool name 正则错误；
   - intent prompt 注入 intent+execution 合并工具 schema，避免误判“无可用写入工具”。

### 9.3 Live 证据

#### A) tokens/cost 入库（thread=`cb1681c2-c223-4ced-bcfd-76f7252ba2d8`）

- intent: `input_tokens=1541`，`output_tokens=37`，`cost=0.000382`
- execution: `input_tokens=2161`，`output_tokens=376`，`cost=0.005450`
- report: `input_tokens=3266`，`output_tokens=318`，`cost=0.007256`
- session 聚合：`total_tokens=13518`，`total_cost=0.019473`

#### B) 连续会话记忆（thread=`9c456736-d5e5-48a4-b9db-55f507baf573`）

- run `mem-1`：`请记住口令是蓝鲸42，只回复已记住。`
- run `mem-2`：`只回复我刚才让你记住的口令，不要解释。`
- assistant 回复：`蓝鲸42`（记忆命中）。

#### C) 工具调用 + 时间感知（thread=`cb1681c2-c223-4ced-bcfd-76f7252ba2d8`，run=`run-tool-1`）

- 事件序列含 execution 阶段与多次 `TOOL_CALL_RESULT`
- 工具调用结果：`calendar_write`、`calendar_read`（多次）
- assistant 回复包含时间感知信息（北京时间日期/星期/时刻）

### 9.4 最新失败 session 根因

- 失败样本：`d6bc4dbd-8361-4a39-bf09-12b3392e0e70`
- 根因：tool 名含点号（如 `calendar.write`）触发校验失败：
  - `Invalid 'tools[0].function.name' ... expected pattern ^[a-zA-Z0-9_-]+$`
- 修复后：同类执行链路已可稳定进入 execution 并产出 `TOOL_CALL_RESULT`。

### 9.5 当前未闭环项

- `user_resolve` + calendar **分享 + 删除** 组合链路的完整 live 证据还未补齐（本轮执行中断：`Tool execution aborted`）。