refactor: 简化 AgentScope 运行时模块与事件处理

- 移除冗余的 user_token 参数传递 - 重构 tool.result 事件使用 ToolAgentOutput 模型 - 重构 text.end 事件使用 WorkerAgentOutput 模型 - 简化 store 模块的 tool result 处理逻辑 - 更新 router/service 适配新事件结构 - 清理废弃的测试文件与设计文档 - 新增 AgentRuns 多模态存储设计文档
2026-03-13 17:27:18 +08:00
parent 3273d63b23
commit 1c02503d1d
29 changed files with 1259 additions and 2725 deletions
@@ -0,0 +1,261 @@
+# Agent Runs Multimodal Refactor Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** 让 runs/resume 使用真实多模态图片输入，并将 worker/tool 按新结构化 metadata 规范落库。
+
+**Architecture:** 保持现有 event pipeline，不引入旁路写库。请求入口完成 URL 安全边界校验；runtime 将 `binary` 转模型可识别 `image_url` block；event store 统一校验 `WorkerAgentOutput` / `ToolAgentOutput` 并完成 `content` 映射。
+
+**Tech Stack:** FastAPI, Pydantic v2, SQLAlchemy AsyncSession, AgentScope, LiteLLM, Redis Stream
+
+---
+
+### Task 1: Runs 输入安全边界
+
+**Files:**
+- Modify: `backend/src/core/agentscope/schemas/agui_input.py`
+- Modify: `backend/src/v1/agent/router.py`
+- Modify: `backend/src/v1/agent/service.py`
+- Test: `backend/tests/unit/v1/agent/test_agent_router.py`
+
+**Step 1: Write the failing test**
+
+```python
+def test_runs_rejects_non_project_signed_url(...) -> None:
+    payload = build_run_payload_with_binary_url("https://evil.example.com/storage/v1/object/sign/..." )
+    resp = client.post("/api/v1/agent/runs", json=payload, headers=auth_headers)
+    assert resp.status_code == 422
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `pytest backend/tests/unit/v1/agent/test_agent_router.py::test_runs_rejects_non_project_signed_url -v`
+Expected: FAIL（当前不会拦截该 URL）
+
+**Step 3: Write minimal implementation**
+
+```python
+def validate_binary_signed_url_scope(*, url: str, user_id: UUID, thread_id: UUID) -> tuple[str, str]:
+    bucket, path = supabase_service.parse_signed_url(url)
+    # check host, bucket, path prefix agent-inputs/{user_id}/{thread_id}/uploads/
+    return bucket, path
+```
+
+在 `runs/resume` 请求入口调用校验；若请求含 binary 且当前模型不支持视觉，抛 `HTTPException(status_code=422, ...)`。
+
+**Step 4: Run test to verify it passes**
+
+Run: `pytest backend/tests/unit/v1/agent/test_agent_router.py::test_runs_rejects_non_project_signed_url -v`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add backend/src/core/agentscope/schemas/agui_input.py backend/src/v1/agent/router.py backend/src/v1/agent/service.py backend/tests/unit/v1/agent/test_agent_router.py
+git commit -m "fix: enforce signed image url scope on runs"
+```
+
+### Task 2: Runtime 多模态直传（移除文本化图片）
+
+**Files:**
+- Modify: `backend/src/core/agentscope/runtime/orchestrator.py`
+- Modify: `backend/src/core/agentscope/prompts/agent_prompt.py`
+- Test: `backend/tests/unit/core/agentscope/runtime/test_orchestrator.py`
+
+**Step 1: Write the failing test**
+
+```python
+async def test_orchestrator_passes_image_url_block_to_runner() -> None:
+    command = build_run_input_with_binary("https://project.supabase.co/storage/v1/object/sign/...")
+    await orchestrator.run(..., command=command, ...)
+    assert fake_runner.user_input[1]["type"] == "image_url"
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `pytest backend/tests/unit/core/agentscope/runtime/test_orchestrator.py::test_orchestrator_passes_image_url_block_to_runner -v`
+Expected: FAIL（当前路径仍可能文本化）
+
+**Step 3: Write minimal implementation**
+
+```python
+def _to_model_multimodal_blocks(content_blocks: list[dict[str, Any]]) -> list[dict[str, Any]]:
+    # text -> {type:"text", text:...}
+    # binary -> {type:"image_url", image_url:{url:...}}
+```
+
+将 runner 输入改为上述多模态块；禁止把图片块拼进普通字符串。
+
+**Step 4: Run test to verify it passes**
+
+Run: `pytest backend/tests/unit/core/agentscope/runtime/test_orchestrator.py::test_orchestrator_passes_image_url_block_to_runner -v`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add backend/src/core/agentscope/runtime/orchestrator.py backend/src/core/agentscope/prompts/agent_prompt.py backend/tests/unit/core/agentscope/runtime/test_orchestrator.py
+git commit -m "feat: pass image blocks as multimodal payload to model"
+```
+
+### Task 3: Worker 结构化落库（content=answer）
+
+**Files:**
+- Modify: `backend/src/core/agentscope/events/store.py`
+- Modify: `backend/src/core/agentscope/runtime/orchestrator.py`
+- Test: `backend/tests/unit/core/agentscope/events/test_store.py`
+
+**Step 1: Write the failing test**
+
+```python
+async def test_text_message_end_persists_worker_output_and_answer_content() -> None:
+    event = build_text_end_event(worker_agent_output={"answer": "ok", ...})
+    await store.persist(event)
+    assert saved.content == "ok"
+    assert saved.metadata_json["worker_agent_output"]["answer"] == "ok"
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `pytest backend/tests/unit/core/agentscope/events/test_store.py::test_text_message_end_persists_worker_output_and_answer_content -v`
+Expected: FAIL
+
+**Step 3: Write minimal implementation**
+
+```python
+worker = WorkerAgentOutput.model_validate(event.get("workerAgentOutput") or {})
+content = worker.answer
+metadata["worker_agent_output"] = worker.model_dump(mode="json")
+```
+
+orchestrator 在 `text.end` 事件 data 写入 `workerAgentOutput`。
+
+**Step 4: Run test to verify it passes**
+
+Run: `pytest backend/tests/unit/core/agentscope/events/test_store.py::test_text_message_end_persists_worker_output_and_answer_content -v`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add backend/src/core/agentscope/events/store.py backend/src/core/agentscope/runtime/orchestrator.py backend/tests/unit/core/agentscope/events/test_store.py
+git commit -m "refactor: persist worker output schema with answer as message content"
+```
+
+### Task 4: Tool 结构化落库（content=result_summary）并删除旧摘要逻辑
+
+**Files:**
+- Modify: `backend/src/core/agentscope/events/store.py`
+- Modify: `backend/src/core/agentscope/runtime/orchestrator.py`
+- Delete: `backend/src/core/agentscope/events/tool_result_summary.py`
+- Test: `backend/tests/unit/core/agentscope/events/test_store.py`
+
+**Step 1: Write the failing test**
+
+```python
+async def test_tool_result_persists_tool_output_and_summary_content() -> None:
+    event = build_tool_result_event(tool_agent_output={"result_summary": "done", ...})
+    await store.persist(event)
+    assert saved.content == "done"
+    assert saved.metadata_json["tool_agent_output"]["result_summary"] == "done"
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `pytest backend/tests/unit/core/agentscope/events/test_store.py::test_tool_result_persists_tool_output_and_summary_content -v`
+Expected: FAIL
+
+**Step 3: Write minimal implementation**
+
+```python
+tool = ToolAgentOutput.model_validate(event.get("toolAgentOutput") or {})
+content = tool.result_summary
+metadata["tool_agent_output"] = tool.model_dump(mode="json")
+```
+
+移除 `build_tool_content_summary` 相关 import/调用。
+
+**Step 4: Run test to verify it passes**
+
+Run: `pytest backend/tests/unit/core/agentscope/events/test_store.py::test_tool_result_persists_tool_output_and_summary_content -v`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add backend/src/core/agentscope/events/store.py backend/src/core/agentscope/runtime/orchestrator.py backend/tests/unit/core/agentscope/events/test_store.py backend/src/core/agentscope/events/tool_result_summary.py
+git commit -m "refactor: persist tool output schema and remove legacy summary builder"
+```
+
+### Task 5: Worker output 模型别名收敛（可选第二阶段）
+
+**Files:**
+- Modify: `backend/src/schemas/agent/runtime_models.py`
+- Modify: `backend/src/schemas/messages/chat_message.py`
+- Test: `backend/tests/unit/schemas/agent/test_runtime_models.py`
+
+**Step 1: Write the failing test**
+
+```python
+def test_worker_output_lite_disallows_ui_hints() -> None:
+    with pytest.raises(ValidationError):
+        WorkerAgentOutputLite.model_validate({... , "ui_hints": {...}})
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `pytest backend/tests/unit/schemas/agent/test_runtime_models.py::test_worker_output_lite_disallows_ui_hints -v`
+Expected: 根据现状决定（若已 fail 则作为守护测试）
+
+**Step 3: Write minimal implementation**
+
+```python
+WorkerAgentOutput = WorkerAgentOutputLite | WorkerAgentOutputRich
+```
+
+如不想扩大变更，可保留现状并仅补充注释说明由 `resolve_worker_output_model` 决定运行时约束。
+
+**Step 4: Run test to verify it passes**
+
+Run: `pytest backend/tests/unit/schemas/agent/test_runtime_models.py -v`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add backend/src/schemas/agent/runtime_models.py backend/src/schemas/messages/chat_message.py backend/tests/unit/schemas/agent/test_runtime_models.py
+git commit -m "refactor: clarify worker output model contract for lite and rich modes"
+```
+
+### Task 6: 端到端回归与文档同步
+
+**Files:**
+- Modify: `docs/protocols/agent-chat-messages.md`
+- Modify: `docs/runtime/runtime-route.md`
+
+**Step 1: Run targeted backend tests**
+
+Run: `pytest backend/tests/unit/v1/agent/test_agent_router.py backend/tests/unit/core/agentscope/runtime/test_orchestrator.py backend/tests/unit/core/agentscope/events/test_store.py -v`
+Expected: PASS
+
+**Step 2: Run lint/type checks**
+
+Run: `cd backend && ruff check src tests && mypy src`
+Expected: PASS
+
+**Step 3: Update docs for new contracts**
+
+- 明确 `runs` 的 URL 安全边界与 422 错误码。
+- 明确 `worker_agent_output`/`tool_agent_output` 的落库契约及 `content` 映射规则。
+
+**Step 4: Final verification**
+
+Run: `pytest backend/tests -q`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add docs/protocols/agent-chat-messages.md docs/runtime/runtime-route.md
+git commit -m "docs: align runs multimodal and structured persistence contracts"
+```
@@ -0,0 +1,87 @@
+# Agent Runs Multimodal 与落库重构设计
+
+**目标**：让 `POST /agent/runs` 支持真实多模态直传到模型（非文本化），并将 worker/tool 结果按新 metadata 协议结构化落库。
+
+**范围**：后端 `runs/resume` 请求校验、runtime 输入转换、事件落库、history 回放一致性。
+
+---
+
+## 1. 背景与问题
+
+- 当前 `binary` 内容在运行链路中被当作普通 JSON 文本拼接进入 prompt，模型拿不到原生图像输入。
+- tool 落库仍依赖旧摘要逻辑 `build_tool_content_summary`，与最新 `ToolAgentOutput` 元数据规范不一致。
+- worker 落库当前只落文本内容，未确保 `WorkerAgentOutput` 结构化对象与 `content=answer` 的一致关系。
+
+---
+
+## 2. 设计原则
+
+- 协议单一信源：严格遵循 `docs/protocols/agent-chat-messages.md`，只接受 `binary` 形态，不兼容旧形态。
+- 最小安全边界：仅允许本项目 Supabase 私有桶签名 URL，拒绝任意外部 URL。
+- 事件驱动持久化：以 event store 作为唯一落库入口，避免双轨逻辑。
+- 数据可回放：history 始终可按 metadata 重新签名并回填 user 附件。
+
+---
+
+## 3. 目标数据流
+
+1. `runs` 入参校验通过后，user message 入库（附件仅存 bucket/path/mime）。
+2. runtime 执行时，将 `binary` 转为模型多模态 `image_url` content block 直传。
+3. orchestrator 产出结构化事件：
+   - worker 主响应通过 `TEXT_MESSAGE_*` 事件发送，`TEXT_MESSAGE_END` 携带 `workerAgentOutput`。
+   - tool 执行结果通过 `TOOL_CALL_RESULT` 事件发送，携带 `toolAgentOutput`。
+4. event store 统一校验并落库：
+   - worker：`content = answer`，metadata 写 `worker_agent_output`。
+   - tool：`content = result_summary`，metadata 写 `tool_agent_output`。
+5. history 读取 user metadata 重新签名 URL，返回 `binary` block 给前端。
+
+---
+
+## 4. 安全与错误策略
+
+### 4.1 URL 安全边界
+
+- `binary.url` 必须满足：
+  - host 为当前 Supabase 项目域名。
+  - path 为 `/storage/v1/object/sign/{bucket}/{path}`。
+  - `{bucket}` 等于 `config.storage.bucket`。
+  - `{path}` 前缀匹配 `agent-inputs/{user_id}/{thread_id}/uploads/`。
+
+### 4.2 运行失败
+
+- 保持 AG-UI 生命周期完整：`RUN_STARTED` 后只能 `RUN_FINISHED` 或 `RUN_ERROR` 结束。
+- 运行错误时不落半结构化消息，避免脏元数据。
+
+---
+
+## 5. 落库契约
+
+### 5.1 Worker
+
+- 入库角色：`assistant`
+- `messages.content = worker_agent_output.answer`
+- `messages.metadata.worker_agent_output = WorkerAgentOutput`（完整、schema 校验后）
+
+### 5.2 Tool
+
+- 入库角色：`tool`
+- `messages.content = tool_agent_output.result_summary`
+- `messages.metadata.tool_agent_output = ToolAgentOutput`（完整、schema 校验后）
+- 删除旧摘要逻辑：`build_tool_content_summary`
+
+---
+
+## 6. 兼容性策略
+
+- 不兼容旧输入块形态（如 `image_url` 作为 runs 输入）。
+- 历史接口输出协议保持不变，前端无需修改消费协议。
+- 原有 user 附件回放路径保留，只强化入站 URL 校验。
+
+---
+
+## 7. 验收标准
+
+- runs 包含合法 `binary` 时，模型收到多模态消息（非文本化 JSON）。
+- 非本项目签名 URL 返回 `422`。
+- worker/tool 落库满足 `content` 与结构化 metadata 一一对应。
+- history 仍能正确回放 user 附件（临时签名 URL）。
@@ -0,0 +1,239 @@
+# Agent Runs Events and History Route Protocol
+
+> **NOTE**: This document is the single source of truth for agent runs event streaming and history snapshot routes.
+
+## Overview
+
+Defines the transport format for:
+
+- `POST /api/v1/agent/runs`
+- `GET /api/v1/agent/runs/{thread_id}/events`
+- `GET /api/v1/agent/history`
+- `GET /api/v1/agent/attachments/signed-url`
+
+## Version
+
+- **Current**: `1.0`
+- **Status**: Draft (pending full backend/frontend alignment)
+
+---
+
+## Route Semantics
+
+### `GET /api/v1/agent/history`
+
+- Unified history endpoint.
+- Query params:
+  - `threadId` (optional): target thread id.
+  - `before` (optional, `YYYY-MM-DD`): paginate by day.
+- Behavior:
+  - With `threadId`: returns that thread's day snapshot.
+  - Without `threadId`: returns latest available thread snapshot for current user.
+
+### `GET /api/v1/agent/attachments/signed-url`
+
+- Generate temporary signed URL for attachment rendering.
+- Query params:
+  - `bucket` (required)
+  - `path` (required)
+- Scope rule:
+  - `bucket` must match current storage bucket.
+  - `path` must be within current user prefix `agent-inputs/{user_id}/`.
+
+---
+
+## SSE Envelope (`/events`)
+
+`GET /api/v1/agent/runs/{thread_id}/events` uses `text/event-stream`.
+
+Each SSE frame format:
+
+```text
+id: <stream-id>
+event: <EVENT_TYPE>
+data: <JSON payload>
+
+```
+
+---
+
+## Event Type Set
+
+- `RUN_STARTED`
+- `STEP_STARTED`
+- `STEP_FINISHED`
+- `TEXT_MESSAGE_START`
+- `TEXT_MESSAGE_CONTENT`
+- `TEXT_MESSAGE_END`
+- `TOOL_CALL_RESULT`
+- `RUN_FINISHED`
+- `RUN_ERROR`
+
+---
+
+## Common Event Fields
+
+```typescript
+interface EventBase {
+  type: string;
+  threadId: string;
+  runId?: string;
+}
+```
+
+---
+
+## Event Payload Schemas
+
+### Run Lifecycle
+
+```typescript
+interface RunStartedEvent extends EventBase {
+  type: "RUN_STARTED";
+  runId: string;
+}
+
+interface RunFinishedEvent extends EventBase {
+  type: "RUN_FINISHED";
+  runId: string;
+}
+
+interface RunErrorEvent extends EventBase {
+  type: "RUN_ERROR";
+  runId: string;
+  message: string;
+}
+```
+
+### Step Lifecycle
+
+```typescript
+interface StepStartedEvent extends EventBase {
+  type: "STEP_STARTED";
+  runId: string;
+  stepName: string;
+}
+
+interface StepFinishedEvent extends EventBase {
+  type: "STEP_FINISHED";
+  runId: string;
+  stepName: string;
+}
+```
+
+### Text Streaming
+
+```typescript
+interface TextMessageStartEvent extends EventBase {
+  type: "TEXT_MESSAGE_START";
+  runId: string;
+  messageId: string;
+  role: "assistant" | "system" | "user" | "tool";
+  stage?: string;
+}
+
+interface TextMessageContentEvent extends EventBase {
+  type: "TEXT_MESSAGE_CONTENT";
+  runId: string;
+  messageId: string;
+  delta: string; // incremental text chunk
+}
+
+interface TextMessageEndEvent extends EventBase {
+  type: "TEXT_MESSAGE_END";
+  runId: string;
+  messageId: string;
+  workerAgentOutput: WorkerAgentOutput;
+  // stage/model are intentionally excluded from this event
+}
+```
+
+### Tool Result
+
+```typescript
+interface ToolCallResultEvent extends EventBase {
+  type: "TOOL_CALL_RESULT";
+  messageId: string;
+  toolCallId: string;
+  toolAgentOutput: ToolAgentOutput; // required
+}
+```
+
+### Worker/Tool Payloads
+
+```typescript
+interface WorkerAgentOutput {
+  status: "success" | "partial_success" | "failed";
+  answer: string;
+  key_points?: string[];
+  result_type?: string;
+  suggested_actions?: string[];
+  error?: {
+    code: string;
+    message: string;
+    retryable?: boolean;
+    details?: Record<string, unknown>;
+  };
+  ui_hints?: Record<string, unknown>;
+}
+
+interface ToolAgentOutput {
+  tool_name: string;
+  tool_call_id: string;
+  tool_call_args?: Record<string, unknown>;
+  status: "success" | "partial" | "failure";
+  result_summary: string;
+  ui_hints?: Record<string, unknown>;
+  error?: {
+    code: string;
+    message: string;
+    retryable?: boolean;
+    details?: Record<string, unknown>;
+  };
+}
+```
+
+---
+
+## History Response Schema
+
+`GET /api/v1/agent/history` returns `STATE_SNAPSHOT` payload.
+
+```typescript
+interface AgentHistoryResponse {
+  type: "STATE_SNAPSHOT";
+  threadId?: string;
+  snapshot: {
+    scope: "history_day";
+    threadId: string | null;
+    day: string | null; // YYYY-MM-DD
+    hasMore: boolean;
+    messages: SnapshotMessage[];
+  };
+}
+
+interface SnapshotMessage {
+  id: string;
+  seq: number;
+  role: "user" | "assistant" | "system" | "tool";
+  content: string;
+  metadata?: Record<string, unknown>;
+  timestamp: string; // ISO-8601
+}
+
+interface AttachmentSignedUrlResponse {
+  bucket: string;
+  path: string;
+  url: string;
+}
+```
+
+---
+
+## Compatibility Notes
+
+- For `TOOL_CALL_RESULT`, clients should treat `toolAgentOutput` as canonical payload.
+- `TEXT_MESSAGE_CONTENT.delta` is defined as incremental text chunk. Implementations should emit multiple chunks for real streaming UX.
+- `TEXT_MESSAGE_END` must not include `stage` or `model` in this protocol version.
+- History snapshot `messages[]` strictly follows `backend/src/schemas/messages/chat_message.py` `AgentChatMessage` schema.
+- Attachment URL rendering is decoupled from history; client should call `/api/v1/agent/attachments/signed-url` using metadata fields.