Files
social-app/docs/plans/2026-03-13-agent-runs-multimodal-implementation.md
T
qzl 1c02503d1d refactor: 简化 AgentScope 运行时模块与事件处理
- 移除冗余的 user_token 参数传递
- 重构 tool.result 事件使用 ToolAgentOutput 模型
- 重构 text.end 事件使用 WorkerAgentOutput 模型
- 简化 store 模块的 tool result 处理逻辑
- 更新 router/service 适配新事件结构
- 清理废弃的测试文件与设计文档
- 新增 AgentRuns 多模态存储设计文档
2026-03-13 17:27:18 +08:00

262 lines
9.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Agent Runs Multimodal Refactor Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** 让 runs/resume 使用真实多模态图片输入,并将 worker/tool 按新结构化 metadata 规范落库。
**Architecture:** 保持现有 event pipeline,不引入旁路写库。请求入口完成 URL 安全边界校验;runtime 将 `binary` 转模型可识别 `image_url` blockevent store 统一校验 `WorkerAgentOutput` / `ToolAgentOutput` 并完成 `content` 映射。
**Tech Stack:** FastAPI, Pydantic v2, SQLAlchemy AsyncSession, AgentScope, LiteLLM, Redis Stream
---
### Task 1: Runs 输入安全边界
**Files:**
- Modify: `backend/src/core/agentscope/schemas/agui_input.py`
- Modify: `backend/src/v1/agent/router.py`
- Modify: `backend/src/v1/agent/service.py`
- Test: `backend/tests/unit/v1/agent/test_agent_router.py`
**Step 1: Write the failing test**
```python
def test_runs_rejects_non_project_signed_url(...) -> None:
payload = build_run_payload_with_binary_url("https://evil.example.com/storage/v1/object/sign/..." )
resp = client.post("/api/v1/agent/runs", json=payload, headers=auth_headers)
assert resp.status_code == 422
```
**Step 2: Run test to verify it fails**
Run: `pytest backend/tests/unit/v1/agent/test_agent_router.py::test_runs_rejects_non_project_signed_url -v`
Expected: FAIL(当前不会拦截该 URL
**Step 3: Write minimal implementation**
```python
def validate_binary_signed_url_scope(*, url: str, user_id: UUID, thread_id: UUID) -> tuple[str, str]:
bucket, path = supabase_service.parse_signed_url(url)
# check host, bucket, path prefix agent-inputs/{user_id}/{thread_id}/uploads/
return bucket, path
```
`runs/resume` 请求入口调用校验;若请求含 binary 且当前模型不支持视觉,抛 `HTTPException(status_code=422, ...)`
**Step 4: Run test to verify it passes**
Run: `pytest backend/tests/unit/v1/agent/test_agent_router.py::test_runs_rejects_non_project_signed_url -v`
Expected: PASS
**Step 5: Commit**
```bash
git add backend/src/core/agentscope/schemas/agui_input.py backend/src/v1/agent/router.py backend/src/v1/agent/service.py backend/tests/unit/v1/agent/test_agent_router.py
git commit -m "fix: enforce signed image url scope on runs"
```
### Task 2: Runtime 多模态直传(移除文本化图片)
**Files:**
- Modify: `backend/src/core/agentscope/runtime/orchestrator.py`
- Modify: `backend/src/core/agentscope/prompts/agent_prompt.py`
- Test: `backend/tests/unit/core/agentscope/runtime/test_orchestrator.py`
**Step 1: Write the failing test**
```python
async def test_orchestrator_passes_image_url_block_to_runner() -> None:
command = build_run_input_with_binary("https://project.supabase.co/storage/v1/object/sign/...")
await orchestrator.run(..., command=command, ...)
assert fake_runner.user_input[1]["type"] == "image_url"
```
**Step 2: Run test to verify it fails**
Run: `pytest backend/tests/unit/core/agentscope/runtime/test_orchestrator.py::test_orchestrator_passes_image_url_block_to_runner -v`
Expected: FAIL(当前路径仍可能文本化)
**Step 3: Write minimal implementation**
```python
def _to_model_multimodal_blocks(content_blocks: list[dict[str, Any]]) -> list[dict[str, Any]]:
# text -> {type:"text", text:...}
# binary -> {type:"image_url", image_url:{url:...}}
```
将 runner 输入改为上述多模态块;禁止把图片块拼进普通字符串。
**Step 4: Run test to verify it passes**
Run: `pytest backend/tests/unit/core/agentscope/runtime/test_orchestrator.py::test_orchestrator_passes_image_url_block_to_runner -v`
Expected: PASS
**Step 5: Commit**
```bash
git add backend/src/core/agentscope/runtime/orchestrator.py backend/src/core/agentscope/prompts/agent_prompt.py backend/tests/unit/core/agentscope/runtime/test_orchestrator.py
git commit -m "feat: pass image blocks as multimodal payload to model"
```
### Task 3: Worker 结构化落库(content=answer
**Files:**
- Modify: `backend/src/core/agentscope/events/store.py`
- Modify: `backend/src/core/agentscope/runtime/orchestrator.py`
- Test: `backend/tests/unit/core/agentscope/events/test_store.py`
**Step 1: Write the failing test**
```python
async def test_text_message_end_persists_worker_output_and_answer_content() -> None:
event = build_text_end_event(worker_agent_output={"answer": "ok", ...})
await store.persist(event)
assert saved.content == "ok"
assert saved.metadata_json["worker_agent_output"]["answer"] == "ok"
```
**Step 2: Run test to verify it fails**
Run: `pytest backend/tests/unit/core/agentscope/events/test_store.py::test_text_message_end_persists_worker_output_and_answer_content -v`
Expected: FAIL
**Step 3: Write minimal implementation**
```python
worker = WorkerAgentOutput.model_validate(event.get("workerAgentOutput") or {})
content = worker.answer
metadata["worker_agent_output"] = worker.model_dump(mode="json")
```
orchestrator 在 `text.end` 事件 data 写入 `workerAgentOutput`
**Step 4: Run test to verify it passes**
Run: `pytest backend/tests/unit/core/agentscope/events/test_store.py::test_text_message_end_persists_worker_output_and_answer_content -v`
Expected: PASS
**Step 5: Commit**
```bash
git add backend/src/core/agentscope/events/store.py backend/src/core/agentscope/runtime/orchestrator.py backend/tests/unit/core/agentscope/events/test_store.py
git commit -m "refactor: persist worker output schema with answer as message content"
```
### Task 4: Tool 结构化落库(content=result_summary)并删除旧摘要逻辑
**Files:**
- Modify: `backend/src/core/agentscope/events/store.py`
- Modify: `backend/src/core/agentscope/runtime/orchestrator.py`
- Delete: `backend/src/core/agentscope/events/tool_result_summary.py`
- Test: `backend/tests/unit/core/agentscope/events/test_store.py`
**Step 1: Write the failing test**
```python
async def test_tool_result_persists_tool_output_and_summary_content() -> None:
event = build_tool_result_event(tool_agent_output={"result_summary": "done", ...})
await store.persist(event)
assert saved.content == "done"
assert saved.metadata_json["tool_agent_output"]["result_summary"] == "done"
```
**Step 2: Run test to verify it fails**
Run: `pytest backend/tests/unit/core/agentscope/events/test_store.py::test_tool_result_persists_tool_output_and_summary_content -v`
Expected: FAIL
**Step 3: Write minimal implementation**
```python
tool = ToolAgentOutput.model_validate(event.get("toolAgentOutput") or {})
content = tool.result_summary
metadata["tool_agent_output"] = tool.model_dump(mode="json")
```
移除 `build_tool_content_summary` 相关 import/调用。
**Step 4: Run test to verify it passes**
Run: `pytest backend/tests/unit/core/agentscope/events/test_store.py::test_tool_result_persists_tool_output_and_summary_content -v`
Expected: PASS
**Step 5: Commit**
```bash
git add backend/src/core/agentscope/events/store.py backend/src/core/agentscope/runtime/orchestrator.py backend/tests/unit/core/agentscope/events/test_store.py backend/src/core/agentscope/events/tool_result_summary.py
git commit -m "refactor: persist tool output schema and remove legacy summary builder"
```
### Task 5: Worker output 模型别名收敛(可选第二阶段)
**Files:**
- Modify: `backend/src/schemas/agent/runtime_models.py`
- Modify: `backend/src/schemas/messages/chat_message.py`
- Test: `backend/tests/unit/schemas/agent/test_runtime_models.py`
**Step 1: Write the failing test**
```python
def test_worker_output_lite_disallows_ui_hints() -> None:
with pytest.raises(ValidationError):
WorkerAgentOutputLite.model_validate({... , "ui_hints": {...}})
```
**Step 2: Run test to verify it fails**
Run: `pytest backend/tests/unit/schemas/agent/test_runtime_models.py::test_worker_output_lite_disallows_ui_hints -v`
Expected: 根据现状决定(若已 fail 则作为守护测试)
**Step 3: Write minimal implementation**
```python
WorkerAgentOutput = WorkerAgentOutputLite | WorkerAgentOutputRich
```
如不想扩大变更,可保留现状并仅补充注释说明由 `resolve_worker_output_model` 决定运行时约束。
**Step 4: Run test to verify it passes**
Run: `pytest backend/tests/unit/schemas/agent/test_runtime_models.py -v`
Expected: PASS
**Step 5: Commit**
```bash
git add backend/src/schemas/agent/runtime_models.py backend/src/schemas/messages/chat_message.py backend/tests/unit/schemas/agent/test_runtime_models.py
git commit -m "refactor: clarify worker output model contract for lite and rich modes"
```
### Task 6: 端到端回归与文档同步
**Files:**
- Modify: `docs/protocols/agent-chat-messages.md`
- Modify: `docs/runtime/runtime-route.md`
**Step 1: Run targeted backend tests**
Run: `pytest backend/tests/unit/v1/agent/test_agent_router.py backend/tests/unit/core/agentscope/runtime/test_orchestrator.py backend/tests/unit/core/agentscope/events/test_store.py -v`
Expected: PASS
**Step 2: Run lint/type checks**
Run: `cd backend && ruff check src tests && mypy src`
Expected: PASS
**Step 3: Update docs for new contracts**
- 明确 `runs` 的 URL 安全边界与 422 错误码。
- 明确 `worker_agent_output`/`tool_agent_output` 的落库契约及 `content` 映射规则。
**Step 4: Final verification**
Run: `pytest backend/tests -q`
Expected: PASS
**Step 5: Commit**
```bash
git add docs/protocols/agent-chat-messages.md docs/runtime/runtime-route.md
git commit -m "docs: align runs multimodal and structured persistence contracts"
```