docs(agent): add Task2/Task3 architecture and implementation artifacts

2026-03-08 16:03:02 +08:00
parent 8a23018b6d
commit 5ada60e834
3 changed files with 977 additions and 0 deletions
@@ -0,0 +1,323 @@
+# Agent 模块审查报告 - 工具架构
+
+**日期**: 2026-03-08
+**范围**: `backend/src/core/agent`
+**状态**: 待评估
+
+---
+
+## 🟡 MEDIUM - 工具架构问题
+
+### 1. 未使用 CrewAI 工具模块，工具硬编码
+
+**文件**: 
+- `application/run_service.py:406` - `_execute_backend_tool()`
+- `infrastructure/crewai/runtime.py` - 三阶段流程
+
+**问题**:
+
+当前 agent 只使用了 CrewAI 的 **agent/task 配置模板**（YAML），但**没有使用 CrewAI 的工具系统**：
+
+```
+已用到:
+├── agents.yaml (agent 角色定义)
+└── tasks.yaml (task 定义)
+
+未用到:
+├── @tool 装饰器
+├── BaseTool 类
+└── Tools 工具注册表
+```
+
+**当前实现**：
+```python
+# run_service.py:406
+async def _execute_backend_tool(self, *, tool_name, tool_args, ...):
+    if tool_name != "create_calendar_event":  # 硬编码判断
+        raise ValueError(f"unsupported backend tool: {tool_name}")
+    # 手动执行工具...
+```
+
+**影响**:
+1. 每新增一个工具需要修改 `_execute_backend_tool()` 代码
+2. 无法利用 CrewAI 的工具选择、执行结果处理等能力
+3. 与 CrewAI 集成度低，无法发挥框架优势
+4. 无法将工具描述等prompt信息自动注入agent中
+
+---
+
+## 🟡 MEDIUM - 工具结果存储问题
+
+### 2. 工具结果存储到对象存储的功能未启用
+
+**文件**: 
+- `application/session_state_persistence.py:52` - `persist_tool_result_payload()`
+- `models/agent_chat_message.py` - messages 表
+
+**问题**:
+
+已定义 `persist_tool_result_payload()` 函数，可将工具结果上传到对象存储（MinIO/Supabase Storage），但**该函数未被调用**。
+
+当前实现：
+- 工具结果直接存在数据库 `messages.content` 字段
+- `metadata_json` 中定义了 `storage_bucket`, `storage_path` 等字段，但都是 `None`
+
+```python
+# message_metadata.py:17-27
+class MessageMetadataToolResult(BaseModel):
+    storage_bucket: str | None = None  # 当前未使用
+    storage_path: str | None = None     # 当前未使用
+    payload_sha256: str | None = None  # 当前未使用
+```
+
+**影响**:
+1. 工具结果（尤其是 UI 组件等大数据）存在数据库，增加 DB 负担
+2. 已定义的存储接口未被使用，代码冗余
+3. 无法利用对象存储的 CDN 加速和带宽优势
+
+---
+
+## 🟡 MEDIUM - 工具输出格式问题
+
+### 3. 工具输出不是 UI Schema，前端无法直接渲染
+
+**文件**: 
+- `application/run_service.py:456-479` - `_execute_backend_tool()`
+
+**问题**:
+
+当前 `create_calendar_event` 工具返回的是**非结构化文本**，不是前端可渲染的 UI Schema：
+
+```python
+# run_service.py:456-479
+event_id = str(schedule_item.id)
+ui_card = {
+    "type": "calendar_card.v1",
+    "version": "v1",
+    "data": {...}
+    "actions": [...]
+}
+# ui_card 构建了但没有作为 tool result 返回
+return {"status": "ok", "event_id": event_id}  # 只返回了简单结构
+```
+
+**当前输出**：
+```json
+{
+  "status": "ok",
+  "event_id": "xxx"
+}
+```
+
+**期望输出**（UI Schema）：
+```json
+{
+  "type": "calendar_card.v1",
+  "version": "v1",
+  "data": {
+    "id": "xxx",
+    "title": "会议",
+    "startAt": "2026-03-08T15:00:00Z",
+    ...
+  },
+  "actions": [
+    {"type": "link", "label": "查看详情", "target": "/calendar/events/xxx"}
+  ]
+}
+```
+
+**影响**:
+1. 前端无法直接渲染丰富的 UI 组件
+2. 需要前端手动解析文本再渲染，增加前端工作量
+3. 无法利用 AG-UI 协议的 `ui_schema` 能力
+
+---
+
+## 🟡 MEDIUM - 阶段配置问题
+
+### 4. 三阶段流程参数硬编码，无法为每个阶段配置不同策略
+
+**文件**: 
+- `infrastructure/crewai/runtime.py:190-277` - `CrewAIRuntime.execute()`
+
+**问题**:
+
+当前三阶段流程（intent → execution → organization）是硬编码在 `run_agent_task()` 中的，无法为每个阶段配置不同的参数，如每个阶段可以使用的工具：
+
+```python
+# runtime.py:203-277
+# intent 阶段
+intent_text, intent_usage = _run_stage(
+    litellm_model=litellm_model,
+    api_key=...,
+    llm_config=self._llm_config,  # 同一套配置
+    stage="intent",
+    ...
+)
+
+# execution 阶段（如果有）
+execution_text, execution_usage = _run_stage(
+    litellm_model=litellm_model,
+    api_key=...,
+    llm_config=self._llm_config,  # 同一套配置
+    stage="execution",
+    ...
+)
+
+# organization 阶段
+organization_text, organization_usage = _run_stage(
+    litellm_model=litellm_model,
+    api_key=...,
+    llm_config=self._llm_config,  # 同一套配置
+    stage="organization",
+    ...
+)
+```
+
+**当前限制**：
+1. 无法为 intent 阶段设置只读 LLM（不允许工具调用）
+
+
+**影响**:
+1. 无法精细控制每个阶段的 LLM 行为
+2. 意图识别阶段可能误触发工具调用
+3. 增加不必要的 LLM 调用成本
+4. 降低了架构的灵活性
+
+---
+
+## 🔴 HIGH - Agent Loop 断裂问题
+
+### 5. 工具审批后未继续 Agent Loop
+
+**文件**: 
+- `application/resume_service.py:121-158`
+
+**问题**:
+
+前端审批工具调用后，后端返回 tool result，但**没有继续执行 agent loop**，直接标记 session 为 COMPLETED 结束。
+
+当前流程：
+```python
+# resume_service.py:121-127
+snapshot = self._state_persistence.build_completed_snapshot()
+await session_repository.update_runtime_state(
+    chat_session=chat_session,
+    status=AgentChatSessionStatus.COMPLETED,  # 直接完成
+    state_snapshot=snapshot,
+    ...
+)
+```
+
+缺失的流程：
+```
+1. 接收 tool result
+2. 将 tool result 作为 message 存入上下文
+3. 再次调用 LLM（带 tool result）
+4. 生成最终回复
+5. 标记为 COMPLETED
+```
+
+**影响**:
+1. 用户审批工具后，agent 不会继续生成回复
+2. 整个 agent loop 在工具审批后断裂
+3. 用户体验不完整
+
+---
+
+## 🔴 HIGH - 对话历史和用户上下文架构错误
+
+### 6. 对话历史由前端维护，违反后端架构设计
+
+**文件**: 
+- `application/run_service.py:89-124`
+- `domain/agui_input.py`
+
+**问题**:
+
+当前架构中，**对话历史完全由前端维护并传递**：
+
+```
+前端 → GET /runs/{thread_id}/history → 后端返回历史 messages
+前端 → POST /runs/{thread_id}/run → 前端把 history 放入 run_input.messages 传给后端
+后端 → 只读取 run_input 中的最新 user_input，不读取数据库历史
+```
+
+代码证据 (`run_service.py:89-124`)：
+```python
+async def run(self, *, run_input: RunAgentInput):
+    user_input = extract_latest_user_text(run_input)  # 只取最新用户消息
+    
+    runtime_result = await asyncio.to_thread(
+        runtime.execute,
+        user_input=user_input,  # 只传最新输入
+        system_prompt=system_prompt,
+    )
+```
+
+**影响**:
+1. **高危安全风险**：前端可以篡改对话历史，伪造上下文
+2. **架构违反**：用户上下文和对话历史都应该由后端维护
+3. **数据不一致**：前端可能遗漏或错误处理历史消息
+4. **无法支持多端同步**：不同前端设备看到的历史可能不同
+5. **Token 浪费**：每次请求都要传递完整历史，增加请求体积
+6. 原来的计划文档写清楚了，后端通过redis来缓存对话历史，并结合数据库读取的回退策略
+
+---
+
+## 🟡 MEDIUM - 多模态输入支持问题
+
+### 7. 不支持图片等多模态输入
+
+**文件**: 
+- `domain/agui_input.py:64-86` - `extract_latest_user_text()`
+- `infrastructure/crewai/runtime.py:121-136` - `_run_stage()`
+- `infrastructure/litellm/client.py`
+
+**问题**:
+
+当前架构**只支持纯文本输入**，图片等多模态内容被丢弃：
+
+代码证据 (`agui_input.py:64-86`)：
+```python
+def extract_latest_user_text(run_input: RunAgentInput) -> str:
+    if isinstance(content, list):
+        for item in content:
+            if getattr(item, "type", None) != "text":
+                continue  # ❌ 跳过非 text 类型（图片被丢弃）
+```
+
+代码证据 (`runtime.py:125`)：
+```python
+messages.append({"role": "user", "content": user_content})  # 只传 str
+```
+
+**影响**:
+1. 用户无法发送图片进行多模态交互
+2. 浪费多模态 LLM 能力
+3. 无法实现"上传图片让 AI 分析"等场景
+
+---
+
+## 🟡 MEDIUM - 缺失语音识别 (ASR) 功能
+
+### 8. 未实现 fun-asr-realtime 语音识别 API 相关路由
+
+**文件**: 
+- 无（功能缺失）
+
+**问题**:
+
+后端**未实现语音识别功能**，无法处理前端传入的音频数据：
+
+当前状态：
+- `dashscope` 只用于 LLM（qwen3.5-flash 等）
+- 没有任何 fun-asr、ASR、audio、transcribe 相关代码
+- v1 路由中无语音/音频相关 API
+
+**影响**:
+1. 用户无法发送语音消息
+2. 无法实现实时语音对话场景
+3. 需要前端自行完成 ASR，增大前端负担
+
+---