324 lines
8.5 KiB
Markdown
324 lines
8.5 KiB
Markdown
|
|
# Agent 模块审查报告 - 工具架构
|
|||
|
|
|
|||
|
|
**日期**: 2026-03-08
|
|||
|
|
**范围**: `backend/src/core/agent`
|
|||
|
|
**状态**: 待评估
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🟡 MEDIUM - 工具架构问题
|
|||
|
|
|
|||
|
|
### 1. 未使用 CrewAI 工具模块,工具硬编码
|
|||
|
|
|
|||
|
|
**文件**:
|
|||
|
|
- `application/run_service.py:406` - `_execute_backend_tool()`
|
|||
|
|
- `infrastructure/crewai/runtime.py` - 三阶段流程
|
|||
|
|
|
|||
|
|
**问题**:
|
|||
|
|
|
|||
|
|
当前 agent 只使用了 CrewAI 的 **agent/task 配置模板**(YAML),但**没有使用 CrewAI 的工具系统**:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
已用到:
|
|||
|
|
├── agents.yaml (agent 角色定义)
|
|||
|
|
└── tasks.yaml (task 定义)
|
|||
|
|
|
|||
|
|
未用到:
|
|||
|
|
├── @tool 装饰器
|
|||
|
|
├── BaseTool 类
|
|||
|
|
└── Tools 工具注册表
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**当前实现**:
|
|||
|
|
```python
|
|||
|
|
# run_service.py:406
|
|||
|
|
async def _execute_backend_tool(self, *, tool_name, tool_args, ...):
|
|||
|
|
if tool_name != "create_calendar_event": # 硬编码判断
|
|||
|
|
raise ValueError(f"unsupported backend tool: {tool_name}")
|
|||
|
|
# 手动执行工具...
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**影响**:
|
|||
|
|
1. 每新增一个工具需要修改 `_execute_backend_tool()` 代码
|
|||
|
|
2. 无法利用 CrewAI 的工具选择、执行结果处理等能力
|
|||
|
|
3. 与 CrewAI 集成度低,无法发挥框架优势
|
|||
|
|
4. 无法将工具描述等prompt信息自动注入agent中
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🟡 MEDIUM - 工具结果存储问题
|
|||
|
|
|
|||
|
|
### 2. 工具结果存储到对象存储的功能未启用
|
|||
|
|
|
|||
|
|
**文件**:
|
|||
|
|
- `application/session_state_persistence.py:52` - `persist_tool_result_payload()`
|
|||
|
|
- `models/agent_chat_message.py` - messages 表
|
|||
|
|
|
|||
|
|
**问题**:
|
|||
|
|
|
|||
|
|
已定义 `persist_tool_result_payload()` 函数,可将工具结果上传到对象存储(MinIO/Supabase Storage),但**该函数未被调用**。
|
|||
|
|
|
|||
|
|
当前实现:
|
|||
|
|
- 工具结果直接存在数据库 `messages.content` 字段
|
|||
|
|
- `metadata_json` 中定义了 `storage_bucket`, `storage_path` 等字段,但都是 `None`
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# message_metadata.py:17-27
|
|||
|
|
class MessageMetadataToolResult(BaseModel):
|
|||
|
|
storage_bucket: str | None = None # 当前未使用
|
|||
|
|
storage_path: str | None = None # 当前未使用
|
|||
|
|
payload_sha256: str | None = None # 当前未使用
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**影响**:
|
|||
|
|
1. 工具结果(尤其是 UI 组件等大数据)存在数据库,增加 DB 负担
|
|||
|
|
2. 已定义的存储接口未被使用,代码冗余
|
|||
|
|
3. 无法利用对象存储的 CDN 加速和带宽优势
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🟡 MEDIUM - 工具输出格式问题
|
|||
|
|
|
|||
|
|
### 3. 工具输出不是 UI Schema,前端无法直接渲染
|
|||
|
|
|
|||
|
|
**文件**:
|
|||
|
|
- `application/run_service.py:456-479` - `_execute_backend_tool()`
|
|||
|
|
|
|||
|
|
**问题**:
|
|||
|
|
|
|||
|
|
当前 `create_calendar_event` 工具返回的是**非结构化文本**,不是前端可渲染的 UI Schema:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# run_service.py:456-479
|
|||
|
|
event_id = str(schedule_item.id)
|
|||
|
|
ui_card = {
|
|||
|
|
"type": "calendar_card.v1",
|
|||
|
|
"version": "v1",
|
|||
|
|
"data": {...}
|
|||
|
|
"actions": [...]
|
|||
|
|
}
|
|||
|
|
# ui_card 构建了但没有作为 tool result 返回
|
|||
|
|
return {"status": "ok", "event_id": event_id} # 只返回了简单结构
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**当前输出**:
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"status": "ok",
|
|||
|
|
"event_id": "xxx"
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**期望输出**(UI Schema):
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"type": "calendar_card.v1",
|
|||
|
|
"version": "v1",
|
|||
|
|
"data": {
|
|||
|
|
"id": "xxx",
|
|||
|
|
"title": "会议",
|
|||
|
|
"startAt": "2026-03-08T15:00:00Z",
|
|||
|
|
...
|
|||
|
|
},
|
|||
|
|
"actions": [
|
|||
|
|
{"type": "link", "label": "查看详情", "target": "/calendar/events/xxx"}
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**影响**:
|
|||
|
|
1. 前端无法直接渲染丰富的 UI 组件
|
|||
|
|
2. 需要前端手动解析文本再渲染,增加前端工作量
|
|||
|
|
3. 无法利用 AG-UI 协议的 `ui_schema` 能力
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🟡 MEDIUM - 阶段配置问题
|
|||
|
|
|
|||
|
|
### 4. 三阶段流程参数硬编码,无法为每个阶段配置不同策略
|
|||
|
|
|
|||
|
|
**文件**:
|
|||
|
|
- `infrastructure/crewai/runtime.py:190-277` - `CrewAIRuntime.execute()`
|
|||
|
|
|
|||
|
|
**问题**:
|
|||
|
|
|
|||
|
|
当前三阶段流程(intent → execution → organization)是硬编码在 `run_agent_task()` 中的,无法为每个阶段配置不同的参数,如每个阶段可以使用的工具:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# runtime.py:203-277
|
|||
|
|
# intent 阶段
|
|||
|
|
intent_text, intent_usage = _run_stage(
|
|||
|
|
litellm_model=litellm_model,
|
|||
|
|
api_key=...,
|
|||
|
|
llm_config=self._llm_config, # 同一套配置
|
|||
|
|
stage="intent",
|
|||
|
|
...
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# execution 阶段(如果有)
|
|||
|
|
execution_text, execution_usage = _run_stage(
|
|||
|
|
litellm_model=litellm_model,
|
|||
|
|
api_key=...,
|
|||
|
|
llm_config=self._llm_config, # 同一套配置
|
|||
|
|
stage="execution",
|
|||
|
|
...
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# organization 阶段
|
|||
|
|
organization_text, organization_usage = _run_stage(
|
|||
|
|
litellm_model=litellm_model,
|
|||
|
|
api_key=...,
|
|||
|
|
llm_config=self._llm_config, # 同一套配置
|
|||
|
|
stage="organization",
|
|||
|
|
...
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**当前限制**:
|
|||
|
|
1. 无法为 intent 阶段设置只读 LLM(不允许工具调用)
|
|||
|
|
|
|||
|
|
|
|||
|
|
**影响**:
|
|||
|
|
1. 无法精细控制每个阶段的 LLM 行为
|
|||
|
|
2. 意图识别阶段可能误触发工具调用
|
|||
|
|
3. 增加不必要的 LLM 调用成本
|
|||
|
|
4. 降低了架构的灵活性
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔴 HIGH - Agent Loop 断裂问题
|
|||
|
|
|
|||
|
|
### 5. 工具审批后未继续 Agent Loop
|
|||
|
|
|
|||
|
|
**文件**:
|
|||
|
|
- `application/resume_service.py:121-158`
|
|||
|
|
|
|||
|
|
**问题**:
|
|||
|
|
|
|||
|
|
前端审批工具调用后,后端返回 tool result,但**没有继续执行 agent loop**,直接标记 session 为 COMPLETED 结束。
|
|||
|
|
|
|||
|
|
当前流程:
|
|||
|
|
```python
|
|||
|
|
# resume_service.py:121-127
|
|||
|
|
snapshot = self._state_persistence.build_completed_snapshot()
|
|||
|
|
await session_repository.update_runtime_state(
|
|||
|
|
chat_session=chat_session,
|
|||
|
|
status=AgentChatSessionStatus.COMPLETED, # 直接完成
|
|||
|
|
state_snapshot=snapshot,
|
|||
|
|
...
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
缺失的流程:
|
|||
|
|
```
|
|||
|
|
1. 接收 tool result
|
|||
|
|
2. 将 tool result 作为 message 存入上下文
|
|||
|
|
3. 再次调用 LLM(带 tool result)
|
|||
|
|
4. 生成最终回复
|
|||
|
|
5. 标记为 COMPLETED
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**影响**:
|
|||
|
|
1. 用户审批工具后,agent 不会继续生成回复
|
|||
|
|
2. 整个 agent loop 在工具审批后断裂
|
|||
|
|
3. 用户体验不完整
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔴 HIGH - 对话历史和用户上下文架构错误
|
|||
|
|
|
|||
|
|
### 6. 对话历史由前端维护,违反后端架构设计
|
|||
|
|
|
|||
|
|
**文件**:
|
|||
|
|
- `application/run_service.py:89-124`
|
|||
|
|
- `domain/agui_input.py`
|
|||
|
|
|
|||
|
|
**问题**:
|
|||
|
|
|
|||
|
|
当前架构中,**对话历史完全由前端维护并传递**:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
前端 → GET /runs/{thread_id}/history → 后端返回历史 messages
|
|||
|
|
前端 → POST /runs/{thread_id}/run → 前端把 history 放入 run_input.messages 传给后端
|
|||
|
|
后端 → 只读取 run_input 中的最新 user_input,不读取数据库历史
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
代码证据 (`run_service.py:89-124`):
|
|||
|
|
```python
|
|||
|
|
async def run(self, *, run_input: RunAgentInput):
|
|||
|
|
user_input = extract_latest_user_text(run_input) # 只取最新用户消息
|
|||
|
|
|
|||
|
|
runtime_result = await asyncio.to_thread(
|
|||
|
|
runtime.execute,
|
|||
|
|
user_input=user_input, # 只传最新输入
|
|||
|
|
system_prompt=system_prompt,
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**影响**:
|
|||
|
|
1. **高危安全风险**:前端可以篡改对话历史,伪造上下文
|
|||
|
|
2. **架构违反**:用户上下文和对话历史都应该由后端维护
|
|||
|
|
3. **数据不一致**:前端可能遗漏或错误处理历史消息
|
|||
|
|
4. **无法支持多端同步**:不同前端设备看到的历史可能不同
|
|||
|
|
5. **Token 浪费**:每次请求都要传递完整历史,增加请求体积
|
|||
|
|
6. 原来的计划文档写清楚了,后端通过redis来缓存对话历史,并结合数据库读取的回退策略
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🟡 MEDIUM - 多模态输入支持问题
|
|||
|
|
|
|||
|
|
### 7. 不支持图片等多模态输入
|
|||
|
|
|
|||
|
|
**文件**:
|
|||
|
|
- `domain/agui_input.py:64-86` - `extract_latest_user_text()`
|
|||
|
|
- `infrastructure/crewai/runtime.py:121-136` - `_run_stage()`
|
|||
|
|
- `infrastructure/litellm/client.py`
|
|||
|
|
|
|||
|
|
**问题**:
|
|||
|
|
|
|||
|
|
当前架构**只支持纯文本输入**,图片等多模态内容被丢弃:
|
|||
|
|
|
|||
|
|
代码证据 (`agui_input.py:64-86`):
|
|||
|
|
```python
|
|||
|
|
def extract_latest_user_text(run_input: RunAgentInput) -> str:
|
|||
|
|
if isinstance(content, list):
|
|||
|
|
for item in content:
|
|||
|
|
if getattr(item, "type", None) != "text":
|
|||
|
|
continue # ❌ 跳过非 text 类型(图片被丢弃)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
代码证据 (`runtime.py:125`):
|
|||
|
|
```python
|
|||
|
|
messages.append({"role": "user", "content": user_content}) # 只传 str
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**影响**:
|
|||
|
|
1. 用户无法发送图片进行多模态交互
|
|||
|
|
2. 浪费多模态 LLM 能力
|
|||
|
|
3. 无法实现"上传图片让 AI 分析"等场景
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🟡 MEDIUM - 缺失语音识别 (ASR) 功能
|
|||
|
|
|
|||
|
|
### 8. 未实现 fun-asr-realtime 语音识别 API 相关路由
|
|||
|
|
|
|||
|
|
**文件**:
|
|||
|
|
- 无(功能缺失)
|
|||
|
|
|
|||
|
|
**问题**:
|
|||
|
|
|
|||
|
|
后端**未实现语音识别功能**,无法处理前端传入的音频数据:
|
|||
|
|
|
|||
|
|
当前状态:
|
|||
|
|
- `dashscope` 只用于 LLM(qwen3.5-flash 等)
|
|||
|
|
- 没有任何 fun-asr、ASR、audio、transcribe 相关代码
|
|||
|
|
- v1 路由中无语音/音频相关 API
|
|||
|
|
|
|||
|
|
**影响**:
|
|||
|
|
1. 用户无法发送语音消息
|
|||
|
|
2. 无法实现实时语音对话场景
|
|||
|
|
3. 需要前端自行完成 ASR,增大前端负担
|
|||
|
|
|
|||
|
|
---
|