d2d292a99e
- 所有 calendar action .md: skill/action 替换为 module/method + mode 字段 - handler_memory: 新增 Pydantic extra=forbid 模型替代手工 dict 校验 - memory/SKILL.md: 补充 UserMemoryContent/WorkProfileContent 全字段文档 - 移除 handler_calendar 死代码 _batch_status 和 runner 旧别名 AgentScopeReActRunner - PRD §5.2-5.6 和 sse-events 协议对齐实际 module/method 实现
700 lines
19 KiB
Markdown
700 lines
19 KiB
Markdown
# Single CLI + Progressive Skill Disclosure Redesign PRD
|
|
|
|
## 1. Goal
|
|
|
|
This task redesigns the current agent tool protocol around one confirmed product constraint:
|
|
|
|
1. The runtime should continue to expose exactly one business tool to the worker agent: `project_cli`.
|
|
2. The worker should learn how to use the tool through progressive skill disclosure instead of receiving a large global tool surface up front.
|
|
3. The current `command + subcommand + args` transport should be replaced with a business-action protocol that matches real product objects and user intents.
|
|
4. The redesign must remain grounded in the current repository's actual schedule domain:
|
|
- `schedule_items`
|
|
- `schedule_subscriptions`
|
|
5. The redesign must reduce wasted retries and token consumption without reintroducing the old multi-tool schema explosion.
|
|
|
|
This PRD does not propose a broad agent-platform rewrite. It is a focused redesign of how the single CLI tool, skills, router output, and worker execution contract should work together.
|
|
|
|
## 2. Confirmed Repository Facts
|
|
|
|
### 2.1 Router is not ReAct
|
|
|
|
The router is a direct structured generation stage, not a ReAct loop.
|
|
|
|
Confirmed in:
|
|
|
|
- `backend/src/core/agentscope/runtime/runner.py:310`
|
|
- `backend/src/core/agentscope/runtime/runner.py:325`
|
|
|
|
`_run_router_stage()` uses `finalize_json_response(...)` and returns one `RouterAgentOutput` payload.
|
|
|
|
Implication:
|
|
|
|
- Router cost control depends on prompt/schema size and retries in `finalize_json_response`, not `max_iters`.
|
|
- Tool-choice ambiguity is a worker problem, not a router ReAct problem.
|
|
|
|
### 2.2 Worker is the only ReAct loop
|
|
|
|
The worker uses `JsonReActAgent`, which subclasses AgentScope `ReActAgent`.
|
|
|
|
Confirmed in:
|
|
|
|
- `backend/src/core/agentscope/runtime/json_react_agent.py`
|
|
- `backend/src/core/agentscope/runtime/runner.py:495`
|
|
|
|
The current code does not pass an explicit `max_iters`, so the worker inherits AgentScope's default.
|
|
|
|
Confirmed externally in the local environment by inspecting the installed `ReActAgent.__init__` signature:
|
|
|
|
- `max_iters=10`
|
|
|
|
Implication:
|
|
|
|
- The worker currently has too much room to repeat invalid tool calls before failing.
|
|
- This task will explicitly set worker `max_iters=7`.
|
|
|
|
### 2.3 Worker does not consume context_messages
|
|
|
|
The worker receives only the router contract message and not the original `context_messages` list.
|
|
|
|
Confirmed in:
|
|
|
|
- `backend/src/core/agentscope/runtime/runner.py:265`
|
|
- `backend/src/core/agentscope/runtime/runner.py:285`
|
|
- `backend/src/core/agentscope/runtime/runner.py:461`
|
|
|
|
Implication:
|
|
|
|
- `worker.config.context_messages` is currently semantically misleading.
|
|
- Router history context remains important.
|
|
- Worker runtime context should come from router output, system prompt, tool results, and optional memory, not duplicated chat history configuration.
|
|
|
|
### 2.4 Latest failure was caused by protocol mismatch, not missing data
|
|
|
|
Latest messages read from Supabase showed the following failure pattern:
|
|
|
|
- Worker repeatedly called `project_cli`
|
|
- Payload shape: `command=calendar`, `subcommand=read`, `args={"event_id": "..."}`
|
|
- Backend returned `INVALID_ARGUMENT: start_at and end_at are required`
|
|
- The same invalid call repeated until the worker exhausted the default ReAct limit
|
|
|
|
This proves:
|
|
|
|
1. The worker knew the event identifier.
|
|
2. The current CLI protocol did not expose a clear "get one event by id" action.
|
|
3. The current naming (`read`) encouraged the worker to map both range listing and single-event detail lookup onto one ambiguous command.
|
|
|
|
### 2.5 The current calendar domain is already split into two real business objects
|
|
|
|
Database evidence:
|
|
|
|
- `public.schedule_items`
|
|
- `public.schedule_subscriptions`
|
|
|
|
Current schema highlights:
|
|
|
|
`schedule_items`
|
|
|
|
- `id`
|
|
- `owner_id`
|
|
- `title`
|
|
- `description`
|
|
- `start_at`
|
|
- `end_at`
|
|
- `timezone`
|
|
- `metadata`
|
|
- `recurrence_rule`
|
|
- `source_type`
|
|
- `status`
|
|
|
|
`schedule_subscriptions`
|
|
|
|
- `item_id`
|
|
- `subscriber_id`
|
|
- `permission`
|
|
- `notify_level`
|
|
- `status`
|
|
|
|
Current backend routes and services already reflect this split:
|
|
|
|
- list events by range
|
|
- get event by id
|
|
- create event
|
|
- update event
|
|
- delete event
|
|
- share/invite event
|
|
- accept subscription
|
|
- reject subscription
|
|
|
|
Confirmed in:
|
|
|
|
- `backend/src/v1/schedule_items/router.py`
|
|
- `backend/src/v1/schedule_items/service.py`
|
|
|
|
### 2.6 Frontend already distinguishes list vs detail vs invite flows
|
|
|
|
Confirmed in:
|
|
|
|
- `apps/lib/features/calendar/data/apis/calendar_api.dart`
|
|
- `apps/lib/features/calendar/data/repositories/calendar_repository.dart`
|
|
|
|
The frontend already calls:
|
|
|
|
- `GET /schedule-items?start_at&end_at`
|
|
- `GET /schedule-items/{id}`
|
|
- `POST /schedule-items/{id}/share`
|
|
- `POST /schedule-items/{id}/accept`
|
|
- `POST /schedule-items/{id}/reject`
|
|
|
|
Implication:
|
|
|
|
- The product itself already separates these business operations.
|
|
- The ambiguity exists in the agent CLI input contract, not in the underlying app/domain design.
|
|
|
|
## 3. Problem Statement
|
|
|
|
The current one-tool design has the right high-level direction but the wrong action protocol.
|
|
|
|
### 3.1 What was correct in the previous refactor
|
|
|
|
The following direction remains valid and should be preserved:
|
|
|
|
1. One AgentScope tool entry (`project_cli`) is preferable to many domain tools for token control.
|
|
2. AgentScope skills should be the mechanism for teaching the model when and how to use the tool.
|
|
3. Tool outputs should remain structured and machine-oriented.
|
|
4. AG-UI/UI-schema compilation should remain backend-owned.
|
|
5. The worker should not receive all tool knowledge eagerly.
|
|
|
|
### 3.2 What is no longer acceptable
|
|
|
|
The following parts of the previous CLI protocol should be replaced:
|
|
|
|
1. `command + subcommand + args` as the model-facing protocol.
|
|
2. Ambiguous action names such as `read` that cover more than one business intent.
|
|
3. Loose `args: dict[str, Any]` semantics that encourage field guessing.
|
|
4. Legacy alias drift such as `start_time/end_time`, `event_timezone`, and other migration leftovers.
|
|
5. Runtime dependence on long prose skill files instead of short execution-oriented action cards.
|
|
|
|
### 3.3 Why the old CLI shape fails even though the single-tool strategy is good
|
|
|
|
The current single-tool protocol is too generic for a small model.
|
|
|
|
The worker must infer, from weak labels like `read`, all of the following at once:
|
|
|
|
1. Which business object is involved.
|
|
2. Whether the user wants a list or one detail record.
|
|
3. Which fields are mandatory for that specific subcommand.
|
|
4. Which field names are canonical.
|
|
|
|
This moves too much burden from the runtime protocol into model guesswork.
|
|
|
|
The result is not just correctness risk. It also increases token cost because the worker burns iterations learning through failure.
|
|
|
|
## 4. Design Principles
|
|
|
|
### 4.1 Keep exactly one tool
|
|
|
|
The worker should continue to see one executable tool:
|
|
|
|
- `project_cli`
|
|
|
|
Reason:
|
|
|
|
- avoids multi-tool selection overhead
|
|
- avoids injecting many tool schemas into every model call
|
|
- preserves a stable tool surface for worker prompting
|
|
|
|
### 4.2 Move model-facing semantics from CLI history to business actions
|
|
|
|
The model-facing protocol should describe business intent directly, not technical command-tree history.
|
|
|
|
Replace:
|
|
|
|
```json
|
|
{
|
|
"command": "calendar",
|
|
"subcommand": "read",
|
|
"args": {}
|
|
}
|
|
```
|
|
|
|
With:
|
|
|
|
```json
|
|
{
|
|
"module": "calendar",
|
|
"method": "read",
|
|
"input": {
|
|
"mode": "event",
|
|
"event_id": "<uuid>"
|
|
}
|
|
}
|
|
```
|
|
|
|
This preserves one tool while making the business contract explicit.
|
|
|
|
### 4.3 Use progressive disclosure for skill knowledge, not for raw global schema exposure
|
|
|
|
The worker should not receive all method definitions by default.
|
|
|
|
Instead:
|
|
|
|
1. Read a short skill index first.
|
|
2. Read the relevant method card only when necessary.
|
|
3. Call `project_cli` with the chosen `module/method/input` payload.
|
|
|
|
This keeps the token budget focused on the current business scenario.
|
|
|
|
### 4.4 Server-side validation stays strict even if the tool schema stays thin
|
|
|
|
To avoid a large tool schema, `project_cli` may expose only a thin outer schema:
|
|
|
|
- `module`
|
|
- `method`
|
|
- `input`
|
|
|
|
Strict validation then happens server-side by dispatching `module + method` to the corresponding Pydantic model.
|
|
|
|
For calendar reads, the input must use strong typed domain values at the schema boundary:
|
|
|
|
- day reads: `date`
|
|
- range reads: timezone-aware `datetime`
|
|
- single-event reads: `UUID`
|
|
|
|
The transport remains JSON, but the backend contract must validate these as typed values immediately instead of accepting arbitrary strings and reparsing them later.
|
|
|
|
This preserves strictness without forcing the entire action matrix into the model context.
|
|
|
|
### 4.5 No broad backward-compatibility layer
|
|
|
|
This redesign should not preserve old field aliases or broad coercion behavior.
|
|
|
|
Specifically, phase implementation should remove or reject:
|
|
|
|
- `args` as JSON string
|
|
- `start_time/end_time`
|
|
- `event_timezone`
|
|
- action overloading under `read`
|
|
|
|
The system should fail clearly and structurally instead of guessing.
|
|
|
|
## 5. Target Architecture
|
|
|
|
## 5.1 Runtime responsibilities
|
|
|
|
### Router
|
|
|
|
The router remains a direct structured output stage.
|
|
|
|
It should continue to decide:
|
|
|
|
- the objective
|
|
- whether tool evidence is required
|
|
|
|
It should be extended to optionally provide stronger execution hints:
|
|
|
|
- `selected_skill`
|
|
- `intended_action`
|
|
- `known_entities`
|
|
- `known_time_range`
|
|
- `missing_fields`
|
|
|
|
These fields are not there to make router execute tools. They are there to reduce worker exploration cost.
|
|
|
|
### Worker
|
|
|
|
The worker remains the only ReAct stage.
|
|
|
|
Worker changes in this redesign:
|
|
|
|
1. Explicitly set `max_iters=7`.
|
|
2. Keep `temperature` unchanged.
|
|
3. Stop pretending worker consumes `context_messages` configuration.
|
|
4. Prefer router execution hints before reading additional skill files.
|
|
5. Read the smallest relevant skill file possible before tool use.
|
|
|
|
### Tool
|
|
|
|
The worker still sees only:
|
|
|
|
- `project_cli`
|
|
- `view_skill_file`
|
|
|
|
`project_cli` is the execution boundary.
|
|
`view_skill_file` is the progressive-disclosure knowledge boundary.
|
|
|
|
## 5.2 New `project_cli` model-facing input contract
|
|
|
|
The canonical model-facing payload is:
|
|
|
|
```json
|
|
{
|
|
"module": "calendar",
|
|
"method": "read",
|
|
"input": {
|
|
"mode": "event",
|
|
"event_id": "<uuid>"
|
|
}
|
|
}
|
|
```
|
|
|
|
Field meanings:
|
|
|
|
- `module`: enabled business module namespace (calendar, contacts, memory)
|
|
- `method`: concrete business operation inside the module
|
|
- `input`: strict method-specific payload
|
|
|
|
This is still one tool call. The worker is not choosing among many tools.
|
|
|
|
## 5.3 Calendar method protocol
|
|
|
|
The calendar module exposes the following methods registered in the CLI router:
|
|
|
|
| Module | Method | Handler | Input Shape |
|
|
|----------|----------------|----------------------------------|-------------|
|
|
| calendar | read | `handle_calendar_list_day` | discriminated by `mode` |
|
|
| calendar | create | `handle_calendar_create_event` | title, start_at, timezone, ... |
|
|
| calendar | update | `handle_calendar_update_event` | event_id + patch |
|
|
| calendar | delete | `handle_calendar_delete_event` | event_id |
|
|
| calendar | share | `handle_calendar_invite_subscriber` | event_id, invitee, permissions |
|
|
| calendar | accept_invite | `handle_calendar_accept_invite` | event_id |
|
|
| calendar | reject_invite | `handle_calendar_reject_invite` | event_id |
|
|
|
|
The `read` method uses a discriminated union with `mode` field to dispatch to list_day, list_range, or get_event internally.
|
|
|
|
This avoids overloading one label like `read` for two distinct business tasks.
|
|
|
|
## 5.4 Canonical calendar method shapes
|
|
|
|
### `read` with mode=day (list one day)
|
|
|
|
```json
|
|
{
|
|
"module": "calendar",
|
|
"method": "read",
|
|
"input": {
|
|
"mode": "day",
|
|
"date": "2026-04-23",
|
|
"timezone": "Asia/Shanghai"
|
|
}
|
|
}
|
|
```
|
|
|
|
### `read` with mode=range (list time range)
|
|
|
|
```json
|
|
{
|
|
"module": "calendar",
|
|
"method": "read",
|
|
"input": {
|
|
"mode": "range",
|
|
"start_at": "2026-04-23T00:00:00+08:00",
|
|
"end_at": "2026-04-24T00:00:00+08:00"
|
|
}
|
|
}
|
|
```
|
|
|
|
### `read` with mode=event (get by ID)
|
|
|
|
```json
|
|
{
|
|
"module": "calendar",
|
|
"method": "read",
|
|
"input": {
|
|
"mode": "event",
|
|
"event_id": "<uuid>"
|
|
}
|
|
}
|
|
```
|
|
|
|
### `create`
|
|
|
|
```json
|
|
{
|
|
"module": "calendar",
|
|
"method": "create",
|
|
"input": {
|
|
"title": "Project sync",
|
|
"start_at": "2026-04-23T16:00:00+08:00",
|
|
"end_at": "2026-04-23T17:00:00+08:00",
|
|
"timezone": "Asia/Shanghai",
|
|
"description": "optional",
|
|
"metadata": {
|
|
"location": "optional",
|
|
"reminder_minutes": 30,
|
|
"color": "blue",
|
|
"notes": "optional"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### `update`
|
|
|
|
```json
|
|
{
|
|
"module": "calendar",
|
|
"method": "update",
|
|
"input": {
|
|
"event_id": "<uuid>",
|
|
"patch": {
|
|
"title": "Updated title",
|
|
"start_at": "2026-04-23T18:00:00+08:00",
|
|
"timezone": "Asia/Shanghai",
|
|
"status": "archived"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### `delete`
|
|
|
|
```json
|
|
{
|
|
"module": "calendar",
|
|
"method": "delete",
|
|
"input": {
|
|
"event_id": "<uuid>"
|
|
}
|
|
}
|
|
```
|
|
|
|
### `share`
|
|
|
|
```json
|
|
{
|
|
"module": "calendar",
|
|
"method": "share",
|
|
"input": {
|
|
"event_id": "<uuid>",
|
|
"invitee": {
|
|
"phone": "+8613812345678"
|
|
},
|
|
"permissions": {
|
|
"view": true,
|
|
"edit": false,
|
|
"invite": false
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### `accept_invite`
|
|
|
|
```json
|
|
{
|
|
"module": "calendar",
|
|
"method": "accept_invite",
|
|
"input": {
|
|
"event_id": "<uuid>"
|
|
}
|
|
}
|
|
```
|
|
|
|
### `reject_invite`
|
|
|
|
```json
|
|
{
|
|
"module": "calendar",
|
|
"method": "reject_invite",
|
|
"input": {
|
|
"event_id": "<uuid>"
|
|
}
|
|
}
|
|
```
|
|
|
|
## 5.5 Skill packaging for progressive disclosure
|
|
|
|
The calendar skill should no longer be one long explanatory page that the worker must read in full.
|
|
|
|
Recommended structure:
|
|
|
|
```text
|
|
calendar/
|
|
SKILL.md # very short index / navigation card
|
|
actions/
|
|
list_day.md
|
|
list_range.md
|
|
get_event.md
|
|
create_event.md
|
|
update_event.md
|
|
delete_event.md
|
|
invite_subscriber.md
|
|
accept_invite.md
|
|
reject_invite.md
|
|
```
|
|
|
|
### `SKILL.md` responsibilities
|
|
|
|
- describe when calendar skill is relevant
|
|
- list all actions in one screen
|
|
- say which action to use for known `event_id`
|
|
- say which action to use for date/range queries
|
|
- point to action files for exact payloads
|
|
|
|
### Action file responsibilities
|
|
|
|
Each action file should contain only:
|
|
|
|
1. when to use the action
|
|
2. required fields
|
|
3. optional fields
|
|
4. one canonical example
|
|
5. forbidden field names and common mistakes
|
|
|
|
This makes `view_skill_file` a real progressive-disclosure mechanism instead of a markdown dump.
|
|
|
|
## 5.6 Error contract for self-correction
|
|
|
|
The redesigned CLI returns structured validation feedback with field-level detail.
|
|
|
|
Canonical error example:
|
|
|
|
```json
|
|
{
|
|
"ok": false,
|
|
"module": "calendar",
|
|
"method": "read",
|
|
"error": {
|
|
"code": "INVALID_ACTION_INPUT",
|
|
"message": "input does not match method schema",
|
|
"retryable": false,
|
|
"details": {
|
|
"missing_fields": ["start_at", "end_at"],
|
|
"invalid_fields": [],
|
|
"alias_corrections": {
|
|
"start_time": "start_at"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
This is intentionally more corrective than the current generic `INVALID_ARGUMENT` payload.
|
|
|
|
## 6. Token and Cost Control Strategy
|
|
|
|
### 6.1 Preserve single-tool economy
|
|
|
|
The main token-saving choice is to preserve one executable business tool.
|
|
|
|
This avoids:
|
|
|
|
- multiple tool schemas in each worker call
|
|
- model confusion over which tool to pick first
|
|
- large repeated tool descriptions in every turn
|
|
|
|
### 6.2 Replace global knowledge with scoped reading
|
|
|
|
The worker should read:
|
|
|
|
1. router execution hints first
|
|
2. skill index second
|
|
3. one action card if needed
|
|
|
|
This is cheaper than injecting the entire action matrix into every prompt.
|
|
|
|
### 6.3 Stop spending iterations on protocol discovery
|
|
|
|
The redesign reduces cost not by suppressing useful reasoning, but by removing the need for repeated failed exploration.
|
|
|
|
The worker should no longer need multiple failed attempts to discover:
|
|
|
|
- whether `event_id` belongs to `read`
|
|
- whether `start_time` is valid
|
|
- whether `event_timezone` is accepted
|
|
|
|
### 6.4 Concrete worker settings for this redesign
|
|
|
|
- set worker `max_iters=7`
|
|
- keep worker `temperature` unchanged
|
|
- remove/ignore worker `context_messages` configuration in runtime semantics
|
|
|
|
### 6.5 Explicit non-goals in this task
|
|
|
|
This task does not include:
|
|
|
|
- changing router into a ReAct stage
|
|
- lowering worker temperature
|
|
- adding duplicate-failure circuit breakers yet
|
|
- exposing many separate AgentScope tools again
|
|
|
|
## 7. Migration Plan
|
|
|
|
### Phase 0: Planning and protocol design
|
|
|
|
1. Write this PRD and implementation checklist.
|
|
2. Update protocol docs before runtime code changes.
|
|
3. Record rejected alternatives and reasoning.
|
|
|
|
### Phase 1: Backend runtime contract
|
|
|
|
1. Extend router output schema with optional execution hints.
|
|
2. Explicitly set worker `max_iters=7`.
|
|
3. Remove semantic reliance on worker `context_messages`.
|
|
4. Redesign `project_cli` request payload as `skill/action/input`.
|
|
|
|
### Phase 2: Calendar action dispatch
|
|
|
|
1. Replace current calendar command/subcommand routing with action dispatch.
|
|
2. Implement strict action-specific Pydantic models.
|
|
3. Remove legacy alias handling and generic dict coercion.
|
|
4. Return structured correction-oriented validation errors.
|
|
|
|
### Phase 3: Skill refactor
|
|
|
|
1. Rewrite `calendar/SKILL.md` as a short index card.
|
|
2. Add per-action action-card files.
|
|
3. Update skill instructions so worker reads only what is needed.
|
|
|
|
### Phase 4: Cross-layer alignment
|
|
|
|
1. Update relevant protocol docs.
|
|
2. Keep frontend consumption stable where possible.
|
|
3. Ensure tool result and AG-UI event semantics remain compatible.
|
|
|
|
### Phase 5: Verification
|
|
|
|
1. Reproduce the previous failure case and confirm it routes to `get_event`.
|
|
2. Verify create-event flow uses canonical names only.
|
|
3. Verify range/day queries still work.
|
|
4. Verify invite/accept/reject flows map to current schedule subscription behavior.
|
|
|
|
## 8. Rejected Alternatives
|
|
|
|
### 8.1 Rejected: split back into many tools
|
|
|
|
Reason:
|
|
|
|
- reintroduces tool-schema bloat
|
|
- worsens tool-choice ambiguity
|
|
- increases token overhead on every worker step
|
|
|
|
### 8.2 Rejected: keep `command/subcommand/args` and fix only the skill text
|
|
|
|
Reason:
|
|
|
|
- the ambiguity is structural, not editorial
|
|
- `read` still overloads distinct business operations
|
|
- loose dict input still encourages field guessing
|
|
|
|
### 8.3 Rejected: put the full action schema into the tool prompt directly
|
|
|
|
Reason:
|
|
|
|
- defeats progressive disclosure
|
|
- grows the worker prompt on every turn
|
|
- hurts cost and small-model reliability
|
|
|
|
## 9. Success Criteria
|
|
|
|
This redesign is successful only if all of the following are true:
|
|
|
|
1. The worker still sees one executable business tool.
|
|
2. The worker chooses calendar actions through business semantics, not command-tree guesswork.
|
|
3. The previous repeated-failure case becomes a direct `get_event` call when `event_id` is known.
|
|
4. The worker no longer relies on undocumented field aliases.
|
|
5. The runtime protocol is strictly validated server-side.
|
|
6. Skill reading is incremental and action-scoped.
|
|
7. Worker iteration cost is bounded by `max_iters=7`.
|
|
8. Backend, protocol docs, and frontend assumptions remain aligned.
|