- 所有 calendar action .md: skill/action 替换为 module/method + mode 字段 - handler_memory: 新增 Pydantic extra=forbid 模型替代手工 dict 校验 - memory/SKILL.md: 补充 UserMemoryContent/WorkProfileContent 全字段文档 - 移除 handler_calendar 死代码 _batch_status 和 runner 旧别名 AgentScopeReActRunner - PRD §5.2-5.6 和 sse-events 协议对齐实际 module/method 实现
19 KiB
Single CLI + Progressive Skill Disclosure Redesign PRD
1. Goal
This task redesigns the current agent tool protocol around one confirmed product constraint:
- The runtime should continue to expose exactly one business tool to the worker agent:
project_cli. - The worker should learn how to use the tool through progressive skill disclosure instead of receiving a large global tool surface up front.
- The current
command + subcommand + argstransport should be replaced with a business-action protocol that matches real product objects and user intents. - The redesign must remain grounded in the current repository's actual schedule domain:
schedule_itemsschedule_subscriptions
- The redesign must reduce wasted retries and token consumption without reintroducing the old multi-tool schema explosion.
This PRD does not propose a broad agent-platform rewrite. It is a focused redesign of how the single CLI tool, skills, router output, and worker execution contract should work together.
2. Confirmed Repository Facts
2.1 Router is not ReAct
The router is a direct structured generation stage, not a ReAct loop.
Confirmed in:
backend/src/core/agentscope/runtime/runner.py:310backend/src/core/agentscope/runtime/runner.py:325
_run_router_stage() uses finalize_json_response(...) and returns one RouterAgentOutput payload.
Implication:
- Router cost control depends on prompt/schema size and retries in
finalize_json_response, notmax_iters. - Tool-choice ambiguity is a worker problem, not a router ReAct problem.
2.2 Worker is the only ReAct loop
The worker uses JsonReActAgent, which subclasses AgentScope ReActAgent.
Confirmed in:
backend/src/core/agentscope/runtime/json_react_agent.pybackend/src/core/agentscope/runtime/runner.py:495
The current code does not pass an explicit max_iters, so the worker inherits AgentScope's default.
Confirmed externally in the local environment by inspecting the installed ReActAgent.__init__ signature:
max_iters=10
Implication:
- The worker currently has too much room to repeat invalid tool calls before failing.
- This task will explicitly set worker
max_iters=7.
2.3 Worker does not consume context_messages
The worker receives only the router contract message and not the original context_messages list.
Confirmed in:
backend/src/core/agentscope/runtime/runner.py:265backend/src/core/agentscope/runtime/runner.py:285backend/src/core/agentscope/runtime/runner.py:461
Implication:
worker.config.context_messagesis currently semantically misleading.- Router history context remains important.
- Worker runtime context should come from router output, system prompt, tool results, and optional memory, not duplicated chat history configuration.
2.4 Latest failure was caused by protocol mismatch, not missing data
Latest messages read from Supabase showed the following failure pattern:
- Worker repeatedly called
project_cli - Payload shape:
command=calendar,subcommand=read,args={"event_id": "..."} - Backend returned
INVALID_ARGUMENT: start_at and end_at are required - The same invalid call repeated until the worker exhausted the default ReAct limit
This proves:
- The worker knew the event identifier.
- The current CLI protocol did not expose a clear "get one event by id" action.
- The current naming (
read) encouraged the worker to map both range listing and single-event detail lookup onto one ambiguous command.
2.5 The current calendar domain is already split into two real business objects
Database evidence:
public.schedule_itemspublic.schedule_subscriptions
Current schema highlights:
schedule_items
idowner_idtitledescriptionstart_atend_attimezonemetadatarecurrence_rulesource_typestatus
schedule_subscriptions
item_idsubscriber_idpermissionnotify_levelstatus
Current backend routes and services already reflect this split:
- list events by range
- get event by id
- create event
- update event
- delete event
- share/invite event
- accept subscription
- reject subscription
Confirmed in:
backend/src/v1/schedule_items/router.pybackend/src/v1/schedule_items/service.py
2.6 Frontend already distinguishes list vs detail vs invite flows
Confirmed in:
apps/lib/features/calendar/data/apis/calendar_api.dartapps/lib/features/calendar/data/repositories/calendar_repository.dart
The frontend already calls:
GET /schedule-items?start_at&end_atGET /schedule-items/{id}POST /schedule-items/{id}/sharePOST /schedule-items/{id}/acceptPOST /schedule-items/{id}/reject
Implication:
- The product itself already separates these business operations.
- The ambiguity exists in the agent CLI input contract, not in the underlying app/domain design.
3. Problem Statement
The current one-tool design has the right high-level direction but the wrong action protocol.
3.1 What was correct in the previous refactor
The following direction remains valid and should be preserved:
- One AgentScope tool entry (
project_cli) is preferable to many domain tools for token control. - AgentScope skills should be the mechanism for teaching the model when and how to use the tool.
- Tool outputs should remain structured and machine-oriented.
- AG-UI/UI-schema compilation should remain backend-owned.
- The worker should not receive all tool knowledge eagerly.
3.2 What is no longer acceptable
The following parts of the previous CLI protocol should be replaced:
command + subcommand + argsas the model-facing protocol.- Ambiguous action names such as
readthat cover more than one business intent. - Loose
args: dict[str, Any]semantics that encourage field guessing. - Legacy alias drift such as
start_time/end_time,event_timezone, and other migration leftovers. - Runtime dependence on long prose skill files instead of short execution-oriented action cards.
3.3 Why the old CLI shape fails even though the single-tool strategy is good
The current single-tool protocol is too generic for a small model.
The worker must infer, from weak labels like read, all of the following at once:
- Which business object is involved.
- Whether the user wants a list or one detail record.
- Which fields are mandatory for that specific subcommand.
- Which field names are canonical.
This moves too much burden from the runtime protocol into model guesswork.
The result is not just correctness risk. It also increases token cost because the worker burns iterations learning through failure.
4. Design Principles
4.1 Keep exactly one tool
The worker should continue to see one executable tool:
project_cli
Reason:
- avoids multi-tool selection overhead
- avoids injecting many tool schemas into every model call
- preserves a stable tool surface for worker prompting
4.2 Move model-facing semantics from CLI history to business actions
The model-facing protocol should describe business intent directly, not technical command-tree history.
Replace:
{
"command": "calendar",
"subcommand": "read",
"args": {}
}
With:
{
"module": "calendar",
"method": "read",
"input": {
"mode": "event",
"event_id": "<uuid>"
}
}
This preserves one tool while making the business contract explicit.
4.3 Use progressive disclosure for skill knowledge, not for raw global schema exposure
The worker should not receive all method definitions by default.
Instead:
- Read a short skill index first.
- Read the relevant method card only when necessary.
- Call
project_cliwith the chosenmodule/method/inputpayload.
This keeps the token budget focused on the current business scenario.
4.4 Server-side validation stays strict even if the tool schema stays thin
To avoid a large tool schema, project_cli may expose only a thin outer schema:
modulemethodinput
Strict validation then happens server-side by dispatching module + method to the corresponding Pydantic model.
For calendar reads, the input must use strong typed domain values at the schema boundary:
- day reads:
date - range reads: timezone-aware
datetime - single-event reads:
UUID
The transport remains JSON, but the backend contract must validate these as typed values immediately instead of accepting arbitrary strings and reparsing them later.
This preserves strictness without forcing the entire action matrix into the model context.
4.5 No broad backward-compatibility layer
This redesign should not preserve old field aliases or broad coercion behavior.
Specifically, phase implementation should remove or reject:
argsas JSON stringstart_time/end_timeevent_timezone- action overloading under
read
The system should fail clearly and structurally instead of guessing.
5. Target Architecture
5.1 Runtime responsibilities
Router
The router remains a direct structured output stage.
It should continue to decide:
- the objective
- whether tool evidence is required
It should be extended to optionally provide stronger execution hints:
selected_skillintended_actionknown_entitiesknown_time_rangemissing_fields
These fields are not there to make router execute tools. They are there to reduce worker exploration cost.
Worker
The worker remains the only ReAct stage.
Worker changes in this redesign:
- Explicitly set
max_iters=7. - Keep
temperatureunchanged. - Stop pretending worker consumes
context_messagesconfiguration. - Prefer router execution hints before reading additional skill files.
- Read the smallest relevant skill file possible before tool use.
Tool
The worker still sees only:
project_cliview_skill_file
project_cli is the execution boundary.
view_skill_file is the progressive-disclosure knowledge boundary.
5.2 New project_cli model-facing input contract
The canonical model-facing payload is:
{
"module": "calendar",
"method": "read",
"input": {
"mode": "event",
"event_id": "<uuid>"
}
}
Field meanings:
module: enabled business module namespace (calendar, contacts, memory)method: concrete business operation inside the moduleinput: strict method-specific payload
This is still one tool call. The worker is not choosing among many tools.
5.3 Calendar method protocol
The calendar module exposes the following methods registered in the CLI router:
| Module | Method | Handler | Input Shape |
|---|---|---|---|
| calendar | read | handle_calendar_list_day |
discriminated by mode |
| calendar | create | handle_calendar_create_event |
title, start_at, timezone, ... |
| calendar | update | handle_calendar_update_event |
event_id + patch |
| calendar | delete | handle_calendar_delete_event |
event_id |
| calendar | share | handle_calendar_invite_subscriber |
event_id, invitee, permissions |
| calendar | accept_invite | handle_calendar_accept_invite |
event_id |
| calendar | reject_invite | handle_calendar_reject_invite |
event_id |
The read method uses a discriminated union with mode field to dispatch to list_day, list_range, or get_event internally.
This avoids overloading one label like read for two distinct business tasks.
5.4 Canonical calendar method shapes
read with mode=day (list one day)
{
"module": "calendar",
"method": "read",
"input": {
"mode": "day",
"date": "2026-04-23",
"timezone": "Asia/Shanghai"
}
}
read with mode=range (list time range)
{
"module": "calendar",
"method": "read",
"input": {
"mode": "range",
"start_at": "2026-04-23T00:00:00+08:00",
"end_at": "2026-04-24T00:00:00+08:00"
}
}
read with mode=event (get by ID)
{
"module": "calendar",
"method": "read",
"input": {
"mode": "event",
"event_id": "<uuid>"
}
}
create
{
"module": "calendar",
"method": "create",
"input": {
"title": "Project sync",
"start_at": "2026-04-23T16:00:00+08:00",
"end_at": "2026-04-23T17:00:00+08:00",
"timezone": "Asia/Shanghai",
"description": "optional",
"metadata": {
"location": "optional",
"reminder_minutes": 30,
"color": "blue",
"notes": "optional"
}
}
}
update
{
"module": "calendar",
"method": "update",
"input": {
"event_id": "<uuid>",
"patch": {
"title": "Updated title",
"start_at": "2026-04-23T18:00:00+08:00",
"timezone": "Asia/Shanghai",
"status": "archived"
}
}
}
delete
{
"module": "calendar",
"method": "delete",
"input": {
"event_id": "<uuid>"
}
}
share
{
"module": "calendar",
"method": "share",
"input": {
"event_id": "<uuid>",
"invitee": {
"phone": "+8613812345678"
},
"permissions": {
"view": true,
"edit": false,
"invite": false
}
}
}
accept_invite
{
"module": "calendar",
"method": "accept_invite",
"input": {
"event_id": "<uuid>"
}
}
reject_invite
{
"module": "calendar",
"method": "reject_invite",
"input": {
"event_id": "<uuid>"
}
}
5.5 Skill packaging for progressive disclosure
The calendar skill should no longer be one long explanatory page that the worker must read in full.
Recommended structure:
calendar/
SKILL.md # very short index / navigation card
actions/
list_day.md
list_range.md
get_event.md
create_event.md
update_event.md
delete_event.md
invite_subscriber.md
accept_invite.md
reject_invite.md
SKILL.md responsibilities
- describe when calendar skill is relevant
- list all actions in one screen
- say which action to use for known
event_id - say which action to use for date/range queries
- point to action files for exact payloads
Action file responsibilities
Each action file should contain only:
- when to use the action
- required fields
- optional fields
- one canonical example
- forbidden field names and common mistakes
This makes view_skill_file a real progressive-disclosure mechanism instead of a markdown dump.
5.6 Error contract for self-correction
The redesigned CLI returns structured validation feedback with field-level detail.
Canonical error example:
{
"ok": false,
"module": "calendar",
"method": "read",
"error": {
"code": "INVALID_ACTION_INPUT",
"message": "input does not match method schema",
"retryable": false,
"details": {
"missing_fields": ["start_at", "end_at"],
"invalid_fields": [],
"alias_corrections": {
"start_time": "start_at"
}
}
}
}
This is intentionally more corrective than the current generic INVALID_ARGUMENT payload.
6. Token and Cost Control Strategy
6.1 Preserve single-tool economy
The main token-saving choice is to preserve one executable business tool.
This avoids:
- multiple tool schemas in each worker call
- model confusion over which tool to pick first
- large repeated tool descriptions in every turn
6.2 Replace global knowledge with scoped reading
The worker should read:
- router execution hints first
- skill index second
- one action card if needed
This is cheaper than injecting the entire action matrix into every prompt.
6.3 Stop spending iterations on protocol discovery
The redesign reduces cost not by suppressing useful reasoning, but by removing the need for repeated failed exploration.
The worker should no longer need multiple failed attempts to discover:
- whether
event_idbelongs toread - whether
start_timeis valid - whether
event_timezoneis accepted
6.4 Concrete worker settings for this redesign
- set worker
max_iters=7 - keep worker
temperatureunchanged - remove/ignore worker
context_messagesconfiguration in runtime semantics
6.5 Explicit non-goals in this task
This task does not include:
- changing router into a ReAct stage
- lowering worker temperature
- adding duplicate-failure circuit breakers yet
- exposing many separate AgentScope tools again
7. Migration Plan
Phase 0: Planning and protocol design
- Write this PRD and implementation checklist.
- Update protocol docs before runtime code changes.
- Record rejected alternatives and reasoning.
Phase 1: Backend runtime contract
- Extend router output schema with optional execution hints.
- Explicitly set worker
max_iters=7. - Remove semantic reliance on worker
context_messages. - Redesign
project_clirequest payload asskill/action/input.
Phase 2: Calendar action dispatch
- Replace current calendar command/subcommand routing with action dispatch.
- Implement strict action-specific Pydantic models.
- Remove legacy alias handling and generic dict coercion.
- Return structured correction-oriented validation errors.
Phase 3: Skill refactor
- Rewrite
calendar/SKILL.mdas a short index card. - Add per-action action-card files.
- Update skill instructions so worker reads only what is needed.
Phase 4: Cross-layer alignment
- Update relevant protocol docs.
- Keep frontend consumption stable where possible.
- Ensure tool result and AG-UI event semantics remain compatible.
Phase 5: Verification
- Reproduce the previous failure case and confirm it routes to
get_event. - Verify create-event flow uses canonical names only.
- Verify range/day queries still work.
- Verify invite/accept/reject flows map to current schedule subscription behavior.
8. Rejected Alternatives
8.1 Rejected: split back into many tools
Reason:
- reintroduces tool-schema bloat
- worsens tool-choice ambiguity
- increases token overhead on every worker step
8.2 Rejected: keep command/subcommand/args and fix only the skill text
Reason:
- the ambiguity is structural, not editorial
readstill overloads distinct business operations- loose dict input still encourages field guessing
8.3 Rejected: put the full action schema into the tool prompt directly
Reason:
- defeats progressive disclosure
- grows the worker prompt on every turn
- hurts cost and small-model reliability
9. Success Criteria
This redesign is successful only if all of the following are true:
- The worker still sees one executable business tool.
- The worker chooses calendar actions through business semantics, not command-tree guesswork.
- The previous repeated-failure case becomes a direct
get_eventcall whenevent_idis known. - The worker no longer relies on undocumented field aliases.
- The runtime protocol is strictly validated server-side.
- Skill reading is incremental and action-scoped.
- Worker iteration cost is bounded by
max_iters=7. - Backend, protocol docs, and frontend assumptions remain aligned.