Files

T

qzl d2d292a99e fix(agent): 修复 skill action 卡片调用约定、memory 强类型校验和死代码清理

- 所有 calendar action .md: skill/action 替换为 module/method + mode 字段
- handler_memory: 新增 Pydantic extra=forbid 模型替代手工 dict 校验
- memory/SKILL.md: 补充 UserMemoryContent/WorkProfileContent 全字段文档
- 移除 handler_calendar 死代码 _batch_status 和 runner 旧别名 AgentScopeReActRunner
- PRD §5.2-5.6 和 sse-events 协议对齐实际 module/method 实现

2026-04-24 14:10:57 +08:00

19 KiB

Raw Blame History

Single CLI + Progressive Skill Disclosure Redesign PRD

1. Goal

This task redesigns the current agent tool protocol around one confirmed product constraint:

The runtime should continue to expose exactly one business tool to the worker agent: project_cli.
The worker should learn how to use the tool through progressive skill disclosure instead of receiving a large global tool surface up front.
The current command + subcommand + args transport should be replaced with a business-action protocol that matches real product objects and user intents.
The redesign must remain grounded in the current repository's actual schedule domain:
- schedule_items
- schedule_subscriptions
The redesign must reduce wasted retries and token consumption without reintroducing the old multi-tool schema explosion.

This PRD does not propose a broad agent-platform rewrite. It is a focused redesign of how the single CLI tool, skills, router output, and worker execution contract should work together.

2. Confirmed Repository Facts

2.1 Router is not ReAct

The router is a direct structured generation stage, not a ReAct loop.

Confirmed in:

backend/src/core/agentscope/runtime/runner.py:310
backend/src/core/agentscope/runtime/runner.py:325

_run_router_stage() uses finalize_json_response(...) and returns one RouterAgentOutput payload.

Implication:

Router cost control depends on prompt/schema size and retries in finalize_json_response, not max_iters.
Tool-choice ambiguity is a worker problem, not a router ReAct problem.

2.2 Worker is the only ReAct loop

The worker uses JsonReActAgent, which subclasses AgentScope ReActAgent.

Confirmed in:

backend/src/core/agentscope/runtime/json_react_agent.py
backend/src/core/agentscope/runtime/runner.py:495

The current code does not pass an explicit max_iters, so the worker inherits AgentScope's default.

Confirmed externally in the local environment by inspecting the installed ReActAgent.__init__ signature:

max_iters=10

Implication:

The worker currently has too much room to repeat invalid tool calls before failing.
This task will explicitly set worker max_iters=7.

2.3 Worker does not consume context_messages

The worker receives only the router contract message and not the original context_messages list.

Confirmed in:

backend/src/core/agentscope/runtime/runner.py:265
backend/src/core/agentscope/runtime/runner.py:285
backend/src/core/agentscope/runtime/runner.py:461

Implication:

worker.config.context_messages is currently semantically misleading.
Router history context remains important.
Worker runtime context should come from router output, system prompt, tool results, and optional memory, not duplicated chat history configuration.

2.4 Latest failure was caused by protocol mismatch, not missing data

Latest messages read from Supabase showed the following failure pattern:

Worker repeatedly called project_cli
Payload shape: command=calendar, subcommand=read, args={"event_id": "..."}
Backend returned INVALID_ARGUMENT: start_at and end_at are required
The same invalid call repeated until the worker exhausted the default ReAct limit

This proves:

The worker knew the event identifier.
The current CLI protocol did not expose a clear "get one event by id" action.
The current naming (read) encouraged the worker to map both range listing and single-event detail lookup onto one ambiguous command.

2.5 The current calendar domain is already split into two real business objects

Database evidence:

public.schedule_items
public.schedule_subscriptions

Current schema highlights:

schedule_items

id
owner_id
title
description
start_at
end_at
timezone
metadata
recurrence_rule
source_type
status

schedule_subscriptions

item_id
subscriber_id
permission
notify_level
status

Current backend routes and services already reflect this split:

list events by range
get event by id
create event
update event
delete event
share/invite event
accept subscription
reject subscription

Confirmed in:

backend/src/v1/schedule_items/router.py
backend/src/v1/schedule_items/service.py

2.6 Frontend already distinguishes list vs detail vs invite flows

Confirmed in:

apps/lib/features/calendar/data/apis/calendar_api.dart
apps/lib/features/calendar/data/repositories/calendar_repository.dart

The frontend already calls:

GET /schedule-items?start_at&end_at
GET /schedule-items/{id}
POST /schedule-items/{id}/share
POST /schedule-items/{id}/accept
POST /schedule-items/{id}/reject

Implication:

The product itself already separates these business operations.
The ambiguity exists in the agent CLI input contract, not in the underlying app/domain design.

3. Problem Statement

The current one-tool design has the right high-level direction but the wrong action protocol.

3.1 What was correct in the previous refactor

The following direction remains valid and should be preserved:

One AgentScope tool entry (project_cli) is preferable to many domain tools for token control.
AgentScope skills should be the mechanism for teaching the model when and how to use the tool.
Tool outputs should remain structured and machine-oriented.
AG-UI/UI-schema compilation should remain backend-owned.
The worker should not receive all tool knowledge eagerly.

3.2 What is no longer acceptable

The following parts of the previous CLI protocol should be replaced:

command + subcommand + args as the model-facing protocol.
Ambiguous action names such as read that cover more than one business intent.
Loose args: dict[str, Any] semantics that encourage field guessing.
Legacy alias drift such as start_time/end_time, event_timezone, and other migration leftovers.
Runtime dependence on long prose skill files instead of short execution-oriented action cards.

3.3 Why the old CLI shape fails even though the single-tool strategy is good

The current single-tool protocol is too generic for a small model.

The worker must infer, from weak labels like read, all of the following at once:

Which business object is involved.
Whether the user wants a list or one detail record.
Which fields are mandatory for that specific subcommand.
Which field names are canonical.

This moves too much burden from the runtime protocol into model guesswork.

The result is not just correctness risk. It also increases token cost because the worker burns iterations learning through failure.

4. Design Principles

4.1 Keep exactly one tool

The worker should continue to see one executable tool:

project_cli

Reason:

avoids multi-tool selection overhead
avoids injecting many tool schemas into every model call
preserves a stable tool surface for worker prompting

4.2 Move model-facing semantics from CLI history to business actions

The model-facing protocol should describe business intent directly, not technical command-tree history.

Replace:

{
  "command": "calendar",
  "subcommand": "read",
  "args": {}
}

With:

{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "event",
    "event_id": "<uuid>"
  }
}

This preserves one tool while making the business contract explicit.

4.3 Use progressive disclosure for skill knowledge, not for raw global schema exposure

The worker should not receive all method definitions by default.

Instead:

Read a short skill index first.
Read the relevant method card only when necessary.
Call project_cli with the chosen module/method/input payload.

This keeps the token budget focused on the current business scenario.

4.4 Server-side validation stays strict even if the tool schema stays thin

To avoid a large tool schema, project_cli may expose only a thin outer schema:

module
method
input

Strict validation then happens server-side by dispatching module + method to the corresponding Pydantic model.

For calendar reads, the input must use strong typed domain values at the schema boundary:

day reads: date
range reads: timezone-aware datetime
single-event reads: UUID

The transport remains JSON, but the backend contract must validate these as typed values immediately instead of accepting arbitrary strings and reparsing them later.

This preserves strictness without forcing the entire action matrix into the model context.

4.5 No broad backward-compatibility layer

This redesign should not preserve old field aliases or broad coercion behavior.

Specifically, phase implementation should remove or reject:

args as JSON string
start_time/end_time
event_timezone
action overloading under read

The system should fail clearly and structurally instead of guessing.

5. Target Architecture

5.1 Runtime responsibilities

Router

The router remains a direct structured output stage.

It should continue to decide:

the objective
whether tool evidence is required

It should be extended to optionally provide stronger execution hints:

selected_skill
intended_action
known_entities
known_time_range
missing_fields

These fields are not there to make router execute tools. They are there to reduce worker exploration cost.

Worker

The worker remains the only ReAct stage.

Worker changes in this redesign:

Explicitly set max_iters=7.
Keep temperature unchanged.
Stop pretending worker consumes context_messages configuration.
Prefer router execution hints before reading additional skill files.
Read the smallest relevant skill file possible before tool use.

Tool

The worker still sees only:

project_cli
view_skill_file

project_cli is the execution boundary. view_skill_file is the progressive-disclosure knowledge boundary.

5.2 New `project_cli` model-facing input contract

The canonical model-facing payload is:

{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "event",
    "event_id": "<uuid>"
  }
}

Field meanings:

module: enabled business module namespace (calendar, contacts, memory)
method: concrete business operation inside the module
input: strict method-specific payload

This is still one tool call. The worker is not choosing among many tools.

5.3 Calendar method protocol

The calendar module exposes the following methods registered in the CLI router:

Module	Method	Handler	Input Shape
calendar	read	`handle_calendar_list_day`	discriminated by `mode`
calendar	create	`handle_calendar_create_event`	title, start_at, timezone, ...
calendar	update	`handle_calendar_update_event`	event_id + patch
calendar	delete	`handle_calendar_delete_event`	event_id
calendar	share	`handle_calendar_invite_subscriber`	event_id, invitee, permissions
calendar	accept_invite	`handle_calendar_accept_invite`	event_id
calendar	reject_invite	`handle_calendar_reject_invite`	event_id

The read method uses a discriminated union with mode field to dispatch to list_day, list_range, or get_event internally.

This avoids overloading one label like read for two distinct business tasks.

5.4 Canonical calendar method shapes

`read` with mode=day (list one day)

{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "day",
    "date": "2026-04-23",
    "timezone": "Asia/Shanghai"
  }
}

`read` with mode=range (list time range)

{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "range",
    "start_at": "2026-04-23T00:00:00+08:00",
    "end_at": "2026-04-24T00:00:00+08:00"
  }
}

`read` with mode=event (get by ID)

{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "event",
    "event_id": "<uuid>"
  }
}

`create`

{
  "module": "calendar",
  "method": "create",
  "input": {
    "title": "Project sync",
    "start_at": "2026-04-23T16:00:00+08:00",
    "end_at": "2026-04-23T17:00:00+08:00",
    "timezone": "Asia/Shanghai",
    "description": "optional",
    "metadata": {
      "location": "optional",
      "reminder_minutes": 30,
      "color": "blue",
      "notes": "optional"
    }
  }
}

`update`

{
  "module": "calendar",
  "method": "update",
  "input": {
    "event_id": "<uuid>",
    "patch": {
      "title": "Updated title",
      "start_at": "2026-04-23T18:00:00+08:00",
      "timezone": "Asia/Shanghai",
      "status": "archived"
    }
  }
}

`delete`

{
  "module": "calendar",
  "method": "delete",
  "input": {
    "event_id": "<uuid>"
  }
}

`share`

{
  "module": "calendar",
  "method": "share",
  "input": {
    "event_id": "<uuid>",
    "invitee": {
      "phone": "+8613812345678"
    },
    "permissions": {
      "view": true,
      "edit": false,
      "invite": false
    }
  }
}

`accept_invite`

{
  "module": "calendar",
  "method": "accept_invite",
  "input": {
    "event_id": "<uuid>"
  }
}

`reject_invite`

{
  "module": "calendar",
  "method": "reject_invite",
  "input": {
    "event_id": "<uuid>"
  }
}

5.5 Skill packaging for progressive disclosure

The calendar skill should no longer be one long explanatory page that the worker must read in full.

Recommended structure:

calendar/
  SKILL.md               # very short index / navigation card
  actions/
    list_day.md
    list_range.md
    get_event.md
    create_event.md
    update_event.md
    delete_event.md
    invite_subscriber.md
    accept_invite.md
    reject_invite.md

`SKILL.md` responsibilities

describe when calendar skill is relevant
list all actions in one screen
say which action to use for known event_id
say which action to use for date/range queries
point to action files for exact payloads

Action file responsibilities

Each action file should contain only:

when to use the action
required fields
optional fields
one canonical example
forbidden field names and common mistakes

This makes view_skill_file a real progressive-disclosure mechanism instead of a markdown dump.

5.6 Error contract for self-correction

The redesigned CLI returns structured validation feedback with field-level detail.

Canonical error example:

{
  "ok": false,
  "module": "calendar",
  "method": "read",
  "error": {
    "code": "INVALID_ACTION_INPUT",
    "message": "input does not match method schema",
    "retryable": false,
    "details": {
      "missing_fields": ["start_at", "end_at"],
      "invalid_fields": [],
      "alias_corrections": {
        "start_time": "start_at"
      }
    }
  }
}

This is intentionally more corrective than the current generic INVALID_ARGUMENT payload.

6. Token and Cost Control Strategy

6.1 Preserve single-tool economy

The main token-saving choice is to preserve one executable business tool.

This avoids:

multiple tool schemas in each worker call
model confusion over which tool to pick first
large repeated tool descriptions in every turn

6.2 Replace global knowledge with scoped reading

The worker should read:

router execution hints first
skill index second
one action card if needed

This is cheaper than injecting the entire action matrix into every prompt.

6.3 Stop spending iterations on protocol discovery

The redesign reduces cost not by suppressing useful reasoning, but by removing the need for repeated failed exploration.

The worker should no longer need multiple failed attempts to discover:

whether event_id belongs to read
whether start_time is valid
whether event_timezone is accepted

6.4 Concrete worker settings for this redesign

set worker max_iters=7
keep worker temperature unchanged
remove/ignore worker context_messages configuration in runtime semantics

6.5 Explicit non-goals in this task

This task does not include:

changing router into a ReAct stage
lowering worker temperature
adding duplicate-failure circuit breakers yet
exposing many separate AgentScope tools again

7. Migration Plan

Phase 0: Planning and protocol design

Write this PRD and implementation checklist.
Update protocol docs before runtime code changes.
Record rejected alternatives and reasoning.

Phase 1: Backend runtime contract

Extend router output schema with optional execution hints.
Explicitly set worker max_iters=7.
Remove semantic reliance on worker context_messages.
Redesign project_cli request payload as skill/action/input.

Phase 2: Calendar action dispatch

Replace current calendar command/subcommand routing with action dispatch.
Implement strict action-specific Pydantic models.
Remove legacy alias handling and generic dict coercion.
Return structured correction-oriented validation errors.

Phase 3: Skill refactor

Rewrite calendar/SKILL.md as a short index card.
Add per-action action-card files.
Update skill instructions so worker reads only what is needed.

Phase 4: Cross-layer alignment

Update relevant protocol docs.
Keep frontend consumption stable where possible.
Ensure tool result and AG-UI event semantics remain compatible.

Phase 5: Verification

Reproduce the previous failure case and confirm it routes to get_event.
Verify create-event flow uses canonical names only.
Verify range/day queries still work.
Verify invite/accept/reject flows map to current schedule subscription behavior.

8. Rejected Alternatives

8.1 Rejected: split back into many tools

Reason:

reintroduces tool-schema bloat
worsens tool-choice ambiguity
increases token overhead on every worker step

8.2 Rejected: keep `command/subcommand/args` and fix only the skill text

Reason:

the ambiguity is structural, not editorial
read still overloads distinct business operations
loose dict input still encourages field guessing

8.3 Rejected: put the full action schema into the tool prompt directly

Reason:

defeats progressive disclosure
grows the worker prompt on every turn
hurts cost and small-model reliability

9. Success Criteria

This redesign is successful only if all of the following are true:

The worker still sees one executable business tool.
The worker chooses calendar actions through business semantics, not command-tree guesswork.
The previous repeated-failure case becomes a direct get_event call when event_id is known.
The worker no longer relies on undocumented field aliases.
The runtime protocol is strictly validated server-side.
Skill reading is incremental and action-scoped.
Worker iteration cost is bounded by max_iters=7.
Backend, protocol docs, and frontend assumptions remain aligned.

19 KiB Raw Blame History

Single CLI + Progressive Skill Disclosure Redesign PRD

1. Goal

2. Confirmed Repository Facts

2.1 Router is not ReAct

2.2 Worker is the only ReAct loop

2.3 Worker does not consume context_messages

2.4 Latest failure was caused by protocol mismatch, not missing data

2.5 The current calendar domain is already split into two real business objects

2.6 Frontend already distinguishes list vs detail vs invite flows

3. Problem Statement

3.1 What was correct in the previous refactor

3.2 What is no longer acceptable

3.3 Why the old CLI shape fails even though the single-tool strategy is good

4. Design Principles

4.1 Keep exactly one tool

4.2 Move model-facing semantics from CLI history to business actions

4.3 Use progressive disclosure for skill knowledge, not for raw global schema exposure

4.4 Server-side validation stays strict even if the tool schema stays thin

4.5 No broad backward-compatibility layer

5. Target Architecture

5.1 Runtime responsibilities

Router

Worker

Tool

5.2 New project_cli model-facing input contract

5.3 Calendar method protocol

5.4 Canonical calendar method shapes

read with mode=day (list one day)

read with mode=range (list time range)

read with mode=event (get by ID)

create

update

delete

share

accept_invite

reject_invite

5.5 Skill packaging for progressive disclosure

SKILL.md responsibilities

Action file responsibilities

5.6 Error contract for self-correction

6. Token and Cost Control Strategy

6.1 Preserve single-tool economy

6.2 Replace global knowledge with scoped reading

6.3 Stop spending iterations on protocol discovery

6.4 Concrete worker settings for this redesign

6.5 Explicit non-goals in this task

7. Migration Plan

Phase 0: Planning and protocol design

Phase 1: Backend runtime contract

Phase 2: Calendar action dispatch

Phase 3: Skill refactor

Phase 4: Cross-layer alignment

Phase 5: Verification

8. Rejected Alternatives

8.1 Rejected: split back into many tools

8.2 Rejected: keep command/subcommand/args and fix only the skill text

8.3 Rejected: put the full action schema into the tool prompt directly

9. Success Criteria

19 KiB

Raw Blame History

5.2 New `project_cli` model-facing input contract

`read` with mode=day (list one day)

`read` with mode=range (list time range)

`read` with mode=event (get by ID)

`create`

`update`

`delete`

`share`

`accept_invite`

`reject_invite`

`SKILL.md` responsibilities

8.2 Rejected: keep `command/subcommand/args` and fix only the skill text