Files
social-app/.trellis/tasks/04-23-redesign-single-cli-skill-disclosure/prd.md
T
qzl d2d292a99e fix(agent): 修复 skill action 卡片调用约定、memory 强类型校验和死代码清理
- 所有 calendar action .md: skill/action 替换为 module/method + mode 字段
- handler_memory: 新增 Pydantic extra=forbid 模型替代手工 dict 校验
- memory/SKILL.md: 补充 UserMemoryContent/WorkProfileContent 全字段文档
- 移除 handler_calendar 死代码 _batch_status 和 runner 旧别名 AgentScopeReActRunner
- PRD §5.2-5.6 和 sse-events 协议对齐实际 module/method 实现
2026-04-24 14:10:57 +08:00

19 KiB

Single CLI + Progressive Skill Disclosure Redesign PRD

1. Goal

This task redesigns the current agent tool protocol around one confirmed product constraint:

  1. The runtime should continue to expose exactly one business tool to the worker agent: project_cli.
  2. The worker should learn how to use the tool through progressive skill disclosure instead of receiving a large global tool surface up front.
  3. The current command + subcommand + args transport should be replaced with a business-action protocol that matches real product objects and user intents.
  4. The redesign must remain grounded in the current repository's actual schedule domain:
    • schedule_items
    • schedule_subscriptions
  5. The redesign must reduce wasted retries and token consumption without reintroducing the old multi-tool schema explosion.

This PRD does not propose a broad agent-platform rewrite. It is a focused redesign of how the single CLI tool, skills, router output, and worker execution contract should work together.

2. Confirmed Repository Facts

2.1 Router is not ReAct

The router is a direct structured generation stage, not a ReAct loop.

Confirmed in:

  • backend/src/core/agentscope/runtime/runner.py:310
  • backend/src/core/agentscope/runtime/runner.py:325

_run_router_stage() uses finalize_json_response(...) and returns one RouterAgentOutput payload.

Implication:

  • Router cost control depends on prompt/schema size and retries in finalize_json_response, not max_iters.
  • Tool-choice ambiguity is a worker problem, not a router ReAct problem.

2.2 Worker is the only ReAct loop

The worker uses JsonReActAgent, which subclasses AgentScope ReActAgent.

Confirmed in:

  • backend/src/core/agentscope/runtime/json_react_agent.py
  • backend/src/core/agentscope/runtime/runner.py:495

The current code does not pass an explicit max_iters, so the worker inherits AgentScope's default.

Confirmed externally in the local environment by inspecting the installed ReActAgent.__init__ signature:

  • max_iters=10

Implication:

  • The worker currently has too much room to repeat invalid tool calls before failing.
  • This task will explicitly set worker max_iters=7.

2.3 Worker does not consume context_messages

The worker receives only the router contract message and not the original context_messages list.

Confirmed in:

  • backend/src/core/agentscope/runtime/runner.py:265
  • backend/src/core/agentscope/runtime/runner.py:285
  • backend/src/core/agentscope/runtime/runner.py:461

Implication:

  • worker.config.context_messages is currently semantically misleading.
  • Router history context remains important.
  • Worker runtime context should come from router output, system prompt, tool results, and optional memory, not duplicated chat history configuration.

2.4 Latest failure was caused by protocol mismatch, not missing data

Latest messages read from Supabase showed the following failure pattern:

  • Worker repeatedly called project_cli
  • Payload shape: command=calendar, subcommand=read, args={"event_id": "..."}
  • Backend returned INVALID_ARGUMENT: start_at and end_at are required
  • The same invalid call repeated until the worker exhausted the default ReAct limit

This proves:

  1. The worker knew the event identifier.
  2. The current CLI protocol did not expose a clear "get one event by id" action.
  3. The current naming (read) encouraged the worker to map both range listing and single-event detail lookup onto one ambiguous command.

2.5 The current calendar domain is already split into two real business objects

Database evidence:

  • public.schedule_items
  • public.schedule_subscriptions

Current schema highlights:

schedule_items

  • id
  • owner_id
  • title
  • description
  • start_at
  • end_at
  • timezone
  • metadata
  • recurrence_rule
  • source_type
  • status

schedule_subscriptions

  • item_id
  • subscriber_id
  • permission
  • notify_level
  • status

Current backend routes and services already reflect this split:

  • list events by range
  • get event by id
  • create event
  • update event
  • delete event
  • share/invite event
  • accept subscription
  • reject subscription

Confirmed in:

  • backend/src/v1/schedule_items/router.py
  • backend/src/v1/schedule_items/service.py

2.6 Frontend already distinguishes list vs detail vs invite flows

Confirmed in:

  • apps/lib/features/calendar/data/apis/calendar_api.dart
  • apps/lib/features/calendar/data/repositories/calendar_repository.dart

The frontend already calls:

  • GET /schedule-items?start_at&end_at
  • GET /schedule-items/{id}
  • POST /schedule-items/{id}/share
  • POST /schedule-items/{id}/accept
  • POST /schedule-items/{id}/reject

Implication:

  • The product itself already separates these business operations.
  • The ambiguity exists in the agent CLI input contract, not in the underlying app/domain design.

3. Problem Statement

The current one-tool design has the right high-level direction but the wrong action protocol.

3.1 What was correct in the previous refactor

The following direction remains valid and should be preserved:

  1. One AgentScope tool entry (project_cli) is preferable to many domain tools for token control.
  2. AgentScope skills should be the mechanism for teaching the model when and how to use the tool.
  3. Tool outputs should remain structured and machine-oriented.
  4. AG-UI/UI-schema compilation should remain backend-owned.
  5. The worker should not receive all tool knowledge eagerly.

3.2 What is no longer acceptable

The following parts of the previous CLI protocol should be replaced:

  1. command + subcommand + args as the model-facing protocol.
  2. Ambiguous action names such as read that cover more than one business intent.
  3. Loose args: dict[str, Any] semantics that encourage field guessing.
  4. Legacy alias drift such as start_time/end_time, event_timezone, and other migration leftovers.
  5. Runtime dependence on long prose skill files instead of short execution-oriented action cards.

3.3 Why the old CLI shape fails even though the single-tool strategy is good

The current single-tool protocol is too generic for a small model.

The worker must infer, from weak labels like read, all of the following at once:

  1. Which business object is involved.
  2. Whether the user wants a list or one detail record.
  3. Which fields are mandatory for that specific subcommand.
  4. Which field names are canonical.

This moves too much burden from the runtime protocol into model guesswork.

The result is not just correctness risk. It also increases token cost because the worker burns iterations learning through failure.

4. Design Principles

4.1 Keep exactly one tool

The worker should continue to see one executable tool:

  • project_cli

Reason:

  • avoids multi-tool selection overhead
  • avoids injecting many tool schemas into every model call
  • preserves a stable tool surface for worker prompting

4.2 Move model-facing semantics from CLI history to business actions

The model-facing protocol should describe business intent directly, not technical command-tree history.

Replace:

{
  "command": "calendar",
  "subcommand": "read",
  "args": {}
}

With:

{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "event",
    "event_id": "<uuid>"
  }
}

This preserves one tool while making the business contract explicit.

4.3 Use progressive disclosure for skill knowledge, not for raw global schema exposure

The worker should not receive all method definitions by default.

Instead:

  1. Read a short skill index first.
  2. Read the relevant method card only when necessary.
  3. Call project_cli with the chosen module/method/input payload.

This keeps the token budget focused on the current business scenario.

4.4 Server-side validation stays strict even if the tool schema stays thin

To avoid a large tool schema, project_cli may expose only a thin outer schema:

  • module
  • method
  • input

Strict validation then happens server-side by dispatching module + method to the corresponding Pydantic model.

For calendar reads, the input must use strong typed domain values at the schema boundary:

  • day reads: date
  • range reads: timezone-aware datetime
  • single-event reads: UUID

The transport remains JSON, but the backend contract must validate these as typed values immediately instead of accepting arbitrary strings and reparsing them later.

This preserves strictness without forcing the entire action matrix into the model context.

4.5 No broad backward-compatibility layer

This redesign should not preserve old field aliases or broad coercion behavior.

Specifically, phase implementation should remove or reject:

  • args as JSON string
  • start_time/end_time
  • event_timezone
  • action overloading under read

The system should fail clearly and structurally instead of guessing.

5. Target Architecture

5.1 Runtime responsibilities

Router

The router remains a direct structured output stage.

It should continue to decide:

  • the objective
  • whether tool evidence is required

It should be extended to optionally provide stronger execution hints:

  • selected_skill
  • intended_action
  • known_entities
  • known_time_range
  • missing_fields

These fields are not there to make router execute tools. They are there to reduce worker exploration cost.

Worker

The worker remains the only ReAct stage.

Worker changes in this redesign:

  1. Explicitly set max_iters=7.
  2. Keep temperature unchanged.
  3. Stop pretending worker consumes context_messages configuration.
  4. Prefer router execution hints before reading additional skill files.
  5. Read the smallest relevant skill file possible before tool use.

Tool

The worker still sees only:

  • project_cli
  • view_skill_file

project_cli is the execution boundary. view_skill_file is the progressive-disclosure knowledge boundary.

5.2 New project_cli model-facing input contract

The canonical model-facing payload is:

{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "event",
    "event_id": "<uuid>"
  }
}

Field meanings:

  • module: enabled business module namespace (calendar, contacts, memory)
  • method: concrete business operation inside the module
  • input: strict method-specific payload

This is still one tool call. The worker is not choosing among many tools.

5.3 Calendar method protocol

The calendar module exposes the following methods registered in the CLI router:

Module Method Handler Input Shape
calendar read handle_calendar_list_day discriminated by mode
calendar create handle_calendar_create_event title, start_at, timezone, ...
calendar update handle_calendar_update_event event_id + patch
calendar delete handle_calendar_delete_event event_id
calendar share handle_calendar_invite_subscriber event_id, invitee, permissions
calendar accept_invite handle_calendar_accept_invite event_id
calendar reject_invite handle_calendar_reject_invite event_id

The read method uses a discriminated union with mode field to dispatch to list_day, list_range, or get_event internally.

This avoids overloading one label like read for two distinct business tasks.

5.4 Canonical calendar method shapes

read with mode=day (list one day)

{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "day",
    "date": "2026-04-23",
    "timezone": "Asia/Shanghai"
  }
}

read with mode=range (list time range)

{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "range",
    "start_at": "2026-04-23T00:00:00+08:00",
    "end_at": "2026-04-24T00:00:00+08:00"
  }
}

read with mode=event (get by ID)

{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "event",
    "event_id": "<uuid>"
  }
}

create

{
  "module": "calendar",
  "method": "create",
  "input": {
    "title": "Project sync",
    "start_at": "2026-04-23T16:00:00+08:00",
    "end_at": "2026-04-23T17:00:00+08:00",
    "timezone": "Asia/Shanghai",
    "description": "optional",
    "metadata": {
      "location": "optional",
      "reminder_minutes": 30,
      "color": "blue",
      "notes": "optional"
    }
  }
}

update

{
  "module": "calendar",
  "method": "update",
  "input": {
    "event_id": "<uuid>",
    "patch": {
      "title": "Updated title",
      "start_at": "2026-04-23T18:00:00+08:00",
      "timezone": "Asia/Shanghai",
      "status": "archived"
    }
  }
}

delete

{
  "module": "calendar",
  "method": "delete",
  "input": {
    "event_id": "<uuid>"
  }
}

share

{
  "module": "calendar",
  "method": "share",
  "input": {
    "event_id": "<uuid>",
    "invitee": {
      "phone": "+8613812345678"
    },
    "permissions": {
      "view": true,
      "edit": false,
      "invite": false
    }
  }
}

accept_invite

{
  "module": "calendar",
  "method": "accept_invite",
  "input": {
    "event_id": "<uuid>"
  }
}

reject_invite

{
  "module": "calendar",
  "method": "reject_invite",
  "input": {
    "event_id": "<uuid>"
  }
}

5.5 Skill packaging for progressive disclosure

The calendar skill should no longer be one long explanatory page that the worker must read in full.

Recommended structure:

calendar/
  SKILL.md               # very short index / navigation card
  actions/
    list_day.md
    list_range.md
    get_event.md
    create_event.md
    update_event.md
    delete_event.md
    invite_subscriber.md
    accept_invite.md
    reject_invite.md

SKILL.md responsibilities

  • describe when calendar skill is relevant
  • list all actions in one screen
  • say which action to use for known event_id
  • say which action to use for date/range queries
  • point to action files for exact payloads

Action file responsibilities

Each action file should contain only:

  1. when to use the action
  2. required fields
  3. optional fields
  4. one canonical example
  5. forbidden field names and common mistakes

This makes view_skill_file a real progressive-disclosure mechanism instead of a markdown dump.

5.6 Error contract for self-correction

The redesigned CLI returns structured validation feedback with field-level detail.

Canonical error example:

{
  "ok": false,
  "module": "calendar",
  "method": "read",
  "error": {
    "code": "INVALID_ACTION_INPUT",
    "message": "input does not match method schema",
    "retryable": false,
    "details": {
      "missing_fields": ["start_at", "end_at"],
      "invalid_fields": [],
      "alias_corrections": {
        "start_time": "start_at"
      }
    }
  }
}

This is intentionally more corrective than the current generic INVALID_ARGUMENT payload.

6. Token and Cost Control Strategy

6.1 Preserve single-tool economy

The main token-saving choice is to preserve one executable business tool.

This avoids:

  • multiple tool schemas in each worker call
  • model confusion over which tool to pick first
  • large repeated tool descriptions in every turn

6.2 Replace global knowledge with scoped reading

The worker should read:

  1. router execution hints first
  2. skill index second
  3. one action card if needed

This is cheaper than injecting the entire action matrix into every prompt.

6.3 Stop spending iterations on protocol discovery

The redesign reduces cost not by suppressing useful reasoning, but by removing the need for repeated failed exploration.

The worker should no longer need multiple failed attempts to discover:

  • whether event_id belongs to read
  • whether start_time is valid
  • whether event_timezone is accepted

6.4 Concrete worker settings for this redesign

  • set worker max_iters=7
  • keep worker temperature unchanged
  • remove/ignore worker context_messages configuration in runtime semantics

6.5 Explicit non-goals in this task

This task does not include:

  • changing router into a ReAct stage
  • lowering worker temperature
  • adding duplicate-failure circuit breakers yet
  • exposing many separate AgentScope tools again

7. Migration Plan

Phase 0: Planning and protocol design

  1. Write this PRD and implementation checklist.
  2. Update protocol docs before runtime code changes.
  3. Record rejected alternatives and reasoning.

Phase 1: Backend runtime contract

  1. Extend router output schema with optional execution hints.
  2. Explicitly set worker max_iters=7.
  3. Remove semantic reliance on worker context_messages.
  4. Redesign project_cli request payload as skill/action/input.

Phase 2: Calendar action dispatch

  1. Replace current calendar command/subcommand routing with action dispatch.
  2. Implement strict action-specific Pydantic models.
  3. Remove legacy alias handling and generic dict coercion.
  4. Return structured correction-oriented validation errors.

Phase 3: Skill refactor

  1. Rewrite calendar/SKILL.md as a short index card.
  2. Add per-action action-card files.
  3. Update skill instructions so worker reads only what is needed.

Phase 4: Cross-layer alignment

  1. Update relevant protocol docs.
  2. Keep frontend consumption stable where possible.
  3. Ensure tool result and AG-UI event semantics remain compatible.

Phase 5: Verification

  1. Reproduce the previous failure case and confirm it routes to get_event.
  2. Verify create-event flow uses canonical names only.
  3. Verify range/day queries still work.
  4. Verify invite/accept/reject flows map to current schedule subscription behavior.

8. Rejected Alternatives

8.1 Rejected: split back into many tools

Reason:

  • reintroduces tool-schema bloat
  • worsens tool-choice ambiguity
  • increases token overhead on every worker step

8.2 Rejected: keep command/subcommand/args and fix only the skill text

Reason:

  • the ambiguity is structural, not editorial
  • read still overloads distinct business operations
  • loose dict input still encourages field guessing

8.3 Rejected: put the full action schema into the tool prompt directly

Reason:

  • defeats progressive disclosure
  • grows the worker prompt on every turn
  • hurts cost and small-model reliability

9. Success Criteria

This redesign is successful only if all of the following are true:

  1. The worker still sees one executable business tool.
  2. The worker chooses calendar actions through business semantics, not command-tree guesswork.
  3. The previous repeated-failure case becomes a direct get_event call when event_id is known.
  4. The worker no longer relies on undocumented field aliases.
  5. The runtime protocol is strictly validated server-side.
  6. Skill reading is incremental and action-scoped.
  7. Worker iteration cost is bounded by max_iters=7.
  8. Backend, protocol docs, and frontend assumptions remain aligned.