Files
social-app/.trellis/tasks/04-23-redesign-single-cli-skill-disclosure/prd.md
T
qzl d060962a5f feat(agent): redesign project_cli with module/method/input protocol
- Replace command/subcommand/args with module/method/input envelope
- Calendar handler uses discriminated union (mode) for read operations
- Strict Pydantic models with extra='forbid' for all calendar methods
- Worker max_iters=7, router prompt simplified (removed project_cli_defaults)
- Skill index cards + per-action files for progressive disclosure
- Frontend/AG-UI aligned to module/method dispatch
- Protocol docs updated to module/method/input contract

WIP: action cards need envelope fix, 2 tests need update, memory
handler needs Pydantic models.
2026-04-24 13:24:13 +08:00

18 KiB

Single CLI + Progressive Skill Disclosure Redesign PRD

1. Goal

This task redesigns the current agent tool protocol around one confirmed product constraint:

  1. The runtime should continue to expose exactly one business tool to the worker agent: project_cli.
  2. The worker should learn how to use the tool through progressive skill disclosure instead of receiving a large global tool surface up front.
  3. The current command + subcommand + args transport should be replaced with a business-action protocol that matches real product objects and user intents.
  4. The redesign must remain grounded in the current repository's actual schedule domain:
    • schedule_items
    • schedule_subscriptions
  5. The redesign must reduce wasted retries and token consumption without reintroducing the old multi-tool schema explosion.

This PRD does not propose a broad agent-platform rewrite. It is a focused redesign of how the single CLI tool, skills, router output, and worker execution contract should work together.

2. Confirmed Repository Facts

2.1 Router is not ReAct

The router is a direct structured generation stage, not a ReAct loop.

Confirmed in:

  • backend/src/core/agentscope/runtime/runner.py:310
  • backend/src/core/agentscope/runtime/runner.py:325

_run_router_stage() uses finalize_json_response(...) and returns one RouterAgentOutput payload.

Implication:

  • Router cost control depends on prompt/schema size and retries in finalize_json_response, not max_iters.
  • Tool-choice ambiguity is a worker problem, not a router ReAct problem.

2.2 Worker is the only ReAct loop

The worker uses JsonReActAgent, which subclasses AgentScope ReActAgent.

Confirmed in:

  • backend/src/core/agentscope/runtime/json_react_agent.py
  • backend/src/core/agentscope/runtime/runner.py:495

The current code does not pass an explicit max_iters, so the worker inherits AgentScope's default.

Confirmed externally in the local environment by inspecting the installed ReActAgent.__init__ signature:

  • max_iters=10

Implication:

  • The worker currently has too much room to repeat invalid tool calls before failing.
  • This task will explicitly set worker max_iters=7.

2.3 Worker does not consume context_messages

The worker receives only the router contract message and not the original context_messages list.

Confirmed in:

  • backend/src/core/agentscope/runtime/runner.py:265
  • backend/src/core/agentscope/runtime/runner.py:285
  • backend/src/core/agentscope/runtime/runner.py:461

Implication:

  • worker.config.context_messages is currently semantically misleading.
  • Router history context remains important.
  • Worker runtime context should come from router output, system prompt, tool results, and optional memory, not duplicated chat history configuration.

2.4 Latest failure was caused by protocol mismatch, not missing data

Latest messages read from Supabase showed the following failure pattern:

  • Worker repeatedly called project_cli
  • Payload shape: command=calendar, subcommand=read, args={"event_id": "..."}
  • Backend returned INVALID_ARGUMENT: start_at and end_at are required
  • The same invalid call repeated until the worker exhausted the default ReAct limit

This proves:

  1. The worker knew the event identifier.
  2. The current CLI protocol did not expose a clear "get one event by id" action.
  3. The current naming (read) encouraged the worker to map both range listing and single-event detail lookup onto one ambiguous command.

2.5 The current calendar domain is already split into two real business objects

Database evidence:

  • public.schedule_items
  • public.schedule_subscriptions

Current schema highlights:

schedule_items

  • id
  • owner_id
  • title
  • description
  • start_at
  • end_at
  • timezone
  • metadata
  • recurrence_rule
  • source_type
  • status

schedule_subscriptions

  • item_id
  • subscriber_id
  • permission
  • notify_level
  • status

Current backend routes and services already reflect this split:

  • list events by range
  • get event by id
  • create event
  • update event
  • delete event
  • share/invite event
  • accept subscription
  • reject subscription

Confirmed in:

  • backend/src/v1/schedule_items/router.py
  • backend/src/v1/schedule_items/service.py

2.6 Frontend already distinguishes list vs detail vs invite flows

Confirmed in:

  • apps/lib/features/calendar/data/apis/calendar_api.dart
  • apps/lib/features/calendar/data/repositories/calendar_repository.dart

The frontend already calls:

  • GET /schedule-items?start_at&end_at
  • GET /schedule-items/{id}
  • POST /schedule-items/{id}/share
  • POST /schedule-items/{id}/accept
  • POST /schedule-items/{id}/reject

Implication:

  • The product itself already separates these business operations.
  • The ambiguity exists in the agent CLI input contract, not in the underlying app/domain design.

3. Problem Statement

The current one-tool design has the right high-level direction but the wrong action protocol.

3.1 What was correct in the previous refactor

The following direction remains valid and should be preserved:

  1. One AgentScope tool entry (project_cli) is preferable to many domain tools for token control.
  2. AgentScope skills should be the mechanism for teaching the model when and how to use the tool.
  3. Tool outputs should remain structured and machine-oriented.
  4. AG-UI/UI-schema compilation should remain backend-owned.
  5. The worker should not receive all tool knowledge eagerly.

3.2 What is no longer acceptable

The following parts of the previous CLI protocol should be replaced:

  1. command + subcommand + args as the model-facing protocol.
  2. Ambiguous action names such as read that cover more than one business intent.
  3. Loose args: dict[str, Any] semantics that encourage field guessing.
  4. Legacy alias drift such as start_time/end_time, event_timezone, and other migration leftovers.
  5. Runtime dependence on long prose skill files instead of short execution-oriented action cards.

3.3 Why the old CLI shape fails even though the single-tool strategy is good

The current single-tool protocol is too generic for a small model.

The worker must infer, from weak labels like read, all of the following at once:

  1. Which business object is involved.
  2. Whether the user wants a list or one detail record.
  3. Which fields are mandatory for that specific subcommand.
  4. Which field names are canonical.

This moves too much burden from the runtime protocol into model guesswork.

The result is not just correctness risk. It also increases token cost because the worker burns iterations learning through failure.

4. Design Principles

4.1 Keep exactly one tool

The worker should continue to see one executable tool:

  • project_cli

Reason:

  • avoids multi-tool selection overhead
  • avoids injecting many tool schemas into every model call
  • preserves a stable tool surface for worker prompting

4.2 Move model-facing semantics from CLI history to business actions

The model-facing protocol should describe business intent directly, not technical command-tree history.

Replace:

{
  "command": "calendar",
  "subcommand": "read",
  "args": {}
}

With:

{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "event",
    "event_id": "<uuid>"
  }
}

This preserves one tool while making the business contract explicit.

4.3 Use progressive disclosure for skill knowledge, not for raw global schema exposure

The worker should not receive all method definitions by default.

Instead:

  1. Read a short skill index first.
  2. Read the relevant method card only when necessary.
  3. Call project_cli with the chosen module/method/input payload.

This keeps the token budget focused on the current business scenario.

4.4 Server-side validation stays strict even if the tool schema stays thin

To avoid a large tool schema, project_cli may expose only a thin outer schema:

  • module
  • method
  • input

Strict validation then happens server-side by dispatching module + method to the corresponding Pydantic model.

For calendar reads, the input must use strong typed domain values at the schema boundary:

  • day reads: date
  • range reads: timezone-aware datetime
  • single-event reads: UUID

The transport remains JSON, but the backend contract must validate these as typed values immediately instead of accepting arbitrary strings and reparsing them later.

This preserves strictness without forcing the entire action matrix into the model context.

4.5 No broad backward-compatibility layer

This redesign should not preserve old field aliases or broad coercion behavior.

Specifically, phase implementation should remove or reject:

  • args as JSON string
  • start_time/end_time
  • event_timezone
  • action overloading under read

The system should fail clearly and structurally instead of guessing.

5. Target Architecture

5.1 Runtime responsibilities

Router

The router remains a direct structured output stage.

It should continue to decide:

  • the objective
  • whether tool evidence is required

It should be extended to optionally provide stronger execution hints:

  • selected_skill
  • intended_action
  • known_entities
  • known_time_range
  • missing_fields

These fields are not there to make router execute tools. They are there to reduce worker exploration cost.

Worker

The worker remains the only ReAct stage.

Worker changes in this redesign:

  1. Explicitly set max_iters=7.
  2. Keep temperature unchanged.
  3. Stop pretending worker consumes context_messages configuration.
  4. Prefer router execution hints before reading additional skill files.
  5. Read the smallest relevant skill file possible before tool use.

Tool

The worker still sees only:

  • project_cli
  • view_skill_file

project_cli is the execution boundary. view_skill_file is the progressive-disclosure knowledge boundary.

5.2 New project_cli model-facing input contract

The new canonical model-facing payload is:

{
  "skill": "calendar",
  "action": "get_event",
  "input": {
    "event_id": "<uuid>"
  }
}

Field meanings:

  • skill: enabled business skill namespace
  • action: concrete business operation inside the skill
  • input: strict action-specific payload

This is still one tool call. The worker is not choosing among many tools.

5.3 Calendar action protocol

The calendar skill should be redesigned around real business actions derived from schedule_items and schedule_subscriptions.

Event actions

  1. list_day
  2. list_range
  3. get_event
  4. create_event
  5. update_event
  6. delete_event

Subscription actions

  1. invite_subscriber
  2. accept_invite
  3. reject_invite

Why this action set

This set directly maps to current product behavior:

  • user asks what is scheduled today -> list_day
  • user asks what is scheduled this week -> list_range
  • user asks for a known event's details -> get_event
  • user creates or edits a schedule item -> create_event / update_event
  • user removes a schedule item -> delete_event
  • user invites another person -> invite_subscriber
  • invite recipient responds -> accept_invite / reject_invite

This avoids overloading one label like read for two distinct business tasks.

5.4 Canonical calendar action shapes

list_day

{
  "skill": "calendar",
  "action": "list_day",
  "input": {
    "date": "2026-04-23",
    "timezone": "Asia/Shanghai"
  }
}

list_range

{
  "skill": "calendar",
  "action": "list_range",
  "input": {
    "start_at": "2026-04-23T00:00:00+08:00",
    "end_at": "2026-04-24T00:00:00+08:00"
  }
}

get_event

{
  "skill": "calendar",
  "action": "get_event",
  "input": {
    "event_id": "<uuid>"
  }
}

create_event

{
  "skill": "calendar",
  "action": "create_event",
  "input": {
    "title": "Project sync",
    "start_at": "2026-04-23T16:00:00+08:00",
    "end_at": "2026-04-23T17:00:00+08:00",
    "timezone": "Asia/Shanghai",
    "description": "optional",
    "metadata": {
      "location": "optional",
      "reminder_minutes": 30,
      "color": "blue",
      "notes": "optional"
    }
  }
}

update_event

{
  "skill": "calendar",
  "action": "update_event",
  "input": {
    "event_id": "<uuid>",
    "patch": {
      "title": "Updated title",
      "start_at": "2026-04-23T18:00:00+08:00",
      "timezone": "Asia/Shanghai",
      "status": "archived"
    }
  }
}

delete_event

{
  "skill": "calendar",
  "action": "delete_event",
  "input": {
    "event_id": "<uuid>"
  }
}

invite_subscriber

{
  "skill": "calendar",
  "action": "invite_subscriber",
  "input": {
    "event_id": "<uuid>",
    "invitee": {
      "phone": "+8613812345678"
    },
    "permissions": {
      "view": true,
      "edit": false,
      "invite": false
    }
  }
}

accept_invite

{
  "skill": "calendar",
  "action": "accept_invite",
  "input": {
    "event_id": "<uuid>"
  }
}

reject_invite

{
  "skill": "calendar",
  "action": "reject_invite",
  "input": {
    "event_id": "<uuid>"
  }
}

5.5 Skill packaging for progressive disclosure

The calendar skill should no longer be one long explanatory page that the worker must read in full.

Recommended structure:

calendar/
  SKILL.md               # very short index / navigation card
  actions/
    list_day.md
    list_range.md
    get_event.md
    create_event.md
    update_event.md
    delete_event.md
    invite_subscriber.md
    accept_invite.md
    reject_invite.md

SKILL.md responsibilities

  • describe when calendar skill is relevant
  • list all actions in one screen
  • say which action to use for known event_id
  • say which action to use for date/range queries
  • point to action files for exact payloads

Action file responsibilities

Each action file should contain only:

  1. when to use the action
  2. required fields
  3. optional fields
  4. one canonical example
  5. forbidden field names and common mistakes

This makes view_skill_file a real progressive-disclosure mechanism instead of a markdown dump.

5.6 Error contract for self-correction

The redesigned CLI should return structured action-level validation feedback.

Canonical error example:

{
  "status": "failure",
  "error": {
    "code": "INVALID_ACTION_INPUT",
    "message": "action list_range requires start_at and end_at",
    "skill": "calendar",
    "action": "list_range",
    "missing_fields": ["start_at", "end_at"],
    "unexpected_fields": ["event_id"],
    "suggested_alternative_actions": ["get_event"]
  }
}

This is intentionally more corrective than the current generic INVALID_ARGUMENT payload.

6. Token and Cost Control Strategy

6.1 Preserve single-tool economy

The main token-saving choice is to preserve one executable business tool.

This avoids:

  • multiple tool schemas in each worker call
  • model confusion over which tool to pick first
  • large repeated tool descriptions in every turn

6.2 Replace global knowledge with scoped reading

The worker should read:

  1. router execution hints first
  2. skill index second
  3. one action card if needed

This is cheaper than injecting the entire action matrix into every prompt.

6.3 Stop spending iterations on protocol discovery

The redesign reduces cost not by suppressing useful reasoning, but by removing the need for repeated failed exploration.

The worker should no longer need multiple failed attempts to discover:

  • whether event_id belongs to read
  • whether start_time is valid
  • whether event_timezone is accepted

6.4 Concrete worker settings for this redesign

  • set worker max_iters=7
  • keep worker temperature unchanged
  • remove/ignore worker context_messages configuration in runtime semantics

6.5 Explicit non-goals in this task

This task does not include:

  • changing router into a ReAct stage
  • lowering worker temperature
  • adding duplicate-failure circuit breakers yet
  • exposing many separate AgentScope tools again

7. Migration Plan

Phase 0: Planning and protocol design

  1. Write this PRD and implementation checklist.
  2. Update protocol docs before runtime code changes.
  3. Record rejected alternatives and reasoning.

Phase 1: Backend runtime contract

  1. Extend router output schema with optional execution hints.
  2. Explicitly set worker max_iters=7.
  3. Remove semantic reliance on worker context_messages.
  4. Redesign project_cli request payload as skill/action/input.

Phase 2: Calendar action dispatch

  1. Replace current calendar command/subcommand routing with action dispatch.
  2. Implement strict action-specific Pydantic models.
  3. Remove legacy alias handling and generic dict coercion.
  4. Return structured correction-oriented validation errors.

Phase 3: Skill refactor

  1. Rewrite calendar/SKILL.md as a short index card.
  2. Add per-action action-card files.
  3. Update skill instructions so worker reads only what is needed.

Phase 4: Cross-layer alignment

  1. Update relevant protocol docs.
  2. Keep frontend consumption stable where possible.
  3. Ensure tool result and AG-UI event semantics remain compatible.

Phase 5: Verification

  1. Reproduce the previous failure case and confirm it routes to get_event.
  2. Verify create-event flow uses canonical names only.
  3. Verify range/day queries still work.
  4. Verify invite/accept/reject flows map to current schedule subscription behavior.

8. Rejected Alternatives

8.1 Rejected: split back into many tools

Reason:

  • reintroduces tool-schema bloat
  • worsens tool-choice ambiguity
  • increases token overhead on every worker step

8.2 Rejected: keep command/subcommand/args and fix only the skill text

Reason:

  • the ambiguity is structural, not editorial
  • read still overloads distinct business operations
  • loose dict input still encourages field guessing

8.3 Rejected: put the full action schema into the tool prompt directly

Reason:

  • defeats progressive disclosure
  • grows the worker prompt on every turn
  • hurts cost and small-model reliability

9. Success Criteria

This redesign is successful only if all of the following are true:

  1. The worker still sees one executable business tool.
  2. The worker chooses calendar actions through business semantics, not command-tree guesswork.
  3. The previous repeated-failure case becomes a direct get_event call when event_id is known.
  4. The worker no longer relies on undocumented field aliases.
  5. The runtime protocol is strictly validated server-side.
  6. Skill reading is incremental and action-scoped.
  7. Worker iteration cost is bounded by max_iters=7.
  8. Backend, protocol docs, and frontend assumptions remain aligned.