social-app/.trellis/tasks/04-23-redesign-single-cli-skill-disclosure/prd.md

# Single CLI + Progressive Skill Disclosure Redesign PRD

## 1. Goal

This task redesigns the current agent tool protocol around one confirmed product constraint:

1. The runtime should continue to expose exactly one business tool to the worker agent: `project_cli`.
2. The worker should learn how to use the tool through progressive skill disclosure instead of receiving a large global tool surface up front.
3. The current `command + subcommand + args` transport should be replaced with a business-action protocol that matches real product objects and user intents.
4. The redesign must remain grounded in the current repository's actual schedule domain:
   - `schedule_items`
   - `schedule_subscriptions`
5. The redesign must reduce wasted retries and token consumption without reintroducing the old multi-tool schema explosion.

This PRD does not propose a broad agent-platform rewrite. It is a focused redesign of how the single CLI tool, skills, router output, and worker execution contract should work together.

## 2. Confirmed Repository Facts

### 2.1 Router is not ReAct

The router is a direct structured generation stage, not a ReAct loop.

Confirmed in:

- `backend/src/core/agentscope/runtime/runner.py:310`
- `backend/src/core/agentscope/runtime/runner.py:325`

`_run_router_stage()` uses `finalize_json_response(...)` and returns one `RouterAgentOutput` payload.

Implication:

- Router cost control depends on prompt/schema size and retries in `finalize_json_response`, not `max_iters`.
- Tool-choice ambiguity is a worker problem, not a router ReAct problem.

### 2.2 Worker is the only ReAct loop

The worker uses `JsonReActAgent`, which subclasses AgentScope `ReActAgent`.

Confirmed in:

- `backend/src/core/agentscope/runtime/json_react_agent.py`
- `backend/src/core/agentscope/runtime/runner.py:495`

The current code does not pass an explicit `max_iters`, so the worker inherits AgentScope's default.

Confirmed externally in the local environment by inspecting the installed `ReActAgent.__init__` signature:

- `max_iters=10`

Implication:

- The worker currently has too much room to repeat invalid tool calls before failing.
- This task will explicitly set worker `max_iters=7`.

### 2.3 Worker does not consume context_messages

The worker receives only the router contract message and not the original `context_messages` list.

Confirmed in:

- `backend/src/core/agentscope/runtime/runner.py:265`
- `backend/src/core/agentscope/runtime/runner.py:285`
- `backend/src/core/agentscope/runtime/runner.py:461`

Implication:

- `worker.config.context_messages` is currently semantically misleading.
- Router history context remains important.
- Worker runtime context should come from router output, system prompt, tool results, and optional memory, not duplicated chat history configuration.

### 2.4 Latest failure was caused by protocol mismatch, not missing data

Latest messages read from Supabase showed the following failure pattern:

- Worker repeatedly called `project_cli`
- Payload shape: `command=calendar`, `subcommand=read`, `args={"event_id": "..."}`
- Backend returned `INVALID_ARGUMENT: start_at and end_at are required`
- The same invalid call repeated until the worker exhausted the default ReAct limit

This proves:

1. The worker knew the event identifier.
2. The current CLI protocol did not expose a clear "get one event by id" action.
3. The current naming (`read`) encouraged the worker to map both range listing and single-event detail lookup onto one ambiguous command.

### 2.5 The current calendar domain is already split into two real business objects

Database evidence:

- `public.schedule_items`
- `public.schedule_subscriptions`

Current schema highlights:

`schedule_items`

- `id`
- `owner_id`
- `title`
- `description`
- `start_at`
- `end_at`
- `timezone`
- `metadata`
- `recurrence_rule`
- `source_type`
- `status`

`schedule_subscriptions`

- `item_id`
- `subscriber_id`
- `permission`
- `notify_level`
- `status`

Current backend routes and services already reflect this split:

- list events by range
- get event by id
- create event
- update event
- delete event
- share/invite event
- accept subscription
- reject subscription

Confirmed in:

- `backend/src/v1/schedule_items/router.py`
- `backend/src/v1/schedule_items/service.py`

### 2.6 Frontend already distinguishes list vs detail vs invite flows

Confirmed in:

- `apps/lib/features/calendar/data/apis/calendar_api.dart`
- `apps/lib/features/calendar/data/repositories/calendar_repository.dart`

The frontend already calls:

- `GET /schedule-items?start_at&end_at`
- `GET /schedule-items/{id}`
- `POST /schedule-items/{id}/share`
- `POST /schedule-items/{id}/accept`
- `POST /schedule-items/{id}/reject`

Implication:

- The product itself already separates these business operations.
- The ambiguity exists in the agent CLI input contract, not in the underlying app/domain design.

## 3. Problem Statement

The current one-tool design has the right high-level direction but the wrong action protocol.

### 3.1 What was correct in the previous refactor

The following direction remains valid and should be preserved:

1. One AgentScope tool entry (`project_cli`) is preferable to many domain tools for token control.
2. AgentScope skills should be the mechanism for teaching the model when and how to use the tool.
3. Tool outputs should remain structured and machine-oriented.
4. AG-UI/UI-schema compilation should remain backend-owned.
5. The worker should not receive all tool knowledge eagerly.

### 3.2 What is no longer acceptable

The following parts of the previous CLI protocol should be replaced:

1. `command + subcommand + args` as the model-facing protocol.
2. Ambiguous action names such as `read` that cover more than one business intent.
3. Loose `args: dict[str, Any]` semantics that encourage field guessing.
4. Legacy alias drift such as `start_time/end_time`, `event_timezone`, and other migration leftovers.
5. Runtime dependence on long prose skill files instead of short execution-oriented action cards.

### 3.3 Why the old CLI shape fails even though the single-tool strategy is good

The current single-tool protocol is too generic for a small model.

The worker must infer, from weak labels like `read`, all of the following at once:

1. Which business object is involved.
2. Whether the user wants a list or one detail record.
3. Which fields are mandatory for that specific subcommand.
4. Which field names are canonical.

This moves too much burden from the runtime protocol into model guesswork.

The result is not just correctness risk. It also increases token cost because the worker burns iterations learning through failure.

## 4. Design Principles

### 4.1 Keep exactly one tool

The worker should continue to see one executable tool:

- `project_cli`

Reason:

- avoids multi-tool selection overhead
- avoids injecting many tool schemas into every model call
- preserves a stable tool surface for worker prompting

### 4.2 Move model-facing semantics from CLI history to business actions

The model-facing protocol should describe business intent directly, not technical command-tree history.

Replace:

```json
{
  "command": "calendar",
  "subcommand": "read",
  "args": {}
}
```

With:

```json
{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "event",
    "event_id": "<uuid>"
  }
}
```

This preserves one tool while making the business contract explicit.

### 4.3 Use progressive disclosure for skill knowledge, not for raw global schema exposure

The worker should not receive all method definitions by default.

Instead:

1. Read a short skill index first.
2. Read the relevant method card only when necessary.
3. Call `project_cli` with the chosen `module/method/input` payload.

This keeps the token budget focused on the current business scenario.

### 4.4 Server-side validation stays strict even if the tool schema stays thin

To avoid a large tool schema, `project_cli` may expose only a thin outer schema:

- `module`
- `method`
- `input`

Strict validation then happens server-side by dispatching `module + method` to the corresponding Pydantic model.

For calendar reads, the input must use strong typed domain values at the schema boundary:

- day reads: `date`
- range reads: timezone-aware `datetime`
- single-event reads: `UUID`

The transport remains JSON, but the backend contract must validate these as typed values immediately instead of accepting arbitrary strings and reparsing them later.

This preserves strictness without forcing the entire action matrix into the model context.

### 4.5 No broad backward-compatibility layer

This redesign should not preserve old field aliases or broad coercion behavior.

Specifically, phase implementation should remove or reject:

- `args` as JSON string
- `start_time/end_time`
- `event_timezone`
- action overloading under `read`

The system should fail clearly and structurally instead of guessing.

## 5. Target Architecture

## 5.1 Runtime responsibilities

### Router

The router remains a direct structured output stage.

It should continue to decide:

- the objective
- whether tool evidence is required

It should be extended to optionally provide stronger execution hints:

- `selected_skill`
- `intended_action`
- `known_entities`
- `known_time_range`
- `missing_fields`

These fields are not there to make router execute tools. They are there to reduce worker exploration cost.

### Worker

The worker remains the only ReAct stage.

Worker changes in this redesign:

1. Explicitly set `max_iters=7`.
2. Keep `temperature` unchanged.
3. Stop pretending worker consumes `context_messages` configuration.
4. Prefer router execution hints before reading additional skill files.
5. Read the smallest relevant skill file possible before tool use.

### Tool

The worker still sees only:

- `project_cli`
- `view_skill_file`

`project_cli` is the execution boundary.
`view_skill_file` is the progressive-disclosure knowledge boundary.

## 5.2 New `project_cli` model-facing input contract

The canonical model-facing payload is:

```json
{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "event",
    "event_id": "<uuid>"
  }
}
```

Field meanings:

- `module`: enabled business module namespace (calendar, contacts, memory)
- `method`: concrete business operation inside the module
- `input`: strict method-specific payload

This is still one tool call. The worker is not choosing among many tools.

## 5.3 Calendar method protocol

The calendar module exposes the following methods registered in the CLI router:

| Module   | Method         | Handler                          | Input Shape |
|----------|----------------|----------------------------------|-------------|
| calendar | read           | `handle_calendar_list_day`       | discriminated by `mode` |
| calendar | create         | `handle_calendar_create_event`   | title, start_at, timezone, ... |
| calendar | update         | `handle_calendar_update_event`   | event_id + patch |
| calendar | delete         | `handle_calendar_delete_event`   | event_id |
| calendar | share          | `handle_calendar_invite_subscriber` | event_id, invitee, permissions |
| calendar | accept_invite  | `handle_calendar_accept_invite`  | event_id |
| calendar | reject_invite  | `handle_calendar_reject_invite`  | event_id |

The `read` method uses a discriminated union with `mode` field to dispatch to list_day, list_range, or get_event internally.

This avoids overloading one label like `read` for two distinct business tasks.

## 5.4 Canonical calendar method shapes

### `read` with mode=day (list one day)

```json
{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "day",
    "date": "2026-04-23",
    "timezone": "Asia/Shanghai"
  }
}
```

### `read` with mode=range (list time range)

```json
{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "range",
    "start_at": "2026-04-23T00:00:00+08:00",
    "end_at": "2026-04-24T00:00:00+08:00"
  }
}
```

### `read` with mode=event (get by ID)

```json
{
  "module": "calendar",
  "method": "read",
  "input": {
    "mode": "event",
    "event_id": "<uuid>"
  }
}
```

### `create`

```json
{
  "module": "calendar",
  "method": "create",
  "input": {
    "title": "Project sync",
    "start_at": "2026-04-23T16:00:00+08:00",
    "end_at": "2026-04-23T17:00:00+08:00",
    "timezone": "Asia/Shanghai",
    "description": "optional",
    "metadata": {
      "location": "optional",
      "reminder_minutes": 30,
      "color": "blue",
      "notes": "optional"
    }
  }
}
```

### `update`

```json
{
  "module": "calendar",
  "method": "update",
  "input": {
    "event_id": "<uuid>",
    "patch": {
      "title": "Updated title",
      "start_at": "2026-04-23T18:00:00+08:00",
      "timezone": "Asia/Shanghai",
      "status": "archived"
    }
  }
}
```

### `delete`

```json
{
  "module": "calendar",
  "method": "delete",
  "input": {
    "event_id": "<uuid>"
  }
}
```

### `share`

```json
{
  "module": "calendar",
  "method": "share",
  "input": {
    "event_id": "<uuid>",
    "invitee": {
      "phone": "+8613812345678"
    },
    "permissions": {
      "view": true,
      "edit": false,
      "invite": false
    }
  }
}
```

### `accept_invite`

```json
{
  "module": "calendar",
  "method": "accept_invite",
  "input": {
    "event_id": "<uuid>"
  }
}
```

### `reject_invite`

```json
{
  "module": "calendar",
  "method": "reject_invite",
  "input": {
    "event_id": "<uuid>"
  }
}
```

## 5.5 Skill packaging for progressive disclosure

The calendar skill should no longer be one long explanatory page that the worker must read in full.

Recommended structure:

```text
calendar/
  SKILL.md               # very short index / navigation card
  actions/
    list_day.md
    list_range.md
    get_event.md
    create_event.md
    update_event.md
    delete_event.md
    invite_subscriber.md
    accept_invite.md
    reject_invite.md
```

### `SKILL.md` responsibilities

- describe when calendar skill is relevant
- list all actions in one screen
- say which action to use for known `event_id`
- say which action to use for date/range queries
- point to action files for exact payloads

### Action file responsibilities

Each action file should contain only:

1. when to use the action
2. required fields
3. optional fields
4. one canonical example
5. forbidden field names and common mistakes

This makes `view_skill_file` a real progressive-disclosure mechanism instead of a markdown dump.

## 5.6 Error contract for self-correction

The redesigned CLI returns structured validation feedback with field-level detail.

Canonical error example:

```json
{
  "ok": false,
  "module": "calendar",
  "method": "read",
  "error": {
    "code": "INVALID_ACTION_INPUT",
    "message": "input does not match method schema",
    "retryable": false,
    "details": {
      "missing_fields": ["start_at", "end_at"],
      "invalid_fields": [],
      "alias_corrections": {
        "start_time": "start_at"
      }
    }
  }
}
```

This is intentionally more corrective than the current generic `INVALID_ARGUMENT` payload.

## 6. Token and Cost Control Strategy

### 6.1 Preserve single-tool economy

The main token-saving choice is to preserve one executable business tool.

This avoids:

- multiple tool schemas in each worker call
- model confusion over which tool to pick first
- large repeated tool descriptions in every turn

### 6.2 Replace global knowledge with scoped reading

The worker should read:

1. router execution hints first
2. skill index second
3. one action card if needed

This is cheaper than injecting the entire action matrix into every prompt.

### 6.3 Stop spending iterations on protocol discovery

The redesign reduces cost not by suppressing useful reasoning, but by removing the need for repeated failed exploration.

The worker should no longer need multiple failed attempts to discover:

- whether `event_id` belongs to `read`
- whether `start_time` is valid
- whether `event_timezone` is accepted

### 6.4 Concrete worker settings for this redesign

- set worker `max_iters=7`
- keep worker `temperature` unchanged
- remove/ignore worker `context_messages` configuration in runtime semantics

### 6.5 Explicit non-goals in this task

This task does not include:

- changing router into a ReAct stage
- lowering worker temperature
- adding duplicate-failure circuit breakers yet
- exposing many separate AgentScope tools again

## 7. Migration Plan

### Phase 0: Planning and protocol design

1. Write this PRD and implementation checklist.
2. Update protocol docs before runtime code changes.
3. Record rejected alternatives and reasoning.

### Phase 1: Backend runtime contract

1. Extend router output schema with optional execution hints.
2. Explicitly set worker `max_iters=7`.
3. Remove semantic reliance on worker `context_messages`.
4. Redesign `project_cli` request payload as `skill/action/input`.

### Phase 2: Calendar action dispatch

1. Replace current calendar command/subcommand routing with action dispatch.
2. Implement strict action-specific Pydantic models.
3. Remove legacy alias handling and generic dict coercion.
4. Return structured correction-oriented validation errors.

### Phase 3: Skill refactor

1. Rewrite `calendar/SKILL.md` as a short index card.
2. Add per-action action-card files.
3. Update skill instructions so worker reads only what is needed.

### Phase 4: Cross-layer alignment

1. Update relevant protocol docs.
2. Keep frontend consumption stable where possible.
3. Ensure tool result and AG-UI event semantics remain compatible.

### Phase 5: Verification

1. Reproduce the previous failure case and confirm it routes to `get_event`.
2. Verify create-event flow uses canonical names only.
3. Verify range/day queries still work.
4. Verify invite/accept/reject flows map to current schedule subscription behavior.

## 8. Rejected Alternatives

### 8.1 Rejected: split back into many tools

Reason:

- reintroduces tool-schema bloat
- worsens tool-choice ambiguity
- increases token overhead on every worker step

### 8.2 Rejected: keep `command/subcommand/args` and fix only the skill text

Reason:

- the ambiguity is structural, not editorial
- `read` still overloads distinct business operations
- loose dict input still encourages field guessing

### 8.3 Rejected: put the full action schema into the tool prompt directly

Reason:

- defeats progressive disclosure
- grows the worker prompt on every turn
- hurts cost and small-model reliability

## 9. Success Criteria

This redesign is successful only if all of the following are true:

1. The worker still sees one executable business tool.
2. The worker chooses calendar actions through business semantics, not command-tree guesswork.
3. The previous repeated-failure case becomes a direct `get_event` call when `event_id` is known.
4. The worker no longer relies on undocumented field aliases.
5. The runtime protocol is strictly validated server-side.
6. Skill reading is incremental and action-scoped.
7. Worker iteration cost is bounded by `max_iters=7`.
8. Backend, protocol docs, and frontend assumptions remain aligned.