feat(agent): redesign project_cli with module/method/input protocol

- Replace command/subcommand/args with module/method/input envelope - Calendar handler uses discriminated union (mode) for read operations - Strict Pydantic models with extra='forbid' for all calendar methods - Worker max_iters=7, router prompt simplified (removed project_cli_defaults) - Skill index cards + per-action files for progressive disclosure - Frontend/AG-UI aligned to module/method dispatch - Protocol docs updated to module/method/input contract WIP: action cards need envelope fix, 2 tests need update, memory handler needs Pydantic models.
2026-04-24 13:24:13 +08:00
parent ab526af2c4
commit d060962a5f
62 changed files with 4802 additions and 805 deletions
@@ -0,0 +1,3 @@
+{"file": ".opencode/commands/trellis/finish-work.md", "reason": "Finish work checklist"}
+{"file": ".opencode/commands/trellis/check-backend.md", "reason": "Backend check spec"}
+{"file": ".opencode/commands/trellis/check-frontend.md", "reason": "Frontend check spec"}
@@ -0,0 +1,2 @@
+{"file": ".opencode/commands/trellis/check-backend.md", "reason": "Backend check spec"}
+{"file": ".opencode/commands/trellis/check-frontend.md", "reason": "Frontend check spec"}
@@ -0,0 +1,173 @@
+# Decision Log: Single CLI + Progressive Skill Disclosure Redesign
+
+## Accepted Decisions
+
+### D1. Keep one executable tool
+
+Accepted.
+
+Reason:
+
+- lower tool-selection complexity
+- lower repeated schema exposure cost
+- consistent with the original valid direction of the CLI refactor
+
+### D2. Replace `command/subcommand/args` with `module/method/input`
+
+Accepted.
+
+Reason:
+
+- keeps one tool while removing ambiguous CLI-history semantics
+- aligns the worker-facing protocol with business intent
+- reduces repeated failure from action guessing
+- keeps skill files decoupled from the executable tool contract
+
+### D2.1 Remove `skill` from `project_cli`
+
+Accepted.
+
+Reason:
+
+- skill files are guidance, not transport
+- project_cli should execute by business module and method only
+- error messages and validation should remain tool-native and not point back into skill docs
+
+### D2.2 Use strong typed calendar read inputs
+
+Accepted.
+
+Reason:
+
+- `date: str` plus manual parsing is glue code and too easy to misuse
+- Pydantic `date`, timezone-aware `datetime`, and `UUID` give stricter and clearer validation
+- `calendar.read` can cover day/range/event reads with one module-scoped method while still keeping input modes explicit
+
+### D3. Keep progressive disclosure through skill files
+
+Accepted.
+
+Reason:
+
+- allows the model to load only current-scenario knowledge
+- avoids injecting every action definition into every model call
+- fits AgentScope skill usage better than giant tool schemas
+
+### D4. Split skill knowledge into index + action cards
+
+Accepted.
+
+Reason:
+
+- real progressive disclosure needs smaller files, not only a long `SKILL.md`
+- action-scoped files are easier for the worker to read and apply correctly
+
+### D5. Set worker `max_iters=7`
+
+Accepted.
+
+Reason:
+
+- current default 10 is too high for repeated invalid action discovery
+- 7 preserves room for complex tasks without keeping the current waste level
+
+### D6. Keep worker temperature unchanged
+
+Accepted.
+
+Reason:
+
+- explicit user requirement
+- this task focuses on protocol clarity and token efficiency, not generation-style tuning
+
+### D7. Remove semantic reliance on worker `context_messages`
+
+Accepted.
+
+Reason:
+
+- current runtime does not feed those messages into worker execution
+- keeping the config active on worker is misleading and complicates reasoning about cost
+
+## Rejected Decisions
+
+### R1. Re-split into many domain tools
+
+Rejected.
+
+Reason:
+
+- increases tool schema size
+- increases selection ambiguity
+- pushes the design back toward the old high-token path
+
+### R2. Keep old CLI shape and only improve skill writing
+
+Rejected.
+
+Reason:
+
+- failure came from structural action ambiguity, not only poor wording
+- `read` remains overloaded even with better prose
+
+### R3. Keep broad legacy input compatibility
+
+Rejected.
+
+Reason:
+
+- old aliases teach the model that guessing is acceptable
+- compatibility paths increase parser complexity and maintenance burden
+- the repository is still early enough to tighten the protocol cleanly
+
+### R4. Add duplicate-failure circuit breaker now
+
+Rejected for this task.
+
+Reason:
+
+- user explicitly wants to keep some exploration room while refining skills
+- this redesign should first fix the protocol itself
+
+## Open Questions To Resolve During Implementation
+
+1. Should router output add all optional execution hint fields in one step or phase them in gradually?
+2. Should `worker.config.context_messages` be removed from schema entirely or retained as ignored/deprecated for one migration cycle?
+3. Should `calendar` action files be separate files under `actions/` or a single file with stable sections and `ranges` reads?
+4. Should action validation errors include `suggested_alternative_actions` for every validation failure or only for selected known-confusion cases?
+5. Should `archive` become an explicit calendar action now, or remain represented via `update_event.patch.status = archived` until there is a dedicated route and UI contract?
+6. ~~Frontend/live integration assertions still need migration from `skill/action` to `module/method`~~ Resolved 2026-04-24: assertions migrated and integration tests passing.
+
+## Session 2026-04-24: Integration Test Debugging
+
+### D8. Tool schema `input` must be required, not optional
+
+Accepted.
+
+Root cause of `project_cli` repeatedly receiving `input: {}`:
+- `input: dict[str, Any] | None = None` generated a tool schema with `input` as optional, nullable, and `additionalProperties: true`
+- Small models (qwen3.5-flash) interpret this as "input can be anything, including empty object"
+- The tool schema has higher priority than skill file content in the model's attention
+- Fix: changed to `input: dict[str, Any]` (required, no default, no nullable)
+
+### D9. Router must not resolve time or suggest tool args
+
+Accepted.
+
+Previous router prompt included instructions for:
+- Including `project_cli_defaults` in `context_summary`
+- Standardizing time values to project_cli input format
+- Resolving relative dates via `system_time_local`
+
+This violated the router/worker responsibility split:
+- Router: intent extraction + context summary + requires_tool evidence
+- Worker: tool selection + time resolution via skill + ENV section + tool execution
+
+Fix: removed all `project_cli_defaults` and time-resolution instructions from router prompt.
+Time resolution is now the sole responsibility of worker + skill file, using `system_time_local` from ENV section as the single time source.
+
+### D10. Skill files should reference ENV section variable names explicitly
+
+Accepted.
+
+Instead of abstract instructions like "resolve dates using system_time_local", skill files should directly reference `system_time_local` and `timezone_effective` from `USER_CONTEXT_JSON` in the ENV section, with concrete examples showing how to extract values.
@@ -0,0 +1,6 @@
+{"file": ".trellis/workflow.md", "reason": "Project workflow and conventions"}
+{"file": ".trellis/spec/backend/index.md", "reason": "Backend development guide"}
+{"file": ".trellis/spec/frontend/index.md", "reason": "Frontend development guide"}
+{"file": ".trellis/spec/guides/cross-layer-thinking-guide.md", "reason": "Cross-layer contract checklist for protocol/backend/frontend alignment"}
+{"file": ".trellis/tasks/archive/2026-04/04-20-refactor-tool-cli-skill-ui-schema/prd.md", "reason": "Previous refactor decisions to preserve or intentionally replace"}
+{"file": ".trellis/tasks/04-23-redesign-single-cli-skill-disclosure/prd.md", "reason": "Current task source of truth"}
@@ -0,0 +1,241 @@
+# Single CLI + Progressive Skill Disclosure Implementation Checklist
+
+## Purpose
+
+This checklist turns the PRD into an execution plan. Complete items in order. Do not mark an item complete until the code, docs, and verification for that item are actually done.
+
+## Required Reading
+
+- [x] Read `backend/AGENTS.md`
+- [x] Read `apps/AGENTS.md`
+- [x] Read `.trellis/workflow.md`
+- [x] Read `.trellis/spec/backend/index.md`
+- [x] Read `.trellis/spec/guides/cross-layer-thinking-guide.md`
+- [x] Read the archived task docs in `.trellis/tasks/archive/2026-04/04-20-refactor-tool-cli-skill-ui-schema/`
+
+## Locked Decisions For This Task
+
+- [x] Router remains a direct structured stage, not ReAct
+- [x] Worker remains the only ReAct stage
+- [x] Worker `max_iters` target is 7
+- [x] Worker `temperature` stays unchanged
+- [x] Single executable tool entry remains `project_cli`
+- [x] `command/subcommand/args` model-facing input will be replaced
+- [x] The new model-facing input is `module/method/input`
+- [x] No broad backward-compatibility aliases will be kept
+- [x] Worker duplicate-failure circuit breaker is explicitly out of scope for this task
+
+## Phase 0: Task and Protocol Planning
+
+### 0.1 Task docs
+
+- [x] Create new trellis task directory
+- [x] Update `task.json` with real scope, summary, and related files
+- [x] Write `prd.md`
+- [x] Write `implementation-checklist.md`
+- [x] Write `decision-log.md`
+
+### 0.2 Design checkpoints captured
+
+- [x] Record why multi-tool exposure is rejected
+- [x] Record why old `command/subcommand/args` is rejected
+- [x] Record why single-tool progressive disclosure is preserved
+- [x] Record current Supabase failure evidence and what it implies
+
+## Phase 1: Protocol Docs First
+
+### 1.1 Update tool protocol docs
+
+- [x] Update `docs/protocols/agent/tool-protocol.md`
+- [x] Replace model-facing `command/subcommand/args` examples with `module/method/input`
+- [x] Document thin outer tool schema + strict server-side action validation
+- [x] Document structured validation errors with correction hints
+
+### 1.2 Update agent protocol docs
+
+- [x] Update `docs/protocols/agent/sse-events.md` if tool call arg examples change
+- [x] Update `docs/protocols/agent/api-endpoints.md` if history/examples mention old CLI arg shapes
+- [x] Update any agent protocol doc that currently assumes `calendar read/create/update/...`
+
+### 1.3 Cross-layer contract review
+
+- [x] Confirm backend examples, protocol docs, and frontend assumptions remain mutually consistent
+- [x] Confirm no doc still teaches the model old alias names like `start_time/end_time`
+
+## Phase 2: Router and Worker Runtime Contract
+
+### 2.1 Router output schema
+
+- [x] Keep existing `objective/context_summary/requires_tool_evidence` intact
+- [x] Reject heavier router output expansion in favor of a lighter contract and stronger `context_summary`
+- [x] Add/update tests for the retained lightweight router contract
+
+### 2.2 Router prompting
+
+- [x] Update `backend/src/core/agentscope/prompts/agent_prompt.py`
+- [x] Teach router to make `context_summary` execution-useful when IDs, dates, ranges, or prior tool outcomes matter
+- [x] Standardize all time values in `context_summary` to downstream project_cli input formats
+- [x] Avoid turning router into an executor
+
+### 2.3 Worker runtime settings
+
+- [x] Update `backend/src/core/agentscope/runtime/runner.py` to pass `max_iters=7` into `JsonReActAgent`
+- [x] Confirm worker `temperature` remains unchanged
+- [x] Remove worker runtime dependence on `context_messages` semantics in prompt/runtime guidance
+- [x] Keep schema unchanged for now, but stop exposing worker `context_messages` in worker prompt semantics
+
+### 2.4 Phase 2 verification
+
+- [x] Run targeted router/worker schema, prompt, and runner unit tests
+- [x] Confirm worker prompt no longer advertises `context_messages.mode/count`
+- [x] Confirm worker input still contains only the router contract message
+- [x] Confirm worker agent construction passes `max_iters=7`
+
+## Phase 3: Single CLI Input Protocol Redesign
+
+### 3.1 Replace model-facing request envelope
+
+- [x] Update `backend/src/core/agentscope/tools/internal/project_cli.py`
+- [x] Update `backend/src/core/agentscope/tools/cli/adapter.py`
+- [x] Replace `command/subcommand/args` with `module/method/input`
+- [x] Remove `args` string parsing compatibility
+- [x] Keep tool result persistence and AG-UI flow intact
+
+### 3.2 Action dispatch layer
+
+- [x] Add explicit dispatch by `module + method`
+- [x] Add strict per-method Pydantic request models with `extra="forbid"` for calendar methods
+- [x] Ensure unknown `module` and unknown `method` return structured errors
+- [x] Ensure method validators surface structured error details for invalid/missing fields
+
+### 3.3 Remove legacy input aliases
+
+- [x] Reject `start_time/end_time`
+- [x] Reject `event_timezone`
+- [x] Reject using `event_id` with list-style actions
+- [x] Confirm error messages are corrective rather than generic only
+
+## Phase 4: Calendar Business Action Protocol
+
+### 4.1 Event actions
+
+- [x] Implement `list_day`
+- [x] Implement `list_range`
+- [x] Implement `get_event`
+- [x] Implement `create_event`
+- [x] Implement `update_event`
+- [x] Implement `delete_event`
+
+### 4.2 Subscription actions
+
+- [x] Implement `invite_subscriber`
+- [x] Implement `accept_invite`
+- [x] Implement `reject_invite`
+
+### 4.3 Handler mapping
+
+- [x] Map actions onto existing `v1.schedule_items.service` operations where possible
+- [x] Keep repository -> service layering intact
+- [x] Keep `owner_id` derived from auth context, never from tool input
+- [x] Preserve existing permission and subscription semantics
+
+### 4.4 Test coverage
+
+- [x] Add targeted unit coverage for calendar action validation paths and dispatch shape changes
+- [x] Add unit tests for dispatch selection and validation errors
+- [x] Add regression tests for the known `event_id` detail flow
+- [x] Add regression tests for canonical create/update field names
+
+### 4.5 Phase 3 partial verification
+
+- [x] Run targeted CLI router, calendar handler, and tool postprocessor unit tests
+- [x] Confirm tool postprocessor resolves UI by `module/method`
+- [x] Update integration/live test expectations to the new tool_call_args/result shape
+- [x] Confirm integration/live flows execute successfully with the new runtime shape (calendar read verified 2026-04-24)
+
+## Phase 5: Skill Refactor For Progressive Disclosure
+
+### 5.1 Calendar skill packaging
+
+- [x] Rewrite `backend/src/core/agentscope/tools/skills/calendar/SKILL.md` as a short index card
+- [x] Add `actions/` files for each calendar action
+- [x] Keep action files short, canonical, and example-driven
+
+### 5.2 Skill composition rules
+
+- [x] Document when calendar should compose with contacts for phone lookup
+- [x] Document when worker should read only the index
+- [x] Document when worker should read one action file before calling the tool
+- [x] Remove long prose that does not help execution
+
+### 5.3 View skill flow
+
+- [x] Ensure `view_skill_file` can read the new per-action file layout
+- [x] Verify enabled-skill restrictions still work with nested action files
+- [x] Add tests for reading short skill index + action files
+
+## Phase 6: Frontend and AG-UI Alignment
+
+### 6.1 Frontend assumptions audit
+
+- [x] Audit `apps/lib/core/chat/ag_ui_event.dart` for tool arg assumptions
+- [x] Audit `apps/lib/features/chat/presentation/bloc/chat_bloc.dart` for `project_cli` payload assumptions
+- [x] Audit calendar refresh logic that currently looks for `command/subcommand`
+
+### 6.2 Compatibility decision
+
+- [x] Decide whether frontend should switch refresh logic to `module/method`
+- [x] Update frontend parsing if required
+- [x] Keep `ui_schema` rendering path unchanged unless protocol docs require otherwise
+
+### 6.3 Cross-layer verification checklist
+
+- [x] Confirm backend tool payload examples match frontend parser expectations
+- [x] Confirm history/SSE still preserve tool result display behavior
+- [ ] Confirm calendar detail navigation behavior still matches event identity semantics
+
+## Phase 7: Verification
+
+### 7.1 Supabase-backed regression scenarios
+
+- [ ] Reproduce the previous "known event_id detail lookup" scenario and verify `get_event` is used
+- [ ] Reproduce create-event scenario and verify canonical field names only
+- [ ] Reproduce same-day and range listing scenarios
+
+### 7.2 Runtime cost controls
+
+- [ ] Verify worker max iterations are capped at 7
+- [ ] Verify worker does not spend extra turns reading unnecessary skill files
+- [ ] Review whether the redesign keeps common runs under the target token/cost budget
+
+### 7.3 Code quality
+
+- [x] Run targeted backend tests
+- [x] Run targeted frontend tests if parser logic changes
+- [x] Run relevant integration or live tests when feasible
+- [x] Record what was verified and what remains unverified
+
+## Completion Criteria
+
+- [ ] `project_cli` remains the only executable business tool
+- [ ] Worker-facing CLI protocol uses `module/method/input`
+- [ ] Calendar actions map to actual product objects and routes
+- [ ] Skills are index-first and action-scoped
+- [ ] Worker `max_iters=7` is wired
+- [ ] Worker `context_messages` ambiguity is removed
+- [ ] Docs, backend, and frontend expectations are aligned
+
+## Current Status Note
+
+- [x] Backend protocol and unit/regression tests are updated to `module/method/input`
+- [x] Calendar read inputs now use strong typed `date` / timezone-aware `datetime` / `UUID` validation
+- [x] Integration/live tests have been rerun after the `module/method` redesign
+- [x] `project_cli` tool schema: `input` changed from optional to required (root cause of empty input bug)
+- [x] Router prompt cleaned: removed `project_cli_defaults` and time-resolution duties
+- [x] Worker contract prompt cleaned: removed `project_cli_defaults` reference
+- [x] Calendar SKILL.md rewritten with concrete examples referencing `USER_CONTEXT_JSON` variables
+- [x] Integration test assertions migrated from `skill/action` to `module/method`
+- [x] 5/6 integration tests passing (calendar read, calendar create, contacts read, memory update, tool flow read)
+- [ ] `test_tool_ui_schema_in_history` failing: history API returns tool messages without `metadata.tool_agent_output` (pre-existing issue, not related to prompt changes)
+- [ ] Action card filenames under `calendar/actions/` still use old names (`list_day.md`, `get_event.md`) instead of method-based names matching `module/method` contract
+- [ ] Per-method review needed: verify each `project_cli` method (create, update, delete, share, accept_invite, reject_invite) works end-to-end with current prompts
@@ -0,0 +1,705 @@
+# Single CLI + Progressive Skill Disclosure Redesign PRD
+
+## 1. Goal
+
+This task redesigns the current agent tool protocol around one confirmed product constraint:
+
+1. The runtime should continue to expose exactly one business tool to the worker agent: `project_cli`.
+2. The worker should learn how to use the tool through progressive skill disclosure instead of receiving a large global tool surface up front.
+3. The current `command + subcommand + args` transport should be replaced with a business-action protocol that matches real product objects and user intents.
+4. The redesign must remain grounded in the current repository's actual schedule domain:
+   - `schedule_items`
+   - `schedule_subscriptions`
+5. The redesign must reduce wasted retries and token consumption without reintroducing the old multi-tool schema explosion.
+
+This PRD does not propose a broad agent-platform rewrite. It is a focused redesign of how the single CLI tool, skills, router output, and worker execution contract should work together.
+
+## 2. Confirmed Repository Facts
+
+### 2.1 Router is not ReAct
+
+The router is a direct structured generation stage, not a ReAct loop.
+
+Confirmed in:
+
+- `backend/src/core/agentscope/runtime/runner.py:310`
+- `backend/src/core/agentscope/runtime/runner.py:325`
+
+`_run_router_stage()` uses `finalize_json_response(...)` and returns one `RouterAgentOutput` payload.
+
+Implication:
+
+- Router cost control depends on prompt/schema size and retries in `finalize_json_response`, not `max_iters`.
+- Tool-choice ambiguity is a worker problem, not a router ReAct problem.
+
+### 2.2 Worker is the only ReAct loop
+
+The worker uses `JsonReActAgent`, which subclasses AgentScope `ReActAgent`.
+
+Confirmed in:
+
+- `backend/src/core/agentscope/runtime/json_react_agent.py`
+- `backend/src/core/agentscope/runtime/runner.py:495`
+
+The current code does not pass an explicit `max_iters`, so the worker inherits AgentScope's default.
+
+Confirmed externally in the local environment by inspecting the installed `ReActAgent.__init__` signature:
+
+- `max_iters=10`
+
+Implication:
+
+- The worker currently has too much room to repeat invalid tool calls before failing.
+- This task will explicitly set worker `max_iters=7`.
+
+### 2.3 Worker does not consume context_messages
+
+The worker receives only the router contract message and not the original `context_messages` list.
+
+Confirmed in:
+
+- `backend/src/core/agentscope/runtime/runner.py:265`
+- `backend/src/core/agentscope/runtime/runner.py:285`
+- `backend/src/core/agentscope/runtime/runner.py:461`
+
+Implication:
+
+- `worker.config.context_messages` is currently semantically misleading.
+- Router history context remains important.
+- Worker runtime context should come from router output, system prompt, tool results, and optional memory, not duplicated chat history configuration.
+
+### 2.4 Latest failure was caused by protocol mismatch, not missing data
+
+Latest messages read from Supabase showed the following failure pattern:
+
+- Worker repeatedly called `project_cli`
+- Payload shape: `command=calendar`, `subcommand=read`, `args={"event_id": "..."}`
+- Backend returned `INVALID_ARGUMENT: start_at and end_at are required`
+- The same invalid call repeated until the worker exhausted the default ReAct limit
+
+This proves:
+
+1. The worker knew the event identifier.
+2. The current CLI protocol did not expose a clear "get one event by id" action.
+3. The current naming (`read`) encouraged the worker to map both range listing and single-event detail lookup onto one ambiguous command.
+
+### 2.5 The current calendar domain is already split into two real business objects
+
+Database evidence:
+
+- `public.schedule_items`
+- `public.schedule_subscriptions`
+
+Current schema highlights:
+
+`schedule_items`
+
+- `id`
+- `owner_id`
+- `title`
+- `description`
+- `start_at`
+- `end_at`
+- `timezone`
+- `metadata`
+- `recurrence_rule`
+- `source_type`
+- `status`
+
+`schedule_subscriptions`
+
+- `item_id`
+- `subscriber_id`
+- `permission`
+- `notify_level`
+- `status`
+
+Current backend routes and services already reflect this split:
+
+- list events by range
+- get event by id
+- create event
+- update event
+- delete event
+- share/invite event
+- accept subscription
+- reject subscription
+
+Confirmed in:
+
+- `backend/src/v1/schedule_items/router.py`
+- `backend/src/v1/schedule_items/service.py`
+
+### 2.6 Frontend already distinguishes list vs detail vs invite flows
+
+Confirmed in:
+
+- `apps/lib/features/calendar/data/apis/calendar_api.dart`
+- `apps/lib/features/calendar/data/repositories/calendar_repository.dart`
+
+The frontend already calls:
+
+- `GET /schedule-items?start_at&end_at`
+- `GET /schedule-items/{id}`
+- `POST /schedule-items/{id}/share`
+- `POST /schedule-items/{id}/accept`
+- `POST /schedule-items/{id}/reject`
+
+Implication:
+
+- The product itself already separates these business operations.
+- The ambiguity exists in the agent CLI input contract, not in the underlying app/domain design.
+
+## 3. Problem Statement
+
+The current one-tool design has the right high-level direction but the wrong action protocol.
+
+### 3.1 What was correct in the previous refactor
+
+The following direction remains valid and should be preserved:
+
+1. One AgentScope tool entry (`project_cli`) is preferable to many domain tools for token control.
+2. AgentScope skills should be the mechanism for teaching the model when and how to use the tool.
+3. Tool outputs should remain structured and machine-oriented.
+4. AG-UI/UI-schema compilation should remain backend-owned.
+5. The worker should not receive all tool knowledge eagerly.
+
+### 3.2 What is no longer acceptable
+
+The following parts of the previous CLI protocol should be replaced:
+
+1. `command + subcommand + args` as the model-facing protocol.
+2. Ambiguous action names such as `read` that cover more than one business intent.
+3. Loose `args: dict[str, Any]` semantics that encourage field guessing.
+4. Legacy alias drift such as `start_time/end_time`, `event_timezone`, and other migration leftovers.
+5. Runtime dependence on long prose skill files instead of short execution-oriented action cards.
+
+### 3.3 Why the old CLI shape fails even though the single-tool strategy is good
+
+The current single-tool protocol is too generic for a small model.
+
+The worker must infer, from weak labels like `read`, all of the following at once:
+
+1. Which business object is involved.
+2. Whether the user wants a list or one detail record.
+3. Which fields are mandatory for that specific subcommand.
+4. Which field names are canonical.
+
+This moves too much burden from the runtime protocol into model guesswork.
+
+The result is not just correctness risk. It also increases token cost because the worker burns iterations learning through failure.
+
+## 4. Design Principles
+
+### 4.1 Keep exactly one tool
+
+The worker should continue to see one executable tool:
+
+- `project_cli`
+
+Reason:
+
+- avoids multi-tool selection overhead
+- avoids injecting many tool schemas into every model call
+- preserves a stable tool surface for worker prompting
+
+### 4.2 Move model-facing semantics from CLI history to business actions
+
+The model-facing protocol should describe business intent directly, not technical command-tree history.
+
+Replace:
+
+```json
+{
+  "command": "calendar",
+  "subcommand": "read",
+  "args": {}
+}
+```
+
+With:
+
+```json
+{
+  "module": "calendar",
+  "method": "read",
+  "input": {
+    "mode": "event",
+    "event_id": "<uuid>"
+  }
+}
+```
+
+This preserves one tool while making the business contract explicit.
+
+### 4.3 Use progressive disclosure for skill knowledge, not for raw global schema exposure
+
+The worker should not receive all method definitions by default.
+
+Instead:
+
+1. Read a short skill index first.
+2. Read the relevant method card only when necessary.
+3. Call `project_cli` with the chosen `module/method/input` payload.
+
+This keeps the token budget focused on the current business scenario.
+
+### 4.4 Server-side validation stays strict even if the tool schema stays thin
+
+To avoid a large tool schema, `project_cli` may expose only a thin outer schema:
+
+- `module`
+- `method`
+- `input`
+
+Strict validation then happens server-side by dispatching `module + method` to the corresponding Pydantic model.
+
+For calendar reads, the input must use strong typed domain values at the schema boundary:
+
+- day reads: `date`
+- range reads: timezone-aware `datetime`
+- single-event reads: `UUID`
+
+The transport remains JSON, but the backend contract must validate these as typed values immediately instead of accepting arbitrary strings and reparsing them later.
+
+This preserves strictness without forcing the entire action matrix into the model context.
+
+### 4.5 No broad backward-compatibility layer
+
+This redesign should not preserve old field aliases or broad coercion behavior.
+
+Specifically, phase implementation should remove or reject:
+
+- `args` as JSON string
+- `start_time/end_time`
+- `event_timezone`
+- action overloading under `read`
+
+The system should fail clearly and structurally instead of guessing.
+
+## 5. Target Architecture
+
+## 5.1 Runtime responsibilities
+
+### Router
+
+The router remains a direct structured output stage.
+
+It should continue to decide:
+
+- the objective
+- whether tool evidence is required
+
+It should be extended to optionally provide stronger execution hints:
+
+- `selected_skill`
+- `intended_action`
+- `known_entities`
+- `known_time_range`
+- `missing_fields`
+
+These fields are not there to make router execute tools. They are there to reduce worker exploration cost.
+
+### Worker
+
+The worker remains the only ReAct stage.
+
+Worker changes in this redesign:
+
+1. Explicitly set `max_iters=7`.
+2. Keep `temperature` unchanged.
+3. Stop pretending worker consumes `context_messages` configuration.
+4. Prefer router execution hints before reading additional skill files.
+5. Read the smallest relevant skill file possible before tool use.
+
+### Tool
+
+The worker still sees only:
+
+- `project_cli`
+- `view_skill_file`
+
+`project_cli` is the execution boundary.
+`view_skill_file` is the progressive-disclosure knowledge boundary.
+
+## 5.2 New `project_cli` model-facing input contract
+
+The new canonical model-facing payload is:
+
+```json
+{
+  "skill": "calendar",
+  "action": "get_event",
+  "input": {
+    "event_id": "<uuid>"
+  }
+}
+```
+
+Field meanings:
+
+- `skill`: enabled business skill namespace
+- `action`: concrete business operation inside the skill
+- `input`: strict action-specific payload
+
+This is still one tool call. The worker is not choosing among many tools.
+
+## 5.3 Calendar action protocol
+
+The calendar skill should be redesigned around real business actions derived from `schedule_items` and `schedule_subscriptions`.
+
+### Event actions
+
+1. `list_day`
+2. `list_range`
+3. `get_event`
+4. `create_event`
+5. `update_event`
+6. `delete_event`
+
+### Subscription actions
+
+1. `invite_subscriber`
+2. `accept_invite`
+3. `reject_invite`
+
+### Why this action set
+
+This set directly maps to current product behavior:
+
+- user asks what is scheduled today -> `list_day`
+- user asks what is scheduled this week -> `list_range`
+- user asks for a known event's details -> `get_event`
+- user creates or edits a schedule item -> `create_event` / `update_event`
+- user removes a schedule item -> `delete_event`
+- user invites another person -> `invite_subscriber`
+- invite recipient responds -> `accept_invite` / `reject_invite`
+
+This avoids overloading one label like `read` for two distinct business tasks.
+
+## 5.4 Canonical calendar action shapes
+
+### `list_day`
+
+```json
+{
+  "skill": "calendar",
+  "action": "list_day",
+  "input": {
+    "date": "2026-04-23",
+    "timezone": "Asia/Shanghai"
+  }
+}
+```
+
+### `list_range`
+
+```json
+{
+  "skill": "calendar",
+  "action": "list_range",
+  "input": {
+    "start_at": "2026-04-23T00:00:00+08:00",
+    "end_at": "2026-04-24T00:00:00+08:00"
+  }
+}
+```
+
+### `get_event`
+
+```json
+{
+  "skill": "calendar",
+  "action": "get_event",
+  "input": {
+    "event_id": "<uuid>"
+  }
+}
+```
+
+### `create_event`
+
+```json
+{
+  "skill": "calendar",
+  "action": "create_event",
+  "input": {
+    "title": "Project sync",
+    "start_at": "2026-04-23T16:00:00+08:00",
+    "end_at": "2026-04-23T17:00:00+08:00",
+    "timezone": "Asia/Shanghai",
+    "description": "optional",
+    "metadata": {
+      "location": "optional",
+      "reminder_minutes": 30,
+      "color": "blue",
+      "notes": "optional"
+    }
+  }
+}
+```
+
+### `update_event`
+
+```json
+{
+  "skill": "calendar",
+  "action": "update_event",
+  "input": {
+    "event_id": "<uuid>",
+    "patch": {
+      "title": "Updated title",
+      "start_at": "2026-04-23T18:00:00+08:00",
+      "timezone": "Asia/Shanghai",
+      "status": "archived"
+    }
+  }
+}
+```
+
+### `delete_event`
+
+```json
+{
+  "skill": "calendar",
+  "action": "delete_event",
+  "input": {
+    "event_id": "<uuid>"
+  }
+}
+```
+
+### `invite_subscriber`
+
+```json
+{
+  "skill": "calendar",
+  "action": "invite_subscriber",
+  "input": {
+    "event_id": "<uuid>",
+    "invitee": {
+      "phone": "+8613812345678"
+    },
+    "permissions": {
+      "view": true,
+      "edit": false,
+      "invite": false
+    }
+  }
+}
+```
+
+### `accept_invite`
+
+```json
+{
+  "skill": "calendar",
+  "action": "accept_invite",
+  "input": {
+    "event_id": "<uuid>"
+  }
+}
+```
+
+### `reject_invite`
+
+```json
+{
+  "skill": "calendar",
+  "action": "reject_invite",
+  "input": {
+    "event_id": "<uuid>"
+  }
+}
+```
+
+## 5.5 Skill packaging for progressive disclosure
+
+The calendar skill should no longer be one long explanatory page that the worker must read in full.
+
+Recommended structure:
+
+```text
+calendar/
+  SKILL.md               # very short index / navigation card
+  actions/
+    list_day.md
+    list_range.md
+    get_event.md
+    create_event.md
+    update_event.md
+    delete_event.md
+    invite_subscriber.md
+    accept_invite.md
+    reject_invite.md
+```
+
+### `SKILL.md` responsibilities
+
+- describe when calendar skill is relevant
+- list all actions in one screen
+- say which action to use for known `event_id`
+- say which action to use for date/range queries
+- point to action files for exact payloads
+
+### Action file responsibilities
+
+Each action file should contain only:
+
+1. when to use the action
+2. required fields
+3. optional fields
+4. one canonical example
+5. forbidden field names and common mistakes
+
+This makes `view_skill_file` a real progressive-disclosure mechanism instead of a markdown dump.
+
+## 5.6 Error contract for self-correction
+
+The redesigned CLI should return structured action-level validation feedback.
+
+Canonical error example:
+
+```json
+{
+  "status": "failure",
+  "error": {
+    "code": "INVALID_ACTION_INPUT",
+    "message": "action list_range requires start_at and end_at",
+    "skill": "calendar",
+    "action": "list_range",
+    "missing_fields": ["start_at", "end_at"],
+    "unexpected_fields": ["event_id"],
+    "suggested_alternative_actions": ["get_event"]
+  }
+}
+```
+
+This is intentionally more corrective than the current generic `INVALID_ARGUMENT` payload.
+
+## 6. Token and Cost Control Strategy
+
+### 6.1 Preserve single-tool economy
+
+The main token-saving choice is to preserve one executable business tool.
+
+This avoids:
+
+- multiple tool schemas in each worker call
+- model confusion over which tool to pick first
+- large repeated tool descriptions in every turn
+
+### 6.2 Replace global knowledge with scoped reading
+
+The worker should read:
+
+1. router execution hints first
+2. skill index second
+3. one action card if needed
+
+This is cheaper than injecting the entire action matrix into every prompt.
+
+### 6.3 Stop spending iterations on protocol discovery
+
+The redesign reduces cost not by suppressing useful reasoning, but by removing the need for repeated failed exploration.
+
+The worker should no longer need multiple failed attempts to discover:
+
+- whether `event_id` belongs to `read`
+- whether `start_time` is valid
+- whether `event_timezone` is accepted
+
+### 6.4 Concrete worker settings for this redesign
+
+- set worker `max_iters=7`
+- keep worker `temperature` unchanged
+- remove/ignore worker `context_messages` configuration in runtime semantics
+
+### 6.5 Explicit non-goals in this task
+
+This task does not include:
+
+- changing router into a ReAct stage
+- lowering worker temperature
+- adding duplicate-failure circuit breakers yet
+- exposing many separate AgentScope tools again
+
+## 7. Migration Plan
+
+### Phase 0: Planning and protocol design
+
+1. Write this PRD and implementation checklist.
+2. Update protocol docs before runtime code changes.
+3. Record rejected alternatives and reasoning.
+
+### Phase 1: Backend runtime contract
+
+1. Extend router output schema with optional execution hints.
+2. Explicitly set worker `max_iters=7`.
+3. Remove semantic reliance on worker `context_messages`.
+4. Redesign `project_cli` request payload as `skill/action/input`.
+
+### Phase 2: Calendar action dispatch
+
+1. Replace current calendar command/subcommand routing with action dispatch.
+2. Implement strict action-specific Pydantic models.
+3. Remove legacy alias handling and generic dict coercion.
+4. Return structured correction-oriented validation errors.
+
+### Phase 3: Skill refactor
+
+1. Rewrite `calendar/SKILL.md` as a short index card.
+2. Add per-action action-card files.
+3. Update skill instructions so worker reads only what is needed.
+
+### Phase 4: Cross-layer alignment
+
+1. Update relevant protocol docs.
+2. Keep frontend consumption stable where possible.
+3. Ensure tool result and AG-UI event semantics remain compatible.
+
+### Phase 5: Verification
+
+1. Reproduce the previous failure case and confirm it routes to `get_event`.
+2. Verify create-event flow uses canonical names only.
+3. Verify range/day queries still work.
+4. Verify invite/accept/reject flows map to current schedule subscription behavior.
+
+## 8. Rejected Alternatives
+
+### 8.1 Rejected: split back into many tools
+
+Reason:
+
+- reintroduces tool-schema bloat
+- worsens tool-choice ambiguity
+- increases token overhead on every worker step
+
+### 8.2 Rejected: keep `command/subcommand/args` and fix only the skill text
+
+Reason:
+
+- the ambiguity is structural, not editorial
+- `read` still overloads distinct business operations
+- loose dict input still encourages field guessing
+
+### 8.3 Rejected: put the full action schema into the tool prompt directly
+
+Reason:
+
+- defeats progressive disclosure
+- grows the worker prompt on every turn
+- hurts cost and small-model reliability
+
+## 9. Success Criteria
+
+This redesign is successful only if all of the following are true:
+
+1. The worker still sees one executable business tool.
+2. The worker chooses calendar actions through business semantics, not command-tree guesswork.
+3. The previous repeated-failure case becomes a direct `get_event` call when `event_id` is known.
+4. The worker no longer relies on undocumented field aliases.
+5. The runtime protocol is strictly validated server-side.
+6. Skill reading is incremental and action-scoped.
+7. Worker iteration cost is bounded by `max_iters=7`.
+8. Backend, protocol docs, and frontend assumptions remain aligned.
@@ -0,0 +1,85 @@
+{
+  "id": "redesign-single-cli-skill-disclosure",
+  "name": "redesign-single-cli-skill-disclosure",
+  "title": "Redesign single CLI + progressive skill disclosure protocol",
+  "description": "Redesign the current single CLI tool into a business-action protocol driven by progressive skill disclosure. Preserve one AgentScope tool, replace legacy command/subcommand/args guessing with strict module/method/input dispatch, align router-worker contracts with actual runtime behavior, and reduce token waste without reintroducing multi-tool schema bloat.",
+  "status": "in_progress",
+  "dev_type": "fullstack",
+  "scope": "cross-domain",
+  "priority": "P1",
+  "creator": "qzl",
+  "assignee": "qzl",
+  "createdAt": "2026-04-23",
+  "completedAt": null,
+  "branch": null,
+  "base_branch": "dev",
+  "worktree_path": null,
+  "current_phase": 6,
+  "next_action": [
+    {
+      "phase": 1,
+      "action": "implement"
+    },
+    {
+      "phase": 2,
+      "action": "check"
+    },
+    {
+      "phase": 6,
+      "action": "finish"
+    },
+    {
+      "phase": 4,
+      "action": "create-pr"
+    }
+  ],
+  "commit": null,
+  "pr_url": null,
+  "subtasks": [
+    {
+      "name": "Write PRD for single CLI + progressive skill disclosure redesign",
+      "status": "completed"
+    },
+    {
+      "name": "Define calendar business action protocol from schedule_items and schedule_subscriptions",
+      "status": "completed"
+    },
+    {
+      "name": "Define router and worker contract changes for lower-token execution",
+      "status": "completed"
+    },
+    {
+      "name": "Define skill packaging for index-first progressive disclosure",
+      "status": "completed"
+    },
+    {
+      "name": "Define backend dispatch and validation migration plan",
+      "status": "completed"
+    },
+    {
+      "name": "Define protocol/frontend alignment and verification plan",
+      "status": "completed"
+    }
+  ],
+  "children": [],
+  "parent": null,
+  "relatedFiles": [
+    "backend/src/core/agentscope/runtime/runner.py",
+    "backend/src/core/agentscope/runtime/json_react_agent.py",
+    "backend/src/core/agentscope/tools/internal/project_cli.py",
+    "backend/src/core/agentscope/tools/internal/view_skill_file.py",
+    "backend/src/core/agentscope/tools/cli/adapter.py",
+    "backend/src/core/agentscope/tools/skills/calendar/SKILL.md",
+    "backend/src/v1/schedule_items/router.py",
+    "backend/src/v1/schedule_items/service.py",
+    "backend/src/v1/schedule_items/schemas.py",
+    "apps/lib/features/calendar/data/apis/calendar_api.dart",
+    "apps/lib/features/calendar/data/repositories/calendar_repository.dart",
+    "docs/protocols/agent/sse-events.md",
+    "docs/protocols/agent/tool-protocol.md"
+  ],
+  "notes": "This task now supersedes both the older command/subcommand/args direction and the intermediate skill/action/input direction for the CLI input protocol while keeping the validated parts of the prior refactor: one tool entry, AgentScope skills, structured tool outputs, and backend-owned AG-UI compilation. Phase 1 protocol docs are updated to module/method/input. Phase 2 runtime contract is updated with worker max_iters=7 and a lighter router contract that now requires time values in context_summary to be standardized to downstream project_cli input formats, including project_cli_defaults when deterministically known. Phase 3 and the calendar-focused business-method redesign are now in place: project_cli uses module/method/input, runtime-side skill gating was removed from project_cli, the CLI router dispatches by module+method, calendar reads were collapsed into calendar.read with strong typed `date`/timezone-aware `datetime`/`UUID` input variants, calendar mutations use module-scoped methods, contacts/memory align to the same envelope, tool postprocessing resolves ui_hints from module/method, and skill docs now teach module/method usage instead of leaking transport concerns into the tool contract. Backend unit/regression coverage is green for the updated AgentScope/tool stack. Integration/live tests have not yet been rerun after the module/method and strong-typing redesign, so end-to-end verification remains incomplete.",
+  "meta": {
+    "feature_summary": "single project_cli redesign + progressive skill disclosure + business action protocol + lower token/runtime ambiguity"
+  }
+}