.trellis/tasks/04-23-redesign-single-cli-skill-disclosure/implementation-checklist.md

# Single CLI + Progressive Skill Disclosure Implementation Checklist

## Purpose

This checklist turns the PRD into an execution plan. Complete items in order. Do not mark an item complete until the code, docs, and verification for that item are actually done.

## Required Reading

- [x] Read `backend/AGENTS.md`
- [x] Read `apps/AGENTS.md`
- [x] Read `.trellis/workflow.md`
- [x] Read `.trellis/spec/backend/index.md`
- [x] Read `.trellis/spec/guides/cross-layer-thinking-guide.md`
- [x] Read the archived task docs in `.trellis/tasks/archive/2026-04/04-20-refactor-tool-cli-skill-ui-schema/`

## Locked Decisions For This Task

- [x] Router remains a direct structured stage, not ReAct
- [x] Worker remains the only ReAct stage
- [x] Worker `max_iters` target is 7
- [x] Worker `temperature` stays unchanged
- [x] Single executable tool entry remains `project_cli`
- [x] `command/subcommand/args` model-facing input will be replaced
- [x] The new model-facing input is `module/method/input`
- [x] No broad backward-compatibility aliases will be kept
- [x] Worker duplicate-failure circuit breaker is explicitly out of scope for this task

## Phase 0: Task and Protocol Planning

### 0.1 Task docs

- [x] Create new trellis task directory
- [x] Update `task.json` with real scope, summary, and related files
- [x] Write `prd.md`
- [x] Write `implementation-checklist.md`
- [x] Write `decision-log.md`

### 0.2 Design checkpoints captured

- [x] Record why multi-tool exposure is rejected
- [x] Record why old `command/subcommand/args` is rejected
- [x] Record why single-tool progressive disclosure is preserved
- [x] Record current Supabase failure evidence and what it implies

## Phase 1: Protocol Docs First

### 1.1 Update tool protocol docs

- [x] Update `docs/protocols/agent/tool-protocol.md`
- [x] Replace model-facing `command/subcommand/args` examples with `module/method/input`
- [x] Document thin outer tool schema + strict server-side action validation
- [x] Document structured validation errors with correction hints

### 1.2 Update agent protocol docs

- [x] Update `docs/protocols/agent/sse-events.md` if tool call arg examples change
- [x] Update `docs/protocols/agent/api-endpoints.md` if history/examples mention old CLI arg shapes
- [x] Update any agent protocol doc that currently assumes `calendar read/create/update/...`

### 1.3 Cross-layer contract review

- [x] Confirm backend examples, protocol docs, and frontend assumptions remain mutually consistent
- [x] Confirm no doc still teaches the model old alias names like `start_time/end_time`

## Phase 2: Router and Worker Runtime Contract

### 2.1 Router output schema

- [x] Keep existing `objective/context_summary/requires_tool_evidence` intact
- [x] Reject heavier router output expansion in favor of a lighter contract and stronger `context_summary`
- [x] Add/update tests for the retained lightweight router contract

### 2.2 Router prompting

- [x] Update `backend/src/core/agentscope/prompts/agent_prompt.py`
- [x] Teach router to make `context_summary` execution-useful when IDs, dates, ranges, or prior tool outcomes matter
- [x] Standardize all time values in `context_summary` to downstream project_cli input formats
- [x] Avoid turning router into an executor

### 2.3 Worker runtime settings

- [x] Update `backend/src/core/agentscope/runtime/runner.py` to pass `max_iters=7` into `JsonReActAgent`
- [x] Confirm worker `temperature` remains unchanged
- [x] Remove worker runtime dependence on `context_messages` semantics in prompt/runtime guidance
- [x] Keep schema unchanged for now, but stop exposing worker `context_messages` in worker prompt semantics

### 2.4 Phase 2 verification

- [x] Run targeted router/worker schema, prompt, and runner unit tests
- [x] Confirm worker prompt no longer advertises `context_messages.mode/count`
- [x] Confirm worker input still contains only the router contract message
- [x] Confirm worker agent construction passes `max_iters=7`

## Phase 3: Single CLI Input Protocol Redesign

### 3.1 Replace model-facing request envelope

- [x] Update `backend/src/core/agentscope/tools/internal/project_cli.py`
- [x] Update `backend/src/core/agentscope/tools/cli/adapter.py`
- [x] Replace `command/subcommand/args` with `module/method/input`
- [x] Remove `args` string parsing compatibility
- [x] Keep tool result persistence and AG-UI flow intact

### 3.2 Action dispatch layer

- [x] Add explicit dispatch by `module + method`
- [x] Add strict per-method Pydantic request models with `extra="forbid"` for calendar methods
- [x] Ensure unknown `module` and unknown `method` return structured errors
- [x] Ensure method validators surface structured error details for invalid/missing fields

### 3.3 Remove legacy input aliases

- [x] Reject `start_time/end_time`
- [x] Reject `event_timezone`
- [x] Reject using `event_id` with list-style actions
- [x] Confirm error messages are corrective rather than generic only

## Phase 4: Calendar Business Action Protocol

### 4.1 Event actions

- [x] Implement `list_day`
- [x] Implement `list_range`
- [x] Implement `get_event`
- [x] Implement `create_event`
- [x] Implement `update_event`
- [x] Implement `delete_event`

### 4.2 Subscription actions

- [x] Implement `invite_subscriber`
- [x] Implement `accept_invite`
- [x] Implement `reject_invite`

### 4.3 Handler mapping

- [x] Map actions onto existing `v1.schedule_items.service` operations where possible
- [x] Keep repository -> service layering intact
- [x] Keep `owner_id` derived from auth context, never from tool input
- [x] Preserve existing permission and subscription semantics

### 4.4 Test coverage

- [x] Add targeted unit coverage for calendar action validation paths and dispatch shape changes
- [x] Add unit tests for dispatch selection and validation errors
- [x] Add regression tests for the known `event_id` detail flow
- [x] Add regression tests for canonical create/update field names

### 4.5 Phase 3 partial verification

- [x] Run targeted CLI router, calendar handler, and tool postprocessor unit tests
- [x] Confirm tool postprocessor resolves UI by `module/method`
- [x] Update integration/live test expectations to the new tool_call_args/result shape
- [x] Confirm integration/live flows execute successfully with the new runtime shape (calendar read verified 2026-04-24)

## Phase 5: Skill Refactor For Progressive Disclosure

### 5.1 Calendar skill packaging

- [x] Rewrite `backend/src/core/agentscope/tools/skills/calendar/SKILL.md` as a short index card
- [x] Add `actions/` files for each calendar action
- [x] Keep action files short, canonical, and example-driven

### 5.2 Skill composition rules

- [x] Document when calendar should compose with contacts for phone lookup
- [x] Document when worker should read only the index
- [x] Document when worker should read one action file before calling the tool
- [x] Remove long prose that does not help execution

### 5.3 View skill flow

- [x] Ensure `view_skill_file` can read the new per-action file layout
- [x] Verify enabled-skill restrictions still work with nested action files
- [x] Add tests for reading short skill index + action files

## Phase 6: Frontend and AG-UI Alignment

### 6.1 Frontend assumptions audit

- [x] Audit `apps/lib/core/chat/ag_ui_event.dart` for tool arg assumptions
- [x] Audit `apps/lib/features/chat/presentation/bloc/chat_bloc.dart` for `project_cli` payload assumptions
- [x] Audit calendar refresh logic that currently looks for `command/subcommand`

### 6.2 Compatibility decision

- [x] Decide whether frontend should switch refresh logic to `module/method`
- [x] Update frontend parsing if required
- [x] Keep `ui_schema` rendering path unchanged unless protocol docs require otherwise

### 6.3 Cross-layer verification checklist

- [x] Confirm backend tool payload examples match frontend parser expectations
- [x] Confirm history/SSE still preserve tool result display behavior
- [ ] Confirm calendar detail navigation behavior still matches event identity semantics

## Phase 7: Verification

### 7.1 Supabase-backed regression scenarios

- [ ] Reproduce the previous "known event_id detail lookup" scenario and verify `get_event` is used
- [ ] Reproduce create-event scenario and verify canonical field names only
- [ ] Reproduce same-day and range listing scenarios

### 7.2 Runtime cost controls

- [ ] Verify worker max iterations are capped at 7
- [ ] Verify worker does not spend extra turns reading unnecessary skill files
- [ ] Review whether the redesign keeps common runs under the target token/cost budget

### 7.3 Code quality

- [x] Run targeted backend tests
- [x] Run targeted frontend tests if parser logic changes
- [x] Run relevant integration or live tests when feasible
- [x] Record what was verified and what remains unverified

## Completion Criteria

- [ ] `project_cli` remains the only executable business tool
- [ ] Worker-facing CLI protocol uses `module/method/input`
- [ ] Calendar actions map to actual product objects and routes
- [ ] Skills are index-first and action-scoped
- [ ] Worker `max_iters=7` is wired
- [ ] Worker `context_messages` ambiguity is removed
- [ ] Docs, backend, and frontend expectations are aligned

## Current Status Note

- [x] Backend protocol and unit/regression tests are updated to `module/method/input`
- [x] Calendar read inputs now use strong typed `date` / timezone-aware `datetime` / `UUID` validation
- [x] Integration/live tests have been rerun after the `module/method` redesign
- [x] `project_cli` tool schema: `input` changed from optional to required (root cause of empty input bug)
- [x] Router prompt cleaned: removed `project_cli_defaults` and time-resolution duties
- [x] Worker contract prompt cleaned: removed `project_cli_defaults` reference
- [x] Calendar SKILL.md rewritten with concrete examples referencing `USER_CONTEXT_JSON` variables
- [x] Integration test assertions migrated from `skill/action` to `module/method`
- [x] 5/6 integration tests passing (calendar read, calendar create, contacts read, memory update, tool flow read)
- [ ] `test_tool_ui_schema_in_history` failing: history API returns tool messages without `metadata.tool_agent_output` (pre-existing issue, not related to prompt changes)
- [ ] Action card filenames under `calendar/actions/` still use old names (`list_day.md`, `get_event.md`) instead of method-based names matching `module/method` contract
- [ ] Per-method review needed: verify each `project_cli` method (create, update, delete, share, accept_invite, reject_invite) works end-to-end with current prompts
feat(agent): redesign project_cli with module/method/input protocol 2026-04-24 13:24:13 +08:00			`# Single CLI + Progressive Skill Disclosure Implementation Checklist`

			`## Purpose`

			`This checklist turns the PRD into an execution plan. Complete items in order. Do not mark an item complete until the code, docs, and verification for that item are actually done.`

			`## Required Reading`

			- [x] Read `backend/AGENTS.md`
			- [x] Read `apps/AGENTS.md`
			- [x] Read `.trellis/workflow.md`
			- [x] Read `.trellis/spec/backend/index.md`
			- [x] Read `.trellis/spec/guides/cross-layer-thinking-guide.md`
			- [x] Read the archived task docs in `.trellis/tasks/archive/2026-04/04-20-refactor-tool-cli-skill-ui-schema/`

			`## Locked Decisions For This Task`

			`- [x] Router remains a direct structured stage, not ReAct`
			`- [x] Worker remains the only ReAct stage`
			- [x] Worker `max_iters` target is 7
			- [x] Worker `temperature` stays unchanged
			- [x] Single executable tool entry remains `project_cli`
			- [x] `command/subcommand/args` model-facing input will be replaced
			- [x] The new model-facing input is `module/method/input`
			`- [x] No broad backward-compatibility aliases will be kept`
			`- [x] Worker duplicate-failure circuit breaker is explicitly out of scope for this task`

			`## Phase 0: Task and Protocol Planning`

			`### 0.1 Task docs`

			`- [x] Create new trellis task directory`
			- [x] Update `task.json` with real scope, summary, and related files
			- [x] Write `prd.md`
			- [x] Write `implementation-checklist.md`
			- [x] Write `decision-log.md`

			`### 0.2 Design checkpoints captured`

			`- [x] Record why multi-tool exposure is rejected`
			- [x] Record why old `command/subcommand/args` is rejected
			`- [x] Record why single-tool progressive disclosure is preserved`
			`- [x] Record current Supabase failure evidence and what it implies`

			`## Phase 1: Protocol Docs First`

			`### 1.1 Update tool protocol docs`

			- [x] Update `docs/protocols/agent/tool-protocol.md`
			- [x] Replace model-facing `command/subcommand/args` examples with `module/method/input`
			`- [x] Document thin outer tool schema + strict server-side action validation`
			`- [x] Document structured validation errors with correction hints`

			`### 1.2 Update agent protocol docs`

			- [x] Update `docs/protocols/agent/sse-events.md` if tool call arg examples change
			- [x] Update `docs/protocols/agent/api-endpoints.md` if history/examples mention old CLI arg shapes
			- [x] Update any agent protocol doc that currently assumes `calendar read/create/update/...`

			`### 1.3 Cross-layer contract review`

			`- [x] Confirm backend examples, protocol docs, and frontend assumptions remain mutually consistent`
			- [x] Confirm no doc still teaches the model old alias names like `start_time/end_time`

			`## Phase 2: Router and Worker Runtime Contract`

			`### 2.1 Router output schema`

			- [x] Keep existing `objective/context_summary/requires_tool_evidence` intact
			- [x] Reject heavier router output expansion in favor of a lighter contract and stronger `context_summary`
			`- [x] Add/update tests for the retained lightweight router contract`

			`### 2.2 Router prompting`

			- [x] Update `backend/src/core/agentscope/prompts/agent_prompt.py`
			- [x] Teach router to make `context_summary` execution-useful when IDs, dates, ranges, or prior tool outcomes matter
			- [x] Standardize all time values in `context_summary` to downstream project_cli input formats
			`- [x] Avoid turning router into an executor`

			`### 2.3 Worker runtime settings`

			- [x] Update `backend/src/core/agentscope/runtime/runner.py` to pass `max_iters=7` into `JsonReActAgent`
			- [x] Confirm worker `temperature` remains unchanged
			- [x] Remove worker runtime dependence on `context_messages` semantics in prompt/runtime guidance
			- [x] Keep schema unchanged for now, but stop exposing worker `context_messages` in worker prompt semantics

			`### 2.4 Phase 2 verification`

			`- [x] Run targeted router/worker schema, prompt, and runner unit tests`
			- [x] Confirm worker prompt no longer advertises `context_messages.mode/count`
			`- [x] Confirm worker input still contains only the router contract message`
			- [x] Confirm worker agent construction passes `max_iters=7`

			`## Phase 3: Single CLI Input Protocol Redesign`

			`### 3.1 Replace model-facing request envelope`

			- [x] Update `backend/src/core/agentscope/tools/internal/project_cli.py`
			- [x] Update `backend/src/core/agentscope/tools/cli/adapter.py`
			- [x] Replace `command/subcommand/args` with `module/method/input`
			- [x] Remove `args` string parsing compatibility
			`- [x] Keep tool result persistence and AG-UI flow intact`

			`### 3.2 Action dispatch layer`

			- [x] Add explicit dispatch by `module + method`
			- [x] Add strict per-method Pydantic request models with `extra="forbid"` for calendar methods
			- [x] Ensure unknown `module` and unknown `method` return structured errors
			`- [x] Ensure method validators surface structured error details for invalid/missing fields`

			`### 3.3 Remove legacy input aliases`

			- [x] Reject `start_time/end_time`
			- [x] Reject `event_timezone`
			- [x] Reject using `event_id` with list-style actions
			`- [x] Confirm error messages are corrective rather than generic only`

			`## Phase 4: Calendar Business Action Protocol`

			`### 4.1 Event actions`

			- [x] Implement `list_day`
			- [x] Implement `list_range`
			- [x] Implement `get_event`
			- [x] Implement `create_event`
			- [x] Implement `update_event`
			- [x] Implement `delete_event`

			`### 4.2 Subscription actions`

			- [x] Implement `invite_subscriber`
			- [x] Implement `accept_invite`
			- [x] Implement `reject_invite`

			`### 4.3 Handler mapping`

			- [x] Map actions onto existing `v1.schedule_items.service` operations where possible
			`- [x] Keep repository -> service layering intact`
			- [x] Keep `owner_id` derived from auth context, never from tool input
			`- [x] Preserve existing permission and subscription semantics`

			`### 4.4 Test coverage`

			`- [x] Add targeted unit coverage for calendar action validation paths and dispatch shape changes`
			`- [x] Add unit tests for dispatch selection and validation errors`
			- [x] Add regression tests for the known `event_id` detail flow
			`- [x] Add regression tests for canonical create/update field names`

			`### 4.5 Phase 3 partial verification`

			`- [x] Run targeted CLI router, calendar handler, and tool postprocessor unit tests`
			- [x] Confirm tool postprocessor resolves UI by `module/method`
			`- [x] Update integration/live test expectations to the new tool_call_args/result shape`
			`- [x] Confirm integration/live flows execute successfully with the new runtime shape (calendar read verified 2026-04-24)`

			`## Phase 5: Skill Refactor For Progressive Disclosure`

			`### 5.1 Calendar skill packaging`

			- [x] Rewrite `backend/src/core/agentscope/tools/skills/calendar/SKILL.md` as a short index card
			- [x] Add `actions/` files for each calendar action
			`- [x] Keep action files short, canonical, and example-driven`

			`### 5.2 Skill composition rules`

			`- [x] Document when calendar should compose with contacts for phone lookup`
			`- [x] Document when worker should read only the index`
			`- [x] Document when worker should read one action file before calling the tool`
			`- [x] Remove long prose that does not help execution`

			`### 5.3 View skill flow`

			- [x] Ensure `view_skill_file` can read the new per-action file layout
			`- [x] Verify enabled-skill restrictions still work with nested action files`
			`- [x] Add tests for reading short skill index + action files`

			`## Phase 6: Frontend and AG-UI Alignment`

			`### 6.1 Frontend assumptions audit`

			- [x] Audit `apps/lib/core/chat/ag_ui_event.dart` for tool arg assumptions
			- [x] Audit `apps/lib/features/chat/presentation/bloc/chat_bloc.dart` for `project_cli` payload assumptions
			- [x] Audit calendar refresh logic that currently looks for `command/subcommand`

			`### 6.2 Compatibility decision`

			- [x] Decide whether frontend should switch refresh logic to `module/method`
			`- [x] Update frontend parsing if required`
			- [x] Keep `ui_schema` rendering path unchanged unless protocol docs require otherwise

			`### 6.3 Cross-layer verification checklist`

			`- [x] Confirm backend tool payload examples match frontend parser expectations`
			`- [x] Confirm history/SSE still preserve tool result display behavior`
			`- [ ] Confirm calendar detail navigation behavior still matches event identity semantics`

			`## Phase 7: Verification`

			`### 7.1 Supabase-backed regression scenarios`

			- [ ] Reproduce the previous "known event_id detail lookup" scenario and verify `get_event` is used
			`- [ ] Reproduce create-event scenario and verify canonical field names only`
			`- [ ] Reproduce same-day and range listing scenarios`

			`### 7.2 Runtime cost controls`

			`- [ ] Verify worker max iterations are capped at 7`
			`- [ ] Verify worker does not spend extra turns reading unnecessary skill files`
			`- [ ] Review whether the redesign keeps common runs under the target token/cost budget`

			`### 7.3 Code quality`

			`- [x] Run targeted backend tests`
			`- [x] Run targeted frontend tests if parser logic changes`
			`- [x] Run relevant integration or live tests when feasible`
			`- [x] Record what was verified and what remains unverified`

			`## Completion Criteria`

			- [ ] `project_cli` remains the only executable business tool
			- [ ] Worker-facing CLI protocol uses `module/method/input`
			`- [ ] Calendar actions map to actual product objects and routes`
			`- [ ] Skills are index-first and action-scoped`
			- [ ] Worker `max_iters=7` is wired
			- [ ] Worker `context_messages` ambiguity is removed
			`- [ ] Docs, backend, and frontend expectations are aligned`

			`## Current Status Note`

			- [x] Backend protocol and unit/regression tests are updated to `module/method/input`
			- [x] Calendar read inputs now use strong typed `date` / timezone-aware `datetime` / `UUID` validation
			- [x] Integration/live tests have been rerun after the `module/method` redesign
			- [x] `project_cli` tool schema: `input` changed from optional to required (root cause of empty input bug)
			- [x] Router prompt cleaned: removed `project_cli_defaults` and time-resolution duties
			- [x] Worker contract prompt cleaned: removed `project_cli_defaults` reference
			- [x] Calendar SKILL.md rewritten with concrete examples referencing `USER_CONTEXT_JSON` variables
			- [x] Integration test assertions migrated from `skill/action` to `module/method`
			`- [x] 5/6 integration tests passing (calendar read, calendar create, contacts read, memory update, tool flow read)`
			- [ ] `test_tool_ui_schema_in_history` failing: history API returns tool messages without `metadata.tool_agent_output` (pre-existing issue, not related to prompt changes)
			- [ ] Action card filenames under `calendar/actions/` still use old names (`list_day.md`, `get_event.md`) instead of method-based names matching `module/method` contract
			- [ ] Per-method review needed: verify each `project_cli` method (create, update, delete, share, accept_invite, reject_invite) works end-to-end with current prompts