Files
social-app/.trellis/tasks/04-23-redesign-single-cli-skill-disclosure/implementation-checklist.md
T
qzl d060962a5f feat(agent): redesign project_cli with module/method/input protocol
- Replace command/subcommand/args with module/method/input envelope
- Calendar handler uses discriminated union (mode) for read operations
- Strict Pydantic models with extra='forbid' for all calendar methods
- Worker max_iters=7, router prompt simplified (removed project_cli_defaults)
- Skill index cards + per-action files for progressive disclosure
- Frontend/AG-UI aligned to module/method dispatch
- Protocol docs updated to module/method/input contract

WIP: action cards need envelope fix, 2 tests need update, memory
handler needs Pydantic models.
2026-04-24 13:24:13 +08:00

10 KiB

Single CLI + Progressive Skill Disclosure Implementation Checklist

Purpose

This checklist turns the PRD into an execution plan. Complete items in order. Do not mark an item complete until the code, docs, and verification for that item are actually done.

Required Reading

  • Read backend/AGENTS.md
  • Read apps/AGENTS.md
  • Read .trellis/workflow.md
  • Read .trellis/spec/backend/index.md
  • Read .trellis/spec/guides/cross-layer-thinking-guide.md
  • Read the archived task docs in .trellis/tasks/archive/2026-04/04-20-refactor-tool-cli-skill-ui-schema/

Locked Decisions For This Task

  • Router remains a direct structured stage, not ReAct
  • Worker remains the only ReAct stage
  • Worker max_iters target is 7
  • Worker temperature stays unchanged
  • Single executable tool entry remains project_cli
  • command/subcommand/args model-facing input will be replaced
  • The new model-facing input is module/method/input
  • No broad backward-compatibility aliases will be kept
  • Worker duplicate-failure circuit breaker is explicitly out of scope for this task

Phase 0: Task and Protocol Planning

0.1 Task docs

  • Create new trellis task directory
  • Update task.json with real scope, summary, and related files
  • Write prd.md
  • Write implementation-checklist.md
  • Write decision-log.md

0.2 Design checkpoints captured

  • Record why multi-tool exposure is rejected
  • Record why old command/subcommand/args is rejected
  • Record why single-tool progressive disclosure is preserved
  • Record current Supabase failure evidence and what it implies

Phase 1: Protocol Docs First

1.1 Update tool protocol docs

  • Update docs/protocols/agent/tool-protocol.md
  • Replace model-facing command/subcommand/args examples with module/method/input
  • Document thin outer tool schema + strict server-side action validation
  • Document structured validation errors with correction hints

1.2 Update agent protocol docs

  • Update docs/protocols/agent/sse-events.md if tool call arg examples change
  • Update docs/protocols/agent/api-endpoints.md if history/examples mention old CLI arg shapes
  • Update any agent protocol doc that currently assumes calendar read/create/update/...

1.3 Cross-layer contract review

  • Confirm backend examples, protocol docs, and frontend assumptions remain mutually consistent
  • Confirm no doc still teaches the model old alias names like start_time/end_time

Phase 2: Router and Worker Runtime Contract

2.1 Router output schema

  • Keep existing objective/context_summary/requires_tool_evidence intact
  • Reject heavier router output expansion in favor of a lighter contract and stronger context_summary
  • Add/update tests for the retained lightweight router contract

2.2 Router prompting

  • Update backend/src/core/agentscope/prompts/agent_prompt.py
  • Teach router to make context_summary execution-useful when IDs, dates, ranges, or prior tool outcomes matter
  • Standardize all time values in context_summary to downstream project_cli input formats
  • Avoid turning router into an executor

2.3 Worker runtime settings

  • Update backend/src/core/agentscope/runtime/runner.py to pass max_iters=7 into JsonReActAgent
  • Confirm worker temperature remains unchanged
  • Remove worker runtime dependence on context_messages semantics in prompt/runtime guidance
  • Keep schema unchanged for now, but stop exposing worker context_messages in worker prompt semantics

2.4 Phase 2 verification

  • Run targeted router/worker schema, prompt, and runner unit tests
  • Confirm worker prompt no longer advertises context_messages.mode/count
  • Confirm worker input still contains only the router contract message
  • Confirm worker agent construction passes max_iters=7

Phase 3: Single CLI Input Protocol Redesign

3.1 Replace model-facing request envelope

  • Update backend/src/core/agentscope/tools/internal/project_cli.py
  • Update backend/src/core/agentscope/tools/cli/adapter.py
  • Replace command/subcommand/args with module/method/input
  • Remove args string parsing compatibility
  • Keep tool result persistence and AG-UI flow intact

3.2 Action dispatch layer

  • Add explicit dispatch by module + method
  • Add strict per-method Pydantic request models with extra="forbid" for calendar methods
  • Ensure unknown module and unknown method return structured errors
  • Ensure method validators surface structured error details for invalid/missing fields

3.3 Remove legacy input aliases

  • Reject start_time/end_time
  • Reject event_timezone
  • Reject using event_id with list-style actions
  • Confirm error messages are corrective rather than generic only

Phase 4: Calendar Business Action Protocol

4.1 Event actions

  • Implement list_day
  • Implement list_range
  • Implement get_event
  • Implement create_event
  • Implement update_event
  • Implement delete_event

4.2 Subscription actions

  • Implement invite_subscriber
  • Implement accept_invite
  • Implement reject_invite

4.3 Handler mapping

  • Map actions onto existing v1.schedule_items.service operations where possible
  • Keep repository -> service layering intact
  • Keep owner_id derived from auth context, never from tool input
  • Preserve existing permission and subscription semantics

4.4 Test coverage

  • Add targeted unit coverage for calendar action validation paths and dispatch shape changes
  • Add unit tests for dispatch selection and validation errors
  • Add regression tests for the known event_id detail flow
  • Add regression tests for canonical create/update field names

4.5 Phase 3 partial verification

  • Run targeted CLI router, calendar handler, and tool postprocessor unit tests
  • Confirm tool postprocessor resolves UI by module/method
  • Update integration/live test expectations to the new tool_call_args/result shape
  • Confirm integration/live flows execute successfully with the new runtime shape (calendar read verified 2026-04-24)

Phase 5: Skill Refactor For Progressive Disclosure

5.1 Calendar skill packaging

  • Rewrite backend/src/core/agentscope/tools/skills/calendar/SKILL.md as a short index card
  • Add actions/ files for each calendar action
  • Keep action files short, canonical, and example-driven

5.2 Skill composition rules

  • Document when calendar should compose with contacts for phone lookup
  • Document when worker should read only the index
  • Document when worker should read one action file before calling the tool
  • Remove long prose that does not help execution

5.3 View skill flow

  • Ensure view_skill_file can read the new per-action file layout
  • Verify enabled-skill restrictions still work with nested action files
  • Add tests for reading short skill index + action files

Phase 6: Frontend and AG-UI Alignment

6.1 Frontend assumptions audit

  • Audit apps/lib/core/chat/ag_ui_event.dart for tool arg assumptions
  • Audit apps/lib/features/chat/presentation/bloc/chat_bloc.dart for project_cli payload assumptions
  • Audit calendar refresh logic that currently looks for command/subcommand

6.2 Compatibility decision

  • Decide whether frontend should switch refresh logic to module/method
  • Update frontend parsing if required
  • Keep ui_schema rendering path unchanged unless protocol docs require otherwise

6.3 Cross-layer verification checklist

  • Confirm backend tool payload examples match frontend parser expectations
  • Confirm history/SSE still preserve tool result display behavior
  • Confirm calendar detail navigation behavior still matches event identity semantics

Phase 7: Verification

7.1 Supabase-backed regression scenarios

  • Reproduce the previous "known event_id detail lookup" scenario and verify get_event is used
  • Reproduce create-event scenario and verify canonical field names only
  • Reproduce same-day and range listing scenarios

7.2 Runtime cost controls

  • Verify worker max iterations are capped at 7
  • Verify worker does not spend extra turns reading unnecessary skill files
  • Review whether the redesign keeps common runs under the target token/cost budget

7.3 Code quality

  • Run targeted backend tests
  • Run targeted frontend tests if parser logic changes
  • Run relevant integration or live tests when feasible
  • Record what was verified and what remains unverified

Completion Criteria

  • project_cli remains the only executable business tool
  • Worker-facing CLI protocol uses module/method/input
  • Calendar actions map to actual product objects and routes
  • Skills are index-first and action-scoped
  • Worker max_iters=7 is wired
  • Worker context_messages ambiguity is removed
  • Docs, backend, and frontend expectations are aligned

Current Status Note

  • Backend protocol and unit/regression tests are updated to module/method/input
  • Calendar read inputs now use strong typed date / timezone-aware datetime / UUID validation
  • Integration/live tests have been rerun after the module/method redesign
  • project_cli tool schema: input changed from optional to required (root cause of empty input bug)
  • Router prompt cleaned: removed project_cli_defaults and time-resolution duties
  • Worker contract prompt cleaned: removed project_cli_defaults reference
  • Calendar SKILL.md rewritten with concrete examples referencing USER_CONTEXT_JSON variables
  • Integration test assertions migrated from skill/action to module/method
  • 5/6 integration tests passing (calendar read, calendar create, contacts read, memory update, tool flow read)
  • test_tool_ui_schema_in_history failing: history API returns tool messages without metadata.tool_agent_output (pre-existing issue, not related to prompt changes)
  • Action card filenames under calendar/actions/ still use old names (list_day.md, get_event.md) instead of method-based names matching module/method contract
  • Per-method review needed: verify each project_cli method (create, update, delete, share, accept_invite, reject_invite) works end-to-end with current prompts