d060962a5f
- Replace command/subcommand/args with module/method/input envelope - Calendar handler uses discriminated union (mode) for read operations - Strict Pydantic models with extra='forbid' for all calendar methods - Worker max_iters=7, router prompt simplified (removed project_cli_defaults) - Skill index cards + per-action files for progressive disclosure - Frontend/AG-UI aligned to module/method dispatch - Protocol docs updated to module/method/input contract WIP: action cards need envelope fix, 2 tests need update, memory handler needs Pydantic models.
10 KiB
10 KiB
Single CLI + Progressive Skill Disclosure Implementation Checklist
Purpose
This checklist turns the PRD into an execution plan. Complete items in order. Do not mark an item complete until the code, docs, and verification for that item are actually done.
Required Reading
- Read
backend/AGENTS.md - Read
apps/AGENTS.md - Read
.trellis/workflow.md - Read
.trellis/spec/backend/index.md - Read
.trellis/spec/guides/cross-layer-thinking-guide.md - Read the archived task docs in
.trellis/tasks/archive/2026-04/04-20-refactor-tool-cli-skill-ui-schema/
Locked Decisions For This Task
- Router remains a direct structured stage, not ReAct
- Worker remains the only ReAct stage
- Worker
max_iterstarget is 7 - Worker
temperaturestays unchanged - Single executable tool entry remains
project_cli command/subcommand/argsmodel-facing input will be replaced- The new model-facing input is
module/method/input - No broad backward-compatibility aliases will be kept
- Worker duplicate-failure circuit breaker is explicitly out of scope for this task
Phase 0: Task and Protocol Planning
0.1 Task docs
- Create new trellis task directory
- Update
task.jsonwith real scope, summary, and related files - Write
prd.md - Write
implementation-checklist.md - Write
decision-log.md
0.2 Design checkpoints captured
- Record why multi-tool exposure is rejected
- Record why old
command/subcommand/argsis rejected - Record why single-tool progressive disclosure is preserved
- Record current Supabase failure evidence and what it implies
Phase 1: Protocol Docs First
1.1 Update tool protocol docs
- Update
docs/protocols/agent/tool-protocol.md - Replace model-facing
command/subcommand/argsexamples withmodule/method/input - Document thin outer tool schema + strict server-side action validation
- Document structured validation errors with correction hints
1.2 Update agent protocol docs
- Update
docs/protocols/agent/sse-events.mdif tool call arg examples change - Update
docs/protocols/agent/api-endpoints.mdif history/examples mention old CLI arg shapes - Update any agent protocol doc that currently assumes
calendar read/create/update/...
1.3 Cross-layer contract review
- Confirm backend examples, protocol docs, and frontend assumptions remain mutually consistent
- Confirm no doc still teaches the model old alias names like
start_time/end_time
Phase 2: Router and Worker Runtime Contract
2.1 Router output schema
- Keep existing
objective/context_summary/requires_tool_evidenceintact - Reject heavier router output expansion in favor of a lighter contract and stronger
context_summary - Add/update tests for the retained lightweight router contract
2.2 Router prompting
- Update
backend/src/core/agentscope/prompts/agent_prompt.py - Teach router to make
context_summaryexecution-useful when IDs, dates, ranges, or prior tool outcomes matter - Standardize all time values in
context_summaryto downstream project_cli input formats - Avoid turning router into an executor
2.3 Worker runtime settings
- Update
backend/src/core/agentscope/runtime/runner.pyto passmax_iters=7intoJsonReActAgent - Confirm worker
temperatureremains unchanged - Remove worker runtime dependence on
context_messagessemantics in prompt/runtime guidance - Keep schema unchanged for now, but stop exposing worker
context_messagesin worker prompt semantics
2.4 Phase 2 verification
- Run targeted router/worker schema, prompt, and runner unit tests
- Confirm worker prompt no longer advertises
context_messages.mode/count - Confirm worker input still contains only the router contract message
- Confirm worker agent construction passes
max_iters=7
Phase 3: Single CLI Input Protocol Redesign
3.1 Replace model-facing request envelope
- Update
backend/src/core/agentscope/tools/internal/project_cli.py - Update
backend/src/core/agentscope/tools/cli/adapter.py - Replace
command/subcommand/argswithmodule/method/input - Remove
argsstring parsing compatibility - Keep tool result persistence and AG-UI flow intact
3.2 Action dispatch layer
- Add explicit dispatch by
module + method - Add strict per-method Pydantic request models with
extra="forbid"for calendar methods - Ensure unknown
moduleand unknownmethodreturn structured errors - Ensure method validators surface structured error details for invalid/missing fields
3.3 Remove legacy input aliases
- Reject
start_time/end_time - Reject
event_timezone - Reject using
event_idwith list-style actions - Confirm error messages are corrective rather than generic only
Phase 4: Calendar Business Action Protocol
4.1 Event actions
- Implement
list_day - Implement
list_range - Implement
get_event - Implement
create_event - Implement
update_event - Implement
delete_event
4.2 Subscription actions
- Implement
invite_subscriber - Implement
accept_invite - Implement
reject_invite
4.3 Handler mapping
- Map actions onto existing
v1.schedule_items.serviceoperations where possible - Keep repository -> service layering intact
- Keep
owner_idderived from auth context, never from tool input - Preserve existing permission and subscription semantics
4.4 Test coverage
- Add targeted unit coverage for calendar action validation paths and dispatch shape changes
- Add unit tests for dispatch selection and validation errors
- Add regression tests for the known
event_iddetail flow - Add regression tests for canonical create/update field names
4.5 Phase 3 partial verification
- Run targeted CLI router, calendar handler, and tool postprocessor unit tests
- Confirm tool postprocessor resolves UI by
module/method - Update integration/live test expectations to the new tool_call_args/result shape
- Confirm integration/live flows execute successfully with the new runtime shape (calendar read verified 2026-04-24)
Phase 5: Skill Refactor For Progressive Disclosure
5.1 Calendar skill packaging
- Rewrite
backend/src/core/agentscope/tools/skills/calendar/SKILL.mdas a short index card - Add
actions/files for each calendar action - Keep action files short, canonical, and example-driven
5.2 Skill composition rules
- Document when calendar should compose with contacts for phone lookup
- Document when worker should read only the index
- Document when worker should read one action file before calling the tool
- Remove long prose that does not help execution
5.3 View skill flow
- Ensure
view_skill_filecan read the new per-action file layout - Verify enabled-skill restrictions still work with nested action files
- Add tests for reading short skill index + action files
Phase 6: Frontend and AG-UI Alignment
6.1 Frontend assumptions audit
- Audit
apps/lib/core/chat/ag_ui_event.dartfor tool arg assumptions - Audit
apps/lib/features/chat/presentation/bloc/chat_bloc.dartforproject_clipayload assumptions - Audit calendar refresh logic that currently looks for
command/subcommand
6.2 Compatibility decision
- Decide whether frontend should switch refresh logic to
module/method - Update frontend parsing if required
- Keep
ui_schemarendering path unchanged unless protocol docs require otherwise
6.3 Cross-layer verification checklist
- Confirm backend tool payload examples match frontend parser expectations
- Confirm history/SSE still preserve tool result display behavior
- Confirm calendar detail navigation behavior still matches event identity semantics
Phase 7: Verification
7.1 Supabase-backed regression scenarios
- Reproduce the previous "known event_id detail lookup" scenario and verify
get_eventis used - Reproduce create-event scenario and verify canonical field names only
- Reproduce same-day and range listing scenarios
7.2 Runtime cost controls
- Verify worker max iterations are capped at 7
- Verify worker does not spend extra turns reading unnecessary skill files
- Review whether the redesign keeps common runs under the target token/cost budget
7.3 Code quality
- Run targeted backend tests
- Run targeted frontend tests if parser logic changes
- Run relevant integration or live tests when feasible
- Record what was verified and what remains unverified
Completion Criteria
project_cliremains the only executable business tool- Worker-facing CLI protocol uses
module/method/input - Calendar actions map to actual product objects and routes
- Skills are index-first and action-scoped
- Worker
max_iters=7is wired - Worker
context_messagesambiguity is removed - Docs, backend, and frontend expectations are aligned
Current Status Note
- Backend protocol and unit/regression tests are updated to
module/method/input - Calendar read inputs now use strong typed
date/ timezone-awaredatetime/UUIDvalidation - Integration/live tests have been rerun after the
module/methodredesign project_clitool schema:inputchanged from optional to required (root cause of empty input bug)- Router prompt cleaned: removed
project_cli_defaultsand time-resolution duties - Worker contract prompt cleaned: removed
project_cli_defaultsreference - Calendar SKILL.md rewritten with concrete examples referencing
USER_CONTEXT_JSONvariables - Integration test assertions migrated from
skill/actiontomodule/method - 5/6 integration tests passing (calendar read, calendar create, contacts read, memory update, tool flow read)
test_tool_ui_schema_in_historyfailing: history API returns tool messages withoutmetadata.tool_agent_output(pre-existing issue, not related to prompt changes)- Action card filenames under
calendar/actions/still use old names (list_day.md,get_event.md) instead of method-based names matchingmodule/methodcontract - Per-method review needed: verify each
project_climethod (create, update, delete, share, accept_invite, reject_invite) works end-to-end with current prompts