Files

T

qzl d060962a5f feat(agent): redesign project_cli with module/method/input protocol

- Replace command/subcommand/args with module/method/input envelope
- Calendar handler uses discriminated union (mode) for read operations
- Strict Pydantic models with extra='forbid' for all calendar methods
- Worker max_iters=7, router prompt simplified (removed project_cli_defaults)
- Skill index cards + per-action files for progressive disclosure
- Frontend/AG-UI aligned to module/method dispatch
- Protocol docs updated to module/method/input contract

WIP: action cards need envelope fix, 2 tests need update, memory
handler needs Pydantic models.

2026-04-24 13:24:13 +08:00

10 KiB

Raw Blame History

Single CLI + Progressive Skill Disclosure Implementation Checklist

Purpose

This checklist turns the PRD into an execution plan. Complete items in order. Do not mark an item complete until the code, docs, and verification for that item are actually done.

Required Reading

Read backend/AGENTS.md
Read apps/AGENTS.md
Read .trellis/workflow.md
Read .trellis/spec/backend/index.md
Read .trellis/spec/guides/cross-layer-thinking-guide.md
Read the archived task docs in .trellis/tasks/archive/2026-04/04-20-refactor-tool-cli-skill-ui-schema/

Locked Decisions For This Task

Router remains a direct structured stage, not ReAct
Worker remains the only ReAct stage
Worker max_iters target is 7
Worker temperature stays unchanged
Single executable tool entry remains project_cli
command/subcommand/args model-facing input will be replaced
The new model-facing input is module/method/input
No broad backward-compatibility aliases will be kept
Worker duplicate-failure circuit breaker is explicitly out of scope for this task

Phase 0: Task and Protocol Planning

0.1 Task docs

Create new trellis task directory
Update task.json with real scope, summary, and related files
Write prd.md
Write implementation-checklist.md
Write decision-log.md

0.2 Design checkpoints captured

Record why multi-tool exposure is rejected
Record why old command/subcommand/args is rejected
Record why single-tool progressive disclosure is preserved
Record current Supabase failure evidence and what it implies

Phase 1: Protocol Docs First

1.1 Update tool protocol docs

Update docs/protocols/agent/tool-protocol.md
Replace model-facing command/subcommand/args examples with module/method/input
Document thin outer tool schema + strict server-side action validation
Document structured validation errors with correction hints

1.2 Update agent protocol docs

Update docs/protocols/agent/sse-events.md if tool call arg examples change
Update docs/protocols/agent/api-endpoints.md if history/examples mention old CLI arg shapes
Update any agent protocol doc that currently assumes calendar read/create/update/...

1.3 Cross-layer contract review

Confirm backend examples, protocol docs, and frontend assumptions remain mutually consistent
Confirm no doc still teaches the model old alias names like start_time/end_time

Phase 2: Router and Worker Runtime Contract

2.1 Router output schema

Keep existing objective/context_summary/requires_tool_evidence intact
Reject heavier router output expansion in favor of a lighter contract and stronger context_summary
Add/update tests for the retained lightweight router contract

2.2 Router prompting

Update backend/src/core/agentscope/prompts/agent_prompt.py
Teach router to make context_summary execution-useful when IDs, dates, ranges, or prior tool outcomes matter
Standardize all time values in context_summary to downstream project_cli input formats
Avoid turning router into an executor

2.3 Worker runtime settings

Update backend/src/core/agentscope/runtime/runner.py to pass max_iters=7 into JsonReActAgent
Confirm worker temperature remains unchanged
Remove worker runtime dependence on context_messages semantics in prompt/runtime guidance
Keep schema unchanged for now, but stop exposing worker context_messages in worker prompt semantics

2.4 Phase 2 verification

Run targeted router/worker schema, prompt, and runner unit tests
Confirm worker prompt no longer advertises context_messages.mode/count
Confirm worker input still contains only the router contract message
Confirm worker agent construction passes max_iters=7

Phase 3: Single CLI Input Protocol Redesign

3.1 Replace model-facing request envelope

Update backend/src/core/agentscope/tools/internal/project_cli.py
Update backend/src/core/agentscope/tools/cli/adapter.py
Replace command/subcommand/args with module/method/input
Remove args string parsing compatibility
Keep tool result persistence and AG-UI flow intact

3.2 Action dispatch layer

Add explicit dispatch by module + method
Add strict per-method Pydantic request models with extra="forbid" for calendar methods
Ensure unknown module and unknown method return structured errors
Ensure method validators surface structured error details for invalid/missing fields

3.3 Remove legacy input aliases

Reject start_time/end_time
Reject event_timezone
Reject using event_id with list-style actions
Confirm error messages are corrective rather than generic only

Phase 4: Calendar Business Action Protocol

4.1 Event actions

Implement list_day
Implement list_range
Implement get_event
Implement create_event
Implement update_event
Implement delete_event

4.2 Subscription actions

Implement invite_subscriber
Implement accept_invite
Implement reject_invite

4.3 Handler mapping

Map actions onto existing v1.schedule_items.service operations where possible
Keep repository -> service layering intact
Keep owner_id derived from auth context, never from tool input
Preserve existing permission and subscription semantics

4.4 Test coverage

Add targeted unit coverage for calendar action validation paths and dispatch shape changes
Add unit tests for dispatch selection and validation errors
Add regression tests for the known event_id detail flow
Add regression tests for canonical create/update field names

4.5 Phase 3 partial verification

Run targeted CLI router, calendar handler, and tool postprocessor unit tests
Confirm tool postprocessor resolves UI by module/method
Update integration/live test expectations to the new tool_call_args/result shape
Confirm integration/live flows execute successfully with the new runtime shape (calendar read verified 2026-04-24)

Phase 5: Skill Refactor For Progressive Disclosure

5.1 Calendar skill packaging

Rewrite backend/src/core/agentscope/tools/skills/calendar/SKILL.md as a short index card
Add actions/ files for each calendar action
Keep action files short, canonical, and example-driven

5.2 Skill composition rules

Document when calendar should compose with contacts for phone lookup
Document when worker should read only the index
Document when worker should read one action file before calling the tool
Remove long prose that does not help execution

5.3 View skill flow

Ensure view_skill_file can read the new per-action file layout
Verify enabled-skill restrictions still work with nested action files
Add tests for reading short skill index + action files

Phase 6: Frontend and AG-UI Alignment

6.1 Frontend assumptions audit

Audit apps/lib/core/chat/ag_ui_event.dart for tool arg assumptions
Audit apps/lib/features/chat/presentation/bloc/chat_bloc.dart for project_cli payload assumptions
Audit calendar refresh logic that currently looks for command/subcommand

6.2 Compatibility decision

Decide whether frontend should switch refresh logic to module/method
Update frontend parsing if required
Keep ui_schema rendering path unchanged unless protocol docs require otherwise

6.3 Cross-layer verification checklist

Confirm backend tool payload examples match frontend parser expectations
Confirm history/SSE still preserve tool result display behavior
Confirm calendar detail navigation behavior still matches event identity semantics

Phase 7: Verification

7.1 Supabase-backed regression scenarios

Reproduce the previous "known event_id detail lookup" scenario and verify get_event is used
Reproduce create-event scenario and verify canonical field names only
Reproduce same-day and range listing scenarios

7.2 Runtime cost controls

Verify worker max iterations are capped at 7
Verify worker does not spend extra turns reading unnecessary skill files
Review whether the redesign keeps common runs under the target token/cost budget

7.3 Code quality

Run targeted backend tests
Run targeted frontend tests if parser logic changes
Run relevant integration or live tests when feasible
Record what was verified and what remains unverified

Completion Criteria

project_cli remains the only executable business tool
Worker-facing CLI protocol uses module/method/input
Calendar actions map to actual product objects and routes
Skills are index-first and action-scoped
Worker max_iters=7 is wired
Worker context_messages ambiguity is removed
Docs, backend, and frontend expectations are aligned

Current Status Note

Backend protocol and unit/regression tests are updated to module/method/input
Calendar read inputs now use strong typed date / timezone-aware datetime / UUID validation
Integration/live tests have been rerun after the module/method redesign
project_cli tool schema: input changed from optional to required (root cause of empty input bug)
Router prompt cleaned: removed project_cli_defaults and time-resolution duties
Worker contract prompt cleaned: removed project_cli_defaults reference
Calendar SKILL.md rewritten with concrete examples referencing USER_CONTEXT_JSON variables
Integration test assertions migrated from skill/action to module/method
5/6 integration tests passing (calendar read, calendar create, contacts read, memory update, tool flow read)
test_tool_ui_schema_in_history failing: history API returns tool messages without metadata.tool_agent_output (pre-existing issue, not related to prompt changes)
Action card filenames under calendar/actions/ still use old names (list_day.md, get_event.md) instead of method-based names matching module/method contract
Per-method review needed: verify each project_cli method (create, update, delete, share, accept_invite, reject_invite) works end-to-end with current prompts

10 KiB Raw Blame History