# AgentScope Skill + CLI Tool Refactor Implementation Checklist ## Purpose This file is the execution checklist for implementing the PRD in: - `.trellis/tasks/04-20-refactor-tool-cli-skill-ui-schema/prd.md` Use this document as the working guide during implementation. Do not mark an item complete until the code, docs, and verification for that item are actually done. ## Required Standards Read Before Backend Changes - [x] Read `backend/AGENTS.md` - [x] Read `.trellis/spec/backend/index.md` - [x] Read `.trellis/spec/backend/database-guidelines.md` - [x] Read `.trellis/spec/backend/error-handling.md` - [x] Read `.trellis/spec/backend/logging-guidelines.md` - [x] Read `.trellis/spec/backend/quality-guidelines.md` - [x] Confirm `.trellis/spec/backend/type-safety.md` does not exist; use current backend schema/type rules from `backend/AGENTS.md` and repository code as the effective type-safety baseline ## Non-Negotiable Constraints - [x] Protocol docs are updated before implementation changes that alter contracts - [ ] New backend runtime code reads configuration only through `core.config.settings` - [ ] New backend runtime code uses project logging, never `print()` - [ ] New backend errors follow RFC 7807 with stable `code` - [ ] Any new or changed error codes are updated in `docs/protocols/common/http-error-codes.md` - [ ] Repository/service layering remains intact - [ ] `owner_id` is never treated as a credential - [ ] No new error swallowing is introduced - [x] `ToolAgentOutput.result` remains the canonical machine-oriented tool result field ## Execution Order - [ ] Phase 0 completed before any runtime contract change is implemented - [x] Phase 1 completed before replacing tool execution with CLI-backed wrappers - [ ] Phase 2 completed before auth credential transport is wired into queue/runtime - [ ] Phase 3 completed before frontend contract alignment begins - [ ] Phase 4 completed before cleanup is considered done - [ ] Phase 5 verification completed before task is marked finished ## Phase 0: Protocol Docs First ### 0.1 Define the tool protocol source of truth - [x] Add `docs/protocols/agent/tool-protocol.md` - [x] Document that CLI execution produces structured `result` as the source payload - [x] Document that `ToolResponse` only carries the text projection of `result` - [x] Document that runtime tool post-processing reconstructs full `ToolAgentOutput` - [x] Document that tool post-processing is responsible for `status`, `error`, and `ui_hints` - [x] Document that `message.content` is the full JSON text projection of `ToolAgentOutput.result` - [x] Document that `ToolAgentOutput` is used for SSE, persistence, history recovery, and context rebuild - [x] Document CLI input channel split: `argv` primary, `stdin` secondary, environment variables for controlled auth injection - [x] Document stdout JSON shape and non-zero exit semantics - [x] Document that shell execution is not exposed; router is whitelist-only ### 0.2 Remove `ui_schema` from active protocol - [x] Update `docs/protocols/ui/data-flow.md` - [x] Replace worker-driven UI source descriptions with tool-driven `ui_hints` - [x] Explicitly document that worker output no longer includes `ui_hints` - [x] Explicitly document that history tool UI recovery reads `metadata.tool_agent_output.ui_hints` and compiles to `ui_schema` - [x] Update `docs/protocols/ui/ui-schema.md` - [x] Clarify that `ui_hints` is the descriptive UI representation (source), `ui_schema` is the rendered format (wire format) - [x] Clarify that frontend renderer continues to consume `ui_schema` - [x] Document that `ui_hints → ui_schema` compilation path remains unchanged, only `ui_hints` source changes ### 0.3 Update SSE and HTTP contracts - [x] Update `docs/protocols/agent/sse-events.md` - [x] Remove worker `ui_hints` from `TEXT_MESSAGE_END` - [x] Define `TOOL_CALL_RESULT` payload with `ui_schema` (compiled from `ui_hints`) - [x] Document that `ui_hints → ui_schema` compilation happens in backend codec - [x] Provide examples where `result` is object-shaped instead of string-shaped - [x] Update `docs/protocols/agent/api-endpoints.md` - [x] Define `/history` response contract for tool UI replay from `metadata.tool_agent_output.ui_hints` compiled to `ui_schema` - [x] Remove any statement that `/history` is assistant-only UI-wise if tool UI replay is now supported - [x] Update `docs/protocols/agent/run-agent-input.md` - [x] Clarify that frontend does not submit auth token as tool arg - [x] Clarify that backend-controlled tool registration remains backend-owned ### 0.4 Update auth and automation protocol docs - [x] Update `docs/protocols/models/auth.md` - [x] Define controlled credential purpose, TTL, scope, and audit expectations - [x] Define relationship between normal bearer issuer and automation credential issuer - [x] Update `docs/protocols/models/automation-jobs.md` - [x] State that `owner_id` is only an identity reference, not a credential - [x] Document automation credential issuance path before queue/runtime execution - [x] Update `docs/protocols/common/http-error-codes.md` if new codes are introduced for CLI/runtime/credential failures ### 0.5 Phase 0 verification - [x] Confirm protocol docs no longer describe worker `ui_hints` as UI source - [x] Confirm protocol docs explicitly document `ui_hints → ui_schema` compilation path - [x] Confirm docs explicitly define ToolResponse vs ToolAgentOutput responsibility split - [x] Confirm docs explicitly define `/history` tool UI replay path (from `ui_hints` compiled to `ui_schema`) - [x] Confirm docs explicitly define controlled credential transport and TTL ## Phase 1: Backend Contract Models And Persistence Path ### 1.1 Refactor runtime schemas - [x] Update `backend/src/schemas/agent/runtime_models.py` - [x] Remove `WorkerAgentOutputRich.ui_hints` - [x] Remove `AgentOutput` inheritance that depends on worker UI payload - [x] Make `resolve_worker_output_model()` return the non-UI worker output model path - [x] Change `ToolAgentOutput.result` from `str` to JSON-native structured payload type - [x] Add `ui_hints` to `ToolAgentOutput` - [x] Keep `ToolAgentOutput` strict with `extra="forbid"` - [x] Review any validator changes required to keep result deterministic and JSON-native ### 1.2 Update chat message metadata schema consumers - [x] Review `backend/src/schemas/domain/chat_message.py` - [x] Ensure `tool_agent_output` accepts the updated structured `ToolAgentOutput` - [x] Confirm metadata serialization remains compatible with persistence and context cache usage ### 1.3 Separate ToolResponse from ToolAgentOutput - [x] Update `backend/src/core/agentscope/tools/utils/tool_response_builder.py` - [x] Stop serializing full `ToolAgentOutput` directly into `ToolResponse.content` - [x] Make `build_tool_response()` emit only the text projection of `result` - [x] Decide and implement the helper that projects structured `result` to stable JSON text - [x] Update error response builder to follow the same split cleanly ### 1.4 Add tool post-processing path - [x] Introduce a runtime tool post-processing module in backend tool/runtime layer - [x] Define the post-processor input contract from raw tool execution result - [x] Define the post-processor output as full `ToolAgentOutput` - [x] Ensure post-processor is the only place generating `ui_hints` for tools - [x] Ensure worker code does not generate tool UI fields anymore ### 1.5 Update parsing and stage emission - [x] Update `backend/src/core/agentscope/utils/parsing.py` - [x] Stop assuming text blocks contain full serialized `ToolAgentOutput` - [x] Add helpers to parse the text projection back into structured result where required - [x] Update `backend/src/core/agentscope/runtime/stage_emitter.py` - [x] Remove worker `ui_hints` emission from final text events - [x] Emit `TOOL_CALL_RESULT` based on full post-processed `ToolAgentOutput` - [x] Ensure emitted tool payload carries structured `result` and `ui_hints` ### 1.6 Update AG-UI codec and event storage - [x] Update `backend/src/core/agentscope/events/agui_codec.py` - [x] Remove worker `ui_hints -> ui_schema` compilation path - [x] Remove `ui_schema`-specific output shaping - [x] Ensure tool events pass through tool-derived `ui_hints` - [x] Update `backend/src/core/agentscope/events/store.py` - [x] Persist tool message `content` as the JSON text projection of `result` - [x] Persist full post-processed `ToolAgentOutput` in metadata - [x] Ensure worker metadata no longer expects `ui_hints` ### 1.7 Unify cold/hot runtime paths - [x] Update `backend/src/core/agentscope/runtime/tasks.py` - [x] Replace `_serialize_tool_agent_output()` assumptions that rely on old `ToolAgentOutput` shape - [x] Ensure context rebuild uses the same content projection rule as hot-path execution - [x] Stop rebuilding tool context from legacy string-only result assumptions - [x] Review `backend/src/core/agentscope/caches/context_messages_cache.py` - [x] Define whether old cache payloads are backward-read compatible or intentionally invalidated - [x] Ensure runtime cold path and hot path see the same tool message shape ### 1.8 Update `/history` backend shaping - [x] Update `backend/src/v1/agent/utils.py` - [x] Remove worker `ui_hints` compilation logic - [x] Stop returning `ui_schema` - [x] Add tool UI replay logic from `metadata.tool_agent_output.ui_hints` - [x] Keep user attachment handling intact - [x] Update `backend/src/v1/agent/schemas.py` - [x] Remove `UiSchemaRenderer` dependency from `HistoryMessage` - [x] Redefine history response shape to carry tool UI replay payload - [x] Update role constraints if tool-derived history items need explicit representation - [x] Review `backend/src/v1/agent/repository.py` for any history query assumptions that prevent tool UI replay ### 1.9 Phase 1 verification - [x] Unit tests cover `ToolAgentOutput.result` as structured payload - [x] Unit tests confirm worker output schema no longer includes `ui_hints` - [x] Unit tests confirm ToolResponse no longer embeds full ToolAgentOutput - [x] Unit tests confirm event store persists full ToolAgentOutput metadata and projected content separately - [x] Unit tests confirm `/history` shaping no longer emits `ui_schema` - [x] Unit tests confirm tool UI replay uses `metadata.tool_agent_output.ui_hints` ## Phase 2: CLI-Backed Tools And Skill Registration ### 2.1 Replace direct Python tool registration - [x] Update `backend/src/core/agentscope/tools/tool_config.py` - [x] Replace function-name-centric mapping with CLI capability/wrapper-centric mapping - [x] Unify config and runtime skill selection on `enabled_skills` - [x] Keep approval config support aligned with the new tool names - [x] Update `backend/src/core/agentscope/tools/toolkit.py` - [x] Remove direct imports of `custom/calendar.py`, `custom/memory.py`, `custom/user_lookup.py` - [x] Register CLI-backed wrappers instead of Python business functions - [x] Preserve `enabled_skills` filtering behavior ### 2.2 Add CLI adapter, router, and entrypoint - [x] Add a CLI adapter module in `backend/src/core/agentscope/tools/` - [x] Adapter must invoke only the project CLI entrypoint - [x] Adapter must pass args via `argv` primarily and `stdin` secondarily where required - [x] Adapter must inject auth credential only via controlled environment variables - [x] Adapter must parse stdout JSON and map failures to structured errors - [x] Add a command router module in `backend/src/core/agentscope/tools/` - [x] Router must be whitelist-only - [x] Router must map commands to Python handlers - [x] Router must not expose generic shell execution - [x] Add a Python console entrypoint module in `backend/src/core/agentscope/tools/` - [x] Update `pyproject.toml` with the console script entry ### 2.3 Migrate tool implementations to CLI handlers - [x] Replace old `backend/src/core/agentscope/tools/custom/*.py` direct runtime tools with CLI handler implementations - [x] Remove old direct AgentScope tool-function implementations from final runtime wiring - [x] Ensure new handlers only call allowed internal services/repositories - [x] Ensure handler boundaries follow schema -> repository -> service layering - [x] Ensure handlers raise typed errors instead of transport exceptions where applicable ### 2.4 Register AgentScope skills - [x] Populate `backend/src/core/agentscope/tools/custom` with skill assets using AgentScope-native layout - [x] Add required `SKILL.md` files - [x] Ensure skill content explains when to use each tool and how to compose them - [x] Register skills through AgentScope-native registration path in toolkit/runtime setup - [x] Ensure skill assets are included in runtime/deployment packaging ### 2.5 Update runner and middleware linkages - [x] Update `backend/src/core/agentscope/runtime/runner.py` - [x] Build toolkit from CLI-backed wrappers instead of Python functions - [x] Keep `enabled_skills` and stage-based selection behavior intact - [x] Update `backend/src/core/agentscope/tools/tool_middleware.py` - [x] Ensure middleware name resolution still works with the new tool registration path - [x] Update `backend/src/core/agentscope/prompts/agent_prompt.py` - [x] Remove any prompt assumptions that still act as pseudo-skill behavior - [x] Keep prompt aligned with skill-driven disclosure instead of duplicating the full tool contract ### 2.6 Phase 2 verification - [x] Unit tests cover CLI adapter success path - [x] Unit tests cover CLI adapter malformed stdout path - [x] Unit tests cover CLI adapter non-zero exit path - [x] Unit tests confirm toolkit only registers enabled CLI-backed tools - [x] Unit tests confirm middleware still recognizes the active tool names - [x] Smoke test confirms AgentScope skill registration succeeds from project skill assets ## Phase 3: Controlled Credential And Queue Transport ### 3.1 Define backend auth runtime objects - [x] Review `backend/src/core/auth/models.py` - [x] Add any missing auth runtime model needed for controlled credential transport - [x] Keep `CurrentUser` as identity model if still appropriate, but do not overload it as credential carrier without an explicit design ### 3.2 Add controlled credential issuance path - [x] Add a credential issuer service under `backend/src/core/auth/` or another appropriate auth module - [x] Keep issuer in the same trust boundary as current bearer token issuing system - [x] Ensure issued credential is short-lived according to PRD target - [x] Ensure issuer encodes only the minimal scope required for tool execution - [x] Ensure logs do not expose raw credentials ### 3.3 Wire chat enqueue path - [x] Update `backend/src/v1/agent/service.py` - [x] Stop enqueueing only `owner_id` for runtime auth purposes - [x] Enqueue the controlled credential or resolvable credential handle required by worker runtime - [x] Ensure queue payload does not expose raw token in model-visible fields - [x] Keep session ownership checks intact ### 3.4 Wire automation dispatch path - [x] Update `backend/src/core/automation/scheduler.py` - [x] Stop creating runtime auth solely as `CurrentUser(id=owner_id)` - [x] Issue or obtain automation controlled credential before enqueueing run - [x] Ensure `owner_id` remains only a lookup/reference input - [x] Ensure automation runtime uses the same CLI auth injection mechanism as chat runtime ### 3.5 Update task runtime injection - [x] Update `backend/src/core/agentscope/runtime/tasks.py` - [x] Read controlled credential from queued command payload - [x] Inject controlled credential into CLI runtime environment variables - [x] Remove any path that implicitly depends on `owner_id` as execution credential - [x] Keep user-context loading behavior explicit and separate from auth credential handling ### 3.6 Add settings and error mapping - [x] Update `backend/src/core/config/settings.py` for any new CLI/credential configuration - [x] Keep new config values typed and centralized - [x] Update error handling paths to use stable problem codes for credential/CLI failures - [x] Update docs/protocols/common/http-error-codes.md if these codes are new ### 3.7 Phase 3 verification - [x] Unit tests confirm chat enqueue includes required controlled credential transport data - [x] Unit tests confirm automation dispatch no longer relies on `owner_id` as credential - [x] Unit tests confirm task runtime injects controlled credential only via env vars - [x] Unit tests confirm credential issuance TTL and scope constraints - [x] Logs and error payloads do not expose raw credentials ## Phase 4: Frontend Contract Alignment ### 4.1 Update event parsing - [x] Update `apps/lib/core/chat/ag_ui_event.dart` - [x] Remove active wire parsing paths that depend on `ui_schema` - [x] Parse tool event `ui_hints` directly from updated payload contract - [x] Parse structured `result` instead of string-only assumptions ### 4.2 Update history parsing and cache - [x] Update `apps/lib/core/chat/chat_history_repository.dart` - [x] Align cached history format with the new backend history response shape - [x] Ensure history replay can rebuild tool UI items from backend-provided tool metadata/UI payload ### 4.3 Update chat service and item models - [x] Update `apps/lib/core/chat/ag_ui_service.dart` - [x] Ensure SSE handling matches the new tool event contract - [x] Update `apps/lib/core/chat/chat_list_item.dart` - [x] Remove item model assumptions that a rendered UI payload must be named `uiSchema` ### 4.4 Update rendering path - [x] Update `apps/lib/features/chat/presentation/bloc/chat_bloc_events.dart` - [x] Ensure tool results become visible UI items through direct tool payloads - [x] Update `apps/lib/features/home/presentation/widgets/home_chat_item_renderer.dart` - [x] Continue reusing the existing renderer component if it still fits the new input shape - [x] Update `apps/lib/shared/widgets/ui_schema/ui_schema_renderer.dart` only as needed to accept the new direct tool UI input contract ### 4.5 Phase 4 verification - [x] Frontend tests confirm SSE tool event parsing without `ui_schema` - [x] Frontend tests confirm history replay rebuilds tool UI correctly - [x] Frontend tests confirm refresh/reload still shows prior tool UI consistently ## Phase 5: Cleanup, Regression Tests, And Final Validation ### 5.1 Backend test updates - [x] Update `backend/tests/unit/core/agentscope/events/test_store.py` - [x] Update `backend/tests/unit/core/agentscope/events/test_agui_codec.py` - [x] Update `backend/tests/unit/core/agentscope/runtime/test_stage_emitter.py` - [x] Update `backend/tests/unit/core/agentscope/runtime/test_tasks.py` - [x] Update `backend/tests/unit/v1/agent/test_utils.py` - [x] Update `backend/tests/unit/schemas/agent/test_runtime_models.py` - [x] Add tests for CLI adapter, command router, and tool post-processing - [x] Add tests for controlled credential issuance and queue transport ### 5.2 Frontend test updates - [x] Update `apps/test/core/chat/ag_ui_event_test.dart` - [x] Update `apps/test/features/chat/presentation/bloc/chat_bloc_test.dart` - [x] Add tests for history repository if needed by the new replay contract ### 5.3 Remove obsolete code paths - [x] Remove worker `ui_hints` usage from runtime/event/history code paths - [x] Remove active `ui_schema` contract usage from backend response shaping (N/A - ui_schema is still used as wire format) - [x] Remove old direct `custom/*.py` tool runtime wiring - [x] Remove any parsing logic that assumes `ToolResponse` carries full ToolAgentOutput JSON - [x] Remove dead compatibility helpers only after replacement path is verified ### 5.4 Run verification commands - [x] Run relevant backend unit tests with `uv run pytest ...` - [x] Run relevant frontend tests - [x] Run backend lint checks required for touched files - [x] Run backend type checks required for touched files - [x] If skill registration/package wiring changed, run a focused smoke check of the CLI-backed tool path ### 5.5 Final acceptance audit against PRD - [x] `ui_hints → ui_schema` compilation path is preserved (only `ui_hints` source changes from worker to tool) - [x] `WorkerAgentOutput` no longer has `ui_hints` - [x] `/history` tool UI replay compiles `metadata.tool_agent_output.ui_hints` to `ui_schema` - [x] `ToolResponse` carries only projected result text - [x] Tool post-processor generates full `ToolAgentOutput` - [x] `ToolAgentOutput.result` is structured and machine-oriented - [x] `message.content` is the full JSON text projection of `result` - [x] CLI uses whitelist router and no shell execution path - [x] Chat and automation both use controlled credential injection, not `owner_id` as credential - [x] AgentScope skills are registered from project skill assets - [x] Hot path and cold path tool context are unified - [x] Frontend receives `ui_schema` from `TOOL_CALL_RESULT` and history - [x] Relevant docs, tests, lint, and type checks are updated ## Suggested First Implementation Slice - [ ] Complete Phase 0 only - [ ] Do not start backend runtime refactor until Phase 0 contract text is committed and reviewed ## Progress Log - [x] Phase 0 complete - [x] Phase 1 complete - [x] Phase 2 complete - [x] Phase 3 complete - [x] Phase 4 complete - [x] Phase 5 complete - [x] 2026-04-23: finished frontend cleanup for legacy tool-call interim events/cards; tool UI render path is now `TOOL_CALL_RESULT` + history replay only - [x] 2026-04-23: documented `messages.content` decision to remain `text` (structured payload stays in metadata) - [x] 2026-04-23: removed CLI alias compatibility and switched to canonical subcommands (`calendar.create/read/update/delete/share`, `contacts.read`, `memory.update`) - [x] 2026-04-23: expanded protocol and postprocessor policy so canonical CRUD commands emit `ui_hints` (`calendar.create/read/update/delete`, `contacts.read`, `memory.update`)