# AgentScope Skill + CLI Tool Refactor Implementation Checklist

## Purpose

This file is the execution checklist for implementing the PRD in:

- `.trellis/tasks/04-20-refactor-tool-cli-skill-ui-schema/prd.md`

Use this document as the working guide during implementation.
Do not mark an item complete until the code, docs, and verification for that item are actually done.

## Required Standards Read Before Backend Changes

- [x] Read `backend/AGENTS.md`
- [x] Read `.trellis/spec/backend/index.md`
- [x] Read `.trellis/spec/backend/database-guidelines.md`
- [x] Read `.trellis/spec/backend/error-handling.md`
- [x] Read `.trellis/spec/backend/logging-guidelines.md`
- [x] Read `.trellis/spec/backend/quality-guidelines.md`
- [x] Confirm `.trellis/spec/backend/type-safety.md` does not exist; use current backend schema/type rules from `backend/AGENTS.md` and repository code as the effective type-safety baseline

## Non-Negotiable Constraints

- [x] Protocol docs are updated before implementation changes that alter contracts
- [ ] New backend runtime code reads configuration only through `core.config.settings`
- [ ] New backend runtime code uses project logging, never `print()`
- [ ] New backend errors follow RFC 7807 with stable `code`
- [ ] Any new or changed error codes are updated in `docs/protocols/common/http-error-codes.md`
- [ ] Repository/service layering remains intact
- [ ] `owner_id` is never treated as a credential
- [ ] No new error swallowing is introduced
- [x] `ToolAgentOutput.result` remains the canonical machine-oriented tool result field

## Execution Order

- [ ] Phase 0 completed before any runtime contract change is implemented
- [x] Phase 1 completed before replacing tool execution with CLI-backed wrappers
- [ ] Phase 2 completed before auth credential transport is wired into queue/runtime
- [ ] Phase 3 completed before frontend contract alignment begins
- [ ] Phase 4 completed before cleanup is considered done
- [ ] Phase 5 verification completed before task is marked finished

## Phase 0: Protocol Docs First

### 0.1 Define the tool protocol source of truth

- [x] Add `docs/protocols/agent/tool-protocol.md`
- [x] Document that CLI execution produces structured `result` as the source payload
- [x] Document that `ToolResponse` only carries the text projection of `result`
- [x] Document that runtime tool post-processing reconstructs full `ToolAgentOutput`
- [x] Document that tool post-processing is responsible for `status`, `error`, and `ui_hints`
- [x] Document that `message.content` is the full JSON text projection of `ToolAgentOutput.result`
- [x] Document that `ToolAgentOutput` is used for SSE, persistence, history recovery, and context rebuild
- [x] Document CLI input channel split: `argv` primary, `stdin` secondary, environment variables for controlled auth injection
- [x] Document stdout JSON shape and non-zero exit semantics
- [x] Document that shell execution is not exposed; router is whitelist-only

### 0.2 Remove `ui_schema` from active protocol

- [x] Update `docs/protocols/ui/data-flow.md`
- [x] Replace worker-driven UI source descriptions with tool-driven `ui_hints`
- [x] Explicitly document that worker output no longer includes `ui_hints`
- [x] Explicitly document that history tool UI recovery reads `metadata.tool_agent_output.ui_hints` and compiles to `ui_schema`
- [x] Update `docs/protocols/ui/ui-schema.md`
- [x] Clarify that `ui_hints` is the descriptive UI representation (source), `ui_schema` is the rendered format (wire format)
- [x] Clarify that frontend renderer continues to consume `ui_schema`
- [x] Document that `ui_hints → ui_schema` compilation path remains unchanged, only `ui_hints` source changes

### 0.3 Update SSE and HTTP contracts

- [x] Update `docs/protocols/agent/sse-events.md`
- [x] Remove worker `ui_hints` from `TEXT_MESSAGE_END`
- [x] Define `TOOL_CALL_RESULT` payload with `ui_schema` (compiled from `ui_hints`)
- [x] Document that `ui_hints → ui_schema` compilation happens in backend codec
- [x] Provide examples where `result` is object-shaped instead of string-shaped
- [x] Update `docs/protocols/agent/api-endpoints.md`
- [x] Define `/history` response contract for tool UI replay from `metadata.tool_agent_output.ui_hints` compiled to `ui_schema`
- [x] Remove any statement that `/history` is assistant-only UI-wise if tool UI replay is now supported
- [x] Update `docs/protocols/agent/run-agent-input.md`
- [x] Clarify that frontend does not submit auth token as tool arg
- [x] Clarify that backend-controlled tool registration remains backend-owned

### 0.4 Update auth and automation protocol docs

- [x] Update `docs/protocols/models/auth.md`
- [x] Define controlled credential purpose, TTL, scope, and audit expectations
- [x] Define relationship between normal bearer issuer and automation credential issuer
- [x] Update `docs/protocols/models/automation-jobs.md`
- [x] State that `owner_id` is only an identity reference, not a credential
- [x] Document automation credential issuance path before queue/runtime execution
- [x] Update `docs/protocols/common/http-error-codes.md` if new codes are introduced for CLI/runtime/credential failures

### 0.5 Phase 0 verification

- [x] Confirm protocol docs no longer describe worker `ui_hints` as UI source
- [x] Confirm protocol docs explicitly document `ui_hints → ui_schema` compilation path
- [x] Confirm docs explicitly define ToolResponse vs ToolAgentOutput responsibility split
- [x] Confirm docs explicitly define `/history` tool UI replay path (from `ui_hints` compiled to `ui_schema`)
- [x] Confirm docs explicitly define controlled credential transport and TTL

## Phase 1: Backend Contract Models And Persistence Path

### 1.1 Refactor runtime schemas

- [x] Update `backend/src/schemas/agent/runtime_models.py`
- [x] Remove `WorkerAgentOutputRich.ui_hints`
- [x] Remove `AgentOutput` inheritance that depends on worker UI payload
- [x] Make `resolve_worker_output_model()` return the non-UI worker output model path
- [x] Change `ToolAgentOutput.result` from `str` to JSON-native structured payload type
- [x] Add `ui_hints` to `ToolAgentOutput`
- [x] Keep `ToolAgentOutput` strict with `extra="forbid"`
- [x] Review any validator changes required to keep result deterministic and JSON-native

### 1.2 Update chat message metadata schema consumers

- [x] Review `backend/src/schemas/domain/chat_message.py`
- [x] Ensure `tool_agent_output` accepts the updated structured `ToolAgentOutput`
- [x] Confirm metadata serialization remains compatible with persistence and context cache usage

### 1.3 Separate ToolResponse from ToolAgentOutput

- [x] Update `backend/src/core/agentscope/tools/utils/tool_response_builder.py`
- [x] Stop serializing full `ToolAgentOutput` directly into `ToolResponse.content`
- [x] Make `build_tool_response()` emit only the text projection of `result`
- [x] Decide and implement the helper that projects structured `result` to stable JSON text
- [x] Update error response builder to follow the same split cleanly

### 1.4 Add tool post-processing path

- [x] Introduce a runtime tool post-processing module in backend tool/runtime layer
- [x] Define the post-processor input contract from raw tool execution result
- [x] Define the post-processor output as full `ToolAgentOutput`
- [x] Ensure post-processor is the only place generating `ui_hints` for tools
- [x] Ensure worker code does not generate tool UI fields anymore

### 1.5 Update parsing and stage emission

- [x] Update `backend/src/core/agentscope/utils/parsing.py`
- [x] Stop assuming text blocks contain full serialized `ToolAgentOutput`
- [x] Add helpers to parse the text projection back into structured result where required
- [x] Update `backend/src/core/agentscope/runtime/stage_emitter.py`
- [x] Remove worker `ui_hints` emission from final text events
- [x] Emit `TOOL_CALL_RESULT` based on full post-processed `ToolAgentOutput`
- [x] Ensure emitted tool payload carries structured `result` and `ui_hints`

### 1.6 Update AG-UI codec and event storage

- [x] Update `backend/src/core/agentscope/events/agui_codec.py`
- [x] Remove worker `ui_hints -> ui_schema` compilation path
- [x] Remove `ui_schema`-specific output shaping
- [x] Ensure tool events pass through tool-derived `ui_hints`
- [x] Update `backend/src/core/agentscope/events/store.py`
- [x] Persist tool message `content` as the JSON text projection of `result`
- [x] Persist full post-processed `ToolAgentOutput` in metadata
- [x] Ensure worker metadata no longer expects `ui_hints`

### 1.7 Unify cold/hot runtime paths

- [x] Update `backend/src/core/agentscope/runtime/tasks.py`
- [x] Replace `_serialize_tool_agent_output()` assumptions that rely on old `ToolAgentOutput` shape
- [x] Ensure context rebuild uses the same content projection rule as hot-path execution
- [x] Stop rebuilding tool context from legacy string-only result assumptions
- [x] Review `backend/src/core/agentscope/caches/context_messages_cache.py`
- [x] Define whether old cache payloads are backward-read compatible or intentionally invalidated
- [x] Ensure runtime cold path and hot path see the same tool message shape

### 1.8 Update `/history` backend shaping

- [x] Update `backend/src/v1/agent/utils.py`
- [x] Remove worker `ui_hints` compilation logic
- [x] Stop returning `ui_schema`
- [x] Add tool UI replay logic from `metadata.tool_agent_output.ui_hints`
- [x] Keep user attachment handling intact
- [x] Update `backend/src/v1/agent/schemas.py`
- [x] Remove `UiSchemaRenderer` dependency from `HistoryMessage`
- [x] Redefine history response shape to carry tool UI replay payload
- [x] Update role constraints if tool-derived history items need explicit representation
- [x] Review `backend/src/v1/agent/repository.py` for any history query assumptions that prevent tool UI replay

### 1.9 Phase 1 verification

- [x] Unit tests cover `ToolAgentOutput.result` as structured payload
- [x] Unit tests confirm worker output schema no longer includes `ui_hints`
- [x] Unit tests confirm ToolResponse no longer embeds full ToolAgentOutput
- [x] Unit tests confirm event store persists full ToolAgentOutput metadata and projected content separately
- [x] Unit tests confirm `/history` shaping no longer emits `ui_schema`
- [x] Unit tests confirm tool UI replay uses `metadata.tool_agent_output.ui_hints`

## Phase 2: CLI-Backed Tools And Skill Registration

### 2.1 Replace direct Python tool registration

- [x] Update `backend/src/core/agentscope/tools/tool_config.py`
- [x] Replace function-name-centric mapping with CLI capability/wrapper-centric mapping
- [x] Unify config and runtime skill selection on `enabled_skills`
- [x] Keep approval config support aligned with the new tool names
- [x] Update `backend/src/core/agentscope/tools/toolkit.py`
- [x] Remove direct imports of `custom/calendar.py`, `custom/memory.py`, `custom/user_lookup.py`
- [x] Register CLI-backed wrappers instead of Python business functions
- [x] Preserve `enabled_skills` filtering behavior

### 2.2 Add CLI adapter, router, and entrypoint

- [x] Add a CLI adapter module in `backend/src/core/agentscope/tools/`
- [x] Adapter must invoke only the project CLI entrypoint
- [x] Adapter must pass args via `argv` primarily and `stdin` secondarily where required
- [x] Adapter must inject auth credential only via controlled environment variables
- [x] Adapter must parse stdout JSON and map failures to structured errors
- [x] Add a command router module in `backend/src/core/agentscope/tools/`
- [x] Router must be whitelist-only
- [x] Router must map commands to Python handlers
- [x] Router must not expose generic shell execution
- [x] Add a Python console entrypoint module in `backend/src/core/agentscope/tools/`
- [x] Update `pyproject.toml` with the console script entry

### 2.3 Migrate tool implementations to CLI handlers

- [x] Replace old `backend/src/core/agentscope/tools/custom/*.py` direct runtime tools with CLI handler implementations
- [x] Remove old direct AgentScope tool-function implementations from final runtime wiring
- [x] Ensure new handlers only call allowed internal services/repositories
- [x] Ensure handler boundaries follow schema -> repository -> service layering
- [x] Ensure handlers raise typed errors instead of transport exceptions where applicable

### 2.4 Register AgentScope skills

- [x] Populate `backend/src/core/agentscope/tools/custom` with skill assets using AgentScope-native layout
- [x] Add required `SKILL.md` files
- [x] Ensure skill content explains when to use each tool and how to compose them
- [x] Register skills through AgentScope-native registration path in toolkit/runtime setup
- [x] Ensure skill assets are included in runtime/deployment packaging

### 2.5 Update runner and middleware linkages

- [x] Update `backend/src/core/agentscope/runtime/runner.py`
- [x] Build toolkit from CLI-backed wrappers instead of Python functions
- [x] Keep `enabled_skills` and stage-based selection behavior intact
- [x] Update `backend/src/core/agentscope/tools/tool_middleware.py`
- [x] Ensure middleware name resolution still works with the new tool registration path
- [x] Update `backend/src/core/agentscope/prompts/agent_prompt.py`
- [x] Remove any prompt assumptions that still act as pseudo-skill behavior
- [x] Keep prompt aligned with skill-driven disclosure instead of duplicating the full tool contract

### 2.6 Phase 2 verification

- [x] Unit tests cover CLI adapter success path
- [x] Unit tests cover CLI adapter malformed stdout path
- [x] Unit tests cover CLI adapter non-zero exit path
- [x] Unit tests confirm toolkit only registers enabled CLI-backed tools
- [x] Unit tests confirm middleware still recognizes the active tool names
- [x] Smoke test confirms AgentScope skill registration succeeds from project skill assets

## Phase 3: Controlled Credential And Queue Transport

### 3.1 Define backend auth runtime objects

- [x] Review `backend/src/core/auth/models.py`
- [x] Add any missing auth runtime model needed for controlled credential transport
- [x] Keep `CurrentUser` as identity model if still appropriate, but do not overload it as credential carrier without an explicit design

### 3.2 Add controlled credential issuance path

- [x] Add a credential issuer service under `backend/src/core/auth/` or another appropriate auth module
- [x] Keep issuer in the same trust boundary as current bearer token issuing system
- [x] Ensure issued credential is short-lived according to PRD target
- [x] Ensure issuer encodes only the minimal scope required for tool execution
- [x] Ensure logs do not expose raw credentials

### 3.3 Wire chat enqueue path

- [x] Update `backend/src/v1/agent/service.py`
- [x] Stop enqueueing only `owner_id` for runtime auth purposes
- [x] Enqueue the controlled credential or resolvable credential handle required by worker runtime
- [x] Ensure queue payload does not expose raw token in model-visible fields
- [x] Keep session ownership checks intact

### 3.4 Wire automation dispatch path

- [x] Update `backend/src/core/automation/scheduler.py`
- [x] Stop creating runtime auth solely as `CurrentUser(id=owner_id)`
- [x] Issue or obtain automation controlled credential before enqueueing run
- [x] Ensure `owner_id` remains only a lookup/reference input
- [x] Ensure automation runtime uses the same CLI auth injection mechanism as chat runtime

### 3.5 Update task runtime injection

- [x] Update `backend/src/core/agentscope/runtime/tasks.py`
- [x] Read controlled credential from queued command payload
- [x] Inject controlled credential into CLI runtime environment variables
- [x] Remove any path that implicitly depends on `owner_id` as execution credential
- [x] Keep user-context loading behavior explicit and separate from auth credential handling

### 3.6 Add settings and error mapping

- [x] Update `backend/src/core/config/settings.py` for any new CLI/credential configuration
- [x] Keep new config values typed and centralized
- [x] Update error handling paths to use stable problem codes for credential/CLI failures
- [x] Update docs/protocols/common/http-error-codes.md if these codes are new

### 3.7 Phase 3 verification

- [x] Unit tests confirm chat enqueue includes required controlled credential transport data
- [x] Unit tests confirm automation dispatch no longer relies on `owner_id` as credential
- [x] Unit tests confirm task runtime injects controlled credential only via env vars
- [x] Unit tests confirm credential issuance TTL and scope constraints
- [x] Logs and error payloads do not expose raw credentials

## Phase 4: Frontend Contract Alignment

### 4.1 Update event parsing

- [x] Update `apps/lib/core/chat/ag_ui_event.dart`
- [x] Remove active wire parsing paths that depend on `ui_schema`
- [x] Parse tool event `ui_hints` directly from updated payload contract
- [x] Parse structured `result` instead of string-only assumptions

### 4.2 Update history parsing and cache

- [x] Update `apps/lib/core/chat/chat_history_repository.dart`
- [x] Align cached history format with the new backend history response shape
- [x] Ensure history replay can rebuild tool UI items from backend-provided tool metadata/UI payload

### 4.3 Update chat service and item models

- [x] Update `apps/lib/core/chat/ag_ui_service.dart`
- [x] Ensure SSE handling matches the new tool event contract
- [x] Update `apps/lib/core/chat/chat_list_item.dart`
- [x] Remove item model assumptions that a rendered UI payload must be named `uiSchema`

### 4.4 Update rendering path

- [x] Update `apps/lib/features/chat/presentation/bloc/chat_bloc_events.dart`
- [x] Ensure tool results become visible UI items through direct tool payloads
- [x] Update `apps/lib/features/home/presentation/widgets/home_chat_item_renderer.dart`
- [x] Continue reusing the existing renderer component if it still fits the new input shape
- [x] Update `apps/lib/shared/widgets/ui_schema/ui_schema_renderer.dart` only as needed to accept the new direct tool UI input contract

### 4.5 Phase 4 verification

- [x] Frontend tests confirm SSE tool event parsing without `ui_schema`
- [x] Frontend tests confirm history replay rebuilds tool UI correctly
- [x] Frontend tests confirm refresh/reload still shows prior tool UI consistently

## Phase 5: Cleanup, Regression Tests, And Final Validation

### 5.1 Backend test updates

- [x] Update `backend/tests/unit/core/agentscope/events/test_store.py`
- [x] Update `backend/tests/unit/core/agentscope/events/test_agui_codec.py`
- [x] Update `backend/tests/unit/core/agentscope/runtime/test_stage_emitter.py`
- [x] Update `backend/tests/unit/core/agentscope/runtime/test_tasks.py`
- [x] Update `backend/tests/unit/v1/agent/test_utils.py`
- [x] Update `backend/tests/unit/schemas/agent/test_runtime_models.py`
- [x] Add tests for CLI adapter, command router, and tool post-processing
- [x] Add tests for controlled credential issuance and queue transport

### 5.2 Frontend test updates

- [x] Update `apps/test/core/chat/ag_ui_event_test.dart`
- [x] Update `apps/test/features/chat/presentation/bloc/chat_bloc_test.dart`
- [x] Add tests for history repository if needed by the new replay contract

### 5.3 Remove obsolete code paths

- [x] Remove worker `ui_hints` usage from runtime/event/history code paths
- [x] Remove active `ui_schema` contract usage from backend response shaping (N/A - ui_schema is still used as wire format)
- [x] Remove old direct `custom/*.py` tool runtime wiring
- [x] Remove any parsing logic that assumes `ToolResponse` carries full ToolAgentOutput JSON
- [x] Remove dead compatibility helpers only after replacement path is verified

### 5.4 Run verification commands

- [x] Run relevant backend unit tests with `uv run pytest ...`
- [x] Run relevant frontend tests
- [x] Run backend lint checks required for touched files
- [x] Run backend type checks required for touched files
- [x] If skill registration/package wiring changed, run a focused smoke check of the CLI-backed tool path

### 5.5 Final acceptance audit against PRD

- [x] `ui_hints → ui_schema` compilation path is preserved (only `ui_hints` source changes from worker to tool)
- [x] `WorkerAgentOutput` no longer has `ui_hints`
- [x] `/history` tool UI replay compiles `metadata.tool_agent_output.ui_hints` to `ui_schema`
- [x] `ToolResponse` carries only projected result text
- [x] Tool post-processor generates full `ToolAgentOutput`
- [x] `ToolAgentOutput.result` is structured and machine-oriented
- [x] `message.content` is the full JSON text projection of `result`
- [x] CLI uses whitelist router and no shell execution path
- [x] Chat and automation both use controlled credential injection, not `owner_id` as credential
- [x] AgentScope skills are registered from project skill assets
- [x] Hot path and cold path tool context are unified
- [x] Frontend receives `ui_schema` from `TOOL_CALL_RESULT` and history
- [x] Relevant docs, tests, lint, and type checks are updated

## Suggested First Implementation Slice

- [ ] Complete Phase 0 only
- [ ] Do not start backend runtime refactor until Phase 0 contract text is committed and reviewed

## Progress Log

- [x] Phase 0 complete
- [x] Phase 1 complete
- [x] Phase 2 complete
- [x] Phase 3 complete
- [x] Phase 4 complete
- [x] Phase 5 complete
- [x] 2026-04-23: finished frontend cleanup for legacy tool-call interim events/cards; tool UI render path is now `TOOL_CALL_RESULT` + history replay only
- [x] 2026-04-23: documented `messages.content` decision to remain `text` (structured payload stays in metadata)
- [x] 2026-04-23: removed CLI alias compatibility and switched to canonical subcommands (`calendar.create/read/update/delete/share`, `contacts.read`, `memory.update`)
- [x] 2026-04-23: expanded protocol and postprocessor policy so canonical CRUD commands emit `ui_hints` (`calendar.create/read/update/delete`, `contacts.read`, `memory.update`)