feat(agent): redesign project_cli with module/method/input protocol

- Replace command/subcommand/args with module/method/input envelope
- Calendar handler uses discriminated union (mode) for read operations
- Strict Pydantic models with extra='forbid' for all calendar methods
- Worker max_iters=7, router prompt simplified (removed project_cli_defaults)
- Skill index cards + per-action files for progressive disclosure
- Frontend/AG-UI aligned to module/method dispatch
- Protocol docs updated to module/method/input contract

WIP: action cards need envelope fix, 2 tests need update, memory
handler needs Pydantic models.
This commit is contained in:
qzl
2026-04-24 13:24:13 +08:00
parent ab526af2c4
commit d060962a5f
62 changed files with 4802 additions and 805 deletions
@@ -0,0 +1,3 @@
{"file": ".opencode/commands/trellis/finish-work.md", "reason": "Finish work checklist"}
{"file": ".opencode/commands/trellis/check-backend.md", "reason": "Backend check spec"}
{"file": ".opencode/commands/trellis/check-frontend.md", "reason": "Frontend check spec"}
@@ -0,0 +1,2 @@
{"file": ".opencode/commands/trellis/check-backend.md", "reason": "Backend check spec"}
{"file": ".opencode/commands/trellis/check-frontend.md", "reason": "Frontend check spec"}
@@ -0,0 +1,173 @@
# Decision Log: Single CLI + Progressive Skill Disclosure Redesign
## Accepted Decisions
### D1. Keep one executable tool
Accepted.
Reason:
- lower tool-selection complexity
- lower repeated schema exposure cost
- consistent with the original valid direction of the CLI refactor
### D2. Replace `command/subcommand/args` with `module/method/input`
Accepted.
Reason:
- keeps one tool while removing ambiguous CLI-history semantics
- aligns the worker-facing protocol with business intent
- reduces repeated failure from action guessing
- keeps skill files decoupled from the executable tool contract
### D2.1 Remove `skill` from `project_cli`
Accepted.
Reason:
- skill files are guidance, not transport
- project_cli should execute by business module and method only
- error messages and validation should remain tool-native and not point back into skill docs
### D2.2 Use strong typed calendar read inputs
Accepted.
Reason:
- `date: str` plus manual parsing is glue code and too easy to misuse
- Pydantic `date`, timezone-aware `datetime`, and `UUID` give stricter and clearer validation
- `calendar.read` can cover day/range/event reads with one module-scoped method while still keeping input modes explicit
### D3. Keep progressive disclosure through skill files
Accepted.
Reason:
- allows the model to load only current-scenario knowledge
- avoids injecting every action definition into every model call
- fits AgentScope skill usage better than giant tool schemas
### D4. Split skill knowledge into index + action cards
Accepted.
Reason:
- real progressive disclosure needs smaller files, not only a long `SKILL.md`
- action-scoped files are easier for the worker to read and apply correctly
### D5. Set worker `max_iters=7`
Accepted.
Reason:
- current default 10 is too high for repeated invalid action discovery
- 7 preserves room for complex tasks without keeping the current waste level
### D6. Keep worker temperature unchanged
Accepted.
Reason:
- explicit user requirement
- this task focuses on protocol clarity and token efficiency, not generation-style tuning
### D7. Remove semantic reliance on worker `context_messages`
Accepted.
Reason:
- current runtime does not feed those messages into worker execution
- keeping the config active on worker is misleading and complicates reasoning about cost
## Rejected Decisions
### R1. Re-split into many domain tools
Rejected.
Reason:
- increases tool schema size
- increases selection ambiguity
- pushes the design back toward the old high-token path
### R2. Keep old CLI shape and only improve skill writing
Rejected.
Reason:
- failure came from structural action ambiguity, not only poor wording
- `read` remains overloaded even with better prose
### R3. Keep broad legacy input compatibility
Rejected.
Reason:
- old aliases teach the model that guessing is acceptable
- compatibility paths increase parser complexity and maintenance burden
- the repository is still early enough to tighten the protocol cleanly
### R4. Add duplicate-failure circuit breaker now
Rejected for this task.
Reason:
- user explicitly wants to keep some exploration room while refining skills
- this redesign should first fix the protocol itself
## Open Questions To Resolve During Implementation
1. Should router output add all optional execution hint fields in one step or phase them in gradually?
2. Should `worker.config.context_messages` be removed from schema entirely or retained as ignored/deprecated for one migration cycle?
3. Should `calendar` action files be separate files under `actions/` or a single file with stable sections and `ranges` reads?
4. Should action validation errors include `suggested_alternative_actions` for every validation failure or only for selected known-confusion cases?
5. Should `archive` become an explicit calendar action now, or remain represented via `update_event.patch.status = archived` until there is a dedicated route and UI contract?
6. ~~Frontend/live integration assertions still need migration from `skill/action` to `module/method`~~ Resolved 2026-04-24: assertions migrated and integration tests passing.
## Session 2026-04-24: Integration Test Debugging
### D8. Tool schema `input` must be required, not optional
Accepted.
Root cause of `project_cli` repeatedly receiving `input: {}`:
- `input: dict[str, Any] | None = None` generated a tool schema with `input` as optional, nullable, and `additionalProperties: true`
- Small models (qwen3.5-flash) interpret this as "input can be anything, including empty object"
- The tool schema has higher priority than skill file content in the model's attention
- Fix: changed to `input: dict[str, Any]` (required, no default, no nullable)
### D9. Router must not resolve time or suggest tool args
Accepted.
Previous router prompt included instructions for:
- Including `project_cli_defaults` in `context_summary`
- Standardizing time values to project_cli input format
- Resolving relative dates via `system_time_local`
This violated the router/worker responsibility split:
- Router: intent extraction + context summary + requires_tool evidence
- Worker: tool selection + time resolution via skill + ENV section + tool execution
Fix: removed all `project_cli_defaults` and time-resolution instructions from router prompt.
Time resolution is now the sole responsibility of worker + skill file, using `system_time_local` from ENV section as the single time source.
### D10. Skill files should reference ENV section variable names explicitly
Accepted.
Instead of abstract instructions like "resolve dates using system_time_local", skill files should directly reference `system_time_local` and `timezone_effective` from `USER_CONTEXT_JSON` in the ENV section, with concrete examples showing how to extract values.
@@ -0,0 +1,6 @@
{"file": ".trellis/workflow.md", "reason": "Project workflow and conventions"}
{"file": ".trellis/spec/backend/index.md", "reason": "Backend development guide"}
{"file": ".trellis/spec/frontend/index.md", "reason": "Frontend development guide"}
{"file": ".trellis/spec/guides/cross-layer-thinking-guide.md", "reason": "Cross-layer contract checklist for protocol/backend/frontend alignment"}
{"file": ".trellis/tasks/archive/2026-04/04-20-refactor-tool-cli-skill-ui-schema/prd.md", "reason": "Previous refactor decisions to preserve or intentionally replace"}
{"file": ".trellis/tasks/04-23-redesign-single-cli-skill-disclosure/prd.md", "reason": "Current task source of truth"}
@@ -0,0 +1,241 @@
# Single CLI + Progressive Skill Disclosure Implementation Checklist
## Purpose
This checklist turns the PRD into an execution plan. Complete items in order. Do not mark an item complete until the code, docs, and verification for that item are actually done.
## Required Reading
- [x] Read `backend/AGENTS.md`
- [x] Read `apps/AGENTS.md`
- [x] Read `.trellis/workflow.md`
- [x] Read `.trellis/spec/backend/index.md`
- [x] Read `.trellis/spec/guides/cross-layer-thinking-guide.md`
- [x] Read the archived task docs in `.trellis/tasks/archive/2026-04/04-20-refactor-tool-cli-skill-ui-schema/`
## Locked Decisions For This Task
- [x] Router remains a direct structured stage, not ReAct
- [x] Worker remains the only ReAct stage
- [x] Worker `max_iters` target is 7
- [x] Worker `temperature` stays unchanged
- [x] Single executable tool entry remains `project_cli`
- [x] `command/subcommand/args` model-facing input will be replaced
- [x] The new model-facing input is `module/method/input`
- [x] No broad backward-compatibility aliases will be kept
- [x] Worker duplicate-failure circuit breaker is explicitly out of scope for this task
## Phase 0: Task and Protocol Planning
### 0.1 Task docs
- [x] Create new trellis task directory
- [x] Update `task.json` with real scope, summary, and related files
- [x] Write `prd.md`
- [x] Write `implementation-checklist.md`
- [x] Write `decision-log.md`
### 0.2 Design checkpoints captured
- [x] Record why multi-tool exposure is rejected
- [x] Record why old `command/subcommand/args` is rejected
- [x] Record why single-tool progressive disclosure is preserved
- [x] Record current Supabase failure evidence and what it implies
## Phase 1: Protocol Docs First
### 1.1 Update tool protocol docs
- [x] Update `docs/protocols/agent/tool-protocol.md`
- [x] Replace model-facing `command/subcommand/args` examples with `module/method/input`
- [x] Document thin outer tool schema + strict server-side action validation
- [x] Document structured validation errors with correction hints
### 1.2 Update agent protocol docs
- [x] Update `docs/protocols/agent/sse-events.md` if tool call arg examples change
- [x] Update `docs/protocols/agent/api-endpoints.md` if history/examples mention old CLI arg shapes
- [x] Update any agent protocol doc that currently assumes `calendar read/create/update/...`
### 1.3 Cross-layer contract review
- [x] Confirm backend examples, protocol docs, and frontend assumptions remain mutually consistent
- [x] Confirm no doc still teaches the model old alias names like `start_time/end_time`
## Phase 2: Router and Worker Runtime Contract
### 2.1 Router output schema
- [x] Keep existing `objective/context_summary/requires_tool_evidence` intact
- [x] Reject heavier router output expansion in favor of a lighter contract and stronger `context_summary`
- [x] Add/update tests for the retained lightweight router contract
### 2.2 Router prompting
- [x] Update `backend/src/core/agentscope/prompts/agent_prompt.py`
- [x] Teach router to make `context_summary` execution-useful when IDs, dates, ranges, or prior tool outcomes matter
- [x] Standardize all time values in `context_summary` to downstream project_cli input formats
- [x] Avoid turning router into an executor
### 2.3 Worker runtime settings
- [x] Update `backend/src/core/agentscope/runtime/runner.py` to pass `max_iters=7` into `JsonReActAgent`
- [x] Confirm worker `temperature` remains unchanged
- [x] Remove worker runtime dependence on `context_messages` semantics in prompt/runtime guidance
- [x] Keep schema unchanged for now, but stop exposing worker `context_messages` in worker prompt semantics
### 2.4 Phase 2 verification
- [x] Run targeted router/worker schema, prompt, and runner unit tests
- [x] Confirm worker prompt no longer advertises `context_messages.mode/count`
- [x] Confirm worker input still contains only the router contract message
- [x] Confirm worker agent construction passes `max_iters=7`
## Phase 3: Single CLI Input Protocol Redesign
### 3.1 Replace model-facing request envelope
- [x] Update `backend/src/core/agentscope/tools/internal/project_cli.py`
- [x] Update `backend/src/core/agentscope/tools/cli/adapter.py`
- [x] Replace `command/subcommand/args` with `module/method/input`
- [x] Remove `args` string parsing compatibility
- [x] Keep tool result persistence and AG-UI flow intact
### 3.2 Action dispatch layer
- [x] Add explicit dispatch by `module + method`
- [x] Add strict per-method Pydantic request models with `extra="forbid"` for calendar methods
- [x] Ensure unknown `module` and unknown `method` return structured errors
- [x] Ensure method validators surface structured error details for invalid/missing fields
### 3.3 Remove legacy input aliases
- [x] Reject `start_time/end_time`
- [x] Reject `event_timezone`
- [x] Reject using `event_id` with list-style actions
- [x] Confirm error messages are corrective rather than generic only
## Phase 4: Calendar Business Action Protocol
### 4.1 Event actions
- [x] Implement `list_day`
- [x] Implement `list_range`
- [x] Implement `get_event`
- [x] Implement `create_event`
- [x] Implement `update_event`
- [x] Implement `delete_event`
### 4.2 Subscription actions
- [x] Implement `invite_subscriber`
- [x] Implement `accept_invite`
- [x] Implement `reject_invite`
### 4.3 Handler mapping
- [x] Map actions onto existing `v1.schedule_items.service` operations where possible
- [x] Keep repository -> service layering intact
- [x] Keep `owner_id` derived from auth context, never from tool input
- [x] Preserve existing permission and subscription semantics
### 4.4 Test coverage
- [x] Add targeted unit coverage for calendar action validation paths and dispatch shape changes
- [x] Add unit tests for dispatch selection and validation errors
- [x] Add regression tests for the known `event_id` detail flow
- [x] Add regression tests for canonical create/update field names
### 4.5 Phase 3 partial verification
- [x] Run targeted CLI router, calendar handler, and tool postprocessor unit tests
- [x] Confirm tool postprocessor resolves UI by `module/method`
- [x] Update integration/live test expectations to the new tool_call_args/result shape
- [x] Confirm integration/live flows execute successfully with the new runtime shape (calendar read verified 2026-04-24)
## Phase 5: Skill Refactor For Progressive Disclosure
### 5.1 Calendar skill packaging
- [x] Rewrite `backend/src/core/agentscope/tools/skills/calendar/SKILL.md` as a short index card
- [x] Add `actions/` files for each calendar action
- [x] Keep action files short, canonical, and example-driven
### 5.2 Skill composition rules
- [x] Document when calendar should compose with contacts for phone lookup
- [x] Document when worker should read only the index
- [x] Document when worker should read one action file before calling the tool
- [x] Remove long prose that does not help execution
### 5.3 View skill flow
- [x] Ensure `view_skill_file` can read the new per-action file layout
- [x] Verify enabled-skill restrictions still work with nested action files
- [x] Add tests for reading short skill index + action files
## Phase 6: Frontend and AG-UI Alignment
### 6.1 Frontend assumptions audit
- [x] Audit `apps/lib/core/chat/ag_ui_event.dart` for tool arg assumptions
- [x] Audit `apps/lib/features/chat/presentation/bloc/chat_bloc.dart` for `project_cli` payload assumptions
- [x] Audit calendar refresh logic that currently looks for `command/subcommand`
### 6.2 Compatibility decision
- [x] Decide whether frontend should switch refresh logic to `module/method`
- [x] Update frontend parsing if required
- [x] Keep `ui_schema` rendering path unchanged unless protocol docs require otherwise
### 6.3 Cross-layer verification checklist
- [x] Confirm backend tool payload examples match frontend parser expectations
- [x] Confirm history/SSE still preserve tool result display behavior
- [ ] Confirm calendar detail navigation behavior still matches event identity semantics
## Phase 7: Verification
### 7.1 Supabase-backed regression scenarios
- [ ] Reproduce the previous "known event_id detail lookup" scenario and verify `get_event` is used
- [ ] Reproduce create-event scenario and verify canonical field names only
- [ ] Reproduce same-day and range listing scenarios
### 7.2 Runtime cost controls
- [ ] Verify worker max iterations are capped at 7
- [ ] Verify worker does not spend extra turns reading unnecessary skill files
- [ ] Review whether the redesign keeps common runs under the target token/cost budget
### 7.3 Code quality
- [x] Run targeted backend tests
- [x] Run targeted frontend tests if parser logic changes
- [x] Run relevant integration or live tests when feasible
- [x] Record what was verified and what remains unverified
## Completion Criteria
- [ ] `project_cli` remains the only executable business tool
- [ ] Worker-facing CLI protocol uses `module/method/input`
- [ ] Calendar actions map to actual product objects and routes
- [ ] Skills are index-first and action-scoped
- [ ] Worker `max_iters=7` is wired
- [ ] Worker `context_messages` ambiguity is removed
- [ ] Docs, backend, and frontend expectations are aligned
## Current Status Note
- [x] Backend protocol and unit/regression tests are updated to `module/method/input`
- [x] Calendar read inputs now use strong typed `date` / timezone-aware `datetime` / `UUID` validation
- [x] Integration/live tests have been rerun after the `module/method` redesign
- [x] `project_cli` tool schema: `input` changed from optional to required (root cause of empty input bug)
- [x] Router prompt cleaned: removed `project_cli_defaults` and time-resolution duties
- [x] Worker contract prompt cleaned: removed `project_cli_defaults` reference
- [x] Calendar SKILL.md rewritten with concrete examples referencing `USER_CONTEXT_JSON` variables
- [x] Integration test assertions migrated from `skill/action` to `module/method`
- [x] 5/6 integration tests passing (calendar read, calendar create, contacts read, memory update, tool flow read)
- [ ] `test_tool_ui_schema_in_history` failing: history API returns tool messages without `metadata.tool_agent_output` (pre-existing issue, not related to prompt changes)
- [ ] Action card filenames under `calendar/actions/` still use old names (`list_day.md`, `get_event.md`) instead of method-based names matching `module/method` contract
- [ ] Per-method review needed: verify each `project_cli` method (create, update, delete, share, accept_invite, reject_invite) works end-to-end with current prompts
@@ -0,0 +1,705 @@
# Single CLI + Progressive Skill Disclosure Redesign PRD
## 1. Goal
This task redesigns the current agent tool protocol around one confirmed product constraint:
1. The runtime should continue to expose exactly one business tool to the worker agent: `project_cli`.
2. The worker should learn how to use the tool through progressive skill disclosure instead of receiving a large global tool surface up front.
3. The current `command + subcommand + args` transport should be replaced with a business-action protocol that matches real product objects and user intents.
4. The redesign must remain grounded in the current repository's actual schedule domain:
- `schedule_items`
- `schedule_subscriptions`
5. The redesign must reduce wasted retries and token consumption without reintroducing the old multi-tool schema explosion.
This PRD does not propose a broad agent-platform rewrite. It is a focused redesign of how the single CLI tool, skills, router output, and worker execution contract should work together.
## 2. Confirmed Repository Facts
### 2.1 Router is not ReAct
The router is a direct structured generation stage, not a ReAct loop.
Confirmed in:
- `backend/src/core/agentscope/runtime/runner.py:310`
- `backend/src/core/agentscope/runtime/runner.py:325`
`_run_router_stage()` uses `finalize_json_response(...)` and returns one `RouterAgentOutput` payload.
Implication:
- Router cost control depends on prompt/schema size and retries in `finalize_json_response`, not `max_iters`.
- Tool-choice ambiguity is a worker problem, not a router ReAct problem.
### 2.2 Worker is the only ReAct loop
The worker uses `JsonReActAgent`, which subclasses AgentScope `ReActAgent`.
Confirmed in:
- `backend/src/core/agentscope/runtime/json_react_agent.py`
- `backend/src/core/agentscope/runtime/runner.py:495`
The current code does not pass an explicit `max_iters`, so the worker inherits AgentScope's default.
Confirmed externally in the local environment by inspecting the installed `ReActAgent.__init__` signature:
- `max_iters=10`
Implication:
- The worker currently has too much room to repeat invalid tool calls before failing.
- This task will explicitly set worker `max_iters=7`.
### 2.3 Worker does not consume context_messages
The worker receives only the router contract message and not the original `context_messages` list.
Confirmed in:
- `backend/src/core/agentscope/runtime/runner.py:265`
- `backend/src/core/agentscope/runtime/runner.py:285`
- `backend/src/core/agentscope/runtime/runner.py:461`
Implication:
- `worker.config.context_messages` is currently semantically misleading.
- Router history context remains important.
- Worker runtime context should come from router output, system prompt, tool results, and optional memory, not duplicated chat history configuration.
### 2.4 Latest failure was caused by protocol mismatch, not missing data
Latest messages read from Supabase showed the following failure pattern:
- Worker repeatedly called `project_cli`
- Payload shape: `command=calendar`, `subcommand=read`, `args={"event_id": "..."}`
- Backend returned `INVALID_ARGUMENT: start_at and end_at are required`
- The same invalid call repeated until the worker exhausted the default ReAct limit
This proves:
1. The worker knew the event identifier.
2. The current CLI protocol did not expose a clear "get one event by id" action.
3. The current naming (`read`) encouraged the worker to map both range listing and single-event detail lookup onto one ambiguous command.
### 2.5 The current calendar domain is already split into two real business objects
Database evidence:
- `public.schedule_items`
- `public.schedule_subscriptions`
Current schema highlights:
`schedule_items`
- `id`
- `owner_id`
- `title`
- `description`
- `start_at`
- `end_at`
- `timezone`
- `metadata`
- `recurrence_rule`
- `source_type`
- `status`
`schedule_subscriptions`
- `item_id`
- `subscriber_id`
- `permission`
- `notify_level`
- `status`
Current backend routes and services already reflect this split:
- list events by range
- get event by id
- create event
- update event
- delete event
- share/invite event
- accept subscription
- reject subscription
Confirmed in:
- `backend/src/v1/schedule_items/router.py`
- `backend/src/v1/schedule_items/service.py`
### 2.6 Frontend already distinguishes list vs detail vs invite flows
Confirmed in:
- `apps/lib/features/calendar/data/apis/calendar_api.dart`
- `apps/lib/features/calendar/data/repositories/calendar_repository.dart`
The frontend already calls:
- `GET /schedule-items?start_at&end_at`
- `GET /schedule-items/{id}`
- `POST /schedule-items/{id}/share`
- `POST /schedule-items/{id}/accept`
- `POST /schedule-items/{id}/reject`
Implication:
- The product itself already separates these business operations.
- The ambiguity exists in the agent CLI input contract, not in the underlying app/domain design.
## 3. Problem Statement
The current one-tool design has the right high-level direction but the wrong action protocol.
### 3.1 What was correct in the previous refactor
The following direction remains valid and should be preserved:
1. One AgentScope tool entry (`project_cli`) is preferable to many domain tools for token control.
2. AgentScope skills should be the mechanism for teaching the model when and how to use the tool.
3. Tool outputs should remain structured and machine-oriented.
4. AG-UI/UI-schema compilation should remain backend-owned.
5. The worker should not receive all tool knowledge eagerly.
### 3.2 What is no longer acceptable
The following parts of the previous CLI protocol should be replaced:
1. `command + subcommand + args` as the model-facing protocol.
2. Ambiguous action names such as `read` that cover more than one business intent.
3. Loose `args: dict[str, Any]` semantics that encourage field guessing.
4. Legacy alias drift such as `start_time/end_time`, `event_timezone`, and other migration leftovers.
5. Runtime dependence on long prose skill files instead of short execution-oriented action cards.
### 3.3 Why the old CLI shape fails even though the single-tool strategy is good
The current single-tool protocol is too generic for a small model.
The worker must infer, from weak labels like `read`, all of the following at once:
1. Which business object is involved.
2. Whether the user wants a list or one detail record.
3. Which fields are mandatory for that specific subcommand.
4. Which field names are canonical.
This moves too much burden from the runtime protocol into model guesswork.
The result is not just correctness risk. It also increases token cost because the worker burns iterations learning through failure.
## 4. Design Principles
### 4.1 Keep exactly one tool
The worker should continue to see one executable tool:
- `project_cli`
Reason:
- avoids multi-tool selection overhead
- avoids injecting many tool schemas into every model call
- preserves a stable tool surface for worker prompting
### 4.2 Move model-facing semantics from CLI history to business actions
The model-facing protocol should describe business intent directly, not technical command-tree history.
Replace:
```json
{
"command": "calendar",
"subcommand": "read",
"args": {}
}
```
With:
```json
{
"module": "calendar",
"method": "read",
"input": {
"mode": "event",
"event_id": "<uuid>"
}
}
```
This preserves one tool while making the business contract explicit.
### 4.3 Use progressive disclosure for skill knowledge, not for raw global schema exposure
The worker should not receive all method definitions by default.
Instead:
1. Read a short skill index first.
2. Read the relevant method card only when necessary.
3. Call `project_cli` with the chosen `module/method/input` payload.
This keeps the token budget focused on the current business scenario.
### 4.4 Server-side validation stays strict even if the tool schema stays thin
To avoid a large tool schema, `project_cli` may expose only a thin outer schema:
- `module`
- `method`
- `input`
Strict validation then happens server-side by dispatching `module + method` to the corresponding Pydantic model.
For calendar reads, the input must use strong typed domain values at the schema boundary:
- day reads: `date`
- range reads: timezone-aware `datetime`
- single-event reads: `UUID`
The transport remains JSON, but the backend contract must validate these as typed values immediately instead of accepting arbitrary strings and reparsing them later.
This preserves strictness without forcing the entire action matrix into the model context.
### 4.5 No broad backward-compatibility layer
This redesign should not preserve old field aliases or broad coercion behavior.
Specifically, phase implementation should remove or reject:
- `args` as JSON string
- `start_time/end_time`
- `event_timezone`
- action overloading under `read`
The system should fail clearly and structurally instead of guessing.
## 5. Target Architecture
## 5.1 Runtime responsibilities
### Router
The router remains a direct structured output stage.
It should continue to decide:
- the objective
- whether tool evidence is required
It should be extended to optionally provide stronger execution hints:
- `selected_skill`
- `intended_action`
- `known_entities`
- `known_time_range`
- `missing_fields`
These fields are not there to make router execute tools. They are there to reduce worker exploration cost.
### Worker
The worker remains the only ReAct stage.
Worker changes in this redesign:
1. Explicitly set `max_iters=7`.
2. Keep `temperature` unchanged.
3. Stop pretending worker consumes `context_messages` configuration.
4. Prefer router execution hints before reading additional skill files.
5. Read the smallest relevant skill file possible before tool use.
### Tool
The worker still sees only:
- `project_cli`
- `view_skill_file`
`project_cli` is the execution boundary.
`view_skill_file` is the progressive-disclosure knowledge boundary.
## 5.2 New `project_cli` model-facing input contract
The new canonical model-facing payload is:
```json
{
"skill": "calendar",
"action": "get_event",
"input": {
"event_id": "<uuid>"
}
}
```
Field meanings:
- `skill`: enabled business skill namespace
- `action`: concrete business operation inside the skill
- `input`: strict action-specific payload
This is still one tool call. The worker is not choosing among many tools.
## 5.3 Calendar action protocol
The calendar skill should be redesigned around real business actions derived from `schedule_items` and `schedule_subscriptions`.
### Event actions
1. `list_day`
2. `list_range`
3. `get_event`
4. `create_event`
5. `update_event`
6. `delete_event`
### Subscription actions
1. `invite_subscriber`
2. `accept_invite`
3. `reject_invite`
### Why this action set
This set directly maps to current product behavior:
- user asks what is scheduled today -> `list_day`
- user asks what is scheduled this week -> `list_range`
- user asks for a known event's details -> `get_event`
- user creates or edits a schedule item -> `create_event` / `update_event`
- user removes a schedule item -> `delete_event`
- user invites another person -> `invite_subscriber`
- invite recipient responds -> `accept_invite` / `reject_invite`
This avoids overloading one label like `read` for two distinct business tasks.
## 5.4 Canonical calendar action shapes
### `list_day`
```json
{
"skill": "calendar",
"action": "list_day",
"input": {
"date": "2026-04-23",
"timezone": "Asia/Shanghai"
}
}
```
### `list_range`
```json
{
"skill": "calendar",
"action": "list_range",
"input": {
"start_at": "2026-04-23T00:00:00+08:00",
"end_at": "2026-04-24T00:00:00+08:00"
}
}
```
### `get_event`
```json
{
"skill": "calendar",
"action": "get_event",
"input": {
"event_id": "<uuid>"
}
}
```
### `create_event`
```json
{
"skill": "calendar",
"action": "create_event",
"input": {
"title": "Project sync",
"start_at": "2026-04-23T16:00:00+08:00",
"end_at": "2026-04-23T17:00:00+08:00",
"timezone": "Asia/Shanghai",
"description": "optional",
"metadata": {
"location": "optional",
"reminder_minutes": 30,
"color": "blue",
"notes": "optional"
}
}
}
```
### `update_event`
```json
{
"skill": "calendar",
"action": "update_event",
"input": {
"event_id": "<uuid>",
"patch": {
"title": "Updated title",
"start_at": "2026-04-23T18:00:00+08:00",
"timezone": "Asia/Shanghai",
"status": "archived"
}
}
}
```
### `delete_event`
```json
{
"skill": "calendar",
"action": "delete_event",
"input": {
"event_id": "<uuid>"
}
}
```
### `invite_subscriber`
```json
{
"skill": "calendar",
"action": "invite_subscriber",
"input": {
"event_id": "<uuid>",
"invitee": {
"phone": "+8613812345678"
},
"permissions": {
"view": true,
"edit": false,
"invite": false
}
}
}
```
### `accept_invite`
```json
{
"skill": "calendar",
"action": "accept_invite",
"input": {
"event_id": "<uuid>"
}
}
```
### `reject_invite`
```json
{
"skill": "calendar",
"action": "reject_invite",
"input": {
"event_id": "<uuid>"
}
}
```
## 5.5 Skill packaging for progressive disclosure
The calendar skill should no longer be one long explanatory page that the worker must read in full.
Recommended structure:
```text
calendar/
SKILL.md # very short index / navigation card
actions/
list_day.md
list_range.md
get_event.md
create_event.md
update_event.md
delete_event.md
invite_subscriber.md
accept_invite.md
reject_invite.md
```
### `SKILL.md` responsibilities
- describe when calendar skill is relevant
- list all actions in one screen
- say which action to use for known `event_id`
- say which action to use for date/range queries
- point to action files for exact payloads
### Action file responsibilities
Each action file should contain only:
1. when to use the action
2. required fields
3. optional fields
4. one canonical example
5. forbidden field names and common mistakes
This makes `view_skill_file` a real progressive-disclosure mechanism instead of a markdown dump.
## 5.6 Error contract for self-correction
The redesigned CLI should return structured action-level validation feedback.
Canonical error example:
```json
{
"status": "failure",
"error": {
"code": "INVALID_ACTION_INPUT",
"message": "action list_range requires start_at and end_at",
"skill": "calendar",
"action": "list_range",
"missing_fields": ["start_at", "end_at"],
"unexpected_fields": ["event_id"],
"suggested_alternative_actions": ["get_event"]
}
}
```
This is intentionally more corrective than the current generic `INVALID_ARGUMENT` payload.
## 6. Token and Cost Control Strategy
### 6.1 Preserve single-tool economy
The main token-saving choice is to preserve one executable business tool.
This avoids:
- multiple tool schemas in each worker call
- model confusion over which tool to pick first
- large repeated tool descriptions in every turn
### 6.2 Replace global knowledge with scoped reading
The worker should read:
1. router execution hints first
2. skill index second
3. one action card if needed
This is cheaper than injecting the entire action matrix into every prompt.
### 6.3 Stop spending iterations on protocol discovery
The redesign reduces cost not by suppressing useful reasoning, but by removing the need for repeated failed exploration.
The worker should no longer need multiple failed attempts to discover:
- whether `event_id` belongs to `read`
- whether `start_time` is valid
- whether `event_timezone` is accepted
### 6.4 Concrete worker settings for this redesign
- set worker `max_iters=7`
- keep worker `temperature` unchanged
- remove/ignore worker `context_messages` configuration in runtime semantics
### 6.5 Explicit non-goals in this task
This task does not include:
- changing router into a ReAct stage
- lowering worker temperature
- adding duplicate-failure circuit breakers yet
- exposing many separate AgentScope tools again
## 7. Migration Plan
### Phase 0: Planning and protocol design
1. Write this PRD and implementation checklist.
2. Update protocol docs before runtime code changes.
3. Record rejected alternatives and reasoning.
### Phase 1: Backend runtime contract
1. Extend router output schema with optional execution hints.
2. Explicitly set worker `max_iters=7`.
3. Remove semantic reliance on worker `context_messages`.
4. Redesign `project_cli` request payload as `skill/action/input`.
### Phase 2: Calendar action dispatch
1. Replace current calendar command/subcommand routing with action dispatch.
2. Implement strict action-specific Pydantic models.
3. Remove legacy alias handling and generic dict coercion.
4. Return structured correction-oriented validation errors.
### Phase 3: Skill refactor
1. Rewrite `calendar/SKILL.md` as a short index card.
2. Add per-action action-card files.
3. Update skill instructions so worker reads only what is needed.
### Phase 4: Cross-layer alignment
1. Update relevant protocol docs.
2. Keep frontend consumption stable where possible.
3. Ensure tool result and AG-UI event semantics remain compatible.
### Phase 5: Verification
1. Reproduce the previous failure case and confirm it routes to `get_event`.
2. Verify create-event flow uses canonical names only.
3. Verify range/day queries still work.
4. Verify invite/accept/reject flows map to current schedule subscription behavior.
## 8. Rejected Alternatives
### 8.1 Rejected: split back into many tools
Reason:
- reintroduces tool-schema bloat
- worsens tool-choice ambiguity
- increases token overhead on every worker step
### 8.2 Rejected: keep `command/subcommand/args` and fix only the skill text
Reason:
- the ambiguity is structural, not editorial
- `read` still overloads distinct business operations
- loose dict input still encourages field guessing
### 8.3 Rejected: put the full action schema into the tool prompt directly
Reason:
- defeats progressive disclosure
- grows the worker prompt on every turn
- hurts cost and small-model reliability
## 9. Success Criteria
This redesign is successful only if all of the following are true:
1. The worker still sees one executable business tool.
2. The worker chooses calendar actions through business semantics, not command-tree guesswork.
3. The previous repeated-failure case becomes a direct `get_event` call when `event_id` is known.
4. The worker no longer relies on undocumented field aliases.
5. The runtime protocol is strictly validated server-side.
6. Skill reading is incremental and action-scoped.
7. Worker iteration cost is bounded by `max_iters=7`.
8. Backend, protocol docs, and frontend assumptions remain aligned.
@@ -0,0 +1,85 @@
{
"id": "redesign-single-cli-skill-disclosure",
"name": "redesign-single-cli-skill-disclosure",
"title": "Redesign single CLI + progressive skill disclosure protocol",
"description": "Redesign the current single CLI tool into a business-action protocol driven by progressive skill disclosure. Preserve one AgentScope tool, replace legacy command/subcommand/args guessing with strict module/method/input dispatch, align router-worker contracts with actual runtime behavior, and reduce token waste without reintroducing multi-tool schema bloat.",
"status": "in_progress",
"dev_type": "fullstack",
"scope": "cross-domain",
"priority": "P1",
"creator": "qzl",
"assignee": "qzl",
"createdAt": "2026-04-23",
"completedAt": null,
"branch": null,
"base_branch": "dev",
"worktree_path": null,
"current_phase": 6,
"next_action": [
{
"phase": 1,
"action": "implement"
},
{
"phase": 2,
"action": "check"
},
{
"phase": 6,
"action": "finish"
},
{
"phase": 4,
"action": "create-pr"
}
],
"commit": null,
"pr_url": null,
"subtasks": [
{
"name": "Write PRD for single CLI + progressive skill disclosure redesign",
"status": "completed"
},
{
"name": "Define calendar business action protocol from schedule_items and schedule_subscriptions",
"status": "completed"
},
{
"name": "Define router and worker contract changes for lower-token execution",
"status": "completed"
},
{
"name": "Define skill packaging for index-first progressive disclosure",
"status": "completed"
},
{
"name": "Define backend dispatch and validation migration plan",
"status": "completed"
},
{
"name": "Define protocol/frontend alignment and verification plan",
"status": "completed"
}
],
"children": [],
"parent": null,
"relatedFiles": [
"backend/src/core/agentscope/runtime/runner.py",
"backend/src/core/agentscope/runtime/json_react_agent.py",
"backend/src/core/agentscope/tools/internal/project_cli.py",
"backend/src/core/agentscope/tools/internal/view_skill_file.py",
"backend/src/core/agentscope/tools/cli/adapter.py",
"backend/src/core/agentscope/tools/skills/calendar/SKILL.md",
"backend/src/v1/schedule_items/router.py",
"backend/src/v1/schedule_items/service.py",
"backend/src/v1/schedule_items/schemas.py",
"apps/lib/features/calendar/data/apis/calendar_api.dart",
"apps/lib/features/calendar/data/repositories/calendar_repository.dart",
"docs/protocols/agent/sse-events.md",
"docs/protocols/agent/tool-protocol.md"
],
"notes": "This task now supersedes both the older command/subcommand/args direction and the intermediate skill/action/input direction for the CLI input protocol while keeping the validated parts of the prior refactor: one tool entry, AgentScope skills, structured tool outputs, and backend-owned AG-UI compilation. Phase 1 protocol docs are updated to module/method/input. Phase 2 runtime contract is updated with worker max_iters=7 and a lighter router contract that now requires time values in context_summary to be standardized to downstream project_cli input formats, including project_cli_defaults when deterministically known. Phase 3 and the calendar-focused business-method redesign are now in place: project_cli uses module/method/input, runtime-side skill gating was removed from project_cli, the CLI router dispatches by module+method, calendar reads were collapsed into calendar.read with strong typed `date`/timezone-aware `datetime`/`UUID` input variants, calendar mutations use module-scoped methods, contacts/memory align to the same envelope, tool postprocessing resolves ui_hints from module/method, and skill docs now teach module/method usage instead of leaking transport concerns into the tool contract. Backend unit/regression coverage is green for the updated AgentScope/tool stack. Integration/live tests have not yet been rerun after the module/method and strong-typing redesign, so end-to-end verification remains incomplete.",
"meta": {
"feature_summary": "single project_cli redesign + progressive skill disclosure + business action protocol + lower token/runtime ambiguity"
}
}
@@ -263,10 +263,10 @@ class ChatBloc extends Cubit<ChatState> implements ChatOrchestrator {
if (args == null) {
return false;
}
final command = (args['command'] as String?)?.trim().toLowerCase();
final subcommand = (args['subcommand'] as String?)?.trim().toLowerCase();
const mutationSubcommands = {'create', 'update', 'delete'};
if (command != 'calendar' || !mutationSubcommands.contains(subcommand)) {
final skill = (args['skill'] as String?)?.trim().toLowerCase();
final action = (args['action'] as String?)?.trim().toLowerCase();
const mutationActions = {'create_event', 'update_event', 'delete_event'};
if (skill != 'calendar' || !mutationActions.contains(action)) {
return false;
}
return status == 'success' || status == 'partial';
@@ -262,35 +262,15 @@ class HomeChatItemRenderer {
ToolResultItem item,
) {
final colorScheme = Theme.of(context).colorScheme;
final rootNode = item.uiSchema['root'];
final appearance = rootNode is Map<String, dynamic>
? rootNode['appearance'] as String?
: null;
final needsOuterCard = appearance == null || appearance == 'plain';
final schemaContent = UiSchemaRenderer(
context,
colorScheme,
).renderSchema(item.uiSchema);
final wrappedContent = needsOuterCard
? Container(
width: double.infinity,
padding: const EdgeInsets.all(AppSpacing.md),
decoration: BoxDecoration(
color: colorScheme.surfaceContainerLow.withValues(alpha: 0.65),
borderRadius: BorderRadius.circular(AppRadius.lg),
border: Border.all(
color: colorScheme.outlineVariant.withValues(alpha: 0.25),
),
),
child: schemaContent,
)
: schemaContent;
final schemaContent = UiSchemaRenderer(context, colorScheme).renderSchema(
item.uiSchema,
);
return Align(
alignment: Alignment.centerLeft,
child: FractionallySizedBox(
widthFactor: _toolResultWidthFactor,
child: wrappedContent,
child: schemaContent,
),
);
}
File diff suppressed because it is too large Load Diff
@@ -234,7 +234,7 @@ void main() {
});
test(
'tool calendar_create success triggers calendar refresh callback',
'calendar mutation tool result triggers calendar refresh callback',
() async {
final service = _FakeAgUiService();
var refreshCalls = 0;
@@ -251,7 +251,10 @@ void main() {
messageId: 'msg-1',
toolCallId: 'call-1',
toolName: 'project_cli',
toolCallArgs: const {'command': 'calendar', 'subcommand': 'create'},
toolCallArgs: const {
'skill': 'calendar',
'action': 'create_event',
},
result: const {'ok': true},
status: 'success',
uiSchema: null,
@@ -264,6 +267,36 @@ void main() {
},
);
test('calendar read tool result does not trigger calendar refresh callback', () async {
final service = _FakeAgUiService();
var refreshCalls = 0;
final bloc = ChatBloc(
service: service,
chatApi: _NoopChatApi(),
onCalendarMutated: () async {
refreshCalls += 1;
},
);
service.emitEventForTest(
ToolCallResultEvent(
messageId: 'msg-1',
toolCallId: 'call-1',
toolName: 'project_cli',
toolCallArgs: const {
'skill': 'calendar',
'action': 'list_day',
},
result: const {'ok': true},
status: 'success',
uiSchema: null,
),
);
await Future<void>.delayed(Duration.zero);
expect(refreshCalls, 0);
});
test(
'sendMessage recovers from premature SSE close with polled history',
() async {
@@ -16,18 +16,28 @@ def _wrap_section(section: str, content: str) -> str:
return f"{start}\n{body}\n{end}" if body else f"{start}\n{end}"
def _config_rules(llm_config: SystemAgentLLMConfig | None) -> list[str]:
def _config_rules(
llm_config: SystemAgentLLMConfig | None,
*,
include_context_messages: bool = True,
) -> list[str]:
if llm_config is None:
return []
context_mode = llm_config.context_messages.mode.value
context_count = llm_config.context_messages.count
enabled_skills = [skill.value for skill in llm_config.enabled_skills]
return [
"[Runtime Config]",
f"- context_messages.mode={context_mode}",
f"- context_messages.count={context_count}",
f"- enabled_skills={','.join(enabled_skills) if enabled_skills else 'default'}",
]
rules = ["[Runtime Config]"]
if include_context_messages:
context_mode = llm_config.context_messages.mode.value
context_count = llm_config.context_messages.count
rules.extend(
[
f"- context_messages.mode={context_mode}",
f"- context_messages.count={context_count}",
]
)
rules.append(
f"- enabled_skills={','.join(enabled_skills) if enabled_skills else 'default'}"
)
return rules
PromptRuleBuilder = Callable[[SystemAgentLLMConfig | None], list[str]]
@@ -60,7 +70,7 @@ def _router_rules(llm_config: SystemAgentLLMConfig | None) -> list[str]:
"[Responsibilities]",
"- Router only: extract intent and route strategy; never answer user directly.",
"- Set objective to the user's goal in a concise, faithful sentence.",
"- Set context_summary to a brief description of what context messages contain.",
"- Set context_summary to a brief but execution-useful summary of the relevant context, including known IDs, dates, time ranges, and prior tool outcomes when they matter.",
"- Set requires_tool_evidence=true when the task needs tool execution to ground the answer.",
"- Set requires_tool_evidence=false when the question can be answered directly from context.",
*_config_rules(llm_config),
@@ -75,14 +85,17 @@ def _worker_rules(llm_config: SystemAgentLLMConfig | None) -> list[str]:
"[Responsibilities]",
"- Worker only: execute routed objective without changing router intent.",
"- Treat router output as objective contract, not as a fully-materialized tool-args payload.",
"- Use objective plus context_summary as the primary execution guide from the router.",
"- Infer deterministic required tool arguments from contract fields, tool schema, and runtime context.",
"- Ask minimal clarification only when required arguments cannot be inferred safely.",
"- Ground every claim in available evidence and tool results; never fabricate execution state.",
"- When requires_tool_evidence=true, do not finalize an answer from failed tool calls; either recover with a corrected tool call or explicitly surface that execution failed.",
"- If all tool calls fail under requires_tool_evidence=true, set status=failed and populate error; do not present a factual answer as confirmed.",
"- Keep status/answer/suggested_actions/error internally consistent.",
"[Schema Guidance]",
"- The worker output schema is injected at runtime; follow it exactly.",
"- Do not add fields that are not present in the injected schema.",
*_config_rules(llm_config),
*_config_rules(llm_config, include_context_messages=False),
]
@@ -97,8 +110,10 @@ def build_worker_contract_prompt(*, router_output: RouterAgentOutput) -> str:
"[Worker Contract]",
"- Keep routed objective unchanged.",
"- Use objective as the execution target.",
"- Use context_summary to understand conversational background.",
"- Use context_summary to understand conversational background and reuse concrete facts already known from earlier context.",
"- When requires_tool_evidence=true, you MUST call at least one tool before answering.",
"- A failed tool call does not count as grounding evidence for a factual answer.",
"- If no tool call succeeds, finalize with status=failed and a concrete error instead of a fact claim.",
"- Infer deterministic missing required tool args from evidence + tool schema.",
"- Ask clarification only when safe inference is impossible.",
"[RouterAgentOutput]",
@@ -39,7 +39,9 @@ from schemas.agent.forwarded_props import (
parse_forwarded_props_runtime_mode,
)
from schemas.agent.runtime_models import (
ErrorInfo,
RouterAgentOutput,
RunStatus,
WorkerAgentOutputLite,
)
from schemas.agent.skill_config import ProjectCliCommand, SkillName
@@ -74,6 +76,8 @@ class AgentScopeRunner:
self._active_agent: JsonReActAgent | None = None
self._active_agent_lock = asyncio.Lock()
_WORKER_MAX_ITERS = 7
async def execute(
self,
*,
@@ -442,6 +446,11 @@ class AgentScopeRunner:
if self._active_agent is agent:
self._active_agent = None
worker_payload = worker_output_model.model_validate(response_msg.metadata or {})
worker_payload = self._enforce_tool_evidence_contract(
worker_output=worker_payload,
requires_tool_evidence=requires_tool_evidence,
has_successful_tool_result=emitter.has_successful_tool_result,
)
response_metadata = self._llm_pricing_service.build_usage_metadata(
model=stage_config.model_code,
usage_summary=tracking_model.usage_summary(),
@@ -458,6 +467,28 @@ class AgentScopeRunner:
finally:
reset_tool_credential(credential_token)
@staticmethod
def _enforce_tool_evidence_contract(
*,
worker_output: WorkerAgentOutputLite,
requires_tool_evidence: bool,
has_successful_tool_result: bool,
) -> WorkerAgentOutputLite:
if not requires_tool_evidence or has_successful_tool_result:
return worker_output
return worker_output.model_copy(
update={
"status": RunStatus.FAILED,
"answer": "无法确认结果:所需工具调用未成功完成。",
"suggested_actions": [],
"error": ErrorInfo(
code="TOOL_EVIDENCE_MISSING",
message="requires_tool_evidence=true but no tool call completed successfully in this run",
retryable=False,
),
}
)
def _build_worker_input_messages(
self,
*,
@@ -501,6 +532,7 @@ class AgentScopeRunner:
model: TrackingChatModel,
emitter: PipelineStageEmitter | None = None,
force_tool_on_first_reasoning: bool = False,
max_iters: int = _WORKER_MAX_ITERS,
) -> JsonReActAgent:
return JsonReActAgent(
name=agent_name,
@@ -511,6 +543,7 @@ class AgentScopeRunner:
memory=InMemoryMemory(),
emitter=emitter,
force_tool_on_first_reasoning=force_tool_on_first_reasoning,
max_iters=max_iters,
)
async def _emit_step_event(
@@ -36,8 +36,13 @@ class PipelineStageEmitter:
self._emit_tool_events = emit_tool_events
self._emitted_tool_calls: set[str] = set()
self._emitted_tool_results: set[str] = set()
self._has_successful_tool_result = False
self.latest_text_message_id: str | None = None
@property
def has_successful_tool_result(self) -> bool:
return self._has_successful_tool_result
async def handle_print(self, *, msg: Msg, last: bool) -> None:
del last
if self._emit_tool_events:
@@ -126,6 +131,8 @@ class PipelineStageEmitter:
payload["error"] = tool_output.error.model_dump(mode="json")
await self._emit("TOOL_CALL_RESULT", payload)
if tool_output.status.value in {"success", "partial"}:
self._has_successful_tool_result = True
self._emitted_tool_results.add(tool_call_id)
async def _emit(self, event_type: str, payload: dict[str, Any]) -> None:
@@ -1,11 +1,11 @@
from __future__ import annotations
import json
from typing import Any
from agentscope.tool import ToolResponse
from agentscope.message import TextBlock
from core.agentscope.tools.cli.contracts import get_method_input_contract
from core.agentscope.tools.cli.handlers import build_router
from core.agentscope.tools.cli.models import CliCommand
from core.agentscope.tools.cli.router import CommandRouter
@@ -44,29 +44,44 @@ def _resolve_owner_id() -> str:
return owner_id
def _with_method_contract(
*,
module: str,
method: str,
error: ErrorInfo | None,
) -> ErrorInfo | None:
if error is None:
return None
contract = get_method_input_contract(module=module, method=method)
if contract is None:
return error
details = dict(error.details or {})
for key, value in contract.items():
details.setdefault(key, value)
message = error.message
retry_hint = contract.get("retry_hint")
if isinstance(retry_hint, str) and retry_hint and retry_hint not in message:
message = f"{message} {retry_hint}".strip()
return error.model_copy(update={"message": message, "details": details})
async def invoke_cli_tool(
*,
tool_name: str,
tool_call_args: dict[str, Any],
allowed_commands: set[str] | None = None,
) -> ToolResponse:
command = str(tool_call_args.get("command", "")).strip()
subcommand = str(tool_call_args.get("subcommand", "")).strip()
args = tool_call_args.get("args")
if isinstance(args, str):
try:
parsed_args = json.loads(args)
except (json.JSONDecodeError, ValueError):
parsed_args = None
if isinstance(parsed_args, dict):
args = parsed_args
if not isinstance(args, dict):
args = {}
module = str(tool_call_args.get("module", "")).strip()
method = str(tool_call_args.get("method", "")).strip()
input_payload = tool_call_args.get("input")
if not isinstance(input_payload, dict):
input_payload = {}
tool_call_args = {
**tool_call_args,
"subcommand": subcommand,
"args": args,
"module": module,
"method": method,
"input": input_payload,
}
if tool_name != "project_cli":
@@ -76,29 +91,29 @@ async def invoke_cli_tool(
code="UNKNOWN_TOOL",
message=f"unsupported tool: {tool_name}",
)
if not command or not subcommand:
if not module or not method:
return _build_error(
tool_name=tool_name,
tool_call_args=tool_call_args,
code="INVALID_ARGUMENT",
message="command and subcommand are required",
message="module and method are required",
)
router = _get_router()
if allowed_commands is not None and command not in allowed_commands:
if allowed_commands is not None and module not in allowed_commands:
return _build_error(
tool_name=tool_name,
tool_call_args=tool_call_args,
code="COMMAND_NOT_ALLOWED",
message=f"command not enabled: {command}",
code="MODULE_NOT_ALLOWED",
message=f"module not enabled: {module}",
)
if (command, subcommand) not in router.command_pairs:
if (module, method) not in router.module_methods:
return _build_error(
tool_name=tool_name,
tool_call_args=tool_call_args,
code="UNKNOWN_COMMAND",
message=f"unknown command: {command} {subcommand}",
code="UNKNOWN_METHOD",
message=f"unknown method: {module} {method}",
)
try:
@@ -113,9 +128,9 @@ async def invoke_cli_tool(
)
request = CliCommand(
command=command,
subcommand=subcommand,
args=args,
module=module,
method=method,
input=input_payload,
owner_id=owner_id,
)
@@ -131,11 +146,17 @@ async def invoke_cli_tool(
)
status = ToolStatus.SUCCESS if cli_result.ok else ToolStatus.FAILURE
error_info = cli_result.error
error_info = _with_method_contract(
module=module,
method=method,
error=cli_result.error,
)
result = {
"command": cli_result.command,
"subcommand": cli_result.subcommand,
"ok": cli_result.ok,
"module": cli_result.module,
"method": cli_result.method,
"data": cli_result.data,
"error": error_info.model_dump(mode="json", exclude_none=True) if error_info else None,
}
tool_call_id = get_current_tool_call_id(tool_name=tool_name)
@@ -171,14 +192,27 @@ def _build_error(
code: str,
message: str,
) -> ToolResponse:
module = str((tool_call_args or {}).get("module", "")).strip()
method = str((tool_call_args or {}).get("method", "")).strip()
error_info = _with_method_contract(
module=module,
method=method,
error=ErrorInfo(code=code, message=message, retryable=False),
)
tool_call_id = get_current_tool_call_id(tool_name=tool_name)
output = ToolAgentOutput(
tool_name=tool_name,
tool_call_id=tool_call_id,
tool_call_args=tool_call_args,
status=ToolStatus.FAILURE,
result={"status": "failure", "code": code, "message": message},
error=ErrorInfo(code=code, message=message, retryable=False),
result={
"ok": False,
"module": module,
"method": method,
"data": None,
"error": error_info.model_dump(mode="json", exclude_none=True) if error_info else None,
},
error=error_info,
)
from core.agentscope.tools.tool_postprocessor import postprocess_tool_output
@@ -0,0 +1,112 @@
from __future__ import annotations
from typing import Any
METHOD_INPUT_CONTRACTS: dict[tuple[str, str], dict[str, Any]] = {
("calendar", "read"): {
"input_schema": {
"mode": "string enum(day|range|event)",
"date": "date, required when mode=day",
"timezone": "string (IANA timezone), optional when mode=day",
"start_at": "datetime with timezone, required when mode=range",
"end_at": "datetime with timezone, required when mode=range",
"event_id": "UUID, required when mode=event",
},
"expected_input_examples": [
{"mode": "day", "date": "2026-04-24", "timezone": "Asia/Shanghai"},
{
"mode": "range",
"start_at": "2026-04-24T09:00:00+08:00",
"end_at": "2026-04-24T18:00:00+08:00",
},
{"mode": "event", "event_id": "550e8400-e29b-41d4-a716-446655440000"},
],
"retry_hint": "For relative day requests, resolve the day to a concrete input.date value in YYYY-MM-DD format before retrying.",
},
("calendar", "create"): {
"input_schema": {
"title": "string",
"start_at": "datetime with timezone",
"end_at": "datetime with timezone | null",
"timezone": "string (IANA timezone)",
"description": "string | null",
"metadata": "object | null",
},
"expected_input_examples": [
{
"title": "Project sync",
"start_at": "2026-04-24T10:00:00+08:00",
"end_at": "2026-04-24T11:00:00+08:00",
"timezone": "Asia/Shanghai",
}
],
},
("calendar", "update"): {
"input_schema": {
"event_id": "UUID",
"patch": "object with mutable event fields",
"patch.start_at": "datetime with timezone | omitted",
"patch.end_at": "datetime with timezone | null | omitted",
},
"expected_input_examples": [
{
"event_id": "550e8400-e29b-41d4-a716-446655440000",
"patch": {"title": "Updated title", "timezone": "Asia/Shanghai"},
}
],
},
("calendar", "delete"): {
"input_schema": {"event_id": "UUID"},
"expected_input_examples": [{"event_id": "550e8400-e29b-41d4-a716-446655440000"}],
},
("calendar", "share"): {
"input_schema": {
"event_id": "UUID",
"invitee": "object { phone: string }",
"permissions": "object { view: bool, edit: bool, invite: bool }",
},
"expected_input_examples": [
{
"event_id": "550e8400-e29b-41d4-a716-446655440000",
"invitee": {"phone": "+8613800138000"},
"permissions": {"view": True, "edit": False, "invite": False},
}
],
},
("calendar", "accept_invite"): {
"input_schema": {"event_id": "UUID"},
"expected_input_examples": [{"event_id": "550e8400-e29b-41d4-a716-446655440000"}],
},
("calendar", "reject_invite"): {
"input_schema": {"event_id": "UUID"},
"expected_input_examples": [{"event_id": "550e8400-e29b-41d4-a716-446655440000"}],
},
("contacts", "read"): {
"input_schema": {},
"expected_input_examples": [{}],
},
("memory", "update"): {
"input_schema": {
"operations": "array of objects",
"operations[].action": "string (update | delete)",
"operations[].memory_type": "string (user | work)",
},
"expected_input_examples": [
{
"operations": [
{
"action": "update",
"memory_type": "user",
"user_content": {"preferences": {"meeting_time": "morning"}},
}
]
}
],
},
}
def get_method_input_contract(*, module: str, method: str) -> dict[str, Any] | None:
contract = METHOD_INPUT_CONTRACTS.get((module.strip(), method.strip()))
return dict(contract) if contract is not None else None
@@ -1,17 +1,15 @@
from __future__ import annotations
from datetime import date, datetime, timedelta
from typing import Any
from datetime import date, datetime, timedelta, timezone
from typing import Annotated, Any, Literal
from uuid import UUID
from zoneinfo import ZoneInfo
from core.agentscope.tools.cli.models import CliCommand, CliCommandResult
from pydantic import BaseModel, ConfigDict, Field, TypeAdapter, ValidationError, field_validator
from core.agentscope.tools.utils.calendar_domain import (
build_schedule_metadata,
create_schedule_service,
map_calendar_exception,
merge_schedule_metadata_for_update,
parse_iso_datetime,
schedule_event_to_dict,
)
from schemas.agent.runtime_models import ErrorInfo
@@ -19,23 +17,185 @@ from schemas.enums import ScheduleItemStatus
from v1.schedule_items.schemas import (
ScheduleItemCreateRequest,
ScheduleItemListRequest,
ScheduleItemMetadata,
ScheduleItemShareRequest,
ScheduleItemUpdateRequest,
)
async def handle_calendar_read(request: CliCommand) -> CliCommandResult:
class _CalendarReadRangeInput(BaseModel):
model_config = ConfigDict(extra="forbid")
mode: Literal["range"]
start_at: datetime
end_at: datetime
@field_validator("start_at", "end_at")
@classmethod
def _validate_aware_datetime(cls, value: datetime) -> datetime:
if value.tzinfo is None:
raise ValueError("datetime must include timezone offset")
return value
class _CalendarReadDayInput(BaseModel):
model_config = ConfigDict(extra="forbid")
mode: Literal["day"]
date: date
timezone: str = "Asia/Shanghai"
class _CalendarReadEventInput(BaseModel):
model_config = ConfigDict(extra="forbid")
mode: Literal["event"]
event_id: UUID
_CalendarReadInput = Annotated[
_CalendarReadDayInput | _CalendarReadRangeInput | _CalendarReadEventInput,
Field(discriminator="mode"),
]
_CALENDAR_READ_INPUT_ADAPTER = TypeAdapter(_CalendarReadInput)
class _CalendarInviteeInput(BaseModel):
model_config = ConfigDict(extra="forbid")
phone: str
class _CalendarPermissionsInput(BaseModel):
model_config = ConfigDict(extra="forbid")
view: bool = True
edit: bool = False
invite: bool = False
class _CalendarInviteSubscriberInput(BaseModel):
model_config = ConfigDict(extra="forbid")
event_id: UUID
invitee: _CalendarInviteeInput
permissions: _CalendarPermissionsInput = Field(default_factory=_CalendarPermissionsInput)
class _CalendarCreateEventInput(BaseModel):
model_config = ConfigDict(extra="forbid")
title: str = Field(min_length=1, max_length=255)
start_at: datetime
end_at: datetime | None = None
timezone: str = Field(min_length=1, max_length=50)
description: str | None = Field(default=None, max_length=2000)
metadata: ScheduleItemMetadata | None = None
@field_validator("start_at", "end_at")
@classmethod
def _validate_create_datetimes(cls, value: datetime | None) -> datetime | None:
if value is not None and value.tzinfo is None:
raise ValueError("datetime must include timezone offset")
return value
class _CalendarUpdatePatchInput(BaseModel):
model_config = ConfigDict(extra="forbid")
title: str | None = Field(default=None, min_length=1, max_length=255)
description: str | None = Field(default=None, max_length=2000)
start_at: datetime | None = None
end_at: datetime | None = None
timezone: str | None = Field(default=None, min_length=1, max_length=50)
metadata: ScheduleItemMetadata | None = None
status: str | None = None
@field_validator("start_at", "end_at")
@classmethod
def _validate_patch_datetimes(cls, value: datetime | None) -> datetime | None:
if value is not None and value.tzinfo is None:
raise ValueError("datetime must include timezone offset")
return value
class _CalendarUpdateEventInput(BaseModel):
model_config = ConfigDict(extra="forbid")
event_id: UUID
patch: _CalendarUpdatePatchInput
class _CalendarInviteResponseInput(BaseModel):
model_config = ConfigDict(extra="forbid")
event_id: UUID
def _validate_action_input(
request: CliCommand,
validator: type[BaseModel] | TypeAdapter[Any],
) -> Any | CliCommandResult:
try:
if isinstance(validator, TypeAdapter):
return validator.validate_python(request.input)
return validator.model_validate(request.input)
except ValidationError as exc:
missing_fields: list[str] = []
invalid_fields: list[str] = []
for error in exc.errors():
location = error.get("loc") or ()
if not location:
continue
field_path = ".".join(str(part) for part in location)
error_type = str(error.get("type") or "")
if error_type == "missing":
missing_fields.append(field_path)
else:
invalid_fields.append(field_path)
details: dict[str, Any] = {
"missing_fields": sorted(set(missing_fields)),
"invalid_fields": sorted(set(invalid_fields)),
}
alias_corrections = _alias_corrections_for_input(request.input)
if alias_corrections:
details["alias_corrections"] = alias_corrections
message = "input does not match method schema"
return CliCommandResult(
ok=False,
module=request.module,
method=request.method,
error=ErrorInfo(
code="INVALID_ACTION_INPUT",
message=message,
retryable=False,
details=details,
),
)
def _alias_corrections_for_input(input_payload: dict[str, Any]) -> dict[str, str]:
alias_map = {
"start_time": "start_at",
"end_time": "end_at",
"event_timezone": "timezone",
}
corrections: dict[str, str] = {}
for alias, canonical in alias_map.items():
if alias in input_payload:
corrections[alias] = canonical
return corrections
async def handle_calendar_list_range(request: CliCommand) -> CliCommandResult:
from core.db.session import AsyncSessionLocal
parsed_start, parsed_end, read_error = _resolve_read_range(request)
if read_error is not None:
return _fail(request=request, code="INVALID_ARGUMENT", message=read_error)
if parsed_start is None or parsed_end is None:
return _fail(
request=request,
code="INVALID_ARGUMENT",
message="start_at and end_at are required",
)
validated = _validate_action_input(request, _CalendarReadRangeInput)
if isinstance(validated, CliCommandResult):
return validated
parsed_start = validated.start_at.astimezone(timezone.utc)
parsed_end = validated.end_at.astimezone(timezone.utc)
if parsed_start >= parsed_end:
return _fail(
request=request,
@@ -50,24 +210,75 @@ async def handle_calendar_read(request: CliCommand) -> CliCommandResult:
event_items = [schedule_event_to_dict(item) for item in items]
return CliCommandResult(
ok=True,
command="calendar",
subcommand="read",
module="calendar",
method=request.method,
data={"total": len(event_items), "items": event_items},
)
async def handle_calendar_create(request: CliCommand) -> CliCommandResult:
async def handle_calendar_list_day(request: CliCommand) -> CliCommandResult:
validated = _validate_action_input(request, _CALENDAR_READ_INPUT_ADAPTER)
if isinstance(validated, CliCommandResult):
return validated
if isinstance(validated, _CalendarReadEventInput):
return await handle_calendar_get_event(request)
if isinstance(validated, _CalendarReadRangeInput):
return await handle_calendar_list_range(request)
day_request = request.model_copy(
update={
"input": _day_input_to_range_input(validated),
}
)
return await handle_calendar_list_range(day_request)
async def handle_calendar_get_event(request: CliCommand) -> CliCommandResult:
from core.db.session import AsyncSessionLocal
validated = _validate_action_input(request, _CalendarReadEventInput)
if isinstance(validated, CliCommandResult):
return validated
event_id = validated.event_id
async with AsyncSessionLocal() as session:
service = create_schedule_service(session, UUID(request.owner_id))
try:
result_item = await _create_event(service, request.args)
item = await service.get_by_id(event_id)
return CliCommandResult(
ok=True,
module="calendar",
method=request.method,
data=schedule_event_to_dict(item),
)
except Exception as exc:
code, message, retryable = map_calendar_exception(exc)
return CliCommandResult(
ok=False,
module="calendar",
method=request.method,
error=ErrorInfo(code=code, message=message, retryable=retryable),
)
async def handle_calendar_create_event(request: CliCommand) -> CliCommandResult:
from core.db.session import AsyncSessionLocal
validated = _validate_action_input(request, _CalendarCreateEventInput)
if isinstance(validated, CliCommandResult):
return validated
async with AsyncSessionLocal() as session:
service = create_schedule_service(session, UUID(request.owner_id))
try:
result_item = await _create_event(service, validated)
event_id = str(result_item.get("eventId") or "")
return CliCommandResult(
ok=True,
command=request.command,
subcommand=request.subcommand,
module=request.module,
method=request.method,
data={
"status": "success",
"success": 1,
@@ -80,8 +291,8 @@ async def handle_calendar_create(request: CliCommand) -> CliCommandResult:
code, message, retryable = map_calendar_exception(exc)
return CliCommandResult(
ok=False,
command=request.command,
subcommand=request.subcommand,
module=request.module,
method=request.method,
data={
"status": "failure",
"success": 0,
@@ -89,7 +300,7 @@ async def handle_calendar_create(request: CliCommand) -> CliCommandResult:
"ids": [],
"results": [
{
"action": "create",
"action": request.method,
"status": "failure",
"eventId": "",
"code": code,
@@ -101,19 +312,23 @@ async def handle_calendar_create(request: CliCommand) -> CliCommandResult:
)
async def handle_calendar_update(request: CliCommand) -> CliCommandResult:
async def handle_calendar_update_event(request: CliCommand) -> CliCommandResult:
from core.db.session import AsyncSessionLocal
validated = _validate_action_input(request, _CalendarUpdateEventInput)
if isinstance(validated, CliCommandResult):
return validated
async with AsyncSessionLocal() as session:
service = create_schedule_service(session, UUID(request.owner_id))
event_id = str(request.args.get("event_id") or "").strip()
event_id = str(validated.event_id)
try:
result_item = await _update_event(service, request.args)
result_item = await _update_event(service, validated)
event_id = str(result_item.get("eventId") or event_id)
return CliCommandResult(
ok=True,
command=request.command,
subcommand=request.subcommand,
module=request.module,
method=request.method,
data={
"status": "success",
"success": 1,
@@ -126,8 +341,8 @@ async def handle_calendar_update(request: CliCommand) -> CliCommandResult:
code, message, retryable = map_calendar_exception(exc)
return CliCommandResult(
ok=False,
command=request.command,
subcommand=request.subcommand,
module=request.module,
method=request.method,
data={
"status": "failure",
"success": 0,
@@ -135,7 +350,7 @@ async def handle_calendar_update(request: CliCommand) -> CliCommandResult:
"ids": [],
"results": [
{
"action": "update",
"action": request.method,
"status": "failure",
"eventId": event_id,
"code": code,
@@ -147,24 +362,22 @@ async def handle_calendar_update(request: CliCommand) -> CliCommandResult:
)
async def handle_calendar_delete(request: CliCommand) -> CliCommandResult:
async def handle_calendar_delete_event(request: CliCommand) -> CliCommandResult:
from core.db.session import AsyncSessionLocal
validated = _validate_action_input(request, _CalendarReadEventInput)
if isinstance(validated, CliCommandResult):
return validated
async with AsyncSessionLocal() as session:
service = create_schedule_service(session, UUID(request.owner_id))
event_id = str(request.args.get("event_id") or "").strip()
if not event_id:
return _fail(
request=request,
code="INVALID_ARGUMENT",
message="event_id is required",
)
event_id = str(validated.event_id)
try:
await service.delete(UUID(event_id))
return CliCommandResult(
ok=True,
command=request.command,
subcommand=request.subcommand,
module=request.module,
method=request.method,
data={
"status": "success",
"success": 1,
@@ -172,7 +385,7 @@ async def handle_calendar_delete(request: CliCommand) -> CliCommandResult:
"ids": [event_id],
"results": [
{
"action": "delete",
"action": request.method,
"status": "success",
"eventId": event_id,
}
@@ -183,8 +396,8 @@ async def handle_calendar_delete(request: CliCommand) -> CliCommandResult:
code, message, retryable = map_calendar_exception(exc)
return CliCommandResult(
ok=False,
command=request.command,
subcommand=request.subcommand,
module=request.module,
method=request.method,
data={
"status": "failure",
"success": 0,
@@ -192,7 +405,7 @@ async def handle_calendar_delete(request: CliCommand) -> CliCommandResult:
"ids": [],
"results": [
{
"action": "delete",
"action": request.method,
"status": "failure",
"eventId": event_id,
"code": code,
@@ -204,155 +417,199 @@ async def handle_calendar_delete(request: CliCommand) -> CliCommandResult:
)
async def handle_calendar_share(request: CliCommand) -> CliCommandResult:
async def handle_calendar_invite_subscriber(request: CliCommand) -> CliCommandResult:
from core.db.session import AsyncSessionLocal
event_id = str(request.args.get("event_id", ""))
invitees = request.args.get("invitees")
if not isinstance(invitees, list):
invitees = []
validated = _validate_action_input(request, _CalendarInviteSubscriberInput)
if isinstance(validated, CliCommandResult):
return validated
event_id = str(validated.event_id)
async with AsyncSessionLocal() as session:
service = create_schedule_service(session, UUID(request.owner_id))
target_uuid = UUID(event_id)
invited: list[str] = []
result_items: list[dict[str, str]] = []
for inv in invitees:
raw_phone = inv.get("phone", "").strip()
normalized_phone = _normalize_phone(raw_phone)
if not normalized_phone:
result_items.append(
{
"phone": raw_phone,
"status": "failure",
"code": "INVALID_ARGUMENT",
"message": "invalid phone",
}
)
continue
permission = {
"permission_view": inv.get("permission_view", True),
"permission_edit": inv.get("permission_edit", False),
"permission_invite": inv.get("permission_invite", False),
}
try:
await service.share(
target_uuid,
ScheduleItemShareRequest(phone=normalized_phone, **permission),
)
invited.append(normalized_phone)
result_items.append({"phone": normalized_phone, "status": "success"})
except Exception as exc:
code, message, _ = map_calendar_exception(exc)
result_items.append(
{
"phone": normalized_phone,
"status": "failure",
"code": code,
"message": message,
}
)
failure_count = len([r for r in result_items if r["status"] == "failure"])
success_count = len(invited)
status = _batch_status(success_count, failure_count)
return CliCommandResult(
ok=status != "failure",
command=request.command,
subcommand=request.subcommand,
data={
"status": status,
"success": success_count,
"failed": failure_count,
"results": result_items,
},
)
raw_phone = validated.invitee.phone.strip()
normalized_phone = _normalize_phone(raw_phone)
if not normalized_phone:
return CliCommandResult(
ok=False,
module=request.module,
method=request.method,
data={
"status": "failure",
"success": 0,
"failed": 1,
"results": [
{
"phone": raw_phone,
"status": "failure",
"code": "INVALID_ACTION_INPUT",
"message": "invalid phone",
}
],
},
error=ErrorInfo(code="INVALID_ACTION_INPUT", message="invalid phone", retryable=False),
)
try:
await service.share(
target_uuid,
ScheduleItemShareRequest(
phone=normalized_phone,
permission_view=validated.permissions.view,
permission_edit=validated.permissions.edit,
permission_invite=validated.permissions.invite,
),
)
return CliCommandResult(
ok=True,
module=request.module,
method=request.method,
data={
"status": "success",
"success": 1,
"failed": 0,
"results": [{"phone": normalized_phone, "status": "success"}],
},
)
except Exception as exc:
code, message, retryable = map_calendar_exception(exc)
return CliCommandResult(
ok=False,
module=request.module,
method=request.method,
data={
"status": "failure",
"success": 0,
"failed": 1,
"results": [
{
"phone": normalized_phone,
"status": "failure",
"code": code,
"message": message,
}
],
},
error=ErrorInfo(code=code, message=message, retryable=retryable),
)
async def _create_event(service: Any, args: dict[str, Any]) -> dict[str, Any]:
start_at = args.get("start_at")
if not isinstance(start_at, str) or not start_at.strip():
raise ValueError("create requires start_at")
event_timezone = args.get("event_timezone")
if not isinstance(event_timezone, str) or not event_timezone.strip():
raise ValueError("create requires event_timezone")
parsed_start = parse_iso_datetime(start_at)
if parsed_start is None:
raise ValueError("invalid start_at")
async def handle_calendar_accept_invite(request: CliCommand) -> CliCommandResult:
from core.db.session import AsyncSessionLocal
parsed_end = None
end_at = args.get("end_at")
if isinstance(end_at, str) and end_at.strip():
parsed_end = parse_iso_datetime(end_at)
if parsed_end is None:
raise ValueError("invalid end_at")
validated = _validate_action_input(request, _CalendarInviteResponseInput)
if isinstance(validated, CliCommandResult):
return validated
event_id = str(validated.event_id)
async with AsyncSessionLocal() as session:
service = create_schedule_service(session, UUID(request.owner_id))
try:
result = await service.accept_subscription(UUID(event_id))
return CliCommandResult(ok=True, module=request.module, method=request.method, data=result)
except Exception as exc:
code, message, retryable = map_calendar_exception(exc)
return CliCommandResult(
ok=False,
module=request.module,
method=request.method,
error=ErrorInfo(code=code, message=message, retryable=retryable),
)
async def handle_calendar_reject_invite(request: CliCommand) -> CliCommandResult:
from core.db.session import AsyncSessionLocal
validated = _validate_action_input(request, _CalendarInviteResponseInput)
if isinstance(validated, CliCommandResult):
return validated
event_id = str(validated.event_id)
async with AsyncSessionLocal() as session:
service = create_schedule_service(session, UUID(request.owner_id))
try:
result = await service.reject_subscription(UUID(event_id))
return CliCommandResult(ok=True, module=request.module, method=request.method, data=result)
except Exception as exc:
code, message, retryable = map_calendar_exception(exc)
return CliCommandResult(
ok=False,
module=request.module,
method=request.method,
error=ErrorInfo(code=code, message=message, retryable=retryable),
)
async def _create_event(service: Any, input_payload: _CalendarCreateEventInput) -> dict[str, Any]:
parsed_start = input_payload.start_at.astimezone(timezone.utc)
parsed_end = (
input_payload.end_at.astimezone(timezone.utc)
if input_payload.end_at is not None
else None
)
created = await service.create_agent_generated(
ScheduleItemCreateRequest(
title=str(args.get("title") or "new event").strip(),
description=(str(args.get("description") or "").strip() or None),
title=input_payload.title.strip(),
description=(input_payload.description.strip() if input_payload.description else None),
start_at=parsed_start,
end_at=parsed_end,
timezone=event_timezone.strip(),
metadata=build_schedule_metadata(
args.get("location"),
args.get("color"),
args.get("reminder_minutes"),
),
timezone=input_payload.timezone.strip(),
metadata=input_payload.metadata,
)
)
return {"action": "create", "status": "success", "eventId": str(created.id)}
async def _update_event(service: Any, args: dict[str, Any]) -> dict[str, Any]:
event_id = args.get("event_id")
if not isinstance(event_id, str) or not event_id.strip():
raise ValueError("update requires event_id")
async def _update_event(service: Any, input_payload: _CalendarUpdateEventInput) -> dict[str, Any]:
event_id = str(input_payload.event_id)
patch = input_payload.patch.model_dump(exclude_unset=True)
update_data: dict[str, Any] = {}
if "title" in args:
update_data["title"] = str(args.get("title") or "").strip()
if "description" in args:
update_data["description"] = str(args.get("description") or "").strip()
if "start_at" in args:
start_value = args.get("start_at")
if not isinstance(start_value, str) or not start_value.strip():
raise ValueError("start_at must be non-empty string")
parsed_start = parse_iso_datetime(start_value)
if parsed_start is None:
raise ValueError("invalid start_at")
update_data["start_at"] = parsed_start
if "end_at" in args:
end_value = args.get("end_at")
if "title" in patch:
update_data["title"] = str(patch.get("title") or "").strip()
if "description" in patch:
update_data["description"] = str(patch.get("description") or "").strip()
if "start_at" in patch:
start_value = patch.get("start_at")
if not isinstance(start_value, datetime):
raise ValueError("start_at must be datetime with timezone")
update_data["start_at"] = start_value.astimezone(timezone.utc)
if "end_at" in patch:
end_value = patch.get("end_at")
if end_value in (None, ""):
update_data["end_at"] = None
elif isinstance(end_value, str):
parsed_end = parse_iso_datetime(end_value)
if parsed_end is None:
raise ValueError("invalid end_at")
update_data["end_at"] = parsed_end
elif isinstance(end_value, datetime):
update_data["end_at"] = end_value.astimezone(timezone.utc)
else:
raise ValueError("end_at must be string or null")
if "event_timezone" in args:
timezone_value = args.get("event_timezone")
raise ValueError("end_at must be datetime with timezone or null")
if "timezone" in patch:
timezone_value = patch.get("timezone")
if not isinstance(timezone_value, str) or not timezone_value.strip():
raise ValueError("event_timezone must be non-empty string")
raise ValueError("timezone must be non-empty string")
update_data["timezone"] = timezone_value.strip()
if "status" in args:
update_data["status"] = ScheduleItemStatus(str(args.get("status")))
if "status" in patch:
update_data["status"] = ScheduleItemStatus(str(patch.get("status")))
if any(key in args for key in ("location", "color", "reminder_minutes")):
if "metadata" in patch:
existing = await service.get_by_id(UUID(event_id))
update_data["metadata"] = merge_schedule_metadata_for_update(
existing_metadata=existing.metadata,
location=args.get("location"),
color=args.get("color"),
reminder_minutes=args.get("reminder_minutes"),
)
metadata_payload = patch.get("metadata")
if metadata_payload is None:
update_data["metadata"] = ScheduleItemMetadata.model_validate({})
else:
metadata_dict = (
metadata_payload.model_dump() if isinstance(metadata_payload, ScheduleItemMetadata) else metadata_payload
)
update_data["metadata"] = ScheduleItemMetadata.model_validate(
{
**(existing.metadata.model_dump() if existing.metadata else {}),
**metadata_dict,
}
)
if not update_data:
raise ValueError("update requires at least one mutable field")
raise ValueError("patch requires at least one mutable field")
changed_fields = sorted(update_data.keys())
updated = await service.update(
@@ -395,55 +652,34 @@ def _batch_status(success: int, failed: int) -> str:
return "partial"
def _resolve_read_range(
request: CliCommand,
) -> tuple[datetime | None, datetime | None, str | None]:
start_at = str(request.args.get("start_at", "")).strip()
end_at = str(request.args.get("end_at", "")).strip()
if start_at and end_at:
try:
return parse_iso_datetime(start_at), parse_iso_datetime(end_at), None
except ValueError as exc:
return None, None, str(exc)
raw_date = str(request.args.get("date", "")).strip()
if not raw_date:
return None, None, None
timezone_name = (
str(request.args.get("timezone", "Asia/Shanghai")).strip() or "Asia/Shanghai"
)
def _day_input_to_range_input(input_payload: _CalendarReadDayInput) -> dict[str, str]:
timezone_name = input_payload.timezone.strip() or "Asia/Shanghai"
try:
zone = ZoneInfo(timezone_name)
except Exception:
return None, None, "timezone is invalid"
try:
target_date = date.fromisoformat(raw_date)
except ValueError:
return None, None, "date must be YYYY-MM-DD"
except Exception as exc:
raise ValueError("timezone is invalid") from exc
start_local = datetime(
year=target_date.year,
month=target_date.month,
day=target_date.day,
year=input_payload.date.year,
month=input_payload.date.month,
day=input_payload.date.day,
hour=0,
minute=0,
second=0,
tzinfo=zone,
)
end_local = start_local + timedelta(days=1)
return (
parse_iso_datetime(start_local.isoformat()),
parse_iso_datetime(end_local.isoformat()),
None,
)
return {
"mode": "range",
"start_at": start_local.isoformat(),
"end_at": end_local.isoformat(),
}
def _fail(*, request: CliCommand, code: str, message: str) -> CliCommandResult:
return CliCommandResult(
ok=False,
command=request.command,
subcommand=request.subcommand,
module=request.module,
method=request.method,
error=ErrorInfo(code=code, message=message, retryable=False),
)
@@ -20,8 +20,8 @@ async def handle_contacts_read(request: CliCommand) -> CliCommandResult:
contacts = await _list_friend_contacts(session=session, owner_id=UUID(request.owner_id))
return CliCommandResult(
ok=True,
command=request.command,
subcommand=request.subcommand,
module=request.module,
method=request.method,
data={
"friends_count": len(contacts),
"friends": contacts,
@@ -17,11 +17,15 @@ from schemas.domain.memory_content import UserMemoryContent, WorkProfileContent
async def handle_memory_update(request: CliCommand) -> CliCommandResult:
from core.db.session import AsyncSessionLocal
operations = request.args.get("operations")
operations = request.input.get("operations")
if not isinstance(operations, list) or not operations:
return _invalid_argument(
request=request,
message="operations must be a non-empty list",
details={
"required_fields": ["operations"],
"field_types": {"operations": "array of objects"},
},
)
async with AsyncSessionLocal() as session:
@@ -135,8 +139,8 @@ async def handle_memory_update(request: CliCommand) -> CliCommandResult:
return CliCommandResult(
ok=status != "failure",
command=request.command,
subcommand=request.subcommand,
module=request.module,
method=request.method,
data={
"status": status,
"success": success_count,
@@ -233,12 +237,22 @@ async def _apply_delete_operation(
}
def _invalid_argument(*, request: CliCommand, message: str) -> CliCommandResult:
def _invalid_argument(
*,
request: CliCommand,
message: str,
details: dict[str, Any] | None,
) -> CliCommandResult:
return CliCommandResult(
ok=False,
command=request.command,
subcommand=request.subcommand,
error=ErrorInfo(code="INVALID_ARGUMENT", message=message, retryable=False),
module=request.module,
method=request.method,
error=ErrorInfo(
code="INVALID_ARGUMENT",
message=message,
retryable=False,
details=details,
),
)
@@ -1,11 +1,13 @@
from __future__ import annotations
from core.agentscope.tools.cli.handler_calendar import (
handle_calendar_create,
handle_calendar_delete,
handle_calendar_read,
handle_calendar_share,
handle_calendar_update,
handle_calendar_accept_invite,
handle_calendar_create_event,
handle_calendar_delete_event,
handle_calendar_invite_subscriber,
handle_calendar_list_day,
handle_calendar_reject_invite,
handle_calendar_update_event,
)
from core.agentscope.tools.cli.handler_contacts import handle_contacts_read
from core.agentscope.tools.cli.handler_memory import handle_memory_update
@@ -14,11 +16,13 @@ from core.agentscope.tools.cli.router import CommandRouter
def build_router() -> CommandRouter:
router = CommandRouter()
router.register(command="calendar", subcommand="create", handler=handle_calendar_create)
router.register(command="calendar", subcommand="read", handler=handle_calendar_read)
router.register(command="calendar", subcommand="update", handler=handle_calendar_update)
router.register(command="calendar", subcommand="delete", handler=handle_calendar_delete)
router.register(command="calendar", subcommand="share", handler=handle_calendar_share)
router.register(command="contacts", subcommand="read", handler=handle_contacts_read)
router.register(command="memory", subcommand="update", handler=handle_memory_update)
router.register(module="calendar", method="read", handler=handle_calendar_list_day)
router.register(module="calendar", method="create", handler=handle_calendar_create_event)
router.register(module="calendar", method="update", handler=handle_calendar_update_event)
router.register(module="calendar", method="delete", handler=handle_calendar_delete_event)
router.register(module="calendar", method="share", handler=handle_calendar_invite_subscriber)
router.register(module="calendar", method="accept_invite", handler=handle_calendar_accept_invite)
router.register(module="calendar", method="reject_invite", handler=handle_calendar_reject_invite)
router.register(module="contacts", method="read", handler=handle_contacts_read)
router.register(module="memory", method="update", handler=handle_memory_update)
return router
@@ -10,9 +10,9 @@ from schemas.agent.runtime_models import ErrorInfo
class CliCommand(BaseModel):
model_config = ConfigDict(extra="forbid")
command: str
subcommand: str
args: dict[str, Any] = Field(default_factory=dict)
module: str
method: str
input: dict[str, Any] = Field(default_factory=dict)
owner_id: str
@@ -20,7 +20,7 @@ class CliCommandResult(BaseModel):
model_config = ConfigDict(extra="forbid")
ok: bool
command: str
subcommand: str
module: str
method: str
data: Any = None
error: ErrorInfo | None = None
+24 -24
View File
@@ -17,30 +17,30 @@ class CommandRouter:
def __init__(self) -> None:
self._handlers: dict[tuple[str, str], CliHandler] = {}
def register(self, *, command: str, subcommand: str, handler: CliHandler) -> None:
key = (command, subcommand)
def register(self, *, module: str, method: str, handler: CliHandler) -> None:
key = (module, method)
if key in self._handlers:
raise ValueError(f"command already registered: {command} {subcommand}")
raise ValueError(f"method already registered: {module} {method}")
self._handlers[key] = handler
@property
def commands(self) -> set[str]:
return {command for command, _ in self._handlers.keys()}
def modules(self) -> set[str]:
return {module for module, _ in self._handlers.keys()}
@property
def command_pairs(self) -> set[tuple[str, str]]:
def module_methods(self) -> set[tuple[str, str]]:
return set(self._handlers.keys())
async def dispatch(self, request: CliCommand) -> CliCommandResult:
handler = self._handlers.get((request.command, request.subcommand))
handler = self._handlers.get((request.module, request.method))
if handler is None:
return CliCommandResult(
ok=False,
command=request.command,
subcommand=request.subcommand,
module=request.module,
method=request.method,
error=ErrorInfo(
code="UNKNOWN_COMMAND",
message=f"unknown command: {request.command} {request.subcommand}",
code="UNKNOWN_METHOD",
message=f"unknown method: {request.module} {request.method}",
retryable=False,
),
)
@@ -49,14 +49,14 @@ class CommandRouter:
except Exception as exc:
logger.error(
"CLI handler failed",
command=request.command,
subcommand=request.subcommand,
module=request.module,
method=request.method,
error=str(exc),
)
return CliCommandResult(
ok=False,
command=request.command,
subcommand=request.subcommand,
module=request.module,
method=request.method,
error=ErrorInfo(
code="HANDLER_ERROR",
message=str(exc),
@@ -75,11 +75,11 @@ async def cli_main(argv: list[str] | None = None) -> None:
_write_output(
CliCommandResult(
ok=False,
command=argv[0] if argv else "",
subcommand=argv[1] if len(argv) > 1 else "",
module=argv[0] if argv else "",
method=argv[1] if len(argv) > 1 else "",
error=ErrorInfo(
code="MISSING_COMMAND",
message="command and subcommand are required",
code="MISSING_METHOD",
message="module and method are required",
retryable=False,
),
)
@@ -94,17 +94,17 @@ async def cli_main(argv: list[str] | None = None) -> None:
_write_output(
CliCommandResult(
ok=False,
command=argv[0],
subcommand=argv[1],
module=argv[0],
method=argv[1],
error=ErrorInfo(
code="INVALID_ARGS",
message="args must be valid JSON",
code="INVALID_INPUT",
message="input must be valid JSON",
retryable=False,
),
)
)
sys.exit(1)
request = CliCommand(command=argv[0], subcommand=argv[1], args=args, owner_id=str(args.get("owner_id", "")))
request = CliCommand(module=argv[0], method=argv[1], input=args, owner_id=str(args.get("owner_id", "")))
result = await router.dispatch(request)
_write_output(result)
if not result.ok:
@@ -9,16 +9,19 @@ from core.agentscope.tools.cli import invoke_cli_tool
PROJECT_CLI_TOOL_NAME = "project_cli"
def make_project_cli_wrapper(*, allowed_commands: set[str]) -> Any:
def make_project_cli_wrapper(
*,
allowed_commands: set[str],
) -> Any:
async def _project_cli(
command: str,
subcommand: str,
args: dict[str, Any] | None = None,
module: str,
method: str,
input: dict[str, Any],
) -> ToolResponse:
tool_call_args = {
"command": command,
"subcommand": subcommand,
"args": args or {},
"module": module,
"method": method,
"input": input,
}
return await invoke_cli_tool(
tool_name=PROJECT_CLI_TOOL_NAME,
@@ -27,12 +30,14 @@ def make_project_cli_wrapper(*, allowed_commands: set[str]) -> Any:
)
_project_cli.__name__ = PROJECT_CLI_TOOL_NAME
_project_cli.__doc__ = """Execute CLI commands for calendar, contacts, and memory operations.
_project_cli.__doc__ = """Execute business methods for enabled modules (calendar, contacts, memory, etc.).
You MUST read the relevant skill file via view_skill_file before calling this tool to learn the correct method names and input shapes for each module. Do not guess input fields.
Args:
command: The command to execute (calendar, contacts, memory).
subcommand: The subcommand for the operation (calendar: create/read/update/delete/share; contacts: read; memory: update).
args: Arguments for the command as a JSON object.
module: Business module namespace (e.g., calendar, contacts, memory).
method: Module method to execute. Valid methods are listed in each module's skill file.
input: Method-specific input object. Shape depends on module and method -- read the skill file first.
Returns:
ToolResponse with the command result.
@@ -6,11 +6,23 @@ from typing import Any
from agentscope.message import TextBlock
from agentscope.tool import ToolResponse
from core.agentscope.tools.skill_session import SkillSessionState
from core.agentscope.tools.tool_call_context import (
get_current_tool_call_id,
store_tool_agent_output,
)
from core.agentscope.utils.parsing import project_tool_result_text
from schemas.agent.runtime_models import ErrorInfo, ToolAgentOutput, ToolStatus
SKILLS_DIR = Path(__file__).parent.parent / "skills"
VIEW_SKILL_FILE_TOOL_NAME = "view_skill_file"
def make_view_skill_file_wrapper(*, enabled_skill_names: set[str]) -> Any:
def make_view_skill_file_wrapper(
*,
enabled_skill_names: set[str],
skill_session: SkillSessionState,
) -> Any:
skills_root = SKILLS_DIR.resolve()
async def _view_skill_file(
@@ -23,13 +35,20 @@ def make_view_skill_file_wrapper(*, enabled_skill_names: set[str]) -> Any:
parts = normalized.split("/")
if not parts:
return _error_response("INVALID_PATH", "file_path cannot be empty")
return _error_response(
file_path=file_path,
ranges=ranges,
code="INVALID_PATH",
message="file_path cannot be empty",
)
skill_name = parts[0]
if skill_name not in enabled_skill_names:
return _error_response(
"ACCESS_DENIED",
f"skill '{skill_name}' is not enabled. Enabled skills: {sorted(enabled_skill_names)}",
file_path=file_path,
ranges=ranges,
code="ACCESS_DENIED",
message=f"skill '{skill_name}' is not enabled. Enabled skills: {sorted(enabled_skill_names)}",
)
target_path = skills_root / normalized
@@ -37,15 +56,30 @@ def make_view_skill_file_wrapper(*, enabled_skill_names: set[str]) -> Any:
target_path = target_path.resolve()
target_path.relative_to(skills_root)
except Exception:
return _error_response("ACCESS_DENIED", "access denied: path outside skills directory")
return _error_response(
file_path=file_path,
ranges=ranges,
code="ACCESS_DENIED",
message="access denied: path outside skills directory",
)
if not target_path.exists() or not target_path.is_file():
return _error_response("FILE_NOT_FOUND", f"file not found: {file_path}")
return _error_response(
file_path=file_path,
ranges=ranges,
code="FILE_NOT_FOUND",
message=f"file not found: {file_path}",
)
try:
content = target_path.read_text(encoding="utf-8")
except Exception as exc:
return _error_response("READ_ERROR", f"failed to read file: {exc}")
return _error_response(
file_path=file_path,
ranges=ranges,
code="READ_ERROR",
message=f"failed to read file: {exc}",
)
lines = content.splitlines()
if ranges and len(ranges) >= 2:
@@ -54,6 +88,17 @@ def make_view_skill_file_wrapper(*, enabled_skill_names: set[str]) -> Any:
lines = lines[start - 1 : end]
text = "\n".join(lines)
skill_session.mark_read(skill_name=skill_name)
tool_call_id = get_current_tool_call_id(tool_name=VIEW_SKILL_FILE_TOOL_NAME)
payload = ToolAgentOutput(
tool_name=VIEW_SKILL_FILE_TOOL_NAME,
tool_call_id=tool_call_id,
tool_call_args={"file_path": normalized, "ranges": ranges},
status=ToolStatus.SUCCESS,
result={"file_path": normalized, "content": text},
).model_dump(mode="json", exclude_none=True)
store_tool_agent_output(tool_call_id=tool_call_id, payload=payload)
return ToolResponse(
content=[
@@ -78,14 +123,30 @@ Returns:
ToolResponse with the file content.
"""
return _view_skill_file
def _error_response(code: str, message: str) -> ToolResponse:
def _error_response(
*,
file_path: str,
ranges: list[int] | None,
code: str,
message: str,
) -> ToolResponse:
tool_call_id = get_current_tool_call_id(tool_name=VIEW_SKILL_FILE_TOOL_NAME)
payload = ToolAgentOutput(
tool_name=VIEW_SKILL_FILE_TOOL_NAME,
tool_call_id=tool_call_id,
tool_call_args={"file_path": file_path, "ranges": ranges},
status=ToolStatus.FAILURE,
result={"status": "failure", "code": code, "message": message},
error=ErrorInfo(code=code, message=message, retryable=False),
).model_dump(mode="json", exclude_none=True)
store_tool_agent_output(tool_call_id=tool_call_id, payload=payload)
return ToolResponse(
content=[
TextBlock(
type="text",
text=f"error: {code} - {message}",
text=project_tool_result_text(
{"status": "failure", "code": code, "message": message}
),
)
]
)
@@ -0,0 +1,15 @@
from __future__ import annotations
AGENT_SKILL_INSTRUCTION = """# Agent Skills
The entries below are skill indexes, not full execution instructions.
Before the first `project_cli` call for a skill in a run, you MUST read that skill's `SKILL.md` with `view_skill_file`.
Use the exact relative `file_path` shown below.
If the skill index tells you to inspect one method card, read that file with `view_skill_file` before calling `project_cli`.
Do not guess skill instructions from the summary alone.
"""
AGENT_SKILL_TEMPLATE = """## {name}
{description}
Read with `view_skill_file` using `file_path="{name}/SKILL.md"` before using `project_cli` for this skill."""
@@ -0,0 +1,16 @@
from __future__ import annotations
from dataclasses import dataclass, field
@dataclass
class SkillSessionState:
read_skill_names: set[str] = field(default_factory=set)
def mark_read(self, *, skill_name: str) -> None:
normalized = skill_name.strip()
if normalized:
self.read_skill_names.add(normalized)
def has_read(self, *, skill_name: str) -> bool:
return skill_name.strip() in self.read_skill_names
@@ -1,121 +1,128 @@
---
name: calendar
description: Calendar event management - read, create, update, delete, and share events.
description: Calendar event management via project_cli.
---
# Calendar Skill
## Execution Protocol
Read this file before the first calendar tool call in a run, then call `project_cli` with the correct `module`, `method`, and `input`.
1. On first calendar use in a run, call `view_skill_file` with `calendar/SKILL.md` before any `project_cli` call.
2. After reading, use `project_cli` only with `command="calendar"`.
3. If the user asks for actual schedule data, use `project_cli` to verify it. Do not guess results.
## Method: read
## When to Use
All calendar queries use `method="read"`. The `input` must contain `mode` plus mode-specific fields.
- User asks about their schedule or upcoming events
- User wants to create, update, or delete calendar events
- User wants to share a calendar event with someone
- User asks about event details within a date range
## Available Tool
Use the single tool `project_cli`.
Read this file first with `view_skill_file` when calendar is the relevant skill.
### Read Events
Call `project_cli` with:
### Query one day (today, tomorrow, a specific date)
```json
{
"command": "calendar",
"subcommand": "read",
"args": {
"start_at": "2026-04-21T00:00:00+08:00",
"end_at": "2026-04-22T00:00:00+08:00"
"module": "calendar",
"method": "read",
"input": {
"mode": "day",
"date": "YYYY-MM-DD",
"timezone": "Area/Zone"
}
}
```
Use this whenever the user asks what is scheduled, free, upcoming, or happening in a time range.
To resolve "today" or relative dates: extract the date part (before the T) from `system_time_local` in USER_CONTEXT_JSON. Use `timezone_effective` for timezone.
### Create Event
Call `project_cli` with:
### Query a time range
```json
{
"command": "calendar",
"subcommand": "create",
"args": {
"title": "Project sync",
"start_at": "2026-04-21T10:00:00+08:00",
"end_at": "2026-04-21T11:00:00+08:00",
"event_timezone": "Asia/Shanghai"
"module": "calendar",
"method": "read",
"input": {
"mode": "range",
"start_at": "2026-04-24T09:00:00+08:00",
"end_at": "2026-04-24T18:00:00+08:00"
}
}
```
### Update Event
Call `project_cli` with:
### Query a known event by ID
```json
{
"command": "calendar",
"subcommand": "update",
"args": {
"event_id": "<uuid>",
"title": "Updated title"
"module": "calendar",
"method": "read",
"input": {
"mode": "event",
"event_id": "550e8400-e29b-41d4-a716-446655440000"
}
}
```
### Delete Event
Call `project_cli` with:
## Method: create
```json
{
"command": "calendar",
"subcommand": "delete",
"args": {
"event_id": "<uuid>"
"module": "calendar",
"method": "create",
"input": {
"title": "Meeting title",
"start_at": "2026-04-24T10:00:00+08:00",
"end_at": "2026-04-24T11:00:00+08:00",
"timezone": "Asia/Shanghai"
}
}
```
Read first if you need to confirm the write payload shape instead of relying on memory.
### Share Events
Call `project_cli` with:
## Method: update
```json
{
"command": "calendar",
"subcommand": "share",
"args": {
"event_id": "<uuid>",
"invitees": []
"module": "calendar",
"method": "update",
"input": {
"event_id": "UUID",
"patch": { "title": "New title" }
}
}
```
## Composition Patterns
## Method: delete
1. To share an event with a friend:
- Call `view_skill_file` with `contacts/SKILL.md` if contacts instructions have not been read in this run
- Call `project_cli` `contacts read` to find friend phone numbers
- Call `project_cli` `calendar share` with the selected phone
```json
{
"module": "calendar",
"method": "delete",
"input": { "event_id": "UUID" }
}
```
2. To update a specific event:
- Call `project_cli` `calendar read` to find the event_id
- Call `project_cli` `calendar update` with target fields
## Method: share
## Failure Recovery
```json
{
"module": "calendar",
"method": "share",
"input": {
"event_id": "UUID",
"invitee": { "phone": "+8613800138000" }
}
}
```
- If `calendar create/update/delete` returns failure, report why and suggest retrying with corrected parameters.
- If `calendar share` fails for a phone, suggest verifying the phone number with `contacts read`.
## Methods: accept_invite, reject_invite
```json
{
"module": "calendar",
"method": "accept_invite",
"input": { "event_id": "UUID" }
}
```
## Rules
- Always fill `input` with all required fields. Never pass `input: {}`.
- Use `timezone_effective` from USER_CONTEXT_JSON as the default timezone.
- Resolve relative dates (today, tomorrow) to concrete YYYY-MM-DD from `system_time_local` in USER_CONTEXT_JSON before calling.
- Do not use old field names: command, subcommand, args, start_time, end_time, event_timezone.
## Composition
- To share an event but you only have a person description: read `contacts/SKILL.md` first, find the phone number, then call share.
- To update/delete an ambiguous event: call read first to list candidates, then call the mutation.
@@ -0,0 +1,22 @@
# accept_invite
## Input Schema
- `input.event_id`: required, `string`, UUID
## Output Shape
- success: subscription response object
- failure: `error.code`, `error.message`, `error.details`
Use when accepting a shared event invitation.
```json
{
"skill": "calendar",
"action": "accept_invite",
"input": {
"event_id": "550e8400-e29b-41d4-a716-446655440000"
}
}
```
@@ -0,0 +1,36 @@
# create_event
Use when creating a new event.
## Input Schema
- `input.title`: required, `string`
- `input.start_at`: required, `string`, ISO 8601 datetime
- `input.timezone`: required, `string`, IANA timezone
- `input.end_at`: optional, `string | null`, ISO 8601 datetime
- `input.description`: optional, `string | null`
- `input.metadata`: optional, `object | null`
## Output Shape
- success: `data.status`, `data.success`, `data.failed`, `data.ids`, `data.results`
- failure: `error.code`, `error.message`, `error.details`
```json
{
"skill": "calendar",
"action": "create_event",
"input": {
"title": "Project sync",
"start_at": "2026-04-23T10:00:00+08:00",
"end_at": "2026-04-23T11:00:00+08:00",
"timezone": "Asia/Shanghai",
"description": "Weekly planning"
}
}
```
## Rules
- Use `timezone`, not `event_timezone`.
- Use `start_at` and `end_at`, not `start_time` or `end_time`.
@@ -0,0 +1,22 @@
# delete_event
## Input Schema
- `input.event_id`: required, `string`, UUID
## Output Shape
- success: `data.status`, `data.success`, `data.failed`, `data.ids`, `data.results`
- failure: `error.code`, `error.message`, `error.details`
Use when deleting one known event.
```json
{
"skill": "calendar",
"action": "delete_event",
"input": {
"event_id": "550e8400-e29b-41d4-a716-446655440000"
}
}
```
@@ -0,0 +1,26 @@
# get_event
Use when the user already knows the target event identity.
## Input Schema
- `input.event_id`: required, `string`, UUID
## Output Shape
- success: `data.id`, `data.title`, `data.start_at`, `data.end_at`, ...
- failure: `error.code`, `error.message`, `error.details`
```json
{
"skill": "calendar",
"action": "get_event",
"input": {
"event_id": "550e8400-e29b-41d4-a716-446655440000"
}
}
```
## Rules
- Prefer this over list actions when an `event_id` is already available.
@@ -0,0 +1,40 @@
# invite_subscriber
Use when sharing an event with one phone number.
## Input Schema
- `input.event_id`: required, `string`, UUID
- `input.invitee`: required, `object`
- `input.invitee.phone`: required, `string`
- `input.permissions`: optional, `object`
- `input.permissions.view`: optional, `bool`
- `input.permissions.edit`: optional, `bool`
- `input.permissions.invite`: optional, `bool`
## Output Shape
- success: `data.status`, `data.success`, `data.failed`, `data.results`
- failure: `error.code`, `error.message`, `error.details`
```json
{
"skill": "calendar",
"action": "invite_subscriber",
"input": {
"event_id": "550e8400-e29b-41d4-a716-446655440000",
"invitee": {
"phone": "+8613800138000"
},
"permissions": {
"view": true,
"edit": false,
"invite": false
}
}
}
```
## Rules
- Look up the phone number with `contacts` first if needed.
@@ -0,0 +1,31 @@
# list_day
Use when the user asks about one calendar day in a local timezone.
## Input Schema
- `input.date`: required, `string`, format `YYYY-MM-DD`
- `input.timezone`: optional, `string`, IANA timezone like `Asia/Shanghai`
## Output Shape
- success: `data.total: int`, `data.items: array`
- failure: `error.code`, `error.message`, `error.details`
```json
{
"skill": "calendar",
"action": "list_day",
"input": {
"date": "2026-04-23",
"timezone": "Asia/Shanghai"
}
}
```
## Rules
- `input` must not be empty.
- `date` must be a concrete date string, not an empty object.
- For words like today or tomorrow, convert them to a concrete `YYYY-MM-DD` date from `system_time_local` before calling `project_cli`.
- Use `get_event` instead if you already have an `event_id`.
@@ -0,0 +1,29 @@
# list_range
Use when the user asks for a specific time range.
## Input Schema
- `input.start_at`: required, `string`, ISO 8601 datetime
- `input.end_at`: required, `string`, ISO 8601 datetime
## Output Shape
- success: `data.total: int`, `data.items: array`
- failure: `error.code`, `error.message`, `error.details`
```json
{
"skill": "calendar",
"action": "list_range",
"input": {
"start_at": "2026-04-23T09:00:00+08:00",
"end_at": "2026-04-23T18:00:00+08:00"
}
}
```
## Rules
- `start_at` and `end_at` must both be present.
- Do not send `event_id` to list actions.
@@ -0,0 +1,22 @@
# reject_invite
## Input Schema
- `input.event_id`: required, `string`, UUID
## Output Shape
- success: subscription response object
- failure: `error.code`, `error.message`, `error.details`
Use when rejecting a shared event invitation.
```json
{
"skill": "calendar",
"action": "reject_invite",
"input": {
"event_id": "550e8400-e29b-41d4-a716-446655440000"
}
}
```
@@ -0,0 +1,39 @@
# update_event
Use when changing one known event.
## Input Schema
- `input.event_id`: required, `string`, UUID
- `input.patch`: required, `object`
- `input.patch.title`: optional, `string`
- `input.patch.description`: optional, `string | null`
- `input.patch.start_at`: optional, `string | null`, ISO 8601 datetime
- `input.patch.end_at`: optional, `string | null`, ISO 8601 datetime
- `input.patch.timezone`: optional, `string`
- `input.patch.metadata`: optional, `object | null`
- `input.patch.status`: optional, `string`
## Output Shape
- success: `data.status`, `data.success`, `data.failed`, `data.ids`, `data.results`
- failure: `error.code`, `error.message`, `error.details`
```json
{
"skill": "calendar",
"action": "update_event",
"input": {
"event_id": "550e8400-e29b-41d4-a716-446655440000",
"patch": {
"title": "Updated title",
"timezone": "Asia/Shanghai"
}
}
}
```
## Rules
- All mutable fields go inside `patch`.
- Do not put mutable fields at the top level.
@@ -8,7 +8,7 @@ description: Contact lookup - find friend information including phone numbers fo
## Execution Protocol
1. On first contacts use in a run, call `view_skill_file` with `contacts/SKILL.md` before any `project_cli` call.
2. After reading, use `project_cli` only with `command="contacts"`.
2. After reading, use `project_cli` only with `module="contacts"`, `method="read"`, and JSON-native `input`.
3. If contact data is needed for a later action, fetch it first instead of inventing phone numbers or friend matches.
## When to Use
@@ -23,15 +23,23 @@ Use the single tool `project_cli`.
Read this file first with `view_skill_file` when contacts is the relevant skill.
## Calling Contract
- `module`: required, must be `contacts`
- `method`: required, must be `read`
- `input`: required, must be `{}`
- Output success fields: `data.friends_count`, `data.friends`
- Output failure fields: `error.code`, `error.message`, `error.details`
### Read Contacts
Call `project_cli` with:
```json
{
"command": "contacts",
"subcommand": "read",
"args": {}
"module": "contacts",
"method": "read",
"input": {}
}
```
@@ -43,11 +51,11 @@ Returns:
1. To share an event:
- Call `view_skill_file` with `calendar/SKILL.md` if calendar instructions have not been read in this run
- Call `project_cli` `contacts read` to get friend candidates
- Call `project_cli` with `module="contacts"`, `method="read"` to get friend candidates
- Match user's description to a friend
- Call `project_cli` `calendar share` with the friend's phone
- Call `project_cli` with `module="calendar"`, `method="share"` and the friend's phone
## Failure Recovery
- If no friends found, inform the user they have no contacts yet
- If lookup fails, suggest retrying
- If lookup fails, inspect `error.details` and retry only with the documented input shape
@@ -8,7 +8,7 @@ description: User memory management - store and forget personal facts and work p
## Execution Protocol
1. On first memory use in a run, call `view_skill_file` with `memory/SKILL.md` before any `project_cli` call.
2. After reading, use `project_cli` only with `command="memory"`.
2. After reading, use `project_cli` only with `module="memory"`, `method="update"`, and JSON-native `input`.
3. If the user asks to remember or forget something, execute `project_cli`; do not claim persistence without the tool result.
## When to Use
@@ -24,15 +24,23 @@ Use the single tool `project_cli`.
Read this file first with `view_skill_file` when memory is the relevant skill.
## Calling Contract
- `module`: required, must be `memory`
- `method`: required, must be `update`
- `input.operations`: required, non-empty array
- Output success fields: `data.status`, `data.success`, `data.failed`, `data.results`
- Output failure fields: `error.code`, `error.message`, `error.details`
### Update Memory
Call `project_cli` with:
```json
{
"command": "memory",
"subcommand": "update",
"args": {
"module": "memory",
"method": "update",
"input": {
"operations": [
{
"action": "update",
@@ -50,15 +58,26 @@ Operation object fields:
- `update` requires matching content payload (`user_content` / `work_content`)
- `delete` requires `forget_paths`
Field requirements:
- `operations[].action`: required, `string`
- `operations[].memory_type`: required, `string`
- `operations[].user_content`: required for `memory_type=user` and `action=update`, `object`
- `operations[].work_content`: required for `memory_type=work` and `action=update`, `object`
- `operations[].forget_paths`: required for `action=delete`, `array[string]`
## Composition Patterns
1. When user says "remember that I prefer morning meetings":
- Call `project_cli` `memory update` with `action=update`, `memory_type=user`, and appropriate content
- Call `project_cli` with `module="memory"`, `method="update"`, and appropriate content
2. When user says "forget my old address":
- Call `project_cli` `memory update` with `action=delete` and the specific dot-path
- Call `project_cli` with `module="memory"`, `method="update"`, `operations[0].action="delete"`, and the specific dot-path
## Protocol Reminder
- Never use old `command/subcommand/args` fields for memory writes.
## Failure Recovery
- If write fails, inform the user and suggest rephrasing
- If write fails, inspect `error.details` and retry with the documented field shape only
- If forget path is invalid, suggest checking the data structure
@@ -14,7 +14,6 @@ _TOOL_AGENT_OUTPUT_STORE: ContextVar[dict[str, dict[str, Any]] | None] = Context
default=None,
)
def set_current_tool_call_id(tool_call_id: str | None) -> Token[str | None]:
return _CURRENT_TOOL_CALL_ID.set(tool_call_id)
@@ -7,18 +7,18 @@ from schemas.agent.runtime_models import ToolAgentOutput, ToolStatus
from schemas.agent.ui_hints import UiHintIntent, UiHintsPayload, UiHintStatus
def _resolve_command_key(tool_output: ToolAgentOutput) -> tuple[str, str] | None:
def _resolve_method_key(tool_output: ToolAgentOutput) -> tuple[str, str] | None:
args = tool_output.tool_call_args or {}
command = str(args.get("command", "")).strip()
subcommand = str(args.get("subcommand", "")).strip()
if command and subcommand:
return command, subcommand
module = str(args.get("module", "")).strip()
method = str(args.get("method", "")).strip()
if module and method:
return module, method
result = tool_output.result
if isinstance(result, dict):
command = str(result.get("command", "")).strip()
subcommand = str(result.get("subcommand", "")).strip()
if command and subcommand:
return command, subcommand
module = str(result.get("module", "")).strip()
method = str(result.get("method", "")).strip()
if module and method:
return module, method
return None
@@ -84,6 +84,9 @@ def _calendar_read_ui_hints(tool_output: ToolAgentOutput) -> dict[str, Any] | No
if data is None:
return None
if "id" in data:
return _calendar_get_event_ui_hints(tool_output)
items_raw = data.get("items")
events = [item for item in items_raw if isinstance(item, dict)] if isinstance(items_raw, list) else []
list_items: list[dict[str, Any]] = []
@@ -116,6 +119,38 @@ def _calendar_read_ui_hints(tool_output: ToolAgentOutput) -> dict[str, Any] | No
)
def _calendar_get_event_ui_hints(tool_output: ToolAgentOutput) -> dict[str, Any] | None:
data = _result_data(tool_output)
if data is None:
return None
event_id = str(data.get("id") or "").strip()
title = str(data.get("title") or "").strip() or "日程详情"
start_at = str(data.get("start_at") or "").strip()
end_at = str(data.get("end_at") or "").strip()
subtitle = f"{start_at} ~ {end_at}" if start_at and end_at else (start_at or end_at or None)
return _build_status_ui_hints(
tool_output=tool_output,
intent=UiHintIntent.STATUS,
title="日程详情",
description="仅展示本次查询返回的日程详情。",
items=[
{"key": "event_id", "label": "日程 ID", "value": event_id},
{"key": "title", "label": "标题", "value": title},
],
list_title="详情",
list_items=[
{
"id": event_id or None,
"title": title,
"subtitle": subtitle,
"status": UiHintStatus.INFO.value,
}
],
)
def _calendar_mutation_ui_hints(
*,
tool_output: ToolAgentOutput,
@@ -232,6 +267,23 @@ def _calendar_share_ui_hints(tool_output: ToolAgentOutput) -> dict[str, Any] | N
)
def _calendar_invite_status_ui_hints(tool_output: ToolAgentOutput) -> dict[str, Any] | None:
data = _result_data(tool_output)
if data is None:
return None
return _build_status_ui_hints(
tool_output=tool_output,
intent=UiHintIntent.STATUS,
title="邀请处理结果",
description="仅展示本次邀请响应结果。",
items=[
{"key": "message", "label": "结果", "value": str(data.get("message") or "")},
],
list_title="执行结果",
list_items=[],
)
def _memory_update_ui_hints(tool_output: ToolAgentOutput) -> dict[str, Any] | None:
data = _result_data(tool_output)
if data is None:
@@ -326,11 +378,13 @@ def _contacts_read_ui_hints(tool_output: ToolAgentOutput) -> dict[str, Any] | No
_UI_HINTS_BUILDERS: dict[tuple[str, str], Callable[[ToolAgentOutput], dict[str, Any] | None]] = {
("calendar", "create"): _calendar_create_ui_hints,
("calendar", "read"): _calendar_read_ui_hints,
("calendar", "create"): _calendar_create_ui_hints,
("calendar", "update"): _calendar_update_ui_hints,
("calendar", "delete"): _calendar_delete_ui_hints,
("calendar", "share"): _calendar_share_ui_hints,
("calendar", "accept_invite"): _calendar_invite_status_ui_hints,
("calendar", "reject_invite"): _calendar_invite_status_ui_hints,
("contacts", "read"): _contacts_read_ui_hints,
("memory", "update"): _memory_update_ui_hints,
}
@@ -341,10 +395,10 @@ def postprocess_tool_output(tool_output: ToolAgentOutput) -> ToolAgentOutput:
return tool_output
if tool_output.ui_hints is not None:
return tool_output
command_key = _resolve_command_key(tool_output)
if command_key is None:
method_key = _resolve_method_key(tool_output)
if method_key is None:
return tool_output
builder = _UI_HINTS_BUILDERS.get(command_key)
builder = _UI_HINTS_BUILDERS.get(method_key)
if builder is None:
return tool_output
ui_hints = builder(tool_output)
+13 -3
View File
@@ -6,6 +6,8 @@ from typing import Any
from core.agentscope.tools.internal import make_project_cli_wrapper, make_view_skill_file_wrapper
from core.agentscope.tools.internal.project_cli import PROJECT_CLI_TOOL_NAME
from core.agentscope.tools.internal.view_skill_file import VIEW_SKILL_FILE_TOOL_NAME
from core.agentscope.tools.skill_session import SkillSessionState
from core.agentscope.tools.skill_prompt import AGENT_SKILL_INSTRUCTION, AGENT_SKILL_TEMPLATE
from core.agentscope.tools.tool_middleware import register_tool_middlewares
from core.logging import get_logger
from schemas.agent.skill_config import ProjectCliCommand, SkillName
@@ -50,7 +52,12 @@ def build_toolkit(
else:
enabled_skills = _validate_enabled_skill_names(enabled_skill_names)
toolkit = Toolkit()
skill_session = SkillSessionState()
toolkit = Toolkit(
agent_skill_instruction=AGENT_SKILL_INSTRUCTION,
agent_skill_template=AGENT_SKILL_TEMPLATE,
)
if allowed_commands is None:
resolved_allowed_commands = _all_command_names()
@@ -58,14 +65,17 @@ def build_toolkit(
resolved_allowed_commands = _validate_allowed_commands(allowed_commands)
project_cli_wrapper = make_project_cli_wrapper(
allowed_commands=resolved_allowed_commands
allowed_commands=resolved_allowed_commands,
)
toolkit.register_tool_function(
project_cli_wrapper,
func_name=PROJECT_CLI_TOOL_NAME,
)
view_skill_wrapper = make_view_skill_file_wrapper(enabled_skill_names=enabled_skills)
view_skill_wrapper = make_view_skill_file_wrapper(
enabled_skill_names=enabled_skills,
skill_session=skill_session,
)
toolkit.register_tool_function(
view_skill_wrapper,
func_name=VIEW_SKILL_FILE_TOOL_NAME,
@@ -41,30 +41,10 @@ llms:
output_cost_per_token: 0.000012
cache_hit_cost_per_token: 0.00000012
- model_code: qwen3.5-35b-a3b
factory_name: dashscope
pricing_tiers:
- max_prompt_tokens: 128000
input_cost_per_token: 0.0000004
output_cost_per_token: 0.0000032
- max_prompt_tokens: 256000
input_cost_per_token: 0.0000016
output_cost_per_token: 0.0000128
- model_code: deepseek-chat
factory_name: deepseek
pricing_tiers:
- max_prompt_tokens: 128000
input_cost_per_token: 0.000002
output_cost_per_token: 0.000003
- max_prompt_tokens: 1000000
input_cost_per_token: 0.000001
output_cost_per_token: 0.000002
cache_hit_cost_per_token: 0.0000002
- model_code: qwen3.5-27b
factory_name: dashscope
pricing_tiers:
- max_prompt_tokens: 128000
input_cost_per_token: 0.0000006
output_cost_per_token: 0.0000048
- max_prompt_tokens: 256000
input_cost_per_token: 0.0000018
output_cost_per_token: 0.0000144
@@ -32,7 +32,9 @@ def test_react_agent_sys_prompt_includes_registered_skill_prompt() -> None:
assert "# Agent Skills" in prompt
assert "## calendar" in prompt
assert "## contacts" in prompt
assert "SKILL.md" in prompt
assert "view_skill_file" in prompt
assert 'file_path="calendar/SKILL.md"' in prompt
assert 'file_path="contacts/SKILL.md"' in prompt
def test_view_skill_file_tool_reads_registered_skill_content() -> None:
@@ -47,3 +49,18 @@ def test_view_skill_file_tool_reads_registered_skill_content() -> None:
block = response.content[0]
text = block["text"] if isinstance(block, dict) else block.text
assert "Calendar Skill" in text or "name: calendar" in text
def test_view_skill_file_tool_reads_calendar_action_card() -> None:
toolkit = build_toolkit(enabled_skill_names={"calendar"})
tool = toolkit.tools["view_skill_file"].original_func
response = asyncio.run(
tool(file_path="calendar/actions/create_event.md", ranges=[1, 20]),
)
assert response.content
block = response.content[0]
text = block["text"] if isinstance(block, dict) else block.text
assert "create_event" in text
assert "input.title" in text
@@ -252,8 +252,8 @@ async def test_calendar_create_skill_creates_db_record() -> None:
assert cli_result.get("status") == "success", f"Tool call failed: {cli_result}"
args = cli_result.get("tool_call_args", {})
assert args.get("command") == "calendar"
assert args.get("subcommand") == "create"
assert args.get("module") == "calendar"
assert args.get("method") == "create"
result_payload = cli_result.get("result")
assert isinstance(result_payload, dict), f"Unexpected result payload: {cli_result}"
@@ -317,8 +317,8 @@ async def test_calendar_read_skill_queries_db() -> None:
assert cli_result.get("status") in {"success", "partial"}, f"Tool call failed: {cli_result}"
args = cli_result.get("tool_call_args", {})
assert args.get("command") == "calendar"
assert args.get("subcommand") == "read"
assert args.get("module") == "calendar"
assert args.get("method") in {"read"}
@pytest.mark.asyncio
@@ -355,8 +355,8 @@ async def test_contacts_read_skill_queries_db() -> None:
assert cli_result.get("status") in {"success", "partial"}, f"Tool call failed: {cli_result}"
args = cli_result.get("tool_call_args", {})
assert args.get("command") == "contacts"
assert args.get("subcommand") == "read"
assert args.get("module") == "contacts"
assert args.get("method") == "read"
@pytest.mark.asyncio
@@ -398,8 +398,8 @@ async def test_memory_update_skill_via_automation() -> None:
assert cli_result.get("status") in {"success", "partial"}, f"Tool call failed: {cli_result}"
args = cli_result.get("tool_call_args", {})
assert args.get("command") == "memory"
assert args.get("subcommand") == "update"
assert args.get("module") == "memory"
assert args.get("method") == "update"
if user_id:
time.sleep(1)
+16 -10
View File
@@ -183,7 +183,6 @@ async def test_agent_calendar_read_via_cli() -> None:
tool_names = [result.get("tool_name") for result in tool_call_results]
assert "view_skill_file" in tool_names
assert "project_cli" in tool_names
assert tool_names.index("view_skill_file") < tool_names.index("project_cli")
view_result = next(
result for result in tool_call_results if result.get("tool_name") == "view_skill_file"
@@ -193,22 +192,27 @@ async def test_agent_calendar_read_via_cli() -> None:
assert isinstance(view_args, dict)
assert view_args.get("file_path") == "calendar/SKILL.md"
result = next(
result for result in tool_call_results if result.get("tool_name") == "project_cli"
)
successful_project_cli_results = [
result
for result in tool_call_results
if result.get("tool_name") == "project_cli"
and result.get("status") in {"success", "partial"}
]
assert successful_project_cli_results, "expected at least one successful project_cli result"
result = successful_project_cli_results[-1]
assert result.get("status") in {"success", "failure", "partial"}
tool_call_args = result.get("tool_call_args")
assert isinstance(tool_call_args, dict)
assert tool_call_args.get("command") == "calendar"
assert tool_call_args.get("subcommand") == "read"
assert tool_call_args.get("module") == "calendar"
assert tool_call_args.get("method") in {"read"}
raw_result = result.get("result")
if isinstance(raw_result, str):
raw_result = json.loads(raw_result)
assert isinstance(raw_result, dict), f"result should be dict, got {type(raw_result)}"
assert raw_result.get("command") == "calendar"
assert raw_result.get("subcommand") == "read"
assert raw_result.get("module") == "calendar"
assert raw_result.get("method") in {"read"}
if "ui_schema" in result:
ui_schema = result["ui_schema"]
@@ -285,8 +289,10 @@ async def test_tool_ui_schema_in_history() -> None:
except (json.JSONDecodeError, ValueError):
pass
assert isinstance(result, dict), f"result in DB should be dict, got {type(result)}: {result!r}"
assert result.get("command") == "calendar"
assert result.get("subcommand") == "read"
if tool_agent_output.get("status") == "failure":
continue
assert result.get("module") == "calendar"
assert result.get("method") in {"read"}
ui_hints = tool_agent_output.get("ui_hints")
assert isinstance(ui_hints, dict), f"ui_hints should be dict, got {type(ui_hints)}"
View File
+196
View File
@@ -0,0 +1,196 @@
from __future__ import annotations
import os
import time
from pathlib import Path
from uuid import uuid4
import httpx
import jwt
def _load_env() -> None:
env_path = Path(__file__).resolve().parents[3] / ".env"
if env_path.exists():
for line in env_path.read_text().splitlines():
line = line.strip()
if not line or line.startswith("#") or "=" not in line:
continue
key, _, value = line.partition("=")
key = key.strip()
value = value.strip().strip('"').strip("'")
if key and key not in os.environ:
os.environ[key] = value
_load_env()
BASE_URL = os.getenv("AGENT_LIVE_BASE_URL", "http://localhost:5775")
def get_jwt_secret() -> str:
secret = (
os.getenv("SOCIAL_SUPABASE__JWT_SECRET")
or os.getenv("SUPABASE_JWT_SECRET")
or os.getenv("JWT_SECRET")
)
if not secret:
raise RuntimeError("JWT_SECRET not found in environment")
return secret
def get_supabase_url() -> str:
return (
os.getenv("SOCIAL_SUPABASE__URL")
or os.getenv("SUPABASE_URL")
or "http://localhost:54321"
)
def get_test_user_id() -> str:
user_id = os.getenv("TEST_USER_ID")
if user_id:
return user_id
raise RuntimeError("TEST_USER_ID not set")
def create_test_jwt(user_id: str) -> str:
now = int(time.time())
payload = {
"sub": user_id,
"role": "authenticated",
"aud": "authenticated",
"iss": get_supabase_url(),
"iat": now,
"exp": now + 3600,
}
return jwt.encode(payload, get_jwt_secret(), algorithm="HS256")
async def run_agent_and_collect(
*,
user_message: str,
client: httpx.AsyncClient,
headers: dict,
run_id: str | None = None,
thread_id: str | None = None,
timeout: float = 120.0,
) -> AgentRunResult:
if thread_id is None:
thread_id = str(uuid4())
if run_id is None:
run_id = f"quality-{thread_id[:8]}"
t_start = time.monotonic()
run_resp = await client.post(
f"{BASE_URL}/api/v1/agent/runs",
headers=headers,
json={
"threadId": thread_id,
"runId": run_id,
"state": {},
"messages": [
{"id": "u1", "role": "user", "content": user_message}
],
"tools": [],
"context": [],
"forwardedProps": {"runtime_mode": "chat"},
},
)
run_data = run_resp.json()
effective_thread_id = str(run_data.get("threadId", thread_id))
effective_run_id = run_data.get("runId", run_id)
events_url = (
f"{BASE_URL}/api/v1/agent/runs/{effective_thread_id}/events"
f"?runId={effective_run_id}"
)
import json
tool_results: list[dict] = []
all_events: list[dict] = []
run_finished = False
final_answer = ""
async with client.stream(
"GET", events_url, headers=headers, timeout=timeout
) as sse_resp:
buffer = ""
async for line in sse_resp.aiter_lines():
if line.startswith("data:"):
data_str = line.split(":", 1)[1].strip()
if data_str:
buffer = data_str
elif line == "" and buffer:
try:
event_data = json.loads(buffer)
event_type = event_data.get("type")
all_events.append(event_data)
if event_type == "TOOL_CALL_RESULT":
tool_results.append(event_data)
elif event_type == "TEXT_MESSAGE_END":
final_answer = event_data.get("answer", "") or event_data.get("text", "")
elif event_type in {"RUN_FINISHED", "RUN_ERROR"}:
run_finished = True
except json.JSONDecodeError:
pass
buffer = ""
t_end = time.monotonic()
return AgentRunResult(
thread_id=effective_thread_id,
run_id=effective_run_id,
user_message=user_message,
final_answer=final_answer,
tool_results=tool_results,
all_events=all_events,
run_finished=run_finished,
latency_ms=round((t_end - t_start) * 1000),
)
class AgentRunResult:
def __init__(
self,
*,
thread_id: str,
run_id: str,
user_message: str,
final_answer: str,
tool_results: list[dict],
all_events: list[dict],
run_finished: bool,
latency_ms: int,
) -> None:
self.thread_id = thread_id
self.run_id = run_id
self.user_message = user_message
self.final_answer = final_answer
self.tool_results = tool_results
self.all_events = all_events
self.run_finished = run_finished
self.latency_ms = latency_ms
@property
def tool_names_called(self) -> list[str]:
return [
tr.get("tool_name", "") or tr.get("toolName", "")
for tr in self.tool_results
]
@property
def successful_tool_names(self) -> list[str]:
return [
tr.get("tool_name", "") or tr.get("toolName", "")
for tr in self.tool_results
if tr.get("status") in ("success", "partial")
]
@property
def has_tool_success(self) -> bool:
return len(self.successful_tool_names) > 0
@@ -0,0 +1,99 @@
from __future__ import annotations
from pydantic import BaseModel
class ScoreDetail(BaseModel):
criterion: str
passed: bool
note: str = ""
class ScenarioScore(BaseModel):
scenario_id: str
model_code: str
latency_ms: int
input_tokens: int = 0
output_tokens: int = 0
cost_usd: float = 0.0
tool_called: bool
tool_succeeded: bool
answer_quality: float
details: list[ScoreDetail]
raw_answer: str = ""
run_finished: bool = True
@property
def overall_score(self) -> float:
weights = {
"tool_correctness": 0.3,
"answer_quality": 0.5,
"latency": 0.2,
}
tool_score = 1.0 if self.tool_succeeded else (0.5 if self.tool_called else 0.0)
latency_score = self._latency_score()
return (
weights["tool_correctness"] * tool_score
+ weights["answer_quality"] * self.answer_quality
+ weights["latency"] * latency_score
)
def _latency_score(self) -> float:
if self.latency_ms <= 5000:
return 1.0
if self.latency_ms <= 15000:
return 0.7
if self.latency_ms <= 30000:
return 0.4
return 0.1
class ModelScorecard(BaseModel):
model_code: str
scenario_scores: list[ScenarioScore]
@property
def avg_overall(self) -> float:
if not self.scenario_scores:
return 0.0
return sum(s.overall_score for s in self.scenario_scores) / len(self.scenario_scores)
@property
def avg_latency_ms(self) -> float:
if not self.scenario_scores:
return 0.0
return sum(s.latency_ms for s in self.scenario_scores) / len(self.scenario_scores)
@property
def avg_cost_usd(self) -> float:
if not self.scenario_scores:
return 0.0
return sum(s.cost_usd for s in self.scenario_scores) / len(self.scenario_scores)
@property
def tool_success_rate(self) -> float:
if not self.scenario_scores:
return 0.0
return sum(1 for s in self.scenario_scores if s.tool_succeeded) / len(self.scenario_scores)
def summary_table(self) -> str:
lines = [
f"\n{'='*60}",
f"Model Scorecard: {self.model_code}",
f"{'='*60}",
f" Avg Overall Score : {self.avg_overall:.2f}",
f" Avg Latency : {self.avg_latency_ms:.0f}ms",
f" Avg Cost : ${self.avg_cost_usd:.6f}",
f" Tool Success Rate : {self.tool_success_rate:.0%}",
f"{'-'*60}",
]
for s in self.scenario_scores:
status = "PASS" if s.tool_succeeded else "FAIL"
lines.append(
f" [{status}] {s.scenario_id:<25} "
f"score={s.overall_score:.2f} "
f"lat={s.latency_ms}ms "
f"cost=${s.cost_usd:.6f}"
)
lines.append(f"{'='*60}")
return "\n".join(lines)
@@ -0,0 +1,82 @@
from __future__ import annotations
from pydantic import BaseModel
class EvalScenario(BaseModel):
id: str
prompt: str
category: str
expect_tool_use: bool
expect_tool_success: bool
quality_criteria: list[str]
CALENDAR_SCENARIOS: list[EvalScenario] = [
EvalScenario(
id="calendar-read-today",
prompt="请查询我今天的日程安排",
category="calendar",
expect_tool_use=True,
expect_tool_success=True,
quality_criteria=[
"应调用 project_cli 的 calendar.read 方法",
"input 应包含 mode=day 和具体日期",
"回答应基于工具返回的实际数据",
"如果无日程,应明确告知无日程",
],
),
EvalScenario(
id="calendar-create-event",
prompt="帮我创建一个明天下午3点两小时的会议,标题是项目周会",
category="calendar",
expect_tool_use=True,
expect_tool_success=True,
quality_criteria=[
"应调用 project_cli 的 calendar.create 方法",
"input 应包含 title、start_at、timezone",
"start_at 应为具体的时间戳而非自然语言",
"应返回创建结果(包含 event_id)",
],
),
EvalScenario(
id="calendar-read-range",
prompt="这周一到周五我有哪些日程?",
category="calendar",
expect_tool_use=True,
expect_tool_success=True,
quality_criteria=[
"应调用 project_cli 的 calendar.read 方法",
"input 应使用 mode=range 或多次 mode=day",
"应提供完整时间范围",
],
),
]
GENERAL_SCENARIOS: list[EvalScenario] = [
EvalScenario(
id="general-greeting",
prompt="你好,你是谁?",
category="general",
expect_tool_use=False,
expect_tool_success=False,
quality_criteria=[
"应简短自我介绍",
"不应调用任何工具",
"回答简洁不啰嗦",
],
),
EvalScenario(
id="general-farewell",
prompt="好的谢谢,再见",
category="general",
expect_tool_use=False,
expect_tool_success=False,
quality_criteria=[
"应礼貌告别",
"不应调用任何工具",
],
),
]
ALL_SCENARIOS = CALENDAR_SCENARIOS + GENERAL_SCENARIOS
+440
View File
@@ -0,0 +1,440 @@
from __future__ import annotations
import json
import os
import time
from uuid import uuid4
import httpx
import jwt
import pytest
from backend.tests.quality.evaluators import ModelScorecard, ScoreDetail, ScenarioScore
from backend.tests.quality.scenarios import ALL_SCENARIOS
CANDIDATE_MODELS = ["qwen3.5-flash", "deepseek-chat"]
MODEL_LLM_IDS = {
"qwen3.5-flash": "c625bce4-970e-4a76-bebe-cb8840fed854",
"deepseek-chat": "12bc1963-4b67-404b-b952-5948bea0f690",
}
BASE_URL = os.getenv("AGENT_LIVE_BASE_URL", "http://localhost:5775")
def _load_env() -> None:
from pathlib import Path
env_path = Path(__file__).resolve().parents[3] / ".env"
if env_path.exists():
for line in env_path.read_text().splitlines():
line = line.strip()
if not line or line.startswith("#") or "=" not in line:
continue
key, _, value = line.partition("=")
key = key.strip()
value = value.strip().strip('"').strip("'")
if key and key not in os.environ:
os.environ[key] = value
_load_env()
def _get_jwt_secret() -> str:
secret = (
os.getenv("SOCIAL_SUPABASE__JWT_SECRET")
or os.getenv("SUPABASE_JWT_SECRET")
or os.getenv("JWT_SECRET")
)
if not secret:
raise RuntimeError("JWT_SECRET not found in environment")
return secret
def _get_supabase_url() -> str:
return (
os.getenv("SOCIAL_SUPABASE__PUBLIC_URL")
or os.getenv("SOCIAL_SUPABASE__URL")
or os.getenv("SUPABASE_URL")
or "http://localhost:54321"
)
def _get_supabase_key() -> str:
from core.config.settings import config
key = os.getenv("SOCIAL_SUPABASE__SERVICE_ROLE_KEY", "")
if key:
return key
return config.supabase.service_role_key
def _get_test_user_id() -> str:
user_id = os.getenv("TEST_USER_ID")
if user_id:
return user_id
raise RuntimeError("TEST_USER_ID not set")
def _create_jwt(user_id: str) -> str:
now = int(time.time())
payload = {
"sub": user_id,
"role": "authenticated",
"aud": "authenticated",
"iss": _get_supabase_url(),
"iat": now,
"exp": now + 3600,
}
return jwt.encode(payload, _get_jwt_secret(), algorithm="HS256")
async def _run_via_http(
*,
user_message: str,
token: str,
timeout: float = 120.0,
) -> dict:
thread_id = str(uuid4())
run_id = f"q-{uuid4().hex[:12]}"
async with httpx.AsyncClient(timeout=httpx.Timeout(timeout)) as client:
headers = {"Authorization": f"Bearer {token}"}
run_resp = await client.post(
f"{BASE_URL}/api/v1/agent/runs",
headers=headers,
json={
"threadId": thread_id,
"runId": run_id,
"state": {},
"messages": [
{"id": "u1", "role": "user", "content": user_message}
],
"tools": [],
"context": [],
"forwardedProps": {"runtime_mode": "chat"},
},
)
run_data = run_resp.json()
eff_thread = str(run_data.get("threadId", thread_id))
eff_run = run_data.get("runId", run_id)
events_url = (
f"{BASE_URL}/api/v1/agent/runs/{eff_thread}/events"
f"?runId={eff_run}"
)
t_start = time.monotonic()
tool_results: list[dict] = []
all_events: list[dict] = []
final_answer = ""
run_finished = False
token_usage: dict = {}
async with client.stream(
"GET", events_url, headers=headers, timeout=timeout
) as sse:
buffer = ""
async for line in sse.aiter_lines():
if line.startswith("data:"):
data_str = line.split(":", 1)[1].strip()
if data_str:
buffer = data_str
elif line == "" and buffer:
try:
ev = json.loads(buffer)
all_events.append(ev)
etype = ev.get("type")
if etype == "TOOL_CALL_RESULT":
tool_results.append(ev)
elif etype == "TEXT_MESSAGE_END":
final_answer = ev.get("answer", "") or ev.get("text", "")
token_usage = {
"totalTokens": ev.get("totalTokens", 0),
"inputTokens": ev.get("inputTokens", 0),
"outputTokens": ev.get("outputTokens", 0),
"promptCacheMissTokens": ev.get(
"promptCacheMissTokens", 0
),
"promptCacheHitTokens": ev.get(
"promptCacheHitTokens", 0
),
}
elif etype in {"RUN_FINISHED", "RUN_ERROR"}:
run_finished = True
except json.JSONDecodeError:
pass
buffer = ""
t_end = time.monotonic()
tool_names = [
tr.get("tool_name", "") or tr.get("toolName", "")
for tr in tool_results
]
successful_tool_names = [
tr.get("tool_name", "") or tr.get("toolName", "")
for tr in tool_results
if tr.get("status") in ("success", "partial")
]
return {
"final_answer": final_answer,
"tool_results": tool_results,
"tool_names": tool_names,
"successful_tool_names": successful_tool_names,
"run_finished": run_finished,
"latency_ms": round((t_end - t_start) * 1000),
"token_usage": token_usage,
}
def _switch_model(model_code: str) -> None:
from supabase import create_client
sb = create_client(_get_supabase_url(), _get_supabase_key())
llm_id = MODEL_LLM_IDS[model_code]
for agent_type in ("router", "worker"):
(
sb.table("system_agents")
.update({"llm_id": llm_id})
.eq("agent_type", agent_type)
.execute()
)
def _save_original_models() -> list[dict]:
from supabase import create_client
sb = create_client(_get_supabase_url(), _get_supabase_key())
return (
sb.table("system_agents")
.select("agent_type, llm_id")
.execute()
.data
)
def _restore_models(original_rows: list[dict]) -> None:
from supabase import create_client
sb = create_client(_get_supabase_url(), _get_supabase_key())
for row in original_rows:
(
sb.table("system_agents")
.update({"llm_id": row["llm_id"]})
.eq("agent_type", row["agent_type"])
.execute()
)
def _evaluate_answer_quality(
*,
answer: str,
run_finished: bool,
expect_tool_use: bool,
has_tool_success: bool,
tool_names: list[str],
) -> float:
if not run_finished:
return 0.0
if not answer or not answer.strip():
return 0.0
score = 0.6
if expect_tool_use:
if has_tool_success:
score += 0.2
elif tool_names:
score += 0.1
else:
score -= 0.3
else:
if not tool_names:
score += 0.2
else:
score -= 0.1
if len(answer) > 10:
score += 0.1
if "无法" in answer or "失败" in answer or "错误" in answer:
if expect_tool_use:
score -= 0.1
return max(0.0, min(1.0, score))
def _evaluate_criteria(
*,
answer: str,
run_finished: bool,
tool_names: list[str],
has_tool_success: bool,
tool_results: list[dict],
scenario: object,
) -> list[ScoreDetail]:
details: list[ScoreDetail] = []
for criterion in getattr(scenario, "quality_criteria", []):
passed = False
note = ""
if "调用" in criterion or "project_cli" in criterion:
passed = any("project_cli" in tn for tn in tool_names)
note = f"tools: {tool_names}" if not passed else ""
elif "mode" in criterion and "day" in criterion:
for tr in tool_results:
args = tr.get("tool_call_args", {}) or tr.get("toolCallArgs", {})
inp = args.get("input", {})
if isinstance(inp, dict) and inp.get("mode") == "day":
passed = True
break
elif "具体" in criterion or "时间戳" in criterion:
passed = has_tool_success
elif "基于工具" in criterion or "返回" in criterion:
passed = has_tool_success
elif "无日程" in criterion:
passed = "" in answer or "没有" in answer
elif "简短" in criterion or "简洁" in criterion:
passed = 0 < len(answer) < 200
elif "自我介绍" in criterion:
passed = "Linksy" in answer or "助手" in answer
elif "礼貌" in criterion:
passed = len(answer) > 0
else:
passed = run_finished and len(answer) > 0
details.append(ScoreDetail(criterion=criterion, passed=passed, note=note))
return details
async def _run_model_scenarios(model_code: str, user_id: str) -> ModelScorecard:
from services.llm_pricing.service import LlmPricingService
pricing = LlmPricingService()
token = _create_jwt(user_id)
scores: list[ScenarioScore] = []
for scenario in ALL_SCENARIOS:
result = await _run_via_http(
user_message=scenario.prompt,
token=token,
)
answer = result["final_answer"]
tool_names = result["tool_names"]
has_tool_success = len(result["successful_tool_names"]) > 0
tu = result["token_usage"]
total_tokens = tu.get("totalTokens", 0)
input_tokens = tu.get("inputTokens", 0) or tu.get("promptCacheMissTokens", 0)
output_tokens = tu.get("outputTokens", 0) or max(total_tokens - input_tokens, 0)
try:
cost_usd = pricing.calculate_cost(
model=model_code,
prompt_tokens=input_tokens,
completion_tokens=output_tokens,
cached_prompt_tokens=tu.get("promptCacheHitTokens", 0),
)
except ValueError:
cost_usd = 0.0
cost_usd = round(cost_usd, 8)
tool_called = any("project_cli" in tn for tn in tool_names)
tool_succeeded = has_tool_success if scenario.expect_tool_use else True
answer_quality = _evaluate_answer_quality(
answer=answer,
run_finished=result["run_finished"],
expect_tool_use=scenario.expect_tool_use,
has_tool_success=has_tool_success,
tool_names=tool_names,
)
details = _evaluate_criteria(
answer=answer,
run_finished=result["run_finished"],
tool_names=tool_names,
has_tool_success=has_tool_success,
tool_results=result["tool_results"],
scenario=scenario,
)
print(
f" [{model_code}] {scenario.id:<25} "
f"lat={result['latency_ms']}ms "
f"tokens={total_tokens} "
f"cost=${cost_usd:.6f} "
f"tool={'OK' if has_tool_success else 'FAIL'} "
f"answer={answer[:60]}"
)
scores.append(
ScenarioScore(
scenario_id=scenario.id,
model_code=model_code,
latency_ms=result["latency_ms"],
input_tokens=input_tokens,
output_tokens=output_tokens,
cost_usd=cost_usd,
tool_called=tool_called,
tool_succeeded=tool_succeeded,
answer_quality=answer_quality,
details=details,
raw_answer=answer[:500],
run_finished=result["run_finished"],
)
)
return ModelScorecard(model_code=model_code, scenario_scores=scores)
@pytest.fixture(autouse=True)
def _check_env():
if os.getenv("QUALITY_TEST") != "1":
pytest.skip("set QUALITY_TEST=1 to run quality tests")
@pytest.fixture(autouse=True)
def _require_test_user_id():
_get_test_user_id()
@pytest.mark.asyncio
@pytest.mark.quality
@pytest.mark.live
async def test_model_ab_comparison():
user_id = _get_test_user_id()
original_rows = _save_original_models()
scorecards: list[ModelScorecard] = []
try:
for model_code in CANDIDATE_MODELS:
_switch_model(model_code)
card = await _run_model_scenarios(model_code, user_id)
scorecards.append(card)
print(card.summary_table())
finally:
_restore_models(original_rows)
print("\n" + "=" * 60)
print("COMPARISON")
print("=" * 60)
for card in scorecards:
print(
f" {card.model_code:<20} "
f"overall={card.avg_overall:.2f} "
f"latency={card.avg_latency_ms:.0f}ms "
f"cost=${card.avg_cost_usd:.6f} "
f"tool_success={card.tool_success_rate:.0%}"
)
if len(scorecards) == 2:
a, b = scorecards
winner = a.model_code if a.avg_overall >= b.avg_overall else b.model_code
print(f"\n Winner: {winner} (by overall score)")
@@ -7,6 +7,7 @@ from ag_ui.core import RunAgentInput
import core.agentscope.runtime.runner as runner_module
from core.agentscope.runtime.runner import AgentScopeRunner
from schemas.agent.runtime_models import (
RunStatus,
RouterAgentOutput,
WorkerAgentOutputLite,
)
@@ -60,6 +61,31 @@ def test_build_worker_input_messages_only_contains_router_contract() -> None:
assert "[RouterAgentOutput]" in str(input_messages[0].content)
def test_build_agent_sets_worker_max_iters(
monkeypatch: pytest.MonkeyPatch,
) -> None:
captured: dict[str, object] = {}
class _FakeJsonReActAgent:
def __init__(self, **kwargs: object) -> None:
captured.update(kwargs)
monkeypatch.setattr(runner_module, "JsonReActAgent", _FakeJsonReActAgent)
runner = AgentScopeRunner()
model = runner_module.TrackingChatModel(object())
agent = runner._build_agent(
agent_name="worker",
system_prompt="test",
toolkit=object(),
model=model,
)
assert isinstance(agent, _FakeJsonReActAgent)
assert captured["max_iters"] == 7
def test_build_router_messages_injects_user_input_when_context_last_not_user() -> None:
runner = AgentScopeRunner()
run_input = _run_input()
@@ -119,6 +145,45 @@ def test_build_router_messages_appends_user_input_to_context_tail() -> None:
assert messages[0].content == "上一轮回复"
def test_enforce_tool_evidence_contract_keeps_success_when_tool_succeeds() -> None:
runner = AgentScopeRunner()
worker_output = runner._enforce_tool_evidence_contract(
worker_output=WorkerAgentOutputLite(
status=RunStatus.SUCCESS,
answer="今天没有日程",
suggested_actions=["查明天"],
),
requires_tool_evidence=True,
has_successful_tool_result=True,
)
assert worker_output.status == RunStatus.SUCCESS
assert worker_output.answer == "今天没有日程"
assert worker_output.suggested_actions == ["查明天"]
assert worker_output.error is None
def test_enforce_tool_evidence_contract_forces_failure_without_successful_tool() -> None:
runner = AgentScopeRunner()
worker_output = runner._enforce_tool_evidence_contract(
worker_output=WorkerAgentOutputLite(
status=RunStatus.SUCCESS,
answer="今天没有日程",
suggested_actions=["查明天"],
),
requires_tool_evidence=True,
has_successful_tool_result=False,
)
assert worker_output.status == RunStatus.FAILED
assert worker_output.answer == "无法确认结果:所需工具调用未成功完成。"
assert worker_output.suggested_actions == []
assert worker_output.error is not None
assert worker_output.error.code == "TOOL_EVIDENCE_MISSING"
def test_build_model_omits_none_generate_kwargs(
monkeypatch: pytest.MonkeyPatch,
) -> None:
@@ -1,6 +1,10 @@
from __future__ import annotations
from core.agentscope.prompts.agent_prompt import build_agent_prompt
from core.agentscope.prompts.agent_prompt import (
build_agent_prompt,
build_worker_contract_prompt,
)
from schemas.agent.runtime_models import RouterAgentOutput
from schemas.agent.system_agent import AgentType, SystemAgentLLMConfig
@@ -18,9 +22,12 @@ def test_build_agent_prompt_for_worker_contains_runtime_config() -> None:
assert "<!-- AGENT_START -->" in prompt
assert "- type: worker" in prompt
assert "context_messages.mode=number" in prompt
assert "context_messages.count=20" in prompt
assert "enabled_skills=calendar,contacts" in prompt
assert "Use objective plus context_summary as the primary execution guide from the router." in prompt
assert "When requires_tool_evidence=true, do not finalize an answer from failed tool calls; either recover with a corrected tool call or explicitly surface that execution failed." in prompt
assert "If all tool calls fail under requires_tool_evidence=true, set status=failed and populate error; do not present a factual answer as confirmed." in prompt
assert "context_messages.mode=number" not in prompt
assert "context_messages.count=20" not in prompt
def test_build_agent_prompt_for_router_contains_identity_and_config() -> None:
@@ -35,5 +42,20 @@ def test_build_agent_prompt_for_router_contains_identity_and_config() -> None:
assert "- type: router" in prompt
assert "[Router Agent]" in prompt
assert "When the task will require project_cli, include canonical tool input defaults in context_summary using the exact shape `project_cli_defaults={\"module\":...,\"method\":...,\"input\":{...}}` whenever they can be determined safely." in prompt
assert "Standardize every time value mentioned in context_summary to the exact project_cli input format that would be required downstream: dates as `YYYY-MM-DD`, local datetimes as RFC3339 with timezone offset, and event ids as raw UUID strings." in prompt
assert "For relative time requests like today, tomorrow, or next Monday, resolve them using system_time_local and place the resolved standardized value into project_cli_defaults.input instead of leaving natural-language time phrases." in prompt
assert "context_messages.mode=day" in prompt
assert "context_messages.count=2" in prompt
def test_build_worker_contract_prompt_prefers_resolved_dates_from_context_summary() -> None:
prompt = build_worker_contract_prompt(
router_output=RouterAgentOutput(
objective="查询今天日程",
context_summary="目标日期: 2026-04-24",
requires_tool_evidence=True,
)
)
assert "If context_summary contains project_cli_defaults, prefer using those exact module/method/input values directly." in prompt
@@ -0,0 +1,84 @@
from __future__ import annotations
import json
import pytest
from core.agentscope.tools.cli.adapter import invoke_cli_tool
@pytest.mark.asyncio
async def test_project_cli_requires_module_and_method() -> None:
response = await invoke_cli_tool(
tool_name="project_cli",
tool_call_args={
"module": "calendar",
"input": {},
},
allowed_commands={"calendar"},
)
assert response.content
block = response.content[0]
text = block["text"] if isinstance(block, dict) else block.text
payload = json.loads(text)
assert payload["ok"] is False
assert payload["module"] == "calendar"
assert payload["method"] == ""
assert payload["error"]["code"] == "INVALID_ARGUMENT"
@pytest.mark.asyncio
async def test_project_cli_failure_includes_method_contract_in_side_channel() -> None:
from core.agentscope.tools.tool_call_context import (
peek_tool_agent_output,
reset_current_tool_call_id,
set_current_tool_call_id,
)
from core.auth.credential_issuer import create_credential_issuer
from core.auth.tool_credential_context import reset_tool_credential, set_tool_credential
token = set_current_tool_call_id("call-test-guidance")
credential_token = set_tool_credential(
create_credential_issuer().issue(
owner_id="00000000-0000-0000-0000-000000000001",
mode="chat",
)
)
try:
response = await invoke_cli_tool(
tool_name="project_cli",
tool_call_args={
"module": "calendar",
"method": "read",
"input": {},
},
allowed_commands={"calendar"},
)
finally:
reset_tool_credential(credential_token)
reset_current_tool_call_id(token)
assert response.content
block = response.content[0]
text = block["text"] if isinstance(block, dict) else block.text
payload = json.loads(text)
assert payload["ok"] is False
assert payload["module"] == "calendar"
assert payload["method"] == "read"
assert payload["data"] is None
assert payload["error"]["code"] == "INVALID_ACTION_INPUT"
stored = peek_tool_agent_output(tool_call_id="call-test-guidance")
assert stored is not None
error = stored.get("error")
assert isinstance(error, dict)
assert error["code"] == "INVALID_ACTION_INPUT"
assert error["details"]["input_schema"]["mode"] == "string enum(day|range|event)"
assert error["details"]["expected_input_examples"][0] == {
"mode": "day",
"date": "2026-04-24",
"timezone": "Asia/Shanghai",
}
assert "resolve the day to a concrete input.date value" in error["message"]
@@ -1,38 +1,96 @@
from __future__ import annotations
import pytest
from core.agentscope.tools.cli.handler_calendar import (
_resolve_read_range,
_day_input_to_range_input,
_CalendarReadDayInput,
handle_calendar_create_event,
handle_calendar_list_day,
)
from core.agentscope.tools.cli.models import CliCommand
def test_resolve_read_range_supports_date_timezone_fallback() -> None:
request = CliCommand(
command="calendar",
subcommand="read",
owner_id="u1",
args={"date": "2026-04-23", "timezone": "Asia/Shanghai"},
def test_day_input_converts_to_tz_range() -> None:
payload = _CalendarReadDayInput.model_validate(
{"mode": "day", "date": "2026-04-23", "timezone": "Asia/Shanghai"}
)
start_at, end_at, error = _resolve_read_range(request)
result = _day_input_to_range_input(payload)
assert error is None
assert start_at is not None
assert end_at is not None
assert start_at.isoformat() == "2026-04-22T16:00:00+00:00"
assert end_at.isoformat() == "2026-04-23T16:00:00+00:00"
assert result == {
"mode": "range",
"start_at": "2026-04-23T00:00:00+08:00",
"end_at": "2026-04-24T00:00:00+08:00",
}
def test_resolve_read_range_rejects_bad_date() -> None:
@pytest.mark.asyncio
async def test_calendar_read_rejects_bad_date_format() -> None:
request = CliCommand(
command="calendar",
subcommand="read",
module="calendar",
method="read",
owner_id="u1",
args={"date": "2026/04/23", "timezone": "Asia/Shanghai"},
input={"mode": "day", "date": "2026/04/23", "timezone": "Asia/Shanghai"},
)
start_at, end_at, error = _resolve_read_range(request)
result = await handle_calendar_list_day(request)
assert start_at is None
assert end_at is None
assert error == "date must be YYYY-MM-DD"
assert result.ok is False
assert result.error is not None
assert result.error.code == "INVALID_ACTION_INPUT"
assert result.error.details == {
"missing_fields": [],
"invalid_fields": ["day.date"],
}
@pytest.mark.asyncio
async def test_calendar_read_range_requires_timezone_aware_datetimes() -> None:
request = CliCommand(
module="calendar",
method="read",
owner_id="u1",
input={
"mode": "range",
"start_at": "2026-04-23T00:00:00",
"end_at": "2026-04-24T00:00:00",
},
)
result = await handle_calendar_list_day(request)
assert result.ok is False
assert result.error is not None
assert result.error.code == "INVALID_ACTION_INPUT"
assert sorted(result.error.details["invalid_fields"]) == ["range.end_at", "range.start_at"]
@pytest.mark.asyncio
async def test_create_event_rejects_legacy_field_aliases_with_corrections() -> None:
request = CliCommand(
module="calendar",
method="create",
owner_id="u1",
input={
"title": "Project sync",
"start_time": "2026-04-23T10:00:00+08:00",
"end_time": "2026-04-23T11:00:00+08:00",
"event_timezone": "Asia/Shanghai",
},
)
result = await handle_calendar_create_event(request)
assert result.ok is False
assert result.error is not None
assert result.error.code == "INVALID_ACTION_INPUT"
assert result.error.details == {
"missing_fields": ["start_at", "timezone"],
"invalid_fields": ["end_time", "event_timezone", "start_time"],
"alias_corrections": {
"start_time": "start_at",
"end_time": "end_at",
"event_timezone": "timezone",
},
}
@@ -3,18 +3,21 @@ from __future__ import annotations
from core.agentscope.tools.cli.handlers import build_router
def test_router_registers_only_new_canonical_subcommands() -> None:
def test_router_registers_only_new_canonical_actions() -> None:
router = build_router()
assert ("calendar", "create") in router.command_pairs
assert ("calendar", "read") in router.command_pairs
assert ("calendar", "update") in router.command_pairs
assert ("calendar", "delete") in router.command_pairs
assert ("calendar", "share") in router.command_pairs
assert ("contacts", "read") in router.command_pairs
assert ("memory", "update") in router.command_pairs
assert ("calendar", "read") in router.module_methods
assert ("calendar", "create") in router.module_methods
assert ("calendar", "update") in router.module_methods
assert ("calendar", "delete") in router.module_methods
assert ("calendar", "share") in router.module_methods
assert ("calendar", "accept_invite") in router.module_methods
assert ("calendar", "reject_invite") in router.module_methods
assert ("contacts", "read") in router.module_methods
assert ("memory", "update") in router.module_methods
assert ("calendar", "write") not in router.command_pairs
assert ("contacts", "lookup") not in router.command_pairs
assert ("memory", "write") not in router.command_pairs
assert ("memory", "forget") not in router.command_pairs
assert ("calendar", "list_day") not in router.module_methods
assert ("calendar", "get_event") not in router.module_methods
assert ("contacts", "lookup") not in router.module_methods
assert ("memory", "write") not in router.module_methods
assert ("memory", "forget") not in router.module_methods
@@ -11,13 +11,13 @@ async def test_router_register_and_dispatch() -> None:
router = CommandRouter()
async def mock_handler(request: CliCommand) -> CliCommandResult:
return CliCommandResult(ok=True, command=request.command, subcommand=request.subcommand, data={"name": request.args["name"]})
return CliCommandResult(ok=True, module=request.module, method=request.method, data={"name": request.input["name"]})
router.register(command="test", subcommand="run", handler=mock_handler)
router.register(module="test", method="run", handler=mock_handler)
assert ("test", "run") in router.command_pairs
assert ("test", "run") in router.module_methods
result = await router.dispatch(CliCommand(command="test", subcommand="run", args={"name": "demo"}, owner_id="u1"))
result = await router.dispatch(CliCommand(module="test", method="run", input={"name": "demo"}, owner_id="u1"))
assert result.ok is True
assert result.data == {"name": "demo"}
@@ -25,10 +25,10 @@ async def test_router_register_and_dispatch() -> None:
@pytest.mark.asyncio
async def test_router_unknown_command() -> None:
router = CommandRouter()
result = await router.dispatch(CliCommand(command="unknown", subcommand="run", args={}, owner_id="u1"))
result = await router.dispatch(CliCommand(module="unknown", method="run", input={}, owner_id="u1"))
assert result.ok is False
assert result.error is not None
assert result.error.code == "UNKNOWN_COMMAND"
assert result.error.code == "UNKNOWN_METHOD"
@pytest.mark.asyncio
@@ -39,9 +39,9 @@ async def test_router_handler_exception() -> None:
del request
raise ValueError("intentional error")
router.register(command="fail", subcommand="run", handler=failing_handler)
router.register(module="fail", method="run", handler=failing_handler)
result = await router.dispatch(CliCommand(command="fail", subcommand="run", args={}, owner_id="u1"))
result = await router.dispatch(CliCommand(module="fail", method="run", input={}, owner_id="u1"))
assert result.ok is False
assert result.error is not None
assert result.error.code == "HANDLER_ERROR"
@@ -51,12 +51,12 @@ def test_router_duplicate_register() -> None:
router = CommandRouter()
async def handler1(request: CliCommand) -> CliCommandResult:
return CliCommandResult(ok=True, command=request.command, subcommand=request.subcommand)
return CliCommandResult(ok=True, module=request.module, method=request.method)
async def handler2(request: CliCommand) -> CliCommandResult:
return CliCommandResult(ok=True, command=request.command, subcommand=request.subcommand)
return CliCommandResult(ok=True, module=request.module, method=request.method)
router.register(command="cmd", subcommand="one", handler=handler1)
router.register(module="cmd", method="one", handler=handler1)
with pytest.raises(ValueError, match="already registered"):
router.register(command="cmd", subcommand="one", handler=handler2)
router.register(module="cmd", method="one", handler=handler2)
@@ -6,31 +6,53 @@ from schemas.agent.runtime_models import ToolAgentOutput, ToolStatus
def _make_tool_output(
*,
command: str,
subcommand: str,
module: str,
method: str,
status: ToolStatus,
data: dict | None = None,
) -> ToolAgentOutput:
return ToolAgentOutput(
tool_name="project_cli",
tool_call_id="test_call_id",
tool_call_args={"command": command, "subcommand": subcommand, "args": {}},
tool_call_args={"module": module, "method": method, "input": {}},
status=status,
result={"command": command, "subcommand": subcommand, "data": data or {}},
result={"module": module, "method": method, "data": data or {}},
error=None,
ui_hints=None,
)
def test_postprocess_calendar_read_has_ui_hints() -> None:
output = _make_tool_output(command="calendar", subcommand="read", status=ToolStatus.SUCCESS, data={"total": 5, "items": []})
output = _make_tool_output(
module="calendar",
method="read",
status=ToolStatus.SUCCESS,
data={"total": 5, "items": []},
)
processed = postprocess_tool_output(output)
assert processed.ui_hints is not None
assert processed.ui_hints["intent"] == "list"
def test_postprocess_calendar_read_event_detail_has_ui_hints() -> None:
output = _make_tool_output(
module="calendar",
method="read",
status=ToolStatus.SUCCESS,
data={"id": "evt_1", "title": "Project sync", "start_at": "2026-04-21T10:00:00+08:00"},
)
processed = postprocess_tool_output(output)
assert processed.ui_hints is not None
assert processed.ui_hints["title"] == "日程详情"
def test_postprocess_calendar_create_partial() -> None:
output = _make_tool_output(command="calendar", subcommand="create", status=ToolStatus.PARTIAL, data={"status": "partial", "success": 1, "failed": 1, "results": []})
output = _make_tool_output(
module="calendar",
method="create",
status=ToolStatus.PARTIAL,
data={"status": "partial", "success": 1, "failed": 1, "results": []},
)
processed = postprocess_tool_output(output)
assert processed.ui_hints is not None
assert processed.ui_hints["intent"] == "status"
@@ -39,8 +61,8 @@ def test_postprocess_calendar_create_partial() -> None:
def test_postprocess_calendar_share_has_ui_hints() -> None:
output = _make_tool_output(
command="calendar",
subcommand="share",
module="calendar",
method="share",
status=ToolStatus.SUCCESS,
data={
"status": "success",
@@ -60,7 +82,12 @@ def test_postprocess_calendar_share_has_ui_hints() -> None:
def test_postprocess_contacts_read_has_ui_hints() -> None:
output = _make_tool_output(command="contacts", subcommand="read", status=ToolStatus.SUCCESS, data={"friends_count": 3, "friends": []})
output = _make_tool_output(
module="contacts",
method="read",
status=ToolStatus.SUCCESS,
data={"friends_count": 3, "friends": []},
)
processed = postprocess_tool_output(output)
assert processed.ui_hints is not None
assert processed.ui_hints["intent"] == "list"
@@ -69,8 +96,8 @@ def test_postprocess_contacts_read_has_ui_hints() -> None:
def test_postprocess_memory_update_has_ui_hints() -> None:
output = _make_tool_output(
command="memory",
subcommand="update",
module="memory",
method="update",
status=ToolStatus.SUCCESS,
data={
"status": "success",
@@ -95,19 +122,19 @@ def test_postprocess_memory_update_has_ui_hints() -> None:
def test_postprocess_failure_no_ui_hints() -> None:
output = _make_tool_output(command="calendar", subcommand="read", status=ToolStatus.FAILURE, data=None)
output = _make_tool_output(module="calendar", method="read", status=ToolStatus.FAILURE, data=None)
processed = postprocess_tool_output(output)
assert processed.ui_hints is None
def test_postprocess_unknown_command_no_ui_hints() -> None:
output = _make_tool_output(command="unknown", subcommand="run", status=ToolStatus.SUCCESS, data={"data": "test"})
output = _make_tool_output(module="unknown", method="run", status=ToolStatus.SUCCESS, data={"data": "test"})
processed = postprocess_tool_output(output)
assert processed.ui_hints is None
def test_postprocess_preserves_existing_ui_hints() -> None:
output = _make_tool_output(command="calendar", subcommand="read", status=ToolStatus.SUCCESS, data={"total": 5})
output = _make_tool_output(module="calendar", method="read", status=ToolStatus.SUCCESS, data={"total": 5})
output = output.model_copy(update={"ui_hints": {"view": "custom_view", "custom": True}})
processed = postprocess_tool_output(output)
assert processed.ui_hints["view"] == "custom_view"
@@ -3,6 +3,7 @@ import asyncio
from core.agentscope.tools.internal.project_cli import PROJECT_CLI_TOOL_NAME
from core.agentscope.tools.internal.view_skill_file import VIEW_SKILL_FILE_TOOL_NAME
from core.agentscope.tools.internal import make_view_skill_file_wrapper
from core.agentscope.tools.skill_session import SkillSessionState
from core.agentscope.tools.toolkit import build_toolkit
from schemas.agent.skill_config import SkillName
@@ -48,8 +49,22 @@ def test_build_toolkit_registers_project_cli() -> None:
}
def test_build_toolkit_uses_custom_agent_skill_prompt_contract() -> None:
toolkit = build_toolkit(enabled_skill_names={"calendar"})
prompt = toolkit.get_agent_skill_prompt()
assert prompt is not None
assert "The entries below are skill indexes, not full execution instructions." in prompt
assert 'file_path="calendar/SKILL.md"' in prompt
assert "/home/" not in prompt
def test_view_skill_file_rejects_path_outside_enabled_skill_dirs() -> None:
wrapper = make_view_skill_file_wrapper(enabled_skill_names={"calendar"})
wrapper = make_view_skill_file_wrapper(
enabled_skill_names={"calendar"},
skill_session=SkillSessionState(),
)
response = asyncio.run(
wrapper(file_path="/tmp/not-allowed.txt", ranges=None),
@@ -62,10 +77,48 @@ def test_view_skill_file_rejects_path_outside_enabled_skill_dirs() -> None:
def test_view_skill_file_reads_enabled_skill_file() -> None:
wrapper = make_view_skill_file_wrapper(enabled_skill_names={"calendar"})
skill_session = SkillSessionState()
wrapper = make_view_skill_file_wrapper(
enabled_skill_names={"calendar"},
skill_session=skill_session,
)
response = asyncio.run(wrapper(file_path="calendar/SKILL.md", ranges=[1, 10]))
assert response.content
block = response.content[0]
text = block["text"] if isinstance(block, dict) else block.text
assert "Calendar Skill" in text or "name: calendar" in text
assert skill_session.has_read(skill_name="calendar") is True
def test_view_skill_file_reads_calendar_action_card() -> None:
skill_session = SkillSessionState()
wrapper = make_view_skill_file_wrapper(
enabled_skill_names={"calendar"},
skill_session=skill_session,
)
response = asyncio.run(
wrapper(file_path="calendar/actions/get_event.md", ranges=[1, 20])
)
assert response.content
block = response.content[0]
text = block["text"] if isinstance(block, dict) else block.text
assert "get_event" in text
assert '"action": "get_event"' in text
assert skill_session.has_read(skill_name="calendar") is True
def test_view_skill_file_rejects_action_card_for_disabled_skill() -> None:
wrapper = make_view_skill_file_wrapper(
enabled_skill_names={"contacts"},
skill_session=SkillSessionState(),
)
response = asyncio.run(
wrapper(file_path="calendar/actions/get_event.md", ranges=[1, 20])
)
assert response.content
block = response.content[0]
text = block["text"] if isinstance(block, dict) else block.text
assert "ACCESS_DENIED" in text
+2
View File
@@ -176,6 +176,8 @@ run 过滤语义:
`/history` 会返回 tool 消息用于 UI 重建。tool 消息的 `ui_schema` 来自 `metadata.tool_agent_output.ui_hints` 的编译结果。
补充说明:tool 消息若来自 `project_cli`,其 `metadata.tool_agent_output.tool_call_args` 内部形状以当前 CLI 协议为准。当前 canonical 结构为 `module/method/input`,调用方不得再假设存在 `skill/action/input` 或旧 `command/subcommand/args`
`messages[].content` 在当前协议中始终是字符串:
- assistant: answer 文本
+30 -8
View File
@@ -197,7 +197,13 @@ data: <json>
说明:`TOOL_CALL_RESULT``result` 字段提供紧凑、结构化、可执行的信息(优先包含 id/status/count 等关键事实),用于 agent 后续推理与工具编排。若对应工具输出存在 `ui_hints`,后端会在 codec 层编译得到 `ui_schema` 并随事件下发。
当前 `ui_hints` 策略:仅对当前 canonical CLI 的 CRUD 子命令生成(`calendar.create/read/update/delete``contacts.read``memory.update`);`calendar.share` 不生成 `ui_hints`
当前 `ui_hints` 策略:仅对当前有稳定展示语义的 canonical method 生成,例如 `calendar.read``calendar.create``calendar.update``calendar.delete``calendar.share``calendar.accept_invite``calendar.reject_invite``contacts.read``memory.update`
协议迁移说明:
- `tool_call_args` 的模型侧 canonical 结构已统一为 `module/method/input`
- SSE 事件字段名 `tool_call_args` 保持不变,但其内部对象形状以当期 `project_cli` 协议为准。
- 前端和调试工具不得再假设 `tool_call_args.command` / `tool_call_args.subcommand` 一定存在。
补充约束:
@@ -206,6 +212,18 @@ data: <json>
- `result` 仅表示执行输出事实,不重复 `tool_call_args` 已包含的输入参数。
- `ui_schema` 为可渲染 UI 线缆格式;其源数据来自 `metadata.tool_agent_output.ui_hints`
推荐的 `tool_call_args` 形状:
```json
{
"skill": "calendar",
"action": "get_event",
"input": {
"event_id": "evt_123"
}
}
```
#### 3.3.1 tool 名称展示规范(前端本地化)
SSE 协议中的工具名字段保持后端原样,不做服务端翻译:
@@ -215,16 +233,20 @@ SSE 协议中的工具名字段保持后端原样,不做服务端翻译:
前端展示层统一通过工具名本地化映射进行中文渲染,要求兼容两类命名风格:
- dot 风格:`memory.update``calendar.read`
- snake 风格:`memory_update``calendar_read`
- dot 风格:`memory.update``calendar.get_event`
- snake 风格:`memory_update``calendar_get_event`
当前规范映射(canonical -> 中文)如下:
- `calendar.read` -> `读取日程`
- `calendar.create` -> `创建日程`
- `calendar.update` -> `更新日程`
- `calendar.delete` -> `删除日程`
- `calendar.share` -> `共享日程`
- `calendar.list_day` -> `读取当日日程`
- `calendar.list_range` -> `读取区间日程`
- `calendar.get_event` -> `读取日程详情`
- `calendar.create_event` -> `创建日程`
- `calendar.update_event` -> `更新日程`
- `calendar.delete_event` -> `删除日程`
- `calendar.invite_subscriber` -> `邀请参与者`
- `calendar.accept_invite` -> `接受邀请`
- `calendar.reject_invite` -> `拒绝邀请`
- `contacts.read` -> `读取联系人`
- `memory.update` -> `更新记忆`
+62 -20
View File
@@ -24,11 +24,11 @@
1. 项目 CLI 是受限工具执行边界,不是通用 shell。
2. agent 只暴露一个 AgentScope tool`project_cli`
3. skills 只负责向 agent 披露如何使用 `project_cli`,不承担执行 transport 或权限决策。
4. Router 是 CLI 的唯一命令分发核心,只允许白名单 `command + subcommand`
5. 每个 CLI 子命令绑定 Python handler。
3. skills 只负责向 agent 披露如何使用 `project_cli`,不承担执行 transport、分发或权限决策。
4. Router 是 CLI 的唯一动作分发核心,只允许白名单 `module + method`
5. 每个 CLI method 绑定 Python handler 或 action dispatcher
6. handler 只能调用允许的内部能力,不开放任意系统命令执行。
6.1 `project_cli` 命令权限由 runtime `allowed_commands` 与 CLI router 白名单共同约束,不能由 skills 启用状态隐式放开。
6.1 `project_cli` 权限由 runtime 的 module 白名单与 CLI router 白名单共同约束,不能由 skills 可见性隐式放开。
7. `ToolAgentOutput.result` 是 canonical machine-oriented tool result。
8. `ToolResponse` 不承载完整 `ToolAgentOutput`,只承载给 agent 使用的文本投影。
9. tool UI 只来自 `ToolAgentOutput.ui_hints`,不再经过 worker `ui_hints -> ui_schema` 链路。
@@ -38,9 +38,9 @@
一次 tool 调用按如下顺序执行:
1. AgentScope tool `project_cli` 接收到模型生成的 tool call。
2. wrapper 将 `command + subcommand + args` 映射为项目 CLI 输入。
2. wrapper 将 `module + method + input` 映射为项目 CLI 输入。
3. runtime 将受控认证凭证通过环境变量注入 CLI 子进程。
4. CLI router 将 `(command, subcommand)` 分发给对应 Python handler。
4. CLI router 将 `(module, method)` 分发给对应 Python handler。
5. handler 执行业务逻辑并返回结构化 `result`
6. wrapper 将 `result` 投影为文本,写入 `ToolResponse.content`
7. runtime tool post-processor 基于 `result` 和 runtime 上下文生成完整 `ToolAgentOutput`
@@ -56,20 +56,28 @@ Agent -> `project_cli` 的结构化入参:
```json
{
"command": "calendar",
"subcommand": "read",
"args": {
"module": "calendar",
"method": "read",
"input": {
"mode": "range",
"start_at": "2026-04-21T00:00:00+08:00",
"end_at": "2026-04-22T00:00:00+08:00"
}
}
```
补充说明:
- `project_cli` 对模型暴露的 schema 保持薄封装,只保证存在 `module``method``input` 三层。
- method 级别的严格字段校验由服务端 dispatch 到对应 Pydantic model 后执行。
- 对日期/时间字段使用强类型约束:`date` 使用 Pydantic `date`,时间区间使用带时区的 `datetime`
- 不再把完整 action matrix 直接注入工具 schema,以保留渐进式披露和较低 token 成本。
CLI 运行时输入通道采用“两者结合”:
- `argv` 为主:
- command
- subcommand
- module
- method
- mode / formatting flags
- `stdin` 为辅:
- 较大的 JSON payload
@@ -88,8 +96,8 @@ CLI 运行时输入通道采用“两者结合”:
权限边界:
- `enabled_skills` 仅控制 skill 文档可见性与注册。
- `allowed_commands` 控制 `project_cli` 可执行命令集合。
- 两者职责解耦,避免“技能可见即命令授权”的隐式耦合。
- runtime 白名单控制 `project_cli` 可执行的 module/method 集合。
- 两者职责解耦,避免“技能可见即动作授权”的隐式耦合。
## 5. CLI Output Contract
@@ -100,8 +108,8 @@ CLI handler 的原始成功输出必须是统一结构化结果。
```json
{
"ok": true,
"command": "calendar",
"subcommand": "read",
"module": "calendar",
"method": "read",
"data": {
"items": [
{
@@ -150,21 +158,24 @@ post-processor 负责生成完整 `ToolAgentOutput`,至少包括:
规则:
- `result` 是真源。
- `result` 应保留 `command``subcommand``data`
- `result` 应保留 `module``method``data`
- `ui_hints` 由 post-processor 生成,不由 worker 生成。
- tool 失败时 `error` 必须为结构化对象。
- `status` 必须是 `success | failure | partial`
`ui_hints` 输出范围(当前协议):
- 输出:当前 CLI canonical 子命令中的 CRUD 调用
- `calendar.create`
- 输出:当前业务 action 中适合稳定结构化展示的调用
- `calendar.read`
- `calendar.create`
- `calendar.update`
- `calendar.delete`
- `calendar.share`
- `calendar.accept_invite`
- `calendar.reject_invite`
- `contacts.read`
- `memory.update`
- 不输出:非 CRUD 子命令(例如 `calendar.share`
- 若某 action 暂无稳定 UI 模板,可不输出 `ui_hints`,但不能回退为 worker 生成 UI。
## 8. ToolAgentOutput Contract
@@ -184,6 +195,36 @@ post-processor 负责生成完整 `ToolAgentOutput`,至少包括:
- 必须包含后续链式调用所需的 ID/outcome/status/count 等事实。
- `ui_hints` 是 tool UI 的唯一真源。
推荐的 `result` 形状:
```json
{
"module": "calendar",
"method": "read",
"data": {
"id": "evt_123",
"title": "Project sync"
}
}
```
失败时应返回结构化、可纠正的错误对象,而不是只返回泛化文案:
```json
{
"status": "failure",
"error": {
"code": "INVALID_ACTION_INPUT",
"message": "calendar.read input does not match method schema",
"module": "calendar",
"method": "read",
"input_schema": {
"mode": "string enum(day|range|event)"
}
}
}
```
## 9. History Replay Contract
`/history` 必须支持 tool UI 回放。
@@ -234,7 +275,8 @@ tool runtime 的认证边界使用 controlled credential。
## 13. Compatibility Strategy
- 策略:`backward-compatible`
- 策略:`migration`
- `ui_schema` 作为 wire format 保留,由后端 codec 从 `ui_hints` 编译而来。
- 前端 renderer 继续消费 `ui_schema`
- `ui_hints` 作为内部字段,不直接传输给前端。
- 模型侧 `command/subcommand/args` 输入协议废弃并移除,不保留广义别名兼容。
+2 -1
View File
@@ -40,10 +40,11 @@ default = true
[tool.pytest.ini_options]
testpaths = ["backend/tests"]
addopts = "-q --import-mode=importlib"
addopts = "-q --import-mode=importlib --ignore=backend/tests/quality"
asyncio_mode = "auto"
markers = [
"live: requires running local runtime and real external dependencies",
"quality: model quality and cost evaluation (not part of CI, run manually)",
]
[dependency-groups]