qzl/social-app

Fork 0

Files

T

zl-q 2980213a5b fix(agent): stabilize live e2e tool execution and loop isolation

2026-03-08 22:41:59 +08:00

6.2 KiB

Raw Blame History

Runtime Refactor and Prompt Centralization Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Refactor CrewAI runtime into reusable modules, centralize all prompt text under core/agent/prompt, and diagnose flaky front-tool interrupt behavior without adding hardcoded runtime heuristics.

Architecture: Keep runtime.py as a thin facade and move parsing/tool/prompt composition/stage execution into cohesive modules. Prompt strings (including stage contracts and injected tool-context instructions) are generated exclusively by prompt-module functions. Keep behavior equivalent by default; only add diagnostic observability for flaky live scenario analysis.

Tech Stack: Python 3.12, FastAPI backend, CrewAI, Pydantic v2, pytest, ruff, basedpyright.

Task 1: Add prompt module and centralize all runtime prompt text

Files:

Create: backend/src/core/agent/prompt/__init__.py
Create: backend/src/core/agent/prompt/runtime_stage_prompts.py
Modify: backend/src/core/agent/infrastructure/crewai/runtime.py
Test: backend/tests/unit/core/agent/test_crewai_runtime.py

Step 1: Write failing test

Add unit test asserting runtime uses prompt builder output (not inline literals) for stage description/contract/tool context.

Step 2: Run test to verify it fails

Run: uv run pytest backend/tests/unit/core/agent/test_crewai_runtime.py::test_runtime_uses_prompt_module_for_stage_descriptions -q
Expected: FAIL because runtime still composes inline strings.

Step 3: Implement prompt module

Add prompt functions:
- build_stage_output_contract(stage: str) -> str
- build_stage_task_description(...) -> str
- build_intent_multimodal_prompt(...) -> str
Use mainstream prompt structure: role/objective/context/constraints/output-format.
Keep rules non-hardcoded and behavior-oriented, avoid keyword-triggered branching rules.

Step 4: Wire runtime to prompt functions

Replace inline prompt strings in runtime with prompt-module function calls.
Ensure no prompt literals remain in runtime except minimal wiring labels.

Step 5: Run tests

Run: uv run pytest backend/tests/unit/core/agent/test_crewai_runtime.py -q
Expected: PASS.

Task 2: Split runtime into reusable modules and keep facade stable

Files:

Create: backend/src/core/agent/infrastructure/crewai/runtime_models.py
Create: backend/src/core/agent/infrastructure/crewai/runtime_parsers.py
Create: backend/src/core/agent/infrastructure/crewai/runtime_tools.py
Create: backend/src/core/agent/infrastructure/crewai/runtime_stage_runner.py
Modify: backend/src/core/agent/infrastructure/crewai/runtime.py
Modify: backend/src/core/agent/infrastructure/crewai/__init__.py (if needed)
Test: backend/tests/unit/core/agent/test_crewai_runtime.py

Step 1: Write failing test

Add/adjust unit test that imports CrewAIRuntime facade and verifies existing contract (execute, map_events, is_registered_backend_tool) still works after split.

Step 2: Run test to verify it fails

Run: uv run pytest backend/tests/unit/core/agent/test_crewai_runtime.py::test_runtime_facade_contract_stable_after_refactor -q
Expected: FAIL before module split wiring.

Step 3: Extract models/parsers/tools/stage-runner

Move Pydantic result models to runtime_models.py.
Move parse/normalize helpers to runtime_parsers.py.
Move tool normalization, routing tool class, pending-front-tool extraction to runtime_tools.py.
Move _run_stage_with_crewai + usage extraction to runtime_stage_runner.py.

Step 4: Keep runtime facade thin

runtime.py retains orchestration flow and public API only.
Import and compose extracted modules; no behavior change intended.

Step 5: Run tests

Run: uv run pytest backend/tests/unit/core/agent/test_crewai_runtime.py -q
Expected: PASS.

Task 3: Diagnose front-tool interrupt instability with explicit observability

Files:

Modify: backend/src/core/agent/infrastructure/crewai/runtime.py
Modify: backend/src/core/agent/infrastructure/crewai/runtime_stage_runner.py
Modify: backend/tests/e2e/test_agent_live_flow.py
Modify: docs/bugs/2026-03-08-backend-tool-no-events.md

Step 1: Add failing/diagnostic assertion in live test path

Extend test to capture and print structured diagnostics when pending_tool_call_id is None:
- intent/execution raw+structured output
- tool payload injected into prompts
- captured tool calls list

Step 2: Run targeted live test for evidence

Run: AGENT_LIVE_E2E=1 uv run pytest backend/tests/e2e/test_agent_live_flow.py::test_agent_live_front_tool_interrupt_resume_continue -v -rs
Expected: still flaky/fail, but with actionable diagnostics.

Step 3: Analyze evidence and apply non-hardcoded fix

If input ambiguity: refine test input prompt text under test fixture.
If tool-description injection issue: fix prompt-builder injection logic.
Do not add keyword heuristics in runtime branching.

Step 4: Re-run live targeted test

Same command as Step 2.
Expected: improved stability or clearly documented unresolved root cause.

Step 5: Update bug doc

Add root-cause findings and next actions under Bug 3 section.

Task 4: Full verification and hygiene

Files:

Modify (if needed): backend/tests/unit/core/agent/test_run_resume_service.py

Step 1: Run impacted unit suites

uv run pytest backend/tests/unit/core/agent/test_crewai_runtime.py -q
uv run pytest backend/tests/unit/core/agent/test_run_resume_service.py -q

Step 2: Run lint/type checks

uv run ruff check backend/src/core/agent/prompt backend/src/core/agent/infrastructure/crewai backend/tests/unit/core/agent/test_crewai_runtime.py backend/tests/e2e/test_agent_live_flow.py
uv run basedpyright backend/src/core/agent/prompt backend/src/core/agent/infrastructure/crewai backend/tests/unit/core/agent/test_crewai_runtime.py

Step 3: Optional live regression pack (if env ready)

AGENT_LIVE_E2E=1 uv run pytest backend/tests/e2e/test_agent_live_flow.py -m live -v -rs

Step 4: Report residual risk

If live still flaky, report exact failure mode and captured diagnostics (no workaround heuristics).

6.2 KiB Raw Blame History

Runtime Refactor and Prompt Centralization Implementation Plan

Task 1: Add prompt module and centralize all runtime prompt text

Task 2: Split runtime into reusable modules and keep facade stable

Task 3: Diagnose front-tool interrupt instability with explicit observability

Task 4: Full verification and hygiene

6.2 KiB

Raw Blame History