130 lines
6.2 KiB
Markdown
130 lines
6.2 KiB
Markdown
# Runtime Refactor and Prompt Centralization Implementation Plan
|
|
|
|
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
|
|
|
**Goal:** Refactor CrewAI runtime into reusable modules, centralize all prompt text under `core/agent/prompt`, and diagnose flaky front-tool interrupt behavior without adding hardcoded runtime heuristics.
|
|
|
|
**Architecture:** Keep `runtime.py` as a thin facade and move parsing/tool/prompt composition/stage execution into cohesive modules. Prompt strings (including stage contracts and injected tool-context instructions) are generated exclusively by prompt-module functions. Keep behavior equivalent by default; only add diagnostic observability for flaky live scenario analysis.
|
|
|
|
**Tech Stack:** Python 3.12, FastAPI backend, CrewAI, Pydantic v2, pytest, ruff, basedpyright.
|
|
|
|
---
|
|
|
|
### Task 1: Add prompt module and centralize all runtime prompt text
|
|
|
|
**Files:**
|
|
- Create: `backend/src/core/agent/prompt/__init__.py`
|
|
- Create: `backend/src/core/agent/prompt/runtime_stage_prompts.py`
|
|
- Modify: `backend/src/core/agent/infrastructure/crewai/runtime.py`
|
|
- Test: `backend/tests/unit/core/agent/test_crewai_runtime.py`
|
|
|
|
**Step 1: Write failing test**
|
|
- Add unit test asserting runtime uses prompt builder output (not inline literals) for stage description/contract/tool context.
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
- Run: `uv run pytest backend/tests/unit/core/agent/test_crewai_runtime.py::test_runtime_uses_prompt_module_for_stage_descriptions -q`
|
|
- Expected: FAIL because runtime still composes inline strings.
|
|
|
|
**Step 3: Implement prompt module**
|
|
- Add prompt functions:
|
|
- `build_stage_output_contract(stage: str) -> str`
|
|
- `build_stage_task_description(...) -> str`
|
|
- `build_intent_multimodal_prompt(...) -> str`
|
|
- Use mainstream prompt structure: role/objective/context/constraints/output-format.
|
|
- Keep rules non-hardcoded and behavior-oriented, avoid keyword-triggered branching rules.
|
|
|
|
**Step 4: Wire runtime to prompt functions**
|
|
- Replace inline prompt strings in runtime with prompt-module function calls.
|
|
- Ensure no prompt literals remain in runtime except minimal wiring labels.
|
|
|
|
**Step 5: Run tests**
|
|
- Run: `uv run pytest backend/tests/unit/core/agent/test_crewai_runtime.py -q`
|
|
- Expected: PASS.
|
|
|
|
---
|
|
|
|
### Task 2: Split runtime into reusable modules and keep facade stable
|
|
|
|
**Files:**
|
|
- Create: `backend/src/core/agent/infrastructure/crewai/runtime_models.py`
|
|
- Create: `backend/src/core/agent/infrastructure/crewai/runtime_parsers.py`
|
|
- Create: `backend/src/core/agent/infrastructure/crewai/runtime_tools.py`
|
|
- Create: `backend/src/core/agent/infrastructure/crewai/runtime_stage_runner.py`
|
|
- Modify: `backend/src/core/agent/infrastructure/crewai/runtime.py`
|
|
- Modify: `backend/src/core/agent/infrastructure/crewai/__init__.py` (if needed)
|
|
- Test: `backend/tests/unit/core/agent/test_crewai_runtime.py`
|
|
|
|
**Step 1: Write failing test**
|
|
- Add/adjust unit test that imports `CrewAIRuntime` facade and verifies existing contract (`execute`, `map_events`, `is_registered_backend_tool`) still works after split.
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
- Run: `uv run pytest backend/tests/unit/core/agent/test_crewai_runtime.py::test_runtime_facade_contract_stable_after_refactor -q`
|
|
- Expected: FAIL before module split wiring.
|
|
|
|
**Step 3: Extract models/parsers/tools/stage-runner**
|
|
- Move Pydantic result models to `runtime_models.py`.
|
|
- Move parse/normalize helpers to `runtime_parsers.py`.
|
|
- Move tool normalization, routing tool class, pending-front-tool extraction to `runtime_tools.py`.
|
|
- Move `_run_stage_with_crewai` + usage extraction to `runtime_stage_runner.py`.
|
|
|
|
**Step 4: Keep runtime facade thin**
|
|
- `runtime.py` retains orchestration flow and public API only.
|
|
- Import and compose extracted modules; no behavior change intended.
|
|
|
|
**Step 5: Run tests**
|
|
- Run: `uv run pytest backend/tests/unit/core/agent/test_crewai_runtime.py -q`
|
|
- Expected: PASS.
|
|
|
|
---
|
|
|
|
### Task 3: Diagnose front-tool interrupt instability with explicit observability
|
|
|
|
**Files:**
|
|
- Modify: `backend/src/core/agent/infrastructure/crewai/runtime.py`
|
|
- Modify: `backend/src/core/agent/infrastructure/crewai/runtime_stage_runner.py`
|
|
- Modify: `backend/tests/e2e/test_agent_live_flow.py`
|
|
- Modify: `docs/bugs/2026-03-08-backend-tool-no-events.md`
|
|
|
|
**Step 1: Add failing/diagnostic assertion in live test path**
|
|
- Extend test to capture and print structured diagnostics when `pending_tool_call_id` is `None`:
|
|
- intent/execution raw+structured output
|
|
- tool payload injected into prompts
|
|
- captured tool calls list
|
|
|
|
**Step 2: Run targeted live test for evidence**
|
|
- Run: `AGENT_LIVE_E2E=1 uv run pytest backend/tests/e2e/test_agent_live_flow.py::test_agent_live_front_tool_interrupt_resume_continue -v -rs`
|
|
- Expected: still flaky/fail, but with actionable diagnostics.
|
|
|
|
**Step 3: Analyze evidence and apply non-hardcoded fix**
|
|
- If input ambiguity: refine test input prompt text under test fixture.
|
|
- If tool-description injection issue: fix prompt-builder injection logic.
|
|
- Do not add keyword heuristics in runtime branching.
|
|
|
|
**Step 4: Re-run live targeted test**
|
|
- Same command as Step 2.
|
|
- Expected: improved stability or clearly documented unresolved root cause.
|
|
|
|
**Step 5: Update bug doc**
|
|
- Add root-cause findings and next actions under Bug 3 section.
|
|
|
|
---
|
|
|
|
### Task 4: Full verification and hygiene
|
|
|
|
**Files:**
|
|
- Modify (if needed): `backend/tests/unit/core/agent/test_run_resume_service.py`
|
|
|
|
**Step 1: Run impacted unit suites**
|
|
- `uv run pytest backend/tests/unit/core/agent/test_crewai_runtime.py -q`
|
|
- `uv run pytest backend/tests/unit/core/agent/test_run_resume_service.py -q`
|
|
|
|
**Step 2: Run lint/type checks**
|
|
- `uv run ruff check backend/src/core/agent/prompt backend/src/core/agent/infrastructure/crewai backend/tests/unit/core/agent/test_crewai_runtime.py backend/tests/e2e/test_agent_live_flow.py`
|
|
- `uv run basedpyright backend/src/core/agent/prompt backend/src/core/agent/infrastructure/crewai backend/tests/unit/core/agent/test_crewai_runtime.py`
|
|
|
|
**Step 3: Optional live regression pack (if env ready)**
|
|
- `AGENT_LIVE_E2E=1 uv run pytest backend/tests/e2e/test_agent_live_flow.py -m live -v -rs`
|
|
|
|
**Step 4: Report residual risk**
|
|
- If live still flaky, report exact failure mode and captured diagnostics (no workaround heuristics).
|