Files
social-app/docs/plans/2026-03-06-taskiq-migration.md
T

360 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Celery To Taskiq One-Shot Migration Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** 在当前早期项目中一次性移除 Celery,并以 Taskiq 替换异步任务基础设施,保持 agent runtime 行为不变。
**Architecture:** 复用现有 `AgentService -> QueueClientLike` 抽象,仅替换基础设施层实现(任务声明、入队调用、worker 启动、配置与依赖)。保持 Redis 作为 broker/result 存储与事件流通道,避免改动业务服务层语义。
**Tech Stack:** FastAPI, Taskiq, taskiq-redis, Redis, pytest, uv
---
### Task 1: 依赖与配置切换(先 RED 后 GREEN)
**Files:**
- Modify: `pyproject.toml`
- Modify: `backend/src/core/config/settings.py`
- Test: `backend/tests/unit/core/config/test_taskiq_settings.py` (new)
**Step 1: Write the failing test**
```python
from core.config.settings import Settings
def test_taskiq_uses_redis_url_by_default() -> None:
settings = Settings()
assert settings.taskiq_broker_url.startswith("redis://")
def test_taskiq_queue_default_value() -> None:
settings = Settings()
assert settings.taskiq.default_queue == "default"
```
**Step 2: Run test to verify it fails**
Run: `uv run pytest backend/tests/unit/core/config/test_taskiq_settings.py -v`
Expected: FAIL`taskiq_broker_url` / `taskiq` 字段不存在)
**Step 3: Write minimal implementation**
```python
class TaskiqSettings(BaseModel):
broker_url: str | None = None
result_backend_url: str | None = None
default_queue: str = "default"
class Settings(BaseSettings):
taskiq: TaskiqSettings = TaskiqSettings()
@computed_field
@property
def taskiq_broker_url(self) -> str:
return self.taskiq.broker_url or self.redis.url
@computed_field
@property
def taskiq_result_backend_url(self) -> str:
return self.taskiq.result_backend_url or self.redis.url
```
`pyproject.toml` 同步变更:
- 删除 `celery>=...`
- 增加 `taskiq>=...`
- 增加 `taskiq-redis>=...`
**Step 4: Run test to verify it passes**
Run: `uv run pytest backend/tests/unit/core/config/test_taskiq_settings.py -v`
Expected: PASS
**Step 5: Commit**
```bash
git add pyproject.toml backend/src/core/config/settings.py backend/tests/unit/core/config/test_taskiq_settings.py
git commit -m "refactor(queue): replace celery config with taskiq settings"
```
### Task 2: 新建 Taskiq broker 与 worker 启动入口
**Files:**
- Create: `backend/src/core/taskiq/app.py`
- Create: `backend/tests/unit/core/taskiq/test_app.py`
- Delete: `backend/src/core/celery/app.py`
**Step 1: Write the failing test**
```python
from core.taskiq.app import broker
def test_taskiq_broker_is_configured() -> None:
assert broker is not None
```
**Step 2: Run test to verify it fails**
Run: `uv run pytest backend/tests/unit/core/taskiq/test_app.py -v`
Expected: FAIL(模块不存在)
**Step 3: Write minimal implementation**
```python
from taskiq_redis import ListQueueBroker, RedisAsyncResultBackend
from core.config.settings import config
broker = ListQueueBroker(url=config.taskiq_broker_url).with_result_backend(
RedisAsyncResultBackend(redis_url=config.taskiq_result_backend_url)
)
```
说明:若当前 `taskiq-redis` 版本 API 名称有差异,以该版本官方 API 为准做等价实现。
**Step 4: Run test to verify it passes**
Run: `uv run pytest backend/tests/unit/core/taskiq/test_app.py -v`
Expected: PASS
**Step 5: Commit**
```bash
git add backend/src/core/taskiq/app.py backend/tests/unit/core/taskiq/test_app.py backend/src/core/celery/app.py
git commit -m "feat(queue): add taskiq broker app and remove celery app"
```
### Task 3: 迁移任务定义(Celery task -> Taskiq task
**Files:**
- Modify: `backend/src/core/agent/infrastructure/queue/tasks.py`
- Test: `backend/tests/unit/core/agent/infrastructure/queue/test_tasks.py` (new)
**Step 1: Write the failing test**
```python
from core.agent.infrastructure.queue.tasks import run_agent_task
async def test_run_agent_task_invalid_command_raises() -> None:
try:
await run_agent_task({"command": "unknown", "session_id": "00000000-0000-0000-0000-000000000001"})
raise AssertionError("expected ValueError")
except ValueError as exc:
assert "invalid command type" in str(exc)
```
**Step 2: Run test to verify it fails**
Run: `uv run pytest backend/tests/unit/core/agent/infrastructure/queue/test_tasks.py -v`
Expected: FAIL(测试文件不存在或导入失败)
**Step 3: Write minimal implementation**
```python
from core.taskiq.app import broker
@broker.task(task_name="tasks.agent.run_command")
async def run_command_task(command: dict[str, Any]) -> dict[str, object]:
return await run_agent_task(command)
```
并移除:
- `from core.celery.app import celery_app`
- `@celery_app.task(...)`
**Step 4: Run test to verify it passes**
Run: `uv run pytest backend/tests/unit/core/agent/infrastructure/queue/test_tasks.py -v`
Expected: PASS
**Step 5: Commit**
```bash
git add backend/src/core/agent/infrastructure/queue/tasks.py backend/tests/unit/core/agent/infrastructure/queue/test_tasks.py
git commit -m "refactor(agent): migrate run command task to taskiq"
```
### Task 4: 迁移 API 入队客户端(.delay -> .kiq
**Files:**
- Modify: `backend/src/v1/agent/dependencies.py`
- Test: `backend/tests/unit/v1/agent/test_dependencies_queue.py` (new)
**Step 1: Write the failing test**
```python
class _FakeTask:
async def kiq(self, payload: dict[str, object]):
class _Result:
task_id = "task-123"
return _Result()
async def test_enqueue_returns_task_id(monkeypatch):
from v1.agent.dependencies import CeleryQueueClient
client = CeleryQueueClient() # 迁移后应重命名为 TaskiqQueueClient
monkeypatch.setattr("v1.agent.dependencies.run_command_task", _FakeTask())
task_id = await client.enqueue(command={"command": "run"}, dedup_key=None)
assert task_id == "task-123"
```
**Step 2: Run test to verify it fails**
Run: `uv run pytest backend/tests/unit/v1/agent/test_dependencies_queue.py -v`
Expected: FAIL(类型/方法不匹配)
**Step 3: Write minimal implementation**
```python
class TaskiqQueueClient:
async def enqueue(self, *, command: dict[str, object], dedup_key: str | None) -> str:
payload = dict(command)
if dedup_key:
payload["dedup_key"] = dedup_key
result = await run_command_task.kiq(payload)
task_id = str(result.task_id)
return task_id
```
并替换 DI
```python
queue=TaskiqQueueClient()
```
**Step 4: Run test to verify it passes**
Run: `uv run pytest backend/tests/unit/v1/agent/test_dependencies_queue.py -v`
Expected: PASS
**Step 5: Commit**
```bash
git add backend/src/v1/agent/dependencies.py backend/tests/unit/v1/agent/test_dependencies_queue.py
git commit -m "refactor(api): switch agent enqueue client from celery to taskiq"
```
### Task 5: 运维脚本与日志测试清理(一次性删除 Celery)
**Files:**
- Modify: `infra/scripts/app.sh`
- Delete: `backend/tests/unit/test_celery_logging.py`
- Modify/Create: `backend/tests/unit/core/logging/test_taskiq_logging.py` (if taskiq logging hook implemented)
- Modify: `backend/src/core/logging/__init__.py`(移除 celery logging export
**Step 1: Write the failing test**
```python
def test_worker_command_uses_taskiq() -> None:
content = Path("infra/scripts/app.sh").read_text()
assert "uv run taskiq worker" in content
assert "uv run celery" not in content
```
**Step 2: Run test to verify it fails**
Run: `uv run pytest backend/tests/unit/core/logging/test_taskiq_logging.py -v`
Expected: FAIL(脚本仍含 celery
**Step 3: Write minimal implementation**
`infra/scripts/app.sh` worker 命令替换为 Taskiq worker,例如:
```bash
uv run taskiq worker core.taskiq.app:broker core.agent.infrastructure.queue.tasks
```
删除所有 celery 进程清理匹配:
```bash
pgrep -f "taskiq.*worker"
pkill -f "taskiq.*worker"
```
**Step 4: Run test to verify it passes**
Run: `uv run pytest backend/tests/unit/core/logging/test_taskiq_logging.py -v`
Expected: PASS
**Step 5: Commit**
```bash
git add infra/scripts/app.sh backend/src/core/logging/__init__.py backend/tests/unit/core/logging/test_taskiq_logging.py backend/tests/unit/test_celery_logging.py
git commit -m "chore(infra): replace celery worker scripts and remove celery-specific tests"
```
### Task 6: 全量引用清理与回归验证
**Files:**
- Modify: `docs/runtime/runtime-runbook.md`
- Modify: 其他引用 Celery 的运行文档(按 `rg` 结果逐个更新)
**Step 1: Write the failing test**
```python
# 用命令断言替代代码测试
# rg -n "celery" backend/src infra/scripts docs/runtime pyproject.toml
```
**Step 2: Run check to verify it fails**
Run: `rg -n "celery" backend/src infra/scripts docs/runtime pyproject.toml`
Expected: 仍有旧引用
**Step 3: Write minimal implementation**
- 删除/替换剩余 Celery 代码、文档、配置。
- 保留历史变更记录中的 Celery 字样(如 bugs 归档)可接受,但运行路径必须为 0 引用。
**Step 4: Run verification suite**
Run:
- `uv run pytest backend/tests/unit -q`
- `uv run pytest backend/tests/integration -q`
- `uv run pytest backend/tests/e2e -q`(如环境不满足,记录原因)
- `uv run ruff check backend/src backend/tests`
- `uv run basedpyright`
- `rg -n "celery" backend/src infra/scripts pyproject.toml`
Expected:
- 测试与静态检查通过
- 运行路径无 Celery 引用
**Step 5: Commit**
```bash
git add docs/runtime/runtime-runbook.md pyproject.toml backend/src infra/scripts backend/tests
git commit -m "refactor(queue): complete one-shot migration from celery to taskiq"
```
### Task 7: L1 Review Gates 与交付确认
**Files:**
- No code changes required by default
**Step 1: Run required L1 gate (`refactor-cleaner`)**
Run: 使用 `refactor-cleaner` 审查迁移后冗余代码、死引用、命名一致性。
Expected: 无阻断问题。
**Step 2: Optional `code-reviewer` (recommended for infra switch)**
Run: 使用 `code-reviewer` 聚焦任务丢失、重复消费、幂等锁逻辑。
Expected: 无 CRITICAL/HIGH 问题。
**Step 3: Final evidence report**
输出内容必须包含:
- 执行命令列表
- 每条命令 PASS/FAIL
- 若有无法执行项(如 e2e 环境),给出原因与人工验证步骤
**Step 4: Commit review notes (optional)**
```bash
git add docs/plans/2026-03-06-taskiq-migration.md
git commit -m "docs(plan): taskiq one-shot migration execution checklist"
```