# Worker Token/Latency Optimization TODO Date: 2026-03-17 Owner: backend runtime Status: pending ## Background - Router cost/latency is acceptable. - Worker stage (deepseek-chat) has significantly higher input tokens and latency. - Current optimization work is deferred due to prioritization. ## Observations (from `public.messages`) - Worker avg input tokens are much higher than router (about 12k+ vs 3k). - Worker avg latency is much higher than router (about 41s vs 4s). - Worker cost dominates total cost. ## Root Cause Hypothesis - Worker ReAct path repeatedly includes full tool schemas per model call. - `calendar_write` tool schema is large and contributes major prompt overhead. - Finalize JSON step performs an additional model call after ReAct. ## Deferred Optimization Items 1. Tool schema slimming for calendar write path. - Split `calendar_write` into focused tools (`calendar_create`, `calendar_update`, `calendar_delete`). - Reduce redundant/verbose field descriptions where possible. 2. Dynamic tool set exposure by routed intent. - Only expose tools needed for current task. 3. Evaluate finalize overhead. - Verify whether finalize call can be reduced or replaced in specific flows. 4. Add before/after benchmark script. - Compare worker `input_tokens`, `latency_ms`, and `cost` for the same scripted multi-turn scenario. ## Acceptance Metrics (target) - Reduce worker input tokens by >= 30% in multi-turn calendar CRUD scenario. - Reduce worker p95 latency by >= 25%. - Keep functional behavior unchanged for agent runs.