7.8 KiB
7.8 KiB
Logging Guidelines
How logging is done in this project.
Overview
This project uses structlog for structured, JSON-formatted logging:
- Library:
structlogwith standard library integration - Interface:
get_logger(name)fromcore.logging - Format: JSON logs in production, readable logs in development
- Sensitive fields: Automatically redacted (passwords, tokens, etc.)
Log Levels
Use appropriate log levels based on event importance:
| Level | When to Use | Examples | Noise Level |
|---|---|---|---|
| ERROR | All exceptions and failures | Database connection failed, unhandled exception | Required, never skip |
| WARNING | Degraded behavior, retry, fallback | Cache miss, using fallback data, retry attempt | Minimal, only when action taken |
| INFO | Key business events | User login, message sent, run started | Minimal, only milestones |
| DEBUG | Detailed flow tracing (dev only) | Function entry/exit, variable values | High, avoid in release |
Structured Logging
Logger Setup
from core.logging import get_logger
logger = get_logger("v1.agent.service")
Module naming convention:
- Feature modules:
v1.<feature>.<component>(e.g.,v1.agent.service) - Core modules:
core.<module>.<submodule>(e.g.,core.logging.middleware)
Log Format
JSON structure:
{
"event": "user_logged_in",
"user_id": "123e4567-e89b-12d3-a456-426614174000",
"timestamp": "2026-04-10T12:34:56.789Z",
"level": "info",
"logger": "v1.agent.service",
"process": 12345,
"thread": 140123456789024
}
Logging Methods
# Simple event
logger.info("user_logged_in", user_id=user.id)
# With extra context
logger.info(
"run_enqueued",
run_id=run_id,
thread_id=thread_id,
user_id=current_user.id,
)
# Error with exception info
try:
await repository.create_session(...)
except IntegrityError as exc:
logger.error(
"session_creation_failed",
error=str(exc),
exc_info=True, # Include stack trace
user_id=user_id,
)
raise
What to Log
Always Log
- Exceptions: Every catch block that handles an exception must log it
- Business milestones: Key domain events (login, run started, message sent)
- Failed operations: Any operation that fails or requires retry
- Auth events: Login/logout, token refresh, auth failures
- External calls: LLM API calls, database queries (at DEBUG level)
Log with Context
# Include relevant identifiers
logger.info(
"run_completed",
run_id=run_id,
thread_id=thread_id,
duration_seconds=duration,
)
# Include error details
logger.error(
"database_error",
operation="create_session",
error=str(exc),
exc_info=True,
)
Log Levels by Event Type
INFO - Business Milestones:
logger.info("user_logged_in", user_id=user.id)
logger.info("run_started", run_id=run_id, thread_id=thread_id)
logger.info("run_completed", run_id=run_id, duration_seconds=duration)
logger.info("message_sent", message_id=msg_id, conversation_id=conv_id)
WARNING - Degraded Behavior:
logger.warning("cache_miss", key=cache_key, fallback="reload_from_db")
logger.warning("retry_attempt", operation="llm_call", attempt=2, max_attempts=3)
logger.warning("fallback_data_used", reason="external_api_timeout")
ERROR - Failures:
logger.error("database_connection_failed", error=str(exc), exc_info=True)
logger.error("validation_failed", field="email", reason="invalid_format")
logger.error("auth_failed", user_id=user_id, reason="invalid_token")
try:
await service.do_operation()
except Exception as exc:
logger.error("operation_failed", error=str(exc), exc_info=True)
raise
What NOT to Log
Forbidden: Sensitive Data
Never log:
- Passwords (even hashed)
- API keys (
ERYAO__*environment variables) - JWT tokens
- Access tokens / refresh tokens
- Session IDs (except for tracing)
- PII (personally identifiable information) - names, emails, phone numbers
- Credit card numbers
- Social security numbers
- Secret keys
- Authorization headers
- Cookies
Forbidden: Debug in Production
# WRONG: Debug logs in release builds
import os
if os.getenv("DEBUG"):
logger.debug("variable_value", var=some_data) # Never do this
Right way: Use runtime log level configuration:
# Config handles log level based on environment
# runtime:
# log_level: "INFO" # production
# log_level: "DEBUG" # development
logger.debug("variable_value", var=some_data) # Will be filtered in production
Forbidden: Iteration Logs
# WRONG: Log every iteration
for item in items:
logger.debug("processing_item", item_id=item.id) # Too noisy
process(item)
Right way: Log only failures:
for item in items:
try:
process(item)
except Exception as exc:
logger.error("item_processing_failed", item_id=item.id, error=str(exc))
Configuration
Settings
# core.config.settings.RuntimeSettings
class RuntimeSettings(BaseModel):
log_level: str = "INFO" # DEBUG, INFO, WARNING, ERROR
log_json: bool = True # JSON format (production) vs readable (dev)
log_rotation: Literal["time", "size", "none"] = "time"
log_rotation_when: str = "midnight" # Daily rotation
log_rotation_interval: int = 1
log_rotation_backup_count: int = 14 # Keep 14 days
log_file_name: str = "" # Auto-generated: "{service_name}.log"
log_sensitive_fields: list[str] = [
"password", "secret", "token", "api_key",
"authorization", "cookie", "client_ip", "user_id",
]
Log Files
Location: logs/{service_name}.log and logs/errors/{service_name}.error.log
Example:
logs/app.log- All logslogs/errors/app.error.log- Errors only
Common Mistakes
❌ Using print() instead of logging
# WRONG: Never use print in runtime code
print(f"User {user_id} logged in")
Right: Use logger.info:
logger.info("user_logged_in", user_id=user_id)
❌ Logging sensitive data
# WRONG: Logging password
logger.info("user_login", username=username, password=password)
Right: Exclude sensitive fields:
logger.info("user_login", username=username) # Password excluded
❌ Skipping error logging
# WRONG: Exception is suppressed
try:
await service.do_something()
except Exception:
pass # No logging, no re-raise
Right: Log and re-raise:
try:
await service.do_something()
except Exception as exc:
logger.error("operation_failed", error=str(exc), exc_info=True)
raise
❌ Info flooding
# WRONG: Log every database query
results = await repository.get_all()
logger.info("query_completed", count=len(results)) # Too noisy
Right: Log only significant operations:
results = await repository.get_all()
# Only log if something notable happens
if len(results) > MAX_THRESHOLD:
logger.warning("high_result_count", count=len(results))
Examples from Codebase
Service Layer Logging
# v1/agent/service.py
logger = get_logger(__name__)
async def enqueue_run(self, ...) -> TaskAccepted:
logger.info(
"run_enqueued",
run_id=run_id,
thread_id=thread_id,
user_id=current_user.id,
)
# ...
Router Layer Logging
# v1/agent/router.py
logger = get_logger("v1.agent.router")
async def _acquire_sse_slot(*, user_id: str) -> bool:
try:
redis = await get_or_init_redis_client()
# ...
except Exception as exc: # noqa: BLE001
logger.warning(
"sse_slot_acquire_failed",
user_id=user_id,
reason=str(exc),
)
return True # Graceful degradation