# Logging Guidelines > How logging is done in this project. --- ## Overview This project uses **structlog** for structured, JSON-formatted logging: - **Library**: `structlog` with standard library integration - **Interface**: `get_logger(name)` from `core.logging` - **Format**: JSON logs in production, readable logs in development - **Sensitive fields**: Automatically redacted (passwords, tokens, etc.) --- ## Log Levels Use appropriate log levels based on event importance: | Level | When to Use | Examples | Noise Level | |-------|-------------|----------|-------------| | **ERROR** | All exceptions and failures | Database connection failed, unhandled exception | Required, never skip | | **WARNING** | Degraded behavior, retry, fallback | Cache miss, using fallback data, retry attempt | Minimal, only when action taken | | **INFO** | Key business events | User login, message sent, run started | Minimal, only milestones | | **DEBUG** | Detailed flow tracing (dev only) | Function entry/exit, variable values | High, avoid in release | --- ## Structured Logging ### Logger Setup ```python from core.logging import get_logger logger = get_logger("v1.agent.service") ``` **Module naming convention:** - Feature modules: `v1..` (e.g., `v1.agent.service`) - Core modules: `core..` (e.g., `core.logging.middleware`) ### Log Format **JSON structure:** ```json { "event": "user_logged_in", "user_id": "123e4567-e89b-12d3-a456-426614174000", "timestamp": "2026-04-10T12:34:56.789Z", "level": "info", "logger": "v1.agent.service", "process": 12345, "thread": 140123456789024 } ``` ### Logging Methods ```python # Simple event logger.info("user_logged_in", user_id=user.id) # With extra context logger.info( "run_enqueued", run_id=run_id, thread_id=thread_id, user_id=current_user.id, ) # Error with exception info try: await repository.create_session(...) except IntegrityError as exc: logger.error( "session_creation_failed", error=str(exc), exc_info=True, # Include stack trace user_id=user_id, ) raise ``` --- ## What to Log ### Always Log 1. **Exceptions**: Every catch block that handles an exception must log it 2. **Business milestones**: Key domain events (login, run started, message sent) 3. **Failed operations**: Any operation that fails or requires retry 4. **Auth events**: Login/logout, token refresh, auth failures 5. **External calls**: LLM API calls, database queries (at DEBUG level) ### Log with Context ```python # Include relevant identifiers logger.info( "run_completed", run_id=run_id, thread_id=thread_id, duration_seconds=duration, ) # Include error details logger.error( "database_error", operation="create_session", error=str(exc), exc_info=True, ) ``` ### Log Levels by Event Type **INFO - Business Milestones:** ```python logger.info("user_logged_in", user_id=user.id) logger.info("run_started", run_id=run_id, thread_id=thread_id) logger.info("run_completed", run_id=run_id, duration_seconds=duration) logger.info("message_sent", message_id=msg_id, conversation_id=conv_id) ``` **WARNING - Degraded Behavior:** ```python logger.warning("cache_miss", key=cache_key, fallback="reload_from_db") logger.warning("retry_attempt", operation="llm_call", attempt=2, max_attempts=3) logger.warning("fallback_data_used", reason="external_api_timeout") ``` **ERROR - Failures:** ```python logger.error("database_connection_failed", error=str(exc), exc_info=True) logger.error("validation_failed", field="email", reason="invalid_format") logger.error("auth_failed", user_id=user_id, reason="invalid_token") try: await service.do_operation() except Exception as exc: logger.error("operation_failed", error=str(exc), exc_info=True) raise ``` --- ## What NOT to Log ### Forbidden: Sensitive Data **Never log:** - Passwords (even hashed) - API keys (`ERYAO__*` environment variables) - JWT tokens - Access tokens / refresh tokens - Session IDs (except for tracing) - PII (personally identifiable information) - names, emails, phone numbers - Credit card numbers - Social security numbers - Secret keys - Authorization headers - Cookies ### Forbidden: Debug in Production ```python # WRONG: Debug logs in release builds import os if os.getenv("DEBUG"): logger.debug("variable_value", var=some_data) # Never do this ``` **Right way:** Use runtime log level configuration: ```python # Config handles log level based on environment # runtime: # log_level: "INFO" # production # log_level: "DEBUG" # development logger.debug("variable_value", var=some_data) # Will be filtered in production ``` ### Forbidden: Iteration Logs ```python # WRONG: Log every iteration for item in items: logger.debug("processing_item", item_id=item.id) # Too noisy process(item) ``` **Right way:** Log only failures: ```python for item in items: try: process(item) except Exception as exc: logger.error("item_processing_failed", item_id=item.id, error=str(exc)) ``` --- ## Configuration ### Settings ```python # core.config.settings.RuntimeSettings class RuntimeSettings(BaseModel): log_level: str = "INFO" # DEBUG, INFO, WARNING, ERROR log_json: bool = True # JSON format (production) vs readable (dev) log_rotation: Literal["time", "size", "none"] = "time" log_rotation_when: str = "midnight" # Daily rotation log_rotation_interval: int = 1 log_rotation_backup_count: int = 14 # Keep 14 days log_file_name: str = "" # Auto-generated: "{service_name}.log" log_sensitive_fields: list[str] = [ "password", "secret", "token", "api_key", "authorization", "cookie", "client_ip", "user_id", ] ``` ### Log Files **Location:** `logs/{service_name}.log` and `logs/errors/{service_name}.error.log` **Example:** - `logs/app.log` - All logs - `logs/errors/app.error.log` - Errors only --- ## Common Mistakes ### ❌ Using `print()` instead of logging ```python # WRONG: Never use print in runtime code print(f"User {user_id} logged in") ``` **Right:** Use `logger.info`: ```python logger.info("user_logged_in", user_id=user_id) ``` ### ❌ Logging sensitive data ```python # WRONG: Logging password logger.info("user_login", username=username, password=password) ``` **Right:** Exclude sensitive fields: ```python logger.info("user_login", username=username) # Password excluded ``` ### ❌ Skipping error logging ```python # WRONG: Exception is suppressed try: await service.do_something() except Exception: pass # No logging, no re-raise ``` **Right:** Log and re-raise: ```python try: await service.do_something() except Exception as exc: logger.error("operation_failed", error=str(exc), exc_info=True) raise ``` ### ❌ Info flooding ```python # WRONG: Log every database query results = await repository.get_all() logger.info("query_completed", count=len(results)) # Too noisy ``` **Right:** Log only significant operations: ```python results = await repository.get_all() # Only log if something notable happens if len(results) > MAX_THRESHOLD: logger.warning("high_result_count", count=len(results)) ``` --- ## Examples from Codebase ### Service Layer Logging ```python # v1/agent/service.py logger = get_logger(__name__) async def enqueue_run(self, ...) -> TaskAccepted: logger.info( "run_enqueued", run_id=run_id, thread_id=thread_id, user_id=current_user.id, ) # ... ``` ### Router Layer Logging ```python # v1/agent/router.py logger = get_logger("v1.agent.router") async def _acquire_sse_slot(*, user_id: str) -> bool: try: redis = await get_or_init_redis_client() # ... except Exception as exc: # noqa: BLE001 logger.warning( "sse_slot_acquire_failed", user_id=user_id, reason=str(exc), ) return True # Graceful degradation ```