340 lines
7.8 KiB
Markdown
340 lines
7.8 KiB
Markdown
# Logging Guidelines
|
|
|
|
> How logging is done in this project.
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
This project uses **structlog** for structured, JSON-formatted logging:
|
|
|
|
- **Library**: `structlog` with standard library integration
|
|
- **Interface**: `get_logger(name)` from `core.logging`
|
|
- **Format**: JSON logs in production, readable logs in development
|
|
- **Sensitive fields**: Automatically redacted (passwords, tokens, etc.)
|
|
|
|
---
|
|
|
|
## Log Levels
|
|
|
|
Use appropriate log levels based on event importance:
|
|
|
|
| Level | When to Use | Examples | Noise Level |
|
|
|-------|-------------|----------|-------------|
|
|
| **ERROR** | All exceptions and failures | Database connection failed, unhandled exception | Required, never skip |
|
|
| **WARNING** | Degraded behavior, retry, fallback | Cache miss, using fallback data, retry attempt | Minimal, only when action taken |
|
|
| **INFO** | Key business events | User login, message sent, run started | Minimal, only milestones |
|
|
| **DEBUG** | Detailed flow tracing (dev only) | Function entry/exit, variable values | High, avoid in release |
|
|
|
|
---
|
|
|
|
## Structured Logging
|
|
|
|
### Logger Setup
|
|
|
|
```python
|
|
from core.logging import get_logger
|
|
|
|
logger = get_logger("v1.agent.service")
|
|
```
|
|
|
|
**Module naming convention:**
|
|
- Feature modules: `v1.<feature>.<component>` (e.g., `v1.agent.service`)
|
|
- Core modules: `core.<module>.<submodule>` (e.g., `core.logging.middleware`)
|
|
|
|
### Log Format
|
|
|
|
**JSON structure:**
|
|
|
|
```json
|
|
{
|
|
"event": "user_logged_in",
|
|
"user_id": "123e4567-e89b-12d3-a456-426614174000",
|
|
"timestamp": "2026-04-10T12:34:56.789Z",
|
|
"level": "info",
|
|
"logger": "v1.agent.service",
|
|
"process": 12345,
|
|
"thread": 140123456789024
|
|
}
|
|
```
|
|
|
|
### Logging Methods
|
|
|
|
```python
|
|
# Simple event
|
|
logger.info("user_logged_in", user_id=user.id)
|
|
|
|
# With extra context
|
|
logger.info(
|
|
"run_enqueued",
|
|
run_id=run_id,
|
|
thread_id=thread_id,
|
|
user_id=current_user.id,
|
|
)
|
|
|
|
# Error with exception info
|
|
try:
|
|
await repository.create_session(...)
|
|
except IntegrityError as exc:
|
|
logger.error(
|
|
"session_creation_failed",
|
|
error=str(exc),
|
|
exc_info=True, # Include stack trace
|
|
user_id=user_id,
|
|
)
|
|
raise
|
|
```
|
|
|
|
---
|
|
|
|
## What to Log
|
|
|
|
### Always Log
|
|
|
|
1. **Exceptions**: Every catch block that handles an exception must log it
|
|
2. **Business milestones**: Key domain events (login, run started, message sent)
|
|
3. **Failed operations**: Any operation that fails or requires retry
|
|
4. **Auth events**: Login/logout, token refresh, auth failures
|
|
5. **External calls**: LLM API calls, database queries (at DEBUG level)
|
|
|
|
### Log with Context
|
|
|
|
```python
|
|
# Include relevant identifiers
|
|
logger.info(
|
|
"run_completed",
|
|
run_id=run_id,
|
|
thread_id=thread_id,
|
|
duration_seconds=duration,
|
|
)
|
|
|
|
# Include error details
|
|
logger.error(
|
|
"database_error",
|
|
operation="create_session",
|
|
error=str(exc),
|
|
exc_info=True,
|
|
)
|
|
```
|
|
|
|
### Log Levels by Event Type
|
|
|
|
**INFO - Business Milestones:**
|
|
```python
|
|
logger.info("user_logged_in", user_id=user.id)
|
|
logger.info("run_started", run_id=run_id, thread_id=thread_id)
|
|
logger.info("run_completed", run_id=run_id, duration_seconds=duration)
|
|
logger.info("message_sent", message_id=msg_id, conversation_id=conv_id)
|
|
```
|
|
|
|
**WARNING - Degraded Behavior:**
|
|
```python
|
|
logger.warning("cache_miss", key=cache_key, fallback="reload_from_db")
|
|
logger.warning("retry_attempt", operation="llm_call", attempt=2, max_attempts=3)
|
|
logger.warning("fallback_data_used", reason="external_api_timeout")
|
|
```
|
|
|
|
**ERROR - Failures:**
|
|
```python
|
|
logger.error("database_connection_failed", error=str(exc), exc_info=True)
|
|
logger.error("validation_failed", field="email", reason="invalid_format")
|
|
logger.error("auth_failed", user_id=user_id, reason="invalid_token")
|
|
try:
|
|
await service.do_operation()
|
|
except Exception as exc:
|
|
logger.error("operation_failed", error=str(exc), exc_info=True)
|
|
raise
|
|
```
|
|
|
|
---
|
|
|
|
## What NOT to Log
|
|
|
|
### Forbidden: Sensitive Data
|
|
|
|
**Never log:**
|
|
- Passwords (even hashed)
|
|
- API keys (`ERYAO__*` environment variables)
|
|
- JWT tokens
|
|
- Access tokens / refresh tokens
|
|
- Session IDs (except for tracing)
|
|
- PII (personally identifiable information) - names, emails, phone numbers
|
|
- Credit card numbers
|
|
- Social security numbers
|
|
- Secret keys
|
|
- Authorization headers
|
|
- Cookies
|
|
|
|
### Forbidden: Debug in Production
|
|
|
|
```python
|
|
# WRONG: Debug logs in release builds
|
|
import os
|
|
if os.getenv("DEBUG"):
|
|
logger.debug("variable_value", var=some_data) # Never do this
|
|
```
|
|
|
|
**Right way:** Use runtime log level configuration:
|
|
|
|
```python
|
|
# Config handles log level based on environment
|
|
# runtime:
|
|
# log_level: "INFO" # production
|
|
# log_level: "DEBUG" # development
|
|
logger.debug("variable_value", var=some_data) # Will be filtered in production
|
|
```
|
|
|
|
### Forbidden: Iteration Logs
|
|
|
|
```python
|
|
# WRONG: Log every iteration
|
|
for item in items:
|
|
logger.debug("processing_item", item_id=item.id) # Too noisy
|
|
process(item)
|
|
```
|
|
|
|
**Right way:** Log only failures:
|
|
|
|
```python
|
|
for item in items:
|
|
try:
|
|
process(item)
|
|
except Exception as exc:
|
|
logger.error("item_processing_failed", item_id=item.id, error=str(exc))
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### Settings
|
|
|
|
```python
|
|
# core.config.settings.RuntimeSettings
|
|
class RuntimeSettings(BaseModel):
|
|
log_level: str = "INFO" # DEBUG, INFO, WARNING, ERROR
|
|
log_json: bool = True # JSON format (production) vs readable (dev)
|
|
log_rotation: Literal["time", "size", "none"] = "time"
|
|
log_rotation_when: str = "midnight" # Daily rotation
|
|
log_rotation_interval: int = 1
|
|
log_rotation_backup_count: int = 14 # Keep 14 days
|
|
log_file_name: str = "" # Auto-generated: "{service_name}.log"
|
|
log_sensitive_fields: list[str] = [
|
|
"password", "secret", "token", "api_key",
|
|
"authorization", "cookie", "client_ip", "user_id",
|
|
]
|
|
```
|
|
|
|
### Log Files
|
|
|
|
**Location:** `logs/{service_name}.log` and `logs/errors/{service_name}.error.log`
|
|
|
|
**Example:**
|
|
- `logs/app.log` - All logs
|
|
- `logs/errors/app.error.log` - Errors only
|
|
|
|
---
|
|
|
|
## Common Mistakes
|
|
|
|
### ❌ Using `print()` instead of logging
|
|
|
|
```python
|
|
# WRONG: Never use print in runtime code
|
|
print(f"User {user_id} logged in")
|
|
```
|
|
|
|
**Right:** Use `logger.info`:
|
|
|
|
```python
|
|
logger.info("user_logged_in", user_id=user_id)
|
|
```
|
|
|
|
### ❌ Logging sensitive data
|
|
|
|
```python
|
|
# WRONG: Logging password
|
|
logger.info("user_login", username=username, password=password)
|
|
```
|
|
|
|
**Right:** Exclude sensitive fields:
|
|
|
|
```python
|
|
logger.info("user_login", username=username) # Password excluded
|
|
```
|
|
|
|
### ❌ Skipping error logging
|
|
|
|
```python
|
|
# WRONG: Exception is suppressed
|
|
try:
|
|
await service.do_something()
|
|
except Exception:
|
|
pass # No logging, no re-raise
|
|
```
|
|
|
|
**Right:** Log and re-raise:
|
|
|
|
```python
|
|
try:
|
|
await service.do_something()
|
|
except Exception as exc:
|
|
logger.error("operation_failed", error=str(exc), exc_info=True)
|
|
raise
|
|
```
|
|
|
|
### ❌ Info flooding
|
|
|
|
```python
|
|
# WRONG: Log every database query
|
|
results = await repository.get_all()
|
|
logger.info("query_completed", count=len(results)) # Too noisy
|
|
```
|
|
|
|
**Right:** Log only significant operations:
|
|
|
|
```python
|
|
results = await repository.get_all()
|
|
# Only log if something notable happens
|
|
if len(results) > MAX_THRESHOLD:
|
|
logger.warning("high_result_count", count=len(results))
|
|
```
|
|
|
|
---
|
|
|
|
## Examples from Codebase
|
|
|
|
### Service Layer Logging
|
|
|
|
```python
|
|
# v1/agent/service.py
|
|
logger = get_logger(__name__)
|
|
|
|
async def enqueue_run(self, ...) -> TaskAccepted:
|
|
logger.info(
|
|
"run_enqueued",
|
|
run_id=run_id,
|
|
thread_id=thread_id,
|
|
user_id=current_user.id,
|
|
)
|
|
# ...
|
|
```
|
|
|
|
### Router Layer Logging
|
|
|
|
```python
|
|
# v1/agent/router.py
|
|
logger = get_logger("v1.agent.router")
|
|
|
|
async def _acquire_sse_slot(*, user_id: str) -> bool:
|
|
try:
|
|
redis = await get_or_init_redis_client()
|
|
# ...
|
|
except Exception as exc: # noqa: BLE001
|
|
logger.warning(
|
|
"sse_slot_acquire_failed",
|
|
user_id=user_id,
|
|
reason=str(exc),
|
|
)
|
|
return True # Graceful degradation
|
|
``` |