Production hardening: monitoring, logging, error handling, security #25

New issue

Closed

opened 2026-02-09 21:06:26 +00:00 by mik-tf · 1 comment

mik-tf commented

2026-02-09 21:06:26 +00:00

Owner

Overview

Harden the marketplace for production deployment with comprehensive monitoring, structured logging, proper error handling, rate limiting, and security best practices.

Areas

Monitoring & Observability

Prometheus metrics endpoint (/metrics)
Health check endpoint (/health) with dependency checks
Request latency histograms
Error rate tracking
hero_osis connection pool monitoring

Structured Logging

Replace println! / log::info! with structured JSON logging
Request tracing with correlation IDs
Log levels configurable per module
Log aggregation-friendly format

Error Handling

Consistent error response format (JSON API errors)
Proper HTTP status codes for all error conditions
User-friendly error pages for template-rendered routes
Error reporting/alerting integration

Security

Rate limiting on authentication endpoints
CSRF protection on state-changing operations
Input validation on all user-facing endpoints
Content Security Policy headers
Dependency audit (cargo audit)

Performance

Connection pooling for hero_osis
Response compression (gzip/brotli)
Static asset caching headers
Database query optimization (if PostgreSQL used)

Acceptance Criteria

Health endpoint reports all dependency statuses
Prometheus metrics available at /metrics
All errors return consistent JSON format
Rate limiting active on login endpoint
cargo audit passes with no critical vulnerabilities
Structured JSON logging in production mode

Dependencies

Requires Phase 1 complete for clean architecture baseline

## Overview Harden the marketplace for production deployment with comprehensive monitoring, structured logging, proper error handling, rate limiting, and security best practices. ## Areas ### Monitoring & Observability - Prometheus metrics endpoint (`/metrics`) - Health check endpoint (`/health`) with dependency checks - Request latency histograms - Error rate tracking - hero_osis connection pool monitoring ### Structured Logging - Replace `println!` / `log::info!` with structured JSON logging - Request tracing with correlation IDs - Log levels configurable per module - Log aggregation-friendly format ### Error Handling - Consistent error response format (JSON API errors) - Proper HTTP status codes for all error conditions - User-friendly error pages for template-rendered routes - Error reporting/alerting integration ### Security - Rate limiting on authentication endpoints - CSRF protection on state-changing operations - Input validation on all user-facing endpoints - Content Security Policy headers - Dependency audit (`cargo audit`) ### Performance - Connection pooling for hero_osis - Response compression (gzip/brotli) - Static asset caching headers - Database query optimization (if PostgreSQL used) ## Acceptance Criteria - [ ] Health endpoint reports all dependency statuses - [ ] Prometheus metrics available at `/metrics` - [ ] All errors return consistent JSON format - [ ] Rate limiting active on login endpoint - [ ] `cargo audit` passes with no critical vulnerabilities - [ ] Structured JSON logging in production mode ## Dependencies - Requires Phase 1 complete for clean architecture baseline