Skip to content

📝 Postmortems Timeline

All recorded incidents, outages, and lessons learned are documented here, ordered by date (newest last).
Each entry links to its detailed postmortem report.


🗓️ 2025

September

  • 2025‑09‑03Data Center Outage
    A critical data center incident that caused widespread downtime and impacted all production environments.

  • 2025‑09‑04Slow Query Performance
    Database query inefficiencies led to degraded API latency and service slowness.

  • 2025‑09‑13SSO Integration Problem
    Authentication layer failed due to expired tokens and config mismatch in SSO provider.

  • 2025‑09‑20500 Error on Vehicle Addition
    API malfunction during vehicle registration flow triggered 500 responses and blocked user submissions.

  • 2025‑09‑24Nginx Upstream DNS Resolve Failed
    Upstream name resolution failures at Nginx layer caused intermittent 502 gateway errors.


October

  • 2025‑10‑05Upstream Worker Errors
    Service instability caused by insufficient worker pool capacity; resolved after scaling up workers.

  • 2025‑10‑05Fava 429 Problem on Edge IP
    External rate‑limiting by Fava on IP 81.12.28.40 caused partial outage across dependent APIs.

  • 2025‑10‑07DNS Resolve / Nameserver Problem
    Nameservers not properly configured after DNS provider migration, leading to global DNS downtime.

  • 2025‑10‑09Fava Storage Problem
    Storage subsystem failure at Fava caused complete API outage; fixed after infrastructure restoration.