The challenge
Care providers using the platform needed alerts the moment patient telemetry crossed threshold. The legacy ingestion path used a polling model that pushed alert latency into the 10-30 second range during peak hours — far slower than the clinical SLAs the platform sold against.
Method
- Replaced the polling ingestion with an event-driven pipeline (RabbitMQ + worker fleet).
- Instrumented the entire path with Grafana, with synthetic alerts catching regressions before they hit production traffic.
- Validated sub-second p95 alert latency under realistic load with shadow traffic.
- Migrated existing customers in stages with zero downtime.
Outcome
- Alert latency: <1 second end-to-end (was 10-30s)
- Throughput: +40% over the previous architecture
- Zero downtime during the cutover
- Audit logs for every alert path — meeting clinical compliance requirements
Stack
Python · FastAPI · PostgreSQL · RabbitMQ · Redis · AWS · Grafana · Sentry