The challenge
The client — a Series B martech / SEO platform — had a hot endpoint that aggregated tens of thousands of records per request. As customer accounts grew, p95 latency drifted from ~300ms to ~800ms. Power users started seeing dashboard timeouts.
The team had already added caching and pagination. The remaining bottleneck was the query path itself — and the schema couldn't be changed without breaking downstream integrations.
Method
- Profiled the hottest endpoint with realistic production payloads. Three queries were responsible for 80% of the latency.
- Rewrote the read path to use composite indexes and a read replica, with Redis caching on the highest-cardinality lookups.
- Ran shadow traffic in production for two weeks before promoting the new path.
- Cut infra spend by ~30% by removing redundant in-memory caches and right-sizing replica instances.
Outcome
- p95 latency: ~800ms → <300ms on the hot path
- Throughput: 100K+ daily operations sustained without horizontal scale
- Cost: 30% reduction in monthly cloud spend
- Zero breaking changes to public API contracts
Stack
Python · FastAPI · PostgreSQL · Redis · AWS ECS · Terraform · Grafana