TL;DR
A 502 Bad Gateway error means a server acting as a gateway or proxy received an invalid response from the upstream server it was forwarding the request to. The gateway itself is reachable, but whatever was supposed to handle the request behind it didn’t return a usable HTTP response. The user sees the gateway’s error page; the actual problem is one layer deeper.
If you’re a visitor: refreshing in 30 seconds, switching networks, or trying a different DNS resolver fixes the visible symptom in roughly 60% of cases. The other 40% are real upstream outages that no client-side fix will resolve — those need the site operator to act.
How a 502 happens — the request path
Most modern websites don’t serve HTTP responses from a single server. The path looks something like this:
Browser → Cloudflare/Akamai (CDN) → load balancer → app server → database
└──── this layer is "the gateway" in HTTP-502 terms
Any time the gateway tries to forward a request and the upstream layer fails to send back a valid HTTP response (no response at all, partial response, garbage bytes, TCP reset), the gateway has to tell the client something. The HTTP standard reserves status code 502 for exactly this situation: “I’m a gateway, I tried to ask upstream, and I got a malformed answer back.”
Crucially, 502 is not the same as 504. A 504 Gateway Timeout means the upstream took too long to respond at all. A 502 means upstream did respond but the response wasn’t valid HTTP. They look similar from the user’s seat, but the underlying problems are different.
The seven most common causes
1. Application server crashed or restarted
The most common cause by far. The app server (Node, Python, Java, PHP-FPM) crashes, gets killed by OOM, or is restarting after a deploy. Health checks haven’t yet pulled it from the load balancer pool, so the gateway tries to forward to it and gets a TCP reset.
Symptom pattern: brief 502s clustered in time, then they stop on their own when health checks catch up.
2. Connection limits exhausted
Too many concurrent requests, the upstream’s max-connections limit hits, new connections are refused. The gateway sees an immediate connection failure and emits 502.
Symptom pattern: 502s correlate with traffic spikes (Black Friday, viral content, DDoS).
3. Backend timeout > gateway timeout
The gateway has a proxy_read_timeout (commonly 30 or 60 seconds). The backend takes longer than that to start streaming. Gateway gives up, returns 502 to the client. Whether this becomes a 502 or a 504 depends on whether the backend sent any response bytes at all before timing out.
Symptom pattern: persistent on slow operations (large file generation, complex DB queries, slow third-party calls).
4. Buggy app server returning malformed responses
The app sends back something that isn’t valid HTTP — wrong header format, mid-response crash, invalid Content-Length. Gateway can’t parse it and converts to 502.
Symptom pattern: affects specific endpoints, not the whole site. Often follows a recent deploy.
5. DNS failure between gateway and backend
The gateway resolves the backend hostname dynamically (e.g., a Kubernetes service or AWS ALB DNS name). DNS lookup fails. Gateway can’t even connect to send the request.
Symptom pattern: lasts as long as the DNS issue does, often correlates with infrastructure-level incidents.
6. SSL/TLS handshake failure to backend
Gateway is configured to use HTTPS to the backend. Backend’s certificate is expired, mismatched, or the protocol version is wrong. TLS handshake fails before any HTTP request is even sent.
Symptom pattern: persistent until the certificate is fixed or the gateway config is updated. Often surfaces immediately after a backend cert renewal.
7. Provider-level cascading failures
A major CDN or cloud provider has a regional outage. Cloudflare, AWS ALB, Fastly, or similar service-level incident. The gateway is part of the failing infrastructure, and 502s are returned for every site behind it.
Symptom pattern: simultaneous 502s across thousands of unrelated sites. This is what we surface as a “cascade signal” in our outage detection methodology.
How to diagnose 502 as a visitor
If you’re hitting 502 on a website you don’t operate, run through this in order:
- Refresh once or twice. A surprising fraction are transient and resolve in seconds.
- Check our status page for the site (e.g.,
/check/{domain}) to see if other users are reporting issues. - Try a different network — switch from Wi-Fi to mobile, or vice versa. If the new network works, the issue was your previous ISP’s routing to the site’s CDN.
- Try a different DNS resolver. Cloudflare’s
1.1.1.1or Google’s8.8.8.8route through different infrastructure than your ISP’s default DNS, and occasionally one path is broken while the other works. - Try the site from another device or location (e.g., a phone on cellular). If everything works there but not from your computer, the problem is local to your machine — a misbehaving browser extension, corrupted DNS cache, or proxy software is the most likely culprit.
- Wait 5 minutes and retry. The vast majority of real 502 incidents resolve within 5 minutes. If it persists past that, the site has a real upstream problem that no client action can fix.
How to diagnose 502 as an operator
You’re seeing 502s in your logs or monitoring. The investigation order is:
- Check the gateway’s error log. Nginx, Apache, HAProxy, and Cloudflare all log the upstream connection attempt with the exact reason for the 502. Common log strings:
upstream prematurely closed connection,connect() failed (111: Connection refused),upstream sent invalid HTTP/1.1 header. - Check upstream health. Is the app server process running? Is it responding to a direct request (bypassing the gateway)? Is the connection limit saturated?
- Check error rates per endpoint. If 502s are localized to one URL pattern, the bug is in that endpoint’s code. If they’re spread evenly, it’s an infra issue.
- Compare timing. When did 502s start? Did they start exactly when a deploy went out? Right at a traffic peak? Right at a midnight cron? The timing usually points at the cause.
- Check upstream DNS. Use
digornslookupfrom the gateway to verify the backend hostname resolves consistently. - Check certificate validity if the gateway-to-backend hop uses TLS.
openssl s_client -connect backend:443 -servername backendwill surface most cert issues immediately.
How a website operator prevents 502s
The most effective defenses, in order of cost-to-implement vs. impact:
- Set up active health checks on the gateway so dead backends are pulled from rotation faster than they can serve a 502.
- Tune timeouts. Make sure
proxy_read_timeouton the gateway is comfortably longer than the slowest legitimate backend operation, but short enough to fail fast on stuck backends. - Run multiple backend instances behind the gateway so one crash doesn’t cause user-visible 502s.
- Use connection pooling with a sensible max so traffic spikes don’t exhaust the limit.
- Test rolling deploys in staging — verify health checks correctly drain old instances before they’re killed.
- Monitor 502 rate as a first-class SLI. A baseline 502 rate of more than 0.1% deserves an alert.
Related concepts
- HTTP 503 Service Unavailable — sibling error code, slightly different meaning.
- DNS resolution — root cause of many 502s when the gateway can’t resolve its backend.
- Time to first byte (TTFB) — the timing metric that often reveals which layer is causing 502s.
- SSL/TLS errors — when 502 is caused by a certificate issue between gateway and backend.
When to check our outage feed
If multiple unrelated websites hit 502 simultaneously, the most likely cause is a cascading provider incident — typically Cloudflare, AWS, Akamai, or Fastly. Our live outages feed tracks these patterns across our 2,700+ monitored sites in real time. When you see a multi-site 502 spike, that’s almost always not your problem to fix; it’s a major-provider incident in progress and the only useful action is to wait for it to clear.
For longer-term reliability data, browse our monthly outage reports — they break down which platforms had the most incidents over each calendar month, including 502-pattern outages.