TL;DR
DNS resolution is the process of translating a human-readable domain name (like youtube.com) into the IP address (like 142.250.74.110) that your browser actually connects to. It’s a chain of lookups across multiple servers, and any link in that chain can break — which is why DNS issues are responsible for a surprisingly large share of “the website is down” complaints, even when the website itself is perfectly healthy.
If you can ping the site by IP address but not by name, the problem is DNS, not the site.
The full DNS request flow
When you type youtube.com into your browser, here’s what happens, in order:
1. Your computer's local cache → "Have I looked this up recently?"
2. Your OS DNS cache (system-wide) → Same question, broader scope
3. Your router's DNS cache → Same, network-wide
4. Your ISP's recursive resolver → "Please find youtube.com for me"
├─ Asks the root nameservers → "Where do I find .com names?"
├─ Asks the .com authoritative → "Where do I find youtube.com?"
└─ Asks YouTube's nameserver → "What IP is youtube.com?"
5. Answer comes back up the chain → IP address returned to your computer
6. Your browser opens TCP to that IP → Now we can finally make HTTP request
Each layer caches the answer for a TTL (time-to-live) period — usually 5 minutes to 24 hours — so you don’t repeat the whole dance for every page load.
Two kinds of DNS server
Understanding DNS failures requires understanding the two roles:
Recursive resolvers
These do the lookups on your behalf. Your ISP provides one by default, but you can switch to public ones:
- Cloudflare:
1.1.1.1and1.0.0.1— fastest, privacy-focused - Google:
8.8.8.8and8.8.4.4— most popular alternative - Quad9:
9.9.9.9— adds malware-blocking - OpenDNS:
208.67.222.222— adds family-safe filtering
When people say “switch your DNS server”, they mean switch your recursive resolver.
Authoritative nameservers
These are the source of truth for a particular domain. The DNS records for youtube.com are hosted on Google’s authoritative nameservers. The records for wikipedia.org are on Wikimedia’s. Recursive resolvers ask authoritative servers for the actual records.
If a domain’s authoritative nameservers go down, no recursive resolver in the world can answer queries for that domain (after their cached records expire). This is one of the most catastrophic failure modes in DNS.
The ten most common DNS failures
1. Recursive resolver outage
Your ISP’s DNS resolver is down or overloaded. You see “DNS_PROBE_FINISHED_NO_INTERNET” or similar in your browser. Fix: switch to 1.1.1.1 or 8.8.8.8. Often resolves immediately.
2. Authoritative nameserver outage
The domain’s own DNS provider has an issue. Affects everyone trying to look up that domain after caches expire. Fix: none from the user side. The site operator has to bring their nameservers back or fail over.
3. Expired DNS records (TTL hell)
A record was changed but the old one is still cached at recursive resolvers worldwide. Some users see the new IP, others see the old. Fix: wait for the TTL to expire (commonly 4-24 hours). For operators: pre-lower TTLs before planned changes.
4. Domain registration expired
The registrar (e.g., Namecheap, GoDaddy) didn’t auto-renew. The domain is no longer pointed at any nameservers. Browsers see “server not found”. Fix: the operator has to renew. Until then, nothing works for anyone.
5. DNSSEC validation failure
The domain uses DNSSEC (cryptographic signing of DNS records) and the signatures became invalid. Validating resolvers (1.1.1.1, 8.8.8.8) refuse to return the records. Non-validating resolvers may still work. Fix: operator must fix DNSSEC keys; users can temporarily switch to a non-validating resolver.
6. Blocked at the recursive resolver level
ISPs in some countries block specific domains via DNS. The recursive resolver returns NXDOMAIN or a “blocked” page. Fix: use a public resolver outside your ISP’s control (1.1.1.1) or a VPN.
7. Slow DNS = slow page loads
Your DNS resolver is reachable but slow. Every page on every site takes 200-2000ms longer than it should because each new domain requires a fresh lookup. Fix: switch to a faster resolver.
8. Local DNS cache poisoning
Your machine has a stale or corrupted DNS cache. Common after a long uptime, sleep/wake cycles, or VPN connection changes. Fix: flush DNS cache. macOS: sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder. Windows: ipconfig /flushdns. Linux: sudo resolvectl flush-caches.
9. Hosts file override
A line in your computer’s hosts file maps the domain to an old or wrong IP. Common during local development that wasn’t cleaned up. Fix: edit /etc/hosts (Linux/Mac) or C:\Windows\System32\drivers\etc\hosts (Windows) and remove the override.
10. Asymmetric DNS configuration
The site has different IP addresses for different geographic regions (geo-DNS), and one region’s records are misconfigured or pointed at a dead server. Fix: operator has to fix the regional record; users can temporarily use a VPN to a working region.
How to diagnose DNS issues as a visitor
If you suspect DNS is broken:
- Try the IP directly. Use
dig,nslookup, or an online DNS lookup tool to find the site’s IP, then paste the IP into your browser. If the site loads via IP but not via name, DNS is the issue. - Try a different resolver. Change your DNS server (system settings) to
1.1.1.1and reload. If it works, your ISP’s DNS was the problem. - Flush local cache. See command list in cause #8 above.
- Try from a different network. If the site works from your phone on cellular but not from your laptop on Wi-Fi, the issue is in your home network’s DNS path.
- Check from a third party. Use whatsmydns.net or similar to see what the rest of the world sees for that domain. If global lookups work but yours doesn’t, it’s local. If global lookups fail too, the domain itself has a problem.
How to diagnose DNS issues as an operator
- Check your registrar’s expiration date. Renew if needed. Set up auto-renew.
- Check authoritative nameservers. Run
dig @your-nameserver yourdomain.comfrom outside your network. Should return a valid answer. - Check propagation. whatsmydns.net shows how a record looks from dozens of locations. Mismatches mean propagation is in progress.
- Check DNSSEC chain. dnsviz.net visualizes the cryptographic signatures and flags any broken links.
- Check TTLs. A 24-hour TTL means changes take 24 hours to propagate worldwide. Lower this before planned changes (set to 5 minutes), then raise it back after.
- Audit
hostsfile entries on your own machines that might be overriding production records during testing.
Why DNS is responsible for so many “outages”
The classic incident: a deploy goes out, the new version is healthy, monitoring is green, but users are reporting the site is broken. The cause turns out to be a DNS record that pointed at the old infrastructure for far longer than expected because of a 24-hour TTL. The site is technically up, but for many users, browsers can’t even find it.
A second classic: the authoritative nameservers are at the same provider as the website. The provider has an outage. Now the website and its DNS are both unreachable simultaneously, so users see “server not found” rather than “page error”. Multi-provider DNS hosting (separate authoritative nameserver providers from your hosting) prevents this single point of failure.
A third classic: BGP route hijacking. An ISP makes a configuration mistake and starts routing traffic destined for 8.8.8.8 to its own servers. Legitimate Google DNS responses become unreachable for that ISP’s customers. This is rare but real and has happened multiple times in the past decade.
How DNS shows up in our monitoring
When our automated HTTPS probe runs, the very first step is a DNS resolution from our monitoring servers to the target domain. If DNS resolution fails — either because the recursive resolver returns NXDOMAIN, or because no authoritative nameserver responds — we mark that check as dns_error rather than down. This distinction matters because DNS errors and HTTP errors usually have different root causes, and our outage detection methodology records them separately.
If you’re seeing a “is X down” question with a status of dns_error on our site, the operator’s DNS configuration is the issue, not their website. Often a 5-30 minute fix once they notice.
Related concepts
- HTTP 502 Bad Gateway — DNS failures between gateway and backend often cause 502s.
- HTTP 503 Service Unavailable — distinct from DNS failures, though they look similar from the user’s seat.
- Time to first byte (TTFB) — DNS lookup time is included in TTFB, so slow DNS makes pages feel slow.
- SSL/TLS errors — DNS and SSL issues frequently coincide during certificate renewals and infrastructure changes.
For a real-time view of sites currently experiencing DNS issues, see our live outages feed. For longer-term reliability data, browse our monthly outage reports.