How We Detect Outages — Methodology (down detector-Style)

Is it down checker uses a dual-signal approach to detect website outages: automated server-side HTTP checks combined with crowd-sourced user reports. This is the same down detector style methodology used by major outage trackers, here available as a free alternative covering 2,700+ sites. This hybrid approach means we can detect both full-scale outages and partial or regional issues that server checks alone might miss. The terms "crowd sourced" and "crowdsourced" are both used in the literature; we use the hyphenated form throughout but the underlying concept is the same.

This page explains how it works end-to-end — our monitoring methodology, the dual signal between server probes and user reports, and how thresholds are calibrated. If you're looking for the short version, the FAQ has it; for technical depth, keep reading.

How We Check Websites

Our monitoring infrastructure continuously checks websites using HTTPS requests from our servers. For each check, we measure:

HTTP Status Code — Whether the server responds with a success (2xx/3xx) or error (4xx/5xx) code
Response Time (Latency) — Time-to-first-byte in milliseconds, giving an accurate picture of server performance
DNS Resolution — Whether the domain resolves correctly, catching DNS-level failures
Connection Timeouts — Sites that fail to respond within the timeout window are flagged

Popular sites are checked every 60 seconds. All monitored sites are checked at least once per hour, with results stored for 24-hour historical analysis.

Crowd-Sourced Report Spike Detection

Server-side checks tell us if a site is reachable from our servers — but a site can be "up" for us while down for thousands of users due to CDN failures, regional routing issues, or application-level errors that don't affect the HTTP response code.

To catch these situations, we track user-submitted "It's down for me" reports and compare the current report volume against a rolling historical baseline. When reports spike significantly above normal levels, we elevate the site's status — even if our HTTP check says it's up.

The baseline is calculated as the average number of reports per hour over a trailing window. We then compute a spike ratio: the current reporting rate divided by the baseline rate. A minimum threshold of reports is required before any spike is triggered, preventing false alarms from low-traffic sites.

3-Tier Status System

We combine the HTTP check result with the crowd report spike ratio to produce one of three statuses:

No Problems

The site responds successfully to our HTTP check and user reports are at or below the historical baseline. No evidence of an outage.

Possible Problems

The site responds successfully to our HTTP check, but user reports are elevated — at least 3x the normal baseline with a minimum of 3 reports in the last 30 minutes. This may indicate a partial or regional outage.

Problems Detected

Either our HTTP check indicates the site is down (error response, timeout, or DNS failure), or user reports have spiked to 5x or more above the baseline. This indicates a significant, widespread outage.

Report Chart & Baseline Visualization

On each site's status page, the report chart shows the volume of user-submitted reports over the last 24 hours. The dashed red baseline line represents the historical average reporting rate. When the blue report bars exceed the baseline, it visually indicates elevated user complaints — the higher above the line, the more likely an outage is occurring.

Response time data is overlaid on the same chart (shown as a green line), allowing you to correlate user reports with actual server performance changes.

Outage Event Tracking

When a site's status transitions from "No Problems" to either "Possible Problems" or "Problems Detected," we record an outage event with a start timestamp. When the status returns to normal, we resolve the event and record its duration. This gives us accurate 24-hour, 7-day, and 30-day uptime statistics.

Uptime Percentage Calculation

Uptime percentages are calculated by subtracting total recorded outage minutes from the total minutes in the time window. For example, if a site had 15 minutes of outage in the last 24 hours (1,440 minutes), its 24-hour uptime would be:

(1440 − 15) ÷ 1440 × 100 = 98.96%

Limitations & Transparency

Single check region — Our HTTP checks currently run from a single data center region. We may miss region-specific outages that don't affect our checking location.
Crowd report accuracy — User reports are subjective. A user might report a site as "down" due to their own network issues. Our minimum report threshold and spike ratio math help filter noise, but false positives can still occur.
Application-level issues — We check if a site responds to an HTTP request. A site could return a 200 status code while displaying an error page to users. Crowd reports help catch these cases.
Privacy — IP addresses used for geo-location of reports are hashed with a daily rotating salt. Raw IPs are never stored persistently.

Glossary of Terms

Key terms used throughout our methodology and status pages. These definitions are also published as structured data so search engines and AI assistants can cite them directly.

Uptime: Percentage of time a site was reachable and responded with a valid HTTP status during a given period.
Dual-Signal Outage Detection: Our monitoring approach: automated HTTP probes combined with crowd-sourced user reports.
Time to First Byte (TTFB): Latency from sending an HTTP request to receiving the first byte of the response. Used to measure server performance.
DNS Resolution: Translation of a domain name to an IP address. DNS failures are among the most common outage causes.
Soft 404: A page that returns HTTP 200 OK but has no useful content — often treated as thin content by search engines.
Report Spike: A statistically significant jump in user-submitted outage reports vs. a 7-day rolling baseline.
No Problems (Green): Site is reachable and user reports are at baseline. No evidence of an outage.
Possible Problems (Yellow): Server responds but user reports have spiked 3x+ above baseline — likely partial or regional outage.
Problems Detected (Red): Either HTTP failure confirmed, or user reports at 5x+ baseline. Widespread outage in progress.

Have questions about our methodology?

We're committed to transparency. If you have questions or suggestions about how we detect outages, feel free to reach out via the report form on any site's status page. You can also surface this same status data on your own site with our embeddable status badge.

Glossary deep-dives

For the technical concepts referenced above, see our reliability glossary:

Our Methodology