How-to

Probe & Scanner Detection

How the edge_404_probe detector surfaces reconnaissance activity (WordPress kits, secrets hunters, tenant-targeted scans) that every volume-based detector misses.

3 min readLast updated 26 April 2026

Jump to section

How it works
When it trips
When it does NOT trip
What to do when it fires

Probe & Scanner Detection

Most detectors fire on traffic volume — "this ASN sent 12× its usual requests", "this path bypassed cache at 3× the rate". That works for scraping and credential stuffing, but it's blind to the activity that happens before those attacks: reconnaissance.

A scanner looking for soft targets fires single requests at dozens of well-known probe paths — /wp-login.php, /.env, /backup.zip, /phpmyadmin, /.git/config, /wp-content/upgrade/alfacgiapi. Each individual probe is tiny (often under 50 requests/week across your whole tenant). No volume-based detector would ever fire. Yet the structural signature — many distinct probe paths, across several probe families, from one ASN in one hour — is unambiguously hostile.

That's what edge_404_probe catches.

How it works

A separate AE rollup dataset (edge_probe_paths) captures 4xx and 5xx requests only, with per-request classification into one of seven families:

wordpress — anything matching wp-login, wp-admin, wp-content, wp-includes, wlwmanifest.xml, xmlrpc.php, or similar.
env_secrets — .env variants (.env.production, /app/.env, /backend/.env, /laravel/.env, etc.).
git_repo — /.git/config, /.git/HEAD.
alfa_webshell — ALFA Team webshell fingerprints (alfacgiapi, ALFA_DATA). These scanners specifically look for sites already compromised by someone else so they can piggyback.
sql_dump — dump.sql, backup.zip, database.sql, wp-config.php.bak, etc.
admin_panel — phpmyadmin, adminer.php, /administrator, phpinfo.php, /vendor/phpunit/phpunit.
tenant_targeted — a probe path that contains your brand's name as a substring (e.g. /charlestyrwhitt.sql). This is the most important signal: a generic kit fires the same URLs at every site, but a tenant-targeted probe means somebody already knows you.

Because the dataset only receives 4xx/5xx records, its volume stays tiny even on busy tenants. Known-probe fingerprints are priority-boosted in the rollup so they always survive the top-24 cut against legitimate 4xx traffic (missing images, expired deep links).

When it trips

The detector evaluates each (ASN, country) tuple over a rolling 60-minute window. It fires on any of three independent signals:

20+ distinct probe paths from one ASN → warning. Classic scanner fingerprint.
3+ distinct probe families from one ASN → critical regardless of request count. Cross-category scanning demonstrates more capable tooling.
Any tenant_targeted probe at count ≥ 1 → critical immediately. A scanner using your brand name is far more specific than a generic kit.

The alert's context includes a breakdown of families, example probe paths per family (up to three per family), and the full probe and total-4xx request counts, so you can see at a glance what the scanner was looking for.

When it does NOT trip

A real user hitting a single missing asset (/us/old-campaign) — one path, no family match, no trip.
Your own health-check or synthetic-monitor hitting a deprecated URL repeatedly — even at high volume, no family match.
A single 4xx noise source on one family only — min_distinct_paths guards against overfiring on one-off 4xx anomalies.

What to do when it fires

The standard mitigation is a Cloudflare WAF rate-limit on the ASN (not a block). Scanner ASNs are often shared with legitimate traffic — AWS, DigitalOcean, Hetzner, OVH — and blocking the whole range will hit real users. A rate-limit at, say, 10 requests/minute is lethal to a scanner without affecting humans.

If you see tenant_targeted hits:

Treat as a targeted-recon event — note the ASN and the exact path patterns probed.
Audit that those filenames do not, in fact, exist on your origin. (Shouldn't, but worth confirming.)
Rotate any credentials that historically existed in an .env or similar configuration file on this origin at any time.
Consider enabling Cloudflare Access on admin-style paths.

Probes are cheap to ignore if your origin is hardened. The alert's biggest value is longitudinal — who keeps probing you, and when do they escalate to something else?