How-to
Probe & Scanner Detection
How the edge_404_probe detector surfaces reconnaissance activity (WordPress kits, secrets hunters, tenant-targeted scans) that every volume-based detector misses.
Probe & Scanner Detection
Most detectors fire on traffic volume — "this ASN sent 12× its usual requests", "this path bypassed cache at 3× the rate". That works for scraping and credential stuffing, but it's blind to the activity that happens before those attacks: reconnaissance.
A scanner looking for soft targets fires single requests at dozens of
well-known probe paths — /wp-login.php, /.env, /backup.zip,
/phpmyadmin, /.git/config, /wp-content/upgrade/alfacgiapi.
Each individual probe is tiny (often under 50 requests/week across your
whole tenant). No volume-based detector would ever fire. Yet the
structural signature — many distinct probe paths, across several
probe families, from one ASN in one hour — is unambiguously hostile.
That's what edge_404_probe catches.
How it works
A separate AE rollup dataset (edge_probe_paths) captures 4xx and 5xx
requests only, with per-request classification into one of seven
families:
- wordpress — anything matching
wp-login,wp-admin,wp-content,wp-includes,wlwmanifest.xml,xmlrpc.php, or similar. - env_secrets —
.envvariants (.env.production,/app/.env,/backend/.env,/laravel/.env, etc.). - git_repo —
/.git/config,/.git/HEAD. - alfa_webshell — ALFA Team webshell fingerprints (
alfacgiapi,ALFA_DATA). These scanners specifically look for sites already compromised by someone else so they can piggyback. - sql_dump —
dump.sql,backup.zip,database.sql,wp-config.php.bak, etc. - admin_panel —
phpmyadmin,adminer.php,/administrator,phpinfo.php,/vendor/phpunit/phpunit. - tenant_targeted — a probe path that contains your brand's name
as a substring (e.g.
/charlestyrwhitt.sql). This is the most important signal: a generic kit fires the same URLs at every site, but a tenant-targeted probe means somebody already knows you.
Because the dataset only receives 4xx/5xx records, its volume stays tiny even on busy tenants. Known-probe fingerprints are priority-boosted in the rollup so they always survive the top-24 cut against legitimate 4xx traffic (missing images, expired deep links).
When it trips
The detector evaluates each (ASN, country) tuple over a rolling 60-minute window. It fires on any of three independent signals:
- 20+ distinct probe paths from one ASN → warning. Classic scanner fingerprint.
- 3+ distinct probe families from one ASN → critical regardless of request count. Cross-category scanning demonstrates more capable tooling.
- Any tenant_targeted probe at count ≥ 1 → critical immediately. A scanner using your brand name is far more specific than a generic kit.
The alert's context includes a breakdown of families, example probe paths per family (up to three per family), and the full probe and total-4xx request counts, so you can see at a glance what the scanner was looking for.
When it does NOT trip
- A real user hitting a single missing asset (
/us/old-campaign) — one path, no family match, no trip. - Your own health-check or synthetic-monitor hitting a deprecated URL repeatedly — even at high volume, no family match.
- A single 4xx noise source on one family only —
min_distinct_pathsguards against overfiring on one-off 4xx anomalies.
What to do when it fires
The standard mitigation is a Cloudflare WAF rate-limit on the ASN (not a block). Scanner ASNs are often shared with legitimate traffic — AWS, DigitalOcean, Hetzner, OVH — and blocking the whole range will hit real users. A rate-limit at, say, 10 requests/minute is lethal to a scanner without affecting humans.
If you see tenant_targeted hits:
- Treat as a targeted-recon event — note the ASN and the exact path patterns probed.
- Audit that those filenames do not, in fact, exist on your origin. (Shouldn't, but worth confirming.)
- Rotate any credentials that historically existed in an
.envor similar configuration file on this origin at any time. - Consider enabling Cloudflare Access on admin-style paths.
Probes are cheap to ignore if your origin is hardened. The alert's biggest value is longitudinal — who keeps probing you, and when do they escalate to something else?