Hidden in Plain Sight

Why are we doing this?

Any one that has hosted anything on the internet knows that hackers, bots and scripts are constantly scanning for content (and often ignore robots.txt). This pages describes techniques for making it harder for bots, scripts, and even hackers to find applications in a domain.

What do the logs say?

IP address are findable (obviously!)
Domain names are (usually) findable:

  • shodan.io - stores a history of your past searchable domains
  • centrsys.io - scans IP addresses, certs and domains
  • crt.sh - letsencrypt publishes all your subdomains in certs so its searchable

EVERYONE hits the base IP
EVERYONE checks common ports
EVERYONE tries common paths
EVERYONE tries known vulns (especially php and especially wordpress)
EVERYONE tries shell code (weird - when was the last time this worked?)
Many hit the domain

Some follow links (looking at you gptbot)

But no one seems to follow javascript (except nicecrawler.com)

What should we do?

  • Put a web server serving static html at the root
  • Harden the web server (and definitely no dynamic php or cgi)
  • Do NOT put apps in top level domains
  • Do NOT put apps in sub domains
  • Use a top level domain cert for all apps
  • Do NOT use common / obvious paths for apps
  • Use a robot.txt (some bots do respect it)
  • Use obfuscated javascript to create links on the main landing page.
  • Watch your access logs

Use Javascript Obfuscator to obscure links - this means only bots that actually render the page will see them. And your javascript should use "onclick" event rather than anchor href to obscure it further.

Something like:

document.write('<div style="display:inline" onclick="window.location.href=\'/app\'">my app</div> - ');

Tell robots.txt in the site root to block everything:

User-agent: *
Disallow: /

Raw Data

Filtered, unique paths uncovered as at 09/12/2025 are here and they were collected using this magic:

cat access.log* |  sed 's/^.*] \"//' | sed 's/ HTTP\/...\"//' | sed 's/ [0-9]* \".*$//' | sed 's/\"//' | sort -u

Further Notes

Connection requests that start with binary, e.g. \x16\x03, is an attempt to connect to an HTTP port with using an encrypted protocol. I.e. https://<your domain>:80/

<IP Address> - - [14/Dec/2025:09:24:07 +1300] "\x16\x03\x01\x07\x17\x01\x00\x07\x13\x03\x03\x07\x1D\xD3\x18\xE6\xD4\xD2\xEF[\x19F\xD5\x80l\x12F\x13\xFE\xDC\xA6\xF5)\xAB\xC9\x07:(\xE8\xB9\xCB\x0F\x1A \x1Cf\xF2]?\x80f\x00\xFB\x99\xB3\x7F\x09z\x01\xB8BP?\xDA\xF3\x995\x9C\xBDr\xA5!\xED\x8F[\xC3\x00 zz\x13\x01\x13\x02\x13\x03\xC0+\xC0/\xC0,\xC00\xCC\xA9\xCC\xA8\xC0\x13\xC0\x14\x00\x9C\x00\x9D\x00/\x005\x01\x00\x06\xAAZZ\x00\x00\x00" 400 166 "-" "-"

I confirmed that these techniques work by monitoring and alerting on application access using the Telegram Alert Bot to watch nginx logs.