How can I block crawlers sent bij services such as / ?

Posted September 30, 2017 4k views
NginxPHPSecurityFirewallUbuntu 16.04

Hi all

I would like to block crawling tools such as and . These crawlers seem to be pretty smart and keep indexing media placements (ads). As a ‘services’ they allow people to view the indexed ads and let people view the web pages they link to. The result is that people steal idea’s, concepts and even complete landing page code.

There are several services which claim to 'cloak’ pages and redirect these bots to dummy pages. They charge a very high fee (200 dollars and up / month) and they refuse to say anything about they techniques or ways they work. Not a single thing.

I was wondering if anyone has any idea of how I could be able to identify those spy tools and then redirect them to a dummy web page, while letting regular users go through to a correct page. Are there any applications, techniques or other ways that I can accomplish this on my droplets ? Any tips would be much appreciated!


These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Submit an Answer
3 answers

If you are using apache you can use your .htaccess file to do so after identifying the crawler.

This may help:

For NGINX this may help:

Thanks for the answer Jason … I had already looked into this technique. I believe that these crawlers change IP and User Agent very often, making it a cat and mouse game to catch them each and every time … I’m still looking for a ‘catch most’ solution that limits the amount of work that I have to put in …

Haha user agent. They spoof browsers with the user-agent header. If you want them blocked you can start by blocking any of the 50k residential IP proxies they’re using. This means signing up to all the residential IP services ($500/mo minimum) and harvesting their IPs without them knowing you’re sniffing. Google literally has a whole department dedicated to scanning them. Another option is a service like Udger which tries (tries) to keep track. Adplexity isn’t stupid. What’s funny is the number of trackers like Voluum, Binom, ThriveTracker, etc that say they detect bots. They don’t. Sign up for Kintura and use their Google reCAPTCHA v3 integration and you’ll see just how many bots are making it to your site. Welcome to the wild west, sir.