How can I block crawlers sent bij services such as / ?

Hi all

I would like to block crawling tools such as and . These crawlers seem to be pretty smart and keep indexing media placements (ads). As a ‘services’ they allow people to view the indexed ads and let people view the web pages they link to. The result is that people steal idea’s, concepts and even complete landing page code.

There are several services which claim to ‘cloak’ pages and redirect these bots to dummy pages. They charge a very high fee (200 dollars and up / month) and they refuse to say anything about they techniques or ways they work. Not a single thing.

I was wondering if anyone has any idea of how I could be able to identify those spy tools and then redirect them to a dummy web page, while letting regular users go through to a correct page. Are there any applications, techniques or other ways that I can accomplish this on my droplets ? Any tips would be much appreciated!

Thanks, Lex

Submit an answer

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Haha user agent. They spoof browsers with the user-agent header. If you want them blocked you can start by blocking any of the 50k residential IP proxies they’re using. This means signing up to all the residential IP services ($500/mo minimum) and harvesting their IPs without them knowing you’re sniffing. Google literally has a whole department dedicated to scanning them. Another option is a service like Udger which tries (tries) to keep track. Adplexity isn’t stupid. What’s funny is the number of trackers like Voluum, Binom, ThriveTracker, etc that say they detect bots. They don’t. Sign up for Kintura and use their Google reCAPTCHA v3 integration and you’ll see just how many bots are making it to your site. Welcome to the wild west, sir.

Thanks for the answer Jason … I had already looked into this technique. I believe that these crawlers change IP and User Agent very often, making it a cat and mouse game to catch them each and every time … I’m still looking for a ‘catch most’ solution that limits the amount of work that I have to put in …

If you are using apache you can use your .htaccess file to do so after identifying the crawler.

This may help:

For NGINX this may help: