By Lex Gabrees
Hi all
I would like to block crawling tools such as adplexity.com and whatrunswhere.com . These crawlers seem to be pretty smart and keep indexing media placements (ads). As a ‘services’ they allow people to view the indexed ads and let people view the web pages they link to. The result is that people steal idea’s, concepts and even complete landing page code.
There are several services which claim to ‘cloak’ pages and redirect these bots to dummy pages. They charge a very high fee (200 dollars and up / month) and they refuse to say anything about they techniques or ways they work. Not a single thing.
I was wondering if anyone has any idea of how I could be able to identify those spy tools and then redirect them to a dummy web page, while letting regular users go through to a correct page. Are there any applications, techniques or other ways that I can accomplish this on my droplets ? Any tips would be much appreciated!
Thanks, Lex
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Thanks for the answer Jason … I had already looked into this technique. I believe that these crawlers change IP and User Agent very often, making it a cat and mouse game to catch them each and every time … I’m still looking for a ‘catch most’ solution that limits the amount of work that I have to put in …
If you are using apache you can use your .htaccess file to do so after identifying the crawler.
This may help: http://sitebeginner.com/apache/blockcrawlerbots/
For NGINX this may help: https://www.knthost.com/nginx/blocking-bots-nginx
Haha user agent. They spoof browsers with the user-agent header. If you want them blocked you can start by blocking any of the 50k residential IP proxies they’re using. This means signing up to all the residential IP services ($500/mo minimum) and harvesting their IPs without them knowing you’re sniffing. Google literally has a whole department dedicated to scanning them. Another option is a service like Udger which tries (tries) to keep track. Adplexity isn’t stupid. What’s funny is the number of trackers like Voluum, Binom, ThriveTracker, etc that say they detect bots. They don’t. Sign up for Kintura and use their Google reCAPTCHA v3 integration and you’ll see just how many bots are making it to your site. Welcome to the wild west, sir.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.