Report this

What is the reason for this report?

How to block website from crawling my server?

Posted on June 9, 2019

I have Ubuntu 18 on my vps with ufw enabled, i tried to block the ip adress from the netstat log but it is still accessing my site like the following : root@Waseely:~# netstat -c | grep 3000 tcp6 0 0 142.93.23.138:3000 crawl-66-249-64-3:57123 ESTABLISHED tcp6 0 0 142.93.23.138:3000 crawl-66-249-64-3:38822 ESTABLISHED tcp6 0 0 142.93.23.138:3000 crawl-66-249-64-3:36797 ESTABLISHED tcp6 0 0 142.93.23.138:3000 crawl-66-249-64-3:57123 ESTABLISHED tcp6 0 0 142.93.23.138:3000 crawl-66-249-64-3:38822 ESTABLISHED tcp6 0 0 142.93.23.138:3000 crawl-66-249-64-3:36797 ESTABLISHED tcp6 0 0 142.93.23.138:3000 crawl-66-249-64-3:57123 ESTABLISHED tcp6 0 0 142.93.23.138:3000 crawl-66-249-64-3:38822 ESTABLISHED tcp6 0 0 142.93.23.138:3000 crawl-66-249-64-3:57123 ESTABLISHED tcp6 0 0 142.93.23.138:3000 crawl-66-249-64-3:38822 TIME_WAIT tcp6 0 0 142.93.23.138:3000 crawl-66-249-64-3:46276 ESTABLISHED

I am trying to block the ip adress of the "crawl-66-249-64-3 but no luck

Any idea?



This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

This looks like a googlebot, have you tried disallowing google via your robots.txt file?

I tried that , also tried to block the ip it self using sudo ufw deny from 66.249.64.1/24

Heya,

In your robots.txt file, you can use the Disallow directive to specify which bots or user-agents should be blocked.

For example, to block all access from the user-agent crawl-66-249-64-3, you can add the following lines:

  1. User-agent: crawl-66-249-64-3
  2. Disallow: /

This instructs the bot “crawl-66-249-64-3” not to crawl any pages on your site.

To make sure the changes take effect, you may need to restart your web server. Use the appropriate command for your web server software.

For Apache, you can use:

  1. sudo systemctl restart apache2

For Nginx:

  1. sudo systemctl restart nginx

To test if the robots.txt file is correctly blocking the bot, you can use various online tools or access logs to see if the bot’s access to your site decreases or stops.

Remember that well-behaved bots usually respect the rules in the robots.txt file, but malicious bots might not. It’s an effective way to communicate your preferences to web crawlers, but it doesn’t guarantee that all bots will obey. If you continue to experience issues, you might consider implementing additional security measures or IP blocking as mentioned earlier.

Hope that this helps!

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.