By gebelo
I have a droplet running Ubuntu to support a small Rails app and a few weeks ago, my handful of users reported that they couldn’t get on, and I discovered that there was some bot indexing every page on my site, hitting several pages per second and using up all of my CPU capacity. This now seems to happen once per week. I am not a highly skilled web master - somebody set up the droplet for me and I just know how to write the Rails code and work with the database. Is there anything I can do on the server to, say, limit how many pages one user can hit at a time, or anything else that would prevent this from happening?
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Accepted Answer
Hi @gebelo,
What you can do is create a robots.txt file in your website’s root directory, disallow all bots except Google to craw your website.
The contents of the robots.txt file can be:
- User-agent: Googlebot
- Allow: /
-
- User-agent: *
- Disallow: /
Hello @gebelo
If the robot’s crawling is causing the CPU spikes, then follow KFS’s suggestion. This should do the trick for you.
I will also recommend you monitor the droplet’s resource usage using top
or netstat
. You can find more information here.
Regards
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.