Report this

What is the reason for this report?

block search engine indexing on all sub-domain

Posted on February 25, 2019

Hi,

I host many demo site on a single droplet like this, for examples

site1.demosite.com site2.demosite.com . . . site10.demosite.com

they all use demosite.com as the domain

How can I block search engine from indexing all my website with in that domain? Is it possible to do in DigitalOcean’s control panel or I have to do it on Nginx?

I know it possible to manually block search engine for each one of them. However, sometimes I forgot to manually block it and need to make sure that my demo site don’t affect the SEO of the real one.

Thank you



This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

I found this in a forum a few days ago as it was also a concern for me.

Step One The first section is an example where you can map a list of the bots (whoever collected this is a power ranger) in nginx.conf’s http block, making it reusable in case you’re working with multiple server files.

http {
    map $http_user_agent $limit_bots {
        default 0;
        ~*(adbeat_bot|ahrefsbot|alexibot|appengine|aqua_products|archive.org_bot|archive|asterias|attackbot|b2w|backdoorbot|becomebot|blackwidow|blekkobot) 1;
        ~*(blowfish|botalot|builtbottough|bullseye|bunnyslippers|ccbot|cheesebot|cherrypicker|chinaclaw|chroot|clshttp|collector) 1;
        ~*(control|copernic|copyrightcheck|copyscape|cosmos|craftbot|crescent|curl|custo|demon) 1;
        ~*(disco|dittospyder|dotbot|download|downloader|dumbot|ecatch|eirgrabber|email|emailcollector) 1;
        ~*(emailsiphon|emailwolf|enterprise_search|erocrawler|eventmachine|exabot|express|extractor|extractorpro|eyenetie) 1;
        ~*(fairad|flaming|flashget|foobot|foto|gaisbot|getright|getty|getweb!|gigabot) 1;
        ~*(github|go!zilla|go-ahead-got-it|go-http-client|grabnet|grafula|grub|hari|harvest|hatena|antenna|hloader) 1;
        ~*(hmview|htmlparser|httplib|httrack|humanlinks|ia_archiver|indy|infonavirobot|interget|intraformant) 1;
        ~*(iron33|jamesbot|jennybot|jetbot|jetcar|joc|jorgee|kenjin|keyword|larbin|leechftp) 1;
        ~*(lexibot|library|libweb|libwww|linkextractorpro|linkpadbot|linkscan|linkwalker|lnspiderguy|looksmart) 1;
        ~*(lwp-trivial|mass|mata|midown|miixpc|mister|mj12bot|moget|msiecrawler|naver) 1;
        ~*(navroad|nearsite|nerdybot|netants|netmechanic|netspider|netzip|nicerspro|ninja|nutch) 1;
        ~*(octopus|offline|openbot|openfind|openlink|pagegrabber|papa|pavuk|pcbrowser|perl) 1;
        ~*(perman|picscout|propowerbot|prowebwalker|psbot|pycurl|pyq|pyth|python) 1;
        ~*(python-urllib|queryn|quester|radiation|realdownload|reget|retriever|rma|rogerbot|scan|screaming|frog|seo) 1;
        ~*(scooter|searchengineworld|searchpreview|semrush|semrushbot|semrushbot-sa|seokicks-robot|sitesnagger|smartdownload|sootle) 1;
        ~*(spankbot|spanner|spbot|spider|stanford|stripper|sucker|superbot|superhttp|surfbot|surveybot) 1;
        ~*(suzuran|szukacz|takeout|teleport|telesoft|thenomad|tocrawl|tool|true_robot|turingos) 1;
        ~*(twengabot|typhoeus|url_spider_pro|urldispatcher|urllib|urly|vampire|vci|voideye|warning) 1;
        ~*(webauto|webbandit|webcollector|webcopier|webcopy|webcraw|webenhancer|webfetch|webgo|webleacher) 1;
        ~*(webmasterworld|webmasterworldforumbot|webpictures|webreaper|websauger|webspider|webster|webstripper|webvac|webviewer) 1;
        ~*(webwhacker|webzip|webzip|wesee|wget|widow|woobot|www-collector-e|wwwoffle|xenu) 1;
    }
}

Step Two Then in the second section, you can of course configure the location yourself, matching the agent whether they match whatever’s in the list or if they don’t return an agent at all (those are the ones that get you at first) and viola.

server {
    location / {
        #blocks blank user_agents
        if ($http_user_agent = "") { return  301 $scheme://www.google.com/; }

        if ($limit_bots = 1) {return  301 $scheme://www.google.com/;}
    }
}

Remember that robots.txt should be a secondary, aside from the usual no-index meta you can use in your HTML. Some can say this is OD, but as long as it works for you, it works.

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.