Does Google Obey Your Robots.txt File Exclusions?

March 10, 2015

Robots.txt files are commonplace on the Web and standard for most CMS platforms. Even if you don’t know what a robots.txt file is, it is likely that your site has one. As a brief summary, a robots.txt file is a plain-text, or .txt file, that lives in the root directory of your website. This file allows webmasters to give instructions to web robots on how to crawl their site. A site’s robots.txt file can be found at domain.tld/robots.txt. In theory, your robots.txt file will be the first file crawled and web robots will then use these directives on how to crawl, or not to crawl, the site. Malware bots etc. are going to ignore your robots.txt file completely and crawl your site. In fact, your web logs will reveal that most bots ignore your robots.txt specifications, but the question is does Google obey your robots.txt file exclusions?

Does Google Obey Your Robots.txt File Exclusions?

I want to keep this brief, so all I’m checking is if Google obeys your robots.txt exclusions. Common exclusions and their meanings:

Directive to instruct all robots or web crawlers:

User-agent: *

Directive to instruct just Google bot:

User-agent: Googlebot

Disallow all pages:
Disallow: /

Disallow Certain Pages/Directories

(common disallow for WordPress sites):
Disallow: /wp-admin/

Disallow all bots from entire site:

User-agent: *

Disallow: /

Disallow all bots from WP admin directory:

Disallow: /wp-admin/

With that covered, I have disallow all admin pages enabled for my site. So, does Google obey my robots.txt file exclusions?

Well, yes and no. I don’t see my excluded directories in the primary results, but I do see my root admin page in the “omitted” results:

However, results differ if you have your root domain excluded via your robots.txt file. These results are visible in search results and not hidden behind the “omitted” results:

Note with the second result that this is not a search specifying the domain, but rather the brand and the robots.txt exclusion is being ignored. Though, the page description and actual title is not being displayed in search results. The cache is also not available for either of these results. However, it is clear that Google will still return your site with a robots.txt exclusion, though, it will certainly cripple your SEO.

*This post is not about Bing, but in all examples I’ve found Bing does honor the robots.txt exclusions and does not return these results.

Google Does Not Obey Your Robots.txt File Exclusions – How Can You Stop Your Pages From Returning in Searches?

You will need to add the below no index, and no follow meta tag in the <head> tags on your site:

You can also allow following and just disallow indexing:

There you go, Google does not obey your robots.txt exclusions. However, you can still keep your site private in a number of ways.

Does Google Obey Your Robots.txt File Exclusions?

Does Google Obey Your Robots.txt File Exclusions?

Google Does Not Obey Your Robots.txt File Exclusions – How Can You Stop Your Pages From Returning in Searches?

You may also like...

Leave a Reply Cancel reply

Track Any Website Click with Google Tag Manager

Google Tag Manager Trick for Variable Identification

Block Google Analytics Spam Referrals Hostname Include Filter

Track Email and Phone Number Clicks in Google Analytics Via Google Tag Manager

Share Public Link to File on Synology NAS

How To Customize Your Wordpress 404 Error Page

How to Track External Clicks in Google Tag Manager

Track Mailto Clicks as Events in Google Tag Manager

Start Here

Recent Posts

Does Google Obey Your Robots.txt File Exclusions?

Does Google Obey Your Robots.txt File Exclusions?

Google Does Not Obey Your Robots.txt File Exclusions – How Can You Stop Your Pages From Returning in Searches?

You may also like...

Change WordPress Password in Database

How to Play N64 Games on a Mac

Why Does Google Hide Organic Keyword Referral Data?

Leave a Reply Cancel reply

Track Any Website Click with Google Tag Manager

Google Tag Manager Trick for Variable Identification

Block Google Analytics Spam Referrals Hostname Include Filter

Track Email and Phone Number Clicks in Google Analytics Via Google Tag Manager

Share Public Link to File on Synology NAS

How To Customize Your Wordpress 404 Error Page

How to Track External Clicks in Google Tag Manager

Track Mailto Clicks as Events in Google Tag Manager

Start Here

Recent Posts