Introduction to Robots.txt

The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on their website. Cheat Sheet

Block all web crawlers from all content

User-agent: * 
Disallow: /

Block a specific web crawler from a specific folder

User-agent: Googlebot 
Disallow: /no-google/

Block a specific web crawler from a specific web page

User-agent: Googlebot 
Disallow: /no-google/blocked-page.html

Allow a specific web crawler to visit a specific web page

Disallow: /no-bots/block-all-bots-except-rogerbot-page.html 
User-agent: rogerbot Allow: /no-bots/block-all-bots-except-rogerbot-page.html

Sitemap Parameter

User-agent: * 

Disallow Googlebot from indexing of a folder, except for allowing the indexing of one file in that folder

User-agent: Googlebot 
Disallow: /folder1/ 
Allow: /folder1/myfile.html

Crawl Delay -Search engines allow you to set crawl priorities.

Microsoft's information for Bing is located

User-agent: bingbot
Crawl-delay: 10

where the 10 is in seconds.

