Robots.txt File Generator

Advertisement

Default robot access

Additional rules

Action Robot Files or directories

Sitemap (optional)

Your Robots.txt File

Default -  All Robots are:  
Crawl-Delay:
Sitemap: (leave blank for none)
     
Specific Search Robots: Google   googlebot
  MSN Search   msnbot
  Yahoo   yahoo-slurp
  Ask/Teoma   teoma
  Cuil   twiceler
  GigaBlast   gigabot
  Scrub The Web   scrubby
  DMOZ Checker   robozilla
  Nutch   nutch
  Alexa/Wayback   ia_archiver
  Baidu   baiduspider
  Naver   naverbot, yeti
   
Specific Special Bots: Google Image   googlebot-image
  Google Mobile   googlebot-mobile
  Yahoo MM   yahoo-mmcrawler
  MSN PicSearch   psbot
  SingingFish   asterias
  Yahoo Blogs   yahoo-blogs/v3.9
   
Restricted Directories: The path is relative to root and must contain a trailing "/"
 
 
 
 
 
   

Now, copy and paste this text into a blank text file called "robots.txt" (don't forget the "s" on the end of "robots") and put it in your root directory. Like all other files on your server, make sure its permissions are set so that visitors (such as search engines) can read it.

Advertisement

Introduction to Robots.txt

The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on their website. Cheat Sheet

Block all web crawlers from all content

User-agent: * 
Disallow: /

Block a specific web crawler from a specific folder

User-agent: Googlebot 
Disallow: /no-google/

Block a specific web crawler from a specific web page

User-agent: Googlebot 
Disallow: /no-google/blocked-page.html

Allow a specific web crawler to visit a specific web page

Disallow: /no-bots/block-all-bots-except-rogerbot-page.html 
User-agent: rogerbot Allow: /no-bots/block-all-bots-except-rogerbot-page.html
                                

Sitemap Parameter

User-agent: * 
Disallow: 
Sitemap: http://www.example.com/none-standard-location/sitemap.xml
                                

Disallow Googlebot from indexing of a folder, except for allowing the indexing of one file in that folder

User-agent: Googlebot 
Disallow: /folder1/ 
Allow: /folder1/myfile.html
                                

Crawl Delay -Search engines allow you to set crawl priorities.

Microsoft's information for Bing is located

User-agent: bingbot
Crawl-delay: 10

where the 10 is in seconds.

Comments / Ratings / Reviews / Feedbacks

Did you like this report?