##################################################### # Robots and Spiders # ##################################################### # What is a robots.txt File? ##################################################### # A robots.txt is a file placed on your server to tell the various search engine spiders not to crawl or index certain sections or pages of your site. You can use it to prevent indexing totally, prevent certain areas of your site from being indexes or to issue individual indexing instructions to specific search engines. # # There are a number of situations where you may wish to exclude spiders from some or all of your site. # # 1. You are still building the site, or certain pages, and do not want the unfinished work to appear in search engines # # 2. You have information that, while not sensitive enough to bother password protecting, is of no interest to anyone but those it is intended for and you would prefer it did not appear in search engines. # # 3. Most people will have some directories they would prefer were not crawled - for example do you really need to have your cgi-bin indexed? Or a directory that simply contains thank you or error pages. # # 4. If you are using doorway pages (similar pages, each optimized for an individual search engine) you may wish to ensure that individual robots do not have access to all of them. This is important in order to avoid being penalized for spamming a search engine with a series of overly similar pages. # # 5. You would like to exclude some bots or spiders altogether, for example those from search engines you do not want to appear in or those whose chief purpose is collecting email addresses. # # # User-Agent: [Spider or Bot name] # Disallow: [Directory or File Name] # # Example: # User-Agent: Googlebot # Disallow: /private/privatefile.htm # ##################################################### User-agent: * Disallow: /cgi-bin/