ROBOTS.TXT


Robots, Crawlers, Spiders etc are all recognized with the same. These crawlers indexes our website in their search engine databases. Google robots are called as Googlebot, Yahoo robots are called as Yahoo Slurp and MSN robots are called as MSN bot.


robots.txt is a text file which is created at the time of creating a website. This name is case sensitive, maintain all the letters to be lower case.


We give the instruction to the search engine through this file robots.txt file. robots.txt file guides the search engines whether to include or exclude the particular file or directory in its database. We create this file to hide the sensitive information from displaying them in the search engine results.


We place this file robots.txt in the root directory. That means we place this file besides the index file of our website.





How to create a robots.txt file ?


Example: robots.txt example

User-agent: *
Disallow: /examples_folder/example.html

User-agent: * means that the instruction is for all the search engine crawlers.
Disallow: instructs the crawler not to index the file /example_folder/example.html

If there are many files which you don't want them to be crawlerd. we can give many Disallow: command for every file.


Example:

User-agent: *
Disallow: /examples_folder/example.html
Disallow: /examples_folder/example22.html
Disallow: /howto/sample.html


If you want all the files from a folder not be crawled by the search engines. The we give the command as Disallow: /example_folder. All the files inside this folder will not be indexed.



NOTE: There is an another process of giving the instruction. We do that through <meta> tag in the head section of every page.