A robots.txt file is just what the name implies: a simple text file with the ending .txt. It doesn't have to be very large.
User-agent: googlebot User-agent: yahoobot User-agent: microsoftbot Disallow: /stuff/ User-agent: * Disallow: /notobeindexed/ Disallow: /example.html Sitemap: http://www.primitivecode.com/sitemap.xml
The robots.txt file tells the search engine what it should not index.
Here are two records. One for googlebot, yahoobot and microsoftbot telling these bots to not look into the folder "stuff". The second record is for all bots (*) telling them not to go and look into the folder "notobeindexed" and the file "example.html".
You have to create a file named robots.txt containing something like this. Then you have to put it into your main directory (root), that is the directory where the index.html or the index.php is located. The robot, be it googlebot or some other bot, will search there first. You can also specify the location of a sitemap in the robots.txt.
If you want to address a specific bot, do it like it is noted in the first data record. Each data record has to be seperated by an empty line. It is very useful when setting up a new site. Then you can add the following code (while working on your site) so that any broken urls don't get indexed:
User-agent: * Disallow: /
But don't forget to remove this total disallow. As you can see, there is only a Disallow possible. You cannot tell a bot to crawl a certain directory.