Domains use the robots.txt, also known or called the robots exclusion protocol to provide instruction about their site to web robots (also known as web wanders, crawlers, or spiders). A web robot is a program that systematically browses or runs scripts againt the internet for many different purposes including web indexing. Web Robots are used by such search engines as Google, Bing, and Yahoo to build their databases and to index web content.
The robots.txt is a text file, typically placed on the highest-level directory of your site or root of your domain. It is a publicly available file and can be viewed by anyone. When a web robot looks for the robots.txt from any URL, it will automatically strips the path components. The robots.txt will inform the web robot what sections of the website should be indexed or disallowed for scanning or processing.
Not all web robots will honor the robots.txt and will ignore the instructions. Some examples include spammers, who uses web robots to harvest e-mails and malware robots will scan the web for security vulnerabilities.