Robots.txt file is used to stop accessing specific part of a web site from web spiders and web robots. Robots exclusion standard or robots.txt protocol is a very popular method to hide a part of web site from search engines. You can specify a particular folder or a file from a particular search engine robot.
To enable this protocol you have to create a file names robots.txt and upload it to the top level folder. This is a plain text file so create it with windows notepad or with any text editor. First you have to specify the user agent and then the directories and files you want to hide. For example this will hide all files and folders from all robots
User-agent: *
Disallow: /
To prevent indexing cgi-bin and images directories use this
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
If you want to stop specific robots then use the robots name instead of wildcard. To stop Google's robot (googlebot) from accession myfolder use this
User-agent: Googlebot
Disallow: /myfolder/
You can also specify file names instead of folders.
User-agent:*
Disallow: /search.php
Disallow: /download.html
You have to be very careful if you have directives for all robots and particular for a special robot. For eg. Googlebot ignores general directives if there is a special section for Googlebot.
To prevent Yahoo search robot to access a section use
User-agent: Slurp
Disallow: /search-engines/
For msn search it is msnbot.
Normally a web robot visit the site collect the information and store it in it's internal database normally call it search index. So to avoid indexing unwanted or secret files you have to use this robots.txt file.
Google removal tool also uses robots.txt to remove unwanted urls. You can request remove urls from google's index with robots.txt file with this tool.
There are many other tags also exists in robots.txt protocol like crawl-delay, visit-time, request-rate etc. But these are not accepted by all robots. For eg. Googlebot doesnt obey crawl delay directive.
It is also possible to stop indexing a page with meta tag. Put this tag in each page you want to hide from search engines.
11 Eylül 2007 Salı
Robots.txt File
Gönderen
ike
zaman:
15:16
Etiketler: robot.txt, search engine
Kaydol:
Kayıt Yorumları (Atom)


Hiç yorum yok:
Yorum Gönder