How to create and configure Robots.txt File || Disallow URL

146

Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:


User-agent: * 

Disallow: /

The “User-agent: *” means this section applies to all robots. The “Disallow: /” tells the robot that it should not visit any pages on the site

See given link for more details: Click here

How it’s work see given examples:

  • To exclude all robots from the entire server

User-agent: *
Disallow: /

 

  • To allow all robots complete access
User-agent: *
Disallow:

(or just create an empty “/robots.txt” file, or don’t use one at all)

 

  • To exclude all robots from part of the server
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

 

  • To exclude a single robot
User-agent: BadBot
Disallow: /

  •  To allow a single robot

 

User-agent: Google
Disallow:

User-agent: *
Disallow: /

  •  To exclude all files except one
    This is currently a bit awkward, as there is no “Allow” field. The easy way is to put all files to be disallowed into a separate directory, say “stuff”, and leave the one file in the level above this directory:

 

User-agent: *
Disallow: /~joe/stuff/

  •  Alternatively you can explicitly disallow all disallowed pages:

 

User-agent: *
Disallow: /~joe/junk.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html

 

related articles:

LEAVE A REPLY

Please enter your comment!
Please enter your name here