Robots.txt file

Seo

robots.txt file give instructions about their site to web robots, using 'the robots exclusion protocol'.

How it works:

When a robot visits a website, it first checks for a robots.txt file in the root directory.

Syntax:
User-agent:
Disallow:
Allow:

User-agent: are search engine robots/crawlers/spiders
Disallow: lists the files and directories to be excluded from indexing.
Allow: lists allowed files and dir, but not all robots use this!

Two important things to be mindful about:

1. robots, esp malware robots can ignore your /robots.txt file.

2. this file is publically available. Anyone can see which sections of your website you don't want robots to use.

Don't try to use /robots.txt to hide information.

To exclude all robots from the entire site (Hides website from search engines):

User-agent: *
Disallow: /

To allow all robots complete access:

User-agent: *
Disallow:

(or just create an empty "/robots.txt" file, or don't use one at all!)

To exclude all robots from some directories on server:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/

To exclude a single robot:

User-agent: MalwareBot
Disallow: /

To allow a single robot:

User-agent: Google
Disallow:

User-agent: *
Disallow: /

You can explicitly disallow some pages:

User-agent: *
Disallow: /~andrew/a.html
Disallow: /~andrew/b.html