EW Resource

Creating robots.txt files in EW

Robots.txt files can be used to mark both directories and files that you do not want search engine spiders to access.

The format is straight forward:

User-agent: *
Disallow: /bin/
Disallow: /images/

The * against User-agent ALLOWS all search engines to index the site

User-agent: Gaaglebot   would PREVENT the Gaagle crawler from indexing the site.

An extension to the protocol know as 'Sitemaps Autodiscovery' specifies the sites sitemap to all SE's that recognise the extension. It does not currently have universal recognition but is worth including for the SEs that can use it.

Sitemap: http://www.mysite.com/sitemap.xml

So our complete robots.txt file might look like: (note that the file name should be in lower case; robots.txt NOT Robots.txt

User-agent: *
Disallow: /bin/
Disallow: /images/
Disallow: /testpages/
Sitemap: http://www.mysite.com/sitemap.xml

To create this file is EW use 'File - New - Page', select 'General' and then select 'Text File' from the options given.

Now, when we create this in EW and save the file it will be saved in the current default character encoding and, whilst it may look correct in EW, will not function correctly on the site. To verify this use one of the on-line robots.txt syntax checkers such as http://tool.motoricerca.info/robots-checker.phtml

For EW V1

To get around this close the file in EW and then open it in Notepad. Then use 'Save As' in Notepad and select ANSI Encoding. Then publish or FTP the file using EW as normal. Once the file has been saved with the correct encoding it can be edited and saved in EW as normal.

For EW V2/V3/V4

Right-Click the page and select 'Encoding'. Then use the 'Save the current file as' drop-down and select 'US/Western European (Windows)'. The click the 'Save As' button. You'll be asked whether you want to replace the current file. Do this.

For further information on robots.txt files see The Web Robot Pages