Search Engine Spiders Lost Without Guidance - Post This Sign!
The robots.txt file is an exclusion standard required by all web crawlers/robots to tell them what files and directories that you want them to stay OUT of on your site. Not all crawlers/bots follow the exclusion standard and will continue crawling your site anyway. I like to call them "Bad Bots" or trespassers. We block them by IP exclusion which is another story entirely.
This is a very simple overview of robots.txt basics for webmasters. For a complete and thorough lesson, visit http://www.robotstxt.org/
To see the proper format for a somewhat standard robots.txt file look directly below. That file should be at the root of the domain because that is where the crawlers expect it to be, not in some secondary directory.
Below is the proper format for a robots.txt file ----->
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /group/
User-agent: msnbot
Crawl-delay: 10
User-agent: Teoma
Crawl-delay: 10
User-agent: Slurp
Crawl-delay: 10
User-agent: aipbot
Disallow: /
User-agent: BecomeBot
Disallow: /
User-agent: psbot
Disallow: /
--------> End of robots.txt file
This tiny text file is saved as a plain text document and ALWAYS with the name "robots.txt" in the root of your domain.
A quick review of the listed information from the robots.txt file above follows. The "User Agent: MSNbot" is from MSN, Slurp is from Yahoo and Teoma is from AskJeeves. The others listed are "Bad" bots that crawl very fast and to nobody's benefit but their own, so we ask them to stay out entirely. The * asterisk is a wild card that means "All" crawlers/spiders/bots should stay out of that group of files or directories listed.
The bots given the instruction "Disallow: /" means they should stay out entirely and those with "Crawl-delay: 10" are those that crawled our site too quickly and caused it to bog down and overuse the server resources. Google crawls more slowly than the others and doesn't require that instruction, so is not specifically listed in the above robots.txt file. Crawl-delay instruction is only needed on very large sites with hundreds or thousands of pages. The wildcard asterisk * applies to all crawlers, bots and spiders, including Googlebot.
Those we provided that "Crawl-delay: 10" instruction to were requesting as many as 7 pages every second and so we asked them to slow down. The number you see is seconds and you can change it to suit your server capacity, based on their crawling rate. Ten seconds between page requests is far more leisurely and stops them from asking for more pages than your server can dish up.
(You can discover how fast robots and spiders are crawling by looking at your raw server logs - which show pages requested by precise times to within a hundredth of a second - available from your web host or ask your web or IT person. Your server logs can be found in the root directory if you have server access, you can usually download compressed server log files by calendar day right off your server. You'll need a utility that can expand compressed files to open and read those plain text raw server log files.)
To see the contents of any robots.txt file just type robots.txt after any domain name. If they have that file up, you will see it displayed as a text file in your web browser. Click on the link below to see that file for Amazon.com
http://www.Amazon.com/robots.txt
You can see the contents of any website robots.txt file that way.
The robots.txt shown above is what we currently use at Publish101 Web Content Distributor, just launched in May of 2005. We did an extensive case study and published a series of articles on crawler behavior and indexing delays known as the Google Sandbox. That Google Sandbox Case Study is highly instructive on many levels for webmasters everywhere about the importance of this often ignored little text file.
One thing we didn't expect to glean from the research involved in indexing delays (known as the Google Sandbox) was the importance of robots.txt files to quick and efficient crawling by the spiders from the major search engines and the number of heavy crawls from bots that will do no earthly good to the site owner, yet crawl most sites extensively and heavily, straining servers to the breaking point with requests for pages coming as fast as 7 pages per second.
We discovered in our launch of the new site that Google and Yahoo will crawl the site whether or not you use a robots.txt file, but MSN seems to REQUIRE it before they will begin crawling at all. All of the search engine robots seem to request the file on a regular basis to verify that it hasn't changed.
Then when you DO change it, they will stop crawling for brief periods and repeatedly ask for that robots.txt file during that time without crawling any additional pages. (Perhaps they had a list of pages to visit that included the directory or files you have instructed them to stay out of and must now adjust their crawling schedule to eliminate those files from their list.)
Most webmasters instruct the bots to stay out of "image" directories and the "cgi-bin" directory as well as any directories containing private or proprietary files intended only for users of an intranet or password protected sections of your site. Clearly, you should direct the bots to stay out of any private areas that you don't want indexed by the search engines.
The importance of robots.txt is rarely discussed by average webmasters and I've even had some of my client business' webmasters ask me what it is and how to implement it when I tell them how important it is to both site security and efficient crawling by the search engines. This should be standard knowledge by webmasters at substantial companies, but this illustrates how little attention is paid to use of robots.txt.
The search engine spiders really do want your guidance and this tiny text file is the best way to provide crawlers and bots a clear signpost to warn off trespassers and protect private property - and to warmly welcome invited guests, such as the big three search engines while asking them nicely to stay out of private areas.
Copyright © August 17, 2005 by Mike Banks Valentine
Google Sandbox Case Study http://publish101.com/Sandbox2 Mike Banks Valentine operates http://Publish101.com Free Web Content Distribution for Article Marketers and Provides content aggregation, press release optimization and custom web content for Search Engine Positioning http://www.seoptimism.com/SEO_Contact.htm
Related Articles:
Rock Your Rank With a Dynamite Text Link - Yahoo Directory Explodes Rankings
Last week a client called me excitedly exclaiming that their
Google PageRank had jumped a notch and their targeted keyword
term now ranked #23 (up from #45) for their competitive search
phrase. I asked the client if he'd been notified by Yahoo that
his site was now included in the index after we had submitted
it three weeks ago.
One Way Links and Reciprocal Link Exchange and Traffic
While reciprocal links are still valid and help you gain link popularity and page rank, many SEO experts agree that one way links are more valuable. One way links are also known as non-reciprocal links. Acquiring one way links are much more difficult than reciprocal links. One way links are a tool that can be quite beneficial to the webmaster. The very best one way links are those that are included in the content of another website, directing visitors to your website. One way links are those where you point to a site, or a site points to you without a link being returned. One way links are the best way to increase the link popularity of the site and get theme based links for natural search engine optimization.
Buying Links - How To Make Sure That The Links You Buy Are Worth It
Before you start looking at links to buy you need to know that not all links for sale are worth it There are many things that you need to look at before you buy those links
Traffic One Way Links And Reciprocal Link Exchange
While reciprocal links are still valid and help you gain link popularity and page rank, many SEO experts agree that one way links are more valuable. One way links are also known as non-reciprocal links. Acquiring one way links are much more difficult than reciprocal links. One way links are a tool that can be quite beneficial to the webmaster. The very best one way links are those that are included in the content of another website, directing visitors to your website. One way links are those where you point to a site, or a site points to you without a link being returned. One way links are the best way to increase the link popularity of the site and get theme based links for natural search engine optimization.
Link Survey Version 1.6: Improve Search Engine Ranking by Learning About Competitors
AntsSoft today announced the release of Link Survey version 1.6, the first software in the world which can check link popularity of multiple relative websites, make comprehensive analysis, and generate a detailed report.
25 Common Link Exchange & Search Engine Terms
In today's world of website promotion and traffic building, a whole new set of terms and definitions have developed. To be a successful webmaster and/or website owner, it is important to know the meanings of some of the most popular link exchange and search engine terms.
15 Proven Ways For Link Building To Improve Search Engine Rank
Almost all webmasters know that incoming links are food for website. The website will rank high in the search engine result pages as long as they have great quality incoming links with related anchor text.
Link Building and Link Strategy for Increased Web Traffic
Toronto, ON November 26, 2007 ? There are millions of websites in cyberspace. The challenge becomes how to ensure that your website is found on search engines and is seen by potential customers.
Link Building: To Link, or Not to Link, That is the Question
Lately, there have been a lot of heated discussions regarding link building. Is it ethical to create a link building campaign? Does Google or any other search engine penalize for "link farms" (a bunch of non-related links created for the SOLE purpose of increasing search engine ratings)? Is the "link building era" over?Link FarmsMany webmasters claim that Google penalizes websites for link farms.
Link Building To Improve Search Engine Rank
As the competition increasing every day so when you are planning to make business online then you need to optimize your site For this Search Engine Optimization or SEO is getting popular every day
SEO Link Building and Copywriting Service Relaunched by Search Engine Optimization Firm Brick Marketing
Brick Marketing has realized the importance of incorporating and combining link building into all aspects of online marketing for each of their clients. They have since re-launched their link building service for those interested in introducing online marketing in their business model.
Using Back Links to Get Top Search Engine Ranking
There are no hidden secrets on how to rank high with the major search engines. All that is needed is a basic understanding of how search engines work and a bit of know how.
Increase Your Search Engine Ranking With Non-Reciprocal Link Building
You don?t need to be taught over and over that one of the most effective and powerful ways of obtaining the best position in search engines is through link building Because of this, there are already a lot of webmasters that would either send you an e-mail or drop you a call, requesting for a link exchange with you
Smart Link's Local Submit Enhances SEO for Vertical Search Engines
Smart Link Web, a Michigan based (http://profiles.smartlinksolutions.com) leader in search engine optimization (SEO), now offers a method for small and local businesses to climb to the top of search engine rankings. This is in response to Google's recent change in its search results through the vertical search system. It introduces Smart Link Local Submit to give local and small scale businesses an edge in the field of online business. Unlike the traditional horizontal counterpart, vertical search results place local businesses above the normal organic results. Vertical searches are focused on the particular - and the online user is given ...
Link Exchange and Search Engines
There are many ways an online business can benefit through link exchange There is only one company who knows how to properly provide this type of service so you can succeed in the global marketplace today