60 Day Sandbox for Google & AskJeeves; MSN Indexes Quickest, Yahoo Next


Search engine listing delays have come to be called the Google Sandbox effect are actually true in practice at each of four top tier search engines in one form or another. MSN, it seems has the shortest indexing delay at 30 days. This article is the second in a series following the spiders through a brand new web site beginning on May 11, 2005 when the site was first made live on that day under a newly purchased domain name.

First Case Study Article

Previously we looked at the first 35 days and detailed the crawling behavior of Googlebot, Teoma, MSNbot and Slurp as they traversed the pages of this new site. We discovered the each robot spider displays distinctly different behavior in crawling frequency and similarly differing indexing patterns.

For reference, there are about 15 to 20 new pages added to the site daily, which are each linked from the home page for a day. Site structure is non-traditional with no categories and a linking structure tied to author pages listing their articles as well as a "related articles" index varied by linking to relevant pages containing similar content.

So let's review where we are with each spider crawling and look at pages crawled and compare pages indexed by engine.

The AskJeeves spider, Teoma has crawled most of the pages on the site, yet indexes no pages 60 days later at this writing. This is clearly a site aging delay that's modeled on Google's Sandbox behavior. Although the Teoma spider from Ask.com has crawled more pages on this site than any other engine over a 60 day period and appears to be tired of crawling as they've not returned since July 13 - their first break in 60 days.

In the first two days, Googlebot gobbled up 250 pages and didn't return until 60 days later, but has not indexed even a single page in 60 days since they made that initial crawl. But Googlebot is showing a renewed interest in crawling the site since this crawling case study article was published on several high traffic sites. Now Googlebot is looking at a few pages each day. So far no more than about 20 pages at a decidedly lackluster pace, a true "Crawl" that will keep it occupied for years if continued that slowly.

MSNbot crawled timidly for the first 45 days, looking over 30 to 50 pages daily, but not until they found a robots.txt file, which we'd neglected to post to the site for a week and then bobbled the ball as we changed site structure, then failed to implement robots.txt in new subdomains until day 25 - and THEN MSNbot didn't return until day 30. If little else were discovered about initial crawls and indexing, we have seen that MSNbot relies heavily on that robots.txt file and proper implementation of that file will speed crawling.

MSNbot is now crawling with enthusiasm at anywhere between 200 to 800 pages daily. As a matter of fact, we had to use a "crawl-delay" command in the robots.txt file after MSNbot began hitting 6 pages per second last week. The MSN index now shows 4905 pages 60 days into this experiment. Cached pages change weekly. MSNbot has apparently found that it likes how we changed the page structure to include a new feature which links to questions from several other article pages.

Slurp gets strangely inactive then alternately hyperactive for periods of time. The Yahoo crawler will look at 40 pages one day and then 4000 the next, then simply look at the home page for a few days and then jump back in for 3000 pages the next day and back to only reviewing robots.txt for two days. Consistency is not a curse suffered by Slurp. Yahoo now shows 6 pages in their index, one an errors page and another is a "index/of" page as we have not posted a home page to several subdomains. But Slurp has crawled easily 15,000 pages to date.

Lessons learned in the first 60 days on a new site follow:

1) Google crawls 250 pages on first discovery of links to site. Then they don't return until they find more links and crawl slowly. Google has failed to index new domain for 60 days.

2) Yahoo looks for errors pages and once they find bad links will crawl them ceaselessly until you tell them to stop it. Then won't crawl at all for weeks until crawling heavily one day and lightly the next in random fashion.

3) MSNbot requires robots.txt files and once they decide they like your site, may crawl too fast, requiring "crawl-delay" instructions in that robots.txt file. Implement immediately.

4) Bad bots can strain resources and hit too many pages too quickly until you tell them to stay out. We banned 3 bots outright after they slammed our servers for a day or two. Noted "aipbot" crawled first then "BecomeBot" came along and then "Pbot" from Picsearch.com crawled heavily looking for image files we don't have. Bad bots, stay out. Best to implement robots.txt exclusions for all but top engines if their crawlers strain your server resources. We considered excluding the Chinese search engine named Baidu.com when they began crawling heavily early on. We don't expect much traffic from China, but why exclude one billion people? Especially since Google is rumored to be considering a possible purchase of Baidu.com as entry to Chinese market.

The bottom line is that we've discovered all engines seem to delay indexing of new domain names for at least thirty days. Google so far has delayed indexing THIS new domain for 60 days since first crawling it. AskJeeves has crawled thousands of pages, while indexing none of them. MSN indexes faster than all engines but requires robots.txt file. Yahoo's Slurp crawls on again off again for 60 days, but indexes only six of total 15,000 or more pages crawled to date.

We seem to have settled that there is a clear indexing delay, but whether this site specifically is "Sandboxed" and whether delays apply universally is less clear. Many webmasters claim that they have been indexed fully within 30 days of first posting a new domain. We'd love to see others track spiders through new sites following launch to document their results publicly so that indexing and crawling behavior are proven.

© Copyright July 18, 2005 Mike Banks Valentine

Mike Banks Valentine is a search engine optimization specialist who operates WebSite101 eCommerce Tutorial and will continue reports of case study chronicling search indexing of Publish101 Article Resource

Click to Contact Mike Valentine

This Site Is For Sale

Related Articles:

Smart Link's Local Submit Enhances SEO for Vertical Search Engines
Smart Link Web, a Michigan based (http://profiles.smartlinksolutions.com) leader in search engine optimization (SEO), now offers a method for small and local businesses to climb to the top of search engine rankings. This is in response to Google's recent change in its search results through the vertical search system. It introduces Smart Link Local Submit to give local and small scale businesses an edge in the field of online business. Unlike the traditional horizontal counterpart, vertical search results place local businesses above the normal organic results. Vertical searches are focused on the particular - and the online user is given ...

Link Exchange and Search Engines
There are many ways an online business can benefit through link exchange There is only one company who knows how to properly provide this type of service so you can succeed in the global marketplace today

Link Survey Version 1.6: Improve Search Engine Ranking by Learning About Competitors
AntsSoft today announced the release of Link Survey version 1.6, the first software in the world which can check link popularity of multiple relative websites, make comprehensive analysis, and generate a detailed report.

15 Proven Ways For Link Building To Improve Search Engine Rank
Almost all webmasters know that incoming links are food for website. The website will rank high in the search engine result pages as long as they have great quality incoming links with related anchor text.

Using Back Links to Get Top Search Engine Ranking
There are no hidden secrets on how to rank high with the major search engines. All that is needed is a basic understanding of how search engines work and a bit of know how.

Link Building and Link Strategy for Increased Web Traffic
Toronto, ON November 26, 2007 ? There are millions of websites in cyberspace. The challenge becomes how to ensure that your website is found on search engines and is seen by potential customers.

Buying Links - How To Make Sure That The Links You Buy Are Worth It
Before you start looking at links to buy you need to know that not all links for sale are worth it There are many things that you need to look at before you buy those links

SEO Link Building and Copywriting Service Relaunched by Search Engine Optimization Firm Brick Marketing
Brick Marketing has realized the importance of incorporating and combining link building into all aspects of online marketing for each of their clients. They have since re-launched their link building service for those interested in introducing online marketing in their business model.

One Way Links and Reciprocal Link Exchange and Traffic
While reciprocal links are still valid and help you gain link popularity and page rank, many SEO experts agree that one way links are more valuable. One way links are also known as non-reciprocal links. Acquiring one way links are much more difficult than reciprocal links. One way links are a tool that can be quite beneficial to the webmaster. The very best one way links are those that are included in the content of another website, directing visitors to your website. One way links are those where you point to a site, or a site points to you without a link being returned. One way links are the best way to increase the link popularity of the site and get theme based links for natural search engine optimization.

Linking for Traffic: The Shift from Link Directories to Hyper-Targeted Linking
There's a stiff wind blowing in a new direction on the web. And you'd benefit from taking the time to notice the direction its headed.

Boost Your Search Engine Ranking And Generate Free Traffic With Reciprocal Links
Reciprocal links are an important step in your overall plan to get site visitors.What are they? Reciprocal links are mutual links you and some other web site owner agree to post on your respective sites.

Google Grants Links to SEMcares.com to Help Non-Profits with Volunteer Search Engine Marketing
The Official Google Grants Blog tells non-profits about SEMcares.com, the not-for-profit database that connects non-profits and volunteer search engine marketing providers.

Link Building: To Link, or Not to Link, That is the Question
Lately, there have been a lot of heated discussions regarding link building. Is it ethical to create a link building campaign? Does Google or any other search engine penalize for "link farms" (a bunch of non-related links created for the SOLE purpose of increasing search engine ratings)? Is the "link building era" over?Link FarmsMany webmasters claim that Google penalizes websites for link farms.

Traffic One Way Links And Reciprocal Link Exchange
While reciprocal links are still valid and help you gain link popularity and page rank, many SEO experts agree that one way links are more valuable. One way links are also known as non-reciprocal links. Acquiring one way links are much more difficult than reciprocal links. One way links are a tool that can be quite beneficial to the webmaster. The very best one way links are those that are included in the content of another website, directing visitors to your website. One way links are those where you point to a site, or a site points to you without a link being returned. One way links are the best way to increase the link popularity of the site and get theme based links for natural search engine optimization.

Build Links, Increase Page Rank, Increase Traffic
Search Engines in the last couple of years are giving more weight to one way links with a similar theme, these links are a vote of trust and confidence for your website, they are so important that they help your site in the rankings of search engines. One search engine in particular uses link popularity, that search engine is Google. When you improve your link popularity it will eventually move your site up in the serps, this is the goals of every webmaster.


Privacy Policy | Copyright/Trademark Notification
eXTReMe Tracker