The Problem Of Demoting Spam On The Internet: Yahoo!?s Trustrank Approach
TrustRank is an attempt to counter the web spamming activities that threatens to deceive search engines? ranking algorithms. It propagates trust among web pages in the same manner that PageRank propagates authority. However, tests would show that the combination of trust and distrust values have greater ability to demote spam sites than with the use of trust values alone.
The Assumption
A link between two pages holds an implied conveyance of trust emanating from the source page to the target page. Pointing to a link is a vote of confidence from the source that the target is able to provide content that will be of value to the user. It basically revolves around the ideal set-up that good sites only point to similarly good sites and will not knowingly refer people to spam sites. These good sites hold the trust of people which is then used in propagating trust through the link structure of the web.
TrustRank hopes to use a set of highly trusted seed sites to help in demoting web spam. The approach assigns a non-zero initial trust score to these seed sites while assigning initial values of zero to all other sites. A biased PageRank algorithm is used to propagate these initial trust scores to the outgoing sites where good sites are expected to get a decent trust score while spam sites are likely to get lower trust scores after convergence.
The possibility of a page pointing to a spam page increases as the number of links increases. It has been proposed that the trust score of a parent page be equally split among the children pages. There is the question as to the logic of having different trust scores for children pages in cases of multiple parent pages. TrustRank provides a solution by simple summation which has been not quite effective in curtailing the spam site?s efforts to raise their ranking.
The conveyance of distrust emerged as a natural extension of the conveyance of trust between links. Distrust may be an indication of lack of confidence to a source page due to its linkage to an untrustworthy page. Thus, when a link with a known spam page is established, the trust judgment of the source page cannot be considered valid.
TrustRank as it was originally conceived, proposed that trust should be reduced as we move further away from the seed set of trusted pages. However, the limited number of seed pages makes it impossible for the whole web to be touched by propagation. A well performing algorithm is needed to produce trust judgments at least for a larger fraction of web pages.
The seed sets used may not be able to sufficiently represent the different topics of the web. TrustRank tend to show a bias towards larger communities which can be remedied by the use of topical information to divide the seed set and calculate trust scores separately for each topic. The use of the pages listed in well-maintained topic directories can help in resolving the coverage issue. Seed filtering may be done to remove low quality pages or even spam pages that may inadvertently been included in the pool of seed pages.
Much work is being done to come up with methods that don?t rely heavily on human judgment for identification of spam free pages. As it is, searchers are highly challenged to locate pages that would serve their needs and not those that are intended for high ranking in search engines. Sites that do not provide any value to users are just too many to be ignored.
Semantic Cloaking on the Web
Semantics is the study or science of meaning in language that takes words and compares them with other words or symbols and determines the relevancy and relationship between them. Semantic cloaking is the practice of supplying different versions of a web page to search engines and to browsers. The purpose of the content provider is to hide the real content of the page from the view of search engines. The difference in meaning between the pages is supposed to deceive search engines? ranking algorithms. Cloaking is one type of search engine spamming technique that makes it possible for non-relevant pages to occupy top ranking in searches.
Search engines are used by people when they need to find the most relevant responses to their search. It is typical for users to view just one page of results thus sites are hard put to compete for the top rankings particularly for popular queries. Increased traffic to a commercial website is equivalent to more profit.
Reputable content providers work hard to come up with high quality web pages to get their desired high ranking. Unfortunately, not all content providers hold the same view. These are the people that would try to reach high ranking through manipulation of web page features used by search engines as basis for their ranking algorithms.
Ranking algorithms assumes that page content is real. This means that the content seen by search engines is identical to that seen by actual users with browsers. With the use of the web spamming technique of cloaking, different versions are successfully supplied causing a big amount of confusion and disappointment for users.
Cloaking falls under the page-hiding spam category in search engine spamming techniques. Some cloaking behavior is considered acceptable. Cloaking is of two types ? syntactic and semantic. Syntactic cloaking includes all situations in which different content is sent to a crawler and real user. Semantic cloaking is an offshoot of syntactic cloaking which employs differences in meaning between pages to deceive the ranking algorithms of search engines.
Syntactic cloaking may be acceptable in cases such as web servers using session identifiers within URLs for copies sent to browser and no such identifiers for copies sent to crawlers. This is in effect being used by web servers to differentiate their users. Search engines may interpret these identifiers as a change in the page. The cloaking behavior that needs to be penalized is the semantic cloaking.
There are various proposals on ways to counter the problem. One proposal suggests the comparison of copies from both the browser?s perspective and the crawler?s perspective. It may be necessary to get two or more copies from each side to be able to detect cloaking. Another suggests a two-step process that would require fewer resources. The first step implements a filter by use of heuristics to eliminate web pages that cannot demonstrate cloaking. All the pages that have not been eliminated will go through the second step for inspection. Features are extracted from about four copies and a classifier is used to determine whether semantic cloaking is being done or not. However, the reality remains that no ideal solution has been arrived at to effectively curb semantic cloaking. This is a technique that should not be practiced by anyone who wants to maintain good business ethics. The practice continues to undermine the search engine?s attempts to provide users with the actual information they need.
Related Articles:
Rock Your Rank With a Dynamite Text Link - Yahoo Directory Explodes Rankings
Last week a client called me excitedly exclaiming that their
Google PageRank had jumped a notch and their targeted keyword
term now ranked #23 (up from #45) for their competitive search
phrase. I asked the client if he'd been notified by Yahoo that
his site was now included in the index after we had submitted
it three weeks ago.
Buying Links - How To Make Sure That The Links You Buy Are Worth It
Before you start looking at links to buy you need to know that not all links for sale are worth it There are many things that you need to look at before you buy those links
Using Back Links to Get Top Search Engine Ranking
There are no hidden secrets on how to rank high with the major search engines. All that is needed is a basic understanding of how search engines work and a bit of know how.
Traffic One Way Links And Reciprocal Link Exchange
While reciprocal links are still valid and help you gain link popularity and page rank, many SEO experts agree that one way links are more valuable. One way links are also known as non-reciprocal links. Acquiring one way links are much more difficult than reciprocal links. One way links are a tool that can be quite beneficial to the webmaster. The very best one way links are those that are included in the content of another website, directing visitors to your website. One way links are those where you point to a site, or a site points to you without a link being returned. One way links are the best way to increase the link popularity of the site and get theme based links for natural search engine optimization.
Google Grants Links to SEMcares.com to Help Non-Profits with Volunteer Search Engine Marketing
The Official Google Grants Blog tells non-profits about SEMcares.com, the not-for-profit database that connects non-profits and volunteer search engine marketing providers.
Boost Your Search Engine Ranking And Generate Free Traffic With Reciprocal Links
Reciprocal links are an important step in your overall plan to get site visitors.What are they? Reciprocal links are mutual links you and some other web site owner agree to post on your respective sites.
Link Building: To Link, or Not to Link, That is the Question
Lately, there have been a lot of heated discussions regarding link building. Is it ethical to create a link building campaign? Does Google or any other search engine penalize for "link farms" (a bunch of non-related links created for the SOLE purpose of increasing search engine ratings)? Is the "link building era" over?Link FarmsMany webmasters claim that Google penalizes websites for link farms.
Increase Your Search Engine Ranking With Non-Reciprocal Link Building
You don?t need to be taught over and over that one of the most effective and powerful ways of obtaining the best position in search engines is through link building Because of this, there are already a lot of webmasters that would either send you an e-mail or drop you a call, requesting for a link exchange with you
One Way Links and Reciprocal Link Exchange and Traffic
While reciprocal links are still valid and help you gain link popularity and page rank, many SEO experts agree that one way links are more valuable. One way links are also known as non-reciprocal links. Acquiring one way links are much more difficult than reciprocal links. One way links are a tool that can be quite beneficial to the webmaster. The very best one way links are those that are included in the content of another website, directing visitors to your website. One way links are those where you point to a site, or a site points to you without a link being returned. One way links are the best way to increase the link popularity of the site and get theme based links for natural search engine optimization.
15 Proven Ways For Link Building To Improve Search Engine Rank
Almost all webmasters know that incoming links are food for website. The website will rank high in the search engine result pages as long as they have great quality incoming links with related anchor text.
Link Building and Link Strategy for Increased Web Traffic
Toronto, ON November 26, 2007 ? There are millions of websites in cyberspace. The challenge becomes how to ensure that your website is found on search engines and is seen by potential customers.
Smart Link's Local Submit Enhances SEO for Vertical Search Engines
Smart Link Web, a Michigan based (http://profiles.smartlinksolutions.com) leader in search engine optimization (SEO), now offers a method for small and local businesses to climb to the top of search engine rankings. This is in response to Google's recent change in its search results through the vertical search system. It introduces Smart Link Local Submit to give local and small scale businesses an edge in the field of online business. Unlike the traditional horizontal counterpart, vertical search results place local businesses above the normal organic results. Vertical searches are focused on the particular - and the online user is given ...
Link Survey Version 1.6: Improve Search Engine Ranking by Learning About Competitors
AntsSoft today announced the release of Link Survey version 1.6, the first software in the world which can check link popularity of multiple relative websites, make comprehensive analysis, and generate a detailed report.
Is Exchanging Links Better Than One Way Links
When establishing links and exchanging links this helps your rankings with the search engines and builds on connecting with other business owners. When exchanging links with other webmasters you will need to give them your code and you will need to use their code on your site.
Build Links, Increase Page Rank, Increase Traffic
Search Engines in the last couple of years are giving more weight to one way links with a similar theme, these links are a vote of trust and confidence for your website, they are so important that they help your site in the rankings of search engines. One search engine in particular uses link popularity, that search engine is Google. When you improve your link popularity it will eventually move your site up in the serps, this is the goals of every webmaster.