Review: Web Spam Taxonomy

February 22, 2008
Authors: Zoltan Gyongyi, Hector Garcia-Molina.
Year: 2005
Published in:
Link: http://airweb.cse.lehigh.edu/2005/gyongyi.pdf
Important: Very High

Abstract

Web spamming refers to actions intended to mislead search engines into ranking some pages higher than they deserve. Recently, the amount of web spam has increased
dramatically, leading to a degradation of search results. This paper presents a comprehensive taxonomy of current spamming techniques, which we believe can help in developing appropriate countermeasures.

My review

In this paper authors presents a comprehensive collection of spams in cyber world. They describe each section of spamming very clear along with examples that made me continue reading paper without stop. Begin from impact of spams: 1. decrease of search engine result 2. increase cost of search query
One drawback of this paper is that authors only consider spam as defined as below:
all types of actions intended to boost ranking (either relevance, or importance, or both), without improving the true value of a page, are considered spamming.
but personally thought that there are some kind of other spams in web that are not classify as spam by above definition. such as spam page which accessible when user enter URLs incorrectly. e.g. enter gmal.com instead of Gmail.com you will see a spam page.Interestingly, gmal.com is an unregistred domain name so it can not be indexed in search engines. there is not classification for this kind of spam pages which are exists in market.

Authors describe each section of spamming and target algorithms in each section.
Two techniques associated with web spam:
Boost technique: achieve high relevant/important of page, influence search engine ranking
Hide technique: hide boosting technique from eyes of human

Important Terms

Search Engine Optimizer – SEO

Boost technique

Hide technique

Term spamming

TFIDF

HITS

Cloaking

Advertisements