Review: Improving Cloaking Detection Using Search Query Popularity and Monetizability

May 16, 2008
Authors: Kumar Chellapilla, David Maxwell Chickering
Year: 2006
Published in: 2 nd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb)
Importance: High


Cloaking is a search engine spamming technique used by some
Web sites to deliver one page to a search engine for indexing
while serving an entirely different page to users browsing the site.
In this paper, we show that the degree of cloaking among search
results depends on query properties such as popularity and
monetizability. We propose estimating query popularity and
monetizability by analyzing search engine query logs and online
advertising click-through logs, respectively. We also present a
new measure for detecting cloaked URLs that uses a normalized
term frequency ratio between multiple downloaded copies of Web
pages. Experiments are conducted using 10,000 search queries
and 3 million associated search result URLs. Experimental results
indicate that while only 73.1% of the cloaked popular search
URLs are spam, over 98.5% of the cloaked monetizable search
URLs are spam. Further, on average, the search results for top 2%
most cloaked queries are 10x more likely to be cloaking than
those for the bottom 98% of the queries.

My Review

the view presented in this paper is that authors using new method to detect cloacking. From authors’ point of view clocking is a kind of hidding technique which use by spam pages to view different page to the search engine crawlers than real user. In this case spam page designers can trick crawler by some tricky methods to get higher positions in search engine ranking. Authors claims that monetizability is a goal of web spam. They used MSN serach engine as their primary data set.

An outcome of their work is 98% accurate for monetizable cloacking spam and 75% for popular query spam.