Abstract
The rapid growth of the Internet as an environment for information exchange and the lack of enforceable standards regarding the information it contains has lead to numerous information quality problems. A major issue is the inability of Search Engine technology to wade through the vast expanse of questionable content and return “quality” results to a user’s query. This paper attempts to address some of the issues involved in determining what quality is, as it pertains to information retrieval on the Internet. The IQIP model is presented as an approach to managing the choice and implementation of quality related algorithms of an Internet crawling Search Engine.
My Review
In this paper authors discuss about the problem of Information quality in WWW from search engines perspectives. They clearly define the problem and current solutions. Their proposed model (IQIP) consist four parts:
- Identify: user, environment and task
- Quantify: Prioritise information quality dimensions
- Implement: implement chosen IQ dimension into Web Crawler
- Perfect: improve crawler through feedback
Their proposed model can be used for attacking spam in WWW. As my supervisor (Dr. Potdar) suggested we make use of this model in anti-spam methods. Simple example:
- Identify: here we study spammers behavior their subjects, behavior, …
- Environment: contain study of Splog, Sforums, Spam pages, …
- Task: Spam detection based on spam characteristic which previously exists in literature.
More review coming soon….
April 9, 2008 at 11:37 am
Hi Pedram
This may be useful for your research
========================================================
Toy corpus of spam in blog comments
A small collection of 50 blog pages, with 1024 comments; manual classifications of these comments as spam or non-spam (67% are spam). For questions, contact Gilad Mishne.
Note: by downloading the corpus you agree to the disclaimer.
If you publish results obtained using this resource, please cite this paper:
Blocking Blog Spam with Language Model Disagreement, G. Mishne, D. Carmel, and R. Lempel. In: AIRWeb ‘05 – First International Workshop on Adversarial Information Retrieval on the Web, at the 14th International World Wide Web Conference (WWW2005), 2005. [PDF]
========================================================
Link
http://ilps.science.uva.nl/Resources/blogspam/
April 9, 2008 at 11:51 am
The BlogVox Opinion Retrieval System
http://ebiquity.umbc.edu/_file_directory_/papers/343.pdf