Review: Developing a Framework for Assessing Information Quality on the World Wide Web

Authors: Shirlee-ann Knight and Janice Burn
Year: 2005
Published in: Informing Science Journal Volume 8
Link: http://inform.nu/Articles/Vol8/v8p159-172Knig.pdf
Importance: Medium 

 Abstract

The rapid growth of the Internet as an environment for information exchange and the lack of enforceable standards regarding the information it contains has lead to numerous information quality problems. A major issue is the inability of Search Engine technology to wade through the vast expanse of questionable content and return “quality” results to a user’s query. This paper attempts to address some of the issues involved in determining what quality is, as it pertains to information retrieval on the Internet. The IQIP model is presented as an approach to managing the choice and implementation of quality related algorithms of an Internet crawling Search Engine.

My Review

In this paper authors discuss about the problem of Information quality in WWW from search engines perspectives. They clearly define the problem and current solutions. Their proposed model (IQIP) consist four parts:

  • Identify: user, environment and task
  • Quantify: Prioritise information quality dimensions
  • Implement: implement chosen IQ dimension into Web Crawler
  • Perfect: improve crawler through feedback

Their proposed model can be used for attacking spam in WWW. As my supervisor (Dr. Potdar) suggested we make use of this model in anti-spam methods. Simple example:

  • Identify: here we study spammers behavior their subjects, behavior, …
  • Environment: contain study of Splog, Sforums, Spam pages, …
  • Task: Spam detection based on spam characteristic which previously exists in literature.

More review coming soon….

Advertisements

2 Responses to Review: Developing a Framework for Assessing Information Quality on the World Wide Web

  1. Vidy says:

    Hi Pedram
    This may be useful for your research

    ========================================================
    Toy corpus of spam in blog comments
    A small collection of 50 blog pages, with 1024 comments; manual classifications of these comments as spam or non-spam (67% are spam). For questions, contact Gilad Mishne.

    Note: by downloading the corpus you agree to the disclaimer.

    If you publish results obtained using this resource, please cite this paper:

    Blocking Blog Spam with Language Model Disagreement, G. Mishne, D. Carmel, and R. Lempel. In: AIRWeb ’05 – First International Workshop on Adversarial Information Retrieval on the Web, at the 14th International World Wide Web Conference (WWW2005), 2005. [PDF]
    ========================================================
    Link
    http://ilps.science.uva.nl/Resources/blogspam/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: