Review: Opinion Spam and Analysis

May 5, 2008
Authors: Nitin Jindal and Bing Liu
Year: 2008
Published in: Proceedings of the international conference on Web search and web data mining
Link: http://portal.acm.org/citation.cfm?id=1341560
Importance: High

Abstract

Evaluative texts on the Web have become a valuable source of opinions on products, services, events, individuals, etc. Recently, many researchers have studied such opinion sources as product reviews, forum posts, and blogs. However, existing research has been focused on classification and summarization of opinions using natural language processing and data mining techniques. An important issue that has been neglected so far is opinion spam or trustworthiness of online opinions. In this paper, we study this issue in the context of product reviews, which are opinion rich and are widely used by consumers and product manufacturers. In the past two years, several startup companies also appeared which aggregate opinions from product reviews. It is thus high time to study spam in reviews. To the best of our knowledge, there is still no published study on this topic, although Web spam and email spam have been investigated extensively. We will see that opinion spam is quite different from Web spam and email spam, and thus requires different detection techniques. Based on the analysis of 5.8 million reviews and 2.14 million reviewers from amazon.com, we show that opinion spam in reviews is widespread. This paper analyzes such spam activities and presents some novel techniques to detect them

My Review

As authors claimed spam reviews are currently unique area and they are different from other type of web spams as disscussed in Review: Review Spam Detection by the same authors. they classify review spams into 3 types:

  • Type 1: Deliberately mislead reviews. Hard to detect.
  • Type 2: Reviews on brands only not product.
  • Type 3: non-reviews suc has ads, question and answers, …

Their test enviorment is Amazon.com with 5.8 million reviews.

As discussed in previous post, Type 1 reviews are hard to detect. Authors propose new way to study this problem. They first find which reviews are harmful. harmful mean those review that are different from others reviews in a product page.

In their model, 36 feature for reviews, reviewers and products proposed, first these features used to detect type 2 and 3 (duplicate) reviews then they used this as a trainning set for detecting type 1 reviews.

Their result based on AUC ecaluation is 98% for type 2 and 3 spams and 78% for type 1.

Suggestions

  1. they try to detect good and bad products based on product rating, may be this assumtion is not suitable since we have some spam ratings. although they mentioned to this drawback using data mining method for dicovering which product is good which is bad is recommanded.
  2. They do not mentioned to cost/benefit of their model.
  3. Other type of features which are not accesible from user frontpage may be make the results better.

Important terms


Review: Review Spam Detection

April 29, 2008
Author: Nitin Jindal and Bing Liu
Year: 2007
Published in: The International World Wide Web Conference Committee
Link: http://www2007.org/posters/poster930.pdf
Importance: High

Abstract

It is now a common practice for e-commerce Web sites to enable
their customers to write reviews of products that they have
purchased. Such reviews provide valuable sources of information
on these products. They are used by potential customers to find
opinions of existing users before deciding to purchase a product.
They are also used by product manufacturers to identify problems
of their products and to find competitive intelligence information
about their competitors. Unfortunately, this importance of reviews
also gives good incentive for spam, which contains false positive
or malicious negative opinions. In this paper, we make an attempt
to study review spam and spam detection. To the best of our
knowledge, there is still no reported study on this problem

My Review

This only two pages paper talks about review spams which is new to the word of web spam. According to author claim review spams are different from web and email spam so we need new methods for detecting review spam. Also review spams are hard to detect even manually.

Why hard to detect?

  1. Similarity to real reviews.
  2. Not enough meta-data for analysing.

Mainly, author tries to detect duplicate reviews in this paper and they provide a model based on shingle method. other type of review as author said are hard to detect and the outcome of their work is small.

Suggestions

Personally think that since still it is hard to detect review spam manually we should improve spam prevention methods such as CAPTCHA in order to disallow review spams (Sreview). So for the time being I have no idea on detecting review spam after postage.

Good reference

Web Data mining book by Bing Liu


Review: Review Spam Detection

March 5, 2008
Authors: Nitin Jindal and Bing Liu
Year: 2007
Published in: The International World Wide Web Conference Committee (IW3C2)
Link: http://www.www2007.org/htmlposters/poster930/
Importance: Very High

Abstract

First of all Dr. Vidy Potdar wrote a great review on this paper on his blog. After I wrote my review I realized that. so here is my review:

It is now a common practice for e-commerce Web sites to enable their customers to write reviews of products that they have purchased. Such reviews provide valuable sources of information on these products. They are used by potential customers to find opinions of existing users before deciding to purchase a product. They are also used by product manufacturers to identify problems of their products and to find competitive intelligence information about their competitors. Unfortunately, this importance of reviews also gives good incentive for spam, which contains false positive or malicious negative opinions. In this paper, we make an attempt to study review spam and spam detection. To the best of our knowledge, there is still no reported study on this problem.

My Review

Authors discuss about new kind of spam in web which is called, “Review Spam”. They believe that this kind of spam are different from previous spam, so do the method detection.

There are 2 type of Review Spam:

  1. Duplicate Spam – review which are duplicated for different products
  2. Spam classification – other spams review spams which are not duplicated and mislead real users.

They use Shingle method to detect duplicate spams. but there is challenge for detecting other type. So authors based their classification on duplicate spams as positive training examples and learn their suggested machine learning model to detect type 2 spam characteristic.

As author indicated in their paper, it is hard to detect spam only from content of review. So we need other kind of information (meta-information) about a person who write review and all reviewers to better detect spams with more accuracy.

All and all, this kind of spam, Review Spam, need more robust and accurate method so need more investigations

Important terms

Useful Refrences

  • Jindal, N. & Liu, B. Review Analysis. Tech. Report, 2007.
  • Jindal, N., & Liu, B. Identifying comparative sentences in text documents. SIGIR’2006.
  • Li, K., & Zhong, Z. Fast statistical spam filter by approximate classifications. SIGMETRICS 2006, 2006.
  • Popescu, A-M., & Etzioni, O. Extracting Product Features and Opinions from Reviews. EMNLP’2005.
  • Jindal, N., & Liu, B. Identifying comparative sentences in text documents. SIGIR’2006.