Review: Is Britney Spears Spam?

Authors: Aaron Zinman, Judith Donath 
Year: 2007
Published in: In Proceedings of Fourth Conference on Email and Anti-Spam
Link: http://smg.media.mit.edu/papers/Zinman/britneyspears.pdf
Importance: High

Abstract

We seek to redefine spam and the role of the spam filter in the context of Social Networking Services (SNS). SNS, such as MySpace and Facebook, are increasing in popularity. They enable and encourage users to communicate with previously unknown network members on an unprecedented scale. The problem we address with our work is that users of these sites risk being overwhelmed with unsolicited communications not just from e-mail spammers, but also from a large pool of well intending, yet subjectively uninteresting people. Those who wish to remain open to meeting new people must spend a large amount of time estimating deception and utility in unknown contacts. Our goal is to assist the user in making these determinations. This requires identifying clear cases of
undesirable spam and helping them to assess the more ambiguous ones. Our approach is to present an analysis of the salient features of the sender’s profile and network that contains otherwise hard to perceive cues about their likely intentions. As with traditional spam analysis, much of our work focuses on detecting deception: finding profiles that mimic ordinary users but which are actually commercial and usually undesirable entities. We address this within the larger context of making more legible the key cues presented by any unknown contact. We have developed a research prototype that categorizes senders into broader categories than spam/not spam using features unique to SNS. We discuss our initial experiment, and its results and implications.

My Review

Authors purpose a detection method for social network website user in determining spam users. They use the name Britney Spears to demonstrate a sample spam user.

They interestingly defined the different problem and spam behavior of Social Network websites against other web spam categories.

  1. A friend request from spam users in Social network websites is content-less, so many of content-based detection algorithms can not employed here
  2. In social network websites, simply, filtering based on categories can not help us to detect spam user, since many spam user profiles are deceptive and also how can we define one category as spam and one is not?

They try to categories users on social network website based on two main categories:

  1. Sociability
  2. Promotion

And combination of these two categories

  1. Low sociability and low promotion: New user, low-effort spammer
  2. Low sociability and high promotion: spammer, Britney Spears is here 😉
  3. High sociability and low promotion: Many active users
  4. High sociability and high promotion: Spammer, local band (real users)

For scoring user in these two categories they used to group of features:

  1. Profile-based features:
  2. Network-based features:

Profile-based features include:

  • number of friends
  • number of youtube movies
  • number of details
  • number of comments
  • number of thanks
  • number of survey
  • number of ‘I’
  • number of ‘you’
  • missing picture
  • mp3 player present
  • static url to profile available
  • has a school section
  • has blurbs
  • the page is personalized through CSS
  • has a networking section
  • has a company section
  • has blog entries

Network-based feature include:

  • percent of our comments that are from our top n
  • percent of our top n comments that are from us
  • percent of our comments’ images that are unique
  • percent of our comments’ hrefs that are unique
  • percent of our comments to our top n that have unique hrefs
  • percent of our comments to our top n that have unique images
  • average number of posters that use the same images in our
  • comments to our top n
  • average number of posters that use the same images in our comments
  • average number of posters that use the same hrefs in our comments
  • average number of posters that use the same hrefs in our comments to our top n
  • total number of comments from anyone to our top n
  • total number of images in comments
  • total number of hrefs in comments
  • total number of images in our comments to our top n
  • total number of hrefs in our comments to our top n
  • percent of our comments that have images
  • percent of our comments that have hrefs
  • percent of our comments in our top n that have hrefs
  • percent of our comments in our top n that have images
  • number of independent images in our comments
  • number of independent hrefs in our comments
  • number of independent images in our comments to our top n
  • number of independent hrefs in our comments to our top n

Although they did not provide practical method for their suggestion detection method, their works was great and unique in web spam field.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: