Review: A Learning Approach to Spam Detection based on Social Networks

Authors: Ho-Yu Lam, Dit-Yan Yeung
Year: 2007
Published in: CEAS 2007
Importance: High


The massive increase of spam is posing a very serious threat to email which has become an important means of communication. Not only does it annoy users, but it also consumes much of the bandwidth of the Internet. Most spam filters in existence are based on the content of email one way or the other. While these anti-spam tools have proven very useful, they do not prevent the bandwidth from being wasted and spammers are learning to bypass them via clever manipulation of the spam content. A very different approach to spam detection is based on the behavior of email senders. In this paper, we propose a learning approach to spam sender detection based on features extracted from social networks constructed from email exchange logs. Legitimacy scores are assigned to senders based on their likelihood of being a legitimate sender. Moreover, we also explore various spam filtering and resisting possibilities.

My Review

The term “social network” which is stated in this paper refer to email transaction logs. Email transaction logs in SMTP server which contains sender address, ip address, sender email client, …. are parsed offline and construct email social networks. I like to mentioned to this term since it has different meaning from usual realization of this term.

In this paper, authors first mention to type of email spam:

  • Unsolicited commercial email (UCE) – emails without recipient’s prior consent.
  • Unsolicited bulk email (UBE) – emails which distribute virus and spywares.

Email spam detection are based on two approaches:

  1. Spam text detection
  2. Whitelist and blacklist

Their suggested method is based on spam detection whitelist and blacklist. They provide learning method for creating better black/whitelist.

Detection method based on 7 features, each feature is countend, normilized and weighted. then each of them is compaired with other valid feature data. What I mean by valid feature data is those data that are classified before as spam or non-spam. By compairing similarity between these futures a sender can be considered as spam / non-spam.

One drawback of this method is that may be some website which sent mass emails to users (such as mycareer, ebay, …) may fall into spam senders. So there should be some other policies for these legitimate senders.

Important Terms

  • In/out-count
  • In/Out-degree
  • Communication Reciprocity
  • Communication Interaction Average
  • Clustering Coefficient

Cite this article as
Critical review on  “A Learning Approach to Spam Detection based on Social Networks” by P.Hayati. 8th Mar 2008. Available online:


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: