Spam Analysis垃圾邮件分析
June 25th, 2008 · by David Bradley 2008年6月25日,大卫布拉德利
Anyone who says they have never had a problem with email spam is either my Dad, who has never touched a computer in his life (bless him), or they have staff to read their emails.谁的人说他们从未有过的问题,与垃圾电子邮件,或者是我爸爸,谁从未动过计算机在他的生命(保佑他) ,或者他们的工作人员阅读他们的电子邮件。 Spam is ubiquitous in the online world, it is everywhere, and it is omnipresent.垃圾邮件是无处不在网上世界,这是无处不在,它是无所不在。
If you’re using Google Mail you may not see much, the spam filters on that system are very good (at least in my experience).如果您使用的是Google的邮件,您可能看不到多少,垃圾邮件过滤器该系统是很好的(至少在我的经验) 。 Moreover, if you’re then POP3 downloading your此外,如果您的POP3 ,然后下载您的 GMail的Gmail into a desktop email client with到桌面的电子邮件客户端与 Bayesian statistical filtering贝叶斯统计过滤 then you may see even less.然后,您可能会看到,甚至更少。 Forward to your Linux-based server and employ Spam Assassin and you may well see only very rare spam emails.着您的基于Linux的服务器和聘请的垃圾邮件杀手,你可能只看到非常罕见的垃圾邮件。 However, just take a look at your space-draining spam folders and you will realize that, although you may not see much spam, it’s still a problem.不过,只要看看您的空间排水垃圾邮件文件夹,你会认识到,虽然你可能不会看到很多垃圾邮件,它的仍是一个问题。
Computer scientists in France think they may have come up with a new answer to finding the perfect spam filter.计算机科学家在法国认为他们可能想出了一个新的答案,找到了完美的垃圾邮件过滤器。 Writing in the写作,在 International Journal of Web and Grid Services国际杂志的网络和网格服务 recently (2008, vol 4, , they describe how they can filter spam very effectively using a process known as Kolmogorov complexity analysis. This approach works, not by analyzing the headers or the body of an incoming email, but by classifying it based on how well it can be compressed (akin to WinZip or Stuffit compression) and then comparing this compression ratio to that of previously whitelisted or blacklisted emails.最近( 2008年,第一卷4 , ,他们描述他们如何能够过滤垃圾邮件非常有效地使用过程称为的Kolmogorov复杂性分析,这种做法工程,而不是通过分析标题或正文中传入电子邮件,但划分为它的基础上如何以及可以压缩(类似WinZip的或压缩的StuffIt ) ,然后比较,这个压缩比这一以前白名单或黑名单的电子邮件。
Andrei Nikolaevich Kolmogorov (1903-1987) was a Soviet mathematician, considered one of the most pre-eminent of the twentieth century.安德烈nikolaevich的Kolmogorov ( 1903年至1987年)是前苏联数学家,考虑到其中一个最前著名的20世纪。 He made major advances in probability theory, topology, intuitionistic logic, turbulence, classical mechanics and computational complexity.他取得了重大进展在概率论,拓扑结构,直观的逻辑,湍流,经典力学和计算复杂性。 It is within Kolmogorov’s work on logic that这是的Kolmogorov的工作逻辑 Gilles Richard吉勒斯理查德 and Andrei Doncescu of the University of Toulouse hope to find a solution to spam filtering, as they explain:和安德烈东切斯库的图卢兹大学,希望找到一个解决垃圾邮件过滤,因为他们解释:
The main idea is to give a formal meaning to the notion of ‘information content’ and to provide a measure of this content. 其主要思想是给一个正式的意义的概念â € 〜信息contentâ € ™ ,并提供一个衡量此内容。 Using such a quantitative approach, it becomes possible to define a distance, which is a major tool for classification purposes. 使用这类一种定量方法,它成为能够确定的距离,这是一个主要工具分类的目的。
The researchers have validated their approach by proceeding in two steps:研究人员已经证实他们的做法,程序分为两个步骤:
First, they used the classical compression distance over a mix of spam and legitimate emails to determine if they can be properly clustered without any supervision.首先,他们用古典压缩距离的混合垃圾邮件和合法的电子邮件,以确定如果他们能够得到妥善群集在没有任何监督。 This step could then show whether there is an underlying structure to spam emails that might be exploited in filtering.这一步,便可以查看是否有一个潜在的结构,以垃圾邮件可能会被利用在过滤。
In the second step, they implemented a simple machine-learning system, a so-called k-nearest neighbors algorithm, which then classifies emails according to how closely they resemble others in the queue.在第二步,他们实施了简单的机器学习系统,即所谓的K近邻算法,然后根据分类的电子邮件,如何紧密合作,他们类似于其他人在排队。 The approach requires no deep analysis of the header or body of the incoming email as is necessary with Spam Assassin type systems and Bayesian filtering.办法规定,没有深入分析的标题或正文中传入的电子邮件是必要的与垃圾邮件杀手型系统和贝叶斯过滤。 Instead, it works by simply measuring how different is the possible compression of known legitimate and spam emails.相反,它的工作原理简单地衡量如何不同的是尽可能压缩已知的合法和垃圾邮件。
Using this approach, the researchers were able to filter spam with 85% using this approach alone.使用这种方法,研究人员能够过滤垃圾邮件与85 % ,使用此方法。 However, its real strength will lie in turning to a more powerful classification technique (Support Vector Machines for instance) and in coupling it to another anti-spam technique, such as Bayesian analysis, Richard told me.但其真正的力量在于在谈到一个更强大的分类技术(支持向量机,例如)和在耦合到另一个反垃圾邮件技术,如贝叶斯分析,理查德告诉我。

















3 responses so far ↓三反应到目前为止↓
andrew // 郑家富 / / Jun 25, 2008 at 4:51 pm 2008年6月25日在下午4时51分
I read an article about a new technology called ReceiverNet from Abaca.我读了一篇关于一个新的技术,称为receivernet从abaca 。 ReceiverNet technology characterizes each protected user based on the percentage of spam they receive and then uses those reputations to rate the incoming message flow. receivernet技术的特点,每个保护用户的基础上的百分比,他们收到的垃圾邮件,然后使用这些图虚名,以率传入邮件流。 I changed my spam filtering system to Abaca’s Email Protection Gateway and it blocked Replica watches spam mails, Subpoena Phishing mails and many more.我改变垃圾邮件过滤系统,以abacaâ € ™电子邮件闸道保护和封锁,它的翻版,手表,垃圾邮件,网络钓鱼的传票,电子邮件和许多更多。 I found that Abaca’s ReceiverNet service has 99% efficiency in blocking spam mails and they guarantee their results .我发现abacaâ € ™ s receivernet服务已经有99 %的效率,在阻断垃圾邮件和他们保证他们的结果。 For more information, log on to如需更多资讯,登录到 http://abaca.com/ . 。
David Bradley 大卫布拉德利 // / / Jun 25, 2008 at 6:18 pm 2008年6月25日在下午6时18分
Sounds like an interesting approach that saves on all this mathematical analysis.听起来像一个有趣的方法,节省了这一切的数学分析。 Anyone else got a good system in place that works as well as Abaca?任何人都取得了较好的制度,在地方工程,以及abaca ?
Phil Whelan 菲尔惠兰 // / / Jun 26, 2008 at 8:14 pm 2008年6月26日在下午8时14分
Abaca approach sounds like an interesting. abaca的做法,听起来像一个有趣的。 99% is quite amazing! 99 % ,是相当惊人的! I’m going to check it.我要去检查。
David, yes, we have an approach that uses even less mathematical analysis, using the idea that spammers are impatient.国宝,是的,我们有一个办法,利用更少的数学分析,使用的观念,垃圾邮件发送者是不耐烦了。 We slow down connections of unknown senders, and in doing so have found that most zombie machines sending the spam disconnect within a few seconds.我们放缓连接不明的寄件者,并在这样做时有发现,大多数僵尸机器发送垃圾邮件断开一个几秒钟。
Phil Whelans last blog post..菲尔whelans最后的博客帖子.. Sign up for a MailChannels Email System Load Test注册一个mailchannels电子邮件系统的负载测试
Leave a Comment留下意见