Spam Analysis垃圾郵件分析
June 25th, 2008 · by David Bradley 2008年6月25日,大衛布拉德利
Anyone who says they have never had a problem with email spam is either my Dad, who has never touched a computer in his life (bless him), or they have staff to read their emails.誰的人說他們從未有過的問題,與垃圾電子郵件,或者是我爸爸,誰從未動過計算機在他的生命(保佑他) ,或者他們的工作人員閱讀他們的電子郵件。 Spam is ubiquitous in the online world, it is everywhere, and it is omnipresent.垃圾郵件是無處不在網上世界,這是無處不在,它是無所不在。
If you’re using Google Mail you may not see much, the spam filters on that system are very good (at least in my experience).如果您使用的是Google的郵件,您可能看不到多少,垃圾郵件過濾器該系統是很好的(至少在我的經驗) 。 Moreover, if you’re then POP3 downloading your此外,如果您的POP3 ,然後下載您的 GMail的Gmail into a desktop email client with到桌面的電子郵件客戶端與 Bayesian statistical filtering貝葉斯統計過濾 then you may see even less.然後,您可能會看到,甚至更少。 Forward to your Linux-based server and employ Spam Assassin and you may well see only very rare spam emails.著您的基於Linux的服務器和聘請的垃圾郵件殺手,你可能只看到非常罕見的垃圾郵件。 However, just take a look at your space-draining spam folders and you will realize that, although you may not see much spam, it’s still a problem.不過,只要看看您的空間排水垃圾郵件文件夾,你會認識到,雖然你可能不會看到很多垃圾郵件,它的仍是一個問題。
Computer scientists in France think they may have come up with a new answer to finding the perfect spam filter.計算機科學家在法國認為他們可能想出了一個新的答案,找到了完美的垃圾郵件過濾器。 Writing in the寫作,在 International Journal of Web and Grid Services國際雜誌的網絡和網格服務 recently (2008, vol 4, , they describe how they can filter spam very effectively using a process known as Kolmogorov complexity analysis. This approach works, not by analyzing the headers or the body of an incoming email, but by classifying it based on how well it can be compressed (akin to WinZip or Stuffit compression) and then comparing this compression ratio to that of previously whitelisted or blacklisted emails.最近( 2008年,第一卷4 , ,他們描述他們如何能夠過濾垃圾郵件非常有效地使用過程稱為的Kolmogorov複雜性分析,這種做法工程,而不是通過分析標題或正文中傳入電子郵件,但劃分為它的基礎上如何以及可以壓縮(類似WinZip的或壓縮的StuffIt ) ,然後比較,這個壓縮比這一以前白名單或黑名單的電子郵件。
Andrei Nikolaevich Kolmogorov (1903-1987) was a Soviet mathematician, considered one of the most pre-eminent of the twentieth century.安德烈nikolaevich的Kolmogorov ( 1903年至1987年)是前蘇聯數學家,考慮到其中一個最前著名的20世紀。 He made major advances in probability theory, topology, intuitionistic logic, turbulence, classical mechanics and computational complexity.他取得了重大進展在概率論,拓撲結構,直觀的邏輯,湍流,經典力學和計算複雜性。 It is within Kolmogorov’s work on logic that這是的Kolmogorov的工作邏輯 Gilles Richard吉勒斯理查德 and Andrei Doncescu of the University of Toulouse hope to find a solution to spam filtering, as they explain:和安德烈東切斯庫的圖盧茲大學,希望找到一個解決垃圾郵件過濾,因為他們解釋:
The main idea is to give a formal meaning to the notion of ‘information content’ and to provide a measure of this content. 其主要思想是給一個正式的意義的概念â € 〜信息contentâ € ™ ,並提供一個衡量此內容。 Using such a quantitative approach, it becomes possible to define a distance, which is a major tool for classification purposes. 使用這類一種定量方法,它成為能夠確定的距離,這是一個主要工具分類的目的。
The researchers have validated their approach by proceeding in two steps:研究人員已經證實他們的做法,程序分為兩個步驟:
First, they used the classical compression distance over a mix of spam and legitimate emails to determine if they can be properly clustered without any supervision.首先,他們用古典壓縮距離的混合垃圾郵件和合法的電子郵件,以確定如果他們能夠得到妥善群集在沒有任何監督。 This step could then show whether there is an underlying structure to spam emails that might be exploited in filtering.這一步,便可以查看是否有一個潛在的結構,以垃圾郵件可能會被利用在過濾。
In the second step, they implemented a simple machine-learning system, a so-called k-nearest neighbors algorithm, which then classifies emails according to how closely they resemble others in the queue.在第二步,他們實施了簡單的機器學習系統,即所謂的K近鄰算法,然後根據分類的電子郵件,如何緊密合作,他們類似於其他人在排隊。 The approach requires no deep analysis of the header or body of the incoming email as is necessary with Spam Assassin type systems and Bayesian filtering.辦法規定,沒有深入分析的標題或正文中傳入的電子郵件是必要的與垃圾郵件殺手型系統和貝葉斯過濾。 Instead, it works by simply measuring how different is the possible compression of known legitimate and spam emails.相反,它的工作原理簡單地衡量如何不同的是盡可能壓縮已知的合法和垃圾郵件。
Using this approach, the researchers were able to filter spam with 85% using this approach alone.使用這種方法,研究人員能夠過濾垃圾郵件與85 % ,使用此方法。 However, its real strength will lie in turning to a more powerful classification technique (Support Vector Machines for instance) and in coupling it to another anti-spam technique, such as Bayesian analysis, Richard told me.但其真正的力量在於在談到一個更強大的分類技術(支持向量機,例如)和在耦合到另一個反垃圾郵件技術,如貝葉斯分析,理查德告訴我。

















3 responses so far ↓三反應到目前為止↓
andrew // 鄭家富 / / Jun 25, 2008 at 4:51 pm 2008年6月25日在下午4時51分
I read an article about a new technology called ReceiverNet from Abaca.我讀了一篇關於一個新的技術,稱為receivernet從abaca 。 ReceiverNet technology characterizes each protected user based on the percentage of spam they receive and then uses those reputations to rate the incoming message flow. receivernet技術的特點,每個保護用戶的基礎上的百分比,他們收到的垃圾郵件,然後使用這些圖虛名,以率傳入郵件流。 I changed my spam filtering system to Abaca’s Email Protection Gateway and it blocked Replica watches spam mails, Subpoena Phishing mails and many more.我更改了自己的垃圾郵件過濾系統,以abacaâ € ™電子郵件閘道保護和封鎖,它的翻版,手錶,垃圾郵件,網絡釣魚的傳票,電子郵件和許多更多。 I found that Abaca’s ReceiverNet service has 99% efficiency in blocking spam mails and they guarantee their results .我發現abacaâ € ™ s receivernet服務已經有99 %的效率,在阻斷垃圾郵件和他們保證他們的結果。 For more information, log on to如需更多資訊,登錄到 http://abaca.com/ . 。
David Bradley 大衛布拉德利 // / / Jun 25, 2008 at 6:18 pm 2008年6月25日在下午6時18分
Sounds like an interesting approach that saves on all this mathematical analysis.聽起來像一個有趣的方法,節省了這一切的數學分析。 Anyone else got a good system in place that works as well as Abaca?任何人都取得了較好的制度,在地方工程,以及abaca ?
Phil Whelan 菲爾惠蘭 // / / Jun 26, 2008 at 8:14 pm 2008年6月26日在下午8時14分
Abaca approach sounds like an interesting. abaca的做法,聽起來像一個有趣的。 99% is quite amazing! 99 % ,是相當驚人的! I’m going to check it.我要去檢查。
David, yes, we have an approach that uses even less mathematical analysis, using the idea that spammers are impatient.國寶,是的,我們有一個辦法,利用更少的數學分析,使用的觀念,垃圾郵件發送者是不耐煩了。 We slow down connections of unknown senders, and in doing so have found that most zombie machines sending the spam disconnect within a few seconds.我們放緩連接不明的寄件者,並在這樣做時有發現,大多數殭屍機器發送垃圾郵件斷開一個幾秒鐘。
Phil Whelans last blog post..菲爾whelans最後的博客帖子.. Sign up for a MailChannels Email System Load Test註冊一個mailchannels電子郵件系統的負載測試
Leave a Comment留下意見