信號無花果主頁 信號無花果Newsfeed 信號無花果由Email 信號無花果多數普遍的崗位
有效數字
幫助您,通過幫助以blogging,瀏覽和技術技巧

您的電子郵件自已組織的地圖作為控制

2007年8月29日 · 由大衛布雷得里

組織的電子郵件

信號專屬研究新聞最我們想要能分類,組織和一般 filterize 過濾我們一樣自動地接受儘可能電子郵件的浩大的數量。 有各種各樣的工具被打制入允許某一程度過濾的多數電子郵件程序,但什么都不是完善的。 當您考慮時一個典型的企業電子郵件也許處理數以萬計如果不成千上萬電子郵件每週,那是可怕的全部必須完成防止信息超載的電子郵件組織。

現在, Helmut Berger和邁克爾Dittenbach兩iSpaces研究小組的高級研究員在電子商務能力中心(EC3),在維也納,奧地利,運作與維也納技術大學節食者Merkl副教授回顧了各種各樣的技術解答對數據準備為電子郵件範疇。

文章分類責任可以用於辨認文件類型當然,并且允許過濾入適當的文件夾或優先權電子郵件的特殊專家的接收者、著作歸屬和證明根據發令者的,它可能允許對對一張勘測的規範化的反應的核對和分析或查詢表,例如和各種各樣的傳入的消息,和,在過濾掉發送同樣的消息到多個新聞組消息。

研究員學習了可能執行任務,包括支持傳染媒介機器、判定樹學習者、基於事例的量詞、naïve貝斯分類方法和自組織映射可以被實施作為直接的算法橫跨電子郵件系統的各種各樣的被監督的和未加監督的機器學習技術。 他們使用了電子郵件文件的一個「n克」表示法為了估計每一個種這些種方法表現的基於詞的或字符。

「n克」方法應該幫助所有範疇系統處理電子郵件的喧鬧的本質,拼錯、特性和簡稱是共同的並且不正確意譯從格式到格式。 任何人看見十二個和許多串像「=A30編碼在每個詞之間和在批轉的電子郵件或電子郵件的每條線的開始和結尾″和「=20 ″和html從到達入一個另外服從的電子郵件程序的MS Outlook將知道什麼頭疼種類噪聲可以是。

他們發現的成功的鑰匙在對電子郵件報頭信息的具體分析作為本文表示法一部分。 終究研究員說,除電子郵件的身體內容以外,倒栽跳水包含在傳入的消息的分類也許被利用的無價的信息。 Surprisingly, they found that organization was affected to a much lesser degree by whether or not the word-based document representation was used rather than the n-gram character analysis. Perhaps categorizing based on real word analysis counters the presence of noise just as effectively as the character approach. Their main conclusion is that support vector machines (SVMs) rather than the commonly used Bayesian and other approaches is apparently the most successful at organizing email. Unattended self-organizing maps lagged only a little behind the SVM approach, surprisingly perhaps, given that no user input or training is needed.

That said, all six approaches tested showed at least 90% accuracy. However, with tens of thousands of emails, 10% falsely or negatively classified as something, spam, for example, that they are not could cause almost as big a headache as the information overload the filtering aims to tackle.

The team reports details of their study in the International Journal of Intelligent Information and Database Systems, 2007, 1, 91-121.

3 responses so far ↓

  • Kannan.M.S. // Aug 30, 2007 at 9:47 am

    A good article for self-disciplined personnel and who believe in systems they and others build.
    Structured approach pays off after all.

  • DNA Networks // Sep 3, 2007 at 7:46 pm

    I’m not sure what kind of accuracy I get with Google mail, but it has to be very high! It is rare that I find something in Spam that isn’t spam.

    Google mail for your domain works great.

  • David Bradley // Sep 3, 2007 at 10:13 pm

    I probably see about 100-200 spams a day in my main google account and roughly 1-2 of those messages are false positives. Other people’s mileage varies…

Leave a Comment

Comments are checked for spam before appearing, no need to post it twice.

Related Posts