I've been using CRM114 as spam filter for a while now, and I'm quite happy with it. Due to bug #529720 though (incompatible upstream file format changes) I decided to start my setup from scratch with a recent CRM114 version from unstable. Here's a short HOWTO, hope it's useful for others.
First you need to install crm114 and set up a few files in your $HOME directory.
$ sudo apt-get install crm114 $ mkdir ~/.crm114 $ cd ~/.crm114 $ cp /usr/share/doc/crm114/examples/mailfilter.cf.gz . $ gunzip mailfilter.cf.gz $ cp /usr/share/crm114/mailtrainer.crm . $ touch rewrites.mfp priolist.mfp
Edit ~/.crm114/mailfilter.cf and set the following variables (some are optional, but that's what I currently use):
:spw: /mypassword/ :add_verbose_stats: /no/ :add_extra_stuff: /no/ :rewrites_enabled: /no/ :spam_flag_subject_string: // :unsure_flag_subject_string: // :log_to_allmail.txt: /no/
The :log_to_allmail.txt: /no/ option should probably stay at "yes" for the first few days until you have tested your setup and everything works OK. The ~/.crm114/allmail.txt file will contain all your mails, in case something goes wrong.
Now set up empty spam and nonspam files like this:
$ cssutil -b -r spam.css $ cssutil -b -r nonspam.css
Test the setup by invoking mailreaver.crm as follows, typing some test text and then pressing CTRL+d:
$ /usr/share/crm114/mailreaver.crm -u ~/.crm114 test [CTRL-d] ** ACCEPT: CRM114 PASS osb unique microgroom Matcher ** CLASSIFY fails; success probability: 0.5000 pR: 0.0000 Best match to file #0 (nonspam.css) prob: 0.5000 pR: 0.0000 Total features in input file: 8 #0 (nonspam.css): features: 1, hits: 0, prob: 5.00e-01, pR: 0.00 #1 (spam.css): features: 1, hits: 0, prob: 5.00e-01, pR: 0.00 X-CRM114-Version: 200904023-BlameSteveJobs ( TRE 0.7.6 (BSD) ) MF-35EB8B9A [pR: 0.0000] X-CRM114-CacheID: sfid-20090920_151224_574131_D290E589 X-CRM114-Status: UNSURE (0.0000) This message is 'unsure'; please train it!
The output should look similar to the above. If there are errors instead, you should check your settings in ~/.crm114/mailfilter.cf.
Now you have to setup a procmail rule for crm114:
:0fw: crm114.lock | /usr/share/crm114/mailreaver.crm -u /home/uwe/.crm114 :0: * ^X-CRM114-Status: SPAM.* IN.spam-crm114
Finally, in .muttrc I have the following configs so I can press SHIFT+x to mark a mail as spam, and SHIFT+h to mark it as non-spam (ham).
macro index X '| formail -I X-CRM114-Status -I X-CRM114-Action -I X-CRM114-Version | /usr/share/crm114/mailreaver.crm -u /home/uwe/.crm114/ --spam' macro index H '| formail -I X-CRM114-Status -I X-CRM114-Action -I X-CRM114-Version | /usr/share/crm114/mailreaver.crm -u /home/uwe/.crm114/ --good' macro pager X '| formail -I X-CRM114-Status -I X-CRM114-Action -I X-CRM114-Version | /usr/share/crm114/mailreaver.crm -u /home/uwe/.crm114/ --spam' macro pager H '| formail -I X-CRM114-Status -I X-CRM114-Action -I X-CRM114-Version | /usr/share/crm114/mailreaver.crm -u /home/uwe/.crm114/ --good'
Important: crm114 is most effective if you start with empty CSS files (as shown above) and only train it by marking mails as spam/ham when it gets them wrong. The process will take a few hours or maybe a day (depending on how many mails per day you get), then the misclassification rate gets very low...
Update 2009-09-23: Changed --spam/--nonspam to the correct options for mailreaver/mailtrainer, --spam/--good.
Whoa, those spammers are getting really desperate now, aren't they?
Today in my inbox:
apy8 0lyk b8xvdtfa glb13 0uurqjl 5xju3p0jb 1uk9z yhak o3vl tytjx4ui m6frp 64zx9238iq 128lkxk2wh fzqpv mkqj g1tn wgd293sv s3mnhaq y1vvng731dsy f39iqddc65f 5fgnwcx t9ba4 wg8j1 ucq8 uviyoz6 4k2g4 fo wz0i q7pn hqblemz pu9t 1dwr mocp nlihfws mm3w0 j4zb 0fzh o6nljyq 0luy to8a ljd0 5bi8 zpfh 93ab tbpr hztc foza p7sf 5vw7t a4nce 2fjr oxto 2t0r 37v3 mxvfq0 x6qtw1j6me ye51 b7pt pwtx gg5l mtfr h7390 0voxg btc8 t7vj3n twn72qv80 92sj8 8qhuc 4xoq 9m3u r3i0 4dgf 2k8l o6u8 eegabt 70vrl5ukj 6bpp u336 9p5tqyo ixkj 7mkcss82ko2 6dgtj tdei eayi tnjgi ujh0x073p63 jbxotva alrs ubvdw9kele9rs ed7bi vbjz 0tlb b1svn 15xh90ojyj56u zzfla7m 3o1jnrrc kvlxt74rl46l1 yy5mng2kl7dj 8bmq 793jb qzqkjf00glzsf e6doi hfcqgi2t w8bd vydk elqfyxtdk7g upqf ippbf ca5l cgrm npnrd dzsgo4jz q9zo co4g 6kabvxc sqpy 5ds54 qhpb krpw
A recent debian-curiosa thread made my day:
# Subject: looking for someone?
# From: "Mitch"
Hi there locvely,
aThis kind aof opportucnity comes ones in a life. I don't want
to miss it. Do you? I am coming to your place in few days
and I thoughc may be we can meet each other. If cyou don't mind
I can send you my pcicturea. I am a girl.
You can bcorrespond with me using my email firstname.lastname@example.org
# From: 'Mash
Sorry I prefer a women who isn't so keen on placing random letters
in her words. Apparently they are rubbish in bed.
I mean what the hell is a "pcicturea," something from the
Anne-summers Jurassic collection?
# From: Shawn McMahon
I prefer women who aren't named "Mitch".
The registered wiki user "Drunkers" (yes, the spammer scripts not only spam anonymously, they also create real accounts lately!) spammed several pages in the wiki, adding tons of spam links, hidden with fancy CSS and other tricks. Nothing unusual so far.
Another registered user ("Mootlif3") reverted the spammer's changes with comments like "reverted (spam)", "unrelated links removed", "deleted spam links", and "damn spammers". Or so I thought.
The real "wtf" moment emerged when I checked what "Mootlif3" really had done. He didn't really revert the changes of the spammer. He only removed a few of the links from the page, leaving most of them still in there! So it would look like a nice (human) wiki user had helped out with cleaning spam, but in reality he only created a false sense of "security" for people who really want to clean the spam...
Damn spammers, indeed.
I've been a happy Privoxy user for quite some time now. I can really recommend it to anybody who wants to get rid of all the nasty stuff floating around on the web these days. From the Privoxy homepage:
Privoxy is a web proxy with advanced filtering capabilities for protecting privacy, modifying web page content, managing cookies, controlling access, and removing ads, banners, pop-ups and other obnoxious Internet junk.
The most useful feature for me is that it automatically removes almost all of those ugly flash-based ad banners.
My todo list: