I've been using CRM114 as spam filter for a while now, and I'm quite happy with it. Due to bug #529720 though (incompatible upstream file format changes) I decided to start my setup from scratch with a recent CRM114 version from unstable. Here's a short HOWTO, hope it's useful for others.
First you need to install crm114 and set up a few files in your $HOME directory.
$ sudo apt-get install crm114 $ mkdir ~/.crm114 $ cd ~/.crm114 $ cp /usr/share/doc/crm114/examples/mailfilter.cf.gz . $ gunzip mailfilter.cf.gz $ cp /usr/share/crm114/mailtrainer.crm . $ touch rewrites.mfp priolist.mfp
Edit ~/.crm114/mailfilter.cf and set the following variables (some are optional, but that's what I currently use):
:spw: /mypassword/ :add_verbose_stats: /no/ :add_extra_stuff: /no/ :rewrites_enabled: /no/ :spam_flag_subject_string: // :unsure_flag_subject_string: // :log_to_allmail.txt: /no/
The :log_to_allmail.txt: /no/ option should probably stay at "yes" for the first few days until you have tested your setup and everything works OK. The ~/.crm114/allmail.txt file will contain all your mails, in case something goes wrong.
Now set up empty spam and nonspam files like this:
$ cssutil -b -r spam.css $ cssutil -b -r nonspam.css
Test the setup by invoking mailreaver.crm as follows, typing some test text and then pressing CTRL+d:
$ /usr/share/crm114/mailreaver.crm -u ~/.crm114 test [CTRL-d] ** ACCEPT: CRM114 PASS osb unique microgroom Matcher ** CLASSIFY fails; success probability: 0.5000 pR: 0.0000 Best match to file #0 (nonspam.css) prob: 0.5000 pR: 0.0000 Total features in input file: 8 #0 (nonspam.css): features: 1, hits: 0, prob: 5.00e-01, pR: 0.00 #1 (spam.css): features: 1, hits: 0, prob: 5.00e-01, pR: 0.00 X-CRM114-Version: 200904023-BlameSteveJobs ( TRE 0.7.6 (BSD) ) MF-35EB8B9A [pR: 0.0000] X-CRM114-CacheID: sfid-20090920_151224_574131_D290E589 X-CRM114-Status: UNSURE (0.0000) This message is 'unsure'; please train it!
The output should look similar to the above. If there are errors instead, you should check your settings in ~/.crm114/mailfilter.cf.
Now you have to setup a procmail rule for crm114:
:0fw: crm114.lock | /usr/share/crm114/mailreaver.crm -u /home/uwe/.crm114 :0: * ^X-CRM114-Status: SPAM.* IN.spam-crm114
Finally, in .muttrc I have the following configs so I can press SHIFT+x to mark a mail as spam, and SHIFT+h to mark it as non-spam (ham).
macro index X '| formail -I X-CRM114-Status -I X-CRM114-Action -I X-CRM114-Version | /usr/share/crm114/mailreaver.crm -u /home/uwe/.crm114/ --spam' macro index H '| formail -I X-CRM114-Status -I X-CRM114-Action -I X-CRM114-Version | /usr/share/crm114/mailreaver.crm -u /home/uwe/.crm114/ --good' macro pager X '| formail -I X-CRM114-Status -I X-CRM114-Action -I X-CRM114-Version | /usr/share/crm114/mailreaver.crm -u /home/uwe/.crm114/ --spam' macro pager H '| formail -I X-CRM114-Status -I X-CRM114-Action -I X-CRM114-Version | /usr/share/crm114/mailreaver.crm -u /home/uwe/.crm114/ --good'
Important: crm114 is most effective if you start with empty CSS files (as shown above) and only train it by marking mails as spam/ham when it gets them wrong. The process will take a few hours or maybe a day (depending on how many mails per day you get), then the misclassification rate gets very low...
Update 2009-09-23: Changed --spam/--nonspam to the correct options for mailreaver/mailtrainer, --spam/--good.