To get started, after installing SpamBayes, just run
-
sb_filter.py -n
This creates your word database. After doing this, you'll want to set up filtering and training.
1. Procmail
Procmail is a standard mail filtering program used on many Unix systems. If you read mail from a Unix shell, chances are you have procmail available. Check with your system administrator for more information.
The following recipies will filter every message, and put the spam and unknown messages into their own folders:
:0 fw | sb_filter.py :0: * ^X-Spambayes-Classification: spam spam :0: * ^X-Spambayes-Classification: unknown unknown
Ham will fall through these three recipies and be delivered normally. If you don't have any recipies after these, it will go to your inbox.
2. Mutt
Mutt is a popular text-based mail client written in C, featuring a colorized display and some neat customization options. You will probably want to use this in conjuction with the procmail setup listed above.
Adding this to your muttrc will:
-
Bind "S" (that's shift-s) to "refile as spam". This deletes the message, runs it through SpamBayes with the "this is spam" option, then sends the re-filtered message through procmail.
-
Bind "H" (shift-h) to "refile as ham". This does the same thing as "S", except it tells SpamBayes that the message is ham.
-
Color spam red
-
Color unsures green
The idea is that if you train on the misfiled messages and the unsures (green), you can build up a good training database in a short amount of time. It works for me, at least
macro index S "|sb_filter.py -s -f | procmail\nd" macro pager S "|sb_filter.py -s -f | procmail\nd" macro index H "|sb_filter.py -g -f | procmail\nd" macro pager H "|sb_filter.py -g -f | procmail\nd" color index red default "~h '^X-Spambayes-Classification: spam'" color index green default "~h '^X-Spambayes-Classification: unsure'"
Note: I experienced a significant delay when executing these keyboard macros, so I run them in the background with the ampersand (&) which is very speedy, for example: macro index S "|sb_filter.py -f -s | procmail \&\n<delete-message>"
You should check that the variable "pipe_decode" is set to "no" ("set pipe_decode=no" in your ~/.muttrc). When set to "yes", the displayed version of the mail is piped (stripping out the non-displayed headers). The version in Debian unstable sets it to yes by default, for instance.
3. Gnus
Gnus is a news and mail client shipped with Emacs. It's probably the most powerful mail client on the planet.
The contrib/spambayes.el file in the SpamBayes distribution contains code and instructions for integrating with Gnus. It allows you to filter mail from Gnus (no need for procmail) and can train on a message then refile it based on its training. This is very handy when doing fancy- or group-splitting.
In addition, you may want to set up scoring rules so that when you review your spam, you can only look at things that the system was not certain about. I use the following score file in my spam group, which marks as read anything with a Spambayes score of 0.97-1.00:
(("head" ("X-Spambayes-Classification: spam; \\(1\\|0\\.9[789]\\)" -1000 nil r)))
4. Sylpheed-claws (POP3)
Sylpheed-claws is a fast lighweight, e-mail client built on top of the sylpheed app which is the stable release of the same program. It exists for linux/unix/win32, and has infact another spam filter plugin - spamassassin. I am using it because it is very fast, simple to change in the source code (perhaps on might add a spambayes plugin to it someday!), and yes simple to use.
There are two solutions for fetching mails you'd like to use spambayes together with sylpheed-claws:
-
Either you use the POP3 proxy which works great. But the thing with it is that when one is downloading the e-mail messages one has to wait for a short time when spambayes analyses each message.
-
Or either you use procmail (perhaps together with fetchmail if you don't have a MDA installed) which makes checking your mail a lot faster as mails will be delivered in the background.
Either way you choose above, you will always have to train you messages and this can be done in many ways:
-
The most simple way - by training against the web interface (probably something like http://localhost:8880).
-
By setting a certain mail adress on your localhost where you can send your ham and spam. This requires you to set up sylpheed to use the smtp port on your computer.
-
By an action script which you can add in sylpheed. Although I'm too lazy to add a description here today, it should be possible =)
-
You can also have a script which trains you mail once every 24 hours or so...in linux/unix you can do this by using "cron" (see "man cron" for more details).
Next you will just add a filter in sylpheed-claws to sort out your spam. First you will need to create a folder where your spams will go. This is done by pressing "File" -> "Folder" -> "Create new folder". Then do the following:
-
Configuration -> Filter
-
In "condition" press "Define..."
-
"Match-type" = "Header"
-
"Header-name" = "X-Spambayes-Classification"
-
"Value" = "spam"
-
Press "Add" If you'd like to filter your mails which are classified as "unsure" aswell. Otherwise go to step 12.
-
"Boolean op" = "or"
-
"Match-type" = "Header"
-
"Header-name" = "X-Spambayes-Classification"
-
"Value" = "unsure"
-
Press "Add"
-
Press "OK"
-
In "Action", press "Define..."
-
"Action" = "Move"
-
In "Destination" press "Select..." and select your folder where your spams will go!
-
Press "Add"
-
Press OK
-
Press "Add"
-
Press "OK"
It is also simple to implement in sylpheed so that it sets a certain color on the spams.
Good luck!