-
Collect a large corpus of verified ham and put it in a folder named Z corpus (ham).
-
Collect a large corpus of verified spam and put it in a folder named Z corpus (spam).
-
Create two other folders named Z training set (ham) and Z training set (spam).
-
Move a small number of messages from Z corpus (ham) to Z training set (ham).
-
Move an equal number of messages from Z corpus (spam) to Z training set (spam).
-
Using the SpamBayes Manager, train on the messages in Z training set (ham) and Z training set (spam) making sure to select "Rebuild entire database" and deselect "Score messages after training".
-
From the SpamBayes menu tab, select "Filter messages ..." and select all four of the above folders. In the "Filter action" section, select "Score messages, but don't perform filter action". In the "Restrict filter to" section, make sure everything is deselected. Hit the "Start Filtering" button.
-
Add a Spam field to each of the four folders and click on the Spam field header in each one to sort by spam score.
-
Move one or more of the highest scoring ham from Z corpus (ham) to Z training set (ham).
-
Move an equal number of the lowest scoring spam from Z corpus (spam) to Z training set (spam).
-
Using the SpamBayes Manager, train on the messages in Z training set (ham) and Z training set (spam) making sure to deselect "Rebuild entire database" and "score messages after training". This will only train on the new messages added. It will not train on messages already trained.
-
From the SpamBayes menu tab, select "Filter messages ..." and hit the "Start Filtering" button.
-
Go to step 9 and repeat this loop until you are satisfied with the performance.