UserPreferences

SkipsRecursiveTrainingSetSelectionForOutlook


Note: This wiki is now frozen; you can no longer edit it, and no interactive features work.

Here is an adaptation of SkipsRecursiveTrainingSetSelectionAlgorithm for the Outlook Plug-In. Skip's algorithm actually used errors and unsures as the corpus to train from, but this will work equally well for a message corpus selected by any of the training tactics in TrainingIdeas. The number of messages you choose in steps 4 and 9 depends on whether your database size philosophy is minimalist, regular or enlarged. [Rumor has it that Skip has recently left the enlarged camp and become a minimalist.]
  1. Collect a large corpus of verified ham and put it in a folder named Z corpus (ham).

  2. Collect a large corpus of verified spam and put it in a folder named Z corpus (spam).

  3. Create two other folders named Z training set (ham) and Z training set (spam).

  4. Move a small number of messages from Z corpus (ham) to Z training set (ham).

  5. Move an equal number of messages from Z corpus (spam) to Z training set (spam).

  6. Using the SpamBayes Manager, train on the messages in Z training set (ham) and Z training set (spam) making sure to select "Rebuild entire database" and deselect "Score messages after training".

  7. From the SpamBayes menu tab, select "Filter messages ..." and select all four of the above folders. In the "Filter action" section, select "Score messages, but don't perform filter action". In the "Restrict filter to" section, make sure everything is deselected. Hit the "Start Filtering" button.

  8. Add a Spam field to each of the four folders and click on the Spam field header in each one to sort by spam score.

  9. Move one or more of the highest scoring ham from Z corpus (ham) to Z training set (ham).

  10. Move an equal number of the lowest scoring spam from Z corpus (spam) to Z training set (spam).

  11. Using the SpamBayes Manager, train on the messages in Z training set (ham) and Z training set (spam) making sure to deselect "Rebuild entire database" and "score messages after training". This will only train on the new messages added. It will not train on messages already trained.

  12. From the SpamBayes menu tab, select "Filter messages ..." and hit the "Start Filtering" button.

  13. Go to step 9 and repeat this loop until you are satisfied with the performance.

SethGoodman