SpamAssassin Auto-Learning with Site-Wide Bayes and User Feedback

The object of this howto is getting your SpamAssassin Bayes Database effective system-wide and allow your users to feed mis-tagged spam back to the server where a script automatically runs sa-learn on it. In order to use this method you need the following:

My qmail setup with Spamassassin
RipMIME from ports. You can install the FreeBSD port: /usr/ports/mail/ripmime
An email account on your server (i.e. thisisspam@yourdomain.com )
for the users to send the spam to
The learnspam script included in this package
Users must send the spam emails as ATTACHMENTS to your thisisspam email address

STEP 1 - The System Account:

The System-wide Bayes Database and spamassassin need to operate as the same user. Normally that would be spamd as set in /etc/sysconfig/spamassin (or similar) But the Autolearn script must be able to R/W the mail directories on the server and the Bayes Database. Spamd cannot R/W mail directories so you must run the script as either root (cron.daily) or vpopmail. However, Vpopmail does not have R/W permissions to the Bayes Database if spamd is running spamassassin. For those who do not wish to risk running the script as root, simply change the spamd user to qscand by setting the -u and -h options in /etc/sysconfig/spamassin from spamd to qscand. Then when you restart spamassassin, ps aux should show spamd running as qscnad who is able to R/W the Bayes Directory.

Once you decide which account will run spamassassin and the autolearn script, choose where in that account's home directory to put the database, the default is /home/(account name)/.spamassassin

STEP 2 - Setting up Bayes and Autolearning in Spamassassin:

Edit /usr/local/etc/mail/spamassassin/local.cf and insert or modify the following lines:

bayes_path /path/to/your/bayes/directory ( as you chose in Step 1)
use_bayes 1
bayes_auto_learn 1

Save the file and restart spamassassin. Run sa-learn --sync to resync the database and then run sa-learn --dump magic and you should see nham and nspam at 0

You need 200 ham and 200 spam in your database for Bayes to autolearn. If you have good emails in your users' /cur directories do the following:

# find /home/vpopmail/domains -type d -name cur -exec sa-learn --nosync --ham {}/* ;

Then run sa-learn --sync and sa-learn --dump magic to see that they are there. Otherwise gather some legit email from your users or other sources into a directory on the server and run sa-learn --nosync --ham on them, then --sync again.

Find some spam to force feed the database - drop it into a folder and run

# sa-learn --nosync --spam /path/to/spam/*

Then run sa-learn --sync and sa-learn --dump magic again to make sure the database is growing. You should see numbers climbing steadily as spamassassin automatically learns spam and ham as mail flows through the server.

STEP 3 - Setting up the Feedback Autolearn Script

After setting up your spam account and installing RipMIME, Edit the learnspam script variables per your preferences and system.

The system account the script runs as must have /usr/local/bin in their $PATH to find RipMIME. If you chose to run the script as root (from cron.daily) you will need to insert this line in the script: PATH="$PATH":/usr/local/bin Remember, however, that running anything as root has risks - do so at your own risk. Forward some spam email to the thisisspam account and run the script to test it. Make sure that the logfile shows that the emails were RipMIME'd and that they were learned by sa-learn. If sa-learn has seen them before it will not learn them again unless it forgets them first, so do not be suprised it you see more examined than learned. Once the script is tested, enter the cron job for it and watch your logs for activity.

Maintenance - LogRotate does a fine job of rotating the logs on the system. A recommended entry for the salearn.log is:

# AutoLearn Spam Log
# This should rotate the log every week
# and keep one month's worth of logs archived
/var/log/salearn.log {
weekly
rotate 4
nocompress
}

You can download the following related files:

http://freebsdrocks.net/files/tarballs/salearn.tgz

References:

http://freebsdrocks.net:8080/freebsdrocks.net/qmail-utilites/forwarding-emails-as-attachments-in-ms-outlook

freebsdrocks.net facts

Get in touch