Blue Flower

Monday, 06 July 2015 00:35

SpamAssassin Auto-Learning with Site-Wide Bayes and User Feedback

Written by
Rate this item
(0 votes)

The object of this howto is getting your SpamAssassin 3.0.x Bayes Database
effective system-wide and allow your users to feed mis-tagged spam back to the
server where a script automatically runs sa-learn on it.  In order to use this
method you need the following:

A properly working email server with Spamassassin 3.0.x
RipMIME from
You can install the FreeBSD port: /usr/ports/mail/ripmime
An email account on your server (i.e. This email address is being protected from spambots. You need JavaScript enabled to view it. )
for the users to send the spam to
The learnspam script included in this package 
Users must send the spam emails as ATTACHMENTS to your thisisspam email address

This howto is based on a qmail server setup according to
Other servers will be similar but you must adjust directories and accounts accordingly.

Please note: If you are running the freebsdrocks setup, you do not need to change the spamd service to run as a different user. It is already running as user qscand.

STEP 1 - The System Account:

The System-wide Bayes Database and spamassassin need to operate as the same user.
Normally that would be spamd as set in /etc/sysconfig/spamassin (or similar)
But the Autolearn script must be able to R/W the mail directories on the server
and the Bayes Database.  Spamd cannot R/W mail directories so you must run the
script as either root (cron.daily) or vpopmail.  However, Vpopmail does not have
R/W permissions to the Bayes Database if spamd is running spamassassin. 
For those who do not wish to risk running the script as root, simply change
the spamd user to qscand by setting the -u and -h options in
/etc/sysconfig/spamassin from spamd to qscand.
Then when you restart spamassassin, ps aux should show spamd running as
qscnad who is able to R/W the Bayes Directory. 

Once you decide which account
will run spamassassin and the autolearn script, choose where in that account's
home directory to put the database, the default is
/home/(account name)/.spamassassin

STEP 2 - Setting up Bayes and Autolearning in Spamassassin:

Edit /usr/local/etc/mail/spamassassin/ and insert the following lines:

bayes_path /path/to/your/bayes/directory ( as you chose in Step 1)
use_bayes 1
bayes_auto_learn 1

Save the file and restart spamassassin
Run sa-learn --sync to resync the database
Run sa-learn --dump magic and you should see nham and nspam at 0

You need 200 ham and 200 spam in your database for Bayes to autolearn.
If you have good emails in your users' /cur directories do the following:

# find /home/vpopmail/domains -type d -name cur -exec sa-learn --nosync --ham {}/* ;

Then run sa-learn --sync and sa-learn --dump magic to see that they are there.
Otherwise gather some legit email from your users or other sources into a
directory on the server and run sa-learn --nosync --ham on them, then --sync again.

Find some spam to force feed the database - drop it into a folder and run
#sa-learn --nosync --spam /path/to/spam/*
Then run sa-learn --sync and sa-learn --dump magic again
to make sure the database is growing.  You should see numbers climbing steadily
as spamassassin automatically learns spam and ham as mail flows through the server.

STEP 3 - Setting up the Feedback Autolearn Script

After setting up your spam account and installing RipMIME,
Edit the learnspam script variables per your preferences and system.

The system account the script runs as must have /usr/local/bin in their $PATH to find
RipMIME. If you chose to run the script as root (from cron.daily) you will need to
insert this line in the script:   PATH="$PATH":/usr/local/bin
Remember, however, that running anything as root has risks - do so at your own risk.
Forward some spam email to the thisisspam account and run the script to test it. 
Make sure that the logfile shows that the emails were RipMIME'd and that they were
learned by sa-learn.  If sa-learn has seen them before it will not learn them again
unless it forgets them first, so do not be suprised it you see more examined
than learned. Once the script is tested, enter the cron job for it and watch your
logs for activity.

Maintenance - LogRotate does a fine job of rotating the logs on the system.  A recommended
entry for the salearn.log is:

# AutoLearn Spam Log
# This should rotate the log every week
# and keep one month's worth of logs archived
/var/log/salearn.log {
rotate 4

You can download the following related files: and salearn.log


Read 3610 times Last modified on Wednesday, 08 July 2015 01:39

Leave a comment

Make sure you enter all the required information, indicated by an asterisk (*). HTML code is not allowed.