Posts Tagged ‘SpamAssassin’

SpamAssassin: Dealing with unrecognized spam

Saturday, May 3rd, 2008

Everyone hates spam, and one of the main ways that people are fighting it is through the use of SpamAssassin. I’ve been using it for a while now and have Sieve detecting spam headers and moving them to my Junk folder.

The Problem

Dealing with spam that went unrecognized has been more of a manual process. Every once in a while, I’d have to segregate all of my useful mail from the spam and run “sa-learn” on my the leftovers. This isn’t horrible, because I tend to shell into my server fairly frequently, but I really prefer to have menial tasks like this automated.

A solution

First of all, I created a folder in my mailbox called “Unrecognized Spam”. The name isn’t important, really. It just needs to be a place to file away all of those messages that SpamAssassin didn’t catch on the way in.

Once that was done, I wrote a very simple little script, which I dropped in /etc/cron.daily/:

#!/bin/bash
 
SPAM_DIR="/home/bshacklett/Maildir/Unrecognized Spam/cur"
 
cd "$SPAM_DIR"
sa-learn --spam .;
rm *

Nasty, I know, but it did the job. All I had to do when I got spam that went unnoticed by SpamAssassin was drag it into my “Unrecognized Spam folder” and it would be learned and gone within 24 hours. Of course, I was also getting mail from the cron daemon complaining when there weren’t any emails to learn from or delete.

Improvements

So, this morning I had a little spare time, so I decided to improve on the script a bit:

#!/bin/bash
 
# Constants
SPAM_PATH="Maildir/.Unrecognized Spam/cur";
 
# Find all of the directories directly under /home/
homeDirectories=(`find /home/ -maxdepth 1 -mindepth 1 -type d`);
 
# Loop through the found directories and check for spam
for homeDirectory in ${homeDirectories[*]}
do
    fullSpamPath=${homeDirectory}|>/${SPAM_PATH}|>;
 
    #Check if the spam directory exists under this home directory
    if [ -d  "${fullSpamPath}" ]; then
 
        # Check if there is mail under the spam directory
        if [ "$( ls -A "${fullSpamPath}|>" )" ]; then
            sa-learn --spam "$fullSpamPath";
            rm "${fullSpammPath}/"*;
        fi
    fi
 
done

Now I know I’m not a great shell scripter, but this is working pretty well. It basically scans all of the home directories and looks for the “Unrecognized Spam” directory under each one. If it finds it, it will test to make sure that there are emails in the folder, then learn them and remove them.

Caveats

  • This isn’t going to scale all that well. I’m guessing it would be fine for 200 users or less, as it runs at night, but it would need some tweaking for anything more.
  • As it is, this requires that your mail be stored in the Maildir format. I know that sa-learn can work with mBox stores, but I’m not sure how you’d target it effectively.