zachary.com

personal pages

All ad proceeds donated to charity.

False Positive Spam Identification with Python

My email system typically sees hundreds of spam messages per day. SpamAssassin catches most of these, and Apple Mail generally gets the rest. However I do worry about false positives even though these days that case seems to be very rare. I've set my SpamAssassin threshold level to be relatively low, and procmail moves all spam with scores from 20 down to my threshold into a junk folder. To make sure I don't miss any false-positives, I run the Python script below to scan the junk folder for potentially interesting emails.

What's nice about this script is the efficiency of using it: just type junk at the command prompt, and I see a list of all of the messages with scores below 10 (by default), nicely lined up for a very quick scan. If something looks like it might not be junk, I can just type junk -p 5 for example, to have a quick look at message number five. If it's not junk, I'll either use the command line or Apple Mail to move the message out of the junk folder. The output ignores any human readable From address (which is usually forged anyway) and just prints the raw email address. Here's an example:

$ junk
Mail message with spam scores below 10

Msg SL From                              Subject
--- -- --------------------------------- -------
  3  8                PCVZJ@djkjkkkd.com  ****Cheap Cialis****
  0  9            zjhsiyfsx@cashette.com what will your kids do when you die
  1  9        miachung_rz@worldonline.de =?ISO-8859-1?B?WW91ciBwYXltZW50cyBjYW4gY

The script is setup to use Maildirs, but it should be easy to use other mail box formats as well. Perhaps someday I can add logic to see if the 'received-from' headers make sense with respect to the from address, but for now this works well enough.

#! /usr/bin/python

import sys, getopt
import mailbox, email
from email.Utils import parseaddr

#
# settings you will need to change
#
maildir = '/home/username/.maildir/.junkmail'
spamtag = '***SPAM***'
defaultSpamLevel = 10

###

spamtaglen = len(spamtag)

def buildList( tlevel ):
    tlist = []
    i=0
    jf= mailbox.Maildir( maildir, email.message_from_file )
    for msg in jf:
        l = 0
        if 'x-spam-level' in msg:
            l = len(msg['x-spam-level'])
        if l <= tlevel:
            (tmp,f) = parseaddr(msg['from'])
            s = msg.get('subject', '')
            if s.startswith( spamtag ):
                s = s[spamtaglen+1:]
            tlist.append( (i, l, f, s, msg) )
            i += 1
    return tlist

def printMessage( tlist, i ):
    (i,l,f,s,msg) = tlist[i]
    print msg

def usage():
    print "junk [-s | --spam ] [-p | --print ]"

def main():
    try:
        opts, args = getopt.getopt(sys.argv[1:],
                                "s:p:h",
                                ["spam", "print", "help"])
    except getopt.GetoptError:
        usage()
        sys.exit(2)
    printMsg = None
    tlevel = defaultSpamLevel
    for o, v in opts:
        if o in ("-h", "--help"):
            usage()
            sys.exit()
        if o in ("-s", "--spam"):
            print "option: spam level=", v
            tlevel = int(v)
        if o in ("-p", "--print"):
            printMsg = int(v)

    l = buildList( tlevel )
    if printMsg != None:
        printMessage( l, int(sys.argv[2]) )
    else:
        printList( l, tlevel )

if __name__ == "__main__":
    main()

***
highlight file error
***

Categories: technology python

Trackbacks (0)

Comments (1)

oganxtnasnb on Monday 15 March, 2010:

wSKn5P <a href="http://mlsqlfflfpwa.com/">mlsqlfflfpwa</a>, [url=http://mzvlbdtezgcv.com/]mzvlbdtezgcv[/url], [link=http://eigfcxsbrkrl.com/]eigfcxsbrkrl[/link], http://ewavtcohplnm.com/

Add a Comment

What is 51+46?
Name
URI
Comment
Comments are text only.
The math question is to ensure you are a human!

This page last modified Tuesday 24 May, 2005
All content Copyright 2003-2005, David Z Creemer