Filtering With Procmail

Journal started Jun 8, 2002

I made a nifty program. It's a procmail filter, and procmail is cool. I set my computer up to use the fetchmail program to collect mail from my POP3 mail server and deliver that mail to my internal mail delivery server, sendmail, using the SMTP protocol.

Fetchmail --- http://www.tuxedo.org/~esr/fetchmail/ http://www.tuxedo.org/~esr/fetchmail/fetchmail-FAQ.html

POP3 Protocol --- http://www.ietf.org/rfc/rfc1939.txt

SMTP Protocol --- http://www.ietf.org/rfc/rfc0821.txt

RFCs are cool because they tell you what commands your Internet programs really send. I telnet-ed to my pop3 server, typed the following lines:

USER starling PASS yeah_right_like_I'm_gonna_tell_you LIST RETR 1

I was greeted with a list of which messages I had on the server, and then the complete text of the first one flew by! All those commands are documented in the POP3 rfc.

SO.... after sendmail gets my mail, then the fun begins. In my .forward file in my home directory I put "|/etc/smrsh/procmail" (I'd made a link to procmail in the smrsh directory as root earlier)

Now, when sendmail sees that as its forward, it sends each mail message one by one to the program forwarded to. Procmail examines the message, compares it to any of the rules I have set, and either moves the mail itself or tells sendmail how to deal with it.

Actually, my procmail setup always moves the mail, just sometimes it moves it to the mail spool like sendmail would have done anyway.

Here's my procmail RC: --- # Directory for storing procmail configuration and log files # You can name this environment variable anything you like # or, if you prefer, don't set it (but then don't refer to it!) PMDIR=$HOME/Procmail

# Put ## before LOGFILE if you want no logging (not recommended) LOGFILE=$PMDIR/log

# To insert a blank line between each message's log entry, # uncomment next two lines (this is helpful for debugging) LOG=" "

# Set to yes when debugging VERBOSE=no

# Remove ## when debugging; set to no if you want minimal logging ##LOGABSTRACT=all

# Replace $HOME/Msgs with your mailbox directory # Mutt and elm use $HOME/Mail # Pine uses $HOME/mail # Netscape Messenger uses $HOME/nsmail # Some NNTP clients, such as slrn & nn, use $HOME/News # Mailboxes in maildir format are often put in $HOME/Maildir # NOTE: Upon reading the next line, Procmail does a chdir to $MAILDIR # and relative paths are relative to $MAILDIR MAILDIR=$HOME/Mail # Make sure this directory exists! SPOOLDIR=/var/spool/mail/starling

INCLUDERC=$PMDIR/rc.spam

# Messages that fall through all your procmail recipes are delivered # to your default INBOX (to find out yours, see step 2 above) ---

And correspondingly, rc.spam:

--- SPAMFILT=$HOME/code/spam/spamd VERBOSE=yeah SHELL=/bin/sh

# Get the X-IP headers

:0fhW | $SPAMFILT

# KILL the spam

:0: * ^X-IP: .*(\.pcnet.ro|\[12\.229\.143\.180\]|\.eu\.uu\.net) $MAILDIR/mail/spam ---

But wait, you experienced procmail users might say, what is that strange program called $SPAMFILT? It's a really bad name actually, what my program "spamd" does is parse the headers of the message as a filter, extracting the IP address from each Recieved: line and adding a header called "X-IP:" that lists the IP address and its primary DNS name.

That's what 'fhW' means. It tells procmail to treat the bottom line as a filter for the headers and Wait without displaying program crashes. I have yet to see an example that does something so simple as explain how to write one such filter, but I managed to trick it out through trial and error.

It's simple really. The headers are fed into standard input, and you output the modified headers via standard output. If you want to filter the body, the 'fhW' has to be 'fhbW' or 'fbW' or something like that.

I wrote a C program using the POSIX regex package to locate anything matching "^Received:.*\\[(([0-9]{1,3}\\.){3}[0-9]{1,3})\\]". The first submatch is the IP address in [] brackets in the Recieved: line. Then, it does a DNS lookup on that IP address using erm... gethostbyaddr, and displays the DNS name and the IP thusly: X-ID: no.body.some.isp.com [33.44.555.66] All headers, including that Received: line are also sent to stdout normally.

The end result is I can do a search on the IP and DNS name of any computer the mail has passed through on its path through the Internet. The next rule in my 'rc.spam' file moves any files with an X-IP containing '.pcnet.ro', '[12.229.143.180]', or '.eu.uu.net'. I'm pretty good about spam, but these servers have been consistently and unflagginlgly been sending me harassing emails.

Pacbell must be giving pcnet.ro lists of email addresses, either because of a mistaken listing service, sheer corporate greed, or malicious hacking. I've gotten no responses from any of the administrators for these ISPs, no action taken, and the spam keeps coming. I told Pacbell to block them, and slow down the terrible traffic a bit, but you guessed it. No response so far.

I'm not going to say Pacbell could block an entire ISP from all of their users, but it would be nice to have a shell account and a .procmailrc on their computer. Then I could actually refuse to allow the mail to be delivered. Regardless, my solution works quickly to tag the people I have identified as sending spam, doesn't rely on 'From:' lines nor the bogus DNS names that most spammers put in their Received line. I've seen this before:

Received: from aol.com ([213.154.159.96]) by ....

where 213.154.159.96 resolves to 'isdn57.bb0.pcnet.ro'. Quit picking on aol.com, spammers! Even if it deserves it. :-)

It does take time and a talk with my DNS name server to resolve an IP address, but this all happens in the background. Because I'm running fetchmail, the mail gets processed, some sent to the spam folder, some sent to my local mail spool. Then when I check 'gnus', it downloads the already filtered mail from the local mail spool. I don't have to wait for my POP3 server, or my name server! Life is good.

If I were to increase the robustness of this spam filter, I would probably set up my rc.spam to filter from an external list of known spammers, a personal 'blackhole list' if you will. That's it. My C program took a while, but it was one day's worth of work, and now I'll know exactly when to loudly complain to a spammer's IP because #1, their IP is already resolved in the headers. #2, their message is already banished to the spam folder.

I'd list the code of my program, but I'd like to keep track of the people who know about it. If you want to know it, feel free to ask. I'll just make that quick feedback cgi app I've been meaning to, and you can let me know with that! Hold on a sec.

[2 days later...]

Oog. @.@ Well it was harder making the feedback form & processor than it was making the spam filter. But I did it. It needed to be done anyway. It's amazing how hard it is to keep all those characters escaped and unescaped properly!

Fill out the form I made and make sure to put a working email so I can respond to you. Warning, I'll let people know here if I receive too much email. *laughs*

Comment
Index
Previous (You might be Immortal if...)
Next (Grading on Averages Sucks)