[From_old_site] Greycasting: a distributed heavy duty greylisting implementation or Nothing beats /bin/rm for pruning a database

At work, I maintain a mail filter serving a 5 digit number of users. Naturally, this attracts a lot of spam. To reduce the impact, greylisting is used.
A few simplifications have been made: only the domain of the email sender is recorded. This is to cope with systems that change the envelope sender for each retransmission. Likewise, only the /24 IP address prefix is recorded (in order to be friendly to GMail/HotMail users).Historically, there’s been huge activity peaks with bursts of up to 20 million emails a day (1 minute samples). This is over 200 mails/sec and was not healthy to the performance of the cluster — due to overload, legitimate mail were tempfailed leading to long delays in email delivery.To avoid any single point of failure it is desirable avoid the use of a central greylisting database so a distributed technology is needed. Rather than using an off-the-shelf product like MySQL’s cluster database a custom scheme named greycasting is used. The filter nodes run small daemons – greycastd – which receive greylist updates from all cluster nodes including itself. It updates a local database which is consulted by the mail filter daemons (mimedefang) running on it. When the filter decides to greylist an incoming email, it records the info in a log file which is is sent to the all greycast daemons every few seconds. The current implementation uses UDP broadcasts. This is dirty but simple and working. Multicasting would be a natural alternative, but I’ve seen strange problems with IGMP snooping messing with the distribution of multicast packets, so let’s stick to the low tech solution. Naturally, packets may get lost or greycast daemons may be temporarily down/freshly installed. This may lead to mails getting greylisted an extra time and is considered an acceptable price for keeping the design simple.This leaves the question of which local database to use. Two alternatives: MySQL and SQLite has been tested. The latter is a very nice, lightweight standalone database with impressive performance figures. It uses one file system file per database. The problem with both is that most of the records inserted have to be deleted again in order not to accumulate zillions of entries over time. That’s may be a couple of hundred records of second and may coincide with a similar need for inserting new records. Moreover, SQLite locks the database when it deletes records. The consequence was that the greycast daemon saturated the disk and starved the defang slaves.To solve the problems, 3 SQLite databases are now used:

  • junior0
  • junior1
  • tenure

The junior databases are used to store greylisting records which will most likely have to be deleted. The tenure database is used to store records for emails which have actually been retransmitted. Mails caught as spam/virus are not entered into the tenure database.The mail filter has to consult all 3 databases to make sure that an email has not been seen before. This is not a performance problem in practice. When greycastd inserts a greylisting record, it is done in junior0/junior1 depending on whether the day of the year is odd/even. When the rare event happens that a MTA retransmits, the greylist record is added to the tenure database. There’s no reason to waste time removing it from the junior database. Every day at 23.00, the junior database not written to today is removed using /bin/rm) and an empty one is created. As the subtitle of this document hints, this is an unbeatable fast way to clean up a database…

The 3 database implementation is somewhat inspired by LISP garbage collection techniques like stop-and-copy GC (copy the little percentage of objects that are alive and do away with the rest) and generation-scavenging techniques (use separate heaps for short/long-lived objects).

The limitation of greycasting is that it that all nodes perform the same writes meaning that buying more nodes won’t help you if you’re saturating your disk with writes. Read activity on the other hand will scale nicely with the number of nodes.

Naturally, the tenure database will grow over time. When it gets too big, it can be tossed away scratching the greylist state. If this is deemed unacceptable it can be pruned the traditional way.

Greycasting has been in service for over a couple of years. It works nicely and can sustain very rough treatment by the spammers/virus writers.

Update 2007-12-22: a month or two ago, we again encountered problems with heavy spammer activity saturating the disks on the cluster nodes. The fix chosen was to move the junior databases to RAM disks. This effectively removes all the disk R/O activity (a copy of the tenure database will be reside in the OS buffer cache) and most of the R/W activity. As we almost never reboot the cluster nodes, we haven’t yet bothered with saving/restoring the dbs in the case of reboots. With this fix in place, we’ve handled greylisting activity up exceeding 500 mails/sec for extended periods without any problems but a little high firewall CPU load. Handling this with ordinary disks would require enterprise-grade hardware.

Update 2008-09-08: during the past week, we’ve seen greylisting activity caused by malware runs scaling to up to 40-50 million mails per day lasting for up to an hour. Log file inspection reveal that about 0.5 percent gets through GL and is subsequently caught by the virus filters.

Update 2008-09-09 <shudder>Recent articles in the danish press indicate that the largest danish ISP only handles 6-7 more times junk mail per month than our filter </shudder>. They do – as most mail systems outside the academic world – not employ greylisting, I suppose they have to adopt a kiwish approach.

Update 2008-09-10 We may look into doing a little more intelligent greylisting by using some sort of reputation system meaning that we won’t greylist mails from servers that are not major spam sources and does proper retransmissions anyway. One idea would be to use Abaca-like idea (spam are mostly sent to users getting a high percentage of spam, yes those guys seem to be right) or getting reputation data from our conventional spam filters.

Update 2008-10-27 I have a suspicion that RAM disk based greylisting is a sort of cloud busting (picture). The reason is that it seems that the spammers adopts the amounts of spam sent to the capacity of the mail server, they’re targeting. After all hitting a low-capacity mail server with hundreds a mails per second would be counter-productive. So I don’t think we would get millions of spam mails if we disabled greylisting.

After adopting the ideas of the previous update, we’re now mostly greylisting the mails from sources sensitive to GL and passing mails from legitimate sources w/o delay.

Update 2009-05-19 Using RAM disks may be an excellent way of speeding up SQL tables but why use them in the first place when all we really need is a shared associative array local to the node? So recently, I’ve replaced the use of two junior databases with a stand-alone instance of the excellent freeware product Memcached. Memcached is sort of a distributed RAM based hash table that is used by heavy traffic sites like /. It automatically reuses entries when they become too old. I’ve allocated a meager 512 megs for memcached to use. If greylisting activity should explode, entries will be expired faster than 1 days rather than filling up the RAM disk.

Leave a Reply