I may have an affinity for technologies with semi cryptical names. I spent a lot of time working with CRM114 for spam fighting purposes without any result worthwhile the investment.Another technology with a nice name — NilSimsa codes — have caught my attention.I remembered the name back from the happy days when the Razor spam filter for a while removed all incoming spam.NilSimsa codes boils down an arbitray length document to 256 bits like a hash would do. But contrary to a cryptographic hash, small textual differences only make relatively small changes to the code. This makes is suitable for building a local Razor-like database which users can use to submit spam to. For an academic paper I stumbled upon recently which outlines similar ideas, refer to this link.For a description of the filter, I’ve implemented, refer to the local user FAQ.You can do interesting things with Nilsimsa codes:
- Scrubbing mail from user inboxes. I.e moving already delivered mails out of users inboxen and into the global spam repository. This may be somewhat controversial.
- Given a large set of NilSimsa codes, you can do clustering and take action when a cluster is discovered. This works quite nice