[From_old_site] Using NilSimsa codes for spam filtering

I may have an affinity for technologies with semi  cryptical names. I spent a lot of time working with CRM114 for spam fighting purposes without any result worthwhile the investment.Another technology with a nice name — NilSimsa codes — have caught my attention.I remembered the name back from the happy days when the Razor spam filter for a while removed all incoming spam.NilSimsa codes boils down an arbitray length document to 256 bits like a hash would do. But contrary to a cryptographic hash, small textual differences only make relatively small changes to the code. This makes is suitable for building a local Razor-like database which users can use to submit spam to. For an academic paper I stumbled upon recently which outlines similar ideas, refer to this link.For a description of the filter, I’ve implemented, refer to the local user FAQ.You can do interesting things with Nilsimsa codes:

  • Scrubbing mail from user inboxes. I.e  moving already delivered  mails out of  users inboxen and into the global spam repository. This may be somewhat controversial.
  • Given a large set of NilSimsa codes, you can do clustering and take action when a cluster is discovered. This works quite nice

