Writings Photos Code Contact Resume Me
mbox splitter
Submitted by msameer on Mon, 23/07/2007 - 12:56am

Just because I can't run sa-learn on my 15,000+ messages spam folder. It'll crash due to some hardware problems.
I thought that splitting the mailbox into smaller files will allow me to feed it to sa-learn.
I'm not sure something similar doesn't exist but I wrote mine anyway ;-)

The only problem is it consumes a lot of CPU and RAM, it was killed/crashed multiple times but it worked and allowed me to feed my spam mailbox to spamassassin!

Here it is in case someone needs it: split_mailbox.py. Needs python 2.5

Syndicate content  digg  bookmark

Submitted by Amr Gharbeia (not verified) on Mon, 23/07/2007 - 1:23am.

Would Maildir help? I need to learn SA too.

Submitted by msameer on Mon, 23/07/2007 - 1:33am.

Help in what ?

Submitted by Josh Triplett (not verified) on Mon, 23/07/2007 - 4:39am.

A similar tool exists in Git: git-mailsplit.

Submitted by Delian Krustev (not verified) on Wed, 15/08/2007 - 4:01pm.

The Maildir format is the format introduced by Qmail for storing messages. It looks like

Maildir
|-cur
|-new
|-tmp

Where cur, new and tmp are directories. Each messages is stored in a separate file under one of these directories(new for unread, cur for read, tmp for temporary files during the delivery).

You could look for mbox2maildir utility (www.qmail.org should help) .

Submitted by Christoph (not verified) on Wed, 12/09/2007 - 12:08pm.

The memory consumption of this method is that high that even larger mailboxes cannot be processed. I tried to split a mbox file with 125,000 messages using 1024MB of RAM, but I always got MemoryError.

Can anyone give a better solution to this?

Submitted by msameer on Wed, 12/09/2007 - 2:38pm.

didn't box2maildir work for you ?

Post new comment
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <b> <dd> <dl> <dt> <i> <s> <li> <ol> <u> <ul> <br> <br /> <blockquote>
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.
  • You can use BBCode tags in the text. URLs will automatically be converted to links.
  • Lines and paragraphs break automatically.
  • You may write mixed Arabic and English freely, line direction will be computed automaticaly

More information about formatting options