Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 25 May 2001 09:48:54 +1000
From:      Tony Landells <ahl@austclear.com.au>
To:        Dan Nelson <dnelson@emsphone.com>
Cc:        Odhiambo Washington <wash@wananchi.com>, FBSD-Q <freebsd-questions@FreeBSD.ORG>
Subject:   Re: Large mail file (3GB) 
Message-ID:  <200105242348.JAA05172@tungsten.austclear.com.au>
In-Reply-To: Message from Dan Nelson <dnelson@emsphone.com>  of "Thu, 24 May 2001 14:01:22 EST." <20010524140121.A28899@dan.emsphone.com> 

next in thread | previous in thread | raw e-mail | index | archive | help

dnelson@emsphone.com said:
> In the last episode (May 24), Odhiambo Washington said:
>> I have a file that is 3GB in mbox format. I need to split it into 3
>> parts then access it using elm, or even mutt. Is there a utility in
>> FreeBSD that can be used to truncate a file into some predetermined
>> parts?
> Do you mean a single message, or multiple messages in a single large
> mailbox?  Just load mutt up on the 3gb mailbox.  As long as you have
> less that 65000 messages mutt will be able to read it fine.

> If you want to split it up, just tag the first 1/3 of the messages,
> save them to "mbox1", tag the next third, save them as "mbox2", etc.

I think the whole point is that he doesn't have a means to tag them
and save them because he can't find a program to load them.

As I understand it, what he wants to do is find a utility that will
split the file without loading the whole thing.

> Sometimes in Winblows I use a small utility called MEGAFLI, but there
> are others also. I cannot access this file using mutt because doing
> that does brinf my system down to its knees..a pentium III 500MHz
> with 128MB RAM...

> Hmm.  Mutt shouldn't have any problems, unless you really have a huge
> number of messages.  All mutt keeps in memory is a few headers per
> message.  For example, my archive of the FreeBSD-Questions for this
> year has 24K messages and is 86MB in size, and mutt requires 23MB of
> ram to load it up in unsorted mode. 

So what you're saying is that mutt wants about 25% of the mailbox size
in RAM?  Well, let's see, he has a 3 GB mailbox, so that would be about
750 MB of RAM required.  He has 128 MB of physical RAM, say about double
that in swap for a total of 384 MB of "memory"...  Looks like he's still
way short.

The problem with trying to split the file is that you need something
that understands mbox format, otherwise you'll get a message that's half
in one file and half in the other, which will be a mess.

He could try "split" with a pattern, since all the messages start with
a line like:

	From user Fri May 11 16:25:23 2001

but that will still run foul, for example, of messages with that have
other messages MIME-encapsulated.  And, of course, he'll then wind up
with one file per message and need to glue them back together (though
that's much easier).

Another option would be to install nmh from the ports collection, which
is a command-line mail package.  The program to move messages from the
maildrop to the "inbox" is "inc", and it's basically splitting the file
maildrop into individual files (similar to my "split" suggestion, but
it has a complete understanding of mail messages and should therefore
do a much better job).  In theory it should be able to do this without
much RAM, assuming you have the space on disk to have (temporarily)
two copies of the mail.

Once you have the mail in your nmh inbox, you can get an overview with
"scan" (which will show you the headers), select subsets of the messages
with "pick", repack them into mbox format files with "packf", ...

For example, if I've been away from my mail for a while and I know there's
a lot there (which takes about two days), I can do the following to have
a quick cleanup of freebsd-questions:

	# Incorporate my maildrop into my inbox
	$ inc
	Incorporating new mail into inbox...

	  1+ 05/24 Heather Hanneman   ASX TECHNICAL ANNOUCEMENT 19/01<<--Mark=_2001524
	[ complete listing deleted ]
	# Pick everything with a "To:" or "cc:" line with freebsd-questions,
	# and put it in a named set (a sequence) called "questions"
	$ pick -to freebsd-questions -o -cc freebsd-questions -seq questions
	1076 hits
	# Now do a summary listing of that list through less.
	# If there's anything interesting, I can do a shell escape and
	# "show 45" to look at message 45, for example.
	$ scan questions | less
	# Okay, I've done what I needed with some of the messages.
	# The rest can go to bit-bucket heaven.
	$ rmm questions
	# Now put the rest back into my maildrop so I can load them in
	# something else.
	$ packf -mbox -file /var/mail/ahl
	# And finally, clean up the directory that NMH created.
	$ rm -rf ~/Mail

Note that if I wanted, I could also have created another sequence of
only the stuff I was interested in and just packed that.  The sequence
is really just a convenience--you can refer to things by message number
as well.

It's kind of overkill for this occasion, but it's the only "ready made"
thing I can think of that has the tools...

Tony
-- 
Tony Landells					<ahl@austclear.com.au>
Senior Network Engineer				Ph:  +61 3 9677 9319
Australian Clearing Services Pty Ltd		Fax: +61 3 9677 9355
Level 4, Rialto North Tower
525 Collins Street
Melbourne VIC 3000
Australia



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200105242348.JAA05172>