Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Mar 2004 12:07:23 -0500
From:      Louis LeBlanc <freebsd@keyslapper.org>
To:        freebsd-questions@freebsd.org
Subject:   Re: formail recipe
Message-ID:  <20040310170723.GA90043@keyslapper.org>
In-Reply-To: <20040310162744.GA2081@asu.edu>
References:  <20040310162744.GA2081@asu.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
I know what you mean.  Mine's over 6700, and that's just since 1/1/04.
I have no doubt whatsoever there are a good number of people here that
have that beat several times over in the same period of time.

What I do to trim mine down is just take the oldest messages out.
Naturally, this can be tricky since the Date: header is often bogus,
but it's a place to start.  Come the end of the quarter, I'll be
blocking off this archive folder and starting a new one.  At that
time, I'll be rebuilding my SA bayes db to make sure I have a
'correct' base.  The next quarters worth (which I'd like to delude
myself to believe will be smaller) will be feed in on a regular basis
to keep the bayes db on track.

The reason I suggest removing the oldest messages is that spammers
seem to evolve their methods, and the bayes db will be most accurate
with a more complete picture of CURRENT practices, with those methods no
longer being used not affecting the current db.  Over the last month,
I've seen their evolving methods start sneaking in under the SA radar,
and have slowly but surely dropped my threshold down to 1.0 rather
than the default 5.0.  So far, no FNs, and the FPs have gone away (for
now).

There will be lots of arguments to the contrary of at least some of
what I've said here, but the great thing about all this is you get to
decide what approach you have more confidence in.  This is the
approach I have more confidence in - though I'm open to any method of
tweaking that method.

Good luck.

Lou

On 03/10/04 09:27 AM, David Bear sat at the `puter and typed:
> Hope I'm not imposing too much on this group.. but since this group
> has a collection of the best, brightest, and generous..
> 
> I wonder if someone might have a formail recipe that would randomly
> select N messages from a mailbox of M messages?  I have a spam corpus
> thats well over 10000 and need to trim it down.
> 
> 
> -- 
> David Bear
> phone: 	480-965-8257
> fax: 	480-965-9189
> College of Public Programs/ASU
> Wilson Hall 232
> Tempe, AZ 85287-0803
>  "Beware the IP portfolio, everyone will be suspect of trespassing"
> _______________________________________________
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"
> 
> 

-- 
Louis LeBlanc               leblanc@keyslapper.org
Fully Funded Hobbyist, KeySlapper Extrordinaire :)
http://www.keyslapper.org                     ԿԬ

An age is called Dark not because the light fails to shine, but because
people refuse to see it.
    -- James Michener, "Space"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040310170723.GA90043>