Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Feb 2001 10:37:03 -0800
From:      Peter Wemm <peter@netplex.com.au>
To:        Gordon Tetlow <gordont@bluemtn.net>
Cc:        scanner@jurai.net, Dan Phoenix <dphoenix@bravenet.com>, freebsd-hackers@FreeBSD.ORG
Subject:   Re: qmail IO--qmail vs postfix competition 
Message-ID:  <200102211837.f1LIb3f26667@mobile.wemm.org>
In-Reply-To: <Pine.BSF.4.31.0102201845430.18356-100000@sdmail0.sd.bmarts.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
Gordon Tetlow wrote:
> On Tue, 20 Feb 2001 scanner@jurai.net wrote:
> 
> > 	Aha. That explains it. You use HW raid. I wondered why you were
> > only doing 4 million mails for *30* boxes. Dan, is doing 500K, on a
> > completely idle box (cpu/ram/I/O wise), with vinum, Postfix, and RAID-0.
> > Have you seen brad knowles papers on vinum vs HW raid? It's erm
> > enlightening to say the least :) Id be happy to dig up the URL if you are
> > interested. I personally will be using Vinum from now on. The performance
> > is very impressive.
> 
> Well, as I said, these boxes are rather bored. I don't think the load
> reaches above 0.05. Most of the time is delivering mail trying to
> negotiate with destination hosts. I don't think that the mailers are IO
> bound, but I haven't really looked to find out to tell you the truth. Once
> the mailers are set up we treat them as black boxes. They just work.
> 
> Also, the 500K number, is that per day? The 4 million was in 4 hours, not
> a day.

Another bored box:
mx1.freebsd.org$ grep 'status=sent' /var/log/mail | wc -l
  331877

It is 8 hours since the last rollover.  Unfortunately, it spends most of its
time waiting for something to do and looking at broken mail servers.  It
delivers most of its mail in a few seconds.  We see it peaking at delivering
several hundred envelopes per second shortly after getting a large mailing
list to digest.  Here's a quick histogram of what those 8 hours look like:

mx1.freebsd.org$ sh hist.sh
zero       1292    1292 0.36577 0.36577
one        4983    6275 1.41071 1.77648
two        7680   13955 2.17424 3.95072
three      10741  24696 3.04082 6.99154
five       30853  55549 8.73461 15.7261
seven      37626  93175 10.6521 26.3782
ten        48169 141344 13.6368 40.0151
fifteen    66877 208221 18.9332 58.9482
twenty     44244 252465 12.5257 71.4739
thirty     48059 300524 13.6057 85.0796
fourtyfive 23626 324150 6.68862 91.7682
sixty       6902 331052 1.95398 93.7222
ninety      7082 338134 2.00494 95.7271
twomin      2336 340470 0.66133 96.3884
threemin    1521 341991 0.43060 96.819
rest       11236 353227 3.18096 100
total 353227

First field: number of seconds.  Second is number of deliveries in that
interval, third is percentage of total that this represents, and last is an
accumulated percentage.

This is a 24 hour run for yesterday (1am -> 1am):
> sh hist.sh
zero        3186    3186 0.29641 0.29641
one        13724   16910 1.27684 1.57325
two        19948   36858 1.8559  3.42915
three      29557   66415 2.74989 6.17904
five       87973  154388 8.18473 14.3638
seven     104690  259078 9.74003 24.1038
ten       144142  403220 13.4105 37.5143
fifteen   208335  611555 19.3828 56.8971
twenty    134030  745585 12.4697 69.3669
thirty    148163  893748 13.7846 83.1515
fourtyfive 74129  967877 6.89673 90.0482
sixty      34204 1002081 3.18223 93.2305
ninety     28955 1031036 2.69388 95.9243
twomin      7146 1038182 0.66484 96.5892
threemin    4297 1042479 0.39977 96.989
rest       32364 1074843 3.01104 100
total 1074843

Some random samples of mail servers in the 5 to 20 second range show most
of this delay is due to remote sendmail response time, the ident lookup, etc.
I'm pretty pleased to see that 83% of mail is delivered in less than 30
seconds and that 90% is out by 45 seconds.  The 'zero' count is because
there are a couple of other well connected postfix servers nearby that have
a handful of subscribers :-)

The machine is only non-trivially busy for a small percentage of its time,
it could easily deliver 10 or 20 times that much mail before it was
really under load.  That is easily 10 to 20 million per day for one box.

This is a p3-800 w/ one ide disk.  We're in the process of switching it
to SCSI because of IDE drive problems.  The postfix spool will probably be
mirrored for safety.  Incidently, the spool is mostly write-only as the
entire spool fits cached in memory.

mx1.freebsd.org$ mailq
-Queue ID- --Size-- ----Arrival Time---- -Sender/Recipient-------
....
F40BC6E323E     2021 Wed Feb 21 02:42:14  owner-cvs-all@FreeBSD.ORG
              (connect to mx1.mainstreet.net[207.5.0.50]: Operation timed out)
                                         john@mj.com
              (connect to foobar.nisse.dk[24.232.51.205]: Operation timed out)
                                         r@nisse.dk
                  (connect to osfmail.isc.rit.edu[129.21.2.241]: read timeout)
                                         maf8113@osfmail.isc.rit.edu
               (connect to mx.mainstreet.net[207.5.0.45]: Operation timed out)
                                         alexm@securify.com
....
           (connect to mailhub.state.me.us[141.114.122.227]: No route to host)
                                         darren@bmv.state.me.us
                     (connect to mail.is-one.net[210.75.223.43]: read timeout)
                                         col@is-one.net
(conversation with mbox.iyard.org[140.117.11.95] timed out while sending RCPT TO)
                                         kimkara@iyard.org
(conversation with relay.orsk.ru[193.233.163.2] timed out while sending RCPT TO)
                                         dm@orsk.ru

-- 104395 Kbytes in 3639 Requests.

The queue (104MB on disk) fits comfortably in memory right now.  postfix
itself is very light on memory demands.

Some other postfix tuning stats:
- parallel outbound smtp sender processes: 500
- various qmgr params changed to keep the queue state in memory (ie: deal
with something like 100,000 recipients and/or envelopes)
- We use bulk_mailer to inject mail on hub.freebsd.org from majordomo
and avoid the -outgoing aliases. bulk_mailer was hacked to not split the
envelopes unless it got to 100,000 recipients and to not sort the addresses.
- hub uses mx1 as a mail exploder, leaving hub to the mailing list management,
archiving and searching roles and mx1 solely to delivery.  We have seen
it pump something like 2000 seperate messages in 3 seconds flat to mx1.

The only real problems we've had have been DNS related and disk media
errors on the cursed IBM DTLA drives.

Cheers,
-Peter
--
Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200102211837.f1LIb3f26667>