From owner-freebsd-chat  Fri Feb 14 10:44: 8 2003
Delivered-To: freebsd-chat@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C2BDA37B407
	for <freebsd-chat@freebsd.org>; Fri, 14 Feb 2003 10:43:58 -0800 (PST)
Received: from wolfbert.skynet.be (wolfbert.skynet.be [195.238.3.13])
	by mx1.FreeBSD.org (Postfix) with ESMTP id A101743F93
	for <freebsd-chat@freebsd.org>; Fri, 14 Feb 2003 10:43:56 -0800 (PST)
	(envelope-from brad.knowles@skynet.be)
Received: from riker.skynet.be (riker.skynet.be [195.238.3.89])
	by wolfbert.skynet.be (8.12.7/8.12.7/Skynet-OUT-FALLBACK-2.22) with ESMTP id h1EExLZp022199
	for <freebsd-chat@freebsd.org>; Fri, 14 Feb 2003 15:59:21 +0100 (MET)
	(envelope-from <brad.knowles@skynet.be>)
Received: from [10.0.1.2] (ip-26.shub-internet.org [194.78.144.26] (may be forged))
	by riker.skynet.be (8.12.7/8.12.7/Skynet-OUT-2.21) with ESMTP id h1EEwn2o002496;
	Fri, 14 Feb 2003 15:58:51 +0100 (MET)
	(envelope-from <brad.knowles@skynet.be>)
Mime-Version: 1.0
X-Sender: bs663385@pop.skynet.be
Message-Id: <a05200f14ba72aae77b18@[10.0.1.2]>
In-Reply-To: <3E4CB9A5.645EC9C@mindspring.com>
References: <20030211032932.GA1253@papagena.rockefeller.edu>					
 <a05200f2bba6e8fc03a0f@[10.0.1.2]>					
 <3E498175.295FC389@mindspring.com>				
 <a05200f37ba6f50bfc705@[10.0.1.2]>				
 <3E49C2BC.F164F19A@mindspring.com>			
 <a05200f43ba6fe1a9f4d8@[10.0.1.2]>			
 <3E4A81A3.A8626F3D@mindspring.com>		
 <a05200f4cba70710ad3f1@[10.0.1.2]>		
 <3E4B11BA.A060AEFD@mindspring.com>	
 <a05200f5bba7128081b43@[10.0.1.2]>	
 <3E4BC32A.713AB0C4@mindspring.com>
 <a05200f07ba71ee8ee0b6@[10.0.1.2]>
 <3E4CB9A5.645EC9C@mindspring.com>
Date: Fri, 14 Feb 2003 15:58:09 +0100
To: Terry Lambert <tlambert2@mindspring.com>
From: Brad Knowles <brad.knowles@skynet.be>
Subject: Re: Email push and pull (was Re: matthew dillon)
Cc: Brad Knowles <brad.knowles@skynet.be>,
	Rahul Siddharthan <rsidd@online.fr>, freebsd-chat@freebsd.org
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
Sender: owner-freebsd-chat@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-chat.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-chat>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-chat>
X-Loop: FreeBSD.org

At 1:40 AM -0800 2003/02/14, Terry Lambert wrote:

>                                                              If you
>  are using an NFS server (which you are), then it's based on your
>  ability to saturate your network device.

	You're still limited by disk devices that may be used temporarily 
on the local server, as well as the disk devices on the other end of 
that network connection.  Putting them on the network does not 
magically solve the problem that disk I/O is still many orders of 
magnitude slower than any other thing we ever do on computer systems.

>  Disagree.  These locking issues are an artifact of the system
>  design (FS, application, or both).

	And you have magically solved all these problems in what way?

>  Simple answer: Don't use a metadata intensive storage mechanism.

	So, use what -- a pure memory-based file system for hundreds of 
gigabytes or even multiple terabytes of storage?  Even that will 
still have synchronous meta-data update issues with regards to the 
in-memory directory structure, even if those operations do take place 
much faster.

>  In other words, the message takes up your disk space, no matter
>  what.

	I other words, I can protect the entire system from being taken 
down by a concerted DOS attack on a single user.  They're going to 
have to work harder than that if they want to take down my entire 
system.

>>          SIS increases SPOFs, reduces reliability, increases complexity,
>>  increases the probability of hot-spots and other forms of contention,
>>  and all for very little possible benefit.
>
>  The only one of these I agree with is that it increases complexity.

	In what way does SIS *not* increase SPOFs, reduce reliability, 
increase the probability of hot-spots and other forms of contention, 
and in what way does it magically solve all the storage problems of 
the system?

>  This discussion *started* because there was a set of list floods,
>  and someone made a stupid remark about an important researcher
>  indicating he was cancelling his subscription to the -hackers
>  mailing list over it, and I pointed out to the person belittling
>  the important researcher that such flooding has consequences that
>  depend on the mail transport technology over and above "just having
>  to delete a bunch of identical email".

	Okay, so let's say that you've got this magical SIS which solves 
all storage problems, and you let your users have unlimited disk 
space.  All it takes is someone applying trivial changes to the 
messages so that they are not all actually identical, and you're back 
to storing at least one copy of each.

	Such transformations are typically found in message headers 
(message-ids are supposed to be unique, and combinations of date/time 
stamps and process ids will probably be unique, especially when taken 
over the entire message and the multiple hops it might have 
traversed).

	Such transformations are becoming much more typical with spam, 
where the recipient's name is part of the message body.


	So, you're right back where you started, and yet you've paid such 
a very high price.

>  As far as "dealing with DOS", in for a penny, in for a pound: if
>  you are willing to burn CPU cycles, then implement Sieve or some
>  other technology to permit server-side filtering.

	We're doing that, too.  However, server-side filtering can only 
do so much.  Yes, it can eliminate duplicates that have the same 
message-id (although there is some risk that you'll eliminate unique 
messages that have colliding ids), and there is the possibility to 
program it so that it can actually inspect the content and eliminate 
additional messages that have the same message body fingerprint as 
previously seen.

	But even that can only go so far.  See above.

>  We also know that, for most DOS cases on maildrops, the user
>  simply loses, and that's that.

	True enough.  But I don't have to throw out all of my users 
simply because just one of them was the target of a DOS.

>  Let's quit talking about the free services.

	Yes, please.

>  So let's limit ourselves to the realm of LWCYM - "Lunches Which
>  Cost You Money".

	Sounds good.

>  The replication model is actually a pretty profound issue.  Prior
>  to replication, if you connect to one of the replicas, the message
>  can be seen as "in transit".  Post deletion on an original prior to
>  the replication, and the deletion can bee seen as "in transit".  The
>  worst case failure modes are that a message has increased apparent
>  delivery latency, or the message "comes back" after it's deleted.

	Yes, at another level, the particular replication model chosen 
will be important.  However, at this level what we really care about 
is the fact that the message/mailbox is replicated, and we don't 
really care how.

>>          That's what I was calling the "recipient system".  It is the
>>  system where the message was received.
>
>  This is not useful to talk about in terms of a POP3 maildrop.

	Sure it is.  I've got limited disk space that I can afford to 
give each user, in accordance to the amount of money that they are 
paying for their service (or is being paid on their behalf).  But 
their local disk storage is limited only by their own budget (or the 
budget of their group), and is not an expense that I have to account 
for.

	So, when defining "recipient system", it makes perfect sense that 
this would be the point at which the mail is accumulated into some 
sort of a mailbox or queue and held on their behalf, regardless of 
whether that mailbox/queue is downloaded/retrieved with UUCP, POP3, 
IMAP4, or some other protocol.

>  To all intents and purposes, message in a POP3 maildrop are
>  "in transit on a point to point mail transport".  That's really
>  the whole point of acknowledging a "pull" technology exists, in
>  the first place.

	Yes, there is another component to the system, which comprises 
the system of the end user, their bandwidth to the server that holds 
their mail, etc....  But this is not the "recipient system".  This is 
the "end-user system".  It's an important system in the overall 
scheme of things, but is different from the one we're talking about 
-- they manage their own end-user system, but I manage the recipient 
system(s).

>  The majority of that latency is an artifact of the FS technology,
>  not an artifact of the disk technology, except as it impacts the
>  ability of the FS technology to be implemented without stall
>  barriers (e.g. IDE write data transfers not permitting disconnect
>  ruin your whole day).

	Again, I'd like to know where you get this magic filesystem 
technology that solves all disk I/O performance issues and makes them 
as fast as a RAM disk, while also being 100% perfectly safe.

>  Unless I can use someone else's stored copy of the message to
>  recover my corrupted stored copy of the message, that's not
>  replication, it's duplication.

	Correct.  But with only ~1.3 recipients per message (on average), 
there isn't much duplication to be had anyway.  The whole replication 
issue is a different matter.

>  The reason I brought up SIS again is that you seemed more than
>  willing to let a message sit in the main mail queue, but almost
>  paniced at the idea of throwing it into the user mailbox instead.

	No, I don't panic "...at the idea of throwing it into the user 
mailbox...".  I have defined queueing & buffering mechanisms that 
function system-wide, which help me resist problems with even 
large-scale DOS attacks, and help ensure that all the rest of my 
customers continue to receive service even if a single user has an 
overflowing mailbox.

	But it's easier to solve this problem at the system-wide level 
where I can allocate relatively large buffers, as opposed to 
inflicting it on the end user and letting them try to deal with it 
across their slow dial-up line (or whatever).

>  Nope; I want to do it to get you to agree to turn off quotas,
>  if your business model is not based on the idea that it's OK
>  to drop email into /dev/null for customers who don't pay you
>  more money.

	Bait not taken.  The customer is paying me to implement quotas. 
This is a basic requirement.

	Moreover, even if it wasn't a basic requirement, I'd go back to 
the customer and make sure that they understood that they're placing 
the entire mail system for all thousands of users at risk if there is 
a single mail loop or a large DOS attack on a single user, where I 
have better tools to constrain these issues at a system-wide level.

	If they still said that they didn't want quotas, then I'd let 
someone else build the system for them -- I wouldn't want my name on 
it.


	I don't drop the stuff in /dev/null.  I just put some limits on 
things so that I've got brakes that will automatically kick in and 
start slowing the train down if there is an excessive overspeed 
problem for an excessive period of time.

>  FS design issue.  And metadata updates in FreeBSD (with soft
>  updates) or SVR4.2 or Solaris (with delayed ordered writes) are
>  *NOT* synchronous, they are merely ordered.

	Well, we're not talking about FreeBSD.  I wish we were.  However, 
I can assure you that UFS+Logging definitely has synchronous 
meta-data update issues -- making them ordered or putting them into a 
commit log and doing them in larger chunks does not eliminate them.


	Fortunately, in this case I have architected the system so that 
we shouldn't run into those problems very often.

	However, there's nothing I can do about synchronous meta-data 
issues with the network & filesystem implementation of the NFS 
server, and any related problems with the NFS client.

>  You limited my options to Open Source, however.

	Because there is no additional money to spend, open source is 
really the only practical choice.  However, neither UW-IMAP nor Cyrus 
will work on NFS, thus leaving us with either the complete Courier 
package, or just the Courier-IMAP component.

>  Maildir is a kludge aound NFS locking.  Nothing more, and nothing
>  less.

	Yup.  And I'm convinced that it introduces more problems than it 
solves.  But I still don't have much choice.

>  MS Exchange does, and so does Lotus Notes.  I know they suck, but
>  they are examples.

	They're not IMAP servers.  They are proprietary LAN e-mail 
systems that may happen to have an interface to this alien IMAP 
protocol.

>>          Nope.  mmap on NFS doesn't work.
>
>  Who's using mmap?!?

	Cyrus.  All those databases it keeps to help inform it what the 
status is of the various messages, etc... are using mmap to access 
the information inside the database files.  Or are you not familiar 
with the method of operation of tools like Berkeley DB?

>  This is interesting to know; from the documentation available,
>  they imply they scale, and a single instance of one seems to
>  match their claims for a single instance.  I guess it's always
>  worse than the marketing literature, when you deploy it.  8-(.

	Actually, the Netscape/iPlanet e-mail server is just a re-badged 
SIMS, which is itself a partial port of PMDF from Vax/VMS to Unix, 
which was a port of the original MMDF from Unix to Vax/VMS.


	While I have a lot of respect for PMDF and the work that Innosoft 
did, we know from practical experience that SIMS can't scale beyond 
~60,000 POP3 users with 5MB mailbox quotas, if you're using a Sun 
Enterprise 5500.  At that point, if you want to add any new users, 
you must first delete some old ones.

	Belgacom Skynet bought a small ISP co-op in southern Belgium that 
was using SIMS as their mail system, and one of the reasons they were 
selling themselves to us was the fact that their mail system couldn't 
scale.  We moved their users over to a system on a Sun E420R with an 
external Comparex D1400/Hitachi Data Systems DF400 RAID array which 
was already serving several hundred thousand users, and we didn't 
even notice.


	SIMS and Netscape/iPlanet mail server are dead-end products. 
Scott McNealy was very unpleasantly surprised when the Sun Europe 
guys sprung SIMS on him, and it is definitely going the way of the 
dodo.  Note that Sun is a major investor in Sendmail, Inc. and they 
have on their payroll one of the key members of the Sendmail 
Consortium.

>  40 seconds to transfer on a Gigabit ethernet... assuming you can get
>  it of the disks.  8-).  Do you really expect them all simultaneously?

	Not a one of these machines has GigaBit Ethernet.  They all have 
100Base-TX FastEthernet, and the front-end machines may also have a 
second 100Base-TX FastEthernet interface (if I can scrounge a couple 
of NICs).

	The big problem is that most of the users will also have 
100Base-TX FastEthernet.  It won't take too many of them trying to 
access the server at once to completely swamp it.

>  You don't need to assert a lock over NFS, if the only machine doing
>  the reading is the one doing the writing, and it asserts the lock
>  locally (this was more talking about the Cyrus cache files, not
>  maildir).

	This assumes that there is only one machine ever writing to a 
particular mailbox.  This is not a valid assumption.

-- 
Brad Knowles, <brad.knowles@skynet.be>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
     -Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message