Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Feb 2003 08:09:14 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Brad Knowles <brad.knowles@skynet.be>
Cc:        Rahul Siddharthan <rsidd@online.fr>, freebsd-chat@freebsd.org
Subject:   Re: Email push and pull (was Re: matthew dillon)
Message-ID:  <3E4BC32A.713AB0C4@mindspring.com>
References:  <20030211032932.GA1253@papagena.rockefeller.edu>			 <a05200f2bba6e8fc03a0f@[10.0.1.2]>			 <3E498175.295FC389@mindspring.com>		 <a05200f37ba6f50bfc705@[10.0.1.2]>		 <3E49C2BC.F164F19A@mindspring.com>	 <a05200f43ba6fe1a9f4d8@[10.0.1.2]>	 <3E4A81A3.A8626F3D@mindspring.com> <a05200f4cba70710ad3f1@[10.0.1.2]> <3E4B11BA.A060AEFD@mindspring.com> <a05200f5bba7128081b43@[10.0.1.2]>

next in thread | previous in thread | raw e-mail | index | archive | help
Brad Knowles wrote:
> >>          Under what circumstances are you not interested in I/O throughput?!?
> 
>         Again, you're talking about the MTA.  For this discussion, I
> couldn't give a flying flip about the MTA.  I care about the message
> store and mailbox access methods.  I know how to solve MTA problems.
> Solving message store and mailbox access methods tend to be more
> difficult, especially if they're dependant on a underlying technology
> that you can't touch or change.

OK, then why do you keep talking about I/O throughput?  Do you
mean *network I/O*?  Why the hell would you care about disk I/O
on a properly designed message store, when the bottleneck is
going to first be network I/O, followed closely by bus bandwidth?


> >  The issue is not real limits, it is administrative limits, and, of
> >  you care about being DOS'ed, it's about aggregate limits not
> >  resulting in overcommit.
> 
>         Quotas and making sure you have enough disk space are
> well-understood problems with well understood solutions.

The consequences of quatas are (apperently) not well understood.


> >  You are looking at the problem from the wrong end.  A quota is good
> >  for you, but it sucks for your user, who loses legitimate traffic,
> >  if illegitimate traffic pushed them over their quota.
> 
>         There's no way around this issue.  If you don't set quotas then
> the entire system can be trivially taken down by a DOS attack, and
> this affects thousands, hundreds of thousands, or millions of other
> users.  If you do set quotas, the entire system can still be taken
> down, but it takes a more concerted effort aimed at more than just
> one user.

So what's the difference between not enforcing a quota, and ending
up with the email sitting on your disks in a user maildrop, or
enforcing a quota, and ending up with the email sitting on your
disks in an MTA queue?

Quotas are actually a strong argument for single image storage.


>         You have to have quotas.  There simply is no other viable
> alternative.  They key is setting them high enough that 95-99% of
> your users never hit them, and the remainder that do would probably
> have hit *any* quota that you set, and therefore they need to be
> dealt with in a different manner.

Obviously, unless setting the quota low on purpose is your revenue
model (HotMail, Yahoo Mail).


>         For dealing with DOS attacks that take a single user over their
> quota, that's a different issue that has to be addressed in a
> different manner.

How?  It's going to sit on your disks, no matter what, the only
choice you really have on it is *which* disk it's going to sit on.


> >  What this comes down to is the level of service you are offering
> >  your customer.  Your definition of "adequate" and their definition
> >  of "adequate" are likely not the same.
> 
>         If 95-99% of all users never even notice that there is a quota,
> then I've solved the part of the problem that is feasible to solve.
> The remainder cannot possibly be solved with any quota at any level,
> and these users need to be dealt with separately.

Again, how?


> >  However, we now see that it's being used as a lever to attempt to
> >  extract revenue from a broken business model ("buy more disk space
> >  for only $9.95/month!").
> 
>         Another valid use, in this case allowing you to have an actual
> sustainable business model.
> 
>         Or would you prefer for everyone to offer all their services for
> "free" only to go bankrupt six months later, and forcing you to go
> somewhere else for your next fix of "free" service?  That way lies
> madness.

No.  you are misunderstanding.  Their business model is:

1)	Attract people who are unwilling to pay for service

2)	try to sell things to the people who will not pay for
	things in the first place

3)	Profit!!!

It's a losing proposition, entirely.  It's like the "whitebox" sellers
in Computer Shopper, whose businesses all go under in ~3 months when
the run out of capital from trying to "undercut the market to establish
a customer base, then raise prices to cash in".

I call this "The Chinese Restaurant Model": they expect to attract
people who have no brand/vendor loyalty, and then they expect them
to stay, out of brand/vendor loyalty.


> >  The user convenience being sold here lies in the ability for the
> >  user to request what is, in effect, a larger queue size, in
> >  exchange for money.
> >
> >  If this queue size were not an issue, then we would not be having
> >  this discussion: it would not have value to users, and, not having
> >  any value, it would not have a market cost associated with its
> >  reduction.
> 
>         You have to pay for storage somehow.

I understand.  I'm saying that the business model is fundamentally
flawed, because it depends on something to get users, and then it
depends on the logical NOT of that same something, in order to keep
them.


>         If you store it all on the sender's system, then you run into
> SPOFs, overload when a billion people all check their e-mail and read
> a copy of the same message, backups, etc....

You mean like storing content on HTTP servers?


>         If you use a flood-fill mechanism, then everyone pays to store
> everyone's messages all the time, and then you run into problems of
> not enough shared storage space so old messages get tossed away very
> quickly and then they just re-post them again.  Look at what's
> happening to USENET today.

Flood fill will only work as part of an individual infrastructure,
not as part of a shared infrasstrusture, if what you are trying to
sell is to be any different from what everyone else is giving away
for free.  You can't have a general "the Internet is a big disk"
mentality.  At best, you can have peering arrangements, and then
only between peers within half an order of magnitude in size.


>         If you store them on the recipient system, you have what exists
> today for e-mail.  Of the three, this is the only one that has proved
> sustainable (so far) and sufficiently reliable.

This argument is flawed.  Messages are not stored on recipient
systems, they are stored on the systems of the ISP that the
recipient subscribes to.  Users, with the exception of some bearded
weirdos (Hi, guys!) do not run their own mail servers.  That's
where quotas become an issue.


> >  Whether the expires is enforced by default, self, or administratively
> >  is irrelevent to the mere fact that there is a limited lifetime in
> >  the distributed persistant queueing system that is Usenet.
> 
>         Yeah, at 650GB/day for a full feed, it's called not having enough
> disk space for an entire day's full feed.  At ~2GB/day for text only,
> it's called not having enough disk space for a weeks traffic.  And
> you still lose messages that never somehow managed to flood over to
> your system.  For USENET, this doesn't really matter.  But for
> personal e-mail that needs some reasonable guarantees, this just
> doesn't fly.

Yet those same guarantees are specifically disclaimed by HotMail
and other "free" providers, even though there is no technological
difference between a POP3 maildrop hosted at EarthLink and accessed
via a mail client, and a POP3/IMAP4 maildrop hosted at HotMail and
accessed via a mail client.

*This* is what you are supposedly paying for, but a quota is in
place in both cases.


> >  This is a transport issue -- or, more properly, a queue management
> >  and data replication issue.  It would be very easy to envision a
> >  system that could handle this, with "merely" enough spindles to
> >  hold 650GB/day.
> 
>         Two IDE 320GB disks are not going to cut it.  They cannot
> possibly get the data in and out fast enough.

Who the hell uses IDE on servers?!?  Get real!  You can't detach an
IDE drive during the data transfer on a write, so tagged command
queueing only works for *reading* data.  For a server that does writes,
you use *SCSI* (or something else, but *not* IDE).


> >                   An OS with a log structured or journalling FS,
> >  or even soft updates, which exported a transaction dependency
> >  interface to user space, could handle this, no problem.
> 
>         Bullshit.  You have to have sufficient underlying I/O capacity to
> move a given amount of data in a given amount of time, regardless of
> what magic you try to work at a higher level.

I think I see the misunderstanding here.  You think IDE disks are
server parts.  8-).


> >  Surely, you aren't saying an Oracle Database would need a significant
> >  number of spindles in order to replicate another Oracle Database,
> >  when all bottlenecks between the two machines, down to the disks,
> >  are directly managed by a unified software set, written by Oracle?
> 
>         Yup.  Indeed, this is *precisely* what is needed.  Just try doing
> this on a single 320GB hard drive.  Or even a pair of 320GB hard
> drives.

IDE again.

>         We need enough drives with enough I/O capacity to handle the
> transaction rates.  We worry about disk space secondarily, because we
> know that we can always buy the next size up.

Use SCSI, or divide the load between a number of IDE spindles
equal to the tagged command queue depth for a single SCSI drive
(hmmm... should I buy five SCSI drives, or should I buy 500 IDE
drives?).


> >  I'm not positive that it matters, one way or the other, in the
> >  long run, if thigs are implemented correctly.  However, it is
> >  Esthetically pleasing, on many levels.
> 
>         Aesthetically pleasing or not, it is not practical.  SIS causes
> way too many problems and only solves issues that we don't really
> care about.

It gets rid of the quota problem.

Heck, you could even store your indices on a SCSI drive, and then
store your SIS on an IDE drive, if you wanted.


> >  Why the heck are you locking at a mailbox granularity, instead
> >  of a message granularity, for either of these operations?
> 
>         For IMAP, you need to lock at message granularity.  But your
> ability to do that will be dependant on your mailbox format.
> Choosing a mailbox directory format has a whole host of associated
> problems, as well understood and explained by Mark Crispin at
> <http://www.washington.edu/imap/documentation/formats.txt.html>.

Mark's wrong.  His assumptions are incorrect, and based on the
idea that metadata updates are not synchronous in all systems.
He's worrying about a problem that only exists on some platforms,
and he has to do that, because his software *may* have to run on
those platforms.

If you want me to get into criticizing his code, I can; at one
point, I converted the UW IMAP server to C++, with a pure virtual
base class for the driver interfaces, and then implemented each
driver as an implementation class.  There are tons of places that
you would get runtime errors that doing this converts to compile
time errors (e.g. potential NULL pointer dereferences turn into
compilation errors about not having implementations for member
functions in the pure virtual base class).

At best, UW IMAP is an academic project.

Cyrus is much closer to commercial usability, but it has it's own
set of problems, too.  Most of them, though, are solvable by
adding depth to the mail directory, so that you can seperate out
the metadata, and remove the "." separator restriction.


>         Either way, locking is a very important issue that has to be
> solved, one way or the other.

No, it's a very important issue that has to be designed around,
rather than implemented.

FreeBSD has this same problem: global resources with more than
one acessor automatically require addition of locking.


>         I can tell you that the guys at Compuserve appeared to be
> blissfully unaware of many scaling issues when they had one millions
> customers and AOL had five million.  I don't understand why, but
> somewhere between those two numbers, a change in scale had become a
> change in kind.

Amen.


> >  I have read it.  The modifications he proposes are small ones,
> >  which deal with impedence issues.  They are low hanging fruit,
> >  available to a system administrator, not an in depth modification
> >  by a software engineer.
> 
>         The point is that these low-hanging fruit were enough to get Nick
> to a point where he could serve multiple millions of customers using
> this technology, and he didn't need to go any further.

Yes, and no.  It's very easy to paint a rosy picture in a technical
paper, particularly when you are in a position to need to obtain
funding.  8-).  It's something else entirely to deal with support
and scalability issues, to the point where you "just throw hardware"
at the problem.  Nick's solution seems to require a lot of manual
load distribution, or a lot of proactive capacity planning, both of
which are damaging, in terms of not locking up cash flow.  8-(.


[ ... Nick's Magic Mail ... ]

>         I can't adopt his solution.  He did POP3, I'm doing IMAP.
> 
>         The mailbox formats have to change, because we have to assume
> multiple simultaneous processes accessing it (unlike POP3).  He did
> just fine with mailbox locking (or methods to work around that
> problem).  I need message locking (or methods to work around that
> problem).  There are a whole series of other domino-effect changes
> that end up making the end solution totally different.
> 
>         Simply put, there just aren't that many medium-scale IMAP
> implementations in the world, period.  Even after my LISA 2000 paper,
> there still haven't been *any* truly large-scale IMAP
> implementations, despite things like
> <http://www-1.ibm.com/servers/esdd/articles/sendmail/>,
> <http://www.networkcomputing.com/1117/1117f1.html?ls=NCJS_1117bt>,
> <http://store.sendmail.com/pdfs/whitepapers/wp_samscapacity.pdf>,
> <http://store.sendmail.com/pdfs/whitepapers/wp_samscapacity.zseries.pdf>,
> and <http://www.dell.com/downloads/global/topics/linux/sendmail.doc>.
> 
>         Certainly, so far as I can tell, none of them have used NFS as
> the underlying mailbox storage method.

You are unlikely to ever find someone using NFS in this capacity,
except as a back end for a single server message store.  What you
appear to be asking for is a way to store all the mail on a big
NetApp Filer, and then have a bunch of front end machines accessing
the same mailboxes (inbound SMTP severs and outbound and inbound
IMAP4 acessors.

I submit that you've got a lot of work ahead of you.  I've personally
got code that can do it, but I have six months into it, and I value it
at over $3M.


[ ... level of depth of understanding ... ]

>         Give me such a box and wait until I've gotten this project out of
> the way, and I'll be glad to do this sort of thing.  I'm setting up
> my own consulting business, and a large part of the work I want to do
> is in relation to research on scaling issues.  This would be right up
> my alley.

The point was that, without making changes requiring an in depth
understanding of the code of the components involved, which Nick's
solution doesn't really demonstrate, you're never going to get more
than "marginally better" numbers.

[ ... ]

>         Cyrus doesn't work on NFS.  Most of the commercial products I've
> been able to find are based on Cyrus or Cyrus-like technology and
> don't support NFS, either.  The ones I've been able to find that
> would (theoretically) support NFS are based on Courier-IMAP, and run
> on Linux on PCs.

It works on NFS.  You just have to run the delivery agent on the
same machine that's running the access agent, and not try to mix
multiple hosts accessing the same data.

I understand you want a distributed, replicated message store, or
at least the appearance of one, but in order to get that, well,
you have to "write a distributed, replicated message store".


>         One of the other can't-change criteria for this system is that it
> has to run on SPARC/Solaris, so for example Bynari Insight Server is
> not an option.

The part of Netscape that Sun bought used to provide an IMAP4
server (based on heavily modified UW IMAP code).  Is there a
reason you can't use that?  I guess the answer must be "I have
been directed to use Open Source".  8-).


> >  How many maildrops does this need to support?  I will tell you if
> >  your project will fail.  8-(.
> 
>         ~1800 LAN ene-mail clients initially, quickly growing to
> ~3000-4000, and possible growth to ~6,000-10,000.

[ ... lot of stats ... ]

This should be no problem.  You should be able to handle this
with a single machine, IMO, without worrying about locking, at
all.  10,000 client machines is nothing.  At worst, you should
seperate inbound and outbound SMTP servers, so you can treat the
inbound one as a bastion host, and keep the outbound entirely
inside, and the inbound server should use a transport protocol
for internal delivery to the machine running the IMAP4 server,
which makes lockign go away.  At worst, you can limit the number
of bastion to internal server connections, which will make things
queue up at the bastion, if you get a large activity burst, and
let it drain out to the internal server, over time.  At most,
you are well under 40,000 simultaneous TCP connections to the
IMAP4 server host, even if you are using OutLook, people have
two mailboxes open, each, and are monitoring incoming mail in
several folders.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E4BC32A.713AB0C4>