Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 05 Dec 2000 16:34:30 -0500 (EST)
From:      Bosko Milekic <bmilekic@technokratis.com>
To:        Terry Lambert <tlambert@primenet.com>
Cc:        arch@FreeBSD.ORG
Subject:   Re: zero copy code review
Message-ID:  <Pine.BSF.4.21.0012051623180.9538-100000@jehovah.technokratis.com>
In-Reply-To: <200012042352.QAA12392@usr02.primenet.com>

next in thread | previous in thread | raw e-mail | index | archive | help

	I don't understand what you're complaining about already. Just set
  kern.ipc.mbuf_wait to 0 and you'll have the behavior you're looking for.
  As for the kernel, people must always keep checking whether their mbuf
  pointer is NULL following any type of allocation and deal with it
  appropriately (return ENOBUFS or drop the packet) until a real preventive
  global preventive measure is put into place (think vm_state a-la PHK, or
  something similar).
  	Changing the behavior of M_WAIT to not return NULL ever is out of the
  question. I don't need to explain myself again. If you want to test this
  theory, try lowering NMBCLUSTERS (so it's easy to exhaust mb_map),
  heavily load your network (from the outside) and tune your
  kern.ipc.mbuf_wait accordingly, so that `netstat -m' shows a "requests
  for memory delayed" > "requests for memory denied." This should give you
  an about optimal wait time for heavy network load. This is your "normal
  wait time." Now load your system with some local DoS (allocate very large
  socket buffers in a tight loop, for example) and watch your system
  effectively deadlock, and then see how much you're glad that your process
  isn't hanging in the kernel indefinetely and how ^C eventually does its
  job and kills the process. Then watch your system recover. Now set
  mbuf_wait to 0 and run the same test. Have fun running fsck after the
  cold boot.
	Just about the only thing that may be considered is changing the
  name of M_WAIT to something more appropriate, if it means so much to the
  majority of people (honestly, I would find even doing this a waste of
  time, but if lots of folks think it's worth educating kernel developers
  by changing the name of a flag, then we might as well). 

On Mon, 4 Dec 2000, Terry Lambert wrote:

> > > [ ... local DOS ... ]
> > > 
> > > I really don't buy a probability defense.  If a probability defense
> > > were acceptable, then not checking for a NULL return, and eating
> > > the panic that results is also acceptable.
> > 
> > 	It's not a "probability defense." It's not a "defense." It's just a
> >   "don't act the worst way possible when we have an attack." And you
> >   haven't said at all why waiting indefinetely is better than not,
> >   especially in the problematic situation I brought up.
> 
> The situation you quote is one where the allocation fails,
> instead of WAITing until it can complete successfully, and
> this results in the kernel function failing and state being
> undone back to the point where the user space call that was
> the originator of the request fails back to user space.  Then
> the user space code has to handle the failure.
> 
> I maintain that the most reasonable and logical thing for the
> user space program to do, on seeing this failure (ENOBUF?),
> is to retry the operation.
> 
> So it calls down again, and fails again, and you have
> substituted a busy loop which crosses protection domains
> twice, for a kernel sleep.  This is the best case.
> 
> The worst case is that the local DOS obtains yet more
> resources when the state is backed out,  and the busy
> loop path in the kernel becomes shorter, due to an
> earlier failure for lack of resources.
> 
> In neither case does failing the allocation instead of
> sleeping do _anything at all_ to address the root cause
> of the problem, nor does the failure result in the problem
> going away or being lessened.
> 
> So I really don't see what is being accomplished by failing
> the allocation, rather than sleeping, except to use up
> _extra_ resources, during a time of resource starvation, to
> enforce the mbuf_wait interval.
> 
> 
> > > The problem with this theory is that "have the the [non-offending]
> > > process return from the kernel and deal with the temporary failure"
> > > presumes that there is a correct way to work around the failure in
> > > user space.
> > 
> > 	No, it doesn't. But it's better for the process to sleep in user
> >   space than to be INDEFINETELY stuck in the kernel. And, in the case of an
> >   attack, it _will_ be indefinetely stuck.
> 
> Why the heck would the process sleep in user space?!?  It has
> work to do, it knows the call to make to do the work, and it
> will make the call repeatedly, untile it's context switched,
> or until the call succeeds.  This is just like a write loop
> on a large buffer, subtracting out the write() return value
> and advancing the buffer pointer, until everything has been
> written.  You might argue that a "correctly" written user
> space program would use a select loop, but I'm betting that
> the descriptor will show as writeable, even if thee aren't
> any mbufs available to accept the write; there's no way to
> make the write select accurate, without pre-reserving memory
> to accept the write.
> 
> Personally, I would prefer, under DOS conditions, that my
> program be stuck in kernel space, so that it at least has a
> small chance of getting work done slowly during a DOS, than
> stuck in user space.  You can be sure that the DOS process
> is not going to be nearly as polite in hanging around in user
> space until kernel resources are freed up.
> 
> 
> > > I would maintain that the failure would be persistant, since this
> > > does nothing to silence the DOS attack, and there is nothing that
> > > a user space program can do, except to retry, and get all the way
> > > down the code path to the same place that it was before.
> > 
> > 	Right. It's not a preventive measure. But, it's much better to have
> >   it act in this manner than wait indefinetely "in the case of."
> 
> I strongly disagree.  That's "``in the case of'' being able to
> get work done, despite the DOS".  Hung in user space is the
> same as hung in the kernel: your process is not doing useful
> work.
> 
> Making it easier for the DOS to get yet more resources during
> a period of resource starvation, and preventing other programs
> from competing ewith the DOS for resources freed by timeout or
> other mechanism, which takes them back from the DOS, seems like
> a big mistake to me.  I would much rather have a system that I
> can normally talk to in a few seconds be capable of being talked
> to over a period of 10 minutes, than one I can't talk to at all;
> wouldn't you?
> 
> 
> > > It seems to me that this is just a case of how big you want to
> > > make your retry loop, not one of whether or not there will be a
> > > retry loop.
> > 
> > 	The retry loop is _useless_. You drop the mutex and lose priority in
> >   the wait queue when you return from m_get(). Calling again makes your
> >   chances of getting an mbuf in a shortage even less probable. If you want
> >   that behavior, just tweak your kern.ipc.mbuf_wait.
> 
> This is actually the opposite of the effect you would want.  A
> well behaved process denied a scarce resource should be first in
> line for that resource.  Saying "I can't give you one because
> there's this pig of a process, but I'll tell you what I'll do:
> why don't you just piss off until the next millenium?" is no way
> to encourage well behaved processes... 8-).
> 
> 
> > [...]
> > > I would argue that this level of congestion should be proactively
> > > prohibited from occurring in the first place; the most likely way
> > > to do this correctly is to start "dropping" the oldest datagrams,
> > > NOT returning "NULL" to allocations made on behalf od telnetd or
> > > sshd from the local interface.
> > 
> > 	This is really a great block of theory. I only wish that people with
> >   such a passion to argue the methods would work in actually implementing
> >   them.
> 
> The code which implements "source quench" could be abused to
> provide this functionality at the queue bottom, where things
> are packing up in the ICMP echo datagram case (as one example).
> 
> 
> > 	It's not. It never was. It never will be. It's just better than
> >   waiting indefinetely. It still provides you with the ability to wait
> >   indefinetely, though, if you are incapable of understanding why it's
> >   better not to.
> 
> Explain it to me: why is it better to not wait?  When I see the
> error return from the low memory condition, am I supposed to shut
> myself down, disabling apache, for example?  Is _everyone_
> supposed to do the same thing, until there is nothing but the DOS
> process running on the system?
> 
> What does me failing buy _me_?
> 
> How is this different than me waiting on _any_ contended resource,
> instead of timing out, like an advisory lock on a file?
> 
> 
> > > As a general bone of contention, if the thing _doesn't_ wait, it
> > > shouldn't be called M_WAIT, it should be called M_TRY_HARDER or
> > > something that indicates that the default behaviour has been
> > > altered, but in fact the routine will not be waiting around until
> > > it is successful, like all of the other _WAIT flags imply.
> > 
> > 	It _does_ wait, and I disagree. By that logic, why not rename all the
> >   _WAITs with _WAIT_INDEF? If you're curious about what M_WAIT does, you
> >   can either read the code (hey, it is free!) or read the mbuf(9) man page
> >   (now available in -CURRENT).
> 
> It waits until it doesn't, you mean.  8-p.
> 
> 
> 					Terry Lambert
> 					terry@lambert.org
> ---
> Any opinions in this posting are my own and not those of my present
> or previous employers.

  Regards,
  Bosko Milekic
  bmilekic@technokratis.com




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0012051623180.9538-100000>