From owner-freebsd-arch Thu Jan 23 19:13:36 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 06AAB37B401; Thu, 23 Jan 2003 19:13:32 -0800 (PST) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 51C8843F43; Thu, 23 Jan 2003 19:13:31 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0463.cvx22-bradley.dialup.earthlink.net ([209.179.199.208] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18buHM-0004nK-00; Thu, 23 Jan 2003 19:13:17 -0800 Message-ID: <3E30AEF6.FD18CF37@mindspring.com> Date: Thu, 23 Jan 2003 19:11:50 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Bosko Milekic Cc: Doug Rabson , John Baldwin , arch@FreeBSD.org, Andrew Gallatin Subject: Re: M_ flags summary. References: <1043339738.29341.1.camel@builder02.qubesoft.com> <3E309FE5.F74564DC@mindspring.com> <20030123212722.A80406@unixdaemons.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a499b4237ee4ee977d4c2efb242358239593caf27dac41a8fd350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Bosko Milekic wrote: > On Thu, Jan 23, 2003 at 06:07:33PM -0800, Terry Lambert wrote: > > This is preferrable for *most* cases. For cases where a failure > > of an operation to complete immediately results in the operation > > being queued, which requires an allocation, then you are doing a > > preallocation for the failure code path. Doing a preallocation > > that way is incredibly expensive. If on the other hand, you are > > doing the allocation on the assumption of success, then it's > > "free". The real question is whether or not the allocation is in > > the common or uncommon code path. > > In that case you shouldn't be holding the lock protecting the queue > before actually detecting the failure. Once you detect the failure, > then you allocate your resource, _then_ you grab the queue lock, > _then_ you queue the operation. This works unless you left out some > of the detail from your example. The point is that I'm sure that a > reasonable solution exists for each scenario, unless the design is > wrong to begin with... but I'm willing to accept that my intuition has > misled me. Sorry, I thought the problem was obvious: by doing this, you invert the lock order that you would normally use for a free, if you are locking one or more other things, at the time. The most common case for this inversion would be "fork", or any other place that has to punch the scheduler. But it's also in pretty much every place you would do a kevent, as well as in the network stacks, where copies or pullups happen. Basically, you can't delay holding the lock, but you can accellerate it (ie. doing it early means an extra free in the failure case -- if it means it in the success case, you are pessimizing the heck out of something you shouldn't be). Otherwise, you would have to tolerate "LOCK MALLOC; LOCK XXX" on malloc and "LOCK XXX; LOCK MALLOC" on frees. > > The easy way to mitigate the issue here is to maintain an object > > free list, and use that, instead of the allocator. Of course, if > > you do that, you can often avoid holding a mutex altogether. And > > if the code tolerates a failure to allocate reasonably well, you > > can signal a "need to refill free list", and not hold a mutex over > > an allocation at all. > > Although clever, this is somewhat bogus behavior w.r.t. the allocator. > Remember that the allocator already keeps a cache but if you instead > start maintaining your own (lock-free) cache, yes, maybe you're > improving local performance but, overall, you're doing what the > allocator should be doing anyway and, in some cases, this hampers the > allocator's ability to manage the resources it is responsible for. > But I'm sure you know this because, yes, you are technically correct. IMO, the allocator should really do this on your behalf, under the covers. One of the things that UnixWare (SVR4.2) did internally was to preallocate a pool of buffers for use by network drivers, to avoid having to do allocations at interrupt time, and then refill the pool, as necessary, in the top en of the drivers. So while clever, I can't claim that it's original. 8-). > In any case, it's good that we're discussing general solution > possibilities for these sorts of problems but I think that we agree > that they are rather special exception situations that, given good > thought and MP-oriented design, can be avoided. And that's what I > think the allocator API should encourage: good design. By specifying > the wait-case as the default behavior, the allocator API is > effectively encouraging all non-ISR code to be prepared to wait, for > whatever amount of time (the actual amount of time is irrelevant in > making my point). This contradicts bot Jeffrey's and Warner's points, though, and I think their points were valid. The problem that Jeffrey wants addressed is the problem of magic numbers; I almost made this point myself, when we were talking about prototypes in scope for the math library functions, which were defines instead of prototypes in the x86 case, to use inline functions. The issue there was that the place the defines and/or prototypes belonged was actually in the machine dependent files. Thsi is because the functions took manifest constants as parameters which may very well be enum's or something else -- and the values of the bits could be different from platform to platform. Magic numbers really suck, even if they are "0". The problem Warner wants addressed is that in order to provide certain classes of scheduling service to applications, and in particular, to provide POSIX conformant scheduling for parts of the POSIX specifications, you have to be able to do deadlining. What this boils down to is that you have to be able to guarantee that particular operations will either succeed or fail in a bounded amount of time (e.g. 2ms or whatever the bound happens to be). For that to work, you prefer that something fails, rather than sleeping. Short of adding a parameter that gets passed down to all the functions in the chain to the target function which might sleep, telling it whether or not it's OK to do so, you really can't make the type of guaranteeds necessary to conform to the standards. I understand that malloc() has a parameter for this now -- or it defaults to that aparameter being there -- but basically this means that what *should* be the common case: bounded completion time, regardless of success or failure, ends up being unbounded by default. So to fix this problem, a programmer would have to go out of their way to add additional "nowait" parameters to all functions up the gall graph, until they got to the one they cared about. This, to me, means that there's a lot of unnecessary slogging that's going to have to happen to get to the point where this is all heading anyway, eventually, and along with that, a lot of additional future opportunities for error. -- At this point, I would almost suggest elminating both M_NOWAIT and M_WAIT (and M_TRYWAIT and M_WAITOK, or whatever the heck it is this week), and split the functionality, so that there are two different function calls for malloc. This is similar to the suggestion on the table here -- however, I would *NOT* pass a mutex that could be released and reacquired over the wait to the allocation function; I would, instead have an allocation function which blocks indefinitely until it gets memory, or the heat death of the universe, whichever comes first (the whole "TRYWAIT" thing was an incredible mistake, IMO). At least this way, you can look at it as clearly laying out a way of getting rid of blocking allocation requests *some time in the future*, and then call the entry point for blocking allocations "deprecated" from the start, so that people will at least try to avoid using it in new code. I guess I should say that, on general principles, barriers should be up front-loaded, if possible. What I mean by this is that if you make it hard to do initial work on a massive change, and easier to do later work, you are *MUCH* better off than if you make the work easy up front, and then have to write "And Then A Miracle Happens..." in your project plan, right at the end. 8-). Put another way: people will only do work they are passionate about, and passion wanes, so you have to put the hard stuff up front, or lots of things will be started, but nothing will ever be completed. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message