From owner-freebsd-current Tue Apr 17 19: 6:18 2001 Delivered-To: freebsd-current@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id EEF0637B440 for ; Tue, 17 Apr 2001 19:06:13 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.2/8.11.2) id f3I268716829; Tue, 17 Apr 2001 19:06:08 -0700 (PDT) (envelope-from dillon) Date: Tue, 17 Apr 2001 19:06:08 -0700 (PDT) From: Matt Dillon Message-Id: <200104180206.f3I268716829@earth.backplane.com> To: Alfred Perlstein Cc: Greg Lehey , "Justin T. Gibbs" , Doug Barton , "current @ freebsd . org" Subject: Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost) References: <200104160259.f3G2xqs06321@aslan.scsiguy.com> <200104160616.f3G6GI973782@earth.backplane.com> <20010417011957.W976@fw.wintelcom.net> <20010418093212.A80877@wantadilla.lemis.com> <200104180047.f3I0lN615938@earth.backplane.com> <20010417182840.A976@fw.wintelcom.net> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG : Mutex creation can be expensive as it seems like each interrupt : needs to register what sort of mutex it's interested in, when a : mutex is created the list must be scanned and each interrupt : updated. The list is based in the interrupt structure. The cost is, what, four or five instructions in a loop for which the vast majority will only have to iterate once. All the operations are read-only unless you get a hit. Very cheap. It would be nice if the list could be fixed to one or two items... same number of instructions, but no loop, fewer memory accesses, and cheaper to execute. The only interrupts we care about in regards to the efficiency of this design are: network interrupts and I/O interrupts, yes? Network interrupts can get away with one or two mutexes (memory and queue, or perhaps even just memory). I/O interrupts are a stickier issue but up until softupdates the only thing biodone() did was release a lock already held, so it wouldn't be an issue. I think softupdates relegates nearly all of its work to a software interrupt or process so softupdates would not represent a problem either. I'd have to review it. I made one change to the VM system in 4.x which was to free swap indirectly from biodone which I would have to rip out, but that would pretty much be it. : Interrupts do not know "exactly" which mutexes they will need, they : know about a subset of the mutexes they may need, this scheme causes : several problems: : 1) interrupts are again fan-in, meaning if you block an interrupt : class on one cpu you block them on all cpus They don't have to be. If you have four NICs each one can be its own interrupt, each with its own mutex. Thus all four can be taken in parallel. I was under the impression that BSDI had achieved that in their scheme. If you have one NIC then obviously you can't take multiple interrupts for that one NIC on different cpu's. No great loss, you generally don't want to do that anyway. : 2) when we may have things like per-socket mutexes we are blocking : interrupts that may not need twiddling by the interrupt handler, : yet we need to block the interrupt anyway because it _may_ want : the same mutex that we have. Network interrupts do not mess around with sockets. The packets are passed to a software interrupt level which is certainly a more heavyweight entity. I can be argued very easily that the TCP stack should operate as a thread -- actually, one thread for each cpu, so if you have a lot of TCP activity you can activate several threads and process TCP operations in parallel. (IRIX did this to good effect). Nobody should ever do complex processing in an interrupt, period. If you need to do complex processing, you do it in a software interrupt (in -stable), or a thread in the new design. : Windriver has a full time developer working on the existing : implementation, as far as I know we can only count on you for : weekends and spare time. Doesn't effect the discussion, really. It's nice that people are dedicated to the project. I wish someone were in charge of it, like Linus is in charge of Linux. When my time frees up (A year from now? Less? More? I don't know).. when my time frees up I am going to start working from whatever platform is the most stable. If 5.x isn't stable by that time it's probably hopeless and I'll have to start work from the 4.x base. If 5.x is stable then I'll be able to start from 5.x. I know that sounds harsh, but it's a realistic view. I truely do not believe that SMPifying things needs to be this difficult, if only people would focus on the things that matter and stop trying to throw the kitchen sink into -current (especially without adequate testing). That's my beef with current. I find it ironic that I was shot down for not following the BSDI mutex model in the name of compatibility when I did that first push, but when other people started messing with the system compatibility went flying right out the window. Very ironic. :neither scheme is buying us much other than overhead without :signifigant parts of the kernel being converted over to a mutexed :system. : :Your proposal is valueable and might be something that we switch :to, however for the time being it's far more important to work on :locking down subsystems than working on the locking subsystem. : :In fact if you proposed a new macro wrapper for mtx_* that would :make it easier at a later date to implement _your_ version of the :locking subsystem I would back it just to get you interested in :participating in locking down the other subsystems. : :-Alfred I wasn't really proposing a new macro wrapper, it was just pseudo code. If I were doing mutexes from scratch I would scrap all the fancy features and just have spin mutexes, period. One argument (pointer to the mutex), simplified operation, optional debugging, done. Complexity breeds bugs. Bugs prevent adoption, lack of adoption results in death. Death not good. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message