From owner-freebsd-arch@FreeBSD.ORG Sun May 16 07:17:03 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 393BB16A4CE; Sun, 16 May 2004 07:17:03 -0700 (PDT) Received: from comp.chem.msu.su (comp.chem.msu.su [158.250.32.97]) by mx1.FreeBSD.org (Postfix) with ESMTP id 96E1843D53; Sun, 16 May 2004 07:17:01 -0700 (PDT) (envelope-from yar@comp.chem.msu.su) Received: from comp.chem.msu.su (localhost [127.0.0.1]) by comp.chem.msu.su (8.12.9p2/8.12.9) with ESMTP id i4GEGx3F040353; Sun, 16 May 2004 18:16:59 +0400 (MSD) (envelope-from yar@comp.chem.msu.su) Received: (from yar@localhost) by comp.chem.msu.su (8.12.9p2/8.12.9/Submit) id i4GEGwHc040352; Sun, 16 May 2004 18:16:59 +0400 (MSD) (envelope-from yar) Date: Sun, 16 May 2004 18:16:58 +0400 From: Yar Tikhiy To: arch@freebsd.org, net@freebsd.org Message-ID: <20040516141658.GA39893@comp.chem.msu.su> References: <20040508034514.GA937@grosbein.pp.ru> <20040508132354.GB44214@comp.chem.msu.su> <20040515182157.GB89625@comp.chem.msu.su> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040515182157.GB89625@comp.chem.msu.su> User-Agent: Mutt/1.5.6i cc: Eugene Grosbein Subject: TIME_WAIT sockets from other users (was Re: bin/65928: [PATCH] stock ftpd uses superuser credentials for active mode sockets) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 May 2004 14:17:03 -0000 Note for the impatient: This message does not discuss the well-known issue of reusing local addresses through setting SO_REUSEADDR. This message is on reusing local addresses occupied by sockets belonging to other users. On Sat, May 15, 2004 at 10:21:57PM +0400, Yar Tikhiy wrote: > > Attached below is a patch addressing the issue of the inability to > reuse a local IP:port couple occupied by an established TCP connection > from another user, but by no listeners. Could anybody with fair > understanding of our TCP/IP stack review it please? Thanks. > > -- > Yar > > Index: in_pcb.c > =================================================================== > RCS file: /home/ncvs/src/sys/netinet/in_pcb.c,v > retrieving revision 1.146 > diff -u -p -r1.146 in_pcb.c > --- in_pcb.c 23 Apr 2004 23:29:49 -0000 1.146 > +++ in_pcb.c 15 May 2004 17:37:18 -0000 > @@ -340,6 +340,8 @@ in_pcbbind_setup(inp, nam, laddrp, lport > return (EADDRINUSE); > } else > if (t && > + (so->so_type != SOCK_STREAM || > + ntohl(t->inp_faddr.s_addr) == INADDR_ANY) && > (ntohl(sin->sin_addr.s_addr) != INADDR_ANY || > ntohl(t->inp_laddr.s_addr) != INADDR_ANY || > (t->inp_socket->so_options & One more detail to note: Currently if another user's socket is in the TIME_WAIT state, it still counts as occupying the local IP:port couple. I cannot see the point of such a behaviour. Restricting bind() is to disallow unprivileged port stealth, but how can one steal a connection in the TIME_WAIT state? For FreeBSD-4 the above patch would take care of this case along with established connections, but in CURRENT TIME_WAIT connections are a special case since they no longer use full-blown state. Therefore, for CURRENT the above patch mutates into the below one. Do I have a point? -- Yar Index: in_pcb.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/in_pcb.c,v retrieving revision 1.146 diff -u -p -r1.146 in_pcb.c --- in_pcb.c 23 Apr 2004 23:29:49 -0000 1.146 +++ in_pcb.c 16 May 2004 13:33:33 -0000 @@ -332,14 +332,10 @@ in_pcbbind_setup(inp, nam, laddrp, lport * XXX * This entire block sorely needs a rewrite. */ - if (t && (t->inp_vflag & INP_TIMEWAIT)) { - if ((ntohl(sin->sin_addr.s_addr) != INADDR_ANY || - ntohl(t->inp_laddr.s_addr) != INADDR_ANY || - (intotw(t)->tw_so_options & SO_REUSEPORT) == 0) && - (so->so_cred->cr_uid != intotw(t)->tw_cred->cr_uid)) - return (EADDRINUSE); - } else if (t && + ((t->inp_vflag & INP_TIMEWAIT) == 0) && + (so->so_type != SOCK_STREAM || + ntohl(t->inp_faddr.s_addr) == INADDR_ANY) && (ntohl(sin->sin_addr.s_addr) != INADDR_ANY || ntohl(t->inp_laddr.s_addr) != INADDR_ANY || (t->inp_socket->so_options & From owner-freebsd-arch@FreeBSD.ORG Mon May 17 16:28:32 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1379316A4CE; Mon, 17 May 2004 16:28:32 -0700 (PDT) Received: from comp.chem.msu.su (comp.chem.msu.su [158.250.32.97]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0FF9B43D39; Mon, 17 May 2004 16:28:31 -0700 (PDT) (envelope-from yar@comp.chem.msu.su) Received: from comp.chem.msu.su (localhost [127.0.0.1]) by comp.chem.msu.su (8.12.9p2/8.12.9) with ESMTP id i4HNSS3F035506; Tue, 18 May 2004 03:28:28 +0400 (MSD) (envelope-from yar@comp.chem.msu.su) Received: (from yar@localhost) by comp.chem.msu.su (8.12.9p2/8.12.9/Submit) id i4HNSRDB035501; Tue, 18 May 2004 03:28:27 +0400 (MSD) (envelope-from yar) Date: Tue, 18 May 2004 03:28:27 +0400 From: Yar Tikhiy To: Cyrille Lefevre Message-ID: <20040517232827.GD27584@comp.chem.msu.su> References: <20040515092114.GB67531@comp.chem.msu.su> <042601c43a6b$cd1cb9a0$7890a8c0@dyndns.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <042601c43a6b$cd1cb9a0$7890a8c0@dyndns.org> User-Agent: Mutt/1.5.6i cc: arch@freebsd.org cc: hackers@freebsd.org Subject: Re: Interoperation of flock(2), fcntl(2), and lockf(3) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 May 2004 23:28:32 -0000 On Sat, May 15, 2004 at 01:00:13PM +0200, Cyrille Lefevre wrote: > "Yar Tikhiy" wrote: > [snip] > > Considering all the above, I'd like to add the following paragraph > > to the flock(2), lockf(3), and fcntl(2) man pages (replacing the > > sentence quoted from lockf(3)): > > > > The flock(2), fcntl(2), and lockf(3) locks are compatible. > > Processes using different locking interfaces can cooperate > > over the same file safely. However, only one of such > > interfaces should be used within a process. If a file is > > s/a process/the same process/ ? Agreed, thanks! BTW, since no objections were raised and Kirk encouraged me to make the change (thank you Kirk!), I just did so. -- Yar From owner-freebsd-arch@FreeBSD.ORG Thu May 20 13:31:19 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 83D5B16A4CE for ; Thu, 20 May 2004 13:31:19 -0700 (PDT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id E020843D49 for ; Thu, 20 May 2004 13:31:18 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.11/8.12.11) with ESMTP id i4KKUQDe094733 for ; Thu, 20 May 2004 16:30:27 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i4KKUQ2G094730 for ; Thu, 20 May 2004 16:30:26 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Thu, 20 May 2004 16:30:26 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: arch@FreeBSD.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: Network Stack Locking X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 May 2004 20:31:19 -0000 1.5 line summary: This is an e-mail about the on-going network stack locking and contains largely technical stuff. Executive summary: The high level view, for those less willing to wade through a greater level of detail, is that we have a substantial work in progress with a lot of our bases covered, and that we're looking for broader exposure for the work. We've been merging smaller parts of the work (supporting infrastructure, fine-grained locking for specific leaf dependencies), and are starting to think about larger scale merging over the next month or two. There are some known serious issues in the current work, but we've also identified some areas that need attention outside of the stack in order to make serious progress on merging. There are also some important tasks that require owners moving forward, and a solicitation for those areas. I don't attempt to capture everything, in particular things like locking strategies in this e-mail. You will find patch URLs and perforce references. Body: As many of you are aware, I've become the latest inheritor of the omnibus "Network Stack Locking" task of SMPng. This work has a pretty long history that I won't attempt to go into here, other than to observe that: - This is a product of the adoption of the SMPng approach a few years ago by the FreeBSD Project for the FreeBSD 5.x line. This approach attempts to address a lack of kernel parallelism and preemption, as well as generally formalizing synchronization, adopting architectural properties such as interrupt threads and a more general use of threads in the kernel, etc. - The vast majority of work that will be discussed in this e-mail is the product of significant contributions of others, including: Jonathan Lemon, Jennifer Yang, Jeffrey Hsu, and Sam Leffler, and a large number of other contributors (many of whom are named in recent status reports, but some of whom I've inevitably accidentally omitted and would be happy to be reminded of via private e-mail!). The goal of this e-mail is to provide a bit of high level information about what is going on to increase awareness, solicit involvement in a variety of areas, and throw around words like "merge schedule". Warning: this is a work in progress, and you will find rough parts. This is being worked on actively, but by bringing this up during the process, we can improve the work. If you see things that scare you, that's a reasonable response. Now into the details: Those following the last few status reports will know that recent work has focused in the following areas: - Introducing and refining data based locking for the top levels of the network stack (sockets, socket buffers, et al). - Refining and testing locking for lower pieces of the stack that already have locking. - Locking for UNIX domain sockets, FIFOs, etc. - Iterating through pseudo-interfaces and network interfaces to identify and correct locking problems. - Allow Giant to be conditionally acquired across the entire stack using a Giant Toggle Switch. - Address interactions with tightly coupled support infrastructure for the stack, including the MAC Framework, kqueue, sigio, select() general signaling primitives, et al. - Investigating and in many cases locking of less popular/less widely used stack components that were previously unaddressed, such as IPv6, netatalk, netipx, et al. - Some local changes used to monitor and assert locks at a finer granularity than in the main tree. Specifically, sampling of callouts and timeouts to measure what we're grabbing Giant for, and in certain branches, the addition of a great many assertions. This work is occurring in a number of Perforce branches. The primary branch that is actively worked on is "rwatson_netperf", which may be found at the following patch: //depot/users/rwatson/netperf/... Additional work is taking place to explore socket locking issues in: //depot/users/rwatson/net2/... A number of other developers have branches off of these branches to explore locking for particular subsystems. There are also some larger unintegrated patch sets for data-based NFS locking, fixing the user space build, etc. You can find a non-Perforce version at: http://www.watson.org/~robert/freebsd/netperf/ This includes a basic change log and incrementally generated patches, work sets, etc. Perforce is the preferred way to get to the work as it provides easier access to my working notes, the ability to maintain local changes, get the most recent version, etc. I try to drop patches fairly regularly -- several times a week against HEAD, but due to travel to BSDCan, I'm about two weeks behind. I hope to make substantial headway this weekend in updating the patch set and integrating a number of recent socket locking changes from various work branches. This work is currently a work in progress, and has a number of known issues, including some lock order reversal problems, known deficiencies in socket locking coverage of socket variables, etc. However, it's been being reviewed and worked on by an increasingly broad population of FreeBSD developers, so I wanted to move to a more general patch posting process and attempt to identify additional "hired hands" for areas that require additional work. Here are current known tasks and current owners: Task Developer ---- --------- Sockets Robert Watson Synthetic network interfaces Robert Watson Netinet6 George Neville-Neil Netatalk Robert Watson Netipx Robert Watson Interface Locking Max Laier, Luigi Rizzo, Maurycy Pawlowski-Wieronski, Brooks Davis Routing Cleanup Luigi Rizzo KQueue (subsystem lock) Brian Feldman KQueue (data locking) John-Mark Gurney NFS Server (subsystem lock) Robert Watson NFS Server (data locking) Rick Macklem SPPP Roman Kurakin Userspace build Roman Kurakin VFS/fifofs interactions Don Lewis Performance measurement Pawel Jakub Dawidek And of course, I can't neglect to mention the on-going work of Kris Kennaway to test out these changes on high-load systems :-). Some noted absences in the above, and areas where I'd like to see additional people helping out are: - Reviewing Netgraph modules for correct interactions with locking in the remainder of the system. I've started pushing some locking into ng_ksocket.c and ng_socket.c, and some of the basic infrastructure that needed it, but each module will need to be reviewed for correct locking. - ATM -- Harti? :-) - Network device drivers -- some have locking, some have correct locking, some have potential interactions with other pieces of the system (such as the USB stack). Note that for a driver to work correctly with a Giant-free system, it must be safe to invoke ifp->if_start() without holding Giant, and for if_start() to be aware that it cannot acquire Giant without generating a lock order issue. It's OK for if_input() to be called with Giant, although undesirable generally. Some drivers also have locking that is commented out by default due to use of recursive locks, but I'm not sure this is necessarily sufficient problem not to just turn on the locking. - Complete coverage of synthetic/pseudo-interfaces. In particular, careful addressing of if_gif and other "cross-layer" and protocol aware pieces. - mbuma -- Bosko's work looks good to me, we need to make sure all the pieces work with each other. Getting down to one large memory allocator would be great. I'm interested in exploring uniprocessor optimizations here -- I notice that a lot of the locks getting acquired in profiling are for memory allocation. Exploring using critical sections, per-cpu variables/caching, and pinning both seem like reasonable approaches to reduce synchronization costs here. Note that there are some serious issues with the current locking changes: - Socket locking is deficient in a number of ways -- primarily that there are several important socket fields that are currently insufficiently or inconsistently synchronized. I'm in the throes of correcting this, but that requires a line-by-line review of all use of sockets, which will take me at least another week or two to complete. I'm also addressing some races between listen sockets and the sockets hung off of them during the new connection setup and accept process. Currently there is no defined lock order between multiple sockets, and if possible I'd like to keep it that way. - Based on the BSD/OS strategy, there are two mutexes on a socket: each socket buffer has a mutex (send, receive), and then the basic socket fields are locked using SOCK_LOCK(), which actually uses the receive socket buffer mutex. This reduces the locking overhead while helping to address ordering issues in the upward and downward paths. However, there are also some issues of locking correctness and redundancy, and I'm looking into these as part of an overall review of the strategy. It's worth noting that the BSD/OS snapshot we have has substantially incomplete and non-functional socket locking, so unlike some other pieces of the network stack, it was not possible to use the strategy whole-cloth. In the long term, the socket locking model may require substantial revision. - Per some recent discussions on -CURRENT, I've been exploring mitigating locking costs through coalescing activities on multiple packets. I.e., effectively passing in queues of packet chains across API boundaries, as well as creating local work queues. It's a bit early to commit to this approach because the performance numbers have not confirmed the benefit, but it's important to keep that possible approach in mind across all other locking work, as it trades off work queue latency with synchronization cost. My earlier experimentation occurred at the end of 2003, so I hope to revisit this now that more of the locking is in place to offer us advantages in preemption and parallelism. - They enable net.isr.enable by default, which provides inbound packet parallelism through running to completion in the ithread. This has other down sides, and while we should provide the option, I think we should continue to support forcing use of the netisr. One of the problems with the netisr approach is how to accomplish inbound processing parallelism without sacrificing the currently strong ordering properties, which could cause bad TCP behavior, etc. We should seriously consider at least some aspects of Jeffrey Hsu's work on DragonFly to explore providing for multiple netisr's bound to CPUs, then directing traffic based on protocol aware hashing that permits us to maintain sufficient ordering to meeting higher level protocol requirements while avoiding the cost of maintaining full ordering. This isn't something we have to do immediately, but exploiting parallelism requires both effective synchronization and effective balancing of load. In the short term, I'm less interested in the avoidance of synchronization of data adopted in the DragonFly approach, since I'd like to see that approach validated on a larger chunk of the stack (i.e., across the more incestuous pieces of the network stack), and also to see performance numbers that confirm the claims. The approach we're currently taking is tried and true across a broad array of systems (almost every commercial UNIX vendor, for example), and offers many benefits (such as a very strong assertion model). However, as aspects of the DFBSD approach are validated (or not, as the case may be), we should consider adopting things as they make sense. The approaches offer quite a bit of promise, but are also very experimental and will require a lot of validation, needless to say. I've done a little bit of work to start applying the load distribution approach on FreeBSD, but need to work more on the netisr infrastructure before I'll be able to evaluate its effectiveness there. - There are still some serious issues in the timely processing and scheduling of device driver interrupts, and these affect performance in a number of ways. They also change the degree of effective coalescing of interrupts, making it harder to evaluate strategies to lower costs. These issues aren't limited to the network stack work, but I wanted to make sure it was on the list of concerns. Improving our scheduling and handling of interrupts will be critical to realizing the performance benefits SMPng has offered. - There are issues relating to upcalls from the socket layer: while many consumers of sockets simply sleep for wakeups on socket pointers, so_upcall() permits the network stack to "upcall" into other components of the system. I believe this was introduced initially for the NFS server to allow initial processing of RPCs to occur in the netisr rather than waiting on a context switch to the NFS server threads. However, it's now also used for accept sockets, and I'm aware of outstanding changes that modify the NFS client to use it as well. We need to establish what locks will be held over the upcall, if any, and what expectations are in place for implementers of upcall functions. At the very least, they have to be MPSAFE, but there are also potential lock order issues. - Locking for KQueue is critical to success. Without locking down the event infrastructure, we can't remove Giant from the many interesting pieces of the network stack. KQueue is an example of a high level of incestuousness between levels, and will require careful handling. Brian's approach adopts a "single subsystem" for KQueue and as such offers a low hanging fruit approach, but comes at a number of costs, not least is parallelism loss and functional loss. John-Mark's approach appears to offer a more granular locking approach offering higher parallelism, but at the cost of complexity. I've not yet had the opportunity to review either in any detail, but I know Brian has integrated a work branch in Perforce that combines both the locking in rwatson_netperf, and perform testing. There's obviously more work to go on here, and it is required to get to "Giant-free operation". For more complete changes and history, I would refer you to the last few FreeBSD Status Reports on network stack locking. I would also encourage you to contact me if you would like to claim some section of the stack for work so I can coordinate activities. These patch sets have been pounded heavily in a wide variety of environments, but there are several known issues so I would recommend using them cautiously. In terms of merging: I've been gradually merging a lot of the infrastructure pieces as I went along. The next big chunks to consider merging are: - Socket locking. This needs to wait until I'm more happy with the strategy. - UNIX domain socket locking. This is probably an early candidate, but because of potential interactions with socket locking changes, I've been deferring the merge. - NFS server locking. I had planned to merge the current subsystem lock quickly, but then Rick turned up with fine-grained data based locking of the NFS server, and NFSv4 server code when I asked him for review of the subsystem lock, so I've been holding off. - Additional general infrastructure, such as more psuedo-interface locking, fifofs stuff, etc. I'll continue on the gradual incremental merge path as I have been for the past few months. It's obviously desirable to get things merged as soon as they are ready, even with Giant remaining over the stack, so that we can get broad exercising of the locking assertions in INVARIANTS and WITNESS. As such, over the next month I anticipate an increasing number of merges, and increasing usability of "debug.mpsafenet" in the main tree. Turning off Giant will likely lead to problems for some time to come, but the sooner we get exposure, the better life will be. We've done a lot of heavy testing of common code paths, but working out the edge cases will take some time. We're prepared to live in a world with a dual-mode stack for some period, but that has to be an interim measure. So I guess the upshot is "Stuff is going on, be aware, volunteer to help!". Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research From owner-freebsd-arch@FreeBSD.ORG Thu May 20 13:56:39 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4D69616A4CF for ; Thu, 20 May 2004 13:56:39 -0700 (PDT) Received: from rwcrmhc13.comcast.net (rwcrmhc13.comcast.net [204.127.198.39]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1943043D2D for ; Thu, 20 May 2004 13:56:39 -0700 (PDT) (envelope-from julian@elischer.org) Received: from interjet.elischer.org ([24.7.73.28]) by comcast.net (rwcrmhc13) with ESMTP id <2004052020563801500qckjie>; Thu, 20 May 2004 20:56:38 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id NAA74737 for ; Thu, 20 May 2004 13:56:38 -0700 (PDT) Date: Thu, 20 May 2004 13:56:36 -0700 (PDT) From: Julian Elischer To: arch@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: atomic reference counting primatives. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 May 2004 20:56:39 -0000 This has been raised before but I've come across uses for it again and again so I'm raising it again. JHB once posted some atomic referenc counting primatives. (Do you still have them John?) Alfred once said he had soem somewhere too, and other s have commentted on this before, but we still don't seem to have any. every object is reference counted with its own code and sometimes it's done poorly. Some peiople indicated that there are cases where a generic refcounter can not be used and usd this as a reason to not have one at all. So, here are some possibilities.. my first "write it down without too much thinking" effort.. typedef {mumble} refcnt_t refcnt_add(refcnt_t *) Increments the reference count.. no magic except to be atomic. int refcnt_drop(refcnt *, struct mutex *) Decrements the refcount. If it goes to 0 it returns 0 and locks the mutex (if the mutex is supplied).. refcnt_init(refcnt_t *) would simply set the counter to 0 if refcnt_t is defined as a simple type, but could do more if a more complex refcount is used (say for debugging) debugging versions of the above might store all sorts of stuff in the refcount.. (e.g. pid, __LINE__ __FUNCTION__ etc.) vm->vm_exitingcnt) If these were in place it would be a first step in tightennign up some of the reference counting we see in the kernel and there are several places I've seen over the last few years where locks are used purely to allow reference counts to be manipulated. thoughts....? better ideas? From owner-freebsd-arch@FreeBSD.ORG Thu May 20 14:32:56 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6A59816A4CE for ; Thu, 20 May 2004 14:32:56 -0700 (PDT) Received: from mailtoaster1.pipeline.ch (mailtoaster1.pipeline.ch [62.48.0.70]) by mx1.FreeBSD.org (Postfix) with ESMTP id A5AEF43D48 for ; Thu, 20 May 2004 14:32:55 -0700 (PDT) (envelope-from andre@freebsd.org) Received: (qmail 47139 invoked from network); 20 May 2004 21:32:54 -0000 Received: from unknown (HELO freebsd.org) ([62.48.0.53]) (envelope-sender ) by mailtoaster1.pipeline.ch (qmail-ldap-1.03) with SMTP for ; 20 May 2004 21:32:54 -0000 Message-ID: <40AD2405.DC13B45C@freebsd.org> Date: Thu, 20 May 2004 23:32:53 +0200 From: Andre Oppermann X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: Robert Watson References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: arch@FreeBSD.org Subject: Re: Network Stack Locking X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 May 2004 21:32:56 -0000 Robert Watson wrote: ... > Note that there are some serious issues with the current locking changes: ... > I vote for the approach to get in as much as possible from the moment on it is known to work *correctly* (not neccessarily perfectly optimal/ optimized). Having something correct is an ideal base to start for optimizing. There I'm ready to jump in and go ahead to make things better by re-arraning or re-writing them. One of my main dislikings of the current 'net' and 'netinet' code is it's obfuscation and really overloaded functions. Even though I'm very fluent in the IPv4 network code it is still hurting my eye and brain when looking through certain files... So I've started to clean up large parts of it. The very first thing is to get ipfw out of ip_input/ip_output which I have early patches (see last status report). In that patch are two more things. One is to make ip_reass() a real function taking a fragemented packet instead of being a half-way stub only capable of being called from ip_input. The second thing is to move all ip options related functions (which are quite many/large and seldomly used) to their own .c/.h file. With that alone both ip_input/ip_output shrink by approx. 1/3 in size and get way more readable and understandable. Well, the only thing I really want to say is that correctly working code is always a great base to optimize from. I think this is one of the big lessions I've learned through my relatively young kernel programming career and from the VM work of John Dyson (for the younger among us, he and David Greenman did the orginal implementation of the unified VM we have. John lost himself in micro-optimizations where he somewhat lost the ability to see the forest because of all the trees in the way. In the end he had to give it up). Progress happens incrementally. Put in Green's kqueue locking, have that working correctly and make it perfect in a second step. -- Andre From owner-freebsd-arch@FreeBSD.ORG Thu May 20 18:03:27 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6E59C16A4CE; Thu, 20 May 2004 18:03:27 -0700 (PDT) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1B27443D46; Thu, 20 May 2004 18:03:27 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) i4L13Q7Z068013; Thu, 20 May 2004 18:03:26 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id i4L13QWT068012; Thu, 20 May 2004 18:03:26 -0700 (PDT) (envelope-from dillon) Date: Thu, 20 May 2004 18:03:26 -0700 (PDT) From: Matthew Dillon Message-Id: <200405210103.i4L13QWT068012@apollo.backplane.com> To: Robert Watson References: cc: arch@freebsd.org Subject: Re: Network Stack Locking X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 01:03:27 -0000 It's my guess that we will be able to remove the BGL from large portions of the DFly network stack sometime late June or early July, after USENIX, at which point it will be possible to test SMP aspects of the localized cpu distribution method. Right now the network stack is still under the BGL (as is most of the system, our approach to MP is first to isolate and localize the conflicting subsystems, then to release the BGL for that subsystem's thread(s)). It should be noted that the biggest advantages of the distributed approach are (1) The ability to operate on individual PCBs without having to do any token/mutex/other locking at all, (2) Cpu locality of reference in regards to cache mastership of the PCBs and related data, and (3) avoidance of data cache pollution across cpus (more cpus == better utilization of individual L1/L2 caches and far greater scaleability). The biggest disadvantage is the mandatory thread switch (but this is mitigated as load increases since each thread can work on several PCBs without further switches, and because our thread scheduler is extremely light weight under SMP conditions). Messaging passing overhead is very low since most operations already require some sort of roll-up structure to be passed (e.g. an mbuf in the case of the network). We are running the full bore threaded, distributed network stack even on UP systems now (meaning: message passing and thread switching still occurs even though there is only one target thread for a particular protocol). We have done fairly significant testing on GigE LANs and have not noticed any degredation in network performance so we are certain we are on the right track. I do not expect cpu balancing to be all that big an issue, actually, especially due to the typically short lived connection life that occurs in these scenarios. But mutex avoidance is *REALLY* *HUGE* if you are processing a lot of TCP connections in parallel due to the small quantums of work involved. In anycase, if you are seriously considering any sort of distributed methodology you should also consider formalizing a messaging passing API for FreeBSD. Even if you don't like our LWKT messaging API, I think you would love the DFly IPI messaging subsystem and it would be very easy to port as a first step. We use it so much now in DFly that I don't think I could live without it. e.g. for clock distribution, interrupt distribution, thread/cpu isolation, wakeup(), MP-safe messaging at higher levels (and hence packet routing), free()-return-to- originating-cpu (mutexless slab allocator), SMP MMU synchronization (the basic VM/pte-race issue with userland brought up by Alan Cox), basic scheduler operations, signal(), and the list goes on and on. In DFly, IPI messaging and message processing is required to be MP safe (it always occurs outside the BGL, like a cpu-localized fast interrupt), but a critical section still protects against reception processing so code that uses it can be made very clean. -Matt :- They enable net.isr.enable by default, which provides inbound packet :... : consider at least some aspects of Jeffrey Hsu's work on DragonFly : to explore providing for multiple netisr's bound to CPUs, then directing : traffic based on protocol aware hashing that permits us to maintain : sufficient ordering to meeting higher level protocol requirements while : avoiding the cost of maintaining full ordering. This isn't something we : have to do immediately, but exploiting parallelism requires both : effective synchronization and effective balancing of load. : : In the short term, I'm less interested in the avoidance of : synchronization of data adopted in the DragonFly approach, since I'd : like to see that approach validated on a larger chunk of the stack : (i.e., across the more incestuous pieces of the network stack), and also :... : benefits (such as a very strong assertion model). However, as aspects : of the DFBSD approach are validated (or not, as the case may be), we : should consider adopting things as they make sense. The approaches : offer quite a bit of promise, but are also very experimental and will : require a lot of validation, needless to say. I've done a little bit of : work to start applying the load distribution approach on FreeBSD, but : need to work more on the netisr infrastructure before I'll be able to : evaluate its effectiveness there. From owner-freebsd-arch@FreeBSD.ORG Thu May 20 19:54:39 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9B13516A4CE for ; Thu, 20 May 2004 19:54:39 -0700 (PDT) Received: from harmony.village.org (rover.village.org [168.103.84.182]) by mx1.FreeBSD.org (Postfix) with ESMTP id 42C5343D31 for ; Thu, 20 May 2004 19:54:39 -0700 (PDT) (envelope-from imp@bsdimp.com) Received: from localhost (warner@rover2.village.org [10.0.0.1]) by harmony.village.org (8.12.11/8.12.11) with ESMTP id i4L2s2GA038430; Thu, 20 May 2004 20:54:02 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Thu, 20 May 2004 20:54:03 -0600 (MDT) Message-Id: <20040520.205403.08940889.imp@bsdimp.com> To: julian@elischer.org From: "M. Warner Losh" In-Reply-To: References: X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: arch@freebsd.org Subject: Re: atomic reference counting primatives. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 02:54:39 -0000 In message: Julian Elischer writes: : This has been raised before but I've come across uses for it again and : again so I'm raising it again. : JHB once posted some atomic referenc counting primatives. (Do you still : have them John?) : Alfred once said he had soem somewhere too, and other s have commentted : on this before, but we still don't seem to have any. : : every object is reference counted with its own code and : sometimes it's done poorly. : : Some peiople indicated that there are cases where a generic refcounter : can not be used and usd this as a reason to not have one at all. : : So, here are some possibilities.. : my first "write it down without too much thinking" effort.. : : typedef {mumble} refcnt_t : : refcnt_add(refcnt_t *) : Increments the reference count.. no magic except to be atomic. : : : int refcnt_drop(refcnt *, struct mutex *) : Decrements the refcount. If it goes to 0 it returns 0 and locks the : mutex (if the mutex is supplied).. What prevents refcnt_add() from happening after ref count drops to 0? Wouldn't that be a race? Eg, if we have two threads: Thread A Thread B objp = lookup(); [1] refcnt_drop(&objp->ref, &objp->mtx); [2] refcnt_add(&obj->ref); BANG! If [1] happens before [2], then bad things happen at BANG! If [2] happens before [1], then the mutex won't be locked at BANG and things is good. Thread A believes it has a valid reference to objp after the refcnt_add and no way of knowing otherwise. Is there a safe way to use the API into what you are proposing? Warner From owner-freebsd-arch@FreeBSD.ORG Thu May 20 20:45:42 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EDC0F16A4CE for ; Thu, 20 May 2004 20:45:42 -0700 (PDT) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id 63F4343D2D for ; Thu, 20 May 2004 20:45:42 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87])i4L3jW5v012968; Fri, 21 May 2004 13:45:32 +1000 Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) i4L3jULS005620; Fri, 21 May 2004 13:45:31 +1000 Date: Fri, 21 May 2004 13:45:32 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Julian Elischer In-Reply-To: Message-ID: <20040521133502.Y4135@gamplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@FreeBSD.org Subject: Re: atomic reference counting primatives. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 03:45:43 -0000 On Thu, 20 May 2004, Julian Elischer wrote: > This has been raised before but I've come across uses for it again and > again so I'm raising it again. > JHB once posted some atomic referenc counting primatives. (Do you still > have them John?) > Alfred once said he had soem somewhere too, and other s have commentted > on this before, but we still don't seem to have any. > > every object is reference counted with its own code and > sometimes it's done poorly. > > Some peiople indicated that there are cases where a generic refcounter > can not be used and usd this as a reason to not have one at all. Now we know that a generic reference counter would be even better for pessimizing FreeBSD than was first thought, since on P4's locked instructions are very expensive. See the thread about bridging. A pessimization by a factor of 2 or so has been achieved using little more than normal locking, since there are lots of lock/unlock pairs per packet and each lock and unlock takes hundreds (?) of cycles for the bus lock part and very little else. General atomic counters of any sort would take about half as lock as a lock/unlock pair (since they only need 1 lock, but would always needed it even if running in a locked region). The pessimizations from them could be broken using algorithms that don't need fine-grained locking. Bruce From owner-freebsd-arch@FreeBSD.ORG Thu May 20 21:10:21 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DC1D316A4CE for ; Thu, 20 May 2004 21:10:21 -0700 (PDT) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.183]) by mx1.FreeBSD.org (Postfix) with ESMTP id 52D1F43D1F for ; Thu, 20 May 2004 21:10:21 -0700 (PDT) (envelope-from max@love2party.net) Received: from [212.227.126.162] (helo=mrelayng.kundenserver.de) by moutng.kundenserver.de with esmtp (Exim 3.35 #1) id 1BR1Kx-0004ag-00 for freebsd-arch@freebsd.org; Fri, 21 May 2004 06:08:47 +0200 Received: from [216.58.85.218] (helo=[10.0.0.49]) by mrelayng.kundenserver.de with asmtp (TLSv1:RC4-MD5:128) (Exim 3.35 #1) id 1BR1Kx-0004Or-00 for freebsd-arch@freebsd.org; Fri, 21 May 2004 06:08:47 +0200 From: Max Laier To: freebsd-arch@freebsd.org Date: Fri, 21 May 2004 06:10:24 +0200 User-Agent: KMail/1.6.2 References: <20040521133502.Y4135@gamplex.bde.org> In-Reply-To: <20040521133502.Y4135@gamplex.bde.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200405210610.27298.max@love2party.net> X-Provags-ID: kundenserver.de abuse@kundenserver.de auth:e28873fbe4dbe612ce62ab869898ff08 Subject: Re: atomic reference counting primatives. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 04:10:22 -0000 On Friday 21 May 2004 05:45, Bruce Evans wrote: > On Thu, 20 May 2004, Julian Elischer wrote: > > This has been raised before but I've come across uses for it again and > > again so I'm raising it again. > > JHB once posted some atomic referenc counting primatives. (Do you still > > have them John?) > > Alfred once said he had soem somewhere too, and other s have commentted > > on this before, but we still don't seem to have any. > > > > every object is reference counted with its own code and > > sometimes it's done poorly. > > > > Some peiople indicated that there are cases where a generic refcounter > > can not be used and usd this as a reason to not have one at all. > > Now we know that a generic reference counter would be even better for > pessimizing FreeBSD than was first thought, since on P4's locked > instructions are very expensive. See the thread about bridging. A > pessimization by a factor of 2 or so has been achieved using little > more than normal locking, since there are lots of lock/unlock pairs > per packet and each lock and unlock takes hundreds (?) of cycles for > the bus lock part and very little else. General atomic counters of > any sort would take about half as lock as a lock/unlock pair (since > they only need 1 lock, but would always needed it even if running in > a locked region). The pessimizations from them could be broken using > algorithms that don't need fine-grained locking. I find atomic counters still very attractive for a simple sx lock. The current implementation uses (as far as I know) a normal mutex to protect the busy count, so you have four lock/unlock operations that need bus interaction, when we move to updating the busy count with atomic ops we have only two and could start to actually use sx locks. The BANG from Warner's reply could be avoided by decrementing to ($magicval) rather than 0 when exlusive mode is requested. But I am not entirely sure if I got the point ... but that's always the/my problem when it comes to unterstanding locks. -- Best regards, | mlaier@freebsd.org Max Laier | ICQ #67774661 http://pf4freebsd.love2party.net/ | mlaier@EFnet From owner-freebsd-arch@FreeBSD.ORG Fri May 21 06:59:07 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A498E16A4CE for ; Fri, 21 May 2004 06:59:07 -0700 (PDT) Received: from mail3.speakeasy.net (mail3.speakeasy.net [216.254.0.203]) by mx1.FreeBSD.org (Postfix) with ESMTP id 84A7043D31 for ; Fri, 21 May 2004 06:59:07 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 6235 invoked from network); 21 May 2004 13:58:56 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 21 May 2004 13:58:56 -0000 Received: from 10.50.40.205 (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i4LDwoRK076727; Fri, 21 May 2004 09:58:51 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Fri, 21 May 2004 09:59:24 -0400 User-Agent: KMail/1.6 References: <20040520.205403.08940889.imp@bsdimp.com> In-Reply-To: <20040520.205403.08940889.imp@bsdimp.com> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200405210959.25368.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: arch@FreeBSD.org cc: julian@elischer.org Subject: Re: atomic reference counting primatives. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 13:59:07 -0000 On Thursday 20 May 2004 10:54 pm, M. Warner Losh wrote: > In message: > > > Julian Elischer writes: > : This has been raised before but I've come across uses for it again and > : again so I'm raising it again. > : JHB once posted some atomic referenc counting primatives. (Do you still > : have them John?) > : Alfred once said he had soem somewhere too, and other s have commentted > : on this before, but we still don't seem to have any. > : > : every object is reference counted with its own code and > : sometimes it's done poorly. > : > : Some peiople indicated that there are cases where a generic refcounter > : can not be used and usd this as a reason to not have one at all. > : > : So, here are some possibilities.. > : my first "write it down without too much thinking" effort.. > : > : typedef {mumble} refcnt_t > : > : refcnt_add(refcnt_t *) > : Increments the reference count.. no magic except to be atomic. > : > : > : int refcnt_drop(refcnt *, struct mutex *) > : Decrements the refcount. If it goes to 0 it returns 0 and locks the > : mutex (if the mutex is supplied).. > > What prevents refcnt_add() from happening after ref count drops to 0? > Wouldn't that be a race? Eg, if we have two threads: > > > Thread A Thread B > > objp = lookup(); > [1] refcnt_drop(&objp->ref, &objp->mtx); > [2] refcnt_add(&obj->ref); > BANG! > > If [1] happens before [2], then bad things happen at BANG! If [2] > happens before [1], then the mutex won't be locked at BANG and things > is good. Thread A believes it has a valid reference to objp after the > refcnt_add and no way of knowing otherwise. > > Is there a safe way to use the API into what you are proposing? This situation can't happen if you are properly using reference counting. For the reference count to be at 1 in thread B, it has to have the only reference meaning that the object has already been removed from any lists, etc. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Fri May 21 06:59:08 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 31F0D16A4CE for ; Fri, 21 May 2004 06:59:08 -0700 (PDT) Received: from mail3.speakeasy.net (mail3.speakeasy.net [216.254.0.203]) by mx1.FreeBSD.org (Postfix) with ESMTP id 12C3B43D48 for ; Fri, 21 May 2004 06:59:08 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 6235 invoked from network); 21 May 2004 13:58:56 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 21 May 2004 13:58:56 -0000 Received: from 10.50.40.205 (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i4LDwoRK076727; Fri, 21 May 2004 09:58:51 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Fri, 21 May 2004 09:59:24 -0400 User-Agent: KMail/1.6 References: <20040520.205403.08940889.imp@bsdimp.com> In-Reply-To: <20040520.205403.08940889.imp@bsdimp.com> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200405210959.25368.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: arch@FreeBSD.org cc: julian@elischer.org Subject: Re: atomic reference counting primatives. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 13:59:08 -0000 On Thursday 20 May 2004 10:54 pm, M. Warner Losh wrote: > In message: > > > Julian Elischer writes: > : This has been raised before but I've come across uses for it again and > : again so I'm raising it again. > : JHB once posted some atomic referenc counting primatives. (Do you still > : have them John?) > : Alfred once said he had soem somewhere too, and other s have commentted > : on this before, but we still don't seem to have any. > : > : every object is reference counted with its own code and > : sometimes it's done poorly. > : > : Some peiople indicated that there are cases where a generic refcounter > : can not be used and usd this as a reason to not have one at all. > : > : So, here are some possibilities.. > : my first "write it down without too much thinking" effort.. > : > : typedef {mumble} refcnt_t > : > : refcnt_add(refcnt_t *) > : Increments the reference count.. no magic except to be atomic. > : > : > : int refcnt_drop(refcnt *, struct mutex *) > : Decrements the refcount. If it goes to 0 it returns 0 and locks the > : mutex (if the mutex is supplied).. > > What prevents refcnt_add() from happening after ref count drops to 0? > Wouldn't that be a race? Eg, if we have two threads: > > > Thread A Thread B > > objp = lookup(); > [1] refcnt_drop(&objp->ref, &objp->mtx); > [2] refcnt_add(&obj->ref); > BANG! > > If [1] happens before [2], then bad things happen at BANG! If [2] > happens before [1], then the mutex won't be locked at BANG and things > is good. Thread A believes it has a valid reference to objp after the > refcnt_add and no way of knowing otherwise. > > Is there a safe way to use the API into what you are proposing? This situation can't happen if you are properly using reference counting. For the reference count to be at 1 in thread B, it has to have the only reference meaning that the object has already been removed from any lists, etc. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Fri May 21 07:01:46 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DDBBB16A4D0 for ; Fri, 21 May 2004 07:01:46 -0700 (PDT) Received: from mail6.speakeasy.net (mail6.speakeasy.net [216.254.0.206]) by mx1.FreeBSD.org (Postfix) with ESMTP id B8D2F43D31 for ; Fri, 21 May 2004 07:01:46 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 2713 invoked from network); 21 May 2004 14:01:35 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 21 May 2004 14:01:35 -0000 Received: from 10.50.40.205 (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i4LE1Rsp076772; Fri, 21 May 2004 10:01:27 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Fri, 21 May 2004 10:02:02 -0400 User-Agent: KMail/1.6 References: In-Reply-To: MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200405211002.02386.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: arch@FreeBSD.org cc: Julian Elischer Subject: Re: atomic reference counting primatives. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 14:01:47 -0000 On Thursday 20 May 2004 04:56 pm, Julian Elischer wrote: > This has been raised before but I've come across uses for it again and > again so I'm raising it again. > JHB once posted some atomic referenc counting primatives. (Do you still > have them John?) > Alfred once said he had soem somewhere too, and other s have commentted > on this before, but we still don't seem to have any. I still have them. Part of the problem is that there are lots of different reference counts that work in different ways, and if you try to come up with a single all-singing, all-dancing ref count implementation it will be too complicated to provide any benefit. What I do think might be useful might be a simple refcount() API that is useful for objects that are immutable when the refcount > 1 like ucred and are updated via COW. These type of objects have a mutex that just protects a refcount and nothing else. Using a single refcount op for those objects will cut the number of atomic ops in half. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Fri May 21 07:01:46 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DD89416A4CF for ; Fri, 21 May 2004 07:01:46 -0700 (PDT) Received: from mail6.speakeasy.net (mail6.speakeasy.net [216.254.0.206]) by mx1.FreeBSD.org (Postfix) with ESMTP id B94C143D46 for ; Fri, 21 May 2004 07:01:46 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 2713 invoked from network); 21 May 2004 14:01:35 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 21 May 2004 14:01:35 -0000 Received: from 10.50.40.205 (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i4LE1Rsp076772; Fri, 21 May 2004 10:01:27 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Fri, 21 May 2004 10:02:02 -0400 User-Agent: KMail/1.6 References: In-Reply-To: MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200405211002.02386.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: arch@FreeBSD.org cc: Julian Elischer Subject: Re: atomic reference counting primatives. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 14:01:47 -0000 On Thursday 20 May 2004 04:56 pm, Julian Elischer wrote: > This has been raised before but I've come across uses for it again and > again so I'm raising it again. > JHB once posted some atomic referenc counting primatives. (Do you still > have them John?) > Alfred once said he had soem somewhere too, and other s have commentted > on this before, but we still don't seem to have any. I still have them. Part of the problem is that there are lots of different reference counts that work in different ways, and if you try to come up with a single all-singing, all-dancing ref count implementation it will be too complicated to provide any benefit. What I do think might be useful might be a simple refcount() API that is useful for objects that are immutable when the refcount > 1 like ucred and are updated via COW. These type of objects have a mutex that just protects a refcount and nothing else. Using a single refcount op for those objects will cut the number of atomic ops in half. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Fri May 21 09:20:30 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4BB0316A4CE; Fri, 21 May 2004 09:20:30 -0700 (PDT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id F083943D31; Fri, 21 May 2004 09:20:29 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.11/8.12.11) with ESMTP id i4LGJr3K012129; Fri, 21 May 2004 12:19:54 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i4LGJr3H012126; Fri, 21 May 2004 12:19:53 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Fri, 21 May 2004 12:19:53 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Andre Oppermann In-Reply-To: <40AD2405.DC13B45C@freebsd.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: Network Stack Locking X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 16:20:30 -0000 On Thu, 20 May 2004, Andre Oppermann wrote: > Robert Watson wrote: > ... > > Note that there are some serious issues with the current locking changes: > > I vote for the approach to get in as much as possible from the moment on > it is known to work *correctly* (not neccessarily perfectly optimal/ > optimized). Having something correct is an ideal base to start for > optimizing. There I'm ready to jump in and go ahead to make things > better by re-arraning or re-writing them. One of my main dislikings of > the current 'net' and 'netinet' code is it's obfuscation and really > overloaded functions. Even though I'm very fluent in the IPv4 network > code it is still hurting my eye and brain when looking through certain > files... So I've started to clean up large parts of it. The very first > thing is to get ipfw out of ip_input/ip_output which I have early > patches (see last status report). In that patch are two more things. > One is to make ip_reass() a real function taking a fragemented packet > instead of being a half-way stub only capable of being called from > ip_input. The second thing is to move all ip options related functions > (which are quite many/large and seldomly used) to their own .c/.h file. > With that alone both ip_input/ip_output shrink by approx. 1/3 in size > and get way more readable and understandable. I agree generally with all of the improvements you have proposed -- cleaning up the ip_input() and ip_output() paths is imperative. Likewise attempting to reduce the incestuousness of the stack and its various components, normalize utility functions such as reassembly, etc. > Well, the only thing I really want to say is that correctly working code > is always a great base to optimize from. I think this is one of the big > lessions I've learned through my relatively young kernel programming > career and from the VM work of John Dyson (for the younger among us, he > and David Greenman did the orginal implementation of the unified VM we > have. John lost himself in micro-optimizations where he somewhat lost > the ability to see the forest because of all the trees in the way. In > the end he had to give it up). Agreed. My goal in picking up the pieces from various people working on this has been get to the "decent first pass" so that we can finally understand how all the pieces come together. There should be a number of fairly easy optimizations we can look into once we're able to measure accurately the impact of changes in the locking strategy. The trick is getting that decent first pass -- we're close, but not quite there. The good news is that the dual-mode model allows us to merge locking on components without that locking necessarily being 100% complete. I anticipate a non-trivial window in which whether you can run Giant-free depends on whether you're using more obscure stack components, for example. I hope it is not long enough that we have to improve the mechanism for the dual-mode (i.e., have the kernel select running with Giant based on the code compiled in, etc). Right now it's a simple loader tunable. Anyhow, my hope is to have a substantial amount of time to work on cleaning up this weekend so that I can update the patch sets. I also need to integrate in changes from rik to make userspace compile with the modified kernel, etc. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research From owner-freebsd-arch@FreeBSD.ORG Fri May 21 10:24:41 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D57F216A4CE for ; Fri, 21 May 2004 10:24:41 -0700 (PDT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6272B43D2D for ; Fri, 21 May 2004 10:24:41 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.11/8.12.11) with ESMTP id i4LHNqgH024685; Fri, 21 May 2004 13:23:52 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i4LHNqQr024682; Fri, 21 May 2004 13:23:52 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Fri, 21 May 2004 13:23:51 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Matthew Dillon In-Reply-To: <200405210103.i4L13QWT068012@apollo.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: Network Stack Locking X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 17:24:41 -0000 On Thu, 20 May 2004, Matthew Dillon wrote: > It should be noted that the biggest advantages of the distributed > approach are (1) The ability to operate on individual PCBs without > having to do any token/mutex/other locking at all, (2) Cpu locality > of reference in regards to cache mastership of the PCBs and related data, > and (3) avoidance of data cache pollution across cpus (more cpus == > better utilization of individual L1/L2 caches and far greater > scaleability). The biggest disadvantage is the mandatory thread switch > (but this is mitigated as load increases since each thread can work on > several PCBs without further switches, and because our thread scheduler > is extremely light weight under SMP conditions). Messaging passing > overhead is very low since most operations already require some sort of > roll-up structure to be passed (e.g. an mbuf in the case of the network). My primary concern with this approach (and the reason I'm taking somewhat of a "wait and see what happens" attitude) is the level of inter-component incestuousness (referred to elsewhere in this thread). At particular layers in the stack -- the PCBs are probably the best example -- I see the opportunity for this sort of per-CPU unsynchronized access offering a very clean and uncomplicated approach. However, I'm concerned that along many of the total end-to-end paths, there are a moderate number of pieces that will require traditional synchronization or extensive re-writing: the route table, host cache, a variety of "processing" packages such as netgraph, IPSEC, et al. None of that suggests that the per-cpu synchronization-free access in a thread shouldn't be applied, but I'd like to see it demonstrated to be a useful technique in a more broad sense. One of the key implied benefits of the approach is that it allows you to avoid significant rewriting costs for existing code, which is appealing, but less appealing if it doesn't fall out in the general case. The other concern I have is whether the message queues get deep or not: many of the benefits of message queues come when the queues allow coallescing of context switches to process multiple packets. If you're paying a context switch per packet passing through the stack each time you cross a boundary, there's a non-trivial operational cost to that. So what I'd like to see are the numbers that suggest, on a pretty functional sample stack, that you get at least an interesting level of queuing and therefore effective coallescing of synchronization. I've started looking at similar issues in the type-specific mbuf queues in the FreeBSD kernel -- additional context switches are expensive and best avoided even if you use explicit synchronization primitives such as mutexes. > In anycase, if you are seriously considering any sort of distributed > methodology you should also consider formalizing a messaging passing > API for FreeBSD. Even if you don't like our LWKT messaging API, I > think you would love the DFly IPI messaging subsystem and it would be > very easy to port as a first step. We use it so much now in DFly > that I don't think I could live without it. e.g. for clock distribution, > interrupt distribution, thread/cpu isolation, wakeup(), MP-safe messaging > at higher levels (and hence packet routing), free()-return-to- > originating-cpu (mutexless slab allocator), SMP MMU synchronization > (the basic VM/pte-race issue with userland brought up by Alan Cox), > basic scheduler operations, signal(), and the list goes on and on. > In DFly, IPI messaging and message processing is required to be MP > safe (it always occurs outside the BGL, like a cpu-localized fast > interrupt), but a critical section still protects against reception > processing so code that uses it can be made very clean. As someone who's worked with Darwin and other Mach-derived operating systems, I see the clear appeal of message passing systems, as I think we've discussed in other forums. They offer substantially interesting benefits from a security perspective also as they offer more clean separation between components, especially userspace and the kernel. However, based on past experience with such systems, I'm also very cautious about the notion. The increased level of separation between components can also make it harder to understand the interactions between components in a debugging sense: for example, if your stack trace in the TCP code only goes up to the queue receive primitive, the debugger can't simply tell you what code originated the mbuf. In the past, I've explored binding stack traces to messages in message passing systems when operating in debugging mode so that the debugger walks up to the message queue, and can then follow the stack trace from the message to understand more about the calling context. I've also used this on FreeBSD in userspace -- we have local modifications to allow the kernel to attack stack traces of the sending process to messages passed over UNIX domain sockets so that the receiving code can grab the stack trace as ancillary data. The trick, though, is to make sure you're not just substituting message queue operations and context switches for mutexes, because those both have a moderate cost. Many of the benefits come in reducing explicit synchronization and then amortizing the context switch cost over multiple instances, which helps with the cache and many other things. So something I'd very much like to see out of the dfbsd prototype code is a set of measurements on queue depth at the hand-off points between layers, and statistics on #queue operations, synchronization points, etc, amortized over multiple deliveries. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research From owner-freebsd-arch@FreeBSD.ORG Fri May 21 12:20:00 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ED51F16A4CE; Fri, 21 May 2004 12:19:59 -0700 (PDT) Received: from sccrmhc12.comcast.net (sccrmhc12.comcast.net [204.127.202.56]) by mx1.FreeBSD.org (Postfix) with ESMTP id 953A743D39; Fri, 21 May 2004 12:19:58 -0700 (PDT) (envelope-from julian@elischer.org) Received: from interjet.elischer.org ([24.7.73.28]) by comcast.net (sccrmhc12) with ESMTP id <2004052119194001200oj2q5e>; Fri, 21 May 2004 19:19:41 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id MAA87919; Fri, 21 May 2004 12:19:39 -0700 (PDT) Date: Fri, 21 May 2004 12:19:37 -0700 (PDT) From: Julian Elischer To: John Baldwin In-Reply-To: <200405210959.25368.jhb@FreeBSD.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@FreeBSD.org cc: freebsd-arch@FreeBSD.org Subject: Re: atomic reference counting primatives. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 19:20:00 -0000 On Fri, 21 May 2004, John Baldwin wrote: > On Thursday 20 May 2004 10:54 pm, M. Warner Losh wrote: > > In message: > > > > > > Julian Elischer writes: > > : This has been raised before but I've come across uses for it again and > > : again so I'm raising it again. > > : JHB once posted some atomic referenc counting primatives. (Do you still > > : have them John?) > > : Alfred once said he had soem somewhere too, and other s have commentted > > : on this before, but we still don't seem to have any. > > : > > : every object is reference counted with its own code and > > : sometimes it's done poorly. > > : > > : Some peiople indicated that there are cases where a generic refcounter > > : can not be used and usd this as a reason to not have one at all. > > : > > : So, here are some possibilities.. > > : my first "write it down without too much thinking" effort.. > > : > > : typedef {mumble} refcnt_t > > : > > : refcnt_add(refcnt_t *) > > : Increments the reference count.. no magic except to be atomic. > > : > > : > > : int refcnt_drop(refcnt *, struct mutex *) > > : Decrements the refcount. If it goes to 0 it returns 0 and locks the > > : mutex (if the mutex is supplied).. > > > > What prevents refcnt_add() from happening after ref count drops to 0? > > Wouldn't that be a race? Eg, if we have two threads: > > > > > > Thread A Thread B > > > > objp = lookup(); > > [1] refcnt_drop(&objp->ref, &objp->mtx); > > [2] refcnt_add(&obj->ref); > > BANG! > > > > If [1] happens before [2], then bad things happen at BANG! If [2] > > happens before [1], then the mutex won't be locked at BANG and things > > is good. Thread A believes it has a valid reference to objp after the > > refcnt_add and no way of knowing otherwise. > > > > Is there a safe way to use the API into what you are proposing? > > This situation can't happen if you are properly using reference counting. For > the reference count to be at 1 in thread B, it has to have the only reference > meaning that the object has already been removed from any lists, etc. Exactly.. B needs to have got his copy of th reference from somewhere, and that reference should have been counted somewhere as should B's copy of it. So, the reference count should be at least 2 before B drops his reference.. and possibly 3.. I would even go on record as saying that I have seen and liked a refcount API which was (from memory something like): void * refcnt_add(offsetof(struct obj, refcnt), void ** object_p) which takes a pointer to the object pointer you are copyuing, and atomically increments it and returns the contents of the pointer. If the contents of the pointer are NULL, then it retunrs NULL and doesn't increment anything.. The reference decrement atomically reduced the reference count and zapped the pointer, and retunred a copy of the pointer if the reference count had gone to 0 (or NULL if not). So usage was: struct xx *globalpointer; /* has its own owner somewhere */ mypointer = refcnt_add(offsetof(xx, refcnt), globalptr) if (mypointer == NULL) { printf("didn't find an object\n" return (-1); } manipulate(mypointer) if ((tmppointer = refcnt_drop(&mypointer, &globalpointer))) { free(tmppointer); } someone else who owns the globalpointer reference might might in the meanwhile do: if ((tmppointer = refcnt_drop(globalpointer->refcnt, &globalpointer))) { free(tmppointer); } and you were guaranteed to get a predictable result. From owner-freebsd-arch@FreeBSD.ORG Fri May 21 12:20:00 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ED51F16A4CE; Fri, 21 May 2004 12:19:59 -0700 (PDT) Received: from sccrmhc12.comcast.net (sccrmhc12.comcast.net [204.127.202.56]) by mx1.FreeBSD.org (Postfix) with ESMTP id 953A743D39; Fri, 21 May 2004 12:19:58 -0700 (PDT) (envelope-from julian@elischer.org) Received: from interjet.elischer.org ([24.7.73.28]) by comcast.net (sccrmhc12) with ESMTP id <2004052119194001200oj2q5e>; Fri, 21 May 2004 19:19:41 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id MAA87919; Fri, 21 May 2004 12:19:39 -0700 (PDT) Date: Fri, 21 May 2004 12:19:37 -0700 (PDT) From: Julian Elischer To: John Baldwin In-Reply-To: <200405210959.25368.jhb@FreeBSD.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@FreeBSD.org cc: freebsd-arch@FreeBSD.org Subject: Re: atomic reference counting primatives. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 19:20:00 -0000 On Fri, 21 May 2004, John Baldwin wrote: > On Thursday 20 May 2004 10:54 pm, M. Warner Losh wrote: > > In message: > > > > > > Julian Elischer writes: > > : This has been raised before but I've come across uses for it again and > > : again so I'm raising it again. > > : JHB once posted some atomic referenc counting primatives. (Do you still > > : have them John?) > > : Alfred once said he had soem somewhere too, and other s have commentted > > : on this before, but we still don't seem to have any. > > : > > : every object is reference counted with its own code and > > : sometimes it's done poorly. > > : > > : Some peiople indicated that there are cases where a generic refcounter > > : can not be used and usd this as a reason to not have one at all. > > : > > : So, here are some possibilities.. > > : my first "write it down without too much thinking" effort.. > > : > > : typedef {mumble} refcnt_t > > : > > : refcnt_add(refcnt_t *) > > : Increments the reference count.. no magic except to be atomic. > > : > > : > > : int refcnt_drop(refcnt *, struct mutex *) > > : Decrements the refcount. If it goes to 0 it returns 0 and locks the > > : mutex (if the mutex is supplied).. > > > > What prevents refcnt_add() from happening after ref count drops to 0? > > Wouldn't that be a race? Eg, if we have two threads: > > > > > > Thread A Thread B > > > > objp = lookup(); > > [1] refcnt_drop(&objp->ref, &objp->mtx); > > [2] refcnt_add(&obj->ref); > > BANG! > > > > If [1] happens before [2], then bad things happen at BANG! If [2] > > happens before [1], then the mutex won't be locked at BANG and things > > is good. Thread A believes it has a valid reference to objp after the > > refcnt_add and no way of knowing otherwise. > > > > Is there a safe way to use the API into what you are proposing? > > This situation can't happen if you are properly using reference counting. For > the reference count to be at 1 in thread B, it has to have the only reference > meaning that the object has already been removed from any lists, etc. Exactly.. B needs to have got his copy of th reference from somewhere, and that reference should have been counted somewhere as should B's copy of it. So, the reference count should be at least 2 before B drops his reference.. and possibly 3.. I would even go on record as saying that I have seen and liked a refcount API which was (from memory something like): void * refcnt_add(offsetof(struct obj, refcnt), void ** object_p) which takes a pointer to the object pointer you are copyuing, and atomically increments it and returns the contents of the pointer. If the contents of the pointer are NULL, then it retunrs NULL and doesn't increment anything.. The reference decrement atomically reduced the reference count and zapped the pointer, and retunred a copy of the pointer if the reference count had gone to 0 (or NULL if not). So usage was: struct xx *globalpointer; /* has its own owner somewhere */ mypointer = refcnt_add(offsetof(xx, refcnt), globalptr) if (mypointer == NULL) { printf("didn't find an object\n" return (-1); } manipulate(mypointer) if ((tmppointer = refcnt_drop(&mypointer, &globalpointer))) { free(tmppointer); } someone else who owns the globalpointer reference might might in the meanwhile do: if ((tmppointer = refcnt_drop(globalpointer->refcnt, &globalpointer))) { free(tmppointer); } and you were guaranteed to get a predictable result. From owner-freebsd-arch@FreeBSD.ORG Fri May 21 12:36:39 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CFE4816A4CE for ; Fri, 21 May 2004 12:36:39 -0700 (PDT) Received: from sccrmhc11.comcast.net (sccrmhc11.comcast.net [204.127.202.55]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7F08143D41 for ; Fri, 21 May 2004 12:36:39 -0700 (PDT) (envelope-from cristjc@comcast.net) Received: from blossom.cjclark.org (c-24-6-187-112.client.comcast.net[24.6.187.112]) by comcast.net (sccrmhc11) with ESMTP id <2004052119360701100qhchde>; Fri, 21 May 2004 19:36:07 +0000 Received: from blossom.cjclark.org (localhost. [127.0.0.1]) by blossom.cjclark.org (8.12.9p2/8.12.8) with ESMTP id i4LJa68B008351 for ; Fri, 21 May 2004 12:36:06 -0700 (PDT) (envelope-from cristjc@comcast.net) Received: (from cjc@localhost) by blossom.cjclark.org (8.12.9p2/8.12.9/Submit) id i4LJa5ni008350 for freebsd-arch@freebsd.org; Fri, 21 May 2004 12:36:05 -0700 (PDT) (envelope-from cristjc@comcast.net) X-Authentication-Warning: blossom.cjclark.org: cjc set sender to cristjc@comcast.net using -f Date: Fri, 21 May 2004 12:36:05 -0700 From: "Crist J. Clark" To: freebsd-arch@freebsd.org Message-ID: <20040521193605.GA8246@blossom.cjclark.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-URL: http://people.freebsd.org/~cjc/ Subject: Move /usr/sup to /var/db/sup? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: "Crist J. Clark" List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 19:36:39 -0000 Just a minor thing, but I would think[0] most people would agree that /var/db/sup is a much more logical place for the CVSup "base" directory than /usr/sup. Yes, it doesn't take up much space on /usr, but for those who don't want to write to /usr[1] too much or mount /usr read- only, it's an irritant. Of course, there is one big reason not to change it, because it would be a change. Personally, I don't think it will be disruptive to make changes to the example files in /usr/share/examples/cvsup. People who already have /usr/sup populated are using their own localized versions of these files, so the change won't affect them (not that losing the "sup" directory is that big of a deal). A person starting with a copy of one of the examples is probably starting a fresh CVSup and will be creating a new sup dir anyway. Anyone have objections to going through the example supfiles with, --- cvs-supfile 4 May 2004 20:03:50 -0000 1.42 +++ cvs-supfile 21 May 2004 19:30:23 -0000 @@ -53 +53 @@ -*default base=/usr +*default base=/var/db [0] But with any seemingly insignificant change like this, there is an excellent chance some people out there do not agree and will be quite vocal about it. [1] Or / if /usr doesn't have its own file system. The same arguments apply. -- Crist J. Clark | cjclark@alum.mit.edu | cjclark@jhu.edu http://people.freebsd.org/~cjc/ | cjc@freebsd.org From owner-freebsd-arch@FreeBSD.ORG Fri May 21 12:42:58 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7284B16A4CE; Fri, 21 May 2004 12:42:58 -0700 (PDT) Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75]) by mx1.FreeBSD.org (Postfix) with ESMTP id 570B443D1D; Fri, 21 May 2004 12:42:58 -0700 (PDT) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (IDENT:brdavis@localhost.localdomain [127.0.0.1]) by odin.ac.hmc.edu (8.12.10/8.12.10) with ESMTP id i4LJgWs0023268; Fri, 21 May 2004 12:42:32 -0700 Received: (from brdavis@localhost) by odin.ac.hmc.edu (8.12.10/8.12.3/Submit) id i4LJgWMb023267; Fri, 21 May 2004 12:42:32 -0700 Date: Fri, 21 May 2004 12:42:32 -0700 From: Brooks Davis To: "Crist J. Clark" Message-ID: <20040521194231.GA22816@Odin.AC.HMC.Edu> References: <20040521193605.GA8246@blossom.cjclark.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="liOOAslEiF7prFVr" Content-Disposition: inline In-Reply-To: <20040521193605.GA8246@blossom.cjclark.org> User-Agent: Mutt/1.5.4i cc: freebsd-arch@freebsd.org Subject: Re: Move /usr/sup to /var/db/sup? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 19:42:58 -0000 --liOOAslEiF7prFVr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, May 21, 2004 at 12:36:05PM -0700, Crist J. Clark wrote: > Just a minor thing, but I would think[0] most people would agree that > /var/db/sup is a much more logical place for the CVSup "base" directory > than /usr/sup. Yes, it doesn't take up much space on /usr, but for > those who don't want to write to /usr[1] too much or mount /usr read- > only, it's an irritant. >=20 > Of course, there is one big reason not to change it, because it would > be a change. >=20 > Personally, I don't think it will be disruptive to make changes to the > example files in /usr/share/examples/cvsup. People who already have > /usr/sup populated are using their own localized versions of these > files, so the change won't affect them (not that losing the "sup" > directory is that big of a deal). A person starting with a copy of > one of the examples is probably starting a fresh CVSup and will be > creating a new sup dir anyway. This seems reasionable. If you're going to do it, I suggest adding /var/db/sup to the appropriate mtree file so it always exists. That way people who blindly copy their supfiles to /usr/sup will still get something that works and their finger memory won't be broken. -- Brooks --=20 Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 --liOOAslEiF7prFVr Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFArlunXY6L6fI4GtQRAi9IAJ9Kas1YehLIg+jiBKxoIKS0K7/9XQCgjlgC Li3+tNYoMlYHK/a9sS42vBQ= =1I6l -----END PGP SIGNATURE----- --liOOAslEiF7prFVr-- From owner-freebsd-arch@FreeBSD.ORG Fri May 21 14:32:46 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4180516A4CE; Fri, 21 May 2004 14:32:46 -0700 (PDT) Received: from mail023.syd.optusnet.com.au (mail023.syd.optusnet.com.au [211.29.132.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id E09D343D2F; Fri, 21 May 2004 14:32:44 -0700 (PDT) (envelope-from PeterJeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (c211-30-75-229.belrs2.nsw.optusnet.com.au [211.30.75.229]) i4LLW9j31023; Sat, 22 May 2004 07:32:09 +1000 Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1])i4LLW9cj091833; Sat, 22 May 2004 07:32:09 +1000 (EST) (envelope-from pjeremy@cirb503493.alcatel.com.au) Received: (from pjeremy@localhost)i4LLW8Za091832; Sat, 22 May 2004 07:32:08 +1000 (EST) (envelope-from pjeremy) Date: Sat, 22 May 2004 07:32:08 +1000 From: Peter Jeremy To: Andre Oppermann Message-ID: <20040521213208.GA87546@cirb503493.alcatel.com.au> References: <40AD2405.DC13B45C@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <40AD2405.DC13B45C@freebsd.org> User-Agent: Mutt/1.4.2i cc: arch@freebsd.org cc: Robert Watson Subject: Re: Network Stack Locking X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 21:32:46 -0000 On Thu, 2004-May-20 23:32:53 +0200, Andre Oppermann wrote: >Robert Watson wrote: >... >> Note that there are some serious issues with the current locking changes: >... >> > >I vote for the approach to get in as much as possible from the moment >on it is known to work *correctly* (not neccessarily perfectly optimal/ >optimized). Having something correct is an ideal base to start for >optimizing. There I'm ready to jump in and go ahead to make things >better by re-arraning or re-writing them. Keep in mind that the best improvements in performance are achieved by using a better algorithm - macro-optimisation rather than micro- optimisation. We currently have a network stack that works correctly and should be careful about committing WIP code that may be heading in the wrong direction. >Progress happens incrementally. Put in Green's kqueue locking, have >that working correctly and make it perfect in a second step. I don't believe this is the correct approach at this time. Brian's code removes functionality that people have stated that they _do_ use. In theory, John-Mark's approach offers better performance without the loss of functionality. Before implementing Brian's code, the Project needs to decide which direction it should move in. -- Peter Jeremy From owner-freebsd-arch@FreeBSD.ORG Fri May 21 14:38:59 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 30E0116A4CE; Fri, 21 May 2004 14:38:59 -0700 (PDT) Received: from smtp2.server.rpi.edu (smtp2.server.rpi.edu [128.113.2.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 93F4C43D39; Fri, 21 May 2004 14:38:58 -0700 (PDT) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp2.server.rpi.edu (8.12.8/8.12.8) with ESMTP id i4LLcvIX031059; Fri, 21 May 2004 17:38:57 -0400 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <20040521193605.GA8246@blossom.cjclark.org> References: <20040521193605.GA8246@blossom.cjclark.org> Date: Fri, 21 May 2004 17:38:56 -0400 To: "Crist J. Clark" , freebsd-arch@freebsd.org From: Garance A Drosihn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: CanIt (www . canit . ca) Subject: Re: Move /usr/sup to /var/db/sup? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 21:38:59 -0000 At 12:36 PM -0700 5/21/04, Crist J. Clark wrote: >Just a minor thing, but I would think[0] most people would >agree that /var/db/sup is a much more logical place for the >CVSup "base" directory than /usr/sup. Yes, it doesn't take >up much space on /usr, but for those who don't want to write >to /usr[1] too much or mount /usr read-only, it's an irritant. > >Of course, there is one big reason not to change it, because >it would be a change. I have all my own sup-files anyway, so I do not have any strong opinion on this. But there is one minor advantage that I have noticed in having the "base=" directory in the same partition as the "prefix=" directory. If the partition matching "prefix=" is not mounted, and if the "base=" file is on that partition, then any attempt to run the cvsup will immediately fail. However, if the "prefix=" partition is not mounted, and the "base=" directory *is* available (because it is on a different partition), then the cvsup will go right ahead and download everything into the wrong partition. Depending on how your machine is set up, this can be rather disastrous... (it was for me, at least!) I have no idea if that is why someone went with /usr/sup in the example supfiles, though. I do not object to making the change to use /var/db/sup, but then I don't use those example files so my vote wouldn't mean much anyway... :-) -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu From owner-freebsd-arch@FreeBSD.ORG Fri May 21 14:39:48 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5F04016A4CE for ; Fri, 21 May 2004 14:39:48 -0700 (PDT) Received: from sccrmhc13.comcast.net (sccrmhc13.comcast.net [204.127.202.64]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0AE8743D45 for ; Fri, 21 May 2004 14:39:48 -0700 (PDT) (envelope-from cristjc@comcast.net) Received: from blossom.cjclark.org (c-24-6-187-112.client.comcast.net[24.6.187.112]) by comcast.net (sccrmhc13) with ESMTP id <2004052121394601600966s6e>; Fri, 21 May 2004 21:39:47 +0000 Received: from blossom.cjclark.org (localhost. [127.0.0.1]) by blossom.cjclark.org (8.12.9p2/8.12.8) with ESMTP id i4LLdj8B008718; Fri, 21 May 2004 14:39:45 -0700 (PDT) (envelope-from cristjc@comcast.net) Received: (from cjc@localhost) by blossom.cjclark.org (8.12.9p2/8.12.9/Submit) id i4LLdilN008717; Fri, 21 May 2004 14:39:44 -0700 (PDT) (envelope-from cristjc@comcast.net) X-Authentication-Warning: blossom.cjclark.org: cjc set sender to cristjc@comcast.net using -f Date: Fri, 21 May 2004 14:39:44 -0700 From: "Crist J. Clark" To: Brooks Davis Message-ID: <20040521213944.GB8246@blossom.cjclark.org> References: <20040521193605.GA8246@blossom.cjclark.org> <20040521194231.GA22816@Odin.AC.HMC.Edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040521194231.GA22816@Odin.AC.HMC.Edu> User-Agent: Mutt/1.4.2.1i X-URL: http://people.freebsd.org/~cjc/ cc: freebsd-arch@freebsd.org Subject: Re: Move /usr/sup to /var/db/sup? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: cjclark@alum.mit.edu List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 21:39:48 -0000 On Fri, May 21, 2004 at 12:42:32PM -0700, Brooks Davis wrote: > On Fri, May 21, 2004 at 12:36:05PM -0700, Crist J. Clark wrote: > > Just a minor thing, but I would think[0] most people would agree that > > /var/db/sup is a much more logical place for the CVSup "base" directory > > than /usr/sup. Yes, it doesn't take up much space on /usr, but for > > those who don't want to write to /usr[1] too much or mount /usr read- > > only, it's an irritant. > > > > Of course, there is one big reason not to change it, because it would > > be a change. > > > > Personally, I don't think it will be disruptive to make changes to the > > example files in /usr/share/examples/cvsup. People who already have > > /usr/sup populated are using their own localized versions of these > > files, so the change won't affect them (not that losing the "sup" > > directory is that big of a deal). A person starting with a copy of > > one of the examples is probably starting a fresh CVSup and will be > > creating a new sup dir anyway. > > This seems reasionable. If you're going to do it, I suggest adding > /var/db/sup to the appropriate mtree file so it always exists. That > way people who blindly copy their supfiles to /usr/sup will still get > something that works and their finger memory won't be broken. Hmmm... /usr/sup is not in BSD.mtree.usr. I believe cvsup(1) creates it when it does not exist. Are you saying we should add it to BSD.mtree.var even though we don't create /usr/sup? Or are you saying to create /var/db/sup and make a symlink in /usr to it? -- Crist J. Clark | cjclark@alum.mit.edu | cjclark@jhu.edu http://people.freebsd.org/~cjc/ | cjc@freebsd.org From owner-freebsd-arch@FreeBSD.ORG Fri May 21 14:47:36 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3EB0116A4CE for ; Fri, 21 May 2004 14:47:36 -0700 (PDT) Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75]) by mx1.FreeBSD.org (Postfix) with ESMTP id 23DC343D3F for ; Fri, 21 May 2004 14:47:36 -0700 (PDT) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (IDENT:brdavis@localhost.localdomain [127.0.0.1]) by odin.ac.hmc.edu (8.12.10/8.12.10) with ESMTP id i4LLlYs0002618; Fri, 21 May 2004 14:47:34 -0700 Received: (from brdavis@localhost) by odin.ac.hmc.edu (8.12.10/8.12.3/Submit) id i4LLlYQx002616; Fri, 21 May 2004 14:47:34 -0700 Date: Fri, 21 May 2004 14:47:34 -0700 From: Brooks Davis To: cjclark@alum.mit.edu Message-ID: <20040521214733.GA1549@Odin.AC.HMC.Edu> References: <20040521193605.GA8246@blossom.cjclark.org> <20040521194231.GA22816@Odin.AC.HMC.Edu> <20040521213944.GB8246@blossom.cjclark.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="u3/rZRmxL6MmkK24" Content-Disposition: inline In-Reply-To: <20040521213944.GB8246@blossom.cjclark.org> User-Agent: Mutt/1.5.4i cc: freebsd-arch@freebsd.org Subject: Re: Move /usr/sup to /var/db/sup? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 21:47:36 -0000 --u3/rZRmxL6MmkK24 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, May 21, 2004 at 02:39:44PM -0700, Crist J. Clark wrote: > On Fri, May 21, 2004 at 12:42:32PM -0700, Brooks Davis wrote: > > On Fri, May 21, 2004 at 12:36:05PM -0700, Crist J. Clark wrote: > > > Just a minor thing, but I would think[0] most people would agree that > > > /var/db/sup is a much more logical place for the CVSup "base" directo= ry > > > than /usr/sup. Yes, it doesn't take up much space on /usr, but for > > > those who don't want to write to /usr[1] too much or mount /usr read- > > > only, it's an irritant. > > >=20 > > > Of course, there is one big reason not to change it, because it would > > > be a change. > > >=20 > > > Personally, I don't think it will be disruptive to make changes to the > > > example files in /usr/share/examples/cvsup. People who already have > > > /usr/sup populated are using their own localized versions of these > > > files, so the change won't affect them (not that losing the "sup" > > > directory is that big of a deal). A person starting with a copy of > > > one of the examples is probably starting a fresh CVSup and will be > > > creating a new sup dir anyway. > >=20 > > This seems reasionable. If you're going to do it, I suggest adding > > /var/db/sup to the appropriate mtree file so it always exists. That > > way people who blindly copy their supfiles to /usr/sup will still get > > something that works and their finger memory won't be broken. >=20 > Hmmm... /usr/sup is not in BSD.mtree.usr. I believe cvsup(1) creates > it when it does not exist. Are you saying we should add it to > BSD.mtree.var even though we don't create /usr/sup? Or are you > saying to create /var/db/sup and make a symlink in /usr to it? Hmm, for some reason I've always copied the example supfiles to /usr/sup before editing them. For some reason I'd assumed this was something I read in the documentation in the distant past, but I can't find any evidence of such a recommendation so I may just made that convention up. Given that, I don't think it's necessicary to create /var/db/sup, but it might be a nice idea anyway. Another directory in /var/db certaintly wouldn't hurt anything. -- Brooks --=20 Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 --u3/rZRmxL6MmkK24 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD4DBQFArnj1XY6L6fI4GtQRAq3FAJjixUcHH79yxhJh6yApO6oDR+0jAKCiCJMu Cs3rrQoZ4UQPnmS8AxRIHA== =adDF -----END PGP SIGNATURE----- --u3/rZRmxL6MmkK24-- From owner-freebsd-arch@FreeBSD.ORG Fri May 21 14:57:41 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4662716A4CF for ; Fri, 21 May 2004 14:57:41 -0700 (PDT) Received: from mail.soaustin.net (mail.soaustin.net [207.200.4.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id 20B6143D1D for ; Fri, 21 May 2004 14:57:41 -0700 (PDT) (envelope-from linimon@lonesome.com) Received: by mail.soaustin.net (Postfix, from userid 502) id 2F9BD148C1; Fri, 21 May 2004 16:57:24 -0500 (CDT) Date: Fri, 21 May 2004 16:57:24 -0500 (CDT) From: Mark Linimon X-X-Sender: linimon@pancho To: Brooks Davis In-Reply-To: <20040521214733.GA1549@Odin.AC.HMC.Edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: cjclark@alum.mit.edu cc: freebsd-arch@freebsd.org Subject: Re: Move /usr/sup to /var/db/sup? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 21:57:41 -0000 Could we also consider the approach of leaving the supfiles in /usr/sup (since they are small and only rarely change) and having the files that change in /var/db/sup, or does the directory need to be the same? mcl From owner-freebsd-arch@FreeBSD.ORG Fri May 21 15:02:49 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 44ECD16A4CE for ; Fri, 21 May 2004 15:02:49 -0700 (PDT) Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1FD0D43D41 for ; Fri, 21 May 2004 15:02:49 -0700 (PDT) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (IDENT:brdavis@localhost.localdomain [127.0.0.1]) by odin.ac.hmc.edu (8.12.10/8.12.10) with ESMTP id i4LM2ls0003911; Fri, 21 May 2004 15:02:47 -0700 Received: (from brdavis@localhost) by odin.ac.hmc.edu (8.12.10/8.12.3/Submit) id i4LM2lE5003909; Fri, 21 May 2004 15:02:47 -0700 Date: Fri, 21 May 2004 15:02:47 -0700 From: Brooks Davis To: Mark Linimon Message-ID: <20040521220247.GA3366@Odin.AC.HMC.Edu> References: <20040521214733.GA1549@Odin.AC.HMC.Edu> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="qDbXVdCdHGoSgWSk" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.4i cc: cjclark@alum.mit.edu cc: freebsd-arch@freebsd.org Subject: Re: Move /usr/sup to /var/db/sup? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 22:02:49 -0000 --qDbXVdCdHGoSgWSk Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, May 21, 2004 at 04:57:24PM -0500, Mark Linimon wrote: > Could we also consider the approach of leaving the supfiles in > /usr/sup (since they are small and only rarely change) and having > the files that change in /var/db/sup, or does the directory need > to be the same? I don't think it matters to cvsup where the supfiles live. I'll almost certaintly keep putting them in /usr/sup because that's where my fingers think they should be. Pre-creating /var/db/sup would make this easier for me. -- Brooks --=20 Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 --qDbXVdCdHGoSgWSk Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFArnyGXY6L6fI4GtQRAmDMAJ43tWh641cq4l1cp/N6dN90Wb5/swCeMLzq ZscgJDSfLrpvg5AvOIpdKB8= =MgoT -----END PGP SIGNATURE----- --qDbXVdCdHGoSgWSk-- From owner-freebsd-arch@FreeBSD.ORG Fri May 21 15:16:06 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3CD0616A4CE for ; Fri, 21 May 2004 15:16:06 -0700 (PDT) Received: from sccrmhc13.comcast.net (sccrmhc13.comcast.net [204.127.202.64]) by mx1.FreeBSD.org (Postfix) with ESMTP id B10A843D1D for ; Fri, 21 May 2004 15:16:05 -0700 (PDT) (envelope-from cristjc@comcast.net) Received: from blossom.cjclark.org (c-24-6-187-112.client.comcast.net[24.6.187.112]) by comcast.net (sccrmhc13) with ESMTP id <20040521221556016009chqfe>; Fri, 21 May 2004 22:15:56 +0000 Received: from blossom.cjclark.org (localhost. [127.0.0.1]) by blossom.cjclark.org (8.12.9p2/8.12.8) with ESMTP id i4LMFt8B008893; Fri, 21 May 2004 15:15:55 -0700 (PDT) (envelope-from cristjc@comcast.net) Received: (from cjc@localhost) by blossom.cjclark.org (8.12.9p2/8.12.9/Submit) id i4LMFslK008892; Fri, 21 May 2004 15:15:54 -0700 (PDT) (envelope-from cristjc@comcast.net) X-Authentication-Warning: blossom.cjclark.org: cjc set sender to cristjc@comcast.net using -f Date: Fri, 21 May 2004 15:15:54 -0700 From: "Crist J. Clark" To: Mark Linimon Message-ID: <20040521221554.GA8734@blossom.cjclark.org> References: <20040521214733.GA1549@Odin.AC.HMC.Edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-URL: http://people.freebsd.org/~cjc/ cc: freebsd-arch@freebsd.org Subject: Re: Move /usr/sup to /var/db/sup? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: cjclark@alum.mit.edu List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 22:16:06 -0000 On Fri, May 21, 2004 at 04:57:24PM -0500, Mark Linimon wrote: > Could we also consider the approach of leaving the supfiles in > /usr/sup (since they are small and only rarely change) and having > the files that change in /var/db/sup, or does the directory need > to be the same? You can put the supfiles wherever you want. There is no standard place for a supfile. Since the "base" is specified in the supfile, there is a chicken-and-egg problem of placing the supfile in the base directory and expecting CVSup to find it. In addition, it would probably make more sense to put supfiles in the base directory (which is /usr in the examples) rather than in the sup directory (/usr/sup). I suspect most would consider having supfiles in /usr quite an afront. I didn't want to have to discuss the implications of the fact that the hardcoded base default in cvsup(1) is /usr/local/etc/cvsup, but that is probably the most logical place to put supfiles (logical as in "the place someone else might actually find them"). -- Crist J. Clark | cjclark@alum.mit.edu | cjclark@jhu.edu http://people.freebsd.org/~cjc/ | cjc@freebsd.org From owner-freebsd-arch@FreeBSD.ORG Fri May 21 15:40:54 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AF77E16A4CF for ; Fri, 21 May 2004 15:40:54 -0700 (PDT) Received: from mail.soaustin.net (mail.soaustin.net [207.200.4.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id 95B4643D3F for ; Fri, 21 May 2004 15:40:54 -0700 (PDT) (envelope-from linimon@lonesome.com) Received: by mail.soaustin.net (Postfix, from userid 502) id 9998F148B2; Fri, 21 May 2004 17:40:14 -0500 (CDT) Date: Fri, 21 May 2004 17:40:14 -0500 (CDT) From: Mark Linimon X-X-Sender: linimon@pancho To: cjclark@alum.mit.edu In-Reply-To: <20040521221554.GA8734@blossom.cjclark.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Mark Linimon cc: freebsd-arch@freebsd.org Subject: Re: Move /usr/sup to /var/db/sup? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 May 2004 22:40:54 -0000 Um, by "supfiles" I was meaning *-supfile, in case that wasn't clear? mcl From owner-freebsd-arch@FreeBSD.ORG Fri May 21 17:45:16 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8A5DF16A4CE; Fri, 21 May 2004 17:45:16 -0700 (PDT) Received: from smtp2.server.rpi.edu (smtp2.server.rpi.edu [128.113.2.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 352ED43D39; Fri, 21 May 2004 17:45:16 -0700 (PDT) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp2.server.rpi.edu (8.12.8/8.12.8) with ESMTP id i4M0igIX005961; Fri, 21 May 2004 20:44:43 -0400 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: References: Date: Fri, 21 May 2004 20:44:41 -0400 To: Julian Elischer , arch@freebsd.org From: Garance A Drosihn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: CanIt (www . canit . ca) cc: mtm@freebsd.org Subject: Re: atomic reference counting primatives. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 May 2004 00:45:16 -0000 At 1:56 PM -0700 5/20/04, Julian Elischer wrote: >This has been raised before but I have come across uses for >it again and again so I'm raising it again. JHB once posted >some atomic reference counting primitives. (Do you still have >them John?) Alfred once said he had some somewhere too, and >others have commented on this before, but we still don't seem >to have any. Btw, does this thread have anything to do with the present buuldworld-breakage for sparc64? I notice the compile-time errors are something like: /usr/src/lib/libthr/thread/thr_cancel.c: In function `testcancel': /usr/src/lib/libthr/thread/thr_cancel.c:123: warning: passing arg 1 of `atomic_cmpset_int' from incompatible pointer type My guess is that this is related to Mike's change to "Make libthr async-signal-safe without costly signal masking. [...etc...]". This breakage underlines one reason that it would be mighty convenient to have some "official" set of primitives. It is one thing if a developer has to roll-their-own solution for i386, but somewhat more challenging if that solution has to work across a half-dozen different hardware platforms. This also suggests that it would be nice if the primitives could be written so that if the wrong type-of-parameters are given, the compiles will fail on *all* platforms. -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu From owner-freebsd-arch@FreeBSD.ORG Fri May 21 18:53:28 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D4AC416A4CE; Fri, 21 May 2004 18:53:28 -0700 (PDT) Received: from rwcrmhc12.comcast.net (rwcrmhc12.comcast.net [216.148.227.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7E3B943D39; Fri, 21 May 2004 18:53:28 -0700 (PDT) (envelope-from julian@elischer.org) Received: from interjet.elischer.org ([24.7.73.28]) by comcast.net (rwcrmhc12) with ESMTP id <2004052201531401400g304le>; Sat, 22 May 2004 01:53:14 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id SAA91906; Fri, 21 May 2004 18:53:13 -0700 (PDT) Date: Fri, 21 May 2004 18:53:11 -0700 (PDT) From: Julian Elischer To: Garance A Drosihn In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org cc: mtm@freebsd.org Subject: Re: atomic reference counting primatives. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 May 2004 01:53:28 -0000 On Fri, 21 May 2004, Garance A Drosihn wrote: > At 1:56 PM -0700 5/20/04, Julian Elischer wrote: > >This has been raised before but I have come across uses for > >it again and again so I'm raising it again. JHB once posted > >some atomic reference counting primitives. (Do you still have > >them John?) Alfred once said he had some somewhere too, and > >others have commented on this before, but we still don't seem > >to have any. > > Btw, does this thread have anything to do with the present > buuldworld-breakage for sparc64? Not specifically, but for the reasons you outline below, it's an example of the kind of reason one might have for doing it.. > I notice the compile-time > errors are something like: > > /usr/src/lib/libthr/thread/thr_cancel.c: In function `testcancel': > /usr/src/lib/libthr/thread/thr_cancel.c:123: warning: passing > arg 1 of `atomic_cmpset_int' from incompatible pointer type > > My guess is that this is related to Mike's change to "Make libthr > async-signal-safe without costly signal masking. [...etc...]". > > This breakage underlines one reason that it would be mighty > convenient to have some "official" set of primitives. It is > one thing if a developer has to roll-their-own solution for > i386, but somewhat more challenging if that solution has to > work across a half-dozen different hardware platforms. > > This also suggests that it would be nice if the primitives > could be written so that if the wrong type-of-parameters are > given, the compiles will fail on *all* platforms. > > -- > Garance Alistair Drosehn = gad@gilead.netel.rpi.edu > Senior Systems Programmer or gad@freebsd.org > Rensselaer Polytechnic Institute or drosih@rpi.edu > From owner-freebsd-arch@FreeBSD.ORG Sat May 22 02:16:33 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7BAA816A4CE; Sat, 22 May 2004 02:16:33 -0700 (PDT) Received: from mx.nsu.ru (mx.nsu.ru [212.192.164.5]) by mx1.FreeBSD.org (Postfix) with ESMTP id 146D543D2D; Sat, 22 May 2004 02:16:33 -0700 (PDT) (envelope-from danfe@regency.nsu.ru) Received: from regency.nsu.ru ([193.124.210.26]) by mx.nsu.ru with esmtp (Exim 4.30) id 1BRSlr-0004De-Ce; Sat, 22 May 2004 16:26:23 +0700 Received: from regency.nsu.ru (localhost [127.0.0.1]) by regency.nsu.ru (8.12.10/8.12.10) with ESMTP id i4M9IMAT053216; Sat, 22 May 2004 16:18:22 +0700 (NOVST) (envelope-from danfe@regency.nsu.ru) Received: (from danfe@localhost) by regency.nsu.ru (8.12.10/8.12.10/Submit) id i4M9IMeu053182; Sat, 22 May 2004 16:18:22 +0700 (NOVST) (envelope-from danfe) Date: Sat, 22 May 2004 16:18:22 +0700 From: Alexey Dokuchaev To: "Crist J. Clark" Message-ID: <20040522091822.GA50435@regency.nsu.ru> References: <20040521193605.GA8246@blossom.cjclark.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040521193605.GA8246@blossom.cjclark.org> User-Agent: Mutt/1.4.2.1i cc: freebsd-arch@freebsd.org Subject: Re: Move /usr/sup to /var/db/sup? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 May 2004 09:16:33 -0000 On Fri, May 21, 2004 at 12:36:05PM -0700, Crist J. Clark wrote: > Just a minor thing, but I would think[0] most people would agree that > /var/db/sup is a much more logical place for the CVSup "base" directory > than /usr/sup. Yes, it doesn't take up much space on /usr, but for > those who don't want to write to /usr[1] too much or mount /usr read- > only, it's an irritant. > > Of course, there is one big reason not to change it, because it would > be a change. FWIW, compatibility symlink can hang in there for a while (until 6.0 maybe). ./danfe From owner-freebsd-arch@FreeBSD.ORG Sat May 22 11:18:06 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6004316A4CE for ; Sat, 22 May 2004 11:18:06 -0700 (PDT) Received: from blake.polstra.com (blake.polstra.com [64.81.189.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id 01B6C43D1F for ; Sat, 22 May 2004 11:18:06 -0700 (PDT) (envelope-from jdp@polstra.com) Received: from t30w.polstra.com (dsl081-189-078.sea1.dsl.speakeasy.net [64.81.189.78]) by blake.polstra.com (8.12.11/8.12.11) with ESMTP id i4MIHQog088118 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 22 May 2004 11:17:26 -0700 (PDT) (envelope-from jdp@mail.polstra.com) Received: from t30w.polstra.com (localhost [127.0.0.1]) by t30w.polstra.com (8.12.11/8.12.11) with ESMTP id i4MIHPl5000281; Sat, 22 May 2004 11:17:25 -0700 (PDT) (envelope-from jdp@t30w.polstra.com) Received: (from jdp@localhost) by t30w.polstra.com (8.12.11/8.12.11/Submit) id i4MIHOaN000280; Sat, 22 May 2004 11:17:24 -0700 (PDT) (envelope-from jdp) Message-ID: X-Mailer: XFMail 1.5.5 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Sat, 22 May 2004 11:17:24 -0700 (PDT) From: John Polstra To: Garance A Drosihn X-Bogosity: No, tests=bogofilter, spamicity=0.088176, version=0.14.5 cc: freebsd-arch@freebsd.org Subject: Re: Move /usr/sup to /var/db/sup? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 May 2004 18:18:06 -0000 On 21-May-2004 Garance A Drosihn wrote: > I have no idea if that is why someone went with /usr/sup in > the example supfiles, though. I do. :-) That was the location in the original supfiles used by the old "sup" utility that CVSup replaced. When I first released CVSup I made it so that people could use their old supfiles unmodified -- simply because I wanted it to be as easy as possible to switch to CVSup so a lot of people would try it out. I agree that /usr/sup is a lousy place for the metadata, and that something under /var/db would be a lot more sensible. John From owner-freebsd-arch@FreeBSD.ORG Sat May 22 23:59:30 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from green.homeunix.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id A42FD16A4CE; Sat, 22 May 2004 23:59:30 -0700 (PDT) Received: from green.homeunix.org (green@localhost [127.0.0.1]) by green.homeunix.org (8.12.11/8.12.11) with ESMTP id i4N6xUEc059178; Sun, 23 May 2004 02:59:30 -0400 (EDT) (envelope-from green@green.homeunix.org) Received: (from green@localhost) by green.homeunix.org (8.12.11/8.12.11/Submit) id i4N6xTIH059177; Sun, 23 May 2004 02:59:29 -0400 (EDT) (envelope-from green) Date: Sun, 23 May 2004 02:59:28 -0400 From: Brian Feldman To: Peter Jeremy Message-ID: <20040523065928.GD51125@green.homeunix.org> References: <40AD2405.DC13B45C@freebsd.org> <20040521213208.GA87546@cirb503493.alcatel.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040521213208.GA87546@cirb503493.alcatel.com.au> User-Agent: Mutt/1.5.6i cc: Robert Watson cc: arch@freebsd.org cc: Andre Oppermann Subject: Re: Network Stack Locking X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 May 2004 06:59:31 -0000 On Sat, May 22, 2004 at 07:32:08AM +1000, Peter Jeremy wrote: > On Thu, 2004-May-20 23:32:53 +0200, Andre Oppermann wrote: > >Robert Watson wrote: > >... > >> Note that there are some serious issues with the current locking changes: > >Progress happens incrementally. Put in Green's kqueue locking, have > >that working correctly and make it perfect in a second step. > > I don't believe this is the correct approach at this time. Brian's > code removes functionality that people have stated that they _do_ use. > In theory, John-Mark's approach offers better performance without the > loss of functionality. Before implementing Brian's code, the Project > needs to decide which direction it should move in. *shrug* I added recursive kqueues because some people indicated that they actually had reason to use it. I still haven't added the NOTE_TRACK functionality because there is no known project in the entire world that uses it, so it has no chance of breaking anything at all for me by not having it. Anyway, I still want to see any alternative kqueue locking implementations. I haven't even seen a complete enough description of what the proposed change is supposed to look like to know whether it actually solves all of the issues that kqueue has now. If someone posts all the details and not just bits and pieces.... I don't know why I am the only person to have taken a shot at a complete implementation when the subsystem is so completely MP-broken already. -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\