From owner-freebsd-arch@FreeBSD.ORG  Sun May 16 07:17:03 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 393BB16A4CE; Sun, 16 May 2004 07:17:03 -0700 (PDT)
Received: from comp.chem.msu.su (comp.chem.msu.su [158.250.32.97])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 96E1843D53; Sun, 16 May 2004 07:17:01 -0700 (PDT)
	(envelope-from yar@comp.chem.msu.su)
Received: from comp.chem.msu.su (localhost [127.0.0.1])
	by comp.chem.msu.su (8.12.9p2/8.12.9) with ESMTP id i4GEGx3F040353;
	Sun, 16 May 2004 18:16:59 +0400 (MSD)
	(envelope-from yar@comp.chem.msu.su)
Received: (from yar@localhost)
	by comp.chem.msu.su (8.12.9p2/8.12.9/Submit) id i4GEGwHc040352;
	Sun, 16 May 2004 18:16:59 +0400 (MSD)
	(envelope-from yar)
Date: Sun, 16 May 2004 18:16:58 +0400
From: Yar Tikhiy <yar@comp.chem.msu.su>
To: arch@freebsd.org, net@freebsd.org
Message-ID: <20040516141658.GA39893@comp.chem.msu.su>
References: <20040508034514.GA937@grosbein.pp.ru>
	<Pine.BSF.4.53.0405080636010.66978@e0-0.zab2.int.zabbadoz.net>
	<20040508132354.GB44214@comp.chem.msu.su>
	<20040515182157.GB89625@comp.chem.msu.su>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20040515182157.GB89625@comp.chem.msu.su>
User-Agent: Mutt/1.5.6i
cc: Eugene Grosbein <eugen@grosbein.pp.ru>
Subject: TIME_WAIT sockets from other users (was Re: bin/65928: [PATCH]
	stock ftpd uses superuser credentials for active mode sockets)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 May 2004 14:17:03 -0000

Note for the impatient:  This message does not discuss the well-known
issue of reusing local addresses through setting SO_REUSEADDR.  This
message is on reusing local addresses occupied by sockets belonging
to other users.

On Sat, May 15, 2004 at 10:21:57PM +0400, Yar Tikhiy wrote:
> 
> Attached below is a patch addressing the issue of the inability to
> reuse a local IP:port couple occupied by an established TCP connection
> from another user, but by no listeners.  Could anybody with fair
> understanding of our TCP/IP stack review it please?  Thanks.
> 
> -- 
> Yar
> 
> Index: in_pcb.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/netinet/in_pcb.c,v
> retrieving revision 1.146
> diff -u -p -r1.146 in_pcb.c
> --- in_pcb.c	23 Apr 2004 23:29:49 -0000	1.146
> +++ in_pcb.c	15 May 2004 17:37:18 -0000
> @@ -340,6 +340,8 @@ in_pcbbind_setup(inp, nam, laddrp, lport
>  						return (EADDRINUSE);
>  				} else
>  				if (t &&
> +				    (so->so_type != SOCK_STREAM ||
> +				     ntohl(t->inp_faddr.s_addr) == INADDR_ANY) &&
>  				    (ntohl(sin->sin_addr.s_addr) != INADDR_ANY ||
>  				     ntohl(t->inp_laddr.s_addr) != INADDR_ANY ||
>  				     (t->inp_socket->so_options &

One more detail to note:

Currently if another user's socket is in the TIME_WAIT state, it
still counts as occupying the local IP:port couple.  I cannot see
the point of such a behaviour.  Restricting bind() is to disallow
unprivileged port stealth, but how can one steal a connection in
the TIME_WAIT state?

For FreeBSD-4 the above patch would take care of this case along
with established connections, but in CURRENT TIME_WAIT connections
are a special case since they no longer use full-blown state.
Therefore, for CURRENT the above patch mutates into the below one.
Do I have a point?

-- 
Yar

Index: in_pcb.c
===================================================================
RCS file: /home/ncvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.146
diff -u -p -r1.146 in_pcb.c
--- in_pcb.c	23 Apr 2004 23:29:49 -0000	1.146
+++ in_pcb.c	16 May 2004 13:33:33 -0000
@@ -332,14 +332,10 @@ in_pcbbind_setup(inp, nam, laddrp, lport
 	 * XXX
 	 * This entire block sorely needs a rewrite.
 	 */
-				if (t && (t->inp_vflag & INP_TIMEWAIT)) {
-					if ((ntohl(sin->sin_addr.s_addr) != INADDR_ANY ||
-					    ntohl(t->inp_laddr.s_addr) != INADDR_ANY ||
-					    (intotw(t)->tw_so_options & SO_REUSEPORT) == 0) &&
-					    (so->so_cred->cr_uid != intotw(t)->tw_cred->cr_uid))
-						return (EADDRINUSE);
-				} else
 				if (t &&
+				    ((t->inp_vflag & INP_TIMEWAIT) == 0) &&
+				    (so->so_type != SOCK_STREAM ||
+				     ntohl(t->inp_faddr.s_addr) == INADDR_ANY) &&
 				    (ntohl(sin->sin_addr.s_addr) != INADDR_ANY ||
 				     ntohl(t->inp_laddr.s_addr) != INADDR_ANY ||
 				     (t->inp_socket->so_options &


From owner-freebsd-arch@FreeBSD.ORG  Mon May 17 16:28:32 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 1379316A4CE; Mon, 17 May 2004 16:28:32 -0700 (PDT)
Received: from comp.chem.msu.su (comp.chem.msu.su [158.250.32.97])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 0FF9B43D39; Mon, 17 May 2004 16:28:31 -0700 (PDT)
	(envelope-from yar@comp.chem.msu.su)
Received: from comp.chem.msu.su (localhost [127.0.0.1])
	by comp.chem.msu.su (8.12.9p2/8.12.9) with ESMTP id i4HNSS3F035506;
	Tue, 18 May 2004 03:28:28 +0400 (MSD)
	(envelope-from yar@comp.chem.msu.su)
Received: (from yar@localhost)
	by comp.chem.msu.su (8.12.9p2/8.12.9/Submit) id i4HNSRDB035501;
	Tue, 18 May 2004 03:28:27 +0400 (MSD)
	(envelope-from yar)
Date: Tue, 18 May 2004 03:28:27 +0400
From: Yar Tikhiy <yar@comp.chem.msu.su>
To: Cyrille Lefevre <clefevre-lists@9online.fr>
Message-ID: <20040517232827.GD27584@comp.chem.msu.su>
References: <20040515092114.GB67531@comp.chem.msu.su>
	<042601c43a6b$cd1cb9a0$7890a8c0@dyndns.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <042601c43a6b$cd1cb9a0$7890a8c0@dyndns.org>
User-Agent: Mutt/1.5.6i
cc: arch@freebsd.org
cc: hackers@freebsd.org
Subject: Re: Interoperation of flock(2), fcntl(2), and lockf(3)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 May 2004 23:28:32 -0000

On Sat, May 15, 2004 at 01:00:13PM +0200, Cyrille Lefevre wrote:
> "Yar Tikhiy" <yar@comp.chem.msu.su> wrote:
> [snip]
> > Considering all the above, I'd like to add the following paragraph
> > to the flock(2), lockf(3), and fcntl(2) man pages (replacing the
> > sentence quoted from lockf(3)):
> > 
> > The flock(2), fcntl(2), and lockf(3) locks are compatible.
> > Processes using different locking interfaces can cooperate
> > over the same file safely.  However, only one of such
> > interfaces should be used within a process.  If a file is
> 
> s/a process/the same process/ ?

Agreed, thanks!

BTW, since no objections were raised and Kirk encouraged me to make
the change (thank you Kirk!), I just did so.

-- 
Yar

From owner-freebsd-arch@FreeBSD.ORG  Thu May 20 13:31:19 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 83D5B16A4CE
	for <arch@FreeBSD.org>; Thu, 20 May 2004 13:31:19 -0700 (PDT)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E020843D49
	for <arch@FreeBSD.org>; Thu, 20 May 2004 13:31:18 -0700 (PDT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (localhost [127.0.0.1])
	by fledge.watson.org (8.12.11/8.12.11) with ESMTP id i4KKUQDe094733
	for <arch@FreeBSD.org>; Thu, 20 May 2004 16:30:27 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Received: from localhost (robert@localhost)i4KKUQ2G094730
	for <arch@FreeBSD.org>; Thu, 20 May 2004 16:30:26 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Thu, 20 May 2004 16:30:26 -0400 (EDT)
From: Robert Watson <rwatson@FreeBSD.org>
X-Sender: robert@fledge.watson.org
To: arch@FreeBSD.org
Message-ID: <Pine.NEB.3.96L.1040520162957.90528H-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Subject: Network Stack Locking
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 May 2004 20:31:19 -0000


1.5 line summary: 

  This is an e-mail about the on-going network stack locking and contains
  largely technical stuff. 

Executive summary:

  The high level view, for those less willing to wade through a greater
  level of detail, is that we have a substantial work in progress with a
  lot of our bases covered, and that we're looking for broader exposure
  for the work.  We've been merging smaller parts of the work (supporting
  infrastructure, fine-grained locking for specific leaf dependencies),
  and are starting to think about larger scale merging over the next
  month or two.  There are some known serious issues in the current work,
  but we've also identified some areas that need attention outside of the
  stack in order to make serious progress on merging.  There are also some
  important tasks that require owners moving forward, and a solicitation
  for those areas.  I don't attempt to capture everything, in particular
  things like locking strategies in this e-mail.  You will find patch URLs
  and perforce references.

Body:

As many of you are aware, I've become the latest inheritor of the omnibus
"Network Stack Locking" task of SMPng.  This work has a pretty long
history that I won't attempt to go into here, other than to observe that:

- This is a product of the adoption of the SMPng approach a few years ago
  by the FreeBSD Project for the FreeBSD 5.x line.  This approach
  attempts to address a lack of kernel parallelism and preemption, as well
  as generally formalizing synchronization, adopting architectural
  properties such as interrupt threads and a more general use of threads
  in the kernel, etc. 

- The vast majority of work that will be discussed in this e-mail is the
  product of significant contributions of others, including: Jonathan
  Lemon, Jennifer Yang, Jeffrey Hsu, and Sam Leffler, and a large number
  of other contributors (many of whom are named in recent status
  reports, but some of whom I've inevitably accidentally omitted and would
  be happy to be reminded of via private e-mail!). 

The goal of this e-mail is to provide a bit of high level information
about what is going on to increase awareness, solicit involvement in a
variety of areas, and throw around words like "merge schedule".  Warning: 
this is a work in progress, and you will find rough parts.  This is being
worked on actively, but by bringing this up during the process, we can
improve the work.  If you see things that scare you, that's a reasonable
response.

Now into the details:

Those following the last few status reports will know that recent work has
focused in the following areas: 

- Introducing and refining data based locking for the top levels of the
  network stack (sockets, socket buffers, et al).

- Refining and testing locking for lower pieces of the stack that already
  have locking.

- Locking for UNIX domain sockets, FIFOs, etc.

- Iterating through pseudo-interfaces and network interfaces to identify
  and correct locking problems.

- Allow Giant to be conditionally acquired across the entire stack using a
  Giant Toggle Switch.

- Address interactions with tightly coupled support infrastructure for the
  stack, including the MAC Framework, kqueue, sigio, select() general
  signaling primitives, et al.

- Investigating and in many cases locking of less popular/less widely used
  stack components that were previously unaddressed, such as IPv6,
  netatalk, netipx, et al.

- Some local changes used to monitor and assert locks at a finer
  granularity than in the main tree.  Specifically, sampling of callouts
  and timeouts to measure what we're grabbing Giant for, and in certain
  branches, the addition of a great many assertions.

This work is occurring in a number of Perforce branches.  The primary
branch that is actively worked on is "rwatson_netperf", which may be found
at the following patch:

  //depot/users/rwatson/netperf/...

Additional work is taking place to explore socket locking issues in:

  //depot/users/rwatson/net2/...

A number of other developers have branches off of these branches to
explore locking for particular subsystems.  There are also some larger
unintegrated patch sets for data-based NFS locking, fixing the user space
build, etc.  You can find a non-Perforce version at: 

  http://www.watson.org/~robert/freebsd/netperf/

This includes a basic change log and incrementally generated patches, work
sets, etc.  Perforce is the preferred way to get to the work as it
provides easier access to my working notes, the ability to maintain local
changes, get the most recent version, etc.  I try to drop patches fairly
regularly -- several times a week against HEAD, but due to travel to
BSDCan, I'm about two weeks behind.  I hope to make substantial headway
this weekend in updating the patch set and integrating a number of recent
socket locking changes from various work branches.

This work is currently a work in progress, and has a number of known
issues, including some lock order reversal problems, known deficiencies
in socket locking coverage of socket variables, etc.  However, it's been
being reviewed and worked on by an increasingly broad population of
FreeBSD developers, so I wanted to move to a more general patch posting
process and attempt to identify additional "hired hands" for areas that
require additional work.  Here are current known tasks and current owners: 

Task					Developer
----					---------
Sockets					Robert Watson
Synthetic network interfaces		Robert Watson
Netinet6				George Neville-Neil
Netatalk				Robert Watson
Netipx					Robert Watson
Interface Locking			Max Laier, Luigi Rizzo,
					Maurycy Pawlowski-Wieronski,
					Brooks Davis
Routing Cleanup				Luigi Rizzo
KQueue (subsystem lock)			Brian Feldman
KQueue (data locking)			John-Mark Gurney	
NFS Server (subsystem lock)		Robert Watson
NFS Server (data locking)		Rick Macklem
SPPP					Roman Kurakin
Userspace build				Roman Kurakin
VFS/fifofs interactions			Don Lewis
Performance measurement			Pawel Jakub Dawidek

And of course, I can't neglect to mention the on-going work of Kris
Kennaway to test out these changes on high-load systems :-).

Some noted absences in the above, and areas where I'd like to see
additional people helping out are:

- Reviewing Netgraph modules for correct interactions with locking in the
  remainder of the system.  I've started pushing some locking into
  ng_ksocket.c and ng_socket.c, and some of the basic infrastructure that
  needed it, but each module will need to be reviewed for correct locking. 

- ATM -- Harti? :-)

- Network device drivers -- some have locking, some have correct locking,
  some have potential interactions with other pieces of the system (such
  as the USB stack).  Note that for a driver to work correctly with a
  Giant-free system, it must be safe to invoke ifp->if_start() without
  holding Giant, and for if_start() to be aware that it cannot
  acquire Giant without generating a lock order issue.  It's OK for
  if_input() to be called with Giant, although undesirable generally.
  Some drivers also have locking that is commented out by default due to
  use of recursive locks, but I'm not sure this is necessarily sufficient
  problem not to just turn on the locking. 

- Complete coverage of synthetic/pseudo-interfaces.  In particular,
  careful addressing of if_gif and other "cross-layer" and protocol aware
  pieces.

- mbuma -- Bosko's work looks good to me, we need to make sure all the
  pieces work with each other.  Getting down to one large memory allocator
  would be great.  I'm interested in exploring uniprocessor optimizations
  here -- I notice that a lot of the locks getting acquired in profiling
  are for memory allocation.  Exploring using critical sections, per-cpu
  variables/caching, and pinning both seem like reasonable approaches to
  reduce synchronization costs here. 

Note that there are some serious issues with the current locking changes:

- Socket locking is deficient in a number of ways -- primarily that there
  are several important socket fields that are currently insufficiently or
  inconsistently synchronized.  I'm in the throes of correcting this, but
  that requires a line-by-line review of all use of sockets, which will
  take me at least another week or two to complete.  I'm also addressing
  some races between listen sockets and the sockets hung off of them
  during the new connection setup and accept process.  Currently there is
  no defined lock order between multiple sockets, and if possible I'd like
  to keep it that way. 

- Based on the BSD/OS strategy, there are two mutexes on a socket: each
  socket buffer has a mutex (send, receive), and then the basic socket
  fields are locked using SOCK_LOCK(), which actually uses the receive
  socket buffer mutex.  This reduces the locking overhead while helping to
  address ordering issues in the upward and downward paths.  However,
  there are also some issues of locking correctness and redundancy, and
  I'm looking into these as part of an overall review of the strategy.
  It's worth noting that the BSD/OS snapshot we have has substantially
  incomplete and non-functional socket locking, so unlike some other
  pieces of the network stack, it was not possible to use the strategy
  whole-cloth.  In the long term, the socket locking model may require
  substantial revision.

- Per some recent discussions on -CURRENT, I've been exploring mitigating
  locking costs through coalescing activities on multiple packets.  I.e.,
  effectively passing in queues of packet chains across API boundaries, as
  well as creating local work queues.  It's a bit early to commit to this
  approach because the performance numbers have not confirmed the benefit,
  but it's important to keep that possible approach in mind across all
  other locking work, as it trades off work queue latency with
  synchronization cost.  My earlier experimentation occurred at the end of
  2003, so I hope to revisit this now that more of the locking is in place
  to offer us advantages in preemption and parallelism.

- They enable net.isr.enable by default, which provides inbound packet
  parallelism through running to completion in the ithread.  This has
  other down sides, and while we should provide the option, I think we
  should continue to support forcing use of the netisr.  One of the
  problems with the netisr approach is how to accomplish inbound
  processing parallelism without sacrificing the currently strong ordering
  properties, which could cause bad TCP behavior, etc.  We should seriously
  consider at least some aspects of Jeffrey Hsu's work on DragonFly
  to explore providing for multiple netisr's bound to CPUs, then directing
  traffic based on protocol aware hashing that permits us to maintain
  sufficient ordering to meeting higher level protocol requirements while
  avoiding the cost of maintaining full ordering.  This isn't something we
  have to do immediately, but exploiting parallelism requires both
  effective synchronization and effective balancing of load.

  In the short term, I'm less interested in the avoidance of
  synchronization of data adopted in the DragonFly approach, since I'd
  like to see that approach validated on a larger chunk of the stack
  (i.e., across the more incestuous pieces of the network stack), and also
  to see performance numbers that confirm the claims.  The approach we're
  currently taking is tried and true across a broad array of systems
  (almost every commercial UNIX vendor, for example), and offers many
  benefits (such as a very strong assertion model).  However, as aspects
  of the DFBSD approach are validated (or not, as the case may be), we
  should consider adopting things as they make sense.  The approaches
  offer quite a bit of promise, but are also very experimental and will
  require a lot of validation, needless to say.  I've done a little bit of
  work to start applying the load distribution approach on FreeBSD, but
  need to work more on the netisr infrastructure before I'll be able to
  evaluate its effectiveness there.

- There are still some serious issues in the timely processing and
  scheduling of device driver interrupts, and these affect performance in
  a number of ways.  They also change the degree of effective coalescing
  of interrupts, making it harder to evaluate strategies to lower costs.
  These issues aren't limited to the network stack work, but I wanted to
  make sure it was on the list of concerns.  Improving our scheduling and
  handling of interrupts will be critical to realizing the performance
  benefits SMPng has offered.

- There are issues relating to upcalls from the socket layer: while many
  consumers of sockets simply sleep for wakeups on socket pointers,
  so_upcall() permits the network stack to "upcall" into other components
  of the system.  I believe this was introduced initially for the NFS
  server to allow initial processing of RPCs to occur in the netisr rather
  than waiting on a context switch to the NFS server threads.  However,
  it's now also used for accept sockets, and I'm aware of outstanding
  changes that modify the NFS client to use it as well.  We need to
  establish what locks will be held over the upcall, if any, and what
  expectations are in place for implementers of upcall functions.  At the
  very least, they have to be MPSAFE, but there are also potential lock
  order issues.

- Locking for KQueue is critical to success.  Without locking down the
  event infrastructure, we can't remove Giant from the many interesting
  pieces of the network stack.  KQueue is an example of a high level of
  incestuousness between levels, and will require careful handling.
  Brian's approach adopts a "single subsystem" for KQueue and as such
  offers a low hanging fruit approach, but comes at a number of costs, not
  least is parallelism loss and functional loss.  John-Mark's approach
  appears to offer a more granular locking approach offering higher
  parallelism, but at the cost of complexity.  I've not yet had the
  opportunity to review either in any detail, but I know Brian has
  integrated a work branch in Perforce that combines both the locking in
  rwatson_netperf, and perform testing.  There's obviously more work to go
  on here, and it is required to get to "Giant-free operation". 

For more complete changes and history, I would refer you to the last few
FreeBSD Status Reports on network stack locking.  I would also encourage
you to contact me if you would like to claim some section of the stack for
work so I can coordinate activities.  These patch sets have been pounded
heavily in a wide variety of environments, but there are several known
issues so I would recommend using them cautiously.

In terms of merging: I've been gradually merging a lot of the
infrastructure pieces as I went along.  The next big chunks to consider
merging are:

- Socket locking.  This needs to wait until I'm more happy with the
  strategy.

- UNIX domain socket locking.  This is probably an early candidate, but
  because of potential interactions with socket locking changes, I've been
  deferring the merge. 

- NFS server locking.  I had planned to merge the current subsystem lock
  quickly, but then Rick turned up with fine-grained data based locking of
  the NFS server, and NFSv4 server code when I asked him for review of the
  subsystem lock, so I've been holding off.

- Additional general infrastructure, such as more psuedo-interface
  locking, fifofs stuff, etc.  I'll continue on the gradual incremental
  merge path as I have been for the past few months.

It's obviously desirable to get things merged as soon as they are ready,
even with Giant remaining over the stack, so that we can get broad
exercising of the locking assertions in INVARIANTS and WITNESS.  As such,
over the next month I anticipate an increasing number of merges, and
increasing usability of "debug.mpsafenet" in the main tree.  Turning off
Giant will likely lead to problems for some time to come, but the sooner
we get exposure, the better life will be.  We've done a lot of heavy
testing of common code paths, but working out the edge cases will take
some time.  We're prepared to live in a world with a dual-mode stack for
some period, but that has to be an interim measure. 

So I guess the upshot is "Stuff is going on, be aware, volunteer to
help!".

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Senior Research Scientist, McAfee Research


From owner-freebsd-arch@FreeBSD.ORG  Thu May 20 13:56:39 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4D69616A4CF
	for <arch@freebsd.org>; Thu, 20 May 2004 13:56:39 -0700 (PDT)
Received: from rwcrmhc13.comcast.net (rwcrmhc13.comcast.net [204.127.198.39])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1943043D2D
	for <arch@freebsd.org>; Thu, 20 May 2004 13:56:39 -0700 (PDT)
	(envelope-from julian@elischer.org)
Received: from interjet.elischer.org ([24.7.73.28])
          by comcast.net (rwcrmhc13) with ESMTP
          id <2004052020563801500qckjie>; Thu, 20 May 2004 20:56:38 +0000
Received: from localhost (localhost.elischer.org [127.0.0.1])
	by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id NAA74737
	for <arch@freebsd.org>; Thu, 20 May 2004 13:56:38 -0700 (PDT)
Date: Thu, 20 May 2004 13:56:36 -0700 (PDT)
From: Julian Elischer <julian@elischer.org>
To: arch@freebsd.org
Message-ID: <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Subject: atomic reference counting primatives.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 May 2004 20:56:39 -0000


This has been raised before but I've come across uses for it again and
again so I'm raising it again. 
JHB once posted some atomic referenc counting primatives. (Do you still
have them John?) 
Alfred once said he had soem somewhere too, and other s have commentted
on this before, but we still don't seem to have any.

every object is reference counted with its own code and 
sometimes it's done poorly.

Some peiople indicated that there are cases where a generic refcounter
can not be used and usd this as  a reason to not have one at all.

So, here are some possibilities..
my first "write it down without too much thinking" effort..

typedef {mumble} refcnt_t

refcnt_add(refcnt_t *)
  Increments the reference count.. no magic except to be atomic.


int	refcnt_drop(refcnt *, struct mutex *)
 Decrements the refcount. If it goes to 0 it returns 0 and locks the
mutex  (if the mutex is supplied)..


refcnt_init(refcnt_t *)
 would simply set the counter to 0 if refcnt_t is defined as a simple
type, but could do more if a more complex refcount is used (say for
debugging)


debugging versions of the above might store all sorts of stuff in the
refcount.. (e.g. pid, __LINE__ __FUNCTION__ etc.)
vm->vm_exitingcnt)

If these were in place it would be a first step in 
tightennign up some of the reference counting we see in the kernel
and there are several places I've seen over the last few years where
locks are used purely to allow reference counts to be manipulated.

thoughts....?
better ideas?


From owner-freebsd-arch@FreeBSD.ORG  Thu May 20 14:32:56 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6A59816A4CE
	for <arch@FreeBSD.org>; Thu, 20 May 2004 14:32:56 -0700 (PDT)
Received: from mailtoaster1.pipeline.ch (mailtoaster1.pipeline.ch
	[62.48.0.70])	by mx1.FreeBSD.org (Postfix) with ESMTP id A5AEF43D48
	for <arch@FreeBSD.org>; Thu, 20 May 2004 14:32:55 -0700 (PDT)
	(envelope-from andre@freebsd.org)
Received: (qmail 47139 invoked from network); 20 May 2004 21:32:54 -0000
Received: from unknown (HELO freebsd.org) ([62.48.0.53])
          (envelope-sender <andre@freebsd.org>)
          by mailtoaster1.pipeline.ch (qmail-ldap-1.03) with SMTP
          for <rwatson@FreeBSD.org>; 20 May 2004 21:32:54 -0000
Message-ID: <40AD2405.DC13B45C@freebsd.org>
Date: Thu, 20 May 2004 23:32:53 +0200
From: Andre Oppermann <andre@freebsd.org>
X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Robert Watson <rwatson@FreeBSD.org>
References: <Pine.NEB.3.96L.1040520162957.90528H-100000@fledge.watson.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: arch@FreeBSD.org
Subject: Re: Network Stack Locking
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 May 2004 21:32:56 -0000

Robert Watson wrote:
...
> Note that there are some serious issues with the current locking changes:
...
> 

I vote for the approach to get in as much as possible from the moment
on it is known to work *correctly* (not neccessarily perfectly optimal/
optimized).  Having something correct is an ideal base to start for
optimizing.  There I'm ready to jump in and go ahead to make things
better by re-arraning or re-writing them.  One of my main dislikings
of the current 'net' and 'netinet' code is it's obfuscation and really
overloaded functions.  Even though I'm very fluent in the IPv4 network
code it is still hurting my eye and brain when looking through certain
files...  So I've started to clean up large parts of it.  The very
first thing is to get ipfw out of ip_input/ip_output which I have early
patches (see last status report).  In that patch are two more things.
One is to make ip_reass() a real function taking a fragemented packet
instead of being a half-way stub only capable of being called from
ip_input.  The second thing is to move all ip options related functions
(which are quite many/large and seldomly used) to their own .c/.h file.
With that alone both ip_input/ip_output shrink by approx. 1/3 in size
and get way more readable and understandable.

Well, the only thing I really want to say is that correctly working
code is always a great base to optimize from.  I think this is one
of the big lessions I've learned through my relatively young kernel
programming career and from the VM work of John Dyson (for the younger
among us, he and David Greenman did the orginal implementation of the
unified VM we have.  John lost himself in micro-optimizations where
he somewhat lost the ability to see the forest because of all the
trees in the way.  In the end he had to give it up).

Progress happens incrementally.  Put in Green's kqueue locking, have
that working correctly and make it perfect in a second step.

-- 
Andre

From owner-freebsd-arch@FreeBSD.ORG  Thu May 20 18:03:27 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 6E59C16A4CE; Thu, 20 May 2004 18:03:27 -0700 (PDT)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 1B27443D46; Thu, 20 May 2004 18:03:27 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	i4L13Q7Z068013;	Thu, 20 May 2004 18:03:26 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id i4L13QWT068012;
	Thu, 20 May 2004 18:03:26 -0700 (PDT)
	(envelope-from dillon)
Date: Thu, 20 May 2004 18:03:26 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200405210103.i4L13QWT068012@apollo.backplane.com>
To: Robert Watson <rwatson@freebsd.org>
References: <Pine.NEB.3.96L.1040520162957.90528H-100000@fledge.watson.org>
cc: arch@freebsd.org
Subject: Re: Network Stack Locking
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 01:03:27 -0000

    It's my guess that we will be able to remove the BGL from large
    portions of the DFly network stack sometime late June or early July,
    after USENIX, at which point it will be possible to test SMP aspects of 
    the localized cpu distribution method.  Right now the network stack is
    still under the BGL (as is most of the system, our approach to MP is
    first to isolate and localize the conflicting subsystems, then to release
    the BGL for that subsystem's thread(s)).

    It should be noted that the biggest advantages of the distributed
    approach are (1) The ability to operate on individual PCBs without
    having to do any token/mutex/other locking at all, (2) Cpu locality
    of reference in regards to cache mastership of the PCBs and related data,
    and (3) avoidance of data cache pollution across cpus (more cpus == 
    better utilization of individual L1/L2 caches and far greater
    scaleability).  The biggest disadvantage is the mandatory thread switch
    (but this is mitigated as load increases since each thread can work on
    several PCBs without further switches, and because our thread scheduler
    is extremely light weight under SMP conditions).  Messaging passing
    overhead is very low since most operations already require some sort of
    roll-up structure to be passed (e.g. an mbuf in the case of the network).

    We are running the full bore threaded, distributed network stack even
    on UP systems now (meaning: message passing and thread switching still
    occurs even though there is only one target thread for a particular
    protocol).  We have done fairly significant testing on GigE LANs and
    have not noticed any degredation in network performance so we are
    certain we are on the right track.

    I do not expect cpu balancing to be all that big an issue, actually,
    especially due to the typically short lived connection life that occurs
    in these scenarios.  But mutex avoidance is *REALLY* *HUGE* if you are
    processing a lot of TCP connections in parallel due to the small quantums
    of work involved.

    In anycase, if you are seriously considering any sort of distributed
    methodology you should also consider formalizing a messaging passing
    API for FreeBSD.  Even if you don't like our LWKT messaging API, I
    think you would love the DFly IPI messaging subsystem and it would be
    very easy to port as a first step.  We use it so much now in DFly
    that I don't think I could live without it.  e.g. for clock distribution,
    interrupt distribution, thread/cpu isolation, wakeup(), MP-safe messaging
    at higher levels (and hence packet routing), free()-return-to-
    originating-cpu (mutexless slab allocator), SMP MMU synchronization
    (the basic VM/pte-race issue with userland brought up by Alan Cox),
    basic scheduler operations, signal(), and the list goes on and on.
    In DFly, IPI messaging and message processing is required to be MP
    safe (it always occurs outside the BGL, like a cpu-localized fast
    interrupt), but a critical section still protects against reception
    processing so code that uses it can be made very clean.

						-Matt

:- They enable net.isr.enable by default, which provides inbound packet
:...
:  consider at least some aspects of Jeffrey Hsu's work on DragonFly
:  to explore providing for multiple netisr's bound to CPUs, then directing
:  traffic based on protocol aware hashing that permits us to maintain
:  sufficient ordering to meeting higher level protocol requirements while
:  avoiding the cost of maintaining full ordering.  This isn't something we
:  have to do immediately, but exploiting parallelism requires both
:  effective synchronization and effective balancing of load.
:
:  In the short term, I'm less interested in the avoidance of
:  synchronization of data adopted in the DragonFly approach, since I'd
:  like to see that approach validated on a larger chunk of the stack
:  (i.e., across the more incestuous pieces of the network stack), and also
:...
:  benefits (such as a very strong assertion model).  However, as aspects
:  of the DFBSD approach are validated (or not, as the case may be), we
:  should consider adopting things as they make sense.  The approaches
:  offer quite a bit of promise, but are also very experimental and will
:  require a lot of validation, needless to say.  I've done a little bit of
:  work to start applying the load distribution approach on FreeBSD, but
:  need to work more on the netisr infrastructure before I'll be able to
:  evaluate its effectiveness there.

From owner-freebsd-arch@FreeBSD.ORG  Thu May 20 19:54:39 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9B13516A4CE
	for <arch@freebsd.org>; Thu, 20 May 2004 19:54:39 -0700 (PDT)
Received: from harmony.village.org (rover.village.org [168.103.84.182])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 42C5343D31
	for <arch@freebsd.org>; Thu, 20 May 2004 19:54:39 -0700 (PDT)
	(envelope-from imp@bsdimp.com)
Received: from localhost (warner@rover2.village.org [10.0.0.1])
	by harmony.village.org (8.12.11/8.12.11) with ESMTP id i4L2s2GA038430;
	Thu, 20 May 2004 20:54:02 -0600 (MDT)
	(envelope-from imp@bsdimp.com)
Date: Thu, 20 May 2004 20:54:03 -0600 (MDT)
Message-Id: <20040520.205403.08940889.imp@bsdimp.com>
To: julian@elischer.org
From: "M. Warner Losh" <imp@bsdimp.com>
In-Reply-To: <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
References: <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: arch@freebsd.org
Subject: Re: atomic reference counting primatives.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 02:54:39 -0000

In message: <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
            Julian Elischer <julian@elischer.org> writes:
: This has been raised before but I've come across uses for it again and
: again so I'm raising it again. 
: JHB once posted some atomic referenc counting primatives. (Do you still
: have them John?) 
: Alfred once said he had soem somewhere too, and other s have commentted
: on this before, but we still don't seem to have any.
: 
: every object is reference counted with its own code and 
: sometimes it's done poorly.
: 
: Some peiople indicated that there are cases where a generic refcounter
: can not be used and usd this as  a reason to not have one at all.
: 
: So, here are some possibilities..
: my first "write it down without too much thinking" effort..
: 
: typedef {mumble} refcnt_t
: 
: refcnt_add(refcnt_t *)
:   Increments the reference count.. no magic except to be atomic.
: 
: 
: int	refcnt_drop(refcnt *, struct mutex *)
:  Decrements the refcount. If it goes to 0 it returns 0 and locks the
: mutex  (if the mutex is supplied)..

What prevents refcnt_add() from happening after ref count drops to 0?
Wouldn't that be a race?  Eg, if we have two threads:


	Thread A			Thread B

	objp = lookup();
[1]					refcnt_drop(&objp->ref, &objp->mtx);
[2]	refcnt_add(&obj->ref);
					BANG!

If [1] happens before [2], then bad things happen at BANG!  If [2]
happens before [1], then the mutex won't be locked at BANG and things
is good.  Thread A believes it has a valid reference to objp after the
refcnt_add and no way of knowing otherwise.

Is there a safe way to use the API into what you are proposing?

Warner

From owner-freebsd-arch@FreeBSD.ORG  Thu May 20 20:45:42 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id EDC0F16A4CE
	for <arch@FreeBSD.org>; Thu, 20 May 2004 20:45:42 -0700 (PDT)
Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 63F4343D2D
	for <arch@FreeBSD.org>; Thu, 20 May 2004 20:45:42 -0700 (PDT)
	(envelope-from bde@zeta.org.au)
Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au
	[61.8.0.87])i4L3jW5v012968;	Fri, 21 May 2004 13:45:32 +1000
Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246])
	i4L3jULS005620;	Fri, 21 May 2004 13:45:31 +1000
Date: Fri, 21 May 2004 13:45:32 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Julian Elischer <julian@elischer.org>
In-Reply-To: <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
Message-ID: <20040521133502.Y4135@gamplex.bde.org>
References: <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: arch@FreeBSD.org
Subject: Re: atomic reference counting primatives.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 03:45:43 -0000

On Thu, 20 May 2004, Julian Elischer wrote:

> This has been raised before but I've come across uses for it again and
> again so I'm raising it again.
> JHB once posted some atomic referenc counting primatives. (Do you still
> have them John?)
> Alfred once said he had soem somewhere too, and other s have commentted
> on this before, but we still don't seem to have any.
>
> every object is reference counted with its own code and
> sometimes it's done poorly.
>
> Some peiople indicated that there are cases where a generic refcounter
> can not be used and usd this as  a reason to not have one at all.

Now we know that a generic reference counter would be even better for
pessimizing FreeBSD than was first thought, since on P4's locked
instructions are very expensive.  See the thread about bridging.  A
pessimization by a factor of 2 or so has been achieved using little
more than normal locking, since there are lots of lock/unlock pairs
per packet and each lock and unlock takes hundreds (?) of cycles for
the bus lock part and very little else.  General atomic counters of
any sort would take about half as lock as a lock/unlock pair (since
they only need 1 lock, but would always needed it even if running in
a locked region).  The pessimizations from them could be broken using
algorithms that don't need fine-grained locking.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Thu May 20 21:10:21 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id DC1D316A4CE
	for <freebsd-arch@freebsd.org>; Thu, 20 May 2004 21:10:21 -0700 (PDT)
Received: from moutng.kundenserver.de (moutng.kundenserver.de
	[212.227.126.183])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 52D1F43D1F
	for <freebsd-arch@freebsd.org>; Thu, 20 May 2004 21:10:21 -0700 (PDT)
	(envelope-from max@love2party.net)
Received: from [212.227.126.162] (helo=mrelayng.kundenserver.de)
	by moutng.kundenserver.de with esmtp (Exim 3.35 #1)
	id 1BR1Kx-0004ag-00
	for freebsd-arch@freebsd.org; Fri, 21 May 2004 06:08:47 +0200
Received: from [216.58.85.218] (helo=[10.0.0.49])
	by mrelayng.kundenserver.de with asmtp (TLSv1:RC4-MD5:128)
	(Exim 3.35 #1)
	id 1BR1Kx-0004Or-00
	for freebsd-arch@freebsd.org; Fri, 21 May 2004 06:08:47 +0200
From: Max Laier <max@love2party.net>
To: freebsd-arch@freebsd.org
Date: Fri, 21 May 2004 06:10:24 +0200
User-Agent: KMail/1.6.2
References: <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
	<20040521133502.Y4135@gamplex.bde.org>
In-Reply-To: <20040521133502.Y4135@gamplex.bde.org>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200405210610.27298.max@love2party.net>
X-Provags-ID: kundenserver.de abuse@kundenserver.de
	auth:e28873fbe4dbe612ce62ab869898ff08
Subject: Re: atomic reference counting primatives.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 04:10:22 -0000

On Friday 21 May 2004 05:45, Bruce Evans wrote:
> On Thu, 20 May 2004, Julian Elischer wrote:
> > This has been raised before but I've come across uses for it again and
> > again so I'm raising it again.
> > JHB once posted some atomic referenc counting primatives. (Do you still
> > have them John?)
> > Alfred once said he had soem somewhere too, and other s have commentted
> > on this before, but we still don't seem to have any.
> >
> > every object is reference counted with its own code and
> > sometimes it's done poorly.
> >
> > Some peiople indicated that there are cases where a generic refcounter
> > can not be used and usd this as  a reason to not have one at all.
>
> Now we know that a generic reference counter would be even better for
> pessimizing FreeBSD than was first thought, since on P4's locked
> instructions are very expensive.  See the thread about bridging.  A
> pessimization by a factor of 2 or so has been achieved using little
> more than normal locking, since there are lots of lock/unlock pairs
> per packet and each lock and unlock takes hundreds (?) of cycles for
> the bus lock part and very little else.  General atomic counters of
> any sort would take about half as lock as a lock/unlock pair (since
> they only need 1 lock, but would always needed it even if running in
> a locked region).  The pessimizations from them could be broken using
> algorithms that don't need fine-grained locking.

I find atomic counters still very attractive for a simple sx lock. The current 
implementation uses (as far as I know) a normal mutex to protect the busy 
count, so you have four lock/unlock operations that need bus interaction, 
when we move to updating the busy count with atomic ops we have only two and 
could start to actually use sx locks.

The BANG from Warner's reply could be avoided by decrementing to ($magicval) 
rather than 0 when exlusive mode is requested. But I am not entirely sure if 
I got the point ... but that's always the/my problem when it comes to 
unterstanding locks.

-- 
Best regards,				| mlaier@freebsd.org
Max Laier				| ICQ #67774661
http://pf4freebsd.love2party.net/	| mlaier@EFnet

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 06:59:07 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A498E16A4CE
	for <arch@FreeBSD.org>; Fri, 21 May 2004 06:59:07 -0700 (PDT)
Received: from mail3.speakeasy.net (mail3.speakeasy.net [216.254.0.203])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 84A7043D31
	for <arch@FreeBSD.org>; Fri, 21 May 2004 06:59:07 -0700 (PDT)
	(envelope-from jhb@FreeBSD.org)
Received: (qmail 6235 invoked from network); 21 May 2004 13:58:56 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <arch@FreeBSD.org>; 21 May 2004 13:58:56 -0000
Received: from 10.50.40.205 (gw1.twc.weather.com [216.133.140.1])
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i4LDwoRK076727;
	Fri, 21 May 2004 09:58:51 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
Date: Fri, 21 May 2004 09:59:24 -0400
User-Agent: KMail/1.6
References: <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
	<20040520.205403.08940889.imp@bsdimp.com>
In-Reply-To: <20040520.205403.08940889.imp@bsdimp.com>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200405210959.25368.jhb@FreeBSD.org>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
cc: arch@FreeBSD.org
cc: julian@elischer.org
Subject: Re: atomic reference counting primatives.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 13:59:07 -0000

On Thursday 20 May 2004 10:54 pm, M. Warner Losh wrote:
> In message:
> <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
>
>             Julian Elischer <julian@elischer.org> writes:
> : This has been raised before but I've come across uses for it again and
> : again so I'm raising it again.
> : JHB once posted some atomic referenc counting primatives. (Do you still
> : have them John?)
> : Alfred once said he had soem somewhere too, and other s have commentted
> : on this before, but we still don't seem to have any.
> :
> : every object is reference counted with its own code and
> : sometimes it's done poorly.
> :
> : Some peiople indicated that there are cases where a generic refcounter
> : can not be used and usd this as  a reason to not have one at all.
> :
> : So, here are some possibilities..
> : my first "write it down without too much thinking" effort..
> :
> : typedef {mumble} refcnt_t
> :
> : refcnt_add(refcnt_t *)
> :   Increments the reference count.. no magic except to be atomic.
> :
> :
> : int	refcnt_drop(refcnt *, struct mutex *)
> :  Decrements the refcount. If it goes to 0 it returns 0 and locks the
> : mutex  (if the mutex is supplied)..
>
> What prevents refcnt_add() from happening after ref count drops to 0?
> Wouldn't that be a race?  Eg, if we have two threads:
>
>
> 	Thread A			Thread B
>
> 	objp = lookup();
> [1]					refcnt_drop(&objp->ref, &objp->mtx);
> [2]	refcnt_add(&obj->ref);
> 					BANG!
>
> If [1] happens before [2], then bad things happen at BANG!  If [2]
> happens before [1], then the mutex won't be locked at BANG and things
> is good.  Thread A believes it has a valid reference to objp after the
> refcnt_add and no way of knowing otherwise.
>
> Is there a safe way to use the API into what you are proposing?

This situation can't happen if you are properly using reference counting.  For 
the reference count to be at 1 in thread B, it has to have the only reference 
meaning that the object has already been removed from any lists, etc.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 06:59:08 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 31F0D16A4CE
	for <freebsd-arch@FreeBSD.org>; Fri, 21 May 2004 06:59:08 -0700 (PDT)
Received: from mail3.speakeasy.net (mail3.speakeasy.net [216.254.0.203])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 12C3B43D48
	for <freebsd-arch@FreeBSD.org>; Fri, 21 May 2004 06:59:08 -0700 (PDT)
	(envelope-from jhb@FreeBSD.org)
Received: (qmail 6235 invoked from network); 21 May 2004 13:58:56 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <arch@FreeBSD.org>; 21 May 2004 13:58:56 -0000
Received: from 10.50.40.205 (gw1.twc.weather.com [216.133.140.1])
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i4LDwoRK076727;
	Fri, 21 May 2004 09:58:51 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
Date: Fri, 21 May 2004 09:59:24 -0400
User-Agent: KMail/1.6
References: <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
	<20040520.205403.08940889.imp@bsdimp.com>
In-Reply-To: <20040520.205403.08940889.imp@bsdimp.com>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200405210959.25368.jhb@FreeBSD.org>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
cc: arch@FreeBSD.org
cc: julian@elischer.org
Subject: Re: atomic reference counting primatives.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 13:59:08 -0000

On Thursday 20 May 2004 10:54 pm, M. Warner Losh wrote:
> In message:
> <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
>
>             Julian Elischer <julian@elischer.org> writes:
> : This has been raised before but I've come across uses for it again and
> : again so I'm raising it again.
> : JHB once posted some atomic referenc counting primatives. (Do you still
> : have them John?)
> : Alfred once said he had soem somewhere too, and other s have commentted
> : on this before, but we still don't seem to have any.
> :
> : every object is reference counted with its own code and
> : sometimes it's done poorly.
> :
> : Some peiople indicated that there are cases where a generic refcounter
> : can not be used and usd this as  a reason to not have one at all.
> :
> : So, here are some possibilities..
> : my first "write it down without too much thinking" effort..
> :
> : typedef {mumble} refcnt_t
> :
> : refcnt_add(refcnt_t *)
> :   Increments the reference count.. no magic except to be atomic.
> :
> :
> : int	refcnt_drop(refcnt *, struct mutex *)
> :  Decrements the refcount. If it goes to 0 it returns 0 and locks the
> : mutex  (if the mutex is supplied)..
>
> What prevents refcnt_add() from happening after ref count drops to 0?
> Wouldn't that be a race?  Eg, if we have two threads:
>
>
> 	Thread A			Thread B
>
> 	objp = lookup();
> [1]					refcnt_drop(&objp->ref, &objp->mtx);
> [2]	refcnt_add(&obj->ref);
> 					BANG!
>
> If [1] happens before [2], then bad things happen at BANG!  If [2]
> happens before [1], then the mutex won't be locked at BANG and things
> is good.  Thread A believes it has a valid reference to objp after the
> refcnt_add and no way of knowing otherwise.
>
> Is there a safe way to use the API into what you are proposing?

This situation can't happen if you are properly using reference counting.  For 
the reference count to be at 1 in thread B, it has to have the only reference 
meaning that the object has already been removed from any lists, etc.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 07:01:46 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id DDBBB16A4D0
	for <freebsd-arch@FreeBSD.org>; Fri, 21 May 2004 07:01:46 -0700 (PDT)
Received: from mail6.speakeasy.net (mail6.speakeasy.net [216.254.0.206])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B8D2F43D31
	for <freebsd-arch@FreeBSD.org>; Fri, 21 May 2004 07:01:46 -0700 (PDT)
	(envelope-from jhb@FreeBSD.org)
Received: (qmail 2713 invoked from network); 21 May 2004 14:01:35 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <arch@FreeBSD.org>; 21 May 2004 14:01:35 -0000
Received: from 10.50.40.205 (gw1.twc.weather.com [216.133.140.1])
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i4LE1Rsp076772;
	Fri, 21 May 2004 10:01:27 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
Date: Fri, 21 May 2004 10:02:02 -0400
User-Agent: KMail/1.6
References: <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
In-Reply-To: <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200405211002.02386.jhb@FreeBSD.org>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
cc: arch@FreeBSD.org
cc: Julian Elischer <julian@elischer.org>
Subject: Re: atomic reference counting primatives.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 14:01:47 -0000

On Thursday 20 May 2004 04:56 pm, Julian Elischer wrote:
> This has been raised before but I've come across uses for it again and
> again so I'm raising it again.
> JHB once posted some atomic referenc counting primatives. (Do you still
> have them John?)
> Alfred once said he had soem somewhere too, and other s have commentted
> on this before, but we still don't seem to have any.

I still have them.  Part of the problem is that there are lots of different 
reference counts that work in different ways, and if you try to come up with 
a single all-singing, all-dancing ref count implementation it will be too 
complicated to provide any benefit.  What I do think might be useful might be 
a simple refcount() API that is useful for objects that are immutable when 
the refcount > 1 like ucred and are updated via COW.  These type of objects 
have a mutex that just protects a refcount and nothing else.  Using a single 
refcount op for those objects will cut the number of atomic ops in half.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 07:01:46 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id DD89416A4CF
	for <arch@FreeBSD.org>; Fri, 21 May 2004 07:01:46 -0700 (PDT)
Received: from mail6.speakeasy.net (mail6.speakeasy.net [216.254.0.206])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B94C143D46
	for <arch@FreeBSD.org>; Fri, 21 May 2004 07:01:46 -0700 (PDT)
	(envelope-from jhb@FreeBSD.org)
Received: (qmail 2713 invoked from network); 21 May 2004 14:01:35 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <arch@FreeBSD.org>; 21 May 2004 14:01:35 -0000
Received: from 10.50.40.205 (gw1.twc.weather.com [216.133.140.1])
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i4LE1Rsp076772;
	Fri, 21 May 2004 10:01:27 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
Date: Fri, 21 May 2004 10:02:02 -0400
User-Agent: KMail/1.6
References: <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
In-Reply-To: <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200405211002.02386.jhb@FreeBSD.org>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
cc: arch@FreeBSD.org
cc: Julian Elischer <julian@elischer.org>
Subject: Re: atomic reference counting primatives.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 14:01:47 -0000

On Thursday 20 May 2004 04:56 pm, Julian Elischer wrote:
> This has been raised before but I've come across uses for it again and
> again so I'm raising it again.
> JHB once posted some atomic referenc counting primatives. (Do you still
> have them John?)
> Alfred once said he had soem somewhere too, and other s have commentted
> on this before, but we still don't seem to have any.

I still have them.  Part of the problem is that there are lots of different 
reference counts that work in different ways, and if you try to come up with 
a single all-singing, all-dancing ref count implementation it will be too 
complicated to provide any benefit.  What I do think might be useful might be 
a simple refcount() API that is useful for objects that are immutable when 
the refcount > 1 like ucred and are updated via COW.  These type of objects 
have a mutex that just protects a refcount and nothing else.  Using a single 
refcount op for those objects will cut the number of atomic ops in half.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 09:20:30 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4BB0316A4CE; Fri, 21 May 2004 09:20:30 -0700 (PDT)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id F083943D31; Fri, 21 May 2004 09:20:29 -0700 (PDT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (localhost [127.0.0.1])
	by fledge.watson.org (8.12.11/8.12.11) with ESMTP id i4LGJr3K012129;
	Fri, 21 May 2004 12:19:54 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Received: from localhost (robert@localhost)i4LGJr3H012126;
	Fri, 21 May 2004 12:19:53 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Fri, 21 May 2004 12:19:53 -0400 (EDT)
From: Robert Watson <rwatson@freebsd.org>
X-Sender: robert@fledge.watson.org
To: Andre Oppermann <andre@freebsd.org>
In-Reply-To: <40AD2405.DC13B45C@freebsd.org>
Message-ID: <Pine.NEB.3.96L.1040521121549.4759B-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: arch@freebsd.org
Subject: Re: Network Stack Locking
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 16:20:30 -0000


On Thu, 20 May 2004, Andre Oppermann wrote:

> Robert Watson wrote:
> ...
> > Note that there are some serious issues with the current locking changes:
> 
> I vote for the approach to get in as much as possible from the moment on
> it is known to work *correctly* (not neccessarily perfectly optimal/
> optimized).  Having something correct is an ideal base to start for
> optimizing.  There I'm ready to jump in and go ahead to make things
> better by re-arraning or re-writing them.  One of my main dislikings of
> the current 'net' and 'netinet' code is it's obfuscation and really
> overloaded functions.  Even though I'm very fluent in the IPv4 network
> code it is still hurting my eye and brain when looking through certain
> files...  So I've started to clean up large parts of it.  The very first
> thing is to get ipfw out of ip_input/ip_output which I have early
> patches (see last status report).  In that patch are two more things. 
> One is to make ip_reass() a real function taking a fragemented packet
> instead of being a half-way stub only capable of being called from
> ip_input.  The second thing is to move all ip options related functions
> (which are quite many/large and seldomly used) to their own .c/.h file. 
> With that alone both ip_input/ip_output shrink by approx. 1/3 in size
> and get way more readable and understandable. 

I agree generally with all of the improvements you have proposed --
cleaning up the ip_input() and ip_output() paths is imperative.  Likewise
attempting to reduce the incestuousness of the stack and its various
components, normalize utility functions such as reassembly, etc.

> Well, the only thing I really want to say is that correctly working code
> is always a great base to optimize from.  I think this is one of the big
> lessions I've learned through my relatively young kernel programming
> career and from the VM work of John Dyson (for the younger among us, he
> and David Greenman did the orginal implementation of the unified VM we
> have.  John lost himself in micro-optimizations where he somewhat lost
> the ability to see the forest because of all the trees in the way.  In
> the end he had to give it up).

Agreed.  My goal in picking up the pieces from various people working on
this has been get to the "decent first pass" so that we can finally
understand how all the pieces come together.  There should be a number of
fairly easy optimizations we can look into once we're able to measure
accurately the impact of changes in the locking strategy.  The trick is
getting that decent first pass -- we're close, but not quite there.  The
good news is that the dual-mode model allows us to merge locking on
components without that locking necessarily being 100% complete.  I
anticipate a non-trivial window in which whether you can run Giant-free
depends on whether you're using more obscure stack components, for
example.  I hope it is not long enough that we have to improve the
mechanism for the dual-mode (i.e., have the kernel select running with
Giant based on the code compiled in, etc).  Right now it's a simple loader
tunable.  Anyhow, my hope is to have a substantial amount of time to work
on cleaning up this weekend so that I can update the patch sets.  I also
need to integrate in changes from rik to make userspace compile with the
modified kernel, etc.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Senior Research Scientist, McAfee Research

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 10:24:41 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D57F216A4CE
	for <arch@freebsd.org>; Fri, 21 May 2004 10:24:41 -0700 (PDT)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 6272B43D2D
	for <arch@freebsd.org>; Fri, 21 May 2004 10:24:41 -0700 (PDT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (localhost [127.0.0.1])
	by fledge.watson.org (8.12.11/8.12.11) with ESMTP id i4LHNqgH024685;
	Fri, 21 May 2004 13:23:52 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Received: from localhost (robert@localhost)i4LHNqQr024682;
	Fri, 21 May 2004 13:23:52 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Fri, 21 May 2004 13:23:51 -0400 (EDT)
From: Robert Watson <rwatson@freebsd.org>
X-Sender: robert@fledge.watson.org
To: Matthew Dillon <dillon@apollo.backplane.com>
In-Reply-To: <200405210103.i4L13QWT068012@apollo.backplane.com>
Message-ID: <Pine.NEB.3.96L.1040521122004.4759C-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: arch@freebsd.org
Subject: Re: Network Stack Locking
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 17:24:41 -0000


On Thu, 20 May 2004, Matthew Dillon wrote:

>     It should be noted that the biggest advantages of the distributed
>     approach are (1) The ability to operate on individual PCBs without
>     having to do any token/mutex/other locking at all, (2) Cpu locality
>     of reference in regards to cache mastership of the PCBs and related data,
>     and (3) avoidance of data cache pollution across cpus (more cpus == 
>     better utilization of individual L1/L2 caches and far greater
>     scaleability).  The biggest disadvantage is the mandatory thread switch
>     (but this is mitigated as load increases since each thread can work on
>     several PCBs without further switches, and because our thread scheduler
>     is extremely light weight under SMP conditions).  Messaging passing
>     overhead is very low since most operations already require some sort of
>     roll-up structure to be passed (e.g. an mbuf in the case of the network).

My primary concern with this approach (and the reason I'm taking somewhat
of a "wait and see what happens" attitude) is the level of inter-component
incestuousness (referred to elsewhere in this thread).  At particular
layers in the stack -- the PCBs are probably the best example -- I see the
opportunity for this sort of per-CPU unsynchronized access offering a very
clean and uncomplicated approach.

However, I'm concerned that along many of the total end-to-end paths,
there are a moderate number of pieces that will require traditional
synchronization or extensive re-writing: the route table, host cache, a
variety of "processing" packages such as netgraph, IPSEC, et al.  None of
that suggests that the per-cpu synchronization-free access in a thread
shouldn't be applied, but I'd like to see it demonstrated to be a useful
technique in a more broad sense.  One of the key implied benefits of the
approach is that it allows you to avoid significant rewriting costs for
existing code, which is appealing, but less appealing if it doesn't fall
out in the general case. 

The other concern I have is whether the message queues get deep or not: 
many of the benefits of message queues come when the queues allow
coallescing of context switches to process multiple packets.  If you're
paying a context switch per packet passing through the stack each time you
cross a boundary, there's a non-trivial operational cost to that.  So what
I'd like to see are the numbers that suggest, on a pretty functional
sample stack, that you get at least an interesting level of queuing and
therefore effective coallescing of synchronization.  I've started looking
at similar issues in the type-specific mbuf queues in the FreeBSD kernel
-- additional context switches are expensive and best avoided even if you
use explicit synchronization primitives such as mutexes.

>     In anycase, if you are seriously considering any sort of distributed
>     methodology you should also consider formalizing a messaging passing
>     API for FreeBSD.  Even if you don't like our LWKT messaging API, I
>     think you would love the DFly IPI messaging subsystem and it would be
>     very easy to port as a first step.  We use it so much now in DFly
>     that I don't think I could live without it.  e.g. for clock distribution,
>     interrupt distribution, thread/cpu isolation, wakeup(), MP-safe messaging
>     at higher levels (and hence packet routing), free()-return-to-
>     originating-cpu (mutexless slab allocator), SMP MMU synchronization
>     (the basic VM/pte-race issue with userland brought up by Alan Cox),
>     basic scheduler operations, signal(), and the list goes on and on.
>     In DFly, IPI messaging and message processing is required to be MP
>     safe (it always occurs outside the BGL, like a cpu-localized fast
>     interrupt), but a critical section still protects against reception
>     processing so code that uses it can be made very clean.

As someone who's worked with Darwin and other Mach-derived operating
systems, I see the clear appeal of message passing systems, as I think
we've discussed in other forums.  They offer substantially interesting
benefits from a security perspective also as they offer more clean
separation between components, especially userspace and the kernel. 
However, based on past experience with such systems, I'm also very
cautious about the notion.  The increased level of separation between
components can also make it harder to understand the interactions between
components in a debugging sense: for example, if your stack trace in the
TCP code only goes up to the queue receive primitive, the debugger can't
simply tell you what code originated the mbuf.

In the past, I've explored binding stack traces to messages in message
passing systems when operating in debugging mode so that the debugger
walks up to the message queue, and can then follow the stack trace from
the message to understand more about the calling context.  I've also used
this on FreeBSD in userspace -- we have local modifications to allow the
kernel to attack stack traces of the sending process to messages passed
over UNIX domain sockets so that the receiving code can grab the stack
trace as ancillary data.

The trick, though, is to make sure you're not just substituting message
queue operations and context switches for mutexes, because those both have
a moderate cost.  Many of the benefits come in reducing explicit
synchronization and then amortizing the context switch cost over multiple
instances, which helps with the cache and many other things.  So something
I'd very much like to see out of the dfbsd prototype code is a set of
measurements on queue depth at the hand-off points between layers, and
statistics on #queue operations, synchronization points, etc, amortized
over multiple deliveries. 

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Senior Research Scientist, McAfee Research


From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 12:20:00 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id ED51F16A4CE; Fri, 21 May 2004 12:19:59 -0700 (PDT)
Received: from sccrmhc12.comcast.net (sccrmhc12.comcast.net [204.127.202.56])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 953A743D39; Fri, 21 May 2004 12:19:58 -0700 (PDT)
	(envelope-from julian@elischer.org)
Received: from interjet.elischer.org ([24.7.73.28])
          by comcast.net (sccrmhc12) with ESMTP
          id <2004052119194001200oj2q5e>; Fri, 21 May 2004 19:19:41 +0000
Received: from localhost (localhost.elischer.org [127.0.0.1])
	by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id MAA87919;
	Fri, 21 May 2004 12:19:39 -0700 (PDT)
Date: Fri, 21 May 2004 12:19:37 -0700 (PDT)
From: Julian Elischer <julian@elischer.org>
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <200405210959.25368.jhb@FreeBSD.org>
Message-ID: <Pine.BSF.4.21.0405211153350.72391-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: arch@FreeBSD.org
cc: freebsd-arch@FreeBSD.org
Subject: Re: atomic reference counting primatives.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 19:20:00 -0000


On Fri, 21 May 2004, John Baldwin wrote:

> On Thursday 20 May 2004 10:54 pm, M. Warner Losh wrote:
> > In message:
> > <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
> >
> >             Julian Elischer <julian@elischer.org> writes:
> > : This has been raised before but I've come across uses for it again and
> > : again so I'm raising it again.
> > : JHB once posted some atomic referenc counting primatives. (Do you still
> > : have them John?)
> > : Alfred once said he had soem somewhere too, and other s have commentted
> > : on this before, but we still don't seem to have any.
> > :
> > : every object is reference counted with its own code and
> > : sometimes it's done poorly.
> > :
> > : Some peiople indicated that there are cases where a generic refcounter
> > : can not be used and usd this as  a reason to not have one at all.
> > :
> > : So, here are some possibilities..
> > : my first "write it down without too much thinking" effort..
> > :
> > : typedef {mumble} refcnt_t
> > :
> > : refcnt_add(refcnt_t *)
> > :   Increments the reference count.. no magic except to be atomic.
> > :
> > :
> > : int	refcnt_drop(refcnt *, struct mutex *)
> > :  Decrements the refcount. If it goes to 0 it returns 0 and locks the
> > : mutex  (if the mutex is supplied)..
> >
> > What prevents refcnt_add() from happening after ref count drops to 0?
> > Wouldn't that be a race?  Eg, if we have two threads:
> >
> >
> > 	Thread A			Thread B
> >
> > 	objp = lookup();
> > [1]					refcnt_drop(&objp->ref, &objp->mtx);
> > [2]	refcnt_add(&obj->ref);
> > 					BANG!
> >
> > If [1] happens before [2], then bad things happen at BANG!  If [2]
> > happens before [1], then the mutex won't be locked at BANG and things
> > is good.  Thread A believes it has a valid reference to objp after the
> > refcnt_add and no way of knowing otherwise.
> >
> > Is there a safe way to use the API into what you are proposing?
> 
> This situation can't happen if you are properly using reference counting.  For 
> the reference count to be at 1 in thread B, it has to have the only reference 
> meaning that the object has already been removed from any lists, etc.

Exactly.. B needs to have got his copy of th reference from somewhere,
and that reference should have been counted somewhere as should B's copy
of it.
So, the reference count should be at least 2 before B drops his
reference.. and possibly 3..


I would even go on record as saying that I have seen and liked 
a refcount API which was (from memory something like):

void * refcnt_add(offsetof(struct obj, refcnt), void ** object_p)

which takes a pointer to the object pointer you are copyuing, and 
atomically increments it and returns the contents of the pointer.
If the contents of the pointer are NULL, then it retunrs NULL
and doesn't increment anything..

The reference decrement atomically reduced the reference count and 
zapped the pointer, and retunred a copy of the pointer if
the reference count had gone to 0 (or NULL if not).

So usage was:
struct xx *globalpointer;   /* has its own owner somewhere */

	mypointer = refcnt_add(offsetof(xx, refcnt), globalptr)
	if (mypointer == NULL) {
		printf("didn't find an object\n"
		return (-1);

	}
	manipulate(mypointer)
	if ((tmppointer = refcnt_drop(&mypointer, &globalpointer))) {
		free(tmppointer);
	}


someone else who owns the globalpointer reference might might in
the meanwhile do:

if ((tmppointer = refcnt_drop(globalpointer->refcnt, &globalpointer))) {
	free(tmppointer);
}

and you were guaranteed to get a predictable result.


From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 12:20:00 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id ED51F16A4CE; Fri, 21 May 2004 12:19:59 -0700 (PDT)
Received: from sccrmhc12.comcast.net (sccrmhc12.comcast.net [204.127.202.56])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 953A743D39; Fri, 21 May 2004 12:19:58 -0700 (PDT)
	(envelope-from julian@elischer.org)
Received: from interjet.elischer.org ([24.7.73.28])
          by comcast.net (sccrmhc12) with ESMTP
          id <2004052119194001200oj2q5e>; Fri, 21 May 2004 19:19:41 +0000
Received: from localhost (localhost.elischer.org [127.0.0.1])
	by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id MAA87919;
	Fri, 21 May 2004 12:19:39 -0700 (PDT)
Date: Fri, 21 May 2004 12:19:37 -0700 (PDT)
From: Julian Elischer <julian@elischer.org>
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <200405210959.25368.jhb@FreeBSD.org>
Message-ID: <Pine.BSF.4.21.0405211153350.72391-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: arch@FreeBSD.org
cc: freebsd-arch@FreeBSD.org
Subject: Re: atomic reference counting primatives.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 19:20:00 -0000


On Fri, 21 May 2004, John Baldwin wrote:

> On Thursday 20 May 2004 10:54 pm, M. Warner Losh wrote:
> > In message:
> > <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
> >
> >             Julian Elischer <julian@elischer.org> writes:
> > : This has been raised before but I've come across uses for it again and
> > : again so I'm raising it again.
> > : JHB once posted some atomic referenc counting primatives. (Do you still
> > : have them John?)
> > : Alfred once said he had soem somewhere too, and other s have commentted
> > : on this before, but we still don't seem to have any.
> > :
> > : every object is reference counted with its own code and
> > : sometimes it's done poorly.
> > :
> > : Some peiople indicated that there are cases where a generic refcounter
> > : can not be used and usd this as  a reason to not have one at all.
> > :
> > : So, here are some possibilities..
> > : my first "write it down without too much thinking" effort..
> > :
> > : typedef {mumble} refcnt_t
> > :
> > : refcnt_add(refcnt_t *)
> > :   Increments the reference count.. no magic except to be atomic.
> > :
> > :
> > : int	refcnt_drop(refcnt *, struct mutex *)
> > :  Decrements the refcount. If it goes to 0 it returns 0 and locks the
> > : mutex  (if the mutex is supplied)..
> >
> > What prevents refcnt_add() from happening after ref count drops to 0?
> > Wouldn't that be a race?  Eg, if we have two threads:
> >
> >
> > 	Thread A			Thread B
> >
> > 	objp = lookup();
> > [1]					refcnt_drop(&objp->ref, &objp->mtx);
> > [2]	refcnt_add(&obj->ref);
> > 					BANG!
> >
> > If [1] happens before [2], then bad things happen at BANG!  If [2]
> > happens before [1], then the mutex won't be locked at BANG and things
> > is good.  Thread A believes it has a valid reference to objp after the
> > refcnt_add and no way of knowing otherwise.
> >
> > Is there a safe way to use the API into what you are proposing?
> 
> This situation can't happen if you are properly using reference counting.  For 
> the reference count to be at 1 in thread B, it has to have the only reference 
> meaning that the object has already been removed from any lists, etc.

Exactly.. B needs to have got his copy of th reference from somewhere,
and that reference should have been counted somewhere as should B's copy
of it.
So, the reference count should be at least 2 before B drops his
reference.. and possibly 3..


I would even go on record as saying that I have seen and liked 
a refcount API which was (from memory something like):

void * refcnt_add(offsetof(struct obj, refcnt), void ** object_p)

which takes a pointer to the object pointer you are copyuing, and 
atomically increments it and returns the contents of the pointer.
If the contents of the pointer are NULL, then it retunrs NULL
and doesn't increment anything..

The reference decrement atomically reduced the reference count and 
zapped the pointer, and retunred a copy of the pointer if
the reference count had gone to 0 (or NULL if not).

So usage was:
struct xx *globalpointer;   /* has its own owner somewhere */

	mypointer = refcnt_add(offsetof(xx, refcnt), globalptr)
	if (mypointer == NULL) {
		printf("didn't find an object\n"
		return (-1);

	}
	manipulate(mypointer)
	if ((tmppointer = refcnt_drop(&mypointer, &globalpointer))) {
		free(tmppointer);
	}


someone else who owns the globalpointer reference might might in
the meanwhile do:

if ((tmppointer = refcnt_drop(globalpointer->refcnt, &globalpointer))) {
	free(tmppointer);
}

and you were guaranteed to get a predictable result.


From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 12:36:39 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id CFE4816A4CE
	for <freebsd-arch@freebsd.org>; Fri, 21 May 2004 12:36:39 -0700 (PDT)
Received: from sccrmhc11.comcast.net (sccrmhc11.comcast.net [204.127.202.55])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7F08143D41
	for <freebsd-arch@freebsd.org>; Fri, 21 May 2004 12:36:39 -0700 (PDT)
	(envelope-from cristjc@comcast.net)
Received: from blossom.cjclark.org
	(c-24-6-187-112.client.comcast.net[24.6.187.112])
	by comcast.net (sccrmhc11) with ESMTP
	id <2004052119360701100qhchde>; Fri, 21 May 2004 19:36:07 +0000
Received: from blossom.cjclark.org (localhost. [127.0.0.1])
	by blossom.cjclark.org (8.12.9p2/8.12.8) with ESMTP id i4LJa68B008351
	for <freebsd-arch@freebsd.org>; Fri, 21 May 2004 12:36:06 -0700 (PDT)
	(envelope-from cristjc@comcast.net)
Received: (from cjc@localhost)
	by blossom.cjclark.org (8.12.9p2/8.12.9/Submit) id i4LJa5ni008350
	for freebsd-arch@freebsd.org; Fri, 21 May 2004 12:36:05 -0700 (PDT)
	(envelope-from cristjc@comcast.net)
X-Authentication-Warning: blossom.cjclark.org: cjc set sender to
	cristjc@comcast.net using -f
Date: Fri, 21 May 2004 12:36:05 -0700
From: "Crist J. Clark" <cristjc@comcast.net>
To: freebsd-arch@freebsd.org
Message-ID: <20040521193605.GA8246@blossom.cjclark.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.1i
X-URL: http://people.freebsd.org/~cjc/
Subject: Move /usr/sup to /var/db/sup?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: "Crist J. Clark" <cjc@freebsd.org>
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 19:36:39 -0000

Just a minor thing, but I would think[0] most people would agree that
/var/db/sup is a much more logical place for the CVSup "base" directory
than /usr/sup. Yes, it doesn't take up much space on /usr, but for
those who don't want to write to /usr[1] too much or mount /usr read-
only, it's an irritant.

Of course, there is one big reason not to change it, because it would
be a change.

Personally, I don't think it will be disruptive to make changes to the
example files in /usr/share/examples/cvsup. People who already have
/usr/sup populated are using their own localized versions of these
files, so the change won't affect them (not that losing the "sup"
directory is that big of a deal). A person starting with a copy of
one of the examples is probably starting a fresh CVSup and will be
creating a new sup dir anyway.

Anyone have objections to going through the example supfiles with,

--- cvs-supfile 4 May 2004 20:03:50 -0000       1.42
+++ cvs-supfile 21 May 2004 19:30:23 -0000
@@ -53 +53 @@
-*default base=/usr
+*default base=/var/db

[0] But with any seemingly insignificant change like this, there is
an excellent chance some people out there do not agree and will
be quite vocal about it.

[1] Or / if /usr doesn't have its own file system. The same arguments
apply.
-- 
Crist J. Clark                     |     cjclark@alum.mit.edu
                                   |     cjclark@jhu.edu
http://people.freebsd.org/~cjc/    |     cjc@freebsd.org

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 12:42:58 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7284B16A4CE; Fri, 21 May 2004 12:42:58 -0700 (PDT)
Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 570B443D1D; Fri, 21 May 2004 12:42:58 -0700 (PDT)
	(envelope-from brdavis@odin.ac.hmc.edu)
Received: from odin.ac.hmc.edu (IDENT:brdavis@localhost.localdomain
	[127.0.0.1])
	by odin.ac.hmc.edu (8.12.10/8.12.10) with ESMTP id i4LJgWs0023268;
	Fri, 21 May 2004 12:42:32 -0700
Received: (from brdavis@localhost)
	by odin.ac.hmc.edu (8.12.10/8.12.3/Submit) id i4LJgWMb023267;
	Fri, 21 May 2004 12:42:32 -0700
Date: Fri, 21 May 2004 12:42:32 -0700
From: Brooks Davis <brooks@one-eyed-alien.net>
To: "Crist J. Clark" <cjc@freebsd.org>
Message-ID: <20040521194231.GA22816@Odin.AC.HMC.Edu>
References: <20040521193605.GA8246@blossom.cjclark.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="liOOAslEiF7prFVr"
Content-Disposition: inline
In-Reply-To: <20040521193605.GA8246@blossom.cjclark.org>
User-Agent: Mutt/1.5.4i
cc: freebsd-arch@freebsd.org
Subject: Re: Move /usr/sup to /var/db/sup?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 19:42:58 -0000


--liOOAslEiF7prFVr
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, May 21, 2004 at 12:36:05PM -0700, Crist J. Clark wrote:
> Just a minor thing, but I would think[0] most people would agree that
> /var/db/sup is a much more logical place for the CVSup "base" directory
> than /usr/sup. Yes, it doesn't take up much space on /usr, but for
> those who don't want to write to /usr[1] too much or mount /usr read-
> only, it's an irritant.
>=20
> Of course, there is one big reason not to change it, because it would
> be a change.
>=20
> Personally, I don't think it will be disruptive to make changes to the
> example files in /usr/share/examples/cvsup. People who already have
> /usr/sup populated are using their own localized versions of these
> files, so the change won't affect them (not that losing the "sup"
> directory is that big of a deal). A person starting with a copy of
> one of the examples is probably starting a fresh CVSup and will be
> creating a new sup dir anyway.

This seems reasionable.  If you're going to do it, I suggest adding
/var/db/sup to the appropriate mtree file so it always exists.  That
way people who blindly copy their supfiles to /usr/sup will still get
something that works and their finger memory won't be broken.

-- Brooks

--=20
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

--liOOAslEiF7prFVr
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFArlunXY6L6fI4GtQRAi9IAJ9Kas1YehLIg+jiBKxoIKS0K7/9XQCgjlgC
Li3+tNYoMlYHK/a9sS42vBQ=
=1I6l
-----END PGP SIGNATURE-----

--liOOAslEiF7prFVr--

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 14:32:46 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4180516A4CE; Fri, 21 May 2004 14:32:46 -0700 (PDT)
Received: from mail023.syd.optusnet.com.au (mail023.syd.optusnet.com.au
	[211.29.132.101])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id E09D343D2F; Fri, 21 May 2004 14:32:44 -0700 (PDT)
	(envelope-from PeterJeremy@optushome.com.au)
Received: from cirb503493.alcatel.com.au
	(c211-30-75-229.belrs2.nsw.optusnet.com.au [211.30.75.229])
	i4LLW9j31023;	Sat, 22 May 2004 07:32:09 +1000
Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au
	[127.0.0.1])i4LLW9cj091833;	Sat, 22 May 2004 07:32:09 +1000 (EST)
	(envelope-from pjeremy@cirb503493.alcatel.com.au)
Received: (from pjeremy@localhost)i4LLW8Za091832;
	Sat, 22 May 2004 07:32:08 +1000 (EST)	(envelope-from pjeremy)
Date: Sat, 22 May 2004 07:32:08 +1000
From: Peter Jeremy <PeterJeremy@optushome.com.au>
To: Andre Oppermann <andre@freebsd.org>
Message-ID: <20040521213208.GA87546@cirb503493.alcatel.com.au>
References: <Pine.NEB.3.96L.1040520162957.90528H-100000@fledge.watson.org>
	<40AD2405.DC13B45C@freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <40AD2405.DC13B45C@freebsd.org>
User-Agent: Mutt/1.4.2i
cc: arch@freebsd.org
cc: Robert Watson <rwatson@freebsd.org>
Subject: Re: Network Stack Locking
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 21:32:46 -0000

On Thu, 2004-May-20 23:32:53 +0200, Andre Oppermann wrote:
>Robert Watson wrote:
>...
>> Note that there are some serious issues with the current locking changes:
>...
>> 
>
>I vote for the approach to get in as much as possible from the moment
>on it is known to work *correctly* (not neccessarily perfectly optimal/
>optimized).  Having something correct is an ideal base to start for
>optimizing.  There I'm ready to jump in and go ahead to make things
>better by re-arraning or re-writing them.

Keep in mind that the best improvements in performance are achieved by
using a better algorithm - macro-optimisation rather than micro-
optimisation.  We currently have a network stack that works correctly
and should be careful about committing WIP code that may be heading in
the wrong direction.

>Progress happens incrementally.  Put in Green's kqueue locking, have
>that working correctly and make it perfect in a second step.

I don't believe this is the correct approach at this time.  Brian's
code removes functionality that people have stated that they _do_ use.
In theory, John-Mark's approach offers better performance without the
loss of functionality.  Before implementing Brian's code, the Project
needs to decide which direction it should move in.
-- 
Peter Jeremy

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 14:38:59 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 30E0116A4CE; Fri, 21 May 2004 14:38:59 -0700 (PDT)
Received: from smtp2.server.rpi.edu (smtp2.server.rpi.edu [128.113.2.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 93F4C43D39; Fri, 21 May 2004 14:38:58 -0700 (PDT)
	(envelope-from drosih@rpi.edu)
Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47])
	by smtp2.server.rpi.edu (8.12.8/8.12.8) with ESMTP id i4LLcvIX031059;
	Fri, 21 May 2004 17:38:57 -0400
Mime-Version: 1.0
X-Sender: drosih@mail.rpi.edu
Message-Id: <p06020403bcd422915d2e@[128.113.24.47]>
In-Reply-To: <20040521193605.GA8246@blossom.cjclark.org>
References: <20040521193605.GA8246@blossom.cjclark.org>
Date: Fri, 21 May 2004 17:38:56 -0400
To: "Crist J. Clark" <cjc@freebsd.org>, freebsd-arch@freebsd.org
From: Garance A Drosihn <drosih@rpi.edu>
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
X-Scanned-By: CanIt (www . canit . ca)
Subject: Re: Move /usr/sup to /var/db/sup?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 21:38:59 -0000

At 12:36 PM -0700 5/21/04, Crist J. Clark wrote:
>Just a minor thing, but I would think[0] most people would
>agree that /var/db/sup is a much more logical place for the
>CVSup "base" directory than /usr/sup. Yes, it doesn't take
>up much space on /usr, but for those who don't want to write
>to /usr[1] too much or mount /usr read-only, it's an irritant.
>
>Of course, there is one big reason not to change it, because
>it would be a change.

I have all my own sup-files anyway, so I do not have any
strong opinion on this.  But there is one minor advantage
that I have noticed in having the "base=" directory in the
same partition as the "prefix=" directory.  If the partition
matching "prefix=" is not mounted, and if the "base=" file
is on that partition, then any attempt to run the cvsup will
immediately fail.

However, if the "prefix=" partition is not mounted, and the
"base=" directory *is* available (because it is on a different
partition), then the cvsup will go right ahead and download
everything into the wrong partition.  Depending on how your
machine is set up, this can be rather disastrous...  (it was
for me, at least!)

I have no idea if that is why someone went with /usr/sup in
the example supfiles, though.  I do not object to making the
change to use /var/db/sup, but then I don't use those example
files so my vote wouldn't mean much anyway...  :-)

-- 
Garance Alistair Drosehn            =   gad@gilead.netel.rpi.edu
Senior Systems Programmer           or  gad@freebsd.org
Rensselaer Polytechnic Institute    or  drosih@rpi.edu

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 14:39:48 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5F04016A4CE
	for <freebsd-arch@freebsd.org>; Fri, 21 May 2004 14:39:48 -0700 (PDT)
Received: from sccrmhc13.comcast.net (sccrmhc13.comcast.net [204.127.202.64])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0AE8743D45
	for <freebsd-arch@freebsd.org>; Fri, 21 May 2004 14:39:48 -0700 (PDT)
	(envelope-from cristjc@comcast.net)
Received: from blossom.cjclark.org
	(c-24-6-187-112.client.comcast.net[24.6.187.112])
	by comcast.net (sccrmhc13) with ESMTP
	id <2004052121394601600966s6e>; Fri, 21 May 2004 21:39:47 +0000
Received: from blossom.cjclark.org (localhost. [127.0.0.1])
	by blossom.cjclark.org (8.12.9p2/8.12.8) with ESMTP id i4LLdj8B008718;
	Fri, 21 May 2004 14:39:45 -0700 (PDT)
	(envelope-from cristjc@comcast.net)
Received: (from cjc@localhost)
	by blossom.cjclark.org (8.12.9p2/8.12.9/Submit) id i4LLdilN008717;
	Fri, 21 May 2004 14:39:44 -0700 (PDT)
	(envelope-from cristjc@comcast.net)
X-Authentication-Warning: blossom.cjclark.org: cjc set sender to
	cristjc@comcast.net using -f
Date: Fri, 21 May 2004 14:39:44 -0700
From: "Crist J. Clark" <cristjc@comcast.net>
To: Brooks Davis <brooks@one-eyed-alien.net>
Message-ID: <20040521213944.GB8246@blossom.cjclark.org>
References: <20040521193605.GA8246@blossom.cjclark.org>
	<20040521194231.GA22816@Odin.AC.HMC.Edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20040521194231.GA22816@Odin.AC.HMC.Edu>
User-Agent: Mutt/1.4.2.1i
X-URL: http://people.freebsd.org/~cjc/
cc: freebsd-arch@freebsd.org
Subject: Re: Move /usr/sup to /var/db/sup?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: cjclark@alum.mit.edu
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 21:39:48 -0000

On Fri, May 21, 2004 at 12:42:32PM -0700, Brooks Davis wrote:
> On Fri, May 21, 2004 at 12:36:05PM -0700, Crist J. Clark wrote:
> > Just a minor thing, but I would think[0] most people would agree that
> > /var/db/sup is a much more logical place for the CVSup "base" directory
> > than /usr/sup. Yes, it doesn't take up much space on /usr, but for
> > those who don't want to write to /usr[1] too much or mount /usr read-
> > only, it's an irritant.
> > 
> > Of course, there is one big reason not to change it, because it would
> > be a change.
> > 
> > Personally, I don't think it will be disruptive to make changes to the
> > example files in /usr/share/examples/cvsup. People who already have
> > /usr/sup populated are using their own localized versions of these
> > files, so the change won't affect them (not that losing the "sup"
> > directory is that big of a deal). A person starting with a copy of
> > one of the examples is probably starting a fresh CVSup and will be
> > creating a new sup dir anyway.
> 
> This seems reasionable.  If you're going to do it, I suggest adding
> /var/db/sup to the appropriate mtree file so it always exists.  That
> way people who blindly copy their supfiles to /usr/sup will still get
> something that works and their finger memory won't be broken.

Hmmm... /usr/sup is not in BSD.mtree.usr. I believe cvsup(1) creates
it when it does not exist. Are you saying we should add it to
BSD.mtree.var even though we don't create /usr/sup? Or are you
saying to create /var/db/sup and make a symlink in /usr to it?
-- 
Crist J. Clark                     |     cjclark@alum.mit.edu
                                   |     cjclark@jhu.edu
http://people.freebsd.org/~cjc/    |     cjc@freebsd.org

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 14:47:36 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3EB0116A4CE
	for <freebsd-arch@freebsd.org>; Fri, 21 May 2004 14:47:36 -0700 (PDT)
Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 23DC343D3F
	for <freebsd-arch@freebsd.org>; Fri, 21 May 2004 14:47:36 -0700 (PDT)
	(envelope-from brdavis@odin.ac.hmc.edu)
Received: from odin.ac.hmc.edu (IDENT:brdavis@localhost.localdomain
	[127.0.0.1])
	by odin.ac.hmc.edu (8.12.10/8.12.10) with ESMTP id i4LLlYs0002618;
	Fri, 21 May 2004 14:47:34 -0700
Received: (from brdavis@localhost)
	by odin.ac.hmc.edu (8.12.10/8.12.3/Submit) id i4LLlYQx002616;
	Fri, 21 May 2004 14:47:34 -0700
Date: Fri, 21 May 2004 14:47:34 -0700
From: Brooks Davis <brooks@one-eyed-alien.net>
To: cjclark@alum.mit.edu
Message-ID: <20040521214733.GA1549@Odin.AC.HMC.Edu>
References: <20040521193605.GA8246@blossom.cjclark.org>
	<20040521194231.GA22816@Odin.AC.HMC.Edu>
	<20040521213944.GB8246@blossom.cjclark.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="u3/rZRmxL6MmkK24"
Content-Disposition: inline
In-Reply-To: <20040521213944.GB8246@blossom.cjclark.org>
User-Agent: Mutt/1.5.4i
cc: freebsd-arch@freebsd.org
Subject: Re: Move /usr/sup to /var/db/sup?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 21:47:36 -0000


--u3/rZRmxL6MmkK24
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, May 21, 2004 at 02:39:44PM -0700, Crist J. Clark wrote:
> On Fri, May 21, 2004 at 12:42:32PM -0700, Brooks Davis wrote:
> > On Fri, May 21, 2004 at 12:36:05PM -0700, Crist J. Clark wrote:
> > > Just a minor thing, but I would think[0] most people would agree that
> > > /var/db/sup is a much more logical place for the CVSup "base" directo=
ry
> > > than /usr/sup. Yes, it doesn't take up much space on /usr, but for
> > > those who don't want to write to /usr[1] too much or mount /usr read-
> > > only, it's an irritant.
> > >=20
> > > Of course, there is one big reason not to change it, because it would
> > > be a change.
> > >=20
> > > Personally, I don't think it will be disruptive to make changes to the
> > > example files in /usr/share/examples/cvsup. People who already have
> > > /usr/sup populated are using their own localized versions of these
> > > files, so the change won't affect them (not that losing the "sup"
> > > directory is that big of a deal). A person starting with a copy of
> > > one of the examples is probably starting a fresh CVSup and will be
> > > creating a new sup dir anyway.
> >=20
> > This seems reasionable.  If you're going to do it, I suggest adding
> > /var/db/sup to the appropriate mtree file so it always exists.  That
> > way people who blindly copy their supfiles to /usr/sup will still get
> > something that works and their finger memory won't be broken.
>=20
> Hmmm... /usr/sup is not in BSD.mtree.usr. I believe cvsup(1) creates
> it when it does not exist. Are you saying we should add it to
> BSD.mtree.var even though we don't create /usr/sup? Or are you
> saying to create /var/db/sup and make a symlink in /usr to it?

Hmm, for some reason I've always copied the example supfiles to /usr/sup
before editing them.  For some reason I'd assumed this was something
I read in the documentation in the distant past, but I can't find any
evidence of such a recommendation so I may just made that convention up.
Given that, I don't think it's necessicary to create /var/db/sup, but it
might be a nice idea anyway.  Another directory in /var/db certaintly
wouldn't hurt anything.

-- Brooks

--=20
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

--u3/rZRmxL6MmkK24
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD4DBQFArnj1XY6L6fI4GtQRAq3FAJjixUcHH79yxhJh6yApO6oDR+0jAKCiCJMu
Cs3rrQoZ4UQPnmS8AxRIHA==
=adDF
-----END PGP SIGNATURE-----

--u3/rZRmxL6MmkK24--

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 14:57:41 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4662716A4CF
	for <freebsd-arch@freebsd.org>; Fri, 21 May 2004 14:57:41 -0700 (PDT)
Received: from mail.soaustin.net (mail.soaustin.net [207.200.4.66])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 20B6143D1D
	for <freebsd-arch@freebsd.org>; Fri, 21 May 2004 14:57:41 -0700 (PDT)
	(envelope-from linimon@lonesome.com)
Received: by mail.soaustin.net (Postfix, from userid 502)
	id 2F9BD148C1; Fri, 21 May 2004 16:57:24 -0500 (CDT)
Date: Fri, 21 May 2004 16:57:24 -0500 (CDT)
From: Mark Linimon <linimon@lonesome.com>
X-X-Sender: linimon@pancho
To: Brooks Davis <brooks@one-eyed-alien.net>
In-Reply-To: <20040521214733.GA1549@Odin.AC.HMC.Edu>
Message-ID: <Pine.LNX.4.44.0405211655470.24913-100000@pancho>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: cjclark@alum.mit.edu
cc: freebsd-arch@freebsd.org
Subject: Re: Move /usr/sup to /var/db/sup?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 21:57:41 -0000

Could we also consider the approach of leaving the supfiles in
/usr/sup (since they are small and only rarely change) and having
the files that change in /var/db/sup, or does the directory need
to be the same?

mcl

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 15:02:49 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 44ECD16A4CE
	for <freebsd-arch@freebsd.org>; Fri, 21 May 2004 15:02:49 -0700 (PDT)
Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1FD0D43D41
	for <freebsd-arch@freebsd.org>; Fri, 21 May 2004 15:02:49 -0700 (PDT)
	(envelope-from brdavis@odin.ac.hmc.edu)
Received: from odin.ac.hmc.edu (IDENT:brdavis@localhost.localdomain
	[127.0.0.1])
	by odin.ac.hmc.edu (8.12.10/8.12.10) with ESMTP id i4LM2ls0003911;
	Fri, 21 May 2004 15:02:47 -0700
Received: (from brdavis@localhost)
	by odin.ac.hmc.edu (8.12.10/8.12.3/Submit) id i4LM2lE5003909;
	Fri, 21 May 2004 15:02:47 -0700
Date: Fri, 21 May 2004 15:02:47 -0700
From: Brooks Davis <brooks@one-eyed-alien.net>
To: Mark Linimon <linimon@lonesome.com>
Message-ID: <20040521220247.GA3366@Odin.AC.HMC.Edu>
References: <20040521214733.GA1549@Odin.AC.HMC.Edu>
	<Pine.LNX.4.44.0405211655470.24913-100000@pancho>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="qDbXVdCdHGoSgWSk"
Content-Disposition: inline
In-Reply-To: <Pine.LNX.4.44.0405211655470.24913-100000@pancho>
User-Agent: Mutt/1.5.4i
cc: cjclark@alum.mit.edu
cc: freebsd-arch@freebsd.org
Subject: Re: Move /usr/sup to /var/db/sup?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 22:02:49 -0000


--qDbXVdCdHGoSgWSk
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, May 21, 2004 at 04:57:24PM -0500, Mark Linimon wrote:
> Could we also consider the approach of leaving the supfiles in
> /usr/sup (since they are small and only rarely change) and having
> the files that change in /var/db/sup, or does the directory need
> to be the same?

I don't think it matters to cvsup where the supfiles live.  I'll almost
certaintly keep putting them in /usr/sup because that's where my fingers
think they should be.  Pre-creating /var/db/sup would make this easier
for me.

-- Brooks

--=20
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

--qDbXVdCdHGoSgWSk
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFArnyGXY6L6fI4GtQRAmDMAJ43tWh641cq4l1cp/N6dN90Wb5/swCeMLzq
ZscgJDSfLrpvg5AvOIpdKB8=
=MgoT
-----END PGP SIGNATURE-----

--qDbXVdCdHGoSgWSk--

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 15:16:06 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3CD0616A4CE
	for <freebsd-arch@freebsd.org>; Fri, 21 May 2004 15:16:06 -0700 (PDT)
Received: from sccrmhc13.comcast.net (sccrmhc13.comcast.net [204.127.202.64])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B10A843D1D
	for <freebsd-arch@freebsd.org>; Fri, 21 May 2004 15:16:05 -0700 (PDT)
	(envelope-from cristjc@comcast.net)
Received: from blossom.cjclark.org
	(c-24-6-187-112.client.comcast.net[24.6.187.112])
	by comcast.net (sccrmhc13) with ESMTP
	id <20040521221556016009chqfe>; Fri, 21 May 2004 22:15:56 +0000
Received: from blossom.cjclark.org (localhost. [127.0.0.1])
	by blossom.cjclark.org (8.12.9p2/8.12.8) with ESMTP id i4LMFt8B008893;
	Fri, 21 May 2004 15:15:55 -0700 (PDT)
	(envelope-from cristjc@comcast.net)
Received: (from cjc@localhost)
	by blossom.cjclark.org (8.12.9p2/8.12.9/Submit) id i4LMFslK008892;
	Fri, 21 May 2004 15:15:54 -0700 (PDT)
	(envelope-from cristjc@comcast.net)
X-Authentication-Warning: blossom.cjclark.org: cjc set sender to
	cristjc@comcast.net using -f
Date: Fri, 21 May 2004 15:15:54 -0700
From: "Crist J. Clark" <cristjc@comcast.net>
To: Mark Linimon <linimon@lonesome.com>
Message-ID: <20040521221554.GA8734@blossom.cjclark.org>
References: <20040521214733.GA1549@Odin.AC.HMC.Edu>
	<Pine.LNX.4.44.0405211655470.24913-100000@pancho>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.LNX.4.44.0405211655470.24913-100000@pancho>
User-Agent: Mutt/1.4.2.1i
X-URL: http://people.freebsd.org/~cjc/
cc: freebsd-arch@freebsd.org
Subject: Re: Move /usr/sup to /var/db/sup?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: cjclark@alum.mit.edu
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 22:16:06 -0000

On Fri, May 21, 2004 at 04:57:24PM -0500, Mark Linimon wrote:
> Could we also consider the approach of leaving the supfiles in
> /usr/sup (since they are small and only rarely change) and having
> the files that change in /var/db/sup, or does the directory need
> to be the same?

You can put the supfiles wherever you want. There is no standard
place for a supfile. Since the "base" is specified in the supfile,
there is a chicken-and-egg problem of placing the supfile in the
base directory and expecting CVSup to find it. In addition, it 
would probably make more sense to put supfiles in the base directory
(which is /usr in the examples) rather than in the sup directory
(/usr/sup). I suspect most would consider having supfiles in /usr
quite an afront.

I didn't want to have to discuss the implications of the fact that
the hardcoded base default in cvsup(1) is /usr/local/etc/cvsup,
but that is probably the most logical place to put supfiles (logical
as in "the place someone else might actually find them").
-- 
Crist J. Clark                     |     cjclark@alum.mit.edu
                                   |     cjclark@jhu.edu
http://people.freebsd.org/~cjc/    |     cjc@freebsd.org

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 15:40:54 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id AF77E16A4CF
	for <freebsd-arch@freebsd.org>; Fri, 21 May 2004 15:40:54 -0700 (PDT)
Received: from mail.soaustin.net (mail.soaustin.net [207.200.4.66])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 95B4643D3F
	for <freebsd-arch@freebsd.org>; Fri, 21 May 2004 15:40:54 -0700 (PDT)
	(envelope-from linimon@lonesome.com)
Received: by mail.soaustin.net (Postfix, from userid 502)
	id 9998F148B2; Fri, 21 May 2004 17:40:14 -0500 (CDT)
Date: Fri, 21 May 2004 17:40:14 -0500 (CDT)
From: Mark Linimon <linimon@lonesome.com>
X-X-Sender: linimon@pancho
To: cjclark@alum.mit.edu
In-Reply-To: <20040521221554.GA8734@blossom.cjclark.org>
Message-ID: <Pine.LNX.4.44.0405211739510.25889-100000@pancho>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: Mark Linimon <linimon@lonesome.com>
cc: freebsd-arch@freebsd.org
Subject: Re: Move /usr/sup to /var/db/sup?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2004 22:40:54 -0000

Um, by "supfiles" I was meaning *-supfile, in case that wasn't clear?

mcl

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 17:45:16 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 8A5DF16A4CE; Fri, 21 May 2004 17:45:16 -0700 (PDT)
Received: from smtp2.server.rpi.edu (smtp2.server.rpi.edu [128.113.2.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 352ED43D39; Fri, 21 May 2004 17:45:16 -0700 (PDT)
	(envelope-from drosih@rpi.edu)
Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47])
	by smtp2.server.rpi.edu (8.12.8/8.12.8) with ESMTP id i4M0igIX005961;
	Fri, 21 May 2004 20:44:43 -0400
Mime-Version: 1.0
X-Sender: drosih@mail.rpi.edu
Message-Id: <p06020404bcd44faaef2f@[128.113.24.47]>
In-Reply-To: 
 <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
References: 
 <Pine.BSF.4.21.0405201340590.72391-100000@InterJet.elischer.org>
Date: Fri, 21 May 2004 20:44:41 -0400
To: Julian Elischer <julian@elischer.org>, arch@freebsd.org
From: Garance A Drosihn <drosih@rpi.edu>
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
X-Scanned-By: CanIt (www . canit . ca)
cc: mtm@freebsd.org
Subject: Re: atomic reference counting primatives.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 May 2004 00:45:16 -0000

At 1:56 PM -0700 5/20/04, Julian Elischer wrote:
>This has been raised before but I have come across uses for
>it again and again so I'm raising it again.   JHB once posted
>some atomic reference counting primitives. (Do you still have
>them John?)   Alfred once said he had some somewhere too, and
>others have commented on this before, but we still don't seem
>to have any.

Btw, does this thread have anything to do with the present
buuldworld-breakage for sparc64?  I notice the compile-time
errors are something like:

/usr/src/lib/libthr/thread/thr_cancel.c: In function `testcancel':
/usr/src/lib/libthr/thread/thr_cancel.c:123: warning: passing
      arg 1 of `atomic_cmpset_int' from incompatible pointer type

My guess is that this is related to Mike's change to "Make libthr 
async-signal-safe without costly signal masking. [...etc...]".

This breakage underlines one reason that it would be mighty
convenient to have some "official" set of primitives.  It is
one thing if a developer has to roll-their-own solution for
i386, but somewhat more challenging if that solution has to
work across a half-dozen different hardware platforms.

This also suggests that it would be nice if the primitives
could be written so that if the wrong type-of-parameters are
given, the compiles will fail on *all* platforms.

-- 
Garance Alistair Drosehn            =   gad@gilead.netel.rpi.edu
Senior Systems Programmer           or  gad@freebsd.org
Rensselaer Polytechnic Institute    or  drosih@rpi.edu

From owner-freebsd-arch@FreeBSD.ORG  Fri May 21 18:53:28 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id D4AC416A4CE; Fri, 21 May 2004 18:53:28 -0700 (PDT)
Received: from rwcrmhc12.comcast.net (rwcrmhc12.comcast.net [216.148.227.85])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 7E3B943D39; Fri, 21 May 2004 18:53:28 -0700 (PDT)
	(envelope-from julian@elischer.org)
Received: from interjet.elischer.org ([24.7.73.28])
          by comcast.net (rwcrmhc12) with ESMTP
          id <2004052201531401400g304le>; Sat, 22 May 2004 01:53:14 +0000
Received: from localhost (localhost.elischer.org [127.0.0.1])
	by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id SAA91906;
	Fri, 21 May 2004 18:53:13 -0700 (PDT)
Date: Fri, 21 May 2004 18:53:11 -0700 (PDT)
From: Julian Elischer <julian@elischer.org>
To: Garance A Drosihn <drosih@rpi.edu>
In-Reply-To: <p06020404bcd44faaef2f@[128.113.24.47]>
Message-ID: <Pine.BSF.4.21.0405211852070.72391-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: arch@freebsd.org
cc: mtm@freebsd.org
Subject: Re: atomic reference counting primatives.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 May 2004 01:53:28 -0000


On Fri, 21 May 2004, Garance A Drosihn wrote:

> At 1:56 PM -0700 5/20/04, Julian Elischer wrote:
> >This has been raised before but I have come across uses for
> >it again and again so I'm raising it again.   JHB once posted
> >some atomic reference counting primitives. (Do you still have
> >them John?)   Alfred once said he had some somewhere too, and
> >others have commented on this before, but we still don't seem
> >to have any.
> 
> Btw, does this thread have anything to do with the present
> buuldworld-breakage for sparc64?

Not specifically, but for the reasons you outline below, 
it's an example of the kind of reason one might have for doing it..


>  I notice the compile-time
> errors are something like:
> 
> /usr/src/lib/libthr/thread/thr_cancel.c: In function `testcancel':
> /usr/src/lib/libthr/thread/thr_cancel.c:123: warning: passing
>       arg 1 of `atomic_cmpset_int' from incompatible pointer type
> 
> My guess is that this is related to Mike's change to "Make libthr 
> async-signal-safe without costly signal masking. [...etc...]".
> 
> This breakage underlines one reason that it would be mighty
> convenient to have some "official" set of primitives.  It is
> one thing if a developer has to roll-their-own solution for
> i386, but somewhat more challenging if that solution has to
> work across a half-dozen different hardware platforms.
> 
> This also suggests that it would be nice if the primitives
> could be written so that if the wrong type-of-parameters are
> given, the compiles will fail on *all* platforms.
> 
> -- 
> Garance Alistair Drosehn            =   gad@gilead.netel.rpi.edu
> Senior Systems Programmer           or  gad@freebsd.org
> Rensselaer Polytechnic Institute    or  drosih@rpi.edu
> 

From owner-freebsd-arch@FreeBSD.ORG  Sat May 22 02:16:33 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7BAA816A4CE; Sat, 22 May 2004 02:16:33 -0700 (PDT)
Received: from mx.nsu.ru (mx.nsu.ru [212.192.164.5])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 146D543D2D; Sat, 22 May 2004 02:16:33 -0700 (PDT)
	(envelope-from danfe@regency.nsu.ru)
Received: from regency.nsu.ru ([193.124.210.26])
	by mx.nsu.ru with esmtp (Exim 4.30)
	id 1BRSlr-0004De-Ce; Sat, 22 May 2004 16:26:23 +0700
Received: from regency.nsu.ru (localhost [127.0.0.1])
	by regency.nsu.ru (8.12.10/8.12.10) with ESMTP id i4M9IMAT053216;
	Sat, 22 May 2004 16:18:22 +0700 (NOVST)
	(envelope-from danfe@regency.nsu.ru)
Received: (from danfe@localhost)
	by regency.nsu.ru (8.12.10/8.12.10/Submit) id i4M9IMeu053182;
	Sat, 22 May 2004 16:18:22 +0700 (NOVST)
	(envelope-from danfe)
Date: Sat, 22 May 2004 16:18:22 +0700
From: Alexey Dokuchaev <danfe@nsu.ru>
To: "Crist J. Clark" <cjc@freebsd.org>
Message-ID: <20040522091822.GA50435@regency.nsu.ru>
References: <20040521193605.GA8246@blossom.cjclark.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20040521193605.GA8246@blossom.cjclark.org>
User-Agent: Mutt/1.4.2.1i
cc: freebsd-arch@freebsd.org
Subject: Re: Move /usr/sup to /var/db/sup?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 May 2004 09:16:33 -0000

On Fri, May 21, 2004 at 12:36:05PM -0700, Crist J. Clark wrote:
> Just a minor thing, but I would think[0] most people would agree that
> /var/db/sup is a much more logical place for the CVSup "base" directory
> than /usr/sup. Yes, it doesn't take up much space on /usr, but for
> those who don't want to write to /usr[1] too much or mount /usr read-
> only, it's an irritant.
> 
> Of course, there is one big reason not to change it, because it would
> be a change.

FWIW, compatibility symlink can hang in there for a while (until 6.0
maybe).

./danfe

From owner-freebsd-arch@FreeBSD.ORG  Sat May 22 11:18:06 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6004316A4CE
	for <freebsd-arch@freebsd.org>; Sat, 22 May 2004 11:18:06 -0700 (PDT)
Received: from blake.polstra.com (blake.polstra.com [64.81.189.66])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 01B6C43D1F
	for <freebsd-arch@freebsd.org>; Sat, 22 May 2004 11:18:06 -0700 (PDT)
	(envelope-from jdp@polstra.com)
Received: from t30w.polstra.com (dsl081-189-078.sea1.dsl.speakeasy.net
	[64.81.189.78])
	by blake.polstra.com (8.12.11/8.12.11) with ESMTP id i4MIHQog088118
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Sat, 22 May 2004 11:17:26 -0700 (PDT)
	(envelope-from jdp@mail.polstra.com)
Received: from t30w.polstra.com (localhost [127.0.0.1])
	by t30w.polstra.com (8.12.11/8.12.11) with ESMTP id i4MIHPl5000281;
	Sat, 22 May 2004 11:17:25 -0700 (PDT)
	(envelope-from jdp@t30w.polstra.com)
Received: (from jdp@localhost)
	by t30w.polstra.com (8.12.11/8.12.11/Submit) id i4MIHOaN000280;
	Sat, 22 May 2004 11:17:24 -0700 (PDT)
	(envelope-from jdp)
Message-ID: <XFMail.20040522111724.jdp@polstra.com>
X-Mailer: XFMail 1.5.5 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <p06020403bcd422915d2e@[128.113.24.47]>
Date: Sat, 22 May 2004 11:17:24 -0700 (PDT)
From: John Polstra <jdp@polstra.com>
To: Garance A Drosihn <drosih@rpi.edu>
X-Bogosity: No, tests=bogofilter, spamicity=0.088176, version=0.14.5
cc: freebsd-arch@freebsd.org
Subject: Re: Move /usr/sup to /var/db/sup?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 May 2004 18:18:06 -0000

On 21-May-2004 Garance A Drosihn wrote:
> I have no idea if that is why someone went with /usr/sup in
> the example supfiles, though.

I do. :-)  That was the location in the original supfiles used by the
old "sup" utility that CVSup replaced.  When I first released CVSup
I made it so that people could use their old supfiles unmodified --
simply because I wanted it to be as easy as possible to switch to
CVSup so a lot of people would try it out.

I agree that /usr/sup is a lousy place for the metadata, and that
something under /var/db would be a lot more sensible.

John

From owner-freebsd-arch@FreeBSD.ORG  Sat May 22 23:59:30 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from green.homeunix.org (freefall.freebsd.org [216.136.204.21])
	by hub.freebsd.org (Postfix) with ESMTP
	id A42FD16A4CE; Sat, 22 May 2004 23:59:30 -0700 (PDT)
Received: from green.homeunix.org (green@localhost [127.0.0.1])
	by green.homeunix.org (8.12.11/8.12.11) with ESMTP id i4N6xUEc059178;
	Sun, 23 May 2004 02:59:30 -0400 (EDT)
	(envelope-from green@green.homeunix.org)
Received: (from green@localhost)
	by green.homeunix.org (8.12.11/8.12.11/Submit) id i4N6xTIH059177;
	Sun, 23 May 2004 02:59:29 -0400 (EDT)
	(envelope-from green)
Date: Sun, 23 May 2004 02:59:28 -0400
From: Brian Feldman <green@freebsd.org>
To: Peter Jeremy <PeterJeremy@optushome.com.au>
Message-ID: <20040523065928.GD51125@green.homeunix.org>
References: <Pine.NEB.3.96L.1040520162957.90528H-100000@fledge.watson.org>
	<40AD2405.DC13B45C@freebsd.org>
	<20040521213208.GA87546@cirb503493.alcatel.com.au>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20040521213208.GA87546@cirb503493.alcatel.com.au>
User-Agent: Mutt/1.5.6i
cc: Robert Watson <rwatson@freebsd.org>
cc: arch@freebsd.org
cc: Andre Oppermann <andre@freebsd.org>
Subject: Re: Network Stack Locking
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 23 May 2004 06:59:31 -0000

On Sat, May 22, 2004 at 07:32:08AM +1000, Peter Jeremy wrote:
> On Thu, 2004-May-20 23:32:53 +0200, Andre Oppermann wrote:
> >Robert Watson wrote:
> >...
> >> Note that there are some serious issues with the current locking changes:
> >Progress happens incrementally.  Put in Green's kqueue locking, have
> >that working correctly and make it perfect in a second step.
> 
> I don't believe this is the correct approach at this time.  Brian's
> code removes functionality that people have stated that they _do_ use.
> In theory, John-Mark's approach offers better performance without the
> loss of functionality.  Before implementing Brian's code, the Project
> needs to decide which direction it should move in.

*shrug*  I added recursive kqueues because some people indicated that
they actually had reason to use it.  I still haven't added the
NOTE_TRACK functionality because there is no known project in the
entire world that uses it, so it has no chance of breaking anything
at all for me by not having it.

Anyway, I still want to see any alternative kqueue locking
implementations. I haven't even seen a complete enough description of
what the proposed change is supposed to look like to know whether it
actually solves all of the issues that kqueue has now. If someone posts
all the details and not just bits and pieces.... I don't know why I am
the only person to have taken a shot at a complete implementation when
the subsystem is so completely MP-broken already.

-- 
Brian Fundakowski Feldman                           \'[ FreeBSD ]''''''''''\
  <> green@FreeBSD.org                               \  The Power to Serve! \
 Opinions expressed are my own.                       \,,,,,,,,,,,,,,,,,,,,,,\