From owner-freebsd-arch Sun Feb 4 16:14:31 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id F101537B401 for ; Sun, 4 Feb 2001 16:14:14 -0800 (PST) Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.11.1/8.11.1) with SMTP id f150EEh75545 for ; Sun, 4 Feb 2001 19:14:14 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Sun, 4 Feb 2001 19:14:14 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: freebsd-arch@FreeBSD.org Subject: Tests for NULL p_ucred under p_cred -- are they needed? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I've noticed that at various points in the kernel code, there are tests to check that the ucred structure in a proc is non-NULL before using it. Under what circumstances do we believe it is possible for the ucred pointer to be non-NULL? It seems that, in normal usage, it should always be defined--the only points where it might be NULL would be during process creation and process exit. Are these windows long enough for it to be a concern? Are appropriate process locks held, under SMPng, such that it's never possible to grab a ucred structure for a process while it is NULL? It seems that there are other components of the code that assume that if (p) is non-NULL, then a ucred must be defined for the process, which seems like a consistent assumption assuming appropriate protections are in place. Robert N M Watson FreeBSD Core Team, TrustedBSD Project robert@fledge.watson.org NAI Labs, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 2:12: 0 2001 Delivered-To: freebsd-arch@freebsd.org Received: from lndsmtp01.ico.com (unknown [212.57.217.43]) by hub.freebsd.org (Postfix) with ESMTP id 23E0737B401; Mon, 5 Feb 2001 02:11:41 -0800 (PST) Received: from lndgate01.ico.com (unverified) by lndsmtp01.ico.com (Content Technologies SMTPRS 4.1.5) with ESMTP id ; Mon, 5 Feb 2001 09:48:55 +0000 Received: from zoo.co.uk (212.57.223.232 [212.57.223.232]) by lndgate01.ico.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id CRG16CS5; Mon, 5 Feb 2001 09:53:40 -0000 Message-ID: <3A7E767B.6AADB3B5@zoo.co.uk> Date: Mon, 05 Feb 2001 09:46:35 +0000 From: Nathan Gould X-Mailer: Mozilla 4.75 [en] (X11; U; OpenBSD 2.8 i386) X-Accept-Language: en MIME-Version: 1.0 To: Robert Watson Cc: freebsd-arch@FreeBSD.ORG Subject: Re: Tests for NULL p_ucred under p_cred -- are they needed? References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Robert Watson wrote: > I've noticed that at various points in the kernel code, there are tests to > check that the ucred structure in a proc is non-NULL before using it. > Under what circumstances do we believe it is possible for the ucred > pointer to be non-NULL? It seems that, in normal usage, it should always > be defined--the only points where it might be NULL would be during process > creation and process exit. Are these windows long enough for it to be a > concern? Are appropriate process locks held, under SMPng, such that it's > never possible to grab a ucred structure for a process while it is NULL? > > It seems that there are other components of the code that assume that if > (p) is non-NULL, then a ucred must be defined for the process, which seems > like a consistent assumption assuming appropriate protections are in > place. > > Robert N M Watson FreeBSD Core Team, TrustedBSD Project > robert@fledge.watson.org NAI Labs, Safeport Network Services > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message Surely, if for no other reason, we should be checking for abnormalities such as non-Null for security reasons i.e. security breaches tend to be based on non-corformance to publicised identified usage. Just a thought... Nathan Gould ngould@zoo.co.uk To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 7:45:16 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id 52AC737B491 for ; Mon, 5 Feb 2001 07:44:55 -0800 (PST) Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.11.1/8.11.1) with SMTP id f15FiWh83452; Mon, 5 Feb 2001 10:44:32 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Mon, 5 Feb 2001 10:44:32 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: Nathan Gould Cc: freebsd-arch@FreeBSD.ORG Subject: Re: Tests for NULL p_ucred under p_cred -- are they needed? In-Reply-To: <3A7E767B.6AADB3B5@zoo.co.uk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, 5 Feb 2001, Nathan Gould wrote: > Robert Watson wrote: > > > I've noticed that at various points in the kernel code, there are tests to > > check that the ucred structure in a proc is non-NULL before using it. > > Under what circumstances do we believe it is possible for the ucred > > pointer to be non-NULL? It seems that, in normal usage, it should always > > be defined--the only points where it might be NULL would be during process > > creation and process exit. Are these windows long enough for it to be a > > concern? Are appropriate process locks held, under SMPng, such that it's > > never possible to grab a ucred structure for a process while it is NULL? > > > > It seems that there are other components of the code that assume that if > > (p) is non-NULL, then a ucred must be defined for the process, which seems > > like a consistent assumption assuming appropriate protections are in > > place. > > Surely, if for no other reason, we should be checking for abnormalities > such as non-Null for security reasons i.e. security breaches tend to be > based on non-corformance to publicised identified usage. Well, in the event that the credential was NULL, a number of chunks of code currently present would simply panic; my question was about whether or not those chunks of code are incorrect, or whether we can trim out all the conditionals that test (p_cred) (et al); here are a few samples where there is a conditional: kern_proc.c:392 fill_kinfo_proc(p, kp) struct proc *p; struct kinfo_proc *kp; { ... if (p->p_cred) { kp->ki_uid = p->p_cred->pc_ucred->cr_uid; kp->ki_ruid = p->p_cred->p_ruid; kp->ki_svuid = p->p_cred->p_svuid; ... kern_proc.c:600, 606 static int sysctl_kern_proc(SYSCTL_HANDLER_ARGS) { ... case KERN_PROC_UID: if (p->p_ucred == NULL || p->p_ucred->cr_uid != (uid_t)name[0]) continue; break; ... case KERN_PROC_RUID: if (p->p_ucred == NULL || p->p_cred->p_ruid != (uid_t)name[0]) continue; break; } It appears to me that a struct proc should always have a defined p_cred, although there does appear to be a small window in fork1() where it has been added to the global process list and the struct proc is not yet fully initialized. However, the p_cred pointer in that case is the parent's value; and all processes appear to inherit their credential from proc0 which has one hard-coded in init_main.c. kern_exit.c appears to hold the process lock while releasing both the ucred and cred structures; it's possible there is a window there also because the process isn't removed from some of it's inter-process relationships (pgrp, zombproc, p_sibling) until after the credential has been freed, and the process lock has been released. However, there is a fair amount of code that seems to assume the credential is always defined; largely, that appears to be the case for code that acts on behalf of the process: maybe the key here is that a process's credentials must always be defined between the end of fork1() and the beginning of exit(), meaning that when a process itself requests a service, it will be defined and can be relied on, but during process creation/teardown, the credential may be NULL and therefore code acting on the process cannot assume that the credential exists. Not that procfs chooses to ignore processes without credentials: procfs_vnops.c: 407 static int procfs_getattr(ap) struct vop_getattr_args /* { struct vnode *a_vp; struct vattr *a_vap; struct ucred *a_cred; struct proc *a_p; } */ *ap; { ... default: procp = PFIND(pfs->pfs_pid); if (procp == 0 || procp->p_cred == NULL || procp->p_ucred == NULL) return (ENOENT); The code snippets above came from sysctl() code where a process is retrieving information on other processes, similarly. An exception to this would be in Poul-Henning's p_trespass() from RELENG_4 and early RELENG_5, where p_trespass() is invoked on processes that may receive signals, but without a credential==NULL check that I can find (this is from RELENG_4_2_0_RELEASE): kern_prot.c: 966 int p_trespass(struct proc *p1, struct proc *p2) { ... if (p1->p_cred->p_ruid == p2->p_cred->p_ruid) return (0); As invoked from kern_sig.c kern_sig.c: 100, 876 #define CANSIGNAL(p, q, sig) \ (!p_trespass(p, q) || \ ((sig) == SIGCONT && (q)->p_session == (p)->p_session)) ... int kill(cp, uap) register struct proc *cp; register struct kill_args *uap; { ... /* kill single process */ if ((p = pfind(uap->pid)) == NULL) return (ESRCH); if (!CANSIGNAL(cp, p, uap->signum)) return (EPERM); In any case, there seems to be some inconsistency. It would seem that either (a) it is an invariant that p_cred is non-NULL for all reachable processes via various process lists (except unused processes), (b) it's an invariant that p_cred is non-NULL between the end of fork1() and the beginning of exit(), and that p_cred is therefore always defined if you're acting on behalf of the process, but not necessarily if you're acting on the process. Clearly, (1) would make life easier, and mean we could remove a fair number of checks. However, it may be that (b) is the case, in which case the signal code might require fixing, or the invariants it depends on at least require documenting. This relevant also as I overhaul the process access control routines, because I need to know if it's possible to have processes without credentials, and if so, what it means. Robert N M Watson FreeBSD Core Team, TrustedBSD Project robert@fledge.watson.org NAI Labs, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 9:26:49 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mail.wgate.com (mail.wgate.com [38.219.83.4]) by hub.freebsd.org (Postfix) with ESMTP id AC9BA37B503; Mon, 5 Feb 2001 09:26:29 -0800 (PST) Received: from jesup.eng.tvol.net ([10.32.2.26]) by mail.wgate.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 1LP8YWWG; Mon, 5 Feb 2001 12:26:29 -0500 Reply-To: Randell Jesup To: Matt Dillon Cc: Matthew Jacob , "Justin T. Gibbs" , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) References: <200102040026.f140QuD12547@earth.backplane.com> From: Randell Jesup Date: 05 Feb 2001 12:30:50 -0500 In-Reply-To: Matt Dillon's message of "Sat, 3 Feb 2001 16:26:56 -0800 (PST)" Message-ID: User-Agent: Gnus/5.0807 (Gnus v5.8.7) Emacs/20.7 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Matt Dillon writes: > This is a reasonable criticism, but putting aside the issue of bloating > kernel stack useage from huge struct buf structures there is also the > issue of whether any static limit is 'reasonable'. Good point. > The device driver API supports arbitrary raw read and raw write > sizes, but nearly all the device drivers convert read() and write() > calls to physio() calls, and those then convert the parameters > to struct buf / VOP_STRATEGY() calls. > > There are only two solutions that I can see: > > (1) have the SCSI tape device code not convert raw reads and writes > to VOP_STRATEGY calls and instead manage the KVA for the I/O via some > other mechanism. This seems rather painful and makes support for large IO's very driver-dependant and confusing. > (2) Modify the 'struct buf' b_pages[] array to instead be a pointer > to an array. Include the original static array under another name > for compatibility purposes and have the init code default to > assigning b_pages to the original embedded static array. > > Then the physio code could be adjusted to dynamically MALLOC the > necessary pages array if the static one in the supplied buffer is > insufficient. So, how reasonable is this? It seems like a pretty good solution, but I'm far from up-to-speed on the internals here. -- Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94) rjesup@wgate.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 9:31:19 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 559B037B503; Mon, 5 Feb 2001 09:31:02 -0800 (PST) Received: (from dillon@localhost) by earth.backplane.com (8.11.1/8.9.3) id f15HUIU21219; Mon, 5 Feb 2001 09:30:18 -0800 (PST) (envelope-from dillon) Date: Mon, 5 Feb 2001 09:30:18 -0800 (PST) From: Matt Dillon Message-Id: <200102051730.f15HUIU21219@earth.backplane.com> To: Randell Jesup Cc: Matthew Jacob , "Justin T. Gibbs" , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) References: <200102040026.f140QuD12547@earth.backplane.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :> (1) have the SCSI tape device code not convert raw reads and writes :> to VOP_STRATEGY calls and instead manage the KVA for the I/O via some :> other mechanism. : : This seems rather painful and makes support for large IO's very :driver-dependant and confusing. :... :> :> Then the physio code could be adjusted to dynamically MALLOC the :> necessary pages array if the static one in the supplied buffer is :> insufficient. : : So, how reasonable is this? It seems like a pretty good solution, :but I'm far from up-to-speed on the internals here. : :-- :Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94) :rjesup@wgate.com I think what's reasonable is to wait until someone - Poul maybe, puts a better I/O buffering subsytem in place. Anything we do right now will be a bad hack. The funny thing about all of this is that we go to great pains to make things contiguous in KVM, but the bus dma code has to then break things up into page-by-page DMAs anyway. I'd much rather just hand the I/O subsystem a list of vm_page_t's without bothering to map them into KVM. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 9:36: 5 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mail.wgate.com (mail.wgate.com [38.219.83.4]) by hub.freebsd.org (Postfix) with ESMTP id 6872D37B65D; Mon, 5 Feb 2001 09:35:45 -0800 (PST) Received: from jesup.eng.tvol.net ([10.32.2.26]) by mail.wgate.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 1LP8YXAJ; Mon, 5 Feb 2001 12:35:38 -0500 Reply-To: Randell Jesup To: Cy Schubert - ITSD Open Systems Group Cc: Matt Dillon , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) References: <200102031946.f13JkBA08356@cwsys.cwsent.com> From: Randell Jesup Date: 05 Feb 2001 12:39:59 -0500 In-Reply-To: Cy Schubert - ITSD Open Systems Group's message of "Sat, 03 Feb 2001 11:45:44 -0800" Message-ID: User-Agent: Gnus/5.0807 (Gnus v5.8.7) Emacs/20.7 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Cy Schubert - ITSD Open Systems Group writes: >> And, finally, while large I/O's may seem to be a good idea, they can >> actually interfere with the time-share mechanisms that smooth system >> operation. If you queue a 1 MByte I/O to a disk device, that disk >> device is locked up doing that one I/O for a long time (in cpu-time >> terms). Having a large number of bytes queued for I/O on one device >> can interfere with the performance of another device. In short, >> your performance is not going to get better and could very well get >> worse. > >I remember an IBM MVS course course that made this point abundantly >clear. The short of it was that if your system was primarily used as a >batch system, e.g. response time didn't matter but throughput did, use >large block sizes. If on the other hand your primary workload was time >sharing or transaction processing applications, smaller block sizes >would improve response times but reduce throughput. Large block sizes >tend to monopolise I/O channels. Ok. However, a given machine may be used for either heavy batch server-style use (say email, DB), or for more interactive work (including things like serving real-time requests like web pages). Also, usages can vary over time and load - when there are a bunch of processes accessing the disk with smallish IO's and/or paging (on that device), we don't want a large IO tying it up for a while; while when there are few or one process accessing the channel we probably don't mind running larger requests. So, the point (as Matt mentioned) is whether any static limit is appropriate? Or should it be dynamic or at least adjustable? When is a smaller limit better? When do we want a larger limit? Also, devices should be able specify higher (or lower) limits, like for SCSI tape drives. Personally, I think a dynamic system is preferable, but obviously more complex. In any case I think it should be adjustable statically. -- Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94) rjesup@wgate.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 9:52:53 2001 Delivered-To: freebsd-arch@freebsd.org Received: from feral.com (feral.com [192.67.166.1]) by hub.freebsd.org (Postfix) with ESMTP id 97F5537B65D; Mon, 5 Feb 2001 09:52:35 -0800 (PST) Received: from zeppo.feral.com (IDENT:mjacob@zeppo [192.67.166.71]) by feral.com (8.9.3/8.9.3) with ESMTP id JAA04006; Mon, 5 Feb 2001 09:52:22 -0800 Date: Mon, 5 Feb 2001 09:52:20 -0800 (PST) From: Matthew Jacob Reply-To: mjacob@feral.com To: Matt Dillon Cc: Randell Jesup , "Justin T. Gibbs" , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) In-Reply-To: <200102051730.f15HUIU21219@earth.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > The funny thing about all of this is that we go to great pains to > make things contiguous in KVM, but the bus dma code has to then break > things up into page-by-page DMAs anyway. I'd much rather just hand the > I/O subsystem a list of vm_page_t's without bothering to map them into > KVM. See solaris && SunOS for this one. Also, the busdma code doesn't 'have' to break things up. If the underlying physical pages are contiguous then there's no need to have multiple entries. You should note, btw, that not all archictures require or can use scatter-gather (sparc, for instance, which has an iommu). To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 12: 8:19 2001 Delivered-To: freebsd-arch@freebsd.org Received: from aslan.scsiguy.com (mail.scsiguy.com [63.229.232.106]) by hub.freebsd.org (Postfix) with ESMTP id 1ACCE37B491; Mon, 5 Feb 2001 12:08:01 -0800 (PST) Received: from scsiguy.com (localhost [127.0.0.1]) by aslan.scsiguy.com (8.11.0/8.9.3) with ESMTP id f15K6bO49659; Mon, 5 Feb 2001 13:06:54 -0700 (MST) (envelope-from gibbs@scsiguy.com) Message-Id: <200102052006.f15K6bO49659@aslan.scsiguy.com> To: Randell Jesup Cc: Matt Dillon , Matthew Jacob , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) In-Reply-To: Your message of "05 Feb 2001 12:30:50 EST." Date: Mon, 05 Feb 2001 13:06:37 -0700 From: "Justin T. Gibbs" Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG >> (2) Modify the 'struct buf' b_pages[] array to instead be a pointer >> to an array. Include the original static array under another name >> for compatibility purposes and have the init code default to >> assigning b_pages to the original embedded static array. >> >> Then the physio code could be adjusted to dynamically MALLOC the >> necessary pages array if the static one in the supplied buffer is >> insufficient. > > So, how reasonable is this? It seems like a pretty good solution, >but I'm far from up-to-speed on the internals here. I'd rather allow bufs (or bios) to be chained and let the block devices decide how to break them up. This simplifies the clustering code too as you avoid all of the VM operations to combine bufs into a single cluster buf. -- Justin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 12:47:44 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 8327237B491; Mon, 5 Feb 2001 12:47:25 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f15Kl7S09686; Mon, 5 Feb 2001 12:47:07 -0800 (PST) Date: Mon, 5 Feb 2001 12:47:07 -0800 From: Alfred Perlstein To: "Justin T. Gibbs" Cc: Randell Jesup , Matt Dillon , Matthew Jacob , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) Message-ID: <20010205124707.Y26076@fw.wintelcom.net> References: <200102052006.f15K6bO49659@aslan.scsiguy.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200102052006.f15K6bO49659@aslan.scsiguy.com>; from gibbs@scsiguy.com on Mon, Feb 05, 2001 at 01:06:37PM -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Justin T. Gibbs [010205 12:08] wrote: > >> (2) Modify the 'struct buf' b_pages[] array to instead be a pointer > >> to an array. Include the original static array under another name > >> for compatibility purposes and have the init code default to > >> assigning b_pages to the original embedded static array. > >> > >> Then the physio code could be adjusted to dynamically MALLOC the > >> necessary pages array if the static one in the supplied buffer is > >> insufficient. > > > > So, how reasonable is this? It seems like a pretty good solution, > >but I'm far from up-to-speed on the internals here. > > I'd rather allow bufs (or bios) to be chained and let the block devices > decide how to break them up. This simplifies the clustering code too > as you avoid all of the VM operations to combine bufs into a single cluster > buf. One of the suggestions that Poul-Henning made was to have the device somehow specify an optimal clustering strategy, being able to specify bounds and sizes. For instance an NFS commit request could be megabytes in size, while a NFS write may not want any clustering at all. A RAID request might want to ask for a megabyte of data, but have it in a range on the device level. Currently (i think) we only cluster based on logical file offsets, it would be interesting to allow drivers to do callbacks into the FS to ask for blocks physically adjacent to the blocks being written. This is because a 64k block of any file may actually be spread out across any position, even though UFS tries to reduce fragmentation, the worse case is that we do the vm ops to cluster non-physically contiguous blocks. I think the simplest way to do this would be to rip out the current clustering code and provide helper routines for the devices to get adjacent blocks, either logically via VOP or physically via some VFS mechanism. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 13: 2:30 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147]) by hub.freebsd.org (Postfix) with ESMTP id 909E337B491; Mon, 5 Feb 2001 13:02:09 -0800 (PST) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f15L1fB28620; Mon, 5 Feb 2001 22:01:41 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: Alfred Perlstein Cc: "Justin T. Gibbs" , Randell Jesup , Matt Dillon , Matthew Jacob , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) In-Reply-To: Your message of "Mon, 05 Feb 2001 12:47:07 PST." <20010205124707.Y26076@fw.wintelcom.net> Date: Mon, 05 Feb 2001 22:01:41 +0100 Message-ID: <28618.981406901@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <20010205124707.Y26076@fw.wintelcom.net>, Alfred Perlstein writes: >One of the suggestions that Poul-Henning made was to have the device >somehow specify an optimal clustering strategy, being able to specify >bounds and sizes. > >[...] > >Currently (i think) we only cluster based on logical file offsets, >it would be interesting to allow drivers to do callbacks into the >FS to ask for blocks physically adjacent to the blocks being written. I've been playing with various ideas in this area, and to be frank, totally failed to come up with a breakthrough. Give methods like striping and RAID-5, it becomes nontrivial to find a specification language for the driver to say "it would be quick to write the following blocks also" and it would be even slower to determine if this was indeed feasible. "feasible" covers not only "do we have it in RAM", but also "is it already scheduled for writing", "is it dirty" and not the least "would softupdates take a fit if we wrote it". The best I have been able to do so far is if the device-driver can specify the following quantities: (M) maxmimum request size (R) preferred request size (B) preferred request sector boundary The clustering code would then try to increase request to: N * R sectors starting X where X mod B == 0 and N * R <= M Having found a cluster opportunity, the cluster code will issue the read/write request specifying: (E) First possible sector in request (S) First mandatory sector in request (L) Last mandatory sector in request (F) Lase possible sector in request (B) Sector address of (S) on media. The driver has to process the data from [S ... L], and can optionally process [E...S[ and ]L...F] if that seems convenient. If somebody is looking for a good project, benchmarking the performance of our current clustering and playing around with various changes would not be the worst way to spend some winter evenings. Playing with FFS/UFS options (block/fragment etc) at the same time may be worth while. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 13:10:16 2001 Delivered-To: freebsd-arch@freebsd.org Received: from feral.com (feral.com [192.67.166.1]) by hub.freebsd.org (Postfix) with ESMTP id 2879737B4EC for ; Mon, 5 Feb 2001 13:09:59 -0800 (PST) Received: from zeppo.feral.com (IDENT:mjacob@zeppo [192.67.166.71]) by feral.com (8.9.3/8.9.3) with ESMTP id NAA04841 for ; Mon, 5 Feb 2001 13:10:01 -0800 Date: Mon, 5 Feb 2001 13:09:54 -0800 (PST) From: Matthew Jacob Reply-To: mjacob@feral.com To: arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) In-Reply-To: <28618.981406901@critter> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG All of this is nice and fine, but the take home notion here is that there's more than a "maximum" or a "preferred" size. There's also a "required request size". And this isn't a constant value you can stash in a dev_t- or you'll have to have drivers change it as required. It seems to me that the physio should just be beefed up to take an argument to a 'parameterization' function, and that flags could be used that say "we don't even need this mapped any where- just make sure that the pages referred to are resident". All of the other stuff is really more of a tight interaction with VM for optimizing. -matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 13:15:13 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mass.dis.org (mass.dis.org [216.240.45.41]) by hub.freebsd.org (Postfix) with ESMTP id ED5D537B401 for ; Mon, 5 Feb 2001 13:14:55 -0800 (PST) Received: from mass.dis.org (localhost [127.0.0.1]) by mass.dis.org (8.11.1/8.11.1) with ESMTP id f15LFoe01152; Mon, 5 Feb 2001 13:15:58 -0800 (PST) (envelope-from msmith@mass.dis.org) Message-Id: <200102052115.f15LFoe01152@mass.dis.org> X-Mailer: exmh version 2.1.1 10/15/1999 To: mjacob@feral.com Cc: arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) In-reply-to: Your message of "Mon, 05 Feb 2001 13:09:54 PST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 05 Feb 2001 13:15:50 -0800 From: Mike Smith Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > It seems to me that the physio should just be beefed up to take an argument to > a 'parameterization' function, and that flags could be used that say "we don't > even need this mapped any where- just make sure that the pages referred to are > resident". This is more or less what Matt was talking about; the mapping of buffer pages into linear KVM should be optional based on a driver attribute (or, perhaps preferably, only performed at the driver's request). I'm sure that someone will eventually get around to doing something about this... -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 13:17:27 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147]) by hub.freebsd.org (Postfix) with ESMTP id 76E6D37B4EC for ; Mon, 5 Feb 2001 13:17:09 -0800 (PST) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f15LHFB28842; Mon, 5 Feb 2001 22:17:15 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: mjacob@feral.com Cc: arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) In-Reply-To: Your message of "Mon, 05 Feb 2001 13:09:54 PST." Date: Mon, 05 Feb 2001 22:17:15 +0100 Message-ID: <28840.981407835@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message , Matthew Jacob writes: > >All of this is nice and fine, but the take home notion here is that there's >more than a "maximum" or a "preferred" size. There's also a "required request >size". And this isn't a constant value you can stash in a dev_t- or you'll >have to have drivers change it as required. > >It seems to me that the physio should just be beefed up to take an argument to >a 'parameterization' function, and that flags could be used that say "we don't >even need this mapped any where- just make sure that the pages referred to are >resident". This is a different issue. Yes, I want us to be able to handle unmapped pages with struct bio, but that is an entirely separate (and simpler) issue than how clustering is done. To make struct bio handle unmapped memory, all you have to do is this: 1. Add a driver flag which means "I can do unmapped struct bio": D_UNMAPPEDBIO. 2. Add code to specfs::specstrategy(): if (!(devsw(dev_t)->d_flags & D_UNMAPPEDBIO)) { if (bio_is_unmapped(bio)) map_bio(bio); } 3. Add the fields you need to struct bio. 4. Write a driver which DTRT. 5. Make upper kernel and filesystems use the new facility. By all means attack this if you have the foo it takes. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 13:24:29 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id B246C37B401; Mon, 5 Feb 2001 13:24:07 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f15LLr011092; Mon, 5 Feb 2001 13:21:53 -0800 (PST) Date: Mon, 5 Feb 2001 13:21:52 -0800 From: Alfred Perlstein To: Poul-Henning Kamp Cc: "Justin T. Gibbs" , Randell Jesup , Matt Dillon , Matthew Jacob , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) Message-ID: <20010205132152.E26076@fw.wintelcom.net> References: <20010205124707.Y26076@fw.wintelcom.net> <28618.981406901@critter> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <28618.981406901@critter>; from phk@critter.freebsd.dk on Mon, Feb 05, 2001 at 10:01:41PM +0100 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Poul-Henning Kamp [010205 13:01] wrote: > In message <20010205124707.Y26076@fw.wintelcom.net>, Alfred Perlstein writes: > > >One of the suggestions that Poul-Henning made was to have the device > >somehow specify an optimal clustering strategy, being able to specify > >bounds and sizes. > > > >[...] > > > >Currently (i think) we only cluster based on logical file offsets, > >it would be interesting to allow drivers to do callbacks into the > >FS to ask for blocks physically adjacent to the blocks being written. > > I've been playing with various ideas in this area, and to be frank, > totally failed to come up with a breakthrough. > > Give methods like striping and RAID-5, it becomes nontrivial to > find a specification language for the driver to say "it would be > quick to write the following blocks also" and it would be even > slower to determine if this was indeed feasible. You're right, it's non-trivial, however the difference between memory and disk speed is also non-trivial, almost every reasonable algorithm should be considered to reduce/optimize disk traffic. A simple call into the VFS should be able to accomplish, afaik when a VFS has a disk/physical backing it also hashes/sorts bufs based on physicall backing location. Although I may be remebering stuff from 4.3BSD or 4.4BSD instead of the current code... In fact if it is stored and hashed in the bufs you really don't need a callback into the VFS, you just need a generic function to call that gathers physically contig blocks that are dirty, unlocked and actually contiguous. > "feasible" covers not only "do we have it in RAM", but also "is it > already scheduled for writing", "is it dirty" and not the least > "would softupdates take a fit if we wrote it". This is why callbacks into the VFS are probably a good idea along with a generic function that accomplishes what we currently do, except without the vm-remapping into the pbuf. (use a linked chain of bufs instead) > The best I have been able to do so far is if the device-driver > can specify the following quantities: > > (M) maxmimum request size > (R) preferred request size > (B) preferred request sector boundary > > The clustering code would then try to increase request to: > > N * R sectors starting X > where X mod B == 0 > and N * R <= M > > Having found a cluster opportunity, the cluster code will > issue the read/write request specifying: > > (E) First possible sector in request > (S) First mandatory sector in request > (L) Last mandatory sector in request > (F) Lase possible sector in request > (B) Sector address of (S) on media. > > The driver has to process the data from [S ... L], > and can optionally process [E...S[ and ]L...F] if > that seems convenient. Well, there's some assertions and questions I have about this: 1) a device should not refuse to write a block unless there's an error, meaning if 'S' can't be satisfied, it should at least write the single block out. I think S & L pretty much have to be equal to each other otherwise we can have tricky issues to deal with there S through L never become clusterable (they are locked for long periods, or just clean) 2) the device should be able to allow a certain amount of fragmentation, currently (afaik) the clustering code does not tolerate gaps, clean bufs and locked bufs within the request, this ought to be changed, there's no reason why a request really needs to be completely contiguous as the really painful part of disk io, is the seek, being able to cluster data with gaps on the same track/cyl is much more important than not having any breaks in it at all. 3) with #2, it would be important to specify a tolerance for such 'holes' in the cluster operation in case the device does have a penalty for gaps. > If somebody is looking for a good project, benchmarking > the performance of our current clustering and playing > around with various changes would not be the worst > way to spend some winter evenings. Playing with FFS/UFS > options (block/fragment etc) at the same time may be > worth while. Actually, I'm not looking for a project, I'm looking for time. :) -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 13:34: 6 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147]) by hub.freebsd.org (Postfix) with ESMTP id 4982537B401; Mon, 5 Feb 2001 13:33:48 -0800 (PST) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f15LXaB28964; Mon, 5 Feb 2001 22:33:36 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: Alfred Perlstein Cc: "Justin T. Gibbs" , Randell Jesup , Matt Dillon , Matthew Jacob , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) In-Reply-To: Your message of "Mon, 05 Feb 2001 13:21:52 PST." <20010205132152.E26076@fw.wintelcom.net> Date: Mon, 05 Feb 2001 22:33:36 +0100 Message-ID: <28962.981408816@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG >You're right, it's non-trivial, however the difference between >memory and disk speed is also non-trivial, almost every reasonable >algorithm should be considered to reduce/optimize disk traffic. > >A simple call into the VFS should be able to accomplish, afaik when >a VFS has a disk/physical backing it also hashes/sorts bufs based >on physicall backing location. Although I may be remebering stuff >from 4.3BSD or 4.4BSD instead of the current code... It's not "a simple call". By the time you can make the call, you have passed through the target FS, through specfs and the disklabel/slice code, possibly through a layer like vinum and ccd (which may have their own ideas about clustering) and only then do you arrive at a place where you know the actual sector address of the request. We can quickly dismiss the ccd/vinum case by saying that they have to cater for the needs of the lower devices, and they specify the clustering policy "like any other disk". But you still have to contend with the diskslice/label code, and specfs, so even if you do an "upcall" and find more stuff you can read/write, you need to pass this bit of the request down through the specfs (for softupdates rollback/forward) and diskslice/label code (because you want boundary checking). And having tried that, I can say with 100% conviction: that is not an sane option, and if you do it anyway you will certainly not gain any performance by the time you have resolved all the locking issues. Giving some kind of abstract hint from the driver/device and making the clustering optional for the driver is the only path which does not lead straight down to layering insanity. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 13:54:17 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 8F99237B401; Mon, 5 Feb 2001 13:53:56 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f15LpnV12266; Mon, 5 Feb 2001 13:51:49 -0800 (PST) Date: Mon, 5 Feb 2001 13:51:49 -0800 From: Alfred Perlstein To: Poul-Henning Kamp Cc: "Justin T. Gibbs" , Randell Jesup , Matt Dillon , Matthew Jacob , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) Message-ID: <20010205135149.G26076@fw.wintelcom.net> References: <20010205132152.E26076@fw.wintelcom.net> <28962.981408816@critter> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <28962.981408816@critter>; from phk@critter.freebsd.dk on Mon, Feb 05, 2001 at 10:33:36PM +0100 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Poul-Henning Kamp [010205 13:33] wrote: > > >You're right, it's non-trivial, however the difference between > >memory and disk speed is also non-trivial, almost every reasonable > >algorithm should be considered to reduce/optimize disk traffic. > > > >A simple call into the VFS should be able to accomplish, afaik when > >a VFS has a disk/physical backing it also hashes/sorts bufs based > >on physicall backing location. Although I may be remebering stuff > >from 4.3BSD or 4.4BSD instead of the current code... > > It's not "a simple call". > > By the time you can make the call, you have passed through the > target FS, through specfs and the disklabel/slice code, possibly > through a layer like vinum and ccd (which may have their own ideas > about clustering) and only then do you arrive at a place where you > know the actual sector address of the request. > > We can quickly dismiss the ccd/vinum case by saying that they > have to cater for the needs of the lower devices, and they > specify the clustering policy "like any other disk". > > But you still have to contend with the diskslice/label code, and > specfs, so even if you do an "upcall" and find more stuff you can > read/write, you need to pass this bit of the request down through > the specfs (for softupdates rollback/forward) and diskslice/label > code (because you want boundary checking). > > And having tried that, I can say with 100% conviction: that is not > an sane option, and if you do it anyway you will certainly not > gain any performance by the time you have resolved all the locking > issues. Well, my impression was that all locking operation (except mutexes) should be resolved by doing try_lockfoo() and if try_lock fails then don't cluster that object/buf/vnode (as the current code does). You are right though, I guess we don't need callbacks into the VFS, this can be resolved with just the buffer system via flags and locks. > Giving some kind of abstract hint from the driver/device and making > the clustering optional for the driver is the only path which does > not lead straight down to layering insanity. I'm not sure I understand what you mean, my vision of the current code is: Kernel IO request triggered via FS/bufdeamon/etc | 1 buf cluster_foo | 1-N bufs (in a pbuf) device | write What I'd like to see (considering we don't need to really involve VFS) is: Kernel IO request triggered via FS/bufdeamon/etc | 1 buf device ---------> cluster routine (A) | / device <----------------/ | 1-N bufs (linked list, no pbuf) write This way the device can call into any number of generic clustering routines if it wants to support them. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 14: 2:23 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147]) by hub.freebsd.org (Postfix) with ESMTP id 6B54937B401; Mon, 5 Feb 2001 14:02:03 -0800 (PST) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f15M1gB29136; Mon, 5 Feb 2001 23:01:42 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: Alfred Perlstein Cc: "Justin T. Gibbs" , Randell Jesup , Matt Dillon , Matthew Jacob , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) In-Reply-To: Your message of "Mon, 05 Feb 2001 13:51:49 PST." <20010205135149.G26076@fw.wintelcom.net> Date: Mon, 05 Feb 2001 23:01:42 +0100 Message-ID: <29134.981410502@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG >> Giving some kind of abstract hint from the driver/device and making >> the clustering optional for the driver is the only path which does >> not lead straight down to layering insanity. > >I'm not sure I understand what you mean, my vision of the current >code is: As others have pointed out, if the requirement that pages be mapped contiguously for an struct bio request is relaxed, many more clustering opportunities are expected and some mapping/unmapping operations can be avoided. Some argue that it is "some ... many ..." rather than the other way around. Either way it should be a gain. I think it makes sense to try to grab that piece of fruit first, since it has obvious benefits whereas most of the rest of the suggestions are in the "pure speculation" range and not testable without unmapped pages in struct bio. One way or another, benchmarking will be needed and just what is a good workload to benchmark on ? Is make world representative ? If not, we should establish a reproducible benchmark some other way. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 14:27:36 2001 Delivered-To: freebsd-arch@freebsd.org Received: from aslan.scsiguy.com (aslan.scsiguy.com [63.229.232.106]) by hub.freebsd.org (Postfix) with ESMTP id F097237B4EC; Mon, 5 Feb 2001 14:27:15 -0800 (PST) Received: from scsiguy.com (localhost [127.0.0.1]) by aslan.scsiguy.com (8.11.0/8.9.3) with ESMTP id f15MO2O51248; Mon, 5 Feb 2001 15:24:14 -0700 (MST) (envelope-from gibbs@scsiguy.com) Message-Id: <200102052224.f15MO2O51248@aslan.scsiguy.com> To: Poul-Henning Kamp Cc: Alfred Perlstein , Randell Jesup , Matt Dillon , Matthew Jacob , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) In-Reply-To: Your message of "Mon, 05 Feb 2001 22:33:36 +0100." <28962.981408816@critter> Date: Mon, 05 Feb 2001 15:24:02 -0700 From: "Justin T. Gibbs" Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > >It's not "a simple call". > It doesn't have to be a simple call if it only occurs once on mount and whenever a component makes an async upcall telling the system that its state has changed (array is degraded, or perhaps commonly accessed data has migrated to a different striping or RAID layout). -- Justin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 14:37:14 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147]) by hub.freebsd.org (Postfix) with ESMTP id 385E737B684; Mon, 5 Feb 2001 14:36:56 -0800 (PST) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f15MaqB29301; Mon, 5 Feb 2001 23:36:52 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: "Justin T. Gibbs" Cc: Alfred Perlstein , Randell Jesup , Matt Dillon , Matthew Jacob , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) In-Reply-To: Your message of "Mon, 05 Feb 2001 15:24:02 MST." <200102052224.f15MO2O51248@aslan.scsiguy.com> Date: Mon, 05 Feb 2001 23:36:52 +0100 Message-ID: <29299.981412612@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <200102052224.f15MO2O51248@aslan.scsiguy.com>, "Justin T. Gibbs" writes: >> >>It's not "a simple call". >> > >It doesn't have to be a simple call if it only occurs once on mount >and whenever a component makes an async upcall telling the system that >its state has changed (array is degraded, or perhaps commonly accessed >data has migrated to a different striping or RAID layout). I think we are talking too many different things at the same time here. The upcall I (and I belive Alfred) were discussing were happening once per I/O. The one you are talking about is obviously the one to formulate an abstract clustering preference for a device ? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 14:39: 9 2001 Delivered-To: freebsd-arch@freebsd.org Received: from grendel.bsdi.com (grendel.twistedbit.com [199.79.183.5]) by hub.freebsd.org (Postfix) with ESMTP id 19FD837B684; Mon, 5 Feb 2001 14:38:50 -0800 (PST) Received: from grendel.bsdi.com (cp@localhost.bsdi.com [127.0.0.1]) by grendel.bsdi.com (8.11.1/8.9.3) with ESMTP id f15MYfW96817; Mon, 5 Feb 2001 15:34:41 -0700 (MST) (envelope-from cp@grendel.bsdi.com) Message-Id: <200102052234.f15MYfW96817@grendel.bsdi.com> To: "Justin T. Gibbs" Cc: Poul-Henning Kamp , Alfred Perlstein , Randell Jesup , Matt Dillon , Matthew Jacob , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) In-reply-to: Your message of "Mon, 05 Feb 2001 15:24:02 MST." <200102052224.f15MO2O51248@aslan.scsiguy.com> From: Chuck Paterson Date: Mon, 05 Feb 2001 15:34:41 -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In the discussions I noticed someone mentioned some of the issues with architectures like Sparc. I haven't noticed anyone discuss the need to deal with the limited DVMA space. You really need to have some reservation policy on the buffer before you send them down to a driver, or at least have the driver do a call to get a reservatioin commitment before actually doing the map ins. If not you could have problems like two drivers trying to map there io buffer, both having them half mapped and unable to get the resouces to finish the mapping. Chuck Chuck To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 14:51:20 2001 Delivered-To: freebsd-arch@freebsd.org Received: from feral.com (feral.com [192.67.166.1]) by hub.freebsd.org (Postfix) with ESMTP id 9DADF37B67D; Mon, 5 Feb 2001 14:50:59 -0800 (PST) Received: from zeppo.feral.com (IDENT:mjacob@zeppo [192.67.166.71]) by feral.com (8.9.3/8.9.3) with ESMTP id OAA05261; Mon, 5 Feb 2001 14:50:07 -0800 Date: Mon, 5 Feb 2001 14:50:04 -0800 (PST) From: Matthew Jacob Reply-To: mjacob@feral.com To: Chuck Paterson Cc: "Justin T. Gibbs" , Poul-Henning Kamp , Alfred Perlstein , Randell Jesup , Matt Dillon , Mike Smith , Dag-Erling Smorgrav , Dan Nelson , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) In-Reply-To: <200102052234.f15MYfW96817@grendel.bsdi.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, 5 Feb 2001, Chuck Paterson wrote: > In the discussions I noticed someone mentioned some > of the issues with architectures like Sparc. I haven't noticed > anyone discuss the need to deal with the limited DVMA space. You > really need to have some reservation policy on the buffer before > you send them down to a driver, or at least have the > driver do a call to get a reservatioin commitment before > actually doing the map ins. If not you could have problems > like two drivers trying to map there io buffer, both having them > half mapped and unable to get the resouces to finish the mapping. True enough- but this is true for a single process that needs to map more than any specific limited resource- so it isn't just two processes getting deadlocked. That's specifically why a 'mapping window' approach was added to the Solaris DDI DMA model- this allowed one to do a dma transfer for darn near all of physical memory as long as you had a device that could shift the mapping window as needed during the transfer (yes, I actually did test it- it was *wierd* doing 28MB 'single' dma transfers on a Sparc2). From a more or less practical point of view, the newer Ultra machines have a programmable iommu that allows you to pretty much map up to a gig of memory. Then it becomes a very very interesting dance using full, uh, 36 bit I think, physical address and some undefined stuff about I/O coherencey in that case. I'll assert that FreeBSD, should it do a sparc port, shouldn't have the slightest interest in anything less than this class of machines. -matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 15:25:21 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id 62F2B37B698; Mon, 5 Feb 2001 15:25:03 -0800 (PST) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id KAA03439; Tue, 6 Feb 2001 10:24:59 +1100 Date: Tue, 6 Feb 2001 10:24:39 +1100 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Robert Watson Cc: Nathan Gould , freebsd-arch@FreeBSD.ORG Subject: Re: Tests for NULL p_ucred under p_cred -- are they needed? In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, 5 Feb 2001, Robert Watson wrote: > In any case, there seems to be some inconsistency. It would seem that > either (a) it is an invariant that p_cred is non-NULL for all reachable > processes via various process lists (except unused processes), (b) it's an > invariant that p_cred is non-NULL between the end of fork1() and the > beginning of exit(), and that p_cred is therefore always defined if you're > acting on behalf of the process, but not necessarily if you're acting on > the process. > > Clearly, (1) would make life easier, and mean we could remove a fair > number of checks. However, it may be that (b) is the case, in which case > the signal code might require fixing, or the invariants it depends on at > least require documenting. This relevant also as I overhaul the process > access control routines, because I need to know if it's possible to have > processes without credentials, and if so, what it means. p_cred is actually non-NULL until the middle of wait1(), so we are at least close to case (a), and processes "always" have credentials -- even zombies have them. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 15:46: 0 2001 Delivered-To: freebsd-arch@freebsd.org Received: from molly.straylight.com (molly.straylight.com [209.68.199.242]) by hub.freebsd.org (Postfix) with ESMTP id 1D63337B6A2 for ; Mon, 5 Feb 2001 15:45:43 -0800 (PST) Received: from dickie (case.straylight.com [209.68.199.244]) by molly.straylight.com (8.11.0/8.10.0) with SMTP id f15NjbX18424 for ; Mon, 5 Feb 2001 15:45:37 -0800 From: "Jonathan Graehl" To: Subject: nonblocking sockets and EINTR Date: Mon, 5 Feb 2001 15:46:20 -0800 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG If a TCP or UDP socket is set nonblocking, do I ever have to worry about getting my system calls for those sockets interrupted? It is my understanding that you should only have to check for EINTR for "slow" system calls (that can take an indefinite amount of time), which should mean I'm home free, since the operation either completes immediately, or I get EWOULDBLOCK. For now, since I am not sure I can count on this behavior, I block all nonfatal signals. I would like to be able to use signals to communicate to my daemon (with the caveat that I may get an EINTR for my kevent call, but not for any of my socket operations). Is there any standard behavior I can count on for nonblocking sockets w.r.t. EINTR? Thanks ... -- Jonathan Graehl email: jonathan@graehl.org web: http://jonathan.graehl.org/ phone: 858-642-7562 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 15:49: 0 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 07F8537B6A2 for ; Mon, 5 Feb 2001 15:48:44 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f15NmhR16126; Mon, 5 Feb 2001 15:48:43 -0800 (PST) Date: Mon, 5 Feb 2001 15:48:43 -0800 From: Alfred Perlstein To: Jonathan Graehl Cc: freebsd-arch@FreeBSD.ORG Subject: Re: nonblocking sockets and EINTR Message-ID: <20010205154842.J26076@fw.wintelcom.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from jonathan@graehl.org on Mon, Feb 05, 2001 at 03:46:20PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Jonathan Graehl [010205 15:46] wrote: > If a TCP or UDP socket is set nonblocking, do I ever have to worry about getting > my system calls for those sockets interrupted? It is my understanding that you > should only have to check for EINTR for "slow" system calls (that can take an > indefinite amount of time), which should mean I'm home free, since the operation > either completes immediately, or I get EWOULDBLOCK. > > For now, since I am not sure I can count on this behavior, I block all nonfatal > signals. I would like to be able to use signals to communicate to my daemon > (with the caveat that I may get an EINTR for my kevent call, but not for any of > my socket operations). > > Is there any standard behavior I can count on for nonblocking sockets w.r.t. > EINTR? You can specify that syscalls will or won't be automatically restarted via the sigaction() API. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 16:10:18 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mass.dis.org (mass.dis.org [216.240.45.41]) by hub.freebsd.org (Postfix) with ESMTP id 93F0137B503 for ; Mon, 5 Feb 2001 16:09:59 -0800 (PST) Received: from mass.dis.org (localhost [127.0.0.1]) by mass.dis.org (8.11.1/8.11.1) with ESMTP id f160BBe01822; Mon, 5 Feb 2001 16:11:12 -0800 (PST) (envelope-from msmith@mass.dis.org) Message-Id: <200102060011.f160BBe01822@mass.dis.org> X-Mailer: exmh version 2.1.1 10/15/1999 To: Chuck Paterson Cc: arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) In-reply-to: Your message of "Mon, 05 Feb 2001 15:34:41 MST." <200102052234.f15MYfW96817@grendel.bsdi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 05 Feb 2001 16:11:11 -0800 From: Mike Smith Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > In the discussions I noticed someone mentioned some > of the issues with architectures like Sparc. I haven't noticed > anyone discuss the need to deal with the limited DVMA space. You > really need to have some reservation policy on the buffer before > you send them down to a driver, or at least have the > driver do a call to get a reservatioin commitment before > actually doing the map ins. If not you could have problems > like two drivers trying to map there io buffer, both having them > half mapped and unable to get the resouces to finish the mapping. This should be handled by having bus_dmamap_load and/or bus_dmamap_sync return success values, rather than void like they do now. -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 17:16:38 2001 Delivered-To: freebsd-arch@freebsd.org Received: from molly.straylight.com (molly.straylight.com [209.68.199.242]) by hub.freebsd.org (Postfix) with ESMTP id 6E04F37B6A2; Mon, 5 Feb 2001 17:16:20 -0800 (PST) Received: from dickie (case.straylight.com [209.68.199.244]) by molly.straylight.com (8.11.0/8.10.0) with SMTP id f161GDX19005; Mon, 5 Feb 2001 17:16:13 -0800 From: "Jonathan Graehl" To: "Alfred Perlstein" Cc: , "Jonathan Lemon" Subject: RE: nonblocking sockets and EINTR (kevent does not observe SA_RESTART?) Date: Mon, 5 Feb 2001 17:16:56 -0800 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 In-Reply-To: <20010205154842.J26076@fw.wintelcom.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > You can specify that syscalls will or won't be automatically > restarted via the sigaction() API. > > -- > -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] Thank you for reminding me of this (and making me feel like my question could have been better directed at -questions, if it is so trivially answered ;) I am using sigaction with SA_RESTART, and I still get EINTR from my kevent call (no matter, this is easily dealt with, due to the straightforward kevent semantics). I assume that SA_RESTART then only applies to the traditional syscalls (read/write,send/recv), and that this may be an oversight in the kqueue implementation, at least meriting a warning in the man page (I also assume that it is not possible to get EINTR for a datagram read/write, since there is no message handle used in sendto/recvfrom) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 17:34:14 2001 Delivered-To: freebsd-arch@freebsd.org Received: from prism.flugsvamp.com (cb58709-a.mdsn1.wi.home.com [24.17.241.9]) by hub.freebsd.org (Postfix) with ESMTP id DB5FA37B491; Mon, 5 Feb 2001 17:33:55 -0800 (PST) Received: (from jlemon@localhost) by prism.flugsvamp.com (8.11.0/8.11.0) id f161Z7Y95228; Mon, 5 Feb 2001 19:35:07 -0600 (CST) (envelope-from jlemon) Date: Mon, 5 Feb 2001 19:35:07 -0600 From: Jonathan Lemon To: Jonathan Graehl Cc: Alfred Perlstein , freebsd-arch@freebsd.org, Jonathan Lemon Subject: Re: nonblocking sockets and EINTR (kevent does not observe SA_RESTART?) Message-ID: <20010205193507.J650@prism.flugsvamp.com> References: <20010205154842.J26076@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, Feb 05, 2001 at 05:16:56PM -0800, Jonathan Graehl wrote: > > You can specify that syscalls will or won't be automatically > > restarted via the sigaction() API. > > > > -- > > -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] > > Thank you for reminding me of this (and making me feel like my question could > have been better directed at -questions, if it is so trivially answered ;) > > I am using sigaction with SA_RESTART, and I still get EINTR from my kevent call > (no matter, this is easily dealt with, due to the straightforward kevent > semantics). I assume that SA_RESTART then only applies to the traditional > syscalls (read/write,send/recv), and that this may be an oversight in the kqueue > implementation, at least meriting a warning in the man page The difficulty in restarting the kevent call is that it would have to re-apply the changelist, which is probably not what you want. The only case where it is possible to perform a restart is with an empty changelist. I didn't put this optimization in, as I think it would be better if the interface was consistent in all cases. -- Jonathan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 17:50:20 2001 Delivered-To: freebsd-arch@freebsd.org Received: from molly.straylight.com (molly.straylight.com [209.68.199.242]) by hub.freebsd.org (Postfix) with ESMTP id C898437B503 for ; Mon, 5 Feb 2001 17:50:00 -0800 (PST) Received: from dickie (case.straylight.com [209.68.199.244]) by molly.straylight.com (8.11.0/8.10.0) with SMTP id f161nrX19195; Mon, 5 Feb 2001 17:49:53 -0800 From: "Jonathan Graehl" To: "Jonathan Lemon" Cc: Subject: RE: nonblocking sockets and EINTR (kevent does not observe SA_RESTART?) Date: Mon, 5 Feb 2001 17:50:37 -0800 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 In-Reply-To: <20010205193507.J650@prism.flugsvamp.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I assume, then, that you guarantee that the changelist is applied (and errors relating to the changes are placed in the received-events-buffer, if possible) before the call becomes interruptible? (and if there were an error that doesn't fit in the buffer, the return would be immediate with the error code); that is, only after the process goes to sleep waiting in kqueue, is there the possibility of an EINTR return? Or, is there the possibility of the changelist only being partially executed when the result is EINTR? I concur that the EINTR semantics are simple and consistent, but perhaps a warning, to the effect that SA_RESTART does not prevent the EINTR outcome, is in order (this may be the case for quite a few other syscalls as well, I have no idea ... but it would be nice to see it documented) > The difficulty in restarting the kevent call is that it would have > to re-apply the changelist, which is probably not what you want. The > only case where it is possible to perform a restart is with an empty > changelist. I didn't put this optimization in, as I think it would be > better if the interface was consistent in all cases. > -- > Jonathan > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 18:48:55 2001 Delivered-To: freebsd-arch@freebsd.org Received: from smtp10.phx.gblx.net (smtp10.phx.gblx.net [206.165.6.140]) by hub.freebsd.org (Postfix) with ESMTP id 80A8437B503; Mon, 5 Feb 2001 18:48:35 -0800 (PST) Received: (from daemon@localhost) by smtp10.phx.gblx.net (8.9.3/8.9.3) id TAA35152; Mon, 5 Feb 2001 19:47:59 -0700 Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp10.phx.gblx.net, id smtpd4XrvEa; Mon Feb 5 19:47:50 2001 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id TAA08217; Mon, 5 Feb 2001 19:48:20 -0700 (MST) From: Terry Lambert Message-Id: <200102060248.TAA08217@usr08.primenet.com> Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) To: phk@critter.freebsd.dk (Poul-Henning Kamp) Date: Tue, 6 Feb 2001 02:48:19 +0000 (GMT) Cc: gibbs@scsiguy.com (Justin T. Gibbs), bright@wintelcom.net (Alfred Perlstein), rjesup@wgate.com (Randell Jesup), dillon@earth.backplane.com (Matt Dillon), mjacob@feral.com (Matthew Jacob), msmith@FreeBSD.ORG (Mike Smith), des@ofug.org (Dag-Erling Smorgrav), dnelson@emsphone.com (Dan Nelson), tanimura@r.dl.itc.u-tokyo.ac.jp (Seigo Tanimura), arch@FreeBSD.ORG In-Reply-To: <29299.981412612@critter> from "Poul-Henning Kamp" at Feb 05, 2001 11:36:52 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > >It doesn't have to be a simple call if it only occurs once on mount > >and whenever a component makes an async upcall telling the system that > >its state has changed (array is degraded, or perhaps commonly accessed > >data has migrated to a different striping or RAID layout). > > I think we are talking too many different things at the same time here. Way too many irons in the fire here... > The upcall I (and I belive Alfred) were discussing were happening > once per I/O. I don't think an upcall is really useful. Given a stack of things, possibly including Vinum and friends, it would be really difficult to get the event propagation semantics right, in any case. It only gets worse, with vnode devices and FS stacks. > The one you are talking about is obviously the one to formulate an > abstract clustering preference for a device ? I still think it might be worthwhile to readdress the seek minimization code, by reading mode page 2 on SCSI drives, and using the knowledge of the real seek boundaries. Your point about whiling away Winter nights is well taken. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 21:46:37 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 20E5E37B401; Mon, 5 Feb 2001 21:46:20 -0800 (PST) Received: (from dillon@localhost) by earth.backplane.com (8.11.1/8.9.3) id f165kHq58398; Mon, 5 Feb 2001 21:46:17 -0800 (PST) (envelope-from dillon) Date: Mon, 5 Feb 2001 21:46:17 -0800 (PST) From: Matt Dillon Message-Id: <200102060546.f165kHq58398@earth.backplane.com> To: Terry Lambert Cc: phk@critter.freebsd.dk (Poul-Henning Kamp), gibbs@scsiguy.com (Justin T. Gibbs), bright@wintelcom.net (Alfred Perlstein), rjesup@wgate.com (Randell Jesup), mjacob@feral.com (Matthew Jacob), msmith@FreeBSD.ORG (Mike Smith), des@ofug.org (Dag-Erling Smorgrav), dnelson@emsphone.com (Dan Nelson), tanimura@r.dl.itc.u-tokyo.ac.jp (Seigo Tanimura), arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) References: <200102060248.TAA08217@usr08.primenet.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG At risk of throwing yet another iron into coals.... The problem here is to try to give a 'hint' to the high level VFS/BIO and VM systems. The hint doesn't have to be correct, it just has to be close 'most of the time'. What this means is that we don't have to create massive infrastructure to get it exactly right. Something as simple as an alignment size covers a wide range of topologies, including all standard RAID topologies. We don't have to propogate information about actual seek boundries, or reassigned sectors, for example. We certainly do not have to propogate the information on-the-fly... we can get 95% of the way there at mount time, and that's good enough. We can also simply assume a reasonable rule for intermediate topologies such as CCD, VN, or a filesystem... we allow the intermediate layers to modify the parameters on their way up, and we assume they will do so prudently. And we can assume for the most part that contiguous blocks translate to contiguous blocks 'most of the time', even when reading and writing a file. (And I will note here that the clustering code is already aware of the most common case -- a logically contiguous file that is not necessarily physically contiguous, and the system does the right thing). I think the idea Poul originally articulated -- having simple information like recommended I/O size, recommended cluster size, and/or maximum I/O size, is the correct solution. Getting fancy might buy us a percent or two... it isn't worth the effort. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 22: 8:23 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 6B39237B401; Mon, 5 Feb 2001 22:08:06 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f16684g29040; Mon, 5 Feb 2001 22:08:04 -0800 (PST) Date: Mon, 5 Feb 2001 22:08:04 -0800 From: Alfred Perlstein To: Jonathan Lemon Cc: Jonathan Graehl , freebsd-arch@freebsd.org, Jonathan Lemon Subject: Re: nonblocking sockets and EINTR (kevent does not observe SA_RESTART?) Message-ID: <20010205220804.M26076@fw.wintelcom.net> References: <20010205154842.J26076@fw.wintelcom.net> <20010205193507.J650@prism.flugsvamp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010205193507.J650@prism.flugsvamp.com>; from jlemon@flugsvamp.com on Mon, Feb 05, 2001 at 07:35:07PM -0600 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Jonathan Lemon [010205 17:33] wrote: > On Mon, Feb 05, 2001 at 05:16:56PM -0800, Jonathan Graehl wrote: > > > You can specify that syscalls will or won't be automatically > > > restarted via the sigaction() API. > > > > > > -- > > > -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] > > > > Thank you for reminding me of this (and making me feel like my question could > > have been better directed at -questions, if it is so trivially answered ;) > > > > I am using sigaction with SA_RESTART, and I still get EINTR from my kevent call > > (no matter, this is easily dealt with, due to the straightforward kevent > > semantics). I assume that SA_RESTART then only applies to the traditional > > syscalls (read/write,send/recv), and that this may be an oversight in the kqueue > > implementation, at least meriting a warning in the man page > > The difficulty in restarting the kevent call is that it would have > to re-apply the changelist, which is probably not what you want. The > only case where it is possible to perform a restart is with an empty > changelist. I didn't put this optimization in, as I think it would be > better if the interface was consistent in all cases. I'm pretty sure select() and poll() do not respect SA_RESTART either, so it's probably best that kevent doesn't as well. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 22:49:28 2001 Delivered-To: freebsd-arch@freebsd.org Received: from smtp10.phx.gblx.net (smtp10.phx.gblx.net [206.165.6.140]) by hub.freebsd.org (Postfix) with ESMTP id 421A337B503; Mon, 5 Feb 2001 22:49:11 -0800 (PST) Received: (from daemon@localhost) by smtp10.phx.gblx.net (8.9.3/8.9.3) id XAA25964; Mon, 5 Feb 2001 23:48:36 -0700 Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp10.phx.gblx.net, id smtpddhzwEa; Mon Feb 5 23:48:32 2001 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id XAA12348; Mon, 5 Feb 2001 23:49:04 -0700 (MST) From: Terry Lambert Message-Id: <200102060649.XAA12348@usr08.primenet.com> Subject: Re: nonblocking sockets and EINTR (kevent does not observe SA_RESTART?) To: bright@wintelcom.net (Alfred Perlstein) Date: Tue, 6 Feb 2001 06:49:02 +0000 (GMT) Cc: jlemon@flugsvamp.com (Jonathan Lemon), jonathan@graehl.org (Jonathan Graehl), freebsd-arch@FreeBSD.ORG, jlemon@FreeBSD.ORG (Jonathan Lemon) In-Reply-To: <20010205220804.M26076@fw.wintelcom.net> from "Alfred Perlstein" at Feb 05, 2001 10:08:04 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > I'm pretty sure select() and poll() do not respect SA_RESTART > either, so it's probably best that kevent doesn't as well. Historically, select() has respected SA_RESTART; all system calls respected it; it was the default behaviour for 4.2, and until the introduction of siginterrupt(), which was obtained from DEC Ultrix. The standard way that was used prior to that of causing a signal handler to actually interrupt a call was to longjmp() out of the signal handler, with a setjmp() wrapper around the call being aborted. It was only after the introduction of POSIX signals, which have made life hell for wrapping system calls safely, that the default changed to the POSIX (SVR4) behaviour. Actually, I don't really see any problem with select() being restarted, since it's trivial to set the bitmap. If the call is interrupted, the bitmap should be unmodified (ready to call select() and go); if the bitmap was changed, then the bits which are set are valid, so returning them isn't a problem: the call has completed, but triggered the trampoline. The poll() call might be more of a problem, particularly if we are relying on SIGPOLL to signal pollable events pending being reaped via a subsequent poll() call. Otherwise, the poll() interface is better for restarting than select() is. Although I doubt we will return to the default-restart of 4.2 and 4.3 (even though it would make a threads library a trivial thing to write, with all of the signal masking and unmasking calls being dropped from the overhead), I think that it would not be impossible to make SA_RESTART work like POSIX says it should (at least one of the approaches that has been suggested will work, I think); it's probably worth the effort to think about how to fix kevent(). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 5 22:58:21 2001 Delivered-To: freebsd-arch@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 1F26437B401; Mon, 5 Feb 2001 22:58:04 -0800 (PST) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id XAA25045; Mon, 5 Feb 2001 23:53:17 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp05.primenet.com, id smtpdAAAQuai2W; Mon Feb 5 23:53:09 2001 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id XAA12458; Mon, 5 Feb 2001 23:57:49 -0700 (MST) From: Terry Lambert Message-Id: <200102060657.XAA12458@usr08.primenet.com> Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) To: dillon@earth.backplane.com (Matt Dillon) Date: Tue, 6 Feb 2001 06:57:48 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), phk@critter.freebsd.dk (Poul-Henning Kamp), gibbs@scsiguy.com (Justin T. Gibbs), bright@wintelcom.net (Alfred Perlstein), rjesup@wgate.com (Randell Jesup), mjacob@feral.com (Matthew Jacob), msmith@FreeBSD.ORG (Mike Smith), des@ofug.org (Dag-Erling Smorgrav), dnelson@emsphone.com (Dan Nelson), tanimura@r.dl.itc.u-tokyo.ac.jp (Seigo Tanimura), arch@FreeBSD.ORG In-Reply-To: <200102060546.f165kHq58398@earth.backplane.com> from "Matt Dillon" at Feb 05, 2001 09:46:17 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > I think the idea Poul originally articulated -- having simple information > like recommended I/O size, recommended cluster size, and/or maximum I/O > size, is the correct solution. Getting fancy might buy us a percent > or two... it isn't worth the effort. I thought Poul had discarded that idea as unworkable, after having tried to make it work; I got the impression that he still liked the idea, but that he didn't have a way to make it practical (Poul, please correct me if I am misinterpreting your last post). I can't see hints being much more useful than the seek optimization code, which was disabled as a pessimization for most ZBR drives, where the track boundaries were unknown back in the early fictional geometry days (predating SCSI II, where it could be fixed again). I would think that you would want your optimization to work at least 51% of the time for it to be worthwhile, or at least "mostly harmless", and I really have doubts that "hints" would be able to do that. You really don't want to end up with something that makes a microbenchmark run fast, at the expense of real system loads, like some of the stuff that happened in the buffer cache code, historically. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 6 0: 4:13 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147]) by hub.freebsd.org (Postfix) with ESMTP id 46C4037B401; Tue, 6 Feb 2001 00:03:55 -0800 (PST) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f1683pB31838; Tue, 6 Feb 2001 09:03:51 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: Terry Lambert Cc: dillon@earth.backplane.com (Matt Dillon), gibbs@scsiguy.com (Justin T. Gibbs), bright@wintelcom.net (Alfred Perlstein), rjesup@wgate.com (Randell Jesup), mjacob@feral.com (Matthew Jacob), msmith@FreeBSD.ORG (Mike Smith), des@ofug.org (Dag-Erling Smorgrav), dnelson@emsphone.com (Dan Nelson), tanimura@r.dl.itc.u-tokyo.ac.jp (Seigo Tanimura), arch@FreeBSD.ORG Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) In-Reply-To: Your message of "Tue, 06 Feb 2001 06:57:48 GMT." <200102060657.XAA12458@usr08.primenet.com> Date: Tue, 06 Feb 2001 09:03:51 +0100 Message-ID: <31836.981446631@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <200102060657.XAA12458@usr08.primenet.com>, Terry Lambert writes: >> I think the idea Poul originally articulated -- having simple information >> like recommended I/O size, recommended cluster size, and/or maximum I/O >> size, is the correct solution. Getting fancy might buy us a percent >> or two... it isn't worth the effort. > >I thought Poul had discarded that idea as unworkable, after having >tried to make it work; I got the impression that he still liked >the idea, but that he didn't have a way to make it practical (Poul, >please correct me if I am misinterpreting your last post). No, that is perfectly possible and basically on requires the addition of a preferred modulus to the current data in dev_t / struct disk. Optimal individual clustering is unworkable. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 6 1: 6:59 2001 Delivered-To: freebsd-arch@freebsd.org Received: from njord.bart.nl (njord.bart.nl [194.158.170.15]) by hub.freebsd.org (Postfix) with ESMTP id DCD7737B699; Tue, 6 Feb 2001 01:06:41 -0800 (PST) Received: from daemon.chronias.ninth-circle.org (root@cable.ninth-circle.org [195.38.232.6]) by njord.bart.nl (8.10.1/8.10.1) with ESMTP id f1696d650593; Tue, 6 Feb 2001 10:06:39 +0100 (CET) Received: (from asmodai@localhost) by daemon.chronias.ninth-circle.org (8.11.1/8.11.0) id f1696YN91138; Tue, 6 Feb 2001 10:06:34 +0100 (CET) (envelope-from asmodai) Date: Tue, 6 Feb 2001 10:06:34 +0100 From: Jeroen Ruigrok/Asmodai To: Nik Clayton Cc: arch@freebsd.org Subject: Re: [andrew@ugh.net.au: docs/23745: man page for vcount(9)] Message-ID: <20010206100634.K442@daemon.ninth-circle.org> References: <20010202030540.B21835@canyon.nothing-going-on.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <20010202030540.B21835@canyon.nothing-going-on.org>; from nik@freebsd.org on Fri, Feb 02, 2001 at 03:05:43AM +0000 Organisation: Ninth-Circle Enterprises Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG -On [20010202 04:30], Nik Clayton (nik@freebsd.org) wrote: >Anyone up for a review? Cheers. Done, and committed. -- Jeroen Ruigrok vd Werven/Asmodai asmodai@[wxs.nl|bart.nl|freebsd.org] Documentation nutter/C-rated Coder BSD: Technical excellence at its best D78D D0AD 244D 1D12 C9CA 7152 035C 1138 546A B867 Let us eat and drink; for tomorrow we shall die... To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 6 3: 0:29 2001 Delivered-To: freebsd-arch@freebsd.org Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94]) by hub.freebsd.org (Postfix) with ESMTP id 5817737B503; Tue, 6 Feb 2001 03:00:08 -0800 (PST) Received: by relay.butya.kz (Postfix, from userid 1000) id BAE6628E66; Tue, 6 Feb 2001 17:00:03 +0600 (ALMT) Received: from localhost (localhost [127.0.0.1]) by relay.butya.kz (Postfix) with ESMTP id ABBCA28E46; Tue, 6 Feb 2001 17:00:03 +0600 (ALMT) Date: Tue, 6 Feb 2001 17:00:03 +0600 (ALMT) From: Boris Popov To: freebsd-arch@freebsd.org Cc: freebsd-fs@freebsd.org Subject: vnode interlock API Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hello, Few months ago simple locks used for vnode interlock were replaced by mutexes. It causes additional pain for externally maintained filesystems and lowers portability of the code between -stable and -current. So, I suggest to introduce two macro definitions which will hide implementation details for interlocks: #define VI_LOCK(vp) mtx_enter(&(vp)->v_interlock, MTX_DEF) #define VI_UNLOCK(vp) mtx_exit(&(vp)->v_interlock, MTX_DEF) for RELENG_4 they will look like this: #define VI_LOCK(vp) simple_lock(&(vp)->v_interlock) #define VI_UNLOCK(vp) simple_unlock(&(vp)->v_interlock) Any comments, suggestions ? -- Boris Popov http://www.butya.kz/~bp/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 6 3: 3:47 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147]) by hub.freebsd.org (Postfix) with ESMTP id CEB2037B401; Tue, 6 Feb 2001 03:03:24 -0800 (PST) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f16B33B33409; Tue, 6 Feb 2001 12:03:03 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: Boris Popov Cc: freebsd-arch@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG Subject: Re: vnode interlock API In-Reply-To: Your message of "Tue, 06 Feb 2001 17:00:03 +0600." Date: Tue, 06 Feb 2001 12:03:03 +0100 Message-ID: <33407.981457383@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Sounds like something which should have been done long time ago... In message , Boris Popov writes: > Hello, > > Few months ago simple locks used for vnode interlock were replaced >by mutexes. It causes additional pain for externally maintained >filesystems and lowers portability of the code between -stable and >-current. > > So, I suggest to introduce two macro definitions which will hide >implementation details for interlocks: > >#define VI_LOCK(vp) mtx_enter(&(vp)->v_interlock, MTX_DEF) >#define VI_UNLOCK(vp) mtx_exit(&(vp)->v_interlock, MTX_DEF) > > for RELENG_4 they will look like this: > >#define VI_LOCK(vp) simple_lock(&(vp)->v_interlock) >#define VI_UNLOCK(vp) simple_unlock(&(vp)->v_interlock) > > Any comments, suggestions ? > >-- >Boris Popov >http://www.butya.kz/~bp/ > > > >To Unsubscribe: send mail to majordomo@FreeBSD.org >with "unsubscribe freebsd-arch" in the body of the message > -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 6 3:51:33 2001 Delivered-To: freebsd-arch@freebsd.org Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94]) by hub.freebsd.org (Postfix) with ESMTP id 2180B37B401; Tue, 6 Feb 2001 03:51:05 -0800 (PST) Received: by relay.butya.kz (Postfix, from userid 1000) id E602728E45; Tue, 6 Feb 2001 17:50:52 +0600 (ALMT) Received: from localhost (localhost [127.0.0.1]) by relay.butya.kz (Postfix) with ESMTP id 7408928DEE; Tue, 6 Feb 2001 17:50:52 +0600 (ALMT) Date: Tue, 6 Feb 2001 17:50:52 +0600 (ALMT) From: Boris Popov To: freebsd-arch@freebsd.org Cc: freebsd-net@freebsd.org Subject: CFR: Sequential mbuf read/write extensions Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG [Please trim CC list as necessary] Hello, Before starting import process for smbfs, I would like to introduce new API which greatly simplifies process of packaging data into mbufs and fetching it back (in fact, similar API already presented in the tree, but it is private to the netncp code and it will be really nice to share it). Basically, it requires additional structure (working context) and related functions: struct mbdata { struct mbuf * mb_top; struct mbuf * mb_cur; u_char * mb_pos; int mb_count; }; Where mb_top points at the first mbuf in the chain and mb_cur to the current mbuf. Here is a slightly truncated API to illustrate how it works: int mb_init(struct mbdata *mbp); int mb_initm(struct mbdata *mbp, struct mbuf *m); int mb_done(struct mbdata *mbp); int mb_put_byte(struct mbdata *mbp, u_int8_t x); int mb_put_wordbe(struct mbdata *mbp, u_int16_t x); int mb_put_wordle(struct mbdata *mbp, u_int16_t x); int mb_put_dwordbe(struct mbdata *mbp, u_int32_t x); int mb_get_byte(struct mbdata *mbp, u_int8_t *x); int mb_get_word(struct mbdata *mbp, u_int16_t *x); int mb_get_wordle(struct mbdata *mbp, u_int16_t *x); int mb_get_wordbe(struct mbdata *mbp, u_int16_t *x); The mb_put* functions allow to append new data to mbuf chain. These functions take care about necessary mbuf allocations and additional data conversions. For example, mb_put_wordbe will store a 16 bit integer in the network format while mb_put_wordle will convert it to the little endian format if necessary. The mb_get* functions allow to fetch data from mbuf chains with appropriate handling of mbuf borders and data conversions. Here is a simple examples (error checks are omitted): Send: error = mb_init(mbp); if (error) return error; mb_put_mem(mbp, SMB_SIGNATURE, SMB_SIGLEN, MB_MSYSTEM); mb_put_byte(mbp, cmd); mb_put_dwordle(mbp, 1234); mb_put_byte(mbp, vcp->vc_hflags); mb_fixhdr(mbp); my_great_send_function(mbp->mb_top); mb_done(mbp); Receive: mb_initm(mbp, just_received_mbuf_chain); mb_get_byte(mbp, &rqp->sr_rpflags); mb_get_wordle(mbp, &rqp->sr_rpflags2); mb_get_dword(mbp, &tdw); mb_get_dword(mbp, &tdw); mb_get_dword(mbp, &tdw); mb_get_wordle(mbp, &rqp->sr_rptid); mb_get_wordle(mbp, &rqp->sr_rppid); mb_get_wordle(mbp, &rqp->sr_rpuid); mb_get_wordle(mbp, &rqp->sr_rpmid); Since currently there isn't many consumers of this code I can suggest to define an option LIBMBUF in the kernel configuration file and add KLD libmbuf (with interface libmbuf), so kernel footprint will not be significantly affected. The names of source and header files are questionable too and I would appreciate good suggestions (currently they are subr_mbuf.c and subr_mbuf.h). Well, and finally here you will find full source code of proposed API: http://www.butya.kz/~bp/mbuf/ Any comments and suggestions are greatly appreciated. -- Boris Popov http://www.butya.kz/~bp/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 6 7:31:11 2001 Delivered-To: freebsd-arch@freebsd.org Received: from prism.flugsvamp.com (cb58709-a.mdsn1.wi.home.com [24.17.241.9]) by hub.freebsd.org (Postfix) with ESMTP id C36D437B401 for ; Tue, 6 Feb 2001 07:30:54 -0800 (PST) Received: (from jlemon@localhost) by prism.flugsvamp.com (8.11.0/8.11.0) id f16FW1Y20391; Tue, 6 Feb 2001 09:32:01 -0600 (CST) (envelope-from jlemon) Date: Tue, 6 Feb 2001 09:32:01 -0600 From: Jonathan Lemon To: Jonathan Graehl Cc: Jonathan Lemon , freebsd-arch@FreeBSD.ORG Subject: Re: nonblocking sockets and EINTR (kevent does not observe SA_RESTART?) Message-ID: <20010206093201.K650@prism.flugsvamp.com> References: <20010205193507.J650@prism.flugsvamp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, Feb 05, 2001 at 05:50:37PM -0800, Jonathan Graehl wrote: > I assume, then, that you guarantee that the changelist is applied > (and errors relating to the changes are placed in the > received-events-buffer, if possible) before the call becomes > interruptible? (and if there were an error that doesn't fit in the > buffer, the return would be immediate with the error code); that is, > only after the process goes to sleep waiting in kqueue, is there the > possibility of an EINTR return? Correct. Technically, an EINTR is returned when a signal interrupts the process after it goes to sleep (that is, after it calls tsleep). So if (as an example) you call kevent() with a zero valued timespec, you'll never get EINTR, since there's no possibility of it sleeping. -- Jonathan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 6 8:32:25 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mailout02.sul.t-online.com (mailout02.sul.t-online.com [194.25.134.17]) by hub.freebsd.org (Postfix) with ESMTP id 8948D37B4EC; Tue, 6 Feb 2001 08:32:03 -0800 (PST) Received: from fwd07.sul.t-online.com by mailout02.sul.t-online.com with smtp id 14QB27-00052q-00; Tue, 06 Feb 2001 17:31:59 +0100 Received: from frolic.no-support.loc (520094253176-0001@[217.80.111.106]) by fmrl07.sul.t-online.com with esmtp id 14QB1l-2Kk35mC; Tue, 6 Feb 2001 17:31:37 +0100 Received: (from bjoern@localhost) by frolic.no-support.loc (8.11.1/8.9.3) id f16GLp600648; Tue, 6 Feb 2001 17:21:51 +0100 (CET) (envelope-from bjoern) From: Bjoern Fischer Date: Tue, 6 Feb 2001 17:21:50 +0100 To: Boris Popov Cc: freebsd-arch@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG Subject: Re: vnode interlock API Message-ID: <20010206172150.A528@frolic.no-support.loc> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from bp@butya.kz on Tue, Feb 06, 2001 at 05:00:03PM +0600 X-Sender: 520094253176-0001@t-dialin.net Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hello, > Few months ago simple locks used for vnode interlock were replaced > by mutexes. It causes additional pain for externally maintained > filesystems and lowers portability of the code between -stable and > -current. > > So, I suggest to introduce two macro definitions which will hide > implementation details for interlocks: > > #define VI_LOCK(vp) mtx_enter(&(vp)->v_interlock, MTX_DEF) > #define VI_UNLOCK(vp) mtx_exit(&(vp)->v_interlock, MTX_DEF) BTW, does this mean that -current vnode locking works sufficiently enough to support stacked file systems a la Eric Zadok's FiST software? Bjoern -- -----BEGIN GEEK CODE BLOCK----- GCS d--(+) s++: a- C+++(-) UB++++OSI++++$ P+++(-) L---(++) !E W- N+ o>+ K- !w !O !M !V PS++ PE- PGP++ t+++ !5 X++ tv- b+++ D++ G e+ h-- y+ ------END GEEK CODE BLOCK------ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 6 10:59:21 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 2968337B401; Tue, 6 Feb 2001 10:59:01 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f16Iwkd17957; Tue, 6 Feb 2001 10:58:46 -0800 (PST) Date: Tue, 6 Feb 2001 10:58:46 -0800 From: Alfred Perlstein To: Boris Popov Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: Sequential mbuf read/write extensions Message-ID: <20010206105846.Q26076@fw.wintelcom.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from bp@butya.kz on Tue, Feb 06, 2001 at 05:50:52PM +0600 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Boris Popov [010206 03:51] wrote: > [Please trim CC list as necessary] > > Hello, > > Before starting import process for smbfs, I would like to > introduce new API which greatly simplifies process of packaging data into > mbufs and fetching it back (in fact, similar API already presented in the > tree, but it is private to the netncp code and it will be really nice to > share it). [snip] Looks really cool, I can't get to http://www.butya.kz/~bp/mbuf/, but from the examples it looks very useful. I was wondering if you planned or already had an API for reading/writing from/into host/network byte order? Not that it's needed, but would be nice to have. Also any chance we'll get manpages that describe these functions/macros? On other idea is to give each op a 'count' parameter, your examples seem to show various functions being called several times in a row, maybe they would help optimize certain codepaths? Not that any of these suggestions are really required, I just wanted to give you some feedback. :) -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 6 11:51:55 2001 Delivered-To: freebsd-arch@freebsd.org Received: from meow.osd.bsdi.com (meow.osd.bsdi.com [204.216.28.88]) by hub.freebsd.org (Postfix) with ESMTP id 192A337B401; Tue, 6 Feb 2001 11:51:34 -0800 (PST) Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241]) by meow.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id f16Jo9345186; Tue, 6 Feb 2001 11:50:09 -0800 (PST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Tue, 06 Feb 2001 11:51:11 -0800 (PST) From: John Baldwin To: Boris Popov Subject: RE: vnode interlock API Cc: freebsd-fs@FreeBSD.org, freebsd-arch@FreeBSD.org Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 06-Feb-01 Boris Popov wrote: > Hello, > > Few months ago simple locks used for vnode interlock were replaced > by mutexes. It causes additional pain for externally maintained > filesystems and lowers portability of the code between -stable and > -current. Sounds good. -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 6 18:18:20 2001 Delivered-To: freebsd-arch@freebsd.org Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94]) by hub.freebsd.org (Postfix) with ESMTP id E3C2437B401; Tue, 6 Feb 2001 18:17:59 -0800 (PST) Received: by relay.butya.kz (Postfix, from userid 1000) id 40E6B29059; Wed, 7 Feb 2001 08:17:54 +0600 (ALMT) Received: from localhost (localhost [127.0.0.1]) by relay.butya.kz (Postfix) with ESMTP id 312C628698; Wed, 7 Feb 2001 08:17:54 +0600 (ALMT) Date: Wed, 7 Feb 2001 08:17:53 +0600 (ALMT) From: Boris Popov To: Alfred Perlstein Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: Sequential mbuf read/write extensions In-Reply-To: <20010206105846.Q26076@fw.wintelcom.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, 6 Feb 2001, Alfred Perlstein wrote: > Looks really cool, I can't get to http://www.butya.kz/~bp/mbuf/, > but from the examples it looks very useful. Sorry, server was brought down and I wasn't notified :(. It should be ok now. > I was wondering if you planned or already had an API for reading/writing > from/into host/network byte order? Not that it's needed, but would > be nice to have. Also any chance we'll get manpages that describe > these functions/macros? Yes, the header file contains macros which supports not only host to network (big-endian) byte order conversion, but also to the little-endian byte order. And of course, there will be a manpage(s) if this is going to become a part of kernel API. > On other idea is to give each op a 'count' parameter, your examples > seem to show various functions being called several times in a row, > maybe they would help optimize certain codepaths? Yes, there is a mb_{get|put}_mem() functions which allow reading/writing of big memory regions (including user space). So, if protocol is well designed and layout of the packet can be described as structure, it is possible to fill it in the normal memory and copy the mbuf chain in single operation. > > Not that any of these suggestions are really required, I just wanted > to give you some feedback. :) Thanks :) -- Boris Popov http://www.butya.kz/~bp/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 6 19:42:51 2001 Delivered-To: freebsd-arch@freebsd.org Received: from VL-MS-MR002.sc1.videotron.ca (relais.videotron.ca [24.201.245.36]) by hub.freebsd.org (Postfix) with ESMTP id 30A4E37B491; Tue, 6 Feb 2001 19:42:27 -0800 (PST) Received: from jehovah ([24.201.144.31]) by VL-MS-MR002.sc1.videotron.ca (Netscape Messaging Server 4.15) with SMTP id G8DBJZ05.88O; Tue, 6 Feb 2001 22:40:47 -0500 Message-ID: <003001c090b8$0b067a50$1f90c918@jehovah> From: "Bosko Milekic" To: "Boris Popov" , Cc: References: Subject: Re: Sequential mbuf read/write extensions Date: Tue, 6 Feb 2001 22:42:49 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2919.6700 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Boris Popov wrote: [...] > Since currently there isn't many consumers of this code I can > suggest to define an option LIBMBUF in the kernel configuration file and > add KLD libmbuf (with interface libmbuf), so kernel footprint will not be I am in favor of such an option on the condition that it is temporary. In other words, only until we decide "we have converted enough code to use this code so we should remove the option now." The reason is that otherwise, we will be faced with numerous "#ifdef LIBMBUF ... #else ... #endif" code. I assume this is what you meant, anyway, so I have no objections. :-) The API looks great by the way, and I will try to give a more detailed review in the next few days. :-) For now: #define M_TRYWAIT M_WAIT is not right. (M_WAIT is no longer to be used in the mbuf code.) The succesfull return values are 0, I don't have a problem with this, specifically, but I would assume that this: if (!mb_init(mbp)) ... would be more "logical" (I use the term loosely) if it meant: "if initialization fails" (now it means "if initialization is succesful"). > significantly affected. The names of source and header files are > questionable too and I would appreciate good suggestions (currently they > are subr_mbuf.c and subr_mbuf.h). Hmmm. Maybe subr_mblib.c and libmb.h ? I don't want to turn this into a bikeshed ( :-) ), so I suggest that you decide. Personally, I would prefer that it be something other than "subr_mbuf.c" simply because it may be a little misleading in some cases. > Well, and finally here you will find full source code of proposed > API: http://www.butya.kz/~bp/mbuf/ > > Any comments and suggestions are greatly appreciated. > > -- > Boris Popov > http://www.butya.kz/~bp/ Boris, this is really a great interface and nice looking, clean code. Thank you! Regards, Bosko. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 0:26:39 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id D02E037B491 for ; Wed, 7 Feb 2001 00:26:19 -0800 (PST) Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.11.1/8.11.1) with SMTP id f178QHh11778 for ; Wed, 7 Feb 2001 03:26:18 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Wed, 7 Feb 2001 03:26:17 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: arch@FreeBSD.org Subject: Moving struct proc's p_prison to ucred as cr_prison Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I'm planning on committing a close approximation to the following in the near future: http://www.watson.org/~robert/jail-to-ucred.diff The p_prison pointer in the process structure ties a process to its jail(8) prison structure. This patch moves that pointer from the process structure to the credential structure, as well as cleaning up a few other bits and pieces associated with jail and process access control. Here are some more details for those interested in reviewing the changes (which will be committed in components, and is currently waiting on an xucred fix so that mountd doesn't panic the system when it's concept of ucred doesn't match with the kernerl version). - proc->p_prison moved to ucred->cr_prison - abstract out jail reference counting using prison_hold() and prison_free() - make jail inheritence be a function of credential inheritence - make jail garbage collection be a function of credential garbage collection - modify various jail (prison_*) functions to accepting ucred instead of proc - introduce jailed(ucred) call to check if a ucred is in jail rather than direct (p->p_prison!=NULL) checks all over the place - remove const qualifier from various calls, including suser, p_can, cap_check, to reflect mutex use in the near future - remove unnecessary prison check in bpf device code (we use namespacing to protect devices, where possible) - move various jail function prototypes to jail.h - convert PRISON_CHECK from a macro to a function - comment a number of situations where it's now possible to test jail presence with respects to a passed credential rather than the current process (usually in the socket code). No semantics changes here just yet, but there may be in the future. Comments won't be committed, but are there to guide the reader in understanding the diffs. Generally, the benefits of this change include: - increasingly modularized jail, making the idea of a kld-loadable jail or customized jail() more conceivable -- hide jail implementation from many consumers of jail (not all yet, especially in pty code and Linux ABI) - move towards a model where access control decisions can be made without reference to the process, just the credential (won't be entirely possible as some access decisions are based on p_session and related concepts for signalling, but it helps). - move towards a model where pre-bound sockets could be passed into a jail via UDS allowing the jail access to some outside system resources (much the same way as cached socket credentials allow non-root processes to use sockets bound while holding privilege). As I mention above, right now applying this change without rebuilding mountd can result in a system panic, due to differeing interpretations of the ucred structure between kernel and userland. Brian Feldman apparently has patches to fix this by making the userland/kernel ABI/API use xucred; in the mean time if you decide to test this, disable NFS serving, or remember to rebuild userland. Working through these changes prompted my earlier question about NULL credential references. It may be that the race windows in fork1() and exit1() (possibly wait1()) require additional checks in these patches. BTW, while working with the ucred code, I noticed that while uidinfo appears to have moved to ucred, there are still uidinfo references in struct proc. I haven't followed up on why this might be the case as yet. Robert N M Watson FreeBSD Core Team, TrustedBSD Project robert@fledge.watson.org NAI Labs, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 0:34:31 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mailhub.fokus.gmd.de (mailhub.fokus.gmd.de [193.174.154.14]) by hub.freebsd.org (Postfix) with ESMTP id 5EA3737B401; Wed, 7 Feb 2001 00:34:12 -0800 (PST) Received: from beagle (beagle [193.175.132.100]) by mailhub.fokus.gmd.de (8.8.8/8.8.8) with ESMTP id JAA24496; Wed, 7 Feb 2001 09:33:15 +0100 (MET) Date: Wed, 7 Feb 2001 09:33:15 +0100 (CET) From: Harti Brandt To: Boris Popov Cc: , Subject: Re: CFR: Sequential mbuf read/write extensions In-Reply-To: <20010206105846.Q26076@fw.wintelcom.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Looks nice, just what I needed two weeks ago and partly had to implement myself :-) But, I would recommend to stick with the ususal naming of size dependend things, by appending a numeric suffix. Something like: int mb_get8(struct mbdata *mbp, u_int8_t *x); int mb_get16(struct mbdata *mbp, u_int16_t *x); int mb_get16le(struct mbdata *mbp, u_int16_t *x); int mb_get16be(struct mbdata *mbp, u_int16_t *x); int mb_get32(struct mbdata *mbp, u_int32_t *x); ... Using 'word' and 'doubleword' is rather confusing (when speeking of words I would think of 32 bit nowadays). harti -- harti brandt, http://www.fokus.gmd.de/research/cc/cats/employees/hartmut.brandt/private brandt@fokus.gmd.de, harti@begemot.org, lhbrandt@mail.ru To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 1:29:26 2001 Delivered-To: freebsd-arch@freebsd.org Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94]) by hub.freebsd.org (Postfix) with ESMTP id 7863E37B684; Wed, 7 Feb 2001 01:28:59 -0800 (PST) Received: by relay.butya.kz (Postfix, from userid 1000) id E5B812868D; Wed, 7 Feb 2001 15:28:49 +0600 (ALMT) Received: from localhost (localhost [127.0.0.1]) by relay.butya.kz (Postfix) with ESMTP id D66D82868A; Wed, 7 Feb 2001 15:28:49 +0600 (ALMT) Date: Wed, 7 Feb 2001 15:28:49 +0600 (ALMT) From: Boris Popov To: Bosko Milekic Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: Sequential mbuf read/write extensions In-Reply-To: <003001c090b8$0b067a50$1f90c918@jehovah> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, 6 Feb 2001, Bosko Milekic wrote: > > Since currently there isn't many consumers of this code I can > > suggest to define an option LIBMBUF in the kernel configuration file > and > > add KLD libmbuf (with interface libmbuf), so kernel footprint will > not be > > I am in favor of such an option on the condition that it is > temporary. In other words, only until we decide "we have converted > enough code to use this code so we should remove the option now." The > reason is that otherwise, we will be faced with numerous "#ifdef > LIBMBUF ... #else ... #endif" code. I assume this is what you meant, Not exactly so. 'option LIBMBUF' will just connect the source file to kernel makefile. There is no need for any #ifdef's in the code. > #define M_TRYWAIT M_WAIT is not right. > (M_WAIT is no longer to be used in the mbuf code.) You omitted the surrounding "#ifndef M_TRYWAIT" which makes this code portable to RELENG_4 (mind you, this code taken from smbfs). Of course, this should be stripped before import. > The succesfull return values are 0, I don't have a problem with this, > specifically, but I would assume that this: > if (!mb_init(mbp)) ... would be more "logical" (I use the term > loosely) if it meant: "if initialization fails" (now it means "if > initialization is succesful"). I'm generally don't like such syntax if function or variable name do not clearly specify which value it should have/return on success. Nearly all functions in this file return zero or error code, so the correct syntax of the above will be: error = mb_init(mbp); if (!error) or if (error) return error; or if (mb_init(mbp) != 0) return ESOMETHINGEVIL; > > significantly affected. The names of source and header files are > > questionable too and I would appreciate good suggestions (currently > they > > are subr_mbuf.c and subr_mbuf.h). > > Hmmm. Maybe subr_mblib.c and libmb.h ? I don't want to turn this > into a bikeshed ( :-) ), so I suggest that you decide. Personally, I > would prefer that it be something other than "subr_mbuf.c" simply > because it may be a little misleading in some cases. Good point. > Boris, this is really a great interface and nice looking, clean code. I'm sure, this code can be significantly improved by mbuf gurus :) -- Boris Popov http://www.butya.kz/~bp/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 1:35:38 2001 Delivered-To: freebsd-arch@freebsd.org Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94]) by hub.freebsd.org (Postfix) with ESMTP id BE07737B699; Wed, 7 Feb 2001 01:35:18 -0800 (PST) Received: by relay.butya.kz (Postfix, from userid 1000) id CCF6A28648; Wed, 7 Feb 2001 15:35:16 +0600 (ALMT) Received: from localhost (localhost [127.0.0.1]) by relay.butya.kz (Postfix) with ESMTP id C394528647; Wed, 7 Feb 2001 15:35:16 +0600 (ALMT) Date: Wed, 7 Feb 2001 15:35:16 +0600 (ALMT) From: Boris Popov To: Harti Brandt Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: Sequential mbuf read/write extensions In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 7 Feb 2001, Harti Brandt wrote: > But, I would recommend to stick with the ususal naming of size dependend > things, by appending a numeric suffix. Something like: > > int mb_get8(struct mbdata *mbp, u_int8_t *x); > int mb_get16(struct mbdata *mbp, u_int16_t *x); > int mb_get16le(struct mbdata *mbp, u_int16_t *x); > int mb_get16be(struct mbdata *mbp, u_int16_t *x); > int mb_get32(struct mbdata *mbp, u_int32_t *x); > ... > > Using 'word' and 'doubleword' is rather confusing (when speeking of words > I would think of 32 bit nowadays). Well, it depends. For me 'word', 'dword' and 'qword' are clear from the good old 8bit days :) If numbers in the function names looks good I can live with it. Opinions ? -- Boris Popov http://www.butya.kz/~bp/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 1:57:44 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mailhub.fokus.gmd.de (mailhub.fokus.gmd.de [193.174.154.14]) by hub.freebsd.org (Postfix) with ESMTP id AED2037B69E; Wed, 7 Feb 2001 01:57:17 -0800 (PST) Received: from beagle (beagle [193.175.132.100]) by mailhub.fokus.gmd.de (8.8.8/8.8.8) with ESMTP id KAA01306; Wed, 7 Feb 2001 10:44:55 +0100 (MET) Date: Wed, 7 Feb 2001 10:44:55 +0100 (CET) From: Harti Brandt To: Boris Popov Cc: , Subject: Re: CFR: Sequential mbuf read/write extensions In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 7 Feb 2001, Boris Popov wrote: BP>> Using 'word' and 'doubleword' is rather confusing (when speeking of words BP>> I would think of 32 bit nowadays). BP> BP> Well, it depends. For me 'word', 'dword' and 'qword' are clear BP>from the good old 8bit days :) BP> BP> If numbers in the function names looks good I can live with it. Well, I just looked back to the bus_space stuff and discovered, that they use suffixes of _[1234] to count the number of bytes the functions operate on. Perhaps this is a better variant? Anyway, I think, numbers are much clearer, than words in this case (As an example, what does ntohl operate on if longs are 64 bit??). As a side note: Someone told me that Mickeysoft is trying to persuade the C standardisation people to drop the requirement that longs should not be shorter than int's. This is, he said, because of their braindamage with DWORD in -zillions of header files... If I look how they continue to cripple C, this may also slip through :-( harti -- harti brandt, http://www.fokus.gmd.de/research/cc/cats/employees/hartmut.brandt/private brandt@fokus.gmd.de, harti@begemot.org, lhbrandt@mail.ru To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 10:29:37 2001 Delivered-To: freebsd-arch@freebsd.org Received: from green.dyndns.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 0394A37B401 for ; Wed, 7 Feb 2001 10:29:07 -0800 (PST) Received: from localhost (6lbzax@localhost [127.0.0.1]) by green.dyndns.org (8.11.1/8.11.1) with ESMTP id f17ISLr17637 for ; Wed, 7 Feb 2001 13:28:33 -0500 (EST) (envelope-from green@FreeBSD.org) Message-Id: <200102071828.f17ISLr17637@green.dyndns.org> X-Mailer: exmh version 2.3.1 01/18/2001 with nmh-1.0.4 To: arch@FreeBSD.org Subject: xucred introduction From: "Brian F. Feldman" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 07 Feb 2001 13:28:21 -0500 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I'd like to commit this further clean-up of the kernel API in which struct ucred's use outside of the kernel is to be a last resort, and everything which would use ucred will use xucred. This mainly affects mount(2), and changes the size of those structures. However, xucred won't have to be changing size all the time, so this will be the last time mountd or i(den|ne)td would panic the kernel or return an error (respectively) for changes to ucred. Mike Smith would prefer it that for userland, ucred and xucred would be more something like the in-kernel kucred and external ucred, but I believe this will introduce absolutely nothing but headaches for code due to conditionalized structure definition upon _KERNEL being defined. Therefore, I've kept ucred as the in-kernel structure for kvm-using apps, xucred for everything else, with unfortunately the limitation that ucred.h must still be treated as a kernel header and dependencies noted accordingly by the programmer. I've verified this works on at least -CURRENT from the past week. I'd like to commit it soon to lessen any pain from more ucred changes, like rwatson's. The only question is whether or not to add some spare fields to xucred now in case we /do/ want to expand it in the future, and also whether it's appropriate to make some of the field type changes (for example, sockaddr length type -> u_char, since that _IS_ what is defined by the sockaddr interface). Discussion please :) Index: sbin/mountd/mountd.c =================================================================== RCS file: /usr2/ncvs/src/sbin/mountd/mountd.c,v retrieving revision 1.39 diff -u -r1.39 mountd.c --- sbin/mountd/mountd.c 1999/12/03 20:23:53 1.39 +++ sbin/mountd/mountd.c 2001/01/23 00:24:24 @@ -161,9 +161,9 @@ void del_mlist __P((char *, char *)); struct dirlist *dirp_search __P((struct dirlist *, char *)); int do_mount __P((struct exportlist *, struct grouplist *, int, - struct ucred *, char *, int, struct statfs *)); + struct xucred *, char *, int, struct statfs *)); int do_opt __P((char **, char **, struct exportlist *, struct grouplist *, - int *, int *, struct ucred *)); + int *, int *, struct xucred *)); struct exportlist *ex_search __P((fsid_t *)); struct exportlist *get_exp __P((void)); void free_dir __P((struct dirlist *)); @@ -184,7 +184,7 @@ void mntsrv __P((struct svc_req *, SVCXPRT *)); void nextfield __P((char **, char **)); void out_of_mem __P((void)); -void parsecred __P((char *, struct ucred *)); +void parsecred __P((char *, struct xucred *)); int put_exlist __P((struct dirlist *, XDR *, struct dirlist *, int *)); int scan_tree __P((struct dirlist *, u_int32_t)); static void usage __P((void)); @@ -202,8 +202,7 @@ struct mountlist *mlhead; struct grouplist *grphead; char exname[MAXPATHLEN]; -struct ucred def_anon = { - 1, +struct xucred def_anon = { (uid_t) -2, 1, { (gid_t) -2 } @@ -732,7 +731,7 @@ struct dirlist *dirhead; struct statfs fsb, *fsp; struct hostent *hpe; - struct ucred anon; + struct xucred anon; char *cp, *endcp, *dirp, *hst, *usr, *dom, savedc; int len, has_host, exflags, got_nondir, dirplen, num, i, netgrp; @@ -1332,7 +1331,7 @@ struct grouplist *grp; int *has_hostp; int *exflagsp; - struct ucred *cr; + struct xucred *cr; { char *cpoptarg, *cpoptend; char *cp, *endcp, *cpopt, savedc, savedc2; @@ -1591,7 +1590,7 @@ struct exportlist *ep; struct grouplist *grp; int exflags; - struct ucred *anoncrp; + struct xucred *anoncrp; char *dirp; int dirplen; struct statfs *fsb; @@ -1842,7 +1841,7 @@ void parsecred(namelist, cr) char *namelist; - struct ucred *cr; + struct xucred *cr; { char *name; int cnt; @@ -1854,7 +1853,6 @@ /* * Set up the unprivileged user. */ - cr->cr_ref = 1; cr->cr_uid = -2; cr->cr_groups[0] = -2; cr->cr_ngroups = 1; Index: sys/kern/vfs_subr.c =================================================================== RCS file: /usr2/ncvs/src/sys/kern/vfs_subr.c,v retrieving revision 1.301 diff -u -r1.301 vfs_subr.c --- sys/kern/vfs_subr.c 2001/01/31 04:54:23 1.301 +++ sys/kern/vfs_subr.c 2001/02/01 04:14:22 @@ -2319,7 +2319,11 @@ return (EPERM); np = &nep->ne_defexported; np->netc_exflags = argp->ex_flags; - np->netc_anon = argp->ex_anon; + bzero(&np->netc_anon, sizeof(np->netc_anon)); + np->netc_anon.cr_uid = argp->ex_anon.cr_uid; + np->netc_anon.cr_ngroups = argp->ex_anon.cr_ngroups; + bcopy(argp->ex_anon.cr_groups, np->netc_anon.cr_groups, + sizeof(np->netc_anon.cr_groups)); np->netc_anon.cr_ref = 1; mp->mnt_flag |= MNT_DEFEXPORTED; return (0); @@ -2363,7 +2367,11 @@ goto out; } np->netc_exflags = argp->ex_flags; - np->netc_anon = argp->ex_anon; + bzero(&np->netc_anon, sizeof(np->netc_anon)); + np->netc_anon.cr_uid = argp->ex_anon.cr_uid; + np->netc_anon.cr_ngroups = argp->ex_anon.cr_ngroups; + bcopy(argp->ex_anon.cr_groups, np->netc_anon.cr_groups, + sizeof(np->netc_anon.cr_groups)); np->netc_anon.cr_ref = 1; return (0); out: Index: sys/netinet/tcp_subr.c =================================================================== RCS file: /usr2/ncvs/src/sys/netinet/tcp_subr.c,v retrieving revision 1.86 diff -u -r1.86 tcp_subr.c --- sys/netinet/tcp_subr.c 2000/12/24 10:57:21 1.86 +++ sys/netinet/tcp_subr.c 2001/01/23 00:13:00 @@ -893,6 +893,7 @@ static int tcp_getcred(SYSCTL_HANDLER_ARGS) { + struct xucred xuc; struct sockaddr_in addrs[2]; struct inpcb *inp; int error, s; @@ -910,19 +911,25 @@ error = ENOENT; goto out; } - error = SYSCTL_OUT(req, inp->inp_socket->so_cred, sizeof(struct ucred)); + + xuc.cr_uid = inp->inp_socket->so_cred->cr_uid; + xuc.cr_ngroups = inp->inp_socket->so_cred->cr_ngroups; + bcopy(inp->inp_socket->so_cred->cr_groups, xuc.cr_groups, + sizeof(xuc.cr_groups)); + error = SYSCTL_OUT(req, &xuc, sizeof(struct xucred)); out: splx(s); return (error); } SYSCTL_PROC(_net_inet_tcp, OID_AUTO, getcred, CTLTYPE_OPAQUE|CTLFLAG_RW, - 0, 0, tcp_getcred, "S,ucred", "Get the ucred of a TCP connection"); + 0, 0, tcp_getcred, "S,xucred", "Get the xucred of a TCP connection"); #ifdef INET6 static int tcp6_getcred(SYSCTL_HANDLER_ARGS) { + struct xucred xuc; struct sockaddr_in6 addrs[2]; struct inpcb *inp; int error, s, mapped = 0; @@ -956,8 +963,12 @@ error = ENOENT; goto out; } - error = SYSCTL_OUT(req, inp->inp_socket->so_cred, - sizeof(struct ucred)); + + xuc.cr_uid = inp->inp_socket->so_cred->cr_uid; + xuc.cr_ngroups = inp->inp_socket->so_cred->cr_ngroups; + bcopy(inp->inp_socket->so_cred->cr_groups, xuc.cr_groups, + sizeof(xuc.cr_groups)); + error = SYSCTL_OUT(req, &xuc, sizeof(struct xucred)); out: splx(s); return (error); @@ -965,7 +976,7 @@ SYSCTL_PROC(_net_inet6_tcp6, OID_AUTO, getcred, CTLTYPE_OPAQUE|CTLFLAG_RW, 0, 0, - tcp6_getcred, "S,ucred", "Get the ucred of a TCP6 connection"); + tcp6_getcred, "S,xucred", "Get the xucred of a TCP6 connection"); #endif Index: sys/netinet/udp_usrreq.c =================================================================== RCS file: /usr2/ncvs/src/sys/netinet/udp_usrreq.c,v retrieving revision 1.80 diff -u -r1.80 udp_usrreq.c --- sys/netinet/udp_usrreq.c 2000/12/24 10:57:21 1.80 +++ sys/netinet/udp_usrreq.c 2001/01/23 00:13:50 @@ -606,6 +606,7 @@ static int udp_getcred(SYSCTL_HANDLER_ARGS) { + struct xucred xuc; struct sockaddr_in addrs[2]; struct inpcb *inp; int error, s; @@ -623,14 +624,19 @@ error = ENOENT; goto out; } - error = SYSCTL_OUT(req, inp->inp_socket->so_cred, sizeof(struct ucred)); + + xuc.cr_uid = inp->inp_socket->so_cred->cr_uid; + xuc.cr_ngroups = inp->inp_socket->so_cred->cr_ngroups; + bcopy(inp->inp_socket->so_cred->cr_groups, xuc.cr_groups, + sizeof(xuc.cr_groups)); + error = SYSCTL_OUT(req, &xuc, sizeof(struct xucred)); out: splx(s); return (error); } SYSCTL_PROC(_net_inet_udp, OID_AUTO, getcred, CTLTYPE_OPAQUE|CTLFLAG_RW, - 0, 0, udp_getcred, "S,ucred", "Get the ucred of a UDP connection"); + 0, 0, udp_getcred, "S,xucred", "Get the xucred of a UDP connection"); static int udp_output(inp, m, addr, control, p) Index: sys/netinet6/udp6_usrreq.c =================================================================== RCS file: /usr2/ncvs/src/sys/netinet6/udp6_usrreq.c,v retrieving revision 1.13 diff -u -r1.13 udp6_usrreq.c --- sys/netinet6/udp6_usrreq.c 2000/10/23 07:11:01 1.13 +++ sys/netinet6/udp6_usrreq.c 2001/01/23 00:15:16 @@ -474,6 +474,7 @@ static int udp6_getcred(SYSCTL_HANDLER_ARGS) { + struct xucred xuc; struct sockaddr_in6 addrs[2]; struct inpcb *inp; int error, s; @@ -484,7 +485,7 @@ if (req->newlen != sizeof(addrs)) return (EINVAL); - if (req->oldlen != sizeof(struct ucred)) + if (req->oldlen != sizeof(struct xucred)) return (EINVAL); error = SYSCTL_IN(req, addrs, sizeof(addrs)); if (error) @@ -498,9 +499,12 @@ error = ENOENT; goto out; } - error = SYSCTL_OUT(req, inp->inp_socket->so_cred, - sizeof(struct ucred)); + xuc.cr_uid = inp->inp_socket->so_cred->cr_uid; + xuc.cr_ngroups = inp->inp_socket->so_cred->cr_ngroups; + bcopy(inp->inp_socket->so_cred->cr_groups, xuc.cr_groups, + sizeof(xuc.cr_groups)); + error = SYSCTL_OUT(req, &xuc, sizeof(struct xucred)); out: splx(s); return (error); @@ -508,7 +512,7 @@ SYSCTL_PROC(_net_inet6_udp6, OID_AUTO, getcred, CTLTYPE_OPAQUE|CTLFLAG_RW, 0, 0, - udp6_getcred, "S,ucred", "Get the ucred of a UDP6 connection"); + udp6_getcred, "S,xucred", "Get the xucred of a UDP6 connection"); static int udp6_abort(struct socket *so) Index: sys/nfs/nfs.h =================================================================== RCS file: /usr2/ncvs/src/sys/nfs/nfs.h,v retrieving revision 1.56 diff -u -r1.56 nfs.h --- sys/nfs/nfs.h 2000/10/24 10:13:36 1.56 +++ sys/nfs/nfs.h 2001/01/23 00:28:27 @@ -197,7 +197,7 @@ struct nfsd *nsd_nfsd; /* Pointer to in kernel nfsd struct */ uid_t nsd_uid; /* Effective uid mapped to cred */ u_int32_t nsd_haddr; /* Ip address of client */ - struct ucred nsd_cr; /* Cred. uid maps to */ + struct xucred nsd_cr; /* Cred. uid maps to */ int nsd_authlen; /* Length of auth string (ret) */ u_char *nsd_authstr; /* Auth string (ret) */ int nsd_verflen; /* and the verfier */ Index: sys/nfs/nfs_syscalls.c =================================================================== RCS file: /usr2/ncvs/src/sys/nfs/nfs_syscalls.c,v retrieving revision 1.64 diff -u -r1.64 nfs_syscalls.c --- sys/nfs/nfs_syscalls.c 2000/12/21 21:44:24 1.64 +++ sys/nfs/nfs_syscalls.c 2001/01/23 00:48:56 @@ -244,7 +244,7 @@ slp->ns_numuids++; nuidp = (struct nfsuid *) malloc(sizeof (struct nfsuid), M_NFSUID, - M_WAITOK); + M_WAITOK | M_ZERO); } else nuidp = (struct nfsuid *)0; if ((slp->ns_flag & SLP_VALID) == 0) { @@ -260,7 +260,12 @@ FREE(nuidp->nu_nam, M_SONAME); } nuidp->nu_flag = 0; - nuidp->nu_cr = nsd->nsd_cr; + nuidp->nu_cr.cr_uid = nsd->nsd_cr.cr_uid; + nuidp->nu_cr.cr_ngroups = + nsd->nsd_cr.cr_ngroups; + bcopy(nsd->nsd_cr.cr_groups, + nuidp->nu_cr.cr_groups, + sizeof(nuidp->nu_cr.cr_groups)); if (nuidp->nu_cr.cr_ngroups > NGROUPS) nuidp->nu_cr.cr_ngroups = NGROUPS; nuidp->nu_cr.cr_ref = 1; Index: sys/sys/mount.h =================================================================== RCS file: /usr2/ncvs/src/sys/sys/mount.h,v retrieving revision 1.99 diff -u -r1.99 mount.h --- sys/sys/mount.h 2000/12/04 09:21:05 1.99 +++ sys/sys/mount.h 2001/01/23 00:32:10 @@ -245,11 +245,11 @@ struct export_args { int ex_flags; /* export related flags */ uid_t ex_root; /* mapping for root uid */ - struct ucred ex_anon; /* mapping for anonymous user */ + struct xucred ex_anon; /* mapping for anonymous user */ struct sockaddr *ex_addr; /* net address to which exported */ - int ex_addrlen; /* and the net address length */ + u_char ex_addrlen; /* and the net address length */ struct sockaddr *ex_mask; /* mask of valid bits in saddr */ - int ex_masklen; /* and the smask length */ + u_char ex_masklen; /* and the smask length */ char *ex_indexfile; /* index file for WebNFS URLs */ }; Index: sys/sys/ucred.h =================================================================== RCS file: /usr2/ncvs/src/sys/sys/ucred.h,v retrieving revision 1.19 diff -u -r1.19 ucred.h --- sys/sys/ucred.h 2000/11/30 19:09:47 1.19 +++ sys/sys/ucred.h 2001/01/28 22:53:01 @@ -53,9 +53,18 @@ struct uidinfo *cr_uidinfo; /* per uid resource consumption */ struct mtx cr_mtx; /* protect refcount */ }; -#define cr_gid cr_groups[0] #define NOCRED ((struct ucred *)0) /* no credential available */ #define FSCRED ((struct ucred *)-1) /* filesystem credential */ + +/* + * This is the external representation of struct ucred which "won't change". + */ +struct xucred { + uid_t cr_uid; /* effective user id */ + short cr_ngroups; /* number of groups */ + gid_t cr_groups[NGROUPS]; /* groups */ +}; +#define cr_gid cr_groups[0] #ifdef _KERNEL Index: usr.sbin/inetd/builtins.c =================================================================== RCS file: /usr2/ncvs/src/usr.sbin/inetd/builtins.c,v retrieving revision 1.29 diff -u -r1.29 builtins.c --- usr.sbin/inetd/builtins.c 2000/12/05 13:56:01 1.29 +++ usr.sbin/inetd/builtins.c 2001/01/22 23:54:26 @@ -338,7 +338,7 @@ struct sockaddr_in6 sin6[2]; #endif struct sockaddr_storage ss[2]; - struct ucred uc; + struct xucred uc; struct timeval tv = { 10, 0 -- Brian Fundakowski Feldman \ FreeBSD: The Power to Serve! / green@FreeBSD.org `------------------------------' To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 10:33:19 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147]) by hub.freebsd.org (Postfix) with ESMTP id DBF3B37B491; Wed, 7 Feb 2001 10:32:59 -0800 (PST) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f17IX5H02713; Wed, 7 Feb 2001 19:33:05 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: "Brian F. Feldman" Cc: arch@FreeBSD.ORG Subject: Re: xucred introduction In-Reply-To: Your message of "Wed, 07 Feb 2001 13:28:21 EST." <200102071828.f17ISLr17637@green.dyndns.org> Date: Wed, 07 Feb 2001 19:33:05 +0100 Message-ID: <2711.981570785@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <200102071828.f17ISLr17637@green.dyndns.org>, "Brian F. Feldman" wri tes: >The only question is whether or not to add some spare fields to >xucred now in case we /do/ want to expand it in the future, and also whether >it's appropriate to make some of the field type changes (for example, >sockaddr length type -> u_char, since that _IS_ what is defined by the >sockaddr interface). Have you already put a version number in it ? Otherwise please do so. That is the best way to ensure that we don't get too many problems in the future. I think in general all structures shared between the kernel and userland should be equipped with a version number as the first element. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 10:37:31 2001 Delivered-To: freebsd-arch@freebsd.org Received: from syncopation-03.iinet.net.au (syncopation-03.iinet.net.au [203.59.24.49]) by hub.freebsd.org (Postfix) with SMTP id 3E4F537B65D for ; Wed, 7 Feb 2001 10:37:13 -0800 (PST) Received: (qmail 20518 invoked by uid 666); 7 Feb 2001 18:44:43 -0000 Received: from reggae-22-100.nv.iinet.net.au (HELO elischer.org) (203.59.87.100) by mail.m.iinet.net.au with SMTP; 7 Feb 2001 18:44:43 -0000 Message-ID: <3A8195D4.8CFECC9@elischer.org> Date: Wed, 07 Feb 2001 10:37:08 -0800 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: "Brian F. Feldman" Cc: arch@FreeBSD.org Subject: Re: xucred introduction References: <200102071828.f17ISLr17637@green.dyndns.org> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG "Brian F. Feldman" wrote: > > I'd like to commit this further clean-up of the kernel API in which struct [...] technically it seems ok.. it's a political decision as to whether it should be done... -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000-2001 ---> X_.---._/ v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 10:44:48 2001 Delivered-To: freebsd-arch@freebsd.org Received: from syncopation-03.iinet.net.au (syncopation-03.iinet.net.au [203.59.24.49]) by hub.freebsd.org (Postfix) with SMTP id 0549C37B491 for ; Wed, 7 Feb 2001 10:44:30 -0800 (PST) Received: (qmail 20921 invoked by uid 666); 7 Feb 2001 18:52:00 -0000 Received: from reggae-22-100.nv.iinet.net.au (HELO elischer.org) (203.59.87.100) by mail.m.iinet.net.au with SMTP; 7 Feb 2001 18:52:00 -0000 Message-ID: <3A819788.DA35F78C@elischer.org> Date: Wed, 07 Feb 2001 10:44:24 -0800 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Poul-Henning Kamp Cc: "Brian F. Feldman" , arch@FreeBSD.ORG Subject: Re: xucred introduction References: <2711.981570785@critter> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Poul-Henning Kamp wrote: > > In message <200102071828.f17ISLr17637@green.dyndns.org>, "Brian F. Feldman" wri > tes: > > >The only question is whether or not to add some spare fields to > >xucred now in case we /do/ want to expand it in the future, and also whether > >it's appropriate to make some of the field type changes (for example, > >sockaddr length type -> u_char, since that _IS_ what is defined by the > >sockaddr interface). > > Have you already put a version number in it ? Otherwise please > do so. That is the best way to ensure that we don't get too many > problems in the future. > > I think in general all structures shared between the kernel and userland > should be equipped with a version number as the first element. this brings up whether we should have 'rules' for kernel structures in general.. for example "Always start with a version number followed by a magic number followed by the reference count and the lock" or something like that. I know some systems DO impoes such rules and seem to get advantages from it. (you can add debug code to check the magic numbers really easily for example). Not a REALLY serious suggestion but something to consider. what would YOU like to see as a standard part of kernel structures? reference count? magic number? generation count? lock (pointer?) version number? I leads to a general discussion about kernel architecture eventually :-) > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000-2001 ---> X_.---._/ v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 10:50:18 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147]) by hub.freebsd.org (Postfix) with ESMTP id E716F37B401; Wed, 7 Feb 2001 10:50:00 -0800 (PST) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f17Io4H02865; Wed, 7 Feb 2001 19:50:04 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: Julian Elischer Cc: "Brian F. Feldman" , arch@FreeBSD.ORG Subject: Re: xucred introduction In-Reply-To: Your message of "Wed, 07 Feb 2001 10:44:24 PST." <3A819788.DA35F78C@elischer.org> Date: Wed, 07 Feb 2001 19:50:04 +0100 Message-ID: <2863.981571804@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG >Not a REALLY serious suggestion but something to consider. >what would YOU like to see as a standard part of kernel structures? All I want is a layout version number as the first element in the structure if it is in any way sanctioned for use from userland. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 11:24:38 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 4EC4837B503; Wed, 7 Feb 2001 11:24:22 -0800 (PST) Received: (from dillon@localhost) by earth.backplane.com (8.11.1/8.9.3) id f17JNlX91394; Wed, 7 Feb 2001 11:23:47 -0800 (PST) (envelope-from dillon) Date: Wed, 7 Feb 2001 11:23:47 -0800 (PST) From: Matt Dillon Message-Id: <200102071923.f17JNlX91394@earth.backplane.com> To: Julian Elischer Cc: Poul-Henning Kamp , "Brian F. Feldman" , arch@FreeBSD.ORG Subject: Re: xucred introduction References: <2711.981570785@critter> <3A819788.DA35F78C@elischer.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :this brings up whether we should have 'rules' for kernel structures in general.. I'd have to say no. It's too easy for this sort of thing to get completely out of control. I agree with Poul re: having a version number at the head of any structure exported to userland. Maybe a size as well (or some out of band way to get the structure size). -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 11:59:53 2001 Delivered-To: freebsd-arch@freebsd.org Received: from flood.ping.uio.no (flood.ping.uio.no [129.240.78.31]) by hub.freebsd.org (Postfix) with ESMTP id 1A31237B503; Wed, 7 Feb 2001 11:59:35 -0800 (PST) Received: (from des@localhost) by flood.ping.uio.no (8.9.3/8.9.3) id UAA63934; Wed, 7 Feb 2001 20:57:59 +0100 (CET) (envelope-from des@ofug.org) X-URL: http://www.ofug.org/~des/ X-Disclaimer: The views expressed in this message do not necessarily coincide with those of any organisation or company with which I am or have been affiliated. To: Poul-Henning Kamp Cc: Julian Elischer , "Brian F. Feldman" , arch@FreeBSD.ORG Subject: Re: xucred introduction References: <2863.981571804@critter> From: Dag-Erling Smorgrav Date: 07 Feb 2001 20:57:59 +0100 In-Reply-To: Poul-Henning Kamp's message of "Wed, 07 Feb 2001 19:50:04 +0100" Message-ID: Lines: 15 User-Agent: Gnus/5.0802 (Gnus v5.8.2) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Poul-Henning Kamp writes: > All I want is a layout version number as the first element in > the structure if it is in any way sanctioned for use from userland. Some structures (specifically, those that are to be stored in zones) *must* start with two pointers to their own type. This is arguably a design flaw in the zone allocator. One possible fix is to add an extra argument to zinit(), zinitna() and zbootinit() to specify the offset of these pointers within the structure; another is to have the zone allocator prepend those pointers itself, so they don't need to be in the structure at all. DES -- Dag-Erling Smorgrav - des@ofug.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 12:36:21 2001 Delivered-To: freebsd-arch@freebsd.org Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193]) by hub.freebsd.org (Postfix) with ESMTP id 2D7AF37B4EC for ; Wed, 7 Feb 2001 12:36:03 -0800 (PST) Received: (from wollman@localhost) by khavrinen.lcs.mit.edu (8.9.3/8.9.3) id PAA46466; Wed, 7 Feb 2001 15:35:57 -0500 (EST) (envelope-from wollman) Date: Wed, 7 Feb 2001 15:35:57 -0500 (EST) From: Garrett Wollman Message-Id: <200102072035.PAA46466@khavrinen.lcs.mit.edu> To: des@ofug.org Cc: arch@freebsd.org Subject: Re: xucred introduction X-Newsgroups: mit.lcs.mail.freebsd-arch In-Reply-To: References: Organization: MIT Laboratory for Computer Science Cc: Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In article you write: >Some structures (specifically, those that are to be stored in zones) >*must* start with two pointers to their own type. No, they don't. See, e.g., struct inpcb. The restriction that you get from the zone allocator is that the beginning of the zone is overlaid with two such pointers *while the object is free*, so you cannot depend on type-stability for values which would be stored there. In the TCP stack, the only thing we really care about being type-stable is the generation count, which was intentionally placed at the end of the structure. -GAWollman -- Garrett A. Wollman | O Siem / We are all family / O Siem / We're all the same wollman@lcs.mit.edu | O Siem / The fires of freedom Opinions not those of| Dance in the burning flame MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 13:26:42 2001 Delivered-To: freebsd-arch@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id 2A49137B65D; Wed, 7 Feb 2001 13:26:18 -0800 (PST) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id OAA27535; Wed, 7 Feb 2001 14:23:20 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp03.primenet.com, id smtpdAAA7zaWQ1; Wed Feb 7 14:23:10 2001 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id OAA24284; Wed, 7 Feb 2001 14:26:00 -0700 (MST) From: Terry Lambert Message-Id: <200102072126.OAA24284@usr08.primenet.com> Subject: Re: vnode interlock API To: bp@butya.kz (Boris Popov) Date: Wed, 7 Feb 2001 21:26:00 +0000 (GMT) Cc: freebsd-arch@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG In-Reply-To: from "Boris Popov" at Feb 06, 2001 05:00:03 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > So, I suggest to introduce two macro definitions which will hide > implementation details for interlocks: > > #define VI_LOCK(vp) mtx_enter(&(vp)->v_interlock, MTX_DEF) > #define VI_UNLOCK(vp) mtx_exit(&(vp)->v_interlock, MTX_DEF) > > for RELENG_4 they will look like this: > > #define VI_LOCK(vp) simple_lock(&(vp)->v_interlock) > #define VI_UNLOCK(vp) simple_unlock(&(vp)->v_interlock) > > Any comments, suggestions ? 1) Macros are good; interfaces are better. I've consistantly recommended that the NFS cookie interface be rewritten to not require cookies, even though the FreeBSD/NetBSD/OpenBSD differences _could_ be masked with macros. The issue is one of binary vs. source compatability. 2) If you are going to wrap vnode handling, it would probably be a good idea to wrap it using the same approach that another OS uses, instead of being gratuitously different in naming. I would suggest using the Solaris names, but I will admit that doing that depends heavily on the semantics being the same (I think they would be). Worst case, pick an OS with the same semantics; if there are none, this may be an opportunity to learn from other OSs _why_ they don't have the same semantics. 3) It seems to mee that the additional parameter of MTX_DEF is gratuitous, and tries to stretch mutex semantics further than they should be stretched. I personally would have no problem with the conversion of simple_{un}lock() into the equivalent mtx_*() calls. Even if the MTX_DEF can not be murdered without a large public outcry, using this as the the default demantic for the simple_*() equivalents isn't really a bad idea, in my book, and could be done with inline wrappers. Best case, one could apply the WITNESS code to debugging 4.x problems, with some work. 4) You need to wrap the calls with "{ ... }"; this is because it may be useful in the future to institute turnstile or single wakeup semantics, and converting the macro into a single statement instead of a statement block would mean a potentially large amount of work would be needed to cope with the change later, whereas, you seem to plan to already need to touch all those spots now. Again, the Solaris SMP vnode lock management macros are, I think, a good example (or at least they were, six years ago, when Solaris faced the same problem). I have other comments, but these are the four most important ones, IMO, and I've been making a conscious effort to not clutter arguments by giving more detail than people seem to want to hear before they overflow and tune out. 8-). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 13:38:28 2001 Delivered-To: freebsd-arch@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id AEE6237B6A6; Wed, 7 Feb 2001 13:38:09 -0800 (PST) Received: (from daemon@localhost) by smtp02.primenet.com (8.9.3/8.9.3) id OAA05271; Wed, 7 Feb 2001 14:32:13 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp02.primenet.com, id smtpdAAAOCaGMh; Wed Feb 7 14:29:51 2001 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id OAA24445; Wed, 7 Feb 2001 14:35:45 -0700 (MST) From: Terry Lambert Message-Id: <200102072135.OAA24445@usr08.primenet.com> Subject: Re: CFR: Sequential mbuf read/write extensions To: bp@butya.kz (Boris Popov) Date: Wed, 7 Feb 2001 21:35:44 +0000 (GMT) Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG In-Reply-To: from "Boris Popov" at Feb 06, 2001 05:50:52 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > Before starting import process for smbfs, I would like to > introduce new API which greatly simplifies process of packaging data into > mbufs and fetching it back (in fact, similar API already presented in the > tree, but it is private to the netncp code and it will be really nice to > share it). [ ... ] Please include the ability to determine the length of the current contents (as a marcro?) so that buffers can be padded, as necessary, since some hardware and some protocols require this. Also consider protecting the structure with a mutex, at least in kernel space (this would make the macro harder to write, which is why I put it into a parenthetical, question-marjed statement). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 14:33:19 2001 Delivered-To: freebsd-arch@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 2F26737B6BA; Wed, 7 Feb 2001 14:33:02 -0800 (PST) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id PAA01063; Wed, 7 Feb 2001 15:27:40 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp04.primenet.com, id smtpdAAAiway1b; Wed Feb 7 15:27:24 2001 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id PAA26462; Wed, 7 Feb 2001 15:32:28 -0700 (MST) From: Terry Lambert Message-Id: <200102072232.PAA26462@usr08.primenet.com> Subject: Re: xucred introduction To: dillon@earth.backplane.com (Matt Dillon) Date: Wed, 7 Feb 2001 22:32:28 +0000 (GMT) Cc: julian@elischer.org (Julian Elischer), phk@critter.freebsd.dk (Poul-Henning Kamp), green@FreeBSD.ORG (Brian F. Feldman), arch@FreeBSD.ORG In-Reply-To: <200102071923.f17JNlX91394@earth.backplane.com> from "Matt Dillon" at Feb 07, 2001 11:23:47 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > :this brings up whether we should have 'rules' for kernel structures in general.. > > I'd have to say no. It's too easy for this sort of thing to get > completely out of control. Pretty soon you end up with things like "The VAX Calling Standard", which leads to nasty things like clustering, transparent process migration, autonatic load balancing, software fault tolerance, automatic failover, and all those things we'd rather not think matter to anyone unless they are running a server OS... PS: My vote is to put the mutex first, not export it to user space, and then put the version number. I'd keep the version number even if it weren't a user-space/kernel-space interface, since you never know when it will be useful to deal with passing a structure between a new kernel and an old driver/module, or vice versa... PPS: User-to-kernel writes that change contents should hold the mutex in the kernel, in the API. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 15: 4: 7 2001 Delivered-To: freebsd-arch@freebsd.org Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222]) by hub.freebsd.org (Postfix) with ESMTP id A8A4237B6C4; Wed, 7 Feb 2001 15:03:46 -0800 (PST) Received: from foo.osd.bsdi.com (root@foo.osd.bsdi.com [204.216.28.137]) by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id f17N3cx18927; Wed, 7 Feb 2001 15:03:38 -0800 (PST) (envelope-from jhb@foo.osd.bsdi.com) Received: (from jhb@localhost) by foo.osd.bsdi.com (8.11.1/8.11.1) id f17N3FU14366; Wed, 7 Feb 2001 15:03:15 -0800 (PST) (envelope-from jhb) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <2711.981570785@critter> Date: Wed, 07 Feb 2001 15:03:15 -0800 (PST) Organization: BSD, Inc. From: John Baldwin To: Poul-Henning Kamp Subject: Re: xucred introduction Cc: arch@FreeBSD.ORG, "Brian F. Feldman" Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 07-Feb-01 Poul-Henning Kamp wrote: > In message <200102071828.f17ISLr17637@green.dyndns.org>, "Brian F. Feldman" > wri > tes: > >>The only question is whether or not to add some spare fields to >>xucred now in case we /do/ want to expand it in the future, and also whether >>it's appropriate to make some of the field type changes (for example, >>sockaddr length type -> u_char, since that _IS_ what is defined by the >>sockaddr interface). > > Have you already put a version number in it ? Otherwise please > do so. That is the best way to ensure that we don't get too many > problems in the future. > > I think in general all structures shared between the kernel and userland > should be equipped with a version number as the first element. As a sidebar, for anyone looking for something to do: kinfo_proc needs a version number as well. If you change the size of something in the middle of the structure, things like ps(1) and top(1) won't notice a problem but will just misparse the structure. :( -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 15:23:28 2001 Delivered-To: freebsd-arch@freebsd.org Received: from VL-MS-MR002.sc1.videotron.ca (relais.videotron.ca [24.201.245.36]) by hub.freebsd.org (Postfix) with ESMTP id 71B2A37B6C3; Wed, 7 Feb 2001 15:23:04 -0800 (PST) Received: from jehovah ([24.201.144.31]) by VL-MS-MR002.sc1.videotron.ca (Netscape Messaging Server 4.15) with SMTP id G8EUAA03.L5I; Wed, 7 Feb 2001 18:22:58 -0500 Message-ID: <002e01c0915d$326a7ec0$1f90c918@jehovah> From: "Bosko Milekic" To: "Terry Lambert" , "Boris Popov" Cc: , References: <200102072126.OAA24284@usr08.primenet.com> Subject: Re: vnode interlock API Date: Wed, 7 Feb 2001 18:25:02 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2919.6700 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Terry Lambert wrote: [...] > 3) It seems to mee that the additional parameter of MTX_DEF is > gratuitous, and tries to stretch mutex semantics further > than they should be stretched. I personally would have no > problem with the conversion of simple_{un}lock() into the > equivalent mtx_*() calls. Even if the MTX_DEF can not be > murdered without a large public outcry, using this as the Actually, it has been murdered: http://people.freebsd.org/~bmilekic/code/mutex_cleanup-7.1.diff Presently under testing. > the default demantic for the simple_*() equivalents isn't > really a bad idea, in my book, and could be done with > inline wrappers. Best case, one could apply the WITNESS > code to debugging 4.x problems, with some work. [...] > > Terry Lambert > terry@lambert.org > --- > Any opinions in this posting are my own and not those of my present > or previous employers. Regards, Bosko. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 16:35:56 2001 Delivered-To: freebsd-arch@freebsd.org Received: from midas.ifour.com.br (unknown [200.238.229.70]) by hub.freebsd.org (Postfix) with SMTP id E76F637B4EC for ; Wed, 7 Feb 2001 16:35:36 -0800 (PST) Received: (qmail 20944 invoked from network); 7 Feb 2001 21:30:53 -0000 Received: from unknown (HELO ifour.com.br) (192.168.1.11) by 192.168.1.10 with SMTP; 7 Feb 2001 21:30:53 -0000 Message-ID: <3A81CD00.9C5461FB@ifour.com.br> Date: Wed, 07 Feb 2001 22:32:32 +0000 From: Gustavo Vieira Goncalves Coelho Rios X-Mailer: Mozilla 4.76 [en] (X11; U; FreeBSD 4.2-STABLE i386) X-Accept-Language: en MIME-Version: 1.0 To: freebsd-arch@freebsd.org Subject: own boot floppies set Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG May some one give me some help where i can find documentation on building my own boot floppy disk for freebsd ? Thanks in advance! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 7 17:47:35 2001 Delivered-To: freebsd-arch@freebsd.org Received: from molly.straylight.com (molly.straylight.com [209.68.199.242]) by hub.freebsd.org (Postfix) with ESMTP id D0B2A37B491; Wed, 7 Feb 2001 17:47:15 -0800 (PST) Received: from dickie (case.straylight.com [209.68.199.244]) by molly.straylight.com (8.11.0/8.10.0) with SMTP id f181l5X04054; Wed, 7 Feb 2001 17:47:09 -0800 From: "Jonathan Graehl" To: Cc: "Jonathan Lemon" Subject: empirical results of waiting for nonblocking connect with kqueue/EVFILT_WRITE (EV_EOF is not set for timed out connections, bug?) Date: Wed, 7 Feb 2001 17:47:54 -0800 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 Importance: Normal Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG cases for the kevent returned from EVFILT_WRITE for socket whose connect returned error EWOULDBLOCK: connection failed, refused: flags=0x8001 (= EV_EOF & EV_ADD); data=0x4000 connection failed, timed out (+ any icmp response, host unreachable, host admin prohibited, etc): flags=0x1 (= EV_ADD); data=0x4000 connection succesful: flags=0x1 = EV_ADD; data=0x43e0 ( = socket buffer bytes available to write) if you want to see the particular error code (host or net unreachable, or just plain timed out) for a timed out connection, you can use getsockopt(SO_ERROR...). also getpeername can determine if the socket is connected (is there a more direct socket call to do so?) question: clearly, the event for a pending connection has these reproducible (aside from changing the socket send buffer size), undocumented values in the flags/data fields. what can be counted on (and documented) in the future? for now, i would use the test e.data != 0x4000, and make sure i don't set my socket send buffer small enough for any confusion to arise. i would think that EV_EOF should be set for timed out connections as well as refused ones, and this should be the documented criteria my suggestion would be to create a flag EV_SOERR, and change filt_soread and filt_sowrite (in sys/kern/uipc_socket.c) from: if (so->so_error) /* temporary udp error */ return (1); to: if (so->so_error) { kn->kn_flags |= EV_SOERR; kn->kn_data = so->so_error; return (1); } or, to maintain compatibility (if it is necessary to return with no indication for udp errors?), to: if (so->so_error) { if ((so->so_proto->pr_flags & PR_CONNREQUIRED)) kn->kn_flags |= EV_EOF; return (1); } (EV_EOF and/or EV_SOERR would be fine in either case, as long as there is some indication, although it would be nice to not have to getsockopt(SO_ERR,...)) larger context: static int filt_sowrite(struct knote *kn, long hint) { struct socket *so = (struct socket *)kn->kn_fp->f_data; kn->kn_data = sbspace(&so->so_snd); if (so->so_state & SS_CANTSENDMORE) { kn->kn_flags |= EV_EOF; return (1); } if (so->so_error) /* temporary udp error */ return (1); if (((so->so_state & SS_ISCONNECTED) == 0) && (so->so_proto->pr_flags & PR_CONNREQUIRED)) return (0); return (kn->kn_data >= so->so_snd.sb_lowat); } disclaimer: i only vaguely understand what's going on ;) -- Jonathan Graehl email: jonathan@graehl.org web: http://jonathan.graehl.org/ phone: 858-642-7562 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 8 4:34:18 2001 Delivered-To: freebsd-arch@freebsd.org Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by hub.freebsd.org (Postfix) with SMTP id DAA3737B65D; Thu, 8 Feb 2001 04:33:51 -0800 (PST) Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 8 Feb 2001 12:33:50 +0000 (GMT) To: Boris Popov Cc: freebsd-arch@freebsd.org, freebsd-net@freebsd.org, iedowse@maths.tcd.ie Subject: Re: CFR: Sequential mbuf read/write extensions In-Reply-To: Your message of "Tue, 06 Feb 2001 17:50:52 +0600." Date: Thu, 08 Feb 2001 12:33:50 +0000 From: Ian Dowse Message-ID: <200102081233.aa16167@salmon.maths.tcd.ie> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message , Boris Popo v writes: > Before starting import process for smbfs, I would like to >introduce new API which greatly simplifies process of packaging data into >mbufs and fetching it back (in fact, similar API already presented in the >tree, but it is private to the netncp code and it will be really nice to >share it). Hi Boris, These mbuf chain manipulation primitives look great! I was playing around with some similar code myself a while ago, so I'll just mention a few of the general issues that may be worth thinking about. I don't have any strong opinions about what approaches are best, so please don't take anything I say too seriously unless you agree with it :-) It may be beneficial to use separate structs for the build and breakdown operations. The two cases have slightly different requirements: the mb_count field is only useful when building, and mb_pos is only strictly necessary when breaking down mbuf chains. The main advantage of using separate structs is better compiler type checking, especially in the arguments to functions that need to break down one chain and build another. The i386 architecture is not fussy about alignment of multi-byte types in memory operations. However other architectures are not so forgiving. Some NIC drivers have to do magic to ensure that IP packets are 4-byte aligned, but this will not help if you are using a protocol that does not guarantee 4-byte alignment of 32- or 64-bit quantities within the IP packet. Doing a mb_get_dword(...); mb_get_byte(...); mb_get_dword(...); will cause an alignment exception on the alpha, for example. Someone suggested using numeric names to indicate the size of the types rather than 'byte', 'word' etc. I'd agree with this too; the text names are not intuitive unless you have used dos/windows for too long :-) Maybe use names such as mb_get_uint32, so that it is obvious what C type should be passed as an argument. I wonder if 'mbdata' is the best name for the struct? I think I had used something like 'mchain', but if separate build/breakdown structs are used, maybe mbuild/mbreakdown or mbchain/mdchain? (the NFS code uses the words 'build' and 'dissect' to refer to the two operations). The main idea would just be to try and have the name indicate what information is held by the struct. Another useful 'put' function would be something that adds a number of bytes of 'empty' space to the end of the chain, and sets up a uio/iovec pointing to this space. e.g to read from a file to an mbuf chain you could use: error = mb_put_muio(mbp, &uio, size, &iovp); ... error = VOP_READ(vp, &uio, flag, cred); ... FREE(iovp, M_TEMP); For cases where there is a small (< MLEN) but relatively complex data structure to be extracted from a chain, it may be useful to have a function which just rearranges the mbufs to ensure that a number of bytes become contiguous. It can make an in-mbuf pointer to that space available. In most cases this will avoid having to copy the data. I wonder if these routines are the correct place for the endian conversions? It certainly simplifies the code that must build and parse requests, but requires duplication of each mb_get/mb_put operation. I understand that there isn't currently code in the tree for dealing with odd protocols that use little-endian format for data transmitted on the network (smb is one of these?). Sometimes it is useful to have idempotent init() and free() functions. For example, consider a function which builds a request and sends it, but which must handle errors both before and after the mbuf chain is sent off to the protocol. If mb_init simply NULL'd out the mb_top pointer, then the code could look like this: mb_init(&mb); if (mb_add_xxx(...) != 0) goto out; ...->pru_sosend(..., mb.mb_top, ...); mb_init(&mb); ... if (error) goto out; out: mb_free(&mb); return (error); The pru_sosend() function takes over ownership of the mbuf chain, so there is a need to just blank out the mbdata structure without freeing the chain, and without performing any allocations. An init function which cannot fail also simplifies the code. See callout_init() in kern_timeout.c for similar code. The mb_put_pstring function maybe belongs in the protocol-specific code rather than here, since there are just too many different ways of encoding strings. Different protocols are likely to encode strings in different ways, with respect to length field type and padding/alignment. Some of these mb_ functions return EBADRPC when not enough bytes of data are found in the mbuf chain. It might be better to choose a more generic return code, since these routines are not specific to RPC. Ian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 8 11:14:42 2001 Delivered-To: freebsd-arch@freebsd.org Received: from VL-MS-MR002.sc1.videotron.ca (relais.videotron.ca [24.201.245.36]) by hub.freebsd.org (Postfix) with ESMTP id 79DF937B684; Thu, 8 Feb 2001 11:14:16 -0800 (PST) Received: from jehovah ([24.201.144.31]) by VL-MS-MR002.sc1.videotron.ca (Netscape Messaging Server 4.15) with SMTP id G8GDFD04.BFD; Thu, 8 Feb 2001 14:14:01 -0500 Message-ID: <013201c09203$971be9c0$1f90c918@jehovah> From: "Bosko Milekic" To: "Boris Popov" Cc: , References: Subject: Re: Sequential mbuf read/write extensions Date: Thu, 8 Feb 2001 14:16:07 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2919.6700 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Boris Popov wrote: [...] > Not exactly so. 'option LIBMBUF' will just connect the source file > to kernel makefile. There is no need for any #ifdef's in the code. Right. But I assume LIBMBUF will absolutely be needed if code that uses the routines is compiled. What I just meant to say was: "when the code using these routines grows to be significant enough, then we can just remove the option." > > #define M_TRYWAIT M_WAIT is not right. > > (M_WAIT is no longer to be used in the mbuf code.) > > You omitted the surrounding "#ifndef M_TRYWAIT" which makes this > code portable to RELENG_4 (mind you, this code taken from smbfs). Of > course, this should be stripped before import. I did, you're right. I guess I saw the "ifndef" wrong... I read this with only -CURRENT in mind and was afraid that the mbuf code flags would start mixing in with the malloc code flags -- something I tried to fight off in the past while. > > The succesfull return values are 0, I don't have a problem with this, > > specifically, but I would assume that this: > > if (!mb_init(mbp)) ... would be more "logical" (I use the term > > loosely) if it meant: "if initialization fails" (now it means "if > > initialization is succesful"). > > I'm generally don't like such syntax if function or variable name > do not clearly specify which value it should have/return on success. > Nearly all functions in this file return zero or error code, so the > correct syntax of the above will be: > > error = mb_init(mbp); > if (!error) > > or > > if (error) > return error; > > or > > if (mb_init(mbp) != 0) > return ESOMETHINGEVIL; OK. > > > significantly affected. The names of source and header files are > > > questionable too and I would appreciate good suggestions (currently > > they > > > are subr_mbuf.c and subr_mbuf.h). > > > > Hmmm. Maybe subr_mblib.c and libmb.h ? I don't want to turn this > > into a bikeshed ( :-) ), so I suggest that you decide. Personally, I > > would prefer that it be something other than "subr_mbuf.c" simply > > because it may be a little misleading in some cases. > > Good point. > > > Boris, this is really a great interface and nice looking, clean code. > > I'm sure, this code can be significantly improved by mbuf gurus :) > > -- > Boris Popov > http://www.butya.kz/~bp/ Ok, I have a few things to add (although I'm sure you'll be more into reading Ian Dowse's comments) :-) in mb_append_record(), you walk all the "record" mbufs to get to the last "record." How good would be the tradeoff? i.e. keeping a pointer to the last pkt in the mbdata structure's mbuf chain? We would grow the structure by a pointer, and we may have to maintain the last record pointer; but isn't the only place where we would have to "maintain it" in mb_append_record() anyway? in mb_init(), the m->m_pkthdr.rcvif = NULL; can be ommitted, as MGETHDR() will do that. The m->m_len = 0 should stay for now. m_getm() looks like it should belong in uipc_mbuf.c -- it looks quite a bit like the "all or nothing" allocation routine I have sitting here. The difference is that mine doesn't take size as an argument, but rather the actual count of mbufs and all it does is allocate `count' mbufs and attach a cluster to each one of them. If it can't allocate a cluster or an mbuf at any point, it frees everything and returns. Now that I think about it, I'd much rather have `size' passed in instead, even though some callers may not know the `size' (think drivers that pre-allocate mbufs + clusters, they typically know the `count'), it turns out that it is cheaper to compute the count from the size than the optimal size from the count, in the mbuf case. If you don't mind, I would strongly recommend moving m_getm() to uipc_mbuf.c. Code that doesn't know the `size' but knows the `count' (like some driver code) can do; m = m_get(M_TRYWAIT, MT_DATA); if (m == NULL) { /* we can't even allocate one mbuf, we're really low, so don't even bother calling m_getm(). The other option would be to have m_getm() not require us to pre-allocate an mbuf at all and do all the work, but then that may interfere with code like yours which needs to pass in an existing mbuf that has already been allocated. */ m_free(m); /* fail right here */ } else { size = count * (MLEN + MCLBYTES); if (m_getm(m, size) == NULL) { /* everything has been properly freed for us, we don't have to worry about leaking mbufs. */ /* fail right here. */ } } For this to work, though, m_getm() needs to be modified to free all of `top' chain if it can't get either a cluster or an mbuf. I don't know if this was intentional, but it seems to me that there is a subtle problem in m_getm() as it is now: if (len > MINCLSIZE) { MCLGET(m, M_TRYWAIT); if ((m->m_flags & M_EXT) == 0) { m_freem(m); <------ frees only one mbuf return NULL; } } I think what may happen here is that you will leak your `top' chain if you fail to allocate a cluster. Assuming that the leak does exist and that it is fixed, we have a pretty good mechanism for doing 'all or nothing' allocations. :-) Later, Bosko. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 8 18:31:26 2001 Delivered-To: freebsd-arch@freebsd.org Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94]) by hub.freebsd.org (Postfix) with ESMTP id E3CA837B401; Thu, 8 Feb 2001 18:30:57 -0800 (PST) Received: by relay.butya.kz (Postfix, from userid 1000) id A23A328E1C; Fri, 9 Feb 2001 08:30:48 +0600 (ALMT) Received: from localhost (localhost [127.0.0.1]) by relay.butya.kz (Postfix) with ESMTP id 8565D2863E; Fri, 9 Feb 2001 08:30:48 +0600 (ALMT) Date: Fri, 9 Feb 2001 08:30:48 +0600 (ALMT) From: Boris Popov To: Ian Dowse Cc: freebsd-arch@freebsd.org, freebsd-net@freebsd.org Subject: Re: CFR: Sequential mbuf read/write extensions In-Reply-To: <200102081233.aa16167@salmon.maths.tcd.ie> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 8 Feb 2001, Ian Dowse wrote: > It may be beneficial to use separate structs for the build and > breakdown operations. The two cases have slightly different > requirements: the mb_count field is only useful when building, and > mb_pos is only strictly necessary when breaking down mbuf chains. > The main advantage of using separate structs is better compiler > type checking, especially in the arguments to functions that need > to break down one chain and build another. Yes, I've been thinking about it, because once I've managed to mix build and breakdown buffers :). The only (not essential) disadvantage is that it will require two init/done functions. > The i386 architecture is not fussy about alignment of multi-byte > types in memory operations. However other architectures are not > so forgiving. Some NIC drivers have to do magic to ensure that IP > packets are 4-byte aligned, but this will not help if you are using > a protocol that does not guarantee 4-byte alignment of 32- or 64-bit > quantities within the IP packet. Doing a > > mb_get_dword(...); > mb_get_byte(...); > mb_get_dword(...); > > will cause an alignment exception on the alpha, for example. No, in the current implementation mb_get* functions will work properly. But mb_put* will fail. This can be avoided by implementing alignment-safe set* macros (which can be written in two variants - first form is for aligned objects and second for bad aligned ones). > Someone suggested using numeric names to indicate the size of the > types rather than 'byte', 'word' etc. I'd agree with this too; the > text names are not intuitive unless you have used dos/windows for > too long :-) Maybe use names such as mb_get_uint32, so that it is > obvious what C type should be passed as an argument. Ok, I'd like type/numeric notation. It is definitely better than just mb_get32. > I wonder if 'mbdata' is the best name for the struct? I think I > had used something like 'mchain', but if separate build/breakdown > structs are used, maybe mbuild/mbreakdown or mbchain/mdchain? (the > NFS code uses the words 'build' and 'dissect' to refer to the two > operations). The main idea would just be to try and have the name > indicate what information is held by the struct. Good point and good names too. > Another useful 'put' function would be something that adds a number > of bytes of 'empty' space to the end of the chain, and sets up a > uio/iovec pointing to this space. e.g to read from a file to an > mbuf chain you could use: > > error = mb_put_muio(mbp, &uio, size, &iovp); > ... > error = VOP_READ(vp, &uio, flag, cred); > ... > FREE(iovp, M_TEMP); This can be added later when the code will be written. > For cases where there is a small (< MLEN) but relatively complex > data structure to be extracted from a chain, it may be useful to > have a function which just rearranges the mbufs to ensure that a > number of bytes become contiguous. It can make an in-mbuf pointer > to that space available. In most cases this will avoid having to > copy the data. Hmm, this can cause weird things if one have two or more such structures in the mbuf chain. Eg, at first point mbufs will be rearranged to place first structure properly but will misplace second structure. But in general case - yes, this is useful. > I wonder if these routines are the correct place for the endian > conversions? It certainly simplifies the code that must build and > parse requests, but requires duplication of each mb_get/mb_put > operation. I understand that there isn't currently code in the tree > for dealing with odd protocols that use little-endian format for > data transmitted on the network (smb is one of these?). sys/netncp is another example of the code which deals with little-endian formatted protocol (and mb* code was derived from sys/netncp/ncp_rq.c) I think it is good idea to provide functions for an in-place conversions because it makes code much more readable and reduces the size of generated code. Few additional functions is a good price for that. > Sometimes it is useful to have idempotent init() and free() functions. > For example, consider a function which builds a request and sends > it, but which must handle errors both before and after the mbuf > chain is sent off to the protocol. If mb_init simply NULL'd out > the mb_top pointer, then the code could look like this: [skip] > The pru_sosend() function takes over ownership of the mbuf chain, > so there is a need to just blank out the mbdata structure without > freeing the chain, and without performing any allocations. An init > function which cannot fail also simplifies the code. See callout_init() > in kern_timeout.c for similar code. Hmm, since so_send() can fail and some erros can be recovered by another call to so_send(), I'm just called m_copym() to duplicate the mbuf chain and give it to so_send(). > The mb_put_pstring function maybe belongs in the protocol-specific > code rather than here, since there are just too many different ways > of encoding strings. Different protocols are likely to encode > strings in different ways, with respect to length field type and > padding/alignment. The name 'pstring' associated with 'pascal' type string which is known as 'byte of length followed by data'. If this function doesn't suits to be general then it can be omitted (only netncp/nwfs code uses it). > Some of these mb_ functions return EBADRPC when not enough bytes > of data are found in the mbuf chain. It might be better to choose > a more generic return code, since these routines are not specific > to RPC. EBADRPC returned by all mb_get* functions to indicate that the format of reply is unexpected. > Ian Thanks for great review :) -- Boris Popov http://www.butya.kz/~bp/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 8 18:48:50 2001 Delivered-To: freebsd-arch@freebsd.org Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94]) by hub.freebsd.org (Postfix) with ESMTP id B44AF37B4EC; Thu, 8 Feb 2001 18:48:29 -0800 (PST) Received: by relay.butya.kz (Postfix, from userid 1000) id 0B5A828867; Fri, 9 Feb 2001 08:48:26 +0600 (ALMT) Received: from localhost (localhost [127.0.0.1]) by relay.butya.kz (Postfix) with ESMTP id ED14D28695; Fri, 9 Feb 2001 08:48:26 +0600 (ALMT) Date: Fri, 9 Feb 2001 08:48:26 +0600 (ALMT) From: Boris Popov To: Bosko Milekic Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: Sequential mbuf read/write extensions In-Reply-To: <013201c09203$971be9c0$1f90c918@jehovah> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 8 Feb 2001, Bosko Milekic wrote: > in mb_init(), the m->m_pkthdr.rcvif = NULL; can be ommitted, as > MGETHDR() will do that. The m->m_len = 0 should stay for now. Ok. > drivers that pre-allocate mbufs + clusters, they typically know the > `count'), it turns out that it is cheaper to compute the count from > the size than the optimal size from the count, in the mbuf case. If > you don't mind, I would strongly recommend moving m_getm() to > uipc_mbuf.c. Code that doesn't know the `size' but knows the `count' Agreed, that why this function have a prefix 'm_' :) [code sample skipped] > For this to work, though, m_getm() needs to be modified to free all of > `top' chain if it can't get either a cluster or an mbuf. I don't know > if this was intentional, but it seems to me that there is a subtle > problem in m_getm() as it is now: > > if (len > MINCLSIZE) { > MCLGET(m, M_TRYWAIT); > if ((m->m_flags & M_EXT) == 0) { > m_freem(m); <------ frees only one mbuf ^^^^^^^^^^ cluster is not in the chain yet, so it have to be freed. > return NULL; > } > } > > I think what may happen here is that you will leak your `top' chain if > you fail to allocate a cluster. The original semantic was not to free an entire chain because m_getm() do not reallocates original (top) mbuf(s) (which may contain data) and only adds new mbufs/clusters if possible. So, the calls like m_get(mb->mb_top) will not left the wild pointer. There is also simple way to deal with such behavior: mtop = m_get(...); if (mtop == NULL) fail; if (m_getm(mtop) == NULL) { m_freem(mtop); fail; } Probably m_getm() should return error code rather than pointer to mbuf to avoid confusion. -- Boris Popov http://www.butya.kz/~bp/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 8 19: 7:44 2001 Delivered-To: freebsd-arch@freebsd.org Received: from VL-MS-MR001.sc1.videotron.ca (relais.videotron.ca [24.201.245.36]) by hub.freebsd.org (Postfix) with ESMTP id 1AC5837B401; Thu, 8 Feb 2001 19:07:22 -0800 (PST) Received: from jehovah ([24.201.144.31]) by VL-MS-MR001.sc1.videotron.ca (Netscape Messaging Server 4.15) with SMTP id G8GZC902.6Z2; Thu, 8 Feb 2001 22:07:21 -0500 Message-ID: <001301c09245$b7400a00$1f90c918@jehovah> From: "Bosko Milekic" To: "Boris Popov" Cc: , References: Subject: Re: Sequential mbuf read/write extensions Date: Thu, 8 Feb 2001 22:09:28 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2919.6700 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Boris Popov wrote: [...] > > For this to work, though, m_getm() needs to be modified to free all of > > `top' chain if it can't get either a cluster or an mbuf. I don't know > > if this was intentional, but it seems to me that there is a subtle > > problem in m_getm() as it is now: > > > > if (len > MINCLSIZE) { > > MCLGET(m, M_TRYWAIT); > > if ((m->m_flags & M_EXT) == 0) { > > m_freem(m); <------ frees only one mbuf > ^^^^^^^^^^ cluster is not in the chain yet, so it have to be > freed. m_free() may be more appropriate than m_freem() then, but see below. > > return NULL; > > } > > } > > > > I think what may happen here is that you will leak your `top' chain if > > you fail to allocate a cluster. > > The original semantic was not to free an entire chain because > m_getm() do not reallocates original (top) mbuf(s) (which may contain > data) and only adds new mbufs/clusters if possible. So, the calls like > m_get(mb->mb_top) will not left the wild pointer. There is also simple way > to deal with such behavior: > > mtop = m_get(...); > if (mtop == NULL) > fail; > if (m_getm(mtop) == NULL) { > m_freem(mtop); > fail; > } > > Probably m_getm() should return error code rather than pointer to > mbuf to avoid confusion. I understand this part, but what I think you missed in my comment is that m_getm() should probably free what it already allocated before finally failing. It may not need to free `top' because of the wild pointer, as you say. But think of this: m_getm() is called with a larger `size' - it decides that given the `size' it will need to allocate a total of exactly 6 mbufs and 6 clusters for each mbuf. It loops and allocates, succesfully, 5 of those mbufs and 5 clusters. So `top' chain has now grown and includes those mbufs. Then what happens in the last iteration is that it allocates the 6th mbuf OK (it has not yet placed it on the chain) and fails to allocate a cluster, so it frees just that one mbuf (and not the mbufs it allocated in prior iterations and attached to `top' chain) and returns NULL. Your code that calls m_getm() then just fails, leaving `top' with what it could allocate. Note that in my mail I said "assuming this is a leak," thus recognizing the possibility that you did this intentionally. :-) Right now, I'll assume that this _was_ intentional, as that is what I understand from the above. But in any case, if we do move this to uipc_mbuf.c, we need to do one of the following: (a) make m_getm() free what it allocated in previous loop iterations before it failed (as described above) or (b) leave m_getm() the way it is BUT write an additional function that will simply wrap the call to m_getm() and flush properly for it if it fails (EXACTLY like your code snippet above). I'll gladly settle for either, but if we do go with (b), then the m_freem() should be changed to an m_free(), as it reflects the fact that we are only freeing the one mbuf and we should document this behavior, certainly. If you want, I'll roll up a diff in a few days (once I get what is presently dragging in my "commit this" queue out) and commit it. If you prefer to do this yourself, then feel free. :-) > -- > Boris Popov > http://www.butya.kz/~bp/ Regards, Bosko. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 8 19:21:25 2001 Delivered-To: freebsd-arch@freebsd.org Received: from grendel.bsdi.com (unknown [199.79.183.5]) by hub.freebsd.org (Postfix) with ESMTP id 95F2F37B401 for ; Thu, 8 Feb 2001 19:21:07 -0800 (PST) Received: from grendel.bsdi.com (cp@localhost.bsdi.com [127.0.0.1]) by grendel.bsdi.com (8.11.1/8.9.3) with ESMTP id f193L6k00368 for ; Thu, 8 Feb 2001 20:21:06 -0700 (MST) (envelope-from cp@grendel.bsdi.com) Message-Id: <200102090321.f193L6k00368@grendel.bsdi.com> To: freebsd-arch@FreeBSD.ORG Subject: usb, clists, spltty, splbio From: Chuck Paterson Date: Thu, 08 Feb 2001 20:21:06 -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I have been mucking with making moused talk to a usb joystick. This all turned out pretty straight forward, all user land code in moused talking to the hid device. The problem is that the kernel crashes randomly, more often as the system get more loaded. A couple of times I got a panic in the clist code, but it really didn't show anything direct. Oh yah, this is with stable, not current. Reading through the code I found what looks like a problem. The hid, and other usb code use clists. The various usb code is protected by splusb which is a defined as splbio. The function b_to_q() and all the other clist code use spltty. I changed the definition of spltty from GENSPL(spltty, |=, tty_imask, 14) to GENSPL(spltty, |=, tty_imask | bio_imask, 14) and the crashes appear to have gone away. I say appear, it has run longer now than it has before, but it hasn't been up much more than twice as long yet. I am not quite sure the best way to deal with this. The only idea I have thought of that I like at all is to create a splclist() which is the or of tty and bio and put that into the code that mucks with clists, perhaps just the allocation/free routines. Comments Chuck To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 8 21: 0:20 2001 Delivered-To: freebsd-arch@freebsd.org Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94]) by hub.freebsd.org (Postfix) with ESMTP id 91CB337B401; Thu, 8 Feb 2001 20:59:56 -0800 (PST) Received: by relay.butya.kz (Postfix, from userid 1000) id C8FF428695; Fri, 9 Feb 2001 10:59:43 +0600 (ALMT) Received: from localhost (localhost [127.0.0.1]) by relay.butya.kz (Postfix) with ESMTP id BE4992863E; Fri, 9 Feb 2001 10:59:43 +0600 (ALMT) Date: Fri, 9 Feb 2001 10:59:43 +0600 (ALMT) From: Boris Popov To: Bosko Milekic Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: Sequential mbuf read/write extensions In-Reply-To: <001301c09245$b7400a00$1f90c918@jehovah> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 8 Feb 2001, Bosko Milekic wrote: > any case, if we do move this to uipc_mbuf.c, we need to do one of the > following: > > (a) make m_getm() free what it allocated in previous loop iterations > before it failed (as described above) or > > (b) leave m_getm() the way it is BUT write an additional function that > will simply wrap the call to m_getm() and flush properly for it if it > fails (EXACTLY like your code snippet above). Ok, I think the (a) is a right way. There is no point to hold partially allocated mbuf chain. And function should return error code, not a pointer. > I'll gladly settle for either, but if we do go with (b), then the > m_freem() should be changed to an m_free(), as it reflects the fact > that we are only freeing the one mbuf and we should document this > behavior, certainly. If you want, I'll roll up a diff in a few days > (once I get what is presently dragging in my "commit this" queue out) > and commit it. If you prefer to do this yourself, then feel free. :-) Yes, I would appreciate your help on it. -- Boris Popov http://www.butya.kz/~bp/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 8 21:15: 3 2001 Delivered-To: freebsd-arch@freebsd.org Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222]) by hub.freebsd.org (Postfix) with ESMTP id 3B23237B491 for ; Thu, 8 Feb 2001 21:14:45 -0800 (PST) Received: from foo.osd.bsdi.com (root@foo.osd.bsdi.com [204.216.28.137]) by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id f195EXx65579; Thu, 8 Feb 2001 21:14:33 -0800 (PST) (envelope-from jhb@foo.osd.bsdi.com) Received: (from jhb@localhost) by foo.osd.bsdi.com (8.11.1/8.11.1) id f195E9937701; Thu, 8 Feb 2001 21:14:09 -0800 (PST) (envelope-from jhb) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <200102090321.f193L6k00368@grendel.bsdi.com> Date: Thu, 08 Feb 2001 21:14:08 -0800 (PST) Organization: BSD, Inc. From: John Baldwin To: Chuck Paterson Subject: RE: usb, clists, spltty, splbio Cc: freebsd-arch@FreeBSD.ORG Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 09-Feb-01 Chuck Paterson wrote: > > I have been mucking with making moused talk to a usb joystick. > This all turned out pretty straight forward, all user land code in > moused talking to the hid device. The problem is that the kernel > crashes randomly, more often as the system get more loaded. A couple > of times I got a panic in the clist code, but it really didn't > show anything direct. Oh yah, this is with stable, not current. > > Reading through the code I found what looks like a problem. > The hid, and other usb code use clists. The various usb code is > protected by splusb which is a defined as splbio. The function > b_to_q() and all the other clist code use spltty. > > I changed the definition of spltty from > > GENSPL(spltty, |=, tty_imask, 14) > > to > > GENSPL(spltty, |=, tty_imask | bio_imask, 14) > > and the crashes appear to have gone away. I say appear, it has run > longer now than it has before, but it hasn't been up much more than > twice as long yet. > > I am not quite sure the best way to deal with this. The only > idea I have thought of that I like at all is to create a splclist() > which is the or of tty and bio and put that into the code that > mucks with clists, perhaps just the allocation/free routines. We have a similar problem with the slip and ppp devices, which have run code under botth spltty and splnet. The trick we use there is to actually change the imasks by doing something along the lines of: net_mask |= tty_imask; tty_imask = net_imask; So there is at least prior precedent for doing this sort of thing. > Comments > Chuck -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Feb 9 11:12: 2 2001 Delivered-To: freebsd-arch@freebsd.org Received: from grendel.bsdi.com (grendel.twistedbit.com [199.79.183.5]) by hub.freebsd.org (Postfix) with ESMTP id 0BB8037B69F; Fri, 9 Feb 2001 11:11:42 -0800 (PST) Received: from grendel.bsdi.com (cp@localhost.bsdi.com [127.0.0.1]) by grendel.bsdi.com (8.11.1/8.9.3) with ESMTP id f19JBfk06298; Fri, 9 Feb 2001 12:11:41 -0700 (MST) (envelope-from cp@grendel.bsdi.com) Message-Id: <200102091911.f19JBfk06298@grendel.bsdi.com> To: John Baldwin Cc: freebsd-arch@FreeBSD.ORG Subject: Re: usb, clists, spltty, splbio In-reply-to: Your message of "Thu, 08 Feb 2001 21:14:08 PST." From: Chuck Paterson Date: Fri, 09 Feb 2001 12:11:41 -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG The following code segment is from the top the hid open routine. I'll start running this code this afternoon. This is one of those cases where checking it into current does zero good. Chuck Index: uhid.c =================================================================== RCS file: /cp/cvs.freebsd/src/sys/dev/usb/uhid.c,v retrieving revision 1.27.2.4 diff -u -r1.27.2.4 uhid.c --- uhid.c 2000/10/31 22:31:29 1.27.2.4 +++ uhid.c 2001/02/09 19:06:36 @@ -375,6 +375,18 @@ { struct uhid_softc *sc; usbd_status err; +#if defined(__FreeBSD__) && defined(__i386__) + static int hid_opened; + + if (hid_opened == 0) { + int s; + s = splhigh(); + tty_imask |= bio_imask; + update_intr_masks(); + splx(s); + hid_opened = 1; + } +#endif USB_GET_SC_OPEN(uhid, UHIDUNIT(dev), sc); To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Feb 9 17:38: 5 2001 Delivered-To: freebsd-arch@freebsd.org Received: from Awfulhak.org (awfulhak.demon.co.uk [194.222.196.252]) by hub.freebsd.org (Postfix) with ESMTP id 50EEC37B6A4; Fri, 9 Feb 2001 17:37:35 -0800 (PST) Received: from hak.lan.Awfulhak.org (root@hak.lan.Awfulhak.org [172.16.0.12]) by Awfulhak.org (8.11.2/8.11.2) with ESMTP id f1A1bWR08160; Sat, 10 Feb 2001 01:37:32 GMT (envelope-from brian@lan.Awfulhak.org) Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1]) by hak.lan.Awfulhak.org (8.11.2/8.11.1) with ESMTP id f19HaJN01324; Fri, 9 Feb 2001 17:36:19 GMT (envelope-from brian@hak.lan.Awfulhak.org) Message-Id: <200102091736.f19HaJN01324@hak.lan.Awfulhak.org> X-Mailer: exmh version 2.3.1 01/18/2001 with nmh-1.0.4 To: John Baldwin Cc: Chuck Paterson , freebsd-arch@FreeBSD.ORG, brian@Awfulhak.org Subject: Re: usb, clists, spltty, splbio In-Reply-To: Message from John Baldwin of "Thu, 08 Feb 2001 21:14:08 PST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 09 Feb 2001 17:36:19 +0000 From: Brian Somers Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > On 09-Feb-01 Chuck Paterson wrote: > > > > I have been mucking with making moused talk to a usb joystick. > > This all turned out pretty straight forward, all user land code in > > moused talking to the hid device. The problem is that the kernel > > crashes randomly, more often as the system get more loaded. A couple > > of times I got a panic in the clist code, but it really didn't > > show anything direct. Oh yah, this is with stable, not current. > > > > Reading through the code I found what looks like a problem. > > The hid, and other usb code use clists. The various usb code is > > protected by splusb which is a defined as splbio. The function > > b_to_q() and all the other clist code use spltty. > > > > I changed the definition of spltty from > > > > GENSPL(spltty, |=, tty_imask, > 14) > > > > to > > > > GENSPL(spltty, |=, tty_imask | bio_imask, > 14) > > > > and the crashes appear to have gone away. I say appear, it has run > > longer now than it has before, but it hasn't been up much more than > > twice as long yet. > > > > I am not quite sure the best way to deal with this. The only > > idea I have thought of that I like at all is to create a splclist() > > which is the or of tty and bio and put that into the code that > > mucks with clists, perhaps just the allocation/free routines. > > We have a similar problem with the slip and ppp devices, which have > run code under botth spltty and splnet. The trick we use there is > to actually change the imasks by doing something along the lines of: > > net_mask |= tty_imask; > tty_imask = net_imask; > > So there is at least prior precedent for doing this sort of thing. Hmm. I would think that Chucks' idea has the advantage that it doesn't adversely affect existing splnet/spltty code. Despite this only mattering for a finite amount of time, I don't think the precedent is good here :-/ > > Comments > > Chuck > > > -- > > John Baldwin -- http://www.FreeBSD.org/~jhb/ > PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc > "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ -- Brian Don't _EVER_ lose your sense of humour ! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message