From owner-freebsd-arch  Sun Feb  4 16:14:31 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP id F101537B401
	for <freebsd-arch@FreeBSD.org>; Sun,  4 Feb 2001 16:14:14 -0800 (PST)
Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.11.1/8.11.1) with SMTP id f150EEh75545
	for <freebsd-arch@FreeBSD.org>; Sun, 4 Feb 2001 19:14:14 -0500 (EST)
	(envelope-from robert@fledge.watson.org)
Date: Sun, 4 Feb 2001 19:14:14 -0500 (EST)
From: Robert Watson <rwatson@FreeBSD.org>
X-Sender: robert@fledge.watson.org
To: freebsd-arch@FreeBSD.org
Subject: Tests for NULL p_ucred under p_cred -- are they needed?
Message-ID: <Pine.NEB.3.96L.1010204190927.74962D-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


I've noticed that at various points in the kernel code, there are tests to
check that the ucred structure in a proc is non-NULL before using it. 
Under what circumstances do we believe it is possible for the ucred
pointer to be non-NULL?  It seems that, in normal usage, it should always
be defined--the only points where it might be NULL would be during process
creation and process exit.  Are these windows long enough for it to be a
concern?  Are appropriate process locks held, under SMPng, such that it's
never possible to grab a ucred structure for a process while it is NULL?

It seems that there are other components of the code that assume that if
(p) is non-NULL, then a ucred must be defined for the process, which seems
like a consistent assumption assuming appropriate protections are in
place.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
robert@fledge.watson.org      NAI Labs, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5  2:12: 0 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from lndsmtp01.ico.com (unknown [212.57.217.43])
	by hub.freebsd.org (Postfix) with ESMTP
	id 23E0737B401; Mon,  5 Feb 2001 02:11:41 -0800 (PST)
Received: from lndgate01.ico.com (unverified) by lndsmtp01.ico.com
 (Content Technologies SMTPRS 4.1.5) with ESMTP id <Td439d92b518a69b796@lndsmtp01.ico.com>;
 Mon, 5 Feb 2001 09:48:55 +0000
Received: from zoo.co.uk (212.57.223.232 [212.57.223.232]) by lndgate01.ico.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21)
	id CRG16CS5; Mon, 5 Feb 2001 09:53:40 -0000
Message-ID: <3A7E767B.6AADB3B5@zoo.co.uk>
Date: Mon, 05 Feb 2001 09:46:35 +0000
From: Nathan Gould <ngould@zoo.co.uk>
X-Mailer: Mozilla 4.75 [en] (X11; U; OpenBSD 2.8 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Robert Watson <rwatson@FreeBSD.ORG>
Cc: freebsd-arch@FreeBSD.ORG
Subject: Re: Tests for NULL p_ucred under p_cred -- are they needed?
References: <Pine.NEB.3.96L.1010204190927.74962D-100000@fledge.watson.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Robert Watson wrote:

> I've noticed that at various points in the kernel code, there are tests to
> check that the ucred structure in a proc is non-NULL before using it.
> Under what circumstances do we believe it is possible for the ucred
> pointer to be non-NULL?  It seems that, in normal usage, it should always
> be defined--the only points where it might be NULL would be during process
> creation and process exit.  Are these windows long enough for it to be a
> concern?  Are appropriate process locks held, under SMPng, such that it's
> never possible to grab a ucred structure for a process while it is NULL?
>
> It seems that there are other components of the code that assume that if
> (p) is non-NULL, then a ucred must be defined for the process, which seems
> like a consistent assumption assuming appropriate protections are in
> place.
>
> Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
> robert@fledge.watson.org      NAI Labs, Safeport Network Services
>
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-arch" in the body of the message

Surely, if for no other reason, we should be checking for abnormalities such
as non-Null for security reasons i.e. security breaches tend to be based on
non-corformance to publicised identified usage.

Just a thought...

Nathan Gould
ngould@zoo.co.uk


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5  7:45:16 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP id 52AC737B491
	for <freebsd-arch@FreeBSD.ORG>; Mon,  5 Feb 2001 07:44:55 -0800 (PST)
Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.11.1/8.11.1) with SMTP id f15FiWh83452;
	Mon, 5 Feb 2001 10:44:32 -0500 (EST)
	(envelope-from robert@fledge.watson.org)
Date: Mon, 5 Feb 2001 10:44:32 -0500 (EST)
From: Robert Watson <rwatson@FreeBSD.ORG>
X-Sender: robert@fledge.watson.org
To: Nathan Gould <ngould@zoo.co.uk>
Cc: freebsd-arch@FreeBSD.ORG
Subject: Re: Tests for NULL p_ucred under p_cred -- are they needed?
In-Reply-To: <3A7E767B.6AADB3B5@zoo.co.uk>
Message-ID: <Pine.NEB.3.96L.1010205102219.74962L-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Mon, 5 Feb 2001, Nathan Gould wrote:

> Robert Watson wrote:
> 
> > I've noticed that at various points in the kernel code, there are tests to
> > check that the ucred structure in a proc is non-NULL before using it.
> > Under what circumstances do we believe it is possible for the ucred
> > pointer to be non-NULL?  It seems that, in normal usage, it should always
> > be defined--the only points where it might be NULL would be during process
> > creation and process exit.  Are these windows long enough for it to be a
> > concern?  Are appropriate process locks held, under SMPng, such that it's
> > never possible to grab a ucred structure for a process while it is NULL?
> >
> > It seems that there are other components of the code that assume that if
> > (p) is non-NULL, then a ucred must be defined for the process, which seems
> > like a consistent assumption assuming appropriate protections are in
> > place.
> 
> Surely, if for no other reason, we should be checking for abnormalities
> such as non-Null for security reasons i.e. security breaches tend to be
> based on non-corformance to publicised identified usage. 

Well, in the event that the credential was NULL, a number of chunks of
code currently present would simply panic; my question was about whether
or not those chunks of code are incorrect, or whether we can trim out all
the conditionals that test (p_cred) (et al); here are a few samples where
there is a conditional:

kern_proc.c:392
fill_kinfo_proc(p, kp)
        struct proc *p;
        struct kinfo_proc *kp;
{
...
        if (p->p_cred) {
                kp->ki_uid = p->p_cred->pc_ucred->cr_uid;
                kp->ki_ruid = p->p_cred->p_ruid;
                kp->ki_svuid = p->p_cred->p_svuid;
...

kern_proc.c:600, 606
static int
sysctl_kern_proc(SYSCTL_HANDLER_ARGS)
{
...
                        case KERN_PROC_UID:
                                if (p->p_ucred == NULL ||
                                    p->p_ucred->cr_uid != (uid_t)name[0])
                                        continue;
                                break;
...
                        case KERN_PROC_RUID:
                                if (p->p_ucred == NULL ||
                                    p->p_cred->p_ruid != (uid_t)name[0])
                                        continue;
                                break;
                        }

It appears to me that a struct proc should always have a defined p_cred,
although there does appear to be a small window in fork1() where it has
been added to the global process list and the struct proc is not yet fully
initialized.  However, the p_cred pointer in that case is the parent's
value; and all processes appear to inherit their credential from proc0
which has one hard-coded in init_main.c.  kern_exit.c appears to hold the
process lock while releasing both the ucred and cred structures; it's
possible there is a window there also because the process isn't removed
from some of it's inter-process relationships (pgrp, zombproc, p_sibling)
until after the credential has been freed, and the process lock has been
released.

However, there is a fair amount of code that seems to assume the
credential is always defined; largely, that appears to be the case for
code that acts on behalf of the process: maybe the key here is that a
process's credentials must always be defined between the end of fork1()
and the beginning of exit(), meaning that when a process itself requests a
service, it will be defined and can be relied on, but during process
creation/teardown, the credential may be NULL and therefore code acting on
the process cannot assume that the credential exists.  Not that procfs
chooses to ignore processes without credentials:

procfs_vnops.c: 407
static int
procfs_getattr(ap)
        struct vop_getattr_args /* {
                struct vnode *a_vp;
                struct vattr *a_vap;
                struct ucred *a_cred;
                struct proc *a_p;
        } */ *ap;
{
...
default:
                procp = PFIND(pfs->pfs_pid);
                if (procp == 0 || procp->p_cred == NULL ||
                    procp->p_ucred == NULL)
                        return (ENOENT);


The code snippets above came from sysctl() code where a process is
retrieving information on other processes, similarly.  An exception to
this would be in Poul-Henning's p_trespass() from RELENG_4 and early
RELENG_5, where p_trespass() is invoked on processes that may receive
signals, but without a credential==NULL check that I can find (this is
from RELENG_4_2_0_RELEASE):

kern_prot.c: 966
int
p_trespass(struct proc *p1, struct proc *p2)
{
...
        if (p1->p_cred->p_ruid == p2->p_cred->p_ruid)
                return (0);


As invoked from kern_sig.c

kern_sig.c: 100, 876
#define CANSIGNAL(p, q, sig) \
        (!p_trespass(p, q) || \
        ((sig) == SIGCONT && (q)->p_session == (p)->p_session))
...
int
kill(cp, uap)
        register struct proc *cp;
        register struct kill_args *uap;
{
...
                /* kill single process */
                if ((p = pfind(uap->pid)) == NULL)
                        return (ESRCH);
                if (!CANSIGNAL(cp, p, uap->signum))
                        return (EPERM);

In any case, there seems to be some inconsistency.  It would seem that
either (a) it is an invariant that p_cred is non-NULL for all reachable
processes via various process lists (except unused processes), (b) it's an
invariant that p_cred is non-NULL between the end of fork1() and the
beginning of exit(), and that p_cred is therefore always defined if you're
acting on behalf of the process, but not necessarily if you're acting on
the process.

Clearly, (1) would make life easier, and mean we could remove a fair
number of checks.  However, it may be that (b) is the case, in which case
the signal code might require fixing, or the invariants it depends on at
least require documenting.  This relevant also as I overhaul the process
access control routines, because I need to know if it's possible to have
processes without credentials, and if so, what it means.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
robert@fledge.watson.org      NAI Labs, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5  9:26:49 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.wgate.com (mail.wgate.com [38.219.83.4])
	by hub.freebsd.org (Postfix) with ESMTP
	id AC9BA37B503; Mon,  5 Feb 2001 09:26:29 -0800 (PST)
Received: from jesup.eng.tvol.net ([10.32.2.26]) by mail.wgate.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21)
	id 1LP8YWWG; Mon, 5 Feb 2001 12:26:29 -0500
Reply-To: Randell Jesup <rjesup@wgate.com>
To: Matt Dillon <dillon@earth.backplane.com>
Cc: Matthew Jacob <mjacob@feral.com>,
	"Justin T. Gibbs" <gibbs@scsiguy.com>,
	Mike Smith <msmith@FreeBSD.ORG>, Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386)
References: <Pine.LNX.4.21.0102031323080.27128-100000@zeppo.feral.com>
	<200102040026.f140QuD12547@earth.backplane.com>
From: Randell Jesup <rjesup@wgate.com>
Date: 05 Feb 2001 12:30:50 -0500
In-Reply-To: Matt Dillon's message of "Sat, 3 Feb 2001 16:26:56 -0800 (PST)"
Message-ID: <ybuelxdnik5.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net>
User-Agent: Gnus/5.0807 (Gnus v5.8.7) Emacs/20.7
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Matt Dillon <dillon@earth.backplane.com> writes:
>    This is a reasonable criticism, but putting aside the issue of bloating
>    kernel stack useage from huge struct buf structures there is also the 
>    issue of whether any static limit is 'reasonable'.

        Good point.

>    The device driver API supports arbitrary raw read and raw write
>    sizes, but nearly all the device drivers convert read() and write()
>    calls to physio() calls, and those then convert the parameters 
>    to struct buf / VOP_STRATEGY() calls.
>
>    There are only two solutions that I can see:
>
>    (1) have the SCSI tape device code not convert raw reads and writes
>	to VOP_STRATEGY calls and instead manage the KVA for the I/O via some
>	other mechanism.

        This seems rather painful and makes support for large IO's very
driver-dependant and confusing.

>    (2) Modify the 'struct buf' b_pages[] array to instead be a pointer
>	to an array.  Include the original static array under another name
>	for compatibility purposes and have the init code default to 
>	assigning b_pages to the original embedded static array.
>
>	Then the physio code could be adjusted to dynamically MALLOC the
>	necessary pages array if the static one in the supplied buffer is
>	insufficient.

        So, how reasonable is this?  It seems like a pretty good solution,
but I'm far from up-to-speed on the internals here.

-- 
Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94)
rjesup@wgate.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5  9:31:19 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67])
	by hub.freebsd.org (Postfix) with ESMTP
	id 559B037B503; Mon,  5 Feb 2001 09:31:02 -0800 (PST)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.1/8.9.3) id f15HUIU21219;
	Mon, 5 Feb 2001 09:30:18 -0800 (PST)
	(envelope-from dillon)
Date: Mon, 5 Feb 2001 09:30:18 -0800 (PST)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200102051730.f15HUIU21219@earth.backplane.com>
To: Randell Jesup <rjesup@wgate.com>
Cc: Matthew Jacob <mjacob@feral.com>,
	"Justin T. Gibbs" <gibbs@scsiguy.com>,
	Mike Smith <msmith@FreeBSD.ORG>, Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386)
References: <Pine.LNX.4.21.0102031323080.27128-100000@zeppo.feral.com>
	<200102040026.f140QuD12547@earth.backplane.com> <ybuelxdnik5.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


:>    (1) have the SCSI tape device code not convert raw reads and writes
:>	to VOP_STRATEGY calls and instead manage the KVA for the I/O via some
:>	other mechanism.
:
:        This seems rather painful and makes support for large IO's very
:driver-dependant and confusing.
:...
:>
:>	Then the physio code could be adjusted to dynamically MALLOC the
:>	necessary pages array if the static one in the supplied buffer is
:>	insufficient.
:
:        So, how reasonable is this?  It seems like a pretty good solution,
:but I'm far from up-to-speed on the internals here.
:
:-- 
:Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94)
:rjesup@wgate.com

    I think what's reasonable is to wait until someone - Poul maybe, 
    puts a better I/O buffering subsytem in place.  Anything we do right
    now will be a bad hack.

    The funny thing about all of this is that we go to great pains to
    make things contiguous in KVM, but the bus dma code has to then break
    things up into page-by-page DMAs anyway.  I'd much rather just hand the
    I/O subsystem a list of vm_page_t's without bothering to map them into
    KVM.

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5  9:36: 5 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.wgate.com (mail.wgate.com [38.219.83.4])
	by hub.freebsd.org (Postfix) with ESMTP
	id 6872D37B65D; Mon,  5 Feb 2001 09:35:45 -0800 (PST)
Received: from jesup.eng.tvol.net ([10.32.2.26]) by mail.wgate.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21)
	id 1LP8YXAJ; Mon, 5 Feb 2001 12:35:38 -0500
Reply-To: Randell Jesup <rjesup@wgate.com>
To: Cy Schubert - ITSD Open Systems Group <Cy.Schubert@uumail.gov.bc.ca>
Cc: Matt Dillon <dillon@earth.backplane.com>,
	Mike Smith <msmith@FreeBSD.ORG>, Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up  {MAX,DFL}*SIZ in i386)
References: <200102031946.f13JkBA08356@cwsys.cwsent.com>
From: Randell Jesup <rjesup@wgate.com>
Date: 05 Feb 2001 12:39:59 -0500
In-Reply-To: Cy Schubert - ITSD Open Systems Group's message of "Sat, 03 Feb 2001 11:45:44 -0800"
Message-ID: <ybuae81ni4w.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net>
User-Agent: Gnus/5.0807 (Gnus v5.8.7) Emacs/20.7
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Cy Schubert - ITSD Open Systems Group <Cy.Schubert@uumail.gov.bc.ca> writes:
>>     And, finally, while large I/O's may seem to be a good idea, they can
>>     actually interfere with the time-share mechanisms that smooth system
>>     operation.  If you queue a 1 MByte I/O to a disk device, that disk
>>     device is locked up doing that one I/O for a long time (in cpu-time
>>     terms).  Having a large number of bytes queued for I/O on one device
>>     can interfere with the performance of another device.  In short,
>>     your performance is not going to get better and could very well get
>>     worse.
>
>I remember an IBM MVS course course that made this point abundantly 
>clear.  The short of it was that if your system was primarily used as a 
>batch system, e.g. response time didn't matter but throughput did, use 
>large block sizes.  If on the other hand your primary workload was time 
>sharing or transaction processing applications, smaller block sizes 
>would improve response times but reduce throughput.  Large block sizes 
>tend to monopolise I/O channels.

        Ok.  However, a given machine may be used for either heavy batch
server-style use (say email, DB), or for more interactive work (including
things like serving real-time requests like web pages).  Also, usages can
vary over time and load - when there are a bunch of processes accessing the
disk with smallish IO's and/or paging (on that device), we don't want a
large IO tying it up for a while; while when there are few or one process
accessing the channel we probably don't mind running larger requests.

        So, the point (as Matt mentioned) is whether any static limit is
appropriate?  Or should it be dynamic or at least adjustable?  When is a
smaller limit better?  When do we want a larger limit?  Also, devices
should be able specify higher (or lower) limits, like for SCSI tape
drives.

        Personally, I think a dynamic system is preferable, but obviously
more complex.  In any case I think it should be adjustable statically.

-- 
Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94)
rjesup@wgate.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5  9:52:53 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from feral.com (feral.com [192.67.166.1])
	by hub.freebsd.org (Postfix) with ESMTP
	id 97F5537B65D; Mon,  5 Feb 2001 09:52:35 -0800 (PST)
Received: from zeppo.feral.com (IDENT:mjacob@zeppo [192.67.166.71])
	by feral.com (8.9.3/8.9.3) with ESMTP id JAA04006;
	Mon, 5 Feb 2001 09:52:22 -0800
Date: Mon, 5 Feb 2001 09:52:20 -0800 (PST)
From: Matthew Jacob <mjacob@feral.com>
Reply-To: mjacob@feral.com
To: Matt Dillon <dillon@earth.backplane.com>
Cc: Randell Jesup <rjesup@wgate.com>,
	"Justin T. Gibbs" <gibbs@scsiguy.com>,
	Mike Smith <msmith@FreeBSD.ORG>, Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in
 i386)
In-Reply-To: <200102051730.f15HUIU21219@earth.backplane.com>
Message-ID: <Pine.LNX.4.21.0102050949420.2887-100000@zeppo.feral.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


>     The funny thing about all of this is that we go to great pains to
>     make things contiguous in KVM, but the bus dma code has to then break
>     things up into page-by-page DMAs anyway.  I'd much rather just hand the
>     I/O subsystem a list of vm_page_t's without bothering to map them into
>     KVM.

See solaris && SunOS for this one. Also, the busdma code doesn't 'have' to
break things up. If the underlying physical pages are contiguous then there's
no need to have multiple entries.

You should note, btw, that not all archictures require or can use
scatter-gather (sparc, for instance, which has an iommu).


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 12: 8:19 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from aslan.scsiguy.com (mail.scsiguy.com [63.229.232.106])
	by hub.freebsd.org (Postfix) with ESMTP
	id 1ACCE37B491; Mon,  5 Feb 2001 12:08:01 -0800 (PST)
Received: from scsiguy.com (localhost [127.0.0.1])
	by aslan.scsiguy.com (8.11.0/8.9.3) with ESMTP id f15K6bO49659;
	Mon, 5 Feb 2001 13:06:54 -0700 (MST)
	(envelope-from gibbs@scsiguy.com)
Message-Id: <200102052006.f15K6bO49659@aslan.scsiguy.com>
To: Randell Jesup <rjesup@wgate.com>
Cc: Matt Dillon <dillon@earth.backplane.com>,
	Matthew Jacob <mjacob@feral.com>, Mike Smith <msmith@FreeBSD.ORG>,
	Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) 
In-Reply-To: Your message of "05 Feb 2001 12:30:50 EST."
             <ybuelxdnik5.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net> 
Date: Mon, 05 Feb 2001 13:06:37 -0700
From: "Justin T. Gibbs" <gibbs@scsiguy.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

>>    (2) Modify the 'struct buf' b_pages[] array to instead be a pointer
>>	to an array.  Include the original static array under another name
>>	for compatibility purposes and have the init code default to 
>>	assigning b_pages to the original embedded static array.
>>
>>	Then the physio code could be adjusted to dynamically MALLOC the
>>	necessary pages array if the static one in the supplied buffer is
>>	insufficient.
>
>        So, how reasonable is this?  It seems like a pretty good solution,
>but I'm far from up-to-speed on the internals here.

I'd rather allow bufs (or bios) to be chained and let the block devices
decide how to break them up.  This simplifies the clustering code too
as you avoid all of the VM operations to combine bufs into a single cluster
buf.

--
Justin


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 12:47:44 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP
	id 8327237B491; Mon,  5 Feb 2001 12:47:25 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f15Kl7S09686;
	Mon, 5 Feb 2001 12:47:07 -0800 (PST)
Date: Mon, 5 Feb 2001 12:47:07 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
Cc: Randell Jesup <rjesup@wgate.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Matthew Jacob <mjacob@feral.com>, Mike Smith <msmith@FreeBSD.ORG>,
	Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386)
Message-ID: <20010205124707.Y26076@fw.wintelcom.net>
References: <ybuelxdnik5.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net> <200102052006.f15K6bO49659@aslan.scsiguy.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200102052006.f15K6bO49659@aslan.scsiguy.com>; from gibbs@scsiguy.com on Mon, Feb 05, 2001 at 01:06:37PM -0700
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Justin T. Gibbs <gibbs@scsiguy.com> [010205 12:08] wrote:
> >>    (2) Modify the 'struct buf' b_pages[] array to instead be a pointer
> >>	to an array.  Include the original static array under another name
> >>	for compatibility purposes and have the init code default to 
> >>	assigning b_pages to the original embedded static array.
> >>
> >>	Then the physio code could be adjusted to dynamically MALLOC the
> >>	necessary pages array if the static one in the supplied buffer is
> >>	insufficient.
> >
> >        So, how reasonable is this?  It seems like a pretty good solution,
> >but I'm far from up-to-speed on the internals here.
> 
> I'd rather allow bufs (or bios) to be chained and let the block devices
> decide how to break them up.  This simplifies the clustering code too
> as you avoid all of the VM operations to combine bufs into a single cluster
> buf.

One of the suggestions that Poul-Henning made was to have the device
somehow specify an optimal clustering strategy, being able to specify
bounds and sizes.

For instance an NFS commit request could be megabytes in size,
while a NFS write may not want any clustering at all.

A RAID request might want to ask for a megabyte of data, but have
it in a range on the device level.

Currently (i think) we only cluster based on logical file offsets,
it would be interesting to allow drivers to do callbacks into the
FS to ask for blocks physically adjacent to the blocks being written.

This is because a 64k block of any file may actually be spread out
across any position, even though UFS tries to reduce fragmentation,
the worse case is that we do the vm ops to cluster non-physically
contiguous blocks.

I think the simplest way to do this would be to rip out the current
clustering code and provide helper routines for the devices to get
adjacent blocks, either logically via VOP or physically via some VFS
mechanism.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 13: 2:30 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147])
	by hub.freebsd.org (Postfix) with ESMTP
	id 909E337B491; Mon,  5 Feb 2001 13:02:09 -0800 (PST)
Received: from critter (localhost [127.0.0.1])
	by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f15L1fB28620;
	Mon, 5 Feb 2001 22:01:41 +0100 (CET)
	(envelope-from phk@critter.freebsd.dk)
To: Alfred Perlstein <bright@wintelcom.net>
Cc: "Justin T. Gibbs" <gibbs@scsiguy.com>,
	Randell Jesup <rjesup@wgate.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Matthew Jacob <mjacob@feral.com>, Mike Smith <msmith@FreeBSD.ORG>,
	Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) 
In-Reply-To: Your message of "Mon, 05 Feb 2001 12:47:07 PST."
             <20010205124707.Y26076@fw.wintelcom.net> 
Date: Mon, 05 Feb 2001 22:01:41 +0100
Message-ID: <28618.981406901@critter>
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In message <20010205124707.Y26076@fw.wintelcom.net>, Alfred Perlstein writes:

>One of the suggestions that Poul-Henning made was to have the device
>somehow specify an optimal clustering strategy, being able to specify
>bounds and sizes.
>
>[...]
>
>Currently (i think) we only cluster based on logical file offsets,
>it would be interesting to allow drivers to do callbacks into the
>FS to ask for blocks physically adjacent to the blocks being written.

I've been playing with various ideas in this area, and to be frank,
totally failed to come up with a breakthrough.

Give methods like striping and RAID-5, it becomes nontrivial to
find a specification language for the driver to say "it would be
quick to write the following blocks also" and it would be even
slower to determine if this was indeed feasible.

"feasible" covers not only "do we have it in RAM", but also "is it
already scheduled for writing", "is it dirty" and not the least
"would softupdates take a fit if we wrote it".

The best I have been able to do so far is if the device-driver
can specify the following quantities:

	(M) maxmimum request size
	(R) preferred request size
	(B) preferred request sector boundary 

The clustering code would then try to increase request to:

	N * R sectors starting X
	where X mod B == 0
	and N * R <= M

Having found a cluster opportunity, the cluster code will
issue the read/write request specifying:

	(E) First possible sector in request
	(S) First mandatory sector in request
	(L) Last mandatory sector in request
	(F) Lase possible sector in request
	(B) Sector address of (S) on media.

The driver has to process the data from [S ... L],
and can optionally process [E...S[ and ]L...F] if
that seems convenient.

If somebody is looking for a good project, benchmarking
the performance of our current clustering and playing
around with various changes would not be the worst 
way to spend some winter evenings.  Playing with FFS/UFS
options (block/fragment etc) at the same time may be
worth while.

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 13:10:16 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from feral.com (feral.com [192.67.166.1])
	by hub.freebsd.org (Postfix) with ESMTP id 2879737B4EC
	for <arch@FreeBSD.ORG>; Mon,  5 Feb 2001 13:09:59 -0800 (PST)
Received: from zeppo.feral.com (IDENT:mjacob@zeppo [192.67.166.71])
	by feral.com (8.9.3/8.9.3) with ESMTP id NAA04841
	for <arch@FreeBSD.ORG>; Mon, 5 Feb 2001 13:10:01 -0800
Date: Mon, 5 Feb 2001 13:09:54 -0800 (PST)
From: Matthew Jacob <mjacob@feral.com>
Reply-To: mjacob@feral.com
To: arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in
 i386) 
In-Reply-To: <28618.981406901@critter>
Message-ID: <Pine.LNX.4.21.0102051304200.2887-100000@zeppo.feral.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


All of this is nice and fine, but the take home notion here is that there's
more  than a "maximum" or a "preferred" size. There's also a "required request
size". And this isn't a constant value you can stash in a dev_t- or you'll
have to have drivers change it as required.

It seems to me that the physio should just be beefed up to take an argument to
a 'parameterization' function, and that flags could be used that say "we don't
even need this mapped any where- just make sure that the pages referred to are
resident".

All of the other stuff is really more of a tight interaction with VM for
optimizing.

-matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 13:15:13 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from mass.dis.org (mass.dis.org [216.240.45.41])
	by hub.freebsd.org (Postfix) with ESMTP id ED5D537B401
	for <arch@FreeBSD.ORG>; Mon,  5 Feb 2001 13:14:55 -0800 (PST)
Received: from mass.dis.org (localhost [127.0.0.1])
	by mass.dis.org (8.11.1/8.11.1) with ESMTP id f15LFoe01152;
	Mon, 5 Feb 2001 13:15:58 -0800 (PST)
	(envelope-from msmith@mass.dis.org)
Message-Id: <200102052115.f15LFoe01152@mass.dis.org>
X-Mailer: exmh version 2.1.1 10/15/1999
To: mjacob@feral.com
Cc: arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) 
In-reply-to: Your message of "Mon, 05 Feb 2001 13:09:54 PST."
             <Pine.LNX.4.21.0102051304200.2887-100000@zeppo.feral.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Mon, 05 Feb 2001 13:15:50 -0800
From: Mike Smith <msmith@freebsd.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> It seems to me that the physio should just be beefed up to take an argument to
> a 'parameterization' function, and that flags could be used that say "we don't
> even need this mapped any where- just make sure that the pages referred to are
> resident".

This is more or less what Matt was talking about; the mapping of buffer 
pages into linear KVM should be optional based on a driver attribute (or, 
perhaps preferably, only performed at the driver's request).

I'm sure that someone will eventually get around to doing something about 
this...

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
           V I C T O R Y   N O T   V E N G E A N C E


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 13:17:27 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147])
	by hub.freebsd.org (Postfix) with ESMTP id 76E6D37B4EC
	for <arch@FreeBSD.ORG>; Mon,  5 Feb 2001 13:17:09 -0800 (PST)
Received: from critter (localhost [127.0.0.1])
	by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f15LHFB28842;
	Mon, 5 Feb 2001 22:17:15 +0100 (CET)
	(envelope-from phk@critter.freebsd.dk)
To: mjacob@feral.com
Cc: arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) 
In-Reply-To: Your message of "Mon, 05 Feb 2001 13:09:54 PST."
             <Pine.LNX.4.21.0102051304200.2887-100000@zeppo.feral.com> 
Date: Mon, 05 Feb 2001 22:17:15 +0100
Message-ID: <28840.981407835@critter>
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In message <Pine.LNX.4.21.0102051304200.2887-100000@zeppo.feral.com>, Matthew Jacob writes:
>
>All of this is nice and fine, but the take home notion here is that there's
>more  than a "maximum" or a "preferred" size. There's also a "required request
>size". And this isn't a constant value you can stash in a dev_t- or you'll
>have to have drivers change it as required.
>
>It seems to me that the physio should just be beefed up to take an argument to
>a 'parameterization' function, and that flags could be used that say "we don't
>even need this mapped any where- just make sure that the pages referred to are
>resident".

This is a different issue.  Yes, I want us to be able to handle
unmapped pages with struct bio, but that is an entirely separate
(and simpler) issue than how clustering is done.

To make struct bio handle unmapped memory, all you have to
do is this:

	1.  Add a driver flag which means "I can do unmapped
	    struct bio": D_UNMAPPEDBIO.

	2.  Add code to specfs::specstrategy():

		if (!(devsw(dev_t)->d_flags & D_UNMAPPEDBIO)) {
			if (bio_is_unmapped(bio))
				map_bio(bio);
		}

	3.  Add the fields you need to struct bio.

	4.  Write a driver which DTRT.

	5.  Make upper kernel and filesystems use the new facility.

By all means attack this if you have the foo it takes.

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 13:24:29 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP
	id B246C37B401; Mon,  5 Feb 2001 13:24:07 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f15LLr011092;
	Mon, 5 Feb 2001 13:21:53 -0800 (PST)
Date: Mon, 5 Feb 2001 13:21:52 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: "Justin T. Gibbs" <gibbs@scsiguy.com>,
	Randell Jesup <rjesup@wgate.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Matthew Jacob <mjacob@feral.com>, Mike Smith <msmith@FreeBSD.ORG>,
	Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386)
Message-ID: <20010205132152.E26076@fw.wintelcom.net>
References: <20010205124707.Y26076@fw.wintelcom.net> <28618.981406901@critter>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <28618.981406901@critter>; from phk@critter.freebsd.dk on Mon, Feb 05, 2001 at 10:01:41PM +0100
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Poul-Henning Kamp <phk@critter.freebsd.dk> [010205 13:01] wrote:
> In message <20010205124707.Y26076@fw.wintelcom.net>, Alfred Perlstein writes:
> 
> >One of the suggestions that Poul-Henning made was to have the device
> >somehow specify an optimal clustering strategy, being able to specify
> >bounds and sizes.
> >
> >[...]
> >
> >Currently (i think) we only cluster based on logical file offsets,
> >it would be interesting to allow drivers to do callbacks into the
> >FS to ask for blocks physically adjacent to the blocks being written.
> 
> I've been playing with various ideas in this area, and to be frank,
> totally failed to come up with a breakthrough.
> 
> Give methods like striping and RAID-5, it becomes nontrivial to
> find a specification language for the driver to say "it would be
> quick to write the following blocks also" and it would be even
> slower to determine if this was indeed feasible.

You're right, it's non-trivial, however the difference between
memory and disk speed is also non-trivial, almost every reasonable
algorithm should be considered to reduce/optimize disk traffic.

A simple call into the VFS should be able to accomplish, afaik when
a VFS has a disk/physical backing it also hashes/sorts bufs based
on physicall backing location.  Although I may be remebering stuff
from 4.3BSD or 4.4BSD instead of the current code...

In fact if it is stored and hashed in the bufs you really don't need
a callback into the VFS, you just need a generic function to call
that gathers physically contig blocks that are dirty, unlocked and
actually contiguous.

> "feasible" covers not only "do we have it in RAM", but also "is it
> already scheduled for writing", "is it dirty" and not the least
> "would softupdates take a fit if we wrote it".

This is why callbacks into the VFS are probably a good idea along
with a generic function that accomplishes what we currently do,
except without the vm-remapping into the pbuf.  (use a linked
chain of bufs instead)

> The best I have been able to do so far is if the device-driver
> can specify the following quantities:
> 
> 	(M) maxmimum request size
> 	(R) preferred request size
> 	(B) preferred request sector boundary 
> 
> The clustering code would then try to increase request to:
> 
> 	N * R sectors starting X
> 	where X mod B == 0
> 	and N * R <= M
> 
> Having found a cluster opportunity, the cluster code will
> issue the read/write request specifying:
> 
> 	(E) First possible sector in request
> 	(S) First mandatory sector in request
> 	(L) Last mandatory sector in request
> 	(F) Lase possible sector in request
> 	(B) Sector address of (S) on media.
>
> The driver has to process the data from [S ... L],
> and can optionally process [E...S[ and ]L...F] if
> that seems convenient.


Well, there's some assertions and questions I have about this:

1) a device should not refuse to write a block unless there's an
   error, meaning if 'S' can't be satisfied, it should at least
   write the single block out.
   I think S & L pretty much have to be equal to each other otherwise
   we can have tricky issues to deal with there S through L never
   become clusterable (they are locked for long periods, or just
   clean)

2) the device should be able to allow a certain amount of
   fragmentation, currently (afaik) the clustering code does
   not tolerate gaps, clean bufs and locked bufs within the
   request, this ought to be changed, there's no reason why
   a request really needs to be completely contiguous as the
   really painful part of disk io, is the seek, being able
   to cluster data with gaps on the same track/cyl is much
   more important than not having any breaks in it at all.

3) with #2, it would be important to specify a tolerance for such
   'holes' in the cluster operation in case the device does have
   a penalty for gaps.

> If somebody is looking for a good project, benchmarking
> the performance of our current clustering and playing
> around with various changes would not be the worst 
> way to spend some winter evenings.  Playing with FFS/UFS
> options (block/fragment etc) at the same time may be
> worth while.

Actually, I'm not looking for a project, I'm looking for time. :)

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 13:34: 6 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4982537B401; Mon,  5 Feb 2001 13:33:48 -0800 (PST)
Received: from critter (localhost [127.0.0.1])
	by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f15LXaB28964;
	Mon, 5 Feb 2001 22:33:36 +0100 (CET)
	(envelope-from phk@critter.freebsd.dk)
To: Alfred Perlstein <bright@wintelcom.net>
Cc: "Justin T. Gibbs" <gibbs@scsiguy.com>,
	Randell Jesup <rjesup@wgate.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Matthew Jacob <mjacob@feral.com>, Mike Smith <msmith@FreeBSD.ORG>,
	Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) 
In-Reply-To: Your message of "Mon, 05 Feb 2001 13:21:52 PST."
             <20010205132152.E26076@fw.wintelcom.net> 
Date: Mon, 05 Feb 2001 22:33:36 +0100
Message-ID: <28962.981408816@critter>
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


>You're right, it's non-trivial, however the difference between
>memory and disk speed is also non-trivial, almost every reasonable
>algorithm should be considered to reduce/optimize disk traffic.
>
>A simple call into the VFS should be able to accomplish, afaik when
>a VFS has a disk/physical backing it also hashes/sorts bufs based
>on physicall backing location.  Although I may be remebering stuff
>from 4.3BSD or 4.4BSD instead of the current code...

It's not "a simple call".

By the time you can make the call, you have passed through the
target FS, through specfs and the disklabel/slice code, possibly
through a layer like vinum and ccd (which may have their own ideas
about clustering) and only then do you arrive at a place where you
know the actual sector address of the request.

We can quickly dismiss the ccd/vinum case by saying that they
have to cater for the needs of the lower devices, and they
specify the clustering policy "like any other disk".

But you still have to contend with the diskslice/label code, and
specfs, so even if you do an "upcall" and find more stuff you can
read/write, you need to pass this bit of the request down through
the specfs (for softupdates rollback/forward) and diskslice/label
code (because you want boundary checking).

And having tried that, I can say with 100% conviction: that is not
an sane option, and if you do it anyway you will certainly not
gain any performance by the time you have resolved all the locking
issues.

Giving some kind of abstract hint from the driver/device and making
the clustering optional for the driver is the only path which does
not lead straight down to layering insanity.

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 13:54:17 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP
	id 8F99237B401; Mon,  5 Feb 2001 13:53:56 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f15LpnV12266;
	Mon, 5 Feb 2001 13:51:49 -0800 (PST)
Date: Mon, 5 Feb 2001 13:51:49 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: "Justin T. Gibbs" <gibbs@scsiguy.com>,
	Randell Jesup <rjesup@wgate.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Matthew Jacob <mjacob@feral.com>, Mike Smith <msmith@FreeBSD.ORG>,
	Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386)
Message-ID: <20010205135149.G26076@fw.wintelcom.net>
References: <20010205132152.E26076@fw.wintelcom.net> <28962.981408816@critter>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <28962.981408816@critter>; from phk@critter.freebsd.dk on Mon, Feb 05, 2001 at 10:33:36PM +0100
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Poul-Henning Kamp <phk@critter.freebsd.dk> [010205 13:33] wrote:
> 
> >You're right, it's non-trivial, however the difference between
> >memory and disk speed is also non-trivial, almost every reasonable
> >algorithm should be considered to reduce/optimize disk traffic.
> >
> >A simple call into the VFS should be able to accomplish, afaik when
> >a VFS has a disk/physical backing it also hashes/sorts bufs based
> >on physicall backing location.  Although I may be remebering stuff
> >from 4.3BSD or 4.4BSD instead of the current code...
> 
> It's not "a simple call".
> 
> By the time you can make the call, you have passed through the
> target FS, through specfs and the disklabel/slice code, possibly
> through a layer like vinum and ccd (which may have their own ideas
> about clustering) and only then do you arrive at a place where you
> know the actual sector address of the request.
> 
> We can quickly dismiss the ccd/vinum case by saying that they
> have to cater for the needs of the lower devices, and they
> specify the clustering policy "like any other disk".
> 
> But you still have to contend with the diskslice/label code, and
> specfs, so even if you do an "upcall" and find more stuff you can
> read/write, you need to pass this bit of the request down through
> the specfs (for softupdates rollback/forward) and diskslice/label
> code (because you want boundary checking).
> 
> And having tried that, I can say with 100% conviction: that is not
> an sane option, and if you do it anyway you will certainly not
> gain any performance by the time you have resolved all the locking
> issues.

Well, my impression was that all locking operation (except mutexes)
should be resolved by doing try_lockfoo() and if try_lock fails then
don't cluster that object/buf/vnode (as the current code does).

You are right though, I guess we don't need callbacks into the VFS,
this can be resolved with just the buffer system via flags and locks.

> Giving some kind of abstract hint from the driver/device and making
> the clustering optional for the driver is the only path which does
> not lead straight down to layering insanity.

I'm not sure I understand what you mean, my vision of the current
code is:

  Kernel IO request triggered via FS/bufdeamon/etc
      | 1 buf
  cluster_foo
      | 1-N bufs (in a pbuf)
    device
      |
     write

What I'd like to see (considering we don't need to really involve
VFS) is:

  Kernel IO request triggered via FS/bufdeamon/etc
      | 1 buf
     device  ---------> cluster routine (A)
      |                        /
     device  <----------------/
      |          1-N bufs (linked list, no pbuf)
     write

This way the device can call into any number of generic clustering
routines if it wants to support them.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 14: 2:23 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147])
	by hub.freebsd.org (Postfix) with ESMTP
	id 6B54937B401; Mon,  5 Feb 2001 14:02:03 -0800 (PST)
Received: from critter (localhost [127.0.0.1])
	by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f15M1gB29136;
	Mon, 5 Feb 2001 23:01:42 +0100 (CET)
	(envelope-from phk@critter.freebsd.dk)
To: Alfred Perlstein <bright@wintelcom.net>
Cc: "Justin T. Gibbs" <gibbs@scsiguy.com>,
	Randell Jesup <rjesup@wgate.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Matthew Jacob <mjacob@feral.com>, Mike Smith <msmith@FreeBSD.ORG>,
	Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) 
In-Reply-To: Your message of "Mon, 05 Feb 2001 13:51:49 PST."
             <20010205135149.G26076@fw.wintelcom.net> 
Date: Mon, 05 Feb 2001 23:01:42 +0100
Message-ID: <29134.981410502@critter>
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


>> Giving some kind of abstract hint from the driver/device and making
>> the clustering optional for the driver is the only path which does
>> not lead straight down to layering insanity.
>
>I'm not sure I understand what you mean, my vision of the current
>code is:

As others have pointed out, if the requirement that pages be mapped
contiguously for an struct bio request is relaxed, many more clustering
opportunities are expected and some mapping/unmapping operations can
be avoided.

Some argue that it is "some ... many ..." rather than the other
way around.

Either way it should be a gain.

I think it makes sense to try to grab that piece of fruit first,
since it has obvious benefits whereas most of the rest of the
suggestions are in the "pure speculation" range and not testable
without unmapped pages in struct bio.

One way or another, benchmarking will be needed and just what is
a good workload to benchmark on ?  Is make world representative ?
If not, we should establish a reproducible benchmark some other
way.

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 14:27:36 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from aslan.scsiguy.com (aslan.scsiguy.com [63.229.232.106])
	by hub.freebsd.org (Postfix) with ESMTP
	id F097237B4EC; Mon,  5 Feb 2001 14:27:15 -0800 (PST)
Received: from scsiguy.com (localhost [127.0.0.1])
	by aslan.scsiguy.com (8.11.0/8.9.3) with ESMTP id f15MO2O51248;
	Mon, 5 Feb 2001 15:24:14 -0700 (MST)
	(envelope-from gibbs@scsiguy.com)
Message-Id: <200102052224.f15MO2O51248@aslan.scsiguy.com>
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: Alfred Perlstein <bright@wintelcom.net>,
	Randell Jesup <rjesup@wgate.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Matthew Jacob <mjacob@feral.com>, Mike Smith <msmith@FreeBSD.ORG>,
	Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) 
In-Reply-To: Your message of "Mon, 05 Feb 2001 22:33:36 +0100."
             <28962.981408816@critter> 
Date: Mon, 05 Feb 2001 15:24:02 -0700
From: "Justin T. Gibbs" <gibbs@scsiguy.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

>
>It's not "a simple call".
>

It doesn't have to be a simple call if it only occurs once on mount
and whenever a component makes an async upcall telling the system that
its state has changed (array is degraded, or perhaps commonly accessed
data has migrated to a different striping or RAID layout).

--
Justin


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 14:37:14 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147])
	by hub.freebsd.org (Postfix) with ESMTP
	id 385E737B684; Mon,  5 Feb 2001 14:36:56 -0800 (PST)
Received: from critter (localhost [127.0.0.1])
	by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f15MaqB29301;
	Mon, 5 Feb 2001 23:36:52 +0100 (CET)
	(envelope-from phk@critter.freebsd.dk)
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
Cc: Alfred Perlstein <bright@wintelcom.net>,
	Randell Jesup <rjesup@wgate.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Matthew Jacob <mjacob@feral.com>, Mike Smith <msmith@FreeBSD.ORG>,
	Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) 
In-Reply-To: Your message of "Mon, 05 Feb 2001 15:24:02 MST."
             <200102052224.f15MO2O51248@aslan.scsiguy.com> 
Date: Mon, 05 Feb 2001 23:36:52 +0100
Message-ID: <29299.981412612@critter>
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In message <200102052224.f15MO2O51248@aslan.scsiguy.com>, "Justin T. Gibbs" writes:
>>
>>It's not "a simple call".
>>
>
>It doesn't have to be a simple call if it only occurs once on mount
>and whenever a component makes an async upcall telling the system that
>its state has changed (array is degraded, or perhaps commonly accessed
>data has migrated to a different striping or RAID layout).

I think we are talking too many different things at the same time here.

The upcall I (and I belive Alfred) were discussing were happening
once per I/O.

The one you are talking about is obviously the one to formulate an
abstract clustering preference for a device ? 


--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 14:39: 9 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from grendel.bsdi.com (grendel.twistedbit.com [199.79.183.5])
	by hub.freebsd.org (Postfix) with ESMTP
	id 19FD837B684; Mon,  5 Feb 2001 14:38:50 -0800 (PST)
Received: from grendel.bsdi.com (cp@localhost.bsdi.com [127.0.0.1])
	by grendel.bsdi.com (8.11.1/8.9.3) with ESMTP id f15MYfW96817;
	Mon, 5 Feb 2001 15:34:41 -0700 (MST)
	(envelope-from cp@grendel.bsdi.com)
Message-Id: <200102052234.f15MYfW96817@grendel.bsdi.com>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
Cc: Poul-Henning Kamp <phk@critter.freebsd.dk>,
	Alfred Perlstein <bright@wintelcom.net>,
	Randell Jesup <rjesup@wgate.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Matthew Jacob <mjacob@feral.com>, Mike Smith <msmith@FreeBSD.ORG>,
	Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) 
In-reply-to: Your message of "Mon, 05 Feb 2001 15:24:02 MST."
             <200102052224.f15MO2O51248@aslan.scsiguy.com> 
From: Chuck Paterson <cp@bsdi.com>
Date: Mon, 05 Feb 2001 15:34:41 -0700
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

	In the discussions I noticed someone mentioned some
of the issues with architectures like Sparc. I haven't noticed
anyone discuss the need to deal with the limited DVMA space. You
really need to have some reservation policy on the buffer before
you send them down to a driver, or at least have the
driver do a call to get a reservatioin commitment before
actually doing the map ins. If not you could have problems
like two drivers trying to map there io buffer, both having them
half mapped and unable to get the resouces to finish the mapping.

Chuck
Chuck


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 14:51:20 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from feral.com (feral.com [192.67.166.1])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9DADF37B67D; Mon,  5 Feb 2001 14:50:59 -0800 (PST)
Received: from zeppo.feral.com (IDENT:mjacob@zeppo [192.67.166.71])
	by feral.com (8.9.3/8.9.3) with ESMTP id OAA05261;
	Mon, 5 Feb 2001 14:50:07 -0800
Date: Mon, 5 Feb 2001 14:50:04 -0800 (PST)
From: Matthew Jacob <mjacob@feral.com>
Reply-To: mjacob@feral.com
To: Chuck Paterson <cp@bsdi.com>
Cc: "Justin T. Gibbs" <gibbs@scsiguy.com>,
	Poul-Henning Kamp <phk@critter.freebsd.dk>,
	Alfred Perlstein <bright@wintelcom.net>,
	Randell Jesup <rjesup@wgate.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Mike Smith <msmith@FreeBSD.ORG>, Dag-Erling Smorgrav <des@ofug.org>,
	Dan Nelson <dnelson@emsphone.com>,
	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>, arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in
 i386) 
In-Reply-To: <200102052234.f15MYfW96817@grendel.bsdi.com>
Message-ID: <Pine.LNX.4.21.0102051442410.3402-100000@zeppo.feral.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Mon, 5 Feb 2001, Chuck Paterson wrote:

> 	In the discussions I noticed someone mentioned some
> of the issues with architectures like Sparc. I haven't noticed
> anyone discuss the need to deal with the limited DVMA space. You
> really need to have some reservation policy on the buffer before
> you send them down to a driver, or at least have the
> driver do a call to get a reservatioin commitment before
> actually doing the map ins. If not you could have problems
> like two drivers trying to map there io buffer, both having them
> half mapped and unable to get the resouces to finish the mapping.

True enough- but this is true for a single process that needs to map
more than any specific limited resource- so it isn't just two processes
getting deadlocked.

That's specifically why a 'mapping window' approach was added to the Solaris
DDI DMA model- this allowed one to do a dma transfer for darn near all of
physical memory as long as you had a device that could shift the mapping
window as needed during the transfer (yes, I actually did test it- it was
*wierd* doing 28MB 'single' dma transfers on a Sparc2).

From a more or less practical point of view, the newer Ultra machines have a
programmable iommu that allows you to pretty much map up to a gig of
memory. Then it becomes a very very interesting dance using full, uh, 36 bit I
think, physical address and some undefined stuff about I/O coherencey in that
case. I'll assert that FreeBSD, should it do a sparc port, shouldn't have the
slightest interest in anything less than this class of machines.

-matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 15:25:21 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by hub.freebsd.org (Postfix) with ESMTP
	id 62F2B37B698; Mon,  5 Feb 2001 15:25:03 -0800 (PST)
Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102])
	by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id KAA03439;
	Tue, 6 Feb 2001 10:24:59 +1100
Date: Tue, 6 Feb 2001 10:24:39 +1100 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-Sender: bde@besplex.bde.org
To: Robert Watson <rwatson@FreeBSD.ORG>
Cc: Nathan Gould <ngould@zoo.co.uk>, freebsd-arch@FreeBSD.ORG
Subject: Re: Tests for NULL p_ucred under p_cred -- are they needed?
In-Reply-To: <Pine.NEB.3.96L.1010205102219.74962L-100000@fledge.watson.org>
Message-ID: <Pine.BSF.4.21.0102060944420.21359-100000@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Mon, 5 Feb 2001, Robert Watson wrote:

> In any case, there seems to be some inconsistency.  It would seem that
> either (a) it is an invariant that p_cred is non-NULL for all reachable
> processes via various process lists (except unused processes), (b) it's an
> invariant that p_cred is non-NULL between the end of fork1() and the
> beginning of exit(), and that p_cred is therefore always defined if you're
> acting on behalf of the process, but not necessarily if you're acting on
> the process.
> 
> Clearly, (1) would make life easier, and mean we could remove a fair
> number of checks.  However, it may be that (b) is the case, in which case
> the signal code might require fixing, or the invariants it depends on at
> least require documenting.  This relevant also as I overhaul the process
> access control routines, because I need to know if it's possible to have
> processes without credentials, and if so, what it means.

p_cred is actually non-NULL until the middle of wait1(), so we are at
least close to case (a), and processes "always" have credentials -- even
zombies have them.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 15:46: 0 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from molly.straylight.com (molly.straylight.com [209.68.199.242])
	by hub.freebsd.org (Postfix) with ESMTP id 1D63337B6A2
	for <freebsd-arch@freebsd.org>; Mon,  5 Feb 2001 15:45:43 -0800 (PST)
Received: from dickie (case.straylight.com [209.68.199.244])
	by molly.straylight.com (8.11.0/8.10.0) with SMTP id f15NjbX18424
	for <freebsd-arch@freebsd.org>; Mon, 5 Feb 2001 15:45:37 -0800
From: "Jonathan Graehl" <jonathan@graehl.org>
To: <freebsd-arch@freebsd.org>
Subject: nonblocking sockets and EINTR
Date: Mon, 5 Feb 2001 15:46:20 -0800
Message-ID: <NCBBLOALCKKINBNNEDDLOEFNDKAA.jonathan@graehl.org>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

If a TCP or UDP socket is set nonblocking, do I ever have to worry about getting
my system calls for those sockets interrupted?  It is my understanding that you
should only have to check for EINTR for "slow" system calls (that can take an
indefinite amount of time), which should mean I'm home free, since the operation
either completes immediately, or I get EWOULDBLOCK.

For now, since I am not sure I can count on this behavior, I block all nonfatal
signals.  I would like to be able to use signals to communicate to my daemon
(with the caveat that I may get an EINTR for my kevent call, but not for any of
my socket operations).

Is there any standard behavior I can count on for nonblocking sockets w.r.t.
EINTR?

Thanks ...

--
Jonathan Graehl
  email: jonathan@graehl.org
  web: http://jonathan.graehl.org/
  phone: 858-642-7562


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 15:49: 0 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 07F8537B6A2
	for <freebsd-arch@FreeBSD.ORG>; Mon,  5 Feb 2001 15:48:44 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f15NmhR16126;
	Mon, 5 Feb 2001 15:48:43 -0800 (PST)
Date: Mon, 5 Feb 2001 15:48:43 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Jonathan Graehl <jonathan@graehl.org>
Cc: freebsd-arch@FreeBSD.ORG
Subject: Re: nonblocking sockets and EINTR
Message-ID: <20010205154842.J26076@fw.wintelcom.net>
References: <NCBBLOALCKKINBNNEDDLOEFNDKAA.jonathan@graehl.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <NCBBLOALCKKINBNNEDDLOEFNDKAA.jonathan@graehl.org>; from jonathan@graehl.org on Mon, Feb 05, 2001 at 03:46:20PM -0800
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Jonathan Graehl <jonathan@graehl.org> [010205 15:46] wrote:
> If a TCP or UDP socket is set nonblocking, do I ever have to worry about getting
> my system calls for those sockets interrupted?  It is my understanding that you
> should only have to check for EINTR for "slow" system calls (that can take an
> indefinite amount of time), which should mean I'm home free, since the operation
> either completes immediately, or I get EWOULDBLOCK.
> 
> For now, since I am not sure I can count on this behavior, I block all nonfatal
> signals.  I would like to be able to use signals to communicate to my daemon
> (with the caveat that I may get an EINTR for my kevent call, but not for any of
> my socket operations).
> 
> Is there any standard behavior I can count on for nonblocking sockets w.r.t.
> EINTR?

You can specify that syscalls will or won't be automatically
restarted via the sigaction() API.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 16:10:18 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from mass.dis.org (mass.dis.org [216.240.45.41])
	by hub.freebsd.org (Postfix) with ESMTP id 93F0137B503
	for <arch@FreeBSD.ORG>; Mon,  5 Feb 2001 16:09:59 -0800 (PST)
Received: from mass.dis.org (localhost [127.0.0.1])
	by mass.dis.org (8.11.1/8.11.1) with ESMTP id f160BBe01822;
	Mon, 5 Feb 2001 16:11:12 -0800 (PST)
	(envelope-from msmith@mass.dis.org)
Message-Id: <200102060011.f160BBe01822@mass.dis.org>
X-Mailer: exmh version 2.1.1 10/15/1999
To: Chuck Paterson <cp@bsdi.com>
Cc: arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) 
In-reply-to: Your message of "Mon, 05 Feb 2001 15:34:41 MST."
             <200102052234.f15MYfW96817@grendel.bsdi.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Mon, 05 Feb 2001 16:11:11 -0800
From: Mike Smith <msmith@freebsd.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> 	In the discussions I noticed someone mentioned some
> of the issues with architectures like Sparc. I haven't noticed
> anyone discuss the need to deal with the limited DVMA space. You
> really need to have some reservation policy on the buffer before
> you send them down to a driver, or at least have the
> driver do a call to get a reservatioin commitment before
> actually doing the map ins. If not you could have problems
> like two drivers trying to map there io buffer, both having them
> half mapped and unable to get the resouces to finish the mapping.

This should be handled by having bus_dmamap_load and/or bus_dmamap_sync 
return success values, rather than void like they do now.

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
           V I C T O R Y   N O T   V E N G E A N C E


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 17:16:38 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from molly.straylight.com (molly.straylight.com [209.68.199.242])
	by hub.freebsd.org (Postfix) with ESMTP
	id 6E04F37B6A2; Mon,  5 Feb 2001 17:16:20 -0800 (PST)
Received: from dickie (case.straylight.com [209.68.199.244])
	by molly.straylight.com (8.11.0/8.10.0) with SMTP id f161GDX19005;
	Mon, 5 Feb 2001 17:16:13 -0800
From: "Jonathan Graehl" <jonathan@graehl.org>
To: "Alfred Perlstein" <bright@wintelcom.net>
Cc: <freebsd-arch@freebsd.org>, "Jonathan Lemon" <jlemon@freebsd.org>
Subject: RE: nonblocking sockets and EINTR (kevent does not observe SA_RESTART?)
Date: Mon, 5 Feb 2001 17:16:56 -0800
Message-ID: <NCBBLOALCKKINBNNEDDLGEFPDKAA.jonathan@graehl.org>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
In-Reply-To: <20010205154842.J26076@fw.wintelcom.net>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> You can specify that syscalls will or won't be automatically
> restarted via the sigaction() API.
>
> --
> -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]

Thank you for reminding me of this (and making me feel like my question could
have been better directed at -questions, if it is so trivially answered ;)

I am using sigaction with SA_RESTART, and I still get EINTR from my kevent call
(no matter, this is easily dealt with, due to the straightforward kevent
semantics).  I assume that SA_RESTART then only applies to the traditional
syscalls (read/write,send/recv), and that this may be an oversight in the kqueue
implementation, at least meriting a warning in the man page

(I also assume that it is not possible to get EINTR for a datagram read/write,
since there is no message handle used in sendto/recvfrom)


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 17:34:14 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from prism.flugsvamp.com (cb58709-a.mdsn1.wi.home.com [24.17.241.9])
	by hub.freebsd.org (Postfix) with ESMTP
	id DB5FA37B491; Mon,  5 Feb 2001 17:33:55 -0800 (PST)
Received: (from jlemon@localhost)
	by prism.flugsvamp.com (8.11.0/8.11.0) id f161Z7Y95228;
	Mon, 5 Feb 2001 19:35:07 -0600 (CST)
	(envelope-from jlemon)
Date: Mon, 5 Feb 2001 19:35:07 -0600
From: Jonathan Lemon <jlemon@flugsvamp.com>
To: Jonathan Graehl <jonathan@graehl.org>
Cc: Alfred Perlstein <bright@wintelcom.net>,
	freebsd-arch@freebsd.org, Jonathan Lemon <jlemon@freebsd.org>
Subject: Re: nonblocking sockets and EINTR (kevent does not observe SA_RESTART?)
Message-ID: <20010205193507.J650@prism.flugsvamp.com>
References: <20010205154842.J26076@fw.wintelcom.net> <NCBBLOALCKKINBNNEDDLGEFPDKAA.jonathan@graehl.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre2i
In-Reply-To: <NCBBLOALCKKINBNNEDDLGEFPDKAA.jonathan@graehl.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Mon, Feb 05, 2001 at 05:16:56PM -0800, Jonathan Graehl wrote:
> > You can specify that syscalls will or won't be automatically
> > restarted via the sigaction() API.
> >
> > --
> > -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
> 
> Thank you for reminding me of this (and making me feel like my question could
> have been better directed at -questions, if it is so trivially answered ;)
> 
> I am using sigaction with SA_RESTART, and I still get EINTR from my kevent call
> (no matter, this is easily dealt with, due to the straightforward kevent
> semantics).  I assume that SA_RESTART then only applies to the traditional
> syscalls (read/write,send/recv), and that this may be an oversight in the kqueue
> implementation, at least meriting a warning in the man page

The difficulty in restarting the kevent call is that it would have
to re-apply the changelist, which is probably not what you want.  The
only case where it is possible to perform a restart is with an empty
changelist.  I didn't put this optimization in, as I think it would be
better if the interface was consistent in all cases.
--
Jonathan


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 17:50:20 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from molly.straylight.com (molly.straylight.com [209.68.199.242])
	by hub.freebsd.org (Postfix) with ESMTP id C898437B503
	for <freebsd-arch@freebsd.org>; Mon,  5 Feb 2001 17:50:00 -0800 (PST)
Received: from dickie (case.straylight.com [209.68.199.244])
	by molly.straylight.com (8.11.0/8.10.0) with SMTP id f161nrX19195;
	Mon, 5 Feb 2001 17:49:53 -0800
From: "Jonathan Graehl" <jonathan@graehl.org>
To: "Jonathan Lemon" <jlemon@flugsvamp.com>
Cc: <freebsd-arch@freebsd.org>
Subject: RE: nonblocking sockets and EINTR (kevent does not observe SA_RESTART?)
Date: Mon, 5 Feb 2001 17:50:37 -0800
Message-ID: <NCBBLOALCKKINBNNEDDLEEGADKAA.jonathan@graehl.org>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
In-Reply-To: <20010205193507.J650@prism.flugsvamp.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

I assume, then, that you guarantee that the changelist is applied (and errors
relating to the changes are placed in the received-events-buffer, if possible)
before the call becomes interruptible?  (and if there were an error that doesn't
fit in the buffer, the return would be immediate with the error code); that is,
only after the process goes to sleep waiting in kqueue, is there the possibility
of an EINTR return?  Or, is there the possibility of the changelist only being
partially executed when the result is EINTR?

I concur that the EINTR semantics are simple and consistent, but perhaps a
warning, to the effect that SA_RESTART does not prevent the EINTR outcome, is in
order (this may be the case for quite a few other syscalls as well, I have no
idea ... but it would be nice to see it documented)

> The difficulty in restarting the kevent call is that it would have
> to re-apply the changelist, which is probably not what you want.  The
> only case where it is possible to perform a restart is with an empty
> changelist.  I didn't put this optimization in, as I think it would be
> better if the interface was consistent in all cases.
> --
> Jonathan
>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 18:48:55 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp10.phx.gblx.net (smtp10.phx.gblx.net [206.165.6.140])
	by hub.freebsd.org (Postfix) with ESMTP
	id 80A8437B503; Mon,  5 Feb 2001 18:48:35 -0800 (PST)
Received: (from daemon@localhost)
	by smtp10.phx.gblx.net (8.9.3/8.9.3) id TAA35152;
	Mon, 5 Feb 2001 19:47:59 -0700
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp10.phx.gblx.net, id smtpd4XrvEa; Mon Feb  5 19:47:50 2001
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id TAA08217;
	Mon, 5 Feb 2001 19:48:20 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200102060248.TAA08217@usr08.primenet.com>
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386)
To: phk@critter.freebsd.dk (Poul-Henning Kamp)
Date: Tue, 6 Feb 2001 02:48:19 +0000 (GMT)
Cc: gibbs@scsiguy.com (Justin T. Gibbs),
	bright@wintelcom.net (Alfred Perlstein),
	rjesup@wgate.com (Randell Jesup),
	dillon@earth.backplane.com (Matt Dillon),
	mjacob@feral.com (Matthew Jacob), msmith@FreeBSD.ORG (Mike Smith),
	des@ofug.org (Dag-Erling Smorgrav),
	dnelson@emsphone.com (Dan Nelson),
	tanimura@r.dl.itc.u-tokyo.ac.jp (Seigo Tanimura), arch@FreeBSD.ORG
In-Reply-To: <29299.981412612@critter> from "Poul-Henning Kamp" at Feb 05, 2001 11:36:52 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> >It doesn't have to be a simple call if it only occurs once on mount
> >and whenever a component makes an async upcall telling the system that
> >its state has changed (array is degraded, or perhaps commonly accessed
> >data has migrated to a different striping or RAID layout).
> 
> I think we are talking too many different things at the same time here.

Way too many irons in the fire here...


> The upcall I (and I belive Alfred) were discussing were happening
> once per I/O.

I don't think an upcall is really useful.  Given a stack of things,
possibly including Vinum and friends, it would be really difficult
to get the event propagation semantics right, in any case.  It only
gets worse, with vnode devices and FS stacks.


> The one you are talking about is obviously the one to formulate an
> abstract clustering preference for a device ? 

I still think it might be worthwhile to readdress the seek minimization
code, by reading mode page 2 on SCSI drives, and using the knowledge of
the real seek boundaries.  Your point about whiling away Winter nights
is well taken.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 21:46:37 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67])
	by hub.freebsd.org (Postfix) with ESMTP
	id 20E5E37B401; Mon,  5 Feb 2001 21:46:20 -0800 (PST)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.1/8.9.3) id f165kHq58398;
	Mon, 5 Feb 2001 21:46:17 -0800 (PST)
	(envelope-from dillon)
Date: Mon, 5 Feb 2001 21:46:17 -0800 (PST)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200102060546.f165kHq58398@earth.backplane.com>
To: Terry Lambert <tlambert@primenet.com>
Cc: phk@critter.freebsd.dk (Poul-Henning Kamp),
	gibbs@scsiguy.com (Justin T. Gibbs),
	bright@wintelcom.net (Alfred Perlstein),
	rjesup@wgate.com (Randell Jesup), mjacob@feral.com (Matthew Jacob),
	msmith@FreeBSD.ORG (Mike Smith), des@ofug.org (Dag-Erling Smorgrav),
	dnelson@emsphone.com (Dan Nelson),
	tanimura@r.dl.itc.u-tokyo.ac.jp (Seigo Tanimura), arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386)
References:  <200102060248.TAA08217@usr08.primenet.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

   At risk of throwing yet another iron into coals....

   The problem here is to try to give a 'hint' to the high level VFS/BIO
   and VM systems.  The hint doesn't have to be correct, it just has to be
   close 'most of the time'. 

   What this means is that we don't have to create massive infrastructure
   to get it exactly right.  Something as simple as an alignment size covers
   a wide range of topologies, including all standard RAID topologies.
   We don't have to propogate information about actual seek boundries,
   or reassigned sectors, for example.  We certainly do not have to
   propogate the information on-the-fly... we can get 95% of the way there
   at mount time, and that's good enough.  We can also simply assume a
   reasonable rule for intermediate topologies such as CCD, VN, or a 
   filesystem... we allow the intermediate layers to modify the parameters
   on their way up, and we assume they will do so prudently.  And we can
   assume for the most part that contiguous blocks translate to contiguous
   blocks 'most of the time', even when reading and writing a file.
   (And I will note here that the clustering code is already aware of the
   most common case -- a logically contiguous file that is not necessarily
   physically contiguous, and the system does the right thing).

   I think the idea Poul originally articulated -- having simple information
   like recommended I/O size, recommended cluster size, and/or maximum I/O
   size, is the correct solution.  Getting fancy might buy us a percent
   or two... it isn't worth the effort.

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 22: 8:23 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP
	id 6B39237B401; Mon,  5 Feb 2001 22:08:06 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f16684g29040;
	Mon, 5 Feb 2001 22:08:04 -0800 (PST)
Date: Mon, 5 Feb 2001 22:08:04 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Jonathan Lemon <jlemon@flugsvamp.com>
Cc: Jonathan Graehl <jonathan@graehl.org>, freebsd-arch@freebsd.org,
	Jonathan Lemon <jlemon@freebsd.org>
Subject: Re: nonblocking sockets and EINTR (kevent does not observe SA_RESTART?)
Message-ID: <20010205220804.M26076@fw.wintelcom.net>
References: <20010205154842.J26076@fw.wintelcom.net> <NCBBLOALCKKINBNNEDDLGEFPDKAA.jonathan@graehl.org> <20010205193507.J650@prism.flugsvamp.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20010205193507.J650@prism.flugsvamp.com>; from jlemon@flugsvamp.com on Mon, Feb 05, 2001 at 07:35:07PM -0600
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Jonathan Lemon <jlemon@flugsvamp.com> [010205 17:33] wrote:
> On Mon, Feb 05, 2001 at 05:16:56PM -0800, Jonathan Graehl wrote:
> > > You can specify that syscalls will or won't be automatically
> > > restarted via the sigaction() API.
> > >
> > > --
> > > -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
> > 
> > Thank you for reminding me of this (and making me feel like my question could
> > have been better directed at -questions, if it is so trivially answered ;)
> > 
> > I am using sigaction with SA_RESTART, and I still get EINTR from my kevent call
> > (no matter, this is easily dealt with, due to the straightforward kevent
> > semantics).  I assume that SA_RESTART then only applies to the traditional
> > syscalls (read/write,send/recv), and that this may be an oversight in the kqueue
> > implementation, at least meriting a warning in the man page
> 
> The difficulty in restarting the kevent call is that it would have
> to re-apply the changelist, which is probably not what you want.  The
> only case where it is possible to perform a restart is with an empty
> changelist.  I didn't put this optimization in, as I think it would be
> better if the interface was consistent in all cases.

I'm pretty sure select() and poll() do not respect SA_RESTART
either, so it's probably best that kevent doesn't as well.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 22:49:28 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp10.phx.gblx.net (smtp10.phx.gblx.net [206.165.6.140])
	by hub.freebsd.org (Postfix) with ESMTP
	id 421A337B503; Mon,  5 Feb 2001 22:49:11 -0800 (PST)
Received: (from daemon@localhost)
	by smtp10.phx.gblx.net (8.9.3/8.9.3) id XAA25964;
	Mon, 5 Feb 2001 23:48:36 -0700
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp10.phx.gblx.net, id smtpddhzwEa; Mon Feb  5 23:48:32 2001
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id XAA12348;
	Mon, 5 Feb 2001 23:49:04 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200102060649.XAA12348@usr08.primenet.com>
Subject: Re: nonblocking sockets and EINTR (kevent does not observe SA_RESTART?)
To: bright@wintelcom.net (Alfred Perlstein)
Date: Tue, 6 Feb 2001 06:49:02 +0000 (GMT)
Cc: jlemon@flugsvamp.com (Jonathan Lemon),
	jonathan@graehl.org (Jonathan Graehl), freebsd-arch@FreeBSD.ORG,
	jlemon@FreeBSD.ORG (Jonathan Lemon)
In-Reply-To: <20010205220804.M26076@fw.wintelcom.net> from "Alfred Perlstein" at Feb 05, 2001 10:08:04 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> I'm pretty sure select() and poll() do not respect SA_RESTART
> either, so it's probably best that kevent doesn't as well.

Historically, select() has respected SA_RESTART; all system
calls respected it; it was the default behaviour for 4.2,
and until the introduction of siginterrupt(), which was
obtained from DEC Ultrix.

The standard way that was used prior to that of causing a
signal handler to actually interrupt a call was to longjmp()
out of the signal handler, with a setjmp() wrapper around
the call being aborted.

It was only after the introduction of POSIX signals, which
have made life hell for wrapping system calls safely, that
the default changed to the POSIX (SVR4) behaviour.

Actually, I don't really see any problem with select() being
restarted, since it's trivial to set the bitmap.  If the call
is interrupted, the bitmap should be unmodified (ready to call
select() and go); if the bitmap was changed, then the bits
which are set are valid, so returning them isn't a problem:
the call has completed, but triggered the trampoline.

The poll() call might be more of a problem, particularly if
we are relying on SIGPOLL to signal pollable events pending
being reaped via a subsequent poll() call.  Otherwise, the
poll() interface is better for restarting than select() is.

Although I doubt we will return to the default-restart of
4.2 and 4.3 (even though it would make a threads library a
trivial thing to write, with all of the signal masking and
unmasking calls being dropped from the overhead), I think
that it would not be impossible to make SA_RESTART work like
POSIX says it should (at least one of the approaches that
has been suggested will work, I think); it's probably worth
the effort to think about how to fix kevent().


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Feb  5 22:58:21 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP
	id 1F26437B401; Mon,  5 Feb 2001 22:58:04 -0800 (PST)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.3/8.9.3) id XAA25045;
	Mon, 5 Feb 2001 23:53:17 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp05.primenet.com, id smtpdAAAQuai2W; Mon Feb  5 23:53:09 2001
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id XAA12458;
	Mon, 5 Feb 2001 23:57:49 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200102060657.XAA12458@usr08.primenet.com>
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386)
To: dillon@earth.backplane.com (Matt Dillon)
Date: Tue, 6 Feb 2001 06:57:48 +0000 (GMT)
Cc: tlambert@primenet.com (Terry Lambert),
	phk@critter.freebsd.dk (Poul-Henning Kamp),
	gibbs@scsiguy.com (Justin T. Gibbs),
	bright@wintelcom.net (Alfred Perlstein),
	rjesup@wgate.com (Randell Jesup), mjacob@feral.com (Matthew Jacob),
	msmith@FreeBSD.ORG (Mike Smith), des@ofug.org (Dag-Erling Smorgrav),
	dnelson@emsphone.com (Dan Nelson),
	tanimura@r.dl.itc.u-tokyo.ac.jp (Seigo Tanimura), arch@FreeBSD.ORG
In-Reply-To: <200102060546.f165kHq58398@earth.backplane.com> from "Matt Dillon" at Feb 05, 2001 09:46:17 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

>    I think the idea Poul originally articulated -- having simple information
>    like recommended I/O size, recommended cluster size, and/or maximum I/O
>    size, is the correct solution.  Getting fancy might buy us a percent
>    or two... it isn't worth the effort.

I thought Poul had discarded that idea as unworkable, after having
tried to make it work; I got the impression that he still liked
the idea, but that he didn't have a way to make it practical (Poul,
please correct me if I am misinterpreting your last post).

I can't see hints being much more useful than the seek optimization
code, which was disabled as a pessimization for most ZBR drives,
where the track boundaries were unknown back in the early fictional
geometry days (predating SCSI II, where it could be fixed again).

I would think that you would want your optimization to work at
least 51% of the time for it to be worthwhile, or at least "mostly
harmless", and I really have doubts that "hints" would be able to
do that.

You really don't want to end up with something that makes a
microbenchmark run fast, at the expense of real system loads,
like some of the stuff that happened in the buffer cache code,
historically.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Feb  6  0: 4:13 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147])
	by hub.freebsd.org (Postfix) with ESMTP
	id 46C4037B401; Tue,  6 Feb 2001 00:03:55 -0800 (PST)
Received: from critter (localhost [127.0.0.1])
	by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f1683pB31838;
	Tue, 6 Feb 2001 09:03:51 +0100 (CET)
	(envelope-from phk@critter.freebsd.dk)
To: Terry Lambert <tlambert@primenet.com>
Cc: dillon@earth.backplane.com (Matt Dillon),
	gibbs@scsiguy.com (Justin T. Gibbs),
	bright@wintelcom.net (Alfred Perlstein),
	rjesup@wgate.com (Randell Jesup), mjacob@feral.com (Matthew Jacob),
	msmith@FreeBSD.ORG (Mike Smith), des@ofug.org (Dag-Erling Smorgrav),
	dnelson@emsphone.com (Dan Nelson),
	tanimura@r.dl.itc.u-tokyo.ac.jp (Seigo Tanimura), arch@FreeBSD.ORG
Subject: Re: Bumping up {MAX,DFLT}*PHYS (was Re: Bumping up {MAX,DFL}*SIZ in i386) 
In-Reply-To: Your message of "Tue, 06 Feb 2001 06:57:48 GMT."
             <200102060657.XAA12458@usr08.primenet.com> 
Date: Tue, 06 Feb 2001 09:03:51 +0100
Message-ID: <31836.981446631@critter>
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In message <200102060657.XAA12458@usr08.primenet.com>, Terry Lambert writes:
>>    I think the idea Poul originally articulated -- having simple information
>>    like recommended I/O size, recommended cluster size, and/or maximum I/O
>>    size, is the correct solution.  Getting fancy might buy us a percent
>>    or two... it isn't worth the effort.
>
>I thought Poul had discarded that idea as unworkable, after having
>tried to make it work; I got the impression that he still liked
>the idea, but that he didn't have a way to make it practical (Poul,
>please correct me if I am misinterpreting your last post).

No, that is perfectly possible and basically on requires the addition
of a preferred modulus to the current data in dev_t / struct disk.

Optimal individual clustering is unworkable.

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Feb  6  1: 6:59 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from njord.bart.nl (njord.bart.nl [194.158.170.15])
	by hub.freebsd.org (Postfix) with ESMTP
	id DCD7737B699; Tue,  6 Feb 2001 01:06:41 -0800 (PST)
Received: from daemon.chronias.ninth-circle.org (root@cable.ninth-circle.org [195.38.232.6])
	by njord.bart.nl (8.10.1/8.10.1) with ESMTP id f1696d650593;
	Tue, 6 Feb 2001 10:06:39 +0100 (CET)
Received: (from asmodai@localhost)
	by daemon.chronias.ninth-circle.org (8.11.1/8.11.0) id f1696YN91138;
	Tue, 6 Feb 2001 10:06:34 +0100 (CET)
	(envelope-from asmodai)
Date: Tue, 6 Feb 2001 10:06:34 +0100
From: Jeroen Ruigrok/Asmodai <asmodai@wxs.nl>
To: Nik Clayton <nik@freebsd.org>
Cc: arch@freebsd.org
Subject: Re: [andrew@ugh.net.au: docs/23745: man page for vcount(9)]
Message-ID: <20010206100634.K442@daemon.ninth-circle.org>
References: <20010202030540.B21835@canyon.nothing-going-on.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <20010202030540.B21835@canyon.nothing-going-on.org>; from nik@freebsd.org on Fri, Feb 02, 2001 at 03:05:43AM +0000
Organisation: Ninth-Circle Enterprises
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

-On [20010202 04:30], Nik Clayton (nik@freebsd.org) wrote:
>Anyone up for a review?  Cheers.

Done, and committed.

-- 
Jeroen Ruigrok vd Werven/Asmodai    asmodai@[wxs.nl|bart.nl|freebsd.org]
Documentation nutter/C-rated Coder BSD: Technical excellence at its best  
	  D78D D0AD 244D 1D12 C9CA  7152 035C 1138 546A B867
Let us eat and drink; for tomorrow we shall die...


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Feb  6  3: 0:29 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5817737B503; Tue,  6 Feb 2001 03:00:08 -0800 (PST)
Received: by relay.butya.kz (Postfix, from userid 1000)
	id BAE6628E66; Tue,  6 Feb 2001 17:00:03 +0600 (ALMT)
Received: from localhost (localhost [127.0.0.1])
	by relay.butya.kz (Postfix) with ESMTP
	id ABBCA28E46; Tue,  6 Feb 2001 17:00:03 +0600 (ALMT)
Date: Tue, 6 Feb 2001 17:00:03 +0600 (ALMT)
From: Boris Popov <bp@butya.kz>
To: freebsd-arch@freebsd.org
Cc: freebsd-fs@freebsd.org
Subject: vnode interlock API
Message-ID: <Pine.BSF.4.21.0102061638280.82511-100000@lion.butya.kz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

	Hello,

	Few months ago simple locks used for vnode interlock were replaced
by mutexes. It causes additional pain for externally maintained
filesystems and lowers portability of the code between -stable and
-current.

	So, I suggest to introduce two macro definitions which will hide
implementation details for interlocks:

#define VI_LOCK(vp)		mtx_enter(&(vp)->v_interlock, MTX_DEF)
#define VI_UNLOCK(vp)		mtx_exit(&(vp)->v_interlock, MTX_DEF)

	for RELENG_4 they will look like this:

#define VI_LOCK(vp)		simple_lock(&(vp)->v_interlock)
#define VI_UNLOCK(vp)		simple_unlock(&(vp)->v_interlock)

	Any comments, suggestions ?

--
Boris Popov
http://www.butya.kz/~bp/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Feb  6  3: 3:47 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147])
	by hub.freebsd.org (Postfix) with ESMTP
	id CEB2037B401; Tue,  6 Feb 2001 03:03:24 -0800 (PST)
Received: from critter (localhost [127.0.0.1])
	by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f16B33B33409;
	Tue, 6 Feb 2001 12:03:03 +0100 (CET)
	(envelope-from phk@critter.freebsd.dk)
To: Boris Popov <bp@butya.kz>
Cc: freebsd-arch@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG
Subject: Re: vnode interlock API 
In-Reply-To: Your message of "Tue, 06 Feb 2001 17:00:03 +0600."
             <Pine.BSF.4.21.0102061638280.82511-100000@lion.butya.kz> 
Date: Tue, 06 Feb 2001 12:03:03 +0100
Message-ID: <33407.981457383@critter>
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Sounds like something which should have been done long time ago...

In message <Pine.BSF.4.21.0102061638280.82511-100000@lion.butya.kz>, Boris Popov writes:
>	Hello,
>
>	Few months ago simple locks used for vnode interlock were replaced
>by mutexes. It causes additional pain for externally maintained
>filesystems and lowers portability of the code between -stable and
>-current.
>
>	So, I suggest to introduce two macro definitions which will hide
>implementation details for interlocks:
>
>#define VI_LOCK(vp)		mtx_enter(&(vp)->v_interlock, MTX_DEF)
>#define VI_UNLOCK(vp)		mtx_exit(&(vp)->v_interlock, MTX_DEF)
>
>	for RELENG_4 they will look like this:
>
>#define VI_LOCK(vp)		simple_lock(&(vp)->v_interlock)
>#define VI_UNLOCK(vp)		simple_unlock(&(vp)->v_interlock)
>
>	Any comments, suggestions ?
>
>--
>Boris Popov
>http://www.butya.kz/~bp/
>
>
>
>To Unsubscribe: send mail to majordomo@FreeBSD.org
>with "unsubscribe freebsd-arch" in the body of the message
>

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Feb  6  3:51:33 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94])
	by hub.freebsd.org (Postfix) with ESMTP
	id 2180B37B401; Tue,  6 Feb 2001 03:51:05 -0800 (PST)
Received: by relay.butya.kz (Postfix, from userid 1000)
	id E602728E45; Tue,  6 Feb 2001 17:50:52 +0600 (ALMT)
Received: from localhost (localhost [127.0.0.1])
	by relay.butya.kz (Postfix) with ESMTP
	id 7408928DEE; Tue,  6 Feb 2001 17:50:52 +0600 (ALMT)
Date: Tue, 6 Feb 2001 17:50:52 +0600 (ALMT)
From: Boris Popov <bp@butya.kz>
To: freebsd-arch@freebsd.org
Cc: freebsd-net@freebsd.org
Subject: CFR: Sequential mbuf read/write extensions
Message-ID: <Pine.BSF.4.21.0102061703030.82511-100000@lion.butya.kz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

[Please trim CC list as necessary]

	Hello,

	Before starting import process for smbfs, I would like to
introduce new API which greatly simplifies process of packaging data into
mbufs and fetching it back (in fact, similar API already presented in the
tree, but it is private to the netncp code and it will be really nice to
share it).

	Basically, it requires additional structure (working context) and
related functions:

struct mbdata {
        struct mbuf *   mb_top;
        struct mbuf *   mb_cur;
        u_char *        mb_pos;
        int             mb_count;
};

	Where mb_top points at the first mbuf in the chain and mb_cur to
the current mbuf. Here is a slightly truncated API to illustrate how it
works:

int  mb_init(struct mbdata *mbp);
int  mb_initm(struct mbdata *mbp, struct mbuf *m);
int  mb_done(struct mbdata *mbp);
int  mb_put_byte(struct mbdata *mbp, u_int8_t x);
int  mb_put_wordbe(struct mbdata *mbp, u_int16_t x);
int  mb_put_wordle(struct mbdata *mbp, u_int16_t x);
int  mb_put_dwordbe(struct mbdata *mbp, u_int32_t x);
int  mb_get_byte(struct mbdata *mbp, u_int8_t *x);
int  mb_get_word(struct mbdata *mbp, u_int16_t *x);
int  mb_get_wordle(struct mbdata *mbp, u_int16_t *x);
int  mb_get_wordbe(struct mbdata *mbp, u_int16_t *x);


	The mb_put* functions allow to append new data to mbuf chain.
These functions take care about necessary mbuf allocations and additional
data conversions. For example, mb_put_wordbe will store a 16 bit integer
in the network format while mb_put_wordle will convert it to the little
endian format if necessary.

	The mb_get* functions allow to fetch data from mbuf chains with
appropriate handling of mbuf borders and data conversions.


	Here is a simple examples (error checks are omitted):

Send:
        error = mb_init(mbp);
        if (error)
                return error;
        mb_put_mem(mbp, SMB_SIGNATURE, SMB_SIGLEN, MB_MSYSTEM);
        mb_put_byte(mbp, cmd);
        mb_put_dwordle(mbp, 1234);
        mb_put_byte(mbp, vcp->vc_hflags);
        mb_fixhdr(mbp);
 	my_great_send_function(mbp->mb_top);
	mb_done(mbp);

Receive:

	mb_initm(mbp, just_received_mbuf_chain);
        mb_get_byte(mbp, &rqp->sr_rpflags);
        mb_get_wordle(mbp, &rqp->sr_rpflags2);
        mb_get_dword(mbp, &tdw);
        mb_get_dword(mbp, &tdw);
        mb_get_dword(mbp, &tdw);
        mb_get_wordle(mbp, &rqp->sr_rptid);
        mb_get_wordle(mbp, &rqp->sr_rppid);
        mb_get_wordle(mbp, &rqp->sr_rpuid);
        mb_get_wordle(mbp, &rqp->sr_rpmid);

	Since currently there isn't many consumers of this code I can
suggest to define an option LIBMBUF in the kernel configuration file and
add KLD libmbuf (with interface libmbuf), so kernel footprint will not be
significantly affected. The names of source and header files are
questionable too and I would appreciate good suggestions (currently they
are subr_mbuf.c and subr_mbuf.h).

	Well, and finally here you will find full source code of proposed
API: http://www.butya.kz/~bp/mbuf/

	Any comments and suggestions are greatly appreciated.

--
Boris Popov
http://www.butya.kz/~bp/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Feb  6  7:31:11 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from prism.flugsvamp.com (cb58709-a.mdsn1.wi.home.com [24.17.241.9])
	by hub.freebsd.org (Postfix) with ESMTP id C36D437B401
	for <freebsd-arch@FreeBSD.ORG>; Tue,  6 Feb 2001 07:30:54 -0800 (PST)
Received: (from jlemon@localhost)
	by prism.flugsvamp.com (8.11.0/8.11.0) id f16FW1Y20391;
	Tue, 6 Feb 2001 09:32:01 -0600 (CST)
	(envelope-from jlemon)
Date: Tue, 6 Feb 2001 09:32:01 -0600
From: Jonathan Lemon <jlemon@flugsvamp.com>
To: Jonathan Graehl <jonathan@graehl.org>
Cc: Jonathan Lemon <jlemon@flugsvamp.com>, freebsd-arch@FreeBSD.ORG
Subject: Re: nonblocking sockets and EINTR (kevent does not observe SA_RESTART?)
Message-ID: <20010206093201.K650@prism.flugsvamp.com>
References: <20010205193507.J650@prism.flugsvamp.com> <NCBBLOALCKKINBNNEDDLEEGADKAA.jonathan@graehl.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre2i
In-Reply-To: <NCBBLOALCKKINBNNEDDLEEGADKAA.jonathan@graehl.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Mon, Feb 05, 2001 at 05:50:37PM -0800, Jonathan Graehl wrote:
> I assume, then, that you guarantee that the changelist is applied
> (and errors relating to the changes are placed in the 
> received-events-buffer, if possible) before the call becomes
> interruptible?  (and if there were an error that doesn't fit in the
> buffer, the return would be immediate with the error code); that is,
> only after the process goes to sleep waiting in kqueue, is there the
> possibility of an EINTR return? 

Correct.  Technically, an EINTR is returned when a signal interrupts
the process after it goes to sleep (that is, after it calls tsleep). 

So if (as an example) you call kevent() with a zero valued timespec,
you'll never get EINTR, since there's no possibility of it sleeping.
--
Jonathan


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Feb  6  8:32:25 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from mailout02.sul.t-online.com (mailout02.sul.t-online.com [194.25.134.17])
	by hub.freebsd.org (Postfix) with ESMTP
	id 8948D37B4EC; Tue,  6 Feb 2001 08:32:03 -0800 (PST)
Received: from fwd07.sul.t-online.com 
	by mailout02.sul.t-online.com with smtp 
	id 14QB27-00052q-00; Tue, 06 Feb 2001 17:31:59 +0100
Received: from frolic.no-support.loc (520094253176-0001@[217.80.111.106]) by fmrl07.sul.t-online.com
	with esmtp id 14QB1l-2Kk35mC; Tue, 6 Feb 2001 17:31:37 +0100
Received: (from bjoern@localhost)
	by frolic.no-support.loc (8.11.1/8.9.3) id f16GLp600648;
	Tue, 6 Feb 2001 17:21:51 +0100 (CET)
	(envelope-from bjoern)
From: Bjoern Fischer <bfischer@Techfak.Uni-Bielefeld.DE>
Date: Tue, 6 Feb 2001 17:21:50 +0100
To: Boris Popov <bp@butya.kz>
Cc: freebsd-arch@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG
Subject: Re: vnode interlock API
Message-ID: <20010206172150.A528@frolic.no-support.loc>
References: <Pine.BSF.4.21.0102061638280.82511-100000@lion.butya.kz>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <Pine.BSF.4.21.0102061638280.82511-100000@lion.butya.kz>; from bp@butya.kz on Tue, Feb 06, 2001 at 05:00:03PM +0600
X-Sender: 520094253176-0001@t-dialin.net
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Hello,

> 	Few months ago simple locks used for vnode interlock were replaced
> by mutexes. It causes additional pain for externally maintained
> filesystems and lowers portability of the code between -stable and
> -current.
> 
> 	So, I suggest to introduce two macro definitions which will hide
> implementation details for interlocks:
> 
> #define VI_LOCK(vp)		mtx_enter(&(vp)->v_interlock, MTX_DEF)
> #define VI_UNLOCK(vp)		mtx_exit(&(vp)->v_interlock, MTX_DEF)

BTW, does this mean that -current vnode locking works sufficiently
enough to support stacked file systems a la Eric Zadok's FiST software?

  Bjoern

-- 
-----BEGIN GEEK CODE BLOCK-----
GCS d--(+) s++: a- C+++(-) UB++++OSI++++$ P+++(-) L---(++) !E W- N+ o>+
K- !w !O !M !V  PS++  PE-  PGP++  t+++  !5 X++ tv- b+++ D++ G e+ h-- y+ 
------END GEEK CODE BLOCK------


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Feb  6 10:59:21 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP
	id 2968337B401; Tue,  6 Feb 2001 10:59:01 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f16Iwkd17957;
	Tue, 6 Feb 2001 10:58:46 -0800 (PST)
Date: Tue, 6 Feb 2001 10:58:46 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Boris Popov <bp@butya.kz>
Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG
Subject: Re: CFR: Sequential mbuf read/write extensions
Message-ID: <20010206105846.Q26076@fw.wintelcom.net>
References: <Pine.BSF.4.21.0102061703030.82511-100000@lion.butya.kz>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <Pine.BSF.4.21.0102061703030.82511-100000@lion.butya.kz>; from bp@butya.kz on Tue, Feb 06, 2001 at 05:50:52PM +0600
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Boris Popov <bp@butya.kz> [010206 03:51] wrote:
> [Please trim CC list as necessary]
> 
> 	Hello,
> 
> 	Before starting import process for smbfs, I would like to
> introduce new API which greatly simplifies process of packaging data into
> mbufs and fetching it back (in fact, similar API already presented in the
> tree, but it is private to the netncp code and it will be really nice to
> share it).

[snip]

Looks really cool, I can't get to http://www.butya.kz/~bp/mbuf/,
but from the examples it looks very useful.

I was wondering if you planned or already had an API for reading/writing
from/into host/network byte order?  Not that it's needed, but would
be nice to have.  Also any chance we'll get manpages that describe
these functions/macros?

On other idea is to give each op a 'count' parameter, your examples
seem to show various functions being called several times in a row,
maybe they would help optimize certain codepaths?

Not that any of these suggestions are really required, I just wanted
to give you some feedback. :)

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Feb  6 11:51:55 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from meow.osd.bsdi.com (meow.osd.bsdi.com [204.216.28.88])
	by hub.freebsd.org (Postfix) with ESMTP
	id 192A337B401; Tue,  6 Feb 2001 11:51:34 -0800 (PST)
Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241])
	by meow.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id f16Jo9345186;
	Tue, 6 Feb 2001 11:50:09 -0800 (PST)
	(envelope-from jhb@FreeBSD.org)
Message-ID: <XFMail.010206115111.jhb@FreeBSD.org>
X-Mailer: XFMail 1.4.0 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <Pine.BSF.4.21.0102061638280.82511-100000@lion.butya.kz>
Date: Tue, 06 Feb 2001 11:51:11 -0800 (PST)
From: John Baldwin <jhb@FreeBSD.org>
To: Boris Popov <bp@butya.kz>
Subject: RE: vnode interlock API
Cc: freebsd-fs@FreeBSD.org, freebsd-arch@FreeBSD.org
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On 06-Feb-01 Boris Popov wrote:
>       Hello,
> 
>       Few months ago simple locks used for vnode interlock were replaced
> by mutexes. It causes additional pain for externally maintained
> filesystems and lowers portability of the code between -stable and
> -current.

Sounds good.

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Feb  6 18:18:20 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94])
	by hub.freebsd.org (Postfix) with ESMTP
	id E3C2437B401; Tue,  6 Feb 2001 18:17:59 -0800 (PST)
Received: by relay.butya.kz (Postfix, from userid 1000)
	id 40E6B29059; Wed,  7 Feb 2001 08:17:54 +0600 (ALMT)
Received: from localhost (localhost [127.0.0.1])
	by relay.butya.kz (Postfix) with ESMTP
	id 312C628698; Wed,  7 Feb 2001 08:17:54 +0600 (ALMT)
Date: Wed, 7 Feb 2001 08:17:53 +0600 (ALMT)
From: Boris Popov <bp@butya.kz>
To: Alfred Perlstein <bright@wintelcom.net>
Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG
Subject: Re: CFR: Sequential mbuf read/write extensions
In-Reply-To: <20010206105846.Q26076@fw.wintelcom.net>
Message-ID: <Pine.BSF.4.21.0102070809420.4563-100000@lion.butya.kz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Tue, 6 Feb 2001, Alfred Perlstein wrote:

> Looks really cool, I can't get to http://www.butya.kz/~bp/mbuf/,
> but from the examples it looks very useful.

	Sorry, server was brought down and I wasn't notified :(. It should
be ok now.

> I was wondering if you planned or already had an API for reading/writing
> from/into host/network byte order?  Not that it's needed, but would
> be nice to have.  Also any chance we'll get manpages that describe
> these functions/macros?

	Yes, the header file contains macros which supports not only host
to network (big-endian) byte order conversion, but also to the
little-endian byte order. And of course, there will be a manpage(s) if
this is going to become a part of kernel API.

> On other idea is to give each op a 'count' parameter, your examples
> seem to show various functions being called several times in a row,
> maybe they would help optimize certain codepaths?

	Yes, there is a mb_{get|put}_mem() functions which allow
reading/writing of big memory regions (including user space). So, if
protocol is well designed and layout of the packet can be described as
structure, it is possible to fill it in the normal memory and copy the
mbuf chain in single operation.

> > Not that any of these suggestions are really required, I just wanted
> to give you some feedback. :)

	Thanks :)

--
Boris Popov
http://www.butya.kz/~bp/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Feb  6 19:42:51 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from VL-MS-MR002.sc1.videotron.ca (relais.videotron.ca [24.201.245.36])
	by hub.freebsd.org (Postfix) with ESMTP
	id 30A4E37B491; Tue,  6 Feb 2001 19:42:27 -0800 (PST)
Received: from jehovah ([24.201.144.31]) by
          VL-MS-MR002.sc1.videotron.ca (Netscape Messaging Server 4.15)
          with SMTP id G8DBJZ05.88O; Tue, 6 Feb 2001 22:40:47 -0500 
Message-ID: <003001c090b8$0b067a50$1f90c918@jehovah>
From: "Bosko Milekic" <bmilekic@technokratis.com>
To: "Boris Popov" <bp@butya.kz>, <freebsd-arch@FreeBSD.ORG>
Cc: <freebsd-net@FreeBSD.ORG>
References: <Pine.BSF.4.21.0102061703030.82511-100000@lion.butya.kz>
Subject: Re: Sequential mbuf read/write extensions
Date: Tue, 6 Feb 2001 22:42:49 -0500
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2919.6700
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Boris Popov wrote:

[...]
> Since currently there isn't many consumers of this code I can
> suggest to define an option LIBMBUF in the kernel configuration file
and
> add KLD libmbuf (with interface libmbuf), so kernel footprint will
not be

    I am in favor of such an option on the condition that it is
temporary. In other words, only until we decide "we have converted
enough code to use this code so we should remove the option now." The
reason is that otherwise, we will be faced with numerous "#ifdef
LIBMBUF ... #else ... #endif" code. I assume this is what you meant,
anyway, so I have no objections. :-) The API looks great by the way,
and I will try to give a more detailed review in the next few days.
:-)

For now:

#define M_TRYWAIT M_WAIT is not right.
(M_WAIT is no longer to be used in the mbuf code.)

The succesfull return values are 0, I don't have a problem with this,
specifically, but I would assume that this:
if (!mb_init(mbp))  ... would be more "logical" (I use the term
loosely) if it meant: "if initialization fails" (now it means "if
initialization is succesful").

> significantly affected. The names of source and header files are
> questionable too and I would appreciate good suggestions (currently
they
> are subr_mbuf.c and subr_mbuf.h).

    Hmmm. Maybe subr_mblib.c and libmb.h ? I don't want to turn this
into a bikeshed ( :-) ), so I suggest that you decide. Personally, I
would prefer that it be something other than "subr_mbuf.c" simply
because it may be a little misleading in some cases.

> Well, and finally here you will find full source code of proposed
> API: http://www.butya.kz/~bp/mbuf/
>
> Any comments and suggestions are greatly appreciated.
>
> --
> Boris Popov
> http://www.butya.kz/~bp/

Boris, this is really a great interface and nice looking, clean code.
Thank you!

Regards,
Bosko.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7  0:26:39 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP id D02E037B491
	for <arch@FreeBSD.org>; Wed,  7 Feb 2001 00:26:19 -0800 (PST)
Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.11.1/8.11.1) with SMTP id f178QHh11778
	for <arch@FreeBSD.org>; Wed, 7 Feb 2001 03:26:18 -0500 (EST)
	(envelope-from robert@fledge.watson.org)
Date: Wed, 7 Feb 2001 03:26:17 -0500 (EST)
From: Robert Watson <rwatson@FreeBSD.org>
X-Sender: robert@fledge.watson.org
To: arch@FreeBSD.org
Subject: Moving struct proc's p_prison to ucred as cr_prison
Message-ID: <Pine.NEB.3.96L.1010207030206.98384J-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


I'm planning on committing a close approximation to the following in the
near future:

  http://www.watson.org/~robert/jail-to-ucred.diff

The p_prison pointer in the process structure ties a process to its
jail(8) prison structure.  This patch moves that pointer from the process
structure to the credential structure, as well as cleaning up a few other
bits and pieces associated with jail and process access control.  Here are
some more details for those interested in reviewing the changes (which
will be committed in components, and is currently waiting on an xucred fix
so that mountd doesn't panic the system when it's concept of ucred doesn't
match with the kernerl version). 

- proc->p_prison moved to ucred->cr_prison
- abstract out jail reference counting using prison_hold() and
  prison_free()
- make jail inheritence be a function of credential inheritence
- make jail garbage collection be a function of credential garbage
  collection
- modify various jail (prison_*) functions to accepting ucred instead of
  proc
- introduce jailed(ucred) call to check if a ucred is in jail rather than
  direct (p->p_prison!=NULL) checks all over the place
- remove const qualifier from various calls, including suser, p_can,
  cap_check, to reflect mutex use in the near future
- remove unnecessary prison check in bpf device code (we use namespacing
  to protect devices, where possible)
- move various jail function prototypes to jail.h
- convert PRISON_CHECK from a macro to a function
- comment a number of situations where it's now possible to test jail
  presence with respects to a passed credential rather than the current
  process (usually in the socket code).  No semantics changes here just
  yet, but there may be in the future.  Comments won't be committed, but
  are there to guide the reader in understanding the diffs.

Generally, the benefits of this change include:
- increasingly modularized jail, making the idea of a kld-loadable jail
  or customized jail() more conceivable -- hide jail implementation
  from many consumers of jail (not all yet, especially in pty code and
  Linux ABI)
- move towards a model where access control decisions can be made without
  reference to the process, just the credential (won't be entirely
  possible as some access decisions are based on p_session and related
  concepts for signalling, but it helps).
- move towards a model where pre-bound sockets could be passed into
  a jail via UDS allowing the jail access to some outside system
  resources (much the same way as cached socket credentials allow non-root
  processes to use sockets bound while holding privilege).

As I mention above, right now applying this change without rebuilding
mountd can result in a system panic, due to differeing interpretations of
the ucred structure between kernel and userland.  Brian Feldman apparently
has patches to fix this by making the userland/kernel ABI/API use xucred; 
in the mean time if you decide to test this, disable NFS serving, or
remember to rebuild userland.

Working through these changes prompted my earlier question about NULL
credential references.  It may be that the race windows in fork1() and
exit1() (possibly wait1()) require additional checks in these patches.

BTW, while working with the ucred code, I noticed that while uidinfo
appears to have moved to ucred, there are still uidinfo references in
struct proc.  I haven't followed up on why this might be the case as yet.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
robert@fledge.watson.org      NAI Labs, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7  0:34:31 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from mailhub.fokus.gmd.de (mailhub.fokus.gmd.de [193.174.154.14])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5EA3737B401; Wed,  7 Feb 2001 00:34:12 -0800 (PST)
Received: from beagle (beagle [193.175.132.100])
	by mailhub.fokus.gmd.de (8.8.8/8.8.8) with ESMTP id JAA24496;
	Wed, 7 Feb 2001 09:33:15 +0100 (MET)
Date: Wed, 7 Feb 2001 09:33:15 +0100 (CET)
From: Harti Brandt <brandt@fokus.gmd.de>
To: Boris Popov <bp@butya.kz>
Cc: <freebsd-arch@FreeBSD.ORG>, <freebsd-net@FreeBSD.ORG>
Subject: Re: CFR: Sequential mbuf read/write extensions
In-Reply-To: <20010206105846.Q26076@fw.wintelcom.net>
Message-ID: <Pine.BSF.4.32.0102070927410.6318-100000@beagle.fokus.gmd.de>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Looks nice, just what I needed two weeks ago and partly had
to implement myself :-)

But, I would recommend to stick with the ususal naming of size dependend
things, by appending a numeric suffix. Something like:

int  mb_get8(struct mbdata *mbp, u_int8_t *x);
int  mb_get16(struct mbdata *mbp, u_int16_t *x);
int  mb_get16le(struct mbdata *mbp, u_int16_t *x);
int  mb_get16be(struct mbdata *mbp, u_int16_t *x);
int  mb_get32(struct mbdata *mbp, u_int32_t *x);
...

Using 'word' and 'doubleword' is rather confusing (when speeking of words
I would think of 32 bit nowadays).

harti

-- 
harti brandt, http://www.fokus.gmd.de/research/cc/cats/employees/hartmut.brandt/private
              brandt@fokus.gmd.de, harti@begemot.org, lhbrandt@mail.ru


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7  1:29:26 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7863E37B684; Wed,  7 Feb 2001 01:28:59 -0800 (PST)
Received: by relay.butya.kz (Postfix, from userid 1000)
	id E5B812868D; Wed,  7 Feb 2001 15:28:49 +0600 (ALMT)
Received: from localhost (localhost [127.0.0.1])
	by relay.butya.kz (Postfix) with ESMTP
	id D66D82868A; Wed,  7 Feb 2001 15:28:49 +0600 (ALMT)
Date: Wed, 7 Feb 2001 15:28:49 +0600 (ALMT)
From: Boris Popov <bp@butya.kz>
To: Bosko Milekic <bmilekic@technokratis.com>
Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG
Subject: Re: Sequential mbuf read/write extensions
In-Reply-To: <003001c090b8$0b067a50$1f90c918@jehovah>
Message-ID: <Pine.BSF.4.21.0102071516110.7952-100000@lion.butya.kz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Tue, 6 Feb 2001, Bosko Milekic wrote:

> > Since currently there isn't many consumers of this code I can
> > suggest to define an option LIBMBUF in the kernel configuration file
> and
> > add KLD libmbuf (with interface libmbuf), so kernel footprint will
> not be
> 
>     I am in favor of such an option on the condition that it is
> temporary. In other words, only until we decide "we have converted
> enough code to use this code so we should remove the option now." The
> reason is that otherwise, we will be faced with numerous "#ifdef
> LIBMBUF ... #else ... #endif" code. I assume this is what you meant,

	Not exactly so. 'option LIBMBUF' will just connect the source file
to kernel makefile. There is no need for any #ifdef's in the code.

> #define M_TRYWAIT M_WAIT is not right.
> (M_WAIT is no longer to be used in the mbuf code.)

	You omitted the surrounding "#ifndef M_TRYWAIT" which makes this
code portable to RELENG_4 (mind you, this code taken from smbfs). Of
course, this should be stripped before import.

> The succesfull return values are 0, I don't have a problem with this,
> specifically, but I would assume that this:
> if (!mb_init(mbp))  ... would be more "logical" (I use the term
> loosely) if it meant: "if initialization fails" (now it means "if
> initialization is succesful").

	I'm generally don't like such syntax if function or variable name
do not clearly specify which value it should have/return on success. 
Nearly all functions in this file return zero or error code, so the
correct syntax of the above will be:

	error = mb_init(mbp);
	if (!error)

or

	if (error)
		return error;

or

	if (mb_init(mbp) != 0)
		return ESOMETHINGEVIL;

> > significantly affected. The names of source and header files are
> > questionable too and I would appreciate good suggestions (currently
> they
> > are subr_mbuf.c and subr_mbuf.h).
> 
>     Hmmm. Maybe subr_mblib.c and libmb.h ? I don't want to turn this
> into a bikeshed ( :-) ), so I suggest that you decide. Personally, I
> would prefer that it be something other than "subr_mbuf.c" simply
> because it may be a little misleading in some cases.

	Good point.

> Boris, this is really a great interface and nice looking, clean code.

	I'm sure, this code can be significantly improved by mbuf gurus :)

--
Boris Popov
http://www.butya.kz/~bp/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7  1:35:38 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94])
	by hub.freebsd.org (Postfix) with ESMTP
	id BE07737B699; Wed,  7 Feb 2001 01:35:18 -0800 (PST)
Received: by relay.butya.kz (Postfix, from userid 1000)
	id CCF6A28648; Wed,  7 Feb 2001 15:35:16 +0600 (ALMT)
Received: from localhost (localhost [127.0.0.1])
	by relay.butya.kz (Postfix) with ESMTP
	id C394528647; Wed,  7 Feb 2001 15:35:16 +0600 (ALMT)
Date: Wed, 7 Feb 2001 15:35:16 +0600 (ALMT)
From: Boris Popov <bp@butya.kz>
To: Harti Brandt <brandt@fokus.gmd.de>
Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG
Subject: Re: CFR: Sequential mbuf read/write extensions
In-Reply-To: <Pine.BSF.4.32.0102070927410.6318-100000@beagle.fokus.gmd.de>
Message-ID: <Pine.BSF.4.21.0102071530140.7952-100000@lion.butya.kz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, 7 Feb 2001, Harti Brandt wrote:

> But, I would recommend to stick with the ususal naming of size dependend
> things, by appending a numeric suffix. Something like:
> 
> int  mb_get8(struct mbdata *mbp, u_int8_t *x);
> int  mb_get16(struct mbdata *mbp, u_int16_t *x);
> int  mb_get16le(struct mbdata *mbp, u_int16_t *x);
> int  mb_get16be(struct mbdata *mbp, u_int16_t *x);
> int  mb_get32(struct mbdata *mbp, u_int32_t *x);
> ...
> 
> Using 'word' and 'doubleword' is rather confusing (when speeking of words
> I would think of 32 bit nowadays).

	Well, it depends. For me 'word', 'dword' and 'qword' are clear
from the good old 8bit days :)

	If numbers in the function names looks good I can live with it.

	Opinions ?

--
Boris Popov
http://www.butya.kz/~bp/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7  1:57:44 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from mailhub.fokus.gmd.de (mailhub.fokus.gmd.de [193.174.154.14])
	by hub.freebsd.org (Postfix) with ESMTP
	id AED2037B69E; Wed,  7 Feb 2001 01:57:17 -0800 (PST)
Received: from beagle (beagle [193.175.132.100])
	by mailhub.fokus.gmd.de (8.8.8/8.8.8) with ESMTP id KAA01306;
	Wed, 7 Feb 2001 10:44:55 +0100 (MET)
Date: Wed, 7 Feb 2001 10:44:55 +0100 (CET)
From: Harti Brandt <brandt@fokus.gmd.de>
To: Boris Popov <bp@butya.kz>
Cc: <freebsd-arch@freebsd.org>, <freebsd-net@freebsd.org>
Subject: Re: CFR: Sequential mbuf read/write extensions
In-Reply-To: <Pine.BSF.4.21.0102071530140.7952-100000@lion.butya.kz>
Message-ID: <Pine.BSF.4.32.0102071036000.6318-100000@beagle.fokus.gmd.de>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, 7 Feb 2001, Boris Popov wrote:

BP>> Using 'word' and 'doubleword' is rather confusing (when speeking of words
BP>> I would think of 32 bit nowadays).
BP>
BP>	Well, it depends. For me 'word', 'dword' and 'qword' are clear
BP>from the good old 8bit days :)
BP>
BP>	If numbers in the function names looks good I can live with it.

Well, I just looked back to the bus_space stuff and discovered, that they
use suffixes of _[1234] to count the number of bytes the functions operate
on. Perhaps this is a better variant? Anyway, I think, numbers are much
clearer, than words in this case (As an example, what does ntohl operate on
if longs are 64 bit??).

As a side note:

Someone told me that Mickeysoft is trying to persuade the C
standardisation people to drop the requirement that longs should not be
shorter than int's. This is, he said, because of their braindamage with
DWORD in -zillions of header files... If I look how they continue to
cripple C, this may also slip through :-(

harti
-- 
harti brandt, http://www.fokus.gmd.de/research/cc/cats/employees/hartmut.brandt/private
              brandt@fokus.gmd.de, harti@begemot.org, lhbrandt@mail.ru


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7 10:29:37 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from green.dyndns.org (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP id 0394A37B401
	for <arch@FreeBSD.org>; Wed,  7 Feb 2001 10:29:07 -0800 (PST)
Received: from localhost (6lbzax@localhost [127.0.0.1])
	by green.dyndns.org (8.11.1/8.11.1) with ESMTP id f17ISLr17637
	for <arch@FreeBSD.org>; Wed, 7 Feb 2001 13:28:33 -0500 (EST)
	(envelope-from green@FreeBSD.org)
Message-Id: <200102071828.f17ISLr17637@green.dyndns.org>
X-Mailer: exmh version 2.3.1 01/18/2001 with nmh-1.0.4
To: arch@FreeBSD.org
Subject: xucred introduction
From: "Brian F. Feldman" <green@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 07 Feb 2001 13:28:21 -0500
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

I'd like to commit this further clean-up of the kernel API in which struct 
ucred's use outside of the kernel is to be a last resort, and everything 
which would use ucred will use xucred.  This mainly affects mount(2), and 
changes the size of those structures.  However, xucred won't have to be 
changing size all the time, so this will be the last time mountd or
i(den|ne)td would panic the kernel or return an error (respectively) for 
changes to ucred.

Mike Smith would prefer it that for userland, ucred and xucred would be more 
something like the in-kernel kucred and external ucred, but I believe this 
will introduce absolutely nothing but headaches for code due to 
conditionalized structure definition upon _KERNEL being defined.  Therefore, 
I've kept ucred as the in-kernel structure for kvm-using apps, xucred for 
everything else, with unfortunately the limitation that ucred.h must still 
be treated as a kernel header and dependencies noted accordingly by the 
programmer.

I've verified this works on at least -CURRENT from the past week.  I'd like 
to commit it soon to lessen any pain from more ucred changes, like 
rwatson's.  The only question is whether or not to add some spare fields to 
xucred now in case we /do/ want to expand it in the future, and also whether 
it's appropriate to make some of the field type changes (for example, 
sockaddr length type -> u_char, since that _IS_ what is defined by the 
sockaddr interface).

Discussion please :)

Index: sbin/mountd/mountd.c
===================================================================
RCS file: /usr2/ncvs/src/sbin/mountd/mountd.c,v
retrieving revision 1.39
diff -u -r1.39 mountd.c
--- sbin/mountd/mountd.c	1999/12/03 20:23:53	1.39
+++ sbin/mountd/mountd.c	2001/01/23 00:24:24
@@ -161,9 +161,9 @@
 void	del_mlist __P((char *, char *));
 struct dirlist *dirp_search __P((struct dirlist *, char *));
 int	do_mount __P((struct exportlist *, struct grouplist *, int,
-		struct ucred *, char *, int, struct statfs *));
+		struct xucred *, char *, int, struct statfs *));
 int	do_opt __P((char **, char **, struct exportlist *, struct grouplist *,
-				int *, int *, struct ucred *));
+				int *, int *, struct xucred *));
 struct	exportlist *ex_search __P((fsid_t *));
 struct	exportlist *get_exp __P((void));
 void	free_dir __P((struct dirlist *));
@@ -184,7 +184,7 @@
 void	mntsrv __P((struct svc_req *, SVCXPRT *));
 void	nextfield __P((char **, char **));
 void	out_of_mem __P((void));
-void	parsecred __P((char *, struct ucred *));
+void	parsecred __P((char *, struct xucred *));
 int	put_exlist __P((struct dirlist *, XDR *, struct dirlist *, int *));
 int	scan_tree __P((struct dirlist *, u_int32_t));
 static void usage __P((void));
@@ -202,8 +202,7 @@
 struct mountlist *mlhead;
 struct grouplist *grphead;
 char exname[MAXPATHLEN];
-struct ucred def_anon = {
-	1,
+struct xucred def_anon = {
 	(uid_t) -2,
 	1,
 	{ (gid_t) -2 }
@@ -732,7 +731,7 @@
 	struct dirlist *dirhead;
 	struct statfs fsb, *fsp;
 	struct hostent *hpe;
-	struct ucred anon;
+	struct xucred anon;
 	char *cp, *endcp, *dirp, *hst, *usr, *dom, savedc;
 	int len, has_host, exflags, got_nondir, dirplen, num, i, netgrp;
 
@@ -1332,7 +1331,7 @@
 	struct grouplist *grp;
 	int *has_hostp;
 	int *exflagsp;
-	struct ucred *cr;
+	struct xucred *cr;
 {
 	char *cpoptarg, *cpoptend;
 	char *cp, *endcp, *cpopt, savedc, savedc2;
@@ -1591,7 +1590,7 @@
 	struct exportlist *ep;
 	struct grouplist *grp;
 	int exflags;
-	struct ucred *anoncrp;
+	struct xucred *anoncrp;
 	char *dirp;
 	int dirplen;
 	struct statfs *fsb;
@@ -1842,7 +1841,7 @@
 void
 parsecred(namelist, cr)
 	char *namelist;
-	struct ucred *cr;
+	struct xucred *cr;
 {
 	char *name;
 	int cnt;
@@ -1854,7 +1853,6 @@
 	/*
 	 * Set up the unprivileged user.
 	 */
-	cr->cr_ref = 1;
 	cr->cr_uid = -2;
 	cr->cr_groups[0] = -2;
 	cr->cr_ngroups = 1;
Index: sys/kern/vfs_subr.c
===================================================================
RCS file: /usr2/ncvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.301
diff -u -r1.301 vfs_subr.c
--- sys/kern/vfs_subr.c	2001/01/31 04:54:23	1.301
+++ sys/kern/vfs_subr.c	2001/02/01 04:14:22
@@ -2319,7 +2319,11 @@
 			return (EPERM);
 		np = &nep->ne_defexported;
 		np->netc_exflags = argp->ex_flags;
-		np->netc_anon = argp->ex_anon;
+		bzero(&np->netc_anon, sizeof(np->netc_anon));
+		np->netc_anon.cr_uid = argp->ex_anon.cr_uid;
+		np->netc_anon.cr_ngroups = argp->ex_anon.cr_ngroups;
+		bcopy(argp->ex_anon.cr_groups, np->netc_anon.cr_groups,
+		    sizeof(np->netc_anon.cr_groups));
 		np->netc_anon.cr_ref = 1;
 		mp->mnt_flag |= MNT_DEFEXPORTED;
 		return (0);
@@ -2363,7 +2367,11 @@
 		goto out;
 	}
 	np->netc_exflags = argp->ex_flags;
-	np->netc_anon = argp->ex_anon;
+	bzero(&np->netc_anon, sizeof(np->netc_anon));
+	np->netc_anon.cr_uid = argp->ex_anon.cr_uid;
+	np->netc_anon.cr_ngroups = argp->ex_anon.cr_ngroups;
+	bcopy(argp->ex_anon.cr_groups, np->netc_anon.cr_groups,
+	    sizeof(np->netc_anon.cr_groups));
 	np->netc_anon.cr_ref = 1;
 	return (0);
 out:
Index: sys/netinet/tcp_subr.c
===================================================================
RCS file: /usr2/ncvs/src/sys/netinet/tcp_subr.c,v
retrieving revision 1.86
diff -u -r1.86 tcp_subr.c
--- sys/netinet/tcp_subr.c	2000/12/24 10:57:21	1.86
+++ sys/netinet/tcp_subr.c	2001/01/23 00:13:00
@@ -893,6 +893,7 @@
 static int
 tcp_getcred(SYSCTL_HANDLER_ARGS)
 {
+	struct xucred xuc;
 	struct sockaddr_in addrs[2];
 	struct inpcb *inp;
 	int error, s;
@@ -910,19 +911,25 @@
 		error = ENOENT;
 		goto out;
 	}
-	error = SYSCTL_OUT(req, inp->inp_socket->so_cred, sizeof(struct ucred));
+
+	xuc.cr_uid = inp->inp_socket->so_cred->cr_uid;
+	xuc.cr_ngroups = inp->inp_socket->so_cred->cr_ngroups;
+	bcopy(inp->inp_socket->so_cred->cr_groups, xuc.cr_groups,
+	    sizeof(xuc.cr_groups));
+	error = SYSCTL_OUT(req, &xuc, sizeof(struct xucred));
 out:
 	splx(s);
 	return (error);
 }
 
 SYSCTL_PROC(_net_inet_tcp, OID_AUTO, getcred, CTLTYPE_OPAQUE|CTLFLAG_RW,
-    0, 0, tcp_getcred, "S,ucred", "Get the ucred of a TCP connection");
+    0, 0, tcp_getcred, "S,xucred", "Get the xucred of a TCP connection");
 
 #ifdef INET6
 static int
 tcp6_getcred(SYSCTL_HANDLER_ARGS)
 {
+	struct xucred xuc;
 	struct sockaddr_in6 addrs[2];
 	struct inpcb *inp;
 	int error, s, mapped = 0;
@@ -956,8 +963,12 @@
 		error = ENOENT;
 		goto out;
 	}
-	error = SYSCTL_OUT(req, inp->inp_socket->so_cred, 
-			   sizeof(struct ucred));
+	
+	xuc.cr_uid = inp->inp_socket->so_cred->cr_uid;
+	xuc.cr_ngroups = inp->inp_socket->so_cred->cr_ngroups;
+	bcopy(inp->inp_socket->so_cred->cr_groups, xuc.cr_groups,
+	    sizeof(xuc.cr_groups));
+	error = SYSCTL_OUT(req, &xuc, sizeof(struct xucred));
 out:
 	splx(s);
 	return (error);
@@ -965,7 +976,7 @@
 
 SYSCTL_PROC(_net_inet6_tcp6, OID_AUTO, getcred, CTLTYPE_OPAQUE|CTLFLAG_RW,
 	    0, 0,
-	    tcp6_getcred, "S,ucred", "Get the ucred of a TCP6 connection");
+	    tcp6_getcred, "S,xucred", "Get the xucred of a TCP6 connection");
 #endif
 
 
Index: sys/netinet/udp_usrreq.c
===================================================================
RCS file: /usr2/ncvs/src/sys/netinet/udp_usrreq.c,v
retrieving revision 1.80
diff -u -r1.80 udp_usrreq.c
--- sys/netinet/udp_usrreq.c	2000/12/24 10:57:21	1.80
+++ sys/netinet/udp_usrreq.c	2001/01/23 00:13:50
@@ -606,6 +606,7 @@
 static int
 udp_getcred(SYSCTL_HANDLER_ARGS)
 {
+	struct xucred xuc;
 	struct sockaddr_in addrs[2];
 	struct inpcb *inp;
 	int error, s;
@@ -623,14 +624,19 @@
 		error = ENOENT;
 		goto out;
 	}
-	error = SYSCTL_OUT(req, inp->inp_socket->so_cred, sizeof(struct ucred));
+
+	xuc.cr_uid = inp->inp_socket->so_cred->cr_uid;
+	xuc.cr_ngroups = inp->inp_socket->so_cred->cr_ngroups;
+	bcopy(inp->inp_socket->so_cred->cr_groups, xuc.cr_groups,
+	    sizeof(xuc.cr_groups));
+	error = SYSCTL_OUT(req, &xuc, sizeof(struct xucred));
 out:
 	splx(s);
 	return (error);
 }
 
 SYSCTL_PROC(_net_inet_udp, OID_AUTO, getcred, CTLTYPE_OPAQUE|CTLFLAG_RW,
-    0, 0, udp_getcred, "S,ucred", "Get the ucred of a UDP connection");
+    0, 0, udp_getcred, "S,xucred", "Get the xucred of a UDP connection");
 
 static int
 udp_output(inp, m, addr, control, p)
Index: sys/netinet6/udp6_usrreq.c
===================================================================
RCS file: /usr2/ncvs/src/sys/netinet6/udp6_usrreq.c,v
retrieving revision 1.13
diff -u -r1.13 udp6_usrreq.c
--- sys/netinet6/udp6_usrreq.c	2000/10/23 07:11:01	1.13
+++ sys/netinet6/udp6_usrreq.c	2001/01/23 00:15:16
@@ -474,6 +474,7 @@
 static int
 udp6_getcred(SYSCTL_HANDLER_ARGS)
 {
+	struct xucred xuc;
 	struct sockaddr_in6 addrs[2];
 	struct inpcb *inp;
 	int error, s;
@@ -484,7 +485,7 @@
 
 	if (req->newlen != sizeof(addrs))
 		return (EINVAL);
-	if (req->oldlen != sizeof(struct ucred))
+	if (req->oldlen != sizeof(struct xucred))
 		return (EINVAL);
 	error = SYSCTL_IN(req, addrs, sizeof(addrs));
 	if (error)
@@ -498,9 +499,12 @@
 		error = ENOENT;
 		goto out;
 	}
-	error = SYSCTL_OUT(req, inp->inp_socket->so_cred,
-			   sizeof(struct ucred));
 
+	xuc.cr_uid = inp->inp_socket->so_cred->cr_uid;
+	xuc.cr_ngroups = inp->inp_socket->so_cred->cr_ngroups;
+	bcopy(inp->inp_socket->so_cred->cr_groups, xuc.cr_groups,
+	    sizeof(xuc.cr_groups));
+	error = SYSCTL_OUT(req, &xuc, sizeof(struct xucred));
 out:
 	splx(s);
 	return (error);
@@ -508,7 +512,7 @@
 
 SYSCTL_PROC(_net_inet6_udp6, OID_AUTO, getcred, CTLTYPE_OPAQUE|CTLFLAG_RW,
 	    0, 0,
-	    udp6_getcred, "S,ucred", "Get the ucred of a UDP6 connection");
+	    udp6_getcred, "S,xucred", "Get the xucred of a UDP6 connection");
 
 static int
 udp6_abort(struct socket *so)
Index: sys/nfs/nfs.h
===================================================================
RCS file: /usr2/ncvs/src/sys/nfs/nfs.h,v
retrieving revision 1.56
diff -u -r1.56 nfs.h
--- sys/nfs/nfs.h	2000/10/24 10:13:36	1.56
+++ sys/nfs/nfs.h	2001/01/23 00:28:27
@@ -197,7 +197,7 @@
 	struct nfsd	*nsd_nfsd;	/* Pointer to in kernel nfsd struct */
 	uid_t		nsd_uid;	/* Effective uid mapped to cred */
 	u_int32_t	nsd_haddr;	/* Ip address of client */
-	struct ucred	nsd_cr;		/* Cred. uid maps to */
+	struct xucred	nsd_cr;		/* Cred. uid maps to */
 	int		nsd_authlen;	/* Length of auth string (ret) */
 	u_char		*nsd_authstr;	/* Auth string (ret) */
 	int		nsd_verflen;	/* and the verfier */
Index: sys/nfs/nfs_syscalls.c
===================================================================
RCS file: /usr2/ncvs/src/sys/nfs/nfs_syscalls.c,v
retrieving revision 1.64
diff -u -r1.64 nfs_syscalls.c
--- sys/nfs/nfs_syscalls.c	2000/12/21 21:44:24	1.64
+++ sys/nfs/nfs_syscalls.c	2001/01/23 00:48:56
@@ -244,7 +244,7 @@
 				slp->ns_numuids++;
 				nuidp = (struct nfsuid *)
 				   malloc(sizeof (struct nfsuid), M_NFSUID,
-					M_WAITOK);
+					M_WAITOK | M_ZERO);
 			    } else
 				nuidp = (struct nfsuid *)0;
 			    if ((slp->ns_flag & SLP_VALID) == 0) {
@@ -260,7 +260,12 @@
 					FREE(nuidp->nu_nam, M_SONAME);
 			        }
 				nuidp->nu_flag = 0;
-				nuidp->nu_cr = nsd->nsd_cr;
+				nuidp->nu_cr.cr_uid = nsd->nsd_cr.cr_uid;
+				nuidp->nu_cr.cr_ngroups =
+				  nsd->nsd_cr.cr_ngroups;
+				bcopy(nsd->nsd_cr.cr_groups,
+				  nuidp->nu_cr.cr_groups,
+				  sizeof(nuidp->nu_cr.cr_groups));
 				if (nuidp->nu_cr.cr_ngroups > NGROUPS)
 				    nuidp->nu_cr.cr_ngroups = NGROUPS;
 				nuidp->nu_cr.cr_ref = 1;
Index: sys/sys/mount.h
===================================================================
RCS file: /usr2/ncvs/src/sys/sys/mount.h,v
retrieving revision 1.99
diff -u -r1.99 mount.h
--- sys/sys/mount.h	2000/12/04 09:21:05	1.99
+++ sys/sys/mount.h	2001/01/23 00:32:10
@@ -245,11 +245,11 @@
 struct export_args {
 	int	ex_flags;		/* export related flags */
 	uid_t	ex_root;		/* mapping for root uid */
-	struct	ucred ex_anon;		/* mapping for anonymous user */
+	struct	xucred ex_anon;		/* mapping for anonymous user */
 	struct	sockaddr *ex_addr;	/* net address to which exported */
-	int	ex_addrlen;		/* and the net address length */
+	u_char	ex_addrlen;		/* and the net address length */
 	struct	sockaddr *ex_mask;	/* mask of valid bits in saddr */
-	int	ex_masklen;		/* and the smask length */
+	u_char	ex_masklen;		/* and the smask length */
 	char	*ex_indexfile;		/* index file for WebNFS URLs */
 };
 
Index: sys/sys/ucred.h
===================================================================
RCS file: /usr2/ncvs/src/sys/sys/ucred.h,v
retrieving revision 1.19
diff -u -r1.19 ucred.h
--- sys/sys/ucred.h	2000/11/30 19:09:47	1.19
+++ sys/sys/ucred.h	2001/01/28 22:53:01
@@ -53,9 +53,18 @@
 	struct	uidinfo *cr_uidinfo;	/* per uid resource consumption */
 	struct	mtx cr_mtx;		/* protect refcount */
 };
-#define cr_gid cr_groups[0]
 #define NOCRED ((struct ucred *)0)	/* no credential available */
 #define FSCRED ((struct ucred *)-1)	/* filesystem credential */
+
+/*
+ * This is the external representation of struct ucred which "won't change".
+ */
+struct xucred {
+	uid_t	cr_uid;			/* effective user id */
+	short	cr_ngroups;		/* number of groups */
+	gid_t	cr_groups[NGROUPS];	/* groups */
+};
+#define cr_gid cr_groups[0]
 
 #ifdef _KERNEL
 
Index: usr.sbin/inetd/builtins.c
===================================================================
RCS file: /usr2/ncvs/src/usr.sbin/inetd/builtins.c,v
retrieving revision 1.29
diff -u -r1.29 builtins.c
--- usr.sbin/inetd/builtins.c	2000/12/05 13:56:01	1.29
+++ usr.sbin/inetd/builtins.c	2001/01/22 23:54:26
@@ -338,7 +338,7 @@
 	struct sockaddr_in6 sin6[2];
 #endif
 	struct sockaddr_storage ss[2];
-	struct ucred uc;
+	struct xucred uc;
 	struct timeval tv = {
 		10,
 		0


-- 
 Brian Fundakowski Feldman           \  FreeBSD: The Power to Serve!  /
 green@FreeBSD.org                    `------------------------------'


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7 10:33:19 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147])
	by hub.freebsd.org (Postfix) with ESMTP
	id DBF3B37B491; Wed,  7 Feb 2001 10:32:59 -0800 (PST)
Received: from critter (localhost [127.0.0.1])
	by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f17IX5H02713;
	Wed, 7 Feb 2001 19:33:05 +0100 (CET)
	(envelope-from phk@critter.freebsd.dk)
To: "Brian F. Feldman" <green@FreeBSD.ORG>
Cc: arch@FreeBSD.ORG
Subject: Re: xucred introduction 
In-Reply-To: Your message of "Wed, 07 Feb 2001 13:28:21 EST."
             <200102071828.f17ISLr17637@green.dyndns.org> 
Date: Wed, 07 Feb 2001 19:33:05 +0100
Message-ID: <2711.981570785@critter>
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In message <200102071828.f17ISLr17637@green.dyndns.org>, "Brian F. Feldman" wri
tes:

>The only question is whether or not to add some spare fields to 
>xucred now in case we /do/ want to expand it in the future, and also whether 
>it's appropriate to make some of the field type changes (for example, 
>sockaddr length type -> u_char, since that _IS_ what is defined by the 
>sockaddr interface).

Have you already put a version number in it ?  Otherwise please
do so.  That is the best way to ensure that we don't get too many
problems in the future.

I think in general all structures shared between the kernel and userland
should be equipped with a version number as the first element.

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7 10:37:31 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from syncopation-03.iinet.net.au (syncopation-03.iinet.net.au [203.59.24.49])
	by hub.freebsd.org (Postfix) with SMTP id 3E4F537B65D
	for <arch@FreeBSD.org>; Wed,  7 Feb 2001 10:37:13 -0800 (PST)
Received: (qmail 20518 invoked by uid 666); 7 Feb 2001 18:44:43 -0000
Received: from reggae-22-100.nv.iinet.net.au (HELO elischer.org) (203.59.87.100)
  by mail.m.iinet.net.au with SMTP; 7 Feb 2001 18:44:43 -0000
Message-ID: <3A8195D4.8CFECC9@elischer.org>
Date: Wed, 07 Feb 2001 10:37:08 -0800
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: "Brian F. Feldman" <green@FreeBSD.org>
Cc: arch@FreeBSD.org
Subject: Re: xucred introduction
References: <200102071828.f17ISLr17637@green.dyndns.org>
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

"Brian F. Feldman" wrote:
> 
> I'd like to commit this further clean-up of the kernel API in which struct
[...]
technically it seems ok..
it's a political decision as to whether it should be done...

-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000-2001
---> X_.---._/  
            v


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7 10:44:48 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from syncopation-03.iinet.net.au (syncopation-03.iinet.net.au [203.59.24.49])
	by hub.freebsd.org (Postfix) with SMTP id 0549C37B491
	for <arch@FreeBSD.ORG>; Wed,  7 Feb 2001 10:44:30 -0800 (PST)
Received: (qmail 20921 invoked by uid 666); 7 Feb 2001 18:52:00 -0000
Received: from reggae-22-100.nv.iinet.net.au (HELO elischer.org) (203.59.87.100)
  by mail.m.iinet.net.au with SMTP; 7 Feb 2001 18:52:00 -0000
Message-ID: <3A819788.DA35F78C@elischer.org>
Date: Wed, 07 Feb 2001 10:44:24 -0800
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: "Brian F. Feldman" <green@FreeBSD.ORG>, arch@FreeBSD.ORG
Subject: Re: xucred introduction
References: <2711.981570785@critter>
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Poul-Henning Kamp wrote:
> 
> In message <200102071828.f17ISLr17637@green.dyndns.org>, "Brian F. Feldman" wri
> tes:
> 
> >The only question is whether or not to add some spare fields to
> >xucred now in case we /do/ want to expand it in the future, and also whether
> >it's appropriate to make some of the field type changes (for example,
> >sockaddr length type -> u_char, since that _IS_ what is defined by the
> >sockaddr interface).
> 
> Have you already put a version number in it ?  Otherwise please
> do so.  That is the best way to ensure that we don't get too many
> problems in the future.
> 
> I think in general all structures shared between the kernel and userland
> should be equipped with a version number as the first element.

this brings up whether we should have 'rules' for kernel structures in general..

for example
"Always start with a version number followed by a magic number followed by the
reference count and the lock" or something like that. I know some systems DO
impoes such rules and seem to get advantages from it.  (you can add debug code
to check the magic numbers really easily for example).

Not a REALLY serious suggestion but something to consider.
what would YOU like to see as a standard part of kernel structures?

reference count?
magic number?
generation count?
lock (pointer?)
version number?

I leads to a general discussion about kernel architecture eventually :-)

> 
> --
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk@FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD committer       | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by incompetence.
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-arch" in the body of the message

-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000-2001
---> X_.---._/  
            v


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7 10:50:18 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147])
	by hub.freebsd.org (Postfix) with ESMTP
	id E716F37B401; Wed,  7 Feb 2001 10:50:00 -0800 (PST)
Received: from critter (localhost [127.0.0.1])
	by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f17Io4H02865;
	Wed, 7 Feb 2001 19:50:04 +0100 (CET)
	(envelope-from phk@critter.freebsd.dk)
To: Julian Elischer <julian@elischer.org>
Cc: "Brian F. Feldman" <green@FreeBSD.ORG>, arch@FreeBSD.ORG
Subject: Re: xucred introduction 
In-Reply-To: Your message of "Wed, 07 Feb 2001 10:44:24 PST."
             <3A819788.DA35F78C@elischer.org> 
Date: Wed, 07 Feb 2001 19:50:04 +0100
Message-ID: <2863.981571804@critter>
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


>Not a REALLY serious suggestion but something to consider.
>what would YOU like to see as a standard part of kernel structures?

All I want is a layout version number as the first element in
the structure if it is in any way sanctioned for use from userland.

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7 11:24:38 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4EC4837B503; Wed,  7 Feb 2001 11:24:22 -0800 (PST)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.1/8.9.3) id f17JNlX91394;
	Wed, 7 Feb 2001 11:23:47 -0800 (PST)
	(envelope-from dillon)
Date: Wed, 7 Feb 2001 11:23:47 -0800 (PST)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200102071923.f17JNlX91394@earth.backplane.com>
To: Julian Elischer <julian@elischer.org>
Cc: Poul-Henning Kamp <phk@critter.freebsd.dk>,
	"Brian F. Feldman" <green@FreeBSD.ORG>, arch@FreeBSD.ORG
Subject: Re: xucred introduction
References: <2711.981570785@critter> <3A819788.DA35F78C@elischer.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:this brings up whether we should have 'rules' for kernel structures in general..

    I'd have to say no.  It's too easy for this sort of thing to get
    completely out of control.

    I agree with Poul re: having a version number at the head of any 
    structure exported to userland.  Maybe a size as well (or some 
    out of band way to get the structure size).

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7 11:59:53 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from flood.ping.uio.no (flood.ping.uio.no [129.240.78.31])
	by hub.freebsd.org (Postfix) with ESMTP
	id 1A31237B503; Wed,  7 Feb 2001 11:59:35 -0800 (PST)
Received: (from des@localhost)
	by flood.ping.uio.no (8.9.3/8.9.3) id UAA63934;
	Wed, 7 Feb 2001 20:57:59 +0100 (CET)
	(envelope-from des@ofug.org)
X-URL: http://www.ofug.org/~des/
X-Disclaimer: The views expressed in this message do not necessarily
  coincide with those of any organisation or company with
  which I am or have been affiliated.
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: Julian Elischer <julian@elischer.org>,
	"Brian F. Feldman" <green@FreeBSD.ORG>, arch@FreeBSD.ORG
Subject: Re: xucred introduction
References: <2863.981571804@critter>
From: Dag-Erling Smorgrav <des@ofug.org>
Date: 07 Feb 2001 20:57:59 +0100
In-Reply-To: Poul-Henning Kamp's message of "Wed, 07 Feb 2001 19:50:04 +0100"
Message-ID: <xzpk872mfjs.fsf@flood.ping.uio.no>
Lines: 15
User-Agent: Gnus/5.0802 (Gnus v5.8.2) Emacs/20.4
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Poul-Henning Kamp <phk@critter.freebsd.dk> writes:
> All I want is a layout version number as the first element in
> the structure if it is in any way sanctioned for use from userland.

Some structures (specifically, those that are to be stored in zones)
*must* start with two pointers to their own type. This is arguably a
design flaw in the zone allocator. One possible fix is to add an extra
argument to zinit(), zinitna() and zbootinit() to specify the offset
of these pointers within the structure; another is to have the zone
allocator prepend those pointers itself, so they don't need to be in
the structure at all.

DES
-- 
Dag-Erling Smorgrav - des@ofug.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7 12:36:21 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193])
	by hub.freebsd.org (Postfix) with ESMTP id 2D7AF37B4EC
	for <arch@freebsd.org>; Wed,  7 Feb 2001 12:36:03 -0800 (PST)
Received: (from wollman@localhost)
	by khavrinen.lcs.mit.edu (8.9.3/8.9.3) id PAA46466;
	Wed, 7 Feb 2001 15:35:57 -0500 (EST)
	(envelope-from wollman)
Date: Wed, 7 Feb 2001 15:35:57 -0500 (EST)
From: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
Message-Id: <200102072035.PAA46466@khavrinen.lcs.mit.edu>
To: des@ofug.org
Cc: arch@freebsd.org
Subject: Re: xucred introduction
X-Newsgroups: mit.lcs.mail.freebsd-arch
In-Reply-To: <mit.lcs.mail.freebsd-arch/xzpk872mfjs.fsf@flood.ping.uio.no>
References: <mit.lcs.mail.freebsd-arch/2863.981571804@critter>
Organization: MIT Laboratory for Computer Science
Cc: 
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In article <mit.lcs.mail.freebsd-arch/xzpk872mfjs.fsf@flood.ping.uio.no> you write:
>Some structures (specifically, those that are to be stored in zones)
>*must* start with two pointers to their own type.

No, they don't.

See, e.g., struct inpcb.

The restriction that you get from the zone allocator is that the
beginning of the zone is overlaid with two such pointers *while the
object is free*, so you cannot depend on type-stability for values
which would be stored there.  In the TCP stack, the only thing we
really care about being type-stable is the generation count, which was
intentionally placed at the end of the structure.

-GAWollman

-- 
Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
wollman@lcs.mit.edu  | O Siem / The fires of freedom 
Opinions not those of| Dance in the burning flame
MIT, LCS, CRS, or NSA|                     - Susan Aglukark and Chad Irschick


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7 13:26:42 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133])
	by hub.freebsd.org (Postfix) with ESMTP
	id 2A49137B65D; Wed,  7 Feb 2001 13:26:18 -0800 (PST)
Received: (from daemon@localhost)
	by smtp03.primenet.com (8.9.3/8.9.3) id OAA27535;
	Wed, 7 Feb 2001 14:23:20 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp03.primenet.com, id smtpdAAA7zaWQ1; Wed Feb  7 14:23:10 2001
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id OAA24284;
	Wed, 7 Feb 2001 14:26:00 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200102072126.OAA24284@usr08.primenet.com>
Subject: Re: vnode interlock API
To: bp@butya.kz (Boris Popov)
Date: Wed, 7 Feb 2001 21:26:00 +0000 (GMT)
Cc: freebsd-arch@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG
In-Reply-To: <Pine.BSF.4.21.0102061638280.82511-100000@lion.butya.kz> from "Boris Popov" at Feb 06, 2001 05:00:03 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> 	So, I suggest to introduce two macro definitions which will hide
> implementation details for interlocks:
> 
> #define VI_LOCK(vp)		mtx_enter(&(vp)->v_interlock, MTX_DEF)
> #define VI_UNLOCK(vp)		mtx_exit(&(vp)->v_interlock, MTX_DEF)
> 
> 	for RELENG_4 they will look like this:
> 
> #define VI_LOCK(vp)		simple_lock(&(vp)->v_interlock)
> #define VI_UNLOCK(vp)		simple_unlock(&(vp)->v_interlock)
> 
> 	Any comments, suggestions ?

1)	Macros are good; interfaces are better.  I've consistantly
	recommended that the NFS cookie interface be rewritten to
	not require cookies, even though the FreeBSD/NetBSD/OpenBSD
	differences _could_ be masked with macros.  The issue is
	one of binary vs. source compatability.

2)	If you are going to wrap vnode handling, it would probably
	be a good idea to wrap it using the same approach that
	another OS uses, instead of being gratuitously different
	in naming.  I would suggest using the Solaris names, but I
	will admit that doing that depends  heavily on the semantics
	being the same (I think they would be).  Worst case, pick an
	OS with the same semantics; if there are none, this may be
	an opportunity to learn from other OSs _why_ they don't have
	the same semantics.

3)	It seems to mee that the additional parameter of MTX_DEF is
	gratuitous, and tries to stretch mutex semantics further
	than they should be stretched.  I personally would have no
	problem with the conversion of simple_{un}lock() into the
	equivalent mtx_*() calls.  Even if the MTX_DEF can not be
	murdered without a large public outcry, using this as the
	the default demantic for the simple_*() equivalents isn't
	really a bad idea, in my book, and could be done with
	inline wrappers.  Best case, one could apply the WITNESS
	code to debugging 4.x problems, with some work.

4)	You need to wrap the calls with "{ ... }"; this is because
	it may be useful in the future to institute turnstile or
	single wakeup semantics, and converting the macro into a
	single statement instead of a statement block would mean
	a potentially large amount of work would be needed to cope
	with the change later, whereas, you seem to plan to already
	need to touch all those spots now.  Again, the Solaris SMP
	vnode lock management macros are, I think, a good example
	(or at least they were, six years ago, when Solaris faced
	the same problem).

I have other comments, but these are the four most important ones,
IMO, and I've been making a conscious effort to not clutter arguments
by giving more detail than people seem to want to hear before they
overflow and tune out.  8-).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7 13:38:28 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132])
	by hub.freebsd.org (Postfix) with ESMTP
	id AEE6237B6A6; Wed,  7 Feb 2001 13:38:09 -0800 (PST)
Received: (from daemon@localhost)
	by smtp02.primenet.com (8.9.3/8.9.3) id OAA05271;
	Wed, 7 Feb 2001 14:32:13 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp02.primenet.com, id smtpdAAAOCaGMh; Wed Feb  7 14:29:51 2001
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id OAA24445;
	Wed, 7 Feb 2001 14:35:45 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200102072135.OAA24445@usr08.primenet.com>
Subject: Re: CFR: Sequential mbuf read/write extensions
To: bp@butya.kz (Boris Popov)
Date: Wed, 7 Feb 2001 21:35:44 +0000 (GMT)
Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG
In-Reply-To: <Pine.BSF.4.21.0102061703030.82511-100000@lion.butya.kz> from "Boris Popov" at Feb 06, 2001 05:50:52 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> 	Before starting import process for smbfs, I would like to
> introduce new API which greatly simplifies process of packaging data into
> mbufs and fetching it back (in fact, similar API already presented in the
> tree, but it is private to the netncp code and it will be really nice to
> share it).

[ ... ]

Please include the ability to determine the length of the current
contents (as a marcro?) so that buffers can be padded, as necessary,
since some hardware and some protocols require this.

Also consider protecting the structure with a mutex, at least in
kernel space (this would make the macro harder to write, which is
why I put it into a parenthetical, question-marjed statement).

					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7 14:33:19 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134])
	by hub.freebsd.org (Postfix) with ESMTP
	id 2F26737B6BA; Wed,  7 Feb 2001 14:33:02 -0800 (PST)
Received: (from daemon@localhost)
	by smtp04.primenet.com (8.9.3/8.9.3) id PAA01063;
	Wed, 7 Feb 2001 15:27:40 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp04.primenet.com, id smtpdAAAiway1b; Wed Feb  7 15:27:24 2001
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id PAA26462;
	Wed, 7 Feb 2001 15:32:28 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200102072232.PAA26462@usr08.primenet.com>
Subject: Re: xucred introduction
To: dillon@earth.backplane.com (Matt Dillon)
Date: Wed, 7 Feb 2001 22:32:28 +0000 (GMT)
Cc: julian@elischer.org (Julian Elischer),
	phk@critter.freebsd.dk (Poul-Henning Kamp),
	green@FreeBSD.ORG (Brian F. Feldman), arch@FreeBSD.ORG
In-Reply-To: <200102071923.f17JNlX91394@earth.backplane.com> from "Matt Dillon" at Feb 07, 2001 11:23:47 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> :this brings up whether we should have 'rules' for kernel structures in general..
> 
>     I'd have to say no.  It's too easy for this sort of thing to get
>     completely out of control.

Pretty soon you end up with things like "The VAX Calling Standard",
which leads to nasty things like clustering, transparent process
migration, autonatic load balancing, software fault tolerance,
automatic failover, and all those things we'd rather not think
matter to anyone unless they are running a server OS...

PS: My vote is to put the mutex first, not export it to user space,
and then put the version number.  I'd keep the version number even
if it weren't a user-space/kernel-space interface, since you never
know when it will be useful to deal with passing a structure between
a new kernel and an old driver/module, or vice versa...

PPS: User-to-kernel writes that change contents should hold the
mutex in the kernel, in the API.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7 15: 4: 7 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222])
	by hub.freebsd.org (Postfix) with ESMTP
	id A8A4237B6C4; Wed,  7 Feb 2001 15:03:46 -0800 (PST)
Received: from foo.osd.bsdi.com (root@foo.osd.bsdi.com [204.216.28.137])
	by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id f17N3cx18927;
	Wed, 7 Feb 2001 15:03:38 -0800 (PST)
	(envelope-from jhb@foo.osd.bsdi.com)
Received: (from jhb@localhost)
	by foo.osd.bsdi.com (8.11.1/8.11.1) id f17N3FU14366;
	Wed, 7 Feb 2001 15:03:15 -0800 (PST)
	(envelope-from jhb)
Message-ID: <XFMail.010207150315.jhb@FreeBSD.org>
X-Mailer: XFMail 1.4.0 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <2711.981570785@critter>
Date: Wed, 07 Feb 2001 15:03:15 -0800 (PST)
Organization: BSD, Inc.
From: John Baldwin <jhb@FreeBSD.ORG>
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Subject: Re: xucred introduction
Cc: arch@FreeBSD.ORG, "Brian F. Feldman" <green@FreeBSD.ORG>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On 07-Feb-01 Poul-Henning Kamp wrote:
> In message <200102071828.f17ISLr17637@green.dyndns.org>, "Brian F. Feldman"
> wri
> tes:
> 
>>The only question is whether or not to add some spare fields to 
>>xucred now in case we /do/ want to expand it in the future, and also whether 
>>it's appropriate to make some of the field type changes (for example, 
>>sockaddr length type -> u_char, since that _IS_ what is defined by the 
>>sockaddr interface).
> 
> Have you already put a version number in it ?  Otherwise please
> do so.  That is the best way to ensure that we don't get too many
> problems in the future.
> 
> I think in general all structures shared between the kernel and userland
> should be equipped with a version number as the first element.

As a sidebar, for anyone looking for something to do: kinfo_proc needs a version
number as well.  If you change the size of something in the middle of the
structure, things like ps(1) and top(1) won't notice a problem but will just
misparse the structure. :(

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7 15:23:28 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from VL-MS-MR002.sc1.videotron.ca (relais.videotron.ca [24.201.245.36])
	by hub.freebsd.org (Postfix) with ESMTP
	id 71B2A37B6C3; Wed,  7 Feb 2001 15:23:04 -0800 (PST)
Received: from jehovah ([24.201.144.31]) by
          VL-MS-MR002.sc1.videotron.ca (Netscape Messaging Server 4.15)
          with SMTP id G8EUAA03.L5I; Wed, 7 Feb 2001 18:22:58 -0500 
Message-ID: <002e01c0915d$326a7ec0$1f90c918@jehovah>
From: "Bosko Milekic" <bmilekic@technokratis.com>
To: "Terry Lambert" <tlambert@primenet.com>,
	"Boris Popov" <bp@butya.kz>
Cc: <freebsd-arch@FreeBSD.ORG>, <freebsd-fs@FreeBSD.ORG>
References: <200102072126.OAA24284@usr08.primenet.com>
Subject: Re: vnode interlock API
Date: Wed, 7 Feb 2001 18:25:02 -0500
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2919.6700
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Terry Lambert wrote:

[...]
> 3) It seems to mee that the additional parameter of MTX_DEF is
> gratuitous, and tries to stretch mutex semantics further
> than they should be stretched.  I personally would have no
> problem with the conversion of simple_{un}lock() into the
> equivalent mtx_*() calls.  Even if the MTX_DEF can not be
> murdered without a large public outcry, using this as the

    Actually, it has been murdered: 

    http://people.freebsd.org/~bmilekic/code/mutex_cleanup-7.1.diff

    Presently under testing.

> the default demantic for the simple_*() equivalents isn't
> really a bad idea, in my book, and could be done with
> inline wrappers.  Best case, one could apply the WITNESS
> code to debugging 4.x problems, with some work.
 
[...]
> 
> Terry Lambert
> terry@lambert.org
> ---
> Any opinions in this posting are my own and not those of my present
> or previous employers.

Regards,
Bosko.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7 16:35:56 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from midas.ifour.com.br (unknown [200.238.229.70])
	by hub.freebsd.org (Postfix) with SMTP id E76F637B4EC
	for <freebsd-arch@freebsd.org>; Wed,  7 Feb 2001 16:35:36 -0800 (PST)
Received: (qmail 20944 invoked from network); 7 Feb 2001 21:30:53 -0000
Received: from unknown (HELO ifour.com.br) (192.168.1.11)
  by 192.168.1.10 with SMTP; 7 Feb 2001 21:30:53 -0000
Message-ID: <3A81CD00.9C5461FB@ifour.com.br>
Date: Wed, 07 Feb 2001 22:32:32 +0000
From: Gustavo Vieira Goncalves Coelho Rios <gustavo@ifour.com.br>
X-Mailer: Mozilla 4.76 [en] (X11; U; FreeBSD 4.2-STABLE i386)
X-Accept-Language: en
MIME-Version: 1.0
To: freebsd-arch@freebsd.org
Subject: own boot floppies set
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

May some one give me some help where i can find documentation on
building my own boot floppy disk for freebsd ?

Thanks in advance!


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Feb  7 17:47:35 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from molly.straylight.com (molly.straylight.com [209.68.199.242])
	by hub.freebsd.org (Postfix) with ESMTP
	id D0B2A37B491; Wed,  7 Feb 2001 17:47:15 -0800 (PST)
Received: from dickie (case.straylight.com [209.68.199.244])
	by molly.straylight.com (8.11.0/8.10.0) with SMTP id f181l5X04054;
	Wed, 7 Feb 2001 17:47:09 -0800
From: "Jonathan Graehl" <jonathan@graehl.org>
To: <freebsd-arch@freebsd.org>
Cc: "Jonathan Lemon" <jlemon@freebsd.org>
Subject: empirical results of waiting for nonblocking connect with kqueue/EVFILT_WRITE (EV_EOF is not set for timed out connections, bug?)
Date: Wed, 7 Feb 2001 17:47:54 -0800
Message-ID: <NCBBLOALCKKINBNNEDDLCEHNDKAA.jonathan@graehl.org>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
Importance: Normal
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

cases for the kevent returned from EVFILT_WRITE for socket whose connect
returned error EWOULDBLOCK:

connection failed, refused: flags=0x8001 (= EV_EOF & EV_ADD); data=0x4000
connection failed, timed out (+ any icmp response, host unreachable, host admin
prohibited, etc): flags=0x1 (= EV_ADD); data=0x4000
connection succesful: flags=0x1 = EV_ADD; data=0x43e0 ( = socket buffer bytes
available to write)

if you want to see the particular error code (host or net unreachable, or just
plain timed out) for a timed out connection, you can use
getsockopt(SO_ERROR...).  also getpeername can determine if the socket is
connected (is there a more direct socket call to do so?)

question: clearly, the event for a pending connection has these reproducible
(aside from changing the socket send buffer size), undocumented values in the
flags/data fields.  what can be counted on (and documented) in the future?  for
now, i would use the test e.data != 0x4000, and make sure i don't set my socket
send buffer small enough for any confusion to arise.

i would think that EV_EOF should be set for timed out connections as well as
refused ones, and this should be the documented criteria

my suggestion would be to create a flag EV_SOERR, and change filt_soread and
filt_sowrite (in sys/kern/uipc_socket.c) from:
        if (so->so_error)       /* temporary udp error */
                return (1);
to:
        if (so->so_error) {
                kn->kn_flags |= EV_SOERR;
                kn->kn_data = so->so_error;
                return (1);
        }
or, to maintain compatibility (if it is necessary to return with no indication
for udp errors?), to:
	 if (so->so_error) {
	          if ((so->so_proto->pr_flags & PR_CONNREQUIRED))
                        kn->kn_flags |= EV_EOF;
                return (1);
        }

(EV_EOF and/or EV_SOERR would be fine in either case, as long as there is some
indication, although it would be nice to not have to getsockopt(SO_ERR,...))

larger context:

static int
filt_sowrite(struct knote *kn, long hint)
{
        struct socket *so = (struct socket *)kn->kn_fp->f_data;

        kn->kn_data = sbspace(&so->so_snd);
        if (so->so_state & SS_CANTSENDMORE) {
                kn->kn_flags |= EV_EOF;
                return (1);
        }
        if (so->so_error)       /* temporary udp error */
                return (1);
        if (((so->so_state & SS_ISCONNECTED) == 0) &&
            (so->so_proto->pr_flags & PR_CONNREQUIRED))
                return (0);
        return (kn->kn_data >= so->so_snd.sb_lowat);
}

disclaimer: i only vaguely understand what's going on ;)

--
Jonathan Graehl
  email: jonathan@graehl.org
  web: http://jonathan.graehl.org/
  phone: 858-642-7562


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Feb  8  4:34:18 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11])
	by hub.freebsd.org (Postfix) with SMTP
	id DAA3737B65D; Thu,  8 Feb 2001 04:33:51 -0800 (PST)
Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP
          id <aa16167@salmon>; 8 Feb 2001 12:33:50 +0000 (GMT)
To: Boris Popov <bp@butya.kz>
Cc: freebsd-arch@freebsd.org, freebsd-net@freebsd.org,
	iedowse@maths.tcd.ie
Subject: Re: CFR: Sequential mbuf read/write extensions 
In-Reply-To: Your message of "Tue, 06 Feb 2001 17:50:52 +0600."
             <Pine.BSF.4.21.0102061703030.82511-100000@lion.butya.kz> 
Date: Thu, 08 Feb 2001 12:33:50 +0000
From: Ian Dowse <iedowse@maths.tcd.ie>
Message-ID:  <200102081233.aa16167@salmon.maths.tcd.ie>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In message <Pine.BSF.4.21.0102061703030.82511-100000@lion.butya.kz>, Boris Popo
v writes:
>	Before starting import process for smbfs, I would like to
>introduce new API which greatly simplifies process of packaging data into
>mbufs and fetching it back (in fact, similar API already presented in the
>tree, but it is private to the netncp code and it will be really nice to
>share it).

Hi Boris,

These mbuf chain manipulation primitives look great! I was playing
around with some similar code myself a while ago, so I'll just
mention a few of the general issues that may be worth thinking
about. I don't have any strong opinions about what approaches are
best, so please don't take anything I say too seriously unless you
agree with it :-)


It may be beneficial to use separate structs for the build and
breakdown operations. The two cases have slightly different
requirements: the mb_count field is only useful when building, and
mb_pos is only strictly necessary when breaking down mbuf chains.
The main advantage of using separate structs is better compiler
type checking, especially in the arguments to functions that need
to break down one chain and build another.


The i386 architecture is not fussy about alignment of multi-byte
types in memory operations. However other architectures are not
so forgiving. Some NIC drivers have to do magic to ensure that IP
packets are 4-byte aligned, but this will not help if you are using
a protocol that does not guarantee 4-byte alignment of 32- or 64-bit
quantities within the IP packet. Doing a

	mb_get_dword(...);
	mb_get_byte(...);
	mb_get_dword(...);

will cause an alignment exception on the alpha, for example.


Someone suggested using numeric names to indicate the size of the
types rather than 'byte', 'word' etc. I'd agree with this too; the
text names are not intuitive unless you have used dos/windows for
too long :-) Maybe use names such as mb_get_uint32, so that it is
obvious what C type should be passed as an argument.


I wonder if 'mbdata' is the best name for the struct? I think I
had used something like 'mchain', but if separate build/breakdown
structs are used, maybe mbuild/mbreakdown or mbchain/mdchain? (the
NFS code uses the words 'build' and 'dissect' to refer to the two
operations). The main idea would just be to try and have the name
indicate what information is held by the struct.


Another useful 'put' function would be something that adds a number
of bytes of 'empty' space to the end of the chain, and sets up a
uio/iovec pointing to this space. e.g to read from a file to an
mbuf chain you could use:

	error = mb_put_muio(mbp, &uio, size, &iovp);
	...
	error = VOP_READ(vp, &uio, flag, cred);
	...
	FREE(iovp, M_TEMP);


For cases where there is a small (< MLEN) but relatively complex
data structure to be extracted from a chain, it may be useful to
have a function which just rearranges the mbufs to ensure that a
number of bytes become contiguous. It can make an in-mbuf pointer
to that space available. In most cases this will avoid having to
copy the data.


I wonder if these routines are the correct place for the endian
conversions? It certainly simplifies the code that must build and
parse requests, but requires duplication of each mb_get/mb_put
operation. I understand that there isn't currently code in the tree
for dealing with odd protocols that use little-endian format for
data transmitted on the network (smb is one of these?).


Sometimes it is useful to have idempotent init() and free() functions.
For example, consider a function which builds a request and sends
it, but which must handle errors both before and after the mbuf
chain is sent off to the protocol. If mb_init simply NULL'd out
the mb_top pointer, then the code could look like this:

		mb_init(&mb);
		
		if (mb_add_xxx(...) != 0)
			goto out;

		...->pru_sosend(..., mb.mb_top, ...);
		mb_init(&mb);
		...
		if (error)
			goto out;

	out:
		mb_free(&mb);
		return (error);

The pru_sosend() function takes over ownership of the mbuf chain,
so there is a need to just blank out the mbdata structure without
freeing the chain, and without performing any allocations. An init
function which cannot fail also simplifies the code. See callout_init()
in kern_timeout.c for similar code.


The mb_put_pstring function maybe belongs in the protocol-specific
code rather than here, since there are just too many different ways
of encoding strings. Different protocols are likely to encode
strings in different ways, with respect to length field type and
padding/alignment.


Some of these mb_ functions return EBADRPC when not enough bytes
of data are found in the mbuf chain. It might be better to choose
a more generic return code, since these routines are not specific
to RPC.

Ian


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Feb  8 11:14:42 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from VL-MS-MR002.sc1.videotron.ca (relais.videotron.ca [24.201.245.36])
	by hub.freebsd.org (Postfix) with ESMTP
	id 79DF937B684; Thu,  8 Feb 2001 11:14:16 -0800 (PST)
Received: from jehovah ([24.201.144.31]) by
          VL-MS-MR002.sc1.videotron.ca (Netscape Messaging Server 4.15)
          with SMTP id G8GDFD04.BFD; Thu, 8 Feb 2001 14:14:01 -0500 
Message-ID: <013201c09203$971be9c0$1f90c918@jehovah>
From: "Bosko Milekic" <bmilekic@technokratis.com>
To: "Boris Popov" <bp@butya.kz>
Cc: <freebsd-arch@FreeBSD.ORG>, <freebsd-net@FreeBSD.ORG>
References: <Pine.BSF.4.21.0102071516110.7952-100000@lion.butya.kz>
Subject: Re: Sequential mbuf read/write extensions
Date: Thu, 8 Feb 2001 14:16:07 -0500
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2919.6700
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Boris Popov wrote:

[...]
> Not exactly so. 'option LIBMBUF' will just connect the source file
> to kernel makefile. There is no need for any #ifdef's in the code.

    Right. But I assume LIBMBUF will absolutely be needed if code that
uses the routines is compiled. What I just meant to say was: "when the
code using these routines grows to be significant enough, then we can
just remove the option."

> > #define M_TRYWAIT M_WAIT is not right.
> > (M_WAIT is no longer to be used in the mbuf code.)
>
> You omitted the surrounding "#ifndef M_TRYWAIT" which makes this
> code portable to RELENG_4 (mind you, this code taken from smbfs). Of
> course, this should be stripped before import.

    I did, you're right. I guess I saw the "ifndef" wrong... I read
this with only -CURRENT in mind and was afraid that the mbuf code
flags would start mixing in with the malloc code flags -- something I
tried to fight off in the past while.

> > The succesfull return values are 0, I don't have a problem with
this,
> > specifically, but I would assume that this:
> > if (!mb_init(mbp))  ... would be more "logical" (I use the term
> > loosely) if it meant: "if initialization fails" (now it means "if
> > initialization is succesful").
>
> I'm generally don't like such syntax if function or variable name
> do not clearly specify which value it should have/return on success.
> Nearly all functions in this file return zero or error code, so the
> correct syntax of the above will be:
>
> error = mb_init(mbp);
> if (!error)
>
> or
>
> if (error)
> return error;
>
> or
>
> if (mb_init(mbp) != 0)
> return ESOMETHINGEVIL;

    OK.

> > > significantly affected. The names of source and header files are
> > > questionable too and I would appreciate good suggestions
(currently
> > they
> > > are subr_mbuf.c and subr_mbuf.h).
> >
> >     Hmmm. Maybe subr_mblib.c and libmb.h ? I don't want to turn
this
> > into a bikeshed ( :-) ), so I suggest that you decide. Personally,
I
> > would prefer that it be something other than "subr_mbuf.c" simply
> > because it may be a little misleading in some cases.
>
> Good point.
>
> > Boris, this is really a great interface and nice looking, clean
code.
>
> I'm sure, this code can be significantly improved by mbuf gurus :)
>
> --
> Boris Popov
> http://www.butya.kz/~bp/

Ok, I have a few things to add (although I'm sure you'll be more into
reading Ian Dowse's comments) :-)

in mb_append_record(), you walk all the "record" mbufs to get to the
last "record." How good would be the tradeoff? i.e. keeping a pointer
to the last pkt in the mbdata structure's mbuf chain? We would grow
the structure by a pointer, and we may have to maintain the last
record pointer; but isn't the only place where we would have to
"maintain it" in mb_append_record() anyway?

in mb_init(), the m->m_pkthdr.rcvif = NULL; can be ommitted, as
MGETHDR() will do that. The m->m_len = 0 should stay for now.

m_getm() looks like it should belong in uipc_mbuf.c -- it looks quite
a bit like the "all or nothing" allocation routine I have sitting
here. The difference is that mine doesn't take size as an argument,
but rather the actual count of mbufs and all it does is allocate
`count' mbufs and attach a cluster to each one of them. If it can't
allocate a cluster or an mbuf at any point, it frees everything and
returns. Now that I think about it, I'd much rather have `size' passed
in instead, even though some callers may not know the `size' (think
drivers that pre-allocate mbufs + clusters, they typically know the
`count'), it turns out that it is cheaper to compute the count from
the size than the optimal size from the count, in the mbuf case. If
you don't mind, I would strongly recommend moving m_getm() to
uipc_mbuf.c. Code that doesn't know the `size' but knows the `count'
(like some driver code) can do;

m = m_get(M_TRYWAIT, MT_DATA);
if (m == NULL) {
    /* we can't even allocate one mbuf, we're really low,
       so don't even bother calling m_getm(). The other
       option would be to have m_getm() not require
        us to pre-allocate an mbuf at all and do all the
       work, but then that may interfere with code like
       yours which needs to pass in an existing mbuf
       that has already been allocated. */
    m_free(m);
    /* fail right here */
} else {
    size = count * (MLEN + MCLBYTES);
    if (m_getm(m, size) == NULL) {
        /* everything has been properly freed for us,
            we don't have to worry about leaking mbufs. */
        /* fail right here. */
    }
}

For this to work, though, m_getm() needs to be modified to free all of
`top' chain if it can't get either a cluster or an mbuf. I don't know
if this was intentional, but it seems to me that there is a subtle
problem in m_getm() as it is now:

if (len > MINCLSIZE) {
    MCLGET(m, M_TRYWAIT);
    if ((m->m_flags & M_EXT) == 0) {
       m_freem(m); <------ frees only one mbuf
       return NULL;
			    }
		}

I think what may happen here is that you will leak your `top' chain if
you fail to allocate a cluster.

Assuming that the leak does exist and that it is fixed, we have a
pretty good mechanism for doing 'all or nothing' allocations. :-)

Later,
Bosko.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Feb  8 18:31:26 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94])
	by hub.freebsd.org (Postfix) with ESMTP
	id E3CA837B401; Thu,  8 Feb 2001 18:30:57 -0800 (PST)
Received: by relay.butya.kz (Postfix, from userid 1000)
	id A23A328E1C; Fri,  9 Feb 2001 08:30:48 +0600 (ALMT)
Received: from localhost (localhost [127.0.0.1])
	by relay.butya.kz (Postfix) with ESMTP
	id 8565D2863E; Fri,  9 Feb 2001 08:30:48 +0600 (ALMT)
Date: Fri, 9 Feb 2001 08:30:48 +0600 (ALMT)
From: Boris Popov <bp@butya.kz>
To: Ian Dowse <iedowse@maths.tcd.ie>
Cc: freebsd-arch@freebsd.org, freebsd-net@freebsd.org
Subject: Re: CFR: Sequential mbuf read/write extensions 
In-Reply-To: <200102081233.aa16167@salmon.maths.tcd.ie>
Message-ID: <Pine.BSF.4.21.0102090752290.24710-100000@lion.butya.kz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, 8 Feb 2001, Ian Dowse wrote:

> It may be beneficial to use separate structs for the build and
> breakdown operations. The two cases have slightly different
> requirements: the mb_count field is only useful when building, and
> mb_pos is only strictly necessary when breaking down mbuf chains.
> The main advantage of using separate structs is better compiler
> type checking, especially in the arguments to functions that need
> to break down one chain and build another.

	Yes, I've been thinking about it, because once I've managed to 
mix build and breakdown buffers :). The only (not essential) disadvantage
is that it will require two init/done functions.

> The i386 architecture is not fussy about alignment of multi-byte
> types in memory operations. However other architectures are not
> so forgiving. Some NIC drivers have to do magic to ensure that IP
> packets are 4-byte aligned, but this will not help if you are using
> a protocol that does not guarantee 4-byte alignment of 32- or 64-bit
> quantities within the IP packet. Doing a
> 
> 	mb_get_dword(...);
> 	mb_get_byte(...);
> 	mb_get_dword(...);
> 
> will cause an alignment exception on the alpha, for example.

	No, in the current implementation mb_get* functions will work
properly. But mb_put* will fail. This can be avoided by implementing
alignment-safe set* macros (which can be written in two variants - first
form is for aligned objects and second for bad aligned ones).

> Someone suggested using numeric names to indicate the size of the
> types rather than 'byte', 'word' etc. I'd agree with this too; the
> text names are not intuitive unless you have used dos/windows for
> too long :-) Maybe use names such as mb_get_uint32, so that it is
> obvious what C type should be passed as an argument.

	Ok, I'd like type/numeric notation. It is definitely better than
just mb_get32.

> I wonder if 'mbdata' is the best name for the struct? I think I
> had used something like 'mchain', but if separate build/breakdown
> structs are used, maybe mbuild/mbreakdown or mbchain/mdchain? (the
> NFS code uses the words 'build' and 'dissect' to refer to the two
> operations). The main idea would just be to try and have the name
> indicate what information is held by the struct.

	Good point and good names too.

> Another useful 'put' function would be something that adds a number
> of bytes of 'empty' space to the end of the chain, and sets up a
> uio/iovec pointing to this space. e.g to read from a file to an
> mbuf chain you could use:
> 
> 	error = mb_put_muio(mbp, &uio, size, &iovp);
> 	...
> 	error = VOP_READ(vp, &uio, flag, cred);
> 	...
> 	FREE(iovp, M_TEMP);

	This can be added later when the code will be written.

> For cases where there is a small (< MLEN) but relatively complex
> data structure to be extracted from a chain, it may be useful to
> have a function which just rearranges the mbufs to ensure that a
> number of bytes become contiguous. It can make an in-mbuf pointer
> to that space available. In most cases this will avoid having to
> copy the data.

	Hmm, this can cause weird things if one have two or more such
structures in the mbuf chain. Eg, at first point mbufs will be rearranged
to place first structure properly but will misplace second structure. But
in general case - yes, this is useful.

> I wonder if these routines are the correct place for the endian
> conversions? It certainly simplifies the code that must build and
> parse requests, but requires duplication of each mb_get/mb_put
> operation. I understand that there isn't currently code in the tree
> for dealing with odd protocols that use little-endian format for
> data transmitted on the network (smb is one of these?).

	sys/netncp is another example of the code which deals with
little-endian formatted protocol (and mb* code was derived from
sys/netncp/ncp_rq.c) I think it is good idea to provide functions for
an in-place conversions because it makes code much more readable and
reduces the size of generated code. Few additional functions is a good
price for that.

> Sometimes it is useful to have idempotent init() and free() functions.
> For example, consider a function which builds a request and sends
> it, but which must handle errors both before and after the mbuf
> chain is sent off to the protocol. If mb_init simply NULL'd out
> the mb_top pointer, then the code could look like this:
[skip]
> The pru_sosend() function takes over ownership of the mbuf chain,
> so there is a need to just blank out the mbdata structure without
> freeing the chain, and without performing any allocations. An init
> function which cannot fail also simplifies the code. See callout_init()
> in kern_timeout.c for similar code.

	Hmm, since so_send() can fail and some erros can be recovered by
another call to so_send(), I'm just called m_copym() to duplicate the mbuf
chain and give it to so_send().

> The mb_put_pstring function maybe belongs in the protocol-specific
> code rather than here, since there are just too many different ways
> of encoding strings. Different protocols are likely to encode
> strings in different ways, with respect to length field type and
> padding/alignment.

	The name 'pstring' associated with 'pascal' type string which is
known as 'byte of length followed by data'. If this function doesn't suits
to be general then it can be omitted (only netncp/nwfs code uses it).
 
> Some of these mb_ functions return EBADRPC when not enough bytes
> of data are found in the mbuf chain. It might be better to choose
> a more generic return code, since these routines are not specific
> to RPC.

	EBADRPC returned by all mb_get* functions to indicate that the
format of reply is unexpected.

> Ian

	Thanks for great review :)

--
Boris Popov
http://www.butya.kz/~bp/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Feb  8 18:48:50 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94])
	by hub.freebsd.org (Postfix) with ESMTP
	id B44AF37B4EC; Thu,  8 Feb 2001 18:48:29 -0800 (PST)
Received: by relay.butya.kz (Postfix, from userid 1000)
	id 0B5A828867; Fri,  9 Feb 2001 08:48:26 +0600 (ALMT)
Received: from localhost (localhost [127.0.0.1])
	by relay.butya.kz (Postfix) with ESMTP
	id ED14D28695; Fri,  9 Feb 2001 08:48:26 +0600 (ALMT)
Date: Fri, 9 Feb 2001 08:48:26 +0600 (ALMT)
From: Boris Popov <bp@butya.kz>
To: Bosko Milekic <bmilekic@technokratis.com>
Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG
Subject: Re: Sequential mbuf read/write extensions
In-Reply-To: <013201c09203$971be9c0$1f90c918@jehovah>
Message-ID: <Pine.BSF.4.21.0102090831010.24710-100000@lion.butya.kz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, 8 Feb 2001, Bosko Milekic wrote:

> in mb_init(), the m->m_pkthdr.rcvif = NULL; can be ommitted, as
> MGETHDR() will do that. The m->m_len = 0 should stay for now.

	Ok.

> drivers that pre-allocate mbufs + clusters, they typically know the
> `count'), it turns out that it is cheaper to compute the count from
> the size than the optimal size from the count, in the mbuf case. If
> you don't mind, I would strongly recommend moving m_getm() to
> uipc_mbuf.c. Code that doesn't know the `size' but knows the `count'

	Agreed, that why this function have a prefix 'm_' :)

[code sample skipped]

> For this to work, though, m_getm() needs to be modified to free all of
> `top' chain if it can't get either a cluster or an mbuf. I don't know
> if this was intentional, but it seems to me that there is a subtle
> problem in m_getm() as it is now:
> 
> if (len > MINCLSIZE) {
>     MCLGET(m, M_TRYWAIT);
>     if ((m->m_flags & M_EXT) == 0) {
>        m_freem(m); <------ frees only one mbuf
	 ^^^^^^^^^^ cluster is not in the chain yet, so it have to be
freed.

>        return NULL;
> 			    }
> 		}
> 
> I think what may happen here is that you will leak your `top' chain if
> you fail to allocate a cluster.

	The original semantic was not to free an entire chain because
m_getm() do not reallocates original (top) mbuf(s) (which may contain
data) and only adds new mbufs/clusters if possible. So, the calls like
m_get(mb->mb_top) will not left the wild pointer. There is also simple way
to deal with such behavior:

	mtop = m_get(...);
	if (mtop == NULL)
		fail;
	if (m_getm(mtop) == NULL) {
		m_freem(mtop);
		fail;
	}

	Probably m_getm() should return error code rather than pointer to
mbuf to avoid confusion.

--
Boris Popov
http://www.butya.kz/~bp/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Feb  8 19: 7:44 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from VL-MS-MR001.sc1.videotron.ca (relais.videotron.ca [24.201.245.36])
	by hub.freebsd.org (Postfix) with ESMTP
	id 1AC5837B401; Thu,  8 Feb 2001 19:07:22 -0800 (PST)
Received: from jehovah ([24.201.144.31]) by
          VL-MS-MR001.sc1.videotron.ca (Netscape Messaging Server 4.15)
          with SMTP id G8GZC902.6Z2; Thu, 8 Feb 2001 22:07:21 -0500 
Message-ID: <001301c09245$b7400a00$1f90c918@jehovah>
From: "Bosko Milekic" <bmilekic@technokratis.com>
To: "Boris Popov" <bp@butya.kz>
Cc: <freebsd-arch@FreeBSD.ORG>, <freebsd-net@FreeBSD.ORG>
References: <Pine.BSF.4.21.0102090831010.24710-100000@lion.butya.kz>
Subject: Re: Sequential mbuf read/write extensions
Date: Thu, 8 Feb 2001 22:09:28 -0500
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2919.6700
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Boris Popov wrote:

[...]
> > For this to work, though, m_getm() needs to be modified to free
all of
> > `top' chain if it can't get either a cluster or an mbuf. I don't
know
> > if this was intentional, but it seems to me that there is a subtle
> > problem in m_getm() as it is now:
> >
> > if (len > MINCLSIZE) {
> >     MCLGET(m, M_TRYWAIT);
> >     if ((m->m_flags & M_EXT) == 0) {
> >        m_freem(m); <------ frees only one mbuf
> ^^^^^^^^^^ cluster is not in the chain yet, so it have to be
> freed.

    m_free() may be more appropriate than m_freem() then, but see
below.

> >        return NULL;
> >     }
> > }
> >
> > I think what may happen here is that you will leak your `top'
chain if
> > you fail to allocate a cluster.
>
> The original semantic was not to free an entire chain because
> m_getm() do not reallocates original (top) mbuf(s) (which may
contain
> data) and only adds new mbufs/clusters if possible. So, the calls
like
> m_get(mb->mb_top) will not left the wild pointer. There is also
simple way
> to deal with such behavior:
>
> mtop = m_get(...);
> if (mtop == NULL)
> fail;
> if (m_getm(mtop) == NULL) {
> m_freem(mtop);
> fail;
> }
>
> Probably m_getm() should return error code rather than pointer to
> mbuf to avoid confusion.

    I understand this part, but what I think you missed in my comment
is that m_getm() should probably free what it already allocated before
finally failing. It may not need to free `top' because of the wild
pointer, as you say. But think of this:

m_getm() is called with a larger `size' - it decides that given the
`size' it will need to allocate a total of exactly 6 mbufs and 6
clusters for each mbuf. It loops and allocates, succesfully, 5 of
those mbufs and 5 clusters. So `top' chain has now grown and includes
those mbufs. Then what happens in the last iteration is that it
allocates the 6th mbuf OK (it has not yet placed it on the chain) and
fails to allocate a cluster, so it frees just that one mbuf (and not
the mbufs it allocated in prior iterations and attached to `top'
chain) and returns NULL. Your code that calls m_getm() then just
fails, leaving `top' with what it could allocate. Note that in my mail
I said "assuming this is a leak," thus recognizing the possibility
that you did this intentionally. :-) Right now, I'll assume that this
_was_ intentional, as that is what I understand from the above. But in
any case, if we do move this to uipc_mbuf.c, we need to do one of the
following:

(a) make m_getm() free what it allocated in previous loop iterations
before it failed (as described above) or

(b) leave m_getm() the way it is BUT write an additional function that
will simply wrap the call to m_getm() and flush properly for it if it
fails (EXACTLY like your code snippet above).

I'll gladly settle for either, but if we do go with (b), then the
m_freem() should be changed to an m_free(), as it reflects the fact
that we are only freeing the one mbuf and we should document this
behavior, certainly. If you want, I'll roll up a diff in a few days
(once I get what is presently dragging in my "commit this" queue out)
and commit it. If you prefer to do this yourself, then feel free. :-)

> --
> Boris Popov
> http://www.butya.kz/~bp/

Regards,
Bosko.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Feb  8 19:21:25 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from grendel.bsdi.com (unknown [199.79.183.5])
	by hub.freebsd.org (Postfix) with ESMTP id 95F2F37B401
	for <freebsd-arch@FreeBSD.ORG>; Thu,  8 Feb 2001 19:21:07 -0800 (PST)
Received: from grendel.bsdi.com (cp@localhost.bsdi.com [127.0.0.1])
	by grendel.bsdi.com (8.11.1/8.9.3) with ESMTP id f193L6k00368
	for <freebsd-arch@FreeBSD.ORG>; Thu, 8 Feb 2001 20:21:06 -0700 (MST)
	(envelope-from cp@grendel.bsdi.com)
Message-Id: <200102090321.f193L6k00368@grendel.bsdi.com>
To: freebsd-arch@FreeBSD.ORG
Subject: usb, clists, spltty, splbio
From: Chuck Paterson <cp@bsdi.com>
Date: Thu, 08 Feb 2001 20:21:06 -0700
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


	I have been mucking with making moused talk to a usb joystick.
This all turned out pretty straight forward, all user land code in
moused talking to the hid device. The problem is that the kernel
crashes randomly, more often as the system get more loaded. A couple
of times I got a panic in the clist code, but it really didn't
show anything direct. Oh yah, this is with stable, not current.

	Reading through the code I found what looks like a problem.
The hid, and other usb code use clists. The various usb code is
protected by splusb which is a defined as splbio. The function
b_to_q() and all the other clist code use spltty.

I changed the definition of spltty from

GENSPL(spltty,		|=,	tty_imask,				14)

to

GENSPL(spltty,		|=,	tty_imask | bio_imask,			14)

and the crashes appear to have gone away. I say appear, it has run
longer now than it has before, but it hasn't been up much more than
twice as long yet.

	I am not quite sure the best way to deal with this. The only
idea I have thought of that I like at all is to create a splclist()
which is the or of tty and bio and put that into the code that
mucks with clists, perhaps just the allocation/free routines.


Comments
Chuck


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Feb  8 21: 0:20 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94])
	by hub.freebsd.org (Postfix) with ESMTP
	id 91CB337B401; Thu,  8 Feb 2001 20:59:56 -0800 (PST)
Received: by relay.butya.kz (Postfix, from userid 1000)
	id C8FF428695; Fri,  9 Feb 2001 10:59:43 +0600 (ALMT)
Received: from localhost (localhost [127.0.0.1])
	by relay.butya.kz (Postfix) with ESMTP
	id BE4992863E; Fri,  9 Feb 2001 10:59:43 +0600 (ALMT)
Date: Fri, 9 Feb 2001 10:59:43 +0600 (ALMT)
From: Boris Popov <bp@butya.kz>
To: Bosko Milekic <bmilekic@technokratis.com>
Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG
Subject: Re: Sequential mbuf read/write extensions
In-Reply-To: <001301c09245$b7400a00$1f90c918@jehovah>
Message-ID: <Pine.BSF.4.21.0102091054490.25955-100000@lion.butya.kz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, 8 Feb 2001, Bosko Milekic wrote:

> any case, if we do move this to uipc_mbuf.c, we need to do one of the
> following:
> 
> (a) make m_getm() free what it allocated in previous loop iterations
> before it failed (as described above) or
> 
> (b) leave m_getm() the way it is BUT write an additional function that
> will simply wrap the call to m_getm() and flush properly for it if it
> fails (EXACTLY like your code snippet above).

	Ok, I think the (a) is a right way. There is no point to hold
partially allocated mbuf chain. And function should return error code, not
a pointer.

> I'll gladly settle for either, but if we do go with (b), then the
> m_freem() should be changed to an m_free(), as it reflects the fact
> that we are only freeing the one mbuf and we should document this
> behavior, certainly. If you want, I'll roll up a diff in a few days
> (once I get what is presently dragging in my "commit this" queue out)
> and commit it. If you prefer to do this yourself, then feel free. :-)

	Yes, I would appreciate your help on it.

--
Boris Popov
http://www.butya.kz/~bp/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Feb  8 21:15: 3 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222])
	by hub.freebsd.org (Postfix) with ESMTP id 3B23237B491
	for <freebsd-arch@FreeBSD.ORG>; Thu,  8 Feb 2001 21:14:45 -0800 (PST)
Received: from foo.osd.bsdi.com (root@foo.osd.bsdi.com [204.216.28.137])
	by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id f195EXx65579;
	Thu, 8 Feb 2001 21:14:33 -0800 (PST)
	(envelope-from jhb@foo.osd.bsdi.com)
Received: (from jhb@localhost)
	by foo.osd.bsdi.com (8.11.1/8.11.1) id f195E9937701;
	Thu, 8 Feb 2001 21:14:09 -0800 (PST)
	(envelope-from jhb)
Message-ID: <XFMail.010208211408.jhb@FreeBSD.org>
X-Mailer: XFMail 1.4.0 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <200102090321.f193L6k00368@grendel.bsdi.com>
Date: Thu, 08 Feb 2001 21:14:08 -0800 (PST)
Organization: BSD, Inc.
From: John Baldwin <jhb@FreeBSD.ORG>
To: Chuck Paterson <cp@bsdi.com>
Subject: RE: usb, clists, spltty, splbio
Cc: freebsd-arch@FreeBSD.ORG
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On 09-Feb-01 Chuck Paterson wrote:
> 
>       I have been mucking with making moused talk to a usb joystick.
> This all turned out pretty straight forward, all user land code in
> moused talking to the hid device. The problem is that the kernel
> crashes randomly, more often as the system get more loaded. A couple
> of times I got a panic in the clist code, but it really didn't
> show anything direct. Oh yah, this is with stable, not current.
> 
>       Reading through the code I found what looks like a problem.
> The hid, and other usb code use clists. The various usb code is
> protected by splusb which is a defined as splbio. The function
> b_to_q() and all the other clist code use spltty.
> 
> I changed the definition of spltty from
> 
> GENSPL(spltty,                |=,     tty_imask,                             
14)
> 
> to
> 
> GENSPL(spltty,                |=,     tty_imask | bio_imask,                 
14)
> 
> and the crashes appear to have gone away. I say appear, it has run
> longer now than it has before, but it hasn't been up much more than
> twice as long yet.
> 
>       I am not quite sure the best way to deal with this. The only
> idea I have thought of that I like at all is to create a splclist()
> which is the or of tty and bio and put that into the code that
> mucks with clists, perhaps just the allocation/free routines.

We have a similar problem with the slip and ppp devices, which have
run code under botth spltty and splnet.  The trick we use there is
to actually change the imasks by doing something along the lines of:

        net_mask |= tty_imask;
        tty_imask = net_imask;

So there is at least prior precedent for doing this sort of thing.

> Comments
> Chuck


-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Feb  9 11:12: 2 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from grendel.bsdi.com (grendel.twistedbit.com [199.79.183.5])
	by hub.freebsd.org (Postfix) with ESMTP
	id 0BB8037B69F; Fri,  9 Feb 2001 11:11:42 -0800 (PST)
Received: from grendel.bsdi.com (cp@localhost.bsdi.com [127.0.0.1])
	by grendel.bsdi.com (8.11.1/8.9.3) with ESMTP id f19JBfk06298;
	Fri, 9 Feb 2001 12:11:41 -0700 (MST)
	(envelope-from cp@grendel.bsdi.com)
Message-Id: <200102091911.f19JBfk06298@grendel.bsdi.com>
To: John Baldwin <jhb@FreeBSD.ORG>
Cc: freebsd-arch@FreeBSD.ORG
Subject: Re: usb, clists, spltty, splbio 
In-reply-to: Your message of "Thu, 08 Feb 2001 21:14:08 PST."
             <XFMail.010208211408.jhb@FreeBSD.org> 
From: Chuck Paterson <cp@bsdi.com>
Date: Fri, 09 Feb 2001 12:11:41 -0700
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


The following code segment is from the top the hid open
routine. I'll start running this code this afternoon. This
is one of those cases where checking it into current does zero
good.

Chuck


Index: uhid.c
===================================================================
RCS file: /cp/cvs.freebsd/src/sys/dev/usb/uhid.c,v
retrieving revision 1.27.2.4
diff -u -r1.27.2.4 uhid.c
--- uhid.c	2000/10/31 22:31:29	1.27.2.4
+++ uhid.c	2001/02/09 19:06:36
@@ -375,6 +375,18 @@
 {
 	struct uhid_softc *sc;
 	usbd_status err;
+#if defined(__FreeBSD__) && defined(__i386__)
+	static int hid_opened;
+
+	if (hid_opened == 0) {
+		int s;
+		s = splhigh();
+		tty_imask |= bio_imask;
+		update_intr_masks();
+		splx(s);
+		hid_opened = 1;
+	}
+#endif
 
 	USB_GET_SC_OPEN(uhid, UHIDUNIT(dev), sc);
 

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Feb  9 17:38: 5 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from Awfulhak.org (awfulhak.demon.co.uk [194.222.196.252])
	by hub.freebsd.org (Postfix) with ESMTP
	id 50EEC37B6A4; Fri,  9 Feb 2001 17:37:35 -0800 (PST)
Received: from hak.lan.Awfulhak.org (root@hak.lan.Awfulhak.org [172.16.0.12])
	by Awfulhak.org (8.11.2/8.11.2) with ESMTP id f1A1bWR08160;
	Sat, 10 Feb 2001 01:37:32 GMT
	(envelope-from brian@lan.Awfulhak.org)
Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1])
	by hak.lan.Awfulhak.org (8.11.2/8.11.1) with ESMTP id f19HaJN01324;
	Fri, 9 Feb 2001 17:36:19 GMT
	(envelope-from brian@hak.lan.Awfulhak.org)
Message-Id: <200102091736.f19HaJN01324@hak.lan.Awfulhak.org>
X-Mailer: exmh version 2.3.1 01/18/2001 with nmh-1.0.4
To: John Baldwin <jhb@FreeBSD.ORG>
Cc: Chuck Paterson <cp@bsdi.com>, freebsd-arch@FreeBSD.ORG,
	brian@Awfulhak.org
Subject: Re: usb, clists, spltty, splbio 
In-Reply-To: Message from John Baldwin <jhb@FreeBSD.ORG> 
   of "Thu, 08 Feb 2001 21:14:08 PST." <XFMail.010208211408.jhb@FreeBSD.org> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 09 Feb 2001 17:36:19 +0000
From: Brian Somers <brian@Awfulhak.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> 
> On 09-Feb-01 Chuck Paterson wrote:
> > 
> >       I have been mucking with making moused talk to a usb joystick.
> > This all turned out pretty straight forward, all user land code in
> > moused talking to the hid device. The problem is that the kernel
> > crashes randomly, more often as the system get more loaded. A couple
> > of times I got a panic in the clist code, but it really didn't
> > show anything direct. Oh yah, this is with stable, not current.
> > 
> >       Reading through the code I found what looks like a problem.
> > The hid, and other usb code use clists. The various usb code is
> > protected by splusb which is a defined as splbio. The function
> > b_to_q() and all the other clist code use spltty.
> > 
> > I changed the definition of spltty from
> > 
> > GENSPL(spltty,                |=,     tty_imask,                             
> 14)
> > 
> > to
> > 
> > GENSPL(spltty,                |=,     tty_imask | bio_imask,                 
> 14)
> > 
> > and the crashes appear to have gone away. I say appear, it has run
> > longer now than it has before, but it hasn't been up much more than
> > twice as long yet.
> > 
> >       I am not quite sure the best way to deal with this. The only
> > idea I have thought of that I like at all is to create a splclist()
> > which is the or of tty and bio and put that into the code that
> > mucks with clists, perhaps just the allocation/free routines.
> 
> We have a similar problem with the slip and ppp devices, which have
> run code under botth spltty and splnet.  The trick we use there is
> to actually change the imasks by doing something along the lines of:
> 
>         net_mask |= tty_imask;
>         tty_imask = net_imask;
> 
> So there is at least prior precedent for doing this sort of thing.

Hmm.  I would think that Chucks' idea has the advantage that it 
doesn't adversely affect existing splnet/spltty code.  Despite this 
only mattering for a finite amount of time, I don't think the 
precedent is good here :-/

> > Comments
> > Chuck
> 
> 
> -- 
> 
> John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
> PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc
> "Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

-- 
Brian <brian@Awfulhak.org>                        <brian@[uk.]FreeBSD.org>
      <http://www.Awfulhak.org>                   <brian@[uk.]OpenBSD.org>
Don't _EVER_ lose your sense of humour !


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message