From owner-freebsd-hackers  Sun Sep  7 07:08:04 1997
Return-Path: <owner-freebsd-hackers>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.7/8.8.7) id HAA19276
          for hackers-outgoing; Sun, 7 Sep 1997 07:08:04 -0700 (PDT)
Received: from usr09.primenet.com (tlambert@usr09.primenet.com [206.165.6.209])
          by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id HAA19262
          for <freebsd-hackers@FreeBSD.ORG>; Sun, 7 Sep 1997 07:08:00 -0700 (PDT)
Received: (from tlambert@localhost)
	by usr09.primenet.com (8.8.5/8.8.5) id HAA07835;
	Sun, 7 Sep 1997 07:07:56 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199709071407.HAA07835@usr09.primenet.com>
Subject: Re: IOCTL Commands - Where is my mistake?
To: joerg_wunsch@uriah.heep.sax.de
Date: Sun, 7 Sep 1997 14:07:54 +0000 (GMT)
Cc: freebsd-hackers@FreeBSD.ORG
In-Reply-To: <19970907110903.WE07508@uriah.heep.sax.de> from "J Wunsch" at Sep 7, 97 11:09:03 am
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Sender: owner-freebsd-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

> > In SystemV, it would not have been luck, it would have been the way it
> > should be.  One could argue that BSD's was of encoding three separate
> > arguments into one is not exactly a mark of engineering ellegance.
> 
> Well, it offers two advantages:
> 
> . It's failsafe.  Change the size of the structure, and it will make
>   it a different ioctl command.  You can still support the old one
>   if you want, if your kernel driver declares the old struct as
>   `ofoo_ioctl_t'.  Otherwise, an application will simply get an ENOTTY,
>   as opposed to trashing arbitrary data in the kernel in the assumption
>   the ioctl would be called from a matching userland program.

In fact, I use exactly this fact to transparently include system id
and remote pid information in the NFS locking code.  The reason it
works is that the old fcntl() values don't transport the information,
but the new fcntl() values (F_R...) do.  So in the kernel, I can choose
to pull in only the old structure, which is a subset of the new
structure, anytime I'm not decoding a new call.

This maintains binary compatability with old applications without
needing to recompile them for the larger structure size (which you
would have to do, since the data in user space being copied to kernel
space may butt-up against an unmapped region, and attempting to copy
in a larger-than-old-structure could cause the program to segfault).


> . It concentrates the copyin/copyout at a single place, including all
>   the EFAULT handling etc (that older SysV's IMHO didn't even provide
>   for).  When i first saw the BSD approach, i immediately thought:
>   ``Hey, why hasn't it been this way all the time?''  The SysV approach
>   where each driver does a boring copyin/copyout plain sucks. :)
>   (...and is more prone to kernel programmer errors)

There are other issues as well, dealing with this.

In a kernel threaded or kernel preemptive environment (realtime or
SMP, etc.), you can easily get screwed.  Putting the copies in up
front and out at the end means the the intermediate code is no longer
dependent on maintaining the page mappings for the user process.

This would be an especially serious issue for an async call gate,
which is critical to the functioning of a cooperative scheduling of
user space threads on kernel threads to ensure that you don't give
away quantum as frequently.

This is actually a necessity, since without a CPU affinity model in
the scheduler, a kernel thread (a normal process is a user thread
bound to a single kernel thread) may be run on any CPU... after all,
the CPU's are symmetric.  Without this, you will end up migrating
processes unnecessarily.  This destroys the value of your L1 cache
and your instruction pipelines, and would have a big negative impact
on overall performance, and in the end, the amount of CPUs you can
add before diminishing your returns.

Actually, SVR4 and Solaris kernel threading have this problem now,
which is why you won't see an unmodified version of either running
on Sequent-type boxes (ie: 10's of processors).

Think of it as "not being like SVR4"... most BSD people find that
palletable enough that they won't even adopt a good technology, if
it passed through SVR4, and was thus impugned by association.

8-) 8-).

Linux had a big problem, in that it prevalidated source and target
ranges, especially on ioctl's that took arguments in a structure
and returned arguments in the same area.  This was a win, in that
it saved a validation, and increased concurrency (for some operations).
But overall, it's a loss, since with kernel preeemption coming on
line, the mapping may have changed between the time the call started
and when it completed.  I don't know if they still do this, or if
they thrash the page table on each wakeup, or if they are simply
succeptible to race condition based hacks at this time (I haven't
looked lately).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.