From owner-freebsd-arch  Sun Nov 26  6:22:26 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.interware.hu (mail.interware.hu [195.70.32.130])
	by hub.freebsd.org (Postfix) with ESMTP
	id 3FAA137B479; Sun, 26 Nov 2000 06:22:21 -0800 (PST)
Received: from kinshasa-57.budapest.interware.hu ([195.70.51.185] helo=elischer.org)
	by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian))
	id 1402gp-0004yJ-00; Sun, 26 Nov 2000 15:22:00 +0100
Message-ID: <3A211C82.2464D07E@elischer.org>
Date: Sun, 26 Nov 2000 06:21:54 -0800
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: arch@FreeBSD.ORG, jasone@freebsd.org
Subject: Re: Threads (KSE etc) comments
References: <Pine.SUN.3.91.1001121160717.7102A-100000@pcnet1.pcnet.com> <3A1B0B64.6D694248@elischer.org>
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Where's jasone? He's been perculiarly silent during this....

There has been some discussion as to what the function of the KSEG
is....

I was shaving today (doesn't the best thinkin happen at such times?)
and thinking about why we needed KSEGs.

The basic answer is, 

"We need some method by which we group the scheduled entities 
so as to be able to ensure that the scheduler has full 
information and control over what is going on."


Whether we actually need a KSEG and what it does depends upon what
semantics we want our threading support to have. If we want to provide a
virtual machine for the process, that looks as if it has an unlimited
number of virtual processors, then we allow the KSEG to spawn an
unlimited number of KSEs. In this case, do we allow the "scheduling
clout" to build up linearly with the number of KSEs or do we limit it in
some way? Theoretically you would want a KSEG with two KSEs to have the
same clout as a process running unthreaded, so that cpu time would be
divided 50-50. However this would mean assigning the threaded process
'partial quantum' for each processor.

By this I mean that after 5 ticks the KSE on each processor for the KSE
would be interrupted and the other process allowed to run. This is
unworkable. Another way of sharing the processors between the two
processors would be to schedule bith KSEs on one process and allow the
other process to run uninterrupted on the other. This is also quite
unworkable - what if there are three competing processes and only 2
processors? 

Maybe this 'exact fairness' is too hard to achieve..

In my world, we allow the KSEG to become SLIGHTLTY unfair, by allowing
it to compete independently on each processor. If we allow the KSEG to
have an unlimited number of KSEs then we need some other item that
competes on behalf of the KSEG on each processor. That is, we invent
some other structure (KSEG-agent) that sits in the scheduling queue(s)
on behalf of the  KSEG. When the 'agent' gets a quantum, it allows the
KSEG to decide which of it's KSEGs will be run next. (The KSEG could
round robin them for example).

When a KSE is pre-empted, the kernel saves state for that thread in the
thread-control-block and the next KSE to upcall to the UTS will include
that thread-control-block in its list of reportable entities. I'm not
clear on whether it's the next upcall on ANY KSE, or just the next
upcall on that KSE.. 

If the latter then having multiple KSEs on the same processor, allows
the KSEG round-robin scheduler to make the UTS believe that it has N
virtual processors, (N-KSEs). However, it also means that the KSEG
round-robin scheduler is usurping the decision from the UTS as to which
thread is to be run next, as the UTS doesn't know that the thread on the
other KSE was pre-empted in favour of this one. (It's on a different
virtual CPU).

If the Former (All KSEs report all events) then there is no real
advantage to having more than N KSEs (N processors), because that means
that the UTS will probably keep swapping the threads it thinks are most
important to the KSEs which means that the thread that was pre-empted on
KSE-A will probably be re-scheduled on KSE-B when it preempts KSE-A. So
why have KSE-B at all? All it does is massively confuse things, and
creates a whole new class of scheduling problems.


So, in summary:
Assuming we allow only SLIGHT unfairness, if you allow the process to
have more than N KSEs in a KSEG, you have one of the following:
1/ A lot of unfairness if you allow each KSE to be in the queues by
itself.
2/ The KSEG scheduler usurping the role of the UTS if it really does
hide the true number of processors.
3/ An increased level of UTS complexity, and un-needed work, as the UTS
struggles to switch the important threads onto the ever-changing set of
running KSEs (it must be ever changing because there are more of them
than CPUs).


If you only allow N KSEs to the KSEG, then all these problems go away.
The UTS can be aware that it has a limit. But it can also be aware that
a KSE will not be re-empted by another of it's own KSEs. (this
simplifies things). It gets the same amount of
CPU-time, but has less work to do. It has full control of which threads
are running,
and competes fairly with other processes and KSEGs.

The reason for having KSEGs is simply as an entity that competes for CPU
to assure fairness.
It may not even exist as a separate structure in the case where there
are separate per-CPU scheduling queues, (though I think it would for
efficiency's sake). It would PROBABLY have a analogous partner in the
UTS that represents the virtual machine that runs all the threads that
are competing at the same scope. On a single scheduling queue system, I
think I would have the KSEG in the queue rather than the independent
KSEs. When it get's to the head, you schedule
KSEs on all the CPUs. This allows the threads to communicate quickly
using shared memory should they want. The UTS has the entire quantum
across as many CPUs as it has. 

I hope that his answers some of the questions as to why I think there
are reasons for having the KSEG entity.

I hope there will be a good argument about this. We want as many people
thinking about it as possible.

I'll try draw up some more pictures.....(like last time) to illustrate
my thoughts as to how this all works.

Julian


-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000
---> X_.---._/  presently in:  Budapest
            v


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Nov 26 12:18:36 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.interware.hu (mail.interware.hu [195.70.32.130])
	by hub.freebsd.org (Postfix) with ESMTP
	id 8491537B479; Sun, 26 Nov 2000 12:18:28 -0800 (PST)
Received: from dakar-60.budapest.interware.hu ([195.70.51.124] helo=elischer.org)
	by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian))
	id 1408Fj-0002ze-00; Sun, 26 Nov 2000 21:18:24 +0100
Message-ID: <3A216FFE.BE0F780F@elischer.org>
Date: Sun, 26 Nov 2000 12:18:06 -0800
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: arch@FreeBSD.ORG, jasone@freebsd.org
Subject: Re: Threads .. chopping up 'struct proc'
References: <Pine.SUN.3.91.1001121160717.7102A-100000@pcnet1.pcnet.com> <3A1B0B64.6D694248@elischer.org> <3A211C82.2464D07E@elischer.org>
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

I'v been looking a the proc srtucture..

The aim is to eventually move some of the fields into a
struct KSE (struct schedbox?)
struct KSEC (struct threadcontext?)
struct KSEG (struct schedgroup?)

Initially we would simply include one of each of these in the struct proc,
but link them together as if they were correctly connected up.
we would use macros such as:
#define p_estcpu p_kse.kse_estcpu
to keep present code working....
eventually functions that get changed to receive a kse directly
would just use kse->kse_estcpu and if they need proc they
can use kse->kse_proc. But until then, we'd start by simply 
separating the fields and using macros. Then we can convert 
calls at our leasure.

However when going through the fields in struct proc,
some difficulties become obvious. Here's my initial 
division of the fields. I've added a comment at the 
beginning of each line that indicates where I think 
it should go, however I'm not convinced about some of them:

P = stays in struct proc
E = goes to 'KSE' struct (schedulable entity)
G = goes to 'group' struct
C = goes to 'sleepable Context' struct.

I note with [XXX] things I am sure about, or do nut really understand.
these are usually new fields to do with things like events, or fields 
where the semantics of the feature have not been decided for a 
threaded environment.  E.g. WHO GETS A SIGNAL?

struct  proc {
/*E*/   TAILQ_ENTRY(proc) p_procq;      /* run/mutex queue. */
[this may need to be split to two entries.. one in a KSE or
 and one in a KSEG, depending on how we do things ]

/*C*/   TAILQ_ENTRY(proc) p_slpq;       /* sleep queue. */
/*P*/   LIST_ENTRY(proc) p_list;        /* List of all processes. */

        /* substructures: */
/*P*/   struct  pcred *p_cred;          /* Process owner's identity. */
/*P*/   struct  filedesc *p_fd;         /* Ptr to open files structure. */
/*P*/   struct  pstats *p_stats;        /* Accounting/statistics (PROC ONLY). */
[some of these may need to be duplicated in the KSE and KSEG.. 
maybe even Context]
/*P*/   struct  plimit *p_limit;        /* Process limits. */
/*P*/   struct  vm_object *p_upages_obj;/* Upages object */
/*P*/   struct  procsig *p_procsig;
[Well, actually who gets signals?  maybe this is per KSE? per KSEG?
maybe even per Context as each context has a different user stack and
signals are delivered on the user stack.. (unless set otherwise)]


#define p_sigacts       p_procsig->ps_sigacts
#define p_sigignore     p_procsig->ps_sigignore
#define p_sigcatch      p_procsig->ps_sigcatch
 
#define p_ucred         p_cred->pc_ucred
#define p_rlimit        p_limit->pl_rlimit
 
/*C*/   int     p_flag;                 /* P_* flags. */
[these flags will probably need to be shared out amongst the structures]
/*C*/   char    p_stat;                 /* S* process status. */
[as will these]
        char    p_pad1[3];
 
/*P*/   pid_t   p_pid;                  /* Process identifier. */
/*P*/   LIST_ENTRY(proc) p_hash;        /* Hash chain. */
/*P*/   LIST_ENTRY(proc) p_pglist;      /* List of processes in pgrp. */
/*P*/   struct  proc *p_pptr;           /* Pointer to parent process. */
/*P*/   LIST_ENTRY(proc) p_sibling;     /* List of sibling processes. */ 
/*P*/   LIST_HEAD(, proc) p_children;   /* Pointer to list of children. */
 
/*P*/   struct callout_handle p_ithandle; /*
                                              * Callout handle for scheduling
                                              * p_realtimer.
                                              */
[So who gets the resulting signal? Can differnt KSEGs have
different timers running? what about KSEs? (I vote for KSEGs)]

/* The following fields are all zeroed upon creation in fork. */
#define p_startzero     p_oppid
  
/*P*/   pid_t   p_oppid;         /* Save parent pid during ptrace. XXX */ 
/*C*/   int     p_dupfd;         /* Sideways return value from fdopen. XXX */
[whatever THIS means.. it's a hack so C is the safest place for it] 
/*P*/   struct  vmspace *p_vmspace;     /* Address space. */
 
        /* scheduling */
[I've shown the following as being in the KSE structure. they would be 
collected there, but the priority is worked out for the entire KSEG
so it probably collects the data from all of the KSEs. UNLESS we decide that
all KSEs can have independent priorities, in which case how do you
control how their priorities relate..]

/*E*/   u_int   p_estcpu;        /* Time averaged value of p_cpticks. */
/*E*/   int     p_cpticks;       /* Ticks of cpu time. */
/*E*/   fixpt_t p_pctcpu;        /* %cpu for this process during p_swtime */
        void    *p_wchan;        /* Sleep address. */
        const char *p_wmesg;     /* Reason for sleep. */
/*P*/   u_int   p_swtime;        /* Time swapped in or out. */
/*E?*/  u_int   p_slptime;       /* Time since last blocked. */
[what does this mean?]
 
/*?*/   struct  itimerval p_realtimer;  /* Alarm timer. */
[who gets these? who can set them? what is their scope?]
/*P*/   u_int64_t p_runtime;            /* Real time in microsec. */

[If we treat separate KSEGs as seperate processes, do we keep the
below fields per KSEG? */
/*G?*/  u_int64_t p_uu;                 /* Previous user time in microsec. */
/*G?*/  u_int64_t p_su;                 /* Previous system time in microsec. */
/*G?*/  u_int64_t p_iu;                 /* Previous interrupt time in usec. */
[how about these? do we agregate? or collect per KSE? Is there a separate
statclock per CPU?]
/*P?*/  u_int64_t p_uticks;             /* Statclock hits in user mode. */
/*P?*/  u_int64_t p_sticks;             /* Statclock hits in system mode. */
/*P?*/  u_int64_t p_iticks;             /* Statclock hits processing intr. */

/*P*/   int     p_traceflag;            /* Kernel trace points. */
/*P*/   struct  vnode *p_tracep;        /* Trace to vnode. */
[do we trace all KSEs at once? how do we trace individual threads? */

/*P*/   sigset_t p_siglist;             /* Signals arrived but not delivered. */
[who gets signals? does each KSEG (KSE?) have its own handler?]
/*P*/   struct  vnode *p_textvp;        /* Vnode of executable. */

/*P*/   char    p_lock;                 /* Process lock (prevent swap) count. */
/*E*/   u_char  p_oncpu;                /* Which cpu we are on */
/*E?*/  u_char  p_lastcpu;              /* Last cpu we were on */
[each context or each KSE? KSEs can't migrate, (under discussion)]
/*EG?*/ char    p_rqindex;              /* Run queue index */
Who is on the run queue? KSE or KSEG?
   
/*C*/   short   p_locks;                /* DEBUG: lockmgr count of held locks */
/*C*/   short   p_simple_locks;         /* DEBUG: count of held simple locks */
[If you cannot sleep or be interrupted with these they could be in the KSE]
/*P?*/  unsigned int    p_stops;        /* procfs event bitmask */
/*P?*/  unsigned int    p_stype;        /* procfs stop event type */
/*P?*/  char    p_step;                 /* procfs stop *once* flag */
/*P?*/  unsigned char   p_pfsflags;     /* procfs flags */
[the procfs stuff is problematical... dependign in what it does 
and what it is used for, the semantics might vary]

        char    p_pad3[2];              /* padding for alignment */
/*C*/   register_t p_retval[2];         /* syscall aux returns */
/*P*/   struct  sigiolst p_sigiolst;    /* list of sigio sources */
[who gets signals?]

/*P*/   int     p_sigparent;            /* signal to parent on exit */
/*P*/   sigset_t p_oldsigmask;          /* saved mask from before sigpause */
[one per signal scope.. what IS the scope of a signal?]
/*P*/   int     p_sig;                  /* for core dump/debugger XXX */
/*P*/   u_long  p_code;                 /* for core dump/debugger XXX */
/*P?*/  struct  klist p_klist;          /* knotes attached to this process */
/*C?*/  LIST_HEAD(, mtx) p_heldmtx;     /* for debugging code */
/*CE?*/ struct mtx *p_blocked;          /* Mutex process is blocked on */
[depending on what this means ]
/*C*/   LIST_HEAD(, mtx) p_contested;   /* contested locks */

/* End area that is zeroed on creation. */
#define p_endzero       p_startcopy
  
/* The following fields are all copied upon creation in fork. */
#define p_startcopy     p_sigmask
        
/*P?*/  sigset_t p_sigmask;     /* Current signal mask. */
/*C?*/  stack_t p_sigstk;       /* sp & on stack state variable */
[what is the scope of a signal?]

/*??*/  int     p_magic;        /* Magic number. */

[The fields below would be in the KSEG if the priority of all KSEs in a KSEG
were to be calculated at one time.]

/*G*/   u_char  p_priority;     /* Process priority. */
/*G*/   u_char  p_usrpri;       /* User-priority based on p_cpu and p_nice. */
/*G*/   u_char  p_nativepri;    /* Priority before propogation. */
/*G*/   char    p_nice;         /* Process "nice" value. */
/*P*/   char    p_comm[MAXCOMLEN+1];
  
/*P*/   struct  pgrp *p_pgrp;   /* Pointer to process group. */
 
/*P*/   struct  sysentvec *p_sysent; /* System call dispatch information. */

/*G*/   struct  rtprio p_rtprio;        /* Realtime priority. */
[priorities ar eper KSEG]

/*P*/   struct  prison *p_prison;
/*P*/   struct  pargs *p_args;
[Either the whole Process is in gaol or it isn't]

/* End area that is copied on creation. */
#define p_endcopy       p_addr
/*P?*/  struct  user *p_addr;   /* Kernel virtual addr of u-area (PROC ONLY). */
[XXX    Are there 'per KSE' filds there? (actually yes there are...the pcb is
there).
/*C?*/  struct  mdproc p_md;    /* Any machine-dependent fields. */
[there is a trapframe there. not sure what it;s used for]
   
/*P*/   u_short p_xstat;        /* Exit status for wait; also stop signal. */
/*P*/   u_short p_acflag;       /* Accounting flags. */
[these may be collected per KSE and harvested when needed]
/*P*/   struct  rusage *p_ru;   /* Exit information. XXX */
        
/*P*/   int     p_nthreads;     /* number of threads (only in leader) */
[not sure how this is used... may become redundant]

/*G?*/  void    *p_aioinfo;     /* ASYNC I/O info */
[will aio be 'per KSE, per KSEG or per PROC?]

/*C*/   int     p_wakeup;       /* thread id */
[will surely change]
/*P*/   struct proc *p_peers;
/*P*/   struct proc *p_leader;
/*C*/   struct  pasleep p_asleep;       /* Used by asleep()/await(). */
/*P*/   void    *p_emuldata;    /* process-specific emulator state data */
/*C*/   struct ithd *p_ithd;    /* for interrupt threads only */
};
 

Obviously before we can really finish this we need to decide,
what the scope of signals is.. Who gets externally genrated signals?
Who gets signals that are the result of an action (e.g. SIGIO, SIGPIPE)?
WHich signals are diverted when you allocate a signal stack?
In the same context, what is the scope of aio?
where are the results delivered? who is responsible for the 
kernel threads that do the work? do we allocate a KSE to run them? etc.etc.
What is the scope of the timers and such?

All this makes a difference in where the fields live....

Does anyone have comments?
(Everyone has been VERY quiet so far!!!)

julian


-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000
---> X_.---._/  presently in:  Budapest
            v


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Nov 26 13: 0:35 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from io.yi.org (unknown [24.70.218.157])
	by hub.freebsd.org (Postfix) with ESMTP
	id 535DE37B479; Sun, 26 Nov 2000 13:00:15 -0800 (PST)
Received: from io.yi.org (localhost.gvcl1.bc.wave.home.com [127.0.0.1])
	by io.yi.org (Postfix) with ESMTP
	id D824BBA7A; Sun, 26 Nov 2000 13:00:14 -0800 (PST)
X-Mailer: exmh version 2.1.1 10/15/1999
To: arch@freebsd.org
Cc: smp@freebsd.org
Subject: review: callout patch
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Sun, 26 Nov 2000 13:00:14 -0800
From: Jake Burkholder <jburkhol@home.com>
Message-Id: <20001126210014.D824BBA7A@io.yi.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


This patch makes most of sys/kern/* sources use callout_reset for
registering callouts rather than timeout(9).  This should greatly
reduce the use of the fixed size callfree allocator pool.  Currently
we panic when it runs out.

This was motivated by NetBSD, who have completely removed timeout(9)
from their kernel.

Please review it.

Index: compat/linux/linux_misc.c
===================================================================
RCS file: /home/ncvs/src/sys/compat/linux/linux_misc.c,v
retrieving revision 1.88
diff -u -r1.88 linux_misc.c
--- compat/linux/linux_misc.c	2000/11/10 21:30:18	1.88
+++ compat/linux/linux_misc.c	2000/11/26 00:55:05
@@ -115,9 +115,9 @@
     old_it = p->p_realtimer;
     getmicrouptime(&tv);
     if (timevalisset(&old_it.it_value))
-	untimeout(realitexpire, (caddr_t)p, p->p_ithandle);
+	callout_stop(&p->p_itcallout);
     if (it.it_value.tv_sec != 0) {
-	p->p_ithandle = timeout(realitexpire, (caddr_t)p, tvtohz(&it.it_value));
+	callout_reset(&p->p_itcallout, tvtohz(&it.it_value), realitexpire, p);
 	timevaladd(&it.it_value, &tv);
     }
     p->p_realtimer = it;
Index: kern/init_main.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/init_main.c,v
retrieving revision 1.147
diff -u -r1.147 init_main.c
--- kern/init_main.c	2000/11/22 07:41:57	1.147
+++ kern/init_main.c	2000/11/26 00:21:00
@@ -312,6 +312,9 @@
 
 	bcopy("swapper", p->p_comm, sizeof ("swapper"));
 
+	callout_init(&p->p_itcallout, 0);
+	callout_init(&p->p_slpcallout, 0);
+
 	/* Create credentials. */
 	cred0.p_refcnt = 1;
 	cred0.p_uidinfo = uifind(0);
Index: kern/kern_acct.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_acct.c,v
retrieving revision 1.26
diff -u -r1.26 kern_acct.c
--- kern/kern_acct.c	2000/07/04 03:34:06	1.26
+++ kern/kern_acct.c	2000/11/26 07:30:52
@@ -77,11 +77,9 @@
 static void	acctwatch __P((void *));
 
 /*
- * Accounting callout handle used for periodic scheduling of
- * acctwatch.
+ * Accounting callout used for periodic scheduling of acctwatch.
  */
-static struct	callout_handle acctwatch_handle
-    = CALLOUT_HANDLE_INITIALIZER(&acctwatch_handle);
+static struct	callout acctwatch_callout;
 
 /*
  * Accounting vnode pointer, and saved vnode pointer.
@@ -148,7 +146,7 @@
 	 * close the file, and (if no new file was specified, leave).
 	 */
 	if (acctp != NULLVP || savacctp != NULLVP) {
-		untimeout(acctwatch, NULL, acctwatch_handle);
+		callout_stop(&acctwatch_callout);
 		error = vn_close((acctp != NULLVP ? acctp : savacctp), FWRITE,
 		    p->p_ucred, p);
 		acctp = savacctp = NULLVP;
@@ -161,6 +159,7 @@
 	 * free space watcher.
 	 */
 	acctp = nd.ni_vp;
+	callout_init(&acctwatch_callout, 0);
 	acctwatch(NULL);
 	return (error);
 }
@@ -329,5 +328,5 @@
 			log(LOG_NOTICE, "Accounting suspended\n");
 		}
 	}
-	acctwatch_handle = timeout(acctwatch, NULL, acctchkfreq * hz);
+	callout_reset(&acctwatch_callout, acctchkfreq * hz, acctwatch, NULL);
 }
Index: kern/kern_exit.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_exit.c,v
retrieving revision 1.104
diff -u -r1.104 kern_exit.c
--- kern/kern_exit.c	2000/11/22 07:41:58	1.104
+++ kern/kern_exit.c	2000/11/26 00:05:38
@@ -172,7 +172,7 @@
 	p->p_flag |= P_WEXIT;
 	SIGEMPTYSET(p->p_siglist);
 	if (timevalisset(&p->p_realtimer.it_value))
-		untimeout(realitexpire, (caddr_t)p, p->p_ithandle);
+		callout_stop(&p->p_itcallout);
 
 	/*
 	 * Reset any sigio structures pointing to us as a result of
Index: kern/kern_fork.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_fork.c,v
retrieving revision 1.84
diff -u -r1.84 kern_fork.c
--- kern/kern_fork.c	2000/11/22 07:41:58	1.84
+++ kern/kern_fork.c	2000/11/26 00:20:48
@@ -483,6 +483,9 @@
 	LIST_INIT(&p2->p_heldmtx);
 	LIST_INIT(&p2->p_contested);
 
+	callout_init(&p2->p_itcallout, 0);
+	callout_init(&p2->p_slpcallout, 0);
+
 #ifdef KTRACE
 	/*
 	 * Copy traceflag and tracefile if enabled.
Index: kern/kern_synch.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.110
diff -u -r1.110 kern_synch.c
--- kern/kern_synch.c	2000/11/22 07:41:58	1.110
+++ kern/kern_synch.c	2000/11/26 00:55:54
@@ -70,6 +70,9 @@
 int	lbolt;
 int	sched_quantum;		/* Roundrobin scheduling quantum in ticks. */
 
+static struct callout schedcpu_callout;
+static struct callout roundrobin_callout;
+
 static int	curpriority_cmp __P((struct proc *p));
 static void	endtsleep __P((void *));
 static void	maybe_resched __P((struct proc *chk));
@@ -175,7 +178,7 @@
  		need_resched();
 #endif
 
- 	timeout(roundrobin, NULL, sched_quantum);
+	callout_reset(&roundrobin_callout, sched_quantum, roundrobin, NULL);
 }
 
 /*
@@ -344,7 +347,7 @@
 	lockmgr(&allproc_lock, LK_RELEASE, NULL, CURPROC);
 	vmmeter();
 	wakeup((caddr_t)&lbolt);
-	timeout(schedcpu, (void *)0, hz);
+	callout_reset(&schedcpu_callout, hz, schedcpu, NULL);
 }
 
 /*
@@ -414,7 +417,6 @@
 {
 	struct proc *p = curproc;
 	int s, sig, catch = priority & PCATCH;
-	struct callout_handle thandle;
 	int rval = 0;
 	WITNESS_SAVE_DECL(mtx);
 
@@ -465,7 +467,7 @@
 		p, p->p_pid, p->p_comm, (void *) sched_lock.mtx_lock);
 	TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_slpq);
 	if (timo)
-		thandle = timeout(endtsleep, (void *)p, timo);
+		callout_reset(&p->p_slpcallout, timo, endtsleep, p);
 	/*
 	 * We put ourselves on the sleep queue and start our timeout
 	 * before calling CURSIG, as we could stop there, and a wakeup
@@ -517,7 +519,7 @@
 			goto out;
 		}
 	} else if (timo)
-		untimeout(endtsleep, (void *)p, thandle);
+		callout_stop(&p->p_slpcallout);
 	mtx_exit(&sched_lock, MTX_SPIN);
 
 	if (catch && (sig != 0 || (sig = CURSIG(p)))) {
@@ -628,7 +630,6 @@
 	s = splhigh();
 
 	if (p->p_wchan != NULL) {
-		struct callout_handle thandle;
 		int sig;
 		int catch;
 
@@ -646,7 +647,7 @@
 		 */
 
 		if (timo)
-			thandle = timeout(endtsleep, (void *)p, timo);
+			callout_reset(&p->p_slpcallout, timo, endtsleep, p);
 
 		sig = 0;
 		catch = priority & PCATCH;
@@ -687,7 +688,7 @@
 				goto out;
 			}
 		} else if (timo)
-			untimeout(endtsleep, (void *)p, thandle);
+			callout_stop(&p->p_slpcallout);
 		mtx_exit(&sched_lock, MTX_SPIN);
 
 		if (catch && (sig != 0 || (sig = CURSIG(p)))) {
@@ -1036,6 +1037,10 @@
 sched_setup(dummy)
 	void *dummy;
 {
+
+	callout_init(&schedcpu_callout, 1);
+	callout_init(&roundrobin_callout, 0);
+
 	/* Kick off timeout driven events by calling first time. */
 	roundrobin(NULL);
 	schedcpu(NULL);
Index: kern/kern_time.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_time.c,v
retrieving revision 1.70
diff -u -r1.70 kern_time.c
--- kern/kern_time.c	2000/04/18 15:15:20	1.70
+++ kern/kern_time.c	2000/11/26 01:13:48
@@ -513,10 +513,10 @@
 	s = splclock(); /* XXX: still needed ? */
 	if (uap->which == ITIMER_REAL) {
 		if (timevalisset(&p->p_realtimer.it_value))
-			untimeout(realitexpire, (caddr_t)p, p->p_ithandle);
+			callout_stop(&p->p_itcallout);
 		if (timevalisset(&aitv.it_value)) 
-			p->p_ithandle = timeout(realitexpire, (caddr_t)p,
-						tvtohz(&aitv.it_value));
+			callout_reset(&p->p_itcallout, tvtohz(&aitv.it_value),
+			    realitexpire, p);
 		getmicrouptime(&ctv);
 		timevaladd(&aitv.it_value, &ctv);
 		p->p_realtimer = aitv;
@@ -560,8 +560,8 @@
 		if (timevalcmp(&p->p_realtimer.it_value, &ctv, >)) {
 			ntv = p->p_realtimer.it_value;
 			timevalsub(&ntv, &ctv);
-			p->p_ithandle = timeout(realitexpire, (caddr_t)p,
-			    tvtohz(&ntv) - 1);
+			callout_reset(&p->p_itcallout, tvtohz(&ntv) - 1,
+			    realitexpire, p);
 			splx(s);
 			return;
 		}
Index: kern/uipc_domain.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/uipc_domain.c,v
retrieving revision 1.22
diff -u -r1.22 uipc_domain.c
--- kern/uipc_domain.c	1999/08/28 00:46:21	1.22
+++ kern/uipc_domain.c	2000/11/26 07:09:06
@@ -61,6 +61,9 @@
 static void domaininit __P((void *));
 SYSINIT(domain, SI_SUB_PROTO_DOMAIN, SI_ORDER_FIRST, domaininit, NULL)
 
+static struct callout pffast_callout;
+static struct callout pfslow_callout;
+
 static void	pffasttimo __P((void *));
 static void	pfslowtimo __P((void *));
 
@@ -136,9 +139,12 @@
 
 	if (max_linkhdr < 16)		/* XXX */
 		max_linkhdr = 16;
+
+	callout_init(&pffast_callout, 0);
+	callout_init(&pfslow_callout, 0);
 
-	timeout(pffasttimo, (void *)0, 1);
-	timeout(pfslowtimo, (void *)0, 1);
+	callout_reset(&pffast_callout, 1, pffasttimo, NULL);
+	callout_reset(&pfslow_callout, 1, pfslowtimo, NULL);
 }
 
 
@@ -214,7 +220,7 @@
 		for (pr = dp->dom_protosw; pr < dp->dom_protoswNPROTOSW; pr++)
 			if (pr->pr_slowtimo)
 				(*pr->pr_slowtimo)();
-	timeout(pfslowtimo, (void *)0, hz/2);
+	callout_reset(&pfslow_callout, hz/2, pfslowtimo, NULL);
 }
 
 static void
@@ -228,5 +234,5 @@
 		for (pr = dp->dom_protosw; pr < dp->dom_protoswNPROTOSW; pr++)
 			if (pr->pr_fasttimo)
 				(*pr->pr_fasttimo)();
-	timeout(pffasttimo, (void *)0, hz/5);
+	callout_reset(&pffast_callout, hz/5, pffasttimo, NULL);
 }
Index: sys/proc.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/proc.h,v
retrieving revision 1.124
diff -u -r1.124 proc.h
--- sys/proc.h	2000/11/22 07:42:01	1.124
+++ sys/proc.h	2000/11/26 00:28:23
@@ -157,10 +157,6 @@
 	LIST_ENTRY(proc) p_sibling;	/* List of sibling processes. */
 	LIST_HEAD(, proc) p_children;	/* Pointer to list of children. */
 
-	struct callout_handle p_ithandle; /*
-					      * Callout handle for scheduling
-					      * p_realtimer.
-					      */
 /* The following fields are all zeroed upon creation in fork. */
 #define	p_startzero	p_oppid
 
@@ -173,11 +169,13 @@
 	u_int	p_estcpu;	 /* Time averaged value of p_cpticks. */
 	int	p_cpticks;	 /* Ticks of cpu time. */
 	fixpt_t	p_pctcpu;	 /* %cpu for this process during p_swtime */
+	struct	callout p_slpcallout;	/* Callout for sleep. */
 	void	*p_wchan;	 /* Sleep address. */
 	const char *p_wmesg;	 /* Reason for sleep. */
 	u_int	p_swtime;	 /* Time swapped in or out. */
 	u_int	p_slptime;	 /* Time since last blocked. */
 
+	struct	callout p_itcallout;	/* Interval timer callout. */
 	struct	itimerval p_realtimer;	/* Alarm timer. */
 	u_int64_t p_runtime;		/* Real time in microsec. */
 	u_int64_t p_uu;			/* Previous user time in microsec. */


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Nov 26 13:38:33 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7D14B37B479; Sun, 26 Nov 2000 13:38:26 -0800 (PST)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id QAA21022;
	Sun, 26 Nov 2000 16:37:52 -0500 (EST)
Date: Sun, 26 Nov 2000 16:37:49 -0500 (EST)
From: Daniel Eischen <eischen@vigrid.com>
To: Julian Elischer <julian@elischer.org>
Cc: arch@FreeBSD.ORG, jasone@FreeBSD.ORG
Subject: Re: Threads (KSE etc) comments
In-Reply-To: <3A211C82.2464D07E@elischer.org>
Message-ID: <Pine.SUN.3.91.1001126162942.20005A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Sun, 26 Nov 2000, Julian Elischer wrote:
> Where's jasone? He's been perculiarly silent during this....
> 
> There has been some discussion as to what the function of the KSEG
> is....
> 
> I was shaving today (doesn't the best thinkin happen at such times?)
> and thinking about why we needed KSEGs.
> 
> The basic answer is, 
> 
> "We need some method by which we group the scheduled entities 
> so as to be able to ensure that the scheduler has full 
> information and control over what is going on."

Which scheduler - the UTS or the kernel sheduler?  The UTS need
not know about KSEGs, except if that is the only way to get a
quantum.

> Whether we actually need a KSEG and what it does depends upon what
> semantics we want our threading support to have. If we want to provide a
> virtual machine for the process, that looks as if it has an unlimited
> number of virtual processors, then we allow the KSEG to spawn an
> unlimited number of KSEs. In this case, do we allow the "scheduling
> clout" to build up linearly with the number of KSEs or do we limit it in
> some way? Theoretically you would want a KSEG with two KSEs to have the
> same clout as a process running unthreaded, so that cpu time would be
> divided 50-50. However this would mean assigning the threaded process
> 'partial quantum' for each processor.
> 
> By this I mean that after 5 ticks the KSE on each processor for the KSE
> would be interrupted and the other process allowed to run. This is
> unworkable. Another way of sharing the processors between the two
> processors would be to schedule bith KSEs on one process and allow the
> other process to run uninterrupted on the other. This is also quite
> unworkable - what if there are three competing processes and only 2
> processors? 
> 
> Maybe this 'exact fairness' is too hard to achieve..
> 
> In my world, we allow the KSEG to become SLIGHTLTY unfair, by allowing
> it to compete independently on each processor. If we allow the KSEG to
> have an unlimited number of KSEs then we need some other item that
> competes on behalf of the KSEG on each processor. That is, we invent
> some other structure (KSEG-agent) that sits in the scheduling queue(s)

I like Terry's usage of "scheduler reservation" which includes quantum
and priority.

> on behalf of the  KSEG. When the 'agent' gets a quantum, it allows the
> KSEG to decide which of it's KSEGs will be run next. (The KSEG could
                               ^^^^^ KSEs
> round robin them for example).

If you are going to afford N quantum (for N CPUs) to a KSE, then it
doesn't make sense to have more than N KSEs within that KSEG.  From the
UTS point of view, I will not attempt to create/ask for more than N
KSEs.  Let's ignore this case.

> 
> When a KSE is pre-empted, the kernel saves state for that thread in the
> thread-control-block and the next KSE to upcall to the UTS will include
> that thread-control-block in its list of reportable entities. I'm not
> clear on whether it's the next upcall on ANY KSE, or just the next
> upcall on that KSE.

It has to be on the next KSE, otherwise there will be too much
latency (possibly priority inversion) for RT threads if they are
being blocked by a preempted thread.  For instance if a thread is
within a critical region and the KSE on which it is running is
preempted, and the next KSE to execute is running in RT (it's a
scope system thread).  The RT KSE must get notification of the
preemption so it can resume the thread that was preempted long
enough for it to leave the critical region.  It should also be
noted that without notification that the RT KSE cannot determine
which thread is blocking it.  At the minimum, the RT KSE must be
able to search all the other KSE mailboxes to find the thread
that is blocking it.  You also have read-write hazards that have
to be avoided (what happens when the preempted KSE is resumed
on another processor while the RT KSE is resuming the preempted
thread?).  One idea I had was that the RT KSE (in this case) would
issue a system call to halt resumption of preempted KSE.  It would
then resume the preempted thread until it leaves the critical
region, updates the preempted KSEs mailbox, and issues another
system call to release the preempted KSE.  Critical regions are
very brief so this would not be an often occurrence.

Anyway, these problems really have to be worked out.  I really want
this to work well with a mix of RT and non-RT threads.  You could
have the same problem with threads of the same scheduling class
and it would be possible that no KSE makes any progress until the
preempted KSE gets its turn to run again.

> If the latter then having multiple KSEs on the same processor, allows
> the KSEG round-robin scheduler to make the UTS believe that it has N
> virtual processors, (N-KSEs). However, it also means that the KSEG
> round-robin scheduler is usurping the decision from the UTS as to which
> thread is to be run next, as the UTS doesn't know that the thread on the
> other KSE was pre-empted in favour of this one. (It's on a different
> virtual CPU).

It has to know.  See above.

> 
> If the Former (All KSEs report all events) then there is no real
> advantage to having more than N KSEs (N processors), because that means
> that the UTS will probably keep swapping the threads it thinks are most
> important to the KSEs which means that the thread that was pre-empted on
> KSE-A will probably be re-scheduled on KSE-B when it preempts KSE-A. So
> why have KSE-B at all? All it does is massively confuse things, and
> creates a whole new class of scheduling problems.

I am going to assume that you are talking about KSEs that have their
own scheduling quantum (I agree that it doesn't make sense to have
more than N KSEs if they don't have their own quantum).  What has
gone unasked is "what is the application interface to allow creation
of KSEs, quantum, thread/kse/processor binding?".  Let's look at
it from the UTS and application interface point of view.  Let's also
ignore scope system threads; they are uninteresting since we know
how they are scheduled.  So the question now is how are scope process
threads scheduled and what API is presented to the application?

I am in the middle of writing up my notes on this topic, and will post
them when I'm done.  But a brief synopsis is that we want to allow
the application to bind scope process threads to a specific KSE, bind
KSEs to a specific processor, and to allow creation of additional
quantum (KSEs or KSEGs, subject to limitations of course).  This
allows the application to decide how threads are scheduled.  If a
thread is bound to a specific KSE, then it is not rescheduled on
another KSE when it is preempted (unless it is in a critical region).
If a thread is not bound to a specific KSE and it is preempted, then
the UTS could decide to only reschedule it on the next KSE to execute
if there were no other threads of greator or equal priority.  The
UTS could also decide not to reschedule it regardless; this gets
into what scheduling allocation domain we are using.  For scheduling
allocation domains > 1, it is valid (perhaps against POLA) to have
multiple scheduling queues.  I submit that it is difficult for the
UTS to decide how to (soft or hard) bind threads to KSEs -- perhaps
we want to try to do this in the future, but let's keep it simple
for now.  Let the application decide how threads are bound to KSEs
and how much quantum (KSEs or KSEGs) it wants.  This makes it much
easier for the UTS and doesn't "massively confuse things".

> So, in summary:
> Assuming we allow only SLIGHT unfairness, if you allow the process to
> have more than N KSEs in a KSEG, you have one of the following:
> 1/ A lot of unfairness if you allow each KSE to be in the queues by
> itself.

No more than LinuxThreads or fork()'d processes.  Again, this can
be limited just as there is a user process limit.  I don't see this
as a problem.

> 2/ The KSEG scheduler usurping the role of the UTS if it really does
> hide the true number of processors.
> 3/ An increased level of UTS complexity, and un-needed work, as the UTS
> struggles to switch the important threads onto the ever-changing set of
> running KSEs (it must be ever changing because there are more of them
> than CPUs).

Not really true.  I've addressed this above.

> If you only allow N KSEs to the KSEG, then all these problems go away.
> The UTS can be aware that it has a limit. But it can also be aware that
> a KSE will not be re-empted by another of it's own KSEs. (this
> simplifies things). It gets the same amount of
> CPU-time, but has less work to do. It has full control of which threads
> are running,
> and competes fairly with other processes and KSEGs.

Whether there are N or N+d KSEs, it makes no difference to the UTS.
The same problem of scheduling scope process threads over more than
1 KSE exists; it is no more difficult or simple with a limit of N
KSEs.

> The reason for having KSEGs is simply as an entity that competes for CPU
> to assure fairness.

My argument is that if you assign the quantum (and priority) to the
KSE, then the _KSE_ is the entity that competes for CPU fairness.  There
is no visible advantage to me of having a KSEG, especially forcing
knowledge of this to the UTS when it doesn't really care.

> It may not even exist as a separate structure in the case where there
> are separate per-CPU scheduling queues, (though I think it would for
> efficiency's sake). It would PROBABLY have a analogous partner in the
> UTS that represents the virtual machine that runs all the threads that
> are competing at the same scope. On a single scheduling queue system, I
> think I would have the KSEG in the queue rather than the independent
> KSEs. When it get's to the head, you schedule
> KSEs on all the CPUs. This allows the threads to communicate quickly
> using shared memory should they want. The UTS has the entire quantum
> across as many CPUs as it has.

I'm confused.  Now you seem to be advocating having multiple KSEs
with one quantum.

> I hope that his answers some of the questions as to why I think there
> are reasons for having the KSEG entity.

I am not convinced :-)  I think we need to look more closely at what
the UTS needs and what API (both POSIX and non-POSIX) is needed/desired.
My point is that the UTS doesn't need to know about the KSEG.  If that's
the only way to get a quantum, then I guess it'll be forced to know
about it.  But also keep in mind that the UTS could also create a
KSEG just as easily as a KSE in order to provide additional quantum.
It already has to do this for system scope threads.

-- 
"Some folks are into open source, but me, I'm into open bar."
                                          -- Spencer F. Katt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Nov 26 13:49:32 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP
	id BF1D237B479; Sun, 26 Nov 2000 13:49:29 -0800 (PST)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id QAA22240;
	Sun, 26 Nov 2000 16:49:05 -0500 (EST)
Date: Sun, 26 Nov 2000 16:49:05 -0500 (EST)
From: Daniel Eischen <eischen@vigrid.com>
To: Julian Elischer <julian@elischer.org>
Cc: arch@FreeBSD.ORG, jasone@FreeBSD.ORG
Subject: Re: Threads .. chopping up 'struct proc'
In-Reply-To: <3A216FFE.BE0F780F@elischer.org>
Message-ID: <Pine.SUN.3.91.1001126164028.20005B-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Sun, 26 Nov 2000, Julian Elischer wrote:
> I'v been looking a the proc srtucture..
> 
> The aim is to eventually move some of the fields into a
> struct KSE (struct schedbox?)
> struct KSEC (struct threadcontext?)
> struct KSEG (struct schedgroup?)

[ ... ]

> I note with [XXX] things I am sure about, or do nut really understand.
> these are usually new fields to do with things like events, or fields 
> where the semantics of the feature have not been decided for a 
> threaded environment.  E.g. WHO GETS A SIGNAL?

First KSE to execute I suppose.  A signal is just an upcall, so
I'd assume you would want to treat this the same as if a KSE was
preempted.

> Obviously before we can really finish this we need to decide,
> what the scope of signals is.. Who gets externally genrated signals?
> Who gets signals that are the result of an action (e.g. SIGIO, SIGPIPE)?
> WHich signals are diverted when you allocate a signal stack?
> In the same context, what is the scope of aio?
> where are the results delivered? who is responsible for the 
> kernel threads that do the work? do we allocate a KSE to run them? etc.etc.

Have the kernel automaticially allocate a separate KSE (or KSEG) with 
quantum for aio?

> Does anyone have comments?
> (Everyone has been VERY quiet so far!!!)

Not me :-)  Remember that COMDEX was two weeks ago and last week
(and this weekend) was a holiday week in the US.  I suspect folks
are just plain busy.

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Nov 26 13:56:50 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from prism.flugsvamp.com (cb58709-a.mdsn1.wi.home.com [24.17.241.9])
	by hub.freebsd.org (Postfix) with ESMTP
	id 1324A37B479; Sun, 26 Nov 2000 13:56:46 -0800 (PST)
Received: (from jlemon@localhost)
	by prism.flugsvamp.com (8.11.0/8.11.0) id eAQLtQ433741;
	Sun, 26 Nov 2000 15:55:26 -0600 (CST)
	(envelope-from jlemon)
Date: Sun, 26 Nov 2000 15:55:26 -0600
From: Jonathan Lemon <jlemon@flugsvamp.com>
To: Jake Burkholder <jburkhol@home.com>
Cc: arch@FreeBSD.ORG, smp@FreeBSD.ORG
Subject: Re: review: callout patch
Message-ID: <20001126155526.K69183@prism.flugsvamp.com>
References: <20001126210014.D824BBA7A@io.yi.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre2i
In-Reply-To: <20001126210014.D824BBA7A@io.yi.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Sun, Nov 26, 2000 at 01:00:14PM -0800, Jake Burkholder wrote:
> 
> This patch makes most of sys/kern/* sources use callout_reset for
> registering callouts rather than timeout(9).  This should greatly
> reduce the use of the fixed size callfree allocator pool.  Currently
> we panic when it runs out.
> 
> This was motivated by NetBSD, who have completely removed timeout(9)
> from their kernel.

Looks good to me.  I was moving the the same direction, but didn't know
that NetBSD had already done this.
--
Jonathan


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Nov 26 14:39:30 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from green.dyndns.org (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP
	id 12F4F37B479; Sun, 26 Nov 2000 14:39:15 -0800 (PST)
Received: from localhost (vuvjir@localhost [127.0.0.1])
	by green.dyndns.org (8.11.0/8.11.0) with ESMTP id eAQMd0576413;
	Sun, 26 Nov 2000 17:39:07 -0500 (EST)
	(envelope-from green@FreeBSD.org)
Message-Id: <200011262239.eAQMd0576413@green.dyndns.org>
X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4
To: Julian Elischer <julian@elischer.org>
Cc: arch@FreeBSD.org, jasone@FreeBSD.org
Subject: Re: Threads .. chopping up 'struct proc' 
In-Reply-To: Message from Julian Elischer <julian@elischer.org> 
   of "Sun, 26 Nov 2000 12:18:06 PST." <3A216FFE.BE0F780F@elischer.org> 
From: "Brian F. Feldman" <green@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Sun, 26 Nov 2000 17:38:59 -0500
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Julian Elischer <julian@elischer.org> wrote:
> I'v been looking a the proc srtucture..
> 
> The aim is to eventually move some of the fields into a
> struct KSE (struct schedbox?)
> struct KSEC (struct threadcontext?)
> struct KSEG (struct schedgroup?)

Sounds about right, as far as I've been following the discussion (I read all 
of -arch, but don't follow -smp at all since I just don't have SMP ;)

My question thus far is, okay, given a proc has one of each; will a set of 
threads, in any form, ALWAYS have a proc backing it up?  It would make sense 
as such, and in that case I'd think that you would reduce a lot of the 
complexity in the switchover.

> Initially we would simply include one of each of these in the struct proc,
> but link them together as if they were correctly connected up.
> we would use macros such as:
> #define p_estcpu p_kse.kse_estcpu
> to keep present code working....
> eventually functions that get changed to receive a kse directly
> would just use kse->kse_estcpu and if they need proc they
> can use kse->kse_proc. But until then, we'd start by simply 
> separating the fields and using macros. Then we can convert 
> calls at our leasure.

What would be the difference between doing it "right" for struct proc in the 
first place rather than dummying them up?  I wouldn't want an artificial 
discrepancy here, if possible.  Perhaps you could explain a bit more of the 
vision you have here?  I haven't been able to pick that bit up from your 
posts as of yet.  A KSE of just one thread would seem to logically be 
handled the exact same as a process.

> However when going through the fields in struct proc,
> some difficulties become obvious. Here's my initial 
> division of the fields. I've added a comment at the 
> beginning of each line that indicates where I think 
> it should go, however I'm not convinced about some of them:
> 
> P = stays in struct proc
> E = goes to 'KSE' struct (schedulable entity)
> G = goes to 'group' struct
> C = goes to 'sleepable Context' struct.

Does each KSE get a sleepable context?  I don't know if I really see where 
it fits in; sounds like it would have a 1:1 mapping with KSEs.

> I note with [XXX] things I am sure about, or do nut really understand.
> these are usually new fields to do with things like events, or fields 
> where the semantics of the feature have not been decided for a 
> threaded environment.  E.g. WHO GETS A SIGNAL?
> 
> struct  proc {
> /*E*/   TAILQ_ENTRY(proc) p_procq;      /* run/mutex queue. */
> [this may need to be split to two entries.. one in a KSE or
>  and one in a KSEG, depending on how we do things ]
> 
> /*C*/   TAILQ_ENTRY(proc) p_slpq;       /* sleep queue. */
> /*P*/   LIST_ENTRY(proc) p_list;        /* List of all processes. */
> 
>         /* substructures: */
> /*P*/   struct  pcred *p_cred;          /* Process owner's identity. */
> /*P*/   struct  filedesc *p_fd;         /* Ptr to open files structure. */
> /*P*/   struct  pstats *p_stats;        /* Accounting/statistics (PROC ONLY). */
> [some of these may need to be duplicated in the KSE and KSEG.. 
> maybe even Context]

Sounds particularly evil to have a set of statistics in the process and in 
the KSEs.  How about only in the KSEs, and in the "traditional" case, the 
process usage info for example would be the addition of all that of the KSEs.

> /*P*/   struct  plimit *p_limit;        /* Process limits. */
> /*P*/   struct  vm_object *p_upages_obj;/* Upages object */

This maps to a KSE, really... The struct user maps to the signal handlers 
(should be per-KSE, I think...), the stats, and the pcb.  The pcb absolutely 
has to be one per CPU context, so proc won't work :)

> /*P*/   struct  procsig *p_procsig;
> [Well, actually who gets signals?  maybe this is per KSE? per KSEG?
> maybe even per Context as each context has a different user stack and
> signals are delivered on the user stack.. (unless set otherwise)]

I would think that a KSE should own its own and that it should be 
configurable whether to use the signal info per-KSE or per-proc.

> #define p_sigacts       p_procsig->ps_sigacts
> #define p_sigignore     p_procsig->ps_sigignore
> #define p_sigcatch      p_procsig->ps_sigcatch
>  
> #define p_ucred         p_cred->pc_ucred
> #define p_rlimit        p_limit->pl_rlimit
>  
> /*C*/   int     p_flag;                 /* P_* flags. */
> [these flags will probably need to be shared out amongst the structures]
> /*C*/   char    p_stat;                 /* S* process status. */
> [as will these]
>         char    p_pad1[3];
>  
> /*P*/   pid_t   p_pid;                  /* Process identifier. */

If signals are per-KSE, would it then follow to give a KSEG a process id and 
each KSE another process id (same namespace as pids) that could be used to 
signal it and whatnot?

> /*P*/   LIST_ENTRY(proc) p_hash;        /* Hash chain. */
> /*P*/   LIST_ENTRY(proc) p_pglist;      /* List of processes in pgrp. */
> /*P*/   struct  proc *p_pptr;           /* Pointer to parent process. */
> /*P*/   LIST_ENTRY(proc) p_sibling;     /* List of sibling processes. */ 
> /*P*/   LIST_HEAD(, proc) p_children;   /* Pointer to list of children. */

Would non-RFMEM-fork()ed processes be the only ones here, and RFMEM ones 
automatically become a KSE of the proc?

> /*P*/   struct callout_handle p_ithandle; /*
>                                               * Callout handle for scheduling
>                                               * p_realtimer.
>                                               */
> [So who gets the resulting signal? Can differnt KSEGs have
> different timers running? what about KSEs? (I vote for KSEGs)]

KSEGs would be simplest.  BTW, I don't recall there really being a 
difference between a KSEG and a process containing KSEs.  Is there one?

> /* The following fields are all zeroed upon creation in fork. */
> #define p_startzero     p_oppid
>   
> /*P*/   pid_t   p_oppid;         /* Save parent pid during ptrace. XXX */ 
> /*C*/   int     p_dupfd;         /* Sideways return value from fdopen. XXX */
> [whatever THIS means.. it's a hack so C is the safest place for it]

Per-KSE?  Optionally, it would be nice to squash these kind of hacks.

> /*P*/   struct  vmspace *p_vmspace;     /* Address space. */
>  
>         /* scheduling */
> [I've shown the following as being in the KSE structure. they would be 
> collected there, but the priority is worked out for the entire KSEG
> so it probably collects the data from all of the KSEs. UNLESS we decide that
> all KSEs can have independent priorities, in which case how do you
> control how their priorities relate..]
> 
> /*E*/   u_int   p_estcpu;        /* Time averaged value of p_cpticks. */
> /*E*/   int     p_cpticks;       /* Ticks of cpu time. */
> /*E*/   fixpt_t p_pctcpu;        /* %cpu for this process during p_swtime */
>         void    *p_wchan;        /* Sleep address. */
>         const char *p_wmesg;     /* Reason for sleep. */
> /*P*/   u_int   p_swtime;        /* Time swapped in or out. */
> /*E?*/  u_int   p_slptime;       /* Time since last blocked. */
> [what does this mean?]

The scheduler updates the amount of time the process has been in a tsleep() 
(msleep()?).  Should then be KSE, along with the process states and whatnot.

> /*?*/   struct  itimerval p_realtimer;  /* Alarm timer. */
> [who gets these? who can set them? what is their scope?]

Same as signals, no?

> /*P*/   u_int64_t p_runtime;            /* Real time in microsec. */
> 
> [If we treat separate KSEGs as seperate processes, do we keep the
> below fields per KSEG? */
> /*G?*/  u_int64_t p_uu;                 /* Previous user time in microsec. */
> /*G?*/  u_int64_t p_su;                 /* Previous system time in microsec. */
> /*G?*/  u_int64_t p_iu;                 /* Previous interrupt time in usec. */
> [how about these? do we agregate? or collect per KSE? Is there a separate
> statclock per CPU?]
> /*P?*/  u_int64_t p_uticks;             /* Statclock hits in user mode. */
> /*P?*/  u_int64_t p_sticks;             /* Statclock hits in system mode. */
> /*P?*/  u_int64_t p_iticks;             /* Statclock hits processing intr. */
> 
> /*P*/   int     p_traceflag;            /* Kernel trace points. */
> /*P*/   struct  vnode *p_tracep;        /* Trace to vnode. */
> [do we trace all KSEs at once? how do we trace individual threads? */

I'd think we'd want to enable tracing an individual KSE; this could be done 
by making the trace vnode per-KSE, but I think it would be advantageous just 
to change the ktrace info to include both the PID and the KSEid.

> /*P*/   sigset_t p_siglist;             /* Signals arrived but not delivered. */
> [who gets signals? does each KSEG (KSE?) have its own handler?]

Hm.  Do you think there's a good use for separate signal-spaces, actually?  
How would thread migration (across KSEs) be handled for signals, then?  Not 
at all?

> /*P*/   struct  vnode *p_textvp;        /* Vnode of executable. */
> 
> /*P*/   char    p_lock;                 /* Process lock (prevent swap) count. */
> /*E*/   u_char  p_oncpu;                /* Which cpu we are on */
> /*E?*/  u_char  p_lastcpu;              /* Last cpu we were on */
> [each context or each KSE? KSEs can't migrate, (under discussion)]

If I may, I believe KSEs should be able to migrate.  It doesn't much make 
sense to waste a CPU at no utilization by saying "KSE x runs on CPU 0, y on 
1, and z on 0" and if y is blocked and x and z are both runnable, they must 
compete for CPU 0 instead of splitting across.

> /*EG?*/ char    p_rqindex;              /* Run queue index */
> Who is on the run queue? KSE or KSEG?
>    
> /*C*/   short   p_locks;                /* DEBUG: lockmgr count of held locks */
> /*C*/   short   p_simple_locks;         /* DEBUG: count of held simple locks */
> [If you cannot sleep or be interrupted with these they could be in the KSE]

You can hold a lockmgr() lock while msleep()ing...

> /*P?*/  unsigned int    p_stops;        /* procfs event bitmask */
> /*P?*/  unsigned int    p_stype;        /* procfs stop event type */
> /*P?*/  char    p_step;                 /* procfs stop *once* flag */
> /*P?*/  unsigned char   p_pfsflags;     /* procfs flags */
> [the procfs stuff is problematical... dependign in what it does 
> and what it is used for, the semantics might vary]

Procfs would need modifications if we want to make KSEs visible in it, and 
this could be trouble...

>         char    p_pad3[2];              /* padding for alignment */
> /*C*/   register_t p_retval[2];         /* syscall aux returns */

E?

> /*P*/   struct  sigiolst p_sigiolst;    /* list of sigio sources */
> [who gets signals?]
> 
> /*P*/   int     p_sigparent;            /* signal to parent on exit */
> /*P*/   sigset_t p_oldsigmask;          /* saved mask from before sigpause */
> [one per signal scope.. what IS the scope of a signal?]
> /*P*/   int     p_sig;                  /* for core dump/debugger XXX */
> /*P*/   u_long  p_code;                 /* for core dump/debugger XXX */
> /*P?*/  struct  klist p_klist;          /* knotes attached to this process */

That seems right.

> /*C?*/  LIST_HEAD(, mtx) p_heldmtx;     /* for debugging code */
> /*CE?*/ struct mtx *p_blocked;          /* Mutex process is blocked on */
> [depending on what this means ]

E.

> /*C*/   LIST_HEAD(, mtx) p_contested;   /* contested locks */

Why not E?

> /* End area that is zeroed on creation. */
> #define p_endzero       p_startcopy
>   
> /* The following fields are all copied upon creation in fork. */
> #define p_startcopy     p_sigmask
>         
> /*P?*/  sigset_t p_sigmask;     /* Current signal mask. */
> /*C?*/  stack_t p_sigstk;       /* sp & on stack state variable */
> [what is the scope of a signal?]
> 
> /*??*/  int     p_magic;        /* Magic number. */
> 
> [The fields below would be in the KSEG if the priority of all KSEs in a KSEG
> were to be calculated at one time.]
> 
> /*G*/   u_char  p_priority;     /* Process priority. */
> /*G*/   u_char  p_usrpri;       /* User-priority based on p_cpu and p_nice. */
> /*G*/   u_char  p_nativepri;    /* Priority before propogation. */
> /*G*/   char    p_nice;         /* Process "nice" value. */
> /*P*/   char    p_comm[MAXCOMLEN+1];
>   
> /*P*/   struct  pgrp *p_pgrp;   /* Pointer to process group. */
>  
> /*P*/   struct  sysentvec *p_sysent; /* System call dispatch information. */
> 
> /*G*/   struct  rtprio p_rtprio;        /* Realtime priority. */
> [priorities ar eper KSEG]
> 
> /*P*/   struct  prison *p_prison;
> /*P*/   struct  pargs *p_args;
> [Either the whole Process is in gaol or it isn't]
> 
> /* End area that is copied on creation. */
> #define p_endcopy       p_addr
> /*P?*/  struct  user *p_addr;   /* Kernel virtual addr of u-area (PROC ONLY). */
> [XXX    Are there 'per KSE' filds there? (actually yes there are...the pcb is
> there).

The contents should be reevaluated.

> /*C?*/  struct  mdproc p_md;    /* Any machine-dependent fields. */
> [there is a trapframe there. not sure what it;s used for]

Trapframe?  E.

> /*P*/   u_short p_xstat;        /* Exit status for wait; also stop signal. */
> /*P*/   u_short p_acflag;       /* Accounting flags. */
> [these may be collected per KSE and harvested when needed]
> /*P*/   struct  rusage *p_ru;   /* Exit information. XXX */
>         
> /*P*/   int     p_nthreads;     /* number of threads (only in leader) */
> [not sure how this is used... may become redundant]
> 
> /*G?*/  void    *p_aioinfo;     /* ASYNC I/O info */
> [will aio be 'per KSE, per KSEG or per PROC?]

Probably the same as signals, but I'd be inclined to say per proc, keeping 
in mind that the aio is a separate thread.

> /*C*/   int     p_wakeup;       /* thread id */
> [will surely change]
> /*P*/   struct proc *p_peers;
> /*P*/   struct proc *p_leader;
> /*C*/   struct  pasleep p_asleep;       /* Used by asleep()/await(). */
> /*P*/   void    *p_emuldata;    /* process-specific emulator state data */

Should probably have another KSE-specific one, if needed.  That is, planning 
ahead :)

> /*C*/   struct ithd *p_ithd;    /* for interrupt threads only */
> };
>  
> 
> 
> 
> Obviously before we can really finish this we need to decide,
> what the scope of signals is.. Who gets externally genrated signals?
> Who gets signals that are the result of an action (e.g. SIGIO, SIGPIPE)?
> WHich signals are diverted when you allocate a signal stack?
> In the same context, what is the scope of aio?
> where are the results delivered? who is responsible for the 
> kernel threads that do the work? do we allocate a KSE to run them? etc.etc.
> What is the scope of the timers and such?

You can always be flexible enough to have a system call to set the behavior.

> All this makes a difference in where the fields live....
> 
> Does anyone have comments?
> (Everyone has been VERY quiet so far!!!)

I'll be less quiet now, at least!

> julian
> 
> 
> -- 
>       __--_|\  Julian Elischer
>      /       \ julian@elischer.org
>     (   OZ    ) World tour 2000
> ---> X_.---._/  presently in:  Budapest
>             v

--
 Brian Fundakowski Feldman           \  FreeBSD: The Power to Serve!  /
 green@FreeBSD.org                    `------------------------------'


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Nov 26 15:59:34 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.interware.hu (mail.interware.hu [195.70.32.130])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9606837B479; Sun, 26 Nov 2000 15:59:25 -0800 (PST)
Received: from luanda-56.budapest.interware.hu ([195.70.51.56] helo=elischer.org)
	by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian))
	id 140BhM-0001ld-00; Mon, 27 Nov 2000 00:59:09 +0100
Message-ID: <3A21A3C7.A836DE09@elischer.org>
Date: Sun, 26 Nov 2000 15:59:03 -0800
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: "Brian F. Feldman" <green@FreeBSD.org>
Cc: arch@FreeBSD.org, jasone@FreeBSD.org
Subject: Re: Threads .. chopping up 'struct proc'
References: <200011262239.eAQMd0576413@green.dyndns.org>
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

"Brian F. Feldman" wrote:
> 
> Julian Elischer <julian@elischer.org> wrote:
> > I'v been looking a the proc srtucture..
> >
> > The aim is to eventually move some of the fields into a
> > struct KSE (struct schedbox?)
> > struct KSEC (struct threadcontext?)
> > struct KSEG (struct schedgroup?)
> 
> Sounds about right, as far as I've been following the discussion (I read all
> of -arch, but don't follow -smp at all since I just don't have SMP ;)
> 
> My question thus far is, okay, given a proc has one of each; will a set of
> threads, in any form, ALWAYS have a proc backing it up?  It would make sense
> as such, and in that case I'd think that you would reduce a lot of the
> complexity in the switchover.
> 
> > Initially we would simply include one of each of these in the struct proc,
> > but link them together as if they were correctly connected up.
> > we would use macros such as:
> > #define p_estcpu p_kse.kse_estcpu
> > to keep present code working....
> > eventually functions that get changed to receive a kse directly
> > would just use kse->kse_estcpu and if they need proc they
> > can use kse->kse_proc. But until then, we'd start by simply
> > separating the fields and using macros. Then we can convert
> > calls at our leasure.
> 
> What would be the difference between doing it "right" for struct proc in the
> first place rather than dummying them up?  I wouldn't want an artificial
> discrepancy here, if possible.  Perhaps you could explain a bit more of the
> vision you have here?  I haven't been able to pick that bit up from your
> posts as of yet.  A KSE of just one thread would seem to logically be
> handled the exact same as a process.
> 
> > However when going through the fields in struct proc,
> > some difficulties become obvious. Here's my initial
> > division of the fields. I've added a comment at the
> > beginning of each line that indicates where I think
> > it should go, however I'm not convinced about some of them:
> >
> > P = stays in struct proc
> > E = goes to 'KSE' struct (schedulable entity)
> > G = goes to 'group' struct
> > C = goes to 'sleepable Context' struct.
> 
> Does each KSE get a sleepable context?  I don't know if I really see where
> it fits in; sounds like it would have a 1:1 mapping with KSEs.
> 

Ok I'm going to only answer this question here as I'm off to school inthe
morning and it's 12:30 
AM now.. but you have a misconception so I'll try clear that up quickly..

A KSE doesn't have a stack. It doesn't have any state WRT system call execution.
When a system call happens, controll passes from userland, to a waiting KSE that
is presently assigned to teh processor you are on, and your process. The KSE
grabs a spare "KSEC - KSE CONTEXT) (maybe it already has is sitting ready) and
uses it. The KSEC supplies a stack and storage for anything that describes the
state of the processor at any moment during the syscall.

When the system call blocks, the KSEC is left on the sleep queue, and the KSE
grabs another one, and performs an upcall to the Userland Thread scheduler,
which schedules another thread. When THAT thread does a system call, the system
call is executed, storing a set of frames and state onto the stack in the NEW
KSEC. If, in turn, that blocks, it too is thrown onto the sleep queue.

Everything needed to complete the system calls is in the KSECs, which is
hibernating on the Sleep queues. When the system call is reawakenned, the
kernel, waits for a scheduling event in which a KSE from that process (possibly
the same one) is being scheduled. It then reassociates the first KSEC (with it's
stack and stored processor context) with that KSE and then completes the system
call (including any copyout()s or copyin()s). However, instead of crossing back
to user space when it gets back up to the boundary, it puts the syscall's return
information in the mailbox that the Thread system configured (I skipped that
bit) for that thread (don't worry it's trivial), and checks if there are any
more awakened syscalls to complete. It keeps doing this until there are no more
awakening KSECs, at which time it does an upcall to the process. This results in
the Userland Thread Scheduler (UTS) picking up all the completed threads,
deciding which is the highest priority, and running it, as if it were just
returning from the kernel.
I forgot to mention that the mailboxes for the completed threads are linked
together by the kernel before doing the upcall, and the resulting list is passed
as a single pointer to UTS.

Note: the thread that was running when the KSE was pre-empted is also in the
list of threads that is returned to the UTS when the upcall happens, so the UTS
may decide to let it continue running.
It didn't voluntarily do a syscall, but it did cross to the kernel when the
timer interrupt occured, so it can be faked up to look the same. If it was in a
critical region, then of course it should have marked that fact, so it would be
scheduled first. A process may have a KSE for each physical processor. When  it
creates a new KSE (upto the maximum of N) it sets up a KSE mailbox. When it
shedules a thread, it places a pointer to the Thread mailbix in the KSE mailbox.
The KSE always knows where it's mailbox is so it can always find the thread
mailbox of the thread that just made the systemcall. When the syscall blocks,
that thread mailbox address is stored int the KSEC, and it is zero's out from
the KSE's mailbox. 

When an upcall happens, the KSE adds the linked list of all completed syscall's
mailboxes in that same KSE mailbox, as well. The UTS just takes that list, and
adds the threads mentionned onto it's lists of runnable threads, and then makes
a schedulaing decision and runs the highest priority thread. It sets the mailbox
address of that thread into the KSE's mailbox, and jumps into the thread..
etc.etc.

I haven't mentionned KSEGs here but if you are limited to N KSEs, you want a
container into which you want to put extra competeing KSEs (for example a super
High prority thread).
usually you just have one KSEG, but you may start another, in which they are
treated by teh system much like two separate processes. each with it's own KSEs.

more later.
Julian
 
-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000
---> X_.---._/  presently in:  Budapest
            v


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Nov 26 23:44:37 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mail-relay.eunet.no (mail-relay.eunet.no [193.71.71.242])
	by hub.freebsd.org (Postfix) with ESMTP
	id B268E37B4C5; Sun, 26 Nov 2000 23:44:34 -0800 (PST)
Received: from login-1.eunet.no (login-1.eunet.no [193.75.110.2])
	by mail-relay.eunet.no (8.9.3/8.9.3/GN) with ESMTP id IAA12864;
	Mon, 27 Nov 2000 08:43:57 +0100 (CET)
	(envelope-from mbendiks@eunet.no)
Received: from localhost (mbendiks@localhost)
	by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id IAA54208;
	Mon, 27 Nov 2000 08:43:57 +0100 (CET)
	(envelope-from mbendiks@eunet.no)
X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs
Date: Mon, 27 Nov 2000 08:43:57 +0100 (CET)
From: Marius Bendiksen <mbendiks@eunet.no>
To: John Baldwin <jhb@FreeBSD.org>
Cc: Jake Burkholder <jburkhol@home.com>,
	Daniel Eischen <eischen@vigrid.com>, arch@FreeBSD.org,
	Jonathan Lemon <jlemon@flugsvamp.com>
Subject: Re: Thread-specific data and KSEs
In-Reply-To: <XFMail.001121165957.jhb@FreeBSD.org>
Message-ID: <Pine.BSF.4.05.10011270840340.54186-100000@login-1.eunet.no>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Just a short question. As I recall, the Wine people had a lot of
difficulty with FreeBSD due to our abuse of the %fs register. Wouldn't
using %gs as well just aggravate this problem?

Besides, as I recall, the process could likely be obtained from the tss
number, which can be retrieved with str. And additional data could
actually be stuck in the tss.

Marius


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sun Nov 26 23:49:27 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mail-relay.eunet.no (mail-relay.eunet.no [193.71.71.242])
	by hub.freebsd.org (Postfix) with ESMTP
	id D0AC037B479; Sun, 26 Nov 2000 23:49:24 -0800 (PST)
Received: from login-1.eunet.no (login-1.eunet.no [193.75.110.2])
	by mail-relay.eunet.no (8.9.3/8.9.3/GN) with ESMTP id IAA14456;
	Mon, 27 Nov 2000 08:49:23 +0100 (CET)
	(envelope-from mbendiks@eunet.no)
Received: from localhost (mbendiks@localhost)
	by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id IAA54244;
	Mon, 27 Nov 2000 08:49:23 +0100 (CET)
	(envelope-from mbendiks@eunet.no)
X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs
Date: Mon, 27 Nov 2000 08:49:23 +0100 (CET)
From: Marius Bendiksen <mbendiks@eunet.no>
To: Alfred Perlstein <bright@wintelcom.net>
Cc: Daniel Eischen <eischen@vigrid.com>,
	John Baldwin <jhb@FreeBSD.ORG>,
	Jonathan Lemon <jlemon@flugsvamp.com>, arch@FreeBSD.ORG
Subject: Re: Thread-specific data and KSEs
In-Reply-To: <20001121192331.E18037@fw.wintelcom.net>
Message-ID: <Pine.BSF.4.05.10011270847060.54186-100000@login-1.eunet.no>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> > It's just one more register that has to be saved.  I don't
> > think it's going to matter much.
> No extra TLB faults/invalidations?  Aren't segment registers
> somewhat expensive to load?

Upon loading a task state (with ltr or a gate), you will restore all
segment registers from the tss, regardless of their content, and a load of
the shadow portion of the segment will be attempted anyway. I don't think
this is the right place to shave off cycles, nor do I think the speed is
even the most relevant issue for this extension, but rather the abuse of
segments that are ment to hold real data.

Marius


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 10:33: 3 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9F12B37B479; Mon, 27 Nov 2000 10:32:53 -0800 (PST)
Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241])
	by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eARIWpC39794;
	Mon, 27 Nov 2000 10:32:51 -0800 (PST)
	(envelope-from jhb@FreeBSD.org)
Message-ID: <XFMail.001127103304.jhb@FreeBSD.org>
X-Mailer: XFMail 1.4.0 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <20001126210014.D824BBA7A@io.yi.org>
Date: Mon, 27 Nov 2000 10:33:04 -0800 (PST)
From: John Baldwin <jhb@FreeBSD.org>
To: Jake Burkholder <jburkhol@home.com>
Subject: RE: review: callout patch
Cc: smp@FreeBSD.org, arch@FreeBSD.org
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On 26-Nov-00 Jake Burkholder wrote:
> 
> This patch makes most of sys/kern/* sources use callout_reset for
> registering callouts rather than timeout(9).  This should greatly
> reduce the use of the fixed size callfree allocator pool.  Currently
> we panic when it runs out.
> 
> This was motivated by NetBSD, who have completely removed timeout(9)
> from their kernel.
> 
> Please review it.

Looks good to me. :)

Having a callout.9 manpage to go along with it would be nice as well. :)

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 10:53:58 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222])
	by hub.freebsd.org (Postfix) with ESMTP id 1277737B4E5
	for <arch@FreeBSD.org>; Mon, 27 Nov 2000 10:53:45 -0800 (PST)
Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241])
	by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eARIr3C40719;
	Mon, 27 Nov 2000 10:53:03 -0800 (PST)
	(envelope-from jhb@FreeBSD.org)
Message-ID: <XFMail.001127105316.jhb@FreeBSD.org>
X-Mailer: XFMail 1.4.0 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <Pine.BSF.4.05.10011270847060.54186-100000@login-1.eunet.no>
Date: Mon, 27 Nov 2000 10:53:16 -0800 (PST)
From: John Baldwin <jhb@FreeBSD.org>
To: Marius Bendiksen <mbendiks@eunet.no>
Subject: Re: Thread-specific data and KSEs
Cc: arch@FreeBSD.org, Jonathan Lemon <jlemon@flugsvamp.com>,
	Daniel Eischen <eischen@vigrid.com>,
	Alfred Perlstein <bright@wintelcom.net>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On 27-Nov-00 Marius Bendiksen wrote:
>> > It's just one more register that has to be saved.  I don't
>> > think it's going to matter much.
>> No extra TLB faults/invalidations?  Aren't segment registers
>> somewhat expensive to load?
> 
> Upon loading a task state (with ltr or a gate), you will restore all
> segment registers from the tss, regardless of their content, and a load of
> the shadow portion of the segment will be attempted anyway. I don't think
> this is the right place to shave off cycles, nor do I think the speed is
> even the most relevant issue for this extension, but rather the abuse of
> segments that are ment to hold real data.

Erm, we don't use task gates or a TSS for our task switches.  Go look at
cpu_switch() in sys/i386/i386/swtch.s.  %fs and %gs are intended to be used for
per-CPU data and thread-local storage, which is why x86-64 keeps them around
even after axeing %cs, %ds, %es, and %ss.

> Marius

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 11:26: 4 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from io.yi.org (unknown [24.70.218.157])
	by hub.freebsd.org (Postfix) with ESMTP
	id C8F1337B479; Mon, 27 Nov 2000 11:25:59 -0800 (PST)
Received: from io.yi.org (localhost.gvcl1.bc.wave.home.com [127.0.0.1])
	by io.yi.org (Postfix) with ESMTP
	id 897B3BA7A; Mon, 27 Nov 2000 11:25:58 -0800 (PST)
X-Mailer: exmh version 2.1.1 10/15/1999
To: John Baldwin <jhb@FreeBSD.ORG>
Cc: smp@FreeBSD.ORG, arch@FreeBSD.ORG
Subject: Re: review: callout patch 
In-Reply-To: Message from John Baldwin <jhb@FreeBSD.ORG> 
   of "Mon, 27 Nov 2000 10:33:04 PST." <XFMail.001127103304.jhb@FreeBSD.org> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Mon, 27 Nov 2000 11:25:58 -0800
From: Jake Burkholder <jburkhol@home.com>
Message-Id: <20001127192558.897B3BA7A@io.yi.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> 
> On 26-Nov-00 Jake Burkholder wrote:
> > 
> > This patch makes most of sys/kern/* sources use callout_reset for
> > registering callouts rather than timeout(9).  This should greatly
> > reduce the use of the fixed size callfree allocator pool.  Currently
> > we panic when it runs out.
> > 
> > This was motivated by NetBSD, who have completely removed timeout(9)
> > from their kernel.
> > 
> > Please review it.
> 
> Looks good to me. :)
> 
> Having a callout.9 manpage to go along with it would be nice as well. :)

timeout.9 exists, its just not linked.

> 
> -- 
> 
> John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
> PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
> "Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-arch" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 11:34:54 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222])
	by hub.freebsd.org (Postfix) with ESMTP
	id 64A9137B479; Mon, 27 Nov 2000 11:34:51 -0800 (PST)
Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241])
	by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eARJYnC43140;
	Mon, 27 Nov 2000 11:34:49 -0800 (PST)
	(envelope-from jhb@FreeBSD.org)
Message-ID: <XFMail.001127113502.jhb@FreeBSD.org>
X-Mailer: XFMail 1.4.0 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <20001127192558.897B3BA7A@io.yi.org>
Date: Mon, 27 Nov 2000 11:35:02 -0800 (PST)
From: John Baldwin <jhb@FreeBSD.org>
To: Jake Burkholder <jburkhol@home.com>
Subject: Re: review: callout patch
Cc: arch@FreeBSD.org, smp@FreeBSD.org
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On 27-Nov-00 Jake Burkholder wrote:
>> 
>> On 26-Nov-00 Jake Burkholder wrote:
>> > 
>> > This patch makes most of sys/kern/* sources use callout_reset for
>> > registering callouts rather than timeout(9).  This should greatly
>> > reduce the use of the fixed size callfree allocator pool.  Currently
>> > we panic when it runs out.
>> > 
>> > This was motivated by NetBSD, who have completely removed timeout(9)
>> > from their kernel.
>> > 
>> > Please review it.
>> 
>> Looks good to me. :)
>> 
>> Having a callout.9 manpage to go along with it would be nice as well. :)
> 
> timeout.9 exists, its just not linked.

Ah, I had thought timeout(9) didn't document those.  Well, then updating
timeout(9) and adding appropriate MLINK's would be cool. :)

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 14: 8:29 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193])
	by hub.freebsd.org (Postfix) with ESMTP id D557B37B4C5
	for <arch@freebsd.org>; Mon, 27 Nov 2000 14:08:27 -0800 (PST)
Received: (from wollman@localhost)
	by khavrinen.lcs.mit.edu (8.9.3/8.9.3) id RAA97482;
	Mon, 27 Nov 2000 17:08:16 -0500 (EST)
	(envelope-from wollman)
Date: Mon, 27 Nov 2000 17:08:16 -0500 (EST)
From: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
Message-Id: <200011272208.RAA97482@khavrinen.lcs.mit.edu>
To: jburkhol@home.com
Cc: arch@freebsd.org
Subject: Re: review: callout patch
X-Newsgroups: mit.lcs.mail.freebsd-arch
In-Reply-To: <mit.lcs.mail.freebsd-arch/20001126210014.D824BBA7A@io.yi.org>
Organization: MIT Laboratory for Computer Science
Cc: 
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In article <mit.lcs.mail.freebsd-arch/20001126210014.D824BBA7A@io.yi.org> you write:

>This should greatly reduce the use of the fixed size callfree
>allocator pool.

Keep in mind that the size of the callout wheel is currently based on
the number of pre-allocated callout structures there are.  This needs
to be revisited now that the number is effectively unlimited.  Some
instrumentation would be very helpful.

-GAWollman

-- 
Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
wollman@lcs.mit.edu  | O Siem / The fires of freedom 
Opinions not those of| Dance in the burning flame
MIT, LCS, CRS, or NSA|                     - Susan Aglukark and Chad Irschick


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 14: 9: 4 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from winston.osd.bsdi.com (winston.osd.bsdi.com [204.216.27.229])
	by hub.freebsd.org (Postfix) with ESMTP id AFFDC37B4C5
	for <arch@freebsd.org>; Mon, 27 Nov 2000 14:08:48 -0800 (PST)
Received: from winston.osd.bsdi.com (localhost [127.0.0.1])
	by winston.osd.bsdi.com (8.11.1/8.11.1) with ESMTP id eARM8jh52696;
	Mon, 27 Nov 2000 14:08:45 -0800 (PST)
	(envelope-from jkh@winston.osd.bsdi.com)
To: arch@freebsd.org
Cc: rps@merlin.mat.uc.pt
Subject: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.)
Date: Mon, 27 Nov 2000 14:08:45 -0800
Message-ID: <52694.975362925@winston.osd.bsdi.com>
From: Jordan Hubbard <jkh@winston.osd.bsdi.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

I just received this today and am kind of scratching my head over it.
On one hand, creating an "alias" for a one specific piece of terminal
character mapping seems a hack; I can see the idea behind wanting to
use one of n characters for something like backspacing or line-killing
(^U or ^X for example) and would not frown (as much) on a more general
aliasing feature.  On the other hand, I can see that this specific
case (erase) is by far the most significant.  Which is why I'm
forwarding this to arch - this is one of those classic
architecture/feature trade-off decisions and I would like to hear more
opinions before deciding which way I'd like to respond to this.

- Jordan

------- Forwarded Message

Return-Path: rps@merlin.mat.uc.pt
Delivery-Date: Mon Nov 27 12:02:08 2000
Return-Path: <rps@merlin.mat.uc.pt>
Received: from merlin.mat.uc.pt (merlin-f.mat.uc.pt [193.137.206.2])
	by winston.osd.bsdi.com (8.11.1/8.11.1) with ESMTP id eARK25h52306
	for <jkh@winston.osd.bsdi.com>; Mon, 27 Nov 2000 12:02:05 -0800 (PST)
	(envelope-from rps@merlin.mat.uc.pt)
Received: (from rps@localhost)
	by merlin.mat.uc.pt (8.9.3/8.9.0) id UAA06153;
	Mon, 27 Nov 2000 20:01:52 GMT
Message-ID: <20001127200149.05857@merlin.mat.uc.pt>
Date: Mon, 27 Nov 2000 20:01:49 +0000
From: Rui Pedro Mendes Salgueiro <rps@mat.uc.pt>
To: Jordan Hubbard <jkh@winston.osd.bsdi.com>
Subject: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.)
References: <20001122191141.50422@merlin.mat.uc.pt> <80298.974921931@winston.osd.bsdi.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.89.1i
In-Reply-To: <80298.974921931@winston.osd.bsdi.com>; from Jordan Hubbard on Wed, Nov 22, 2000 at 11:38:51AM -0800

I am not sure if you are the proper person to send this:

One thing that has bothered me since a long time ago is the two confliting
standards for the erase character: ^H (backspace) and ^? (del).

Years ago I used a Convex (mini-super-)computer which solved the problem
in an elegant way. stty(1) had an extra option for a erase2 character.
So you could have both usual erase chars working simultaneously.  

Then, around 1993, I reimplemented that in an early version of BSDI (1.0?).
At that time, I also tried other tricks, like a flag to replace each ^H
with an ^? and the reverse, but those interfered with Emacs (which uses ^H).

The next BSDI version lacked the necessary kernel source file due to the ATT
lawsuit, so I could not reimplement it.

Much later I started to reimplement it on FreeBSD, but this is the first
time I managed to get a new release (4.2) before it is obsolete.

dingo# uname -a
FreeBSD dingo.mat.uc.pt 4.2-RELEASE FreeBSD 4.2-RELEASE #0: Mon Nov 27 13:35:57 WET 2000     rps@dingo.mat.uc.pt:/usr/src/sys/compile/GENERIC  i386

dingo# stty -a
[...]
cchars: discard = ^O; dsusp = ^Y; eof = ^D; eol = <undef>;
        eol2 = <undef>; erase = ^?; erase2 = ^H; intr = ^C; kill = ^U;
[...]

The needed patches are simple:

1 - use a spare slot in the c_cc[] character array. This effects the header
    files "termios.h".

2 - define the default char for it in ttydefaults.h . Also, include it in
    the ttydefchars array.

(these files are in /usr/src/sys/sys and /usr/include/sys/ )

3 - the file tty.c in the kernel (/usr/src/sys/kern) is the one that does
    the real work. The modification there is just adding an OR to the
    relevant "if".

4 - modify stty(1) (/usr/src/bin/stty/cchar.c) so it knows about erase2.
    It is just needed to add the a line to the initialization of cchars1.

5 - document it in the man page (/usr/src/bin/stty/stty.1 ).

Patch follows (paths are realtive to /usr/src )

*** ./bin/stty/cchar.c.orig	Sat Aug 28 00:15:40 1999
- --- ./bin/stty/cchar.c	Mon Nov 27 13:11:33 2000
***************
*** 64,69 ****
- --- 64,70 ----
  	{ "eol",	VEOL,		CEOL },
  	{ "eol2",	VEOL2,		CEOL },
  	{ "erase",	VERASE,		CERASE },
+ 	{ "erase2",	VERASE2,	CERASE2 },
  	{ "intr",	VINTR,		CINTR },
  	{ "kill",	VKILL,		CKILL },
  	{ "lnext",	VLNEXT,		CLNEXT },
*** ./bin/stty/stty.1.orig	Wed Mar  1 10:43:07 2000
- --- ./bin/stty/stty.1	Mon Nov 27 13:20:29 2000
***************
*** 374,379 ****
- --- 374,380 ----
  .It eol Ta Tn VEOL	EOL No character
  .It eol2 Ta Tn VEOL2	EOL2 No character
  .It erase Ta Tn VERASE	ERASE No character
+ .It erase2 Ta Tn VERASE2	ERASE2 No character
  .It werase Ta Tn VWERASE	WERASE No character
  .It intr Ta Tn VINTR	INTR No character
  .It kill Ta Tn VKILL	KILL No character
***************
*** 420,426 ****
  -nl unsets inlcr and igncr.
  .It Cm ek
  Reset
! .Dv ERASE
  and
  .Dv KILL
  characters
- --- 421,428 ----
  -nl unsets inlcr and igncr.
  .It Cm ek
  Reset
! .Dv ERASE ,
! .Dv ERASE2 ,
  and
  .Dv KILL
  characters
*** ./sys/kern/tty.c.orig	Thu Aug  3 01:09:33 2000
- --- ./sys/kern/tty.c	Mon Nov 27 13:26:44 2000
***************
*** 452,460 ****
  		 * processing takes place.
  		 */
  		/*
! 		 * erase (^H / ^?)
  		 */
! 		if (CCEQ(cc[VERASE], c)) {
  			if (tp->t_rawq.c_cc)
  				ttyrub(unputc(&tp->t_rawq), tp);
  			goto endcase;
- --- 452,460 ----
  		 * processing takes place.
  		 */
  		/*
! 		 * erase or erase2 (^H / ^?)
  		 */
! 		if (CCEQ(cc[VERASE], c) || CCEQ(cc[VERASE2], c) ) {
  			if (tp->t_rawq.c_cc)
  				ttyrub(unputc(&tp->t_rawq), tp);
  			goto endcase;
***************
*** 2003,2010 ****
  			(void)ttyoutput('\\', tp);
  		}
  		ttyecho(c, tp);
! 	} else
  		ttyecho(tp->t_cc[VERASE], tp);
  	--tp->t_rocount;
  }
  
- --- 2003,2019 ----
  			(void)ttyoutput('\\', tp);
  		}
  		ttyecho(c, tp);
! 	} else {
  		ttyecho(tp->t_cc[VERASE], tp);
+ 		/*
+ 		 * This code may be executed not only when an ERASE key
+ 		 * is pressed, but also when ^U (KILL) or ^W (WERASE) are.
+ 		 * So, I didn't think it was worthwhile to pass the extra
+ 		 * information (which would need an extra parameter,
+ 		 * changing every call) needed to distinguish the ERASE2
+ 		 * case from the ERASE.
+ 		 */
+ 	}
  	--tp->t_rocount;
  }
  
*** ./sys/sys/termios.h.orig	Wed Dec 29 04:24:48 1999
- --- ./sys/sys/termios.h	Mon Nov 27 13:06:35 2000
***************
*** 56,63 ****
  #define VKILL		5	/* ICANON */
  #ifndef _POSIX_SOURCE
  #define	VREPRINT 	6	/* ICANON together with IEXTEN */
  #endif
! /*			7	   spare 1 */
  #define VINTR		8	/* ISIG */
  #define VQUIT		9	/* ISIG */
  #define VSUSP		10	/* ISIG */
- --- 56,64 ----
  #define VKILL		5	/* ICANON */
  #ifndef _POSIX_SOURCE
  #define	VREPRINT 	6	/* ICANON together with IEXTEN */
+ #define VERASE2 	7	/* ICANON */
  #endif
! /*			7	   ex-spare 1 */
  #define VINTR		8	/* ISIG */
  #define VQUIT		9	/* ISIG */
  #define VSUSP		10	/* ISIG */
*** ./sys/sys/ttydefaults.h.orig	Sat Aug 28 01:52:07 1999
- --- ./sys/sys/ttydefaults.h	Mon Nov 27 13:09:13 2000
***************
*** 61,66 ****
- --- 61,67 ----
  #define	CEOF		CTRL('d')
  #define	CEOL		0xff		/* XXX avoid _POSIX_VDISABLE */
  #define	CERASE		0177
+ #define	CERASE2		CTRL('h')
  #define	CINTR		CTRL('c')
  #define	CSTATUS		CTRL('t')
  #define	CKILL		CTRL('u')
***************
*** 90,96 ****
  #ifdef TTYDEFCHARS
  static cc_t	ttydefchars[NCCS] = {
  	CEOF,	CEOL,	CEOL,	CERASE, CWERASE, CKILL, CREPRINT,
! 	_POSIX_VDISABLE, CINTR,	CQUIT,	CSUSP,	CDSUSP,	CSTART,	CSTOP,	CLNEXT,
  	CDISCARD, CMIN,	CTIME,  CSTATUS, _POSIX_VDISABLE
  };
  #undef TTYDEFCHARS
- --- 91,97 ----
  #ifdef TTYDEFCHARS
  static cc_t	ttydefchars[NCCS] = {
  	CEOF,	CEOL,	CEOL,	CERASE, CWERASE, CKILL, CREPRINT,
! 	CERASE2, CINTR,	CQUIT,	CSUSP,	CDSUSP,	CSTART,	CSTOP,	CLNEXT,
  	CDISCARD, CMIN,	CTIME,  CSTATUS, _POSIX_VDISABLE
  };
  #undef TTYDEFCHARS

------- End of Forwarded Message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 14:39:55 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from post.mail.nl.demon.net (post-10.mail.nl.demon.net [194.159.73.20])
	by hub.freebsd.org (Postfix) with ESMTP id 096F937B4C5
	for <arch@freebsd.org>; Mon, 27 Nov 2000 14:39:53 -0800 (PST)
Received: from [212.238.54.101] (helo=freebie.demon.nl)
	by post.mail.nl.demon.net with smtp (Exim 3.14 #2)
	id 140WwB-0002ym-00; Mon, 27 Nov 2000 22:39:51 +0000
Received: (from wkb@localhost)
	by freebie.demon.nl (8.11.1/8.11.0) id eARMd2R02442;
	Mon, 27 Nov 2000 23:39:02 +0100 (CET)
	(envelope-from wkb)
Date: Mon, 27 Nov 2000 23:39:02 +0100
From: Wilko Bulte <wkb@freebie.demon.nl>
To: Jordan Hubbard <jkh@winston.osd.bsdi.com>
Cc: arch@freebsd.org, rps@merlin.mat.uc.pt
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.)
Message-ID: <20001127233902.C2402@freebie.demon.nl>
References: <52694.975362925@winston.osd.bsdi.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <52694.975362925@winston.osd.bsdi.com>; from jkh@winston.osd.bsdi.com on Mon, Nov 27, 2000 at 02:08:45PM -0800
X-OS: FreeBSD 4.2-RELEASE
X-PGP: finger wilko@freebsd.org
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Mon, Nov 27, 2000 at 02:08:45PM -0800, Jordan Hubbard wrote:

If you ever used DEC or Sun terminals/keyboards you will like this idea.
I, for one, like it ;-) As for technical elegance..

W/

> I just received this today and am kind of scratching my head over it.
> On one hand, creating an "alias" for a one specific piece of terminal
> character mapping seems a hack; I can see the idea behind wanting to
> use one of n characters for something like backspacing or line-killing
> (^U or ^X for example) and would not frown (as much) on a more general
> aliasing feature.  On the other hand, I can see that this specific
> case (erase) is by far the most significant.  Which is why I'm
> forwarding this to arch - this is one of those classic
> architecture/feature trade-off decisions and I would like to hear more
> opinions before deciding which way I'd like to respond to this.
> 
> - Jordan
> 
> ------- Forwarded Message
> 
> Return-Path: rps@merlin.mat.uc.pt
> Delivery-Date: Mon Nov 27 12:02:08 2000
> Return-Path: <rps@merlin.mat.uc.pt>
> Received: from merlin.mat.uc.pt (merlin-f.mat.uc.pt [193.137.206.2])
> 	by winston.osd.bsdi.com (8.11.1/8.11.1) with ESMTP id eARK25h52306
> 	for <jkh@winston.osd.bsdi.com>; Mon, 27 Nov 2000 12:02:05 -0800 (PST)
> 	(envelope-from rps@merlin.mat.uc.pt)
> Received: (from rps@localhost)
> 	by merlin.mat.uc.pt (8.9.3/8.9.0) id UAA06153;
> 	Mon, 27 Nov 2000 20:01:52 GMT
> Message-ID: <20001127200149.05857@merlin.mat.uc.pt>
> Date: Mon, 27 Nov 2000 20:01:49 +0000
> From: Rui Pedro Mendes Salgueiro <rps@mat.uc.pt>
> To: Jordan Hubbard <jkh@winston.osd.bsdi.com>
> Subject: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.)
> References: <20001122191141.50422@merlin.mat.uc.pt> <80298.974921931@winston.osd.bsdi.com>
> Mime-Version: 1.0
> Content-Type: text/plain; charset=us-ascii
> X-Mailer: Mutt 0.89.1i
> In-Reply-To: <80298.974921931@winston.osd.bsdi.com>; from Jordan Hubbard on Wed, Nov 22, 2000 at 11:38:51AM -0800
> 
> I am not sure if you are the proper person to send this:
> 
> One thing that has bothered me since a long time ago is the two confliting
> standards for the erase character: ^H (backspace) and ^? (del).
> 
> Years ago I used a Convex (mini-super-)computer which solved the problem
> in an elegant way. stty(1) had an extra option for a erase2 character.
> So you could have both usual erase chars working simultaneously.  

...
-- 
Wilko Bulte  	 					Arnhem, the Netherlands
wilko@freebsd.org  	http://www.freebsd.org 		http://www.nlfug.nl


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 14:41:10 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from beastie.mckusick.com (beastie.mckusick.com [209.31.233.184])
	by hub.freebsd.org (Postfix) with ESMTP id 36A9837B4C5
	for <arch@FreeBSD.ORG>; Mon, 27 Nov 2000 14:41:08 -0800 (PST)
Received: from beastie.mckusick.com (localhost [127.0.0.1])
	by beastie.mckusick.com (8.9.3/8.9.3) with ESMTP id OAA93364;
	Mon, 27 Nov 2000 14:41:05 -0800 (PST)
	(envelope-from mckusick@beastie.mckusick.com)
Message-Id: <200011272241.OAA93364@beastie.mckusick.com>
To: Jordan Hubbard <jkh@winston.osd.bsdi.com>
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) 
Cc: arch@FreeBSD.ORG
In-Reply-To: Your message of "Mon, 27 Nov 2000 14:08:45 PST."
             <52694.975362925@winston.osd.bsdi.com> 
Date: Mon, 27 Nov 2000 14:41:05 -0800
From: Kirk McKusick <mckusick@mckusick.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

When we first implemented termios at CSRG, we had an erase2
character. Mike Karels was vehemently opposed to it, and
insisted that it be deleted before we did our next release
(4.3-tahoe if I remember correctly). I am of the opinion that
it is a good idea, and should be there. I do not believe that
we need/want a general aliasing facility as erase is really
the only character for which there is widespead disagreement
over which character to use. So, my take would be to add
erase2 and be done with it.

	Kirk McKusick


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 14:47:28 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from citusc17.usc.edu (citusc17.usc.edu [128.125.38.177])
	by hub.freebsd.org (Postfix) with ESMTP id ED37737B479
	for <arch@FreeBSD.ORG>; Mon, 27 Nov 2000 14:47:26 -0800 (PST)
Received: (from kris@localhost)
	by citusc17.usc.edu (8.11.1/8.11.1) id eARMm9c67449;
	Mon, 27 Nov 2000 14:48:10 -0800 (PST)
	(envelope-from kris)
Date: Mon, 27 Nov 2000 14:48:09 -0800
From: Kris Kennaway <kris@FreeBSD.ORG>
To: Jordan Hubbard <jkh@winston.osd.bsdi.com>
Cc: arch@FreeBSD.ORG, rps@merlin.mat.uc.pt
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.)
Message-ID: <20001127144809.A67395@citusc17.usc.edu>
References: <52694.975362925@winston.osd.bsdi.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-md5;
	protocol="application/pgp-signature"; boundary="W/nzBZO5zC0uMSeA"
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <52694.975362925@winston.osd.bsdi.com>; from jkh@winston.osd.bsdi.com on Mon, Nov 27, 2000 at 02:08:45PM -0800
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


--W/nzBZO5zC0uMSeA
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Mon, Nov 27, 2000 at 02:08:45PM -0800, Jordan Hubbard wrote:
> I just received this today and am kind of scratching my head over it.
> On one hand, creating an "alias" for a one specific piece of terminal
> character mapping seems a hack; I can see the idea behind wanting to
> use one of n characters for something like backspacing or line-killing
> (^U or ^X for example) and would not frown (as much) on a more general
> aliasing feature.  On the other hand, I can see that this specific
> case (erase) is by far the most significant.  Which is why I'm
> forwarding this to arch - this is one of those classic
> architecture/feature trade-off decisions and I would like to hear more
> opinions before deciding which way I'd like to respond to this.

This is a very common newbie problem ("Stupid FreeBSD won't let me
delete what I've typed, it just prints ^H!"). Commit please! :)

Kris

--W/nzBZO5zC0uMSeA
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (FreeBSD)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAjoi5KgACgkQWry0BWjoQKXoTgCeNn+hADhsnoOrYTlphOsB0wAu
wKsAoKL4inb6IXesYokZf40t2h/G0qAB
=/mtv
-----END PGP SIGNATURE-----

--W/nzBZO5zC0uMSeA--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 15: 7:25 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135])
	by hub.freebsd.org (Postfix) with ESMTP
	id 78D7437B479; Mon, 27 Nov 2000 15:07:23 -0800 (PST)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.1/8.9.3) id eARN7Ln34886;
	Mon, 27 Nov 2000 15:07:21 -0800 (PST)
	(envelope-from dillon)
Date: Mon, 27 Nov 2000 15:07:21 -0800 (PST)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200011272307.eARN7Ln34886@earth.backplane.com>
To: Kris Kennaway <kris@FreeBSD.ORG>
Cc: Jordan Hubbard <jkh@winston.osd.bsdi.com>, arch@FreeBSD.ORG,
	rps@merlin.mat.uc.pt
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.)
References: <52694.975362925@winston.osd.bsdi.com> <20001127144809.A67395@citusc17.usc.edu>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:> I just received this today and am kind of scratching my head over it.
:> On one hand, creating an "alias" for a one specific piece of terminal
:> character mapping seems a hack; I can see the idea behind wanting to
:> use one of n characters for something like backspacing or line-killing
:> (^U or ^X for example) and would not frown (as much) on a more general
:> aliasing feature.  On the other hand, I can see that this specific
:> case (erase) is by far the most significant.  Which is why I'm
:> forwarding this to arch - this is one of those classic
:> architecture/feature trade-off decisions and I would like to hear more
:> opinions before deciding which way I'd like to respond to this.
:
:This is a very common newbie problem ("Stupid FreeBSD won't let me
:delete what I've typed, it just prints ^H!"). Commit please! :)
:
:Kris

    This is one of those things where, 10 years ago, I would probably
    have been a purist and been opposed to it.

    But after 15+ years of pure hell having to deal with every
    conceivable combination of ^H and ^?, terminal types,
    telnet, rlogin, ssh, and so on and so forth...  I say to
    hell with the purist view on this one.  I'd love to
    see this committed!

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 15:12:18 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from dan.emsphone.com (dan.emsphone.com [199.67.51.101])
	by hub.freebsd.org (Postfix) with ESMTP id 5E79D37B479
	for <arch@FreeBSD.ORG>; Mon, 27 Nov 2000 15:12:16 -0800 (PST)
Received: (from dan@localhost)
	by dan.emsphone.com (8.11.1/8.11.1) id eARNC5w15510;
	Mon, 27 Nov 2000 17:12:05 -0600 (CST)
	(envelope-from dan)
Date: Mon, 27 Nov 2000 17:12:05 -0600
From: Dan Nelson <dnelson@emsphone.com>
To: Wilko Bulte <wkb@freebie.demon.nl>
Cc: Jordan Hubbard <jkh@winston.osd.bsdi.com>, arch@FreeBSD.ORG,
	rps@merlin.mat.uc.pt
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.)
Message-ID: <20001127171205.B22109@dan.emsphone.com>
References: <52694.975362925@winston.osd.bsdi.com> <20001127233902.C2402@freebie.demon.nl>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.3.12i
In-Reply-To: <20001127233902.C2402@freebie.demon.nl>; from "Wilko Bulte" on Mon Nov 27 23:39:02 GMT 2000
X-OS: FreeBSD 5.0-CURRENT
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In the last episode (Nov 27), Wilko Bulte said:
> If you ever used DEC or Sun terminals/keyboards you will like this
> idea. I, for one, like it ;-) As for technical elegance..

There's precedent; we've already got "eol" and "eol2", both of which
seem to default to undefined :)

-- 
	Dan Nelson
	dnelson@emsphone.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 15:40:11 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP id 5AC8A37B479
	for <arch@FreeBSD.ORG>; Mon, 27 Nov 2000 15:40:09 -0800 (PST)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id SAA21890;
	Mon, 27 Nov 2000 18:39:45 -0500 (EST)
Date: Mon, 27 Nov 2000 18:39:45 -0500 (EST)
From: Daniel Eischen <eischen@vigrid.com>
To: Matt Dillon <dillon@earth.backplane.com>
Cc: Jordan Hubbard <jkh@winston.osd.bsdi.com>, arch@FreeBSD.ORG
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.)
In-Reply-To: <200011272307.eARN7Ln34886@earth.backplane.com>
Message-ID: <Pine.SUN.3.91.1001127183815.21757A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Mon, 27 Nov 2000, Matt Dillon wrote:
>     This is one of those things where, 10 years ago, I would probably
>     have been a purist and been opposed to it.
> 
>     But after 15+ years of pure hell having to deal with every
>     conceivable combination of ^H and ^?, terminal types,
>     telnet, rlogin, ssh, and so on and so forth...  I say to
>     hell with the purist view on this one.  I'd love to
>     see this committed!

I agree!  Commit this now before there are any objections!

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 15:49: 4 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from magnesium.net (toxic.magnesium.net [207.154.84.15])
	by hub.freebsd.org (Postfix) with SMTP id A9B3F37B479
	for <arch@freebsd.org>; Mon, 27 Nov 2000 15:48:54 -0800 (PST)
Received: (qmail 94023 invoked by uid 1142); 27 Nov 2000 23:48:53 -0000
Date: 27 Nov 2000 15:48:53 -0800
Date: Mon, 27 Nov 2000 14:30:58 -0800
From: Jason Evans <jasone@canonware.com>
To: Julian Elischer <julian@elischer.org>
Cc: arch@freebsd.org
Subject: Re: Threads (KSE etc) comments
Message-ID: <20001127143058.L4140@canonware.com>
References: <3A15A2C1.1F3FB6CD@elischer.org> <3A192821.13463950@elischer.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3A192821.13463950@elischer.org>; from julian@elischer.org on Mon, Nov 20, 2000 at 05:33:21AM -0800
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Mon, Nov 20, 2000 at 05:33:21AM -0800, Julian Elischer wrote:
> I've been thinking about the scheduling queues, and how to make sure
> that the process (KSEG actually) acts fairly with respect to other
> processes.  I was confised for a while by your description. I think part
> of my confusion came from something that we specified in the meeting but
> has not been written in your document directly. Let me see if we are
> agreed on what we decided..
> 
> A KSEG can only have as a maximum  N KSEs associated with it, where N is
> the number of processors, (unless artificially reduced by a lower
> concurency declaration). (you said this but only indirectly).

There's no particular reason that we need to enforce a limit on the number
of KSEs within a KSEG (aside from resource limits), but in practice,
there's no reason that a program would want to create more KSEs within a
KSEG than there are processors.

> In
> general, KSEs are each assigned to a processor. They do not in general
> move between processors unless some explicit adjustment is being
> made(*), and as a general rule, two KSEs will not be assigned to the
> same processor. (in some transitional moments this may be allowed to
> briefly happen) This in general if you run a KSEC on the same KSE it was
> run on last time, you should be on the same processor,
> (and get any affinity advantages that might exist).

KSEs need to be able to float between processors in order to make use of
all the processors if there are fewer KSEs in a KSEG than there are
processors (in other words, KSEG concurrency less than the number of
processors).  In general practice, KSEs will tend to stay on the same
processor, but CPU load balancing may cause KSEs to migrate from time to
time.

> (*)I am inclined to make the requirement of binding KSEs to processors
> HARD,as this allows us to simplify some later decisions.

I wanted the binding to be soft, in order to simplify things. =)

> For example, if
> we hard bind KSEs to procesors then since we assign a different
> communications mailbox for each KSE we create, we can be sure that
> different KSEs will never preempt each other when writing out to their
> mailboxes. this also means that since there can only be one UTS
> incarnation active per KSE (or one KSE per UTS incarnation), that we can
> not have a UTS preempted by another incarnation on the same processor.
> We can therefore make sure that there needs to be no locking on
> mailboxes, or even any checking.

The case where a KSE is preempted, only to be replaced by another KSE
within the same KSEG has no real meaning, and I expect we'd specifically
write the scheduler to avoid ever doing that.

> I think this is what we decided.. is this correct? The binding is not
> really mentioned in your document.

I made a number of minor changes to the design after our discussions.
Almost all of the changes were made in order to simplify implementation.
In this case, I felt that not binding KSEs to CPUs would make the scheduler
much simpler to implement, with no significant down sides.  If I'm missing
something that actually makes the changes more complex, please don't let
the issue drop; simplicity and efficiency are key.

> When we were talking about it, (at least in my memory) Each KSE had a
> mailbox. My memory of this was that we called a KSE creation call with a
> different argument, thus each KSE had a different return stack frame
> when it made upcalls. In the version you have outlined, there is no KSE
> creation call only KSEG creation calls. Thus all upcalls have the same
> frame, and there is the danger of colliding upcalls for different
> processors. I think it works more naturally with everything just
> 'falling into place' if we have calls to create KSEs rather than KSEGs.
> The "make KSEG" call is simply a version of the "make KSE" call that
> also puts it into the  new different group. You are left with teh very
> first 'original' thread being different in my shceme, but my answer to
> this would be to simply make the first "make KSE" call reuse the current
> stack etc. and not return a new one.
>
> [...]

Yes, this is a shortcoming of the current paper.  I couldn't remember how
we had decided to do this, and was still working it out in my head.  Thanks
for the reminder.

>  When we have per-processor scheduling queues, there is only at most ONE
>  KSE from any given KSEG in the scheduling queues for any given
>  processor.

As mentioned above, I don't think we need to enforce this.

>  With the single scheduling queue we have now do we allow N to be in the
>  queues at once? (or do we put the KSEG in instead?)

We would still put all the KSEs in the scheduling queue.  However, I think
we really need to do the scheduler overhaul close to the same time as the
KSE changes, so that we never have production releases of FreeBSD running
this way.

>  The terms KSE etc. have probably served their useful life.
>  It's time to think of or find names that really describe them better
>  
>  KSE  -- a per process processor.. slot? openning? (a-la CAM/SCSI)
>  KSEC ---- stack plus context... KSC..trying to do something (task?)
>  KSEG ---- a class of schedulable  entities.. A slot cluster? :-)
>  PROC ---- probably needs to stay the same.

I'm not particularly attached to the names, but finding something better
may be hard. =)

Jason


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 15:49: 4 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from magnesium.net (toxic.magnesium.net [207.154.84.15])
	by hub.freebsd.org (Postfix) with SMTP id B02DE37B4C5
	for <arch@FreeBSD.ORG>; Mon, 27 Nov 2000 15:48:54 -0800 (PST)
Received: (qmail 94026 invoked by uid 1142); 27 Nov 2000 23:48:54 -0000
Date: 27 Nov 2000 15:48:54 -0800
Date: Mon, 27 Nov 2000 15:48:00 -0800
From: Jason Evans <jasone@canonware.com>
To: Julian Elischer <julian@elischer.org>
Cc: arch@FreeBSD.ORG
Subject: Re: Threads (KSE etc) comments
Message-ID: <20001127154800.M4140@canonware.com>
References: <Pine.SUN.3.91.1001121160717.7102A-100000@pcnet1.pcnet.com> <3A1B0B64.6D694248@elischer.org> <3A211C82.2464D07E@elischer.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3A211C82.2464D07E@elischer.org>; from julian@elischer.org on Sun, Nov 26, 2000 at 06:21:54AM -0800
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Sun, Nov 26, 2000 at 06:21:54AM -0800, Julian Elischer wrote:
> There has been some discussion as to what the function of the KSEG
> is....
> 
> [...] why we needed KSEGs.
> The basic answer is, 
> 
> "We need some method by which we group the scheduled entities 
> so as to be able to ensure that the scheduler has full 
> information and control over what is going on."

Yes.

> Whether we actually need a KSEG and what it does depends upon what
> semantics we want our threading support to have. If we want to provide a
> virtual machine for the process, that looks as if it has an unlimited
> number of virtual processors, then we allow the KSEG to spawn an
> unlimited number of KSEs. In this case, do we allow the "scheduling
> clout" to build up linearly with the number of KSEs or do we limit it in
> some way? Theoretically you would want a KSEG with two KSEs to have the
> same clout as a process running unthreaded, so that cpu time would be
> divided 50-50. However this would mean assigning the threaded process
> 'partial quantum' for each processor.

There shouldn't be a need for assigning partial quanta.  In the case of a
single-threaded process, A, and a multi-threaded process B, on a 2
processor machine, B may initially get ~75% of the CPU resources.  However,
re-prioritization will notice this and lower the priority of B after a
short period of time (4 ticks or so).

> Maybe this 'exact fairness' is too hard to achieve..

IMO, the existing priority adjustment mechanisms are probably adequate.

> When a KSE is pre-empted, the kernel saves state for that thread in the
> thread-control-block and the next KSE to upcall to the UTS will include
> that thread-control-block in its list of reportable entities. I'm not
> clear on whether it's the next upcall on ANY KSE, or just the next
> upcall on that KSE.. 

I think it should be the next upcall on any KSE.

> If the latter then having multiple KSEs on the same processor, allows
> the KSEG round-robin scheduler to make the UTS believe that it has N
> virtual processors, (N-KSEs). However, it also means that the KSEG
> round-robin scheduler is usurping the decision from the UTS as to which
> thread is to be run next, as the UTS doesn't know that the thread on the
> other KSE was pre-empted in favour of this one. (It's on a different
> virtual CPU).

I don't understand how we're usurping the UTS's scheduling decisions.  If a
KSE is preempted, then an upcall (resulting in yet another preemption, if
necessary) must be done right away in order to give the UTS enough
information to correctly schedule threads on the KSEs that are still
running.  This is one of the basic tenets of scheduler activations, which
we really have to follow.

> If the Former (All KSEs report all events) then there is no real
> advantage to having more than N KSEs (N processors), because that means
> that the UTS will probably keep swapping the threads it thinks are most
> important to the KSEs which means that the thread that was pre-empted on
> KSE-A will probably be re-scheduled on KSE-B when it preempts KSE-A. So
> why have KSE-B at all? All it does is massively confuse things, and
> creates a whole new class of scheduling problems.

The main advantage I see of allowing more KSEs than processors (total
across all KSEGs) is that it simplifies the implementation considerably.
Very little has to be changed about how things currently work, which also
means that single-threaded applications work the same as they do now,
without a lot of extra work.

I agree that there's no good reason to have more KSEs in a KSEG than there
are processors, but it doesn't actually break anything to allow this, and
simply using process resource limits to control the number of KSEs is
simpler than enforcing limits on the number of KSEs per KSEG.

One example of why enforcing KSE/KSEG limits could become hard in the
future is if the number of processors is dynamic (i.e. processors can be
added and removed).  In discussions I've had with Mike Smith, this is a
very real possibility, and is something we should keep in mind.

> So, in summary:
> Assuming we allow only SLIGHT unfairness, if you allow the process to
> have more than N KSEs in a KSEG, you have one of the following:
> 1/ A lot of unfairness if you allow each KSE to be in the queues by
> itself.

Why is there unfairness?  Scheduler re-prioritization should prevent
long-term unfairness just fine.

> 2/ The KSEG scheduler usurping the role of the UTS if it really does
> hide the true number of processors.

We shouldn't be hiding the true number of processors.

> 3/ An increased level of UTS complexity, and un-needed work, as the UTS
> struggles to switch the important threads onto the ever-changing set of
> running KSEs (it must be ever changing because there are more of them
> than CPUs).

The UTS doesn't need to be any more complex.  It would simply get more
upcalls if there were more preemptions as a result of excessive KSEs, which
I don't think would happen anyway.

> The reason for having KSEGs is simply as an entity that competes for CPU
> to assure fairness.
> It may not even exist as a separate structure in the case where there
> are separate per-CPU scheduling queues, (though I think it would for
> efficiency's sake). It would PROBABLY have a analogous partner in the
> UTS that represents the virtual machine that runs all the threads that
> are competing at the same scope.

I agree with everything you say here.

> On a single scheduling queue system, I
> think I would have the KSEG in the queue rather than the independent
> KSEs. When it get's to the head, you schedule
> KSEs on all the CPUs. This allows the threads to communicate quickly
> using shared memory should they want. The UTS has the entire quantum
> across as many CPUs as it has. 

As I mentioned in another email, I don't think we should plan on having a
production release that is implemented with only a single scheduling queue.

Jason


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 17: 2:16 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from net2.gendyn.com (nat2.gendyn.com [204.60.171.12])
	by hub.freebsd.org (Postfix) with ESMTP id 303E337B4C5
	for <arch@freebsd.org>; Mon, 27 Nov 2000 17:02:09 -0800 (PST)
Received: from [153.11.11.3] (helo=plunger.gdeb.com)
	by net2.gendyn.com with esmtp (Exim 2.12 #1)
	id 140Z9g-000Ln6-00
	for arch@freebsd.org; Mon, 27 Nov 2000 20:01:56 -0500
Received: from orion.caen.gdeb.com ([153.11.109.11])
	by plunger.gdeb.com  with ESMTP id TAA03337;
	Mon, 27 Nov 2000 19:58:45 -0500 (EST)
Received: from gdeb.com (gpz.clc.gdeb.com [192.168.3.12])
	by orion.caen.gdeb.com (8.9.3/8.9.3) with ESMTP id TAA00995;
	Mon, 27 Nov 2000 19:59:00 -0500 (EST)
	(envelope-from deischen@gdeb.com)
Message-ID: <3A230437.F8318078@gdeb.com>
Date: Mon, 27 Nov 2000 20:02:47 -0500
From: Dan Eischen <deischen@gdeb.com>
X-Mailer: Mozilla 4.75 [en] (X11; U; SunOS 5.8 sun4u)
X-Accept-Language: en
MIME-Version: 1.0
To: arch@freebsd.org
Cc: julian@elischer.org, jasone@canonware.com
Subject: Re: Threads (KSE etc) comments
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

I promised to post my thoughts on the User Thread Scheduler.
Here they are in hopefully some presentable form.

--
Dan Eischen

----------------------------------------------------------------------

1. Overview

This is a discussion of some of the issues in the User Thread Scheduler
in the FreeBSD KSE project.  This doesn't go into much detail on the
design and implementation, but focuses more on the scheduling models
and the (userland) API.

2. Definitions

Kernel Scheduled Entities (KSE) - This is the entity in which user
threads are scheduled.  A process may have multiple KSEs in which
to schedule threads.  A KSE may be viewed as a virtual processor
to the UTS.  The number of KSEs available for threads with contention
scope process may be limited to the actual number of processors in
the system (TBD).

Kernel Scheduled Entity Group (KSEG) - A group of KSEs that, as
outlined so far, contains the shared quantum and priority.  This is
currently under discussion.

Scheduling Allocation Domain - The number of processors over which
threads can be scheduled.  This really correlates into how many KSEs
are allocated to the process.

User Thread Scheduler (UTS) - This is responsible for scheduling user
level threads over the available processors (KSEs).

3. Scheduling Models

Threads are scheduled according to their contention scope and allocation
domain.  The contention scope, as defined by POSIX, is either system or
process contention scope (PTHREAD_SCOPE_SYSTEM and PTHREAD_SCOPE_PROCESS
respectively).  Each contention scope system thread is bound to its own
private KSE.  This KSE has its own quantum and priority (if it needs
a KSEG to obtain that, so be it) such that the thread competes on a
system-wide basis with other system scope threads.  Contention scope
system threads need not be part of any scheduling queue since there are
no other threads in competition for processing time on that KSE.

Contention scope process threads are more interesting in that there are
several possible scheduling schemes depending on the allocation domain
and on the number of quanta granted by the kernel (not including quanta
granted for scope system threads).

3.1 Single Queue, Single Allocation Domain

This is the common case where there is only 1 CPU and only 1 quantum
granted to handle scope process threads.  All runnable threads are placed
in the same queue and compete for processor time on one scheduleable
entity (with 1 quantum).

3.2 Single Queue, Multiple Allocation Domain

Here we have N schedulable entities over which threads can be executed.
All runnable threads are placed in the same queue and are scheduled onto
N schedule entities as they become avaliable.  When a thread blocks or
completes in a schedulable entity, then another thread is pulled from
the single run queue for execution.  If a KSE is preempted while running
a thread, and in lieu of any hint from the application as to the binding
of threads to KSEs, the UTS would only resume that thread on the next
KSE if its priority was (strictly) greater than the priority of the next
thread in the run queue, or if the preempted thread was in a critical
region.

Whether or not the scheduled entities have their own quantum is not known
at this point, but under the current design it is possible (the UTS could
create N KSEG/KSE pairs to allow this instead of N KSEs within 1 KSEG).  

3.3 Multiple Queue, Multiple Allocation Domain

Again we have N schedulable entities over which threads can be executed,
but in this case there are also multiple (up to N) scheduling queues.  In
this model, there may exist a run queue for each schedulable entity, and
threads may be bound a particular entity.  As with the single queue model
above, whether or not scheduled entities have their own quantum is not yet
known, but it is possible with the current design.

How does the UTS decide which threads get bound to each of the N scheduled
entities?  At the least, the application should have the ability to decide
this.  In lieu of any hint from the application, the UTS could also
provide some method of (soft) binding threads to the N scheduled entities
and optimize for maximum CPU utilization and minimum thread reassignment.
My thought is that we concentrate on keeping it simple for now, and allow
for this possiblity later.

For the case where not all threads are bound to a specific KSE by the
application, there could be a global run queue from which unbound threads
are taken.  In this case there would be as many as N+1 scheduling queues,
with each KSE taking a peek at the priority of the global run queue
before deciding on taking a thread from its own run queue.  The global
run queue would not have to be locked (unless adding/removing a thread),
so this would only add the overhead of a couple of instructions to examine
its priority.  You might also have KSEs that don't have any bound threads,
in which case they wouldn't have a run queue and would always obtain
threads from the global run queue.

Yet another option would be to disallow binding of threads to the
original (main) KSE and only allow binding to other KSEs.  All unbound
threads would be executed on the main KSE (and any other KSE which does
not have bound threads).  Any KSE that has bound threads would only
execute those threads.

4. API

For the most part, the POSIX API is sufficient for our needs.  But if
we want to allow application control of how threads are assigned and
scheduled on the KSEs, we could define the following set of interfaces:

  pthread_setconcurrency() - This is the POSIX interface to set the
  concurrency level.  This will request the desired concurrency level
  and informs the UTS as to how many KSEs are to be requested from
  the kernel (which the kernel may limit).  Whether or not this also
  allows additional quantum remains to be seen.  Certainly, the UTS
  could create an additional KSEG/KSE pair for each level of concurrency
  above 1 in order to achieve additional quantum under the current
  design.  This function does not change/reflect the scheduled entities
  for system scope threads.  The limited concurrency level is returned.
  If we wanted this routine to act the same as it does under Solaris,
  then it would actually request the number of entities with quantum
  and priority (KSEGs as currently defined).

  thr_create(...) - This would be an alternative to using
  pthread_setconcurrency() to set the number of KSEs.  This function
  allows an application to specify additional attributes for thread
  creation.  Solaris allows additional flags to be specified, noteably
  THR_NEW_LWP and THR_BOUND.  The effect of specifying THR_BOUND is the
  same as specifying PTHREAD_SCOPE_SYSTEM.  But specifying THR_NEW_LWP
  (and omitting THR_BOUND) allocates an additional LWP that can be used
  to schedule unbound (scope process) threads.  We could provide a
  similar flag THR_NEW_KSE (or THR_NEW_KSEG) that could tell the UTS
  to request an additional KSE (or KSEG) to be used to schedule scope
  process threads.  I'm not too keen on this interface as an alternative
  to pthread_setconcurrency(), but perhaps it has some merit if we
  want a Solaris-like API.

  _kse_self() - returns the current KSE ID, where the integer ID ranges
  from 0 (for the original KSE of the main process) to M-1 (where M
  is the total number of KSEs in the process).  Solaris provides a
  similar function _lwp_self().

  pthread_bind_np(pthread_t pthread, int kse_id) - binds a given thread
  to the specified KSE.  A kse_id of -1 refers to the current KSE.  I
  suppose this could also be called _kse_bind() depending on how you
  looked at it.

  _kse_bind(int kse_id, int processor) - binds a KSE to a particular
  processor.  This might also be called _cpu_bind() if you use _kse_bind
  instead of pthread_bind_np.  Solaris has processor_bind() which can
  handle both LWPs and PIDs, and pset_bind() which allows binding of
  LWPs or PIDs to processor sets.  This is probably getting a little
  ahead of ourselves, but something to think about anyways.

For a moment, let's make the assumption that each KSE has a priority
and quantum, or that we always use a KSEG with one KSE to achieve the
same effect.  We _could_ now present an interface that is very similar
to that provided by Solaris.  True, perhaps a KSE (or KSEG) is not as
heavy as an LWP in Solaris, but that is just an internal implementation
issue.  To the application, they are seen as very much similar things.
If we provide an API that is very similar to that provided by Solaris,
that would make porting Solaris applications trivial.  Again, something
to think about.

5. Interaction of Existing Scheduling Interfaces

We currently have the following interfaces that affect process scheduling:

  setpriority()
  rtprio()
  sched_setparam()
  sched_setscheduler()
  any others?

In a threaded process, I think these should operate on the entity that
contains the quantum and priority, not the process.  Whether that is a
KSE or a KSEG, I don't know.  If it is a KSEG, then that's the only
case I can see for forcing the threads library to know anything about
KSEGs.  Still, the kernel is responsible for setting these priorities,
not the UTS, so it wouldn't be strictly necessary for the UTS to have
any knowledge about KSEGs.

6. Summary

I'd like some resolution as to what interfaces the threads library
should provide to the application.  I've outlined some of my thoughts
above and I'd like some feedback.  My biggest question is do we want
to provide the ability for a threaded application to request more
scheduling time (aside from PTHREAD_SCOPE_SYSTEM threads)?  I've
already seen applications that always use PTHREAD_SCOPE_SYSTEM when
creating threads.  I suppose this is mostly in part to obtain as
much CPU as possible.  At USENIX, Jason and I attended a BOF on threads
and it was kind of amazing to me that folks seemed to prefer the
LinuxThreads model.  Given this attitude, I don't think it makes
sense to attempt to restrict an application to only 1 (or N where
N = # of CPUs) quantum; we'll just end up with applications that
always use system scope threads.

Do we want to provide a method of binding threads to KSEs, and KSEs
to processors?  Binding threads to KSEs isn't really that hard to
implement in the UTS, and I wouldn't think it too difficult for the
kernel to bind KSEs to processors either (?).  Some KSEs may be
automatically bound to processors, but others might not; KSEs allocated
for system scope threads, or KSEs allocated (for process scope threads)
above and beyond the number of CPUs (assuming we allow this).

I'd like to resolve these issues (any others?) very soon so I can
concentrate on more of the UTS details (like what is the communication
channel between the kernel and the UTS).  At some point, it may be
worthwhile to have a telecon or IRC (never tried it) because it could
take too long via this mailing list.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 17:41:41 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from winston.osd.bsdi.com (winston.osd.bsdi.com [204.216.27.229])
	by hub.freebsd.org (Postfix) with ESMTP id 3397D37B4CF
	for <arch@FreeBSD.ORG>; Mon, 27 Nov 2000 17:41:39 -0800 (PST)
Received: from winston.osd.bsdi.com (localhost [127.0.0.1])
	by winston.osd.bsdi.com (8.11.1/8.11.1) with ESMTP id eAS1fYh53354;
	Mon, 27 Nov 2000 17:41:34 -0800 (PST)
	(envelope-from jkh@winston.osd.bsdi.com)
To: Kirk McKusick <mckusick@mckusick.com>
Cc: arch@FreeBSD.ORG, rps@merlin.mat.uc.pt
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) 
In-Reply-To: Message from Kirk McKusick <mckusick@mckusick.com> 
   of "Mon, 27 Nov 2000 14:41:05 PST." <200011272241.OAA93364@beastie.mckusick.com> 
Date: Mon, 27 Nov 2000 17:41:33 -0800
Message-ID: <53352.975375693@winston.osd.bsdi.com>
From: Jordan Hubbard <jkh@winston.osd.bsdi.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> When we first implemented termios at CSRG, we had an erase2
> character. Mike Karels was vehemently opposed to it, and
> insisted that it be deleted before we did our next release
> (4.3-tahoe if I remember correctly). I am of the opinion that
> it is a good idea, and should be there. I do not believe that
> we need/want a general aliasing facility as erase is really
> the only character for which there is widespead disagreement
> over which character to use. So, my take would be to add
> erase2 and be done with it.

Well, there are the ^U vs ^X folks for line-kill (some even argue for
^W) which is why I cited it as another example; I agree that it's by
no means as prevalent as ^H vs DEL though.

That said, I'm still not fully convinced that termios was implemented
in a fully sane fashion to begin with.  If one uses a fairly competent
shell like bash, for example, you have a "bind" command which allows
you to map any key to any function and I've used that feature to good
effect in my .bashrc so I'd have a hard time with any argument that
fully bindable keys is an over-engineered solution.  The major
drawback, of course, is that these editing characters are only useful
at the shell prompt and not with other programs which take input,
which is why readline(3) type functionality would really not be such a
horrible thing to see in termios(4).

Back in the day when a really bloated kernel was a couple of hundred
kilobytes I'd also probably have been shot at dawn for even making
such a suggestion, but I'm hoping that times have changed enough
that my life will be spared for doing so. :)

- Jordan


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 18: 2:24 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from winston.osd.bsdi.com (winston.osd.bsdi.com [204.216.27.229])
	by hub.freebsd.org (Postfix) with ESMTP id 8990937B4CF
	for <arch@FreeBSD.ORG>; Mon, 27 Nov 2000 18:02:22 -0800 (PST)
Received: from winston.osd.bsdi.com (localhost [127.0.0.1])
	by winston.osd.bsdi.com (8.11.1/8.11.1) with ESMTP id eAS22Hh53509;
	Mon, 27 Nov 2000 18:02:17 -0800 (PST)
	(envelope-from jkh@winston.osd.bsdi.com)
Cc: Kirk McKusick <mckusick@mckusick.com>, arch@FreeBSD.ORG,
	rps@merlin.mat.uc.pt
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) 
In-Reply-To: Message from Jordan Hubbard <jkh@winston.osd.bsdi.com> 
   of "Mon, 27 Nov 2000 17:41:33 PST." <53352.975375693@winston.osd.bsdi.com> 
Date: Mon, 27 Nov 2000 18:02:17 -0800
Message-ID: <53507.975376937@winston.osd.bsdi.com>
From: Jordan Hubbard <jkh@winston.osd.bsdi.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> Back in the day when a really bloated kernel was a couple of hundred
> kilobytes I'd also probably have been shot at dawn for even making
> such a suggestion, but I'm hoping that times have changed enough
> that my life will be spared for doing so. :)

Just to follow up to myself, I should also note that I'm just lightly
kvetching with my suggestion that termios(4) should be extended.  I
don't intend it as a rejection of the original patches by
Mr. Salgueiro and it does appear that there is wide-spread support for
them so I'll probably just commit them until such time (probably right
around the time that our Sun enters the red giant cycle) as termios(4)
grows more general functionality.

- Jordan


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 20:54: 2 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.rpi.edu (mail.rpi.edu [128.113.100.7])
	by hub.freebsd.org (Postfix) with ESMTP id D2A6A37B479
	for <arch@FreeBSD.ORG>; Mon, 27 Nov 2000 20:54:00 -0800 (PST)
Received: from [128.113.24.47] (gilead.acs.rpi.edu [128.113.24.47])
	by mail.rpi.edu (8.9.3/8.9.3) with ESMTP id XAA15352;
	Mon, 27 Nov 2000 23:53:52 -0500
Mime-Version: 1.0
X-Sender: drosih@mail.rpi.edu
Message-Id: <p0433011cb648e840ae6b@[128.113.24.47]>
In-Reply-To: <52694.975362925@winston.osd.bsdi.com>
References: <52694.975362925@winston.osd.bsdi.com>
Date: Mon, 27 Nov 2000 23:53:51 -0500
To: Jordan Hubbard <jkh@winston.osd.bsdi.com>, arch@FreeBSD.ORG
From: Garance A Drosihn <drosih@rpi.edu>
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE
 ISO image for x86 updated.)
Cc: rps@merlin.mat.uc.pt
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

At 2:08 PM -0800 11/27/00, Jordan Hubbard wrote:
>On the other hand, I can see that this specific case
>(erase) is by far the most significant.  Which is why I'm
>forwarding this to arch - this is one of those classic
>architecture/feature trade-off decisions and I would like
>to hear more opinions before deciding which way I'd like
>to respond to this.

Due to the variety of unixes that I have to deal with,
and the variety of ways I connect to them, I am forever
having headaches with the erase character.  Conceptually,
I am not thrilled with the idea of having a special
"erase2" option in stty.  But I'm so fed up with del vs ^H
in my own day-to-day operations that any improvement would
be welcome.

I wouldn't mind a more architecturally grand solution,
but this would be helpful enough that I'd be happy to
see it, if someone has already written the changes to
make it happen.
-- 
Garance Alistair Drosehn            =   gad@eclipse.acs.rpi.edu
Senior Systems Programmer           or  gad@freebsd.org
Rensselaer Polytechnic Institute    or  drosih@rpi.edu


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 20:57:25 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP id B1BCC37B479
	for <arch@FreeBSD.ORG>; Mon, 27 Nov 2000 20:57:22 -0800 (PST)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.3/8.9.3) id VAA14111;
	Mon, 27 Nov 2000 21:57:51 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp05.primenet.com, id smtpdAAAylaaUA; Mon Nov 27 21:56:55 2000
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id VAA25380;
	Mon, 27 Nov 2000 21:56:15 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200011280456.VAA25380@usr08.primenet.com>
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.)
To: jkh@winston.osd.bsdi.com (Jordan Hubbard)
Date: Tue, 28 Nov 2000 04:56:14 +0000 (GMT)
Cc: mckusick@mckusick.com (Kirk McKusick), arch@FreeBSD.ORG,
	rps@merlin.mat.uc.pt
In-Reply-To: <53352.975375693@winston.osd.bsdi.com> from "Jordan Hubbard" at Nov 27, 2000 05:41:33 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> That said, I'm still not fully convinced that termios was implemented
> in a fully sane fashion to begin with.  If one uses a fairly competent
> shell like bash, for example, you have a "bind" command which allows
> you to map any key to any function and I've used that feature to good
> effect in my .bashrc so I'd have a hard time with any argument that
> fully bindable keys is an over-engineered solution.  The major
> drawback, of course, is that these editing characters are only useful
> at the shell prompt and not with other programs which take input,
> which is why readline(3) type functionality would really not be such a
> horrible thing to see in termios(4).

Shades of VMS' CTERM protocol...

One nice thing that VMS did was to implement a state machine
in their tty driver; this let them do nice things, like
session switching on VT3xx terminals.  It also let you know
when the terminal was in the base state, as opposed to being
in the middle of processing an escape sequence, so you could
do things like modify the contents of a status line, or turn
transparent printing on, send some data, and turn it back off,
all without worring about managing multiplexing yourself.

Computone and Intelliport did this is in a general way for
more than just ANSI terminals (the only thing VMS worked with)
with their Xenix and UNIX drivers by downloading the state
tree down to the driver when the terminal type was set.  They
didn't support session switching, but they did support a tty
and printer device, muxed in the kernel, to let them support
a printer off the back of a terminal.  I'm actually aware of
a video rental chain that used these cards with the mux drivers
and Wyse terminals to support receipt printing, and most of the
systems are still in use today.

All that said, I think that terminals are probably only going
to become less and less common, as time goes on, and that it
would be a lot of effortspent  for naught to get readline or
similar functionality into FreeBSD's drivers.

Actually, this would probably be a perfect application for a
Streams module...


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Nov 27 22: 4:39 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from winston.osd.bsdi.com (winston.osd.bsdi.com [204.216.27.229])
	by hub.freebsd.org (Postfix) with ESMTP id BCA8137B4D7
	for <arch@FreeBSD.ORG>; Mon, 27 Nov 2000 22:04:36 -0800 (PST)
Received: from winston.osd.bsdi.com (localhost [127.0.0.1])
	by winston.osd.bsdi.com (8.11.1/8.11.1) with ESMTP id eAS64Sh51161;
	Mon, 27 Nov 2000 22:04:28 -0800 (PST)
	(envelope-from jkh@winston.osd.bsdi.com)
To: Terry Lambert <tlambert@primenet.com>
Cc: mckusick@mckusick.com (Kirk McKusick), arch@FreeBSD.ORG,
	rps@merlin.mat.uc.pt
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) 
In-Reply-To: Message from Terry Lambert <tlambert@primenet.com> 
   of "Tue, 28 Nov 2000 04:56:14 GMT." <200011280456.VAA25380@usr08.primenet.com> 
Date: Mon, 27 Nov 2000 22:04:27 -0800
Message-ID: <51159.975391467@winston.osd.bsdi.com>
From: Jordan Hubbard <jkh@winston.osd.bsdi.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> All that said, I think that terminals are probably only going
> to become less and less common, as time goes on, and that it
> would be a lot of effortspent  for naught to get readline or
> similar functionality into FreeBSD's drivers.

This doesn't just apply to terminals, this applies to anyone trying to
use a PTY through a remote session using anything from a Sun keyboard
to a Microsoft Unnatural keyboard.

> Actually, this would probably be a perfect application for a
> Streams module...

OK, I'm sorry, but we have to kill you now.

- Jordan


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Nov 28  6:28:45 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from point.osg.gov.bc.ca (point.osg.gov.bc.ca [142.32.102.44])
	by hub.freebsd.org (Postfix) with ESMTP id C2DCE37B400
	for <arch@FreeBSD.ORG>; Tue, 28 Nov 2000 06:28:42 -0800 (PST)
Received: (from daemon@localhost)
	by point.osg.gov.bc.ca (8.8.7/8.8.8) id GAA12929;
	Tue, 28 Nov 2000 06:28:00 -0800
Received: from passer.osg.gov.bc.ca(142.32.110.29)
 via SMTP by point.osg.gov.bc.ca, id smtpda12927; Tue Nov 28 06:27:40 2000
Received: (from uucp@localhost)
	by passer.osg.gov.bc.ca (8.11.1/8.9.1) id eASERYE06652;
	Tue, 28 Nov 2000 06:27:34 -0800 (PST)
Received: from cwsys9.cwsent.com(10.2.2.1), claiming to be "cwsys.cwsent.com"
 via SMTP by passer9.cwsent.com, id smtpdzp6650; Tue Nov 28 06:26:49 2000
Received: (from uucp@localhost)
	by cwsys.cwsent.com (8.11.1/8.9.1) id eASEQmU13919;
	Tue, 28 Nov 2000 06:26:48 -0800 (PST)
Message-Id: <200011281426.eASEQmU13919@cwsys.cwsent.com>
Received: from localhost.cwsent.com(127.0.0.1), claiming to be "cwsys"
 via SMTP by localhost.cwsent.com, id smtpde13915; Tue Nov 28 06:26:05 2000
X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4
Reply-To: Cy Schubert - ITSD Open Systems Group <Cy.Schubert@uumail.gov.bc.ca>
From: Cy Schubert - ITSD Open Systems Group <Cy.Schubert@uumail.gov.bc.ca>
X-OS: FreeBSD 4.2-RELEASE
X-Sender: cy
To: Daniel Eischen <eischen@vigrid.com>
Cc: Matt Dillon <dillon@earth.backplane.com>,
	Jordan Hubbard <jkh@winston.osd.bsdi.com>, arch@FreeBSD.ORG
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE 
 ISO image for x86 updated.)
In-reply-to: Your message of "Mon, 27 Nov 2000 18:39:45 EST."
             <Pine.SUN.3.91.1001127183815.21757A-100000@pcnet1.pcnet.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Tue, 28 Nov 2000 06:26:05 -0800
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In message <Pine.SUN.3.91.1001127183815.21757A-100000@pcnet1.pcnet.com>
, Daniel
 Eischen writes:
> On Mon, 27 Nov 2000, Matt Dillon wrote:
> >     This is one of those things where, 10 years ago, I would probably
> >     have been a purist and been opposed to it.
> > 
> >     But after 15+ years of pure hell having to deal with every
> >     conceivable combination of ^H and ^?, terminal types,
> >     telnet, rlogin, ssh, and so on and so forth...  I say to
> >     hell with the purist view on this one.  I'd love to
> >     see this committed!
> 
> I agree!  Commit this now before there are any objections!

Let's do it before it becomes worthy of a bikeshed debate.


Regards,                       Phone:  (250)387-8437
Cy Schubert                      Fax:  (250)387-5766
Team Leader, Sun/DEC Team   Internet:  Cy.Schubert@osg.gov.bc.ca
Open Systems Group, ITSD, ISTA
Province of BC


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Nov 28  9: 2:14 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.interware.hu (mail.interware.hu [195.70.32.130])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9093537B400; Tue, 28 Nov 2000 09:02:11 -0800 (PST)
Received: from nairobi-20.budapest.interware.hu ([195.70.50.212] helo=elischer.org)
	by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian))
	id 140o8f-0003Ty-00; Tue, 28 Nov 2000 18:01:53 +0100
Message-ID: <3A23E4F7.8E42EB3E@elischer.org>
Date: Tue, 28 Nov 2000 09:01:43 -0800
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Marius Bendiksen <mbendiks@eunet.no>
Cc: Alfred Perlstein <bright@wintelcom.net>,
	Daniel Eischen <eischen@vigrid.com>, John Baldwin <jhb@FreeBSD.ORG>,
	Jonathan Lemon <jlemon@flugsvamp.com>, arch@FreeBSD.ORG
Subject: Re: Thread-specific data and KSEs
References: <Pine.BSF.4.05.10011270847060.54186-100000@login-1.eunet.no>
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Marius Bendiksen wrote:
> 
> > > It's just one more register that has to be saved.  I don't
> > > think it's going to matter much.
> > No extra TLB faults/invalidations?  Aren't segment registers
> > somewhat expensive to load?
> 
> Upon loading a task state (with ltr or a gate), you will restore all
> segment registers from the tss, regardless of their content, and a load of
> the shadow portion of the segment will be attempted anyway. I don't think
> this is the right place to shave off cycles, nor do I think the speed is
> even the most relevant issue for this extension, but rather the abuse of
> segments that are ment to hold real data.

We don't use TSS to swap between processes..

> 
> Marius
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-arch" in the body of the message

-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000
---> X_.---._/  presently in:  Budapest
            v


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Nov 28  9:11:46 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.interware.hu (mail.interware.hu [195.70.32.130])
	by hub.freebsd.org (Postfix) with ESMTP id 4017037B401
	for <arch@freebsd.org>; Tue, 28 Nov 2000 09:11:44 -0800 (PST)
Received: from nairobi-20.budapest.interware.hu ([195.70.50.212] helo=elischer.org)
	by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian))
	id 140oIB-00046E-00; Tue, 28 Nov 2000 18:11:43 +0100
Message-ID: <3A227FF2.FD2CC41E@elischer.org>
Date: Mon, 27 Nov 2000 07:38:26 -0800
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: arch@FreeBSD.ORG
Cc: Daniel Eischen <eischen@vigrid.com>
Subject: Re: Thread-specific data and KSEs
References: <Pine.BSF.4.05.10011270847060.54186-100000@login-1.eunet.no>
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


One thing I just realised:

If we are using defered FP state saving and restoring in the kernel, then we
will have troubles with that when switching threads in userland, since the
handler for that is in the kernel. Of course we could set the place for it in
the KSE mailbox and let the kernel save the information when it needs it.

Julian


-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000
---> X_.---._/  presently in:  Budapest
            v


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Nov 28 10:13:25 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP id C7E6937B400
	for <arch@FreeBSD.ORG>; Tue, 28 Nov 2000 10:13:22 -0800 (PST)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id NAA21649;
	Tue, 28 Nov 2000 13:12:54 -0500 (EST)
Date: Tue, 28 Nov 2000 13:12:53 -0500 (EST)
From: Daniel Eischen <eischen@vigrid.com>
To: Julian Elischer <julian@elischer.org>
Cc: arch@FreeBSD.ORG
Subject: Re: Thread-specific data and KSEs
In-Reply-To: <3A227FF2.FD2CC41E@elischer.org>
Message-ID: <Pine.SUN.3.91.1001128124003.15989A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Mon, 27 Nov 2000, Julian Elischer wrote:
> 
> One thing I just realised:
> 
> If we are using defered FP state saving and restoring in the kernel, then we
> will have troubles with that when switching threads in userland, since the
> handler for that is in the kernel. Of course we could set the place for it in
> the KSE mailbox and let the kernel save the information when it needs it.

Our current threads library knows when to save and restore FP state;
it currently only happens when a signal is received (for i386, I think
alpha FP state is always saved both in jmp_buf and ucontext_t).

I think we want to avoid saving and restoring FP state unless it's
necessary.  That's probably only when a fault occurs or when the
KSE is preempted.  I like the idea of having the kernel save the
FP state in the thread state storage area (ucontext_t?) in the
KSE mailbox thingy.

Also, are we going to allow the kernel to follow links out of
the mailbox, or are we going to limit UTS<->kernel communication
to just this one page?  I think it might be preferable to only
communicate via the mailbox and never have the kernel attempt
to read/write to other areas of KSE/thread storage.  For instance,
we could place the pointer to the thread state storage area
in the mailbox.  But that would require a copyin, and then a
copyout to another page that might be paged out.  The drawback
of only using the mailbox is that it requires an additional copy
by the UTS every time an upcall is made (to copy the thread state
from the mailbox to the storage area in the thread).

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Nov 28 11: 3:47 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mailhost.iprg.nokia.com (mailhost.iprg.nokia.com [205.226.5.12])
	by hub.freebsd.org (Postfix) with ESMTP id 551CA37B400
	for <arch@FreeBSD.ORG>; Tue, 28 Nov 2000 11:03:45 -0800 (PST)
Received: from darkstar.iprg.nokia.com (darkstar.iprg.nokia.com [205.226.5.69])
	by mailhost.iprg.nokia.com (8.9.3/8.9.3-GLGS) with ESMTP id LAA21472;
	Tue, 28 Nov 2000 11:03:45 -0800 (PST)
Received: (from root@localhost)
	by darkstar.iprg.nokia.com (8.11.0/8.11.0-DARKSTAR) id eASJ3h621434;
	Tue, 28 Nov 2000 11:03:43 -0800
X-Virus-Scanned:  Tue, 28 Nov 2000 11:03:43 -0800 Nokia Silicon Valley Email Exploit Scanner
Received: from dhcp-15-155.iprg.nokia.com (205.226.15.155, claiming to be "iprg.nokia.com")
	by darkstar.iprg.nokia.com(WTS.12.69) smtpdjDaDWg; Tue, 28 Nov 2000 11:02:13 PST
Message-ID: <3A240337.D8109556@iprg.nokia.com>
Date: Tue, 28 Nov 2000 11:10:47 -0800
From: Michael Williams <michaelw@IPRG.nokia.com>
Organization: Nokia
X-Mailer: Mozilla 4.7 [en] (Win98; U)
X-Accept-Language: en,pdf
MIME-Version: 1.0
To: Jason Evans <jasone@canonware.com>
Cc: Julian Elischer <julian@elischer.org>, arch@FreeBSD.ORG
Subject: Re: Threads (KSE etc) comments
References: <Pine.SUN.3.91.1001121160717.7102A-100000@pcnet1.pcnet.com> <3A1B0B64.6D694248@elischer.org> <3A211C82.2464D07E@elischer.org> <20001127154800.M4140@canonware.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

FYI SPARC machines from Sun or Fujitsu do this both in h/w or just in software
i.e. cpu_offline().

Michael

Jason Evans wrote:

One example of why enforcing KSE/KSEG limits could become hard in the
future is if the number of processors is dynamic (i.e. processors can be
added and removed).  In discussions I've had with Mike Smith, this is a
very real possibility, and is something we should keep in mind.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Nov 28 12:48:43 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.interware.hu (mail.interware.hu [195.70.32.130])
	by hub.freebsd.org (Postfix) with ESMTP id 35DD837B402
	for <arch@freebsd.org>; Tue, 28 Nov 2000 12:48:37 -0800 (PST)
Received: from luanda-25.budapest.interware.hu ([195.70.51.25] helo=elischer.org)
	by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian))
	id 140rg1-0004xc-00; Tue, 28 Nov 2000 21:48:34 +0100
Message-ID: <3A24064C.A3DF52A8@elischer.org>
Date: Tue, 28 Nov 2000 11:23:56 -0800
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Jason Evans <jasone@canonware.com>
Cc: arch@FreeBSD.ORG
Subject: Re: Threads (KSE etc) comments
References: <Pine.SUN.3.91.1001121160717.7102A-100000@pcnet1.pcnet.com> <3A1B0B64.6D694248@elischer.org> <3A211C82.2464D07E@elischer.org> <20001127154800.M4140@canonware.com>
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Jason Evans wrote:
> 
>
> I don't understand how we're usurping the UTS's scheduling decisions.  If a
> KSE is preempted, then an upcall (resulting in yet another preemption, if
> necessary) must be done right away in order to give the UTS enough
> information to correctly schedule threads on the KSEs that are still
> running.  This is one of the basic tenets of scheduler activations, which
> we really have to follow.

I don't think it's practical to upcall to the UTS when you preempt one of 
it's KSEs. Would you hold off servicing a higher priority process while 
you are chatting with the UTS? Of course not.. If the UTS wanted a thread 
to have a priority high enough to avoid being pre-empted by  a process of 
priority X then it should have put it into a KSEG running at priority X+1.
What you DO want to do is notify the UTS at the next possible convenient 
moment of the fact. This would be at the completion of a syscall in some
other KSE, or the resumption of a previously suspended KSE.
(these are the same times when signals are delivered).  You COULD
pre-empt another KSE (on another processor)
 (if it was in a thread) and do an upcall to it's UTS
but I don't know that the complexity (including inter-CPU communications
within the kernel and delivery of an inter-CPU interupt) is worth it.
You really can't do anything about the period of time between the
pre-emption and the first available notification moment. We put up with 
a delay equal to, or greater than this with signal delivery today.
You cannot hold up the higher priority process. (As I said, we give a 
method by which the UTS can ensure that some threads are harder to 
pre-empt than others(KSEGs. It should use this to avoid the pre-emption 
in the first place..

> 
> > If the Former (All KSEs report all events) then there is no real
> > advantage to having more than N KSEs (N processors), because that means
> > that the UTS will probably keep swapping the threads it thinks are most
> > important to the KSEs which means that the thread that was pre-empted on
> > KSE-A will probably be re-scheduled on KSE-B when it preempts KSE-A. So
> > why have KSE-B at all? All it does is massively confuse things, and
> > creates a whole new class of scheduling problems.
> 
> The main advantage I see of allowing more KSEs than processors (total
> across all KSEGs) is that it simplifies the implementation considerably.
> Very little has to be changed about how things currently work, which also
> means that single-threaded applications work the same as they do now,
> without a lot of extra work.

We are agreed that the TOTAL number of KSEs may be greater than the number
of processes.

> 
> I agree that there's no good reason to have more KSEs in a KSEG than there
> are processors, but it doesn't actually break anything to allow this, and
> simply using process resource limits to control the number of KSEs is
> simpler than enforcing limits on the number of KSEs per KSEG.

I think that if you think of a KSEG as a contention domain it gives
a different viewpoint.
Threads are assigned to a KSEG and will not contend against each other
in the system-scope. All threads in the same KSEG share the same 
CPU resources and can migrate pretty easily around the KSEs assigned to
the KSEG. (the only reason to keep them on a cpu would be for cache effects).

KSEs within a KSEG do not directly contend with each other, and KSEGs
DO contend with each other (potentially), so you can see that it would 
simplify some things if you make the following rules..

1/ Only one KSE from any given KSEG may be on any single processor at any
given time. Maybe you only shift a KSE when it's idle, or in the kernel, and
only onto a processor which doesn;t already have a KSE from that KSEG in it.

2/ You only allow the number of KSEs to be <= N where N is the number of 
processors available.

I can't prove it yet, but thinking about the implementation, I can't help
but feel in my gut that making these rules will allow some solutions to just
"fall out" that otherwise may require a lot more work.

I'm particularly worried about a KSE being pre-empted while in the UTS.
The kernel isn't going to hold off the pre-emption just because the  
process thinks it's a bad idea... it has a higher priority process 
screaming in its ear wanting cycles. If, later, we then come in on 
another KSE, on the same processor we can't really guarantee that 
we will not have a locking collision within our resourses, with 
the UTS that is presently swapped out. It's not really a 
deadlock but we will take a big hit in time as we have to 
wait for the other KSE to be run again before we can get what 
would otherwise have been a very short term lock.

I'm not explaining it very well, because it sort of relies on a lot 
of other stuff in my head. 

maybe I need to write that down.. ok here goes.

In my imagination of the implementation, KSEs each have a user upcall
stack and a mailbox. The UTS is run on those stacks. Threads are 
assigned to a KSEG and the UTS will prefer to run them on a single KSE
for as long as it's easy to do so, but will often and easily switch them
between KSEs (if there are several) for load balancing. i.e. there
is very little binding of threads to KSEs within a single KSEG. If there
are only N KSEs (N=numProcessors) then locking between the UTS instances
in the same KSEG is rather trivial, and can be pretty much limitted 
to brief spinlocks. Since these contentions will be more common than 
contentions between UTS agents in other KSEGs (threads won't often be 
migrating between KSEGs), keeping the locks simple is important. 
If we have N>numProcessors then we need to take into account 
(by my thinking) the potential serialisation of KSEs and pre-emptions
such as that I mentionned above. If we don't then we can allow
communications between threads in the same KSEG to use a much simpler
locking and synchronisation scheme. You use a more heavyweight scheme
between threads in different KSEGs. if you wnat to make a program that has 
many KSEs, just put them all in different KSEGs all with the same
scheduler priority, but be prepared to pay the price of heavier 
weight communications and synchronisations. With a limit of N KSEs 
we can also experiment with such things as gang scheduling, where 
we might ask that all KSEs in a  KSEG are iff possible sschedlued 
across all the processors at once. This can give massive throughput 
imporvvements in some applications. Particularly ones where threads
are communicating with each other using a ping-pong protocol.

>From the kernel point of view we need not limit the number. but I think
that it is foolish to not do so.


> 
> One example of why enforcing KSE/KSEG limits could become hard in the
> future is if the number of processors is dynamic (i.e. processors can be
> added and removed).  In discussions I've had with Mike Smith, this is a
> very real possibility, and is something we should keep in mind.

OK I agree that this is possible. And I see that this would require that
we can pre-empt a KSE in user mode, while it is in the UTS, and
allow it to try run to completion (get out of the UTS) somewher else.
yuk.
> 

> The UTS doesn't need to be any more complex.  It would simply get more
> upcalls if there were more preemptions as a result of excessive KSEs, which
> I don't think would happen anyway.

As I said, I'm worried about the UTS itslef.

> 
> > The reason for having KSEGs is simply as an entity that competes for CPU
> > to assure fairness.
> > It may not even exist as a separate structure in the case where there
> > are separate per-CPU scheduling queues, (though I think it would for
> > efficiency's sake). It would PROBABLY have a analogous partner in the
> > UTS that represents the virtual machine that runs all the threads that
> > are competing at the same scope.
> 
> I agree with everything you say here.
> 
> > On a single scheduling queue system, I
> > think I would have the KSEG in the queue rather than the independent
> > KSEs. When it get's to the head, you schedule
> > KSEs on all the CPUs. This allows the threads to communicate quickly
> > using shared memory should they want. The UTS has the entire quantum
> > across as many CPUs as it has.
> 
> As I mentioned in another email, I don't think we should plan on having a
> production release that is implemented with only a single scheduling queue.

fair enough

> 
> Jason

-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000
---> X_.---._/  presently in:  Budapest
            v


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Nov 28 12:49: 0 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.interware.hu (mail.interware.hu [195.70.32.130])
	by hub.freebsd.org (Postfix) with ESMTP
	id E8F0D37B400; Tue, 28 Nov 2000 12:48:54 -0800 (PST)
Received: from luanda-25.budapest.interware.hu ([195.70.51.25] helo=elischer.org)
	by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian))
	id 140rgA-0004yq-00; Tue, 28 Nov 2000 21:48:43 +0100
Message-ID: <3A2419AD.43A14605@elischer.org>
Date: Tue, 28 Nov 2000 12:46:37 -0800
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: "Brian F. Feldman" <green@FreeBSD.org>
Cc: arch@FreeBSD.org, jasone@FreeBSD.org
Subject: Re: Threads .. chopping up 'struct proc'
References: <200011262239.eAQMd0576413@green.dyndns.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

"Brian F. Feldman" wrote:
> 
> Julian Elischer <julian@elischer.org> wrote:
> > I'v been looking a the proc srtucture..
> >
> > The aim is to eventually move some of the fields into a
> > struct KSE (struct schedbox?)
> > struct KSEC (struct threadcontext?)
> > struct KSEG (struct schedgroup?)
> 
> Sounds about right, as far as I've been following the discussion (I read all
> of -arch, but don't follow -smp at all since I just don't have SMP ;)
> 
> My question thus far is, okay, given a proc has one of each; will a set of
> threads, in any form, ALWAYS have a proc backing it up?  It would make sense
> as such, and in that case I'd think that you would reduce a lot of the
> complexity in the switchover.
> 
> > Initially we would simply include one of each of these in the struct proc,
> > but link them together as if they were correctly connected up.
> > we would use macros such as:
> > #define p_estcpu p_kse.kse_estcpu
> > to keep present code working....
> > eventually functions that get changed to receive a kse directly
> > would just use kse->kse_estcpu and if they need proc they
> > can use kse->kse_proc. But until then, we'd start by simply
> > separating the fields and using macros. Then we can convert
> > calls at our leasure.
> 
> What would be the difference between doing it "right" for struct proc in the
> first place rather than dummying them up?  I wouldn't want an artificial
> discrepancy here, if possible.  Perhaps you could explain a bit more of the
> vision you have here?  I haven't been able to pick that bit up from your
> posts as of yet.  A KSE of just one thread would seem to logically be
> handled the exact same as a process.
> 
> > However when going through the fields in struct proc,
> > some difficulties become obvious. Here's my initial
> > division of the fields. I've added a comment at the
> > beginning of each line that indicates where I think
> > it should go, however I'm not convinced about some of them:
> >
> > P = stays in struct proc
> > E = goes to 'KSE' struct (schedulable entity)
> > G = goes to 'group' struct
> > C = goes to 'sleepable Context' struct.
> 
> Does each KSE get a sleepable context?  I don't know if I really see where
> it fits in; sounds like it would have a 1:1 mapping with KSEs.
> 

Ok I'm going to only answer this question here as I'm off to school inthe
morning and it's 12:30 
AM now.. but you have a misconception so I'll try clear that up quickly..

A KSE doesn't have a stack. It doesn't have any state WRT system call execution.
When a system call happens, controll passes from userland, to a waiting KSE that
is presently assigned to teh processor you are on, and your process. The KSE
grabs a spare "KSEC - KSE CONTEXT) (maybe it already has is sitting ready) and
uses it. The KSEC supplies a stack and storage for anything that describes the
state of the processor at any moment during the syscall.

When the system call blocks, the KSEC is left on the sleep queue, and the KSE
grabs another one, and performs an upcall to the Userland Thread scheduler,
which schedules another thread. When THAT thread does a system call, the system
call is executed, storing a set of frames and state onto the stack in the NEW
KSEC. If, in turn, that blocks, it too is thrown onto the sleep queue.

Everything needed to complete the system calls is in the KSECs, which is
hibernating on the Sleep queues. When the system call is reawakenned, the
kernel, waits for a scheduling event in which a KSE from that process (possibly
the same one) is being scheduled. It then reassociates the first KSEC (with it's
stack and stored processor context) with that KSE and then completes the system
call (including any copyout()s or copyin()s). However, instead of crossing back
to user space when it gets back up to the boundary, it puts the syscall's return
information in the mailbox that the Thread system configured (I skipped that
bit) for that thread (don't worry it's trivial), and checks if there are any
more awakened syscalls to complete. It keeps doing this until there are no more
awakening KSECs, at which time it does an upcall to the process. This results in
the Userland Thread Scheduler (UTS) picking up all the completed threads,
deciding which is the highest priority, and running it, as if it were just
returning from the kernel.
I forgot to mention that the mailboxes for the completed threads are linked
together by the kernel before doing the upcall, and the resulting list is passed
as a single pointer to UTS.

Note: the thread that was running when the KSE was pre-empted is also in the
list of threads that is returned to the UTS when the upcall happens, so the UTS
may decide to let it continue running.
It didn't voluntarily do a syscall, but it did cross to the kernel when the
timer interrupt occured, so it can be faked up to look the same. If it was in a
critical region, then of course it should have marked that fact, so it would be
scheduled first. A process may have a KSE for each physical processor. When  it
creates a new KSE (upto the maximum of N) it sets up a KSE mailbox. When it
shedules a thread, it places a pointer to the Thread mailbix in the KSE mailbox.
The KSE always knows where it's mailbox is so it can always find the thread
mailbox of the thread that just made the systemcall. When the syscall blocks,
that thread mailbox address is stored int the KSEC, and it is zero's out from
the KSE's mailbox. 

When an upcall happens, the KSE adds the linked list of all completed syscall's
mailboxes in that same KSE mailbox, as well. The UTS just takes that list, and
adds the threads mentionned onto it's lists of runnable threads, and then makes
a schedulaing decision and runs the highest priority thread. It sets the mailbox
address of that thread into the KSE's mailbox, and jumps into the thread..
etc.etc.

I haven't mentionned KSEGs here but if you are limited to N KSEs, you want a
container into which you want to put extra competeing KSEs (for example a super
High prority thread).
usually you just have one KSEG, but you may start another, in which they are
treated by teh system much like two separate processes. each with it's own KSEs.

more later.
Julian
 
-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000
---> X_.---._/  presently in:  Budapest
            v


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Nov 28 14:20:55 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.interware.hu (mail.interware.hu [195.70.32.130])
	by hub.freebsd.org (Postfix) with ESMTP id 9001E37B400
	for <arch@freebsd.org>; Tue, 28 Nov 2000 14:20:47 -0800 (PST)
Received: from timbuktu-06.budapest.interware.hu ([195.70.51.198] helo=elischer.org)
	by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian))
	id 140t7A-0004y2-00; Tue, 28 Nov 2000 23:20:40 +0100
Message-ID: <3A242FAF.313295F0@elischer.org>
Date: Tue, 28 Nov 2000 14:20:31 -0800
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Daniel Eischen <eischen@vigrid.com>
Cc: arch@FreeBSD.ORG
Subject: Re: Thread-specific data and KSEs
References: <Pine.SUN.3.91.1001128124003.15989A-100000@pcnet1.pcnet.com>
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Daniel Eischen wrote:
> 
> On Mon, 27 Nov 2000, Julian Elischer wrote:
> >
> > One thing I just realised:
> >
> > If we are using defered FP state saving and restoring in the kernel, then we
> > will have troubles with that when switching threads in userland, since the
> > handler for that is in the kernel. Of course we could set the place for it in
> > the KSE mailbox and let the kernel save the information when it needs it.
> 
> Our current threads library knows when to save and restore FP state;
> it currently only happens when a signal is received (for i386, I think
> alpha FP state is always saved both in jmp_buf and ucontext_t)

That's comforting.

I was looking at the ia64 specs..
that thing presents some interesting challenges in regards to 
the 'intelligent' stack it has. It will be very hard to play 
games with it's stack when it's cached inside the chip. I 
presume they have a scheme to allow such things as threads, 
but it looks a mess from here.

> 
> I think we want to avoid saving and restoring FP state unless it's
> necessary.  That's probably only when a fault occurs or when the
> KSE is preempted.  I like the idea of having the kernel save the
> FP state in the thread state storage area (ucontext_t?) in the
> KSE mailbox thingy.

The question is, what happens to the FPU context when you swap threads?
should each thread have it's own FPU context?
If there is one for the KSE, might that be not enough?
especially if the KSE was pre-empted. If a thread is migrated to
another KSE, having last been pre-empted, it becomes important 
that the FPU state go with it because it may have been part way 
through some calculations when that KSE was stopped. And what if 
the new KSE already has one that wsa stopped in thasame way?
it looks to me like you need to have one per thread.

It's not much different fromt eh point of view of the kernel.
WHen you create a KSE you give it's mailbox.
when you schedule a thread onto the KSE you set a pointer in that
mailbox to the thread's context and state storage area.
The kernel can easily follow that link when it pre-empts the KSE
to store the General regs, the FPU regs etc.
Theoretically it might only store the regs there in a syscall 
if it looks like the syscall will block. but the aim would be 
to make allthreads look the same when stopped so that the UTS
can restart any one it chooses.


> 
> Also, are we going to allow the kernel to follow links out of
> the mailbox, or are we going to limit UTS<->kernel communication
> to just this one page?  I think it might be preferable to only
> communicate via the mailbox and never have the kernel attempt
> to read/write to other areas of KSE/thread storage. 

The kernel already has to follow links etc for (for example)
the readv() syscall. it's not that big a step. If you allocate 
all the thread-context blocks together, the pages they are in will
be pretty hot. There are great advantages to having the KSEs
being able to follow links. For example it means that the kernel
can ALWAYS deliver a linked list of ALL completed and 'ready-to-run'
threads. It can set them up so that each one will look exactly 
as if it has just returned from the syscall. If you only deal
with one structure, you have to consider what happens when you 
cannot fit all returning threads into the single structure.
As the kernel takes control (as the syscall or trap is entered)
it notes where the context block is and when and if it decide 
it needs to save context, it knows where to put it. The UTS is 
given everything on a plate, and it's almost easier to do it
this way for the kernel too. It can store this address with
the KSEC and use it without any fear of ever having a clash with
some other returning syscall (for example). I can imagine where
a syscall starts on one KSE and is completed on another.
it makes sence for the context to travel with the thread/KSEC
rather than the KSE, which may suddenly have 4 syscalls
all coming back within the same upcall. (where do you save
 all that data?)


> For instance,
> we could place the pointer to the thread state storage area
> in the mailbox.  But that would require a copyin, and then a
> copyout to another page that might be paged out. 

Since the thread storage is part of the thread control block 
that the UTS has just used to schedule the thread, it's  
unlikely to be paged out. Even less so if it shares a page 
with other thread control blocks
And you could always protect it with madvise().

In any case what you suggest above is EXACTLY what jason and I 
were planning on doing. The kernel will define a structure
in /sys/i386/include/kse.h (or somewhere) called something like
struct user_process_context
which you would include in your thread control block.
it would include a link to other such blocks (so we can 
return a linked list of completed or pre-empted threads (KSECs)
and a status word that says whether it is a completed syscall, or a
preempted thread, or whatever, and a cookie that the kernel
doesn't touch so you can extend it with whatever else you need.


> The drawback
> of only using the mailbox is that it requires an additional copy
> by the UTS every time an upcall is made (to copy the thread state
> from the mailbox to the storage area in the thread).

You forget that a single upcall may want to return 37 completed 
syscalls or pre-empted threads.

Using my scheme.. there is an upcall, and bingo the UTS has ALL
the completed items at once. 
It sorts them onto the runnable queues, selects it's favourite
and puts that address into it's mailbox, loads the context,
and it's off an running the next thread.


> 
> --
> Dan Eischen
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-arch" in the body of the message

-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000
---> X_.---._/  presently in:  Budapest
            v


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Nov 28 15:27:24 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222])
	by hub.freebsd.org (Postfix) with ESMTP id 5B92537B401
	for <arch@FreeBSD.org>; Tue, 28 Nov 2000 15:27:22 -0800 (PST)
Received: from laptop.baldwin.cx (john@dhcp246.osd.bsdi.com [204.216.28.246])
	by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eASNQkC95291;
	Tue, 28 Nov 2000 15:26:46 -0800 (PST)
	(envelope-from jhb@FreeBSD.org)
Message-ID: <XFMail.001128152659.jhb@FreeBSD.org>
X-Mailer: XFMail 1.4.0 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <3A242FAF.313295F0@elischer.org>
Date: Tue, 28 Nov 2000 15:26:59 -0800 (PST)
From: John Baldwin <jhb@FreeBSD.org>
To: Julian Elischer <julian@elischer.org>
Subject: Re: Thread-specific data and KSEs
Cc: arch@FreeBSD.org, Daniel Eischen <eischen@vigrid.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On 28-Nov-00 Julian Elischer wrote:
> Daniel Eischen wrote:
>> 
>> On Mon, 27 Nov 2000, Julian Elischer wrote:
>> >
>> > One thing I just realised:
>> >
>> > If we are using defered FP state saving and restoring in the kernel, then
>> > we
>> > will have troubles with that when switching threads in userland, since the
>> > handler for that is in the kernel. Of course we could set the place for it
>> > in
>> > the KSE mailbox and let the kernel save the information when it needs it.
>> 
>> Our current threads library knows when to save and restore FP state;
>> it currently only happens when a signal is received (for i386, I think
>> alpha FP state is always saved both in jmp_buf and ucontext_t)
> 
> That's comforting.
> 
> I was looking at the ia64 specs..
> that thing presents some interesting challenges in regards to 
> the 'intelligent' stack it has. It will be very hard to play 
> games with it's stack when it's cached inside the chip. I 
> presume they have a scheme to allow such things as threads, 
> but it looks a mess from here.

You can disable the RSE and flush it out.  This is done during context switches
for example, and to setup the stack frame for signal handling I believe, though
signal handling isn't quite finished yet.

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Nov 28 16:15:50 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP id E157137B404
	for <arch@FreeBSD.ORG>; Tue, 28 Nov 2000 16:15:43 -0800 (PST)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id TAA18076;
	Tue, 28 Nov 2000 19:15:20 -0500 (EST)
Date: Tue, 28 Nov 2000 19:15:19 -0500 (EST)
From: Daniel Eischen <eischen@vigrid.com>
To: Julian Elischer <julian@elischer.org>
Cc: arch@FreeBSD.ORG
Subject: Re: Thread-specific data and KSEs
In-Reply-To: <3A242FAF.313295F0@elischer.org>
Message-ID: <Pine.SUN.3.91.1001128181622.9762A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Tue, 28 Nov 2000, Julian Elischer wrote:
> Daniel Eischen wrote:
> > I think we want to avoid saving and restoring FP state unless it's
> > necessary.  That's probably only when a fault occurs or when the
> > KSE is preempted.  I like the idea of having the kernel save the
> > FP state in the thread state storage area (ucontext_t?) in the
> > KSE mailbox thingy.
> 
> The question is, what happens to the FPU context when you swap threads?
> should each thread have it's own FPU context?

Yes, it currently does now.  It's hard to imagine every thread
not having FPU context.  A thread that is preempted needs to
have it's context restored when it runs again.  You seem to
be forgetting that the UTS can choose another thread to run
and _not_ resume the preempted thread.  So the UTS chooses
another thread, runs it, and then you can have another fault
or preemption which needs to save FP state again.

> If there is one for the KSE, might that be not enough?
> especially if the KSE was pre-empted. If a thread is migrated to
> another KSE, having last been pre-empted, it becomes important 
> that the FPU state go with it because it may have been part way 
> through some calculations when that KSE was stopped. And what if 
> the new KSE already has one that wsa stopped in thasame way?
> it looks to me like you need to have one per thread.

Ahh, exactly.

> It's not much different fromt eh point of view of the kernel.
> WHen you create a KSE you give it's mailbox.
> when you schedule a thread onto the KSE you set a pointer in that
> mailbox to the thread's context and state storage area.
> The kernel can easily follow that link when it pre-empts the KSE
> to store the General regs, the FPU regs etc.
> Theoretically it might only store the regs there in a syscall 
> if it looks like the syscall will block. but the aim would be 
> to make allthreads look the same when stopped so that the UTS
> can restart any one it chooses.

That would be my goal also.

> > Also, are we going to allow the kernel to follow links out of
> > the mailbox, or are we going to limit UTS<->kernel communication
> > to just this one page?  I think it might be preferable to only
> > communicate via the mailbox and never have the kernel attempt
> > to read/write to other areas of KSE/thread storage. 
> 
> The kernel already has to follow links etc for (for example)
> the readv() syscall. it's not that big a step. If you allocate 
> all the thread-context blocks together, the pages they are in will
> be pretty hot. There are great advantages to having the KSEs
> being able to follow links. For example it means that the kernel
> can ALWAYS deliver a linked list of ALL completed and 'ready-to-run'
> threads. It can set them up so that each one will look exactly 
> as if it has just returned from the syscall. If you only deal
> with one structure, you have to consider what happens when you 
> cannot fit all returning threads into the single structure.

You should only have to copy context back to the KSE for
one thread.  When a thread blocks in the kernel, you don't
need to wait to copyout it's context.  You can do it immediately.
All you need to pass out when the thread is resumed in the
kernel and ready to return to userland is the return value
from the system call.  For faults and preemptions there are
no return values I'd guess.

> As the kernel takes control (as the syscall or trap is entered)
> it notes where the context block is and when and if it decide 
> it needs to save context, it knows where to put it. The UTS is 
> given everything on a plate, and it's almost easier to do it
> this way for the kernel too. It can store this address with
> the KSEC and use it without any fear of ever having a clash with
> some other returning syscall (for example). I can imagine where
> a syscall starts on one KSE and is completed on another.
> it makes sence for the context to travel with the thread/KSEC
> rather than the KSE, which may suddenly have 4 syscalls
> all coming back within the same upcall. (where do you save
>  all that data?)

Here's the way I see it.  A thread blocks in the kernel, or
is preempted, has a fault.  "There can be only one" (name the
movie!) thread running in the KSE at a time.  You copyout
the context to the KSE, _then_ make the upcall.  The KSE
upcall handler then copies the context (along with FP
state if saved) to the threads context storage area.  Another
thread is chosen and executed.  The kernel need not follow
links, the KSE upcall handler can handle placing the context
in the threads storage area.

When a thread becomes unblocked in the kernel, the UTS
already has its context.  All it needs now is enough information
for the return value of the system call.  If the UTS has to
munge with the context a bit, it can.

This will let you set the 1 page used for the mailbox so that
it won't be paged out and not worry about whether memory anywhere
else in the UTS/thread is paged out.

> > For instance,
> > we could place the pointer to the thread state storage area
> > in the mailbox.  But that would require a copyin, and then a
> > copyout to another page that might be paged out. 
> 
> Since the thread storage is part of the thread control block 
> that the UTS has just used to schedule the thread, it's  
> unlikely to be paged out. Even less so if it shares a page 
> with other thread control blocks
> And you could always protect it with madvise().
> 
> In any case what you suggest above is EXACTLY what jason and I 
> were planning on doing. The kernel will define a structure
> in /sys/i386/include/kse.h (or somewhere) called something like
> struct user_process_context
> which you would include in your thread control block.
> it would include a link to other such blocks (so we can 
> return a linked list of completed or pre-empted threads (KSECs)
> and a status word that says whether it is a completed syscall, or a
> preempted thread, or whatever, and a cookie that the kernel
> doesn't touch so you can extend it with whatever else you need.

Well, if you want to take the extra step of performing a copyin
and following a link, I can live with that.  But, I wouldn't wait until
the thread becomes unblocked to copyout its context.  Just
do it immediately and its much easier.

Actually, you have to copyout its context immediately.  Let's
say a thread blocks in the kernel on a read().  Then let's say
the thread is sent a signal (and sa_flags is SA_RESTART).  The
UTS needs the threads context (at least the stack pointer) so
it can create a signal frame on top of its stack.  The signal
handler will run in the context of the thread while it is still
blocked in the kernel.  The thread will also need to use its
context storage area because it may again be preempted or
blocked.  I've spent considerable time trying to get signal
handling working correctly in our threads library, and this
is about the only way that really works.

Actually, there is even another problem.  Suppose you have:

  static pthread_t tid;
  static jmp_buf jmpbuf;

  static void sighandler(int signo)
  {
    if (signo == SIGALRM && pthread_equal(pthread_self(), tid))
      _longjmp(jmpbuf, 1);
  }

  my_thread(void *arg)
  {
    char buf[128];
    int fd = (int)arg;
    int ret;

    tid == pthread_self();
    if (_setjmp(jmpbuf) == 0) {
      ret = read(fd, buf, sizeof(buf));
    }
    else {
      printf("Thread is exiting.\n");
      pthread_exit(NULL);
    }
  }

  pthread_kill(tid, SIGUSR1);
  ...
  pthread_kill(tid, SIGALRM);
  ...

This is perfectly valid.  And you can also have compilers that generate
builtin longjmps for exception handling.  In that case, we can't even
wrap longjmp/_longjmp in order to do cleanup handling.  So the kernel
still thinks the read() is active.  All sorts of gnarly stuff can happen.

I think we need a way to tell the kernel to halt any pending activities
for the KSEC that was blocked before trying to deliver any signals.
If the thread returns normally from the signal handler, then the
KSEC can be resumed.  In the case of an abnormal return/jump out
of the signal handler, I don't know how we'd inform the kernel that
the KSEC could be reused; the UTS doesn't know if the thread is
still operating in the signal handler or has jumped out of it.

It'd be nice if the UTS could retrieve the KSEC state/storage.
If it could, the UTS could copy it to the signal handling frame
so that a normal return from the signal handler could pass it
back to the kernel.

We need to work this out.

> > The drawback
> > of only using the mailbox is that it requires an additional copy
> > by the UTS every time an upcall is made (to copy the thread state
> > from the mailbox to the storage area in the thread).
> 
> You forget that a single upcall may want to return 37 completed 
> syscalls or pre-empted threads.

Already explained above.

> Using my scheme.. there is an upcall, and bingo the UTS has ALL
> the completed items at once. 
> It sorts them onto the runnable queues, selects it's favourite
> and puts that address into it's mailbox, loads the context,
> and it's off an running the next thread.

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Nov 28 23:52: 3 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from homer.softweyr.com (bsdconspiracy.net [208.187.122.220])
	by hub.freebsd.org (Postfix) with ESMTP
	id 139D337B401; Tue, 28 Nov 2000 23:52:01 -0800 (PST)
Received: from [127.0.0.1] (helo=softweyr.com ident=Fools trust ident!)
	by homer.softweyr.com with esmtp (Exim 3.16 #1)
	id 14124g-0000QL-00; Wed, 29 Nov 2000 00:54:42 -0700
Message-ID: <3A24B642.34B50961@softweyr.com>
Date: Wed, 29 Nov 2000 00:54:42 -0700
From: Wes Peters <wes@softweyr.com>
Organization: Softweyr LLC
X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.2.12 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Matt Dillon <dillon@earth.backplane.com>
Cc: Kris Kennaway <kris@FreeBSD.ORG>,
	Jordan Hubbard <jkh@winston.osd.bsdi.com>, arch@FreeBSD.ORG,
	rps@merlin.mat.uc.pt
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image 
 for x86 updated.)
References: <52694.975362925@winston.osd.bsdi.com> <20001127144809.A67395@citusc17.usc.edu> <200011272307.eARN7Ln34886@earth.backplane.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Matt Dillon wrote:
> 
> :> I just received this today and am kind of scratching my head over it.
> :> On one hand, creating an "alias" for a one specific piece of terminal
> :> character mapping seems a hack; I can see the idea behind wanting to
> :> use one of n characters for something like backspacing or line-killing
> :> (^U or ^X for example) and would not frown (as much) on a more general
> :> aliasing feature.  On the other hand, I can see that this specific
> :> case (erase) is by far the most significant.  Which is why I'm
> :> forwarding this to arch - this is one of those classic
> :> architecture/feature trade-off decisions and I would like to hear more
> :> opinions before deciding which way I'd like to respond to this.
> :
> :This is a very common newbie problem ("Stupid FreeBSD won't let me
> :delete what I've typed, it just prints ^H!"). Commit please! :)
> :
> :Kris
> 
>     This is one of those things where, 10 years ago, I would probably
>     have been a purist and been opposed to it.
> 
>     But after 15+ years of pure hell having to deal with every
>     conceivable combination of ^H and ^?, terminal types,
>     telnet, rlogin, ssh, and so on and so forth...  I say to
>     hell with the purist view on this one.  I'd love to
>     see this committed!

IMHO, this is one of the biggest arguments for using bash.  I get bitten
all the time when I leave bash for another interactive program that no
longer provides BS/DEL compatibility.  Fixing it everywhere is a good
idea.

-- 
            "Where am I, and what am I doing in this handbasket?"

Wes Peters                                                         Softweyr LLC
wes@softweyr.com                                           http://softweyr.com/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29  7:50:53 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mgw-dax2.ext.nokia.com (mgw-dax2.ext.nokia.com [63.78.179.217])
	by hub.freebsd.org (Postfix) with ESMTP id 0CE6037B400
	for <arch@freebsd.org>; Wed, 29 Nov 2000 07:50:52 -0800 (PST)
Received: from davir03nok.americas.nokia.com (davir03nok.americas.nokia.com [172.18.242.86])
	by mgw-dax2.ext.nokia.com (Switch-2.1.0/Switch-2.1.0) with ESMTP id eATFp2615063
	for <arch@freebsd.org>; Wed, 29 Nov 2000 09:51:12 -0600 (CST)
Received: from daebh01nok.americas.nokia.com (unverified) by davir03nok.americas.nokia.com
 (Content Technologies SMTPRS 4.1.5) with ESMTP id <Tac12f256502c3ac079@davir03nok.americas.nokia.com> for <arch@freebsd.org>;
 Wed, 29 Nov 2000 09:50:31 -0600
Received: by daebh01nok with Internet Mail Service (5.5.2652.78)
	id <XMR5P3T7>; Wed, 29 Nov 2000 09:46:12 -0600
Message-ID: <B9CFA6CE8FFDD211A1FB0008C7894E4602228767@bseis01nok>
From: Atul.Sharma@nokia.com
To: arch@freebsd.org
Subject: 
Date: Wed, 29 Nov 2000 09:42:17 -0600
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2652.78)
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

subscribe arch@FreeBSD.ORG


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 10:11:58 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from net2.gendyn.com (nat2.gendyn.com [204.60.171.12])
	by hub.freebsd.org (Postfix) with ESMTP id 2158F37B400
	for <arch@freebsd.org>; Wed, 29 Nov 2000 10:11:55 -0800 (PST)
Received: from [153.11.11.3] (helo=plunger.gdeb.com)
	by net2.gendyn.com with esmtp (Exim 2.12 #1)
	id 141Bhn-0002rX-00
	for arch@freebsd.org; Wed, 29 Nov 2000 13:11:43 -0500
Received: from orion.caen.gdeb.com ([153.11.109.11])
	by plunger.gdeb.com  with ESMTP id NAA01053
	for <arch@freebsd.org>; Wed, 29 Nov 2000 13:08:23 -0500 (EST)
Received: from vigrid.com (gpz.clc.gdeb.com [192.168.3.12])
	by orion.caen.gdeb.com (8.9.3/8.9.3) with ESMTP id NAA03358
	for <arch@freebsd.org>; Wed, 29 Nov 2000 13:08:35 -0500 (EST)
	(envelope-from eischen@vigrid.com)
Message-ID: <3A254710.ED8B2C26@vigrid.com>
Date: Wed, 29 Nov 2000 13:12:32 -0500
From: Dan Eischen <eischen@vigrid.com>
X-Mailer: Mozilla 4.75 [en] (X11; U; SunOS 5.8 sun4u)
X-Accept-Language: en
MIME-Version: 1.0
To: arch@freebsd.org
Subject: Modifying FILE to add lock
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Is there any objection to modifying struct __sFILE in stdio.h
to add a lock.  I am think we need to do this for libpthread.
This should let us eliminate the _THREAD_SAFE macro.

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 10:47:43 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 2A71F37B402
	for <arch@FreeBSD.ORG>; Wed, 29 Nov 2000 10:47:42 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id eATIlea24289;
	Wed, 29 Nov 2000 10:47:40 -0800 (PST)
Date: Wed, 29 Nov 2000 10:47:40 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Dan Eischen <eischen@vigrid.com>
Cc: arch@FreeBSD.ORG
Subject: Re: Modifying FILE to add lock
Message-ID: <20001129104740.L8051@fw.wintelcom.net>
References: <3A254710.ED8B2C26@vigrid.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3A254710.ED8B2C26@vigrid.com>; from eischen@vigrid.com on Wed, Nov 29, 2000 at 01:12:32PM -0500
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Dan Eischen <eischen@vigrid.com> [001129 10:11] wrote:
> Is there any objection to modifying struct __sFILE in stdio.h
> to add a lock.  I am think we need to do this for libpthread.
> This should let us eliminate the _THREAD_SAFE macro.

I have no objection as long as you bump the shared lib version
from -stable.  This would be a great time to do it.

While you're at it adding one to DIR structs would be very helpful
for fixing our threadsafeness with DIR handles.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 10:50:41 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 9239B37B400
	for <arch@FreeBSD.ORG>; Wed, 29 Nov 2000 10:50:39 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id eATIocM24437;
	Wed, 29 Nov 2000 10:50:38 -0800 (PST)
Date: Wed, 29 Nov 2000 10:50:38 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Dan Eischen <eischen@vigrid.com>
Cc: arch@FreeBSD.ORG
Subject: Re: Modifying FILE to add lock
Message-ID: <20001129105038.M8051@fw.wintelcom.net>
References: <3A254710.ED8B2C26@vigrid.com> <20001129104740.L8051@fw.wintelcom.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20001129104740.L8051@fw.wintelcom.net>; from bright@wintelcom.net on Wed, Nov 29, 2000 at 10:47:40AM -0800
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Alfred Perlstein <bright@wintelcom.net> [001129 10:47] wrote:
> * Dan Eischen <eischen@vigrid.com> [001129 10:11] wrote:
> > Is there any objection to modifying struct __sFILE in stdio.h
> > to add a lock.  I am think we need to do this for libpthread.
> > This should let us eliminate the _THREAD_SAFE macro.
> 
> I have no objection as long as you bump the shared lib version
> from -stable.  This would be a great time to do it.

...er but only if they aren't already bumped, if libc in 4.x is
at 4 and in 5-current is at 5 already then leave the versions 
alone.

> 
> While you're at it adding one to DIR structs would be very helpful
> for fixing our threadsafeness with DIR handles.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 10:55:30 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP id BCF7F37B401
	for <arch@FreeBSD.ORG>; Wed, 29 Nov 2000 10:55:25 -0800 (PST)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id NAA21275;
	Wed, 29 Nov 2000 13:54:55 -0500 (EST)
Date: Wed, 29 Nov 2000 13:54:55 -0500 (EST)
From: Daniel Eischen <eischen@vigrid.com>
To: Alfred Perlstein <bright@wintelcom.net>
Cc: Dan Eischen <eischen@vigrid.com>, arch@FreeBSD.ORG
Subject: Re: Modifying FILE to add lock
In-Reply-To: <20001129104740.L8051@fw.wintelcom.net>
Message-ID: <Pine.SUN.3.91.1001129134930.20431A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, 29 Nov 2000, Alfred Perlstein wrote:
> * Dan Eischen <eischen@vigrid.com> [001129 10:11] wrote:
> > Is there any objection to modifying struct __sFILE in stdio.h
> > to add a lock.  I am think we need to do this for libpthread.
> > This should let us eliminate the _THREAD_SAFE macro.
> 
> I have no objection as long as you bump the shared lib version
> from -stable.  This would be a great time to do it.

This would only be in -current (where the library versions have
already been bumped) and for our new libpthread.

> While you're at it adding one to DIR structs would be very helpful
> for fixing our threadsafeness with DIR handles.

Thanks!  I missed that one.

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 11:48:30 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.226])
	by hub.freebsd.org (Postfix) with ESMTP id 62EA037B699
	for <arch@FreeBSD.ORG>; Wed, 29 Nov 2000 11:48:28 -0800 (PST)
Received: from adlmail.cup.hp.com (adlmail.cup.hp.com [15.0.100.30])
	by palrel3.hp.com (Postfix) with ESMTP
	id 2B4BF37E; Wed, 29 Nov 2000 11:48:27 -0800 (PST)
Received: from cup.hp.com (p1000180.nsr.hp.com [15.109.0.180])
	by adlmail.cup.hp.com (8.9.3 (PHNE_18546)/8.9.3 SMKit7.02) with ESMTP id LAA17261;
	Wed, 29 Nov 2000 11:48:26 -0800 (PST)
Message-ID: <3A255D8A.7F5CFB26@cup.hp.com>
Date: Wed, 29 Nov 2000 11:48:26 -0800
From: Marcel Moolenaar <marcel@cup.hp.com>
Organization: Hewlett-Packard
X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.12 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Daniel Eischen <eischen@vigrid.com>
Cc: Alfred Perlstein <bright@wintelcom.net>, arch@FreeBSD.ORG
Subject: Re: Modifying FILE to add lock
References: <Pine.SUN.3.91.1001129134930.20431A-100000@pcnet1.pcnet.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Daniel Eischen wrote:
> 
> > I have no objection as long as you bump the shared lib version
> > from -stable.  This would be a great time to do it.
> 
> This would only be in -current (where the library versions have
> already been bumped) and for our new libpthread.

I agree. We should not MFC this. Library version bumps halfway on a
-stable branch doesn't seem appropriate. Changing structures is not
done, IMO.

Anyway: In stdio.h I'm told that I should read the warning before
changing the layout of struct __sFILE. There doesn't seem to be a
warning anywhere in the header file, so I figure it must be the long
comment before the struct declaration. The comment doesn't tell me what
happens if I change the layout. It only tells me what certain fields are
for and doesn't mention _offset at all, even though that field
specifically references the warning.

My point: We're hinted to be careful and cautious without actually being
told why. Can someone tell me what problems we might expect if we add a
new field, both specifically at the end and randomly within the
structure?

-- 
Marcel Moolenaar
  mail: marcel@cup.hp.com / marcel@FreeBSD.org
  tel:  (408) 447-4222


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 12: 7:33 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from alive.znep.com (sense-sea-MegaSub-1-500.oz.net [216.39.145.246])
	by hub.freebsd.org (Postfix) with ESMTP id 119EB37B400
	for <arch@FreeBSD.ORG>; Wed, 29 Nov 2000 12:07:31 -0800 (PST)
Received: from localhost (marcs@localhost)
	by alive.znep.com (8.9.3/8.9.1) with ESMTP id MAA74914;
	Wed, 29 Nov 2000 12:03:40 -0800 (PST)
	(envelope-from marcs@znep.com)
Date: Wed, 29 Nov 2000 12:03:40 -0800 (PST)
From: Marc Slemko <marcs@znep.com>
To: Marcel Moolenaar <marcel@cup.hp.com>
Cc: Daniel Eischen <eischen@vigrid.com>,
	Alfred Perlstein <bright@wintelcom.net>, arch@FreeBSD.ORG
Subject: Re: Modifying FILE to add lock
In-Reply-To: <3A255D8A.7F5CFB26@cup.hp.com>
Message-ID: <Pine.BSF.4.20.0011291152160.58003-100000@alive.znep.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, 29 Nov 2000, Marcel Moolenaar wrote:

> Anyway: In stdio.h I'm told that I should read the warning before
> changing the layout of struct __sFILE. There doesn't seem to be a
> warning anywhere in the header file, so I figure it must be the long

Look in an older revision (eg. 1.1) and you will see a few comments 
earlier some alignment warnings that have since gone the way of the dodo,
sortof.   I think that is what it is referring to...

> comment before the struct declaration. The comment doesn't tell me what
> happens if I change the layout. It only tells me what certain fields are
> for and doesn't mention _offset at all, even though that field
> specifically references the warning.
> 
> My point: We're hinted to be careful and cautious without actually being
> told why. Can someone tell me what problems we might expect if we add a
> new field, both specifically at the end and randomly within the
> structure?

If you add a new field in the middle, then any programs compiled
against the old header file that have to access anything in the
struct after your addition will potentially fall over horribly
since a lot of the access to random fields is done with macros.

If you add a field at the end, then anything that allocates memory
for a FILE will break, although it is bogus to do that anyway.

There is a reason why Solaris was stuck with the lame 8-bit limit
on the size of the file descriptor behind a stream for so long,
until they changed stuff around anyway in a new (at the time) 64-bit
ABI and bumped it up there to a sane number...


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 12: 8:21 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP id 7D6CC37B401
	for <arch@FreeBSD.ORG>; Wed, 29 Nov 2000 12:08:19 -0800 (PST)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id OAA01700;
	Wed, 29 Nov 2000 14:57:51 -0500 (EST)
Date: Wed, 29 Nov 2000 14:57:51 -0500 (EST)
From: Daniel Eischen <eischen@vigrid.com>
To: Marcel Moolenaar <marcel@cup.hp.com>
Cc: Alfred Perlstein <bright@wintelcom.net>, arch@FreeBSD.ORG
Subject: Re: Modifying FILE to add lock
In-Reply-To: <3A255D8A.7F5CFB26@cup.hp.com>
Message-ID: <Pine.SUN.3.91.1001129145439.1117A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, 29 Nov 2000, Marcel Moolenaar wrote:
> Daniel Eischen wrote:
> > 
> > > I have no objection as long as you bump the shared lib version
> > > from -stable.  This would be a great time to do it.
> > 
> > This would only be in -current (where the library versions have
> > already been bumped) and for our new libpthread.
> 
> I agree. We should not MFC this. Library version bumps halfway on a
> -stable branch doesn't seem appropriate. Changing structures is not
> done, IMO.
> 
> Anyway: In stdio.h I'm told that I should read the warning before
> changing the layout of struct __sFILE. There doesn't seem to be a
> warning anywhere in the header file, so I figure it must be the long
> comment before the struct declaration. The comment doesn't tell me what
> happens if I change the layout. It only tells me what certain fields are
> for and doesn't mention _offset at all, even though that field
> specifically references the warning.

I was also confused by the warning, which is part of the reason
I posted this to -arch.

-- 
Dan Eischen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 12:55:12 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id EC9E537B404
	for <arch@freebsd.org>; Wed, 29 Nov 2000 12:55:09 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id eATKt9t27977
	for arch@freebsd.org; Wed, 29 Nov 2000 12:55:09 -0800 (PST)
Date: Wed, 29 Nov 2000 12:55:09 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: arch@freebsd.org
Subject: serious problem with mutexs and userland visibility?
Message-ID: <20001129125508.O8051@fw.wintelcom.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

I'm looking from opinions from those who have:
1) been working closely on SMPng.
2) have a lot of experience with doing the right thing with
   header files and untangling evil dependencies.
3) have dealt with this situation on other operating systems.

I recently locked down struct ucred, not a big deal, basically just
a mutex in each struct to protect the refcount.

Unfortunetly struct ucred is used by some userland utils and
sys/ucred is included in sys/mount.h as well as sys/user.h, this
creates somewhat of a problem, forcing all users of sys/ucred.h to
include sys/mutex.gh.

I have a patch here that sort of takes care of this problem, the
problem is that I had to add sys/mutex.h includes to both sys/mount.h
and sys/user.h, this doesn't make me very happy.

It actually removes some bogus includes of sys/ucred from userland.

http://people.FreeBSD.org/~alfred/mpsafe/bde.diff

(you can all guess why it's called "bde.diff" :) )

What I'd like to do is make a struct 'kucred' which contains the
mutex and either contains a struct ucred or all the fields of struct
ucred.  'kucred' will be used by the kernel and I'll write helper
functions/macros to convert between the two.

This looks like a lot of drudgework, but I'm ok with it.  However
if it becomes the only way to deal with this situation we may have
a lot of drudgework ahead of us when this issue starts popping up
with other structures.

For instance, the uidinfo struct isn't currently exported to the
user, however it would be nice if it was to determine how far off
one was from exceeding thier limits.  We would need another
kernel/userland convertion pair for this facility if anyone wanted
to export the information contained in this structure.

If the general concensus is that exporting sys/mutex.h to userland
is to be avoided, but OK when necessary than I'd rather just apply
the patch I have right now.

Right now I'm of the opion of "by any means necessary", meaning I
really don't care about the visibility, proceeding with the mpsafe
work is far more important that polluting our headers right now.
I'm just concerned about taking it too far.

BSD/os gets around the struct ucred problem by having a single
ucred mutex used for the entire system, I don't like this because
even though it's a very short term lock, it will be cache contested
heavily between processors causing large amounts of bus traffic.

I also don't like the BSD/os approach, because it doesn't address
the problem of mutexes being inside structures declared in userland
included headers, it just avoids it for this specific case.

thanks,
-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 13:53:16 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.242])
	by hub.freebsd.org (Postfix) with ESMTP id 15E4D37B404
	for <arch@FreeBSD.ORG>; Wed, 29 Nov 2000 13:53:14 -0800 (PST)
Received: from adlmail.cup.hp.com (adlmail.cup.hp.com [15.0.100.30])
	by palrel1.hp.com (Postfix) with ESMTP
	id 32DFB1093; Wed, 29 Nov 2000 13:53:02 -0800 (PST)
Received: from cup.hp.com (gauss.cup.hp.com [15.28.97.152])
	by adlmail.cup.hp.com (8.9.3 (PHNE_18546)/8.9.3 SMKit7.02) with ESMTP id NAA22402;
	Wed, 29 Nov 2000 13:53:01 -0800 (PST)
Message-ID: <3A257ABD.5238ED4E@cup.hp.com>
Date: Wed, 29 Nov 2000 16:53:01 -0500
From: Marcel Moolenaar <marcel@cup.hp.com>
Organization: Hewlett-Packard
X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.12 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Marc Slemko <marcs@znep.com>
Cc: Daniel Eischen <eischen@vigrid.com>,
	Alfred Perlstein <bright@wintelcom.net>, arch@FreeBSD.ORG
Subject: Re: Modifying FILE to add lock
References: <Pine.BSF.4.20.0011291152160.58003-100000@alive.znep.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Marc Slemko wrote:
> 
> If you add a field at the end, then anything that allocates memory
> for a FILE will break, although it is bogus to do that anyway.

Having done the signal changes, I immediately have to think about the
Modula port...

-- 
Marcel Moolenaar
  mail: marcel@cup.hp.com / marcel@FreeBSD.org
  tel:  (408) 447-4222


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 14: 3:54 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 4EDDC37B400
	for <arch@FreeBSD.ORG>; Wed, 29 Nov 2000 14:03:52 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id eATM1JP29867;
	Wed, 29 Nov 2000 14:01:19 -0800 (PST)
Date: Wed, 29 Nov 2000 14:01:19 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Marcel Moolenaar <marcel@cup.hp.com>
Cc: Marc Slemko <marcs@znep.com>,
	Daniel Eischen <eischen@vigrid.com>, arch@FreeBSD.ORG
Subject: Re: Modifying FILE to add lock
Message-ID: <20001129140119.P8051@fw.wintelcom.net>
References: <Pine.BSF.4.20.0011291152160.58003-100000@alive.znep.com> <3A257ABD.5238ED4E@cup.hp.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3A257ABD.5238ED4E@cup.hp.com>; from marcel@cup.hp.com on Wed, Nov 29, 2000 at 04:53:01PM -0500
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Marcel Moolenaar <marcel@cup.hp.com> [001129 13:53] wrote:
> Marc Slemko wrote:
> > 
> > If you add a field at the end, then anything that allocates memory
> > for a FILE will break, although it is bogus to do that anyway.
> 
> Having done the signal changes, I immediately have to think about the
> Modula port...

I've never ever looked at the contents of struct FILE except to
research how stdio works.  Why do we need to care about the
contents of struct FILE (or DIR)?  We have funopen do deal with
creating our own special streams, what's the point of digging
into struct FILE?

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 14:43:36 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.226])
	by hub.freebsd.org (Postfix) with ESMTP id DF2C737B401
	for <arch@FreeBSD.ORG>; Wed, 29 Nov 2000 14:43:34 -0800 (PST)
Received: from adlmail.cup.hp.com (adlmail.cup.hp.com [15.0.100.30])
	by palrel3.hp.com (Postfix) with ESMTP
	id 5FBDA420; Wed, 29 Nov 2000 14:43:34 -0800 (PST)
Received: from cup.hp.com (gauss.cup.hp.com [15.28.97.152])
	by adlmail.cup.hp.com (8.9.3 (PHNE_18546)/8.9.3 SMKit7.02) with ESMTP id OAA24388;
	Wed, 29 Nov 2000 14:43:34 -0800 (PST)
Message-ID: <3A258696.EAD7BD7A@cup.hp.com>
Date: Wed, 29 Nov 2000 17:43:34 -0500
From: Marcel Moolenaar <marcel@cup.hp.com>
Organization: Hewlett-Packard
X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.12 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Alfred Perlstein <bright@wintelcom.net>
Cc: Marc Slemko <marcs@znep.com>,
	Daniel Eischen <eischen@vigrid.com>, arch@FreeBSD.ORG
Subject: Re: Modifying FILE to add lock
References: <Pine.BSF.4.20.0011291152160.58003-100000@alive.znep.com> <3A257ABD.5238ED4E@cup.hp.com> <20001129140119.P8051@fw.wintelcom.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Alfred Perlstein wrote:
> 
> I've never ever looked at the contents of struct FILE except to
> research how stdio works.  Why do we need to care about the
> contents of struct FILE (or DIR)?  We have funopen do deal with
> creating our own special streams, what's the point of digging
> into struct FILE?

The fact that you (and I) can't see the point, doesn't mean there is no
point. Ignoring the fact that maybe there's a point somehow or somewhere
is far more worse than reaching general consensus that there likely is
no point at all.

Modula has some weird architecture and OS dependencies, IIRC. It doesn't
hurt to check it out before we commit the change.

-- 
Marcel Moolenaar
  mail: marcel@cup.hp.com / marcel@FreeBSD.org
  tel:  (408) 447-4222


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 17:49: 7 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id CE62F37B401
	for <arch@FreeBSD.ORG>; Wed, 29 Nov 2000 17:49:05 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id eAU1n5K05837
	for arch@FreeBSD.ORG; Wed, 29 Nov 2000 17:49:05 -0800 (PST)
Date: Wed, 29 Nov 2000 17:49:05 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: arch@FreeBSD.ORG
Subject: HEADSUP user struct ucred -> xucred (Was: Re: serious problem with mutexs and userland visibility?)
Message-ID: <20001129174905.S8051@fw.wintelcom.net>
References: <20001129125508.O8051@fw.wintelcom.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20001129125508.O8051@fw.wintelcom.net>; from bright@wintelcom.net on Wed, Nov 29, 2000 at 12:55:09PM -0800
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Alfred Perlstein <bright@wintelcom.net> [001129 12:55] wrote:
> 
> I recently locked down struct ucred, not a big deal, basically just
> a mutex in each struct to protect the refcount.
> 
> Unfortunetly struct ucred is used by some userland utils and
> sys/ucred is included in sys/mount.h as well as sys/user.h, this
> creates somewhat of a problem, forcing all users of sys/ucred.h to
> include sys/mutex.g.
> 
> I have a patch here that sort of takes care of this problem, the
> problem is that I had to add sys/mutex.h includes to both sys/mount.h
> and sys/user.h, this doesn't make me very happy.

After a short discussion it has been determined that there will be
a xucred exported to userland following the concention of xsocket
and the various other xfoo structs exported to the kernel.

Struct ucred will no longer be visible outside the kernel.

Any userland things using struct ucred will need to use xucred.

This will be the convention used to resolve mutex (or other MD 
fields) in kernel exported structures in the future.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 22:16:59 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169])
	by hub.freebsd.org (Postfix) with ESMTP
	id 88A4C37B400; Wed, 29 Nov 2000 22:16:54 -0800 (PST)
Received: (from ken@localhost)
	by panzer.kdm.org (8.9.3/8.9.1) id XAA01540;
	Wed, 29 Nov 2000 23:16:53 -0700 (MST)
	(envelope-from ken)
Date: Wed, 29 Nov 2000 23:16:53 -0700
From: "Kenneth D. Merry" <ken@kdm.org>
To: arch@FreeBSD.org
Cc: gallatin@FreeBSD.org
Subject: zero copy code review
Message-ID: <20001129231653.A1503@panzer.kdm.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

[ -net and -current BCCed for wider coverage, this is probably best
handled on -arch ]

I would like to request reviews of the zero copy sockets and NFS code I've
been posting about for months:

http://people.FreeBSD.org/~ken/zero_copy

There are diffs posted above against -current as of early November 28th,
along with a FAQ, and change log.

These diffs include changes in:

 - the socket code
 - NFS code
 - VM code
 - ti(4) driver
 - sendfile code

Much of the code was written by Drew Gallatin <gallatin@FreeBSD.org>, but I
wrote a lot of the ti(4) driver mods and cleaned things up a fair bit.

The code is stable, and I don't know of any bugs at the moment.  I have run
with it enabled on one of my main development boxes for months without any
problems.

The way things are currently configured, it is not turned on by default.
You need two kernel options and a sysctl to turn it on.  The zero copy NFS
code can be turned on with gdb, although it might be better to make that
into a sysctl.  (I haven't played with the zero copy NFS code much, Drew
has done much more with that.)

How to turn the code on is covered in the web page, above.

Anyway, I'd like to commit this code sometime next week, if no one comes up
with any issues or problems.

Comments, bug reports, etc., are welcome.

Thanks!

Ken
-- 
Kenneth Merry
ken@kdm.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 22:33:37 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131])
	by hub.freebsd.org (Postfix) with ESMTP id 6E7C437B400
	for <arch@FreeBSD.ORG>; Wed, 29 Nov 2000 22:33:34 -0800 (PST)
Received: (from daemon@localhost)
	by smtp01.primenet.com (8.9.3/8.9.3) id XAA16956;
	Wed, 29 Nov 2000 23:32:19 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp01.primenet.com, id smtpdAAAIdaigE; Wed Nov 29 23:27:07 2000
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id XAA06955;
	Wed, 29 Nov 2000 23:28:13 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200011300628.XAA06955@usr08.primenet.com>
Subject: Re: HEADSUP user struct ucred -> xucred (Was: Re: serious problem with mutexs and userland visibility?)
To: bright@wintelcom.net (Alfred Perlstein)
Date: Thu, 30 Nov 2000 06:28:12 +0000 (GMT)
Cc: arch@FreeBSD.ORG
In-Reply-To: <20001129174905.S8051@fw.wintelcom.net> from "Alfred Perlstein" at Nov 29, 2000 05:49:05 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> > I recently locked down struct ucred, not a big deal, basically just
> > a mutex in each struct to protect the refcount.
> > 
> > Unfortunetly struct ucred is used by some userland utils and
> > sys/ucred is included in sys/mount.h as well as sys/user.h, this
> > creates somewhat of a problem, forcing all users of sys/ucred.h to
> > include sys/mutex.g.
> > 
> > I have a patch here that sort of takes care of this problem, the
> > problem is that I had to add sys/mutex.h includes to both sys/mount.h
> > and sys/user.h, this doesn't make me very happy.
> 
> After a short discussion it has been determined that there will be
> a xucred exported to userland following the concention of xsocket
> and the various other xfoo structs exported to the kernel.
> 
> Struct ucred will no longer be visible outside the kernel.
> 
> Any userland things using struct ucred will need to use xucred.
> 
> This will be the convention used to resolve mutex (or other MD 
> fields) in kernel exported structures in the future.

This is a really gross way to handle this.  The ucred structure
is used by a lot of user space programs.

You should do what several UNIX vendors have already done, and
implement a MUTEX() declaration macro that differes in user and
kernel space, and forces an alignment; then when you copy out,
copy out everything _BUT_ the mutex portion to the user space,
and no user space source or object code will need to change.

So:

	#ifdef _KERNEL
	#define	MUTEX(x)	mutex_t	x;
	#define	UREF(x,y)	(void *)&((x)->y)
	#else
	#define	MUTEX(x)	/* user space = no mutex*/
	#define UREF(x,y)	(void *)(x)
	#endif

	struct foo {
		MUTEX(save_foo_from_bad_programmers)
		int	normal_foo_item_1;
		char	normal_foo_item_2;
		...
	};

	...

	struct foo *foop;

	...

	copyout( UREF(foop, normal_foo_item_1), user_space_foo);

It is much better to not impact user space code at all.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 22:53:20 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134])
	by hub.freebsd.org (Postfix) with ESMTP id 3A33A37B402
	for <arch@FreeBSD.ORG>; Wed, 29 Nov 2000 22:53:17 -0800 (PST)
Received: (from daemon@localhost)
	by smtp04.primenet.com (8.9.3/8.9.3) id XAA24967;
	Wed, 29 Nov 2000 23:49:23 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp04.primenet.com, id smtpdAAAUqayRW; Wed Nov 29 23:49:19 2000
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id XAA07381;
	Wed, 29 Nov 2000 23:53:09 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200011300653.XAA07381@usr08.primenet.com>
Subject: Re: Modifying FILE to add lock
To: marcel@cup.hp.com (Marcel Moolenaar)
Date: Thu, 30 Nov 2000 06:53:09 +0000 (GMT)
Cc: bright@wintelcom.net (Alfred Perlstein),
	marcs@znep.com (Marc Slemko), eischen@vigrid.com (Daniel Eischen),
	arch@FreeBSD.ORG
In-Reply-To: <3A258696.EAD7BD7A@cup.hp.com> from "Marcel Moolenaar" at Nov 29, 2000 05:43:34 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> > I've never ever looked at the contents of struct FILE except to
> > research how stdio works.  Why do we need to care about the
> > contents of struct FILE (or DIR)?  We have funopen do deal with
> > creating our own special streams, what's the point of digging
> > into struct FILE?
> 
> The fact that you (and I) can't see the point, doesn't mean there is no
> point. Ignoring the fact that maybe there's a point somehow or somewhere
> is far more worse than reaching general consensus that there likely is
> no point at all.
> 
> Modula has some weird architecture and OS dependencies, IIRC. It doesn't
> hurt to check it out before we commit the change.

There are a number of programs which traditionally need to
be able to access the contents of the FILE buffers directly,
particularly with regard to things like "unget", and so on.

Mostly, these are mixed-mode programs, which do things like
bounce in and out of raw mode, or set cbreak, or modify the
value of vmin or vtime, and wish to act properly on already
typed ahead on ungetc()'ed characters that have been buffered.

It would be terrifically useful, for example, for getpass()
to use this to permit scripting of the creation of user
accounts (as one example).  That it does not work that way
means you have to resort to "pw" (a perl abomination) to get
the job done right.

Historically, things like EMACS and simulations that like to
implement command "intertia" (no command in the timeout window
means the previous command is in effect) tend to directly
manipulate buffered input contents.

There is at least one "curses"-like library of which I'm
aware that actually manipulates buffered output contents to
remove redundant output (e.g. "don't draw X there, if you
are going to draw Y there immediately afterward).  It's very
useful for slow links for things like text editors, where I
can delete a character, insert another, and end up with only
a single character being redrawn once, instead of to the end
of the line from the deletion/insertion point needing to be
rendered twice.

There are also programs which move stdin/out/err around to
effect certain features, without telling the program about
it (screen used to be one, so that it could support session
detach and reattach).

Suffice it to say that not everyone uses the macros, and
those who do, tend to not want to recompile the world.

You might consider using the old "debugging malloc" trick,
of allocating one structure, but referring to another, and
reference your "hidden" lock at a negative offset.  This
would let you pass around FILE objects that were allocated
larger than they were supposed to be, and reference locks
at a negative offset.  This would require some simple pointer
math on allocation, and would ensure binary backward compatability
with old programs and the new libc, without requiring a version
bump at all. 

If you use this trick, be wary of "#pragma pack()" in scope,
since unlike the kernel MUTEX() trick, the relative location
of the start of the shadow structure will end up moving around,
if you aren't explicit.

struct foo {
	whatever;
	whatever;
	...
};

struct foo_with_lock {
	LOCK	alfreds_new_lock;
	struct foo internal_foo;
};

Pass around:

struct foo *foop = &(foo_with_lockp->internal_foo);

Reference the lock with:

CVT_TO_LOCKED(struct foo_with_lock, foop)->alfreds_new_lock

#define CVT_TO_LOCKED(x,y)	\
	(void *)(((char *)(y)) - (int)&(((x *)0)->internal_foo))

I would probably force the packing around the declaration in the
header file.

					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 22:55:52 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from hand.dotat.at (sfo-gw.covalent.net [207.44.198.62])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7F71C37B400; Wed, 29 Nov 2000 22:55:49 -0800 (PST)
Received: from fanf by hand.dotat.at with local (Exim 3.15 #3)
	id 141NcV-0007Al-00; Thu, 30 Nov 2000 06:55:03 +0000
Date: Thu, 30 Nov 2000 06:55:03 +0000
From: Tony Finch <dot@dotat.at>
To: Daniel Eischen <eischen@vigrid.com>
Cc: Alfred Perlstein <bright@wintelcom.net>,
	John Baldwin <jhb@FreeBSD.ORG>, Arun Sharma <arun@sharmas.dhs.org>,
	arch@FreeBSD.ORG
Subject: Re: Thread-specific data and KSEs
Message-ID: <20001130065503.E58294@hand.dotat.at>
References: <20001122133421.S18037@fw.wintelcom.net> <Pine.SUN.3.91.1001122180448.7920A-100000@pcnet1.pcnet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <Pine.SUN.3.91.1001122180448.7920A-100000@pcnet1.pcnet.com>
Organization: Covalent Technologies, Inc
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Daniel Eischen <eischen@vigrid.com> wrote:
>On Wed, 22 Nov 2000, Alfred Perlstein wrote:
>> 
>> Was there something wrong with the suggestion to put the local info
>> on the stack?  I just don't see it being discussed at all.
>
>Yes, I stated that it could not be used.  We want to provide a POSIX
>complaint API, and this dictates that applications be able to create
>stacks of their own size and choosing.  We can't rely on stacks being
>any particular size, or starting at any particular address.

Additionally, wouldn't you have to walk up the stack to find its base?
(which I guess would be a bit more expensive than dereferencing %gs)

Tony.
-- 
f.a.n.finch     dot@dotat.at     fanf@covalent.net     Chad for President!


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 23: 8:37 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from hand.dotat.at (sfo-gw.covalent.net [207.44.198.62])
	by hub.freebsd.org (Postfix) with ESMTP
	id 57AC937B401; Wed, 29 Nov 2000 23:08:35 -0800 (PST)
Received: from fanf by hand.dotat.at with local (Exim 3.15 #3)
	id 141Nom-0007ZL-00; Thu, 30 Nov 2000 07:07:44 +0000
Date: Thu, 30 Nov 2000 07:07:44 +0000
From: Tony Finch <dot@dotat.at>
To: Terry Lambert <tlambert@primenet.com>
Cc: Daniel Eischen <eischen@vigrid.com>,
	Alfred Perlstein <bright@wintelcom.net>,
	John Baldwin <jhb@FreeBSD.ORG>,
	Jonathan Lemon <jlemon@flugsvamp.com>, arch@FreeBSD.ORG,
	Tony Finch <dot@dotat.at>
Subject: Re: Thread-specific data and KSEs
Message-ID: <20001130070744.F58294@hand.dotat.at>
References: <Pine.SUN.3.91.1001122180746.7920B-100000@pcnet1.pcnet.com> <200011240208.TAA06691@usr06.primenet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <200011240208.TAA06691@usr06.primenet.com>
Organization: Covalent Technologies, Inc
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Terry Lambert <tlambert@primenet.com> wrote:
>
>I suspect that someone, somewhere, is working on an OS like the one
>at the University of Utah, using source code to migrate processes
>between dissimilar architectures (as one over-the-top example).

In 1993 I saw an OS called Taos running on a PC with a transputer
expansion card, transparently migrating programs between the two
architectures using JIT compilation of bytecode. It also had support
for ARM and other architectures. They're still around:
http://www.tao.co.uk/.

Tony.
-- 
f.a.n.finch     dot@dotat.at     fanf@covalent.net     Chad for President!


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 23:31:46 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from hand.dotat.at (sfo-gw.covalent.net [207.44.198.62])
	by hub.freebsd.org (Postfix) with ESMTP id 5BC5837B402
	for <arch@freebsd.org>; Wed, 29 Nov 2000 23:31:44 -0800 (PST)
Received: from fanf by hand.dotat.at with local (Exim 3.15 #3)
	id 141OBE-0008I8-00; Thu, 30 Nov 2000 07:30:56 +0000
Date: Thu, 30 Nov 2000 07:30:56 +0000
From: Tony Finch <dot@dotat.at>
To: Jordan Hubbard <jkh@winston.osd.bsdi.com>
Cc: Kirk McKusick <mckusick@mckusick.com>, arch@FreeBSD.ORG,
	rps@merlin.mat.uc.pt
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.)
Message-ID: <20001130073056.G58294@hand.dotat.at>
References: <mckusick@mckusick.com> <53352.975375693@winston.osd.bsdi.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <53352.975375693@winston.osd.bsdi.com>
Organization: Covalent Technologies, Inc
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Jordan Hubbard <jkh@winston.osd.bsdi.com> wrote:
>
>> I do not believe that we need/want a general aliasing facility as
>> erase is really the only character for which there is widespead
>> disagreement over which character to use.
>
>Well, there are the ^U vs ^X folks for line-kill (some even argue for
>^W) which is why I cited it as another example; I agree that it's by
>no means as prevalent as ^H vs DEL though.

And we *love* SVR4 OSs that bind ^? to intr.

Tony.
-- 
f.a.n.finch     dot@dotat.at     fanf@covalent.net     Chad for President!


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 23:39:16 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.242])
	by hub.freebsd.org (Postfix) with ESMTP id B0ADF37B400
	for <arch@FreeBSD.ORG>; Wed, 29 Nov 2000 23:39:14 -0800 (PST)
Received: from adlmail.cup.hp.com (adlmail.cup.hp.com [15.0.100.30])
	by palrel1.hp.com (Postfix) with ESMTP
	id ED6BE1113; Wed, 29 Nov 2000 23:39:13 -0800 (PST)
Received: from cup.hp.com (p1000180.nsr.hp.com [15.109.0.180])
	by adlmail.cup.hp.com (8.9.3 (PHNE_18546)/8.9.3 SMKit7.02) with ESMTP id XAA10696;
	Wed, 29 Nov 2000 23:39:13 -0800 (PST)
Message-ID: <3A260420.6A753ECB@cup.hp.com>
Date: Wed, 29 Nov 2000 23:39:12 -0800
From: Marcel Moolenaar <marcel@cup.hp.com>
Organization: Hewlett-Packard
X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.12 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Terry Lambert <tlambert@primenet.com>
Cc: Alfred Perlstein <bright@wintelcom.net>,
	Marc Slemko <marcs@znep.com>, Daniel Eischen <eischen@vigrid.com>,
	arch@FreeBSD.ORG
Subject: Re: Modifying FILE to add lock
References: <200011300653.XAA07381@usr08.primenet.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Terry Lambert wrote:
> 
> You might consider using the old "debugging malloc" trick,
> of allocating one structure, but referring to another, and
> reference your "hidden" lock at a negative offset.

Hmmmm.... yes. This would present an unchanged struct __sFILE to
programs, but adding a field at the end would also present an unchanged
struct __sFILE. In both cases, the program doesn't know there are more
fields; either before or after what it thinks is struct __sFILE. Adding
to the struct however is much simpler.

-- 
Marcel Moolenaar
  mail: marcel@cup.hp.com / marcel@FreeBSD.org
  tel:  (408) 447-4222


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 23:44:50 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from winston.osd.bsdi.com (winston.osd.bsdi.com [204.216.27.229])
	by hub.freebsd.org (Postfix) with ESMTP
	id 78B7D37B698; Wed, 29 Nov 2000 23:44:48 -0800 (PST)
Received: from winston.osd.bsdi.com (jkh@localhost [127.0.0.1])
	by winston.osd.bsdi.com (8.11.1/8.11.1) with ESMTP id eAU7igM76143;
	Wed, 29 Nov 2000 23:44:42 -0800 (PST)
	(envelope-from jkh@winston.osd.bsdi.com)
To: "Kenneth D. Merry" <ken@kdm.org>
Cc: arch@FreeBSD.ORG, gallatin@FreeBSD.ORG
Subject: Re: zero copy code review 
In-Reply-To: Message from "Kenneth D. Merry" <ken@kdm.org> 
   of "Wed, 29 Nov 2000 23:16:53 MST." <20001129231653.A1503@panzer.kdm.org> 
Date: Wed, 29 Nov 2000 23:44:42 -0800
Message-ID: <76139.975570282@winston.osd.bsdi.com>
From: Jordan Hubbard <jkh@winston.osd.bsdi.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> You need two kernel options and a sysctl to turn it on.  The zero copy NFS
> code can be turned on with gdb, although it might be better to make that
> into a sysctl.  (I haven't played with the zero copy NFS code much, Drew

I agree that it really should be a sysctl.

> Anyway, I'd like to commit this code sometime next week, if no one comes up
> with any issues or problems.

How about adding that extra sysctl first. :-)

- Jordan


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Nov 29 23:46:42 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169])
	by hub.freebsd.org (Postfix) with ESMTP
	id DBAE737B402; Wed, 29 Nov 2000 23:46:39 -0800 (PST)
Received: (from ken@localhost)
	by panzer.kdm.org (8.9.3/8.9.1) id AAA02077;
	Thu, 30 Nov 2000 00:46:36 -0700 (MST)
	(envelope-from ken)
Date: Thu, 30 Nov 2000 00:46:36 -0700
From: "Kenneth D. Merry" <ken@kdm.org>
To: Jordan Hubbard <jkh@winston.osd.bsdi.com>
Cc: arch@FreeBSD.ORG, gallatin@FreeBSD.ORG
Subject: Re: zero copy code review
Message-ID: <20001130004636.A2061@panzer.kdm.org>
References: <ken@kdm.org> <76139.975570282@winston.osd.bsdi.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <76139.975570282@winston.osd.bsdi.com>; from jkh@winston.osd.bsdi.com on Wed, Nov 29, 2000 at 11:44:42PM -0800
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, Nov 29, 2000 at 23:44:42 -0800, Jordan Hubbard wrote:
> > You need two kernel options and a sysctl to turn it on.  The zero copy NFS
> > code can be turned on with gdb, although it might be better to make that
> > into a sysctl.  (I haven't played with the zero copy NFS code much, Drew
> 
> I agree that it really should be a sysctl.
> 
> > Anyway, I'd like to commit this code sometime next week, if no one comes up
> > with any issues or problems.
> 
> How about adding that extra sysctl first. :-)

Okay, will-do. :)

Ken
-- 
Kenneth Merry
ken@kdm.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30  1:31:49 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from flood.ping.uio.no (flood.ping.uio.no [129.240.78.31])
	by hub.freebsd.org (Postfix) with ESMTP id 76BDA37B401
	for <arch@FreeBSD.ORG>; Thu, 30 Nov 2000 01:31:47 -0800 (PST)
Received: (from des@localhost)
	by flood.ping.uio.no (8.9.3/8.9.3) id KAA79516;
	Thu, 30 Nov 2000 10:31:23 +0100 (CET)
	(envelope-from des@ofug.org)
X-URL: http://www.ofug.org/~des/
X-Disclaimer: The views expressed in this message do not necessarily
  coincide with those of any organisation or company with
  which I am or have been affiliated.
To: Tony Finch <dot@dotat.at>
Cc: Jordan Hubbard <jkh@winston.osd.bsdi.com>,
	Kirk McKusick <mckusick@mckusick.com>, arch@FreeBSD.ORG,
	rps@merlin.mat.uc.pt
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.)
References: <mckusick@mckusick.com> <53352.975375693@winston.osd.bsdi.com> <20001130073056.G58294@hand.dotat.at>
From: Dag-Erling Smorgrav <des@ofug.org>
Date: 30 Nov 2000 10:31:23 +0100
In-Reply-To: Tony Finch's message of "Thu, 30 Nov 2000 07:30:56 +0000"
Message-ID: <xzpwvdlx0us.fsf@flood.ping.uio.no>
Lines: 9
User-Agent: Gnus/5.0802 (Gnus v5.8.2) Emacs/20.4
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Tony Finch <dot@dotat.at> writes:
> And we *love* SVR4 OSs that bind ^? to intr.

The only one I've come across that does that is IRIX, but it's really
a *major* PITA.

DES
-- 
Dag-Erling Smorgrav - des@ofug.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30  1:50:18 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mimer.webgiro.com (unknown [213.162.128.50])
	by hub.freebsd.org (Postfix) with ESMTP id 03C4B37B401
	for <arch@FreeBSD.ORG>; Thu, 30 Nov 2000 01:50:14 -0800 (PST)
Received: by mimer.webgiro.com (Postfix, from userid 66)
	id F20682DC0B; Thu, 30 Nov 2000 10:52:15 +0100 (CET)
Received: by mx.webgiro.com (Postfix, from userid 1001)
	id 580E77817; Thu, 30 Nov 2000 10:48:43 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
	by mx.webgiro.com (Postfix) with ESMTP
	id 4792010E1B; Thu, 30 Nov 2000 10:48:43 +0100 (CET)
Date: Thu, 30 Nov 2000 10:48:43 +0100 (CET)
From: Andrzej Bialecki <abial@webgiro.com>
To: Terry Lambert <tlambert@primenet.com>
Cc: Alfred Perlstein <bright@wintelcom.net>, arch@FreeBSD.ORG
Subject: Re: HEADSUP user struct ucred -> xucred (Was: Re: serious problem
 with mutexs and userland visibility?)
In-Reply-To: <200011300628.XAA06955@usr08.primenet.com>
Message-ID: <Pine.BSF.4.20.0011301037190.51755-100000@mx.webgiro.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, 30 Nov 2000, Terry Lambert wrote:

> > > Unfortunetly struct ucred is used by some userland utils and
> > > sys/ucred is included in sys/mount.h as well as sys/user.h, this
> > > creates somewhat of a problem, forcing all users of sys/ucred.h to
> > > include sys/mutex.g.
> > > 
> > > I have a patch here that sort of takes care of this problem, the
> > > problem is that I had to add sys/mutex.h includes to both sys/mount.h
> > > and sys/user.h, this doesn't make me very happy.
> > 
> > After a short discussion it has been determined that there will be
> > a xucred exported to userland following the concention of xsocket
> > and the various other xfoo structs exported to the kernel.
> > 
> > Struct ucred will no longer be visible outside the kernel.
> > 
> > Any userland things using struct ucred will need to use xucred.
> > 
> > This will be the convention used to resolve mutex (or other MD 
> > fields) in kernel exported structures in the future.
> 
> This is a really gross way to handle this.  The ucred structure
> is used by a lot of user space programs.
> 
> You should do what several UNIX vendors have already done, and
> implement a MUTEX() declaration macro that differes in user and
> kernel space, and forces an alignment; then when you copy out,
> copy out everything _BUT_ the mutex portion to the user space,
> and no user space source or object code will need to change.

But don't we have the same issue with other parts of kernel structures
that we don't want to make visible to userland, not just the
mutexes.

I had some discussion with Robert Watson a few days ago about the need to
hide the layout of struct proc (and the changes it undergoes) from
userland, which would allow to stabilize kernel interface to user
utilities, like libkvm and friends (which probably should use
specialized sysctl anyway). This goal would be quite difficult to achieve
with just macros (and ugly at that..), so we thought about fixing all
places where these structs are accessible to use special version of "user
space struct proc" (== struct xproc? :-).

This way no user space code will have to be changed (more than today,
i.e. recompile libkvm et al., as usual), we could hide the complexities
that we don't want to be visible outside the kernel, and we gain the
stability in kernel/user interface (i.e. no more recompiles of userland
needed if you update the kernel with changed struct proc size).

Andrzej Bialecki

//  <abial@webgiro.com> WebGiro AB, Sweden (http://www.webgiro.com)
// -------------------------------------------------------------------
// ------ FreeBSD: The Power to Serve. http://www.freebsd.org --------
// --- Small & Embedded FreeBSD: http://www.freebsd.org/~picobsd/ ----


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30  1:55: 9 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mimer.webgiro.com (unknown [213.162.128.50])
	by hub.freebsd.org (Postfix) with ESMTP id 7552737B400
	for <arch@FreeBSD.ORG>; Thu, 30 Nov 2000 01:55:07 -0800 (PST)
Received: by mimer.webgiro.com (Postfix, from userid 66)
	id BAFA42DC0E; Thu, 30 Nov 2000 10:57:16 +0100 (CET)
Received: by mx.webgiro.com (Postfix, from userid 1001)
	id A2BC07817; Thu, 30 Nov 2000 10:51:41 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
	by mx.webgiro.com (Postfix) with ESMTP
	id 9474E10E1B; Thu, 30 Nov 2000 10:51:41 +0100 (CET)
Date: Thu, 30 Nov 2000 10:51:41 +0100 (CET)
From: Andrzej Bialecki <abial@webgiro.com>
To: Dag-Erling Smorgrav <des@ofug.org>
Cc: arch@FreeBSD.ORG
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO
 image for x86 updated.)
In-Reply-To: <xzpwvdlx0us.fsf@flood.ping.uio.no>
Message-ID: <Pine.BSF.4.20.0011301049410.51755-100000@mx.webgiro.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On 30 Nov 2000, Dag-Erling Smorgrav wrote:

> Tony Finch <dot@dotat.at> writes:
> > And we *love* SVR4 OSs that bind ^? to intr.
> 
> The only one I've come across that does that is IRIX, but it's really
> a *major* PITA.

SCO OpenServer does this as well. I hate it.

Andrzej Bialecki

//  <abial@webgiro.com> WebGiro AB, Sweden (http://www.webgiro.com)
// -------------------------------------------------------------------
// ------ FreeBSD: The Power to Serve. http://www.freebsd.org --------
// --- Small & Embedded FreeBSD: http://www.freebsd.org/~picobsd/ ----


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30  2:26:21 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 2B18A37B400
	for <arch@FreeBSD.ORG>; Thu, 30 Nov 2000 02:26:19 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id eAUAQEn18760;
	Thu, 30 Nov 2000 02:26:14 -0800 (PST)
Date: Thu, 30 Nov 2000 02:26:14 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Andrzej Bialecki <abial@webgiro.com>
Cc: Terry Lambert <tlambert@primenet.com>, arch@FreeBSD.ORG
Subject: Re: HEADSUP user struct ucred -> xucred (Was: Re: serious problem with mutexs and userland visibility?)
Message-ID: <20001130022614.W8051@fw.wintelcom.net>
References: <200011300628.XAA06955@usr08.primenet.com> <Pine.BSF.4.20.0011301037190.51755-100000@mx.webgiro.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <Pine.BSF.4.20.0011301037190.51755-100000@mx.webgiro.com>; from abial@webgiro.com on Thu, Nov 30, 2000 at 10:48:43AM +0100
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Andrzej Bialecki <abial@webgiro.com> [001130 01:50] wrote:
> On Thu, 30 Nov 2000, Terry Lambert wrote:
> 
> > > After a short discussion it has been determined that there will be
> > > a xucred exported to userland following the concention of xsocket
> > > and the various other xfoo structs exported to the kernel.
> > 
> > You should do what several UNIX vendors have already done, and
> > implement a MUTEX() declaration macro that differes in user and
> > kernel space, and forces an alignment; then when you copy out,
> > copy out everything _BUT_ the mutex portion to the user space,
> > and no user space source or object code will need to change.
> 
> But don't we have the same issue with other parts of kernel structures
> that we don't want to make visible to userland, not just the
> mutexes.

True.

> I had some discussion with Robert Watson a few days ago about the need to
> hide the layout of struct proc (and the changes it undergoes) from
> userland, which would allow to stabilize kernel interface to user
> utilities, like libkvm and friends (which probably should use
> specialized sysctl anyway). This goal would be quite difficult to achieve
> with just macros (and ugly at that..), so we thought about fixing all
> places where these structs are accessible to use special version of "user
> space struct proc" (== struct xproc? :-).

Ok, kvm is killing me. :/

see:
~"lib/libkvm/kvm_proc.c" line 125 of 793

libkvm expects to be able to copy the pointer in the struct proc into
its own struct.

My only chance (or so it seems) is to keep all userland visible parts
of the ucred at the begininning of it, as well as forcing the same
order to keep libkvm happy.  Then it can effectively:

  bcopy(struct ucred *uc, struct xucred *xuc, sizeof(struct xucred));

without worries, this is pretty hackish, but libkvm isn't exactly
your state of the art interface.

This is pretty close to what Terry suggested but less scary in
my opinion as long as we add a comment to sys/ucred.h about
keeping kernel only feilds at the end of the struct.

?

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30  5:44:41 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1])
	by hub.freebsd.org (Postfix) with ESMTP id 6B7AA37B401
	for <arch@FreeBSD.ORG>; Thu, 30 Nov 2000 05:44:39 -0800 (PST)
Received: from berserker.bsdi.com (cp@localhost.bsdi.com [127.0.0.1])
	by berserker.bsdi.com (8.11.1/8.9.3) with ESMTP id eAUDiTv03105;
	Thu, 30 Nov 2000 06:44:29 -0700 (MST)
	(envelope-from cp@berserker.bsdi.com)
Message-Id: <200011301344.eAUDiTv03105@berserker.bsdi.com>
To: Alfred Perlstein <bright@wintelcom.net>
Cc: arch@FreeBSD.ORG
Subject: Re: serious problem with mutexs and userland visibility? 
In-reply-to: Your message of "Wed, 29 Nov 2000 12:55:09 PST."
             <20001129125508.O8051@fw.wintelcom.net> 
From: Chuck Paterson <cp@bsdi.com>
Date: Thu, 30 Nov 2000 06:44:29 -0700
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

You might want to look at how the lock manager deals with
mutices. This same approach ought to work for the cred stuff which
has a lower usage rate that the lock manager, and you can adjust
you level of lock sharing.


Chuck


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30  6:49:39 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5160337B400; Thu, 30 Nov 2000 06:49:34 -0800 (PST)
Received: by peorth.iteration.net (Postfix, from userid 1001)
	id D45D9573A5; Thu, 30 Nov 2000 08:49:31 -0600 (CST)
Date: Thu, 30 Nov 2000 08:49:31 -0600
From: "Michael C . Wu" <keichii@iteration.net>
To: Cy Schubert - ITSD Open Systems Group <Cy.Schubert@uumail.gov.bc.ca>
Cc: Poul-Henning Kamp <phk@FreeBSD.ORG>, current@FreeBSD.ORG,
	arch@FreeBSD.ORG
Subject: Re: RFC: /dev/console -> /var/log/messages idea/patch
Message-ID: <20001130084931.C16834@peorth.iteration.net>
Reply-To: "Michael C . Wu" <keichii@peorth.iteration.net>
References: <1050.974925641@critter> <200011251540.eAPFe4N00849@cwsys.cwsent.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200011251540.eAPFe4N00849@cwsys.cwsent.com>; from Cy.Schubert@uumail.gov.bc.ca on Sat, Nov 25, 2000 at 07:39:33AM -0800
X-PGP-Fingerprint: 5025 F691 F943 8128 48A8  5025 77CE 29C5 8FA1 2E20
X-PGP-Key-ID: 0x8FA12E20
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Sat, Nov 25, 2000 at 07:39:33AM -0800, Cy Schubert - ITSD Open Systems Group scribbled:
| In message <1050.974925641@critter>, Poul-Henning Kamp writes:
| >
| > The attached patch is a "proof-of-concept" on which I would like
| > to get some comments:
| >
| > It bugs me big time that the output from /etc/rc and all other output
| > to /dev/console is volatile and lost once it scrolls of your console.
|
| It's a no-brainer.  Let's do it.

How about networked ddb/gdb over {ether,ppp,usb,firewire,IrDA}?
Firewire and IrDA are works in progress AFAIK, but certainly
ddb/gdb networked debugging is what all FreeBSD dream of, right? :)

The PPC port would greatly benefit from this, as newer Apple stations
do not even have a serial port.

Darwin seems to have networked debugging.
--
+------------------------------------------------------------------+
| keichii@peorth.iteration.net         | keichii@bsdconspiracy.net |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+------------------------------------------------------------------+


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30  7: 3: 6 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178])
	by hub.freebsd.org (Postfix) with ESMTP id 465AE37B400
	for <arch@FreeBSD.ORG>; Thu, 30 Nov 2000 07:02:59 -0800 (PST)
Received: by peorth.iteration.net (Postfix, from userid 1001)
	id 2C835573A5; Thu, 30 Nov 2000 09:03:01 -0600 (CST)
Date: Thu, 30 Nov 2000 09:03:01 -0600
From: "Michael C . Wu" <keichii@iteration.net>
To: Kirk McKusick <mckusick@mckusick.com>
Cc: Jordan Hubbard <jkh@winston.osd.bsdi.com>, arch@FreeBSD.ORG
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.)
Message-ID: <20001130090301.D16834@peorth.iteration.net>
Reply-To: "Michael C . Wu" <keichii@peorth.iteration.net>
References: <52694.975362925@winston.osd.bsdi.com> <200011272241.OAA93364@beastie.mckusick.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200011272241.OAA93364@beastie.mckusick.com>; from mckusick@mckusick.com on Mon, Nov 27, 2000 at 02:41:05PM -0800
X-PGP-Fingerprint: 5025 F691 F943 8128 48A8  5025 77CE 29C5 8FA1 2E20
X-PGP-Key-ID: 0x8FA12E20
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Mon, Nov 27, 2000 at 02:41:05PM -0800, Kirk McKusick scribbled:
| When we first implemented termios at CSRG, we had an erase2
| character. Mike Karels was vehemently opposed to it, and
| insisted that it be deleted before we did our next release
| (4.3-tahoe if I remember correctly). I am of the opinion that
| it is a good idea, and should be there. I do not believe that
| we need/want a general aliasing facility as erase is really
| the only character for which there is widespead disagreement
| over which character to use. So, my take would be to add
| erase2 and be done with it.

/me putting on I18N crybaby hat

This feature has one very important aspect that I18N can use very
well.  Currently, for two-byte characters, we need to put delete
twice in console/tty/et al.  The best way to solve this would be
having the tty determine whethere it is a two-byte or one-byte
character.  Then the tty determines whether to push ^H/^? once
or twice depending on the character.

It would be easy to simply alias backspace/delete to two "^H/^?"'s
when we meet a two-byte character.  Please do not lock us
into hardcoding these erase2 characters and assume that everybody
uses English only.  I am not pointing fingers, but this mistake
was made many years ago in all *nix systems, perhaps we should not
hardcode this kind of stuff again. :)

/me hides and takes off all hats

--
+------------------------------------------------------------------+
| keichii@peorth.iteration.net         | keichii@bsdconspiracy.net |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+------------------------------------------------------------------+


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30  8: 6:53 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from feral.com (feral.com [192.67.166.1])
	by hub.freebsd.org (Postfix) with ESMTP id F194B37B402
	for <arch@FreeBSD.ORG>; Thu, 30 Nov 2000 08:06:40 -0800 (PST)
Received: from beppo (beppo [192.67.166.79])
	by feral.com (8.9.3/8.9.3) with ESMTP id IAA26804;
	Thu, 30 Nov 2000 08:06:16 -0800
Date: Thu, 30 Nov 2000 08:06:17 -0800 (PST)
From: Matthew Jacob <mjacob@feral.com>
Reply-To: mjacob@feral.com
To: Andrzej Bialecki <abial@webgiro.com>
Cc: Dag-Erling Smorgrav <des@ofug.org>, arch@FreeBSD.ORG
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO
 image for x86 updated.)
In-Reply-To: <Pine.BSF.4.20.0011301049410.51755-100000@mx.webgiro.com>
Message-ID: <Pine.BSF.4.21.0011300800420.96908-100000@beppo.feral.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Tsk. Then y'all wouldn't love to have a PDP 11/45 running V6/PWB, now would
you then?

Jeez. All this beefing over defaults, and nobody has the little gray cells to
thank whatever deities they might believe in that it is possible to change
these defaults as part of their login process- a feature that is there so they
can do something clever like turn off that pesky echokill feature and change
their line kill character to SPACE (a favorite amongst us who were young once
and decided to stay that way- this was the default action we would do to
someone who wandered off and left themselves logged in to one of the Vt52s).

-matt


On Thu, 30 Nov 2000, Andrzej Bialecki wrote:

> On 30 Nov 2000, Dag-Erling Smorgrav wrote:
> 
> > Tony Finch <dot@dotat.at> writes:
> > > And we *love* SVR4 OSs that bind ^? to intr.
> > 
> > The only one I've come across that does that is IRIX, but it's really
> > a *major* PITA.
> 
> SCO OpenServer does this as well. I hate it.
> 
> Andrzej Bialecki
> 
> //  <abial@webgiro.com> WebGiro AB, Sweden (http://www.webgiro.com)
> // -------------------------------------------------------------------
> // ------ FreeBSD: The Power to Serve. http://www.freebsd.org --------
> // --- Small & Embedded FreeBSD: http://www.freebsd.org/~picobsd/ ----
> 
> 
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-arch" in the body of the message
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30  8:23:38 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.interware.hu (mail.interware.hu [195.70.32.130])
	by hub.freebsd.org (Postfix) with ESMTP id 4C81837B400
	for <arch@freebsd.org>; Thu, 30 Nov 2000 08:23:35 -0800 (PST)
Received: from luanda-16.budapest.interware.hu ([195.70.51.16] helo=elischer.org)
	by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian))
	id 141WUc-0001yp-00; Thu, 30 Nov 2000 17:23:31 +0100
Message-ID: <3A2664AC.493B4101@elischer.org>
Date: Thu, 30 Nov 2000 06:31:08 -0800
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Tony Finch <dot@dotat.at>
Cc: arch@freebsd.org
Subject: Re: Thread-specific data and KSEs
References: <20001122133421.S18037@fw.wintelcom.net> <Pine.SUN.3.91.1001122180448.7920A-100000@pcnet1.pcnet.com> <20001130065503.E58294@hand.dotat.at>
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Tony Finch wrote:
> 
> Daniel Eischen <eischen@vigrid.com> wrote:
> >On Wed, 22 Nov 2000, Alfred Perlstein wrote:
> >>
> >> Was there something wrong with the suggestion to put the local info
> >> on the stack?  I just don't see it being discussed at all.
> >
> >Yes, I stated that it could not be used.  We want to provide a POSIX
> >complaint API, and this dictates that applications be able to create
> >stacks of their own size and choosing.  We can't rely on stacks being
> >any particular size, or starting at any particular address.
> 
> Additionally, wouldn't you have to walk up the stack to find its base?
> (which I guess would be a bit more expensive than dereferencing %gs)

No, you start each stack on some multiple of (say) 1MB
and then you just or it with 0xfffff to find the top of the stack..
(This is what one of the MACH threads packages used to do)

> 
> Tony.
> --
> f.a.n.finch     dot@dotat.at     fanf@covalent.net     Chad for President!
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-arch" in the body of the message

-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000
---> X_.---._/  presently in:  Budapest
            v


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30 13:22:25 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222])
	by hub.freebsd.org (Postfix) with ESMTP id ECB0D37B402
	for <arch@FreeBSD.org>; Thu, 30 Nov 2000 13:22:20 -0800 (PST)
Received: from laptop.baldwin.cx (john@dhcp246.osd.bsdi.com [204.216.28.246])
	by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eAULM8C71530;
	Thu, 30 Nov 2000 13:22:09 -0800 (PST)
	(envelope-from jhb@FreeBSD.org)
Message-ID: <XFMail.001130132229.jhb@FreeBSD.org>
X-Mailer: XFMail 1.4.0 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <200011300628.XAA06955@usr08.primenet.com>
Date: Thu, 30 Nov 2000 13:22:29 -0800 (PST)
From: John Baldwin <jhb@FreeBSD.org>
To: Terry Lambert <tlambert@primenet.com>
Subject: Re: HEADSUP user struct ucred -> xucred (Was: Re: serious proble
Cc: arch@FreeBSD.org, (Alfred Perlstein) <bright@wintelcom.net>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On 30-Nov-00 Terry Lambert wrote:
>> > I recently locked down struct ucred, not a big deal, basically just
>> > a mutex in each struct to protect the refcount.
>> > 
>> > Unfortunetly struct ucred is used by some userland utils and
>> > sys/ucred is included in sys/mount.h as well as sys/user.h, this
>> > creates somewhat of a problem, forcing all users of sys/ucred.h to
>> > include sys/mutex.g.
>> > 
>> > I have a patch here that sort of takes care of this problem, the
>> > problem is that I had to add sys/mutex.h includes to both sys/mount.h
>> > and sys/user.h, this doesn't make me very happy.
>> 
>> After a short discussion it has been determined that there will be
>> a xucred exported to userland following the concention of xsocket
>> and the various other xfoo structs exported to the kernel.
>> 
>> Struct ucred will no longer be visible outside the kernel.
>> 
>> Any userland things using struct ucred will need to use xucred.
>> 
>> This will be the convention used to resolve mutex (or other MD 
>> fields) in kernel exported structures in the future.
> 
> This is a really gross way to handle this.  The ucred structure
> is used by a lot of user space programs.

Another way I suggested that was shotdown was to do something like this:

#ifdef _KERNEL
struct ucred {
     ... kernel structure ...
};

struct xucred {
#else
struct ucred {
#endif
     ... userland structure ...
};

So that ucred didn't change for userland, but the kernel would have ucred for
its internal ucred and xucred for the userland ucred.  This allows no userland
changes, and all you would need to do is convert ucred to xucred and vice versa
at the boundary.

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30 14:47:11 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP id C472A37B400
	for <arch@FreeBSD.ORG>; Thu, 30 Nov 2000 14:47:05 -0800 (PST)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.3/8.9.3) id PAA26953;
	Thu, 30 Nov 2000 15:43:50 -0700 (MST)
Received: from usr05.primenet.com(206.165.6.205)
 via SMTP by smtp05.primenet.com, id smtpdAAA2NaOM0; Thu Nov 30 15:43:43 2000
Received: (from tlambert@localhost)
	by usr05.primenet.com (8.8.5/8.8.5) id PAA23413;
	Thu, 30 Nov 2000 15:46:56 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200011302246.PAA23413@usr05.primenet.com>
Subject: Re: HEADSUP user struct ucred -> xucred (Was: Re: serious problem
To: abial@webgiro.com (Andrzej Bialecki)
Date: Thu, 30 Nov 2000 22:46:55 +0000 (GMT)
Cc: tlambert@primenet.com (Terry Lambert),
	bright@wintelcom.net (Alfred Perlstein), arch@FreeBSD.ORG
In-Reply-To: <Pine.BSF.4.20.0011301037190.51755-100000@mx.webgiro.com> from "Andrzej Bialecki" at Nov 30, 2000 10:48:43 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> But don't we have the same issue with other parts of kernel structures
> that we don't want to make visible to userland, not just the
> mutexes.
> 
> I had some discussion with Robert Watson a few days ago about the need to
> hide the layout of struct proc (and the changes it undergoes) from
> userland, which would allow to stabilize kernel interface to user
> utilities, like libkvm and friends (which probably should use
> specialized sysctl anyway). This goal would be quite difficult to achieve
> with just macros (and ugly at that..), so we thought about fixing all
> places where these structs are accessible to use special version of "user
> space struct proc" (== struct xproc? :-).
> 
> This way no user space code will have to be changed (more than today,
> i.e. recompile libkvm et al., as usual), we could hide the complexities
> that we don't want to be visible outside the kernel, and we gain the
> stability in kernel/user interface (i.e. no more recompiles of userland
> needed if you update the kernel with changed struct proc size).

If you want to get technical, data interfaces are bad engineering,
a bad idea all around, and something which should be immediately
deprecated.  XML is surrounded by similar problems.


Really, there should be _NO_ reading of /dev/kmem, under any
circumstances.  Likewise, there should never be a case where
a kernel structure is copied out to user space directly: all
data to be externalized should be abstracted before it is
externalized.

So the canonically correct thing to so would be to surround
most of the kernel dependent headers with "#ifdef _KERNEL",
and not externalize _ANY_ structure declarations, whatsoever.

There are two major, and many minor, problems with this approach,
which boil down to data interfaces with no other available method
to solve the problem (today).  The first is latent interfaces,
and the second is bimodal interfaces.

A latent interface occurs when data is communicated with a
latency, and the latency is unavoidable, and can not be easily
worked around in code.  The number one latent interface is the
file system, with the latencies being present in newfs, tunefs,
fsck, and other utilities.  Since these utilities operate on
data which is not visible to the kernel (for good reason!) at
the time of the operation, the only option is a latent interface,
or rolling the functionality into the kernel itself.  This could
be done, but it's prohibitively expensive without discardable
code segments, which, while supported by ELF, are not supported
by FreeBSD.  Even were these supported by FreeBSD, you would
still need to deal with discrete kernel object files, since the
issue of license can not be resolved in a static linkage.  In
other words, it's possible to deal with this (Windows supports
ELF [PE: Portable Executable] objects with segment attributes,
including "initialization", "discardable", "pageable", etc.),
but FreeBSD does not have the necessary technical sophistication
at the present time.

A bimodal interface is an interface intended to operate both
interactively, and against latent data, potentially with huge
latencies which can not be overcome with segment attribution,
etc..  An example of an interface like this is the interface
used by the "ps" command in order to obtain information from
the current system image (the granddaddy of all of these is a
kernel debuger).  Since the "ps" command must be able to run
against the existing system, and it must be able to run against
a crashdump of a system, perhaps sent via parcel post or carrier
pigeon, the interfaces it uses can not be seperated from the
data against which they are implemented.

Worst case, "_KERNEL" could be defined in scope, and the
utilities could remain in user space.


The second case here is the most interesting, and the most
applicable to the ucred structure under discussion.

Actually, the "ps" command has limited utility against a crash
dump.  This is because it is linked against a libkvm, and has
itself intimate knowledge of a kernel structure (a historically
volatile one -- proc -- which is shown no signs of stabilizing,
in fact).  The libkv, provides symbolic reference to the kmem
image data base addresses, which can then be followed as linked
lists in order to obtain information.  The information is then
interpreted by the "ps" program itself, based on its knowledge
of the structure contents.

I think in the limit, this interface will have to die.  Consider
the case of a "ps" command in user space, with the proc struct
list protected by mutex from multiple CPUs and/or kernel
preemption: the user space program will neither honor, nor will
it itself assert, the protection mutex.  This means that it may
be running on one processor, while another is manipulating the
structure linkages.  Best case failure mode is the user space
process sees the list appear to terminate prematurely.  Worst
case, the user space process causes a fault while reading kmem,
or sees a circular reference, and fails to terminate properly,
spending all its time traversing the circular reference.

Another problem that will commonly arise is that the proc struct
known to the "ps" program, or the information known to the libkvm,
will change.  When you go to apply this information to an older
image, the newer tools will not operate.  It's a royal pain, but
it is possible to resynchronize this information in the common
interactive case, by insisting that builds be grouped.  For the
latent data case, this will not work.  In fact, most people who
follow -current have, at one time or another, found themselves
booted on a "kernel.old" because the new "kernel" was too unstable
to use, even to correct the stability problem as a bootstrap for
replacing itself.  When this happens subsequent to a rebuild of
libkvm and "ps" (and other utilities, such as "mount"), it is not
as easy to revert the rest of the system as it was to revert the
kernel.

One way to deal with this problem would be to attach segments
to the running kernel, which implement libkvm.  Programs could
map these in and use them as they would use any shared library
to get kvm information.  This is attractive, since it means that
you could map your libkvm from the crashdump image, instead of
the running kernel, or an old kernel (if symbols could not be
obtained from the dump image, only from the kernel of which the
image is a dump; I dislike this, as it means pushing around
synchronized file sets, but it's at least a workable kludge).
In this scenario, the libkvm/kernel synchronization problem has
been resolved.

This still leaves us with the "ps" program knowing about the
proc structure (and the "mount" program knowing about the mount
parameter structure, etc.).  This initimate knowledge can only
be worked around by abstraction.  This might consist of providing
a set of descriptors for data elements, and externalizing this as
"ps" formatting argument strings, etc..  These descriptors could
be bundled in with what was previously described as the shared
objects that could be bundeled with the kernel, and mapped by
user programs.  This would provide a generic API to a protocol,
defined by the descriptors interpretation at compile time and at
runtime of the program using the descriptors.  Not as abstract as
SMTP, but a lot better than an application centric API for doing
the same thing, and infinitely better than a data interface.

This still doesn't resolve the SMP problem.  This could be
handled by externalizing access to the locks to user space.
This would, IMO, be a terrible mistake.

A second approach would be to define an access point that could
act as an API when used interactively, and as a data interface
when used latently.  This is actually rather easy, when you
realize that latent use will be against a static snapshot, and
not have to worry about locking.  The locking can be hidden
behind the API, and the API can straddle a user/kernel boundary.
For "ps", the most logical API is a procfs.  The procfs can
act as a descriptor tree automatically, since FSs are themselves
hierarchical in nature.  Similarly, the in-core implementation
is such that the structure representing it can be traversed as
data, in a static image (ideally, however, one would want to
"fake" an FS interface, so as to keep the "shared library"
segments of the kernel small, even though they are never loaded
by the kernel into the kernel address space; this "faking" could
be done by abstracting file I/O using libkvm descriptors, and by
providing control over syspace vs. userspace copying when trying
to do a "uiomove" to externalize FS data).

In any case, the SMP problem means that the data interfaces must
die, at least in as far as they apply to active systems, rather
than crash dumps.

If they die, then there is no kernel structure externalization
to worry about (with the side benefit of not needing to recompile
"ps" and the rest of the tools which use kmem or externalized
kernel structures, each time those structures are changed).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30 14:51:43 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132])
	by hub.freebsd.org (Postfix) with ESMTP id 1FF6237B400
	for <arch@FreeBSD.ORG>; Thu, 30 Nov 2000 14:51:41 -0800 (PST)
Received: (from daemon@localhost)
	by smtp02.primenet.com (8.9.3/8.9.3) id PAA08490;
	Thu, 30 Nov 2000 15:47:13 -0700 (MST)
Received: from usr05.primenet.com(206.165.6.205)
 via SMTP by smtp02.primenet.com, id smtpdAAA3WaqHq; Thu Nov 30 15:47:04 2000
Received: (from tlambert@localhost)
	by usr05.primenet.com (8.8.5/8.8.5) id PAA23494;
	Thu, 30 Nov 2000 15:51:25 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200011302251.PAA23494@usr05.primenet.com>
Subject: Re: HEADSUP user struct ucred -> xucred (Was: Re: serious problem with mutexs and userland visibility?)
To: bright@wintelcom.net (Alfred Perlstein)
Date: Thu, 30 Nov 2000 22:51:25 +0000 (GMT)
Cc: abial@webgiro.com (Andrzej Bialecki),
	tlambert@primenet.com (Terry Lambert), arch@FreeBSD.ORG
In-Reply-To: <20001130022614.W8051@fw.wintelcom.net> from "Alfred Perlstein" at Nov 30, 2000 02:26:14 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> Ok, kvm is killing me. :/

Data interfaces suck.

> see:
> ~"lib/libkvm/kvm_proc.c" line 125 of 793
> 
> libkvm expects to be able to copy the pointer in the struct proc into
> its own struct.
> 
> My only chance (or so it seems) is to keep all userland visible parts
> of the ucred at the begininning of it, as well as forcing the same
> order to keep libkvm happy.  Then it can effectively:
> 
>   bcopy(struct ucred *uc, struct xucred *xuc, sizeof(struct xucred));
> 
> without worries, this is pretty hackish, but libkvm isn't exactly
> your state of the art interface.
> 
> This is pretty close to what Terry suggested but less scary in
> my opinion as long as we add a comment to sys/ucred.h about
> keeping kernel only feilds at the end of the struct.
> 
> ?

What happens when you add a new _not_ kernel-only field and
boot an older kernel because the newer kernel is unstable?

You need to get away from data interfaces.  Please see my other
posting in this thread: mutex protected data objects accessed
via data interface in a userland which neither asserts nor
honors the mutex are inhernetly SMP and kernel preemption unsafe.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30 15: 0: 9 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from falla.videotron.net (falla.videotron.net [205.151.222.106])
	by hub.freebsd.org (Postfix) with ESMTP
	id 10BA537B402; Thu, 30 Nov 2000 15:00:00 -0800 (PST)
Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213])
 by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8)
 with ESMTP id <0G4V00MKE17X64@falla.videotron.net>; Thu, 30 Nov 2000 17:59:57 -0500 (EST)
Date: Thu, 30 Nov 2000 18:00:38 -0500 (EST)
From: Bosko Milekic <bmilekic@technokratis.com>
Subject: Re: zero copy code review
In-reply-to: <20001129231653.A1503@panzer.kdm.org>
To: "Kenneth D. Merry" <ken@kdm.org>
Cc: arch@FreeBSD.ORG, gallatin@FreeBSD.ORG
Message-id: <Pine.BSF.4.21.0011301657500.78741-100000@jehovah.technokratis.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


  Hi,

On Wed, 29 Nov 2000, Kenneth D. Merry wrote:

> [ -net and -current BCCed for wider coverage, this is probably best
> handled on -arch ]
> 
> I would like to request reviews of the zero copy sockets and NFS code I've
> been posting about for months:
> 
> http://people.FreeBSD.org/~ken/zero_copy
> 
> There are diffs posted above against -current as of early November 28th,
> along with a FAQ, and change log.
> 
> These diffs include changes in:
> 
>  - the socket code
>  - NFS code
>  - VM code
>  - ti(4) driver
>  - sendfile code
> 
> Much of the code was written by Drew Gallatin <gallatin@FreeBSD.org>, but I
> wrote a lot of the ti(4) driver mods and cleaned things up a fair bit.
> 
> The code is stable, and I don't know of any bugs at the moment.  I have run
> with it enabled on one of my main development boxes for months without any
> problems.
> 
> The way things are currently configured, it is not turned on by default.
> You need two kernel options and a sysctl to turn it on.  The zero copy NFS
> code can be turned on with gdb, although it might be better to make that
> into a sysctl.  (I haven't played with the zero copy NFS code much, Drew
> has done much more with that.)
> 
> How to turn the code on is covered in the web page, above.
> 
> Anyway, I'd like to commit this code sometime next week, if no one comes up
> with any issues or problems.
>
> Comments, bug reports, etc., are welcome.

	In general, I am pro-the zero copy stuff you've been
  gathering/merging/updating/writing/etc. over the past several months.
  Looking at the sendfile portion of your changes, it's pretty obvious that
  they are very minimal, but I'm curious as to why you've bothered removing
  the "static" before the sf_buf_free(). I can see why it really has no
  significance in the sf_buf_alloc() case, but sf_buf_free() is attached to
  the mbuf's m_ext free function pointer (I'm really just curious if the
  motivation was strictly stylistic).
  	Here some other notes, which I came across during a real quick read
  of some of the code (I am sort of in a pre-final-exam period, so I can't
  dedicate too much time to this for the next 2 weeks, about :-( ):

  in nfs/nfs_serv.c:
  	In your first "BEGIN SUSPECT REGION" block:

	- You allocate an sf_buf somewhere down the line and then attempt to
  allocate an mbuf to which you will hope to attach the sf_buf to. If the
  mbuf allocation fails, you don't seem to free the sf_buf anywhere and
  consequently, it looks as though you may leak sf_bufs.
  	- You only m_freem() on mb (the header mbuf) if mb->m_next != NULL,
  but if there is no m_next (m_next == NULL), you don't seem to free the mb
  mbuf (header mbuf) at all. Is this meant to be this way? (Note that it
  may very well be, I haven't looked at all the other surrounding code,
  just making sure).
	- In the actual MEXTADD(), you don't seem to be passing the M_RDONLY
  flag (which is done for sendfile buffer ext mbufs). M_RDONLY is used to
  indicate to the rest of the code that the m_data is not to be tampered
  with (trimmed, et al) -- in other words, it's read-only. Have you
  considered it?
	- Stylistic suggestion: please try to keep things 25x80. :-)

 [ skipped all the other NFS + ti driver changes ]

 jumbo.h:
 	I would like to eventually split the cluster code out of mbuf.h and
 uipc_mbuf.c and change jumbo.h/uipc_jumbo.c -> cluster.h/uipc_cluster.c

 mbuf.h:
	- Make EXT_DISPOSABLE 3, instead of 300... if you decide to keep it.
 The reason I say this is because it seems to me that EXT_DISPOSABLE should
 be more of an m_flag than an ext_type, which would probably mean that we
 should make m_flag bigger than a short (which it is now). The reason I
 argue this is because EXT_DISPOSABLE seems to be more of an indication of
 what should be done with the contents of the mbuf. Perhaps what needs to
 be done instead is make the EXT_DISPOSABLE flag, have if_ti use the DRV
 ext type (like it should be doing) for its external buffers, and make it
 set EXT_DISPOSABLE|M_RDONLY during the MEXTADD. Let's not get too strict
 with this for now, though, it would be better to make sure everything is
 working perfectly until we decide what to do with this - and it can be
 changed easily later. 

 tiio.h: Are you sure tiio.h belongs in src/sys/sys ?

 Also, have you checked whether any locking should be performed here?
 Considering that this is all supposed to improve performance, it would be
 nice if it didn't all need to run under Giant. I realize that some of this
 will have to wait (i.e. VM), but what about the if_ti code? Is that
 something that can be looked at RSN?
	I would strongly urge you to run some tests under real heavy network
 activity (possibly lower NMBCLUSTERS for this) and starve out the mbuf
 resources and see if anything strange happens - you may catch a couple of
 leaks that may have accidently slipped through. Finally, I'd like to
 suggest possibly breaking up some of the diff to smaller chunks, just so
 it is easier to track things down if something does break. With -CURRENT
 changing relatively dramatically now sometimes several times in a single
 day, I think this would be worth it for everybody.

> Thanks!
> 
> Ken
> -- 
> Kenneth Merry
> ken@kdm.org

	Thank *you*!

  Regards,
  Bosko Milekic
  bmilekic@technokratis.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30 15:29:28 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from netau1.alcanet.com.au (ntp.alcanet.com.au [203.62.196.27])
	by hub.freebsd.org (Postfix) with ESMTP
	id 368DF37B400; Thu, 30 Nov 2000 15:29:23 -0800 (PST)
Received: from mfg1.cim.alcatel.com.au (mfg1.cim.alcatel.com.au [139.188.23.1])
	by netau1.alcanet.com.au (8.9.3 (PHNE_18979)/8.9.3) with ESMTP id KAA21472;
	Fri, 1 Dec 2000 10:29:19 +1100 (EDT)
Received: from gsmx07.alcatel.com.au by cim.alcatel.com.au
 (PMDF V5.2-32 #37645) with ESMTP id <01JX6QHY1B00EAF49C@cim.alcatel.com.au>;
 Fri, 1 Dec 2000 10:29:17 +1100
Received: (from jeremyp@localhost)	by gsmx07.alcatel.com.au (8.11.0/8.11.0)
 id eAUNTF802533; Fri, 01 Dec 2000 10:29:15 +1100 (EST envelope-from jeremyp)
Content-return: prohibited
Date: Fri, 01 Dec 2000 10:29:15 +1100
From: Peter Jeremy <peter.jeremy@alcatel.com.au>
Subject: Re: RANDOMDEV inspired realitycheck regarding i386/i486...
In-reply-to: <11485.974210886@critter>; from phk@FreeBSD.ORG on Tue, Nov 14,
 2000 at 03:08:06PM +0100
To: Poul-Henning Kamp <phk@FreeBSD.ORG>
Cc: arch@FreeBSD.ORG
Mail-followup-to: Poul-Henning Kamp <phk@FreeBSD.ORG>, arch@FreeBSD.ORG
Message-id: <20001201102915.G1474@gsmx07.alcatel.com.au>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-disposition: inline
User-Agent: Mutt/1.2.5i
References: <11485.974210886@critter>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On 2000-Nov-14 15:08:06 +0100, Poul-Henning Kamp <phk@FreeBSD.ORG> wrote:
>Has anybody run a 486 or 386 under current recently ?

X on a PRE_SMPNG 486 is painful - mouse movements no longer make
the X pointer move in real time.  I haven't noticed the seeding
issue (probably just luck).

>What is the consensus ?

I think 386/486 remains a significant market and would not like to
see support dropped.  I'd go so far as to suggest that if -current
does drop support for the 386/486, the then-stable version will need
to be actively maintained indefinitely to provide continued support.

Peter


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30 17:19:12 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.hz.zj.cn (unknown [202.101.172.2])
	by hub.freebsd.org (Postfix) with SMTP id C4D9237B401
	for <arch@freebsd.org>; Thu, 30 Nov 2000 17:19:08 -0800 (PST)
Received: from xyf([61.130.65.225]) by mail.hz.zj.cn(JetMail 2.5.3.0)
	with SMTP id jmc3a270a0e; Fri,  1 Dec 2000 01:19:06 -0000
Message-ID: <002501c05b34$b1609de0$e001a8c0@xyf>
From: "xuyifeng" <bsddiy@163.net>
To: "Julian Elischer" <julian@elischer.org>,
	"Tony Finch" <dot@dotat.at>
Cc: <arch@freebsd.org>
References: <20001122133421.S18037@fw.wintelcom.net> <Pine.SUN.3.91.1001122180448.7920A-100000@pcnet1.pcnet.com> <20001130065503.E58294@hand.dotat.at> <3A2664AC.493B4101@elischer.org>
Subject: Re: Thread-specific data and KSEs
Date: Fri, 1 Dec 2000 09:17:12 +0800
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-15"
Content-Transfer-Encoding: base64
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2615.200
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

YnV0IHRoaXMgbGltaXRzIHRvdGFsIHRocmVhZHMgdG8gbGVzcyB0aGFuIDIwMDAgaWYgdGhlIHBy
b2Nlc3MgYWRkcmVzcyBzcGFjZSBpcyAyRywNCnRocmVhZHMgbmVlZG4ndCAxTSBzdGFjayBzcGFj
ZSBpbiBtb3N0IGNhc2UuDQoNClh1WWlmZW5nDQoNCi0tLS0tIE9yaWdpbmFsIE1lc3NhZ2UgLS0t
LS0gDQpGcm9tOiBKdWxpYW4gRWxpc2NoZXIgPGp1bGlhbkBlbGlzY2hlci5vcmc+DQpUbzogVG9u
eSBGaW5jaCA8ZG90QGRvdGF0LmF0Pg0KQ2M6IDxhcmNoQGZyZWVic2Qub3JnPg0KU2VudDogVGh1
cnNkYXksIE5vdmVtYmVyIDMwLCAyMDAwIDEwOjMxIFBNDQpTdWJqZWN0OiBSZTogVGhyZWFkLXNw
ZWNpZmljIGRhdGEgYW5kIEtTRXMNCg0KDQo+IFRvbnkgRmluY2ggd3JvdGU6DQo+ID4gDQo+ID4g
RGFuaWVsIEVpc2NoZW4gPGVpc2NoZW5AdmlncmlkLmNvbT4gd3JvdGU6DQo+ID4gPk9uIFdlZCwg
MjIgTm92IDIwMDAsIEFsZnJlZCBQZXJsc3RlaW4gd3JvdGU6DQo+ID4gPj4NCj4gPiA+PiBXYXMg
dGhlcmUgc29tZXRoaW5nIHdyb25nIHdpdGggdGhlIHN1Z2dlc3Rpb24gdG8gcHV0IHRoZSBsb2Nh
bCBpbmZvDQo+ID4gPj4gb24gdGhlIHN0YWNrPyAgSSBqdXN0IGRvbid0IHNlZSBpdCBiZWluZyBk
aXNjdXNzZWQgYXQgYWxsLg0KPiA+ID4NCj4gPiA+WWVzLCBJIHN0YXRlZCB0aGF0IGl0IGNvdWxk
IG5vdCBiZSB1c2VkLiAgV2Ugd2FudCB0byBwcm92aWRlIGEgUE9TSVgNCj4gPiA+Y29tcGxhaW50
IEFQSSwgYW5kIHRoaXMgZGljdGF0ZXMgdGhhdCBhcHBsaWNhdGlvbnMgYmUgYWJsZSB0byBjcmVh
dGUNCj4gPiA+c3RhY2tzIG9mIHRoZWlyIG93biBzaXplIGFuZCBjaG9vc2luZy4gIFdlIGNhbid0
IHJlbHkgb24gc3RhY2tzIGJlaW5nDQo+ID4gPmFueSBwYXJ0aWN1bGFyIHNpemUsIG9yIHN0YXJ0
aW5nIGF0IGFueSBwYXJ0aWN1bGFyIGFkZHJlc3MuDQo+ID4gDQo+ID4gQWRkaXRpb25hbGx5LCB3
b3VsZG4ndCB5b3UgaGF2ZSB0byB3YWxrIHVwIHRoZSBzdGFjayB0byBmaW5kIGl0cyBiYXNlPw0K
PiA+ICh3aGljaCBJIGd1ZXNzIHdvdWxkIGJlIGEgYml0IG1vcmUgZXhwZW5zaXZlIHRoYW4gZGVy
ZWZlcmVuY2luZyAlZ3MpDQo+IA0KPiBObywgeW91IHN0YXJ0IGVhY2ggc3RhY2sgb24gc29tZSBt
dWx0aXBsZSBvZiAoc2F5KSAxTUINCj4gYW5kIHRoZW4geW91IGp1c3Qgb3IgaXQgd2l0aCAweGZm
ZmZmIHRvIGZpbmQgdGhlIHRvcCBvZiB0aGUgc3RhY2suLg0KPiAoVGhpcyBpcyB3aGF0IG9uZSBv
ZiB0aGUgTUFDSCB0aHJlYWRzIHBhY2thZ2VzIHVzZWQgdG8gZG8pDQo+IA0KPiA+IA0KPiA+IFRv
bnkuDQo+ID4gLS0NCj4gPiBmLmEubi5maW5jaCAgICAgZG90QGRvdGF0LmF0ICAgICBmYW5mQGNv
dmFsZW50Lm5ldCAgICAgQ2hhZCBmb3IgUHJlc2lkZW50IQ0KPiA+IA0KPiA+IFRvIFVuc3Vic2Ny
aWJlOiBzZW5kIG1haWwgdG8gbWFqb3Jkb21vQEZyZWVCU0Qub3JnDQo+ID4gd2l0aCAidW5zdWJz
Y3JpYmUgZnJlZWJzZC1hcmNoIiBpbiB0aGUgYm9keSBvZiB0aGUgbWVzc2FnZQ0KPiANCj4gLS0g
DQo+ICAgICAgIF9fLS1ffFwgIEp1bGlhbiBFbGlzY2hlcg0KPiAgICAgIC8gICAgICAgXCBqdWxp
YW5AZWxpc2NoZXIub3JnDQo+ICAgICAoICAgT1ogICAgKSBXb3JsZCB0b3VyIDIwMDANCj4gLS0t
PiBYXy4tLS0uXy8gIHByZXNlbnRseSBpbjogIEJ1ZGFwZXN0DQo+ICAgICAgICAgICAgIHYNCj4g
DQo+IA0KPiANCj4gDQo+IFRvIFVuc3Vic2NyaWJlOiBzZW5kIG1haWwgdG8gbWFqb3Jkb21vQEZy
ZWVCU0Qub3JnDQo+IHdpdGggInVuc3Vic2NyaWJlIGZyZWVic2QtYXJjaCIgaW4gdGhlIGJvZHkg
b2YgdGhlIG1lc3NhZ2UNCg==


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30 17:24:29 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1])
	by hub.freebsd.org (Postfix) with ESMTP id C024437B400
	for <arch@FreeBSD.ORG>; Thu, 30 Nov 2000 17:24:25 -0800 (PST)
Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30])
	by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id UAA02260;
	Thu, 30 Nov 2000 20:24:15 -0500 (EST)
Received: (from gallatin@localhost)
	by grasshopper.cs.duke.edu (8.11.1/8.9.1) id eB11OF003769;
	Thu, 30 Nov 2000 20:24:15 -0500 (EST)
	(envelope-from gallatin@cs.duke.edu)
From: Andrew Gallatin <gallatin@cs.duke.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Date: Thu, 30 Nov 2000 20:24:14 -0500 (EST)
To: Bosko Milekic <bmilekic@technokratis.com>
Cc: "Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG
Subject: Re: zero copy code review
In-Reply-To: <Pine.BSF.4.21.0011301657500.78741-100000@jehovah.technokratis.com>
References: <20001129231653.A1503@panzer.kdm.org>
	<Pine.BSF.4.21.0011301657500.78741-100000@jehovah.technokratis.com>
X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs  Lucid
Message-ID: <14886.63486.157224.937225@grasshopper.cs.duke.edu>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Bosko,

Thanks for your comments.  I'm a little disconnected from the code
these days, as I do most of my development in a 4.0-RELEASE
environment.   (Ken ported my contributions forward).

Bosko Milekic writes:
 > 
 > 	In general, I am pro-the zero copy stuff you've been
 >   gathering/merging/updating/writing/etc. over the past several months.
 >   Looking at the sendfile portion of your changes, it's pretty obvious that
 >   they are very minimal, but I'm curious as to why you've bothered removing
 >   the "static" before the sf_buf_free(). I can see why it really has no
 >   significance in the sf_buf_alloc() case, but sf_buf_free() is attached to
 >   the mbuf's m_ext free function pointer (I'm really just curious if the
 >   motivation was strictly stylistic).

It was un-staticized because it is called by socow_iodone(), which 
is the m_ext free for zero-copy transmissions.

 >   	Here some other notes, which I came across during a real quick read
 >   of some of the code (I am sort of in a pre-final-exam period, so I can't
 >   dedicate too much time to this for the next 2 weeks, about :-( ):
 > 
 >   in nfs/nfs_serv.c:
 >   	In your first "BEGIN SUSPECT REGION" block:
 > 
 > 	- You allocate an sf_buf somewhere down the line and then attempt to
 >   allocate an mbuf to which you will hope to attach the sf_buf to. If the
 >   mbuf allocation fails, you don't seem to free the sf_buf anywhere and
 >   consequently, it looks as though you may leak sf_bufs.

But the mbuf is allocated using M_WAIT.  Can that fail?  I haven't
kept up with the mbuf changes in -current.

>   	- You only m_freem() on mb (the header mbuf) if mb->m_next != NULL,
 >   but if there is no m_next (m_next == NULL), you don't seem to free the mb
 >   mbuf (header mbuf) at all. Is this meant to be this way? (Note that it
 >   may very well be, I haven't looked at all the other surrounding code,
 >   just making sure).

Yes.  Like most of the NFS code, it is a little convoluted.. mb is a
pre-existing mbuf chain that we're attaching mbufs to.  In the failure
case (where the mfreem I think you're talking about is), we backout
what we've done by freeing the mbufs we've added to mb, return
mb->next to null, and continue in the normal (copy) path.


<... some helpful comments deleted ....>

Many of your comments are directly related to -current, I
think I'll let Ken address them...

 > 	I would strongly urge you to run some tests under real heavy network
 >  activity (possibly lower NMBCLUSTERS for this) and starve out the mbuf
 >  resources and see if anything strange happens - you may catch a couple of
 >  leaks that may have accidently slipped through. Finally, I'd like to
 >  suggest possibly breaking up some of the diff to smaller chunks, just so
 >  it is easier to track things down if something does break. With -CURRENT
 >  changing relatively dramatically now sometimes several times in a single
 >  day, I think this would be worth it for everybody.

FWIW, the client-side nfs changes (in their 4.0-RELEASE form) are in
daily use in our lab and have been for months. We run experiments with
8 clients running against our Slice cluster nfs file server.  Each
client is close to maxed-out (60-70MB/sec per client, typically) for
hours... ;)

Thank you for your feedback.  And thank you for impoving the mbuf
system so much.  I wasted a whole afternoon yesterday doing something
which I could have done in 5 minutes if only I had mext_refcnt in 4.0 ;)

Drew


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30 19:18: 7 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from field.videotron.net (field.videotron.net [205.151.222.108])
	by hub.freebsd.org (Postfix) with ESMTP
	id 94ECF37B400; Thu, 30 Nov 2000 19:18:03 -0800 (PST)
Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213])
 by field.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8)
 with ESMTP id <0G4V003BYD61AF@field.videotron.net>; Thu, 30 Nov 2000 22:18:01 -0500 (EST)
Date: Thu, 30 Nov 2000 22:18:43 -0500 (EST)
From: Bosko Milekic <bmilekic@technokratis.com>
Subject: Re: zero copy code review
In-reply-to: <14886.63486.157224.937225@grasshopper.cs.duke.edu>
To: Andrew Gallatin <gallatin@cs.duke.edu>
Cc: "Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG,
	alfred@freebsd.org
Message-id: <Pine.BSF.4.21.0011302159210.79831-100000@jehovah.technokratis.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


 Hi Andrew,

On Thu, 30 Nov 2000, Andrew Gallatin wrote:

[...]
> Bosko Milekic writes:
>  > 
>  > 	In general, I am pro-the zero copy stuff you've been
>  >   gathering/merging/updating/writing/etc. over the past several months.
>  >   Looking at the sendfile portion of your changes, it's pretty obvious that
>  >   they are very minimal, but I'm curious as to why you've bothered removing
>  >   the "static" before the sf_buf_free(). I can see why it really has no
>  >   significance in the sf_buf_alloc() case, but sf_buf_free() is attached to
>  >   the mbuf's m_ext free function pointer (I'm really just curious if the
>  >   motivation was strictly stylistic).
> 
> It was un-staticized because it is called by socow_iodone(), which 
> is the m_ext free for zero-copy transmissions.

	I see. But if the sendfile code still passes it as its own free
  routine, then shouldn't it remain staticized, strictly speaking? Although
  I may have missed it in the large diff, I did not see any changes to the
  actual registering of sf_bufs in the actual sendfile code (i.e.
  uipc_syscalls.c). I'm under the impression that in uipc_syscalls.c, the
  MEXTADD which sets up an sf_buf with an mbuf still passes sf_buf_free as
  its free routine.

>  >   	Here some other notes, which I came across during a real quick read
>  >   of some of the code (I am sort of in a pre-final-exam period, so I can't
>  >   dedicate too much time to this for the next 2 weeks, about :-( ):
>  > 
>  >   in nfs/nfs_serv.c:
>  >   	In your first "BEGIN SUSPECT REGION" block:
>  > 
>  > 	- You allocate an sf_buf somewhere down the line and then attempt to
>  >   allocate an mbuf to which you will hope to attach the sf_buf to. If the
>  >   mbuf allocation fails, you don't seem to free the sf_buf anywhere and
>  >   consequently, it looks as though you may leak sf_bufs.
> 
> But the mbuf is allocated using M_WAIT.  Can that fail?  I haven't
> kept up with the mbuf changes in -current.

	Yes, it can. M_WAIT just means "if nothing is available, first drain
  the stacks and if still nothing is available, then wait
  kern.ipc.mbuf_wait ticks (sysctl) and if still nothing is available, fail
  and set the passed in pointer to NULL and hope that the caller will deal
  with it." Waiting indefinetly can be dangerous in certain situations (for
  mbufs) but I won't get into that here.
	In your code, you do deal with the possibility of the MGETHDR
  returning NULL (you check for it) and you set ENOBUFS in that case and
  jump to the "errorpath" label. But, before using MGETHDR, you allocate an
  sf_buf (in sf) and it just so happens that the code beyond "errorpath"
  does not take care of freeing the sf_buf you allocated before even
  trying to allocate the mbuf.
  	Another thing to note, especially if you are Pre-SMPng: sf_buf_alloc
  calls can block, and even indeffinately (until the allocation is
  succesfull). In sendfile(2), this doesn't matter as you're not allocating
  the sf_buf from an interrupt. It has the potential to be a problem if you
  start allocating sf_bufs from interrupt context. Unfortunately, I haven't
  yet read+fully visualized all the code in the large diff, but this is
  something to take into account when reviewing.

> >   	- You only m_freem() on mb (the header mbuf) if mb->m_next != NULL,
>  >   but if there is no m_next (m_next == NULL), you don't seem to free the mb
>  >   mbuf (header mbuf) at all. Is this meant to be this way? (Note that it
>  >   may very well be, I haven't looked at all the other surrounding code,
>  >   just making sure).
> 
> Yes.  Like most of the NFS code, it is a little convoluted.. mb is a
> pre-existing mbuf chain that we're attaching mbufs to.  In the failure
> case (where the mfreem I think you're talking about is), we backout
> what we've done by freeing the mbufs we've added to mb, return
> mb->next to null, and continue in the normal (copy) path.

	Excellent.

> <... some helpful comments deleted ....>
> 
> Many of your comments are directly related to -current, I
> think I'll let Ken address them...

	Another one directly related to -CURRENT:

	I just noticed that the uipc_jumbo.c stuff does not do any locking.
  Perhaps it would be nice to lock the code sooner or later. I would be
  willing to go over it and do it but, as I said, I am really not going to
  be able to do much until 2 weeks from now.
  	Furthermore, I wonder if Alfred is gutsy enough ( :-) ) to raise his
  voice and let us know how much this may interefere with the adding of
  locks to sockets in the uipc subsystem, and possibly the stack as well.
  Alfred, where are the potential problems? (As you've already written a
  portion of the latter, I assume you're very well aware)...

>  > 	I would strongly urge you to run some tests under real heavy network
>  >  activity (possibly lower NMBCLUSTERS for this) and starve out the mbuf
>  >  resources and see if anything strange happens - you may catch a couple of
>  >  leaks that may have accidently slipped through. Finally, I'd like to
>  >  suggest possibly breaking up some of the diff to smaller chunks, just so
>  >  it is easier to track things down if something does break. With -CURRENT
>  >  changing relatively dramatically now sometimes several times in a single
>  >  day, I think this would be worth it for everybody.
> 
> FWIW, the client-side nfs changes (in their 4.0-RELEASE form) are in
> daily use in our lab and have been for months. We run experiments with
> 8 clients running against our Slice cluster nfs file server.  Each
> client is close to maxed-out (60-70MB/sec per client, typically) for
> hours... ;)

	Okay. Well, it's my understanding that the code is pretty stable; I
  just want to make sure that the case is the same in -CURRENT, especially
  when _mbufs_ are _completely_ starved.

> Thank you for your feedback.  And thank you for impoving the mbuf
> system so much.  I wasted a whole afternoon yesterday doing something
> which I could have done in 5 minutes if only I had mext_refcnt in 4.0 ;)

	Heh; no problem, really. :-) Thanks!

> Drew

  Cheers,
  Bosko Milekic
  bmilekic@technokratis.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30 19:47:43 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9365837B400; Thu, 30 Nov 2000 19:47:41 -0800 (PST)
Received: by peorth.iteration.net (Postfix, from userid 1001)
	id 32863573A9; Thu, 30 Nov 2000 21:47:45 -0600 (CST)
Date: Thu, 30 Nov 2000 21:47:45 -0600
From: "Michael C . Wu" <keichii@iteration.net>
To: Peter Jeremy <peter.jeremy@alcatel.com.au>
Cc: Poul-Henning Kamp <phk@FreeBSD.ORG>, arch@FreeBSD.ORG
Subject: Re: RANDOMDEV inspired realitycheck regarding i386/i486...
Message-ID: <20001130214745.E28757@peorth.iteration.net>
Reply-To: "Michael C . Wu" <keichii@peorth.iteration.net>
References: <11485.974210886@critter> <20001201102915.G1474@gsmx07.alcatel.com.au>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20001201102915.G1474@gsmx07.alcatel.com.au>; from peter.jeremy@alcatel.com.au on Fri, Dec 01, 2000 at 10:29:15AM +1100
X-PGP-Fingerprint: 5025 F691 F943 8128 48A8  5025 77CE 29C5 8FA1 2E20
X-PGP-Key-ID: 0x8FA12E20
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Fri, Dec 01, 2000 at 10:29:15AM +1100, Peter Jeremy scribbled:
| On 2000-Nov-14 15:08:06 +0100, Poul-Henning Kamp <phk@FreeBSD.ORG> wrote:
| >Has anybody run a 486 or 386 under current recently ?
|
| X on a PRE_SMPNG 486 is painful - mouse movements no longer make
| the X pointer move in real time.  I haven't noticed the seeding
| issue (probably just luck).

PRE_SMPNG does not have the /dev/random seeding issue.

You actually expected X to run well on a 486? :-)

| >What is the consensus ?
|
| I think 386/486 remains a significant market and would not like to
| see support dropped.  I'd go so far as to suggest that if -current
| does drop support for the 386/486, the then-stable version will need
| to be actively maintained indefinitely to provide continued support.

I do not really think the latest XFree86 versions were designed
with running 386/486 in mind. 386/486 is still a market, but
not many people try to build an embedded system with a full X
and tools.

--
+------------------------------------------------------------------+
| keichii@peorth.iteration.net         | keichii@bsdconspiracy.net |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+------------------------------------------------------------------+


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30 20:21:53 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from netau1.alcanet.com.au (ntp.alcanet.com.au [203.62.196.27])
	by hub.freebsd.org (Postfix) with ESMTP id 1C7F637B400
	for <arch@FreeBSD.ORG>; Thu, 30 Nov 2000 20:21:49 -0800 (PST)
Received: from mfg1.cim.alcatel.com.au (mfg1.cim.alcatel.com.au [139.188.23.1])
	by netau1.alcanet.com.au (8.9.3 (PHNE_18979)/8.9.3) with ESMTP id PAA25911;
	Fri, 1 Dec 2000 15:21:44 +1100 (EDT)
Received: from gsmx07.alcatel.com.au by cim.alcatel.com.au
 (PMDF V5.2-32 #37641) with ESMTP id <01JX70PFD0XCE7XDQI@cim.alcatel.com.au>;
 Fri, 1 Dec 2000 15:21:39 +1100
Received: (from jeremyp@localhost)	by gsmx07.alcatel.com.au (8.11.0/8.11.0)
 id eB14LbJ03578; Fri, 01 Dec 2000 15:21:37 +1100 (EST envelope-from jeremyp)
Content-return: prohibited
Date: Fri, 01 Dec 2000 15:21:37 +1100
From: Peter Jeremy <peter.jeremy@alcatel.com.au>
Subject: Re: RANDOMDEV inspired realitycheck regarding i386/i486...
In-reply-to: <20001130214745.E28757@peorth.iteration.net>; from
 keichii@iteration.net on Thu, Nov 30, 2000 at 09:47:45PM -0600
To: "Michael C . Wu" <keichii@peorth.iteration.net>
Cc: arch@FreeBSD.ORG
Mail-followup-to: "Michael C . Wu" <keichii@peorth.iteration.net>,
 arch@FreeBSD.ORG
Message-id: <20001201152137.K1474@gsmx07.alcatel.com.au>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-disposition: inline
User-Agent: Mutt/1.2.5i
References: <11485.974210886@critter>
 <20001201102915.G1474@gsmx07.alcatel.com.au>
 <20001130214745.E28757@peorth.iteration.net>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On 2000-Nov-30 21:47:45 -0600, "Michael C . Wu" <keichii@iteration.net> wrote:
>On Fri, Dec 01, 2000 at 10:29:15AM +1100, Peter Jeremy scribbled:
>| On 2000-Nov-14 15:08:06 +0100, Poul-Henning Kamp <phk@FreeBSD.ORG> wrote:
>| >Has anybody run a 486 or 386 under current recently ?
>|
>| X on a PRE_SMPNG 486 is painful - mouse movements no longer make
>| the X pointer move in real time.  I haven't noticed the seeding
>| issue (probably just luck).
>
>PRE_SMPNG does not have the /dev/random seeding issue.
>
>You actually expected X to run well on a 486? :-)

It used to run reasonably well (ignoring hogs like Netscape) before
Yarrow was added.  I'm hoping that once yarrow is threaded performance
will return to a usable level.  Keep in mind that a 486 is relatively
powerful compared to the available systems when X was designed.

>I do not really think the latest XFree86 versions were designed
>with running 386/486 in mind. 386/486 is still a market, but
>not many people try to build an embedded system with a full X
>and tools.

I'm running XFree86 3.x, rather than 4.x.  I agree that X is unlikely
in most embedded applications, but blocking in the kernel for an
extended period is likely to be equally unacceptable.

Peter


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30 20:34:14 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 5D47F37B400
	for <arch@FreeBSD.ORG>; Thu, 30 Nov 2000 20:34:12 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id eB14Y8M19291;
	Thu, 30 Nov 2000 20:34:08 -0800 (PST)
Date: Thu, 30 Nov 2000 20:34:08 -0800
From: Alfred Perlstein <alfred@FreeBSD.ORG>
To: Bosko Milekic <bmilekic@technokratis.com>
Cc: Andrew Gallatin <gallatin@cs.duke.edu>,
	"Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG
Subject: Re: zero copy code review
Message-ID: <20001130203407.I8051@fw.wintelcom.net>
References: <14886.63486.157224.937225@grasshopper.cs.duke.edu> <Pine.BSF.4.21.0011302159210.79831-100000@jehovah.technokratis.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <Pine.BSF.4.21.0011302159210.79831-100000@jehovah.technokratis.com>; from bmilekic@technokratis.com on Thu, Nov 30, 2000 at 10:18:43PM -0500
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Bosko Milekic <bmilekic@technokratis.com> [001130 19:18] wrote:
> 
>   	Furthermore, I wonder if Alfred is gutsy enough ( :-) ) to raise his
>   voice and let us know how much this may interefere with the adding of
>   locks to sockets in the uipc subsystem, and possibly the stack as well.
>   Alfred, where are the potential problems? (As you've already written a
>   portion of the latter, I assume you're very well aware)...

This will be somewhat of a large setback for me, but I'm sure I can
work around it.  If not it will have to go.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30 23:16:27 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169])
	by hub.freebsd.org (Postfix) with ESMTP
	id C4DDC37B401; Thu, 30 Nov 2000 23:16:22 -0800 (PST)
Received: (from ken@localhost)
	by panzer.kdm.org (8.9.3/8.9.1) id AAA11183;
	Fri, 1 Dec 2000 00:16:19 -0700 (MST)
	(envelope-from ken)
Date: Fri, 1 Dec 2000 00:16:19 -0700
From: "Kenneth D. Merry" <ken@kdm.org>
To: Bosko Milekic <bmilekic@technokratis.com>
Cc: arch@FreeBSD.ORG, gallatin@FreeBSD.ORG
Subject: Re: zero copy code review
Message-ID: <20001201001619.C10772@panzer.kdm.org>
References: <20001129231653.A1503@panzer.kdm.org> <Pine.BSF.4.21.0011301657500.78741-100000@jehovah.technokratis.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <Pine.BSF.4.21.0011301657500.78741-100000@jehovah.technokratis.com>; from bmilekic@technokratis.com on Thu, Nov 30, 2000 at 06:00:38PM -0500
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

[ Drew answered some of this, I'll try to answer the rest. ]

Thanks for looking at the code!

On Thu, Nov 30, 2000 at 18:00:38 -0500, Bosko Milekic wrote:
[ ... ]
[ Drew answered this part ]

> 	- In the actual MEXTADD(), you don't seem to be passing the M_RDONLY
>   flag (which is done for sendfile buffer ext mbufs). M_RDONLY is used to
>   indicate to the rest of the code that the m_data is not to be tampered
>   with (trimmed, et al) -- in other words, it's read-only. Have you
>   considered it?

That was an oversight, I'll add the flag.  (The places where it is used are
in uipc_cow.c, if_ti.c and nfs_serv.c.)

> 	- Stylistic suggestion: please try to keep things 25x80. :-)

I try, and I think most of the changes are, except for the NFS stuff.  I
didn't reformat that, although I suppose I could.  (It irritates me, too.)

>  [ skipped all the other NFS + ti driver changes ]
> 
>  jumbo.h:
>  	I would like to eventually split the cluster code out of mbuf.h and
>  uipc_mbuf.c and change jumbo.h/uipc_jumbo.c -> cluster.h/uipc_cluster.c
> 
>  mbuf.h:
> 	- Make EXT_DISPOSABLE 3, instead of 300... if you decide to keep it.
>  The reason I say this is because it seems to me that EXT_DISPOSABLE should
>  be more of an m_flag than an ext_type, which would probably mean that we
>  should make m_flag bigger than a short (which it is now). The reason I
>  argue this is because EXT_DISPOSABLE seems to be more of an indication of
>  what should be done with the contents of the mbuf. Perhaps what needs to
>  be done instead is make the EXT_DISPOSABLE flag, have if_ti use the DRV
>  ext type (like it should be doing) for its external buffers, and make it
>  set EXT_DISPOSABLE|M_RDONLY during the MEXTADD. Let's not get too strict
>  with this for now, though, it would be better to make sure everything is
>  working perfectly until we decide what to do with this - and it can be
>  changed easily later. 

In its current incarnation, EXT_DISPOSABLE indicates that the the memory
used in the mbuf can be disposed of -- i.e. removed from the kernel's
virtual address map.  The contents aren't disposed of, they're just moved
elsewhere.

I don't think most of the rest of the mbuf code is setup to deal with the
memory inside a non-external mbuf going away.  (Which would be the
potential implication of having EXT_DISPOSABLE be a regular m_flag.)

>  tiio.h: Are you sure tiio.h belongs in src/sys/sys ?

Well, it defines the interface for the character device front end for the
ti(4) driver.  Usually ioctls and supporting structures go in sys/sys. 
Would you suggest another location?

>  Also, have you checked whether any locking should be performed here?
>  Considering that this is all supposed to improve performance, it would be
>  nice if it didn't all need to run under Giant. I realize that some of this
>  will have to wait (i.e. VM), but what about the if_ti code? Is that
>  something that can be looked at RSN?

When Bill converted the ti(4) driver from spls to mutexes, I did the same
conversion on my modifications to the driver.  Is that sufficient?  I'm not
terribly up-to-date on the mutex stuff.

As for the rest of the code, since it was written pre-mutex, it still has
the spls in the right places.  I suppose that they would just need to be
converted to mutexes.  (Or is that an overly simplistic way to look at it? :)

> 	I would strongly urge you to run some tests under real heavy network
>  activity (possibly lower NMBCLUSTERS for this) and starve out the mbuf
>  resources and see if anything strange happens - you may catch a couple of
>  leaks that may have accidently slipped through.

Good idea, I'll do it if I have the time.  :(

>  Finally, I'd like to
>  suggest possibly breaking up some of the diff to smaller chunks, just so
>  it is easier to track things down if something does break. With -CURRENT
>  changing relatively dramatically now sometimes several times in a single
>  day, I think this would be worth it for everybody.

Heh, well, the big chunk is the Tigon firmware.  :)  

Are you suggesting just splitting the diffs out into multiple files, or
actually breaking the changes up?  The latter would be rather difficult to
do, I think.

In any case, the changes aren't on by default, so folks can just not turn
them on if they run into problems.

Thanks for the review, I'll try to incorporate your suggestions.

Ken
-- 
Kenneth Merry
ken@kdm.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30 23:22:45 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169])
	by hub.freebsd.org (Postfix) with ESMTP
	id E129537B400; Thu, 30 Nov 2000 23:22:38 -0800 (PST)
Received: (from ken@localhost)
	by panzer.kdm.org (8.9.3/8.9.1) id AAA11231;
	Fri, 1 Dec 2000 00:22:35 -0700 (MST)
	(envelope-from ken)
Date: Fri, 1 Dec 2000 00:22:35 -0700
From: "Kenneth D. Merry" <ken@kdm.org>
To: Bosko Milekic <bmilekic@technokratis.com>
Cc: Andrew Gallatin <gallatin@cs.duke.edu>, arch@FreeBSD.ORG,
	alfred@FreeBSD.ORG
Subject: Re: zero copy code review
Message-ID: <20001201002235.D10772@panzer.kdm.org>
References: <14886.63486.157224.937225@grasshopper.cs.duke.edu> <Pine.BSF.4.21.0011302159210.79831-100000@jehovah.technokratis.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <Pine.BSF.4.21.0011302159210.79831-100000@jehovah.technokratis.com>; from bmilekic@technokratis.com on Thu, Nov 30, 2000 at 10:18:43PM -0500
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, Nov 30, 2000 at 22:18:43 -0500, Bosko Milekic wrote:
> On Thu, 30 Nov 2000, Andrew Gallatin wrote:
> 
> [...]
> > <... some helpful comments deleted ....>
> > 
> > Many of your comments are directly related to -current, I
> > think I'll let Ken address them...
> 
> 	Another one directly related to -CURRENT:
> 
> 	I just noticed that the uipc_jumbo.c stuff does not do any locking.
>   Perhaps it would be nice to lock the code sooner or later. I would be
>   willing to go over it and do it but, as I said, I am really not going to
>   be able to do much until 2 weeks from now.

It does have spls in the right places, in this case splimp() and splvm().
Would you just convert those to the proper mutexes, or are we going to go
with per-data-structure mutexes (i.e. a little finer granularity), or...?
(I don't know much about the mutex strategy we're using...)

>   	Furthermore, I wonder if Alfred is gutsy enough ( :-) ) to raise his
>   voice and let us know how much this may interefere with the adding of
>   locks to sockets in the uipc subsystem, and possibly the stack as well.
>   Alfred, where are the potential problems? (As you've already written a
>   portion of the latter, I assume you're very well aware)...

Hopefully it won't cause many problems..

Ken
-- 
Kenneth Merry
ken@kdm.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30 23:24:39 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169])
	by hub.freebsd.org (Postfix) with ESMTP
	id 05A4037B400; Thu, 30 Nov 2000 23:24:37 -0800 (PST)
Received: (from ken@localhost)
	by panzer.kdm.org (8.9.3/8.9.1) id AAA11249;
	Fri, 1 Dec 2000 00:24:36 -0700 (MST)
	(envelope-from ken)
Date: Fri, 1 Dec 2000 00:24:36 -0700
From: "Kenneth D. Merry" <ken@kdm.org>
To: Alfred Perlstein <alfred@FreeBSD.ORG>
Cc: Bosko Milekic <bmilekic@technokratis.com>,
	Andrew Gallatin <gallatin@cs.duke.edu>, arch@FreeBSD.ORG
Subject: Re: zero copy code review
Message-ID: <20001201002436.E10772@panzer.kdm.org>
References: <14886.63486.157224.937225@grasshopper.cs.duke.edu> <Pine.BSF.4.21.0011302159210.79831-100000@jehovah.technokratis.com> <20001130203407.I8051@fw.wintelcom.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <20001130203407.I8051@fw.wintelcom.net>; from alfred@FreeBSD.ORG on Thu, Nov 30, 2000 at 08:34:08PM -0800
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, Nov 30, 2000 at 20:34:08 -0800, Alfred Perlstein wrote:
> * Bosko Milekic <bmilekic@technokratis.com> [001130 19:18] wrote:
> > 
> >   	Furthermore, I wonder if Alfred is gutsy enough ( :-) ) to raise his
> >   voice and let us know how much this may interefere with the adding of
> >   locks to sockets in the uipc subsystem, and possibly the stack as well.
> >   Alfred, where are the potential problems? (As you've already written a
> >   portion of the latter, I assume you're very well aware)...
> 
> This will be somewhat of a large setback for me, but I'm sure I can
> work around it.  If not it will have to go.

If you need explanations of things, feel free to let Drew or me know.

Hopefully this won't be a major roadblock for your changes.

Ken
-- 
Kenneth Merry
ken@kdm.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Nov 30 23:30:46 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 32A9937B401
	for <arch@FreeBSD.ORG>; Thu, 30 Nov 2000 23:30:44 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id eB17Ubg23792;
	Thu, 30 Nov 2000 23:30:37 -0800 (PST)
Date: Thu, 30 Nov 2000 23:30:37 -0800
From: Alfred Perlstein <alfred@FreeBSD.ORG>
To: "Kenneth D. Merry" <ken@kdm.org>
Cc: Bosko Milekic <bmilekic@technokratis.com>,
	Andrew Gallatin <gallatin@cs.duke.edu>, arch@FreeBSD.ORG
Subject: Re: zero copy code review
Message-ID: <20001130233037.L8051@fw.wintelcom.net>
References: <14886.63486.157224.937225@grasshopper.cs.duke.edu> <Pine.BSF.4.21.0011302159210.79831-100000@jehovah.technokratis.com> <20001201002235.D10772@panzer.kdm.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20001201002235.D10772@panzer.kdm.org>; from ken@kdm.org on Fri, Dec 01, 2000 at 12:22:35AM -0700
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Kenneth D. Merry <ken@kdm.org> [001130 23:22] wrote:
> On Thu, Nov 30, 2000 at 22:18:43 -0500, Bosko Milekic wrote:
> > On Thu, 30 Nov 2000, Andrew Gallatin wrote:
> > 
> > [...]
> > > <... some helpful comments deleted ....>
> > > 
> > > Many of your comments are directly related to -current, I
> > > think I'll let Ken address them...
> > 
> > 	Another one directly related to -CURRENT:
> > 
> > 	I just noticed that the uipc_jumbo.c stuff does not do any locking.
> >   Perhaps it would be nice to lock the code sooner or later. I would be
> >   willing to go over it and do it but, as I said, I am really not going to
> >   be able to do much until 2 weeks from now.
> 
> It does have spls in the right places, in this case splimp() and splvm().
> Would you just convert those to the proper mutexes, or are we going to go
> with per-data-structure mutexes (i.e. a little finer granularity), or...?
> (I don't know much about the mutex strategy we're using...)

The vm system is likely to be the last thing to be locked down, if
your code dips in the vm system you'll have to aquire Giant, possibly
several times through your codepath, the performance can drop
dramatically for the SMP case.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  1 10:17:29 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from wall.polstra.com (unknown [206.213.115.74])
	by hub.freebsd.org (Postfix) with ESMTP id 9E1D537B400
	for <arch@freebsd.org>; Fri,  1 Dec 2000 10:17:25 -0800 (PST)
Received: from vashon.polstra.com (vashon.polstra.com [206.213.73.13])
	by wall.polstra.com (8.9.3/8.9.3) with ESMTP id KAA22993;
	Fri, 1 Dec 2000 10:11:53 -0800 (PST)
	(envelope-from jdp@wall.polstra.com)
Received: (from jdp@localhost)
	by vashon.polstra.com (8.11.0/8.11.0) id eB1IBqY01763;
	Fri, 1 Dec 2000 10:11:52 -0800 (PST)
	(envelope-from jdp)
Date: Fri, 1 Dec 2000 10:11:52 -0800 (PST)
Message-Id: <200012011811.eB1IBqY01763@vashon.polstra.com>
To: arch@freebsd.org
From: John Polstra <jdp@polstra.com>
Reply-To: arch@freebsd.org
Cc: marcel@cup.hp.com
Subject: Re: Modifying FILE to add lock
In-Reply-To: <3A257ABD.5238ED4E@cup.hp.com>
References: <Pine.BSF.4.20.0011291152160.58003-100000@alive.znep.com> <3A257ABD.5238ED4E@cup.hp.com>
Organization: Polstra & Co., Seattle, WA
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In article <3A257ABD.5238ED4E@cup.hp.com>,
Marcel Moolenaar  <marcel@cup.hp.com> wrote:
> 
> Having done the signal changes, I immediately have to think about the
> Modula port...

Thank you, Marcel. :-)  Modula-3 does indeed have its own rendition
of the FILE structure, which is supposed to match the system's
version exactly.  So it is a problem, in theory.  In practice it is
not such a problem, because as far as I know, there aren't any
Modula-3 programs which use the stdio interface for their I/O.
Modula-3 has its own I/O system which uses read() and write() rather
than stdio.

The #1 biggest hassle with the Modula-3 stuff is that it has
Modula-3 versions of all of the system structures, and they have to
match exactly for things to work.  Some day I swear I'm going to
work out a way to generate the M3 versions automatically from the
header files in /usr/include ...

John
-- 
  John Polstra                                               jdp@polstra.com
  John D. Polstra & Co., Inc.                        Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Ch�gyam Trungpa


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  1 12: 8:20 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from rover.village.org (rover.village.org [204.144.255.66])
	by hub.freebsd.org (Postfix) with ESMTP id E443A37B400
	for <arch@FreeBSD.ORG>; Fri,  1 Dec 2000 12:08:17 -0800 (PST)
Received: from harmony.village.org (harmony.village.org [10.0.0.6])
	by rover.village.org (8.11.0/8.11.0) with ESMTP id eB1K8DQ79210;
	Fri, 1 Dec 2000 13:08:13 -0700 (MST)
	(envelope-from imp@harmony.village.org)
Received: from harmony.village.org (localhost.village.org [127.0.0.1]) by harmony.village.org (8.9.3/8.8.3) with ESMTP id NAA08306; Fri, 1 Dec 2000 13:08:12 -0700 (MST)
Message-Id: <200012012008.NAA08306@harmony.village.org>
To: Wes Peters <wes@softweyr.com>
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) 
Cc: arch@FreeBSD.ORG
In-reply-to: Your message of "Wed, 29 Nov 2000 00:54:42 MST."
		<3A24B642.34B50961@softweyr.com> 
References: <3A24B642.34B50961@softweyr.com>  <52694.975362925@winston.osd.bsdi.com> <20001127144809.A67395@citusc17.usc.edu> <200011272307.eARN7Ln34886@earth.backplane.com> 
Date: Fri, 01 Dec 2000 13:08:12 -0700
From: Warner Losh <imp@village.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In message <3A24B642.34B50961@softweyr.com> Wes Peters writes:
: IMHO, this is one of the biggest arguments for using bash.  I get bitten
: all the time when I leave bash for another interactive program that no
: longer provides BS/DEL compatibility.  Fixing it everywhere is a good
: idea.

I see that this has already been committed.  I'm not going to argue
with that (I think it was a good idea), but there are other issues in
the tree.

The issue that I have is that there are many places in the tree where
the erase character is known and things are done based on it.  Will
all of those be updated to have the two aces?  There's a hack in hack
right now:

./games/hack/hack.tty.c:                if(c == erase_char || c == '\b') {

as well as other examples in the tree.

Talk also has a provision for transporting these characters over the
interface.  If both were allowed, some translation would also be
needed.

Warner


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  1 14:51:19 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7454137B400; Fri,  1 Dec 2000 14:51:16 -0800 (PST)
Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30])
	by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id RAA23194;
	Fri, 1 Dec 2000 17:51:14 -0500 (EST)
Received: (from gallatin@localhost)
	by grasshopper.cs.duke.edu (8.11.1/8.9.1) id eB1MpEp06117;
	Fri, 1 Dec 2000 17:51:14 -0500 (EST)
	(envelope-from gallatin@cs.duke.edu)
From: Andrew Gallatin <gallatin@cs.duke.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Date: Fri,  1 Dec 2000 17:51:14 -0500 (EST)
To: Bosko Milekic <bmilekic@technokratis.com>
Cc: "Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG,
	alfred@FreeBSD.ORG
Subject: Re: zero copy code review
In-Reply-To: <Pine.BSF.4.21.0011302159210.79831-100000@jehovah.technokratis.com>
References: <14886.63486.157224.937225@grasshopper.cs.duke.edu>
	<Pine.BSF.4.21.0011302159210.79831-100000@jehovah.technokratis.com>
X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs  Lucid
Message-ID: <14888.9802.415926.434956@grasshopper.cs.duke.edu>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Bosko Milekic writes:
 > > It was un-staticized because it is called by socow_iodone(), which 
 > > is the m_ext free for zero-copy transmissions.
 > 
 > 	I see. But if the sendfile code still passes it as its own free
 >   routine, then shouldn't it remain staticized, strictly speaking? Although
 >   I may have missed it in the large diff, I did not see any changes to the
 >   actual registering of sf_bufs in the actual sendfile code (i.e.
 >   uipc_syscalls.c). I'm under the impression that in uipc_syscalls.c, the
 >   MEXTADD which sets up an sf_buf with an mbuf still passes sf_buf_free as
 >   its free routine.

I'm still not sure I understand your objection.  There's some code in
socow_cowsetup() which uses sf bufs.  Prior to allocating the sf_buf, it
does some of its own fiddling with the page and introduces some state
the sf_buf_free() wouldn't know how to clear.  socow_iodone() undoes
that fiddling and then calls sf_buf_free() to free the sfbuf.  Isn't
it better to call sf_buf_free() than to cut & paste the code?

<...>

 > > But the mbuf is allocated using M_WAIT.  Can that fail?  I haven't
 > > kept up with the mbuf changes in -current.
 > 
 > 	Yes, it can. M_WAIT just means "if nothing is available, first drain

Eeek!  I had no idea; I was thinking of it as blocking forever.  This
will have to be addressed.  Thank you for pointing it out!

 >   the stacks and if still nothing is available, then wait
 >   kern.ipc.mbuf_wait ticks (sysctl) and if still nothing is available, fail
 >   and set the passed in pointer to NULL and hope that the caller will deal
 >   with it." Waiting indefinetly can be dangerous in certain situations (for
 >   mbufs) but I won't get into that here.
 > 	In your code, you do deal with the possibility of the MGETHDR
 >   returning NULL (you check for it) and you set ENOBUFS in that case and
 >   jump to the "errorpath" label. But, before using MGETHDR, you allocate an
 >   sf_buf (in sf) and it just so happens that the code beyond "errorpath"
 >   does not take care of freeing the sf_buf you allocated before even
 >   trying to allocate the mbuf.

I see your point.  This was copied, (bug for bug ;-), from sendfile itself.
Look at line 1700 or so of kern/uipc_syscalls.c..  This bug should
probaby be fixed there too..

 >   	Another thing to note, especially if you are Pre-SMPng: sf_buf_alloc
 >   calls can block, and even indeffinately (until the allocation is
 >   succesfull). In sendfile(2), this doesn't matter as you're not allocating
 >   the sf_buf from an interrupt. It has the potential to be a problem if you
 >   start allocating sf_bufs from interrupt context. Unfortunately, I haven't
 >   yet read+fully visualized all the code in the large diff, but this is
 >   something to take into account when reviewing.

The nfs sf_buf_alloc() calls will be made from either a process
context (when doing a zero-copy send over a socket) or from the
context of an nfsiod for the NFS code, so I think this should
be safe.


Thanks!

Drew


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  1 15:40:14 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from implode.root.com (root.com [209.102.106.178])
	by hub.freebsd.org (Postfix) with ESMTP
	id 748C137B6D0; Fri,  1 Dec 2000 15:29:56 -0800 (PST)
Received: from implode.root.com (localhost [127.0.0.1])
	by implode.root.com (8.8.8/8.8.5) with ESMTP id PAA14154;
	Fri, 1 Dec 2000 15:26:19 -0800 (PST)
Message-Id: <200012012326.PAA14154@implode.root.com>
To: Andrew Gallatin <gallatin@cs.duke.edu>
Cc: Bosko Milekic <bmilekic@technokratis.com>,
	"Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG,
	alfred@FreeBSD.ORG
Subject: Re: zero copy code review 
In-reply-to: Your message of "Fri, 01 Dec 2000 17:51:14 EST."
             <14888.9802.415926.434956@grasshopper.cs.duke.edu> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Fri, 01 Dec 2000 15:26:19 -0800
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> > 	In your code, you do deal with the possibility of the MGETHDR
> >   returning NULL (you check for it) and you set ENOBUFS in that case and
> >   jump to the "errorpath" label. But, before using MGETHDR, you allocate an
> >   sf_buf (in sf) and it just so happens that the code beyond "errorpath"
> >   does not take care of freeing the sf_buf you allocated before even
> >   trying to allocate the mbuf.
>
>I see your point.  This was copied, (bug for bug ;-), from sendfile itself.
>Look at line 1700 or so of kern/uipc_syscalls.c..  This bug should
>probaby be fixed there too..

   Oops. The original assumption (and code that I wrote) was that M_WAIT
_cannot_ return a NULL pointer. This was changed in FreeBSD recently, and
as you mentioned, the code added in rev 1.65 that now checks for it in
sendfile doesn't do complete cleanup in this case. It definately should
be fixed so that the sf_buf is freed as well.

-DG

David Greenman
Co-founder, The FreeBSD Project - http://www.freebsd.org
President, TeraSolutions, Inc. - http://www.terasolutions.com
Pave the road of life with opportunities.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  1 16: 5:11 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from implode.root.com (root.com [209.102.106.178])
	by hub.freebsd.org (Postfix) with ESMTP
	id 8CE2B37B400; Fri,  1 Dec 2000 16:05:07 -0800 (PST)
Received: from implode.root.com (localhost [127.0.0.1])
	by implode.root.com (8.8.8/8.8.5) with ESMTP id QAA14320;
	Fri, 1 Dec 2000 16:01:42 -0800 (PST)
Message-Id: <200012020001.QAA14320@implode.root.com>
To: Andrew Gallatin <gallatin@cs.duke.edu>,
	Bosko Milekic <bmilekic@technokratis.com>,
	"Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG,
	alfred@FreeBSD.ORG
Subject: Re: zero copy code review 
In-reply-to: Your message of "Fri, 01 Dec 2000 15:26:19 PST."
             <200012012326.PAA14154@implode.root.com> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Fri, 01 Dec 2000 16:01:41 -0800
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

>> > 	In your code, you do deal with the possibility of the MGETHDR
>> >   returning NULL (you check for it) and you set ENOBUFS in that case and
>> >   jump to the "errorpath" label. But, before using MGETHDR, you allocate an
>> >   sf_buf (in sf) and it just so happens that the code beyond "errorpath"
>> >   does not take care of freeing the sf_buf you allocated before even
>> >   trying to allocate the mbuf.
>>
>>I see your point.  This was copied, (bug for bug ;-), from sendfile itself.
>>Look at line 1700 or so of kern/uipc_syscalls.c..  This bug should
>>probaby be fixed there too..
>
>   Oops. The original assumption (and code that I wrote) was that M_WAIT
>_cannot_ return a NULL pointer. This was changed in FreeBSD recently, and
>as you mentioned, the code added in rev 1.65 that now checks for it in
>sendfile doesn't do complete cleanup in this case. It definately should
>be fixed so that the sf_buf is freed as well.

   Followup...the attached patch should fix the problem.

-DG

David Greenman
Co-founder, The FreeBSD Project - http://www.freebsd.org
President, TeraSolutions, Inc. - http://www.terasolutions.com
Pave the road of life with opportunities.


Index: uipc_syscalls.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/uipc_syscalls.c,v
retrieving revision 1.65.2.3
diff -c -r1.65.2.3 uipc_syscalls.c
*** uipc_syscalls.c	2000/08/16 19:20:31	1.65.2.3
--- uipc_syscalls.c	2000/12/01 23:54:19
***************
*** 1628,1633 ****
--- 1630,1636 ----
  		MGETHDR(m, M_WAIT, MT_DATA);
  		if (m == NULL) {
  			error = ENOBUFS;
+ 			sf_buf_free((void *)sf->kva, PAGE_SIZE);
  			goto done;
  		}
  		m->m_ext.ext_free = sf_buf_free;


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  1 16:50:59 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from falla.videotron.net (falla.videotron.net [205.151.222.106])
	by hub.freebsd.org (Postfix) with ESMTP id 02E4137B400
	for <arch@FreeBSD.ORG>; Fri,  1 Dec 2000 16:50:58 -0800 (PST)
Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213])
 by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8)
 with ESMTP id <0G4X00ICE10VZM@falla.videotron.net> for arch@FreeBSD.ORG; Fri,  1 Dec 2000 19:50:55 -0500 (EST)
Date: Fri, 01 Dec 2000 19:51:39 -0500 (EST)
From: Bosko Milekic <bmilekic@technokratis.com>
Subject: Re: zero copy code review
In-reply-to: <200012020001.QAA14320@implode.root.com>
To: David Greenman <dg@root.com>
Cc: arch@FreeBSD.ORG
Message-id: <Pine.BSF.4.21.0012011950060.85148-100000@jehovah.technokratis.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On Fri, 1 Dec 2000, David Greenman wrote:

>    Followup...the attached patch should fix the problem.
> 
> -DG
> 
> David Greenman
> Co-founder, The FreeBSD Project - http://www.freebsd.org
> President, TeraSolutions, Inc. - http://www.terasolutions.com
> Pave the road of life with opportunities.

	Cool. Committed to both -CURRENT and -STABLE...

  Cheers,
  Bosko.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  1 17:57:11 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1])
	by hub.freebsd.org (Postfix) with ESMTP
	id E5C6437B400; Fri,  1 Dec 2000 17:57:08 -0800 (PST)
Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30])
	by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id UAA25666;
	Fri, 1 Dec 2000 20:57:01 -0500 (EST)
Received: (from gallatin@localhost)
	by grasshopper.cs.duke.edu (8.11.1/8.9.1) id eB21v1Y06449;
	Fri, 1 Dec 2000 20:57:01 -0500 (EST)
	(envelope-from gallatin@cs.duke.edu)
From: Andrew Gallatin <gallatin@cs.duke.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Date: Fri,  1 Dec 2000 20:57:00 -0500 (EST)
To: dg@root.com
Cc: Bosko Milekic <bmilekic@technokratis.com>,
	"Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG,
	alfred@FreeBSD.ORG
Subject: Re: zero copy code review 
In-Reply-To: <200012012326.PAA14154@implode.root.com>
References: <14888.9802.415926.434956@grasshopper.cs.duke.edu>
	<200012012326.PAA14154@implode.root.com>
X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs  Lucid
Message-ID: <14888.22179.833528.247128@grasshopper.cs.duke.edu>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


David Greenman writes:
 >    Oops. The original assumption (and code that I wrote) was that M_WAIT
 > _cannot_ return a NULL pointer. This was changed in FreeBSD recently, and

Yes, that's always been my assumption too.  That's why I never noticed
it...

Drew


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  1 18: 1:25 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from feral.com (feral.com [192.67.166.1])
	by hub.freebsd.org (Postfix) with ESMTP
	id 0C2B737B400; Fri,  1 Dec 2000 18:01:24 -0800 (PST)
Received: from beppo (beppo [192.67.166.79])
	by feral.com (8.9.3/8.9.3) with ESMTP id SAA08647;
	Fri, 1 Dec 2000 18:01:03 -0800
Date: Fri, 1 Dec 2000 18:01:04 -0800 (PST)
From: Matthew Jacob <mjacob@feral.com>
Reply-To: mjacob@feral.com
To: Andrew Gallatin <gallatin@cs.duke.edu>
Cc: dg@root.com, Bosko Milekic <bmilekic@technokratis.com>,
	"Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG,
	alfred@FreeBSD.ORG
Subject: Re: zero copy code review 
In-Reply-To: <14888.22179.833528.247128@grasshopper.cs.duke.edu>
Message-ID: <Pine.BSF.4.21.0012011758110.46782-100000@beppo.feral.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


> 
> David Greenman writes:
>  >    Oops. The original assumption (and code that I wrote) was that M_WAIT
>  > _cannot_ return a NULL pointer. This was changed in FreeBSD recently, and
> 
> Yes, that's always been my assumption too.  That's why I never noticed
> it...

IIRC, this has never been guaranteed. It's often unlikely that a request can't
be satisfied after a sleep with the current code.

We used to kill off shell pipes by spraying Sparc-1s as a test. This was
another reason (at the time) that SunOS (4.2based with 4.3 changes- pipes were
implemented with mbufs) was considered eligible to be replaced with SVr4.

-matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  1 18: 6:23 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from implode.root.com (root.com [209.102.106.178])
	by hub.freebsd.org (Postfix) with ESMTP
	id D073737B401; Fri,  1 Dec 2000 18:06:19 -0800 (PST)
Received: from implode.root.com (localhost [127.0.0.1])
	by implode.root.com (8.8.8/8.8.5) with ESMTP id SAA14681;
	Fri, 1 Dec 2000 18:02:20 -0800 (PST)
Message-Id: <200012020202.SAA14681@implode.root.com>
To: mjacob@feral.com
Cc: Andrew Gallatin <gallatin@cs.duke.edu>,
	Bosko Milekic <bmilekic@technokratis.com>,
	"Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG,
	alfred@FreeBSD.ORG
Subject: Re: zero copy code review 
In-reply-to: Your message of "Fri, 01 Dec 2000 18:01:04 PST."
             <Pine.BSF.4.21.0012011758110.46782-100000@beppo.feral.com> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Fri, 01 Dec 2000 18:02:20 -0800
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

>
>> 
>> David Greenman writes:
>>  >    Oops. The original assumption (and code that I wrote) was that M_WAIT
>>  > _cannot_ return a NULL pointer. This was changed in FreeBSD recently, and
>> 
>> Yes, that's always been my assumption too.  That's why I never noticed
>> it...
>
>IIRC, this has never been guaranteed. It's often unlikely that a request can't
>be satisfied after a sleep with the current code.

   FreeBSD blocked indefinitly and never returned a NULL pointer.

-DG

David Greenman
Co-founder, The FreeBSD Project - http://www.freebsd.org
President, TeraSolutions, Inc. - http://www.terasolutions.com
Pave the road of life with opportunities.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  1 18: 6:28 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from feral.com (feral.com [192.67.166.1])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9D58637B400; Fri,  1 Dec 2000 18:06:26 -0800 (PST)
Received: from beppo (beppo [192.67.166.79])
	by feral.com (8.9.3/8.9.3) with ESMTP id SAA08663;
	Fri, 1 Dec 2000 18:06:22 -0800
Date: Fri, 1 Dec 2000 18:06:22 -0800 (PST)
From: Matthew Jacob <mjacob@feral.com>
Reply-To: mjacob@feral.com
To: David Greenman <dg@root.com>
Cc: Andrew Gallatin <gallatin@cs.duke.edu>,
	Bosko Milekic <bmilekic@technokratis.com>,
	"Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG,
	alfred@FreeBSD.ORG
Subject: Re: zero copy code review 
In-Reply-To: <200012020202.SAA14681@implode.root.com>
Message-ID: <Pine.BSF.4.21.0012011805210.46782-100000@beppo.feral.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> >
> >> 
> >> David Greenman writes:
> >>  >    Oops. The original assumption (and code that I wrote) was that M_WAIT
> >>  > _cannot_ return a NULL pointer. This was changed in FreeBSD recently, and
> >> 
> >> Yes, that's always been my assumption too.  That's why I never noticed
> >> it...
> >
> >IIRC, this has never been guaranteed. It's often unlikely that a request can't
> >be satisfied after a sleep with the current code.
> 
>    FreeBSD blocked indefinitly and never returned a NULL pointer.

Smells like livelock somewhere here, but has it changed recently as has been
asserted?


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  1 18:26:36 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from implode.root.com (root.com [209.102.106.178])
	by hub.freebsd.org (Postfix) with ESMTP
	id 99A2137B400; Fri,  1 Dec 2000 18:26:33 -0800 (PST)
Received: from implode.root.com (localhost [127.0.0.1])
	by implode.root.com (8.8.8/8.8.5) with ESMTP id SAA14753;
	Fri, 1 Dec 2000 18:22:36 -0800 (PST)
Message-Id: <200012020222.SAA14753@implode.root.com>
To: mjacob@feral.com
Cc: Andrew Gallatin <gallatin@cs.duke.edu>,
	Bosko Milekic <bmilekic@technokratis.com>,
	"Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG,
	alfred@FreeBSD.ORG
Subject: Re: zero copy code review 
In-reply-to: Your message of "Fri, 01 Dec 2000 18:06:22 PST."
             <Pine.BSF.4.21.0012011805210.46782-100000@beppo.feral.com> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Fri, 01 Dec 2000 18:22:36 -0800
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

>> >> Yes, that's always been my assumption too.  That's why I never noticed
>> >> it...
>> >
>> >IIRC, this has never been guaranteed. It's often unlikely that a request can't
>> >be satisfied after a sleep with the current code.
>> 
>>    FreeBSD blocked indefinitly and never returned a NULL pointer.
>
>Smells like livelock somewhere here, but has it changed recently as has been
>asserted?

   Huh? No, the process allocating the memory blocks waiting for memory. If
memory never becomes available, then the process never wakes up, but this is
NOT a livelock.

-DG

David Greenman
Co-founder, The FreeBSD Project - http://www.freebsd.org
President, TeraSolutions, Inc. - http://www.terasolutions.com
Pave the road of life with opportunities.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  1 18:43:38 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from feral.com (feral.com [192.67.166.1])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5B22C37B400; Fri,  1 Dec 2000 18:43:36 -0800 (PST)
Received: from beppo (beppo [192.67.166.79])
	by feral.com (8.9.3/8.9.3) with ESMTP id SAA08716;
	Fri, 1 Dec 2000 18:43:33 -0800
Date: Fri, 1 Dec 2000 18:43:33 -0800 (PST)
From: Matthew Jacob <mjacob@feral.com>
Reply-To: mjacob@feral.com
To: David Greenman <dg@root.com>
Cc: Andrew Gallatin <gallatin@cs.duke.edu>,
	Bosko Milekic <bmilekic@technokratis.com>,
	"Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG,
	alfred@FreeBSD.ORG
Subject: Re: zero copy code review 
In-Reply-To: <200012020222.SAA14753@implode.root.com>
Message-ID: <Pine.BSF.4.21.0012011843220.46782-100000@beppo.feral.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On Fri, 1 Dec 2000, David Greenman wrote:

> >> >> Yes, that's always been my assumption too.  That's why I never noticed
> >> >> it...
> >> >
> >> >IIRC, this has never been guaranteed. It's often unlikely that a request can't
> >> >be satisfied after a sleep with the current code.
> >> 
> >>    FreeBSD blocked indefinitly and never returned a NULL pointer.
> >
> >Smells like livelock somewhere here, but has it changed recently as has been
> >asserted?
> 
>    Huh? No, the process allocating the memory blocks waiting for memory. If
> memory never becomes available, then the process never wakes up, but this is
> NOT a livelock.
> 
oops, sorry, you're right.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  1 18:58:28 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193])
	by hub.freebsd.org (Postfix) with ESMTP id DEFE837B400
	for <arch@freebsd.org>; Fri,  1 Dec 2000 18:58:25 -0800 (PST)
Received: (from wollman@localhost)
	by khavrinen.lcs.mit.edu (8.9.3/8.9.3) id VAA45432;
	Fri, 1 Dec 2000 21:58:21 -0500 (EST)
	(envelope-from wollman)
Date: Fri, 1 Dec 2000 21:58:21 -0500 (EST)
From: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
Message-Id: <200012020258.VAA45432@khavrinen.lcs.mit.edu>
To: dg@root.com
Cc: arch@freebsd.org
Subject: Re: zero copy code review 
X-Newsgroups: mit.lcs.mail.freebsd-arch
In-Reply-To: <mit.lcs.mail.freebsd-arch/200012020202.SAA14681@implode.root.com>
References: <mit.lcs.mail.freebsd-arch/Pine.BSF.4.21.0012011758110.46782-100000@beppo.feral.com>
Organization: MIT Laboratory for Computer Science
Cc: 
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In article <mit.lcs.mail.freebsd-arch/200012020202.SAA14681@implode.root.com> you write:

>   FreeBSD blocked indefinitly and never returned a NULL pointer.

It has never been like that in the FreeBSD era, to my knowledge.  4.3
(or at least 4.3+Wisconsin NFS) slept for mbufs but panicked if it
couldn't allocate a cluster; 4.4 as we got it would drain protocols
once, for mbufs only, and then return nil if there were still no mbufs
free -- thus causing a page-not-present fault a few instructions later
as code which assumed M_WAIT could never fail dereferenced the null
pointer.

Deadlocks may have been possible under 4.3+NFS, if the kernel wanted
to allocate a page of physical memory for more mbufs, but all
potentially-available memory was both dirty and backed by NFS (think
diskless workstation).  My guess is that this is why 4.4 did not
sleep.

-GAWollman

-- 
Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
wollman@lcs.mit.edu  | O Siem / The fires of freedom 
Opinions not those of| Dance in the burning flame
MIT, LCS, CRS, or NSA|                     - Susan Aglukark and Chad Irschick


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Dec  1 22:37:50 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from homer.softweyr.com (bsdconspiracy.net [208.187.122.220])
	by hub.freebsd.org (Postfix) with ESMTP id 2335C37B400
	for <arch@freebsd.org>; Fri,  1 Dec 2000 22:37:45 -0800 (PST)
Received: from [127.0.0.1] (helo=softweyr.com ident=Fools trust ident!)
	by homer.softweyr.com with esmtp (Exim 3.16 #1)
	id 1426Lg-0000SZ-00; Fri, 01 Dec 2000 23:40:41 -0700
Message-ID: <3A289968.63C593E2@softweyr.com>
Date: Fri, 01 Dec 2000 23:40:40 -0700
From: Wes Peters <wes@softweyr.com>
Organization: Softweyr LLC
X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.2.12 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Warner Losh <imp@village.org>
Cc: arch@FreeBSD.ORG
Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image 
 for x86 updated.)
References: <3A24B642.34B50961@softweyr.com>  <52694.975362925@winston.osd.bsdi.com> <20001127144809.A67395@citusc17.usc.edu> <200011272307.eARN7Ln34886@earth.backplane.com> <200012012008.NAA08306@harmony.village.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Warner Losh wrote:
> 
> In message <3A24B642.34B50961@softweyr.com> Wes Peters writes:
> : IMHO, this is one of the biggest arguments for using bash.  I get bitten
> : all the time when I leave bash for another interactive program that no
> : longer provides BS/DEL compatibility.  Fixing it everywhere is a good
> : idea.
> 
> I see that this has already been committed.  I'm not going to argue
> with that (I think it was a good idea), but there are other issues in
> the tree.
> 
> The issue that I have is that there are many places in the tree where
> the erase character is known and things are done based on it.  Will
> all of those be updated to have the two aces?  There's a hack in hack
> right now:
> 
> ./games/hack/hack.tty.c:                if(c == erase_char || c == '\b') {
> 
> as well as other examples in the tree.
> 
> Talk also has a provision for transporting these characters over the
> interface.  If both were allowed, some translation would also be
> needed.

It shouldn't make any different if the interface is in raw mode, which is
pretty much required for any character-at-a-time I/O.  I would have 
preferred to see this in a special line discipline module rather than
buried on the bowels of the tty driver, so it could be optional behavior.

-- 
            "Where am I, and what am I doing in this handbasket?"

Wes Peters                                                         Softweyr LLC
wes@softweyr.com                                           http://softweyr.com/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sat Dec  2  9:59:41 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from falla.videotron.net (falla.videotron.net [205.151.222.106])
	by hub.freebsd.org (Postfix) with ESMTP id 49A0A37B400
	for <arch@FreeBSD.ORG>; Sat,  2 Dec 2000 09:59:39 -0800 (PST)
Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213])
 by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8)
 with ESMTP id <0G4Y005JNCNC4R@falla.videotron.net> for arch@FreeBSD.ORG; Sat,  2 Dec 2000 12:59:36 -0500 (EST)
Date: Sat, 02 Dec 2000 13:00:22 -0500 (EST)
From: Bosko Milekic <bmilekic@technokratis.com>
Subject: Re: zero copy code review
In-reply-to: <20001201002235.D10772@panzer.kdm.org>
To: "Kenneth D. Merry" <ken@kdm.org>
Cc: arch@FreeBSD.ORG
Message-id: <Pine.BSF.4.21.0012021237540.91517-100000@jehovah.technokratis.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On Fri, 1 Dec 2000, Kenneth D. Merry wrote:

> It does have spls in the right places, in this case splimp() and splvm().
> Would you just convert those to the proper mutexes, or are we going to go
> with per-data-structure mutexes (i.e. a little finer granularity), or...?
> (I don't know much about the mutex strategy we're using...)

	For now, you won't be able to do anything with the splvm() stuff, as
  the VM code has not yet been ripped out from under Giant (and likely
  won't be for a while).
  	A few notes Re: spl()s and mutexes in uipc_jumbo.c, in particular
  (since that's where I would begin putting in mutexes):

  - Your jumbo_kmap singly linked list should probably not be manipulated
    under splvm() [in fact, I think it's wrong]. The list should be
    protected by a lock.

  - jumbo_freem should just be called jumbo_free, if the naming convention
    is being adopted from the mbuf system (which it looks like it is). The
    reason is that for mbufs, m_free() frees a single mbuf while m_freem()
    frees an entire chain of them.

  - jumbo_pg_free should be ripped out from under splimp(); leave the
    explicit splvm() in there, but protect the list manipulations with the
    lock.

	If most of the things pointed out earlier are fixed, and as long as
  the code is not flawed (which I really doubt it would be anyway), I have
  no objections to it going in soon and then attacking the above issue a
  little later (If nobody gets to it within the next two weeks, I'll be
  glad to do it myself once those 2 weeks are past).

  Regards,
  Bosko Milekic
  bmilekic@technokratis.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sat Dec  2 10: 6:15 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from falla.videotron.net (falla.videotron.net [205.151.222.106])
	by hub.freebsd.org (Postfix) with ESMTP id BD31D37B6A1
	for <arch@FreeBSD.ORG>; Sat,  2 Dec 2000 10:06:10 -0800 (PST)
Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213])
 by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8)
 with ESMTP id <0G4Y0066FCY87W@falla.videotron.net> for arch@FreeBSD.ORG; Sat,  2 Dec 2000 13:06:09 -0500 (EST)
Date: Sat, 02 Dec 2000 13:06:54 -0500 (EST)
From: Bosko Milekic <bmilekic@technokratis.com>
Subject: Re: zero copy code review
In-reply-to: <14888.9802.415926.434956@grasshopper.cs.duke.edu>
To: Andrew Gallatin <gallatin@cs.duke.edu>
Cc: "Kenneth D. Merry" <ken@kdm.org>, arch@FreeBSD.ORG
Message-id: <Pine.BSF.4.21.0012021301450.91641-100000@jehovah.technokratis.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On Fri, 1 Dec 2000, Andrew Gallatin wrote:

> I'm still not sure I understand your objection.  There's some code in
> socow_cowsetup() which uses sf bufs.  Prior to allocating the sf_buf, it
> does some of its own fiddling with the page and introduces some state
> the sf_buf_free() wouldn't know how to clear.  socow_iodone() undoes
> that fiddling and then calls sf_buf_free() to free the sfbuf.  Isn't
> it better to call sf_buf_free() than to cut & paste the code?
> 
> <...>

	Yeah, you're right. I overlooked things when I posted that.

> I see your point.  This was copied, (bug for bug ;-), from sendfile itself.
> Look at line 1700 or so of kern/uipc_syscalls.c..  This bug should
> probaby be fixed there too..

	Yep. You're right. This is a bug that is the result of some of my
  code, actually (a while back, before I got the commit bit). When the wait
  code was first introduced, I had to go around the code looking for places
  previously expecting that M_WAIT will never return NULL and make them
  deal with the possibility. As we see now, I overlooked the fact that the
  sf_buf has to be freed in the case of failure, in the sendfile(2) case.
  Good thing we caught this now, and David Greenman was extremely quick to
  roll a diff.

> The nfs sf_buf_alloc() calls will be made from either a process
> context (when doing a zero-copy send over a socket) or from the
> context of an nfsiod for the NFS code, so I think this should
> be safe.

	Excellent.

> Thanks!
> 
> Drew

  Cheers,
  Bosko Milekic
  bmilekic@technokratis.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sat Dec  2 10:16:56 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from field.videotron.net (field.videotron.net [205.151.222.108])
	by hub.freebsd.org (Postfix) with ESMTP id 5B74437B400
	for <arch@FreeBSD.ORG>; Sat,  2 Dec 2000 10:16:53 -0800 (PST)
Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213])
 by field.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8)
 with ESMTP id <0G4Y0067MDG2K7@field.videotron.net> for arch@FreeBSD.ORG; Sat,  2 Dec 2000 13:16:51 -0500 (EST)
Date: Sat, 02 Dec 2000 13:17:36 -0500 (EST)
From: Bosko Milekic <bmilekic@technokratis.com>
Subject: Re: zero copy code review
In-reply-to: <20001201001619.C10772@panzer.kdm.org>
To: "Kenneth D. Merry" <ken@kdm.org>
Cc: arch@FreeBSD.ORG
Message-id: <Pine.BSF.4.21.0012021308040.91662-100000@jehovah.technokratis.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On Fri, 1 Dec 2000, Kenneth D. Merry wrote:

> > 	- Stylistic suggestion: please try to keep things 25x80. :-)
> 
> I try, and I think most of the changes are, except for the NFS stuff.  I
> didn't reformat that, although I suppose I could.  (It irritates me, too.)

	Ah, that explains it.

> In its current incarnation, EXT_DISPOSABLE indicates that the the memory
> used in the mbuf can be disposed of -- i.e. removed from the kernel's
> virtual address map.  The contents aren't disposed of, they're just moved
> elsewhere.
> 
> I don't think most of the rest of the mbuf code is setup to deal with the
> memory inside a non-external mbuf going away.  (Which would be the
> potential implication of having EXT_DISPOSABLE be a regular m_flag.)

	Okay, leaving that exactly the way it is now is The Right Thing To
  Do (I'm now convinced).

> >  tiio.h: Are you sure tiio.h belongs in src/sys/sys ?
> 
> Well, it defines the interface for the character device front end for the
> ti(4) driver.  Usually ioctls and supporting structures go in sys/sys. 
> Would you suggest another location?

	No, you're right.

> When Bill converted the ti(4) driver from spls to mutexes, I did the same
> conversion on my modifications to the driver.  Is that sufficient?  I'm not
> terribly up-to-date on the mutex stuff.
> 
> As for the rest of the code, since it was written pre-mutex, it still has
> the spls in the right places.  I suppose that they would just need to be
> converted to mutexes.  (Or is that an overly simplistic way to look at it? :)

	Well, you really only want to maintain data consistency with the
  lock. So you'll be looking at protecting your jumbo_kmap lists in the
  uipc_jumbo.c case with their own lock(s). If you're always looking at
  both of the lists (inuse and free) at the same time, protecting them with
  a single lock would be sufficient.
  	For what concerns splvm(), you can leave that as is for now. I've
  included comments regarding locking in another post, for uipc_jumbo.c
  	As for if_ti, I would have Bill Paul review that.

> >  Finally, I'd like to
> >  suggest possibly breaking up some of the diff to smaller chunks, just so
> >  it is easier to track things down if something does break. With -CURRENT
> >  changing relatively dramatically now sometimes several times in a single
> >  day, I think this would be worth it for everybody.
> 
> Heh, well, the big chunk is the Tigon firmware.  :)  
> 
> Are you suggesting just splitting the diffs out into multiple files, or
> actually breaking the changes up?  The latter would be rather difficult to
> do, I think.

	I was suggesting breaking some of the changes up, actually, and
  committing in several chunks (two or three, as opposed to one). But if
  this is too much of a problem, you don't have to feel obliged to
  implement the suggestion.

> Ken
> -- 
> Kenneth Merry
> ken@kdm.org

  Regards,
  Bosko Milekic
  bmilekic@technokratis.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sat Dec  2 14: 9:40 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from implode.root.com (root.com [209.102.106.178])
	by hub.freebsd.org (Postfix) with ESMTP id 29D6C37B400
	for <arch@freebsd.org>; Sat,  2 Dec 2000 14:09:38 -0800 (PST)
Received: from implode.root.com (localhost [127.0.0.1])
	by implode.root.com (8.8.8/8.8.5) with ESMTP id OAA16275;
	Sat, 2 Dec 2000 14:07:03 -0800 (PST)
Message-Id: <200012022207.OAA16275@implode.root.com>
To: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
Cc: arch@freebsd.org
Subject: Re: zero copy code review 
In-reply-to: Your message of "Fri, 01 Dec 2000 21:58:21 EST."
             <200012020258.VAA45432@khavrinen.lcs.mit.edu> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Sat, 02 Dec 2000 14:07:03 -0800
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

>In article <mit.lcs.mail.freebsd-arch/200012020202.SAA14681@implode.root.com> you write:
>
>>   FreeBSD blocked indefinitly and never returned a NULL pointer.
>
>It has never been like that in the FreeBSD era, to my knowledge.  4.3

   What we're in dispute over is what happens when the kernel runs out
of virtual memory - "mb_map" space. I'm pretty certain that FreeBSD
versions < 2.0 did just sleep when running out of mb_map space, although
I don't have the code around to verify this claim. It's interesting to
note that a process that went to sleep on the map would never wake up
since virtual memory allocated to network buffers was never returned to
the map and thus the kernel would never satisfy the VM shortage. In
FreeBSD 2.0, however, the kernel paniced when running out of mb_map space
with a "mb_map full" panic. It did not return a NULL pointer in the M_WAIT
case. Starting with FreeBSD 2.0.5, FreeBSD printed a console message and
returned a NULL pointer when running out of mb_map. I should have remembered
this better since I was the one who made the change for it to do this in
rev 1.9 of uipc_mbuf.c.
   Going back to 4.3 BSD, I see that the code behaved the same way that
FreeBSD 2.0 did, specifically in m_clalloc:

        mbx = rmalloc(mbmap, (long)npg);
        if (mbx == 0) {
                if (canwait == M_WAIT)
                        panic("out of mbufs: map full");
                return (0);
        }

   My main point was that it used to be a safe assumption that a NULL pointer
wasn't returned in the M_WAIT case. Now that I see that I was the one who
originally broke this assumption, I feel a bit sheepish, so I'll just crawl
away quietly and let this discussion progress. :-)

-DG

David Greenman
Co-founder, The FreeBSD Project - http://www.freebsd.org
President, TeraSolutions, Inc. - http://www.terasolutions.com
Pave the road of life with opportunities.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message