From owner-freebsd-arch Sun Nov 26 6:22:26 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mail.interware.hu (mail.interware.hu [195.70.32.130]) by hub.freebsd.org (Postfix) with ESMTP id 3FAA137B479; Sun, 26 Nov 2000 06:22:21 -0800 (PST) Received: from kinshasa-57.budapest.interware.hu ([195.70.51.185] helo=elischer.org) by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian)) id 1402gp-0004yJ-00; Sun, 26 Nov 2000 15:22:00 +0100 Message-ID: <3A211C82.2464D07E@elischer.org> Date: Sun, 26 Nov 2000 06:21:54 -0800 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: arch@FreeBSD.ORG, jasone@freebsd.org Subject: Re: Threads (KSE etc) comments References: <3A1B0B64.6D694248@elischer.org> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Where's jasone? He's been perculiarly silent during this.... There has been some discussion as to what the function of the KSEG is.... I was shaving today (doesn't the best thinkin happen at such times?) and thinking about why we needed KSEGs. The basic answer is, "We need some method by which we group the scheduled entities so as to be able to ensure that the scheduler has full information and control over what is going on." Whether we actually need a KSEG and what it does depends upon what semantics we want our threading support to have. If we want to provide a virtual machine for the process, that looks as if it has an unlimited number of virtual processors, then we allow the KSEG to spawn an unlimited number of KSEs. In this case, do we allow the "scheduling clout" to build up linearly with the number of KSEs or do we limit it in some way? Theoretically you would want a KSEG with two KSEs to have the same clout as a process running unthreaded, so that cpu time would be divided 50-50. However this would mean assigning the threaded process 'partial quantum' for each processor. By this I mean that after 5 ticks the KSE on each processor for the KSE would be interrupted and the other process allowed to run. This is unworkable. Another way of sharing the processors between the two processors would be to schedule bith KSEs on one process and allow the other process to run uninterrupted on the other. This is also quite unworkable - what if there are three competing processes and only 2 processors? Maybe this 'exact fairness' is too hard to achieve.. In my world, we allow the KSEG to become SLIGHTLTY unfair, by allowing it to compete independently on each processor. If we allow the KSEG to have an unlimited number of KSEs then we need some other item that competes on behalf of the KSEG on each processor. That is, we invent some other structure (KSEG-agent) that sits in the scheduling queue(s) on behalf of the KSEG. When the 'agent' gets a quantum, it allows the KSEG to decide which of it's KSEGs will be run next. (The KSEG could round robin them for example). When a KSE is pre-empted, the kernel saves state for that thread in the thread-control-block and the next KSE to upcall to the UTS will include that thread-control-block in its list of reportable entities. I'm not clear on whether it's the next upcall on ANY KSE, or just the next upcall on that KSE.. If the latter then having multiple KSEs on the same processor, allows the KSEG round-robin scheduler to make the UTS believe that it has N virtual processors, (N-KSEs). However, it also means that the KSEG round-robin scheduler is usurping the decision from the UTS as to which thread is to be run next, as the UTS doesn't know that the thread on the other KSE was pre-empted in favour of this one. (It's on a different virtual CPU). If the Former (All KSEs report all events) then there is no real advantage to having more than N KSEs (N processors), because that means that the UTS will probably keep swapping the threads it thinks are most important to the KSEs which means that the thread that was pre-empted on KSE-A will probably be re-scheduled on KSE-B when it preempts KSE-A. So why have KSE-B at all? All it does is massively confuse things, and creates a whole new class of scheduling problems. So, in summary: Assuming we allow only SLIGHT unfairness, if you allow the process to have more than N KSEs in a KSEG, you have one of the following: 1/ A lot of unfairness if you allow each KSE to be in the queues by itself. 2/ The KSEG scheduler usurping the role of the UTS if it really does hide the true number of processors. 3/ An increased level of UTS complexity, and un-needed work, as the UTS struggles to switch the important threads onto the ever-changing set of running KSEs (it must be ever changing because there are more of them than CPUs). If you only allow N KSEs to the KSEG, then all these problems go away. The UTS can be aware that it has a limit. But it can also be aware that a KSE will not be re-empted by another of it's own KSEs. (this simplifies things). It gets the same amount of CPU-time, but has less work to do. It has full control of which threads are running, and competes fairly with other processes and KSEGs. The reason for having KSEGs is simply as an entity that competes for CPU to assure fairness. It may not even exist as a separate structure in the case where there are separate per-CPU scheduling queues, (though I think it would for efficiency's sake). It would PROBABLY have a analogous partner in the UTS that represents the virtual machine that runs all the threads that are competing at the same scope. On a single scheduling queue system, I think I would have the KSEG in the queue rather than the independent KSEs. When it get's to the head, you schedule KSEs on all the CPUs. This allows the threads to communicate quickly using shared memory should they want. The UTS has the entire quantum across as many CPUs as it has. I hope that his answers some of the questions as to why I think there are reasons for having the KSEG entity. I hope there will be a good argument about this. We want as many people thinking about it as possible. I'll try draw up some more pictures.....(like last time) to illustrate my thoughts as to how this all works. Julian -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000 ---> X_.---._/ presently in: Budapest v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Nov 26 12:18:36 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mail.interware.hu (mail.interware.hu [195.70.32.130]) by hub.freebsd.org (Postfix) with ESMTP id 8491537B479; Sun, 26 Nov 2000 12:18:28 -0800 (PST) Received: from dakar-60.budapest.interware.hu ([195.70.51.124] helo=elischer.org) by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian)) id 1408Fj-0002ze-00; Sun, 26 Nov 2000 21:18:24 +0100 Message-ID: <3A216FFE.BE0F780F@elischer.org> Date: Sun, 26 Nov 2000 12:18:06 -0800 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: arch@FreeBSD.ORG, jasone@freebsd.org Subject: Re: Threads .. chopping up 'struct proc' References: <3A1B0B64.6D694248@elischer.org> <3A211C82.2464D07E@elischer.org> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I'v been looking a the proc srtucture.. The aim is to eventually move some of the fields into a struct KSE (struct schedbox?) struct KSEC (struct threadcontext?) struct KSEG (struct schedgroup?) Initially we would simply include one of each of these in the struct proc, but link them together as if they were correctly connected up. we would use macros such as: #define p_estcpu p_kse.kse_estcpu to keep present code working.... eventually functions that get changed to receive a kse directly would just use kse->kse_estcpu and if they need proc they can use kse->kse_proc. But until then, we'd start by simply separating the fields and using macros. Then we can convert calls at our leasure. However when going through the fields in struct proc, some difficulties become obvious. Here's my initial division of the fields. I've added a comment at the beginning of each line that indicates where I think it should go, however I'm not convinced about some of them: P = stays in struct proc E = goes to 'KSE' struct (schedulable entity) G = goes to 'group' struct C = goes to 'sleepable Context' struct. I note with [XXX] things I am sure about, or do nut really understand. these are usually new fields to do with things like events, or fields where the semantics of the feature have not been decided for a threaded environment. E.g. WHO GETS A SIGNAL? struct proc { /*E*/ TAILQ_ENTRY(proc) p_procq; /* run/mutex queue. */ [this may need to be split to two entries.. one in a KSE or and one in a KSEG, depending on how we do things ] /*C*/ TAILQ_ENTRY(proc) p_slpq; /* sleep queue. */ /*P*/ LIST_ENTRY(proc) p_list; /* List of all processes. */ /* substructures: */ /*P*/ struct pcred *p_cred; /* Process owner's identity. */ /*P*/ struct filedesc *p_fd; /* Ptr to open files structure. */ /*P*/ struct pstats *p_stats; /* Accounting/statistics (PROC ONLY). */ [some of these may need to be duplicated in the KSE and KSEG.. maybe even Context] /*P*/ struct plimit *p_limit; /* Process limits. */ /*P*/ struct vm_object *p_upages_obj;/* Upages object */ /*P*/ struct procsig *p_procsig; [Well, actually who gets signals? maybe this is per KSE? per KSEG? maybe even per Context as each context has a different user stack and signals are delivered on the user stack.. (unless set otherwise)] #define p_sigacts p_procsig->ps_sigacts #define p_sigignore p_procsig->ps_sigignore #define p_sigcatch p_procsig->ps_sigcatch #define p_ucred p_cred->pc_ucred #define p_rlimit p_limit->pl_rlimit /*C*/ int p_flag; /* P_* flags. */ [these flags will probably need to be shared out amongst the structures] /*C*/ char p_stat; /* S* process status. */ [as will these] char p_pad1[3]; /*P*/ pid_t p_pid; /* Process identifier. */ /*P*/ LIST_ENTRY(proc) p_hash; /* Hash chain. */ /*P*/ LIST_ENTRY(proc) p_pglist; /* List of processes in pgrp. */ /*P*/ struct proc *p_pptr; /* Pointer to parent process. */ /*P*/ LIST_ENTRY(proc) p_sibling; /* List of sibling processes. */ /*P*/ LIST_HEAD(, proc) p_children; /* Pointer to list of children. */ /*P*/ struct callout_handle p_ithandle; /* * Callout handle for scheduling * p_realtimer. */ [So who gets the resulting signal? Can differnt KSEGs have different timers running? what about KSEs? (I vote for KSEGs)] /* The following fields are all zeroed upon creation in fork. */ #define p_startzero p_oppid /*P*/ pid_t p_oppid; /* Save parent pid during ptrace. XXX */ /*C*/ int p_dupfd; /* Sideways return value from fdopen. XXX */ [whatever THIS means.. it's a hack so C is the safest place for it] /*P*/ struct vmspace *p_vmspace; /* Address space. */ /* scheduling */ [I've shown the following as being in the KSE structure. they would be collected there, but the priority is worked out for the entire KSEG so it probably collects the data from all of the KSEs. UNLESS we decide that all KSEs can have independent priorities, in which case how do you control how their priorities relate..] /*E*/ u_int p_estcpu; /* Time averaged value of p_cpticks. */ /*E*/ int p_cpticks; /* Ticks of cpu time. */ /*E*/ fixpt_t p_pctcpu; /* %cpu for this process during p_swtime */ void *p_wchan; /* Sleep address. */ const char *p_wmesg; /* Reason for sleep. */ /*P*/ u_int p_swtime; /* Time swapped in or out. */ /*E?*/ u_int p_slptime; /* Time since last blocked. */ [what does this mean?] /*?*/ struct itimerval p_realtimer; /* Alarm timer. */ [who gets these? who can set them? what is their scope?] /*P*/ u_int64_t p_runtime; /* Real time in microsec. */ [If we treat separate KSEGs as seperate processes, do we keep the below fields per KSEG? */ /*G?*/ u_int64_t p_uu; /* Previous user time in microsec. */ /*G?*/ u_int64_t p_su; /* Previous system time in microsec. */ /*G?*/ u_int64_t p_iu; /* Previous interrupt time in usec. */ [how about these? do we agregate? or collect per KSE? Is there a separate statclock per CPU?] /*P?*/ u_int64_t p_uticks; /* Statclock hits in user mode. */ /*P?*/ u_int64_t p_sticks; /* Statclock hits in system mode. */ /*P?*/ u_int64_t p_iticks; /* Statclock hits processing intr. */ /*P*/ int p_traceflag; /* Kernel trace points. */ /*P*/ struct vnode *p_tracep; /* Trace to vnode. */ [do we trace all KSEs at once? how do we trace individual threads? */ /*P*/ sigset_t p_siglist; /* Signals arrived but not delivered. */ [who gets signals? does each KSEG (KSE?) have its own handler?] /*P*/ struct vnode *p_textvp; /* Vnode of executable. */ /*P*/ char p_lock; /* Process lock (prevent swap) count. */ /*E*/ u_char p_oncpu; /* Which cpu we are on */ /*E?*/ u_char p_lastcpu; /* Last cpu we were on */ [each context or each KSE? KSEs can't migrate, (under discussion)] /*EG?*/ char p_rqindex; /* Run queue index */ Who is on the run queue? KSE or KSEG? /*C*/ short p_locks; /* DEBUG: lockmgr count of held locks */ /*C*/ short p_simple_locks; /* DEBUG: count of held simple locks */ [If you cannot sleep or be interrupted with these they could be in the KSE] /*P?*/ unsigned int p_stops; /* procfs event bitmask */ /*P?*/ unsigned int p_stype; /* procfs stop event type */ /*P?*/ char p_step; /* procfs stop *once* flag */ /*P?*/ unsigned char p_pfsflags; /* procfs flags */ [the procfs stuff is problematical... dependign in what it does and what it is used for, the semantics might vary] char p_pad3[2]; /* padding for alignment */ /*C*/ register_t p_retval[2]; /* syscall aux returns */ /*P*/ struct sigiolst p_sigiolst; /* list of sigio sources */ [who gets signals?] /*P*/ int p_sigparent; /* signal to parent on exit */ /*P*/ sigset_t p_oldsigmask; /* saved mask from before sigpause */ [one per signal scope.. what IS the scope of a signal?] /*P*/ int p_sig; /* for core dump/debugger XXX */ /*P*/ u_long p_code; /* for core dump/debugger XXX */ /*P?*/ struct klist p_klist; /* knotes attached to this process */ /*C?*/ LIST_HEAD(, mtx) p_heldmtx; /* for debugging code */ /*CE?*/ struct mtx *p_blocked; /* Mutex process is blocked on */ [depending on what this means ] /*C*/ LIST_HEAD(, mtx) p_contested; /* contested locks */ /* End area that is zeroed on creation. */ #define p_endzero p_startcopy /* The following fields are all copied upon creation in fork. */ #define p_startcopy p_sigmask /*P?*/ sigset_t p_sigmask; /* Current signal mask. */ /*C?*/ stack_t p_sigstk; /* sp & on stack state variable */ [what is the scope of a signal?] /*??*/ int p_magic; /* Magic number. */ [The fields below would be in the KSEG if the priority of all KSEs in a KSEG were to be calculated at one time.] /*G*/ u_char p_priority; /* Process priority. */ /*G*/ u_char p_usrpri; /* User-priority based on p_cpu and p_nice. */ /*G*/ u_char p_nativepri; /* Priority before propogation. */ /*G*/ char p_nice; /* Process "nice" value. */ /*P*/ char p_comm[MAXCOMLEN+1]; /*P*/ struct pgrp *p_pgrp; /* Pointer to process group. */ /*P*/ struct sysentvec *p_sysent; /* System call dispatch information. */ /*G*/ struct rtprio p_rtprio; /* Realtime priority. */ [priorities ar eper KSEG] /*P*/ struct prison *p_prison; /*P*/ struct pargs *p_args; [Either the whole Process is in gaol or it isn't] /* End area that is copied on creation. */ #define p_endcopy p_addr /*P?*/ struct user *p_addr; /* Kernel virtual addr of u-area (PROC ONLY). */ [XXX Are there 'per KSE' filds there? (actually yes there are...the pcb is there). /*C?*/ struct mdproc p_md; /* Any machine-dependent fields. */ [there is a trapframe there. not sure what it;s used for] /*P*/ u_short p_xstat; /* Exit status for wait; also stop signal. */ /*P*/ u_short p_acflag; /* Accounting flags. */ [these may be collected per KSE and harvested when needed] /*P*/ struct rusage *p_ru; /* Exit information. XXX */ /*P*/ int p_nthreads; /* number of threads (only in leader) */ [not sure how this is used... may become redundant] /*G?*/ void *p_aioinfo; /* ASYNC I/O info */ [will aio be 'per KSE, per KSEG or per PROC?] /*C*/ int p_wakeup; /* thread id */ [will surely change] /*P*/ struct proc *p_peers; /*P*/ struct proc *p_leader; /*C*/ struct pasleep p_asleep; /* Used by asleep()/await(). */ /*P*/ void *p_emuldata; /* process-specific emulator state data */ /*C*/ struct ithd *p_ithd; /* for interrupt threads only */ }; Obviously before we can really finish this we need to decide, what the scope of signals is.. Who gets externally genrated signals? Who gets signals that are the result of an action (e.g. SIGIO, SIGPIPE)? WHich signals are diverted when you allocate a signal stack? In the same context, what is the scope of aio? where are the results delivered? who is responsible for the kernel threads that do the work? do we allocate a KSE to run them? etc.etc. What is the scope of the timers and such? All this makes a difference in where the fields live.... Does anyone have comments? (Everyone has been VERY quiet so far!!!) julian -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000 ---> X_.---._/ presently in: Budapest v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Nov 26 13: 0:35 2000 Delivered-To: freebsd-arch@freebsd.org Received: from io.yi.org (unknown [24.70.218.157]) by hub.freebsd.org (Postfix) with ESMTP id 535DE37B479; Sun, 26 Nov 2000 13:00:15 -0800 (PST) Received: from io.yi.org (localhost.gvcl1.bc.wave.home.com [127.0.0.1]) by io.yi.org (Postfix) with ESMTP id D824BBA7A; Sun, 26 Nov 2000 13:00:14 -0800 (PST) X-Mailer: exmh version 2.1.1 10/15/1999 To: arch@freebsd.org Cc: smp@freebsd.org Subject: review: callout patch Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sun, 26 Nov 2000 13:00:14 -0800 From: Jake Burkholder Message-Id: <20001126210014.D824BBA7A@io.yi.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG This patch makes most of sys/kern/* sources use callout_reset for registering callouts rather than timeout(9). This should greatly reduce the use of the fixed size callfree allocator pool. Currently we panic when it runs out. This was motivated by NetBSD, who have completely removed timeout(9) from their kernel. Please review it. Index: compat/linux/linux_misc.c =================================================================== RCS file: /home/ncvs/src/sys/compat/linux/linux_misc.c,v retrieving revision 1.88 diff -u -r1.88 linux_misc.c --- compat/linux/linux_misc.c 2000/11/10 21:30:18 1.88 +++ compat/linux/linux_misc.c 2000/11/26 00:55:05 @@ -115,9 +115,9 @@ old_it = p->p_realtimer; getmicrouptime(&tv); if (timevalisset(&old_it.it_value)) - untimeout(realitexpire, (caddr_t)p, p->p_ithandle); + callout_stop(&p->p_itcallout); if (it.it_value.tv_sec != 0) { - p->p_ithandle = timeout(realitexpire, (caddr_t)p, tvtohz(&it.it_value)); + callout_reset(&p->p_itcallout, tvtohz(&it.it_value), realitexpire, p); timevaladd(&it.it_value, &tv); } p->p_realtimer = it; Index: kern/init_main.c =================================================================== RCS file: /home/ncvs/src/sys/kern/init_main.c,v retrieving revision 1.147 diff -u -r1.147 init_main.c --- kern/init_main.c 2000/11/22 07:41:57 1.147 +++ kern/init_main.c 2000/11/26 00:21:00 @@ -312,6 +312,9 @@ bcopy("swapper", p->p_comm, sizeof ("swapper")); + callout_init(&p->p_itcallout, 0); + callout_init(&p->p_slpcallout, 0); + /* Create credentials. */ cred0.p_refcnt = 1; cred0.p_uidinfo = uifind(0); Index: kern/kern_acct.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_acct.c,v retrieving revision 1.26 diff -u -r1.26 kern_acct.c --- kern/kern_acct.c 2000/07/04 03:34:06 1.26 +++ kern/kern_acct.c 2000/11/26 07:30:52 @@ -77,11 +77,9 @@ static void acctwatch __P((void *)); /* - * Accounting callout handle used for periodic scheduling of - * acctwatch. + * Accounting callout used for periodic scheduling of acctwatch. */ -static struct callout_handle acctwatch_handle - = CALLOUT_HANDLE_INITIALIZER(&acctwatch_handle); +static struct callout acctwatch_callout; /* * Accounting vnode pointer, and saved vnode pointer. @@ -148,7 +146,7 @@ * close the file, and (if no new file was specified, leave). */ if (acctp != NULLVP || savacctp != NULLVP) { - untimeout(acctwatch, NULL, acctwatch_handle); + callout_stop(&acctwatch_callout); error = vn_close((acctp != NULLVP ? acctp : savacctp), FWRITE, p->p_ucred, p); acctp = savacctp = NULLVP; @@ -161,6 +159,7 @@ * free space watcher. */ acctp = nd.ni_vp; + callout_init(&acctwatch_callout, 0); acctwatch(NULL); return (error); } @@ -329,5 +328,5 @@ log(LOG_NOTICE, "Accounting suspended\n"); } } - acctwatch_handle = timeout(acctwatch, NULL, acctchkfreq * hz); + callout_reset(&acctwatch_callout, acctchkfreq * hz, acctwatch, NULL); } Index: kern/kern_exit.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_exit.c,v retrieving revision 1.104 diff -u -r1.104 kern_exit.c --- kern/kern_exit.c 2000/11/22 07:41:58 1.104 +++ kern/kern_exit.c 2000/11/26 00:05:38 @@ -172,7 +172,7 @@ p->p_flag |= P_WEXIT; SIGEMPTYSET(p->p_siglist); if (timevalisset(&p->p_realtimer.it_value)) - untimeout(realitexpire, (caddr_t)p, p->p_ithandle); + callout_stop(&p->p_itcallout); /* * Reset any sigio structures pointing to us as a result of Index: kern/kern_fork.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_fork.c,v retrieving revision 1.84 diff -u -r1.84 kern_fork.c --- kern/kern_fork.c 2000/11/22 07:41:58 1.84 +++ kern/kern_fork.c 2000/11/26 00:20:48 @@ -483,6 +483,9 @@ LIST_INIT(&p2->p_heldmtx); LIST_INIT(&p2->p_contested); + callout_init(&p2->p_itcallout, 0); + callout_init(&p2->p_slpcallout, 0); + #ifdef KTRACE /* * Copy traceflag and tracefile if enabled. Index: kern/kern_synch.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_synch.c,v retrieving revision 1.110 diff -u -r1.110 kern_synch.c --- kern/kern_synch.c 2000/11/22 07:41:58 1.110 +++ kern/kern_synch.c 2000/11/26 00:55:54 @@ -70,6 +70,9 @@ int lbolt; int sched_quantum; /* Roundrobin scheduling quantum in ticks. */ +static struct callout schedcpu_callout; +static struct callout roundrobin_callout; + static int curpriority_cmp __P((struct proc *p)); static void endtsleep __P((void *)); static void maybe_resched __P((struct proc *chk)); @@ -175,7 +178,7 @@ need_resched(); #endif - timeout(roundrobin, NULL, sched_quantum); + callout_reset(&roundrobin_callout, sched_quantum, roundrobin, NULL); } /* @@ -344,7 +347,7 @@ lockmgr(&allproc_lock, LK_RELEASE, NULL, CURPROC); vmmeter(); wakeup((caddr_t)&lbolt); - timeout(schedcpu, (void *)0, hz); + callout_reset(&schedcpu_callout, hz, schedcpu, NULL); } /* @@ -414,7 +417,6 @@ { struct proc *p = curproc; int s, sig, catch = priority & PCATCH; - struct callout_handle thandle; int rval = 0; WITNESS_SAVE_DECL(mtx); @@ -465,7 +467,7 @@ p, p->p_pid, p->p_comm, (void *) sched_lock.mtx_lock); TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_slpq); if (timo) - thandle = timeout(endtsleep, (void *)p, timo); + callout_reset(&p->p_slpcallout, timo, endtsleep, p); /* * We put ourselves on the sleep queue and start our timeout * before calling CURSIG, as we could stop there, and a wakeup @@ -517,7 +519,7 @@ goto out; } } else if (timo) - untimeout(endtsleep, (void *)p, thandle); + callout_stop(&p->p_slpcallout); mtx_exit(&sched_lock, MTX_SPIN); if (catch && (sig != 0 || (sig = CURSIG(p)))) { @@ -628,7 +630,6 @@ s = splhigh(); if (p->p_wchan != NULL) { - struct callout_handle thandle; int sig; int catch; @@ -646,7 +647,7 @@ */ if (timo) - thandle = timeout(endtsleep, (void *)p, timo); + callout_reset(&p->p_slpcallout, timo, endtsleep, p); sig = 0; catch = priority & PCATCH; @@ -687,7 +688,7 @@ goto out; } } else if (timo) - untimeout(endtsleep, (void *)p, thandle); + callout_stop(&p->p_slpcallout); mtx_exit(&sched_lock, MTX_SPIN); if (catch && (sig != 0 || (sig = CURSIG(p)))) { @@ -1036,6 +1037,10 @@ sched_setup(dummy) void *dummy; { + + callout_init(&schedcpu_callout, 1); + callout_init(&roundrobin_callout, 0); + /* Kick off timeout driven events by calling first time. */ roundrobin(NULL); schedcpu(NULL); Index: kern/kern_time.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_time.c,v retrieving revision 1.70 diff -u -r1.70 kern_time.c --- kern/kern_time.c 2000/04/18 15:15:20 1.70 +++ kern/kern_time.c 2000/11/26 01:13:48 @@ -513,10 +513,10 @@ s = splclock(); /* XXX: still needed ? */ if (uap->which == ITIMER_REAL) { if (timevalisset(&p->p_realtimer.it_value)) - untimeout(realitexpire, (caddr_t)p, p->p_ithandle); + callout_stop(&p->p_itcallout); if (timevalisset(&aitv.it_value)) - p->p_ithandle = timeout(realitexpire, (caddr_t)p, - tvtohz(&aitv.it_value)); + callout_reset(&p->p_itcallout, tvtohz(&aitv.it_value), + realitexpire, p); getmicrouptime(&ctv); timevaladd(&aitv.it_value, &ctv); p->p_realtimer = aitv; @@ -560,8 +560,8 @@ if (timevalcmp(&p->p_realtimer.it_value, &ctv, >)) { ntv = p->p_realtimer.it_value; timevalsub(&ntv, &ctv); - p->p_ithandle = timeout(realitexpire, (caddr_t)p, - tvtohz(&ntv) - 1); + callout_reset(&p->p_itcallout, tvtohz(&ntv) - 1, + realitexpire, p); splx(s); return; } Index: kern/uipc_domain.c =================================================================== RCS file: /home/ncvs/src/sys/kern/uipc_domain.c,v retrieving revision 1.22 diff -u -r1.22 uipc_domain.c --- kern/uipc_domain.c 1999/08/28 00:46:21 1.22 +++ kern/uipc_domain.c 2000/11/26 07:09:06 @@ -61,6 +61,9 @@ static void domaininit __P((void *)); SYSINIT(domain, SI_SUB_PROTO_DOMAIN, SI_ORDER_FIRST, domaininit, NULL) +static struct callout pffast_callout; +static struct callout pfslow_callout; + static void pffasttimo __P((void *)); static void pfslowtimo __P((void *)); @@ -136,9 +139,12 @@ if (max_linkhdr < 16) /* XXX */ max_linkhdr = 16; + + callout_init(&pffast_callout, 0); + callout_init(&pfslow_callout, 0); - timeout(pffasttimo, (void *)0, 1); - timeout(pfslowtimo, (void *)0, 1); + callout_reset(&pffast_callout, 1, pffasttimo, NULL); + callout_reset(&pfslow_callout, 1, pfslowtimo, NULL); } @@ -214,7 +220,7 @@ for (pr = dp->dom_protosw; pr < dp->dom_protoswNPROTOSW; pr++) if (pr->pr_slowtimo) (*pr->pr_slowtimo)(); - timeout(pfslowtimo, (void *)0, hz/2); + callout_reset(&pfslow_callout, hz/2, pfslowtimo, NULL); } static void @@ -228,5 +234,5 @@ for (pr = dp->dom_protosw; pr < dp->dom_protoswNPROTOSW; pr++) if (pr->pr_fasttimo) (*pr->pr_fasttimo)(); - timeout(pffasttimo, (void *)0, hz/5); + callout_reset(&pffast_callout, hz/5, pffasttimo, NULL); } Index: sys/proc.h =================================================================== RCS file: /home/ncvs/src/sys/sys/proc.h,v retrieving revision 1.124 diff -u -r1.124 proc.h --- sys/proc.h 2000/11/22 07:42:01 1.124 +++ sys/proc.h 2000/11/26 00:28:23 @@ -157,10 +157,6 @@ LIST_ENTRY(proc) p_sibling; /* List of sibling processes. */ LIST_HEAD(, proc) p_children; /* Pointer to list of children. */ - struct callout_handle p_ithandle; /* - * Callout handle for scheduling - * p_realtimer. - */ /* The following fields are all zeroed upon creation in fork. */ #define p_startzero p_oppid @@ -173,11 +169,13 @@ u_int p_estcpu; /* Time averaged value of p_cpticks. */ int p_cpticks; /* Ticks of cpu time. */ fixpt_t p_pctcpu; /* %cpu for this process during p_swtime */ + struct callout p_slpcallout; /* Callout for sleep. */ void *p_wchan; /* Sleep address. */ const char *p_wmesg; /* Reason for sleep. */ u_int p_swtime; /* Time swapped in or out. */ u_int p_slptime; /* Time since last blocked. */ + struct callout p_itcallout; /* Interval timer callout. */ struct itimerval p_realtimer; /* Alarm timer. */ u_int64_t p_runtime; /* Real time in microsec. */ u_int64_t p_uu; /* Previous user time in microsec. */ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Nov 26 13:38:33 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 7D14B37B479; Sun, 26 Nov 2000 13:38:26 -0800 (PST) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id QAA21022; Sun, 26 Nov 2000 16:37:52 -0500 (EST) Date: Sun, 26 Nov 2000 16:37:49 -0500 (EST) From: Daniel Eischen To: Julian Elischer Cc: arch@FreeBSD.ORG, jasone@FreeBSD.ORG Subject: Re: Threads (KSE etc) comments In-Reply-To: <3A211C82.2464D07E@elischer.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, 26 Nov 2000, Julian Elischer wrote: > Where's jasone? He's been perculiarly silent during this.... > > There has been some discussion as to what the function of the KSEG > is.... > > I was shaving today (doesn't the best thinkin happen at such times?) > and thinking about why we needed KSEGs. > > The basic answer is, > > "We need some method by which we group the scheduled entities > so as to be able to ensure that the scheduler has full > information and control over what is going on." Which scheduler - the UTS or the kernel sheduler? The UTS need not know about KSEGs, except if that is the only way to get a quantum. > Whether we actually need a KSEG and what it does depends upon what > semantics we want our threading support to have. If we want to provide a > virtual machine for the process, that looks as if it has an unlimited > number of virtual processors, then we allow the KSEG to spawn an > unlimited number of KSEs. In this case, do we allow the "scheduling > clout" to build up linearly with the number of KSEs or do we limit it in > some way? Theoretically you would want a KSEG with two KSEs to have the > same clout as a process running unthreaded, so that cpu time would be > divided 50-50. However this would mean assigning the threaded process > 'partial quantum' for each processor. > > By this I mean that after 5 ticks the KSE on each processor for the KSE > would be interrupted and the other process allowed to run. This is > unworkable. Another way of sharing the processors between the two > processors would be to schedule bith KSEs on one process and allow the > other process to run uninterrupted on the other. This is also quite > unworkable - what if there are three competing processes and only 2 > processors? > > Maybe this 'exact fairness' is too hard to achieve.. > > In my world, we allow the KSEG to become SLIGHTLTY unfair, by allowing > it to compete independently on each processor. If we allow the KSEG to > have an unlimited number of KSEs then we need some other item that > competes on behalf of the KSEG on each processor. That is, we invent > some other structure (KSEG-agent) that sits in the scheduling queue(s) I like Terry's usage of "scheduler reservation" which includes quantum and priority. > on behalf of the KSEG. When the 'agent' gets a quantum, it allows the > KSEG to decide which of it's KSEGs will be run next. (The KSEG could ^^^^^ KSEs > round robin them for example). If you are going to afford N quantum (for N CPUs) to a KSE, then it doesn't make sense to have more than N KSEs within that KSEG. From the UTS point of view, I will not attempt to create/ask for more than N KSEs. Let's ignore this case. > > When a KSE is pre-empted, the kernel saves state for that thread in the > thread-control-block and the next KSE to upcall to the UTS will include > that thread-control-block in its list of reportable entities. I'm not > clear on whether it's the next upcall on ANY KSE, or just the next > upcall on that KSE. It has to be on the next KSE, otherwise there will be too much latency (possibly priority inversion) for RT threads if they are being blocked by a preempted thread. For instance if a thread is within a critical region and the KSE on which it is running is preempted, and the next KSE to execute is running in RT (it's a scope system thread). The RT KSE must get notification of the preemption so it can resume the thread that was preempted long enough for it to leave the critical region. It should also be noted that without notification that the RT KSE cannot determine which thread is blocking it. At the minimum, the RT KSE must be able to search all the other KSE mailboxes to find the thread that is blocking it. You also have read-write hazards that have to be avoided (what happens when the preempted KSE is resumed on another processor while the RT KSE is resuming the preempted thread?). One idea I had was that the RT KSE (in this case) would issue a system call to halt resumption of preempted KSE. It would then resume the preempted thread until it leaves the critical region, updates the preempted KSEs mailbox, and issues another system call to release the preempted KSE. Critical regions are very brief so this would not be an often occurrence. Anyway, these problems really have to be worked out. I really want this to work well with a mix of RT and non-RT threads. You could have the same problem with threads of the same scheduling class and it would be possible that no KSE makes any progress until the preempted KSE gets its turn to run again. > If the latter then having multiple KSEs on the same processor, allows > the KSEG round-robin scheduler to make the UTS believe that it has N > virtual processors, (N-KSEs). However, it also means that the KSEG > round-robin scheduler is usurping the decision from the UTS as to which > thread is to be run next, as the UTS doesn't know that the thread on the > other KSE was pre-empted in favour of this one. (It's on a different > virtual CPU). It has to know. See above. > > If the Former (All KSEs report all events) then there is no real > advantage to having more than N KSEs (N processors), because that means > that the UTS will probably keep swapping the threads it thinks are most > important to the KSEs which means that the thread that was pre-empted on > KSE-A will probably be re-scheduled on KSE-B when it preempts KSE-A. So > why have KSE-B at all? All it does is massively confuse things, and > creates a whole new class of scheduling problems. I am going to assume that you are talking about KSEs that have their own scheduling quantum (I agree that it doesn't make sense to have more than N KSEs if they don't have their own quantum). What has gone unasked is "what is the application interface to allow creation of KSEs, quantum, thread/kse/processor binding?". Let's look at it from the UTS and application interface point of view. Let's also ignore scope system threads; they are uninteresting since we know how they are scheduled. So the question now is how are scope process threads scheduled and what API is presented to the application? I am in the middle of writing up my notes on this topic, and will post them when I'm done. But a brief synopsis is that we want to allow the application to bind scope process threads to a specific KSE, bind KSEs to a specific processor, and to allow creation of additional quantum (KSEs or KSEGs, subject to limitations of course). This allows the application to decide how threads are scheduled. If a thread is bound to a specific KSE, then it is not rescheduled on another KSE when it is preempted (unless it is in a critical region). If a thread is not bound to a specific KSE and it is preempted, then the UTS could decide to only reschedule it on the next KSE to execute if there were no other threads of greator or equal priority. The UTS could also decide not to reschedule it regardless; this gets into what scheduling allocation domain we are using. For scheduling allocation domains > 1, it is valid (perhaps against POLA) to have multiple scheduling queues. I submit that it is difficult for the UTS to decide how to (soft or hard) bind threads to KSEs -- perhaps we want to try to do this in the future, but let's keep it simple for now. Let the application decide how threads are bound to KSEs and how much quantum (KSEs or KSEGs) it wants. This makes it much easier for the UTS and doesn't "massively confuse things". > So, in summary: > Assuming we allow only SLIGHT unfairness, if you allow the process to > have more than N KSEs in a KSEG, you have one of the following: > 1/ A lot of unfairness if you allow each KSE to be in the queues by > itself. No more than LinuxThreads or fork()'d processes. Again, this can be limited just as there is a user process limit. I don't see this as a problem. > 2/ The KSEG scheduler usurping the role of the UTS if it really does > hide the true number of processors. > 3/ An increased level of UTS complexity, and un-needed work, as the UTS > struggles to switch the important threads onto the ever-changing set of > running KSEs (it must be ever changing because there are more of them > than CPUs). Not really true. I've addressed this above. > If you only allow N KSEs to the KSEG, then all these problems go away. > The UTS can be aware that it has a limit. But it can also be aware that > a KSE will not be re-empted by another of it's own KSEs. (this > simplifies things). It gets the same amount of > CPU-time, but has less work to do. It has full control of which threads > are running, > and competes fairly with other processes and KSEGs. Whether there are N or N+d KSEs, it makes no difference to the UTS. The same problem of scheduling scope process threads over more than 1 KSE exists; it is no more difficult or simple with a limit of N KSEs. > The reason for having KSEGs is simply as an entity that competes for CPU > to assure fairness. My argument is that if you assign the quantum (and priority) to the KSE, then the _KSE_ is the entity that competes for CPU fairness. There is no visible advantage to me of having a KSEG, especially forcing knowledge of this to the UTS when it doesn't really care. > It may not even exist as a separate structure in the case where there > are separate per-CPU scheduling queues, (though I think it would for > efficiency's sake). It would PROBABLY have a analogous partner in the > UTS that represents the virtual machine that runs all the threads that > are competing at the same scope. On a single scheduling queue system, I > think I would have the KSEG in the queue rather than the independent > KSEs. When it get's to the head, you schedule > KSEs on all the CPUs. This allows the threads to communicate quickly > using shared memory should they want. The UTS has the entire quantum > across as many CPUs as it has. I'm confused. Now you seem to be advocating having multiple KSEs with one quantum. > I hope that his answers some of the questions as to why I think there > are reasons for having the KSEG entity. I am not convinced :-) I think we need to look more closely at what the UTS needs and what API (both POSIX and non-POSIX) is needed/desired. My point is that the UTS doesn't need to know about the KSEG. If that's the only way to get a quantum, then I guess it'll be forced to know about it. But also keep in mind that the UTS could also create a KSEG just as easily as a KSE in order to provide additional quantum. It already has to do this for system scope threads. -- "Some folks are into open source, but me, I'm into open bar." -- Spencer F. Katt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Nov 26 13:49:32 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id BF1D237B479; Sun, 26 Nov 2000 13:49:29 -0800 (PST) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id QAA22240; Sun, 26 Nov 2000 16:49:05 -0500 (EST) Date: Sun, 26 Nov 2000 16:49:05 -0500 (EST) From: Daniel Eischen To: Julian Elischer Cc: arch@FreeBSD.ORG, jasone@FreeBSD.ORG Subject: Re: Threads .. chopping up 'struct proc' In-Reply-To: <3A216FFE.BE0F780F@elischer.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, 26 Nov 2000, Julian Elischer wrote: > I'v been looking a the proc srtucture.. > > The aim is to eventually move some of the fields into a > struct KSE (struct schedbox?) > struct KSEC (struct threadcontext?) > struct KSEG (struct schedgroup?) [ ... ] > I note with [XXX] things I am sure about, or do nut really understand. > these are usually new fields to do with things like events, or fields > where the semantics of the feature have not been decided for a > threaded environment. E.g. WHO GETS A SIGNAL? First KSE to execute I suppose. A signal is just an upcall, so I'd assume you would want to treat this the same as if a KSE was preempted. > Obviously before we can really finish this we need to decide, > what the scope of signals is.. Who gets externally genrated signals? > Who gets signals that are the result of an action (e.g. SIGIO, SIGPIPE)? > WHich signals are diverted when you allocate a signal stack? > In the same context, what is the scope of aio? > where are the results delivered? who is responsible for the > kernel threads that do the work? do we allocate a KSE to run them? etc.etc. Have the kernel automaticially allocate a separate KSE (or KSEG) with quantum for aio? > Does anyone have comments? > (Everyone has been VERY quiet so far!!!) Not me :-) Remember that COMDEX was two weeks ago and last week (and this weekend) was a holiday week in the US. I suspect folks are just plain busy. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Nov 26 13:56:50 2000 Delivered-To: freebsd-arch@freebsd.org Received: from prism.flugsvamp.com (cb58709-a.mdsn1.wi.home.com [24.17.241.9]) by hub.freebsd.org (Postfix) with ESMTP id 1324A37B479; Sun, 26 Nov 2000 13:56:46 -0800 (PST) Received: (from jlemon@localhost) by prism.flugsvamp.com (8.11.0/8.11.0) id eAQLtQ433741; Sun, 26 Nov 2000 15:55:26 -0600 (CST) (envelope-from jlemon) Date: Sun, 26 Nov 2000 15:55:26 -0600 From: Jonathan Lemon To: Jake Burkholder Cc: arch@FreeBSD.ORG, smp@FreeBSD.ORG Subject: Re: review: callout patch Message-ID: <20001126155526.K69183@prism.flugsvamp.com> References: <20001126210014.D824BBA7A@io.yi.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: <20001126210014.D824BBA7A@io.yi.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, Nov 26, 2000 at 01:00:14PM -0800, Jake Burkholder wrote: > > This patch makes most of sys/kern/* sources use callout_reset for > registering callouts rather than timeout(9). This should greatly > reduce the use of the fixed size callfree allocator pool. Currently > we panic when it runs out. > > This was motivated by NetBSD, who have completely removed timeout(9) > from their kernel. Looks good to me. I was moving the the same direction, but didn't know that NetBSD had already done this. -- Jonathan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Nov 26 14:39:30 2000 Delivered-To: freebsd-arch@freebsd.org Received: from green.dyndns.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 12F4F37B479; Sun, 26 Nov 2000 14:39:15 -0800 (PST) Received: from localhost (vuvjir@localhost [127.0.0.1]) by green.dyndns.org (8.11.0/8.11.0) with ESMTP id eAQMd0576413; Sun, 26 Nov 2000 17:39:07 -0500 (EST) (envelope-from green@FreeBSD.org) Message-Id: <200011262239.eAQMd0576413@green.dyndns.org> X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 To: Julian Elischer Cc: arch@FreeBSD.org, jasone@FreeBSD.org Subject: Re: Threads .. chopping up 'struct proc' In-Reply-To: Message from Julian Elischer of "Sun, 26 Nov 2000 12:18:06 PST." <3A216FFE.BE0F780F@elischer.org> From: "Brian F. Feldman" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sun, 26 Nov 2000 17:38:59 -0500 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Julian Elischer wrote: > I'v been looking a the proc srtucture.. > > The aim is to eventually move some of the fields into a > struct KSE (struct schedbox?) > struct KSEC (struct threadcontext?) > struct KSEG (struct schedgroup?) Sounds about right, as far as I've been following the discussion (I read all of -arch, but don't follow -smp at all since I just don't have SMP ;) My question thus far is, okay, given a proc has one of each; will a set of threads, in any form, ALWAYS have a proc backing it up? It would make sense as such, and in that case I'd think that you would reduce a lot of the complexity in the switchover. > Initially we would simply include one of each of these in the struct proc, > but link them together as if they were correctly connected up. > we would use macros such as: > #define p_estcpu p_kse.kse_estcpu > to keep present code working.... > eventually functions that get changed to receive a kse directly > would just use kse->kse_estcpu and if they need proc they > can use kse->kse_proc. But until then, we'd start by simply > separating the fields and using macros. Then we can convert > calls at our leasure. What would be the difference between doing it "right" for struct proc in the first place rather than dummying them up? I wouldn't want an artificial discrepancy here, if possible. Perhaps you could explain a bit more of the vision you have here? I haven't been able to pick that bit up from your posts as of yet. A KSE of just one thread would seem to logically be handled the exact same as a process. > However when going through the fields in struct proc, > some difficulties become obvious. Here's my initial > division of the fields. I've added a comment at the > beginning of each line that indicates where I think > it should go, however I'm not convinced about some of them: > > P = stays in struct proc > E = goes to 'KSE' struct (schedulable entity) > G = goes to 'group' struct > C = goes to 'sleepable Context' struct. Does each KSE get a sleepable context? I don't know if I really see where it fits in; sounds like it would have a 1:1 mapping with KSEs. > I note with [XXX] things I am sure about, or do nut really understand. > these are usually new fields to do with things like events, or fields > where the semantics of the feature have not been decided for a > threaded environment. E.g. WHO GETS A SIGNAL? > > struct proc { > /*E*/ TAILQ_ENTRY(proc) p_procq; /* run/mutex queue. */ > [this may need to be split to two entries.. one in a KSE or > and one in a KSEG, depending on how we do things ] > > /*C*/ TAILQ_ENTRY(proc) p_slpq; /* sleep queue. */ > /*P*/ LIST_ENTRY(proc) p_list; /* List of all processes. */ > > /* substructures: */ > /*P*/ struct pcred *p_cred; /* Process owner's identity. */ > /*P*/ struct filedesc *p_fd; /* Ptr to open files structure. */ > /*P*/ struct pstats *p_stats; /* Accounting/statistics (PROC ONLY). */ > [some of these may need to be duplicated in the KSE and KSEG.. > maybe even Context] Sounds particularly evil to have a set of statistics in the process and in the KSEs. How about only in the KSEs, and in the "traditional" case, the process usage info for example would be the addition of all that of the KSEs. > /*P*/ struct plimit *p_limit; /* Process limits. */ > /*P*/ struct vm_object *p_upages_obj;/* Upages object */ This maps to a KSE, really... The struct user maps to the signal handlers (should be per-KSE, I think...), the stats, and the pcb. The pcb absolutely has to be one per CPU context, so proc won't work :) > /*P*/ struct procsig *p_procsig; > [Well, actually who gets signals? maybe this is per KSE? per KSEG? > maybe even per Context as each context has a different user stack and > signals are delivered on the user stack.. (unless set otherwise)] I would think that a KSE should own its own and that it should be configurable whether to use the signal info per-KSE or per-proc. > #define p_sigacts p_procsig->ps_sigacts > #define p_sigignore p_procsig->ps_sigignore > #define p_sigcatch p_procsig->ps_sigcatch > > #define p_ucred p_cred->pc_ucred > #define p_rlimit p_limit->pl_rlimit > > /*C*/ int p_flag; /* P_* flags. */ > [these flags will probably need to be shared out amongst the structures] > /*C*/ char p_stat; /* S* process status. */ > [as will these] > char p_pad1[3]; > > /*P*/ pid_t p_pid; /* Process identifier. */ If signals are per-KSE, would it then follow to give a KSEG a process id and each KSE another process id (same namespace as pids) that could be used to signal it and whatnot? > /*P*/ LIST_ENTRY(proc) p_hash; /* Hash chain. */ > /*P*/ LIST_ENTRY(proc) p_pglist; /* List of processes in pgrp. */ > /*P*/ struct proc *p_pptr; /* Pointer to parent process. */ > /*P*/ LIST_ENTRY(proc) p_sibling; /* List of sibling processes. */ > /*P*/ LIST_HEAD(, proc) p_children; /* Pointer to list of children. */ Would non-RFMEM-fork()ed processes be the only ones here, and RFMEM ones automatically become a KSE of the proc? > /*P*/ struct callout_handle p_ithandle; /* > * Callout handle for scheduling > * p_realtimer. > */ > [So who gets the resulting signal? Can differnt KSEGs have > different timers running? what about KSEs? (I vote for KSEGs)] KSEGs would be simplest. BTW, I don't recall there really being a difference between a KSEG and a process containing KSEs. Is there one? > /* The following fields are all zeroed upon creation in fork. */ > #define p_startzero p_oppid > > /*P*/ pid_t p_oppid; /* Save parent pid during ptrace. XXX */ > /*C*/ int p_dupfd; /* Sideways return value from fdopen. XXX */ > [whatever THIS means.. it's a hack so C is the safest place for it] Per-KSE? Optionally, it would be nice to squash these kind of hacks. > /*P*/ struct vmspace *p_vmspace; /* Address space. */ > > /* scheduling */ > [I've shown the following as being in the KSE structure. they would be > collected there, but the priority is worked out for the entire KSEG > so it probably collects the data from all of the KSEs. UNLESS we decide that > all KSEs can have independent priorities, in which case how do you > control how their priorities relate..] > > /*E*/ u_int p_estcpu; /* Time averaged value of p_cpticks. */ > /*E*/ int p_cpticks; /* Ticks of cpu time. */ > /*E*/ fixpt_t p_pctcpu; /* %cpu for this process during p_swtime */ > void *p_wchan; /* Sleep address. */ > const char *p_wmesg; /* Reason for sleep. */ > /*P*/ u_int p_swtime; /* Time swapped in or out. */ > /*E?*/ u_int p_slptime; /* Time since last blocked. */ > [what does this mean?] The scheduler updates the amount of time the process has been in a tsleep() (msleep()?). Should then be KSE, along with the process states and whatnot. > /*?*/ struct itimerval p_realtimer; /* Alarm timer. */ > [who gets these? who can set them? what is their scope?] Same as signals, no? > /*P*/ u_int64_t p_runtime; /* Real time in microsec. */ > > [If we treat separate KSEGs as seperate processes, do we keep the > below fields per KSEG? */ > /*G?*/ u_int64_t p_uu; /* Previous user time in microsec. */ > /*G?*/ u_int64_t p_su; /* Previous system time in microsec. */ > /*G?*/ u_int64_t p_iu; /* Previous interrupt time in usec. */ > [how about these? do we agregate? or collect per KSE? Is there a separate > statclock per CPU?] > /*P?*/ u_int64_t p_uticks; /* Statclock hits in user mode. */ > /*P?*/ u_int64_t p_sticks; /* Statclock hits in system mode. */ > /*P?*/ u_int64_t p_iticks; /* Statclock hits processing intr. */ > > /*P*/ int p_traceflag; /* Kernel trace points. */ > /*P*/ struct vnode *p_tracep; /* Trace to vnode. */ > [do we trace all KSEs at once? how do we trace individual threads? */ I'd think we'd want to enable tracing an individual KSE; this could be done by making the trace vnode per-KSE, but I think it would be advantageous just to change the ktrace info to include both the PID and the KSEid. > /*P*/ sigset_t p_siglist; /* Signals arrived but not delivered. */ > [who gets signals? does each KSEG (KSE?) have its own handler?] Hm. Do you think there's a good use for separate signal-spaces, actually? How would thread migration (across KSEs) be handled for signals, then? Not at all? > /*P*/ struct vnode *p_textvp; /* Vnode of executable. */ > > /*P*/ char p_lock; /* Process lock (prevent swap) count. */ > /*E*/ u_char p_oncpu; /* Which cpu we are on */ > /*E?*/ u_char p_lastcpu; /* Last cpu we were on */ > [each context or each KSE? KSEs can't migrate, (under discussion)] If I may, I believe KSEs should be able to migrate. It doesn't much make sense to waste a CPU at no utilization by saying "KSE x runs on CPU 0, y on 1, and z on 0" and if y is blocked and x and z are both runnable, they must compete for CPU 0 instead of splitting across. > /*EG?*/ char p_rqindex; /* Run queue index */ > Who is on the run queue? KSE or KSEG? > > /*C*/ short p_locks; /* DEBUG: lockmgr count of held locks */ > /*C*/ short p_simple_locks; /* DEBUG: count of held simple locks */ > [If you cannot sleep or be interrupted with these they could be in the KSE] You can hold a lockmgr() lock while msleep()ing... > /*P?*/ unsigned int p_stops; /* procfs event bitmask */ > /*P?*/ unsigned int p_stype; /* procfs stop event type */ > /*P?*/ char p_step; /* procfs stop *once* flag */ > /*P?*/ unsigned char p_pfsflags; /* procfs flags */ > [the procfs stuff is problematical... dependign in what it does > and what it is used for, the semantics might vary] Procfs would need modifications if we want to make KSEs visible in it, and this could be trouble... > char p_pad3[2]; /* padding for alignment */ > /*C*/ register_t p_retval[2]; /* syscall aux returns */ E? > /*P*/ struct sigiolst p_sigiolst; /* list of sigio sources */ > [who gets signals?] > > /*P*/ int p_sigparent; /* signal to parent on exit */ > /*P*/ sigset_t p_oldsigmask; /* saved mask from before sigpause */ > [one per signal scope.. what IS the scope of a signal?] > /*P*/ int p_sig; /* for core dump/debugger XXX */ > /*P*/ u_long p_code; /* for core dump/debugger XXX */ > /*P?*/ struct klist p_klist; /* knotes attached to this process */ That seems right. > /*C?*/ LIST_HEAD(, mtx) p_heldmtx; /* for debugging code */ > /*CE?*/ struct mtx *p_blocked; /* Mutex process is blocked on */ > [depending on what this means ] E. > /*C*/ LIST_HEAD(, mtx) p_contested; /* contested locks */ Why not E? > /* End area that is zeroed on creation. */ > #define p_endzero p_startcopy > > /* The following fields are all copied upon creation in fork. */ > #define p_startcopy p_sigmask > > /*P?*/ sigset_t p_sigmask; /* Current signal mask. */ > /*C?*/ stack_t p_sigstk; /* sp & on stack state variable */ > [what is the scope of a signal?] > > /*??*/ int p_magic; /* Magic number. */ > > [The fields below would be in the KSEG if the priority of all KSEs in a KSEG > were to be calculated at one time.] > > /*G*/ u_char p_priority; /* Process priority. */ > /*G*/ u_char p_usrpri; /* User-priority based on p_cpu and p_nice. */ > /*G*/ u_char p_nativepri; /* Priority before propogation. */ > /*G*/ char p_nice; /* Process "nice" value. */ > /*P*/ char p_comm[MAXCOMLEN+1]; > > /*P*/ struct pgrp *p_pgrp; /* Pointer to process group. */ > > /*P*/ struct sysentvec *p_sysent; /* System call dispatch information. */ > > /*G*/ struct rtprio p_rtprio; /* Realtime priority. */ > [priorities ar eper KSEG] > > /*P*/ struct prison *p_prison; > /*P*/ struct pargs *p_args; > [Either the whole Process is in gaol or it isn't] > > /* End area that is copied on creation. */ > #define p_endcopy p_addr > /*P?*/ struct user *p_addr; /* Kernel virtual addr of u-area (PROC ONLY). */ > [XXX Are there 'per KSE' filds there? (actually yes there are...the pcb is > there). The contents should be reevaluated. > /*C?*/ struct mdproc p_md; /* Any machine-dependent fields. */ > [there is a trapframe there. not sure what it;s used for] Trapframe? E. > /*P*/ u_short p_xstat; /* Exit status for wait; also stop signal. */ > /*P*/ u_short p_acflag; /* Accounting flags. */ > [these may be collected per KSE and harvested when needed] > /*P*/ struct rusage *p_ru; /* Exit information. XXX */ > > /*P*/ int p_nthreads; /* number of threads (only in leader) */ > [not sure how this is used... may become redundant] > > /*G?*/ void *p_aioinfo; /* ASYNC I/O info */ > [will aio be 'per KSE, per KSEG or per PROC?] Probably the same as signals, but I'd be inclined to say per proc, keeping in mind that the aio is a separate thread. > /*C*/ int p_wakeup; /* thread id */ > [will surely change] > /*P*/ struct proc *p_peers; > /*P*/ struct proc *p_leader; > /*C*/ struct pasleep p_asleep; /* Used by asleep()/await(). */ > /*P*/ void *p_emuldata; /* process-specific emulator state data */ Should probably have another KSE-specific one, if needed. That is, planning ahead :) > /*C*/ struct ithd *p_ithd; /* for interrupt threads only */ > }; > > > > > Obviously before we can really finish this we need to decide, > what the scope of signals is.. Who gets externally genrated signals? > Who gets signals that are the result of an action (e.g. SIGIO, SIGPIPE)? > WHich signals are diverted when you allocate a signal stack? > In the same context, what is the scope of aio? > where are the results delivered? who is responsible for the > kernel threads that do the work? do we allocate a KSE to run them? etc.etc. > What is the scope of the timers and such? You can always be flexible enough to have a system call to set the behavior. > All this makes a difference in where the fields live.... > > Does anyone have comments? > (Everyone has been VERY quiet so far!!!) I'll be less quiet now, at least! > julian > > > -- > __--_|\ Julian Elischer > / \ julian@elischer.org > ( OZ ) World tour 2000 > ---> X_.---._/ presently in: Budapest > v -- Brian Fundakowski Feldman \ FreeBSD: The Power to Serve! / green@FreeBSD.org `------------------------------' To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Nov 26 15:59:34 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mail.interware.hu (mail.interware.hu [195.70.32.130]) by hub.freebsd.org (Postfix) with ESMTP id 9606837B479; Sun, 26 Nov 2000 15:59:25 -0800 (PST) Received: from luanda-56.budapest.interware.hu ([195.70.51.56] helo=elischer.org) by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian)) id 140BhM-0001ld-00; Mon, 27 Nov 2000 00:59:09 +0100 Message-ID: <3A21A3C7.A836DE09@elischer.org> Date: Sun, 26 Nov 2000 15:59:03 -0800 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: "Brian F. Feldman" Cc: arch@FreeBSD.org, jasone@FreeBSD.org Subject: Re: Threads .. chopping up 'struct proc' References: <200011262239.eAQMd0576413@green.dyndns.org> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG "Brian F. Feldman" wrote: > > Julian Elischer wrote: > > I'v been looking a the proc srtucture.. > > > > The aim is to eventually move some of the fields into a > > struct KSE (struct schedbox?) > > struct KSEC (struct threadcontext?) > > struct KSEG (struct schedgroup?) > > Sounds about right, as far as I've been following the discussion (I read all > of -arch, but don't follow -smp at all since I just don't have SMP ;) > > My question thus far is, okay, given a proc has one of each; will a set of > threads, in any form, ALWAYS have a proc backing it up? It would make sense > as such, and in that case I'd think that you would reduce a lot of the > complexity in the switchover. > > > Initially we would simply include one of each of these in the struct proc, > > but link them together as if they were correctly connected up. > > we would use macros such as: > > #define p_estcpu p_kse.kse_estcpu > > to keep present code working.... > > eventually functions that get changed to receive a kse directly > > would just use kse->kse_estcpu and if they need proc they > > can use kse->kse_proc. But until then, we'd start by simply > > separating the fields and using macros. Then we can convert > > calls at our leasure. > > What would be the difference between doing it "right" for struct proc in the > first place rather than dummying them up? I wouldn't want an artificial > discrepancy here, if possible. Perhaps you could explain a bit more of the > vision you have here? I haven't been able to pick that bit up from your > posts as of yet. A KSE of just one thread would seem to logically be > handled the exact same as a process. > > > However when going through the fields in struct proc, > > some difficulties become obvious. Here's my initial > > division of the fields. I've added a comment at the > > beginning of each line that indicates where I think > > it should go, however I'm not convinced about some of them: > > > > P = stays in struct proc > > E = goes to 'KSE' struct (schedulable entity) > > G = goes to 'group' struct > > C = goes to 'sleepable Context' struct. > > Does each KSE get a sleepable context? I don't know if I really see where > it fits in; sounds like it would have a 1:1 mapping with KSEs. > Ok I'm going to only answer this question here as I'm off to school inthe morning and it's 12:30 AM now.. but you have a misconception so I'll try clear that up quickly.. A KSE doesn't have a stack. It doesn't have any state WRT system call execution. When a system call happens, controll passes from userland, to a waiting KSE that is presently assigned to teh processor you are on, and your process. The KSE grabs a spare "KSEC - KSE CONTEXT) (maybe it already has is sitting ready) and uses it. The KSEC supplies a stack and storage for anything that describes the state of the processor at any moment during the syscall. When the system call blocks, the KSEC is left on the sleep queue, and the KSE grabs another one, and performs an upcall to the Userland Thread scheduler, which schedules another thread. When THAT thread does a system call, the system call is executed, storing a set of frames and state onto the stack in the NEW KSEC. If, in turn, that blocks, it too is thrown onto the sleep queue. Everything needed to complete the system calls is in the KSECs, which is hibernating on the Sleep queues. When the system call is reawakenned, the kernel, waits for a scheduling event in which a KSE from that process (possibly the same one) is being scheduled. It then reassociates the first KSEC (with it's stack and stored processor context) with that KSE and then completes the system call (including any copyout()s or copyin()s). However, instead of crossing back to user space when it gets back up to the boundary, it puts the syscall's return information in the mailbox that the Thread system configured (I skipped that bit) for that thread (don't worry it's trivial), and checks if there are any more awakened syscalls to complete. It keeps doing this until there are no more awakening KSECs, at which time it does an upcall to the process. This results in the Userland Thread Scheduler (UTS) picking up all the completed threads, deciding which is the highest priority, and running it, as if it were just returning from the kernel. I forgot to mention that the mailboxes for the completed threads are linked together by the kernel before doing the upcall, and the resulting list is passed as a single pointer to UTS. Note: the thread that was running when the KSE was pre-empted is also in the list of threads that is returned to the UTS when the upcall happens, so the UTS may decide to let it continue running. It didn't voluntarily do a syscall, but it did cross to the kernel when the timer interrupt occured, so it can be faked up to look the same. If it was in a critical region, then of course it should have marked that fact, so it would be scheduled first. A process may have a KSE for each physical processor. When it creates a new KSE (upto the maximum of N) it sets up a KSE mailbox. When it shedules a thread, it places a pointer to the Thread mailbix in the KSE mailbox. The KSE always knows where it's mailbox is so it can always find the thread mailbox of the thread that just made the systemcall. When the syscall blocks, that thread mailbox address is stored int the KSEC, and it is zero's out from the KSE's mailbox. When an upcall happens, the KSE adds the linked list of all completed syscall's mailboxes in that same KSE mailbox, as well. The UTS just takes that list, and adds the threads mentionned onto it's lists of runnable threads, and then makes a schedulaing decision and runs the highest priority thread. It sets the mailbox address of that thread into the KSE's mailbox, and jumps into the thread.. etc.etc. I haven't mentionned KSEGs here but if you are limited to N KSEs, you want a container into which you want to put extra competeing KSEs (for example a super High prority thread). usually you just have one KSEG, but you may start another, in which they are treated by teh system much like two separate processes. each with it's own KSEs. more later. Julian -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000 ---> X_.---._/ presently in: Budapest v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Nov 26 23:44:37 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mail-relay.eunet.no (mail-relay.eunet.no [193.71.71.242]) by hub.freebsd.org (Postfix) with ESMTP id B268E37B4C5; Sun, 26 Nov 2000 23:44:34 -0800 (PST) Received: from login-1.eunet.no (login-1.eunet.no [193.75.110.2]) by mail-relay.eunet.no (8.9.3/8.9.3/GN) with ESMTP id IAA12864; Mon, 27 Nov 2000 08:43:57 +0100 (CET) (envelope-from mbendiks@eunet.no) Received: from localhost (mbendiks@localhost) by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id IAA54208; Mon, 27 Nov 2000 08:43:57 +0100 (CET) (envelope-from mbendiks@eunet.no) X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs Date: Mon, 27 Nov 2000 08:43:57 +0100 (CET) From: Marius Bendiksen To: John Baldwin Cc: Jake Burkholder , Daniel Eischen , arch@FreeBSD.org, Jonathan Lemon Subject: Re: Thread-specific data and KSEs In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Just a short question. As I recall, the Wine people had a lot of difficulty with FreeBSD due to our abuse of the %fs register. Wouldn't using %gs as well just aggravate this problem? Besides, as I recall, the process could likely be obtained from the tss number, which can be retrieved with str. And additional data could actually be stuck in the tss. Marius To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Nov 26 23:49:27 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mail-relay.eunet.no (mail-relay.eunet.no [193.71.71.242]) by hub.freebsd.org (Postfix) with ESMTP id D0AC037B479; Sun, 26 Nov 2000 23:49:24 -0800 (PST) Received: from login-1.eunet.no (login-1.eunet.no [193.75.110.2]) by mail-relay.eunet.no (8.9.3/8.9.3/GN) with ESMTP id IAA14456; Mon, 27 Nov 2000 08:49:23 +0100 (CET) (envelope-from mbendiks@eunet.no) Received: from localhost (mbendiks@localhost) by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id IAA54244; Mon, 27 Nov 2000 08:49:23 +0100 (CET) (envelope-from mbendiks@eunet.no) X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs Date: Mon, 27 Nov 2000 08:49:23 +0100 (CET) From: Marius Bendiksen To: Alfred Perlstein Cc: Daniel Eischen , John Baldwin , Jonathan Lemon , arch@FreeBSD.ORG Subject: Re: Thread-specific data and KSEs In-Reply-To: <20001121192331.E18037@fw.wintelcom.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > It's just one more register that has to be saved. I don't > > think it's going to matter much. > No extra TLB faults/invalidations? Aren't segment registers > somewhat expensive to load? Upon loading a task state (with ltr or a gate), you will restore all segment registers from the tss, regardless of their content, and a load of the shadow portion of the segment will be attempted anyway. I don't think this is the right place to shave off cycles, nor do I think the speed is even the most relevant issue for this extension, but rather the abuse of segments that are ment to hold real data. Marius To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 10:33: 3 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222]) by hub.freebsd.org (Postfix) with ESMTP id 9F12B37B479; Mon, 27 Nov 2000 10:32:53 -0800 (PST) Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241]) by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eARIWpC39794; Mon, 27 Nov 2000 10:32:51 -0800 (PST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20001126210014.D824BBA7A@io.yi.org> Date: Mon, 27 Nov 2000 10:33:04 -0800 (PST) From: John Baldwin To: Jake Burkholder Subject: RE: review: callout patch Cc: smp@FreeBSD.org, arch@FreeBSD.org Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 26-Nov-00 Jake Burkholder wrote: > > This patch makes most of sys/kern/* sources use callout_reset for > registering callouts rather than timeout(9). This should greatly > reduce the use of the fixed size callfree allocator pool. Currently > we panic when it runs out. > > This was motivated by NetBSD, who have completely removed timeout(9) > from their kernel. > > Please review it. Looks good to me. :) Having a callout.9 manpage to go along with it would be nice as well. :) -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 10:53:58 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222]) by hub.freebsd.org (Postfix) with ESMTP id 1277737B4E5 for ; Mon, 27 Nov 2000 10:53:45 -0800 (PST) Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241]) by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eARIr3C40719; Mon, 27 Nov 2000 10:53:03 -0800 (PST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Mon, 27 Nov 2000 10:53:16 -0800 (PST) From: John Baldwin To: Marius Bendiksen Subject: Re: Thread-specific data and KSEs Cc: arch@FreeBSD.org, Jonathan Lemon , Daniel Eischen , Alfred Perlstein Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 27-Nov-00 Marius Bendiksen wrote: >> > It's just one more register that has to be saved. I don't >> > think it's going to matter much. >> No extra TLB faults/invalidations? Aren't segment registers >> somewhat expensive to load? > > Upon loading a task state (with ltr or a gate), you will restore all > segment registers from the tss, regardless of their content, and a load of > the shadow portion of the segment will be attempted anyway. I don't think > this is the right place to shave off cycles, nor do I think the speed is > even the most relevant issue for this extension, but rather the abuse of > segments that are ment to hold real data. Erm, we don't use task gates or a TSS for our task switches. Go look at cpu_switch() in sys/i386/i386/swtch.s. %fs and %gs are intended to be used for per-CPU data and thread-local storage, which is why x86-64 keeps them around even after axeing %cs, %ds, %es, and %ss. > Marius -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 11:26: 4 2000 Delivered-To: freebsd-arch@freebsd.org Received: from io.yi.org (unknown [24.70.218.157]) by hub.freebsd.org (Postfix) with ESMTP id C8F1337B479; Mon, 27 Nov 2000 11:25:59 -0800 (PST) Received: from io.yi.org (localhost.gvcl1.bc.wave.home.com [127.0.0.1]) by io.yi.org (Postfix) with ESMTP id 897B3BA7A; Mon, 27 Nov 2000 11:25:58 -0800 (PST) X-Mailer: exmh version 2.1.1 10/15/1999 To: John Baldwin Cc: smp@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: review: callout patch In-Reply-To: Message from John Baldwin of "Mon, 27 Nov 2000 10:33:04 PST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 27 Nov 2000 11:25:58 -0800 From: Jake Burkholder Message-Id: <20001127192558.897B3BA7A@io.yi.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > On 26-Nov-00 Jake Burkholder wrote: > > > > This patch makes most of sys/kern/* sources use callout_reset for > > registering callouts rather than timeout(9). This should greatly > > reduce the use of the fixed size callfree allocator pool. Currently > > we panic when it runs out. > > > > This was motivated by NetBSD, who have completely removed timeout(9) > > from their kernel. > > > > Please review it. > > Looks good to me. :) > > Having a callout.9 manpage to go along with it would be nice as well. :) timeout.9 exists, its just not linked. > > -- > > John Baldwin -- http://www.FreeBSD.org/~jhb/ > PGP Key: http://www.baldwin.cx/~john/pgpkey.asc > "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 11:34:54 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222]) by hub.freebsd.org (Postfix) with ESMTP id 64A9137B479; Mon, 27 Nov 2000 11:34:51 -0800 (PST) Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241]) by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eARJYnC43140; Mon, 27 Nov 2000 11:34:49 -0800 (PST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20001127192558.897B3BA7A@io.yi.org> Date: Mon, 27 Nov 2000 11:35:02 -0800 (PST) From: John Baldwin To: Jake Burkholder Subject: Re: review: callout patch Cc: arch@FreeBSD.org, smp@FreeBSD.org Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 27-Nov-00 Jake Burkholder wrote: >> >> On 26-Nov-00 Jake Burkholder wrote: >> > >> > This patch makes most of sys/kern/* sources use callout_reset for >> > registering callouts rather than timeout(9). This should greatly >> > reduce the use of the fixed size callfree allocator pool. Currently >> > we panic when it runs out. >> > >> > This was motivated by NetBSD, who have completely removed timeout(9) >> > from their kernel. >> > >> > Please review it. >> >> Looks good to me. :) >> >> Having a callout.9 manpage to go along with it would be nice as well. :) > > timeout.9 exists, its just not linked. Ah, I had thought timeout(9) didn't document those. Well, then updating timeout(9) and adding appropriate MLINK's would be cool. :) -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 14: 8:29 2000 Delivered-To: freebsd-arch@freebsd.org Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193]) by hub.freebsd.org (Postfix) with ESMTP id D557B37B4C5 for ; Mon, 27 Nov 2000 14:08:27 -0800 (PST) Received: (from wollman@localhost) by khavrinen.lcs.mit.edu (8.9.3/8.9.3) id RAA97482; Mon, 27 Nov 2000 17:08:16 -0500 (EST) (envelope-from wollman) Date: Mon, 27 Nov 2000 17:08:16 -0500 (EST) From: Garrett Wollman Message-Id: <200011272208.RAA97482@khavrinen.lcs.mit.edu> To: jburkhol@home.com Cc: arch@freebsd.org Subject: Re: review: callout patch X-Newsgroups: mit.lcs.mail.freebsd-arch In-Reply-To: Organization: MIT Laboratory for Computer Science Cc: Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In article you write: >This should greatly reduce the use of the fixed size callfree >allocator pool. Keep in mind that the size of the callout wheel is currently based on the number of pre-allocated callout structures there are. This needs to be revisited now that the number is effectively unlimited. Some instrumentation would be very helpful. -GAWollman -- Garrett A. Wollman | O Siem / We are all family / O Siem / We're all the same wollman@lcs.mit.edu | O Siem / The fires of freedom Opinions not those of| Dance in the burning flame MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 14: 9: 4 2000 Delivered-To: freebsd-arch@freebsd.org Received: from winston.osd.bsdi.com (winston.osd.bsdi.com [204.216.27.229]) by hub.freebsd.org (Postfix) with ESMTP id AFFDC37B4C5 for ; Mon, 27 Nov 2000 14:08:48 -0800 (PST) Received: from winston.osd.bsdi.com (localhost [127.0.0.1]) by winston.osd.bsdi.com (8.11.1/8.11.1) with ESMTP id eARM8jh52696; Mon, 27 Nov 2000 14:08:45 -0800 (PST) (envelope-from jkh@winston.osd.bsdi.com) To: arch@freebsd.org Cc: rps@merlin.mat.uc.pt Subject: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) Date: Mon, 27 Nov 2000 14:08:45 -0800 Message-ID: <52694.975362925@winston.osd.bsdi.com> From: Jordan Hubbard Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I just received this today and am kind of scratching my head over it. On one hand, creating an "alias" for a one specific piece of terminal character mapping seems a hack; I can see the idea behind wanting to use one of n characters for something like backspacing or line-killing (^U or ^X for example) and would not frown (as much) on a more general aliasing feature. On the other hand, I can see that this specific case (erase) is by far the most significant. Which is why I'm forwarding this to arch - this is one of those classic architecture/feature trade-off decisions and I would like to hear more opinions before deciding which way I'd like to respond to this. - Jordan ------- Forwarded Message Return-Path: rps@merlin.mat.uc.pt Delivery-Date: Mon Nov 27 12:02:08 2000 Return-Path: Received: from merlin.mat.uc.pt (merlin-f.mat.uc.pt [193.137.206.2]) by winston.osd.bsdi.com (8.11.1/8.11.1) with ESMTP id eARK25h52306 for ; Mon, 27 Nov 2000 12:02:05 -0800 (PST) (envelope-from rps@merlin.mat.uc.pt) Received: (from rps@localhost) by merlin.mat.uc.pt (8.9.3/8.9.0) id UAA06153; Mon, 27 Nov 2000 20:01:52 GMT Message-ID: <20001127200149.05857@merlin.mat.uc.pt> Date: Mon, 27 Nov 2000 20:01:49 +0000 From: Rui Pedro Mendes Salgueiro To: Jordan Hubbard Subject: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) References: <20001122191141.50422@merlin.mat.uc.pt> <80298.974921931@winston.osd.bsdi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.89.1i In-Reply-To: <80298.974921931@winston.osd.bsdi.com>; from Jordan Hubbard on Wed, Nov 22, 2000 at 11:38:51AM -0800 I am not sure if you are the proper person to send this: One thing that has bothered me since a long time ago is the two confliting standards for the erase character: ^H (backspace) and ^? (del). Years ago I used a Convex (mini-super-)computer which solved the problem in an elegant way. stty(1) had an extra option for a erase2 character. So you could have both usual erase chars working simultaneously. Then, around 1993, I reimplemented that in an early version of BSDI (1.0?). At that time, I also tried other tricks, like a flag to replace each ^H with an ^? and the reverse, but those interfered with Emacs (which uses ^H). The next BSDI version lacked the necessary kernel source file due to the ATT lawsuit, so I could not reimplement it. Much later I started to reimplement it on FreeBSD, but this is the first time I managed to get a new release (4.2) before it is obsolete. dingo# uname -a FreeBSD dingo.mat.uc.pt 4.2-RELEASE FreeBSD 4.2-RELEASE #0: Mon Nov 27 13:35:57 WET 2000 rps@dingo.mat.uc.pt:/usr/src/sys/compile/GENERIC i386 dingo# stty -a [...] cchars: discard = ^O; dsusp = ^Y; eof = ^D; eol = ; eol2 = ; erase = ^?; erase2 = ^H; intr = ^C; kill = ^U; [...] The needed patches are simple: 1 - use a spare slot in the c_cc[] character array. This effects the header files "termios.h". 2 - define the default char for it in ttydefaults.h . Also, include it in the ttydefchars array. (these files are in /usr/src/sys/sys and /usr/include/sys/ ) 3 - the file tty.c in the kernel (/usr/src/sys/kern) is the one that does the real work. The modification there is just adding an OR to the relevant "if". 4 - modify stty(1) (/usr/src/bin/stty/cchar.c) so it knows about erase2. It is just needed to add the a line to the initialization of cchars1. 5 - document it in the man page (/usr/src/bin/stty/stty.1 ). Patch follows (paths are realtive to /usr/src ) *** ./bin/stty/cchar.c.orig Sat Aug 28 00:15:40 1999 - --- ./bin/stty/cchar.c Mon Nov 27 13:11:33 2000 *************** *** 64,69 **** - --- 64,70 ---- { "eol", VEOL, CEOL }, { "eol2", VEOL2, CEOL }, { "erase", VERASE, CERASE }, + { "erase2", VERASE2, CERASE2 }, { "intr", VINTR, CINTR }, { "kill", VKILL, CKILL }, { "lnext", VLNEXT, CLNEXT }, *** ./bin/stty/stty.1.orig Wed Mar 1 10:43:07 2000 - --- ./bin/stty/stty.1 Mon Nov 27 13:20:29 2000 *************** *** 374,379 **** - --- 374,380 ---- .It eol Ta Tn VEOL EOL No character .It eol2 Ta Tn VEOL2 EOL2 No character .It erase Ta Tn VERASE ERASE No character + .It erase2 Ta Tn VERASE2 ERASE2 No character .It werase Ta Tn VWERASE WERASE No character .It intr Ta Tn VINTR INTR No character .It kill Ta Tn VKILL KILL No character *************** *** 420,426 **** -nl unsets inlcr and igncr. .It Cm ek Reset ! .Dv ERASE and .Dv KILL characters - --- 421,428 ---- -nl unsets inlcr and igncr. .It Cm ek Reset ! .Dv ERASE , ! .Dv ERASE2 , and .Dv KILL characters *** ./sys/kern/tty.c.orig Thu Aug 3 01:09:33 2000 - --- ./sys/kern/tty.c Mon Nov 27 13:26:44 2000 *************** *** 452,460 **** * processing takes place. */ /* ! * erase (^H / ^?) */ ! if (CCEQ(cc[VERASE], c)) { if (tp->t_rawq.c_cc) ttyrub(unputc(&tp->t_rawq), tp); goto endcase; - --- 452,460 ---- * processing takes place. */ /* ! * erase or erase2 (^H / ^?) */ ! if (CCEQ(cc[VERASE], c) || CCEQ(cc[VERASE2], c) ) { if (tp->t_rawq.c_cc) ttyrub(unputc(&tp->t_rawq), tp); goto endcase; *************** *** 2003,2010 **** (void)ttyoutput('\\', tp); } ttyecho(c, tp); ! } else ttyecho(tp->t_cc[VERASE], tp); --tp->t_rocount; } - --- 2003,2019 ---- (void)ttyoutput('\\', tp); } ttyecho(c, tp); ! } else { ttyecho(tp->t_cc[VERASE], tp); + /* + * This code may be executed not only when an ERASE key + * is pressed, but also when ^U (KILL) or ^W (WERASE) are. + * So, I didn't think it was worthwhile to pass the extra + * information (which would need an extra parameter, + * changing every call) needed to distinguish the ERASE2 + * case from the ERASE. + */ + } --tp->t_rocount; } *** ./sys/sys/termios.h.orig Wed Dec 29 04:24:48 1999 - --- ./sys/sys/termios.h Mon Nov 27 13:06:35 2000 *************** *** 56,63 **** #define VKILL 5 /* ICANON */ #ifndef _POSIX_SOURCE #define VREPRINT 6 /* ICANON together with IEXTEN */ #endif ! /* 7 spare 1 */ #define VINTR 8 /* ISIG */ #define VQUIT 9 /* ISIG */ #define VSUSP 10 /* ISIG */ - --- 56,64 ---- #define VKILL 5 /* ICANON */ #ifndef _POSIX_SOURCE #define VREPRINT 6 /* ICANON together with IEXTEN */ + #define VERASE2 7 /* ICANON */ #endif ! /* 7 ex-spare 1 */ #define VINTR 8 /* ISIG */ #define VQUIT 9 /* ISIG */ #define VSUSP 10 /* ISIG */ *** ./sys/sys/ttydefaults.h.orig Sat Aug 28 01:52:07 1999 - --- ./sys/sys/ttydefaults.h Mon Nov 27 13:09:13 2000 *************** *** 61,66 **** - --- 61,67 ---- #define CEOF CTRL('d') #define CEOL 0xff /* XXX avoid _POSIX_VDISABLE */ #define CERASE 0177 + #define CERASE2 CTRL('h') #define CINTR CTRL('c') #define CSTATUS CTRL('t') #define CKILL CTRL('u') *************** *** 90,96 **** #ifdef TTYDEFCHARS static cc_t ttydefchars[NCCS] = { CEOF, CEOL, CEOL, CERASE, CWERASE, CKILL, CREPRINT, ! _POSIX_VDISABLE, CINTR, CQUIT, CSUSP, CDSUSP, CSTART, CSTOP, CLNEXT, CDISCARD, CMIN, CTIME, CSTATUS, _POSIX_VDISABLE }; #undef TTYDEFCHARS - --- 91,97 ---- #ifdef TTYDEFCHARS static cc_t ttydefchars[NCCS] = { CEOF, CEOL, CEOL, CERASE, CWERASE, CKILL, CREPRINT, ! CERASE2, CINTR, CQUIT, CSUSP, CDSUSP, CSTART, CSTOP, CLNEXT, CDISCARD, CMIN, CTIME, CSTATUS, _POSIX_VDISABLE }; #undef TTYDEFCHARS ------- End of Forwarded Message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 14:39:55 2000 Delivered-To: freebsd-arch@freebsd.org Received: from post.mail.nl.demon.net (post-10.mail.nl.demon.net [194.159.73.20]) by hub.freebsd.org (Postfix) with ESMTP id 096F937B4C5 for ; Mon, 27 Nov 2000 14:39:53 -0800 (PST) Received: from [212.238.54.101] (helo=freebie.demon.nl) by post.mail.nl.demon.net with smtp (Exim 3.14 #2) id 140WwB-0002ym-00; Mon, 27 Nov 2000 22:39:51 +0000 Received: (from wkb@localhost) by freebie.demon.nl (8.11.1/8.11.0) id eARMd2R02442; Mon, 27 Nov 2000 23:39:02 +0100 (CET) (envelope-from wkb) Date: Mon, 27 Nov 2000 23:39:02 +0100 From: Wilko Bulte To: Jordan Hubbard Cc: arch@freebsd.org, rps@merlin.mat.uc.pt Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) Message-ID: <20001127233902.C2402@freebie.demon.nl> References: <52694.975362925@winston.osd.bsdi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <52694.975362925@winston.osd.bsdi.com>; from jkh@winston.osd.bsdi.com on Mon, Nov 27, 2000 at 02:08:45PM -0800 X-OS: FreeBSD 4.2-RELEASE X-PGP: finger wilko@freebsd.org Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, Nov 27, 2000 at 02:08:45PM -0800, Jordan Hubbard wrote: If you ever used DEC or Sun terminals/keyboards you will like this idea. I, for one, like it ;-) As for technical elegance.. W/ > I just received this today and am kind of scratching my head over it. > On one hand, creating an "alias" for a one specific piece of terminal > character mapping seems a hack; I can see the idea behind wanting to > use one of n characters for something like backspacing or line-killing > (^U or ^X for example) and would not frown (as much) on a more general > aliasing feature. On the other hand, I can see that this specific > case (erase) is by far the most significant. Which is why I'm > forwarding this to arch - this is one of those classic > architecture/feature trade-off decisions and I would like to hear more > opinions before deciding which way I'd like to respond to this. > > - Jordan > > ------- Forwarded Message > > Return-Path: rps@merlin.mat.uc.pt > Delivery-Date: Mon Nov 27 12:02:08 2000 > Return-Path: > Received: from merlin.mat.uc.pt (merlin-f.mat.uc.pt [193.137.206.2]) > by winston.osd.bsdi.com (8.11.1/8.11.1) with ESMTP id eARK25h52306 > for ; Mon, 27 Nov 2000 12:02:05 -0800 (PST) > (envelope-from rps@merlin.mat.uc.pt) > Received: (from rps@localhost) > by merlin.mat.uc.pt (8.9.3/8.9.0) id UAA06153; > Mon, 27 Nov 2000 20:01:52 GMT > Message-ID: <20001127200149.05857@merlin.mat.uc.pt> > Date: Mon, 27 Nov 2000 20:01:49 +0000 > From: Rui Pedro Mendes Salgueiro > To: Jordan Hubbard > Subject: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) > References: <20001122191141.50422@merlin.mat.uc.pt> <80298.974921931@winston.osd.bsdi.com> > Mime-Version: 1.0 > Content-Type: text/plain; charset=us-ascii > X-Mailer: Mutt 0.89.1i > In-Reply-To: <80298.974921931@winston.osd.bsdi.com>; from Jordan Hubbard on Wed, Nov 22, 2000 at 11:38:51AM -0800 > > I am not sure if you are the proper person to send this: > > One thing that has bothered me since a long time ago is the two confliting > standards for the erase character: ^H (backspace) and ^? (del). > > Years ago I used a Convex (mini-super-)computer which solved the problem > in an elegant way. stty(1) had an extra option for a erase2 character. > So you could have both usual erase chars working simultaneously. ... -- Wilko Bulte Arnhem, the Netherlands wilko@freebsd.org http://www.freebsd.org http://www.nlfug.nl To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 14:41:10 2000 Delivered-To: freebsd-arch@freebsd.org Received: from beastie.mckusick.com (beastie.mckusick.com [209.31.233.184]) by hub.freebsd.org (Postfix) with ESMTP id 36A9837B4C5 for ; Mon, 27 Nov 2000 14:41:08 -0800 (PST) Received: from beastie.mckusick.com (localhost [127.0.0.1]) by beastie.mckusick.com (8.9.3/8.9.3) with ESMTP id OAA93364; Mon, 27 Nov 2000 14:41:05 -0800 (PST) (envelope-from mckusick@beastie.mckusick.com) Message-Id: <200011272241.OAA93364@beastie.mckusick.com> To: Jordan Hubbard Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) Cc: arch@FreeBSD.ORG In-Reply-To: Your message of "Mon, 27 Nov 2000 14:08:45 PST." <52694.975362925@winston.osd.bsdi.com> Date: Mon, 27 Nov 2000 14:41:05 -0800 From: Kirk McKusick Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG When we first implemented termios at CSRG, we had an erase2 character. Mike Karels was vehemently opposed to it, and insisted that it be deleted before we did our next release (4.3-tahoe if I remember correctly). I am of the opinion that it is a good idea, and should be there. I do not believe that we need/want a general aliasing facility as erase is really the only character for which there is widespead disagreement over which character to use. So, my take would be to add erase2 and be done with it. Kirk McKusick To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 14:47:28 2000 Delivered-To: freebsd-arch@freebsd.org Received: from citusc17.usc.edu (citusc17.usc.edu [128.125.38.177]) by hub.freebsd.org (Postfix) with ESMTP id ED37737B479 for ; Mon, 27 Nov 2000 14:47:26 -0800 (PST) Received: (from kris@localhost) by citusc17.usc.edu (8.11.1/8.11.1) id eARMm9c67449; Mon, 27 Nov 2000 14:48:10 -0800 (PST) (envelope-from kris) Date: Mon, 27 Nov 2000 14:48:09 -0800 From: Kris Kennaway To: Jordan Hubbard Cc: arch@FreeBSD.ORG, rps@merlin.mat.uc.pt Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) Message-ID: <20001127144809.A67395@citusc17.usc.edu> References: <52694.975362925@winston.osd.bsdi.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="W/nzBZO5zC0uMSeA" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <52694.975362925@winston.osd.bsdi.com>; from jkh@winston.osd.bsdi.com on Mon, Nov 27, 2000 at 02:08:45PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG --W/nzBZO5zC0uMSeA Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Mon, Nov 27, 2000 at 02:08:45PM -0800, Jordan Hubbard wrote: > I just received this today and am kind of scratching my head over it. > On one hand, creating an "alias" for a one specific piece of terminal > character mapping seems a hack; I can see the idea behind wanting to > use one of n characters for something like backspacing or line-killing > (^U or ^X for example) and would not frown (as much) on a more general > aliasing feature. On the other hand, I can see that this specific > case (erase) is by far the most significant. Which is why I'm > forwarding this to arch - this is one of those classic > architecture/feature trade-off decisions and I would like to hear more > opinions before deciding which way I'd like to respond to this. This is a very common newbie problem ("Stupid FreeBSD won't let me delete what I've typed, it just prints ^H!"). Commit please! :) Kris --W/nzBZO5zC0uMSeA Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.4 (FreeBSD) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjoi5KgACgkQWry0BWjoQKXoTgCeNn+hADhsnoOrYTlphOsB0wAu wKsAoKL4inb6IXesYokZf40t2h/G0qAB =/mtv -----END PGP SIGNATURE----- --W/nzBZO5zC0uMSeA-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 15: 7:25 2000 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135]) by hub.freebsd.org (Postfix) with ESMTP id 78D7437B479; Mon, 27 Nov 2000 15:07:23 -0800 (PST) Received: (from dillon@localhost) by earth.backplane.com (8.11.1/8.9.3) id eARN7Ln34886; Mon, 27 Nov 2000 15:07:21 -0800 (PST) (envelope-from dillon) Date: Mon, 27 Nov 2000 15:07:21 -0800 (PST) From: Matt Dillon Message-Id: <200011272307.eARN7Ln34886@earth.backplane.com> To: Kris Kennaway Cc: Jordan Hubbard , arch@FreeBSD.ORG, rps@merlin.mat.uc.pt Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) References: <52694.975362925@winston.osd.bsdi.com> <20001127144809.A67395@citusc17.usc.edu> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :> I just received this today and am kind of scratching my head over it. :> On one hand, creating an "alias" for a one specific piece of terminal :> character mapping seems a hack; I can see the idea behind wanting to :> use one of n characters for something like backspacing or line-killing :> (^U or ^X for example) and would not frown (as much) on a more general :> aliasing feature. On the other hand, I can see that this specific :> case (erase) is by far the most significant. Which is why I'm :> forwarding this to arch - this is one of those classic :> architecture/feature trade-off decisions and I would like to hear more :> opinions before deciding which way I'd like to respond to this. : :This is a very common newbie problem ("Stupid FreeBSD won't let me :delete what I've typed, it just prints ^H!"). Commit please! :) : :Kris This is one of those things where, 10 years ago, I would probably have been a purist and been opposed to it. But after 15+ years of pure hell having to deal with every conceivable combination of ^H and ^?, terminal types, telnet, rlogin, ssh, and so on and so forth... I say to hell with the purist view on this one. I'd love to see this committed! -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 15:12:18 2000 Delivered-To: freebsd-arch@freebsd.org Received: from dan.emsphone.com (dan.emsphone.com [199.67.51.101]) by hub.freebsd.org (Postfix) with ESMTP id 5E79D37B479 for ; Mon, 27 Nov 2000 15:12:16 -0800 (PST) Received: (from dan@localhost) by dan.emsphone.com (8.11.1/8.11.1) id eARNC5w15510; Mon, 27 Nov 2000 17:12:05 -0600 (CST) (envelope-from dan) Date: Mon, 27 Nov 2000 17:12:05 -0600 From: Dan Nelson To: Wilko Bulte Cc: Jordan Hubbard , arch@FreeBSD.ORG, rps@merlin.mat.uc.pt Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) Message-ID: <20001127171205.B22109@dan.emsphone.com> References: <52694.975362925@winston.osd.bsdi.com> <20001127233902.C2402@freebie.demon.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.12i In-Reply-To: <20001127233902.C2402@freebie.demon.nl>; from "Wilko Bulte" on Mon Nov 27 23:39:02 GMT 2000 X-OS: FreeBSD 5.0-CURRENT Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In the last episode (Nov 27), Wilko Bulte said: > If you ever used DEC or Sun terminals/keyboards you will like this > idea. I, for one, like it ;-) As for technical elegance.. There's precedent; we've already got "eol" and "eol2", both of which seem to default to undefined :) -- Dan Nelson dnelson@emsphone.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 15:40:11 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 5AC8A37B479 for ; Mon, 27 Nov 2000 15:40:09 -0800 (PST) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id SAA21890; Mon, 27 Nov 2000 18:39:45 -0500 (EST) Date: Mon, 27 Nov 2000 18:39:45 -0500 (EST) From: Daniel Eischen To: Matt Dillon Cc: Jordan Hubbard , arch@FreeBSD.ORG Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) In-Reply-To: <200011272307.eARN7Ln34886@earth.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, 27 Nov 2000, Matt Dillon wrote: > This is one of those things where, 10 years ago, I would probably > have been a purist and been opposed to it. > > But after 15+ years of pure hell having to deal with every > conceivable combination of ^H and ^?, terminal types, > telnet, rlogin, ssh, and so on and so forth... I say to > hell with the purist view on this one. I'd love to > see this committed! I agree! Commit this now before there are any objections! -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 15:49: 4 2000 Delivered-To: freebsd-arch@freebsd.org Received: from magnesium.net (toxic.magnesium.net [207.154.84.15]) by hub.freebsd.org (Postfix) with SMTP id A9B3F37B479 for ; Mon, 27 Nov 2000 15:48:54 -0800 (PST) Received: (qmail 94023 invoked by uid 1142); 27 Nov 2000 23:48:53 -0000 Date: 27 Nov 2000 15:48:53 -0800 Date: Mon, 27 Nov 2000 14:30:58 -0800 From: Jason Evans To: Julian Elischer Cc: arch@freebsd.org Subject: Re: Threads (KSE etc) comments Message-ID: <20001127143058.L4140@canonware.com> References: <3A15A2C1.1F3FB6CD@elischer.org> <3A192821.13463950@elischer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3A192821.13463950@elischer.org>; from julian@elischer.org on Mon, Nov 20, 2000 at 05:33:21AM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, Nov 20, 2000 at 05:33:21AM -0800, Julian Elischer wrote: > I've been thinking about the scheduling queues, and how to make sure > that the process (KSEG actually) acts fairly with respect to other > processes. I was confised for a while by your description. I think part > of my confusion came from something that we specified in the meeting but > has not been written in your document directly. Let me see if we are > agreed on what we decided.. > > A KSEG can only have as a maximum N KSEs associated with it, where N is > the number of processors, (unless artificially reduced by a lower > concurency declaration). (you said this but only indirectly). There's no particular reason that we need to enforce a limit on the number of KSEs within a KSEG (aside from resource limits), but in practice, there's no reason that a program would want to create more KSEs within a KSEG than there are processors. > In > general, KSEs are each assigned to a processor. They do not in general > move between processors unless some explicit adjustment is being > made(*), and as a general rule, two KSEs will not be assigned to the > same processor. (in some transitional moments this may be allowed to > briefly happen) This in general if you run a KSEC on the same KSE it was > run on last time, you should be on the same processor, > (and get any affinity advantages that might exist). KSEs need to be able to float between processors in order to make use of all the processors if there are fewer KSEs in a KSEG than there are processors (in other words, KSEG concurrency less than the number of processors). In general practice, KSEs will tend to stay on the same processor, but CPU load balancing may cause KSEs to migrate from time to time. > (*)I am inclined to make the requirement of binding KSEs to processors > HARD,as this allows us to simplify some later decisions. I wanted the binding to be soft, in order to simplify things. =) > For example, if > we hard bind KSEs to procesors then since we assign a different > communications mailbox for each KSE we create, we can be sure that > different KSEs will never preempt each other when writing out to their > mailboxes. this also means that since there can only be one UTS > incarnation active per KSE (or one KSE per UTS incarnation), that we can > not have a UTS preempted by another incarnation on the same processor. > We can therefore make sure that there needs to be no locking on > mailboxes, or even any checking. The case where a KSE is preempted, only to be replaced by another KSE within the same KSEG has no real meaning, and I expect we'd specifically write the scheduler to avoid ever doing that. > I think this is what we decided.. is this correct? The binding is not > really mentioned in your document. I made a number of minor changes to the design after our discussions. Almost all of the changes were made in order to simplify implementation. In this case, I felt that not binding KSEs to CPUs would make the scheduler much simpler to implement, with no significant down sides. If I'm missing something that actually makes the changes more complex, please don't let the issue drop; simplicity and efficiency are key. > When we were talking about it, (at least in my memory) Each KSE had a > mailbox. My memory of this was that we called a KSE creation call with a > different argument, thus each KSE had a different return stack frame > when it made upcalls. In the version you have outlined, there is no KSE > creation call only KSEG creation calls. Thus all upcalls have the same > frame, and there is the danger of colliding upcalls for different > processors. I think it works more naturally with everything just > 'falling into place' if we have calls to create KSEs rather than KSEGs. > The "make KSEG" call is simply a version of the "make KSE" call that > also puts it into the new different group. You are left with teh very > first 'original' thread being different in my shceme, but my answer to > this would be to simply make the first "make KSE" call reuse the current > stack etc. and not return a new one. > > [...] Yes, this is a shortcoming of the current paper. I couldn't remember how we had decided to do this, and was still working it out in my head. Thanks for the reminder. > When we have per-processor scheduling queues, there is only at most ONE > KSE from any given KSEG in the scheduling queues for any given > processor. As mentioned above, I don't think we need to enforce this. > With the single scheduling queue we have now do we allow N to be in the > queues at once? (or do we put the KSEG in instead?) We would still put all the KSEs in the scheduling queue. However, I think we really need to do the scheduler overhaul close to the same time as the KSE changes, so that we never have production releases of FreeBSD running this way. > The terms KSE etc. have probably served their useful life. > It's time to think of or find names that really describe them better > > KSE -- a per process processor.. slot? openning? (a-la CAM/SCSI) > KSEC ---- stack plus context... KSC..trying to do something (task?) > KSEG ---- a class of schedulable entities.. A slot cluster? :-) > PROC ---- probably needs to stay the same. I'm not particularly attached to the names, but finding something better may be hard. =) Jason To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 15:49: 4 2000 Delivered-To: freebsd-arch@freebsd.org Received: from magnesium.net (toxic.magnesium.net [207.154.84.15]) by hub.freebsd.org (Postfix) with SMTP id B02DE37B4C5 for ; Mon, 27 Nov 2000 15:48:54 -0800 (PST) Received: (qmail 94026 invoked by uid 1142); 27 Nov 2000 23:48:54 -0000 Date: 27 Nov 2000 15:48:54 -0800 Date: Mon, 27 Nov 2000 15:48:00 -0800 From: Jason Evans To: Julian Elischer Cc: arch@FreeBSD.ORG Subject: Re: Threads (KSE etc) comments Message-ID: <20001127154800.M4140@canonware.com> References: <3A1B0B64.6D694248@elischer.org> <3A211C82.2464D07E@elischer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3A211C82.2464D07E@elischer.org>; from julian@elischer.org on Sun, Nov 26, 2000 at 06:21:54AM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, Nov 26, 2000 at 06:21:54AM -0800, Julian Elischer wrote: > There has been some discussion as to what the function of the KSEG > is.... > > [...] why we needed KSEGs. > The basic answer is, > > "We need some method by which we group the scheduled entities > so as to be able to ensure that the scheduler has full > information and control over what is going on." Yes. > Whether we actually need a KSEG and what it does depends upon what > semantics we want our threading support to have. If we want to provide a > virtual machine for the process, that looks as if it has an unlimited > number of virtual processors, then we allow the KSEG to spawn an > unlimited number of KSEs. In this case, do we allow the "scheduling > clout" to build up linearly with the number of KSEs or do we limit it in > some way? Theoretically you would want a KSEG with two KSEs to have the > same clout as a process running unthreaded, so that cpu time would be > divided 50-50. However this would mean assigning the threaded process > 'partial quantum' for each processor. There shouldn't be a need for assigning partial quanta. In the case of a single-threaded process, A, and a multi-threaded process B, on a 2 processor machine, B may initially get ~75% of the CPU resources. However, re-prioritization will notice this and lower the priority of B after a short period of time (4 ticks or so). > Maybe this 'exact fairness' is too hard to achieve.. IMO, the existing priority adjustment mechanisms are probably adequate. > When a KSE is pre-empted, the kernel saves state for that thread in the > thread-control-block and the next KSE to upcall to the UTS will include > that thread-control-block in its list of reportable entities. I'm not > clear on whether it's the next upcall on ANY KSE, or just the next > upcall on that KSE.. I think it should be the next upcall on any KSE. > If the latter then having multiple KSEs on the same processor, allows > the KSEG round-robin scheduler to make the UTS believe that it has N > virtual processors, (N-KSEs). However, it also means that the KSEG > round-robin scheduler is usurping the decision from the UTS as to which > thread is to be run next, as the UTS doesn't know that the thread on the > other KSE was pre-empted in favour of this one. (It's on a different > virtual CPU). I don't understand how we're usurping the UTS's scheduling decisions. If a KSE is preempted, then an upcall (resulting in yet another preemption, if necessary) must be done right away in order to give the UTS enough information to correctly schedule threads on the KSEs that are still running. This is one of the basic tenets of scheduler activations, which we really have to follow. > If the Former (All KSEs report all events) then there is no real > advantage to having more than N KSEs (N processors), because that means > that the UTS will probably keep swapping the threads it thinks are most > important to the KSEs which means that the thread that was pre-empted on > KSE-A will probably be re-scheduled on KSE-B when it preempts KSE-A. So > why have KSE-B at all? All it does is massively confuse things, and > creates a whole new class of scheduling problems. The main advantage I see of allowing more KSEs than processors (total across all KSEGs) is that it simplifies the implementation considerably. Very little has to be changed about how things currently work, which also means that single-threaded applications work the same as they do now, without a lot of extra work. I agree that there's no good reason to have more KSEs in a KSEG than there are processors, but it doesn't actually break anything to allow this, and simply using process resource limits to control the number of KSEs is simpler than enforcing limits on the number of KSEs per KSEG. One example of why enforcing KSE/KSEG limits could become hard in the future is if the number of processors is dynamic (i.e. processors can be added and removed). In discussions I've had with Mike Smith, this is a very real possibility, and is something we should keep in mind. > So, in summary: > Assuming we allow only SLIGHT unfairness, if you allow the process to > have more than N KSEs in a KSEG, you have one of the following: > 1/ A lot of unfairness if you allow each KSE to be in the queues by > itself. Why is there unfairness? Scheduler re-prioritization should prevent long-term unfairness just fine. > 2/ The KSEG scheduler usurping the role of the UTS if it really does > hide the true number of processors. We shouldn't be hiding the true number of processors. > 3/ An increased level of UTS complexity, and un-needed work, as the UTS > struggles to switch the important threads onto the ever-changing set of > running KSEs (it must be ever changing because there are more of them > than CPUs). The UTS doesn't need to be any more complex. It would simply get more upcalls if there were more preemptions as a result of excessive KSEs, which I don't think would happen anyway. > The reason for having KSEGs is simply as an entity that competes for CPU > to assure fairness. > It may not even exist as a separate structure in the case where there > are separate per-CPU scheduling queues, (though I think it would for > efficiency's sake). It would PROBABLY have a analogous partner in the > UTS that represents the virtual machine that runs all the threads that > are competing at the same scope. I agree with everything you say here. > On a single scheduling queue system, I > think I would have the KSEG in the queue rather than the independent > KSEs. When it get's to the head, you schedule > KSEs on all the CPUs. This allows the threads to communicate quickly > using shared memory should they want. The UTS has the entire quantum > across as many CPUs as it has. As I mentioned in another email, I don't think we should plan on having a production release that is implemented with only a single scheduling queue. Jason To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 17: 2:16 2000 Delivered-To: freebsd-arch@freebsd.org Received: from net2.gendyn.com (nat2.gendyn.com [204.60.171.12]) by hub.freebsd.org (Postfix) with ESMTP id 303E337B4C5 for ; Mon, 27 Nov 2000 17:02:09 -0800 (PST) Received: from [153.11.11.3] (helo=plunger.gdeb.com) by net2.gendyn.com with esmtp (Exim 2.12 #1) id 140Z9g-000Ln6-00 for arch@freebsd.org; Mon, 27 Nov 2000 20:01:56 -0500 Received: from orion.caen.gdeb.com ([153.11.109.11]) by plunger.gdeb.com with ESMTP id TAA03337; Mon, 27 Nov 2000 19:58:45 -0500 (EST) Received: from gdeb.com (gpz.clc.gdeb.com [192.168.3.12]) by orion.caen.gdeb.com (8.9.3/8.9.3) with ESMTP id TAA00995; Mon, 27 Nov 2000 19:59:00 -0500 (EST) (envelope-from deischen@gdeb.com) Message-ID: <3A230437.F8318078@gdeb.com> Date: Mon, 27 Nov 2000 20:02:47 -0500 From: Dan Eischen X-Mailer: Mozilla 4.75 [en] (X11; U; SunOS 5.8 sun4u) X-Accept-Language: en MIME-Version: 1.0 To: arch@freebsd.org Cc: julian@elischer.org, jasone@canonware.com Subject: Re: Threads (KSE etc) comments Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I promised to post my thoughts on the User Thread Scheduler. Here they are in hopefully some presentable form. -- Dan Eischen ---------------------------------------------------------------------- 1. Overview This is a discussion of some of the issues in the User Thread Scheduler in the FreeBSD KSE project. This doesn't go into much detail on the design and implementation, but focuses more on the scheduling models and the (userland) API. 2. Definitions Kernel Scheduled Entities (KSE) - This is the entity in which user threads are scheduled. A process may have multiple KSEs in which to schedule threads. A KSE may be viewed as a virtual processor to the UTS. The number of KSEs available for threads with contention scope process may be limited to the actual number of processors in the system (TBD). Kernel Scheduled Entity Group (KSEG) - A group of KSEs that, as outlined so far, contains the shared quantum and priority. This is currently under discussion. Scheduling Allocation Domain - The number of processors over which threads can be scheduled. This really correlates into how many KSEs are allocated to the process. User Thread Scheduler (UTS) - This is responsible for scheduling user level threads over the available processors (KSEs). 3. Scheduling Models Threads are scheduled according to their contention scope and allocation domain. The contention scope, as defined by POSIX, is either system or process contention scope (PTHREAD_SCOPE_SYSTEM and PTHREAD_SCOPE_PROCESS respectively). Each contention scope system thread is bound to its own private KSE. This KSE has its own quantum and priority (if it needs a KSEG to obtain that, so be it) such that the thread competes on a system-wide basis with other system scope threads. Contention scope system threads need not be part of any scheduling queue since there are no other threads in competition for processing time on that KSE. Contention scope process threads are more interesting in that there are several possible scheduling schemes depending on the allocation domain and on the number of quanta granted by the kernel (not including quanta granted for scope system threads). 3.1 Single Queue, Single Allocation Domain This is the common case where there is only 1 CPU and only 1 quantum granted to handle scope process threads. All runnable threads are placed in the same queue and compete for processor time on one scheduleable entity (with 1 quantum). 3.2 Single Queue, Multiple Allocation Domain Here we have N schedulable entities over which threads can be executed. All runnable threads are placed in the same queue and are scheduled onto N schedule entities as they become avaliable. When a thread blocks or completes in a schedulable entity, then another thread is pulled from the single run queue for execution. If a KSE is preempted while running a thread, and in lieu of any hint from the application as to the binding of threads to KSEs, the UTS would only resume that thread on the next KSE if its priority was (strictly) greater than the priority of the next thread in the run queue, or if the preempted thread was in a critical region. Whether or not the scheduled entities have their own quantum is not known at this point, but under the current design it is possible (the UTS could create N KSEG/KSE pairs to allow this instead of N KSEs within 1 KSEG). 3.3 Multiple Queue, Multiple Allocation Domain Again we have N schedulable entities over which threads can be executed, but in this case there are also multiple (up to N) scheduling queues. In this model, there may exist a run queue for each schedulable entity, and threads may be bound a particular entity. As with the single queue model above, whether or not scheduled entities have their own quantum is not yet known, but it is possible with the current design. How does the UTS decide which threads get bound to each of the N scheduled entities? At the least, the application should have the ability to decide this. In lieu of any hint from the application, the UTS could also provide some method of (soft) binding threads to the N scheduled entities and optimize for maximum CPU utilization and minimum thread reassignment. My thought is that we concentrate on keeping it simple for now, and allow for this possiblity later. For the case where not all threads are bound to a specific KSE by the application, there could be a global run queue from which unbound threads are taken. In this case there would be as many as N+1 scheduling queues, with each KSE taking a peek at the priority of the global run queue before deciding on taking a thread from its own run queue. The global run queue would not have to be locked (unless adding/removing a thread), so this would only add the overhead of a couple of instructions to examine its priority. You might also have KSEs that don't have any bound threads, in which case they wouldn't have a run queue and would always obtain threads from the global run queue. Yet another option would be to disallow binding of threads to the original (main) KSE and only allow binding to other KSEs. All unbound threads would be executed on the main KSE (and any other KSE which does not have bound threads). Any KSE that has bound threads would only execute those threads. 4. API For the most part, the POSIX API is sufficient for our needs. But if we want to allow application control of how threads are assigned and scheduled on the KSEs, we could define the following set of interfaces: pthread_setconcurrency() - This is the POSIX interface to set the concurrency level. This will request the desired concurrency level and informs the UTS as to how many KSEs are to be requested from the kernel (which the kernel may limit). Whether or not this also allows additional quantum remains to be seen. Certainly, the UTS could create an additional KSEG/KSE pair for each level of concurrency above 1 in order to achieve additional quantum under the current design. This function does not change/reflect the scheduled entities for system scope threads. The limited concurrency level is returned. If we wanted this routine to act the same as it does under Solaris, then it would actually request the number of entities with quantum and priority (KSEGs as currently defined). thr_create(...) - This would be an alternative to using pthread_setconcurrency() to set the number of KSEs. This function allows an application to specify additional attributes for thread creation. Solaris allows additional flags to be specified, noteably THR_NEW_LWP and THR_BOUND. The effect of specifying THR_BOUND is the same as specifying PTHREAD_SCOPE_SYSTEM. But specifying THR_NEW_LWP (and omitting THR_BOUND) allocates an additional LWP that can be used to schedule unbound (scope process) threads. We could provide a similar flag THR_NEW_KSE (or THR_NEW_KSEG) that could tell the UTS to request an additional KSE (or KSEG) to be used to schedule scope process threads. I'm not too keen on this interface as an alternative to pthread_setconcurrency(), but perhaps it has some merit if we want a Solaris-like API. _kse_self() - returns the current KSE ID, where the integer ID ranges from 0 (for the original KSE of the main process) to M-1 (where M is the total number of KSEs in the process). Solaris provides a similar function _lwp_self(). pthread_bind_np(pthread_t pthread, int kse_id) - binds a given thread to the specified KSE. A kse_id of -1 refers to the current KSE. I suppose this could also be called _kse_bind() depending on how you looked at it. _kse_bind(int kse_id, int processor) - binds a KSE to a particular processor. This might also be called _cpu_bind() if you use _kse_bind instead of pthread_bind_np. Solaris has processor_bind() which can handle both LWPs and PIDs, and pset_bind() which allows binding of LWPs or PIDs to processor sets. This is probably getting a little ahead of ourselves, but something to think about anyways. For a moment, let's make the assumption that each KSE has a priority and quantum, or that we always use a KSEG with one KSE to achieve the same effect. We _could_ now present an interface that is very similar to that provided by Solaris. True, perhaps a KSE (or KSEG) is not as heavy as an LWP in Solaris, but that is just an internal implementation issue. To the application, they are seen as very much similar things. If we provide an API that is very similar to that provided by Solaris, that would make porting Solaris applications trivial. Again, something to think about. 5. Interaction of Existing Scheduling Interfaces We currently have the following interfaces that affect process scheduling: setpriority() rtprio() sched_setparam() sched_setscheduler() any others? In a threaded process, I think these should operate on the entity that contains the quantum and priority, not the process. Whether that is a KSE or a KSEG, I don't know. If it is a KSEG, then that's the only case I can see for forcing the threads library to know anything about KSEGs. Still, the kernel is responsible for setting these priorities, not the UTS, so it wouldn't be strictly necessary for the UTS to have any knowledge about KSEGs. 6. Summary I'd like some resolution as to what interfaces the threads library should provide to the application. I've outlined some of my thoughts above and I'd like some feedback. My biggest question is do we want to provide the ability for a threaded application to request more scheduling time (aside from PTHREAD_SCOPE_SYSTEM threads)? I've already seen applications that always use PTHREAD_SCOPE_SYSTEM when creating threads. I suppose this is mostly in part to obtain as much CPU as possible. At USENIX, Jason and I attended a BOF on threads and it was kind of amazing to me that folks seemed to prefer the LinuxThreads model. Given this attitude, I don't think it makes sense to attempt to restrict an application to only 1 (or N where N = # of CPUs) quantum; we'll just end up with applications that always use system scope threads. Do we want to provide a method of binding threads to KSEs, and KSEs to processors? Binding threads to KSEs isn't really that hard to implement in the UTS, and I wouldn't think it too difficult for the kernel to bind KSEs to processors either (?). Some KSEs may be automatically bound to processors, but others might not; KSEs allocated for system scope threads, or KSEs allocated (for process scope threads) above and beyond the number of CPUs (assuming we allow this). I'd like to resolve these issues (any others?) very soon so I can concentrate on more of the UTS details (like what is the communication channel between the kernel and the UTS). At some point, it may be worthwhile to have a telecon or IRC (never tried it) because it could take too long via this mailing list. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 17:41:41 2000 Delivered-To: freebsd-arch@freebsd.org Received: from winston.osd.bsdi.com (winston.osd.bsdi.com [204.216.27.229]) by hub.freebsd.org (Postfix) with ESMTP id 3397D37B4CF for ; Mon, 27 Nov 2000 17:41:39 -0800 (PST) Received: from winston.osd.bsdi.com (localhost [127.0.0.1]) by winston.osd.bsdi.com (8.11.1/8.11.1) with ESMTP id eAS1fYh53354; Mon, 27 Nov 2000 17:41:34 -0800 (PST) (envelope-from jkh@winston.osd.bsdi.com) To: Kirk McKusick Cc: arch@FreeBSD.ORG, rps@merlin.mat.uc.pt Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) In-Reply-To: Message from Kirk McKusick of "Mon, 27 Nov 2000 14:41:05 PST." <200011272241.OAA93364@beastie.mckusick.com> Date: Mon, 27 Nov 2000 17:41:33 -0800 Message-ID: <53352.975375693@winston.osd.bsdi.com> From: Jordan Hubbard Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > When we first implemented termios at CSRG, we had an erase2 > character. Mike Karels was vehemently opposed to it, and > insisted that it be deleted before we did our next release > (4.3-tahoe if I remember correctly). I am of the opinion that > it is a good idea, and should be there. I do not believe that > we need/want a general aliasing facility as erase is really > the only character for which there is widespead disagreement > over which character to use. So, my take would be to add > erase2 and be done with it. Well, there are the ^U vs ^X folks for line-kill (some even argue for ^W) which is why I cited it as another example; I agree that it's by no means as prevalent as ^H vs DEL though. That said, I'm still not fully convinced that termios was implemented in a fully sane fashion to begin with. If one uses a fairly competent shell like bash, for example, you have a "bind" command which allows you to map any key to any function and I've used that feature to good effect in my .bashrc so I'd have a hard time with any argument that fully bindable keys is an over-engineered solution. The major drawback, of course, is that these editing characters are only useful at the shell prompt and not with other programs which take input, which is why readline(3) type functionality would really not be such a horrible thing to see in termios(4). Back in the day when a really bloated kernel was a couple of hundred kilobytes I'd also probably have been shot at dawn for even making such a suggestion, but I'm hoping that times have changed enough that my life will be spared for doing so. :) - Jordan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 18: 2:24 2000 Delivered-To: freebsd-arch@freebsd.org Received: from winston.osd.bsdi.com (winston.osd.bsdi.com [204.216.27.229]) by hub.freebsd.org (Postfix) with ESMTP id 8990937B4CF for ; Mon, 27 Nov 2000 18:02:22 -0800 (PST) Received: from winston.osd.bsdi.com (localhost [127.0.0.1]) by winston.osd.bsdi.com (8.11.1/8.11.1) with ESMTP id eAS22Hh53509; Mon, 27 Nov 2000 18:02:17 -0800 (PST) (envelope-from jkh@winston.osd.bsdi.com) Cc: Kirk McKusick , arch@FreeBSD.ORG, rps@merlin.mat.uc.pt Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) In-Reply-To: Message from Jordan Hubbard of "Mon, 27 Nov 2000 17:41:33 PST." <53352.975375693@winston.osd.bsdi.com> Date: Mon, 27 Nov 2000 18:02:17 -0800 Message-ID: <53507.975376937@winston.osd.bsdi.com> From: Jordan Hubbard Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > Back in the day when a really bloated kernel was a couple of hundred > kilobytes I'd also probably have been shot at dawn for even making > such a suggestion, but I'm hoping that times have changed enough > that my life will be spared for doing so. :) Just to follow up to myself, I should also note that I'm just lightly kvetching with my suggestion that termios(4) should be extended. I don't intend it as a rejection of the original patches by Mr. Salgueiro and it does appear that there is wide-spread support for them so I'll probably just commit them until such time (probably right around the time that our Sun enters the red giant cycle) as termios(4) grows more general functionality. - Jordan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 20:54: 2 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mail.rpi.edu (mail.rpi.edu [128.113.100.7]) by hub.freebsd.org (Postfix) with ESMTP id D2A6A37B479 for ; Mon, 27 Nov 2000 20:54:00 -0800 (PST) Received: from [128.113.24.47] (gilead.acs.rpi.edu [128.113.24.47]) by mail.rpi.edu (8.9.3/8.9.3) with ESMTP id XAA15352; Mon, 27 Nov 2000 23:53:52 -0500 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <52694.975362925@winston.osd.bsdi.com> References: <52694.975362925@winston.osd.bsdi.com> Date: Mon, 27 Nov 2000 23:53:51 -0500 To: Jordan Hubbard , arch@FreeBSD.ORG From: Garance A Drosihn Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) Cc: rps@merlin.mat.uc.pt Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG At 2:08 PM -0800 11/27/00, Jordan Hubbard wrote: >On the other hand, I can see that this specific case >(erase) is by far the most significant. Which is why I'm >forwarding this to arch - this is one of those classic >architecture/feature trade-off decisions and I would like >to hear more opinions before deciding which way I'd like >to respond to this. Due to the variety of unixes that I have to deal with, and the variety of ways I connect to them, I am forever having headaches with the erase character. Conceptually, I am not thrilled with the idea of having a special "erase2" option in stty. But I'm so fed up with del vs ^H in my own day-to-day operations that any improvement would be welcome. I wouldn't mind a more architecturally grand solution, but this would be helpful enough that I'd be happy to see it, if someone has already written the changes to make it happen. -- Garance Alistair Drosehn = gad@eclipse.acs.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 20:57:25 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id B1BCC37B479 for ; Mon, 27 Nov 2000 20:57:22 -0800 (PST) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id VAA14111; Mon, 27 Nov 2000 21:57:51 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp05.primenet.com, id smtpdAAAylaaUA; Mon Nov 27 21:56:55 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id VAA25380; Mon, 27 Nov 2000 21:56:15 -0700 (MST) From: Terry Lambert Message-Id: <200011280456.VAA25380@usr08.primenet.com> Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) To: jkh@winston.osd.bsdi.com (Jordan Hubbard) Date: Tue, 28 Nov 2000 04:56:14 +0000 (GMT) Cc: mckusick@mckusick.com (Kirk McKusick), arch@FreeBSD.ORG, rps@merlin.mat.uc.pt In-Reply-To: <53352.975375693@winston.osd.bsdi.com> from "Jordan Hubbard" at Nov 27, 2000 05:41:33 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > That said, I'm still not fully convinced that termios was implemented > in a fully sane fashion to begin with. If one uses a fairly competent > shell like bash, for example, you have a "bind" command which allows > you to map any key to any function and I've used that feature to good > effect in my .bashrc so I'd have a hard time with any argument that > fully bindable keys is an over-engineered solution. The major > drawback, of course, is that these editing characters are only useful > at the shell prompt and not with other programs which take input, > which is why readline(3) type functionality would really not be such a > horrible thing to see in termios(4). Shades of VMS' CTERM protocol... One nice thing that VMS did was to implement a state machine in their tty driver; this let them do nice things, like session switching on VT3xx terminals. It also let you know when the terminal was in the base state, as opposed to being in the middle of processing an escape sequence, so you could do things like modify the contents of a status line, or turn transparent printing on, send some data, and turn it back off, all without worring about managing multiplexing yourself. Computone and Intelliport did this is in a general way for more than just ANSI terminals (the only thing VMS worked with) with their Xenix and UNIX drivers by downloading the state tree down to the driver when the terminal type was set. They didn't support session switching, but they did support a tty and printer device, muxed in the kernel, to let them support a printer off the back of a terminal. I'm actually aware of a video rental chain that used these cards with the mux drivers and Wyse terminals to support receipt printing, and most of the systems are still in use today. All that said, I think that terminals are probably only going to become less and less common, as time goes on, and that it would be a lot of effortspent for naught to get readline or similar functionality into FreeBSD's drivers. Actually, this would probably be a perfect application for a Streams module... Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Nov 27 22: 4:39 2000 Delivered-To: freebsd-arch@freebsd.org Received: from winston.osd.bsdi.com (winston.osd.bsdi.com [204.216.27.229]) by hub.freebsd.org (Postfix) with ESMTP id BCA8137B4D7 for ; Mon, 27 Nov 2000 22:04:36 -0800 (PST) Received: from winston.osd.bsdi.com (localhost [127.0.0.1]) by winston.osd.bsdi.com (8.11.1/8.11.1) with ESMTP id eAS64Sh51161; Mon, 27 Nov 2000 22:04:28 -0800 (PST) (envelope-from jkh@winston.osd.bsdi.com) To: Terry Lambert Cc: mckusick@mckusick.com (Kirk McKusick), arch@FreeBSD.ORG, rps@merlin.mat.uc.pt Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) In-Reply-To: Message from Terry Lambert of "Tue, 28 Nov 2000 04:56:14 GMT." <200011280456.VAA25380@usr08.primenet.com> Date: Mon, 27 Nov 2000 22:04:27 -0800 Message-ID: <51159.975391467@winston.osd.bsdi.com> From: Jordan Hubbard Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > All that said, I think that terminals are probably only going > to become less and less common, as time goes on, and that it > would be a lot of effortspent for naught to get readline or > similar functionality into FreeBSD's drivers. This doesn't just apply to terminals, this applies to anyone trying to use a PTY through a remote session using anything from a Sun keyboard to a Microsoft Unnatural keyboard. > Actually, this would probably be a perfect application for a > Streams module... OK, I'm sorry, but we have to kill you now. - Jordan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Nov 28 6:28:45 2000 Delivered-To: freebsd-arch@freebsd.org Received: from point.osg.gov.bc.ca (point.osg.gov.bc.ca [142.32.102.44]) by hub.freebsd.org (Postfix) with ESMTP id C2DCE37B400 for ; Tue, 28 Nov 2000 06:28:42 -0800 (PST) Received: (from daemon@localhost) by point.osg.gov.bc.ca (8.8.7/8.8.8) id GAA12929; Tue, 28 Nov 2000 06:28:00 -0800 Received: from passer.osg.gov.bc.ca(142.32.110.29) via SMTP by point.osg.gov.bc.ca, id smtpda12927; Tue Nov 28 06:27:40 2000 Received: (from uucp@localhost) by passer.osg.gov.bc.ca (8.11.1/8.9.1) id eASERYE06652; Tue, 28 Nov 2000 06:27:34 -0800 (PST) Received: from cwsys9.cwsent.com(10.2.2.1), claiming to be "cwsys.cwsent.com" via SMTP by passer9.cwsent.com, id smtpdzp6650; Tue Nov 28 06:26:49 2000 Received: (from uucp@localhost) by cwsys.cwsent.com (8.11.1/8.9.1) id eASEQmU13919; Tue, 28 Nov 2000 06:26:48 -0800 (PST) Message-Id: <200011281426.eASEQmU13919@cwsys.cwsent.com> Received: from localhost.cwsent.com(127.0.0.1), claiming to be "cwsys" via SMTP by localhost.cwsent.com, id smtpde13915; Tue Nov 28 06:26:05 2000 X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 Reply-To: Cy Schubert - ITSD Open Systems Group From: Cy Schubert - ITSD Open Systems Group X-OS: FreeBSD 4.2-RELEASE X-Sender: cy To: Daniel Eischen Cc: Matt Dillon , Jordan Hubbard , arch@FreeBSD.ORG Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) In-reply-to: Your message of "Mon, 27 Nov 2000 18:39:45 EST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 28 Nov 2000 06:26:05 -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message , Daniel Eischen writes: > On Mon, 27 Nov 2000, Matt Dillon wrote: > > This is one of those things where, 10 years ago, I would probably > > have been a purist and been opposed to it. > > > > But after 15+ years of pure hell having to deal with every > > conceivable combination of ^H and ^?, terminal types, > > telnet, rlogin, ssh, and so on and so forth... I say to > > hell with the purist view on this one. I'd love to > > see this committed! > > I agree! Commit this now before there are any objections! Let's do it before it becomes worthy of a bikeshed debate. Regards, Phone: (250)387-8437 Cy Schubert Fax: (250)387-5766 Team Leader, Sun/DEC Team Internet: Cy.Schubert@osg.gov.bc.ca Open Systems Group, ITSD, ISTA Province of BC To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Nov 28 9: 2:14 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mail.interware.hu (mail.interware.hu [195.70.32.130]) by hub.freebsd.org (Postfix) with ESMTP id 9093537B400; Tue, 28 Nov 2000 09:02:11 -0800 (PST) Received: from nairobi-20.budapest.interware.hu ([195.70.50.212] helo=elischer.org) by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian)) id 140o8f-0003Ty-00; Tue, 28 Nov 2000 18:01:53 +0100 Message-ID: <3A23E4F7.8E42EB3E@elischer.org> Date: Tue, 28 Nov 2000 09:01:43 -0800 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Marius Bendiksen Cc: Alfred Perlstein , Daniel Eischen , John Baldwin , Jonathan Lemon , arch@FreeBSD.ORG Subject: Re: Thread-specific data and KSEs References: Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Marius Bendiksen wrote: > > > > It's just one more register that has to be saved. I don't > > > think it's going to matter much. > > No extra TLB faults/invalidations? Aren't segment registers > > somewhat expensive to load? > > Upon loading a task state (with ltr or a gate), you will restore all > segment registers from the tss, regardless of their content, and a load of > the shadow portion of the segment will be attempted anyway. I don't think > this is the right place to shave off cycles, nor do I think the speed is > even the most relevant issue for this extension, but rather the abuse of > segments that are ment to hold real data. We don't use TSS to swap between processes.. > > Marius > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000 ---> X_.---._/ presently in: Budapest v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Nov 28 9:11:46 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mail.interware.hu (mail.interware.hu [195.70.32.130]) by hub.freebsd.org (Postfix) with ESMTP id 4017037B401 for ; Tue, 28 Nov 2000 09:11:44 -0800 (PST) Received: from nairobi-20.budapest.interware.hu ([195.70.50.212] helo=elischer.org) by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian)) id 140oIB-00046E-00; Tue, 28 Nov 2000 18:11:43 +0100 Message-ID: <3A227FF2.FD2CC41E@elischer.org> Date: Mon, 27 Nov 2000 07:38:26 -0800 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: arch@FreeBSD.ORG Cc: Daniel Eischen Subject: Re: Thread-specific data and KSEs References: Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG One thing I just realised: If we are using defered FP state saving and restoring in the kernel, then we will have troubles with that when switching threads in userland, since the handler for that is in the kernel. Of course we could set the place for it in the KSE mailbox and let the kernel save the information when it needs it. Julian -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000 ---> X_.---._/ presently in: Budapest v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Nov 28 10:13:25 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id C7E6937B400 for ; Tue, 28 Nov 2000 10:13:22 -0800 (PST) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id NAA21649; Tue, 28 Nov 2000 13:12:54 -0500 (EST) Date: Tue, 28 Nov 2000 13:12:53 -0500 (EST) From: Daniel Eischen To: Julian Elischer Cc: arch@FreeBSD.ORG Subject: Re: Thread-specific data and KSEs In-Reply-To: <3A227FF2.FD2CC41E@elischer.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, 27 Nov 2000, Julian Elischer wrote: > > One thing I just realised: > > If we are using defered FP state saving and restoring in the kernel, then we > will have troubles with that when switching threads in userland, since the > handler for that is in the kernel. Of course we could set the place for it in > the KSE mailbox and let the kernel save the information when it needs it. Our current threads library knows when to save and restore FP state; it currently only happens when a signal is received (for i386, I think alpha FP state is always saved both in jmp_buf and ucontext_t). I think we want to avoid saving and restoring FP state unless it's necessary. That's probably only when a fault occurs or when the KSE is preempted. I like the idea of having the kernel save the FP state in the thread state storage area (ucontext_t?) in the KSE mailbox thingy. Also, are we going to allow the kernel to follow links out of the mailbox, or are we going to limit UTS<->kernel communication to just this one page? I think it might be preferable to only communicate via the mailbox and never have the kernel attempt to read/write to other areas of KSE/thread storage. For instance, we could place the pointer to the thread state storage area in the mailbox. But that would require a copyin, and then a copyout to another page that might be paged out. The drawback of only using the mailbox is that it requires an additional copy by the UTS every time an upcall is made (to copy the thread state from the mailbox to the storage area in the thread). -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Nov 28 11: 3:47 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mailhost.iprg.nokia.com (mailhost.iprg.nokia.com [205.226.5.12]) by hub.freebsd.org (Postfix) with ESMTP id 551CA37B400 for ; Tue, 28 Nov 2000 11:03:45 -0800 (PST) Received: from darkstar.iprg.nokia.com (darkstar.iprg.nokia.com [205.226.5.69]) by mailhost.iprg.nokia.com (8.9.3/8.9.3-GLGS) with ESMTP id LAA21472; Tue, 28 Nov 2000 11:03:45 -0800 (PST) Received: (from root@localhost) by darkstar.iprg.nokia.com (8.11.0/8.11.0-DARKSTAR) id eASJ3h621434; Tue, 28 Nov 2000 11:03:43 -0800 X-Virus-Scanned: Tue, 28 Nov 2000 11:03:43 -0800 Nokia Silicon Valley Email Exploit Scanner Received: from dhcp-15-155.iprg.nokia.com (205.226.15.155, claiming to be "iprg.nokia.com") by darkstar.iprg.nokia.com(WTS.12.69) smtpdjDaDWg; Tue, 28 Nov 2000 11:02:13 PST Message-ID: <3A240337.D8109556@iprg.nokia.com> Date: Tue, 28 Nov 2000 11:10:47 -0800 From: Michael Williams Organization: Nokia X-Mailer: Mozilla 4.7 [en] (Win98; U) X-Accept-Language: en,pdf MIME-Version: 1.0 To: Jason Evans Cc: Julian Elischer , arch@FreeBSD.ORG Subject: Re: Threads (KSE etc) comments References: <3A1B0B64.6D694248@elischer.org> <3A211C82.2464D07E@elischer.org> <20001127154800.M4140@canonware.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG FYI SPARC machines from Sun or Fujitsu do this both in h/w or just in software i.e. cpu_offline(). Michael Jason Evans wrote: One example of why enforcing KSE/KSEG limits could become hard in the future is if the number of processors is dynamic (i.e. processors can be added and removed). In discussions I've had with Mike Smith, this is a very real possibility, and is something we should keep in mind. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Nov 28 12:48:43 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mail.interware.hu (mail.interware.hu [195.70.32.130]) by hub.freebsd.org (Postfix) with ESMTP id 35DD837B402 for ; Tue, 28 Nov 2000 12:48:37 -0800 (PST) Received: from luanda-25.budapest.interware.hu ([195.70.51.25] helo=elischer.org) by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian)) id 140rg1-0004xc-00; Tue, 28 Nov 2000 21:48:34 +0100 Message-ID: <3A24064C.A3DF52A8@elischer.org> Date: Tue, 28 Nov 2000 11:23:56 -0800 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Jason Evans Cc: arch@FreeBSD.ORG Subject: Re: Threads (KSE etc) comments References: <3A1B0B64.6D694248@elischer.org> <3A211C82.2464D07E@elischer.org> <20001127154800.M4140@canonware.com> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Jason Evans wrote: > > > I don't understand how we're usurping the UTS's scheduling decisions. If a > KSE is preempted, then an upcall (resulting in yet another preemption, if > necessary) must be done right away in order to give the UTS enough > information to correctly schedule threads on the KSEs that are still > running. This is one of the basic tenets of scheduler activations, which > we really have to follow. I don't think it's practical to upcall to the UTS when you preempt one of it's KSEs. Would you hold off servicing a higher priority process while you are chatting with the UTS? Of course not.. If the UTS wanted a thread to have a priority high enough to avoid being pre-empted by a process of priority X then it should have put it into a KSEG running at priority X+1. What you DO want to do is notify the UTS at the next possible convenient moment of the fact. This would be at the completion of a syscall in some other KSE, or the resumption of a previously suspended KSE. (these are the same times when signals are delivered). You COULD pre-empt another KSE (on another processor) (if it was in a thread) and do an upcall to it's UTS but I don't know that the complexity (including inter-CPU communications within the kernel and delivery of an inter-CPU interupt) is worth it. You really can't do anything about the period of time between the pre-emption and the first available notification moment. We put up with a delay equal to, or greater than this with signal delivery today. You cannot hold up the higher priority process. (As I said, we give a method by which the UTS can ensure that some threads are harder to pre-empt than others(KSEGs. It should use this to avoid the pre-emption in the first place.. > > > If the Former (All KSEs report all events) then there is no real > > advantage to having more than N KSEs (N processors), because that means > > that the UTS will probably keep swapping the threads it thinks are most > > important to the KSEs which means that the thread that was pre-empted on > > KSE-A will probably be re-scheduled on KSE-B when it preempts KSE-A. So > > why have KSE-B at all? All it does is massively confuse things, and > > creates a whole new class of scheduling problems. > > The main advantage I see of allowing more KSEs than processors (total > across all KSEGs) is that it simplifies the implementation considerably. > Very little has to be changed about how things currently work, which also > means that single-threaded applications work the same as they do now, > without a lot of extra work. We are agreed that the TOTAL number of KSEs may be greater than the number of processes. > > I agree that there's no good reason to have more KSEs in a KSEG than there > are processors, but it doesn't actually break anything to allow this, and > simply using process resource limits to control the number of KSEs is > simpler than enforcing limits on the number of KSEs per KSEG. I think that if you think of a KSEG as a contention domain it gives a different viewpoint. Threads are assigned to a KSEG and will not contend against each other in the system-scope. All threads in the same KSEG share the same CPU resources and can migrate pretty easily around the KSEs assigned to the KSEG. (the only reason to keep them on a cpu would be for cache effects). KSEs within a KSEG do not directly contend with each other, and KSEGs DO contend with each other (potentially), so you can see that it would simplify some things if you make the following rules.. 1/ Only one KSE from any given KSEG may be on any single processor at any given time. Maybe you only shift a KSE when it's idle, or in the kernel, and only onto a processor which doesn;t already have a KSE from that KSEG in it. 2/ You only allow the number of KSEs to be <= N where N is the number of processors available. I can't prove it yet, but thinking about the implementation, I can't help but feel in my gut that making these rules will allow some solutions to just "fall out" that otherwise may require a lot more work. I'm particularly worried about a KSE being pre-empted while in the UTS. The kernel isn't going to hold off the pre-emption just because the process thinks it's a bad idea... it has a higher priority process screaming in its ear wanting cycles. If, later, we then come in on another KSE, on the same processor we can't really guarantee that we will not have a locking collision within our resourses, with the UTS that is presently swapped out. It's not really a deadlock but we will take a big hit in time as we have to wait for the other KSE to be run again before we can get what would otherwise have been a very short term lock. I'm not explaining it very well, because it sort of relies on a lot of other stuff in my head. maybe I need to write that down.. ok here goes. In my imagination of the implementation, KSEs each have a user upcall stack and a mailbox. The UTS is run on those stacks. Threads are assigned to a KSEG and the UTS will prefer to run them on a single KSE for as long as it's easy to do so, but will often and easily switch them between KSEs (if there are several) for load balancing. i.e. there is very little binding of threads to KSEs within a single KSEG. If there are only N KSEs (N=numProcessors) then locking between the UTS instances in the same KSEG is rather trivial, and can be pretty much limitted to brief spinlocks. Since these contentions will be more common than contentions between UTS agents in other KSEGs (threads won't often be migrating between KSEGs), keeping the locks simple is important. If we have N>numProcessors then we need to take into account (by my thinking) the potential serialisation of KSEs and pre-emptions such as that I mentionned above. If we don't then we can allow communications between threads in the same KSEG to use a much simpler locking and synchronisation scheme. You use a more heavyweight scheme between threads in different KSEGs. if you wnat to make a program that has many KSEs, just put them all in different KSEGs all with the same scheduler priority, but be prepared to pay the price of heavier weight communications and synchronisations. With a limit of N KSEs we can also experiment with such things as gang scheduling, where we might ask that all KSEs in a KSEG are iff possible sschedlued across all the processors at once. This can give massive throughput imporvvements in some applications. Particularly ones where threads are communicating with each other using a ping-pong protocol. >From the kernel point of view we need not limit the number. but I think that it is foolish to not do so. > > One example of why enforcing KSE/KSEG limits could become hard in the > future is if the number of processors is dynamic (i.e. processors can be > added and removed). In discussions I've had with Mike Smith, this is a > very real possibility, and is something we should keep in mind. OK I agree that this is possible. And I see that this would require that we can pre-empt a KSE in user mode, while it is in the UTS, and allow it to try run to completion (get out of the UTS) somewher else. yuk. > > The UTS doesn't need to be any more complex. It would simply get more > upcalls if there were more preemptions as a result of excessive KSEs, which > I don't think would happen anyway. As I said, I'm worried about the UTS itslef. > > > The reason for having KSEGs is simply as an entity that competes for CPU > > to assure fairness. > > It may not even exist as a separate structure in the case where there > > are separate per-CPU scheduling queues, (though I think it would for > > efficiency's sake). It would PROBABLY have a analogous partner in the > > UTS that represents the virtual machine that runs all the threads that > > are competing at the same scope. > > I agree with everything you say here. > > > On a single scheduling queue system, I > > think I would have the KSEG in the queue rather than the independent > > KSEs. When it get's to the head, you schedule > > KSEs on all the CPUs. This allows the threads to communicate quickly > > using shared memory should they want. The UTS has the entire quantum > > across as many CPUs as it has. > > As I mentioned in another email, I don't think we should plan on having a > production release that is implemented with only a single scheduling queue. fair enough > > Jason -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000 ---> X_.---._/ presently in: Budapest v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Nov 28 12:49: 0 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mail.interware.hu (mail.interware.hu [195.70.32.130]) by hub.freebsd.org (Postfix) with ESMTP id E8F0D37B400; Tue, 28 Nov 2000 12:48:54 -0800 (PST) Received: from luanda-25.budapest.interware.hu ([195.70.51.25] helo=elischer.org) by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian)) id 140rgA-0004yq-00; Tue, 28 Nov 2000 21:48:43 +0100 Message-ID: <3A2419AD.43A14605@elischer.org> Date: Tue, 28 Nov 2000 12:46:37 -0800 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: "Brian F. Feldman" Cc: arch@FreeBSD.org, jasone@FreeBSD.org Subject: Re: Threads .. chopping up 'struct proc' References: <200011262239.eAQMd0576413@green.dyndns.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG "Brian F. Feldman" wrote: > > Julian Elischer wrote: > > I'v been looking a the proc srtucture.. > > > > The aim is to eventually move some of the fields into a > > struct KSE (struct schedbox?) > > struct KSEC (struct threadcontext?) > > struct KSEG (struct schedgroup?) > > Sounds about right, as far as I've been following the discussion (I read all > of -arch, but don't follow -smp at all since I just don't have SMP ;) > > My question thus far is, okay, given a proc has one of each; will a set of > threads, in any form, ALWAYS have a proc backing it up? It would make sense > as such, and in that case I'd think that you would reduce a lot of the > complexity in the switchover. > > > Initially we would simply include one of each of these in the struct proc, > > but link them together as if they were correctly connected up. > > we would use macros such as: > > #define p_estcpu p_kse.kse_estcpu > > to keep present code working.... > > eventually functions that get changed to receive a kse directly > > would just use kse->kse_estcpu and if they need proc they > > can use kse->kse_proc. But until then, we'd start by simply > > separating the fields and using macros. Then we can convert > > calls at our leasure. > > What would be the difference between doing it "right" for struct proc in the > first place rather than dummying them up? I wouldn't want an artificial > discrepancy here, if possible. Perhaps you could explain a bit more of the > vision you have here? I haven't been able to pick that bit up from your > posts as of yet. A KSE of just one thread would seem to logically be > handled the exact same as a process. > > > However when going through the fields in struct proc, > > some difficulties become obvious. Here's my initial > > division of the fields. I've added a comment at the > > beginning of each line that indicates where I think > > it should go, however I'm not convinced about some of them: > > > > P = stays in struct proc > > E = goes to 'KSE' struct (schedulable entity) > > G = goes to 'group' struct > > C = goes to 'sleepable Context' struct. > > Does each KSE get a sleepable context? I don't know if I really see where > it fits in; sounds like it would have a 1:1 mapping with KSEs. > Ok I'm going to only answer this question here as I'm off to school inthe morning and it's 12:30 AM now.. but you have a misconception so I'll try clear that up quickly.. A KSE doesn't have a stack. It doesn't have any state WRT system call execution. When a system call happens, controll passes from userland, to a waiting KSE that is presently assigned to teh processor you are on, and your process. The KSE grabs a spare "KSEC - KSE CONTEXT) (maybe it already has is sitting ready) and uses it. The KSEC supplies a stack and storage for anything that describes the state of the processor at any moment during the syscall. When the system call blocks, the KSEC is left on the sleep queue, and the KSE grabs another one, and performs an upcall to the Userland Thread scheduler, which schedules another thread. When THAT thread does a system call, the system call is executed, storing a set of frames and state onto the stack in the NEW KSEC. If, in turn, that blocks, it too is thrown onto the sleep queue. Everything needed to complete the system calls is in the KSECs, which is hibernating on the Sleep queues. When the system call is reawakenned, the kernel, waits for a scheduling event in which a KSE from that process (possibly the same one) is being scheduled. It then reassociates the first KSEC (with it's stack and stored processor context) with that KSE and then completes the system call (including any copyout()s or copyin()s). However, instead of crossing back to user space when it gets back up to the boundary, it puts the syscall's return information in the mailbox that the Thread system configured (I skipped that bit) for that thread (don't worry it's trivial), and checks if there are any more awakened syscalls to complete. It keeps doing this until there are no more awakening KSECs, at which time it does an upcall to the process. This results in the Userland Thread Scheduler (UTS) picking up all the completed threads, deciding which is the highest priority, and running it, as if it were just returning from the kernel. I forgot to mention that the mailboxes for the completed threads are linked together by the kernel before doing the upcall, and the resulting list is passed as a single pointer to UTS. Note: the thread that was running when the KSE was pre-empted is also in the list of threads that is returned to the UTS when the upcall happens, so the UTS may decide to let it continue running. It didn't voluntarily do a syscall, but it did cross to the kernel when the timer interrupt occured, so it can be faked up to look the same. If it was in a critical region, then of course it should have marked that fact, so it would be scheduled first. A process may have a KSE for each physical processor. When it creates a new KSE (upto the maximum of N) it sets up a KSE mailbox. When it shedules a thread, it places a pointer to the Thread mailbix in the KSE mailbox. The KSE always knows where it's mailbox is so it can always find the thread mailbox of the thread that just made the systemcall. When the syscall blocks, that thread mailbox address is stored int the KSEC, and it is zero's out from the KSE's mailbox. When an upcall happens, the KSE adds the linked list of all completed syscall's mailboxes in that same KSE mailbox, as well. The UTS just takes that list, and adds the threads mentionned onto it's lists of runnable threads, and then makes a schedulaing decision and runs the highest priority thread. It sets the mailbox address of that thread into the KSE's mailbox, and jumps into the thread.. etc.etc. I haven't mentionned KSEGs here but if you are limited to N KSEs, you want a container into which you want to put extra competeing KSEs (for example a super High prority thread). usually you just have one KSEG, but you may start another, in which they are treated by teh system much like two separate processes. each with it's own KSEs. more later. Julian -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000 ---> X_.---._/ presently in: Budapest v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Nov 28 14:20:55 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mail.interware.hu (mail.interware.hu [195.70.32.130]) by hub.freebsd.org (Postfix) with ESMTP id 9001E37B400 for ; Tue, 28 Nov 2000 14:20:47 -0800 (PST) Received: from timbuktu-06.budapest.interware.hu ([195.70.51.198] helo=elischer.org) by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian)) id 140t7A-0004y2-00; Tue, 28 Nov 2000 23:20:40 +0100 Message-ID: <3A242FAF.313295F0@elischer.org> Date: Tue, 28 Nov 2000 14:20:31 -0800 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Daniel Eischen Cc: arch@FreeBSD.ORG Subject: Re: Thread-specific data and KSEs References: Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Daniel Eischen wrote: > > On Mon, 27 Nov 2000, Julian Elischer wrote: > > > > One thing I just realised: > > > > If we are using defered FP state saving and restoring in the kernel, then we > > will have troubles with that when switching threads in userland, since the > > handler for that is in the kernel. Of course we could set the place for it in > > the KSE mailbox and let the kernel save the information when it needs it. > > Our current threads library knows when to save and restore FP state; > it currently only happens when a signal is received (for i386, I think > alpha FP state is always saved both in jmp_buf and ucontext_t) That's comforting. I was looking at the ia64 specs.. that thing presents some interesting challenges in regards to the 'intelligent' stack it has. It will be very hard to play games with it's stack when it's cached inside the chip. I presume they have a scheme to allow such things as threads, but it looks a mess from here. > > I think we want to avoid saving and restoring FP state unless it's > necessary. That's probably only when a fault occurs or when the > KSE is preempted. I like the idea of having the kernel save the > FP state in the thread state storage area (ucontext_t?) in the > KSE mailbox thingy. The question is, what happens to the FPU context when you swap threads? should each thread have it's own FPU context? If there is one for the KSE, might that be not enough? especially if the KSE was pre-empted. If a thread is migrated to another KSE, having last been pre-empted, it becomes important that the FPU state go with it because it may have been part way through some calculations when that KSE was stopped. And what if the new KSE already has one that wsa stopped in thasame way? it looks to me like you need to have one per thread. It's not much different fromt eh point of view of the kernel. WHen you create a KSE you give it's mailbox. when you schedule a thread onto the KSE you set a pointer in that mailbox to the thread's context and state storage area. The kernel can easily follow that link when it pre-empts the KSE to store the General regs, the FPU regs etc. Theoretically it might only store the regs there in a syscall if it looks like the syscall will block. but the aim would be to make allthreads look the same when stopped so that the UTS can restart any one it chooses. > > Also, are we going to allow the kernel to follow links out of > the mailbox, or are we going to limit UTS<->kernel communication > to just this one page? I think it might be preferable to only > communicate via the mailbox and never have the kernel attempt > to read/write to other areas of KSE/thread storage. The kernel already has to follow links etc for (for example) the readv() syscall. it's not that big a step. If you allocate all the thread-context blocks together, the pages they are in will be pretty hot. There are great advantages to having the KSEs being able to follow links. For example it means that the kernel can ALWAYS deliver a linked list of ALL completed and 'ready-to-run' threads. It can set them up so that each one will look exactly as if it has just returned from the syscall. If you only deal with one structure, you have to consider what happens when you cannot fit all returning threads into the single structure. As the kernel takes control (as the syscall or trap is entered) it notes where the context block is and when and if it decide it needs to save context, it knows where to put it. The UTS is given everything on a plate, and it's almost easier to do it this way for the kernel too. It can store this address with the KSEC and use it without any fear of ever having a clash with some other returning syscall (for example). I can imagine where a syscall starts on one KSE and is completed on another. it makes sence for the context to travel with the thread/KSEC rather than the KSE, which may suddenly have 4 syscalls all coming back within the same upcall. (where do you save all that data?) > For instance, > we could place the pointer to the thread state storage area > in the mailbox. But that would require a copyin, and then a > copyout to another page that might be paged out. Since the thread storage is part of the thread control block that the UTS has just used to schedule the thread, it's unlikely to be paged out. Even less so if it shares a page with other thread control blocks And you could always protect it with madvise(). In any case what you suggest above is EXACTLY what jason and I were planning on doing. The kernel will define a structure in /sys/i386/include/kse.h (or somewhere) called something like struct user_process_context which you would include in your thread control block. it would include a link to other such blocks (so we can return a linked list of completed or pre-empted threads (KSECs) and a status word that says whether it is a completed syscall, or a preempted thread, or whatever, and a cookie that the kernel doesn't touch so you can extend it with whatever else you need. > The drawback > of only using the mailbox is that it requires an additional copy > by the UTS every time an upcall is made (to copy the thread state > from the mailbox to the storage area in the thread). You forget that a single upcall may want to return 37 completed syscalls or pre-empted threads. Using my scheme.. there is an upcall, and bingo the UTS has ALL the completed items at once. It sorts them onto the runnable queues, selects it's favourite and puts that address into it's mailbox, loads the context, and it's off an running the next thread. > > -- > Dan Eischen > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000 ---> X_.---._/ presently in: Budapest v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Nov 28 15:27:24 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222]) by hub.freebsd.org (Postfix) with ESMTP id 5B92537B401 for ; Tue, 28 Nov 2000 15:27:22 -0800 (PST) Received: from laptop.baldwin.cx (john@dhcp246.osd.bsdi.com [204.216.28.246]) by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eASNQkC95291; Tue, 28 Nov 2000 15:26:46 -0800 (PST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <3A242FAF.313295F0@elischer.org> Date: Tue, 28 Nov 2000 15:26:59 -0800 (PST) From: John Baldwin To: Julian Elischer Subject: Re: Thread-specific data and KSEs Cc: arch@FreeBSD.org, Daniel Eischen Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 28-Nov-00 Julian Elischer wrote: > Daniel Eischen wrote: >> >> On Mon, 27 Nov 2000, Julian Elischer wrote: >> > >> > One thing I just realised: >> > >> > If we are using defered FP state saving and restoring in the kernel, then >> > we >> > will have troubles with that when switching threads in userland, since the >> > handler for that is in the kernel. Of course we could set the place for it >> > in >> > the KSE mailbox and let the kernel save the information when it needs it. >> >> Our current threads library knows when to save and restore FP state; >> it currently only happens when a signal is received (for i386, I think >> alpha FP state is always saved both in jmp_buf and ucontext_t) > > That's comforting. > > I was looking at the ia64 specs.. > that thing presents some interesting challenges in regards to > the 'intelligent' stack it has. It will be very hard to play > games with it's stack when it's cached inside the chip. I > presume they have a scheme to allow such things as threads, > but it looks a mess from here. You can disable the RSE and flush it out. This is done during context switches for example, and to setup the stack frame for signal handling I believe, though signal handling isn't quite finished yet. -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Nov 28 16:15:50 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id E157137B404 for ; Tue, 28 Nov 2000 16:15:43 -0800 (PST) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id TAA18076; Tue, 28 Nov 2000 19:15:20 -0500 (EST) Date: Tue, 28 Nov 2000 19:15:19 -0500 (EST) From: Daniel Eischen To: Julian Elischer Cc: arch@FreeBSD.ORG Subject: Re: Thread-specific data and KSEs In-Reply-To: <3A242FAF.313295F0@elischer.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, 28 Nov 2000, Julian Elischer wrote: > Daniel Eischen wrote: > > I think we want to avoid saving and restoring FP state unless it's > > necessary. That's probably only when a fault occurs or when the > > KSE is preempted. I like the idea of having the kernel save the > > FP state in the thread state storage area (ucontext_t?) in the > > KSE mailbox thingy. > > The question is, what happens to the FPU context when you swap threads? > should each thread have it's own FPU context? Yes, it currently does now. It's hard to imagine every thread not having FPU context. A thread that is preempted needs to have it's context restored when it runs again. You seem to be forgetting that the UTS can choose another thread to run and _not_ resume the preempted thread. So the UTS chooses another thread, runs it, and then you can have another fault or preemption which needs to save FP state again. > If there is one for the KSE, might that be not enough? > especially if the KSE was pre-empted. If a thread is migrated to > another KSE, having last been pre-empted, it becomes important > that the FPU state go with it because it may have been part way > through some calculations when that KSE was stopped. And what if > the new KSE already has one that wsa stopped in thasame way? > it looks to me like you need to have one per thread. Ahh, exactly. > It's not much different fromt eh point of view of the kernel. > WHen you create a KSE you give it's mailbox. > when you schedule a thread onto the KSE you set a pointer in that > mailbox to the thread's context and state storage area. > The kernel can easily follow that link when it pre-empts the KSE > to store the General regs, the FPU regs etc. > Theoretically it might only store the regs there in a syscall > if it looks like the syscall will block. but the aim would be > to make allthreads look the same when stopped so that the UTS > can restart any one it chooses. That would be my goal also. > > Also, are we going to allow the kernel to follow links out of > > the mailbox, or are we going to limit UTS<->kernel communication > > to just this one page? I think it might be preferable to only > > communicate via the mailbox and never have the kernel attempt > > to read/write to other areas of KSE/thread storage. > > The kernel already has to follow links etc for (for example) > the readv() syscall. it's not that big a step. If you allocate > all the thread-context blocks together, the pages they are in will > be pretty hot. There are great advantages to having the KSEs > being able to follow links. For example it means that the kernel > can ALWAYS deliver a linked list of ALL completed and 'ready-to-run' > threads. It can set them up so that each one will look exactly > as if it has just returned from the syscall. If you only deal > with one structure, you have to consider what happens when you > cannot fit all returning threads into the single structure. You should only have to copy context back to the KSE for one thread. When a thread blocks in the kernel, you don't need to wait to copyout it's context. You can do it immediately. All you need to pass out when the thread is resumed in the kernel and ready to return to userland is the return value from the system call. For faults and preemptions there are no return values I'd guess. > As the kernel takes control (as the syscall or trap is entered) > it notes where the context block is and when and if it decide > it needs to save context, it knows where to put it. The UTS is > given everything on a plate, and it's almost easier to do it > this way for the kernel too. It can store this address with > the KSEC and use it without any fear of ever having a clash with > some other returning syscall (for example). I can imagine where > a syscall starts on one KSE and is completed on another. > it makes sence for the context to travel with the thread/KSEC > rather than the KSE, which may suddenly have 4 syscalls > all coming back within the same upcall. (where do you save > all that data?) Here's the way I see it. A thread blocks in the kernel, or is preempted, has a fault. "There can be only one" (name the movie!) thread running in the KSE at a time. You copyout the context to the KSE, _then_ make the upcall. The KSE upcall handler then copies the context (along with FP state if saved) to the threads context storage area. Another thread is chosen and executed. The kernel need not follow links, the KSE upcall handler can handle placing the context in the threads storage area. When a thread becomes unblocked in the kernel, the UTS already has its context. All it needs now is enough information for the return value of the system call. If the UTS has to munge with the context a bit, it can. This will let you set the 1 page used for the mailbox so that it won't be paged out and not worry about whether memory anywhere else in the UTS/thread is paged out. > > For instance, > > we could place the pointer to the thread state storage area > > in the mailbox. But that would require a copyin, and then a > > copyout to another page that might be paged out. > > Since the thread storage is part of the thread control block > that the UTS has just used to schedule the thread, it's > unlikely to be paged out. Even less so if it shares a page > with other thread control blocks > And you could always protect it with madvise(). > > In any case what you suggest above is EXACTLY what jason and I > were planning on doing. The kernel will define a structure > in /sys/i386/include/kse.h (or somewhere) called something like > struct user_process_context > which you would include in your thread control block. > it would include a link to other such blocks (so we can > return a linked list of completed or pre-empted threads (KSECs) > and a status word that says whether it is a completed syscall, or a > preempted thread, or whatever, and a cookie that the kernel > doesn't touch so you can extend it with whatever else you need. Well, if you want to take the extra step of performing a copyin and following a link, I can live with that. But, I wouldn't wait until the thread becomes unblocked to copyout its context. Just do it immediately and its much easier. Actually, you have to copyout its context immediately. Let's say a thread blocks in the kernel on a read(). Then let's say the thread is sent a signal (and sa_flags is SA_RESTART). The UTS needs the threads context (at least the stack pointer) so it can create a signal frame on top of its stack. The signal handler will run in the context of the thread while it is still blocked in the kernel. The thread will also need to use its context storage area because it may again be preempted or blocked. I've spent considerable time trying to get signal handling working correctly in our threads library, and this is about the only way that really works. Actually, there is even another problem. Suppose you have: static pthread_t tid; static jmp_buf jmpbuf; static void sighandler(int signo) { if (signo == SIGALRM && pthread_equal(pthread_self(), tid)) _longjmp(jmpbuf, 1); } my_thread(void *arg) { char buf[128]; int fd = (int)arg; int ret; tid == pthread_self(); if (_setjmp(jmpbuf) == 0) { ret = read(fd, buf, sizeof(buf)); } else { printf("Thread is exiting.\n"); pthread_exit(NULL); } } pthread_kill(tid, SIGUSR1); ... pthread_kill(tid, SIGALRM); ... This is perfectly valid. And you can also have compilers that generate builtin longjmps for exception handling. In that case, we can't even wrap longjmp/_longjmp in order to do cleanup handling. So the kernel still thinks the read() is active. All sorts of gnarly stuff can happen. I think we need a way to tell the kernel to halt any pending activities for the KSEC that was blocked before trying to deliver any signals. If the thread returns normally from the signal handler, then the KSEC can be resumed. In the case of an abnormal return/jump out of the signal handler, I don't know how we'd inform the kernel that the KSEC could be reused; the UTS doesn't know if the thread is still operating in the signal handler or has jumped out of it. It'd be nice if the UTS could retrieve the KSEC state/storage. If it could, the UTS could copy it to the signal handling frame so that a normal return from the signal handler could pass it back to the kernel. We need to work this out. > > The drawback > > of only using the mailbox is that it requires an additional copy > > by the UTS every time an upcall is made (to copy the thread state > > from the mailbox to the storage area in the thread). > > You forget that a single upcall may want to return 37 completed > syscalls or pre-empted threads. Already explained above. > Using my scheme.. there is an upcall, and bingo the UTS has ALL > the completed items at once. > It sorts them onto the runnable queues, selects it's favourite > and puts that address into it's mailbox, loads the context, > and it's off an running the next thread. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Nov 28 23:52: 3 2000 Delivered-To: freebsd-arch@freebsd.org Received: from homer.softweyr.com (bsdconspiracy.net [208.187.122.220]) by hub.freebsd.org (Postfix) with ESMTP id 139D337B401; Tue, 28 Nov 2000 23:52:01 -0800 (PST) Received: from [127.0.0.1] (helo=softweyr.com ident=Fools trust ident!) by homer.softweyr.com with esmtp (Exim 3.16 #1) id 14124g-0000QL-00; Wed, 29 Nov 2000 00:54:42 -0700 Message-ID: <3A24B642.34B50961@softweyr.com> Date: Wed, 29 Nov 2000 00:54:42 -0700 From: Wes Peters Organization: Softweyr LLC X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.2.12 i386) X-Accept-Language: en MIME-Version: 1.0 To: Matt Dillon Cc: Kris Kennaway , Jordan Hubbard , arch@FreeBSD.ORG, rps@merlin.mat.uc.pt Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) References: <52694.975362925@winston.osd.bsdi.com> <20001127144809.A67395@citusc17.usc.edu> <200011272307.eARN7Ln34886@earth.backplane.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Matt Dillon wrote: > > :> I just received this today and am kind of scratching my head over it. > :> On one hand, creating an "alias" for a one specific piece of terminal > :> character mapping seems a hack; I can see the idea behind wanting to > :> use one of n characters for something like backspacing or line-killing > :> (^U or ^X for example) and would not frown (as much) on a more general > :> aliasing feature. On the other hand, I can see that this specific > :> case (erase) is by far the most significant. Which is why I'm > :> forwarding this to arch - this is one of those classic > :> architecture/feature trade-off decisions and I would like to hear more > :> opinions before deciding which way I'd like to respond to this. > : > :This is a very common newbie problem ("Stupid FreeBSD won't let me > :delete what I've typed, it just prints ^H!"). Commit please! :) > : > :Kris > > This is one of those things where, 10 years ago, I would probably > have been a purist and been opposed to it. > > But after 15+ years of pure hell having to deal with every > conceivable combination of ^H and ^?, terminal types, > telnet, rlogin, ssh, and so on and so forth... I say to > hell with the purist view on this one. I'd love to > see this committed! IMHO, this is one of the biggest arguments for using bash. I get bitten all the time when I leave bash for another interactive program that no longer provides BS/DEL compatibility. Fixing it everywhere is a good idea. -- "Where am I, and what am I doing in this handbasket?" Wes Peters Softweyr LLC wes@softweyr.com http://softweyr.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 7:50:53 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mgw-dax2.ext.nokia.com (mgw-dax2.ext.nokia.com [63.78.179.217]) by hub.freebsd.org (Postfix) with ESMTP id 0CE6037B400 for ; Wed, 29 Nov 2000 07:50:52 -0800 (PST) Received: from davir03nok.americas.nokia.com (davir03nok.americas.nokia.com [172.18.242.86]) by mgw-dax2.ext.nokia.com (Switch-2.1.0/Switch-2.1.0) with ESMTP id eATFp2615063 for ; Wed, 29 Nov 2000 09:51:12 -0600 (CST) Received: from daebh01nok.americas.nokia.com (unverified) by davir03nok.americas.nokia.com (Content Technologies SMTPRS 4.1.5) with ESMTP id for ; Wed, 29 Nov 2000 09:50:31 -0600 Received: by daebh01nok with Internet Mail Service (5.5.2652.78) id ; Wed, 29 Nov 2000 09:46:12 -0600 Message-ID: From: Atul.Sharma@nokia.com To: arch@freebsd.org Subject: Date: Wed, 29 Nov 2000 09:42:17 -0600 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2652.78) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG subscribe arch@FreeBSD.ORG To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 10:11:58 2000 Delivered-To: freebsd-arch@freebsd.org Received: from net2.gendyn.com (nat2.gendyn.com [204.60.171.12]) by hub.freebsd.org (Postfix) with ESMTP id 2158F37B400 for ; Wed, 29 Nov 2000 10:11:55 -0800 (PST) Received: from [153.11.11.3] (helo=plunger.gdeb.com) by net2.gendyn.com with esmtp (Exim 2.12 #1) id 141Bhn-0002rX-00 for arch@freebsd.org; Wed, 29 Nov 2000 13:11:43 -0500 Received: from orion.caen.gdeb.com ([153.11.109.11]) by plunger.gdeb.com with ESMTP id NAA01053 for ; Wed, 29 Nov 2000 13:08:23 -0500 (EST) Received: from vigrid.com (gpz.clc.gdeb.com [192.168.3.12]) by orion.caen.gdeb.com (8.9.3/8.9.3) with ESMTP id NAA03358 for ; Wed, 29 Nov 2000 13:08:35 -0500 (EST) (envelope-from eischen@vigrid.com) Message-ID: <3A254710.ED8B2C26@vigrid.com> Date: Wed, 29 Nov 2000 13:12:32 -0500 From: Dan Eischen X-Mailer: Mozilla 4.75 [en] (X11; U; SunOS 5.8 sun4u) X-Accept-Language: en MIME-Version: 1.0 To: arch@freebsd.org Subject: Modifying FILE to add lock Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Is there any objection to modifying struct __sFILE in stdio.h to add a lock. I am think we need to do this for libpthread. This should let us eliminate the _THREAD_SAFE macro. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 10:47:43 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 2A71F37B402 for ; Wed, 29 Nov 2000 10:47:42 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id eATIlea24289; Wed, 29 Nov 2000 10:47:40 -0800 (PST) Date: Wed, 29 Nov 2000 10:47:40 -0800 From: Alfred Perlstein To: Dan Eischen Cc: arch@FreeBSD.ORG Subject: Re: Modifying FILE to add lock Message-ID: <20001129104740.L8051@fw.wintelcom.net> References: <3A254710.ED8B2C26@vigrid.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3A254710.ED8B2C26@vigrid.com>; from eischen@vigrid.com on Wed, Nov 29, 2000 at 01:12:32PM -0500 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Dan Eischen [001129 10:11] wrote: > Is there any objection to modifying struct __sFILE in stdio.h > to add a lock. I am think we need to do this for libpthread. > This should let us eliminate the _THREAD_SAFE macro. I have no objection as long as you bump the shared lib version from -stable. This would be a great time to do it. While you're at it adding one to DIR structs would be very helpful for fixing our threadsafeness with DIR handles. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 10:50:41 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 9239B37B400 for ; Wed, 29 Nov 2000 10:50:39 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id eATIocM24437; Wed, 29 Nov 2000 10:50:38 -0800 (PST) Date: Wed, 29 Nov 2000 10:50:38 -0800 From: Alfred Perlstein To: Dan Eischen Cc: arch@FreeBSD.ORG Subject: Re: Modifying FILE to add lock Message-ID: <20001129105038.M8051@fw.wintelcom.net> References: <3A254710.ED8B2C26@vigrid.com> <20001129104740.L8051@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20001129104740.L8051@fw.wintelcom.net>; from bright@wintelcom.net on Wed, Nov 29, 2000 at 10:47:40AM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Alfred Perlstein [001129 10:47] wrote: > * Dan Eischen [001129 10:11] wrote: > > Is there any objection to modifying struct __sFILE in stdio.h > > to add a lock. I am think we need to do this for libpthread. > > This should let us eliminate the _THREAD_SAFE macro. > > I have no objection as long as you bump the shared lib version > from -stable. This would be a great time to do it. ...er but only if they aren't already bumped, if libc in 4.x is at 4 and in 5-current is at 5 already then leave the versions alone. > > While you're at it adding one to DIR structs would be very helpful > for fixing our threadsafeness with DIR handles. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 10:55:30 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id BCF7F37B401 for ; Wed, 29 Nov 2000 10:55:25 -0800 (PST) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id NAA21275; Wed, 29 Nov 2000 13:54:55 -0500 (EST) Date: Wed, 29 Nov 2000 13:54:55 -0500 (EST) From: Daniel Eischen To: Alfred Perlstein Cc: Dan Eischen , arch@FreeBSD.ORG Subject: Re: Modifying FILE to add lock In-Reply-To: <20001129104740.L8051@fw.wintelcom.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 29 Nov 2000, Alfred Perlstein wrote: > * Dan Eischen [001129 10:11] wrote: > > Is there any objection to modifying struct __sFILE in stdio.h > > to add a lock. I am think we need to do this for libpthread. > > This should let us eliminate the _THREAD_SAFE macro. > > I have no objection as long as you bump the shared lib version > from -stable. This would be a great time to do it. This would only be in -current (where the library versions have already been bumped) and for our new libpthread. > While you're at it adding one to DIR structs would be very helpful > for fixing our threadsafeness with DIR handles. Thanks! I missed that one. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 11:48:30 2000 Delivered-To: freebsd-arch@freebsd.org Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.226]) by hub.freebsd.org (Postfix) with ESMTP id 62EA037B699 for ; Wed, 29 Nov 2000 11:48:28 -0800 (PST) Received: from adlmail.cup.hp.com (adlmail.cup.hp.com [15.0.100.30]) by palrel3.hp.com (Postfix) with ESMTP id 2B4BF37E; Wed, 29 Nov 2000 11:48:27 -0800 (PST) Received: from cup.hp.com (p1000180.nsr.hp.com [15.109.0.180]) by adlmail.cup.hp.com (8.9.3 (PHNE_18546)/8.9.3 SMKit7.02) with ESMTP id LAA17261; Wed, 29 Nov 2000 11:48:26 -0800 (PST) Message-ID: <3A255D8A.7F5CFB26@cup.hp.com> Date: Wed, 29 Nov 2000 11:48:26 -0800 From: Marcel Moolenaar Organization: Hewlett-Packard X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.12 i386) X-Accept-Language: en MIME-Version: 1.0 To: Daniel Eischen Cc: Alfred Perlstein , arch@FreeBSD.ORG Subject: Re: Modifying FILE to add lock References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Daniel Eischen wrote: > > > I have no objection as long as you bump the shared lib version > > from -stable. This would be a great time to do it. > > This would only be in -current (where the library versions have > already been bumped) and for our new libpthread. I agree. We should not MFC this. Library version bumps halfway on a -stable branch doesn't seem appropriate. Changing structures is not done, IMO. Anyway: In stdio.h I'm told that I should read the warning before changing the layout of struct __sFILE. There doesn't seem to be a warning anywhere in the header file, so I figure it must be the long comment before the struct declaration. The comment doesn't tell me what happens if I change the layout. It only tells me what certain fields are for and doesn't mention _offset at all, even though that field specifically references the warning. My point: We're hinted to be careful and cautious without actually being told why. Can someone tell me what problems we might expect if we add a new field, both specifically at the end and randomly within the structure? -- Marcel Moolenaar mail: marcel@cup.hp.com / marcel@FreeBSD.org tel: (408) 447-4222 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 12: 7:33 2000 Delivered-To: freebsd-arch@freebsd.org Received: from alive.znep.com (sense-sea-MegaSub-1-500.oz.net [216.39.145.246]) by hub.freebsd.org (Postfix) with ESMTP id 119EB37B400 for ; Wed, 29 Nov 2000 12:07:31 -0800 (PST) Received: from localhost (marcs@localhost) by alive.znep.com (8.9.3/8.9.1) with ESMTP id MAA74914; Wed, 29 Nov 2000 12:03:40 -0800 (PST) (envelope-from marcs@znep.com) Date: Wed, 29 Nov 2000 12:03:40 -0800 (PST) From: Marc Slemko To: Marcel Moolenaar Cc: Daniel Eischen , Alfred Perlstein , arch@FreeBSD.ORG Subject: Re: Modifying FILE to add lock In-Reply-To: <3A255D8A.7F5CFB26@cup.hp.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 29 Nov 2000, Marcel Moolenaar wrote: > Anyway: In stdio.h I'm told that I should read the warning before > changing the layout of struct __sFILE. There doesn't seem to be a > warning anywhere in the header file, so I figure it must be the long Look in an older revision (eg. 1.1) and you will see a few comments earlier some alignment warnings that have since gone the way of the dodo, sortof. I think that is what it is referring to... > comment before the struct declaration. The comment doesn't tell me what > happens if I change the layout. It only tells me what certain fields are > for and doesn't mention _offset at all, even though that field > specifically references the warning. > > My point: We're hinted to be careful and cautious without actually being > told why. Can someone tell me what problems we might expect if we add a > new field, both specifically at the end and randomly within the > structure? If you add a new field in the middle, then any programs compiled against the old header file that have to access anything in the struct after your addition will potentially fall over horribly since a lot of the access to random fields is done with macros. If you add a field at the end, then anything that allocates memory for a FILE will break, although it is bogus to do that anyway. There is a reason why Solaris was stuck with the lame 8-bit limit on the size of the file descriptor behind a stream for so long, until they changed stuff around anyway in a new (at the time) 64-bit ABI and bumped it up there to a sane number... To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 12: 8:21 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 7D6CC37B401 for ; Wed, 29 Nov 2000 12:08:19 -0800 (PST) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id OAA01700; Wed, 29 Nov 2000 14:57:51 -0500 (EST) Date: Wed, 29 Nov 2000 14:57:51 -0500 (EST) From: Daniel Eischen To: Marcel Moolenaar Cc: Alfred Perlstein , arch@FreeBSD.ORG Subject: Re: Modifying FILE to add lock In-Reply-To: <3A255D8A.7F5CFB26@cup.hp.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 29 Nov 2000, Marcel Moolenaar wrote: > Daniel Eischen wrote: > > > > > I have no objection as long as you bump the shared lib version > > > from -stable. This would be a great time to do it. > > > > This would only be in -current (where the library versions have > > already been bumped) and for our new libpthread. > > I agree. We should not MFC this. Library version bumps halfway on a > -stable branch doesn't seem appropriate. Changing structures is not > done, IMO. > > Anyway: In stdio.h I'm told that I should read the warning before > changing the layout of struct __sFILE. There doesn't seem to be a > warning anywhere in the header file, so I figure it must be the long > comment before the struct declaration. The comment doesn't tell me what > happens if I change the layout. It only tells me what certain fields are > for and doesn't mention _offset at all, even though that field > specifically references the warning. I was also confused by the warning, which is part of the reason I posted this to -arch. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 12:55:12 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id EC9E537B404 for ; Wed, 29 Nov 2000 12:55:09 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id eATKt9t27977 for arch@freebsd.org; Wed, 29 Nov 2000 12:55:09 -0800 (PST) Date: Wed, 29 Nov 2000 12:55:09 -0800 From: Alfred Perlstein To: arch@freebsd.org Subject: serious problem with mutexs and userland visibility? Message-ID: <20001129125508.O8051@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I'm looking from opinions from those who have: 1) been working closely on SMPng. 2) have a lot of experience with doing the right thing with header files and untangling evil dependencies. 3) have dealt with this situation on other operating systems. I recently locked down struct ucred, not a big deal, basically just a mutex in each struct to protect the refcount. Unfortunetly struct ucred is used by some userland utils and sys/ucred is included in sys/mount.h as well as sys/user.h, this creates somewhat of a problem, forcing all users of sys/ucred.h to include sys/mutex.gh. I have a patch here that sort of takes care of this problem, the problem is that I had to add sys/mutex.h includes to both sys/mount.h and sys/user.h, this doesn't make me very happy. It actually removes some bogus includes of sys/ucred from userland. http://people.FreeBSD.org/~alfred/mpsafe/bde.diff (you can all guess why it's called "bde.diff" :) ) What I'd like to do is make a struct 'kucred' which contains the mutex and either contains a struct ucred or all the fields of struct ucred. 'kucred' will be used by the kernel and I'll write helper functions/macros to convert between the two. This looks like a lot of drudgework, but I'm ok with it. However if it becomes the only way to deal with this situation we may have a lot of drudgework ahead of us when this issue starts popping up with other structures. For instance, the uidinfo struct isn't currently exported to the user, however it would be nice if it was to determine how far off one was from exceeding thier limits. We would need another kernel/userland convertion pair for this facility if anyone wanted to export the information contained in this structure. If the general concensus is that exporting sys/mutex.h to userland is to be avoided, but OK when necessary than I'd rather just apply the patch I have right now. Right now I'm of the opion of "by any means necessary", meaning I really don't care about the visibility, proceeding with the mpsafe work is far more important that polluting our headers right now. I'm just concerned about taking it too far. BSD/os gets around the struct ucred problem by having a single ucred mutex used for the entire system, I don't like this because even though it's a very short term lock, it will be cache contested heavily between processors causing large amounts of bus traffic. I also don't like the BSD/os approach, because it doesn't address the problem of mutexes being inside structures declared in userland included headers, it just avoids it for this specific case. thanks, -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 13:53:16 2000 Delivered-To: freebsd-arch@freebsd.org Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.242]) by hub.freebsd.org (Postfix) with ESMTP id 15E4D37B404 for ; Wed, 29 Nov 2000 13:53:14 -0800 (PST) Received: from adlmail.cup.hp.com (adlmail.cup.hp.com [15.0.100.30]) by palrel1.hp.com (Postfix) with ESMTP id 32DFB1093; Wed, 29 Nov 2000 13:53:02 -0800 (PST) Received: from cup.hp.com (gauss.cup.hp.com [15.28.97.152]) by adlmail.cup.hp.com (8.9.3 (PHNE_18546)/8.9.3 SMKit7.02) with ESMTP id NAA22402; Wed, 29 Nov 2000 13:53:01 -0800 (PST) Message-ID: <3A257ABD.5238ED4E@cup.hp.com> Date: Wed, 29 Nov 2000 16:53:01 -0500 From: Marcel Moolenaar Organization: Hewlett-Packard X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.12 i386) X-Accept-Language: en MIME-Version: 1.0 To: Marc Slemko Cc: Daniel Eischen , Alfred Perlstein , arch@FreeBSD.ORG Subject: Re: Modifying FILE to add lock References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Marc Slemko wrote: > > If you add a field at the end, then anything that allocates memory > for a FILE will break, although it is bogus to do that anyway. Having done the signal changes, I immediately have to think about the Modula port... -- Marcel Moolenaar mail: marcel@cup.hp.com / marcel@FreeBSD.org tel: (408) 447-4222 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 14: 3:54 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 4EDDC37B400 for ; Wed, 29 Nov 2000 14:03:52 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id eATM1JP29867; Wed, 29 Nov 2000 14:01:19 -0800 (PST) Date: Wed, 29 Nov 2000 14:01:19 -0800 From: Alfred Perlstein To: Marcel Moolenaar Cc: Marc Slemko , Daniel Eischen , arch@FreeBSD.ORG Subject: Re: Modifying FILE to add lock Message-ID: <20001129140119.P8051@fw.wintelcom.net> References: <3A257ABD.5238ED4E@cup.hp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3A257ABD.5238ED4E@cup.hp.com>; from marcel@cup.hp.com on Wed, Nov 29, 2000 at 04:53:01PM -0500 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Marcel Moolenaar [001129 13:53] wrote: > Marc Slemko wrote: > > > > If you add a field at the end, then anything that allocates memory > > for a FILE will break, although it is bogus to do that anyway. > > Having done the signal changes, I immediately have to think about the > Modula port... I've never ever looked at the contents of struct FILE except to research how stdio works. Why do we need to care about the contents of struct FILE (or DIR)? We have funopen do deal with creating our own special streams, what's the point of digging into struct FILE? -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 14:43:36 2000 Delivered-To: freebsd-arch@freebsd.org Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.226]) by hub.freebsd.org (Postfix) with ESMTP id DF2C737B401 for ; Wed, 29 Nov 2000 14:43:34 -0800 (PST) Received: from adlmail.cup.hp.com (adlmail.cup.hp.com [15.0.100.30]) by palrel3.hp.com (Postfix) with ESMTP id 5FBDA420; Wed, 29 Nov 2000 14:43:34 -0800 (PST) Received: from cup.hp.com (gauss.cup.hp.com [15.28.97.152]) by adlmail.cup.hp.com (8.9.3 (PHNE_18546)/8.9.3 SMKit7.02) with ESMTP id OAA24388; Wed, 29 Nov 2000 14:43:34 -0800 (PST) Message-ID: <3A258696.EAD7BD7A@cup.hp.com> Date: Wed, 29 Nov 2000 17:43:34 -0500 From: Marcel Moolenaar Organization: Hewlett-Packard X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.12 i386) X-Accept-Language: en MIME-Version: 1.0 To: Alfred Perlstein Cc: Marc Slemko , Daniel Eischen , arch@FreeBSD.ORG Subject: Re: Modifying FILE to add lock References: <3A257ABD.5238ED4E@cup.hp.com> <20001129140119.P8051@fw.wintelcom.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Alfred Perlstein wrote: > > I've never ever looked at the contents of struct FILE except to > research how stdio works. Why do we need to care about the > contents of struct FILE (or DIR)? We have funopen do deal with > creating our own special streams, what's the point of digging > into struct FILE? The fact that you (and I) can't see the point, doesn't mean there is no point. Ignoring the fact that maybe there's a point somehow or somewhere is far more worse than reaching general consensus that there likely is no point at all. Modula has some weird architecture and OS dependencies, IIRC. It doesn't hurt to check it out before we commit the change. -- Marcel Moolenaar mail: marcel@cup.hp.com / marcel@FreeBSD.org tel: (408) 447-4222 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 17:49: 7 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id CE62F37B401 for ; Wed, 29 Nov 2000 17:49:05 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id eAU1n5K05837 for arch@FreeBSD.ORG; Wed, 29 Nov 2000 17:49:05 -0800 (PST) Date: Wed, 29 Nov 2000 17:49:05 -0800 From: Alfred Perlstein To: arch@FreeBSD.ORG Subject: HEADSUP user struct ucred -> xucred (Was: Re: serious problem with mutexs and userland visibility?) Message-ID: <20001129174905.S8051@fw.wintelcom.net> References: <20001129125508.O8051@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20001129125508.O8051@fw.wintelcom.net>; from bright@wintelcom.net on Wed, Nov 29, 2000 at 12:55:09PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Alfred Perlstein [001129 12:55] wrote: > > I recently locked down struct ucred, not a big deal, basically just > a mutex in each struct to protect the refcount. > > Unfortunetly struct ucred is used by some userland utils and > sys/ucred is included in sys/mount.h as well as sys/user.h, this > creates somewhat of a problem, forcing all users of sys/ucred.h to > include sys/mutex.g. > > I have a patch here that sort of takes care of this problem, the > problem is that I had to add sys/mutex.h includes to both sys/mount.h > and sys/user.h, this doesn't make me very happy. After a short discussion it has been determined that there will be a xucred exported to userland following the concention of xsocket and the various other xfoo structs exported to the kernel. Struct ucred will no longer be visible outside the kernel. Any userland things using struct ucred will need to use xucred. This will be the convention used to resolve mutex (or other MD fields) in kernel exported structures in the future. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 22:16:59 2000 Delivered-To: freebsd-arch@freebsd.org Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169]) by hub.freebsd.org (Postfix) with ESMTP id 88A4C37B400; Wed, 29 Nov 2000 22:16:54 -0800 (PST) Received: (from ken@localhost) by panzer.kdm.org (8.9.3/8.9.1) id XAA01540; Wed, 29 Nov 2000 23:16:53 -0700 (MST) (envelope-from ken) Date: Wed, 29 Nov 2000 23:16:53 -0700 From: "Kenneth D. Merry" To: arch@FreeBSD.org Cc: gallatin@FreeBSD.org Subject: zero copy code review Message-ID: <20001129231653.A1503@panzer.kdm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG [ -net and -current BCCed for wider coverage, this is probably best handled on -arch ] I would like to request reviews of the zero copy sockets and NFS code I've been posting about for months: http://people.FreeBSD.org/~ken/zero_copy There are diffs posted above against -current as of early November 28th, along with a FAQ, and change log. These diffs include changes in: - the socket code - NFS code - VM code - ti(4) driver - sendfile code Much of the code was written by Drew Gallatin , but I wrote a lot of the ti(4) driver mods and cleaned things up a fair bit. The code is stable, and I don't know of any bugs at the moment. I have run with it enabled on one of my main development boxes for months without any problems. The way things are currently configured, it is not turned on by default. You need two kernel options and a sysctl to turn it on. The zero copy NFS code can be turned on with gdb, although it might be better to make that into a sysctl. (I haven't played with the zero copy NFS code much, Drew has done much more with that.) How to turn the code on is covered in the web page, above. Anyway, I'd like to commit this code sometime next week, if no one comes up with any issues or problems. Comments, bug reports, etc., are welcome. Thanks! Ken -- Kenneth Merry ken@kdm.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 22:33:37 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (Postfix) with ESMTP id 6E7C437B400 for ; Wed, 29 Nov 2000 22:33:34 -0800 (PST) Received: (from daemon@localhost) by smtp01.primenet.com (8.9.3/8.9.3) id XAA16956; Wed, 29 Nov 2000 23:32:19 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp01.primenet.com, id smtpdAAAIdaigE; Wed Nov 29 23:27:07 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id XAA06955; Wed, 29 Nov 2000 23:28:13 -0700 (MST) From: Terry Lambert Message-Id: <200011300628.XAA06955@usr08.primenet.com> Subject: Re: HEADSUP user struct ucred -> xucred (Was: Re: serious problem with mutexs and userland visibility?) To: bright@wintelcom.net (Alfred Perlstein) Date: Thu, 30 Nov 2000 06:28:12 +0000 (GMT) Cc: arch@FreeBSD.ORG In-Reply-To: <20001129174905.S8051@fw.wintelcom.net> from "Alfred Perlstein" at Nov 29, 2000 05:49:05 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > I recently locked down struct ucred, not a big deal, basically just > > a mutex in each struct to protect the refcount. > > > > Unfortunetly struct ucred is used by some userland utils and > > sys/ucred is included in sys/mount.h as well as sys/user.h, this > > creates somewhat of a problem, forcing all users of sys/ucred.h to > > include sys/mutex.g. > > > > I have a patch here that sort of takes care of this problem, the > > problem is that I had to add sys/mutex.h includes to both sys/mount.h > > and sys/user.h, this doesn't make me very happy. > > After a short discussion it has been determined that there will be > a xucred exported to userland following the concention of xsocket > and the various other xfoo structs exported to the kernel. > > Struct ucred will no longer be visible outside the kernel. > > Any userland things using struct ucred will need to use xucred. > > This will be the convention used to resolve mutex (or other MD > fields) in kernel exported structures in the future. This is a really gross way to handle this. The ucred structure is used by a lot of user space programs. You should do what several UNIX vendors have already done, and implement a MUTEX() declaration macro that differes in user and kernel space, and forces an alignment; then when you copy out, copy out everything _BUT_ the mutex portion to the user space, and no user space source or object code will need to change. So: #ifdef _KERNEL #define MUTEX(x) mutex_t x; #define UREF(x,y) (void *)&((x)->y) #else #define MUTEX(x) /* user space = no mutex*/ #define UREF(x,y) (void *)(x) #endif struct foo { MUTEX(save_foo_from_bad_programmers) int normal_foo_item_1; char normal_foo_item_2; ... }; ... struct foo *foop; ... copyout( UREF(foop, normal_foo_item_1), user_space_foo); It is much better to not impact user space code at all. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 22:53:20 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 3A33A37B402 for ; Wed, 29 Nov 2000 22:53:17 -0800 (PST) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id XAA24967; Wed, 29 Nov 2000 23:49:23 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp04.primenet.com, id smtpdAAAUqayRW; Wed Nov 29 23:49:19 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id XAA07381; Wed, 29 Nov 2000 23:53:09 -0700 (MST) From: Terry Lambert Message-Id: <200011300653.XAA07381@usr08.primenet.com> Subject: Re: Modifying FILE to add lock To: marcel@cup.hp.com (Marcel Moolenaar) Date: Thu, 30 Nov 2000 06:53:09 +0000 (GMT) Cc: bright@wintelcom.net (Alfred Perlstein), marcs@znep.com (Marc Slemko), eischen@vigrid.com (Daniel Eischen), arch@FreeBSD.ORG In-Reply-To: <3A258696.EAD7BD7A@cup.hp.com> from "Marcel Moolenaar" at Nov 29, 2000 05:43:34 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > I've never ever looked at the contents of struct FILE except to > > research how stdio works. Why do we need to care about the > > contents of struct FILE (or DIR)? We have funopen do deal with > > creating our own special streams, what's the point of digging > > into struct FILE? > > The fact that you (and I) can't see the point, doesn't mean there is no > point. Ignoring the fact that maybe there's a point somehow or somewhere > is far more worse than reaching general consensus that there likely is > no point at all. > > Modula has some weird architecture and OS dependencies, IIRC. It doesn't > hurt to check it out before we commit the change. There are a number of programs which traditionally need to be able to access the contents of the FILE buffers directly, particularly with regard to things like "unget", and so on. Mostly, these are mixed-mode programs, which do things like bounce in and out of raw mode, or set cbreak, or modify the value of vmin or vtime, and wish to act properly on already typed ahead on ungetc()'ed characters that have been buffered. It would be terrifically useful, for example, for getpass() to use this to permit scripting of the creation of user accounts (as one example). That it does not work that way means you have to resort to "pw" (a perl abomination) to get the job done right. Historically, things like EMACS and simulations that like to implement command "intertia" (no command in the timeout window means the previous command is in effect) tend to directly manipulate buffered input contents. There is at least one "curses"-like library of which I'm aware that actually manipulates buffered output contents to remove redundant output (e.g. "don't draw X there, if you are going to draw Y there immediately afterward). It's very useful for slow links for things like text editors, where I can delete a character, insert another, and end up with only a single character being redrawn once, instead of to the end of the line from the deletion/insertion point needing to be rendered twice. There are also programs which move stdin/out/err around to effect certain features, without telling the program about it (screen used to be one, so that it could support session detach and reattach). Suffice it to say that not everyone uses the macros, and those who do, tend to not want to recompile the world. You might consider using the old "debugging malloc" trick, of allocating one structure, but referring to another, and reference your "hidden" lock at a negative offset. This would let you pass around FILE objects that were allocated larger than they were supposed to be, and reference locks at a negative offset. This would require some simple pointer math on allocation, and would ensure binary backward compatability with old programs and the new libc, without requiring a version bump at all. If you use this trick, be wary of "#pragma pack()" in scope, since unlike the kernel MUTEX() trick, the relative location of the start of the shadow structure will end up moving around, if you aren't explicit. struct foo { whatever; whatever; ... }; struct foo_with_lock { LOCK alfreds_new_lock; struct foo internal_foo; }; Pass around: struct foo *foop = &(foo_with_lockp->internal_foo); Reference the lock with: CVT_TO_LOCKED(struct foo_with_lock, foop)->alfreds_new_lock #define CVT_TO_LOCKED(x,y) \ (void *)(((char *)(y)) - (int)&(((x *)0)->internal_foo)) I would probably force the packing around the declaration in the header file. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 22:55:52 2000 Delivered-To: freebsd-arch@freebsd.org Received: from hand.dotat.at (sfo-gw.covalent.net [207.44.198.62]) by hub.freebsd.org (Postfix) with ESMTP id 7F71C37B400; Wed, 29 Nov 2000 22:55:49 -0800 (PST) Received: from fanf by hand.dotat.at with local (Exim 3.15 #3) id 141NcV-0007Al-00; Thu, 30 Nov 2000 06:55:03 +0000 Date: Thu, 30 Nov 2000 06:55:03 +0000 From: Tony Finch To: Daniel Eischen Cc: Alfred Perlstein , John Baldwin , Arun Sharma , arch@FreeBSD.ORG Subject: Re: Thread-specific data and KSEs Message-ID: <20001130065503.E58294@hand.dotat.at> References: <20001122133421.S18037@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: Organization: Covalent Technologies, Inc Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Daniel Eischen wrote: >On Wed, 22 Nov 2000, Alfred Perlstein wrote: >> >> Was there something wrong with the suggestion to put the local info >> on the stack? I just don't see it being discussed at all. > >Yes, I stated that it could not be used. We want to provide a POSIX >complaint API, and this dictates that applications be able to create >stacks of their own size and choosing. We can't rely on stacks being >any particular size, or starting at any particular address. Additionally, wouldn't you have to walk up the stack to find its base? (which I guess would be a bit more expensive than dereferencing %gs) Tony. -- f.a.n.finch dot@dotat.at fanf@covalent.net Chad for President! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 23: 8:37 2000 Delivered-To: freebsd-arch@freebsd.org Received: from hand.dotat.at (sfo-gw.covalent.net [207.44.198.62]) by hub.freebsd.org (Postfix) with ESMTP id 57AC937B401; Wed, 29 Nov 2000 23:08:35 -0800 (PST) Received: from fanf by hand.dotat.at with local (Exim 3.15 #3) id 141Nom-0007ZL-00; Thu, 30 Nov 2000 07:07:44 +0000 Date: Thu, 30 Nov 2000 07:07:44 +0000 From: Tony Finch To: Terry Lambert Cc: Daniel Eischen , Alfred Perlstein , John Baldwin , Jonathan Lemon , arch@FreeBSD.ORG, Tony Finch Subject: Re: Thread-specific data and KSEs Message-ID: <20001130070744.F58294@hand.dotat.at> References: <200011240208.TAA06691@usr06.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <200011240208.TAA06691@usr06.primenet.com> Organization: Covalent Technologies, Inc Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Terry Lambert wrote: > >I suspect that someone, somewhere, is working on an OS like the one >at the University of Utah, using source code to migrate processes >between dissimilar architectures (as one over-the-top example). In 1993 I saw an OS called Taos running on a PC with a transputer expansion card, transparently migrating programs between the two architectures using JIT compilation of bytecode. It also had support for ARM and other architectures. They're still around: http://www.tao.co.uk/. Tony. -- f.a.n.finch dot@dotat.at fanf@covalent.net Chad for President! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 23:31:46 2000 Delivered-To: freebsd-arch@freebsd.org Received: from hand.dotat.at (sfo-gw.covalent.net [207.44.198.62]) by hub.freebsd.org (Postfix) with ESMTP id 5BC5837B402 for ; Wed, 29 Nov 2000 23:31:44 -0800 (PST) Received: from fanf by hand.dotat.at with local (Exim 3.15 #3) id 141OBE-0008I8-00; Thu, 30 Nov 2000 07:30:56 +0000 Date: Thu, 30 Nov 2000 07:30:56 +0000 From: Tony Finch To: Jordan Hubbard Cc: Kirk McKusick , arch@FreeBSD.ORG, rps@merlin.mat.uc.pt Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) Message-ID: <20001130073056.G58294@hand.dotat.at> References: <53352.975375693@winston.osd.bsdi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <53352.975375693@winston.osd.bsdi.com> Organization: Covalent Technologies, Inc Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Jordan Hubbard wrote: > >> I do not believe that we need/want a general aliasing facility as >> erase is really the only character for which there is widespead >> disagreement over which character to use. > >Well, there are the ^U vs ^X folks for line-kill (some even argue for >^W) which is why I cited it as another example; I agree that it's by >no means as prevalent as ^H vs DEL though. And we *love* SVR4 OSs that bind ^? to intr. Tony. -- f.a.n.finch dot@dotat.at fanf@covalent.net Chad for President! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 23:39:16 2000 Delivered-To: freebsd-arch@freebsd.org Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.242]) by hub.freebsd.org (Postfix) with ESMTP id B0ADF37B400 for ; Wed, 29 Nov 2000 23:39:14 -0800 (PST) Received: from adlmail.cup.hp.com (adlmail.cup.hp.com [15.0.100.30]) by palrel1.hp.com (Postfix) with ESMTP id ED6BE1113; Wed, 29 Nov 2000 23:39:13 -0800 (PST) Received: from cup.hp.com (p1000180.nsr.hp.com [15.109.0.180]) by adlmail.cup.hp.com (8.9.3 (PHNE_18546)/8.9.3 SMKit7.02) with ESMTP id XAA10696; Wed, 29 Nov 2000 23:39:13 -0800 (PST) Message-ID: <3A260420.6A753ECB@cup.hp.com> Date: Wed, 29 Nov 2000 23:39:12 -0800 From: Marcel Moolenaar Organization: Hewlett-Packard X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.12 i386) X-Accept-Language: en MIME-Version: 1.0 To: Terry Lambert Cc: Alfred Perlstein , Marc Slemko , Daniel Eischen , arch@FreeBSD.ORG Subject: Re: Modifying FILE to add lock References: <200011300653.XAA07381@usr08.primenet.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Terry Lambert wrote: > > You might consider using the old "debugging malloc" trick, > of allocating one structure, but referring to another, and > reference your "hidden" lock at a negative offset. Hmmmm.... yes. This would present an unchanged struct __sFILE to programs, but adding a field at the end would also present an unchanged struct __sFILE. In both cases, the program doesn't know there are more fields; either before or after what it thinks is struct __sFILE. Adding to the struct however is much simpler. -- Marcel Moolenaar mail: marcel@cup.hp.com / marcel@FreeBSD.org tel: (408) 447-4222 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 23:44:50 2000 Delivered-To: freebsd-arch@freebsd.org Received: from winston.osd.bsdi.com (winston.osd.bsdi.com [204.216.27.229]) by hub.freebsd.org (Postfix) with ESMTP id 78B7D37B698; Wed, 29 Nov 2000 23:44:48 -0800 (PST) Received: from winston.osd.bsdi.com (jkh@localhost [127.0.0.1]) by winston.osd.bsdi.com (8.11.1/8.11.1) with ESMTP id eAU7igM76143; Wed, 29 Nov 2000 23:44:42 -0800 (PST) (envelope-from jkh@winston.osd.bsdi.com) To: "Kenneth D. Merry" Cc: arch@FreeBSD.ORG, gallatin@FreeBSD.ORG Subject: Re: zero copy code review In-Reply-To: Message from "Kenneth D. Merry" of "Wed, 29 Nov 2000 23:16:53 MST." <20001129231653.A1503@panzer.kdm.org> Date: Wed, 29 Nov 2000 23:44:42 -0800 Message-ID: <76139.975570282@winston.osd.bsdi.com> From: Jordan Hubbard Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > You need two kernel options and a sysctl to turn it on. The zero copy NFS > code can be turned on with gdb, although it might be better to make that > into a sysctl. (I haven't played with the zero copy NFS code much, Drew I agree that it really should be a sysctl. > Anyway, I'd like to commit this code sometime next week, if no one comes up > with any issues or problems. How about adding that extra sysctl first. :-) - Jordan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Nov 29 23:46:42 2000 Delivered-To: freebsd-arch@freebsd.org Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169]) by hub.freebsd.org (Postfix) with ESMTP id DBAE737B402; Wed, 29 Nov 2000 23:46:39 -0800 (PST) Received: (from ken@localhost) by panzer.kdm.org (8.9.3/8.9.1) id AAA02077; Thu, 30 Nov 2000 00:46:36 -0700 (MST) (envelope-from ken) Date: Thu, 30 Nov 2000 00:46:36 -0700 From: "Kenneth D. Merry" To: Jordan Hubbard Cc: arch@FreeBSD.ORG, gallatin@FreeBSD.ORG Subject: Re: zero copy code review Message-ID: <20001130004636.A2061@panzer.kdm.org> References: <76139.975570282@winston.osd.bsdi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <76139.975570282@winston.osd.bsdi.com>; from jkh@winston.osd.bsdi.com on Wed, Nov 29, 2000 at 11:44:42PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, Nov 29, 2000 at 23:44:42 -0800, Jordan Hubbard wrote: > > You need two kernel options and a sysctl to turn it on. The zero copy NFS > > code can be turned on with gdb, although it might be better to make that > > into a sysctl. (I haven't played with the zero copy NFS code much, Drew > > I agree that it really should be a sysctl. > > > Anyway, I'd like to commit this code sometime next week, if no one comes up > > with any issues or problems. > > How about adding that extra sysctl first. :-) Okay, will-do. :) Ken -- Kenneth Merry ken@kdm.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 1:31:49 2000 Delivered-To: freebsd-arch@freebsd.org Received: from flood.ping.uio.no (flood.ping.uio.no [129.240.78.31]) by hub.freebsd.org (Postfix) with ESMTP id 76BDA37B401 for ; Thu, 30 Nov 2000 01:31:47 -0800 (PST) Received: (from des@localhost) by flood.ping.uio.no (8.9.3/8.9.3) id KAA79516; Thu, 30 Nov 2000 10:31:23 +0100 (CET) (envelope-from des@ofug.org) X-URL: http://www.ofug.org/~des/ X-Disclaimer: The views expressed in this message do not necessarily coincide with those of any organisation or company with which I am or have been affiliated. To: Tony Finch Cc: Jordan Hubbard , Kirk McKusick , arch@FreeBSD.ORG, rps@merlin.mat.uc.pt Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) References: <53352.975375693@winston.osd.bsdi.com> <20001130073056.G58294@hand.dotat.at> From: Dag-Erling Smorgrav Date: 30 Nov 2000 10:31:23 +0100 In-Reply-To: Tony Finch's message of "Thu, 30 Nov 2000 07:30:56 +0000" Message-ID: Lines: 9 User-Agent: Gnus/5.0802 (Gnus v5.8.2) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Tony Finch writes: > And we *love* SVR4 OSs that bind ^? to intr. The only one I've come across that does that is IRIX, but it's really a *major* PITA. DES -- Dag-Erling Smorgrav - des@ofug.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 1:50:18 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mimer.webgiro.com (unknown [213.162.128.50]) by hub.freebsd.org (Postfix) with ESMTP id 03C4B37B401 for ; Thu, 30 Nov 2000 01:50:14 -0800 (PST) Received: by mimer.webgiro.com (Postfix, from userid 66) id F20682DC0B; Thu, 30 Nov 2000 10:52:15 +0100 (CET) Received: by mx.webgiro.com (Postfix, from userid 1001) id 580E77817; Thu, 30 Nov 2000 10:48:43 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by mx.webgiro.com (Postfix) with ESMTP id 4792010E1B; Thu, 30 Nov 2000 10:48:43 +0100 (CET) Date: Thu, 30 Nov 2000 10:48:43 +0100 (CET) From: Andrzej Bialecki To: Terry Lambert Cc: Alfred Perlstein , arch@FreeBSD.ORG Subject: Re: HEADSUP user struct ucred -> xucred (Was: Re: serious problem with mutexs and userland visibility?) In-Reply-To: <200011300628.XAA06955@usr08.primenet.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 30 Nov 2000, Terry Lambert wrote: > > > Unfortunetly struct ucred is used by some userland utils and > > > sys/ucred is included in sys/mount.h as well as sys/user.h, this > > > creates somewhat of a problem, forcing all users of sys/ucred.h to > > > include sys/mutex.g. > > > > > > I have a patch here that sort of takes care of this problem, the > > > problem is that I had to add sys/mutex.h includes to both sys/mount.h > > > and sys/user.h, this doesn't make me very happy. > > > > After a short discussion it has been determined that there will be > > a xucred exported to userland following the concention of xsocket > > and the various other xfoo structs exported to the kernel. > > > > Struct ucred will no longer be visible outside the kernel. > > > > Any userland things using struct ucred will need to use xucred. > > > > This will be the convention used to resolve mutex (or other MD > > fields) in kernel exported structures in the future. > > This is a really gross way to handle this. The ucred structure > is used by a lot of user space programs. > > You should do what several UNIX vendors have already done, and > implement a MUTEX() declaration macro that differes in user and > kernel space, and forces an alignment; then when you copy out, > copy out everything _BUT_ the mutex portion to the user space, > and no user space source or object code will need to change. But don't we have the same issue with other parts of kernel structures that we don't want to make visible to userland, not just the mutexes. I had some discussion with Robert Watson a few days ago about the need to hide the layout of struct proc (and the changes it undergoes) from userland, which would allow to stabilize kernel interface to user utilities, like libkvm and friends (which probably should use specialized sysctl anyway). This goal would be quite difficult to achieve with just macros (and ugly at that..), so we thought about fixing all places where these structs are accessible to use special version of "user space struct proc" (== struct xproc? :-). This way no user space code will have to be changed (more than today, i.e. recompile libkvm et al., as usual), we could hide the complexities that we don't want to be visible outside the kernel, and we gain the stability in kernel/user interface (i.e. no more recompiles of userland needed if you update the kernel with changed struct proc size). Andrzej Bialecki // WebGiro AB, Sweden (http://www.webgiro.com) // ------------------------------------------------------------------- // ------ FreeBSD: The Power to Serve. http://www.freebsd.org -------- // --- Small & Embedded FreeBSD: http://www.freebsd.org/~picobsd/ ---- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 1:55: 9 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mimer.webgiro.com (unknown [213.162.128.50]) by hub.freebsd.org (Postfix) with ESMTP id 7552737B400 for ; Thu, 30 Nov 2000 01:55:07 -0800 (PST) Received: by mimer.webgiro.com (Postfix, from userid 66) id BAFA42DC0E; Thu, 30 Nov 2000 10:57:16 +0100 (CET) Received: by mx.webgiro.com (Postfix, from userid 1001) id A2BC07817; Thu, 30 Nov 2000 10:51:41 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by mx.webgiro.com (Postfix) with ESMTP id 9474E10E1B; Thu, 30 Nov 2000 10:51:41 +0100 (CET) Date: Thu, 30 Nov 2000 10:51:41 +0100 (CET) From: Andrzej Bialecki To: Dag-Erling Smorgrav Cc: arch@FreeBSD.ORG Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 30 Nov 2000, Dag-Erling Smorgrav wrote: > Tony Finch writes: > > And we *love* SVR4 OSs that bind ^? to intr. > > The only one I've come across that does that is IRIX, but it's really > a *major* PITA. SCO OpenServer does this as well. I hate it. Andrzej Bialecki // WebGiro AB, Sweden (http://www.webgiro.com) // ------------------------------------------------------------------- // ------ FreeBSD: The Power to Serve. http://www.freebsd.org -------- // --- Small & Embedded FreeBSD: http://www.freebsd.org/~picobsd/ ---- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 2:26:21 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 2B18A37B400 for ; Thu, 30 Nov 2000 02:26:19 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id eAUAQEn18760; Thu, 30 Nov 2000 02:26:14 -0800 (PST) Date: Thu, 30 Nov 2000 02:26:14 -0800 From: Alfred Perlstein To: Andrzej Bialecki Cc: Terry Lambert , arch@FreeBSD.ORG Subject: Re: HEADSUP user struct ucred -> xucred (Was: Re: serious problem with mutexs and userland visibility?) Message-ID: <20001130022614.W8051@fw.wintelcom.net> References: <200011300628.XAA06955@usr08.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from abial@webgiro.com on Thu, Nov 30, 2000 at 10:48:43AM +0100 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Andrzej Bialecki [001130 01:50] wrote: > On Thu, 30 Nov 2000, Terry Lambert wrote: > > > > After a short discussion it has been determined that there will be > > > a xucred exported to userland following the concention of xsocket > > > and the various other xfoo structs exported to the kernel. > > > > You should do what several UNIX vendors have already done, and > > implement a MUTEX() declaration macro that differes in user and > > kernel space, and forces an alignment; then when you copy out, > > copy out everything _BUT_ the mutex portion to the user space, > > and no user space source or object code will need to change. > > But don't we have the same issue with other parts of kernel structures > that we don't want to make visible to userland, not just the > mutexes. True. > I had some discussion with Robert Watson a few days ago about the need to > hide the layout of struct proc (and the changes it undergoes) from > userland, which would allow to stabilize kernel interface to user > utilities, like libkvm and friends (which probably should use > specialized sysctl anyway). This goal would be quite difficult to achieve > with just macros (and ugly at that..), so we thought about fixing all > places where these structs are accessible to use special version of "user > space struct proc" (== struct xproc? :-). Ok, kvm is killing me. :/ see: ~"lib/libkvm/kvm_proc.c" line 125 of 793 libkvm expects to be able to copy the pointer in the struct proc into its own struct. My only chance (or so it seems) is to keep all userland visible parts of the ucred at the begininning of it, as well as forcing the same order to keep libkvm happy. Then it can effectively: bcopy(struct ucred *uc, struct xucred *xuc, sizeof(struct xucred)); without worries, this is pretty hackish, but libkvm isn't exactly your state of the art interface. This is pretty close to what Terry suggested but less scary in my opinion as long as we add a comment to sys/ucred.h about keeping kernel only feilds at the end of the struct. ? -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 5:44:41 2000 Delivered-To: freebsd-arch@freebsd.org Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1]) by hub.freebsd.org (Postfix) with ESMTP id 6B7AA37B401 for ; Thu, 30 Nov 2000 05:44:39 -0800 (PST) Received: from berserker.bsdi.com (cp@localhost.bsdi.com [127.0.0.1]) by berserker.bsdi.com (8.11.1/8.9.3) with ESMTP id eAUDiTv03105; Thu, 30 Nov 2000 06:44:29 -0700 (MST) (envelope-from cp@berserker.bsdi.com) Message-Id: <200011301344.eAUDiTv03105@berserker.bsdi.com> To: Alfred Perlstein Cc: arch@FreeBSD.ORG Subject: Re: serious problem with mutexs and userland visibility? In-reply-to: Your message of "Wed, 29 Nov 2000 12:55:09 PST." <20001129125508.O8051@fw.wintelcom.net> From: Chuck Paterson Date: Thu, 30 Nov 2000 06:44:29 -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG You might want to look at how the lock manager deals with mutices. This same approach ought to work for the cred stuff which has a lower usage rate that the lock manager, and you can adjust you level of lock sharing. Chuck To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 6:49:39 2000 Delivered-To: freebsd-arch@freebsd.org Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178]) by hub.freebsd.org (Postfix) with ESMTP id 5160337B400; Thu, 30 Nov 2000 06:49:34 -0800 (PST) Received: by peorth.iteration.net (Postfix, from userid 1001) id D45D9573A5; Thu, 30 Nov 2000 08:49:31 -0600 (CST) Date: Thu, 30 Nov 2000 08:49:31 -0600 From: "Michael C . Wu" To: Cy Schubert - ITSD Open Systems Group Cc: Poul-Henning Kamp , current@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: RFC: /dev/console -> /var/log/messages idea/patch Message-ID: <20001130084931.C16834@peorth.iteration.net> Reply-To: "Michael C . Wu" References: <1050.974925641@critter> <200011251540.eAPFe4N00849@cwsys.cwsent.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200011251540.eAPFe4N00849@cwsys.cwsent.com>; from Cy.Schubert@uumail.gov.bc.ca on Sat, Nov 25, 2000 at 07:39:33AM -0800 X-PGP-Fingerprint: 5025 F691 F943 8128 48A8 5025 77CE 29C5 8FA1 2E20 X-PGP-Key-ID: 0x8FA12E20 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sat, Nov 25, 2000 at 07:39:33AM -0800, Cy Schubert - ITSD Open Systems Group scribbled: | In message <1050.974925641@critter>, Poul-Henning Kamp writes: | > | > The attached patch is a "proof-of-concept" on which I would like | > to get some comments: | > | > It bugs me big time that the output from /etc/rc and all other output | > to /dev/console is volatile and lost once it scrolls of your console. | | It's a no-brainer. Let's do it. How about networked ddb/gdb over {ether,ppp,usb,firewire,IrDA}? Firewire and IrDA are works in progress AFAIK, but certainly ddb/gdb networked debugging is what all FreeBSD dream of, right? :) The PPC port would greatly benefit from this, as newer Apple stations do not even have a serial port. Darwin seems to have networked debugging. -- +------------------------------------------------------------------+ | keichii@peorth.iteration.net | keichii@bsdconspiracy.net | | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. | +------------------------------------------------------------------+ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 7: 3: 6 2000 Delivered-To: freebsd-arch@freebsd.org Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178]) by hub.freebsd.org (Postfix) with ESMTP id 465AE37B400 for ; Thu, 30 Nov 2000 07:02:59 -0800 (PST) Received: by peorth.iteration.net (Postfix, from userid 1001) id 2C835573A5; Thu, 30 Nov 2000 09:03:01 -0600 (CST) Date: Thu, 30 Nov 2000 09:03:01 -0600 From: "Michael C . Wu" To: Kirk McKusick Cc: Jordan Hubbard , arch@FreeBSD.ORG Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) Message-ID: <20001130090301.D16834@peorth.iteration.net> Reply-To: "Michael C . Wu" References: <52694.975362925@winston.osd.bsdi.com> <200011272241.OAA93364@beastie.mckusick.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200011272241.OAA93364@beastie.mckusick.com>; from mckusick@mckusick.com on Mon, Nov 27, 2000 at 02:41:05PM -0800 X-PGP-Fingerprint: 5025 F691 F943 8128 48A8 5025 77CE 29C5 8FA1 2E20 X-PGP-Key-ID: 0x8FA12E20 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, Nov 27, 2000 at 02:41:05PM -0800, Kirk McKusick scribbled: | When we first implemented termios at CSRG, we had an erase2 | character. Mike Karels was vehemently opposed to it, and | insisted that it be deleted before we did our next release | (4.3-tahoe if I remember correctly). I am of the opinion that | it is a good idea, and should be there. I do not believe that | we need/want a general aliasing facility as erase is really | the only character for which there is widespead disagreement | over which character to use. So, my take would be to add | erase2 and be done with it. /me putting on I18N crybaby hat This feature has one very important aspect that I18N can use very well. Currently, for two-byte characters, we need to put delete twice in console/tty/et al. The best way to solve this would be having the tty determine whethere it is a two-byte or one-byte character. Then the tty determines whether to push ^H/^? once or twice depending on the character. It would be easy to simply alias backspace/delete to two "^H/^?"'s when we meet a two-byte character. Please do not lock us into hardcoding these erase2 characters and assume that everybody uses English only. I am not pointing fingers, but this mistake was made many years ago in all *nix systems, perhaps we should not hardcode this kind of stuff again. :) /me hides and takes off all hats -- +------------------------------------------------------------------+ | keichii@peorth.iteration.net | keichii@bsdconspiracy.net | | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. | +------------------------------------------------------------------+ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 8: 6:53 2000 Delivered-To: freebsd-arch@freebsd.org Received: from feral.com (feral.com [192.67.166.1]) by hub.freebsd.org (Postfix) with ESMTP id F194B37B402 for ; Thu, 30 Nov 2000 08:06:40 -0800 (PST) Received: from beppo (beppo [192.67.166.79]) by feral.com (8.9.3/8.9.3) with ESMTP id IAA26804; Thu, 30 Nov 2000 08:06:16 -0800 Date: Thu, 30 Nov 2000 08:06:17 -0800 (PST) From: Matthew Jacob Reply-To: mjacob@feral.com To: Andrzej Bialecki Cc: Dag-Erling Smorgrav , arch@FreeBSD.ORG Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Tsk. Then y'all wouldn't love to have a PDP 11/45 running V6/PWB, now would you then? Jeez. All this beefing over defaults, and nobody has the little gray cells to thank whatever deities they might believe in that it is possible to change these defaults as part of their login process- a feature that is there so they can do something clever like turn off that pesky echokill feature and change their line kill character to SPACE (a favorite amongst us who were young once and decided to stay that way- this was the default action we would do to someone who wandered off and left themselves logged in to one of the Vt52s). -matt On Thu, 30 Nov 2000, Andrzej Bialecki wrote: > On 30 Nov 2000, Dag-Erling Smorgrav wrote: > > > Tony Finch writes: > > > And we *love* SVR4 OSs that bind ^? to intr. > > > > The only one I've come across that does that is IRIX, but it's really > > a *major* PITA. > > SCO OpenServer does this as well. I hate it. > > Andrzej Bialecki > > // WebGiro AB, Sweden (http://www.webgiro.com) > // ------------------------------------------------------------------- > // ------ FreeBSD: The Power to Serve. http://www.freebsd.org -------- > // --- Small & Embedded FreeBSD: http://www.freebsd.org/~picobsd/ ---- > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 8:23:38 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mail.interware.hu (mail.interware.hu [195.70.32.130]) by hub.freebsd.org (Postfix) with ESMTP id 4C81837B400 for ; Thu, 30 Nov 2000 08:23:35 -0800 (PST) Received: from luanda-16.budapest.interware.hu ([195.70.51.16] helo=elischer.org) by mail.interware.hu with esmtp (Exim 3.16 #1 (Debian)) id 141WUc-0001yp-00; Thu, 30 Nov 2000 17:23:31 +0100 Message-ID: <3A2664AC.493B4101@elischer.org> Date: Thu, 30 Nov 2000 06:31:08 -0800 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Tony Finch Cc: arch@freebsd.org Subject: Re: Thread-specific data and KSEs References: <20001122133421.S18037@fw.wintelcom.net> <20001130065503.E58294@hand.dotat.at> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Tony Finch wrote: > > Daniel Eischen wrote: > >On Wed, 22 Nov 2000, Alfred Perlstein wrote: > >> > >> Was there something wrong with the suggestion to put the local info > >> on the stack? I just don't see it being discussed at all. > > > >Yes, I stated that it could not be used. We want to provide a POSIX > >complaint API, and this dictates that applications be able to create > >stacks of their own size and choosing. We can't rely on stacks being > >any particular size, or starting at any particular address. > > Additionally, wouldn't you have to walk up the stack to find its base? > (which I guess would be a bit more expensive than dereferencing %gs) No, you start each stack on some multiple of (say) 1MB and then you just or it with 0xfffff to find the top of the stack.. (This is what one of the MACH threads packages used to do) > > Tony. > -- > f.a.n.finch dot@dotat.at fanf@covalent.net Chad for President! > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000 ---> X_.---._/ presently in: Budapest v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 13:22:25 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222]) by hub.freebsd.org (Postfix) with ESMTP id ECB0D37B402 for ; Thu, 30 Nov 2000 13:22:20 -0800 (PST) Received: from laptop.baldwin.cx (john@dhcp246.osd.bsdi.com [204.216.28.246]) by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eAULM8C71530; Thu, 30 Nov 2000 13:22:09 -0800 (PST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <200011300628.XAA06955@usr08.primenet.com> Date: Thu, 30 Nov 2000 13:22:29 -0800 (PST) From: John Baldwin To: Terry Lambert Subject: Re: HEADSUP user struct ucred -> xucred (Was: Re: serious proble Cc: arch@FreeBSD.org, (Alfred Perlstein) Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 30-Nov-00 Terry Lambert wrote: >> > I recently locked down struct ucred, not a big deal, basically just >> > a mutex in each struct to protect the refcount. >> > >> > Unfortunetly struct ucred is used by some userland utils and >> > sys/ucred is included in sys/mount.h as well as sys/user.h, this >> > creates somewhat of a problem, forcing all users of sys/ucred.h to >> > include sys/mutex.g. >> > >> > I have a patch here that sort of takes care of this problem, the >> > problem is that I had to add sys/mutex.h includes to both sys/mount.h >> > and sys/user.h, this doesn't make me very happy. >> >> After a short discussion it has been determined that there will be >> a xucred exported to userland following the concention of xsocket >> and the various other xfoo structs exported to the kernel. >> >> Struct ucred will no longer be visible outside the kernel. >> >> Any userland things using struct ucred will need to use xucred. >> >> This will be the convention used to resolve mutex (or other MD >> fields) in kernel exported structures in the future. > > This is a really gross way to handle this. The ucred structure > is used by a lot of user space programs. Another way I suggested that was shotdown was to do something like this: #ifdef _KERNEL struct ucred { ... kernel structure ... }; struct xucred { #else struct ucred { #endif ... userland structure ... }; So that ucred didn't change for userland, but the kernel would have ucred for its internal ucred and xucred for the userland ucred. This allows no userland changes, and all you would need to do is convert ucred to xucred and vice versa at the boundary. -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 14:47:11 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id C472A37B400 for ; Thu, 30 Nov 2000 14:47:05 -0800 (PST) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id PAA26953; Thu, 30 Nov 2000 15:43:50 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp05.primenet.com, id smtpdAAA2NaOM0; Thu Nov 30 15:43:43 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id PAA23413; Thu, 30 Nov 2000 15:46:56 -0700 (MST) From: Terry Lambert Message-Id: <200011302246.PAA23413@usr05.primenet.com> Subject: Re: HEADSUP user struct ucred -> xucred (Was: Re: serious problem To: abial@webgiro.com (Andrzej Bialecki) Date: Thu, 30 Nov 2000 22:46:55 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), bright@wintelcom.net (Alfred Perlstein), arch@FreeBSD.ORG In-Reply-To: from "Andrzej Bialecki" at Nov 30, 2000 10:48:43 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > But don't we have the same issue with other parts of kernel structures > that we don't want to make visible to userland, not just the > mutexes. > > I had some discussion with Robert Watson a few days ago about the need to > hide the layout of struct proc (and the changes it undergoes) from > userland, which would allow to stabilize kernel interface to user > utilities, like libkvm and friends (which probably should use > specialized sysctl anyway). This goal would be quite difficult to achieve > with just macros (and ugly at that..), so we thought about fixing all > places where these structs are accessible to use special version of "user > space struct proc" (== struct xproc? :-). > > This way no user space code will have to be changed (more than today, > i.e. recompile libkvm et al., as usual), we could hide the complexities > that we don't want to be visible outside the kernel, and we gain the > stability in kernel/user interface (i.e. no more recompiles of userland > needed if you update the kernel with changed struct proc size). If you want to get technical, data interfaces are bad engineering, a bad idea all around, and something which should be immediately deprecated. XML is surrounded by similar problems. Really, there should be _NO_ reading of /dev/kmem, under any circumstances. Likewise, there should never be a case where a kernel structure is copied out to user space directly: all data to be externalized should be abstracted before it is externalized. So the canonically correct thing to so would be to surround most of the kernel dependent headers with "#ifdef _KERNEL", and not externalize _ANY_ structure declarations, whatsoever. There are two major, and many minor, problems with this approach, which boil down to data interfaces with no other available method to solve the problem (today). The first is latent interfaces, and the second is bimodal interfaces. A latent interface occurs when data is communicated with a latency, and the latency is unavoidable, and can not be easily worked around in code. The number one latent interface is the file system, with the latencies being present in newfs, tunefs, fsck, and other utilities. Since these utilities operate on data which is not visible to the kernel (for good reason!) at the time of the operation, the only option is a latent interface, or rolling the functionality into the kernel itself. This could be done, but it's prohibitively expensive without discardable code segments, which, while supported by ELF, are not supported by FreeBSD. Even were these supported by FreeBSD, you would still need to deal with discrete kernel object files, since the issue of license can not be resolved in a static linkage. In other words, it's possible to deal with this (Windows supports ELF [PE: Portable Executable] objects with segment attributes, including "initialization", "discardable", "pageable", etc.), but FreeBSD does not have the necessary technical sophistication at the present time. A bimodal interface is an interface intended to operate both interactively, and against latent data, potentially with huge latencies which can not be overcome with segment attribution, etc.. An example of an interface like this is the interface used by the "ps" command in order to obtain information from the current system image (the granddaddy of all of these is a kernel debuger). Since the "ps" command must be able to run against the existing system, and it must be able to run against a crashdump of a system, perhaps sent via parcel post or carrier pigeon, the interfaces it uses can not be seperated from the data against which they are implemented. Worst case, "_KERNEL" could be defined in scope, and the utilities could remain in user space. The second case here is the most interesting, and the most applicable to the ucred structure under discussion. Actually, the "ps" command has limited utility against a crash dump. This is because it is linked against a libkvm, and has itself intimate knowledge of a kernel structure (a historically volatile one -- proc -- which is shown no signs of stabilizing, in fact). The libkv, provides symbolic reference to the kmem image data base addresses, which can then be followed as linked lists in order to obtain information. The information is then interpreted by the "ps" program itself, based on its knowledge of the structure contents. I think in the limit, this interface will have to die. Consider the case of a "ps" command in user space, with the proc struct list protected by mutex from multiple CPUs and/or kernel preemption: the user space program will neither honor, nor will it itself assert, the protection mutex. This means that it may be running on one processor, while another is manipulating the structure linkages. Best case failure mode is the user space process sees the list appear to terminate prematurely. Worst case, the user space process causes a fault while reading kmem, or sees a circular reference, and fails to terminate properly, spending all its time traversing the circular reference. Another problem that will commonly arise is that the proc struct known to the "ps" program, or the information known to the libkvm, will change. When you go to apply this information to an older image, the newer tools will not operate. It's a royal pain, but it is possible to resynchronize this information in the common interactive case, by insisting that builds be grouped. For the latent data case, this will not work. In fact, most people who follow -current have, at one time or another, found themselves booted on a "kernel.old" because the new "kernel" was too unstable to use, even to correct the stability problem as a bootstrap for replacing itself. When this happens subsequent to a rebuild of libkvm and "ps" (and other utilities, such as "mount"), it is not as easy to revert the rest of the system as it was to revert the kernel. One way to deal with this problem would be to attach segments to the running kernel, which implement libkvm. Programs could map these in and use them as they would use any shared library to get kvm information. This is attractive, since it means that you could map your libkvm from the crashdump image, instead of the running kernel, or an old kernel (if symbols could not be obtained from the dump image, only from the kernel of which the image is a dump; I dislike this, as it means pushing around synchronized file sets, but it's at least a workable kludge). In this scenario, the libkvm/kernel synchronization problem has been resolved. This still leaves us with the "ps" program knowing about the proc structure (and the "mount" program knowing about the mount parameter structure, etc.). This initimate knowledge can only be worked around by abstraction. This might consist of providing a set of descriptors for data elements, and externalizing this as "ps" formatting argument strings, etc.. These descriptors could be bundled in with what was previously described as the shared objects that could be bundeled with the kernel, and mapped by user programs. This would provide a generic API to a protocol, defined by the descriptors interpretation at compile time and at runtime of the program using the descriptors. Not as abstract as SMTP, but a lot better than an application centric API for doing the same thing, and infinitely better than a data interface. This still doesn't resolve the SMP problem. This could be handled by externalizing access to the locks to user space. This would, IMO, be a terrible mistake. A second approach would be to define an access point that could act as an API when used interactively, and as a data interface when used latently. This is actually rather easy, when you realize that latent use will be against a static snapshot, and not have to worry about locking. The locking can be hidden behind the API, and the API can straddle a user/kernel boundary. For "ps", the most logical API is a procfs. The procfs can act as a descriptor tree automatically, since FSs are themselves hierarchical in nature. Similarly, the in-core implementation is such that the structure representing it can be traversed as data, in a static image (ideally, however, one would want to "fake" an FS interface, so as to keep the "shared library" segments of the kernel small, even though they are never loaded by the kernel into the kernel address space; this "faking" could be done by abstracting file I/O using libkvm descriptors, and by providing control over syspace vs. userspace copying when trying to do a "uiomove" to externalize FS data). In any case, the SMP problem means that the data interfaces must die, at least in as far as they apply to active systems, rather than crash dumps. If they die, then there is no kernel structure externalization to worry about (with the side benefit of not needing to recompile "ps" and the rest of the tools which use kmem or externalized kernel structures, each time those structures are changed). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 14:51:43 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id 1FF6237B400 for ; Thu, 30 Nov 2000 14:51:41 -0800 (PST) Received: (from daemon@localhost) by smtp02.primenet.com (8.9.3/8.9.3) id PAA08490; Thu, 30 Nov 2000 15:47:13 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp02.primenet.com, id smtpdAAA3WaqHq; Thu Nov 30 15:47:04 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id PAA23494; Thu, 30 Nov 2000 15:51:25 -0700 (MST) From: Terry Lambert Message-Id: <200011302251.PAA23494@usr05.primenet.com> Subject: Re: HEADSUP user struct ucred -> xucred (Was: Re: serious problem with mutexs and userland visibility?) To: bright@wintelcom.net (Alfred Perlstein) Date: Thu, 30 Nov 2000 22:51:25 +0000 (GMT) Cc: abial@webgiro.com (Andrzej Bialecki), tlambert@primenet.com (Terry Lambert), arch@FreeBSD.ORG In-Reply-To: <20001130022614.W8051@fw.wintelcom.net> from "Alfred Perlstein" at Nov 30, 2000 02:26:14 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > Ok, kvm is killing me. :/ Data interfaces suck. > see: > ~"lib/libkvm/kvm_proc.c" line 125 of 793 > > libkvm expects to be able to copy the pointer in the struct proc into > its own struct. > > My only chance (or so it seems) is to keep all userland visible parts > of the ucred at the begininning of it, as well as forcing the same > order to keep libkvm happy. Then it can effectively: > > bcopy(struct ucred *uc, struct xucred *xuc, sizeof(struct xucred)); > > without worries, this is pretty hackish, but libkvm isn't exactly > your state of the art interface. > > This is pretty close to what Terry suggested but less scary in > my opinion as long as we add a comment to sys/ucred.h about > keeping kernel only feilds at the end of the struct. > > ? What happens when you add a new _not_ kernel-only field and boot an older kernel because the newer kernel is unstable? You need to get away from data interfaces. Please see my other posting in this thread: mutex protected data objects accessed via data interface in a userland which neither asserts nor honors the mutex are inhernetly SMP and kernel preemption unsafe. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 15: 0: 9 2000 Delivered-To: freebsd-arch@freebsd.org Received: from falla.videotron.net (falla.videotron.net [205.151.222.106]) by hub.freebsd.org (Postfix) with ESMTP id 10BA537B402; Thu, 30 Nov 2000 15:00:00 -0800 (PST) Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213]) by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G4V00MKE17X64@falla.videotron.net>; Thu, 30 Nov 2000 17:59:57 -0500 (EST) Date: Thu, 30 Nov 2000 18:00:38 -0500 (EST) From: Bosko Milekic Subject: Re: zero copy code review In-reply-to: <20001129231653.A1503@panzer.kdm.org> To: "Kenneth D. Merry" Cc: arch@FreeBSD.ORG, gallatin@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hi, On Wed, 29 Nov 2000, Kenneth D. Merry wrote: > [ -net and -current BCCed for wider coverage, this is probably best > handled on -arch ] > > I would like to request reviews of the zero copy sockets and NFS code I've > been posting about for months: > > http://people.FreeBSD.org/~ken/zero_copy > > There are diffs posted above against -current as of early November 28th, > along with a FAQ, and change log. > > These diffs include changes in: > > - the socket code > - NFS code > - VM code > - ti(4) driver > - sendfile code > > Much of the code was written by Drew Gallatin , but I > wrote a lot of the ti(4) driver mods and cleaned things up a fair bit. > > The code is stable, and I don't know of any bugs at the moment. I have run > with it enabled on one of my main development boxes for months without any > problems. > > The way things are currently configured, it is not turned on by default. > You need two kernel options and a sysctl to turn it on. The zero copy NFS > code can be turned on with gdb, although it might be better to make that > into a sysctl. (I haven't played with the zero copy NFS code much, Drew > has done much more with that.) > > How to turn the code on is covered in the web page, above. > > Anyway, I'd like to commit this code sometime next week, if no one comes up > with any issues or problems. > > Comments, bug reports, etc., are welcome. In general, I am pro-the zero copy stuff you've been gathering/merging/updating/writing/etc. over the past several months. Looking at the sendfile portion of your changes, it's pretty obvious that they are very minimal, but I'm curious as to why you've bothered removing the "static" before the sf_buf_free(). I can see why it really has no significance in the sf_buf_alloc() case, but sf_buf_free() is attached to the mbuf's m_ext free function pointer (I'm really just curious if the motivation was strictly stylistic). Here some other notes, which I came across during a real quick read of some of the code (I am sort of in a pre-final-exam period, so I can't dedicate too much time to this for the next 2 weeks, about :-( ): in nfs/nfs_serv.c: In your first "BEGIN SUSPECT REGION" block: - You allocate an sf_buf somewhere down the line and then attempt to allocate an mbuf to which you will hope to attach the sf_buf to. If the mbuf allocation fails, you don't seem to free the sf_buf anywhere and consequently, it looks as though you may leak sf_bufs. - You only m_freem() on mb (the header mbuf) if mb->m_next != NULL, but if there is no m_next (m_next == NULL), you don't seem to free the mb mbuf (header mbuf) at all. Is this meant to be this way? (Note that it may very well be, I haven't looked at all the other surrounding code, just making sure). - In the actual MEXTADD(), you don't seem to be passing the M_RDONLY flag (which is done for sendfile buffer ext mbufs). M_RDONLY is used to indicate to the rest of the code that the m_data is not to be tampered with (trimmed, et al) -- in other words, it's read-only. Have you considered it? - Stylistic suggestion: please try to keep things 25x80. :-) [ skipped all the other NFS + ti driver changes ] jumbo.h: I would like to eventually split the cluster code out of mbuf.h and uipc_mbuf.c and change jumbo.h/uipc_jumbo.c -> cluster.h/uipc_cluster.c mbuf.h: - Make EXT_DISPOSABLE 3, instead of 300... if you decide to keep it. The reason I say this is because it seems to me that EXT_DISPOSABLE should be more of an m_flag than an ext_type, which would probably mean that we should make m_flag bigger than a short (which it is now). The reason I argue this is because EXT_DISPOSABLE seems to be more of an indication of what should be done with the contents of the mbuf. Perhaps what needs to be done instead is make the EXT_DISPOSABLE flag, have if_ti use the DRV ext type (like it should be doing) for its external buffers, and make it set EXT_DISPOSABLE|M_RDONLY during the MEXTADD. Let's not get too strict with this for now, though, it would be better to make sure everything is working perfectly until we decide what to do with this - and it can be changed easily later. tiio.h: Are you sure tiio.h belongs in src/sys/sys ? Also, have you checked whether any locking should be performed here? Considering that this is all supposed to improve performance, it would be nice if it didn't all need to run under Giant. I realize that some of this will have to wait (i.e. VM), but what about the if_ti code? Is that something that can be looked at RSN? I would strongly urge you to run some tests under real heavy network activity (possibly lower NMBCLUSTERS for this) and starve out the mbuf resources and see if anything strange happens - you may catch a couple of leaks that may have accidently slipped through. Finally, I'd like to suggest possibly breaking up some of the diff to smaller chunks, just so it is easier to track things down if something does break. With -CURRENT changing relatively dramatically now sometimes several times in a single day, I think this would be worth it for everybody. > Thanks! > > Ken > -- > Kenneth Merry > ken@kdm.org Thank *you*! Regards, Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 15:29:28 2000 Delivered-To: freebsd-arch@freebsd.org Received: from netau1.alcanet.com.au (ntp.alcanet.com.au [203.62.196.27]) by hub.freebsd.org (Postfix) with ESMTP id 368DF37B400; Thu, 30 Nov 2000 15:29:23 -0800 (PST) Received: from mfg1.cim.alcatel.com.au (mfg1.cim.alcatel.com.au [139.188.23.1]) by netau1.alcanet.com.au (8.9.3 (PHNE_18979)/8.9.3) with ESMTP id KAA21472; Fri, 1 Dec 2000 10:29:19 +1100 (EDT) Received: from gsmx07.alcatel.com.au by cim.alcatel.com.au (PMDF V5.2-32 #37645) with ESMTP id <01JX6QHY1B00EAF49C@cim.alcatel.com.au>; Fri, 1 Dec 2000 10:29:17 +1100 Received: (from jeremyp@localhost) by gsmx07.alcatel.com.au (8.11.0/8.11.0) id eAUNTF802533; Fri, 01 Dec 2000 10:29:15 +1100 (EST envelope-from jeremyp) Content-return: prohibited Date: Fri, 01 Dec 2000 10:29:15 +1100 From: Peter Jeremy Subject: Re: RANDOMDEV inspired realitycheck regarding i386/i486... In-reply-to: <11485.974210886@critter>; from phk@FreeBSD.ORG on Tue, Nov 14, 2000 at 03:08:06PM +0100 To: Poul-Henning Kamp Cc: arch@FreeBSD.ORG Mail-followup-to: Poul-Henning Kamp , arch@FreeBSD.ORG Message-id: <20001201102915.G1474@gsmx07.alcatel.com.au> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Content-disposition: inline User-Agent: Mutt/1.2.5i References: <11485.974210886@critter> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 2000-Nov-14 15:08:06 +0100, Poul-Henning Kamp wrote: >Has anybody run a 486 or 386 under current recently ? X on a PRE_SMPNG 486 is painful - mouse movements no longer make the X pointer move in real time. I haven't noticed the seeding issue (probably just luck). >What is the consensus ? I think 386/486 remains a significant market and would not like to see support dropped. I'd go so far as to suggest that if -current does drop support for the 386/486, the then-stable version will need to be actively maintained indefinitely to provide continued support. Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 17:19:12 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mail.hz.zj.cn (unknown [202.101.172.2]) by hub.freebsd.org (Postfix) with SMTP id C4D9237B401 for ; Thu, 30 Nov 2000 17:19:08 -0800 (PST) Received: from xyf([61.130.65.225]) by mail.hz.zj.cn(JetMail 2.5.3.0) with SMTP id jmc3a270a0e; Fri, 1 Dec 2000 01:19:06 -0000 Message-ID: <002501c05b34$b1609de0$e001a8c0@xyf> From: "xuyifeng" To: "Julian Elischer" , "Tony Finch" Cc: References: <20001122133421.S18037@fw.wintelcom.net> <20001130065503.E58294@hand.dotat.at> <3A2664AC.493B4101@elischer.org> Subject: Re: Thread-specific data and KSEs Date: Fri, 1 Dec 2000 09:17:12 +0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: base64 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2615.200 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG YnV0IHRoaXMgbGltaXRzIHRvdGFsIHRocmVhZHMgdG8gbGVzcyB0aGFuIDIwMDAgaWYgdGhlIHBy b2Nlc3MgYWRkcmVzcyBzcGFjZSBpcyAyRywNCnRocmVhZHMgbmVlZG4ndCAxTSBzdGFjayBzcGFj ZSBpbiBtb3N0IGNhc2UuDQoNClh1WWlmZW5nDQoNCi0tLS0tIE9yaWdpbmFsIE1lc3NhZ2UgLS0t LS0gDQpGcm9tOiBKdWxpYW4gRWxpc2NoZXIgPGp1bGlhbkBlbGlzY2hlci5vcmc+DQpUbzogVG9u eSBGaW5jaCA8ZG90QGRvdGF0LmF0Pg0KQ2M6IDxhcmNoQGZyZWVic2Qub3JnPg0KU2VudDogVGh1 cnNkYXksIE5vdmVtYmVyIDMwLCAyMDAwIDEwOjMxIFBNDQpTdWJqZWN0OiBSZTogVGhyZWFkLXNw ZWNpZmljIGRhdGEgYW5kIEtTRXMNCg0KDQo+IFRvbnkgRmluY2ggd3JvdGU6DQo+ID4gDQo+ID4g RGFuaWVsIEVpc2NoZW4gPGVpc2NoZW5AdmlncmlkLmNvbT4gd3JvdGU6DQo+ID4gPk9uIFdlZCwg MjIgTm92IDIwMDAsIEFsZnJlZCBQZXJsc3RlaW4gd3JvdGU6DQo+ID4gPj4NCj4gPiA+PiBXYXMg dGhlcmUgc29tZXRoaW5nIHdyb25nIHdpdGggdGhlIHN1Z2dlc3Rpb24gdG8gcHV0IHRoZSBsb2Nh bCBpbmZvDQo+ID4gPj4gb24gdGhlIHN0YWNrPyAgSSBqdXN0IGRvbid0IHNlZSBpdCBiZWluZyBk aXNjdXNzZWQgYXQgYWxsLg0KPiA+ID4NCj4gPiA+WWVzLCBJIHN0YXRlZCB0aGF0IGl0IGNvdWxk IG5vdCBiZSB1c2VkLiAgV2Ugd2FudCB0byBwcm92aWRlIGEgUE9TSVgNCj4gPiA+Y29tcGxhaW50 IEFQSSwgYW5kIHRoaXMgZGljdGF0ZXMgdGhhdCBhcHBsaWNhdGlvbnMgYmUgYWJsZSB0byBjcmVh dGUNCj4gPiA+c3RhY2tzIG9mIHRoZWlyIG93biBzaXplIGFuZCBjaG9vc2luZy4gIFdlIGNhbid0 IHJlbHkgb24gc3RhY2tzIGJlaW5nDQo+ID4gPmFueSBwYXJ0aWN1bGFyIHNpemUsIG9yIHN0YXJ0 aW5nIGF0IGFueSBwYXJ0aWN1bGFyIGFkZHJlc3MuDQo+ID4gDQo+ID4gQWRkaXRpb25hbGx5LCB3 b3VsZG4ndCB5b3UgaGF2ZSB0byB3YWxrIHVwIHRoZSBzdGFjayB0byBmaW5kIGl0cyBiYXNlPw0K PiA+ICh3aGljaCBJIGd1ZXNzIHdvdWxkIGJlIGEgYml0IG1vcmUgZXhwZW5zaXZlIHRoYW4gZGVy ZWZlcmVuY2luZyAlZ3MpDQo+IA0KPiBObywgeW91IHN0YXJ0IGVhY2ggc3RhY2sgb24gc29tZSBt dWx0aXBsZSBvZiAoc2F5KSAxTUINCj4gYW5kIHRoZW4geW91IGp1c3Qgb3IgaXQgd2l0aCAweGZm ZmZmIHRvIGZpbmQgdGhlIHRvcCBvZiB0aGUgc3RhY2suLg0KPiAoVGhpcyBpcyB3aGF0IG9uZSBv ZiB0aGUgTUFDSCB0aHJlYWRzIHBhY2thZ2VzIHVzZWQgdG8gZG8pDQo+IA0KPiA+IA0KPiA+IFRv bnkuDQo+ID4gLS0NCj4gPiBmLmEubi5maW5jaCAgICAgZG90QGRvdGF0LmF0ICAgICBmYW5mQGNv dmFsZW50Lm5ldCAgICAgQ2hhZCBmb3IgUHJlc2lkZW50IQ0KPiA+IA0KPiA+IFRvIFVuc3Vic2Ny aWJlOiBzZW5kIG1haWwgdG8gbWFqb3Jkb21vQEZyZWVCU0Qub3JnDQo+ID4gd2l0aCAidW5zdWJz Y3JpYmUgZnJlZWJzZC1hcmNoIiBpbiB0aGUgYm9keSBvZiB0aGUgbWVzc2FnZQ0KPiANCj4gLS0g DQo+ICAgICAgIF9fLS1ffFwgIEp1bGlhbiBFbGlzY2hlcg0KPiAgICAgIC8gICAgICAgXCBqdWxp YW5AZWxpc2NoZXIub3JnDQo+ICAgICAoICAgT1ogICAgKSBXb3JsZCB0b3VyIDIwMDANCj4gLS0t PiBYXy4tLS0uXy8gIHByZXNlbnRseSBpbjogIEJ1ZGFwZXN0DQo+ICAgICAgICAgICAgIHYNCj4g DQo+IA0KPiANCj4gDQo+IFRvIFVuc3Vic2NyaWJlOiBzZW5kIG1haWwgdG8gbWFqb3Jkb21vQEZy ZWVCU0Qub3JnDQo+IHdpdGggInVuc3Vic2NyaWJlIGZyZWVic2QtYXJjaCIgaW4gdGhlIGJvZHkg b2YgdGhlIG1lc3NhZ2UNCg== To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 17:24:29 2000 Delivered-To: freebsd-arch@freebsd.org Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by hub.freebsd.org (Postfix) with ESMTP id C024437B400 for ; Thu, 30 Nov 2000 17:24:25 -0800 (PST) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id UAA02260; Thu, 30 Nov 2000 20:24:15 -0500 (EST) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.1/8.9.1) id eB11OF003769; Thu, 30 Nov 2000 20:24:15 -0500 (EST) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Thu, 30 Nov 2000 20:24:14 -0500 (EST) To: Bosko Milekic Cc: "Kenneth D. Merry" , arch@FreeBSD.ORG Subject: Re: zero copy code review In-Reply-To: References: <20001129231653.A1503@panzer.kdm.org> X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs Lucid Message-ID: <14886.63486.157224.937225@grasshopper.cs.duke.edu> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Bosko, Thanks for your comments. I'm a little disconnected from the code these days, as I do most of my development in a 4.0-RELEASE environment. (Ken ported my contributions forward). Bosko Milekic writes: > > In general, I am pro-the zero copy stuff you've been > gathering/merging/updating/writing/etc. over the past several months. > Looking at the sendfile portion of your changes, it's pretty obvious that > they are very minimal, but I'm curious as to why you've bothered removing > the "static" before the sf_buf_free(). I can see why it really has no > significance in the sf_buf_alloc() case, but sf_buf_free() is attached to > the mbuf's m_ext free function pointer (I'm really just curious if the > motivation was strictly stylistic). It was un-staticized because it is called by socow_iodone(), which is the m_ext free for zero-copy transmissions. > Here some other notes, which I came across during a real quick read > of some of the code (I am sort of in a pre-final-exam period, so I can't > dedicate too much time to this for the next 2 weeks, about :-( ): > > in nfs/nfs_serv.c: > In your first "BEGIN SUSPECT REGION" block: > > - You allocate an sf_buf somewhere down the line and then attempt to > allocate an mbuf to which you will hope to attach the sf_buf to. If the > mbuf allocation fails, you don't seem to free the sf_buf anywhere and > consequently, it looks as though you may leak sf_bufs. But the mbuf is allocated using M_WAIT. Can that fail? I haven't kept up with the mbuf changes in -current. > - You only m_freem() on mb (the header mbuf) if mb->m_next != NULL, > but if there is no m_next (m_next == NULL), you don't seem to free the mb > mbuf (header mbuf) at all. Is this meant to be this way? (Note that it > may very well be, I haven't looked at all the other surrounding code, > just making sure). Yes. Like most of the NFS code, it is a little convoluted.. mb is a pre-existing mbuf chain that we're attaching mbufs to. In the failure case (where the mfreem I think you're talking about is), we backout what we've done by freeing the mbufs we've added to mb, return mb->next to null, and continue in the normal (copy) path. <... some helpful comments deleted ....> Many of your comments are directly related to -current, I think I'll let Ken address them... > I would strongly urge you to run some tests under real heavy network > activity (possibly lower NMBCLUSTERS for this) and starve out the mbuf > resources and see if anything strange happens - you may catch a couple of > leaks that may have accidently slipped through. Finally, I'd like to > suggest possibly breaking up some of the diff to smaller chunks, just so > it is easier to track things down if something does break. With -CURRENT > changing relatively dramatically now sometimes several times in a single > day, I think this would be worth it for everybody. FWIW, the client-side nfs changes (in their 4.0-RELEASE form) are in daily use in our lab and have been for months. We run experiments with 8 clients running against our Slice cluster nfs file server. Each client is close to maxed-out (60-70MB/sec per client, typically) for hours... ;) Thank you for your feedback. And thank you for impoving the mbuf system so much. I wasted a whole afternoon yesterday doing something which I could have done in 5 minutes if only I had mext_refcnt in 4.0 ;) Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 19:18: 7 2000 Delivered-To: freebsd-arch@freebsd.org Received: from field.videotron.net (field.videotron.net [205.151.222.108]) by hub.freebsd.org (Postfix) with ESMTP id 94ECF37B400; Thu, 30 Nov 2000 19:18:03 -0800 (PST) Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213]) by field.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G4V003BYD61AF@field.videotron.net>; Thu, 30 Nov 2000 22:18:01 -0500 (EST) Date: Thu, 30 Nov 2000 22:18:43 -0500 (EST) From: Bosko Milekic Subject: Re: zero copy code review In-reply-to: <14886.63486.157224.937225@grasshopper.cs.duke.edu> To: Andrew Gallatin Cc: "Kenneth D. Merry" , arch@FreeBSD.ORG, alfred@freebsd.org Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hi Andrew, On Thu, 30 Nov 2000, Andrew Gallatin wrote: [...] > Bosko Milekic writes: > > > > In general, I am pro-the zero copy stuff you've been > > gathering/merging/updating/writing/etc. over the past several months. > > Looking at the sendfile portion of your changes, it's pretty obvious that > > they are very minimal, but I'm curious as to why you've bothered removing > > the "static" before the sf_buf_free(). I can see why it really has no > > significance in the sf_buf_alloc() case, but sf_buf_free() is attached to > > the mbuf's m_ext free function pointer (I'm really just curious if the > > motivation was strictly stylistic). > > It was un-staticized because it is called by socow_iodone(), which > is the m_ext free for zero-copy transmissions. I see. But if the sendfile code still passes it as its own free routine, then shouldn't it remain staticized, strictly speaking? Although I may have missed it in the large diff, I did not see any changes to the actual registering of sf_bufs in the actual sendfile code (i.e. uipc_syscalls.c). I'm under the impression that in uipc_syscalls.c, the MEXTADD which sets up an sf_buf with an mbuf still passes sf_buf_free as its free routine. > > Here some other notes, which I came across during a real quick read > > of some of the code (I am sort of in a pre-final-exam period, so I can't > > dedicate too much time to this for the next 2 weeks, about :-( ): > > > > in nfs/nfs_serv.c: > > In your first "BEGIN SUSPECT REGION" block: > > > > - You allocate an sf_buf somewhere down the line and then attempt to > > allocate an mbuf to which you will hope to attach the sf_buf to. If the > > mbuf allocation fails, you don't seem to free the sf_buf anywhere and > > consequently, it looks as though you may leak sf_bufs. > > But the mbuf is allocated using M_WAIT. Can that fail? I haven't > kept up with the mbuf changes in -current. Yes, it can. M_WAIT just means "if nothing is available, first drain the stacks and if still nothing is available, then wait kern.ipc.mbuf_wait ticks (sysctl) and if still nothing is available, fail and set the passed in pointer to NULL and hope that the caller will deal with it." Waiting indefinetly can be dangerous in certain situations (for mbufs) but I won't get into that here. In your code, you do deal with the possibility of the MGETHDR returning NULL (you check for it) and you set ENOBUFS in that case and jump to the "errorpath" label. But, before using MGETHDR, you allocate an sf_buf (in sf) and it just so happens that the code beyond "errorpath" does not take care of freeing the sf_buf you allocated before even trying to allocate the mbuf. Another thing to note, especially if you are Pre-SMPng: sf_buf_alloc calls can block, and even indeffinately (until the allocation is succesfull). In sendfile(2), this doesn't matter as you're not allocating the sf_buf from an interrupt. It has the potential to be a problem if you start allocating sf_bufs from interrupt context. Unfortunately, I haven't yet read+fully visualized all the code in the large diff, but this is something to take into account when reviewing. > > - You only m_freem() on mb (the header mbuf) if mb->m_next != NULL, > > but if there is no m_next (m_next == NULL), you don't seem to free the mb > > mbuf (header mbuf) at all. Is this meant to be this way? (Note that it > > may very well be, I haven't looked at all the other surrounding code, > > just making sure). > > Yes. Like most of the NFS code, it is a little convoluted.. mb is a > pre-existing mbuf chain that we're attaching mbufs to. In the failure > case (where the mfreem I think you're talking about is), we backout > what we've done by freeing the mbufs we've added to mb, return > mb->next to null, and continue in the normal (copy) path. Excellent. > <... some helpful comments deleted ....> > > Many of your comments are directly related to -current, I > think I'll let Ken address them... Another one directly related to -CURRENT: I just noticed that the uipc_jumbo.c stuff does not do any locking. Perhaps it would be nice to lock the code sooner or later. I would be willing to go over it and do it but, as I said, I am really not going to be able to do much until 2 weeks from now. Furthermore, I wonder if Alfred is gutsy enough ( :-) ) to raise his voice and let us know how much this may interefere with the adding of locks to sockets in the uipc subsystem, and possibly the stack as well. Alfred, where are the potential problems? (As you've already written a portion of the latter, I assume you're very well aware)... > > I would strongly urge you to run some tests under real heavy network > > activity (possibly lower NMBCLUSTERS for this) and starve out the mbuf > > resources and see if anything strange happens - you may catch a couple of > > leaks that may have accidently slipped through. Finally, I'd like to > > suggest possibly breaking up some of the diff to smaller chunks, just so > > it is easier to track things down if something does break. With -CURRENT > > changing relatively dramatically now sometimes several times in a single > > day, I think this would be worth it for everybody. > > FWIW, the client-side nfs changes (in their 4.0-RELEASE form) are in > daily use in our lab and have been for months. We run experiments with > 8 clients running against our Slice cluster nfs file server. Each > client is close to maxed-out (60-70MB/sec per client, typically) for > hours... ;) Okay. Well, it's my understanding that the code is pretty stable; I just want to make sure that the case is the same in -CURRENT, especially when _mbufs_ are _completely_ starved. > Thank you for your feedback. And thank you for impoving the mbuf > system so much. I wasted a whole afternoon yesterday doing something > which I could have done in 5 minutes if only I had mext_refcnt in 4.0 ;) Heh; no problem, really. :-) Thanks! > Drew Cheers, Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 19:47:43 2000 Delivered-To: freebsd-arch@freebsd.org Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178]) by hub.freebsd.org (Postfix) with ESMTP id 9365837B400; Thu, 30 Nov 2000 19:47:41 -0800 (PST) Received: by peorth.iteration.net (Postfix, from userid 1001) id 32863573A9; Thu, 30 Nov 2000 21:47:45 -0600 (CST) Date: Thu, 30 Nov 2000 21:47:45 -0600 From: "Michael C . Wu" To: Peter Jeremy Cc: Poul-Henning Kamp , arch@FreeBSD.ORG Subject: Re: RANDOMDEV inspired realitycheck regarding i386/i486... Message-ID: <20001130214745.E28757@peorth.iteration.net> Reply-To: "Michael C . Wu" References: <11485.974210886@critter> <20001201102915.G1474@gsmx07.alcatel.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20001201102915.G1474@gsmx07.alcatel.com.au>; from peter.jeremy@alcatel.com.au on Fri, Dec 01, 2000 at 10:29:15AM +1100 X-PGP-Fingerprint: 5025 F691 F943 8128 48A8 5025 77CE 29C5 8FA1 2E20 X-PGP-Key-ID: 0x8FA12E20 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, Dec 01, 2000 at 10:29:15AM +1100, Peter Jeremy scribbled: | On 2000-Nov-14 15:08:06 +0100, Poul-Henning Kamp wrote: | >Has anybody run a 486 or 386 under current recently ? | | X on a PRE_SMPNG 486 is painful - mouse movements no longer make | the X pointer move in real time. I haven't noticed the seeding | issue (probably just luck). PRE_SMPNG does not have the /dev/random seeding issue. You actually expected X to run well on a 486? :-) | >What is the consensus ? | | I think 386/486 remains a significant market and would not like to | see support dropped. I'd go so far as to suggest that if -current | does drop support for the 386/486, the then-stable version will need | to be actively maintained indefinitely to provide continued support. I do not really think the latest XFree86 versions were designed with running 386/486 in mind. 386/486 is still a market, but not many people try to build an embedded system with a full X and tools. -- +------------------------------------------------------------------+ | keichii@peorth.iteration.net | keichii@bsdconspiracy.net | | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. | +------------------------------------------------------------------+ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 20:21:53 2000 Delivered-To: freebsd-arch@freebsd.org Received: from netau1.alcanet.com.au (ntp.alcanet.com.au [203.62.196.27]) by hub.freebsd.org (Postfix) with ESMTP id 1C7F637B400 for ; Thu, 30 Nov 2000 20:21:49 -0800 (PST) Received: from mfg1.cim.alcatel.com.au (mfg1.cim.alcatel.com.au [139.188.23.1]) by netau1.alcanet.com.au (8.9.3 (PHNE_18979)/8.9.3) with ESMTP id PAA25911; Fri, 1 Dec 2000 15:21:44 +1100 (EDT) Received: from gsmx07.alcatel.com.au by cim.alcatel.com.au (PMDF V5.2-32 #37641) with ESMTP id <01JX70PFD0XCE7XDQI@cim.alcatel.com.au>; Fri, 1 Dec 2000 15:21:39 +1100 Received: (from jeremyp@localhost) by gsmx07.alcatel.com.au (8.11.0/8.11.0) id eB14LbJ03578; Fri, 01 Dec 2000 15:21:37 +1100 (EST envelope-from jeremyp) Content-return: prohibited Date: Fri, 01 Dec 2000 15:21:37 +1100 From: Peter Jeremy Subject: Re: RANDOMDEV inspired realitycheck regarding i386/i486... In-reply-to: <20001130214745.E28757@peorth.iteration.net>; from keichii@iteration.net on Thu, Nov 30, 2000 at 09:47:45PM -0600 To: "Michael C . Wu" Cc: arch@FreeBSD.ORG Mail-followup-to: "Michael C . Wu" , arch@FreeBSD.ORG Message-id: <20001201152137.K1474@gsmx07.alcatel.com.au> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Content-disposition: inline User-Agent: Mutt/1.2.5i References: <11485.974210886@critter> <20001201102915.G1474@gsmx07.alcatel.com.au> <20001130214745.E28757@peorth.iteration.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 2000-Nov-30 21:47:45 -0600, "Michael C . Wu" wrote: >On Fri, Dec 01, 2000 at 10:29:15AM +1100, Peter Jeremy scribbled: >| On 2000-Nov-14 15:08:06 +0100, Poul-Henning Kamp wrote: >| >Has anybody run a 486 or 386 under current recently ? >| >| X on a PRE_SMPNG 486 is painful - mouse movements no longer make >| the X pointer move in real time. I haven't noticed the seeding >| issue (probably just luck). > >PRE_SMPNG does not have the /dev/random seeding issue. > >You actually expected X to run well on a 486? :-) It used to run reasonably well (ignoring hogs like Netscape) before Yarrow was added. I'm hoping that once yarrow is threaded performance will return to a usable level. Keep in mind that a 486 is relatively powerful compared to the available systems when X was designed. >I do not really think the latest XFree86 versions were designed >with running 386/486 in mind. 386/486 is still a market, but >not many people try to build an embedded system with a full X >and tools. I'm running XFree86 3.x, rather than 4.x. I agree that X is unlikely in most embedded applications, but blocking in the kernel for an extended period is likely to be equally unacceptable. Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 20:34:14 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 5D47F37B400 for ; Thu, 30 Nov 2000 20:34:12 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id eB14Y8M19291; Thu, 30 Nov 2000 20:34:08 -0800 (PST) Date: Thu, 30 Nov 2000 20:34:08 -0800 From: Alfred Perlstein To: Bosko Milekic Cc: Andrew Gallatin , "Kenneth D. Merry" , arch@FreeBSD.ORG Subject: Re: zero copy code review Message-ID: <20001130203407.I8051@fw.wintelcom.net> References: <14886.63486.157224.937225@grasshopper.cs.duke.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from bmilekic@technokratis.com on Thu, Nov 30, 2000 at 10:18:43PM -0500 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Bosko Milekic [001130 19:18] wrote: > > Furthermore, I wonder if Alfred is gutsy enough ( :-) ) to raise his > voice and let us know how much this may interefere with the adding of > locks to sockets in the uipc subsystem, and possibly the stack as well. > Alfred, where are the potential problems? (As you've already written a > portion of the latter, I assume you're very well aware)... This will be somewhat of a large setback for me, but I'm sure I can work around it. If not it will have to go. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 23:16:27 2000 Delivered-To: freebsd-arch@freebsd.org Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169]) by hub.freebsd.org (Postfix) with ESMTP id C4DDC37B401; Thu, 30 Nov 2000 23:16:22 -0800 (PST) Received: (from ken@localhost) by panzer.kdm.org (8.9.3/8.9.1) id AAA11183; Fri, 1 Dec 2000 00:16:19 -0700 (MST) (envelope-from ken) Date: Fri, 1 Dec 2000 00:16:19 -0700 From: "Kenneth D. Merry" To: Bosko Milekic Cc: arch@FreeBSD.ORG, gallatin@FreeBSD.ORG Subject: Re: zero copy code review Message-ID: <20001201001619.C10772@panzer.kdm.org> References: <20001129231653.A1503@panzer.kdm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: ; from bmilekic@technokratis.com on Thu, Nov 30, 2000 at 06:00:38PM -0500 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG [ Drew answered some of this, I'll try to answer the rest. ] Thanks for looking at the code! On Thu, Nov 30, 2000 at 18:00:38 -0500, Bosko Milekic wrote: [ ... ] [ Drew answered this part ] > - In the actual MEXTADD(), you don't seem to be passing the M_RDONLY > flag (which is done for sendfile buffer ext mbufs). M_RDONLY is used to > indicate to the rest of the code that the m_data is not to be tampered > with (trimmed, et al) -- in other words, it's read-only. Have you > considered it? That was an oversight, I'll add the flag. (The places where it is used are in uipc_cow.c, if_ti.c and nfs_serv.c.) > - Stylistic suggestion: please try to keep things 25x80. :-) I try, and I think most of the changes are, except for the NFS stuff. I didn't reformat that, although I suppose I could. (It irritates me, too.) > [ skipped all the other NFS + ti driver changes ] > > jumbo.h: > I would like to eventually split the cluster code out of mbuf.h and > uipc_mbuf.c and change jumbo.h/uipc_jumbo.c -> cluster.h/uipc_cluster.c > > mbuf.h: > - Make EXT_DISPOSABLE 3, instead of 300... if you decide to keep it. > The reason I say this is because it seems to me that EXT_DISPOSABLE should > be more of an m_flag than an ext_type, which would probably mean that we > should make m_flag bigger than a short (which it is now). The reason I > argue this is because EXT_DISPOSABLE seems to be more of an indication of > what should be done with the contents of the mbuf. Perhaps what needs to > be done instead is make the EXT_DISPOSABLE flag, have if_ti use the DRV > ext type (like it should be doing) for its external buffers, and make it > set EXT_DISPOSABLE|M_RDONLY during the MEXTADD. Let's not get too strict > with this for now, though, it would be better to make sure everything is > working perfectly until we decide what to do with this - and it can be > changed easily later. In its current incarnation, EXT_DISPOSABLE indicates that the the memory used in the mbuf can be disposed of -- i.e. removed from the kernel's virtual address map. The contents aren't disposed of, they're just moved elsewhere. I don't think most of the rest of the mbuf code is setup to deal with the memory inside a non-external mbuf going away. (Which would be the potential implication of having EXT_DISPOSABLE be a regular m_flag.) > tiio.h: Are you sure tiio.h belongs in src/sys/sys ? Well, it defines the interface for the character device front end for the ti(4) driver. Usually ioctls and supporting structures go in sys/sys. Would you suggest another location? > Also, have you checked whether any locking should be performed here? > Considering that this is all supposed to improve performance, it would be > nice if it didn't all need to run under Giant. I realize that some of this > will have to wait (i.e. VM), but what about the if_ti code? Is that > something that can be looked at RSN? When Bill converted the ti(4) driver from spls to mutexes, I did the same conversion on my modifications to the driver. Is that sufficient? I'm not terribly up-to-date on the mutex stuff. As for the rest of the code, since it was written pre-mutex, it still has the spls in the right places. I suppose that they would just need to be converted to mutexes. (Or is that an overly simplistic way to look at it? :) > I would strongly urge you to run some tests under real heavy network > activity (possibly lower NMBCLUSTERS for this) and starve out the mbuf > resources and see if anything strange happens - you may catch a couple of > leaks that may have accidently slipped through. Good idea, I'll do it if I have the time. :( > Finally, I'd like to > suggest possibly breaking up some of the diff to smaller chunks, just so > it is easier to track things down if something does break. With -CURRENT > changing relatively dramatically now sometimes several times in a single > day, I think this would be worth it for everybody. Heh, well, the big chunk is the Tigon firmware. :) Are you suggesting just splitting the diffs out into multiple files, or actually breaking the changes up? The latter would be rather difficult to do, I think. In any case, the changes aren't on by default, so folks can just not turn them on if they run into problems. Thanks for the review, I'll try to incorporate your suggestions. Ken -- Kenneth Merry ken@kdm.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 23:22:45 2000 Delivered-To: freebsd-arch@freebsd.org Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169]) by hub.freebsd.org (Postfix) with ESMTP id E129537B400; Thu, 30 Nov 2000 23:22:38 -0800 (PST) Received: (from ken@localhost) by panzer.kdm.org (8.9.3/8.9.1) id AAA11231; Fri, 1 Dec 2000 00:22:35 -0700 (MST) (envelope-from ken) Date: Fri, 1 Dec 2000 00:22:35 -0700 From: "Kenneth D. Merry" To: Bosko Milekic Cc: Andrew Gallatin , arch@FreeBSD.ORG, alfred@FreeBSD.ORG Subject: Re: zero copy code review Message-ID: <20001201002235.D10772@panzer.kdm.org> References: <14886.63486.157224.937225@grasshopper.cs.duke.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: ; from bmilekic@technokratis.com on Thu, Nov 30, 2000 at 10:18:43PM -0500 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, Nov 30, 2000 at 22:18:43 -0500, Bosko Milekic wrote: > On Thu, 30 Nov 2000, Andrew Gallatin wrote: > > [...] > > <... some helpful comments deleted ....> > > > > Many of your comments are directly related to -current, I > > think I'll let Ken address them... > > Another one directly related to -CURRENT: > > I just noticed that the uipc_jumbo.c stuff does not do any locking. > Perhaps it would be nice to lock the code sooner or later. I would be > willing to go over it and do it but, as I said, I am really not going to > be able to do much until 2 weeks from now. It does have spls in the right places, in this case splimp() and splvm(). Would you just convert those to the proper mutexes, or are we going to go with per-data-structure mutexes (i.e. a little finer granularity), or...? (I don't know much about the mutex strategy we're using...) > Furthermore, I wonder if Alfred is gutsy enough ( :-) ) to raise his > voice and let us know how much this may interefere with the adding of > locks to sockets in the uipc subsystem, and possibly the stack as well. > Alfred, where are the potential problems? (As you've already written a > portion of the latter, I assume you're very well aware)... Hopefully it won't cause many problems.. Ken -- Kenneth Merry ken@kdm.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 23:24:39 2000 Delivered-To: freebsd-arch@freebsd.org Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169]) by hub.freebsd.org (Postfix) with ESMTP id 05A4037B400; Thu, 30 Nov 2000 23:24:37 -0800 (PST) Received: (from ken@localhost) by panzer.kdm.org (8.9.3/8.9.1) id AAA11249; Fri, 1 Dec 2000 00:24:36 -0700 (MST) (envelope-from ken) Date: Fri, 1 Dec 2000 00:24:36 -0700 From: "Kenneth D. Merry" To: Alfred Perlstein Cc: Bosko Milekic , Andrew Gallatin , arch@FreeBSD.ORG Subject: Re: zero copy code review Message-ID: <20001201002436.E10772@panzer.kdm.org> References: <14886.63486.157224.937225@grasshopper.cs.duke.edu> <20001130203407.I8051@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <20001130203407.I8051@fw.wintelcom.net>; from alfred@FreeBSD.ORG on Thu, Nov 30, 2000 at 08:34:08PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, Nov 30, 2000 at 20:34:08 -0800, Alfred Perlstein wrote: > * Bosko Milekic [001130 19:18] wrote: > > > > Furthermore, I wonder if Alfred is gutsy enough ( :-) ) to raise his > > voice and let us know how much this may interefere with the adding of > > locks to sockets in the uipc subsystem, and possibly the stack as well. > > Alfred, where are the potential problems? (As you've already written a > > portion of the latter, I assume you're very well aware)... > > This will be somewhat of a large setback for me, but I'm sure I can > work around it. If not it will have to go. If you need explanations of things, feel free to let Drew or me know. Hopefully this won't be a major roadblock for your changes. Ken -- Kenneth Merry ken@kdm.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Nov 30 23:30:46 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 32A9937B401 for ; Thu, 30 Nov 2000 23:30:44 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id eB17Ubg23792; Thu, 30 Nov 2000 23:30:37 -0800 (PST) Date: Thu, 30 Nov 2000 23:30:37 -0800 From: Alfred Perlstein To: "Kenneth D. Merry" Cc: Bosko Milekic , Andrew Gallatin , arch@FreeBSD.ORG Subject: Re: zero copy code review Message-ID: <20001130233037.L8051@fw.wintelcom.net> References: <14886.63486.157224.937225@grasshopper.cs.duke.edu> <20001201002235.D10772@panzer.kdm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20001201002235.D10772@panzer.kdm.org>; from ken@kdm.org on Fri, Dec 01, 2000 at 12:22:35AM -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Kenneth D. Merry [001130 23:22] wrote: > On Thu, Nov 30, 2000 at 22:18:43 -0500, Bosko Milekic wrote: > > On Thu, 30 Nov 2000, Andrew Gallatin wrote: > > > > [...] > > > <... some helpful comments deleted ....> > > > > > > Many of your comments are directly related to -current, I > > > think I'll let Ken address them... > > > > Another one directly related to -CURRENT: > > > > I just noticed that the uipc_jumbo.c stuff does not do any locking. > > Perhaps it would be nice to lock the code sooner or later. I would be > > willing to go over it and do it but, as I said, I am really not going to > > be able to do much until 2 weeks from now. > > It does have spls in the right places, in this case splimp() and splvm(). > Would you just convert those to the proper mutexes, or are we going to go > with per-data-structure mutexes (i.e. a little finer granularity), or...? > (I don't know much about the mutex strategy we're using...) The vm system is likely to be the last thing to be locked down, if your code dips in the vm system you'll have to aquire Giant, possibly several times through your codepath, the performance can drop dramatically for the SMP case. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 1 10:17:29 2000 Delivered-To: freebsd-arch@freebsd.org Received: from wall.polstra.com (unknown [206.213.115.74]) by hub.freebsd.org (Postfix) with ESMTP id 9E1D537B400 for ; Fri, 1 Dec 2000 10:17:25 -0800 (PST) Received: from vashon.polstra.com (vashon.polstra.com [206.213.73.13]) by wall.polstra.com (8.9.3/8.9.3) with ESMTP id KAA22993; Fri, 1 Dec 2000 10:11:53 -0800 (PST) (envelope-from jdp@wall.polstra.com) Received: (from jdp@localhost) by vashon.polstra.com (8.11.0/8.11.0) id eB1IBqY01763; Fri, 1 Dec 2000 10:11:52 -0800 (PST) (envelope-from jdp) Date: Fri, 1 Dec 2000 10:11:52 -0800 (PST) Message-Id: <200012011811.eB1IBqY01763@vashon.polstra.com> To: arch@freebsd.org From: John Polstra Reply-To: arch@freebsd.org Cc: marcel@cup.hp.com Subject: Re: Modifying FILE to add lock In-Reply-To: <3A257ABD.5238ED4E@cup.hp.com> References: <3A257ABD.5238ED4E@cup.hp.com> Organization: Polstra & Co., Seattle, WA Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In article <3A257ABD.5238ED4E@cup.hp.com>, Marcel Moolenaar wrote: > > Having done the signal changes, I immediately have to think about the > Modula port... Thank you, Marcel. :-) Modula-3 does indeed have its own rendition of the FILE structure, which is supposed to match the system's version exactly. So it is a problem, in theory. In practice it is not such a problem, because as far as I know, there aren't any Modula-3 programs which use the stdio interface for their I/O. Modula-3 has its own I/O system which uses read() and write() rather than stdio. The #1 biggest hassle with the Modula-3 stuff is that it has Modula-3 versions of all of the system structures, and they have to match exactly for things to work. Some day I swear I'm going to work out a way to generate the M3 versions automatically from the header files in /usr/include ... John -- John Polstra jdp@polstra.com John D. Polstra & Co., Inc. Seattle, Washington USA "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 1 12: 8:20 2000 Delivered-To: freebsd-arch@freebsd.org Received: from rover.village.org (rover.village.org [204.144.255.66]) by hub.freebsd.org (Postfix) with ESMTP id E443A37B400 for ; Fri, 1 Dec 2000 12:08:17 -0800 (PST) Received: from harmony.village.org (harmony.village.org [10.0.0.6]) by rover.village.org (8.11.0/8.11.0) with ESMTP id eB1K8DQ79210; Fri, 1 Dec 2000 13:08:13 -0700 (MST) (envelope-from imp@harmony.village.org) Received: from harmony.village.org (localhost.village.org [127.0.0.1]) by harmony.village.org (8.9.3/8.8.3) with ESMTP id NAA08306; Fri, 1 Dec 2000 13:08:12 -0700 (MST) Message-Id: <200012012008.NAA08306@harmony.village.org> To: Wes Peters Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) Cc: arch@FreeBSD.ORG In-reply-to: Your message of "Wed, 29 Nov 2000 00:54:42 MST." <3A24B642.34B50961@softweyr.com> References: <3A24B642.34B50961@softweyr.com> <52694.975362925@winston.osd.bsdi.com> <20001127144809.A67395@citusc17.usc.edu> <200011272307.eARN7Ln34886@earth.backplane.com> Date: Fri, 01 Dec 2000 13:08:12 -0700 From: Warner Losh Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <3A24B642.34B50961@softweyr.com> Wes Peters writes: : IMHO, this is one of the biggest arguments for using bash. I get bitten : all the time when I leave bash for another interactive program that no : longer provides BS/DEL compatibility. Fixing it everywhere is a good : idea. I see that this has already been committed. I'm not going to argue with that (I think it was a good idea), but there are other issues in the tree. The issue that I have is that there are many places in the tree where the erase character is known and things are done based on it. Will all of those be updated to have the two aces? There's a hack in hack right now: ./games/hack/hack.tty.c: if(c == erase_char || c == '\b') { as well as other examples in the tree. Talk also has a provision for transporting these characters over the interface. If both were allowed, some translation would also be needed. Warner To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 1 14:51:19 2000 Delivered-To: freebsd-arch@freebsd.org Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by hub.freebsd.org (Postfix) with ESMTP id 7454137B400; Fri, 1 Dec 2000 14:51:16 -0800 (PST) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id RAA23194; Fri, 1 Dec 2000 17:51:14 -0500 (EST) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.1/8.9.1) id eB1MpEp06117; Fri, 1 Dec 2000 17:51:14 -0500 (EST) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Fri, 1 Dec 2000 17:51:14 -0500 (EST) To: Bosko Milekic Cc: "Kenneth D. Merry" , arch@FreeBSD.ORG, alfred@FreeBSD.ORG Subject: Re: zero copy code review In-Reply-To: References: <14886.63486.157224.937225@grasshopper.cs.duke.edu> X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs Lucid Message-ID: <14888.9802.415926.434956@grasshopper.cs.duke.edu> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Bosko Milekic writes: > > It was un-staticized because it is called by socow_iodone(), which > > is the m_ext free for zero-copy transmissions. > > I see. But if the sendfile code still passes it as its own free > routine, then shouldn't it remain staticized, strictly speaking? Although > I may have missed it in the large diff, I did not see any changes to the > actual registering of sf_bufs in the actual sendfile code (i.e. > uipc_syscalls.c). I'm under the impression that in uipc_syscalls.c, the > MEXTADD which sets up an sf_buf with an mbuf still passes sf_buf_free as > its free routine. I'm still not sure I understand your objection. There's some code in socow_cowsetup() which uses sf bufs. Prior to allocating the sf_buf, it does some of its own fiddling with the page and introduces some state the sf_buf_free() wouldn't know how to clear. socow_iodone() undoes that fiddling and then calls sf_buf_free() to free the sfbuf. Isn't it better to call sf_buf_free() than to cut & paste the code? <...> > > But the mbuf is allocated using M_WAIT. Can that fail? I haven't > > kept up with the mbuf changes in -current. > > Yes, it can. M_WAIT just means "if nothing is available, first drain Eeek! I had no idea; I was thinking of it as blocking forever. This will have to be addressed. Thank you for pointing it out! > the stacks and if still nothing is available, then wait > kern.ipc.mbuf_wait ticks (sysctl) and if still nothing is available, fail > and set the passed in pointer to NULL and hope that the caller will deal > with it." Waiting indefinetly can be dangerous in certain situations (for > mbufs) but I won't get into that here. > In your code, you do deal with the possibility of the MGETHDR > returning NULL (you check for it) and you set ENOBUFS in that case and > jump to the "errorpath" label. But, before using MGETHDR, you allocate an > sf_buf (in sf) and it just so happens that the code beyond "errorpath" > does not take care of freeing the sf_buf you allocated before even > trying to allocate the mbuf. I see your point. This was copied, (bug for bug ;-), from sendfile itself. Look at line 1700 or so of kern/uipc_syscalls.c.. This bug should probaby be fixed there too.. > Another thing to note, especially if you are Pre-SMPng: sf_buf_alloc > calls can block, and even indeffinately (until the allocation is > succesfull). In sendfile(2), this doesn't matter as you're not allocating > the sf_buf from an interrupt. It has the potential to be a problem if you > start allocating sf_bufs from interrupt context. Unfortunately, I haven't > yet read+fully visualized all the code in the large diff, but this is > something to take into account when reviewing. The nfs sf_buf_alloc() calls will be made from either a process context (when doing a zero-copy send over a socket) or from the context of an nfsiod for the NFS code, so I think this should be safe. Thanks! Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 1 15:40:14 2000 Delivered-To: freebsd-arch@freebsd.org Received: from implode.root.com (root.com [209.102.106.178]) by hub.freebsd.org (Postfix) with ESMTP id 748C137B6D0; Fri, 1 Dec 2000 15:29:56 -0800 (PST) Received: from implode.root.com (localhost [127.0.0.1]) by implode.root.com (8.8.8/8.8.5) with ESMTP id PAA14154; Fri, 1 Dec 2000 15:26:19 -0800 (PST) Message-Id: <200012012326.PAA14154@implode.root.com> To: Andrew Gallatin Cc: Bosko Milekic , "Kenneth D. Merry" , arch@FreeBSD.ORG, alfred@FreeBSD.ORG Subject: Re: zero copy code review In-reply-to: Your message of "Fri, 01 Dec 2000 17:51:14 EST." <14888.9802.415926.434956@grasshopper.cs.duke.edu> From: David Greenman Reply-To: dg@root.com Date: Fri, 01 Dec 2000 15:26:19 -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > In your code, you do deal with the possibility of the MGETHDR > > returning NULL (you check for it) and you set ENOBUFS in that case and > > jump to the "errorpath" label. But, before using MGETHDR, you allocate an > > sf_buf (in sf) and it just so happens that the code beyond "errorpath" > > does not take care of freeing the sf_buf you allocated before even > > trying to allocate the mbuf. > >I see your point. This was copied, (bug for bug ;-), from sendfile itself. >Look at line 1700 or so of kern/uipc_syscalls.c.. This bug should >probaby be fixed there too.. Oops. The original assumption (and code that I wrote) was that M_WAIT _cannot_ return a NULL pointer. This was changed in FreeBSD recently, and as you mentioned, the code added in rev 1.65 that now checks for it in sendfile doesn't do complete cleanup in this case. It definately should be fixed so that the sf_buf is freed as well. -DG David Greenman Co-founder, The FreeBSD Project - http://www.freebsd.org President, TeraSolutions, Inc. - http://www.terasolutions.com Pave the road of life with opportunities. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 1 16: 5:11 2000 Delivered-To: freebsd-arch@freebsd.org Received: from implode.root.com (root.com [209.102.106.178]) by hub.freebsd.org (Postfix) with ESMTP id 8CE2B37B400; Fri, 1 Dec 2000 16:05:07 -0800 (PST) Received: from implode.root.com (localhost [127.0.0.1]) by implode.root.com (8.8.8/8.8.5) with ESMTP id QAA14320; Fri, 1 Dec 2000 16:01:42 -0800 (PST) Message-Id: <200012020001.QAA14320@implode.root.com> To: Andrew Gallatin , Bosko Milekic , "Kenneth D. Merry" , arch@FreeBSD.ORG, alfred@FreeBSD.ORG Subject: Re: zero copy code review In-reply-to: Your message of "Fri, 01 Dec 2000 15:26:19 PST." <200012012326.PAA14154@implode.root.com> From: David Greenman Reply-To: dg@root.com Date: Fri, 01 Dec 2000 16:01:41 -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG >> > In your code, you do deal with the possibility of the MGETHDR >> > returning NULL (you check for it) and you set ENOBUFS in that case and >> > jump to the "errorpath" label. But, before using MGETHDR, you allocate an >> > sf_buf (in sf) and it just so happens that the code beyond "errorpath" >> > does not take care of freeing the sf_buf you allocated before even >> > trying to allocate the mbuf. >> >>I see your point. This was copied, (bug for bug ;-), from sendfile itself. >>Look at line 1700 or so of kern/uipc_syscalls.c.. This bug should >>probaby be fixed there too.. > > Oops. The original assumption (and code that I wrote) was that M_WAIT >_cannot_ return a NULL pointer. This was changed in FreeBSD recently, and >as you mentioned, the code added in rev 1.65 that now checks for it in >sendfile doesn't do complete cleanup in this case. It definately should >be fixed so that the sf_buf is freed as well. Followup...the attached patch should fix the problem. -DG David Greenman Co-founder, The FreeBSD Project - http://www.freebsd.org President, TeraSolutions, Inc. - http://www.terasolutions.com Pave the road of life with opportunities. Index: uipc_syscalls.c =================================================================== RCS file: /home/ncvs/src/sys/kern/uipc_syscalls.c,v retrieving revision 1.65.2.3 diff -c -r1.65.2.3 uipc_syscalls.c *** uipc_syscalls.c 2000/08/16 19:20:31 1.65.2.3 --- uipc_syscalls.c 2000/12/01 23:54:19 *************** *** 1628,1633 **** --- 1630,1636 ---- MGETHDR(m, M_WAIT, MT_DATA); if (m == NULL) { error = ENOBUFS; + sf_buf_free((void *)sf->kva, PAGE_SIZE); goto done; } m->m_ext.ext_free = sf_buf_free; To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 1 16:50:59 2000 Delivered-To: freebsd-arch@freebsd.org Received: from falla.videotron.net (falla.videotron.net [205.151.222.106]) by hub.freebsd.org (Postfix) with ESMTP id 02E4137B400 for ; Fri, 1 Dec 2000 16:50:58 -0800 (PST) Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213]) by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G4X00ICE10VZM@falla.videotron.net> for arch@FreeBSD.ORG; Fri, 1 Dec 2000 19:50:55 -0500 (EST) Date: Fri, 01 Dec 2000 19:51:39 -0500 (EST) From: Bosko Milekic Subject: Re: zero copy code review In-reply-to: <200012020001.QAA14320@implode.root.com> To: David Greenman Cc: arch@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, 1 Dec 2000, David Greenman wrote: > Followup...the attached patch should fix the problem. > > -DG > > David Greenman > Co-founder, The FreeBSD Project - http://www.freebsd.org > President, TeraSolutions, Inc. - http://www.terasolutions.com > Pave the road of life with opportunities. Cool. Committed to both -CURRENT and -STABLE... Cheers, Bosko. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 1 17:57:11 2000 Delivered-To: freebsd-arch@freebsd.org Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by hub.freebsd.org (Postfix) with ESMTP id E5C6437B400; Fri, 1 Dec 2000 17:57:08 -0800 (PST) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id UAA25666; Fri, 1 Dec 2000 20:57:01 -0500 (EST) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.1/8.9.1) id eB21v1Y06449; Fri, 1 Dec 2000 20:57:01 -0500 (EST) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Fri, 1 Dec 2000 20:57:00 -0500 (EST) To: dg@root.com Cc: Bosko Milekic , "Kenneth D. Merry" , arch@FreeBSD.ORG, alfred@FreeBSD.ORG Subject: Re: zero copy code review In-Reply-To: <200012012326.PAA14154@implode.root.com> References: <14888.9802.415926.434956@grasshopper.cs.duke.edu> <200012012326.PAA14154@implode.root.com> X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs Lucid Message-ID: <14888.22179.833528.247128@grasshopper.cs.duke.edu> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG David Greenman writes: > Oops. The original assumption (and code that I wrote) was that M_WAIT > _cannot_ return a NULL pointer. This was changed in FreeBSD recently, and Yes, that's always been my assumption too. That's why I never noticed it... Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 1 18: 1:25 2000 Delivered-To: freebsd-arch@freebsd.org Received: from feral.com (feral.com [192.67.166.1]) by hub.freebsd.org (Postfix) with ESMTP id 0C2B737B400; Fri, 1 Dec 2000 18:01:24 -0800 (PST) Received: from beppo (beppo [192.67.166.79]) by feral.com (8.9.3/8.9.3) with ESMTP id SAA08647; Fri, 1 Dec 2000 18:01:03 -0800 Date: Fri, 1 Dec 2000 18:01:04 -0800 (PST) From: Matthew Jacob Reply-To: mjacob@feral.com To: Andrew Gallatin Cc: dg@root.com, Bosko Milekic , "Kenneth D. Merry" , arch@FreeBSD.ORG, alfred@FreeBSD.ORG Subject: Re: zero copy code review In-Reply-To: <14888.22179.833528.247128@grasshopper.cs.duke.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > David Greenman writes: > > Oops. The original assumption (and code that I wrote) was that M_WAIT > > _cannot_ return a NULL pointer. This was changed in FreeBSD recently, and > > Yes, that's always been my assumption too. That's why I never noticed > it... IIRC, this has never been guaranteed. It's often unlikely that a request can't be satisfied after a sleep with the current code. We used to kill off shell pipes by spraying Sparc-1s as a test. This was another reason (at the time) that SunOS (4.2based with 4.3 changes- pipes were implemented with mbufs) was considered eligible to be replaced with SVr4. -matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 1 18: 6:23 2000 Delivered-To: freebsd-arch@freebsd.org Received: from implode.root.com (root.com [209.102.106.178]) by hub.freebsd.org (Postfix) with ESMTP id D073737B401; Fri, 1 Dec 2000 18:06:19 -0800 (PST) Received: from implode.root.com (localhost [127.0.0.1]) by implode.root.com (8.8.8/8.8.5) with ESMTP id SAA14681; Fri, 1 Dec 2000 18:02:20 -0800 (PST) Message-Id: <200012020202.SAA14681@implode.root.com> To: mjacob@feral.com Cc: Andrew Gallatin , Bosko Milekic , "Kenneth D. Merry" , arch@FreeBSD.ORG, alfred@FreeBSD.ORG Subject: Re: zero copy code review In-reply-to: Your message of "Fri, 01 Dec 2000 18:01:04 PST." From: David Greenman Reply-To: dg@root.com Date: Fri, 01 Dec 2000 18:02:20 -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > >> >> David Greenman writes: >> > Oops. The original assumption (and code that I wrote) was that M_WAIT >> > _cannot_ return a NULL pointer. This was changed in FreeBSD recently, and >> >> Yes, that's always been my assumption too. That's why I never noticed >> it... > >IIRC, this has never been guaranteed. It's often unlikely that a request can't >be satisfied after a sleep with the current code. FreeBSD blocked indefinitly and never returned a NULL pointer. -DG David Greenman Co-founder, The FreeBSD Project - http://www.freebsd.org President, TeraSolutions, Inc. - http://www.terasolutions.com Pave the road of life with opportunities. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 1 18: 6:28 2000 Delivered-To: freebsd-arch@freebsd.org Received: from feral.com (feral.com [192.67.166.1]) by hub.freebsd.org (Postfix) with ESMTP id 9D58637B400; Fri, 1 Dec 2000 18:06:26 -0800 (PST) Received: from beppo (beppo [192.67.166.79]) by feral.com (8.9.3/8.9.3) with ESMTP id SAA08663; Fri, 1 Dec 2000 18:06:22 -0800 Date: Fri, 1 Dec 2000 18:06:22 -0800 (PST) From: Matthew Jacob Reply-To: mjacob@feral.com To: David Greenman Cc: Andrew Gallatin , Bosko Milekic , "Kenneth D. Merry" , arch@FreeBSD.ORG, alfred@FreeBSD.ORG Subject: Re: zero copy code review In-Reply-To: <200012020202.SAA14681@implode.root.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > > >> > >> David Greenman writes: > >> > Oops. The original assumption (and code that I wrote) was that M_WAIT > >> > _cannot_ return a NULL pointer. This was changed in FreeBSD recently, and > >> > >> Yes, that's always been my assumption too. That's why I never noticed > >> it... > > > >IIRC, this has never been guaranteed. It's often unlikely that a request can't > >be satisfied after a sleep with the current code. > > FreeBSD blocked indefinitly and never returned a NULL pointer. Smells like livelock somewhere here, but has it changed recently as has been asserted? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 1 18:26:36 2000 Delivered-To: freebsd-arch@freebsd.org Received: from implode.root.com (root.com [209.102.106.178]) by hub.freebsd.org (Postfix) with ESMTP id 99A2137B400; Fri, 1 Dec 2000 18:26:33 -0800 (PST) Received: from implode.root.com (localhost [127.0.0.1]) by implode.root.com (8.8.8/8.8.5) with ESMTP id SAA14753; Fri, 1 Dec 2000 18:22:36 -0800 (PST) Message-Id: <200012020222.SAA14753@implode.root.com> To: mjacob@feral.com Cc: Andrew Gallatin , Bosko Milekic , "Kenneth D. Merry" , arch@FreeBSD.ORG, alfred@FreeBSD.ORG Subject: Re: zero copy code review In-reply-to: Your message of "Fri, 01 Dec 2000 18:06:22 PST." From: David Greenman Reply-To: dg@root.com Date: Fri, 01 Dec 2000 18:22:36 -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG >> >> Yes, that's always been my assumption too. That's why I never noticed >> >> it... >> > >> >IIRC, this has never been guaranteed. It's often unlikely that a request can't >> >be satisfied after a sleep with the current code. >> >> FreeBSD blocked indefinitly and never returned a NULL pointer. > >Smells like livelock somewhere here, but has it changed recently as has been >asserted? Huh? No, the process allocating the memory blocks waiting for memory. If memory never becomes available, then the process never wakes up, but this is NOT a livelock. -DG David Greenman Co-founder, The FreeBSD Project - http://www.freebsd.org President, TeraSolutions, Inc. - http://www.terasolutions.com Pave the road of life with opportunities. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 1 18:43:38 2000 Delivered-To: freebsd-arch@freebsd.org Received: from feral.com (feral.com [192.67.166.1]) by hub.freebsd.org (Postfix) with ESMTP id 5B22C37B400; Fri, 1 Dec 2000 18:43:36 -0800 (PST) Received: from beppo (beppo [192.67.166.79]) by feral.com (8.9.3/8.9.3) with ESMTP id SAA08716; Fri, 1 Dec 2000 18:43:33 -0800 Date: Fri, 1 Dec 2000 18:43:33 -0800 (PST) From: Matthew Jacob Reply-To: mjacob@feral.com To: David Greenman Cc: Andrew Gallatin , Bosko Milekic , "Kenneth D. Merry" , arch@FreeBSD.ORG, alfred@FreeBSD.ORG Subject: Re: zero copy code review In-Reply-To: <200012020222.SAA14753@implode.root.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, 1 Dec 2000, David Greenman wrote: > >> >> Yes, that's always been my assumption too. That's why I never noticed > >> >> it... > >> > > >> >IIRC, this has never been guaranteed. It's often unlikely that a request can't > >> >be satisfied after a sleep with the current code. > >> > >> FreeBSD blocked indefinitly and never returned a NULL pointer. > > > >Smells like livelock somewhere here, but has it changed recently as has been > >asserted? > > Huh? No, the process allocating the memory blocks waiting for memory. If > memory never becomes available, then the process never wakes up, but this is > NOT a livelock. > oops, sorry, you're right. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 1 18:58:28 2000 Delivered-To: freebsd-arch@freebsd.org Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193]) by hub.freebsd.org (Postfix) with ESMTP id DEFE837B400 for ; Fri, 1 Dec 2000 18:58:25 -0800 (PST) Received: (from wollman@localhost) by khavrinen.lcs.mit.edu (8.9.3/8.9.3) id VAA45432; Fri, 1 Dec 2000 21:58:21 -0500 (EST) (envelope-from wollman) Date: Fri, 1 Dec 2000 21:58:21 -0500 (EST) From: Garrett Wollman Message-Id: <200012020258.VAA45432@khavrinen.lcs.mit.edu> To: dg@root.com Cc: arch@freebsd.org Subject: Re: zero copy code review X-Newsgroups: mit.lcs.mail.freebsd-arch In-Reply-To: References: Organization: MIT Laboratory for Computer Science Cc: Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In article you write: > FreeBSD blocked indefinitly and never returned a NULL pointer. It has never been like that in the FreeBSD era, to my knowledge. 4.3 (or at least 4.3+Wisconsin NFS) slept for mbufs but panicked if it couldn't allocate a cluster; 4.4 as we got it would drain protocols once, for mbufs only, and then return nil if there were still no mbufs free -- thus causing a page-not-present fault a few instructions later as code which assumed M_WAIT could never fail dereferenced the null pointer. Deadlocks may have been possible under 4.3+NFS, if the kernel wanted to allocate a page of physical memory for more mbufs, but all potentially-available memory was both dirty and backed by NFS (think diskless workstation). My guess is that this is why 4.4 did not sleep. -GAWollman -- Garrett A. Wollman | O Siem / We are all family / O Siem / We're all the same wollman@lcs.mit.edu | O Siem / The fires of freedom Opinions not those of| Dance in the burning flame MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Dec 1 22:37:50 2000 Delivered-To: freebsd-arch@freebsd.org Received: from homer.softweyr.com (bsdconspiracy.net [208.187.122.220]) by hub.freebsd.org (Postfix) with ESMTP id 2335C37B400 for ; Fri, 1 Dec 2000 22:37:45 -0800 (PST) Received: from [127.0.0.1] (helo=softweyr.com ident=Fools trust ident!) by homer.softweyr.com with esmtp (Exim 3.16 #1) id 1426Lg-0000SZ-00; Fri, 01 Dec 2000 23:40:41 -0700 Message-ID: <3A289968.63C593E2@softweyr.com> Date: Fri, 01 Dec 2000 23:40:40 -0700 From: Wes Peters Organization: Softweyr LLC X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.2.12 i386) X-Accept-Language: en MIME-Version: 1.0 To: Warner Losh Cc: arch@FreeBSD.ORG Subject: Re: Rui Pedro Mendes Salgueiro: erase2 patch (was: 4.2-RELEASE ISO image for x86 updated.) References: <3A24B642.34B50961@softweyr.com> <52694.975362925@winston.osd.bsdi.com> <20001127144809.A67395@citusc17.usc.edu> <200011272307.eARN7Ln34886@earth.backplane.com> <200012012008.NAA08306@harmony.village.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Warner Losh wrote: > > In message <3A24B642.34B50961@softweyr.com> Wes Peters writes: > : IMHO, this is one of the biggest arguments for using bash. I get bitten > : all the time when I leave bash for another interactive program that no > : longer provides BS/DEL compatibility. Fixing it everywhere is a good > : idea. > > I see that this has already been committed. I'm not going to argue > with that (I think it was a good idea), but there are other issues in > the tree. > > The issue that I have is that there are many places in the tree where > the erase character is known and things are done based on it. Will > all of those be updated to have the two aces? There's a hack in hack > right now: > > ./games/hack/hack.tty.c: if(c == erase_char || c == '\b') { > > as well as other examples in the tree. > > Talk also has a provision for transporting these characters over the > interface. If both were allowed, some translation would also be > needed. It shouldn't make any different if the interface is in raw mode, which is pretty much required for any character-at-a-time I/O. I would have preferred to see this in a special line discipline module rather than buried on the bowels of the tty driver, so it could be optional behavior. -- "Where am I, and what am I doing in this handbasket?" Wes Peters Softweyr LLC wes@softweyr.com http://softweyr.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Dec 2 9:59:41 2000 Delivered-To: freebsd-arch@freebsd.org Received: from falla.videotron.net (falla.videotron.net [205.151.222.106]) by hub.freebsd.org (Postfix) with ESMTP id 49A0A37B400 for ; Sat, 2 Dec 2000 09:59:39 -0800 (PST) Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213]) by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G4Y005JNCNC4R@falla.videotron.net> for arch@FreeBSD.ORG; Sat, 2 Dec 2000 12:59:36 -0500 (EST) Date: Sat, 02 Dec 2000 13:00:22 -0500 (EST) From: Bosko Milekic Subject: Re: zero copy code review In-reply-to: <20001201002235.D10772@panzer.kdm.org> To: "Kenneth D. Merry" Cc: arch@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, 1 Dec 2000, Kenneth D. Merry wrote: > It does have spls in the right places, in this case splimp() and splvm(). > Would you just convert those to the proper mutexes, or are we going to go > with per-data-structure mutexes (i.e. a little finer granularity), or...? > (I don't know much about the mutex strategy we're using...) For now, you won't be able to do anything with the splvm() stuff, as the VM code has not yet been ripped out from under Giant (and likely won't be for a while). A few notes Re: spl()s and mutexes in uipc_jumbo.c, in particular (since that's where I would begin putting in mutexes): - Your jumbo_kmap singly linked list should probably not be manipulated under splvm() [in fact, I think it's wrong]. The list should be protected by a lock. - jumbo_freem should just be called jumbo_free, if the naming convention is being adopted from the mbuf system (which it looks like it is). The reason is that for mbufs, m_free() frees a single mbuf while m_freem() frees an entire chain of them. - jumbo_pg_free should be ripped out from under splimp(); leave the explicit splvm() in there, but protect the list manipulations with the lock. If most of the things pointed out earlier are fixed, and as long as the code is not flawed (which I really doubt it would be anyway), I have no objections to it going in soon and then attacking the above issue a little later (If nobody gets to it within the next two weeks, I'll be glad to do it myself once those 2 weeks are past). Regards, Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Dec 2 10: 6:15 2000 Delivered-To: freebsd-arch@freebsd.org Received: from falla.videotron.net (falla.videotron.net [205.151.222.106]) by hub.freebsd.org (Postfix) with ESMTP id BD31D37B6A1 for ; Sat, 2 Dec 2000 10:06:10 -0800 (PST) Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213]) by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G4Y0066FCY87W@falla.videotron.net> for arch@FreeBSD.ORG; Sat, 2 Dec 2000 13:06:09 -0500 (EST) Date: Sat, 02 Dec 2000 13:06:54 -0500 (EST) From: Bosko Milekic Subject: Re: zero copy code review In-reply-to: <14888.9802.415926.434956@grasshopper.cs.duke.edu> To: Andrew Gallatin Cc: "Kenneth D. Merry" , arch@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, 1 Dec 2000, Andrew Gallatin wrote: > I'm still not sure I understand your objection. There's some code in > socow_cowsetup() which uses sf bufs. Prior to allocating the sf_buf, it > does some of its own fiddling with the page and introduces some state > the sf_buf_free() wouldn't know how to clear. socow_iodone() undoes > that fiddling and then calls sf_buf_free() to free the sfbuf. Isn't > it better to call sf_buf_free() than to cut & paste the code? > > <...> Yeah, you're right. I overlooked things when I posted that. > I see your point. This was copied, (bug for bug ;-), from sendfile itself. > Look at line 1700 or so of kern/uipc_syscalls.c.. This bug should > probaby be fixed there too.. Yep. You're right. This is a bug that is the result of some of my code, actually (a while back, before I got the commit bit). When the wait code was first introduced, I had to go around the code looking for places previously expecting that M_WAIT will never return NULL and make them deal with the possibility. As we see now, I overlooked the fact that the sf_buf has to be freed in the case of failure, in the sendfile(2) case. Good thing we caught this now, and David Greenman was extremely quick to roll a diff. > The nfs sf_buf_alloc() calls will be made from either a process > context (when doing a zero-copy send over a socket) or from the > context of an nfsiod for the NFS code, so I think this should > be safe. Excellent. > Thanks! > > Drew Cheers, Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Dec 2 10:16:56 2000 Delivered-To: freebsd-arch@freebsd.org Received: from field.videotron.net (field.videotron.net [205.151.222.108]) by hub.freebsd.org (Postfix) with ESMTP id 5B74437B400 for ; Sat, 2 Dec 2000 10:16:53 -0800 (PST) Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213]) by field.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G4Y0067MDG2K7@field.videotron.net> for arch@FreeBSD.ORG; Sat, 2 Dec 2000 13:16:51 -0500 (EST) Date: Sat, 02 Dec 2000 13:17:36 -0500 (EST) From: Bosko Milekic Subject: Re: zero copy code review In-reply-to: <20001201001619.C10772@panzer.kdm.org> To: "Kenneth D. Merry" Cc: arch@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, 1 Dec 2000, Kenneth D. Merry wrote: > > - Stylistic suggestion: please try to keep things 25x80. :-) > > I try, and I think most of the changes are, except for the NFS stuff. I > didn't reformat that, although I suppose I could. (It irritates me, too.) Ah, that explains it. > In its current incarnation, EXT_DISPOSABLE indicates that the the memory > used in the mbuf can be disposed of -- i.e. removed from the kernel's > virtual address map. The contents aren't disposed of, they're just moved > elsewhere. > > I don't think most of the rest of the mbuf code is setup to deal with the > memory inside a non-external mbuf going away. (Which would be the > potential implication of having EXT_DISPOSABLE be a regular m_flag.) Okay, leaving that exactly the way it is now is The Right Thing To Do (I'm now convinced). > > tiio.h: Are you sure tiio.h belongs in src/sys/sys ? > > Well, it defines the interface for the character device front end for the > ti(4) driver. Usually ioctls and supporting structures go in sys/sys. > Would you suggest another location? No, you're right. > When Bill converted the ti(4) driver from spls to mutexes, I did the same > conversion on my modifications to the driver. Is that sufficient? I'm not > terribly up-to-date on the mutex stuff. > > As for the rest of the code, since it was written pre-mutex, it still has > the spls in the right places. I suppose that they would just need to be > converted to mutexes. (Or is that an overly simplistic way to look at it? :) Well, you really only want to maintain data consistency with the lock. So you'll be looking at protecting your jumbo_kmap lists in the uipc_jumbo.c case with their own lock(s). If you're always looking at both of the lists (inuse and free) at the same time, protecting them with a single lock would be sufficient. For what concerns splvm(), you can leave that as is for now. I've included comments regarding locking in another post, for uipc_jumbo.c As for if_ti, I would have Bill Paul review that. > > Finally, I'd like to > > suggest possibly breaking up some of the diff to smaller chunks, just so > > it is easier to track things down if something does break. With -CURRENT > > changing relatively dramatically now sometimes several times in a single > > day, I think this would be worth it for everybody. > > Heh, well, the big chunk is the Tigon firmware. :) > > Are you suggesting just splitting the diffs out into multiple files, or > actually breaking the changes up? The latter would be rather difficult to > do, I think. I was suggesting breaking some of the changes up, actually, and committing in several chunks (two or three, as opposed to one). But if this is too much of a problem, you don't have to feel obliged to implement the suggestion. > Ken > -- > Kenneth Merry > ken@kdm.org Regards, Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Dec 2 14: 9:40 2000 Delivered-To: freebsd-arch@freebsd.org Received: from implode.root.com (root.com [209.102.106.178]) by hub.freebsd.org (Postfix) with ESMTP id 29D6C37B400 for ; Sat, 2 Dec 2000 14:09:38 -0800 (PST) Received: from implode.root.com (localhost [127.0.0.1]) by implode.root.com (8.8.8/8.8.5) with ESMTP id OAA16275; Sat, 2 Dec 2000 14:07:03 -0800 (PST) Message-Id: <200012022207.OAA16275@implode.root.com> To: Garrett Wollman Cc: arch@freebsd.org Subject: Re: zero copy code review In-reply-to: Your message of "Fri, 01 Dec 2000 21:58:21 EST." <200012020258.VAA45432@khavrinen.lcs.mit.edu> From: David Greenman Reply-To: dg@root.com Date: Sat, 02 Dec 2000 14:07:03 -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG >In article you write: > >> FreeBSD blocked indefinitly and never returned a NULL pointer. > >It has never been like that in the FreeBSD era, to my knowledge. 4.3 What we're in dispute over is what happens when the kernel runs out of virtual memory - "mb_map" space. I'm pretty certain that FreeBSD versions < 2.0 did just sleep when running out of mb_map space, although I don't have the code around to verify this claim. It's interesting to note that a process that went to sleep on the map would never wake up since virtual memory allocated to network buffers was never returned to the map and thus the kernel would never satisfy the VM shortage. In FreeBSD 2.0, however, the kernel paniced when running out of mb_map space with a "mb_map full" panic. It did not return a NULL pointer in the M_WAIT case. Starting with FreeBSD 2.0.5, FreeBSD printed a console message and returned a NULL pointer when running out of mb_map. I should have remembered this better since I was the one who made the change for it to do this in rev 1.9 of uipc_mbuf.c. Going back to 4.3 BSD, I see that the code behaved the same way that FreeBSD 2.0 did, specifically in m_clalloc: mbx = rmalloc(mbmap, (long)npg); if (mbx == 0) { if (canwait == M_WAIT) panic("out of mbufs: map full"); return (0); } My main point was that it used to be a safe assumption that a NULL pointer wasn't returned in the M_WAIT case. Now that I see that I was the one who originally broke this assumption, I feel a bit sheepish, so I'll just crawl away quietly and let this discussion progress. :-) -DG David Greenman Co-founder, The FreeBSD Project - http://www.freebsd.org President, TeraSolutions, Inc. - http://www.terasolutions.com Pave the road of life with opportunities. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message