From owner-freebsd-arch Sun Jan 26 0:26:17 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6714837B401 for ; Sun, 26 Jan 2003 00:26:16 -0800 (PST) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id BA70F43F18 for ; Sun, 26 Jan 2003 00:26:15 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0074.cvx40-bradley.dialup.earthlink.net ([216.244.42.74] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18ci6z-0001JQ-00; Sun, 26 Jan 2003 00:25:53 -0800 Message-ID: <3E339AA8.CBD92BDE@mindspring.com> Date: Sun, 26 Jan 2003 00:22:00 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Dan Nelson Cc: "Daniel C. Sobral" , Alexey Dokuchaev , Gordon Tetlow , Garance A Drosihn , arch@FreeBSD.ORG Subject: Re: CFR: Volume labels in FFS References: <20030124212259.GJ53114@roark.gnf.org> <20030124215753.GM53114@roark.gnf.org> <20030124222718.GN53114@roark.gnf.org> <3E31C4F5.972AA69C@mindspring.com> <20030125120433.GA24687@regency.nsu.ru> <3E32EF99.C3E07015@newsguy.com> <3E331344.6A2A60BC@mindspring.com> <20030125235221.GA23649@dan.emsphone.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4d3c2223a32ca456308319e049f291c0e350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Dan Nelson wrote: > > I first suggested this use of the "last mounted on" field back in > > 1994, for the purpose of supporting auto-mounting on device "arrival" > > for removable media. > > You should probably refer to it as your suggestion to rename "last > mounted on" to "volume label", since people seem to think you want it > to keep its original behaviour. 8-). It's string that doesn't need to be there; if the OS doesn't care about its contents, why should you? 8-) 8-). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Jan 26 0:39:58 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EFE7537B401; Sun, 26 Jan 2003 00:39:56 -0800 (PST) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5814243F13; Sun, 26 Jan 2003 00:39:56 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0074.cvx40-bradley.dialup.earthlink.net ([216.244.42.74] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18ciKU-0002Gk-00; Sun, 26 Jan 2003 00:39:51 -0800 Message-ID: <3E339DEA.E1380FCE@mindspring.com> Date: Sun, 26 Jan 2003 00:35:54 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Matthew Dillon Cc: Steve Kargl , Jeff Roberson , Robert Watson , Gary Jennejohn , arch@FreeBSD.ORG Subject: Re: New scheduler (#3) References: <20030125171217.D18109-100000@mail.chesapeake.net> <200301252320.h0PNKVoq090077@apollo.backplane.com> <200301252350.h0PNo6xO009489@apollo.backplane.com> <200301260114.h0Q1EXuu017546@apollo.backplane.com> <20030126013234.GA19891@troutmask.apl.washington.edu> <200301260218.h0Q2IlkX024483@apollo.backplane.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4d5e72e10d85f994265f6dee34ecc7ae1350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Matthew Dillon wrote: > I expect Jeff can fix the more obvious bugs in a few seconds. Dealing > with the cpu-thread-stealing issue is a much harder problem. I saw > the flip-flop and reported it, but did not try to fix it. My fuzzy logic suggestion is to make the CPU migration model "push" instead of "pull". This also gets rid of the need to lock in the scheduler, in all cases (whereas a pull model requires locking each time). The idea is that you create a process migration list for each CPU, and when you migrate, you do it by "pushing" the process onto the least-loaded CPU's migration queue (doing this requires grabbing a queue lock, but checking that the queue is non-NULL does not require grabbing a queue lock, so it is non-locking in the common case, which is the non-migration case). A "pull" model requires locking your queue all the time, since it permits another CPU to access, which requires a lock to keep someone from pulling from your queue while you are accessing your queue. By only permitting access to an exceptional queue, and not having read access to the normal queue for other processes, you save yourself from the locking of your own queue. Relatively simple, and obvious, once explained. The "flip flopping" is solved because "push" only occurs as a result of relative load... in the "pull" case, you have a virtual global queue, so there's no way to avoid it. Basically, "pull" prevents you from using a fuzzy difference in a "figure of merit" indicating CPU load, and from which relative load can be calculated. The only missing thing here is that the figure of merit needs to be atomically readable without a lock -- sizeof(int), in the general case -- which allows it to be calculated locally to the CPU in question, and read (and acted upon) without a lock. My suggestion is to calculate "percentage load". The easiest way to do this is to keep a count of the number of processes on the ready-to-run queue for a given CPU, and just use that number as the basis of a moving average, which is multiplied by 100 (or 1000) to get the integer figure-of-merit. The hysteresis values, which you could allow to be set via sysctl, provide high and low *differential* values, so that the flip-flopping can be damped, as necessary, to get to the point where the effect no longer impacts overall performance. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Jan 26 0:43:45 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6335A37B401; Sun, 26 Jan 2003 00:43:44 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 01D1A43F43; Sun, 26 Jan 2003 00:43:44 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h0Q8hg0i030573; Sun, 26 Jan 2003 00:43:42 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h0Q8hgoZ030572; Sun, 26 Jan 2003 00:43:42 -0800 (PST) Date: Sun, 26 Jan 2003 00:43:42 -0800 (PST) From: Matthew Dillon Message-Id: <200301260843.h0Q8hgoZ030572@apollo.backplane.com> To: Jeff Roberson Cc: Steve Kargl , Robert Watson , Gary Jennejohn , Subject: Re: New scheduler - Interactivity fixes References: <20030126001955.I7994-100000@mail.chesapeake.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Ok, I've run some preliminary tests w/ ULE. It's a lot better vis-a-vie interactive and batch operations. I think one thing you can do to get better MP results is to add a bit of code to sched_choose(). If sched_choose() cannot find any KSEs to run on kseq->ksq_curr or kseq->ksq_next it should search the other cpu's queues. I haven't tested your scheduler with this but I note that without it KSEs are left bound to the cpu they were originally scheduled on (cpu = ke->ke_oncpu in sched_add()), which will create a lot of lost cycles on an SMP box. My gut feeling is that sched_choose() is the best place to deal with this and sched_add() should be left as-is. I also think you can completely remove sched_pickcpu() without any detrimental effects (test that!). Just have the sched_fork() code leave ke_oncpu alone (like 4bsd does). My gut feeling is that additional work on sched_choose() will yield the best improvement. I'll have some comparative buildworld numbers tomorrow, I've run out of time tonight. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Jan 26 1: 1:53 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DBED537B401; Sun, 26 Jan 2003 01:01:51 -0800 (PST) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id ECC3643F3F; Sun, 26 Jan 2003 01:01:50 -0800 (PST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h0Q91hT94869; Sun, 26 Jan 2003 04:01:43 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Sun, 26 Jan 2003 04:01:43 -0500 (EST) From: Jeff Roberson To: Matthew Dillon Cc: Steve Kargl , Robert Watson , Gary Jennejohn , Subject: Re: New scheduler - Interactivity fixes In-Reply-To: <200301260843.h0Q8hgoZ030572@apollo.backplane.com> Message-ID: <20030126035429.A64928-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sun, 26 Jan 2003, Matthew Dillon wrote: > Ok, I've run some preliminary tests w/ ULE. It's a lot better > vis-a-vie interactive and batch operations. I think one > thing you can do to get better MP results is to add a bit of > code to sched_choose(). If sched_choose() cannot find any > KSEs to run on kseq->ksq_curr or kseq->ksq_next it should > search the other cpu's queues. I haven't tested your scheduler > with this but I note that without it KSEs are left bound to the cpu > they were originally scheduled on (cpu = ke->ke_oncpu in sched_add()), > which will create a lot of lost cycles on an SMP box. I actually have a local patch that does just this. It didn't improve the situation for my buildworld -j4 on a dual box. I'd like to leave the oncpu alone in sched_fork() as you suggest and instead move it to a new call, sched_exec(). My logical here is that since you're completely replacing the vm space you lose any locality advantage so might as well pick the least loaded cpu. I think we need a push and a pull. The push could run periodically and sort the load of all cpus then see how far off the least and most loaded are. I need to come up with some metric that is more interesting than the number of entries in the runq for the load balancing though. It doesn't take into consideration the priority spread. Also, the run queue depth can change so quickly if processes are doing lots of IO. It would be nice to have something like the total slice time of all runnable processes and processes sleeping for a very short period of time on a given cpu. Since the slice size is related to the priority, you would get a much more even load that way. Anyway, this all needs lots of experimentation. I was working on that until the interactivity issues were brought to my attention. It looks like that is satisfactory now, so I'm going to go back to mp. Anyway, keep the good ideas coming! > > My gut feeling is that sched_choose() is the best place to deal with > this and sched_add() should be left as-is. > > I also think you can completely remove sched_pickcpu() without > any detrimental effects (test that!). Just have the sched_fork() > code leave ke_oncpu alone (like 4bsd does). My gut feeling is > that additional work on sched_choose() will yield the best > improvement. > > I'll have some comparative buildworld numbers tomorrow, I've run > out of time tonight. > > -Matt > Matthew Dillon > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Jan 26 1: 4:43 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 92B3437B401; Sun, 26 Jan 2003 01:04:41 -0800 (PST) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id AF4C143F43; Sun, 26 Jan 2003 01:04:40 -0800 (PST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h0Q94ZQ95672; Sun, 26 Jan 2003 04:04:35 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Sun, 26 Jan 2003 04:04:35 -0500 (EST) From: Jeff Roberson To: Julian Elischer Cc: Matthew Dillon , Steve Kargl , Robert Watson , Gary Jennejohn , Subject: Re: New scheduler - build options In-Reply-To: Message-ID: <20030126040154.A64928-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > > > I think that the option should be set up so that no option gives the > > > current scheduler. > > > > We discussed that but that would require adding a #ifndef each time a > > scheduler was added. It's less appealing long term. Sorry we didn't > > include you on this discussion. I sent requests for feedback on this with > > my first arch@ post and eventually ended up directly discussing it with > > re. > > Huh? Did I not indicate an interest? > > if you mean the post earluer today, then it needs abit more than > one day.... It was 3 days ago. > > I happen to think that what you are doing is good because we need the > ability to abstract the scheduler, but 're@' doesn't have any say in > this.. it's not an 're' issue. I felt that changes which would break config compatibility within a major release was a re issue just like breaking API/ABI compatibility. > > Making thousands of people go and edit their config files is just > 'unfriendly'. This is an "arch@" issue and I think that you need to > revert this change until it's been discussed in the correct forum. > Maybe it's come to the decision that what you have done is corrrect but > I suspect that having a default scheduler is more in line with POLA > than suddenly having to specify one. You are probably right about POLA. I'll look into fixing this up tomorrow. I'm a bit too tired to start trying to reverting or moving forward. In the mean time, if anyone else has any opinions on whether this option should be manditory or not, please speak now. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Jan 26 11:13:46 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ABB3E37B401; Sun, 26 Jan 2003 11:13:42 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3454E43E4A; Sun, 26 Jan 2003 11:13:42 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h0QJDe0i051877; Sun, 26 Jan 2003 11:13:40 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h0QJDd8c051876; Sun, 26 Jan 2003 11:13:39 -0800 (PST) Date: Sun, 26 Jan 2003 11:13:39 -0800 (PST) From: Matthew Dillon Message-Id: <200301261913.h0QJDd8c051876@apollo.backplane.com> To: Jeff Roberson Cc: Steve Kargl , Robert Watson , Gary Jennejohn , Subject: Re: New scheduler - Interactivity fixes References: <20030126035429.A64928-100000@mail.chesapeake.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :I actually have a local patch that does just this. It didn't improve the :situation for my buildworld -j4 on a dual box. I'd like to leave the :oncpu alone in sched_fork() as you suggest and instead move it to a new :call, sched_exec(). My logical here is that since you're completely :replacing the vm space you lose any locality advantage so might as well :pick the least loaded cpu. I'll play with this a little today to see if I can improve the build times. The real time is still pretty horrendous: /usr/bin/time make -j 8 buildworld; # DELL2550 2xCPU ULE - 3435.42 real 2500.73 user 581.20 sys 3343.95 real 2501.86 user 581.67 sys 4BSD - 2414.84 real 2648.92 user 758.28 sys Though these numbers, like the numbers you originally posted, seem to imply that if we deal our cards right we can improve user and system times by doing a better job of avoiding cache issues when we do start stealing processes from other cpu's queues. :I think we need a push and a pull. The push could run periodically and :sort the load of all cpus then see how far off the least and most loaded :are. I need to come up with some metric that is more interesting than the :number of entries in the runq for the load balancing though. It doesn't :take into consideration the priority spread. Also, the run queue depth :can change so quickly if processes are doing lots of IO. I don't think periodic balancing has ever worked. The balancing really has to be near term or instantanious (in sched_choose() itself) to be effective. Run-queue depth isn't as important as being able to give a process the ability to run until it next blocks (i.e. reduce involuntary context switches). Not an easy problem to be sure since excessive latency also causes problems. :It would be nice to have something like the total slice time of all :runnable processes and processes sleeping for a very short period of time :on a given cpu. Since the slice size is related to the priority, you :would get a much more even load that way. Total slice time is getting into the fractional-fair-share scheduler methodology. I'll outline it below (and later today I should have a sched_ffs.c to show you, it's turning out to be easier then I thought it would be!). A fractional fair share scheduler basically works like this. In this example, higher priorities are better (in FreeBSD it's reversed, higher priorities are worse): sched_add(task) { GlobalAgg += task->priority; AddHead(queue, task); } sched_del(task) { GlobalAgg -= task->priority; RemoveNode(task); } sched_clock() { task->slice -= GlobalAgg; if (task->slice < 0) involuntary_switch(); } sched_choose() { while ((task = RemoveHead(queue)) != NULL) { if (task->slice >= 0) break; task->slice += task->priority * NOMINAL_TICKS; (usually 4) AddTail(queue, task); } return(task); } There are some fundamental features to this sort of scheduler: * Scheduled tasks are added to the head of the queue. You don't get into races because a task's slice is only improved when the chooser decides to skip it. This generally gives you optimal scheduling for interrupts and pipes (two processes ping-ponging each other with data). The two processes remain cohesive until they both run out of their time slice, which is extremely cache efficient. * The time slice is docked by the aggregate priority, which means that lower priority or currently-running tasks lose their time slice more quickly when a high priority task is scheduled. * The time slice is fractional-fair. The priority is not sorted and overall cpu utilization will be almost exactly the fraction of the task's priority verses the aggregate priority regardless of the granularity of the clock. In the version I'm working up I'm adding additional code to deal with interactive processes (reducing the priority of batch processes almost exactly the same way its done in your code, except using a runtime and a slptime instead of just a slptime). There is one big downside to a fractional fair scheduler, at least as I have described, and that is a certain degree of inefficiency if you have hundreds of runnable low priority processes and just a couple of high priority proceseses. The algorithm does compensate somewhat for this situation by giving the high priority processes larger slices (due to the priority multiplication in sched_clock() verses the global aggregate), but it's still an issue. There is also an issue with interrupt latency though the vast majority of interrupts do in fact run almost instantly due to the front-loading in sched_add(). -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Jan 26 11:17:51 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4DD8F37B401 for ; Sun, 26 Jan 2003 11:17:50 -0800 (PST) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id BC7C943F18 for ; Sun, 26 Jan 2003 11:17:49 -0800 (PST) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id 9C7E52A89E; Sun, 26 Jan 2003 11:17:49 -0800 (PST) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Terry Lambert Cc: Dan Nelson , "Daniel C. Sobral" , Alexey Dokuchaev , Gordon Tetlow , Garance A Drosihn , arch@FreeBSD.ORG Subject: Re: CFR: Volume labels in FFS In-Reply-To: <3E339AA8.CBD92BDE@mindspring.com> Date: Sun, 26 Jan 2003 11:17:49 -0800 From: Peter Wemm Message-Id: <20030126191749.9C7E52A89E@canning.wemm.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Terry Lambert wrote: > Dan Nelson wrote: > > > I first suggested this use of the "last mounted on" field back in > > > 1994, for the purpose of supporting auto-mounting on device "arrival" > > > for removable media. > > > > You should probably refer to it as your suggestion to rename "last > > mounted on" to "volume label", since people seem to think you want it > > to keep its original behaviour. > > 8-). > > It's string that doesn't need to be there; if the OS doesn't care > about its contents, why should you? 8-) 8-). I've found it useful when recovering trashed partition info to make sure I've found the right superblocks. Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Jan 26 11:20:28 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 13C4437B401; Sun, 26 Jan 2003 11:20:27 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id A99B043F13; Sun, 26 Jan 2003 11:20:26 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h0QJKO0i051947; Sun, 26 Jan 2003 11:20:24 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h0QJKOBB051946; Sun, 26 Jan 2003 11:20:24 -0800 (PST) Date: Sun, 26 Jan 2003 11:20:24 -0800 (PST) From: Matthew Dillon Message-Id: <200301261920.h0QJKOBB051946@apollo.backplane.com> To: Jeff Roberson , Steve Kargl , Robert Watson , Gary Jennejohn , Subject: Re: New scheduler - Interactivity fixes References: <20030126035429.A64928-100000@mail.chesapeake.net> <200301261913.h0QJDd8c051876@apollo.backplane.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Oops, a few mistakes with my pseudo-code. Corrected below (ok, it IS only pseudo code but still...). (note also: GlobalAgg could also be a per-cpu Agg). sched_clock() { task->slice -= GlobalAgg + task->priority; <<< must include task's if (task->slice < 0) own priority here. involuntary_switch(); } sched_choose() { while ((task = RemoveHead(queue)) != NULL) { if (task->slice >= 0) { GlobalAgg -= task->priority; <<<< task removed from queue break; } task->slice += task->priority * NOMINAL_TICKS; (usually 4) AddTail(queue, task); } return(task); } To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Jan 26 12:46: 2 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 005C637B405; Sun, 26 Jan 2003 12:45:58 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 29B5243E4A; Sun, 26 Jan 2003 12:45:58 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h0QKjt0i067309; Sun, 26 Jan 2003 12:45:55 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h0QKjsVc067308; Sun, 26 Jan 2003 12:45:54 -0800 (PST) Date: Sun, 26 Jan 2003 12:45:54 -0800 (PST) From: Matthew Dillon Message-Id: <200301262045.h0QKjsVc067308@apollo.backplane.com> To: Jeff Roberson Cc: Julian Elischer , Steve Kargl , Robert Watson , Gary Jennejohn , Subject: Re: New scheduler - ULE performance w/ cpu stealing References: <20030126040154.A64928-100000@mail.chesapeake.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I added cpu-stealing code to ULE and it seems to have made a huge difference. Perhaps you missed something in your local patch set. Here are the results: /usr/bin/time make -j 8 buildworld 4BSD - Original scheduler 2414.84 real 2648.92 user 758.28 sys 2399.05 real 2647.84 user 757.78 sys ULE - 3435.42 real 2500.73 user 581.20 sys 3343.95 real 2501.86 user 581.67 sys ULE - With cpu stealing code 2489.76 real 2610.33 user 659.74 sys <<<<<<<<<<<<<<<<<< Next I am going to try removing sched_pickcpu(), but I don't expect it to improve things (nor do I expect the removal to make things worse). -Matt Index: sched_ule.c =================================================================== RCS file: /home/ncvs/src/sys/kern/sched_ule.c,v retrieving revision 1.1 diff -u -r1.1 sched_ule.c --- sched_ule.c 26 Jan 2003 05:23:15 -0000 1.1 +++ sched_ule.c 26 Jan 2003 20:42:47 -0000 @@ -53,6 +53,9 @@ /* XXX This is bogus compatability crap for ps */ static fixpt_t ccpu = 0.95122942450071400909 * FSCALE; /* exp(-1/20) */ SYSCTL_INT(_kern, OID_AUTO, ccpu, CTLFLAG_RD, &ccpu, 0, ""); +static int sched_stealcpu = 1; +SYSCTL_INT(_kern, OID_AUTO, sched_stealcpu, CTLFLAG_RW, &sched_stealcpu, 0, + "Ok to steal KSEs from another cpu (0=disabled, 1=normal)"); static void sched_setup(void *dummy); SYSINIT(sched_setup, SI_SUB_RUN_QUEUE, SI_ORDER_FIRST, sched_setup, NULL) @@ -554,9 +557,26 @@ cpu = PCPU_GET(cpuid); kseq = &kseq_cpu[cpu]; - if (runq_check(kseq->ksq_curr) == 0) - return (runq_check(kseq->ksq_next)); - return (1); + if (runq_check(kseq->ksq_curr)) + return(1); + if (runq_check(kseq->ksq_next)) + return(1); + +#ifdef SMP + /* + * Check other cpus for runnable tasks + */ + if (sched_stealcpu) { + for (cpu = 0; cpu < mp_ncpus; ++cpu) { + kseq = &kseq_cpu[cpu]; + if (runq_check(kseq->ksq_curr)) + return(1); + if (runq_check(kseq->ksq_next)) + return(1); + } + } +#endif + return (0); } void @@ -573,30 +593,53 @@ } } +static __inline struct kse * -sched_choose(void) +sched_choose_kseq(struct kseq *kseq) { - struct kseq *kseq; struct kse *ke; struct runq *swap; - int cpu; - cpu = PCPU_GET(cpuid); - kseq = &kseq_cpu[cpu]; - if ((ke = runq_choose(kseq->ksq_curr)) == NULL) { swap = kseq->ksq_curr; kseq->ksq_curr = kseq->ksq_next; kseq->ksq_next = swap; ke = runq_choose(kseq->ksq_curr); } + return(ke); +} + +struct kse * +sched_choose(void) +{ + struct kseq *kseq; + struct kse *ke; + int cpu, i; + + cpu = PCPU_GET(cpuid); + kseq = &kseq_cpu[cpu]; + + ke = sched_choose_kseq(kseq); if (ke) { runq_remove(ke->ke_runq, ke); ke->ke_state = KES_THREAD; } - - return (ke); +#ifdef SMP + else if (sched_stealcpu) { + for (i = mp_ncpus - 1; ke == NULL && i; --i) { + cpu = (cpu + 1) % mp_ncpus; + kseq = &kseq_cpu[cpu]; + ke = sched_choose_kseq(kseq); + } + if (ke) { + runq_remove(ke->ke_runq, ke); + ke->ke_state = KES_THREAD; + } + } +#endif + return(ke); } + void sched_add(struct kse *ke) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Jan 26 15:40:12 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 50AFD37B401; Sun, 26 Jan 2003 15:40:07 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id C8BD543F43; Sun, 26 Jan 2003 15:40:06 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h0QNdw0i069472; Sun, 26 Jan 2003 15:39:58 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h0QNdwK1069471; Sun, 26 Jan 2003 15:39:58 -0800 (PST) Date: Sun, 26 Jan 2003 15:39:58 -0800 (PST) From: Matthew Dillon Message-Id: <200301262339.h0QNdwK1069471@apollo.backplane.com> To: Jeff Roberson , Julian Elischer , Steve Kargl , Robert Watson , Gary Jennejohn , Subject: Re: New scheduler - ULE performance w/ cpu stealing & no pickcpu References: <20030126040154.A64928-100000@mail.chesapeake.net> <200301262045.h0QKjsVc067308@apollo.backplane.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Here are the complete test results, the latest is with the stealcpu code plus pickcpu ripped out. The results without pickcpu are basically the same as with pickcpu (as expected). -Matt /usr/bin/time make -j 8 buildworld (local /usr/src, local usr/obj) 4BSD - Original scheduler 2414.84 real 2648.92 user 758.28 sys 2399.05 real 2647.84 user 757.78 sys ULE - 3435.42 real 2500.73 user 581.20 sys 3343.95 real 2501.86 user 581.67 sys ULE - STEALCPU/CHOOSE 2489.76 real 2610.33 user 659.74 sys ULE - STEALCPU/CHOOSE, WITHOUT PICKCPU 2486.76 real 2613.67 user 668.31 sys 2470.49 real 2611.28 user 665.16 sys Index: sched_ule.c =================================================================== RCS file: /home/ncvs/src/sys/kern/sched_ule.c,v retrieving revision 1.1 diff -u -r1.1 sched_ule.c --- sched_ule.c 26 Jan 2003 05:23:15 -0000 1.1 +++ sched_ule.c 26 Jan 2003 20:46:22 -0000 @@ -53,6 +53,9 @@ /* XXX This is bogus compatability crap for ps */ static fixpt_t ccpu = 0.95122942450071400909 * FSCALE; /* exp(-1/20) */ SYSCTL_INT(_kern, OID_AUTO, ccpu, CTLFLAG_RD, &ccpu, 0, ""); +static int sched_stealcpu = 1; +SYSCTL_INT(_kern, OID_AUTO, sched_stealcpu, CTLFLAG_RW, &sched_stealcpu, 0, + "Ok to steal KSEs from another cpu (0=disabled, 1=normal)"); static void sched_setup(void *dummy); SYSINIT(sched_setup, SI_SUB_RUN_QUEUE, SI_ORDER_FIRST, sched_setup, NULL) @@ -181,7 +184,6 @@ static int sched_slice(struct ksegrp *kg); static int sched_priority(struct ksegrp *kg); void sched_pctcpu_update(struct kse *ke); -int sched_pickcpu(void); static void sched_setup(void *dummy) @@ -279,40 +281,6 @@ ke->ke_ftick = ke->ke_ltick - SCHED_CPU_TICKS; } -#ifdef SMP -int -sched_pickcpu(void) -{ - int cpu; - int load; - int i; - - if (!smp_started) - return (0); - - cpu = PCPU_GET(cpuid); - load = kseq_cpu[cpu].ksq_load; - - for (i = 0; i < mp_maxid; i++) { - if (CPU_ABSENT(i)) - continue; - if (kseq_cpu[i].ksq_load < load) { - cpu = i; - load = kseq_cpu[i].ksq_load; - } - } - - CTR1(KTR_RUNQ, "sched_pickcpu: %d", cpu); - return (cpu); -} -#else -int -sched_pickcpu(void) -{ - return (0); -} -#endif - void sched_prio(struct thread *td, u_char prio) { @@ -444,7 +412,6 @@ child->kg_user_pri = kg->kg_user_pri; ckse->ke_slice = pkse->ke_slice; - ckse->ke_oncpu = sched_pickcpu(); ckse->ke_runq = NULL; /* * Claim that we've been running for one second for statistical @@ -554,9 +521,26 @@ cpu = PCPU_GET(cpuid); kseq = &kseq_cpu[cpu]; - if (runq_check(kseq->ksq_curr) == 0) - return (runq_check(kseq->ksq_next)); - return (1); + if (runq_check(kseq->ksq_curr)) + return(1); + if (runq_check(kseq->ksq_next)) + return(1); + +#ifdef SMP + /* + * Check other cpus for runnable tasks + */ + if (sched_stealcpu) { + for (cpu = 0; cpu < mp_ncpus; ++cpu) { + kseq = &kseq_cpu[cpu]; + if (runq_check(kseq->ksq_curr)) + return(1); + if (runq_check(kseq->ksq_next)) + return(1); + } + } +#endif + return (0); } void @@ -573,30 +557,53 @@ } } +static __inline struct kse * -sched_choose(void) +sched_choose_kseq(struct kseq *kseq) { - struct kseq *kseq; struct kse *ke; struct runq *swap; - int cpu; - cpu = PCPU_GET(cpuid); - kseq = &kseq_cpu[cpu]; - if ((ke = runq_choose(kseq->ksq_curr)) == NULL) { swap = kseq->ksq_curr; kseq->ksq_curr = kseq->ksq_next; kseq->ksq_next = swap; ke = runq_choose(kseq->ksq_curr); } + return(ke); +} + +struct kse * +sched_choose(void) +{ + struct kseq *kseq; + struct kse *ke; + int cpu, i; + + cpu = PCPU_GET(cpuid); + kseq = &kseq_cpu[cpu]; + + ke = sched_choose_kseq(kseq); if (ke) { runq_remove(ke->ke_runq, ke); ke->ke_state = KES_THREAD; } - - return (ke); +#ifdef SMP + else if (sched_stealcpu) { + for (i = mp_ncpus - 1; ke == NULL && i; --i) { + cpu = (cpu + 1) % mp_ncpus; + kseq = &kseq_cpu[cpu]; + ke = sched_choose_kseq(kseq); + } + if (ke) { + runq_remove(ke->ke_runq, ke); + ke->ke_state = KES_THREAD; + } + } +#endif + return(ke); } + void sched_add(struct kse *ke) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Jan 26 17:21:36 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3C91C37B401 for ; Sun, 26 Jan 2003 17:21:35 -0800 (PST) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8981F43EB2 for ; Sun, 26 Jan 2003 17:21:29 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0261.cvx21-bradley.dialup.earthlink.net ([209.179.193.6] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18cxxG-0004oK-00; Sun, 26 Jan 2003 17:20:54 -0800 Message-ID: <3E348920.7EF6E966@mindspring.com> Date: Sun, 26 Jan 2003 17:19:28 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Peter Wemm Cc: Dan Nelson , "Daniel C. Sobral" , Alexey Dokuchaev , Gordon Tetlow , Garance A Drosihn , arch@FreeBSD.ORG Subject: Re: CFR: Volume labels in FFS References: <20030126191749.9C7E52A89E@canning.wemm.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a40e9bb40a7c803848d53ebbb78c6a2fb5667c3043c0873f7e350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Peter Wemm wrote: > > 8-). > > > > It's string that doesn't need to be there; if the OS doesn't care > > about its contents, why should you? 8-) 8-). > > I've found it useful when recovering trashed partition info to make sure > I've found the right superblocks. I've personally used it this way as well. On the other hand, it just saves a step, and isn't necessary, if that's what you are trying to do, since you can find the same information by mounting the thing up read-only (FWIW). My own personal use is actually to get rid of fstab on an old FreeBSD 2.2 machine, which does auto-mounting on device arrival. But as people pointed out, this doesn't work so well in the case that you have multiple OS's on locally attached disks, and it doesn't work so well for SANdisk type disks. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Jan 26 18:12:47 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6B29D37B401; Sun, 26 Jan 2003 18:12:45 -0800 (PST) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 701F343E4A; Sun, 26 Jan 2003 18:12:44 -0800 (PST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h0R2CX130388; Sun, 26 Jan 2003 21:12:33 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Sun, 26 Jan 2003 21:12:33 -0500 (EST) From: Jeff Roberson To: Matthew Dillon Cc: Julian Elischer , Steve Kargl , Robert Watson , Gary Jennejohn , Subject: Re: New scheduler - ULE performance w/ cpu stealing In-Reply-To: <200301262045.h0QKjsVc067308@apollo.backplane.com> Message-ID: <20030126210706.V64928-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sun, 26 Jan 2003, Matthew Dillon wrote: > I added cpu-stealing code to ULE and it seems to have made a > huge difference. Perhaps you missed something in your local patch > set. Here are the results: > > /usr/bin/time make -j 8 buildworld > > 4BSD - Original scheduler > 2414.84 real 2648.92 user 758.28 sys > 2399.05 real 2647.84 user 757.78 sys > > ULE - > 3435.42 real 2500.73 user 581.20 sys > 3343.95 real 2501.86 user 581.67 sys > > ULE - With cpu stealing code > 2489.76 real 2610.33 user 659.74 sys <<<<<<<<<<<<<<<<<< > > > Next I am going to try removing sched_pickcpu(), but I don't expect > it to improve things (nor do I expect the removal to make things > worse). So, yes, there is some marginal improvement. Not as much as I would expect to see with the addition of cpu affinity. I think perhaps this may be partially due to the amount of time we spend blocked on giant now. I'm going to try some runs with make -s. In the mean time, I have a very similar version of this that I will commit. I also reworked some of the load stuff so it is slightly more representative of real load. I still think the run queue depth is not the right metric. This mechanism is sufficient for compiles but it ignores problems in other situations. Apache presents one that is readily understandable and easily solvable. Since it uses a forked model there may be many apache processes that live for a long time that are bound to cpus. They need to be evenly spread across all cpus. Using only the pull model you may have 5 apaches on one cpu and 1 on the remaining one which will be sufficient for it not to trigger a pull. So to handle this scenario I'm writing a timeout that runs every second or so and balances cpus this way. So, in short, it looks like pull is good for many short lived processes. Push is good for balancing long term load. I believe the combination of the two is required. Cheers, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Jan 27 3:42:40 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 75BAF37B401 for ; Mon, 27 Jan 2003 03:42:37 -0800 (PST) Received: from purple.the-7.net (purple.the-7.net [209.126.178.119]) by mx1.FreeBSD.org (Postfix) with ESMTP id 938D743E4A for ; Mon, 27 Jan 2003 03:42:34 -0800 (PST) (envelope-from ab@purple.the-7.net) Received: from purple.the-7.net (localhost [IPv6:::1]) by purple.the-7.net (8.12.6/8.12.6) with ESMTP id h0PIrhJ6066452; Sat, 25 Jan 2003 10:53:43 -0800 (PST) (envelope-from ab@purple.the-7.net) Received: (from ab@localhost) by purple.the-7.net (8.12.6/8.12.6/Submit) id h0PIrclA066439; Sat, 25 Jan 2003 10:53:38 -0800 (PST) (envelope-from ab) Date: Sat, 25 Jan 2003 10:53:38 -0800 From: "Eugene M. Kim" To: Alexey Dokuchaev Cc: Terry Lambert , Gordon Tetlow , Garance A Drosihn , arch@FreeBSD.ORG Subject: Re: CFR: Volume labels in FFS Message-ID: <20030125185338.GA54691@purple.the-7.net> References: <20030124212259.GJ53114@roark.gnf.org> <20030124215753.GM53114@roark.gnf.org> <20030124222718.GN53114@roark.gnf.org> <3E31C4F5.972AA69C@mindspring.com> <20030125120433.GA24687@regency.nsu.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030125120433.GA24687@regency.nsu.ru> User-Agent: Mutt/1.4i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sat, Jan 25, 2003 at 06:04:33PM +0600, Alexey Dokuchaev wrote: > > > > > > I can also forsee being able to hook into devd to do some automounting magic > > > for things like zip disks and cdroms (obviously not with FFS, but cd9660 > > > support would be a good thing to have once GEOM recognizes cdroms). > > > > That's what "Last mounted on" is for. > > > > Gotta wonder why we need volume devices, when we know where we > > are going to mount the thing... > > I second Terry here; seeing little-to-none sense in volume lables as > they are. `Last mounted on' is useful only when a disk is assumed to be used on one computer. If you wanted to mount a removable data disk at /data on computer A but at /mydata on computer B and so on, we do need some volume label. Eugene To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Jan 27 9:32: 7 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6387837B401 for ; Mon, 27 Jan 2003 09:32:06 -0800 (PST) Received: from mail.rpi.edu (mail.rpi.edu [128.113.2.7]) by mx1.FreeBSD.org (Postfix) with ESMTP id A822C43F13 for ; Mon, 27 Jan 2003 09:32:05 -0800 (PST) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by mail.rpi.edu (8.12.1/8.12.1) with ESMTP id h0RHVv2w228406; Mon, 27 Jan 2003 12:31:58 -0500 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <20030124215753.GM53114@roark.gnf.org> References: <20030124212259.GJ53114@roark.gnf.org> <20030124215753.GM53114@roark.gnf.org> Date: Mon, 27 Jan 2003 12:31:56 -0500 To: Gordon Tetlow From: Garance A Drosihn Subject: Re: CFR: Volume labels in FFS Cc: arch@FreeBSD.ORG Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: MIMEDefang 2.3 (www dot roaringpenguin dot com slash mimedefang) Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG At 1:57 PM -0800 1/24/03, Gordon Tetlow wrote: >On Fri, Jan 24, 2003, Garance A Drosihn wrote: > > At 1:22 PM -0800 1/24/03, Gordon Tetlow wrote: >> >I'd like to get some review of my patch to add volume label >> >support to FFS. I've already changed struct fs (sorry non-i386 >> >users) to allocate some space for the label. I'm looking to >> >commit this relatively soon unless there is some major problems >> >with the review. >> > > > >http://people.freebsd.org/~gordon/patches/volume.diff > > > > This sounds interesting. The patch looks like it just adds >> support for setting and retrieving the label. How will this > > be usable at the /etc/fstab level? > >it actually creates device nodes in /dev/vol/, so it >will be more like this: > >/dev/vol/rootfs / ufs rw 1 1 >/dev/vol/usrfs /usr ufs rw 2 2 >...etc... While we're all having fun debating what Terry did or did not say, and what he does or does not want, has anyone been testing Gordon's actual change? I meant to try it this past weekend, but was distracted by network headaches (as I'm sure many others were...). -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Jan 27 9:34: 0 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B755F37B401 for ; Mon, 27 Jan 2003 09:33:58 -0800 (PST) Received: from ns2.gnf.org (ns2.gnf.org [63.196.132.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id F2ABB43F13 for ; Mon, 27 Jan 2003 09:33:57 -0800 (PST) (envelope-from gtetlow@gnf.org) Received: from EXCHCLUSTER01.lj.gnf.org (exch01.lj.gnf.org [172.25.10.19]) by ns2.gnf.org (8.12.3/8.12.3) with ESMTP id h0RHXstk092333 for ; Mon, 27 Jan 2003 09:33:54 -0800 (PST) (envelope-from gtetlow@gnf.org) Received: from roark.gnf.org ([172.25.24.15]) by EXCHCLUSTER01.lj.gnf.org with Microsoft SMTPSVC(5.0.2195.5329); Mon, 27 Jan 2003 09:33:55 -0800 Received: from roark.gnf.org (localhost [127.0.0.1]) by roark.gnf.org (8.12.6/8.12.6) with ESMTP id h0RHXt6t016547; Mon, 27 Jan 2003 09:33:55 -0800 (PST) (envelope-from gtetlow@gnf.org) Received: (from gtetlow@localhost) by roark.gnf.org (8.12.6/8.12.6/Submit) id h0RHXtMf016546; Mon, 27 Jan 2003 09:33:55 -0800 (PST) (envelope-from gtetlow) Date: Mon, 27 Jan 2003 09:33:55 -0800 From: Gordon Tetlow To: Garance A Drosihn Cc: arch@FreeBSD.ORG Subject: Re: CFR: Volume labels in FFS Message-ID: <20030127173355.GZ53114@roark.gnf.org> References: <20030124212259.GJ53114@roark.gnf.org> <20030124215753.GM53114@roark.gnf.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="p4I5Bchq69Fe6bBp" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i X-OriginalArrivalTime: 27 Jan 2003 17:33:55.0754 (UTC) FILETIME=[445B10A0:01C2C62A] Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --p4I5Bchq69Fe6bBp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jan 27, 2003 at 12:31:56PM -0500, Garance A Drosihn wrote: > At 1:57 PM -0800 1/24/03, Gordon Tetlow wrote: > >On Fri, Jan 24, 2003, Garance A Drosihn wrote: > > > At 1:22 PM -0800 1/24/03, Gordon Tetlow wrote: > >> >I'd like to get some review of my patch to add volume label > >> >support to FFS. I've already changed struct fs (sorry non-i386 > >> >users) to allocate some space for the label. I'm looking to > >> >commit this relatively soon unless there is some major problems > >> >with the review. > >> > > > > >http://people.freebsd.org/~gordon/patches/volume.diff > > > > > > This sounds interesting. The patch looks like it just adds > >> support for setting and retrieving the label. How will this > > > be usable at the /etc/fstab level? > > > >it actually creates device nodes in /dev/vol/, so it > >will be more like this: > > > >/dev/vol/rootfs / ufs rw 1 1 > >/dev/vol/usrfs /usr ufs rw 2 2 > >...etc... >=20 >=20 > While we're all having fun debating what Terry did or did not say, > and what he does or does not want, has anyone been testing Gordon's > actual change? I meant to try it this past weekend, but was > distracted by network headaches (as I'm sure many others were...). Thanks for steering things back around. I'd also like to emphasize that this will be my first C code commit to the tree, so code pointers would be greatly appreciated. -gordon --p4I5Bchq69Fe6bBp Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (FreeBSD) iD8DBQE+NW2DRu2t9DV9ZfsRAg5LAJ9XrYGAqS7MkL92TswYQQnW5bGncQCgqAQR acGeDDCthmlbHrq/T5Cr9HE= =nfBd -----END PGP SIGNATURE----- --p4I5Bchq69Fe6bBp-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Jan 27 9:36:40 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 219FA37B401 for ; Mon, 27 Jan 2003 09:36:39 -0800 (PST) Received: from mail.rpi.edu (mail.rpi.edu [128.113.2.7]) by mx1.FreeBSD.org (Postfix) with ESMTP id 48A6A43EB2 for ; Mon, 27 Jan 2003 09:36:38 -0800 (PST) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by mail.rpi.edu (8.12.1/8.12.1) with ESMTP id h0RHaU2w098778; Mon, 27 Jan 2003 12:36:31 -0500 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <20030126191749.9C7E52A89E@canning.wemm.org> References: <20030126191749.9C7E52A89E@canning.wemm.org> Date: Mon, 27 Jan 2003 12:36:29 -0500 To: Peter Wemm , Terry Lambert From: Garance A Drosihn Subject: Re: CFR: Volume labels in FFS Cc: arch@FreeBSD.ORG Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: MIMEDefang 2.3 (www dot roaringpenguin dot com slash mimedefang) Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG At 11:17 AM -0800 1/26/03, Peter Wemm wrote: >Terry Lambert wrote: > > Dan Nelson wrote: > > > You should probably refer to it as your suggestion to rename > > > "last mounted on" to "volume label", since people seem to > > > think you want it to keep its original behaviour. > > >> 8-). >> >> It's string that doesn't need to be there; if the OS doesn't care >> about its contents, why should you? 8-) 8-). > >I've found it useful when recovering trashed partition info to make >sure I've found the right superblocks. Yeah, I have a version of /usr/src/tools/tools/find-sb which makes use of that last-mounted on field. Although I imagine the new field would work just as well for my purposes, assuming I had run the utility to set the new field. -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Jan 27 10: 9:55 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3BC7C37B401 for ; Mon, 27 Jan 2003 10:09:54 -0800 (PST) Received: from nycmaster.twai.com (234.mufd.nycm.n54ny06r18.dsl.att.net [12.103.195.234]) by mx1.FreeBSD.org (Postfix) with ESMTP id C4EE543F5B for ; Mon, 27 Jan 2003 10:09:52 -0800 (PST) (envelope-from moreels_b18@conninc.com) Received: from smtp0562.mail.yahoo.com ([65.83.131.242]) by nycmaster.twai.com with Microsoft SMTPSVC(5.0.2195.5329); Thu, 23 Jan 2003 01:50:13 -0500 Date: Thu, 23 Jan 2003 06:36:27 GMT From: "Thaoven" X-Priority: 3 To: la2tx@aol.com Cc: mmpizza@sprynet.com, freebsd-arch@freebsd.org Subject: la2tx,Employment Opportunity from home Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: X-OriginalArrivalTime: 23 Jan 2003 06:50:21.0171 (UTC) FILETIME=[B29E4C30:01C2C2AB] Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG la2tx@aol.com Do You Want To Teach and Grow Rich?........... Earn $2,000-$10,000 dollars each week working from home? Have an Unlimited Income Potential? Join a Fast Growing Company? Gain Financial Independence? If You Answered Yes, This Opportunity's For You! If you are a motivated and qualified communicator, I will personally train you to do 3 / 20 minute phone presentations per day to qualify prospects that we can provide to you. I will demonstrate to you that you can make $400 a day part time using this system. Or, if you have 20 hours per week, as in my case, with the proper training you can make in excess of $10,000 per week, as I am currently generating (verifiable, by the way). Plus I will introduce you to my mentor who makes well in excess of $1,000,000 annually. Many are called, only a select number are chosen. Call the 24 hour pre-recorded message number below for more information. We will take as much or as little time as you need to see if this program is right for you. Call now for more: 618-355-1169 Please do not make this call unless you are genuinely money motivated and qualified. I need people who already have people skills in place and are ready to generate large amounts of money in the near future. Looking forward to your call. Thank you. 618-355-1169 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Jan 27 17:39:39 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7B31F37B401 for ; Mon, 27 Jan 2003 17:39:38 -0800 (PST) Received: from magic.adaptec.com (magic.adaptec.com [208.236.45.80]) by mx1.FreeBSD.org (Postfix) with ESMTP id 04A0243F79 for ; Mon, 27 Jan 2003 17:39:38 -0800 (PST) (envelope-from scott_long@btc.adaptec.com) Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11]) by magic.adaptec.com (8.11.6+Sun/8.11.6) with ESMTP id h0S1dbD12976 for ; Mon, 27 Jan 2003 17:39:37 -0800 (PST) Received: from btc.btc.adaptec.com (btc.btc.adaptec.com [10.100.0.52]) by redfish.adaptec.com (8.8.8+Sun/8.8.8) with ESMTP id RAA09125 for ; Mon, 27 Jan 2003 17:39:27 -0800 (PST) Received: from btc.adaptec.com (hollin [10.100.253.56]) by btc.btc.adaptec.com (8.8.8+Sun/8.8.8) with ESMTP id SAA24782 for ; Mon, 27 Jan 2003 18:39:24 -0700 (MST) Message-ID: <3E35DE8E.2080706@btc.adaptec.com> Date: Mon, 27 Jan 2003 18:36:14 -0700 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3a) Gecko/20030127 X-Accept-Language: en-us, en MIME-Version: 1.0 To: arch@freebsd.org Subject: bus_dmamem_alloc_size() Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG All, With the ongoing effort to convert all of our drivers to busdma, it's becoming painfully apparrent that bus_dmamem_alloc() needs to be able to specify the size of the buffer to create, and not just default to the max_size field of the dma tag. The reason is that a lot of simple hardware out there do understand scatter-gather lists, so any data passed to them must be physically contiguous in a single segment. This in turn means that drivers need to allocate a single segment buffer via busdma and copy their i/o contents into/out of it when talking to the card. These i/o lengths are often extremely variable in size. Without the ability to allocate variable sized buffers, you're forced to either create a custom dma tag per i/o transaction, or pre-allocate a huge chunk up front and do your own sub allocations out of it. Both are rather tedious and inefficient. So, how about adding a method called bus_dmamem_alloc_size() that takes the normal arguments of bus_dmamem_alloc(), plus an allocation size. Driver writers would still be encouraged to be smart about memory management since contigmalloc() would still be the underlying allocator, and contigmalloc rounds all requests up PAGE_SIZE. Patches to do this are trivial and can be provided on request. If I don't hear any arguments against this, I'll commit it this week. In case anyone cares, the my motivation for this comes from trying to convert the usb driver to busdma. Scott To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Jan 27 21:32:26 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3335837B401; Mon, 27 Jan 2003 21:32:23 -0800 (PST) Received: from edgemaster.zombie.org (edgemaster.creighton.edu [147.134.112.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6B52543F75; Mon, 27 Jan 2003 21:32:22 -0800 (PST) (envelope-from smkelly@zombie.org) Received: by edgemaster.zombie.org (Postfix, from userid 1001) id D8431415C6; Mon, 27 Jan 2003 23:32:17 -0600 (CST) Date: Mon, 27 Jan 2003 23:32:17 -0600 From: Sean Kelly To: Bruce Evans Cc: Peter Wemm , phk@FreeBSD.ORG, Kris Kennaway , arch@FreeBSD.ORG Subject: Re: HEADSUP: DEVFS and GEOM mandatorification timeline. Message-ID: <20030128053217.GA738@edgemaster.zombie.org> References: <20030124190057.91A3B2A7EA@canning.wemm.org> <20030125110722.C8080-100000@gamplex.bde.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="zYM0uCDKw75PZbzx" Content-Disposition: inline In-Reply-To: <20030125110722.C8080-100000@gamplex.bde.org> User-Agent: Mutt/1.5.3i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --zYM0uCDKw75PZbzx Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jan 25, 2003 at 11:26:46AM +1100, Bruce Evans wrote: =2E.. > > > >To the best of your knowledge, there are no remaining serious bugs or > > > >missing functionality with GEOM (like the disklabel editing problems > > > >found before 5.0, etc)? > > > > > > There is one errata point (can't rewrite BSD boot code on a disk > > > which is in use) which I am testing a patch for. > > > > > > I know of no bugs at present. >=20 > Features like disklabel -r and disklabel -W not working are not bugs of > course. While I don't agree with the way he said it, I agree with what Bruce Evans is saying here. I am still having problems with disklabel(8) and GEOM as I have previously noted on -current. I consider this a bug, and a rather big one at that. If nobody can explain why it is specific to my configuration, then I will continue to consider it a bug. I may not have a commit bit, and I may not have a @FreeBSD.ORG e-mail addres, but I still have an opinion. I put myself in this strange boat called "the userbase." This is a boat that you might actually want to listen to the few voices in. I would highly suggest GEOM actually properly support disklabel'ing before it is made The One True Way. I was originally in favor of GEOM (as an end-user), but I'm really starting to get a bitter taste in my mouth. edgemaster# uname -v FreeBSD 5.0-CURRENT #1: Mon Jan 27 14:00:00 CST 2003 root@edgemaster.zombie.org:/usr/obj/usr/src/sys/EDGEMASTER edgemaster# disklabel -r ad1s1 8 partitions: # size offset fstype [fsize bsize bps/cpg] a: 1024000 6291519 4.2BSD 2048 16384 90 # (Cyl. 391*- 455*) b: 6291456 63 swap # (Cyl. 0*- 391*) c: 117226242 63 unused 0 0 # (Cyl. 0*- 7296= *) e: 10485760 7315519 4.2BSD 2048 16384 89 # (Cyl. 455*- 1108= *) f: 99425026 17801279 4.2BSD 2048 16384 89 # (Cyl. 1108*- 7296= *) partition c: partition extends past end of unit Warning, partition c doesn't start at 0! Warning, An incorrect partition c may cause problems for standard system ut= ilities partition f: partition extends past end of unit edgemaster# disklabel -B ad1s1 partition c: partition extends past end of unit Warning, partition c doesn't start at 0! Warning, An incorrect partition c may cause problems for standard system ut= ilities partition f: partition extends past end of unit Note that if this is writing boot code, it is not working. When I select "F1 FreeBSD", I receive the message "Boot error". A keypress results in my system rebooting. edgemaster# disklabel ad1s1 8 partitions: # size offset fstype [fsize bsize bps/cpg] a: 1024000 6291456 4.2BSD 2048 16384 90 # (Cyl. 391*- 455*) b: 6291456 0 swap # (Cyl. 0 - 391*) c: 117226242 0 unused 0 0 # (Cyl. 0 - 7296= *) e: 10485760 7315456 4.2BSD 2048 16384 89 # (Cyl. 455*- 1108= *) f: 99425026 17801216 4.2BSD 2048 16384 89 # (Cyl. 1108*- 7296= *) --=20 Sean Kelly | PGP KeyID: D2E5E296 smkelly@zombie.org | http://www.zombie.org --zYM0uCDKw75PZbzx Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (FreeBSD) iD8DBQE+NhXhPm7A9NLl4pYRApJFAJ4/qzeVtmuJQFEPSLcKnaxCHxCx+wCgk450 0SZbhCiyQaYNwNc9nKpY04M= =vv5x -----END PGP SIGNATURE----- --zYM0uCDKw75PZbzx-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Jan 27 23:17: 6 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E211D37B401 for ; Mon, 27 Jan 2003 23:17:04 -0800 (PST) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id EB13043F85 for ; Mon, 27 Jan 2003 23:17:03 -0800 (PST) (envelope-from phk@freebsd.org) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h0S7GgQl031799; Tue, 28 Jan 2003 08:16:43 +0100 (CET) (envelope-from phk@freebsd.org) To: Sean Kelly Cc: Bruce Evans , Peter Wemm , Kris Kennaway , arch@freebsd.org Subject: Re: HEADSUP: DEVFS and GEOM mandatorification timeline. From: phk@freebsd.org In-Reply-To: Your message of "Mon, 27 Jan 2003 23:32:17 CST." <20030128053217.GA738@edgemaster.zombie.org> Date: Tue, 28 Jan 2003 08:16:42 +0100 Message-ID: <31798.1043738202@critter.freebsd.dk> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message <20030128053217.GA738@edgemaster.zombie.org>, Sean Kelly writes: >edgemaster# disklabel -r ad1s1 Don't use '-r' I will remove this as soon as it is no longer needed for NO_GEOM compatibility. >edgemaster# disklabel -B ad1s1 >partition c: partition extends past end of unit >Warning, partition c doesn't start at 0! >Warning, An incorrect partition c may cause problems for standard system ut= >ilities >partition f: partition extends past end of unit > >Note that if this is writing boot code, it is not working. When I select >"F1 FreeBSD", I receive the message "Boot error". A keypress results in my >system rebooting. Can you please check with hexdump what is on the start of the partition ? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Jan 27 23:31:20 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E904E37B401; Mon, 27 Jan 2003 23:31:16 -0800 (PST) Received: from edgemaster.zombie.org (edgemaster.creighton.edu [147.134.112.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 08DBD43E4A; Mon, 27 Jan 2003 23:31:16 -0800 (PST) (envelope-from smkelly@zombie.org) Received: by edgemaster.zombie.org (Postfix, from userid 1001) id 7689C415C6; Tue, 28 Jan 2003 01:31:15 -0600 (CST) Date: Tue, 28 Jan 2003 01:31:15 -0600 From: Sean Kelly To: phk@freebsd.org, Bruce Evans , Peter Wemm , Kris Kennaway , arch@freebsd.org Subject: Re: HEADSUP: DEVFS and GEOM mandatorification timeline. Message-ID: <20030128073115.GA1507@edgemaster.zombie.org> References: <20030128053217.GA738@edgemaster.zombie.org> <31798.1043738202@critter.freebsd.dk> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="FCuugMFkClbJLl1L" Content-Disposition: inline In-Reply-To: <31798.1043738202@critter.freebsd.dk> User-Agent: Mutt/1.5.3i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --FCuugMFkClbJLl1L Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jan 28, 2003 at 08:16:42AM +0100, phk@freebsd.org wrote: > In message <20030128053217.GA738@edgemaster.zombie.org>, Sean Kelly write= s: >=20 > >edgemaster# disklabel -r ad1s1 >=20 > Don't use '-r' I will remove this as soon as it is no longer needed > for NO_GEOM compatibility. Okay. > >edgemaster# disklabel -B ad1s1 > >partition c: partition extends past end of unit > >Warning, partition c doesn't start at 0! > >Warning, An incorrect partition c may cause problems for standard system= ut=3D > >ilities > >partition f: partition extends past end of unit > > > >Note that if this is writing boot code, it is not working. When I select > >"F1 FreeBSD", I receive the message "Boot error". A keypress results in = my > >system rebooting. >=20 > Can you please check with hexdump what is on the start of the partition ? I apologize for the tone in part of my previous message, but I"m a bit grump after not being able to properly boot for over a month now. I've not been able to rectify this with or without GEOM. However, without GEOM the situation was different. I'm not sure how much you want, so I'll start with 512 bytes. The other obvious amount is the whole 8k of boot, and that seems a bit long. edgemaster# cd /boot edgemaster# dd if=3D/dev/ad1s1 bs=3D512 count=3D1 >myboot edgemaster# cmp myboot boot1 myboot boot1 differ: char 447, line 6 00000000 eb 3c 00 00 00 00 00 00 00 00 00 00 02 00 00 00 |.<............= =2E.| 00000010 00 00 00 00 00 00 00 00 12 00 02 00 00 00 00 00 |..............= =2E.| 00000020 00 00 00 00 00 16 1f 66 6a 00 51 50 06 53 31 c0 |.......fj.QP.S= 1.| 00000030 88 f0 50 6a 10 89 e5 e8 c7 00 8d 66 10 cb fc 31 |..Pj.......f..= =2E1| 00000040 c9 8e c1 8e d9 8e d1 bc 00 7c 89 e6 bf 00 07 fe |.........|....= =2E.| 00000050 c5 f3 a5 be ee 7d 80 fa 80 72 2c b6 01 e8 67 00 |.....}...r,...= g.| 00000060 b9 01 00 be aa 8e b6 01 80 7c 04 a5 75 07 e3 19 |.........|..u.= =2E.| 00000070 f6 04 80 75 14 83 c6 10 fe c6 80 fe 05 72 e9 49 |...u.........r= =2EI| 00000080 e3 e1 be ac 7d eb 52 31 d2 89 16 00 09 b6 10 e8 |....}.R1......= =2E.| 00000090 35 00 bb 00 90 8b 77 0a 01 de bf 00 c0 b9 00 ae |5.....w.......= =2E.| 000000a0 29 f1 f3 a4 29 f9 30 c0 f3 aa e8 03 00 e9 60 13 |)...).0.......= `.| 000000b0 fa e4 64 a8 02 75 fa b0 d1 e6 64 e4 64 a8 02 75 |..d..u....d.d.= =2Eu| 000000c0 fa b0 df e6 60 fb c3 bb ec 8c 8b 44 08 8b 4c 0a |....`......D..= L.| 000000d0 0e e8 53 ff 73 2a be a7 7d e8 1c 00 be b1 7d e8 |..S.s*..}.....= }.| 000000e0 16 00 30 e4 cd 16 c7 06 72 04 34 12 ea 00 00 ff |..0.....r.4...= =2E.| 000000f0 ff bb 07 00 b4 0e cd 10 ac 84 c0 75 f4 b4 01 f9 |...........u..= =2E.| 00000100 c3 52 b4 08 cd 13 88 f5 5a 72 f5 80 e1 3f 74 ed |.R......Zr...?= t.| 00000110 fa 66 8b 46 08 52 66 0f b6 d9 66 31 d2 66 f7 f3 |.f.F.Rf...f1.f= =2E.| 00000120 88 eb 88 d5 43 30 d2 66 f7 f3 88 d7 5a 66 3d ff |....C0.f....Zf= =3D.| 00000130 03 00 00 fb 77 44 86 c4 c0 c8 02 08 e8 40 91 88 |....wD.......@= =2E.| 00000140 fe 28 e0 8a 66 02 38 e0 72 02 88 e0 bf 05 00 c4 |.(..f.8.r.....= =2E.| 00000150 5e 04 50 b4 02 cd 13 5b 73 0a 4f 74 1c 30 e4 cd |^.P....[s.Ot.0= =2E.| 00000160 13 93 eb eb 0f b6 c3 01 46 08 73 03 ff 46 0a d0 |........F.s..F= =2E.| 00000170 e3 00 5e 05 28 46 02 77 88 c3 2e f6 06 ba 08 80 |..^.(F.w......= =2E.| 00000180 0f 84 79 ff bb aa 55 52 b4 41 cd 13 5a 0f 82 6f |..y...UR.A..Z.= =2Eo| 00000190 ff 81 fb 55 aa 0f 85 64 ff f6 c1 01 0f 84 5d ff |...U...d......= ].| 000001a0 89 ee b4 42 cd 13 c3 52 65 61 64 00 42 6f 6f 74 |...B...Read.Bo= ot| 000001b0 00 20 65 72 72 6f 72 0d 0a 00 80 90 90 90 3f 8c |. error.......= ?.| 000001c0 ee 0d ba 96 af 6f 41 ee 40 de a2 42 c6 6d bd a7 |.....oA.@..B.m= =2E.| 000001d0 bc 1c 06 44 69 23 20 94 bd fa 5a 2f 92 53 26 dc |...Di# ...Z/.S= &.| 000001e0 56 d8 e7 34 70 35 56 f7 ff f4 23 5a 47 a6 3a a3 |V..4p5V...#ZG.= :.| 000001f0 76 98 8a dc 18 c3 da 33 89 41 b6 e0 79 03 55 aa |v......3.A..y.= U.| 00000200 --=20 Sean Kelly | PGP KeyID: D2E5E296 smkelly@zombie.org | http://www.zombie.org --FCuugMFkClbJLl1L Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (FreeBSD) iD8DBQE+NjHCPm7A9NLl4pYRAvTkAJ4xwgOk0xVtwarbgAaCmlod2BNXUwCgrE5Y O5EaPiw1YwD+f/h0HjOH0xA= =rfV5 -----END PGP SIGNATURE----- --FCuugMFkClbJLl1L-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Jan 27 23:39:48 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D34B537B401 for ; Mon, 27 Jan 2003 23:39:46 -0800 (PST) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id B2D4043F43 for ; Mon, 27 Jan 2003 23:39:45 -0800 (PST) (envelope-from phk@freebsd.org) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h0S7diQl032049; Tue, 28 Jan 2003 08:39:44 +0100 (CET) (envelope-from phk@freebsd.org) To: Sean Kelly Cc: Bruce Evans , Peter Wemm , Kris Kennaway , arch@freebsd.org Subject: Re: HEADSUP: DEVFS and GEOM mandatorification timeline. From: phk@freebsd.org In-Reply-To: Your message of "Tue, 28 Jan 2003 01:31:15 CST." <20030128073115.GA1507@edgemaster.zombie.org> Date: Tue, 28 Jan 2003 08:39:44 +0100 Message-ID: <32048.1043739584@critter.freebsd.dk> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message <20030128073115.GA1507@edgemaster.zombie.org>, Sean Kelly writes: >I apologize for the tone in part of my previous message, but I"m a bit >grump after not being able to properly boot for over a month now. I've not >been able to rectify this with or without GEOM. However, without GEOM the >situation was different. I would be grumpy too then. >I'm not sure how much you want, so I'll start with 512 bytes. The other >obvious amount is the whole 8k of boot, and that seems a bit long. > >edgemaster# cd /boot >edgemaster# dd if=3D/dev/ad1s1 bs=3D512 count=3D1 >myboot >edgemaster# cmp myboot boot1 >myboot boot1 differ: char 447, line 6 This difference is OK in principle, the embedded MBR starts at 446 and extends for 64 bytes and as far as i can see the rest of boot1 is in good shape. The MBR area seems to be filled with junk though. Provided that ad1s1 does not start at the physically first sector of the disk, this should be OK. Can you mail me dd if=/dev/ad1 bs=64k count=10 | uuencode sean.ad1 And I'll to figure out what's wrong. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Jan 28 8:29:44 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CFD5F37B401 for ; Tue, 28 Jan 2003 08:29:42 -0800 (PST) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3105F43F43 for ; Tue, 28 Jan 2003 08:29:42 -0800 (PST) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.12.6/8.12.5) with SMTP id h0SGTFP4094113; Tue, 28 Jan 2003 11:29:16 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Tue, 28 Jan 2003 11:29:14 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: Julian Elischer Cc: Jeff Roberson , Matthew Dillon , Steve Kargl , Gary Jennejohn , arch@FreeBSD.ORG Subject: Re: New scheduler - Interactivity fixes In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sat, 25 Jan 2003, Julian Elischer wrote: > The standard proceedure for adding a new "to be standard" feature is: > > Make the new feature an option, leaving the original as default. Make > the old one as a feature as well. Change the default I know there are other follow-ups to this one, but I couldn't find them, so I'll respond to this (apologies to Julian :-). I noticed recently, perhaps in P4, that Juli Mallett had introduced a "platform" config line for configuration files. It made me wonder whether what we shouldn't consider doing is adding a new "scheduler" directive, i.e.,: scheduler 4bsd #scheduler ule Config(8) could be taught to allow at most one scheduler directive. It would allow zero with a warning ("No scheduler defined at compile-time; one must be loaded as a module"). We'd either introduce a new class similar to OPTIONS, or map the argument of scheduler directly to an option, with the possible risk of getting some false positives in the argument validity checking. While it may well be that at some point we flip to using ULE, but since the idea of pluggable schedulers comes up with relative frequency and there is interest in scheduler research, maintaining the ability to easily plug schedulers sounds like the right strategy, so formalizing it to improve the ease-of-use factor would be useful. (For the humor impaired, we could also use "sched" instead of "scheduler" above.) Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Jan 28 8:33:15 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2F99237B401 for ; Tue, 28 Jan 2003 08:33:14 -0800 (PST) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id AB89C43F75 for ; Tue, 28 Jan 2003 08:33:10 -0800 (PST) (envelope-from phk@freebsd.org) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h0SGWmQl036866; Tue, 28 Jan 2003 17:32:48 +0100 (CET) (envelope-from phk@freebsd.org) To: Robert Watson Cc: Julian Elischer , Jeff Roberson , Matthew Dillon , Steve Kargl , Gary Jennejohn , arch@freebsd.org Subject: Re: New scheduler - Interactivity fixes From: phk@freebsd.org In-Reply-To: Your message of "Tue, 28 Jan 2003 11:29:14 EST." Date: Tue, 28 Jan 2003 17:32:48 +0100 Message-ID: <36865.1043771568@critter.freebsd.dk> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message , Robe rt Watson writes: > >On Sat, 25 Jan 2003, Julian Elischer wrote: > >> The standard proceedure for adding a new "to be standard" feature is: >> >> Make the new feature an option, leaving the original as default. Make >> the old one as a feature as well. Change the default > >I know there are other follow-ups to this one, but I couldn't find them, >so I'll respond to this (apologies to Julian :-). I noticed recently, >perhaps in P4, that Juli Mallett had introduced a "platform" config line >for configuration files. It made me wonder whether what we shouldn't >consider doing is adding a new "scheduler" directive, i.e.,: > > > scheduler 4bsd > #scheduler ule This is just putting duck-tape over the hole rather than fix it. Config(8) barely managed to get a one-dimensional view of the options, it needs to learn about dependencies in a proper way, rather than to add magic keywords for all kernel parts. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Jan 28 11: 7: 9 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7B3B037B401; Tue, 28 Jan 2003 11:07:08 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 05E9043F3F; Tue, 28 Jan 2003 11:07:08 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h0SJ720i028160; Tue, 28 Jan 2003 11:07:02 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h0SJ72tA028159; Tue, 28 Jan 2003 11:07:02 -0800 (PST) Date: Tue, 28 Jan 2003 11:07:02 -0800 (PST) From: Matthew Dillon Message-Id: <200301281907.h0SJ72tA028159@apollo.backplane.com> To: Robert Watson Cc: Julian Elischer , Jeff Roberson , Steve Kargl , Gary Jennejohn , arch@FreeBSD.ORG Subject: Re: New scheduler - Interactivity fixes References: Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG It seems to me that the correct solution is to use the SYSINIT mechanism to allow multiple schedulers to be loaded into the kernel, and choose one as a default (aka sched_4bsd) either with a flag in the SYSINIT or simply by order. Then a boot variable could be used to override the default and allow one to choose a particular scheduler. The work required to do this is minor. We need only structuralize / vectorize the few global procedures in sched_*() and then declare them static. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Jan 28 11:49:26 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0759337B401 for ; Tue, 28 Jan 2003 11:49:26 -0800 (PST) Received: from net1.gendyn.com (gate1.gendyn.com [204.60.171.22]) by mx1.FreeBSD.org (Postfix) with ESMTP id 06BB743F3F for ; Tue, 28 Jan 2003 11:49:24 -0800 (PST) (envelope-from eischen@vigrid.com) Received: from [153.11.11.3] (helo=ebnext01) by net1.gendyn.com with esmtp (Exim 2.12 #1) id 18dbjI-000CVJ-00 for arch@freebsd.org; Tue, 28 Jan 2003 14:49:08 -0500 Received: from clcrtr.gdeb.com ([153.11.109.11]) by ebnext01 with SMTP id h0SJn8oC015896 for ; Tue, 28 Jan 2003 14:49:08 -0500 Received: from vigrid.com (gpz.clc.gdeb.com [192.168.3.12]) by clcrtr.gdeb.com (8.11.4/8.11.4) with ESMTP id h0Q41BN18968 for ; Sat, 25 Jan 2003 23:01:12 -0500 (EST) (envelope-from eischen@vigrid.com) Message-ID: <3E36DEB1.8CC4E61F@vigrid.com> Date: Tue, 28 Jan 2003 14:49:05 -0500 From: Daniel Eischen X-Mailer: Mozilla 4.78 [en] (X11; U; SunOS 5.9 sun4u) X-Accept-Language: en MIME-Version: 1.0 To: arch@freebsd.org Subject: New scheduler and classes Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG How does the new scheduler/API handle different classes? If you want to have processes^Wthreads in 3 or 4 different scheduling classes (perhaps real-time, time-sharing, etc, a la Solaris), how does "a scheduler" handle this? Is there only ever one scheduler selected in the kernel, and it is up to this scheduler to handle all the threads in each class? Or can you stack schedulers somehow, so that each class gets its own scheduler, and it's up to the top-level scheduler to schedule amongst the classes? -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Jan 28 12:11:35 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 810D137B401 for ; Tue, 28 Jan 2003 12:11:31 -0800 (PST) Received: from sccrmhc01.attbi.com (sccrmhc01.attbi.com [204.127.202.61]) by mx1.FreeBSD.org (Postfix) with ESMTP id D262543F93 for ; Tue, 28 Jan 2003 12:11:30 -0800 (PST) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4]) by sccrmhc01.attbi.com (sccrmhc01) with ESMTP id <200301282011290010088ii5e>; Tue, 28 Jan 2003 20:11:29 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id MAA16616 for ; Tue, 28 Jan 2003 12:11:28 -0800 (PST) Date: Tue, 28 Jan 2003 12:11:27 -0800 (PST) From: Julian Elischer To: arch@freebsd.org Subject: threads and the scheduler(s). Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG At the beginning of this we spec'd out a design and several things were said. 1/ the names of these entities might change as we discover the implementation details, and 2/ Sme of thes structures (thread, KSE, KSEGRP, proc) might become more or less 'virtual' when we try implement them. Well that time has happenned. In a stroke of brilliance David Xu has added a small 5th element, an 'upcall' structure, that pretty much changes the relationship between all the others. ** What follows is only true for Threaded processes ** The 'upcall' structure is a 'token with data'. Only a thread that has an upcall structure can cross into userland. Any thread that wishes to cross to userland but does not have one, must simply save it's state to its userland mailbox, and exit, relying on some other thread to eventually deliver the completion notice to the UTS. There is a lot of shuffling here and there but the end result is that the KSE structure is in effect only known by the KSE/BSD4.4 scheduler. If we we to put a couple of callbacks into the scheduler for fork/exit/(etc) points, we could completely isolate the existance of the KSE to the KSE scheduler. This would mean that the entities that the rest of the kernel would know about were: proc -- owns resources and memeory map. ksegrp (possibly rename this now.. (subproc?) thread -- the thing that gets to sleep, run, etc. The KSE becomes an internal scheduler detail that is used to force some kind of fairness between threaded and unthreaded processes. it should be theoretically possible to write a scheduler that uses a different internal abstraction. We've been working towards this for a while and over the last few months we've isolated all the 'hard work' code regarding KSEs to kern_switch.c and kern_threads.c There are some other places where KSEs are referenced, but usually they are just creating on, removing one, or doing other housekeeping with them, and most of these could be moved to scheduler specific callbacks. here is an almost complete list of what the KSE is used for OUSIDE of the SCHEDULER files. Some of these could be moved to within those files by extending the scheduler interface, (e.g. sched_fork() could add a KSE to a process) while others of these uses could move to other structures.. For example, KEF_ASTPENDING might be better stored in the upcall structure or the thread structure (of the thread that is going up) WHat we need to decide is "What exactly does ASTPENDING mean in the context of a threaded program. Same for NEEDRESCHED. ############################## ./alpha/alpha/genassym.c:ASSYM(KE_FLAGS, offsetof(struct kse, ke_flags)); ./ddb/db_ps.c: db_printf("[CPU %d]", td->td_kse->ke_oncpu); ./i386/i386/genassym.c:ASSYM(KE_FLAGS, offsetof(struct kse, ke_flags)); ./i386/i386/sys_machdep.c: td->td_kse->ke_flags |= KEF_NEEDRESCHED; ./i386/i386/trap.c: KASSERT((td->td_kse->ke_thread == td), ("syscall:kse/thr ead mismatch")); ./i386/isa/npx.c: td->td_kse->ke_flags |= KEF_ASTPENDING; ./ia64/ia64/genassym.c:ASSYM(KE_FLAGS, offsetof(struct kse, ke_flags)); ./kern/init_main.c: ke->ke_sched = kse0_sched; ./kern/init_main.c: ke->ke_oncpu = 0; ./kern/init_main.c: ke->ke_state = KES_THREAD; ./kern/init_main.c: ke->ke_thread = td; ./kern/kern_clock.c: td->td_kse->ke_flags |= KEF_ASTPENDING; ./kern/kern_clock.c: td->td_kse->ke_flags |= KEF_ASTPENDING; ./kern/kern_clock.c: * ke->ke_uticks, p->p_sticks, p->p_iticks, and p->p_estcpu. This function ./kern/kern_clock.c: * ke->ke_uticks, p->p_sticks, p->p_iticks, and p->p_estcpu. ./kern/kern_fork.c: bzero(&ke2->ke_startzero, ./kern/kern_fork.c: (unsigned) RANGEOF(struct kse, ke_startzero, ke_endz ero)); ./kern/kern_fork.c: ke2->ke_state = KES_THREAD; ./kern/kern_fork.c: ke2->ke_thread = td2; ./kern/kern_fork.c: td->td_kse->ke_oncpu = PCPU_GET(cpuid); ./kern/kern_idle.c: td->td_kse->ke_flags |= KEF_IDLEKSE; ./kern/kern_intr.c: if (ctd->td_kse->ke_flags & KEF_IDLEKSE) ./kern/kern_intr.c: curthread->td_kse->ke_flags |= KEF_NEEDR ESCHED; ./kern/kern_ktr.c: (td->td_kse->ke_flags & KEF_IDLEKSE) == 0 && ./kern/kern_mutex.c: ((td)->td_kse != NULL && (td)->td_kse->ke_oncpu != NOCPU ) ./kern/kern_proc.c: kp->ki_rqindex = ke->ke_rqindex; ./kern/kern_proc.c: kp->ki_oncpu = ke->ke_oncpu; ./kern/kern_sig.c: ke->ke_flags |= KEF_ASTPENDING; ./kern/subr_smp.c: id = td->td_kse->ke_oncpu; ./kern/subr_smp.c: td->td_kse->ke_flags |= KEF_NEEDRESCHED; ./kern/subr_trap.c: (td->td_kse->ke_flags & KEF_ASTPENDING) == 0)) ./kern/subr_trap.c: flags = ke->ke_flags; ./kern/subr_trap.c: ke->ke_flags &= ~(KEF_ASTPENDING | KEF_NEEDRESCHED); ./kern/subr_witness.c: * td->td_kse->ke_oncpu to get the list of spinlocks for this thread ./posix4/ksched.c: td->td_kse->ke_flags |= KEF_NEEDRESCHED; ./posix4/ksched.c: td->td_kse->ke_flags |= KEF_NEEDRESCHED; ./posix4/ksched.c: curthread->td_kse->ke_flags |= KEF_NEEDRESCHED; ./powerpc/powerpc/genassym.c:ASSYM(KE_FLAGS, offsetof(struct kse, ke_flags)); ./sys/sched.h:extern struct ke_sched *kse0_sched; ./security/mac_lomac/mac_lomac.c: curthread->td_kse->ke_flags |= KEF_ASTPE NDING; ./sparc64/sparc64/genassym.c:ASSYM(KE_FLAGS, offsetof(struct kse, ke_flags)); ################################################ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Jan 28 20:19:12 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5913E37B401 for ; Tue, 28 Jan 2003 20:19:10 -0800 (PST) Received: from citusc.usc.edu (citusc.usc.edu [128.125.38.123]) by mx1.FreeBSD.org (Postfix) with ESMTP id DD4E643F79 for ; Tue, 28 Jan 2003 20:19:09 -0800 (PST) (envelope-from kris@citusc.usc.edu) Received: (from kris@localhost) by citusc.usc.edu (8.11.6/8.11.2) id h0T4J2m01722; Tue, 28 Jan 2003 20:19:02 -0800 Date: Tue, 28 Jan 2003 20:19:02 -0800 From: Kris Kennaway To: Julian Elischer Cc: arch@FreeBSD.ORG Subject: Re: threads and the scheduler(s). Message-ID: <20030128201901.A1673@citusc.usc.edu> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="yrj/dFKFPuw6o+aM" Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: ; from julian@elischer.org on Tue, Jan 28, 2003 at 12:11:27PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --yrj/dFKFPuw6o+aM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jan 28, 2003 at 12:11:27PM -0800, Julian Elischer wrote: >=20 > At the beginning of this we spec'd out a design and several things > were said. >=20 > 1/ the names of these entities might change as we discover the > implementation details,=20 >=20 > and >=20 > 2/ Sme of thes structures (thread, KSE, KSEGRP, proc) might become > more or less 'virtual' when we try implement them. >=20 > Well that time has happenned. I'm kind of concerned that KSE has turned into something much more complicated than it should ever have been (and already was at the beginning). I understand scheduler activations, but as far as I can tell we're no longer even close to the original KSE design because of various incremental changes in direction, and there's no coherent documentation for the current state of affairs (or explanations for the various design decisions in departure from the original design). I guess I don't actually have anything constructive to say, but I really hope this whole KSE thing isn't going to turn out to be an over-designed way of not-really-solving non-problems. Kris --yrj/dFKFPuw6o+aM Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE+N1Y1Wry0BWjoQKURArGcAKCWUwZrYpHRVFXdNOEjsAKLIOrIKwCeIGOv bJ6w8nIkgbaBtUgRzE9or0Q= =7P7x -----END PGP SIGNATURE----- --yrj/dFKFPuw6o+aM-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Jan 28 21:48:33 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B63B937B401 for ; Tue, 28 Jan 2003 21:48:31 -0800 (PST) Received: from sccrmhc01.attbi.com (sccrmhc01.attbi.com [204.127.202.61]) by mx1.FreeBSD.org (Postfix) with ESMTP id ECC7E43FA3 for ; Tue, 28 Jan 2003 21:48:30 -0800 (PST) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4]) by sccrmhc01.attbi.com (sccrmhc01) with ESMTP id <200301290548290010086uboe>; Wed, 29 Jan 2003 05:48:29 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id VAA20486; Tue, 28 Jan 2003 21:48:28 -0800 (PST) Date: Tue, 28 Jan 2003 21:48:26 -0800 (PST) From: Julian Elischer To: Kris Kennaway Cc: arch@FreeBSD.ORG Subject: Re: threads and the scheduler(s). In-Reply-To: <20030128201901.A1673@citusc.usc.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, 28 Jan 2003, Kris Kennaway wrote: > On Tue, Jan 28, 2003 at 12:11:27PM -0800, Julian Elischer wrote: > > > > At the beginning of this we spec'd out a design and several things > > were said. > > > > 1/ the names of these entities might change as we discover the > > implementation details, > > > > and > > > > 2/ Sme of thes structures (thread, KSE, KSEGRP, proc) might become > > more or less 'virtual' when we try implement them. > > > > Well that time has happenned. > > I'm kind of concerned that KSE has turned into something much more > complicated than it should ever have been (and already was at the > beginning). I understand scheduler activations, but as far as I can > tell we're no longer even close to the original KSE design because of > various incremental changes in direction, and there's no coherent > documentation for the current state of affairs (or explanations for > the various design decisions in departure from the original design). > > I guess I don't actually have anything constructive to say, but I > really hope this whole KSE thing isn't going to turn out to be an > over-designed way of not-really-solving non-problems. > Hmm well it's an interesting point of view, but the implementation is actually very close to what was proposed 2 years ago. If you think it's complicated you are misundertanding something.. it's amazingly simple, it's just alot of owrk getting there from a non-threading start. > Kris > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Jan 29 1:10:32 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 12CC537B401; Wed, 29 Jan 2003 01:10:31 -0800 (PST) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 37E9943F75; Wed, 29 Jan 2003 01:10:30 -0800 (PST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h0T9ATo86808; Wed, 29 Jan 2003 04:10:29 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Wed, 29 Jan 2003 04:10:28 -0500 (EST) From: Jeff Roberson To: phk@FreeBSD.ORG Cc: Robert Watson , Julian Elischer , Matthew Dillon , Steve Kargl , Gary Jennejohn , Subject: Re: New scheduler - Interactivity fixes In-Reply-To: <36865.1043771568@critter.freebsd.dk> Message-ID: <20030129040835.G31308-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, 28 Jan 2003 phk@FreeBSD.ORG wrote: > > > > scheduler 4bsd > > #scheduler ule > > This is just putting duck-tape over the hole rather than fix it. > > Config(8) barely managed to get a one-dimensional view of the options, > it needs to learn about dependencies in a proper way, rather than > to add magic keywords for all kernel parts. > Yes, I agree with this. I have no intentions of hacking up config though. In the absence of a real dependency mechanism I might be swayed to do some duct-tape. Perhaps we could come up with a spec and propose the idea to someone who is inclined to work on these sorts of things? I know peter had some good ideas for this. Cheers, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Jan 29 1:22:33 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DF75D37B401 for ; Wed, 29 Jan 2003 01:22:31 -0800 (PST) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 260BB43F3F for ; Wed, 29 Jan 2003 01:22:31 -0800 (PST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h0T9K2590232; Wed, 29 Jan 2003 04:20:03 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Wed, 29 Jan 2003 04:20:02 -0500 (EST) From: Jeff Roberson To: Kris Kennaway Cc: Julian Elischer , Subject: Re: threads and the scheduler(s). In-Reply-To: <20030128201901.A1673@citusc.usc.edu> Message-ID: <20030129041155.Y31308-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > > I'm kind of concerned that KSE has turned into something much more > complicated than it should ever have been (and already was at the > beginning). I understand scheduler activations, but as far as I can > tell we're no longer even close to the original KSE design because of > various incremental changes in direction, and there's no coherent > documentation for the current state of affairs (or explanations for > the various design decisions in departure from the original design). > > I guess I don't actually have anything constructive to say, but I > really hope this whole KSE thing isn't going to turn out to be an > over-designed way of not-really-solving non-problems. > I had many of these same concerns as I began to work on ule. After thinking hard about the problem and how I might attack it, I eventually decided that this is the right way. A long conversation with mini on IRC one night actually got me to start liking it. I think I came at it with the same perspective that many in the project have. I had a reasonable understanding of SA and LPWs but this didn't make much sense. I think a design doc, or at the very least a description of how and why this differs from traditional SA, would go a long way towards settling peoples fears about KSE. I think it's just not well understood by many. My concerns now are mostly with how long it will take before it works and how long it will take before it works well. It is a complicated system. There are too few people working on it. It's also a major sore spot for FreeBSD to not have it. Cheers, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Jan 29 1:49:50 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 696AD37B401; Wed, 29 Jan 2003 01:49:49 -0800 (PST) Received: from cirb503493.alcatel.com.au (c18609.belrs1.nsw.optusnet.com.au [210.49.80.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id 75A9743E4A; Wed, 29 Jan 2003 01:49:48 -0800 (PST) (envelope-from peterjeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1]) by cirb503493.alcatel.com.au (8.12.5/8.12.5) with ESMTP id h0T9niLZ027852; Wed, 29 Jan 2003 20:49:44 +1100 (EST) (envelope-from jeremyp@cirb503493.alcatel.com.au) Received: (from jeremyp@localhost) by cirb503493.alcatel.com.au (8.12.6/8.12.5/Submit) id h0T9nhoT027851; Wed, 29 Jan 2003 20:49:43 +1100 (EST) Date: Wed, 29 Jan 2003 20:49:43 +1100 From: Peter Jeremy To: Robert Watson Cc: Jeff Roberson , arch@FreeBSD.ORG Subject: Re: New scheduler - Interactivity fixes Message-ID: <20030129094943.GA27833@cirb503493.alcatel.com.au> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, Jan 28, 2003 at 11:29:14AM -0500, Robert Watson wrote: > scheduler 4bsd > #scheduler ule > >Config(8) could be taught to allow at most one scheduler directive. It >would allow zero with a warning ("No scheduler defined at compile-time; >one must be loaded as a module"). We'd either introduce a new class >similar to OPTIONS, or map the argument of scheduler directly to an >option, with the possible risk of getting some false positives in the >argument validity checking. Traditionally, this is done by having the linker whinge about undefined or multiply defined symbols when it tries to link the kernel, leaving the poor user to try and guess which option he got wrong :-). I'd prefer to allow multiple schedulers to be compiled in (or KLD'd), with a sysctl to select which one (preferably alterable at runtime). This reduces(?) the problem to: - ensure that at least one scheduler is present before trying to use it - picking a default scheduler when multiple schedulers are present. >(For the humor impaired, we could also use "sched" instead of "scheduler" >above.) Maybe we should name all the schedulers after colours. "Blue" seems logical for Jeff's new scheduler. How about "red" for the standard one? Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Jan 29 10: 4:19 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 65EB537B401; Wed, 29 Jan 2003 10:04:18 -0800 (PST) Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id A73A043E4A; Wed, 29 Jan 2003 10:04:17 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h0TI45Nt004953; Wed, 29 Jan 2003 10:04:05 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h0TI45Me004952; Wed, 29 Jan 2003 10:04:05 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Date: Wed, 29 Jan 2003 10:04:05 -0800 From: David Schultz To: Peter Jeremy Cc: Robert Watson , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: New scheduler - Interactivity fixes Message-ID: <20030129180405.GA3139@HAL9000.homeunix.com> Mail-Followup-To: Peter Jeremy , Robert Watson , Jeff Roberson , arch@FreeBSD.ORG References: <20030129094943.GA27833@cirb503493.alcatel.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030129094943.GA27833@cirb503493.alcatel.com.au> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Thus spake Peter Jeremy : > I'd prefer to allow multiple schedulers to be compiled in (or KLD'd), > with a sysctl to select which one (preferably alterable at runtime). > This reduces(?) the problem to: > - ensure that at least one scheduler is present before trying to use it > - picking a default scheduler when multiple schedulers are present. Some researchers at Microsoft implmeneted dynamically loadable schedulers for Windows NT a few years ago. There idea was to allow applications to load different algorithms for different parts of the system. I think they subsequently developed it to the point where you could have a hierarchy of schedulers, all at the same time. While some of their ideas have the potential to become incredibly complicated on the long and winding path from research prototype to the real world, it may be helpful to look at their design. (Also note that KSE does allow applications to have some control over scheduling policies among their threads and KSEs, so we probably ought to leverage that...AFTER we have a working KSE implementation.) I can't find the Microsoft Research paper that I remember, but I did find an older, less detailed paper on the same topic: http://citeseer.nj.nec.com/candea98vassal.html To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Jan 29 10:15:55 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DF40F37B401; Wed, 29 Jan 2003 10:15:53 -0800 (PST) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 630C443F43; Wed, 29 Jan 2003 10:15:53 -0800 (PST) (envelope-from rizzo@xorpc.icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.3/8.12.3) with ESMTP id h0TIFN8a054148; Wed, 29 Jan 2003 10:15:23 -0800 (PST) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.3/8.12.3/Submit) id h0TIFNpG054147; Wed, 29 Jan 2003 10:15:23 -0800 (PST) (envelope-from rizzo) Date: Wed, 29 Jan 2003 10:15:23 -0800 From: Luigi Rizzo To: David Schultz Cc: Peter Jeremy , Robert Watson , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: New scheduler - Interactivity fixes Message-ID: <20030129101523.A54088@xorpc.icir.org> References: <20030129094943.GA27833@cirb503493.alcatel.com.au> <20030129180405.GA3139@HAL9000.homeunix.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20030129180405.GA3139@HAL9000.homeunix.com>; from dschultz@uclink.Berkeley.EDU on Wed, Jan 29, 2003 at 10:04:05AM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, Jan 29, 2003 at 10:04:05AM -0800, David Schultz wrote: > Thus spake Peter Jeremy : > > I'd prefer to allow multiple schedulers to be compiled in (or KLD'd), > > with a sysctl to select which one (preferably alterable at runtime). > > This reduces(?) the problem to: > > - ensure that at least one scheduler is present before trying to use it > > - picking a default scheduler when multiple schedulers are present. > > Some researchers at Microsoft implmeneted dynamically loadable > schedulers for Windows NT a few years ago. There idea was to and the same concept is used in our proportional share scheduler, where you can switch scheduler at runtime and have multiple schedulers loaded at compile time (possibly kld works too, I have just been too lazy to write a suitable Makefile for the module) An older version of that code (for -stable) can be found at http://info.iet.unipi.it/~luigi/ps_sched.20020719a.diff I have been trying to suggest the introduction of this virtual interface for dynamically loadable schedulers for a while, with little success unfortunately (random complaints about the overhead of going through an indirect function calls when a scheduler function should be invoked -- which is totally irrelevant given how expensive is the effect of a potential context switch, and the fact that we are already paying that overhead one or more times for each incoming packet, an event that is 1..3 orders of magnitude more frequent). cheers luigi To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Jan 29 10:42:23 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D431E37B401 for ; Wed, 29 Jan 2003 10:42:21 -0800 (PST) Received: from angelica.unixdaemons.com (angelica.unixdaemons.com [209.148.64.135]) by mx1.FreeBSD.org (Postfix) with ESMTP id 764DB43F93 for ; Wed, 29 Jan 2003 10:42:20 -0800 (PST) (envelope-from hiten@angelica.unixdaemons.com) Received: from angelica.unixdaemons.com (hiten@localhost.unixdaemons.com [127.0.0.1]) by angelica.unixdaemons.com (8.12.7/8.12.1) with ESMTP id h0TIgBa6071044; Wed, 29 Jan 2003 13:42:11 -0500 (EST) Received: (from hiten@localhost) by angelica.unixdaemons.com (8.12.7/8.12.1/Submit) id h0TIgBHk071043; Wed, 29 Jan 2003 13:42:11 -0500 (EST) (envelope-from hiten) Date: Wed, 29 Jan 2003 13:42:11 -0500 From: Hiten Pandya To: Scott Long Cc: arch@FreeBSD.ORG Subject: Re: bus_dmamem_alloc_size() Message-ID: <20030129184211.GA70010@unixdaemons.com> References: <3E35DE8E.2080706@btc.adaptec.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3E35DE8E.2080706@btc.adaptec.com> User-Agent: Mutt/1.4i X-Operating-System: FreeBSD i386 X-Public-Key: http://www.pittgoth.com/~hiten/pubkey.asc X-URL: http://www.unixdaemons.com/~hiten X-PGP: http://pgp.mit.edu:11371/pks/lookup?search=Hiten+Pandya&op=index Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, Jan 27, 2003 at 06:36:14PM -0700, Scott Long wrote the words in effect of: > All, > > With the ongoing effort to convert all of our drivers to busdma, it's > becoming painfully apparrent that bus_dmamem_alloc() needs to > be able to specify the size of the buffer to create, and not just > default to the max_size field of the dma tag. The reason is that a > lot of simple hardware out there do understand scatter-gather lists, > so any data passed to them must be physically contiguous in a > single segment. This in turn means that drivers need to allocate a > single segment buffer via busdma and copy their i/o contents > into/out of it when talking to the card. These i/o lengths are often > extremely variable in size. Without the ability to allocate variable > sized buffers, you're forced to either create a custom dma tag per > i/o transaction, or pre-allocate a huge chunk up front and do your > own sub allocations out of it. Both are rather tedious and > inefficient. > > So, how about adding a method called bus_dmamem_alloc_size() > that takes the normal arguments of bus_dmamem_alloc(), plus an > allocation size. Driver writers would still be encouraged to be smart > about memory management since contigmalloc() would still be the > underlying allocator, and contigmalloc rounds all requests up > PAGE_SIZE. > > Patches to do this are trivial and can be provided on request. If > I don't hear any arguments against this, I'll commit it this week. > In case anyone cares, the my motivation for this comes from trying > to convert the usb driver to busdma. NetBSD had added a size parameter to their bus_dmamem_alloc() at some point. It was merged from thorpej's branch. We should probably do the same, but I think this will be a harder/tedious task. JFYI. Cheers. -- Hiten Pandya (hiten@unixdaemons.com, hiten@uk.FreeBSD.org) http://www.unixdaemons.com/~hiten/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Jan 29 12:27:14 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4BA1637B401 for ; Wed, 29 Jan 2003 12:27:12 -0800 (PST) Received: from magic.adaptec.com (magic.adaptec.com [208.236.45.80]) by mx1.FreeBSD.org (Postfix) with ESMTP id A15B643F3F for ; Wed, 29 Jan 2003 12:27:11 -0800 (PST) (envelope-from scott_long@btc.adaptec.com) Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11]) by magic.adaptec.com (8.11.6+Sun/8.11.6) with ESMTP id h0TKR3D24897; Wed, 29 Jan 2003 12:27:03 -0800 (PST) Received: from btc.btc.adaptec.com (btc.btc.adaptec.com [10.100.0.52]) by redfish.adaptec.com (8.8.8+Sun/8.8.8) with ESMTP id MAA00538; Wed, 29 Jan 2003 12:26:52 -0800 (PST) Received: from btc.adaptec.com (hollin [10.100.253.56]) by btc.btc.adaptec.com (8.8.8+Sun/8.8.8) with ESMTP id NAA25761; Wed, 29 Jan 2003 13:26:48 -0700 (MST) Message-ID: <3E383846.20308@btc.adaptec.com> Date: Wed, 29 Jan 2003 13:23:34 -0700 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3a) Gecko/20030127 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Hiten Pandya Cc: arch@freebsd.org Subject: Re: bus_dmamem_alloc_size() References: <3E35DE8E.2080706@btc.adaptec.com> <20030129184211.GA70010@unixdaemons.com> In-Reply-To: <20030129184211.GA70010@unixdaemons.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hiten Pandya wrote: > On Mon, Jan 27, 2003 at 06:36:14PM -0700, Scott Long wrote the words in effect of: > >>All, >> >>With the ongoing effort to convert all of our drivers to busdma, it's >>becoming painfully apparrent that bus_dmamem_alloc() needs to >>be able to specify the size of the buffer to create, and not just >>default to the max_size field of the dma tag. The reason is that a >>lot of simple hardware out there do understand scatter-gather lists, >>so any data passed to them must be physically contiguous in a >>single segment. This in turn means that drivers need to allocate a >>single segment buffer via busdma and copy their i/o contents >>into/out of it when talking to the card. These i/o lengths are often >>extremely variable in size. Without the ability to allocate variable >>sized buffers, you're forced to either create a custom dma tag per >>i/o transaction, or pre-allocate a huge chunk up front and do your >>own sub allocations out of it. Both are rather tedious and >>inefficient. >> >>So, how about adding a method called bus_dmamem_alloc_size() >>that takes the normal arguments of bus_dmamem_alloc(), plus an >>allocation size. Driver writers would still be encouraged to be smart >>about memory management since contigmalloc() would still be the >>underlying allocator, and contigmalloc rounds all requests up >>PAGE_SIZE. >> >>Patches to do this are trivial and can be provided on request. If >>I don't hear any arguments against this, I'll commit it this week. >>In case anyone cares, the my motivation for this comes from trying >>to convert the usb driver to busdma. > > > NetBSD had added a size parameter to their bus_dmamem_alloc() at some > point. It was merged from thorpej's branch. We should probably do the > same, but I think this will be a harder/tedious task. > > JFYI. > Cheers. > Well, doing it wasn't that hard (though the sparc64 side of things was horribly tedious). What I probably didn't stress enough in my proposal is that this is just a stop-gap to allow the busdma work to go on. What really needs to happen in for bus_dmamap_load() to learn how to coelesce segments. That way, you define a tag with a max_segs field on 1, pass your data buffer to bus_dmamap_load(), and it does whatever memcpy()'ing or IOMMU magic is neccessary to get it to one physical segment (the same applies to having more than one segment, but it is most relevant with 1). This is the thing that's not very easy to do. I don't have the time or foo right now to do a good job at it. If someone is interested in working on this, it would be very nice to have for 5-STABLE. Things that you would want to consider are 1) sparc64 IOMMU, 2) AGP GART, 3) whatever magic exists for alpha and ia64, 4) how to make this work with PAE on i386. Scott To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Jan 29 12:41:39 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EE00637B401 for ; Wed, 29 Jan 2003 12:41:37 -0800 (PST) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1417A43F3F for ; Wed, 29 Jan 2003 12:41:37 -0800 (PST) (envelope-from arr@watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.6/8.12.5) with ESMTP id h0TKfGP3013383; Wed, 29 Jan 2003 15:41:16 -0500 (EST) (envelope-from arr@watson.org) Received: from localhost (arr@localhost) by fledge.watson.org (8.12.6/8.12.6/Submit) with SMTP id h0TKfFh7013380; Wed, 29 Jan 2003 15:41:16 -0500 (EST) X-Authentication-Warning: fledge.watson.org: arr owned process doing -bs Date: Wed, 29 Jan 2003 15:41:15 -0500 (EST) From: "Andrew R. Reiter" To: Scott Long Cc: arch@FreeBSD.ORG Subject: PAE (was Re: bus_dmamem_alloc_size()) In-Reply-To: <3E383846.20308@btc.adaptec.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG : :If someone is interested in working on this, it would be very nice to have :for 5-STABLE. Things that you would want to consider are 1) sparc64 :IOMMU, 2) AGP GART, 3) whatever magic exists for alpha and ia64, 4) :how to make this work with PAE on i386. Anyone know the status of PAE in fBSD? I heard rumors awhile back that people had patches, or Y! had patches... but has anyone actually coughed them up? Cheers, Andrew -- Andrew R. Reiter arr@watson.org arr@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Jan 29 13: 4:38 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 132A037B401 for ; Wed, 29 Jan 2003 13:04:37 -0800 (PST) Received: from magic.adaptec.com (magic.adaptec.com [208.236.45.80]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8FEB743F79 for ; Wed, 29 Jan 2003 13:04:36 -0800 (PST) (envelope-from scott_long@btc.adaptec.com) Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11]) by magic.adaptec.com (8.11.6+Sun/8.11.6) with ESMTP id h0TL4QD01326; Wed, 29 Jan 2003 13:04:26 -0800 (PST) Received: from btc.btc.adaptec.com (btc.btc.adaptec.com [10.100.0.52]) by redfish.adaptec.com (8.8.8+Sun/8.8.8) with ESMTP id NAA07499; Wed, 29 Jan 2003 13:04:11 -0800 (PST) Received: from btc.adaptec.com (hollin [10.100.253.56]) by btc.btc.adaptec.com (8.8.8+Sun/8.8.8) with ESMTP id OAA25779; Wed, 29 Jan 2003 14:04:08 -0700 (MST) Message-ID: <3E384106.5030107@btc.adaptec.com> Date: Wed, 29 Jan 2003 14:00:54 -0700 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3a) Gecko/20030127 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "Andrew R. Reiter" Cc: arch@freebsd.org Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Andrew R. Reiter wrote: > > > : > :If someone is interested in working on this, it would be very nice to have > :for 5-STABLE. Things that you would want to consider are 1) sparc64 > :IOMMU, 2) AGP GART, 3) whatever magic exists for alpha and ia64, 4) > :how to make this work with PAE on i386. > > > Anyone know the status of PAE in fBSD? I heard rumors awhile back that > people had patches, or Y! had patches... but has anyone actually coughed > them up? > > Cheers, > Andrew > > -- > Andrew R. Reiter > arr@watson.org > arr@FreeBSD.org I know of an unofficial rumor that a contract is being arranged for a FreeBSD developer to start work on it very soon. Scott To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Jan 29 14:49:24 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5DFFE37B401 for ; Wed, 29 Jan 2003 14:49:23 -0800 (PST) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 049F243F43 for ; Wed, 29 Jan 2003 14:49:23 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0131.cvx22-bradley.dialup.earthlink.net ([209.179.198.131] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18e11E-0003Uf-00; Wed, 29 Jan 2003 14:49:21 -0800 Message-ID: <3E385A1E.629EB694@mindspring.com> Date: Wed, 29 Jan 2003 14:47:58 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "Andrew R. Reiter" Cc: Scott Long , arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a41cff534f05c92ddc238729f429c31cc9387f7b89c61deb1d350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG "Andrew R. Reiter" wrote: > Anyone know the status of PAE in fBSD? I heard rumors awhile back that > people had patches, or Y! had patches... but has anyone actually coughed > them up? Contact Paul Saab. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Jan 29 16:40:34 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DC76737B401 for ; Wed, 29 Jan 2003 16:40:32 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7DC2743F3F for ; Wed, 29 Jan 2003 16:40:32 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h0U0eM0i070156; Wed, 29 Jan 2003 16:40:22 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h0U0eJQe070153; Wed, 29 Jan 2003 16:40:19 -0800 (PST) Date: Wed, 29 Jan 2003 16:40:19 -0800 (PST) From: Matthew Dillon Message-Id: <200301300040.h0U0eJQe070153@apollo.backplane.com> To: Jeff Roberson Cc: Kris Kennaway , Julian Elischer , Subject: Found another SCHED_ULE bug. References: <20030129041155.Y31308-100000@mail.chesapeake.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I believe I found another bug. While checking interrupt latencies (using ping from another machine), I noticed that cpu-bound processes could cause large ping latencies. I think the problem has to do with how KEF_NEEDRESCHED is set in sched_wakeup(). Specifically, in sched_wakeup() you only set KEF_NEEDRESCHED on curthread, missing the fact that the KSE may have been added to a different cpu's queue. Setting KEF_NEEDRESCHED on curthread for the current cpu will not necessarily reschedule the thread being woken up on the target cpu. The result is the target cpu runs through its time slice before scheduling the new thread, causing interrupt latency. This isn't a problem in sched_4bsd because the run queue is consolidated, so setting NEEDRESCHED on any cpu's curthread is sufficient to get cpu to the new (interrupt) thread. I'm not sure how to solve the problem in ULE. I tried calling forward_signal() on the target cpu's curthread, but it panic'd with 'thread not TDS_RUNNING' or something like that. You may have to force the interrupt thread to be scheduled on the current cpu in order to properly reschedule it. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Jan 29 17:11: 4 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1D4D437B401 for ; Wed, 29 Jan 2003 17:11:02 -0800 (PST) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id EDC2A43E4A for ; Wed, 29 Jan 2003 17:11:01 -0800 (PST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h0U195J75576; Wed, 29 Jan 2003 20:09:05 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Wed, 29 Jan 2003 20:09:05 -0500 (EST) From: Jeff Roberson To: Matthew Dillon Cc: Kris Kennaway , Julian Elischer , Subject: Re: Found another SCHED_ULE bug. In-Reply-To: <200301300040.h0U0eJQe070153@apollo.backplane.com> Message-ID: <20030129200528.Y31308-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 29 Jan 2003, Matthew Dillon wrote: > I think the problem has to do with how KEF_NEEDRESCHED is > set in sched_wakeup(). Specifically, in sched_wakeup() > you only set KEF_NEEDRESCHED on curthread, missing the fact > that the KSE may have been added to a different cpu's queue. > Setting KEF_NEEDRESCHED on curthread for the current cpu will > not necessarily reschedule the thread being woken up on the > target cpu. It looks like this isn't terribly important for anything other than interrupt handlers or maybe 'real-time' threads. The solution for interrupt handlers is simple. If we can preempt we can force it to run on the cpu that the interrupt fired on. I have this in my local tree but it needs to be implemented in a more general way. I will produce a patch in the next few days. If we want to solve it for things other than interrupt handlers we'll have to peak at the currently executing thread on the cpu that we're inserting the thread onto. This is sort of gross. > > I'm not sure how to solve the problem in ULE. I tried calling > forward_signal() on the target cpu's curthread, but it > panic'd with 'thread not TDS_RUNNING' or something like that. > You may have to force the interrupt thread to be scheduled on > the current cpu in order to properly reschedule it. Thanks for the bug report! Cheers, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Jan 29 17:50: 7 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 56BEF37B401 for ; Wed, 29 Jan 2003 17:50:06 -0800 (PST) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0B26643F43 for ; Wed, 29 Jan 2003 17:50:06 -0800 (PST) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id E2EFE2A89E; Wed, 29 Jan 2003 17:50:05 -0800 (PST) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Terry Lambert Cc: "Andrew R. Reiter" , Scott Long , arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) In-Reply-To: <3E385A1E.629EB694@mindspring.com> Date: Wed, 29 Jan 2003 17:50:05 -0800 From: Peter Wemm Message-Id: <20030130015005.E2EFE2A89E@canning.wemm.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Terry Lambert wrote: > "Andrew R. Reiter" wrote: > > Anyone know the status of PAE in fBSD? I heard rumors awhile back that > > people had patches, or Y! had patches... but has anyone actually coughed > > them up? > > Contact Paul Saab. Nope. Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Jan 29 18:42: 8 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3FD1D37B401 for ; Wed, 29 Jan 2003 18:42:07 -0800 (PST) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id A7D8643F93 for ; Wed, 29 Jan 2003 18:42:06 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0224.cvx22-bradley.dialup.earthlink.net ([209.179.198.224] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18e4eQ-0000o3-00; Wed, 29 Jan 2003 18:42:03 -0800 Message-ID: <3E3890A6.E7256A55@mindspring.com> Date: Wed, 29 Jan 2003 18:40:38 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Peter Wemm Cc: "Andrew R. Reiter" , Scott Long , arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) References: <20030130015005.E2EFE2A89E@canning.wemm.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4774102f7e9d2166129523b4f816a8045350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Peter Wemm wrote: > Terry Lambert wrote: > > "Andrew R. Reiter" wrote: > > > Anyone know the status of PAE in fBSD? I heard rumors awhile back that > > > people had patches, or Y! had patches... but has anyone actually coughed > > > them up? > > > > Contact Paul Saab. > > Nope. Peter Wemm? 8-) 8-). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Jan 29 22:48: 7 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A595F37B47B for ; Wed, 29 Jan 2003 22:48:03 -0800 (PST) Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1CBB443F3F for ; Wed, 29 Jan 2003 22:48:03 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h0U6m1Nt007504; Wed, 29 Jan 2003 22:48:01 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h0U6m024007503; Wed, 29 Jan 2003 22:48:00 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Date: Wed, 29 Jan 2003 22:48:00 -0800 From: David Schultz To: Terry Lambert Cc: "Andrew R. Reiter" , Scott Long , arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) Message-ID: <20030130064800.GB7258@HAL9000.homeunix.com> Mail-Followup-To: Terry Lambert , "Andrew R. Reiter" , Scott Long , arch@FreeBSD.ORG References: <3E385A1E.629EB694@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3E385A1E.629EB694@mindspring.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Thus spake Terry Lambert : > "Andrew R. Reiter" wrote: > > Anyone know the status of PAE in fBSD? I heard rumors awhile back that > > people had patches, or Y! had patches... but has anyone actually coughed > > them up? > > Contact Paul Saab. A year ago, the rumor was that DG was eventually going to do it. Six months ago it was Peter Wemm. And now Paul Saab?! Sheesh. Why don't we just wait another few years so 64-bit machines solve all our problems and we don't have to hack up the VM system? ;-) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Jan 30 0:30:26 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 71DB537B401 for ; Thu, 30 Jan 2003 00:30:25 -0800 (PST) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id 29B2E43F43 for ; Thu, 30 Jan 2003 00:30:25 -0800 (PST) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id 05CC12A8A1; Thu, 30 Jan 2003 00:30:20 -0800 (PST) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Terry Lambert Cc: "Andrew R. Reiter" , Scott Long , arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) In-Reply-To: <3E3890A6.E7256A55@mindspring.com> Date: Thu, 30 Jan 2003 00:30:20 -0800 From: Peter Wemm Message-Id: <20030130083020.05CC12A8A1@canning.wemm.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Terry Lambert wrote: > Peter Wemm wrote: > > Terry Lambert wrote: > > > "Andrew R. Reiter" wrote: > > > > Anyone know the status of PAE in fBSD? I heard rumors awhile back that > > > > people had patches, or Y! had patches... but has anyone actually coughe d > > > > them up? > > > > > > Contact Paul Saab. > > > > Nope. > > Peter Wemm? We've been tinkering with it, but frankly, PAE isn't as useful to us as a larger virtual address space would be. eg: ia64 or x86-64. There have been various experiments and parts of the problem worked on, but nothing even remotely complete. Most of what I've tinkered with is available in the p4 tree. Reliable sources tell me that this should change soon, but it is not from Y!. I'll leave it up to the folks involved to add more details if they think it appropriate. In the mean time, please stop sending people in our direction. Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Jan 30 5: 2:27 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 604D237B401 for ; Thu, 30 Jan 2003 05:02:26 -0800 (PST) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id BE25843F79 for ; Thu, 30 Jan 2003 05:02:25 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0033.cvx21-bradley.dialup.earthlink.net ([209.179.192.33] helo=mindspring.com) by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18eEKa-00044a-00; Thu, 30 Jan 2003 05:02:13 -0800 Message-ID: <3E3921FD.BA247304@mindspring.com> Date: Thu, 30 Jan 2003 05:00:45 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Peter Wemm Cc: "Andrew R. Reiter" , Scott Long , arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) References: <20030130083020.05CC12A8A1@canning.wemm.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4b6c3fe537de370faa72e649002f0f1d0387f7b89c61deb1d350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Peter Wemm wrote: > > Peter Wemm? > > We've been tinkering with it, but frankly, PAE isn't as useful to us as a > larger virtual address space would be. eg: ia64 or x86-64. There have been > various experiments and parts of the problem worked on, but nothing even > remotely complete. Most of what I've tinkered with is available in the p4 > tree. > > Reliable sources tell me that this should change soon, but it is not from > Y!. I'll leave it up to the folks involved to add more details if they > think it appropriate. In the mean time, please stop sending people in our > direction. No problem. The reason I don't send them in my own direction is that I personally don't find it useful, either, so I didn't do anything much on it, past a proof-of-concept. The PSE36 actually seems marginally more useful, but in both cases, the resulting code is not nearly as useful as a large address space would be, and the overhead is prohibitively expensive for it to be useful. IMO, unless you use TSS for task switching, and are willing to eat the overhead of not being able to access task memory simultaneously, and jamming all shared memory into a small window of shared address space, the idea is totally screwed (shared libraries are particularly problematic, given where they are mapped). For some reason, people think PAE/PSE36 is a substitute for 64bit architectures, when it comes to accessing more memory; in reality, the hardware design is bad enough that it's not really possible to access the memory simultaneously, and that's the most interesting (and useful) application for "more memory". In general, I sent people your way because you guys were talking about playing with it in the context of the one place it might be considered useful (running too many processes on one machine, and eating the associated overhead), since it's not really possible to make the additional physical memory a DMA target on most hardware, even if you have a 64bit PCI controller available for all your network and disk controllers. If anything, I have *more* respect for you guys for abandoning it... 8-) 8-). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Jan 30 5: 5:52 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 282AF37B401 for ; Thu, 30 Jan 2003 05:05:51 -0800 (PST) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id B496343E4A for ; Thu, 30 Jan 2003 05:05:50 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0033.cvx21-bradley.dialup.earthlink.net ([209.179.192.33] helo=mindspring.com) by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18eEO4-0004P6-00; Thu, 30 Jan 2003 05:05:49 -0800 Message-ID: <3E3922DA.55D9CDE9@mindspring.com> Date: Thu, 30 Jan 2003 05:04:26 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: David Schultz Cc: "Andrew R. Reiter" , Scott Long , arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) References: <3E385A1E.629EB694@mindspring.com> <20030130064800.GB7258@HAL9000.homeunix.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4b6c3fe537de370fad53262b1bcc4eccca7ce0e8f8d31aa3f350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG David Schultz wrote: > Thus spake Terry Lambert : > > "Andrew R. Reiter" wrote: > > > Anyone know the status of PAE in fBSD? I heard rumors awhile back that > > > people had patches, or Y! had patches... but has anyone actually coughed > > > them up? > > > > Contact Paul Saab. > > A year ago, the rumor was that DG was eventually going to do it. > Six months ago it was Peter Wemm. And now Paul Saab?! Sheesh. > Why don't we just wait another few years so 64-bit machines solve > all our problems and we don't have to hack up the VM system? ;-) PSE36 is more intelligent than PAE, but neither one are very smart; they were put there by hardware people who thought that what software people wanted was more processes in RAM, not more RAM in individual processes. As such, they are a generally bad idea. Most people asking the question seem to have bought into the hardware people's picture of the universe, without understanding that. 8-(. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Jan 30 9:51:33 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6EB0537B401 for ; Thu, 30 Jan 2003 09:51:32 -0800 (PST) Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id BE7D343F3F for ; Thu, 30 Jan 2003 09:51:30 -0800 (PST) (envelope-from dschultz@uclink.berkeley.edu) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h0UHpSNt009925; Thu, 30 Jan 2003 09:51:28 -0800 (PST) (envelope-from dschultz@uclink.berkeley.edu) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h0UHpSJ5009924; Thu, 30 Jan 2003 09:51:28 -0800 (PST) (envelope-from dschultz@uclink.berkeley.edu) Date: Thu, 30 Jan 2003 09:51:28 -0800 From: David Schultz To: Terry Lambert Cc: "Andrew R. Reiter" , Scott Long , arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) Message-ID: <20030130175128.GA9891@HAL9000.homeunix.com> Mail-Followup-To: Terry Lambert , "Andrew R. Reiter" , Scott Long , arch@FreeBSD.ORG References: <3E385A1E.629EB694@mindspring.com> <20030130064800.GB7258@HAL9000.homeunix.com> <3E3922DA.55D9CDE9@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3E3922DA.55D9CDE9@mindspring.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Thus spake Terry Lambert : > David Schultz wrote: > > Thus spake Terry Lambert : > > > "Andrew R. Reiter" wrote: > > > > Anyone know the status of PAE in fBSD? I heard rumors awhile back that > > > > people had patches, or Y! had patches... but has anyone actually coughed > > > > them up? > > > > > > Contact Paul Saab. > > > > A year ago, the rumor was that DG was eventually going to do it. > > Six months ago it was Peter Wemm. And now Paul Saab?! Sheesh. > > Why don't we just wait another few years so 64-bit machines solve > > all our problems and we don't have to hack up the VM system? ;-) > > PSE36 is more intelligent than PAE, but neither one are very smart; > they were put there by hardware people who thought that what software > people wanted was more processes in RAM, not more RAM in individual > processes. As such, they are a generally bad idea. Most people > asking the question seem to have bought into the hardware people's > picture of the universe, without understanding that. 8-(. More specifically, they are the same people who brought us bank switching at least twice in the past, and lo and behold it still isn't a very good idea. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Jan 30 11: 1:20 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E290937B401 for ; Thu, 30 Jan 2003 11:01:18 -0800 (PST) Received: from rwcrmhc53.attbi.com (rwcrmhc53.attbi.com [204.127.198.39]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7CBAC43F3F for ; Thu, 30 Jan 2003 11:01:18 -0800 (PST) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4]) by rwcrmhc53.attbi.com (rwcrmhc53) with ESMTP id <20030130190117053001t9s5e>; Thu, 30 Jan 2003 19:01:18 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id LAA36023; Thu, 30 Jan 2003 11:01:16 -0800 (PST) Date: Thu, 30 Jan 2003 11:01:15 -0800 (PST) From: Julian Elischer To: David Schultz Cc: Terry Lambert , "Andrew R. Reiter" , Scott Long , arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) In-Reply-To: <20030130175128.GA9891@HAL9000.homeunix.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, 30 Jan 2003, David Schultz wrote: > Thus spake Terry Lambert : > > David Schultz wrote: > > > Thus spake Terry Lambert : > > > > "Andrew R. Reiter" wrote: > > > > > Anyone know the status of PAE in fBSD? I heard rumors awhile back that > > > > > people had patches, or Y! had patches... but has anyone actually coughed > > > > > them up? > > > > > > > > Contact Paul Saab. > > > > > > A year ago, the rumor was that DG was eventually going to do it. > > > Six months ago it was Peter Wemm. And now Paul Saab?! Sheesh. > > > Why don't we just wait another few years so 64-bit machines solve > > > all our problems and we don't have to hack up the VM system? ;-) > > > > PSE36 is more intelligent than PAE, but neither one are very smart; > > they were put there by hardware people who thought that what software > > people wanted was more processes in RAM, not more RAM in individual > > processes. As such, they are a generally bad idea. Most people > > asking the question seem to have bought into the hardware people's > > picture of the universe, without understanding that. 8-(. > > More specifically, they are the same people who brought us bank > switching at least twice in the past, and lo and behold it still > isn't a very good idea. The reason for PAE is simple. Disk caches need not be in mapped memory. Physical memory will do. If you want to cache more than 4GB, then PAE is an effective answer. (Assuming I have my TLAs the right way around..) > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Jan 30 11: 9:33 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5959637B405 for ; Thu, 30 Jan 2003 11:09:31 -0800 (PST) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id C13E443F85 for ; Thu, 30 Jan 2003 11:09:29 -0800 (PST) (envelope-from arr@watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.6/8.12.5) with ESMTP id h0UJ9CP3027415; Thu, 30 Jan 2003 14:09:12 -0500 (EST) (envelope-from arr@watson.org) Received: from localhost (arr@localhost) by fledge.watson.org (8.12.6/8.12.6/Submit) with SMTP id h0UJ9CST027412; Thu, 30 Jan 2003 14:09:12 -0500 (EST) X-Authentication-Warning: fledge.watson.org: arr owned process doing -bs Date: Thu, 30 Jan 2003 14:09:11 -0500 (EST) From: "Andrew R. Reiter" To: Julian Elischer Cc: David Schultz , Terry Lambert , Scott Long , arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, 30 Jan 2003, Julian Elischer wrote: : : :On Thu, 30 Jan 2003, David Schultz wrote: : :> Thus spake Terry Lambert : :> > David Schultz wrote: :> > > Thus spake Terry Lambert : :> > > > "Andrew R. Reiter" wrote: :> > > > > Anyone know the status of PAE in fBSD? I heard rumors awhile back that :> > > > > people had patches, or Y! had patches... but has anyone actually coughed :> > > > > them up? :> > > > :> > > > Contact Paul Saab. :> > > :> > > A year ago, the rumor was that DG was eventually going to do it. :> > > Six months ago it was Peter Wemm. And now Paul Saab?! Sheesh. :> > > Why don't we just wait another few years so 64-bit machines solve :> > > all our problems and we don't have to hack up the VM system? ;-) :> > :> > PSE36 is more intelligent than PAE, but neither one are very smart; :> > they were put there by hardware people who thought that what software :> > people wanted was more processes in RAM, not more RAM in individual :> > processes. As such, they are a generally bad idea. Most people :> > asking the question seem to have bought into the hardware people's :> > picture of the universe, without understanding that. 8-(. :> :> More specifically, they are the same people who brought us bank :> switching at least twice in the past, and lo and behold it still :> isn't a very good idea. : :The reason for PAE is simple. : :Disk caches need not be in mapped memory. Physical memory will do. :If you want to cache more than 4GB, then PAE is an effective answer. : :(Assuming I have my TLAs the right way around..) : : Ya, well Im glad you brought that up, b/c aside from the anti-PAE rants that have been coming across (which are of ZERO USE -- THX FOR THAT), I do believe there are uses for it. I am glad to hear that someone is on it :) Thanks to them and those who organized the project for it. Cheers, Andrew -- Andrew R. Reiter arr@watson.org arr@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Jan 30 15:30:31 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 95A4437B401 for ; Thu, 30 Jan 2003 15:30:29 -0800 (PST) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 005F243F3F for ; Thu, 30 Jan 2003 15:30:29 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0073.cvx22-bradley.dialup.earthlink.net ([209.179.198.73] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18eO8T-0004ZZ-00; Thu, 30 Jan 2003 15:30:22 -0800 Message-ID: <3E39B52E.E46AF9EA@mindspring.com> Date: Thu, 30 Jan 2003 15:28:46 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Julian Elischer Cc: David Schultz , "Andrew R. Reiter" , Scott Long , arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4697461e3d96eabd8516879fd0c1639933ca473d225a0f487350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Julian Elischer wrote: > The reason for PAE is simple. > > Disk caches need not be in mapped memory. Physical memory will do. > If you want to cache more than 4GB, then PAE is an effective answer. > > (Assuming I have my TLAs the right way around..) Using the memory by declaring a small copy window that's accessed via PAE in the kernel, and not really supporting PAE at all, can make this work... BUT... it will only work with 64 bit disk controllers, on motherboards with chipsets that support the full 64 bit DMA address path to memory. Many nominally "64 bit" systems apply only to data path, not to addressing, and it is nearly inpossible to determine if a given card supports it, or not, based on attribution in the driver device control block, since most of the drivers in FreeBSD are not written to capture this information, or make it available. In addition, you can not use it for mbufs for scatter/gather DMA, without spreading the bank selection code throughout the kernel, and accessing the mbufs through a window you remap, for puppup into 32 bit processes from the "36 bit" address space, for received mbufs. You *might* be able to deal with this for writes to the wire, but you are talking about adding copies to get the data there, or talking about some serious stack modifications to define a "PAE" or "PSE36" external mbuf type. The relative access penalty for memory because of clock multipliers that are way in the hell up there (133MHz memory on 2.1GHz systems), means that the relative cost of bank selection is very expensive relative to just doing bus transfers. It *may* be useful for something like an NFS server, but it's *not* useful for a proxy cache, and it's not useful for most other applications that need shared access to the same data, rather than their own copies of data. And it doesn't help with databases with more than a single access session, unless both sessions have a high locality of reference with each other. For something like a big Oracle server at a credit processing center, you aren't ever going to get the necessary locality. What this boils down to is stall barriers, waiting for the CPU to process the data in one mapped window, so that it can then remap the window and reprocess the data there, instead, if you are doing any work on the data, whatsoever. Add to this the fact that you can buy 64bit systems today, down at Fry's, and online from other vendors, and FreeBSD 5.0 runs on these systems in native 64bit mode, there's really no reason to try and cram more than 4G of RAM into a 32 bit system, and then grab another 4 bits worth of RAM through bank selection,and eat the stall and processing barriers. Personally, I don't really see the point; in the best case, the intrusions into the VM and other subsystems will be done in such a way that there are macros to make the code go away; it would be really painful to deal with 2M vs. 4M pages for PSE36, and it would be real painful to have to eat overhead that buys you nothing, and add complexity, for the common case of 32 bit machines with memory less than or equal to what can be addressed with 32 bits. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Jan 30 16:33:30 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3ED8A37B401 for ; Thu, 30 Jan 2003 16:33:29 -0800 (PST) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id DF8EC43E4A for ; Thu, 30 Jan 2003 16:33:28 -0800 (PST) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id B42622A8A1; Thu, 30 Jan 2003 16:33:23 -0800 (PST) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Terry Lambert Cc: David Schultz , "Andrew R. Reiter" , Scott Long , arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) In-Reply-To: <3E3922DA.55D9CDE9@mindspring.com> Date: Thu, 30 Jan 2003 16:33:23 -0800 From: Peter Wemm Message-Id: <20030131003323.B42622A8A1@canning.wemm.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Terry Lambert wrote: > David Schultz wrote: > > Thus spake Terry Lambert : > > > "Andrew R. Reiter" wrote: > > > > Anyone know the status of PAE in fBSD? I heard rumors awhile back that > > > > people had patches, or Y! had patches... but has anyone actually coughe d > > > > them up? > > > > > > Contact Paul Saab. > > > > A year ago, the rumor was that DG was eventually going to do it. > > Six months ago it was Peter Wemm. And now Paul Saab?! Sheesh. > > Why don't we just wait another few years so 64-bit machines solve > > all our problems and we don't have to hack up the VM system? ;-) > > PSE36 is more intelligent than PAE, but neither one are very smart; > they were put there by hardware people who thought that what software > people wanted was more processes in RAM, not more RAM in individual > processes. As such, they are a generally bad idea. Most people > asking the question seem to have bought into the hardware people's > picture of the universe, without understanding that. 8-(. I beg to differ about PSE36. Since it still runs on 32 bit page tables, all PSE36 does is enable 4MB mappings that are targeted at above the 4G bounrary. It does this by shifting the PTD entries for 4MB pages across by 4 bits in order to squeeze the extra bits in. For some things this would be useful. But remember you can *only* use it in 4MB chunks. Our VM system isn't geared for that and we'd have to come up with an infrastructure to somehow get it to within reach of userland. Maybe it could be used to provide backing store for things like system V shared memory, but the lack of size granularity would make it interesting. And since its 4MB chunks, forget paging and mmap etc. PSE36 really treats memory above 4G as second-class. On the other hand, PAE treats all memory as "first class" and is useable everywhere. The cost is that you need to do 64 bit idempotent writes to the page tables if you ever want to use it on SMP. But at least it can be used for page cache, generic process data, malloc etc etc. Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Jan 30 16:47:26 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 54C6E37B401 for ; Thu, 30 Jan 2003 16:47:25 -0800 (PST) Received: from magic.adaptec.com (magic.adaptec.com [208.236.45.80]) by mx1.FreeBSD.org (Postfix) with ESMTP id A9B4E43F3F for ; Thu, 30 Jan 2003 16:47:24 -0800 (PST) (envelope-from scott_long@btc.adaptec.com) Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11]) by magic.adaptec.com (8.11.6+Sun/8.11.6) with ESMTP id h0V0kaD27152; Thu, 30 Jan 2003 16:46:36 -0800 (PST) Received: from btc.btc.adaptec.com (btc.btc.adaptec.com [10.100.0.52]) by redfish.adaptec.com (8.8.8+Sun/8.8.8) with ESMTP id QAA26240; Thu, 30 Jan 2003 16:46:25 -0800 (PST) Received: from btc.adaptec.com (hollin [10.100.253.56]) by btc.btc.adaptec.com (8.8.8+Sun/8.8.8) with ESMTP id RAA26526; Thu, 30 Jan 2003 17:46:19 -0700 (MST) Message-ID: <3E39C764.3070500@btc.adaptec.com> Date: Thu, 30 Jan 2003 17:46:28 -0700 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.2b) Gecko/20021125 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Terry Lambert Cc: Julian Elischer , David Schultz , "Andrew R. Reiter" , arch@freebsd.org Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) References: <3E39B52E.E46AF9EA@mindspring.com> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Terry Lambert wrote: > Julian Elischer wrote: > > >The reason for PAE is simple. > > > >Disk caches need not be in mapped memory. Physical memory will do. > >If you want to cache more than 4GB, then PAE is an effective answer. > > > >(Assuming I have my TLAs the right way around..) > > > Using the memory by declaring a small copy window that's accessed > via PAE in the kernel, and not really supporting PAE at all, can > make this work... [...] > > -- Terry This troll is totally unneccessary. Making peripheral devices work with PAE is a matter handled between the device driver and the busdma system. Drivers that cannot pass 64 bit bus addresses to their hardware will have the data bounced by busdma, just like what happens in the ISA world. The whole point of the busdma push that Robert and Maxime started a few months ago is to prepare drivers for the possible coming of PAE. Honestly, though, if you're going to spend the money on a PAE-capable motherboard and all the memory to go along with it, are you really going to put a Realtek nic and an Advansys scsi card into it? Also, the PAE work that might happen is not going to affect the vast majority of FreeBSD/i386 users at all; I can only imagine that it will be a config(8) option that will most likely default to 'off'. There is nothing to bikeshed here. Please respect that there are people who need PAE, understand PAE, and will happily accept PAE. Those who do not need, understand, or accept it can go along with their lives blissfully happy with it turned off. Scott To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Jan 30 16:56:53 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B04BE37B401 for ; Thu, 30 Jan 2003 16:56:52 -0800 (PST) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id 63ED043F43 for ; Thu, 30 Jan 2003 16:56:52 -0800 (PST) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id 487652A89E; Thu, 30 Jan 2003 16:56:52 -0800 (PST) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Scott Long Cc: Terry Lambert , Julian Elischer , David Schultz , "Andrew R. Reiter" , arch@freebsd.org Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) In-Reply-To: <3E39C764.3070500@btc.adaptec.com> Date: Thu, 30 Jan 2003 16:56:52 -0800 From: Peter Wemm Message-Id: <20030131005652.487652A89E@canning.wemm.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Scott Long wrote: > Also, the PAE work that might happen is not going to affect the vast > majority of FreeBSD/i386 users at all; I can only imagine that it will > be a config(8) option that will most likely default to 'off'. > There is nothing to bikeshed here. Please respect that there are people > who need PAE, understand PAE, and will happily accept PAE. Those who do > not need, understand, or accept it can go along with their lives > blissfully happy with it turned off. Especially when you consider that PAE makes things slower. Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Jan 30 16:59:56 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6B1EA37B401 for ; Thu, 30 Jan 2003 16:59:54 -0800 (PST) Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id BAEE843E4A for ; Thu, 30 Jan 2003 16:59:53 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0073.cvx22-bradley.dialup.earthlink.net ([209.179.198.73] helo=mindspring.com) by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18ePWo-0003xc-00; Thu, 30 Jan 2003 16:59:35 -0800 Message-ID: <3E39C9FC.3EAF3345@mindspring.com> Date: Thu, 30 Jan 2003 16:57:32 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Peter Wemm Cc: David Schultz , "Andrew R. Reiter" , Scott Long , arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) References: <20030131003323.B42622A8A1@canning.wemm.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a470f523f2e5d47a7e59edcd74d518b3a72601a10902912494350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Peter Wemm wrote: > I beg to differ about PSE36. Since it still runs on 32 bit page tables, > all PSE36 does is enable 4MB mappings that are targeted at above the 4G > bounrary. It does this by shifting the PTD entries for 4MB pages across > by 4 bits in order to squeeze the extra bits in. Actually, they're 2M. They eat a bit there. 8-). > For some things this would be useful. But remember you can *only* use > it in 4MB chunks. Our VM system isn't geared for that and we'd have > to come up with an infrastructure to somehow get it to within reach of > userland. Maybe it could be used to provide backing store for things > like system V shared memory, but the lack of size granularity would make > it interesting. And since its 4MB chunks, forget paging and mmap etc. > PSE36 really treats memory above 4G as second-class. That's pretty much my point: the memory above 4G *is* second class, in that it requires making memory below 4G *unavailable* in order to make itself available, even if you use PAE. The problem is one of simultaneous access by multiple processes, and PSE36 at least allows that, if badly, whereas PAE doesn't. You're right about the VM system not being geared for it. Going to 2M instead of 4M "PSE pages" would be rather a pain, and that's just one of a half dozen issues. As to paging of 2M pages, I've actually always thought it needed to be fixed so that large pages could be supported directly via paging. It's not unreasonable to want to page at a ratio of 1:32,768, which is what you would be getting. Comparing 4K page on a 4G system, it's a 1:1,048,576 ratio; that's only really an denflation of 32 times in the number of pageable objects mapping an entire address space. > On the other hand, PAE treats all memory as "first class" and is useable > everywhere. The cost is that you need to do 64 bit idempotent writes to > the page tables if you ever want to use it on SMP. But at least it > can be used for page cache, generic process data, malloc etc etc. It's usable, but not simultaneously. A really good example here would be buffer cache entries and mbufs, for something like a "sendfile" operation. If you have an FTP server with this arrangement, and it's loaded enough to actually use the RAM, then you will end up with FTP clients that end up stalling each other at the driver level. You could *maybe* get around it by making sure that the network cards all did checksum offloading, were all capable of doing 64 bit addressing, and then pre-creating the mbuf list for the entire "wired" region of the file, well in excess of the sendspace limit. I've done that in a product or two (jacked around with ignoring the sendspace limit, and putting huge chains of mbufs on a list). But the cost of doing that is moving your mbufs to a 64 bit address space, seperate from the rest of the kernel. If you don't seperate inbound and outbound mbuf pools into 32 bit and 64 bit pools, then you have to face the possibility of dealing with the simultaneous access issue, for, for example, every mbuf in an mbuf chain for an m_pullup operation. The overhead for several TCP streams where you are doing that would be killer. I think it's probably better to acknowledge that the memory above 4G *is* second class, and then treat it as an L3 cache, and (maybe) a DMA target for transfers *into* it, but not for transfers out. It gets ugly fast, because of the cross-boundary stalls. To me, PAE is more like the segments in Windows 3.11; the OS has to be built from the ground up to expect them, and use them properly. 8-(. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Jan 30 17:24:29 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1C41D37B401 for ; Thu, 30 Jan 2003 17:24:27 -0800 (PST) Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7B2AE43E4A for ; Thu, 30 Jan 2003 17:24:26 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0073.cvx22-bradley.dialup.earthlink.net ([209.179.198.73] helo=mindspring.com) by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18ePuj-00075u-00; Thu, 30 Jan 2003 17:24:18 -0800 Message-ID: <3E39CFC3.9EF4A67E@mindspring.com> Date: Thu, 30 Jan 2003 17:22:11 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Scott Long Cc: Julian Elischer , David Schultz , "Andrew R. Reiter" , arch@freebsd.org Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) References: <3E39B52E.E46AF9EA@mindspring.com> <3E39C764.3070500@btc.adaptec.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a444670d5480ef25fe788f799d90cd58c7666fa475841a1c7a350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Scott Long wrote: > > Using the memory by declaring a small copy window that's accessed > > via PAE in the kernel, and not really supporting PAE at all, can > > make this work... > > [...] > > This troll is totally unneccessary. Making peripheral devices work with > PAE is a matter handled between the device driver and the busdma system. > Drivers that cannot pass 64 bit bus addresses to their hardware will > have the data bounced by busdma, just like what happens in the ISA > world. The whole point of the busdma push that Robert and Maxime > started a few months ago is to prepare drivers for the possible coming > of PAE. This is a great idea, until you get that scatter/gather for network cards won't work very well in the context of the mbuf system, as it exists today, unless you are willing to split incoming and outgoing mbufs into two different pools, or you're willing to add a copy operation to everything. > Honestly, though, if you're going to spend the money on a PAE-capable > motherboard and all the memory to go along with it, are you really going > to put a Realtek nic and an Advansys scsi card into it? I'm going to have whatever the manufacturer put on the motherboard, most likely, which may or may not be 64bit capable. If I'm spending all the money building it up from "to spec" components in the first place, I'm more likely to just buy a 64bit machine, instead. My biggest cost is going to end up going to 3rd parth 64 bit capable cards, and RAM, anyway. > Also, the PAE work that might happen is not going to affect the vast > majority of FreeBSD/i386 users at all; I can only imagine that it will > be a config(8) option that will most likely default to 'off'. This would result in potentially significant duplicate sections of code in the VM system, seperated by #ifdef's, if true, unless all the VM references that needed to switch between 32 and 36 bits were macrotized, and certain parts rewritten from scratch. That's always possible, I suppose. > There is nothing to bikeshed here. Please respect that there are people > who need PAE, understand PAE, and will happily accept PAE. Those who do > not need, understand, or accept it can go along with their lives > blissfully happy with it turned off. Realize that I've personally built a system with 4G of memory, based on FreeBSD, that could handle 1.6M simultaneous connections, for a proxy caching company. We had a lot of reason to look into PAE, because number of simultaneous connections and number of mbufs available for caching data, are inversely proportional (obviously). Using PAE was one potential approach to the "add more RAM" approach to throwing resources rather than intelligence at the problem. The problem with using PAE for this application is that the mbuf chains can not be simultaneously available in the inbound and outbound space, without copying. Now it doesn't matter whether the inbound space is from a network card, or from a disk controller: if there is host processing that has to take place, then you have to span multiple of these PAE pages simultaneously. As Peter rightly points out, the regions are large enough to be problematic for paging. Effectively, you have to disassociate the VM and buffer cache, or find some way of supporting paging of much-larger-than-4K units. I have yet to see someone suggest a real application for PAE that wasn't tantamount to an L3 cache and/or a RAMdisk. It does not increase your UVA or KVA size above the 4G limit: all your pointers your compiler generates are still 32 bits, and you are still limited to 4G. What *would* have been useful is if the Intel guys had gone 64bit, like the AMD folks did, so that the UVA or KVA or both could be made larger than 4G. Frankly, the most useful thing that might come out of this is a change to the copyin/copyout/copyinstr/etc. code to seperate the UVA and KVA spaces, making them both 4G. At *that* point, it could be useful to make programs larger. But you could have that *without* PAE, and with PAE, you would *still* need to split the VM and buffer cache apart to create a copy boundary for kernel vs. user data. At least with an explicit coherency requirment, and the code to implement it, we could expect FS stacking to start working like it was designed to work, ten years ago. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Jan 31 10:15:40 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A8DEE37B401 for ; Fri, 31 Jan 2003 10:15:38 -0800 (PST) Received: from web41206.mail.yahoo.com (web41206.mail.yahoo.com [66.218.93.39]) by mx1.FreeBSD.org (Postfix) with SMTP id 3FA7743F75 for ; Fri, 31 Jan 2003 10:15:38 -0800 (PST) (envelope-from gathorpe79@yahoo.com) Message-ID: <20030131181538.80926.qmail@web41206.mail.yahoo.com> Received: from [149.99.116.82] by web41206.mail.yahoo.com via HTTP; Fri, 31 Jan 2003 13:15:38 EST Date: Fri, 31 Jan 2003 13:15:38 -0500 (EST) From: Gary Thorpe Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) To: "Andrew R. Reiter" , Julian Elischer Cc: David Schultz , Terry Lambert , Scott Long , arch@FreeBSD.ORG In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --- "Andrew R. Reiter" wrote: > On Thu, 30 Jan 2003, Julian Elischer wrote: > [...] > :The reason for PAE is simple. > : > :Disk caches need not be in mapped memory. Physical memory will do. > :If you want to cache more than 4GB, then PAE is an effective > answer. > : > :(Assuming I have my TLAs the right way around..) > : > : > > Ya, well Im glad you brought that up, b/c aside from the anti-PAE > rants > that have been coming across (which are of ZERO USE -- THX FOR THAT), > I > do believe there are uses for it. I am glad to hear that someone is > on > it :) Thanks to them and those who organized the project for it. > > Cheers, > Andrew > > -- > Andrew R. Reiter > arr@watson.org > arr@FreeBSD.org Would this be part of a unified buffer-cache scheme though? If I have been following correctly, this memory cannot be directly mapped into processes address space (i.e. a process in one "segment" cannot access directly memory in another "segment"), so how would it be useful as a cache? Wouldn't this need lots of data copying as in bounce buffers? ______________________________________________________________________ Post your free ad now! http://personals.yahoo.ca To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Jan 31 10:43:29 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6639837B401 for ; Fri, 31 Jan 2003 10:43:27 -0800 (PST) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id 10D4643E4A for ; Fri, 31 Jan 2003 10:43:27 -0800 (PST) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id D70A42A89E; Fri, 31 Jan 2003 10:43:26 -0800 (PST) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Gary Thorpe Cc: "Andrew R. Reiter" , Julian Elischer , David Schultz , Terry Lambert , Scott Long , arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) In-Reply-To: <20030131181538.80926.qmail@web41206.mail.yahoo.com> Date: Fri, 31 Jan 2003 10:43:26 -0800 From: Peter Wemm Message-Id: <20030131184326.D70A42A89E@canning.wemm.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Gary Thorpe wrote: > --- "Andrew R. Reiter" wrote: > On Thu, 30 Jan 2003, > Julian Elischer wrote: > > > [...] > > :The reason for PAE is simple. > > : > > :Disk caches need not be in mapped memory. Physical memory will do. > > :If you want to cache more than 4GB, then PAE is an effective > > answer. > > : > > :(Assuming I have my TLAs the right way around..) > > : > > : > > > > Ya, well Im glad you brought that up, b/c aside from the anti-PAE > > rants > > that have been coming across (which are of ZERO USE -- THX FOR THAT), > > I > > do believe there are uses for it. I am glad to hear that someone is > > on > > it :) Thanks to them and those who organized the project for it. > > > > Cheers, > > Andrew > > > > -- > > Andrew R. Reiter > > arr@watson.org > > arr@FreeBSD.org > > Would this be part of a unified buffer-cache scheme though? If I have > been following correctly, this memory cannot be directly mapped into > processes address space (i.e. a process in one "segment" cannot access > directly memory in another "segment"), so how would it be useful as a > cache? Wouldn't this need lots of data copying as in bounce buffers? It the nasty PSE36 hack that cant be used for this. PAE works fine as cache since it is all available to all processes. Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Jan 31 16:49:27 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A362137B401 for ; Fri, 31 Jan 2003 16:49:26 -0800 (PST) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3B39343E4A for ; Fri, 31 Jan 2003 16:49:26 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0203.cvx21-bradley.dialup.earthlink.net ([209.179.192.203] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18elq9-0000ZZ-00; Fri, 31 Jan 2003 16:49:02 -0800 Message-ID: <3E3B1928.384F8C42@mindspring.com> Date: Fri, 31 Jan 2003 16:47:36 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Peter Wemm Cc: Gary Thorpe , "Andrew R. Reiter" , Julian Elischer , David Schultz , Scott Long , arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) References: <20030131184326.D70A42A89E@canning.wemm.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4fa5be0a4ef094526822fbedd8f675bc7350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Peter Wemm wrote: > Gary Thorpe wrote: > > Would this be part of a unified buffer-cache scheme though? If I have > > been following correctly, this memory cannot be directly mapped into > > processes address space (i.e. a process in one "segment" cannot access > > directly memory in another "segment"), so how would it be useful as a > > cache? Wouldn't this need lots of data copying as in bounce buffers? > > It the nasty PSE36 hack that cant be used for this. PAE works fine as > cache since it is all available to all processes. What Peter said: the thing you don't get is the ability to have more than 4G of KVA + UVA space, unless you split the VM and buffer cache. Also what Peter said about the performance penalty that comes from PAE, and what I said about the reliability penalty from #ifdef'ing and seperately maintaining the code side-by-side. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message