From owner-freebsd-arch Sun Oct 6 2:21:16 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4D34937B415 for ; Sun, 6 Oct 2002 02:21:14 -0700 (PDT) Received: from yahoo.com (r-pd037-4a75.tin.it [62.211.176.75]) by mx1.FreeBSD.org (Postfix) with SMTP id B740F43E4A for ; Sun, 6 Oct 2002 02:21:12 -0700 (PDT) (envelope-from assirianint@yahoo.com) From: "Assirian Movement" To: 152.163.207.134@FreeBSD.ORG Subject: Basically, I got on the plane with a bomb. Basically, I tried to ignite it. Basically, yeah, I intended to damage the plane. Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Date: Sun, 6 Oct 2002 11.21.54 +0200 Content-Transfer-Encoding: 8bit Message-Id: <20021006092112.B740F43E4A@mx1.FreeBSD.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG WESTERN PEOPLE, WESTERN GOVERNMENT, DO NOT HELP SADDAM BEING SOFT! IF YOU GO ON WITH YOUR SICK/SOFT UN POLICY YOU WILL FIND THE BOMBERS INSIDE YOUR BUS. WESTERN GOVERNMENT YOUR ELECTORS WILL PUNISH YOU IF YOU HELP SADDAM. SADDAM = WAR TODAY = 100 WARS TOMORROW "Basically, I got on the plane with a bomb. Basically, I tried to ignite it. Basically, yeah, I intended to damage the plane." RICHARD REID Four in U.S. Charged in Post-9/11 Plan to Join Al Qaeda Federal officials said they had broken up a terrorist cell in Portland, Ore., arresting four native-born citizens accused of plotting to join with Al Qaeda and Taliban fighters. Association of assirian countrymen "Bat-Nac" Aims and Tasks of organization: Active contribution to the revival, preservation, development of assirian culture, propaganda among population of historical heritage of assirian people of Iraq and Saudi Arabia. Leader of organization: Bat-Nac Phone: 113 27 27 Date of registration: 05.03.79 This is nor Spam write back to us for removal. Write to assirianint@yahoo.com with 'REMOVE ME PLEASE AS A SUBJECT' Study assirian,Be Proud of being! http://www.acl.edu.au/choose_lang_ass.html Our assirian page went on top of all the searchs consulting the Traffic Security Experts: http://www.msn.com/ + Security Traffic Expert http://search.msn.com/results.asp?q=security+traffic+expert&origq=traffic+ex pert&RS=CHECKED&FORM=SMCRT&v=1&cfg=SMCINITIAL&nosp=0&thr=&submitbutton.x=32& submitbutton.y=16 WESTERN GOVERNMENT: YOUR ELECTORS WILL PUNISH YOU IF YOU HELP SADDAM. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Oct 6 11:58:35 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8619837B401; Sun, 6 Oct 2002 11:58:33 -0700 (PDT) Received: from out017.verizon.net (out017pub.verizon.net [206.46.170.94]) by mx1.FreeBSD.org (Postfix) with ESMTP id B646043E4A; Sun, 6 Oct 2002 11:58:32 -0700 (PDT) (envelope-from res03db2@verizon.net) Received: from verizon.net ([4.47.70.146]) by out017.verizon.net (InterMail vM.5.01.05.09 201-253-122-126-109-20020611) with ESMTP id <20021006185831.CUFU6394.out017.verizon.net@verizon.net>; Sun, 6 Oct 2002 13:58:31 -0500 Received: (from res03db2@localhost) by verizon.net (8.9.3/8.9.3) id LAA29059; Sun, 6 Oct 2002 11:58:16 -0700 (PDT) (envelope-from res03db2) Date: Sun, 6 Oct 2002 11:58:16 -0700 From: Robert Clark To: Terry Lambert Cc: Nate Lawson , David Francheski , freebsd-arch@FreeBSD.ORG, freebsd-smp@FreeBSD.ORG Subject: Re: Running independent kernel instances on dual-Xeon/E7500 system Message-ID: <20021006115816.A28963@darkstar.gte.net> References: <3D9EB0A4.4CD09E20@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: <3D9EB0A4.4CD09E20@mindspring.com>; from tlambert2@mindspring.com on Sat, Oct 05, 2002 at 02:28:04AM -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I've often thought it would be nice to be able to devote one processor to a RT style OS instance that continuous duty doing "throw away" work updating the display, audio, etc. Using a general purpose CPU for graphics and sound work may not result in the kinds of performance you get with a GPU, but I have to imagine it would have a better chance of encouraging "free" driver development. On the flip side, the OS instance that didn't have anything to do with audio/video could spend more of its time doing network/disk I/O, and more traditional duties. [RC] On Sat, Oct 05, 2002 at 02:28:04AM -0700, Terry Lambert wrote: > Nate Lawson wrote: > > On Fri, 4 Oct 2002, David Francheski wrote: > > > I have a dual-Xeon processor (with E7500 chipset) motherboard. > > > Can anybody tell me what the development effort would be to > > > boot and run two independent copies of the FreeBSD kernel, > > > one on each Xeon processor? By this I mean that an SMP > > > enabled kernel would not be utilized, each kernel would be UP. > > > > > > Regards, > > > David L. Francheski > > > > Not possible without another BIOS, PCI bus, and separate memory -- > > i.e. another PC. > > IPL'ing is not the same as "running". So long as you crafted the > memory image of the second OS and its page tables, etc., using the > first processor, there should be no problem running a second copy > of an OS on an AP, as a result of a START IPI from the BP, after > the code is crafted. Thus there is no need for a separate BIOS. > > -- > > I've personally considered pursuing the ability to run code seperately, > though with the same 4G address space, seperated, so as to permit > running a debugger against a "crashed" FreeBSD "system" running on an > AP, doing the debugging from the BP, as a hosted system. The cost > in labor would be 2-3 months of continuous work, I think... that is > the estimate I arrived at, when I considered the project previously. > Doing this certaily beats the cost of buying an ICE to get similar > capability. > > > It would be interesting to see what other people have to say on this, > other than "can't be done" (not to pick on you in particular, here; > this is the knee-jerk reaction many people have to things like this). > > -- Terry > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-smp" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Oct 6 12:37:42 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4D9BA37B401 for ; Sun, 6 Oct 2002 12:37:41 -0700 (PDT) Received: from trantor.utsl.org (cvg-65-27-234-246.cinci.rr.com [65.27.234.246]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8F15E43E4A for ; Sun, 6 Oct 2002 12:37:40 -0700 (PDT) (envelope-from utsl@quic.net) Received: from hotrod.utsl.org ([10.10.57.3] helo=quic.net) by trantor.utsl.org with esmtp (Exim 3.35 #1 (Debian)) id 17yHDU-0007NB-00; Sun, 06 Oct 2002 15:37:28 -0400 Message-ID: <3DA090A2.1010602@quic.net> Date: Sun, 06 Oct 2002 15:36:02 -0400 From: Nathan Hawkins User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020615 Debian/1.0.0-3 MIME-Version: 1.0 To: Antony T Curtis Cc: freebsd-arch@FreeBSD.ORG Subject: Re: Running independent kernel instances on dual-Xeon/E7500 system References: <3D9EB0A4.4CD09E20@mindspring.com> <3D9EF6E9.9040700@ntlworld.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Antony T Curtis wrote: > I'm interested in persueing the idea of creating some form of > partitioning within one machine.... Kind of like wrapping up as many > global variables as possible and sharing the memory between them. > > Things like netgraph to be used to allow each 'partition' to have its > own network interface and for communication between them. Admittedly, > I'm no expert on operating systems but I have been trying to study the > FreeBSD sources to see if I can do some crude implementation, partly to > satisfy my own curiosity. There are a lot of ways to do this sort of thing. Most software implementations seem to fall into one of the following categories: 1. Virtual machine. (Machine, not processor emulation.) VM is the classic implementation, but there's also VMware on PC's, and MOL on PPC. Probably a few others around that I'm not aware of. 2. Run one OS on top of another. Microkernel systems typically do this to some degree, like Lites on Mach. More recently there is User-Mode Linux, which seems like an interesting approach. 3. Extend the OS. In FreeBSD's case, jails provide a limited kind of partitioning. Some of the commercial Unices have add-ons, like Solaris Resource Manager, that provide a different sort of partitioning. Each approach has advantages and disadvantages, depending on what you're trying to accomplish. ---Nathan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Oct 6 17:16:30 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C434837B401; Sun, 6 Oct 2002 17:16:29 -0700 (PDT) Received: from flamingo.mail.pas.earthlink.net (flamingo.mail.pas.earthlink.net [207.217.120.232]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5F41743E97; Sun, 6 Oct 2002 17:16:29 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0162.cvx22-bradley.dialup.earthlink.net ([209.179.198.162] helo=mindspring.com) by flamingo.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17yLZS-0004wa-00; Sun, 06 Oct 2002 17:16:26 -0700 Message-ID: <3DA0D20D.C47E4EF8@mindspring.com> Date: Sun, 06 Oct 2002 17:15:09 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Antony T Curtis Cc: Nate Lawson , David Francheski , freebsd-arch@FreeBSD.ORG, freebsd-smp@FreeBSD.org Subject: Re: Running independent kernel instances on dual-Xeon/E7500 system References: <3D9EB0A4.4CD09E20@mindspring.com> <3D9EF6E9.9040700@ntlworld.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Nathan Hawkins wrote: > Each approach has advantages and disadvantages, depending on what you're > trying to accomplish. I think in this case, it's that benchmarked performance actually goes down in FreeBSD 4.6 when you run SMP, as opposed to running UP, and FreeBSD -current is even worse, even if you disable the debugging that's on by default. Tools like "netperf" aren't really capable of taking advantage of additional processors, but they are excellent at showing the incremental slowdown that results from lack of CPU affinity (if applicable), as well as any additional locking overhead (if applicable). Tools that run against web servers, where the web server has been written to run with multiple processes (or mutithreaded, if the threads system on the platform is SMP scalable) show less improvement than expected; e.g.: http://www.softwareqatest.com/qatweb1.html#LOAD ...but they will at least show some small improvement with SMP. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Oct 6 19:48:46 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 983EE37B404 for ; Sun, 6 Oct 2002 19:48:43 -0700 (PDT) Received: from rootlabs.com (root.org [67.118.192.226]) by mx1.FreeBSD.org (Postfix) with SMTP id 0D5FC43E6A for ; Sun, 6 Oct 2002 19:48:39 -0700 (PDT) (envelope-from nate@rootlabs.com) Received: (qmail 5551 invoked by uid 1000); 7 Oct 2002 02:48:40 -0000 Date: Sun, 6 Oct 2002 19:48:40 -0700 (PDT) From: Nate Lawson To: freebsd-arch@FreeBSD.ORG Cc: freebsd-smp@FreeBSD.ORG Subject: Re: Running independent kernel instances on dual-Xeon/E7500 system In-Reply-To: <20021006115816.A28963@darkstar.gte.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Sorry for the unhelpful first posting. I sent a more detailed letter via private mail, recommending he look into the exokernel papers. My dismissiveness was due to anticipating the direction this was going, which is nicely shown by the response below. In short, dedicated processors for IO were used in the minicomputer days but are wasteful nowadays when you have lightweight interrupts and/or polling when appropriate. If your scheduler sucks, fix it. If a device needs extra processing equivalent to another N Ghz CPU, the vendor will add silicon. The "S" in SMP is for symmetric, lest we forget. -Nate On Sun, 6 Oct 2002, Robert Clark wrote: > I've often thought it would be nice to be able to devote > one processor to a RT style OS instance that continuous > duty doing "throw away" work updating the display, audio, > etc. > > Using a general purpose CPU for graphics and sound work > may not result in the kinds of performance you get with > a GPU, but I have to imagine it would have a better > chance of encouraging "free" driver development. > > On the flip side, the OS instance that didn't have > anything to do with audio/video could spend more of > its time doing network/disk I/O, and more traditional > duties. > > [RC] > > On Sat, Oct 05, 2002 at 02:28:04AM -0700, Terry Lambert wrote: > > Nate Lawson wrote: > > > On Fri, 4 Oct 2002, David Francheski wrote: > > > > I have a dual-Xeon processor (with E7500 chipset) motherboard. > > > > Can anybody tell me what the development effort would be to > > > > boot and run two independent copies of the FreeBSD kernel, > > > > one on each Xeon processor? By this I mean that an SMP > > > > enabled kernel would not be utilized, each kernel would be UP. > > > > > > > > Regards, > > > > David L. Francheski > > > > > > Not possible without another BIOS, PCI bus, and separate memory -- > > > i.e. another PC. > > > > IPL'ing is not the same as "running". So long as you crafted the > > memory image of the second OS and its page tables, etc., using the > > first processor, there should be no problem running a second copy > > of an OS on an AP, as a result of a START IPI from the BP, after > > the code is crafted. Thus there is no need for a separate BIOS. > > > > > > > -- > > > > I've personally considered pursuing the ability to run code seperately, > > though with the same 4G address space, seperated, so as to permit > > running a debugger against a "crashed" FreeBSD "system" running on an > > AP, doing the debugging from the BP, as a hosted system. The cost > > in labor would be 2-3 months of continuous work, I think... that is > > the estimate I arrived at, when I considered the project previously. > > Doing this certaily beats the cost of buying an ICE to get similar > > capability. > > > > > > It would be interesting to see what other people have to say on this, > > other than "can't be done" (not to pick on you in particular, here; > > this is the knee-jerk reaction many people have to things like this). > > > > -- Terry > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > > with "unsubscribe freebsd-smp" in the body of the message > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Oct 6 21:12: 5 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D151037B408; Sun, 6 Oct 2002 21:12:03 -0700 (PDT) Received: from gull.mail.pas.earthlink.net (gull.mail.pas.earthlink.net [207.217.120.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 70BA043E81; Sun, 6 Oct 2002 21:12:03 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0276.cvx22-bradley.dialup.earthlink.net ([209.179.199.21] helo=mindspring.com) by gull.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17yPFR-0007ci-00; Sun, 06 Oct 2002 21:12:02 -0700 Message-ID: <3DA10949.218488B9@mindspring.com> Date: Sun, 06 Oct 2002 21:10:49 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Nate Lawson Cc: freebsd-arch@FreeBSD.ORG, freebsd-smp@FreeBSD.ORG Subject: Re: Running independent kernel instances on dual-Xeon/E7500 system References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Nate Lawson wrote: > My dismissiveness was due to anticipating the direction this was going, > which is nicely shown by the response below. In short, dedicated > processors for IO were used in the minicomputer days but are wasteful > nowadays when you have lightweight interrupts and/or polling when > appropriate. Yet, I keep running into employers who want to pay people to do exactly that, particularly for offloading network processing to one processor, and running applications on the other. And then there's the Tigon II firmware rewrite for FreeBSD, to offload interrupt and copy processing. And CGD's work for Sibytes (NetBSD 64bit MIPS-based network coprocessor board) doing just that got the company sold to Broadcom for what, $700M? 8-). > If your scheduler sucks, fix it. If a device needs extra processing > equivalent to another N Ghz CPU, the vendor will add silicon. The "S" in > SMP is for symmetric, lest we forget. People keep saying that, and then keep not running interrupts in virtual wire mode, so that their delivery is "S" as in "symmetric"... ;^). Actually, NT proved that wiring particular interrupts to particular processors was the way to go -- that was one of the things they did to beat the Linux numbers in both the Netcraft and Ziff-Davis benchmarks... perfect symmetry isn't all that it's promised. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Oct 6 21:38:32 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 47F8E37B401; Sun, 6 Oct 2002 21:38:31 -0700 (PDT) Received: from ebb.errno.com (ebb.errno.com [66.127.85.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id D6EF243E7B; Sun, 6 Oct 2002 21:38:30 -0700 (PDT) (envelope-from sam@errno.com) Received: from melange (melange.errno.com [66.127.85.82]) (authenticated bits=0) by ebb.errno.com (8.12.5/8.12.1) with ESMTP id g974cR1H000716 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Sun, 6 Oct 2002 21:38:30 -0700 (PDT)?g (envelope-from sam@errno.com)œ X-Authentication-Warning: ebb.errno.com: Host melange.errno.com [66.127.85.82] claimed to be melange Message-ID: <13e901c26dbb$63059f60$52557f42@errno.com> From: "Sam Leffler" To: , Subject: CFR: m_tag patch Date: Sun, 6 Oct 2002 21:38:26 -0700 Organization: Errno Consulting MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG http://www.freebsd.org/~sam/mtag.patch has changes to -current to replace the "aux mbuf" with a more general mechanism borrowed from openbsd. Rather than dangling mbuf's off a packet when auxiliary information needs to be associated with a packet a list of variable-size struct m_tag's are kept. This is better because it: 1. Eliminates the use of mbufs as a general-purpose memory allocator. 2. Avoids confusing and problematic code (e.g. ipsec stuffs multiple data structures into an mbuf and often consults m_len to determine what might/should be present). 3. Means arbitrary size data can be stored (w/ mbufs you get what fits in a fixed-size mbuf or--if it were implemented--in a cluster). 4. Removes a recursive dependency that complicates locking in the mbuf code. The patch actually contains three sets of changes that are intertwined: 1. Remove use of aux mbufs and replace with m_tag's. 2. Add an additional parameter to ip_output and ip6_output that was previously passed through an aux mbuf. 3. Rename luigi's m_tag_id hack #define to avoid name conflict with the m_tag definition. I've been running something like this patch for ~9 months. The patch actually eliminates more code than it adds and is likely to improve performance (haven't measured). There should be no functional changes after this patch is applied. Timely feedback is desired as I'd like to commit these changes in time for DP2. Sam To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Oct 6 21:56:30 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 873A137B401 for ; Sun, 6 Oct 2002 21:56:28 -0700 (PDT) Received: from rootlabs.com (root.org [67.118.192.226]) by mx1.FreeBSD.org (Postfix) with SMTP id 2489F43E6E for ; Sun, 6 Oct 2002 21:56:24 -0700 (PDT) (envelope-from nate@rootlabs.com) Received: (qmail 5885 invoked by uid 1000); 7 Oct 2002 04:56:26 -0000 Date: Sun, 6 Oct 2002 21:56:26 -0700 (PDT) From: Nate Lawson To: freebsd-arch@FreeBSD.ORG Cc: freebsd-smp@FreeBSD.ORG Subject: Re: Running independent kernel instances on dual-Xeon/E7500 system In-Reply-To: <3DA10949.218488B9@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sun, 6 Oct 2002, Terry Lambert wrote: > Nate Lawson wrote: > > My dismissiveness was due to anticipating the direction this was going, > > which is nicely shown by the response below. In short, dedicated > > processors for IO were used in the minicomputer days but are wasteful > > nowadays when you have lightweight interrupts and/or polling when > > appropriate. > > Yet, I keep running into employers who want to pay people to do > exactly that, particularly for offloading network processing to > one processor, and running applications on the other. Been there, hence the touchiness. > And then there's the Tigon II firmware rewrite for FreeBSD, to > offload interrupt and copy processing. And CGD's work for Sibytes > (NetBSD 64bit MIPS-based network coprocessor board) doing just that > got the company sold to Broadcom for what, $700M? > > 8-). I agree that when there are spare cycles available on the _device_'s processor it should be doing more work. But that's very different from dedicating a processor that could otherwise be doing useful work (given a well-written SMP-aware OS of course). > > If your scheduler sucks, fix it. If a device needs extra processing > > equivalent to another N Ghz CPU, the vendor will add silicon. The "S" in > > SMP is for symmetric, lest we forget. > > People keep saying that, and then keep not running interrupts in > virtual wire mode, so that their delivery is "S" as in "symmetric"... > ;^). > > Actually, NT proved that wiring particular interrupts to particular > processors was the way to go -- that was one of the things they did > to beat the Linux numbers in both the Netcraft and Ziff-Davis > benchmarks... perfect symmetry isn't all that it's promised. > > -- Terry I'm not sure that breaks my definition of symmetric since that sounds like they were just setting the processor affinity per interrupt. ;-) I agree that for a given fixed workload profile, it may make sense to build a single-purpose device out of off-the-shelf parts. But most of the time, the decision to go that way is the result of a knee-jerk reaction that "I'm a Real Programmer and I want bare metal because it's faster". I believe that mostly results in slower systems because the workload always changes out from under the designer's assumptions. Hence we get more real benefits from versatile things like branch prediction, register scoreboarding, etc. that make their decisions at runtime instead of chalkboard time. Here's an archived email by VJ that seems amazingly relevant, even today: http://www.root.org/ip-development/news/vanj.88jul20.txt -Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Oct 6 22: 2: 3 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6E17237B49B for ; Sun, 6 Oct 2002 22:01:59 -0700 (PDT) Received: from rootlabs.com (root.org [67.118.192.226]) by mx1.FreeBSD.org (Postfix) with SMTP id C279743E65 for ; Sun, 6 Oct 2002 22:01:58 -0700 (PDT) (envelope-from nate@rootlabs.com) Received: (qmail 5907 invoked by uid 1000); 7 Oct 2002 05:02:01 -0000 Date: Sun, 6 Oct 2002 22:02:01 -0700 (PDT) From: Nate Lawson To: freebsd-arch@freebsd.org Cc: freebsd-net@freebsd.org Subject: Re: CFR: m_tag patch In-Reply-To: <13e901c26dbb$63059f60$52557f42@errno.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sun, 6 Oct 2002, Sam Leffler wrote: > http://www.freebsd.org/~sam/mtag.patch > > has changes to -current to replace the "aux mbuf" with a more general > mechanism borrowed from openbsd. Rather than dangling mbuf's off a packet > when auxiliary information needs to be associated with a packet a list of > variable-size struct m_tag's are kept. This is better because it: > > 1. Eliminates the use of mbufs as a general-purpose memory allocator. > 2. Avoids confusing and problematic code (e.g. ipsec stuffs multiple data > structures into an mbuf and often consults m_len to determine what > might/should be present). > 3. Means arbitrary size data can be stored (w/ mbufs you get what fits in a > fixed-size mbuf or--if it were implemented--in a cluster). > 4. Removes a recursive dependency that complicates locking in the mbuf code. > > The patch actually contains three sets of changes that are intertwined: > > 1. Remove use of aux mbufs and replace with m_tag's. > 2. Add an additional parameter to ip_output and ip6_output that was > previously passed through an aux mbuf. > 3. Rename luigi's m_tag_id hack #define to avoid name conflict with the > m_tag definition. > > I've been running something like this patch for ~9 months. The patch > actually eliminates more code than it adds and is likely to improve > performance (haven't measured). There should be no functional changes after > this patch is applied. > > Timely feedback is desired as I'd like to commit these changes in time for > DP2. > > Sam I'm not familiar with that code so only a few questions: 1. Is ordering important or is an SLIST sufficient for all cases? 2. Is it possible to attach the aux argument to the mbuf chain instead of adding it as a new parameter to ip_output? -Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Oct 6 22:21:45 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EE78637B401; Sun, 6 Oct 2002 22:21:43 -0700 (PDT) Received: from ebb.errno.com (ebb.errno.com [66.127.85.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8EA1843E97; Sun, 6 Oct 2002 22:21:43 -0700 (PDT) (envelope-from sam@errno.com) Received: from melange (melange.errno.com [66.127.85.82]) (authenticated bits=0) by ebb.errno.com (8.12.5/8.12.1) with ESMTP id g975Lg1H000879 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Sun, 6 Oct 2002 22:21:43 -0700 (PDT)?g (envelope-from sam@errno.com)œ X-Authentication-Warning: ebb.errno.com: Host melange.errno.com [66.127.85.82] claimed to be melange Message-ID: <142f01c26dc1$6c4fa5b0$52557f42@errno.com> From: "Sam Leffler" To: "Nate Lawson" , Cc: References: Subject: Re: CFR: m_tag patch Date: Sun, 6 Oct 2002 22:21:35 -0700 Organization: Errno Consulting MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > 1. Is ordering important or is an SLIST sufficient for all cases? Order is not important. > 2. Is it possible to attach the aux argument to the mbuf chain instead of > adding it as a new parameter to ip_output? > The "aux argument" _was_ originally attached to the mbuf chain. The change to add an extra arg to ip*_output was done to eliminate one of the biggest uses of the aux mbuf; the socket to use to get IPsec policy. This is a performance win and worth doing independent of the aux->m_tag switch. One could split the ip_output change out but doing it together avoids converting code that would just eventually be eliminated (unless you did the ip_output change first and then the m_tag switch). Sam To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Oct 6 23:11:18 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6EEC437B401; Sun, 6 Oct 2002 23:11:17 -0700 (PDT) Received: from hawk.mail.pas.earthlink.net (hawk.mail.pas.earthlink.net [207.217.120.22]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0D32843E9C; Sun, 6 Oct 2002 23:11:17 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0081.cvx40-bradley.dialup.earthlink.net ([216.244.42.81] helo=mindspring.com) by hawk.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17yR6f-00041e-00; Sun, 06 Oct 2002 23:11:06 -0700 Message-ID: <3DA12517.6D1B4EC2@mindspring.com> Date: Sun, 06 Oct 2002 23:09:27 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Sam Leffler Cc: Nate Lawson , freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: m_tag patch References: <142f01c26dc1$6c4fa5b0$52557f42@errno.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Sam Leffler wrote: > > 1. Is ordering important or is an SLIST sufficient for all cases? > > Order is not important. Hm. I don't know if this is actually correct. I think if it went LIFO, you might not be very happy. I think it depends on what it's used for. > > 2. Is it possible to attach the aux argument to the mbuf chain instead of > > adding it as a new parameter to ip_output? > > The "aux argument" _was_ originally attached to the mbuf chain. The change > to add an extra arg to ip*_output was done to eliminate one of the biggest > uses of the aux mbuf; the socket to use to get IPsec policy. This is a > performance win and worth doing independent of the aux->m_tag switch. > > One could split the ip_output change out but doing it together avoids > converting code that would just eventually be eliminated (unless you did the > ip_output change first and then the m_tag switch). The IPSEC for IPv4 stuff is very ugly. It's tempting to say that it has nothing to do with the IP encapsulation, proper (conceptually, it should not). The big problem is that the IPSEC allocations are are there if IPSEC is compiled into the kernel at all, even if IPSEC is not being used on a particular socket. Actually, the integration into IPv4 strikes me as little more than an afterthought: the KAME code handles it in IPv6 without the extra overhead for the non-IPSEC sockets, and the IPv4 support is more of a bolt-on than something designed in. I'd almost want to see the IPSEC stuff treated as a separate encapsulation layer, on its own. Adding a aparameter for it specifically adds more cruft on the cruft that's already there, and makes the IPSEC *not* an encapsulation, in any way. 8-(. Is there another way to do this? A general extension mechanism for attributin mbufs seems to be a good idea. People have wanted this before, for credentials (e.g. Robert suggested something like this before). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Oct 6 23:40:15 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1BC0937B404; Sun, 6 Oct 2002 23:40:14 -0700 (PDT) Received: from sccrmhc02.attbi.com (sccrmhc02.attbi.com [204.127.202.62]) by mx1.FreeBSD.org (Postfix) with ESMTP id 71F3343E9E; Sun, 6 Oct 2002 23:40:13 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by sccrmhc02.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021007064012.ULJA27763.sccrmhc02.attbi.com@InterJet.elischer.org>; Mon, 7 Oct 2002 06:40:12 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id XAA32049; Sun, 6 Oct 2002 23:32:35 -0700 (PDT) Date: Sun, 6 Oct 2002 23:32:34 -0700 (PDT) From: Julian Elischer To: Sam Leffler Cc: freebsd-arch@freebsd.org, freebsd-net@freebsd.org Subject: Re: CFR: m_tag patch In-Reply-To: <13e901c26dbb$63059f60$52557f42@errno.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG This is not a techincal comment but more a project question.. What is the relationship between these changes and the KAME code? In particular, are they goign to take these changes back into Kame? Can you outline the compatibility issues, both with KAME, and with NetBSD and OpenBSD, as I know you have been looking at OpenBSD? (Am looking at the patches now will respond technically searatly.) On Sun, 6 Oct 2002, Sam Leffler wrote: > http://www.freebsd.org/~sam/mtag.patch > > has changes to -current to replace the "aux mbuf" with a more general > mechanism borrowed from openbsd. Rather than dangling mbuf's off a packet > when auxiliary information needs to be associated with a packet a list of > variable-size struct m_tag's are kept. This is better because it: > > 1. Eliminates the use of mbufs as a general-purpose memory allocator. > 2. Avoids confusing and problematic code (e.g. ipsec stuffs multiple data > structures into an mbuf and often consults m_len to determine what > might/should be present). > 3. Means arbitrary size data can be stored (w/ mbufs you get what fits in a > fixed-size mbuf or--if it were implemented--in a cluster). > 4. Removes a recursive dependency that complicates locking in the mbuf code. > > The patch actually contains three sets of changes that are intertwined: > > 1. Remove use of aux mbufs and replace with m_tag's. > 2. Add an additional parameter to ip_output and ip6_output that was > previously passed through an aux mbuf. > 3. Rename luigi's m_tag_id hack #define to avoid name conflict with the > m_tag definition. > > I've been running something like this patch for ~9 months. The patch > actually eliminates more code than it adds and is likely to improve > performance (haven't measured). There should be no functional changes after > this patch is applied. > > Timely feedback is desired as I'd like to commit these changes in time for > DP2. > > Sam > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 0:40:13 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DB9A137B401; Mon, 7 Oct 2002 00:40:09 -0700 (PDT) Received: from rwcrmhc53.attbi.com (rwcrmhc53.attbi.com [204.127.198.39]) by mx1.FreeBSD.org (Postfix) with ESMTP id 72FD043EAA; Mon, 7 Oct 2002 00:40:09 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc53.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021007074008.OLCB18767.rwcrmhc53.attbi.com@InterJet.elischer.org>; Mon, 7 Oct 2002 07:40:08 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id AAA32405; Mon, 7 Oct 2002 00:29:36 -0700 (PDT) Date: Mon, 7 Oct 2002 00:29:35 -0700 (PDT) From: Julian Elischer To: Sam Leffler Cc: Nate Lawson , freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: m_tag patch In-Reply-To: <142f01c26dc1$6c4fa5b0$52557f42@errno.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sun, 6 Oct 2002, Sam Leffler wrote: > > 1. Is ordering important or is an SLIST sufficient for all cases? > > Order is not important. I do similar in Netgraph but could move to whatever scheme becomes standard in the rest of the system. However I have some serious comments. In netgraph I have a separate structure for the packet that contains a pointer to the mbuf chain as well a pointer to a malloc'd memory buffer that can contain metadata such as is kept here. the metadata is of the form [header][field][field][field].... where each field was defined as: struct meta_field_header { u_long cookie; /* cookie for the field. Skip fields you don't * know about (same cookie as in messgaes) */ u_short type; /* field ID within this cookie scheme */ u_short len; /* total len of this field including extra * data */ char data[0]; /* data starts here */ }; the header for metadata is: struct ng_meta { char priority; /* -ve is less priority, 0 is default */ char discardability; /* higher is less valuable.. discard first */ u_short allocated_len; /* amount malloc'd */ u_short used_len; /* sum of all fields, options etc. */ u_short flags; /* see below.. generic flags */ struct meta_field_header options[0]; /* add as (if) needed */ }; /* Flags for meta-data */ #define NGMF_TEST 0x01 /* discard at the last moment before sending */ #define NGMF_TRACE 0x02 /* trace when handing this data to a node */ One metadata collection is associated with each packet. similar to what you do except that in 4.x Ipass it as an arguent and in 5.0 I have a packet header that is passed around that identifies both data and metadata. ( I need the header for other reasons anyway) I show this only to show that I have tackled the same problem with a similar but different scheme. My thought was that a packet usually only has at most a couple of tags, and the tags are usually short, so that it was ok to malloc a bigger chunk and using the length fields walk them. If you didn't have room you could malloc a bigger one and copy the intitial fields, and then add your new ones at the end. It would probably almost never happen.. I did however find a need to delete a tag once the packet passed out of the scope of the module in question. (the "cookie" represents a particular "ABI" the "type" is only valid withing the ABI. If you do not support a particular cookie type you cannot interpret the contents aof that field) Protocols and such have their own cookies. I point this out as a feature because the TAG values need to only be defined within their own ABI/API include files. If a module that supports a particular cookie passes a packet out to the rest of the world, it probably should invalidate some fields that are set with that particular cookie, in case that packet should in some way be redirected back towards a module that does support it, and that might cause unexpected results. For this reason I needed to 'remove' fields. I ended leaving them in place and setting the cookie to a special "invalid" cookie value that no-one would match. In a similar vein, I can imagine wanting to remove one of the 'tags' in your lists. Each module has a cookie field that is pretty much guaranteed to be unique (time since epoch when written) so in your terms, the IP code would only know and recognise TAGS prepended with the IP cookie and would handle other tags as opaque data. Similarly IPSEC code would only see into tags with the IPSEC cookie prepended.. I don't think you should be defining the TAG values in a global file, but rather each module should identify its own tags and ignode others. If two modules need to share data, a .h file can contain the defineitions for that API and the cookie that identifies such tags, and both modules would include that shared include. That way the base code need not know about future improvements, which has always been "difficult" :-) > > > 2. Is it possible to attach the aux argument to the mbuf chain instead of > > adding it as a new parameter to ip_output? > > > > The "aux argument" _was_ originally attached to the mbuf chain. The change > to add an extra arg to ip*_output was done to eliminate one of the biggest > uses of the aux mbuf; the socket to use to get IPsec policy. This is a > performance win and worth doing independent of the aux->m_tag switch. > > One could split the ip_output change out but doing it together avoids > converting code that would just eventually be eliminated (unless you did the > ip_output change first and then the m_tag switch). I'm not sure but it's possible m_aux was invented so that ip_output() would not change interfaces to match with was defined in some interface method table. I think NetBSD added entries in their interface method tables with varargs (yech) to get around some of this.. So in summary: is it worth making it a linked list? how many tags do you see on packets, and how large are they? CAn you use a sche,e suc as that outlined so that each module can define its own tags privatly? > > Sam > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-net" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 9:24:55 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0223237B401; Mon, 7 Oct 2002 09:24:54 -0700 (PDT) Received: from ebb.errno.com (ebb.errno.com [66.127.85.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8873C43E42; Mon, 7 Oct 2002 09:24:53 -0700 (PDT) (envelope-from sam@errno.com) Received: from melange (melange.errno.com [66.127.85.82]) (authenticated bits=0) by ebb.errno.com (8.12.5/8.12.1) with ESMTP id g97GOn1H003251 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Mon, 7 Oct 2002 09:24:50 -0700 (PDT)?g (envelope-from sam@errno.com)œ X-Authentication-Warning: ebb.errno.com: Host melange.errno.com [66.127.85.82] claimed to be melange Message-ID: <150501c26e1e$0f5702b0$52557f42@errno.com> From: "Sam Leffler" To: "Julian Elischer" Cc: , References: Subject: Re: CFR: m_tag patch Date: Mon, 7 Oct 2002 09:24:49 -0700 Organization: Errno Consulting MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > What is the relationship between these changes and the KAME > code? In particular, are they goign to take these > changes back into Kame? Can you outline the compatibility > issues, both with KAME, and with NetBSD and OpenBSD, as I know you have > been looking at OpenBSD? > I've looked at many systems: openbsd, netbsd, linux (freeswan), bsd/os and of course I'm very familiar with commercial systems like irix and solaris. The m_tag code comes from openbsd. netbsd use aux mbuf's. Not sure what KAME compatibility means as they do not have an IPsec implementation in openbsd. The changes I proposed are intended to have the minimum impact to their source code. In fact these changes should be good for them under freebsd as it allows some obscure code to be simplified and performance to improve. Looking forward, having m_tag support (or something like it) is worthwhile for improving various bits of freebsd by replacing ad hoc mechanisms such as those used by dummynet and ipfw. It also is important to me for my IPsec implementation that uses h/w crypto and for taking advantage of future developments such as offloading IPsec calculations to NIC's. I considered a lot of different options and decided the m_tag stuff was a good way to go. It appears to do what's needed for now and the immediate future. I'm also keen to promote compatiblity across *bsd systems. Sam To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 9:32:18 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4833037B401; Mon, 7 Oct 2002 09:32:17 -0700 (PDT) Received: from ebb.errno.com (ebb.errno.com [66.127.85.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id C863343E42; Mon, 7 Oct 2002 09:32:16 -0700 (PDT) (envelope-from sam@errno.com) Received: from melange (melange.errno.com [66.127.85.82]) (authenticated bits=0) by ebb.errno.com (8.12.5/8.12.1) with ESMTP id g97GWF1H003280 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Mon, 7 Oct 2002 09:32:16 -0700 (PDT)?g (envelope-from sam@errno.com)œ X-Authentication-Warning: ebb.errno.com: Host melange.errno.com [66.127.85.82] claimed to be melange Message-ID: <150d01c26e1f$192baf10$52557f42@errno.com> From: "Sam Leffler" To: "Terry Lambert" Cc: "Nate Lawson" , , References: <142f01c26dc1$6c4fa5b0$52557f42@errno.com> <3DA12517.6D1B4EC2@mindspring.com> Subject: Re: CFR: m_tag patch Date: Mon, 7 Oct 2002 09:32:15 -0700 Organization: Errno Consulting MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > Actually, the integration into IPv4 strikes me as little more than > an afterthought: the KAME code handles it in IPv6 without the extra > overhead for the non-IPSEC sockets, and the IPv4 support is more of > a bolt-on than something designed in. I'd almost want to see the > IPSEC stuff treated as a separate encapsulation layer, on its own. > IPsec integration is done the same for IPv4 and IPv6. Specifically, the socket parameter is passed through the aux mbuf rather than as a function param. I've changed both ip_output and ip6_output to pass the socket as an additional parameter to eliminate this practice. > Adding a aparameter for it specifically adds more cruft on the cruft > that's already there, and makes the IPSEC *not* an encapsulation, in > any way. 8-(. > Adding an extra param to ip*_output is a pragmatic approach chosen to minimize impact to the code and reduce overhead. FWIW this approach is also found in openbsd, irix and bsd/os. > Is there another way to do this? A general extension mechanism for > attributin mbufs seems to be a good idea. People have wanted this > before, for credentials (e.g. Robert suggested something like this > before). > m_tag's are a general extension mechanism for attributing mbuf chains (i.e. packets). If deemed worthwhile they could be promoted from the pkthdr to the base mbuf. For now I've tried to make the change that has least impact as we're (supposedly) close a freeze for DP2. Also, the change I've made permits MFC'ing to -stable w/ binary compatibility since the SLIST of m_tag's requires only a single pointer so this can replace the point to the aux mbuf list. Sam To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 9:47: 5 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DA53437B401; Mon, 7 Oct 2002 09:47:00 -0700 (PDT) Received: from ebb.errno.com (ebb.errno.com [66.127.85.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id 44ACA43E6E; Mon, 7 Oct 2002 09:47:00 -0700 (PDT) (envelope-from sam@errno.com) Received: from melange (melange.errno.com [66.127.85.82]) (authenticated bits=0) by ebb.errno.com (8.12.5/8.12.1) with ESMTP id g97Gks1H003331 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Mon, 7 Oct 2002 09:46:58 -0700 (PDT)?g (envelope-from sam@errno.com)œ X-Authentication-Warning: ebb.errno.com: Host melange.errno.com [66.127.85.82] claimed to be melange Message-ID: <151b01c26e21$26adb690$52557f42@errno.com> From: "Sam Leffler" To: "Julian Elischer" Cc: "Nate Lawson" , , References: Subject: Re: CFR: m_tag patch Date: Mon, 7 Oct 2002 09:46:53 -0700 Organization: Errno Consulting MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > On Sun, 6 Oct 2002, Sam Leffler wrote: > > > > 1. Is ordering important or is an SLIST sufficient for all cases? > > > > Order is not important. > > I do similar in Netgraph but could move to whatever scheme becomes > standard in the rest of the system. However I have some serious > comments. > > In netgraph I have a separate structure for the packet that contains > a pointer to the mbuf chain as well a pointer to a malloc'd memory > buffer that can contain metadata such as is kept here. > the metadata is of the form [header][field][field][field].... > > where each field was defined as: > struct meta_field_header { > u_long cookie; /* cookie for the field. Skip fields you don't > * know about (same cookie as in messgaes) */ > u_short type; /* field ID within this cookie scheme */ > u_short len; /* total len of this field including extra > * data */ > char data[0]; /* data starts here */ > }; > > the header for metadata is: > struct ng_meta { > char priority; /* -ve is less priority, 0 is default */ > char discardability; /* higher is less valuable.. discard first */ > u_short allocated_len; /* amount malloc'd */ > u_short used_len; /* sum of all fields, options etc. */ > u_short flags; /* see below.. generic flags */ > struct meta_field_header options[0]; /* add as (if) needed */ > }; > > /* Flags for meta-data */ > #define NGMF_TEST 0x01 /* discard at the last moment before sending */ > #define NGMF_TRACE 0x02 /* trace when handing this data to a node */ > > One metadata collection is associated with each packet. similar to what > you do except that in 4.x Ipass it as an arguent and in 5.0 > I have a packet header that is passed around that identifies both > data and metadata. ( I need the header for other reasons anyway) > > I show this only to show that I have tackled the same problem with a > similar but different scheme. My thought was that a packet usually only > has at most a couple of tags, and the tags are usually short, so that it > was ok to malloc a bigger chunk and using the length fields walk them. > If you didn't have room you could malloc a bigger one and copy the > intitial fields, and then add your new ones at the end. It would > probably almost never happen.. > I'm not sure I follow exactly how the above works, but the tag data structures are very simple and would appear to accomodate your data structures. An m_tag is simply a variable-length malloc'd block of memory that has a type field (and linked list pointer). What you store in the space that follows the fixed-length struct m_tag header is up to you. The type field is used simply to locate tags once attached to a packet. You can certainly allocate a tag with larger chunk of memory than you initially need and store the bookkeeping info in the tag data block. This would appear to permit implementation of the above within the m_tag framework. > I did however find a need to delete a tag once the packet passed out of > the scope of the module in question. (the "cookie" represents a > particular "ABI" the "type" is only valid withing the ABI. > If you do not support a particular cookie type you cannot interpret the > contents aof that field) Protocols and such have their own cookies. > Yes, this is exactly what the m_tag_id item is for in the m_tag data structure (what I called a type field above). > I point this out as a feature because the TAG values need to only be > defined within their own ABI/API include files. > > If a module that supports a particular cookie passes a packet out to > the rest of the world, it probably should invalidate some fields that > are set with that particular cookie, in case that packet should in some > way be redirected back towards a module that does support it, and that > might cause unexpected results. For this reason I needed to 'remove' > fields. I ended leaving them in place and setting the cookie to a > special "invalid" cookie value that no-one would match. In a similar > vein, I can imagine wanting to remove one of the 'tags' in your lists. > You can either remove tags from the chain attached to an mbuf or invalidate them as you described--by setting the m_tag_id field to something "invalid". > Each module has a cookie field that is pretty much guaranteed > to be unique (time since epoch when written) so in your terms, > the IP code would only know and recognise TAGS prepended > with the IP cookie and would handle other tags as opaque data. > Similarly IPSEC code would only see into tags with the IPSEC cookie > prepended.. I don't think you should be > defining the TAG values in a global file, but rather > each module should identify its own tags and ignode others. > If two modules need to share data, a .h file can contain the > defineitions for that API and the cookie that identifies such tags, > and both modules would include that shared include. That way > the base code need not know about future improvements, which > has always been "difficult" :-) > Changing the way m_tag_id definitions are done is fine with me. It's done statically in mbuf.h for compatibility with code I've bringing over from openbsd. > > > > > 2. Is it possible to attach the aux argument to the mbuf chain instead of > > > adding it as a new parameter to ip_output? > > > > > > > The "aux argument" _was_ originally attached to the mbuf chain. The change > > to add an extra arg to ip*_output was done to eliminate one of the biggest > > uses of the aux mbuf; the socket to use to get IPsec policy. This is a > > performance win and worth doing independent of the aux->m_tag switch. > > > > One could split the ip_output change out but doing it together avoids > > converting code that would just eventually be eliminated (unless you did the > > ip_output change first and then the m_tag switch). > > I'm not sure but it's possible m_aux was invented so that ip_output() > would not change interfaces to match with was defined in some interface > method table. I think NetBSD added entries in their > interface method tables with varargs (yech) to get around some of this.. > > > So in summary: > is it worth making it a linked list? > how many tags do you see on packets, and how large are they? > CAn you use a sche,e suc as that outlined so that each module can > define its own tags privatly? > You can certainly use a scheme as you suggest. I don't believe doing something like that is precluded by what I've offered, you'd just layer the additional logic on top w/o any changes--I think. Tags tend to be very small (e.g. a pointer or two ints) and there tend to be only 1 or 2 when there are any. In my performance tuning work with IPsec their manipulation has never shown up as significant in kernel profiles. Sam To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 10:40:31 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 90BDA37B404 for ; Mon, 7 Oct 2002 10:39:14 -0700 (PDT) Received: from web11207.mail.yahoo.com (web11207.mail.yahoo.com [216.136.131.189]) by mx1.FreeBSD.org (Postfix) with SMTP id 4079E43E65 for ; Mon, 7 Oct 2002 10:39:14 -0700 (PDT) (envelope-from gathorpe79@yahoo.com) Message-ID: <20021007173913.50425.qmail@web11207.mail.yahoo.com> Received: from [149.99.116.61] by web11207.mail.yahoo.com via HTTP; Mon, 07 Oct 2002 13:39:13 EDT Date: Mon, 7 Oct 2002 13:39:13 -0400 (EDT) From: Gary Thorpe Subject: Re: Running independent kernel instances on dual-Xeon/E7500 system To: Terry Lambert Cc: freebsd-arch@freebsd.org In-Reply-To: <3DA10949.218488B9@mindspring.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --- Terry Lambert wrote: > Nate Lawson wrote: > > My dismissiveness was due to anticipating the > direction this was going, > > which is nicely shown by the response below. In > short, dedicated > > processors for IO were used in the minicomputer > days but are wasteful > > nowadays when you have lightweight interrupts > and/or polling when > > appropriate. > > Yet, I keep running into employers who want to pay > people to do > exactly that, particularly for offloading network > processing to > one processor, and running applications on the > other. Wouldn't this be "solved" by using thread affinity? > > And then there's the Tigon II firmware rewrite for > FreeBSD, to > offload interrupt and copy processing. And CGD's > work for Sibytes > (NetBSD 64bit MIPS-based network coprocessor board) > doing just that > got the company sold to Broadcom for what, $700M? > > 8-). > > > > If your scheduler sucks, fix it. If a device > needs extra processing > > equivalent to another N Ghz CPU, the vendor will > add silicon. The "S" in > > SMP is for symmetric, lest we forget. > > People keep saying that, and then keep not running > interrupts in > virtual wire mode, so that their delivery is "S" as > in "symmetric"... > ;^). > > Actually, NT proved that wiring particular > interrupts to particular > processors was the way to go -- that was one of the > things they did > to beat the Linux numbers in both the Netcraft and > Ziff-Davis > benchmarks... perfect symmetry isn't all that it's > promised. > > -- Terry I remember when I mentioned that some time ago and got the general response that this setup is highly specialized, inflexible, and probably not very useful for a real-world server. People did point of that with MORE cpus and/or MORE network adapters or some combinations that is not n:n ratio, NT would not have scaled well at all. How would NT compare to Tru64, Solaris, AIX, or IRIX in a similar test? Do any of these "hardwire" interrupts to particular cpus? I think what the original poster would want is something like user-mode linux or vmware. Aside from machine emulation (via bochs and simular simulators), does anything exist for FreeBSD which would allow you to run seperate, independent environments? ______________________________________________________________________ Post your free ad now! http://personals.yahoo.ca To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 11:54:41 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3EDCC37B401 for ; Mon, 7 Oct 2002 11:54:39 -0700 (PDT) Received: from mail.speakeasy.net (mail12.speakeasy.net [216.254.0.212]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6FDB343E77 for ; Mon, 7 Oct 2002 11:54:38 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 24832 invoked from network); 7 Oct 2002 18:54:38 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail12.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 7 Oct 2002 18:54:38 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id g97Isan5003000; Mon, 7 Oct 2002 14:54:36 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <200210050929.g959T1vU023691@gw.catspoiler.org> Date: Mon, 07 Oct 2002 14:54:40 -0400 (EDT) From: John Baldwin To: Don Lewis Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., Cc: arch@FreeBSD.ORG, jmallett@FreeBSD.ORG Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 05-Oct-2002 Don Lewis wrote: > On 5 Oct, Juli Mallett wrote: >> * De: Don Lewis [ Data: 2002-10-05 ] >> [ Subjecte: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] ] >>> On 5 Oct, Juli Mallett wrote: >>> > To >>> > accomodate situations where allocation of a 'ksiginfo' is a failure >>> > mode (no memory), the destination process is told to exit via a new >>> > member of 'struct proc', p_suicide, which tells a process to kill itself >>> > next time it goes through userret. >>> >>> I hope that doesn't happen when I fg my editor ... >> >> In this situation (can't allocate 64 bytes) you're screwed if you have an >> editor in the background, coming to the foreground, anyway. > > A lot of things that receive SIGCHLD, such as shells and inetd could > also be affected a temporary shortage of kmem. > > Somehow it seems wasteful to have to allocate kmem to deliver SIGKILL. > > How is an ordinary userland program prevented from consuming all of kmem > by blocking signal delivery and looping on kill()? Does a quota system > need to be added? > > The following code never sets error to anything other than zero. It > also looks like it is missing a return statement for the malloc() failed > case. > > +int > +ksiginfo_alloc(struct ksiginfo **ksip, struct proc *p, int signo) > +{ > + int error; > + struct ksiginfo *ksi; > + > + error = 0; > + > + PROC_LOCK_ASSERT(p, MA_NOTOWNED); > + ksi = malloc(sizeof *ksi, M_KSIGINFO, M_ZERO | M_NOWAIT); > + if (ksi == NULL) { > + PROC_LOCK(p); > + p->p_suicide = 1; > + PROC_UNLOCK(p); > + } > + ksi->ksi_signo = signo; > + if (curproc != NULL) { > + ksi->ksi_pid = curproc->p_pid; > + ksi->ksi_ruid = curproc->p_ucred->cr_uid; This is not safe w/o proc lock held. Probably should be using curthread and td_ucred instead. Also, curproc cannot be NULL in current. > + } > + *ksip = ksi; > + return (error); > +} > > > > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 11:54:44 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 89AC737B401 for ; Mon, 7 Oct 2002 11:54:41 -0700 (PDT) Received: from mail.speakeasy.net (mail17.speakeasy.net [216.254.0.217]) by mx1.FreeBSD.org (Postfix) with ESMTP id 196FA43E4A for ; Mon, 7 Oct 2002 11:54:41 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 17979 invoked from network); 7 Oct 2002 18:54:40 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail17.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 7 Oct 2002 18:54:40 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id g97Iscn5003007; Mon, 7 Oct 2002 14:54:38 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <200210051000.g95A0ZvU023752@gw.catspoiler.org> Date: Mon, 07 Oct 2002 14:54:42 -0400 (EDT) From: John Baldwin To: Don Lewis Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., Cc: arch@FreeBSD.ORG, jmallett@FreeBSD.ORG Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 05-Oct-2002 Don Lewis wrote: > On 5 Oct, Juli Mallett wrote: > >> diff -Nrdu -x *CVS* -x *dev* sys/kern/kern_exit.c kernel/kern/kern_exit.c >> --- sys/kern/kern_exit.c Tue Oct 1 12:15:51 2002 >> +++ kernel/kern/kern_exit.c Sat Oct 5 01:20:57 2002 > >> @@ -209,12 +210,12 @@ >> PROC_LOCK(p); >> if (p == p->p_leader) { >> q = p->p_peers; >> + PROC_UNLOCK(p); >> while (q != NULL) { >> - PROC_LOCK(q); >> psignal(q, SIGKILL); >> - PROC_UNLOCK(q); >> q = q->p_peers; >> } >> + PROC_LOCK(p); >> while (p->p_peers) >> msleep(p, &p->p_mtx, PWAIT, "exit1", 0); >> } > > This scary looking fragment of code in exit1() is relying on the lock on > p->p_leader being continuously held to prevent the p_peers list from > changing while the list traversal is in progress. The code in > kern_fork.c and elsewhere in kern_exit.c holds a lock on p_leader while > the list modifications are done. > > The existing code looks like it could deadlock if q is locked because it > is in fork() or exit(). Process p will block when it tries to lock q, > and q will block when it tries to lock its p_leader, which happens to be > p. Ugh. Probably the code should be changed to do something like this: --- kern_exit.c 2 Oct 2002 23:12:01 -0000 1.181 +++ kern_exit.c 7 Oct 2002 18:48:18 -0000 @@ -203,17 +203,18 @@ */ p->p_flag |= P_WEXIT; - PROC_UNLOCK(p); /* Are we a task leader? */ - PROC_LOCK(p); if (p == p->p_leader) { q = p->p_peers; while (q != NULL) { + nq = q->p_peers; + PROC_UNLOCK(p); PROC_LOCK(q); psignal(q, SIGKILL); PROC_UNLOCK(q); - q = q->p_peers; + PROC_LOCK(p); + q = nq; } while (p->p_peers) msleep(p, &p->p_mtx, PWAIT, "exit1", 0); Also, we might should check P_WEXIT and abort in fork1() if it is set. (We don't appear to do that presently.) -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 13:29: 2 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 92A3F37B401 for ; Mon, 7 Oct 2002 13:29:01 -0700 (PDT) Received: from corbulon.video-collage.com (corbulon.video-collage.com [64.35.99.179]) by mx1.FreeBSD.org (Postfix) with ESMTP id E8AC043E65 for ; Mon, 7 Oct 2002 13:28:59 -0700 (PDT) (envelope-from mi+mx@aldan.algebra.com) Received: from misha.murex.com (250-217.customer.cloud9.net [168.100.250.217]) by corbulon.video-collage.com (8.12.2/8.12.2) with ESMTP id g97KSp1P067463 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=FAIL) for ; Mon, 7 Oct 2002 16:28:53 -0400 (EDT) (envelope-from mi+mx@aldan.algebra.com) X-Authentication-Warning: corbulon.video-collage.com: Host 250-217.customer.cloud9.net [168.100.250.217] claimed to be misha.murex.com Content-Type: text/plain; charset="us-ascii" From: Mikhail Teterin Organization: Virtual Estates, Inc. To: arch@FreeBSD.org Subject: swapon some regular file Date: Mon, 7 Oct 2002 16:30:42 -0400 User-Agent: KMail/1.4.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-Id: <200210071630.42512.mi+mx@aldan.algebra.com> X-Scanned-By: MIMEDefang 2.15 (www dot roaringpenguin dot com slash mimedefang) Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Currently, swapon(2) will only succeed if the argument vn_isdisk(9), or if it is an NFS-accessed file. Users wishing to swap onto a local regular file have to go through the vnconfig/mdconfig gimnastics. Is that intentional? If not, should it be fixed by relaxing the swapon(2)'s to not require the VFCF_NETWORK for regular files, or -- cosmeticly -- by modifying the swapon(8) to do the vnconfig/mdconfig-ing inside? In both cases, pstat will, probably, need improving to display the regular file name in the -s case. Thanks! -mi To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 14:16: 0 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 90D9E37B401 for ; Mon, 7 Oct 2002 14:15:55 -0700 (PDT) Received: from swan.mail.pas.earthlink.net (swan.mail.pas.earthlink.net [207.217.120.123]) by mx1.FreeBSD.org (Postfix) with ESMTP id 12AB843E77 for ; Mon, 7 Oct 2002 14:15:55 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0113.cvx22-bradley.dialup.earthlink.net ([209.179.198.113] helo=mindspring.com) by swan.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17yfEG-0006AN-00; Mon, 07 Oct 2002 14:15:52 -0700 Message-ID: <3DA1F91F.F707826E@mindspring.com> Date: Mon, 07 Oct 2002 14:14:07 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Gary Thorpe Cc: freebsd-arch@freebsd.org Subject: Re: Running independent kernel instances on dual-Xeon/E7500 system References: <20021007173913.50425.qmail@web11207.mail.yahoo.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Gary Thorpe wrote: > --- Terry Lambert wrote: > > Nate Lawson wrote: > > > My dismissiveness was due to anticipating the direction this was going, > > > which is nicely shown by the response below. In short, dedicated > > > processors for IO were used in the minicomputer days but are wasteful > > > nowadays when you have lightweight interrupts and/or polling when > > > appropriate. > > > > Yet, I keep running into employers who want to pay people to do > > exactly that, particularly for offloading network processing to > > one processor, and running applications on the other. > > Wouldn't this be "solved" by using thread affinity? No. As I pointed out, "netperf" is a single server process, and will not gain advantage from multiple CPUs, under any circumstances, and will only suffer performance degradation attributable to SMP related overhead, when running on an SMP system. Some of this overhead is the lack of process affinity for a given CPU; some of it is the network interrupts going to whichever CPU holds "giant", instead of going to a particular CPU; some of it is the use of interrupt threads, which can cause the processing to occur on a CPU other than the one the interrupt came in on; some of it is the NETISR processing; etc.. Basically, it's possible at this point to: 1) Receive packets on CPU #1 2) Run the interrupt thread processing on CPU #2 3) Run the NETISR on CPU #3 4) Run the user application on CPU #4 Even if you dealt with the CPU affinity issue for the process, and even if you dealt with it for the interrupt thread, there are still two (potentially) disjoint processing boundaries to cross, which *will* be statistically disjoint. Running the protocol processing from interrupt through completion (queueing of data on the sb_rcv) on a signel CPU is a significant win. > > Actually, NT proved that wiring particular interrupts to particular > > processors was the way to go -- that was one of the things they did > > to beat the Linux numbers in both the Netcraft and Ziff-Davis > > benchmarks... perfect symmetry isn't all that it's promised. > > I remember when I mentioned that some time ago and got > the general response that this setup is highly > specialized, inflexible, and probably not very useful > for a real-world server. The NT setup had multiple interfaces on the same wire, which is what makes this assessment accurate: it's not intrinsically correct to claim this, it's only correct to claim it on the Netcraft benchmark hardware configuration. For the Ziff-Davis configuration, this is an inaccurate claim. > People did point of that with > MORE cpus and/or MORE network adapters or some > combinations that is not n:n ratio, NT would not have > scaled well at all. How would NT compare to Tru64, > Solaris, AIX, or IRIX in a similar test? If there are more adapters than CPUs, and the load is not constant, you will get better performance by sharing the load, particularly if you are compute bound; though that load sharing would be best done by moving the unbalanced network interfaces around. If you have more CPUs than network interfaces, then you are in fact better off handling the interrupts from a particular adapter on a particular CPU, each time, instead of moving it around, to avoid destroying cache locality. > Do any of > these "hardwire" interrupts to particular cpus? I know that IRIX does, and that AIX does. I can't speak for Solaris in general, but in general, all NUMA machines process interrupts local to the processor cluster in which they occur. 8-). > I think what the original poster would want is > something like user-mode linux or vmware. I have had extensive off-list communications with the original poster. Without revealing anything not posted to the list, I can say that you are wrong: neither one of those would solve the problem. Some of this might be my assumption. It's true that seperating the hardware into logically seperate machines is one approach to the problem, and not the only possible approach, so there is some "If I have a hammer, it must be a nail" here. IMO, adding CPUs is a solution looking for a problem to be able to solve, in this case. > Aside from > machine emulation (via bochs and simular simulators), > does anything exist for FreeBSD which would allow you > to run seperate, independent environments? No. That's kind of what we were talking about implementing, with this thread. Actually, I was much more conservative with my own suggestions, in this regard: I think it would be useful for debugging purposes, and it would be less difficult than trying to implement something like "WINICE". It could *also* be useful for satisfying bosses who are under the impression that you should be able to multiply the performance of a box with one CPU by the number of CPUs you have in it after you upgrade it, and come out with a rough approximation of the performance. In a general sense, though, the idea of being able to run certain higher priority tasks on one CPU, and lower priority work on another, so that if the lower priority work gets overloaded, at least the high priority work does not end up with degraded performance, is a good one. Normally, you would handle this by having hard RT. PC hardware, unfortunately, is incapable of handling hard RT tasks, other than a single hard RT task per platform, due to its design; but you should be able to have at least *one* hard RT task. LinuxRT deals with this by having a hard RT executive, that runs the Linux kernel as one of its tasks, and assigns resources. Doing something similar on a two processor box in FreeBSD, without needing the RT executive because you have an extra CPU, is not that much of an intuitive leap, I think. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 14:23:34 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5FEBD37B404; Mon, 7 Oct 2002 14:23:32 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2EB0D43EA9; Mon, 7 Oct 2002 14:23:31 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Received: from mousie.catspoiler.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.5/8.12.5) with ESMTP id g97LNGvU033246; Mon, 7 Oct 2002 14:23:20 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Message-Id: <200210072123.g97LNGvU033246@gw.catspoiler.org> Date: Mon, 7 Oct 2002 14:23:16 -0700 (PDT) From: Don Lewis Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., To: jhb@FreeBSD.ORG Cc: arch@FreeBSD.ORG, jmallett@FreeBSD.ORG In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 7 Oct, John Baldwin wrote: > > On 05-Oct-2002 Don Lewis wrote: >> On 5 Oct, Juli Mallett wrote: >> >>> diff -Nrdu -x *CVS* -x *dev* sys/kern/kern_exit.c kernel/kern/kern_exit.c >>> --- sys/kern/kern_exit.c Tue Oct 1 12:15:51 2002 >>> +++ kernel/kern/kern_exit.c Sat Oct 5 01:20:57 2002 >> >>> @@ -209,12 +210,12 @@ >>> PROC_LOCK(p); >>> if (p == p->p_leader) { >>> q = p->p_peers; >>> + PROC_UNLOCK(p); >>> while (q != NULL) { >>> - PROC_LOCK(q); >>> psignal(q, SIGKILL); >>> - PROC_UNLOCK(q); >>> q = q->p_peers; >>> } >>> + PROC_LOCK(p); >>> while (p->p_peers) >>> msleep(p, &p->p_mtx, PWAIT, "exit1", 0); >>> } >> >> This scary looking fragment of code in exit1() is relying on the lock on >> p->p_leader being continuously held to prevent the p_peers list from >> changing while the list traversal is in progress. The code in >> kern_fork.c and elsewhere in kern_exit.c holds a lock on p_leader while >> the list modifications are done. >> >> The existing code looks like it could deadlock if q is locked because it >> is in fork() or exit(). Process p will block when it tries to lock q, >> and q will block when it tries to lock its p_leader, which happens to be >> p. > > Ugh. Probably the code should be changed to do something like this: > > --- kern_exit.c 2 Oct 2002 23:12:01 -0000 1.181 > +++ kern_exit.c 7 Oct 2002 18:48:18 -0000 > @@ -203,17 +203,18 @@ > */ > > p->p_flag |= P_WEXIT; > - PROC_UNLOCK(p); > > /* Are we a task leader? */ > - PROC_LOCK(p); > if (p == p->p_leader) { > q = p->p_peers; > while (q != NULL) { > + nq = q->p_peers; > + PROC_UNLOCK(p); > PROC_LOCK(q); > psignal(q, SIGKILL); > PROC_UNLOCK(q); > - q = q->p_peers; > + PROC_LOCK(p); > + q = nq; > } > while (p->p_peers) > msleep(p, &p->p_mtx, PWAIT, "exit1", 0); It's not obvious to me that your alternative is safe. It avoids the deadlock problem, but what keeps the list from changing while it is being traversed, especially while we're waiting for PROC_LOCK(q)? It separate lock for the peer list (instead of using PROC_LOCK(p_leader)) looks like the obvious fix. Grabbing the peer list lock after unlocking P would avoid the deadlock and allow us to do whatever locking is needed for psignal(). > Also, we might should check P_WEXIT and abort in fork1() if it is > set. (We don't appear to do that presently.) > Probably, but the list is also modified in the exit code. All those processes that we are sending SIGKILL to are removing themselves from the list. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 14:26: 3 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9365237B407 for ; Mon, 7 Oct 2002 14:25:59 -0700 (PDT) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id E469B43E6A for ; Mon, 7 Oct 2002 14:25:55 -0700 (PDT) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id C363B2A88D; Mon, 7 Oct 2002 14:25:45 -0700 (PDT) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Mikhail Teterin Cc: arch@FreeBSD.org Subject: Re: swapon some regular file In-Reply-To: <200210071630.42512.mi+mx@aldan.algebra.com> Date: Mon, 07 Oct 2002 14:25:45 -0700 From: Peter Wemm Message-Id: <20021007212545.C363B2A88D@canning.wemm.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Mikhail Teterin wrote: > Currently, swapon(2) will only succeed if the argument vn_isdisk(9), or > if it is an NFS-accessed file. > > Users wishing to swap onto a local regular file have to go through the > vnconfig/mdconfig gimnastics. Is that intentional? Yes, it is quite intentional. swap_pager doesn't have the code to do logical to physical translation that file IO would require. And the VOP_BMAP calls that do that add new complications, including an additional place it can sleep or run out of memory. We can get away with it for swapping to a file over NFS because the remote server does the translation, not us. In reality, what is required is some careful cut/paste of code from vnode_pager to swap_pager to add the missing bits, and some care to deal with the complications. vnconfig/mdconfig work because that basically adds the logical -> physical translation step. I'd just as soon not have to mess with this though. Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 14:33:18 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F25FC37B401 for ; Mon, 7 Oct 2002 14:33:16 -0700 (PDT) Received: from corbulon.video-collage.com (corbulon.video-collage.com [64.35.99.179]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5CB8643E86 for ; Mon, 7 Oct 2002 14:33:16 -0700 (PDT) (envelope-from mi+mx@aldan.algebra.com) Received: from misha.murex.com (250-217.customer.cloud9.net [168.100.250.217]) by corbulon.video-collage.com (8.12.2/8.12.2) with ESMTP id g97LX91P068436 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=FAIL); Mon, 7 Oct 2002 17:33:11 -0400 (EDT) (envelope-from mi+mx@aldan.algebra.com) X-Authentication-Warning: corbulon.video-collage.com: Host 250-217.customer.cloud9.net [168.100.250.217] claimed to be misha.murex.com Content-Type: text/plain; charset="koi8-u" From: Mikhail Teterin Organization: Virtual Estates, Inc. To: Peter Wemm Subject: Re: swapon some regular file Date: Mon, 7 Oct 2002 17:35:00 -0400 User-Agent: KMail/1.4.3 Cc: arch@FreeBSD.org References: <20021007212545.C363B2A88D@canning.wemm.org> In-Reply-To: <20021007212545.C363B2A88D@canning.wemm.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-Id: <200210071735.00722.mi+mx@aldan.algebra.com> X-Scanned-By: MIMEDefang 2.15 (www dot roaringpenguin dot com slash mimedefang) Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Monday 07 October 2002 05:25 pm, Peter Wemm wrote: = Mikhail Teterin wrote: = > Currently, swapon(2) will only succeed if the argument vn_isdisk(9), = > or if it is an NFS-accessed file. = > = > Users wishing to swap onto a local regular file have to go through = > the vnconfig/mdconfig gimnastics. Is that intentional? = We can get away with it for swapping to a file over NFS because the = remote server does the translation, not us. I see. = In reality, what is required is some careful cut/paste of code from = vnode_pager to swap_pager to add the missing bits, and some care to = deal with the complications. = vnconfig/mdconfig work because that basically adds the logical -> = physical translation step. I'd just as soon not have to mess with this = though. I guess, the question is, will doing this directly to a file be faster? Otherwise, vnconfig/mdconfig is Ok, if not too good-looking... For users' convenience, the creation of the md or vn devices can be put into swapon(8)... -mi To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 14:40:17 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F3AE137B404; Mon, 7 Oct 2002 14:40:12 -0700 (PDT) Received: from sccrmhc03.attbi.com (sccrmhc03.attbi.com [204.127.202.63]) by mx1.FreeBSD.org (Postfix) with ESMTP id 426E943E77; Mon, 7 Oct 2002 14:40:12 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by sccrmhc03.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021007214011.LSTQ22381.sccrmhc03.attbi.com@InterJet.elischer.org>; Mon, 7 Oct 2002 21:40:11 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id OAA35699; Mon, 7 Oct 2002 14:21:15 -0700 (PDT) Date: Mon, 7 Oct 2002 14:21:14 -0700 (PDT) From: Julian Elischer To: Sam Leffler Cc: freebsd-arch@freebsd.org, freebsd-net@freebsd.org Subject: Re: CFR: m_tag patch In-Reply-To: <13e901c26dbb$63059f60$52557f42@errno.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG If we make this a 'standard' method of adding metadata to a packet, then I'd like to move netgraph to use it to. it's always better to use a standard method if it will do than to roll your own. however, there are some thiongs that would proclude me from doing so at tth moment. Thes are not major issues but they do give a little feeling of "banging a square peg into a round hole" if I try use what you have for replacing the netgraph metadata system. Issues: Firstlty, each netgraph module type is ignorant of all other types (except in some special cases). Each 'type' labels its structures, control messages, and in fact special metadata that it keeps on a packet, using a special maginc number. Each 'type' of netgraph node (e.g. ppp node, or frame-relay-mux node) defines a different 32 bit magic number (traditionally in netgraph it's just the 32 bit time since the epoch when the module was written). Metadata that is associated with a packet may only have meaning to a subset of the modules that touch that packet. so in effect teh maginc number is an API/ABI indicator. It specifies which API defines this metadata. (or in netgraph, this control message). A particular set of modules may want to know about a common API which includes some metadata thet they share. In tehis case it is permissable for them to agree about a common magic number to identify these things, stored in an include file special to that API interface. All modules that do not know about that maginc number (do not include that API include file) must ignore that metadata. In this way 3rd party modes can define their own metadata types, assuming that they do not have a magic-number collision with some other node type (pretty damned unlikely). Once they have seen that teh metadata belongs to an API they understand then, and only then will they look firther to try understand the type of the metadata. There is enough data (e.g. size info) to allow ignorant nodes to free or bypass unknown metadata. Your tags have a single 16 bit tag ID field. This is insufficient for my needs. I need to be able to store the API cookie which is a 32 bit unsigned number, and on top of that, I also need a 16 bit type field that specifies what the data is within teh given API and a 16 bit length to allow opaque handling. your stucture is: /* * Packet tags structure. */ struct m_tag { SLIST_ENTRY(m_tag) m_tag_link; /* List of packet tags */ u_int16_t m_tag_id; /* Tag ID */ u_int16_t m_tag_len; /* Length of data */ }; and mine is: struct meta_field_header { u_long cookie; /* cookie for the field. Skip fields you don't * know about (same cookie as in messages) */ u_short type; /* field ID */ u_short len; /* total len of this field including extra * data */ char data[0]; /* data starts here */ }; Basically I'd need to add a 32 bit API identifier to the metadata to allow me to use it. Since (for example) both ppp and frame relay could define a metadata of type "1". Since neither ppp nor frame realy knows of the other they can not co-operate on selecting command and metadata IDs. However we DO send ppp packets over frame relay links, so the chance that a packet has to pass through both node types is very real. I would define a 'global' API that defines a few characteristics that would be useful for all modules to know about. e.g. packet priority. (frame relay needs this to work right for example, but it needs to be understood at the physical layer so that hi-pri packets can bypass low-pri packets) Your TAG IDS are centrally allocated. e.g. in mbuf.h: /* Packet tag types -- first ones are from NetBSD */ #define PACKET_TAG_NONE 0 /* Nadda */ #define PACKET_TAG_IPSEC_IN_DONE 1 /* IPsec applied, in */ #define PACKET_TAG_IPSEC_OUT_DONE 2 /* IPsec applied, out */ #define PACKET_TAG_IPSEC_IN_CRYPTO_DONE 3 /* NIC IPsec crypto done */ #define PACKET_TAG_IPSEC_OUT_CRYPTO_NEEDED 4 /* NIC IPsec crypto req'ed */ #define PACKET_TAG_IPSEC_IN_COULD_DO_CRYPTO 5 /* NIC notifies IPsec */ #define PACKET_TAG_IPSEC_PENDING_TDB 6 /* Reminder to do IPsec */ #define PACKET_TAG_BRIDGE 7 /* Bridge processing done */ #define PACKET_TAG_GIF 8 /* GIF processing done */ #define PACKET_TAG_GRE 9 /* GRE processing done */ [etc.] If you allocated your API definitions corectly using my scheme, you might allocate a API number of 1034025045 for example to the IPSEC-CRYPTO interface. and that API would define it's own IDS for metadata. This would not be able to accidentally match with the Priority metadata used for frame relay (if you sent Ipsec over frame relay) because (for example) teh frame relay API number is 872148478 /usr/include/netgraph/ng_frame_relay.h line 48 #define NGM_FRAMERELAY_COOKIE 872148478 You wouldn't have to know about frame relay and frame relay doesn't need to know about IPSEC, but it does know how to free the metadata if it needs to discard the packet. (this is a contrived example because priority is a "BASE API" metadata type in netgraph, and the base API doesn't have a magic number at the moment but probably should have one. (certainly would in this case) As I mentionned before, it is also not clear to me that the metadata needs to be in linked list form, but I could live with it. julian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 14:40:22 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 05CC037B408; Mon, 7 Oct 2002 14:40:21 -0700 (PDT) Received: from sccrmhc03.attbi.com (sccrmhc03.attbi.com [204.127.202.63]) by mx1.FreeBSD.org (Postfix) with ESMTP id 56DE143E7B; Mon, 7 Oct 2002 14:40:20 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by sccrmhc03.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021007214019.LSWJ22381.sccrmhc03.attbi.com@InterJet.elischer.org>; Mon, 7 Oct 2002 21:40:19 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id OAA35732; Mon, 7 Oct 2002 14:29:24 -0700 (PDT) Date: Mon, 7 Oct 2002 14:29:22 -0700 (PDT) From: Julian Elischer To: Don Lewis Cc: jhb@FreeBSD.ORG, arch@FreeBSD.ORG, jmallett@FreeBSD.ORG Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., In-Reply-To: <200210072123.g97LNGvU033246@gw.catspoiler.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 7 Oct 2002, Don Lewis wrote: > > Also, we might should check P_WEXIT and abort in fork1() if it is > > set. (We don't appear to do that presently.) > > > > Probably, but the list is also modified in the exit code. All those > processes that we are sending SIGKILL to are removing themselves from > the list. > If you are forking and exititng at once you are obviously a threaded program. In that case the single-threading code in fork and exit will kick in and the fork() will abort if anothe rthread got to exit() first, and exit() will delay until it is safe to proceed if another thread is in fork() first. (not sure if this is relevant to what you are talking about.) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 14:55:57 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 768F637B401 for ; Mon, 7 Oct 2002 14:55:56 -0700 (PDT) Received: from falcon.mail.pas.earthlink.net (falcon.mail.pas.earthlink.net [207.217.120.74]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3495743E42 for ; Mon, 7 Oct 2002 14:55:56 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0113.cvx22-bradley.dialup.earthlink.net ([209.179.198.113] helo=mindspring.com) by falcon.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17yfqr-0000ug-00; Mon, 07 Oct 2002 14:55:45 -0700 Message-ID: <3DA2028E.87632EE1@mindspring.com> Date: Mon, 07 Oct 2002 14:54:22 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Mikhail Teterin Cc: arch@FreeBSD.org Subject: Re: swapon some regular file References: <200210071630.42512.mi+mx@aldan.algebra.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Mikhail Teterin wrote: > Currently, swapon(2) will only succeed if the argument vn_isdisk(9), or > if it is an NFS-accessed file. > > Users wishing to swap onto a local regular file have to go through the > vnconfig/mdconfig gimnastics. Is that intentional? Yes. You have to understand that the code that does the swapping for NFS is different than the code that does it for devices. Local swap to files is via the device pager. > If not, should it be fixed by relaxing the swapon(2)'s to not require > the VFCF_NETWORK for regular files, or -- cosmeticly -- by modifying the > swapon(8) to do the vnconfig/mdconfig-ing inside? And writing a "file_pager.c to live in /usr/src/sys/vm... 8-). > In both cases, pstat will, probably, need improving to display the > regular file name in the -s case. It would have to be recorded (it's not), because there might be multiple links to it. The name really doesn't matter, since the file might have been removed from the FS (e.g., it may no longer *have* a name). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 14:58:42 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9ECFE37B401 for ; Mon, 7 Oct 2002 14:58:41 -0700 (PDT) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id 559AA43E6E for ; Mon, 7 Oct 2002 14:58:41 -0700 (PDT) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id 371562A88D; Mon, 7 Oct 2002 14:58:41 -0700 (PDT) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Terry Lambert Cc: Mikhail Teterin , arch@FreeBSD.org Subject: Re: swapon some regular file In-Reply-To: <3DA2028E.87632EE1@mindspring.com> Date: Mon, 07 Oct 2002 14:58:41 -0700 From: Peter Wemm Message-Id: <20021007215841.371562A88D@canning.wemm.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Terry Lambert wrote: > Mikhail Teterin wrote: > > Currently, swapon(2) will only succeed if the argument vn_isdisk(9), or > > if it is an NFS-accessed file. > > > > Users wishing to swap onto a local regular file have to go through the > > vnconfig/mdconfig gimnastics. Is that intentional? > > Yes. You have to understand that the code that does the swapping > for NFS is different than the code that does it for devices. Local > swap to files is via the device pager. Nope. swap_pager, not device_pager. device_pager is the glue for mmaping character devices. Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 15: 4:51 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1EDB937B404 for ; Mon, 7 Oct 2002 15:04:50 -0700 (PDT) Received: from falcon.mail.pas.earthlink.net (falcon.mail.pas.earthlink.net [207.217.120.74]) by mx1.FreeBSD.org (Postfix) with ESMTP id B9EE043E75 for ; Mon, 7 Oct 2002 15:04:49 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0113.cvx22-bradley.dialup.earthlink.net ([209.179.198.113] helo=mindspring.com) by falcon.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17yfzY-0006C8-00; Mon, 07 Oct 2002 15:04:45 -0700 Message-ID: <3DA204A7.50530BE5@mindspring.com> Date: Mon, 07 Oct 2002 15:03:19 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Peter Wemm Cc: Mikhail Teterin , arch@FreeBSD.org Subject: Re: swapon some regular file References: <20021007212545.C363B2A88D@canning.wemm.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Peter Wemm wrote: > > Users wishing to swap onto a local regular file have to go through the > > vnconfig/mdconfig gimnastics. Is that intentional? > > Yes, it is quite intentional. swap_pager doesn't have the code to do > logical to physical translation that file IO would require. And the > VOP_BMAP calls that do that add new complications, including an additional > place it can sleep or run out of memory. > > We can get away with it for swapping to a file over NFS because the remote > server does the translation, not us. > > In reality, what is required is some careful cut/paste of code from > vnode_pager to swap_pager to add the missing bits, and some care to deal > with the complications. > > vnconfig/mdconfig work because that basically adds the logical -> physical > translation step. I'd just as soon not have to mess with this though. It would be useful to be able to ask a file for its list of physical blocks on the underlying device, so that you could sort them into contiguous extents, and then use *those*, instead of eating the translation overhead, each time... -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 15: 7:10 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 15BE937B401 for ; Mon, 7 Oct 2002 15:07:09 -0700 (PDT) Received: from freebie.xs4all.nl (freebie.xs4all.nl [213.84.32.253]) by mx1.FreeBSD.org (Postfix) with ESMTP id 02D5143E7B for ; Mon, 7 Oct 2002 15:07:08 -0700 (PDT) (envelope-from wkb@freebie.xs4all.nl) Received: from freebie.xs4all.nl (localhost [127.0.0.1]) by freebie.xs4all.nl (8.12.6/8.12.6) with ESMTP id g97M6uFr000617; Tue, 8 Oct 2002 00:06:57 +0200 (CEST) (envelope-from wkb@freebie.xs4all.nl) Received: (from wkb@localhost) by freebie.xs4all.nl (8.12.6/8.12.6/Submit) id g97M6uSv000616; Tue, 8 Oct 2002 00:06:56 +0200 (CEST) Date: Tue, 8 Oct 2002 00:06:56 +0200 From: Wilko Bulte To: Terry Lambert Cc: Peter Wemm , Mikhail Teterin , arch@FreeBSD.ORG Subject: Re: swapon some regular file Message-ID: <20021008000656.A598@freebie.xs4all.nl> References: <20021007212545.C363B2A88D@canning.wemm.org> <3DA204A7.50530BE5@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3DA204A7.50530BE5@mindspring.com>; from tlambert2@mindspring.com on Mon, Oct 07, 2002 at 03:03:19PM -0700 X-OS: FreeBSD 4.7-RC X-PGP: finger wilko@freebsd.org Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, Oct 07, 2002 at 03:03:19PM -0700, Terry Lambert wrote: > Peter Wemm wrote: > > > Users wishing to swap onto a local regular file have to go through the > > > vnconfig/mdconfig gimnastics. Is that intentional? > > > > Yes, it is quite intentional. swap_pager doesn't have the code to do > > logical to physical translation that file IO would require. And the > > VOP_BMAP calls that do that add new complications, including an additional > > place it can sleep or run out of memory. > > > > We can get away with it for swapping to a file over NFS because the remote > > server does the translation, not us. > > > > In reality, what is required is some careful cut/paste of code from > > vnode_pager to swap_pager to add the missing bits, and some care to deal > > with the complications. > > > > vnconfig/mdconfig work because that basically adds the logical -> physical > > translation step. I'd just as soon not have to mess with this though. > > It would be useful to be able to ask a file for its list of > physical blocks on the underlying device, so that you could > sort them into contiguous extents, and then use *those*, > instead of eating the translation overhead, each time... Sounds a bit like VMS (IIRC..) :) -- | / o / /_ _ wilko@FreeBSD.org |/|/ / / /( (_) Bulte Arnhem, the Netherlands To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 15: 9:18 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 80A0337B401 for ; Mon, 7 Oct 2002 15:09:17 -0700 (PDT) Received: from falcon.mail.pas.earthlink.net (falcon.mail.pas.earthlink.net [207.217.120.74]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3E70443E42 for ; Mon, 7 Oct 2002 15:09:17 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0113.cvx22-bradley.dialup.earthlink.net ([209.179.198.113] helo=mindspring.com) by falcon.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17yg3u-0004t9-00; Mon, 07 Oct 2002 15:09:15 -0700 Message-ID: <3DA205B5.2ED5548@mindspring.com> Date: Mon, 07 Oct 2002 15:07:49 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Mikhail Teterin Cc: Peter Wemm , arch@FreeBSD.org Subject: Re: swapon some regular file References: <20021007212545.C363B2A88D@canning.wemm.org> <200210071735.00722.mi+mx@aldan.algebra.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Mikhail Teterin wrote: > For users' convenience, the creation of the md or vn devices can > be put into swapon(8)... Please don't. Please write a shell script, instead. Commands with side effects, especially one like requiring drivers that may not be compiled into your kernel, are a bad idea. It would be a *much* better idea to create a file_pager.c, which would let us go back to your original suggestion. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 15:14:54 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BA41B37B401 for ; Mon, 7 Oct 2002 15:14:53 -0700 (PDT) Received: from scaup.mail.pas.earthlink.net (scaup.mail.pas.earthlink.net [207.217.120.49]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1179643E6E for ; Mon, 7 Oct 2002 15:14:53 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0113.cvx22-bradley.dialup.earthlink.net ([209.179.198.113] helo=mindspring.com) by scaup.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17yg9E-0007Wg-00; Mon, 07 Oct 2002 15:14:44 -0700 Message-ID: <3DA206FD.70C081C9@mindspring.com> Date: Mon, 07 Oct 2002 15:13:17 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Peter Wemm Cc: Mikhail Teterin , arch@FreeBSD.org Subject: Re: swapon some regular file References: <20021007215841.371562A88D@canning.wemm.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Peter Wemm wrote: > > Yes. You have to understand that the code that does the swapping > > for NFS is different than the code that does it for devices. Local > > swap to files is via the device pager. > > Nope. swap_pager, not device_pager. device_pager is the glue for mmaping > character devices. You are, of course, correct. I meant to be descriptive, not give the file name, minus the "_" and the ".c". 8-). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 15:36:52 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8C4AD37B401; Mon, 7 Oct 2002 15:36:49 -0700 (PDT) Received: from swan.mail.pas.earthlink.net (swan.mail.pas.earthlink.net [207.217.120.123]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1A8D943E65; Mon, 7 Oct 2002 15:36:49 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0113.cvx22-bradley.dialup.earthlink.net ([209.179.198.113] helo=mindspring.com) by swan.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17ygUR-0007mN-00; Mon, 07 Oct 2002 15:36:39 -0700 Message-ID: <3DA20C1C.A4B863B7@mindspring.com> Date: Mon, 07 Oct 2002 15:35:08 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Julian Elischer Cc: Sam Leffler , freebsd-arch@freebsd.org, freebsd-net@freebsd.org Subject: Re: CFR: m_tag patch References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Julian Elischer wrote: > Firstlty, each netgraph module type is ignorant of > all other types (except in some special cases). Each 'type' > labels its structures, control messages, and in fact special metadata > that it keeps on a packet, using a special maginc number. Each 'type' > of netgraph node (e.g. ppp node, or frame-relay-mux node) > defines a different 32 bit magic number (traditionally in netgraph > it's just the 32 bit time since the epoch when the module was written). I think the biggest problem is that mbuf lists are not the same thing as packets, and they may not even contain *whole* packets, depending on where in the process you are examining them. Therefore the idea of "packet attributes" is broken to begin with -- particularly, since if the attribute results in TCP options being set on all frags for the packet in question, you will change the amount of real data being sent in the frag. If you are going to treat it as an attribute, then the abstract thing you need to be passing around is a packet, not an mbuf chain. This is probably not a good idea, in general, even if it would let you do nifty things. One obvious application here is to communicate flow information from a front end router to a back end server and/or load balancer, in order to permit it to make better decisions. Such information would be tunnelled in option data on the payload packets between the router and the other end. > Your tags have a single 16 bit tag ID field. > This is insufficient for my needs. > I need to be able to store the API cookie which is a 32 bit > unsigned number, and on top of that, I also need a 16 bit type field > that specifies what the data is within teh given API and a 16 bit > length to allow opaque handling. This is insufficient for Alpha and other 64 bit architectures. I think what you are asking for is really a 'void *'. The other issue here is that your idea of an opaque API/ABI indicator is in conflict, unless you say that this is a pointer, and then format the initial information pointed to by the pointer. Otherwise, you will need a small indirection structure that's pointed to the pointer, AND which contains the API/ABI identifier (i.e. you will need two, not one piece of information for that -- which is what you show, but not what you describe in your text). [ ... ] > If you allocated your API definitions corectly > using my scheme, you might allocate a API number of > 1034025045 for example to the IPSEC-CRYPTO interface. > and that API would define it's own IDS for metadata. > This would not be able to accidentally match with the Priority > metadata used for frame relay (if you sent Ipsec over frame relay) > because (for example) teh frame relay API number is 872148478 > /usr/include/netgraph/ng_frame_relay.h line 48 > > #define NGM_FRAMERELAY_COOKIE 872148478 This is moderately bogus. Specifically, ig you are going to register in new types without an assigned numbers authority (e.g. if I have a vendor private extension, which I wish to implement, yet not have collide with someone else's vendor private extension or a future FreeBSD "standard extension"), then you need to implement a registration interface for named registration, and use *that*. The easiest way to do this would be to ensure that you use the *runtime* kernel address as your identifier, which guarantees that it will be unique in any given system. Yes, I realize that this means that a switch statement may not be used internally to dispatch. In theory, this metadata is an exceptional condition, rather than a common condition (otherwise, the overhead becomes prohibitive, I think, and you might as well put it inline). I think the value of a unique identifier is more important than the ability to switch. If it comes down to it, something which does not conflict is more useful than something which would be faster if it didn't conflict, but doesn't run at all because it doesn't get the chance. > You wouldn't have to know about frame relay and frame relay doesn't need > to know about IPSEC, but it does know how to free the metadata if it > needs to discard the packet. Yes. This implies that metadata ownership indicates a permetadata destructor type; for example, a reference to an object that will no longer be referenced, should the packet be discarded. > As I mentionned before, it is also not clear to me that the metadata > needs to be in linked list form, but I could live with it. I think it has to. The reason he has this is pretty clear from his crypto work, and the reason for the linked list is to, in the limit, allow a linear traversal of the list elements to find data that's relevent to you. It's kind of ugly, but "anything that works is better than anything that doesn't"... it at least guarantees that it *can* work. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 15:45: 7 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E07DB37B428 for ; Mon, 7 Oct 2002 15:44:53 -0700 (PDT) Received: from mail.speakeasy.net (mail14.speakeasy.net [216.254.0.214]) by mx1.FreeBSD.org (Postfix) with ESMTP id 943A143E3B for ; Mon, 7 Oct 2002 15:44:52 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 14220 invoked from network); 7 Oct 2002 22:44:53 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail14.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 7 Oct 2002 22:44:53 -0000 Received: from laptop.baldwin.cx (laptop.baldwin.cx [192.168.0.4]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id g97Mipn5003705; Mon, 7 Oct 2002 18:44:51 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <200210072123.g97LNGvU033246@gw.catspoiler.org> Date: Mon, 07 Oct 2002 18:44:55 -0400 (EDT) From: John Baldwin To: Don Lewis Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., Cc: jmallett@FreeBSD.ORG, arch@FreeBSD.ORG Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 07-Oct-2002 Don Lewis wrote: > On 7 Oct, John Baldwin wrote: >> >> On 05-Oct-2002 Don Lewis wrote: >>> On 5 Oct, Juli Mallett wrote: >>> >>>> diff -Nrdu -x *CVS* -x *dev* sys/kern/kern_exit.c kernel/kern/kern_exit.c >>>> --- sys/kern/kern_exit.c Tue Oct 1 12:15:51 2002 >>>> +++ kernel/kern/kern_exit.c Sat Oct 5 01:20:57 2002 >>> >>>> @@ -209,12 +210,12 @@ >>>> PROC_LOCK(p); >>>> if (p == p->p_leader) { >>>> q = p->p_peers; >>>> + PROC_UNLOCK(p); >>>> while (q != NULL) { >>>> - PROC_LOCK(q); >>>> psignal(q, SIGKILL); >>>> - PROC_UNLOCK(q); >>>> q = q->p_peers; >>>> } >>>> + PROC_LOCK(p); >>>> while (p->p_peers) >>>> msleep(p, &p->p_mtx, PWAIT, "exit1", 0); >>>> } >>> >>> This scary looking fragment of code in exit1() is relying on the lock on >>> p->p_leader being continuously held to prevent the p_peers list from >>> changing while the list traversal is in progress. The code in >>> kern_fork.c and elsewhere in kern_exit.c holds a lock on p_leader while >>> the list modifications are done. >>> >>> The existing code looks like it could deadlock if q is locked because it >>> is in fork() or exit(). Process p will block when it tries to lock q, >>> and q will block when it tries to lock its p_leader, which happens to be >>> p. >> >> Ugh. Probably the code should be changed to do something like this: >> >> --- kern_exit.c 2 Oct 2002 23:12:01 -0000 1.181 >> +++ kern_exit.c 7 Oct 2002 18:48:18 -0000 >> @@ -203,17 +203,18 @@ >> */ >> >> p->p_flag |= P_WEXIT; >> - PROC_UNLOCK(p); >> >> /* Are we a task leader? */ >> - PROC_LOCK(p); >> if (p == p->p_leader) { >> q = p->p_peers; >> while (q != NULL) { >> + nq = q->p_peers; >> + PROC_UNLOCK(p); >> PROC_LOCK(q); >> psignal(q, SIGKILL); >> PROC_UNLOCK(q); >> - q = q->p_peers; >> + PROC_LOCK(p); >> + q = nq; >> } >> while (p->p_peers) >> msleep(p, &p->p_mtx, PWAIT, "exit1", 0); > > It's not obvious to me that your alternative is safe. It avoids the > deadlock problem, but what keeps the list from changing while it is > being traversed, especially while we're waiting for PROC_LOCK(q)? It > separate lock for the peer list (instead of using PROC_LOCK(p_leader)) > looks like the obvious fix. Grabbing the peer list lock after unlocking > P would avoid the deadlock and allow us to do whatever locking is needed > for psignal(). Hmm, you are right. Yuck. *sigh* I think we need to check P_WEXIT in fork1() for this to really DTRT as well. >> Also, we might should check P_WEXIT and abort in fork1() if it is >> set. (We don't appear to do that presently.) >> > > Probably, but the list is also modified in the exit code. All those > processes that we are sending SIGKILL to are removing themselves from > the list. Processes dieing from SIGKILL that we send them aren't a problem since we have already read their p_peers member before we kill them. That's the point of 'nq'. The problem is that 'nq' could exit and could be an invalid pointer. If a process later in the list after 'nq' died that is not a problem either. Well, how about this: http://www.FreeBSD.org/~jhb/patches/ppeers.patch -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 15:50:18 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7BBEF37B401 for ; Mon, 7 Oct 2002 15:50:16 -0700 (PDT) Received: from swan.mail.pas.earthlink.net (swan.mail.pas.earthlink.net [207.217.120.123]) by mx1.FreeBSD.org (Postfix) with ESMTP id 02BA443E6A for ; Mon, 7 Oct 2002 15:50:16 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0113.cvx22-bradley.dialup.earthlink.net ([209.179.198.113] helo=mindspring.com) by swan.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17yghR-0004um-00; Mon, 07 Oct 2002 15:50:05 -0700 Message-ID: <3DA20F40.D3C3FD59@mindspring.com> Date: Mon, 07 Oct 2002 15:48:32 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Wilko Bulte Cc: Peter Wemm , Mikhail Teterin , arch@FreeBSD.ORG Subject: Re: swapon some regular file References: <20021007212545.C363B2A88D@canning.wemm.org> <3DA204A7.50530BE5@mindspring.com> <20021008000656.A598@freebie.xs4all.nl> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Wilko Bulte wrote: > > > vnconfig/mdconfig work because that basically adds the logical -> physical > > > translation step. I'd just as soon not have to mess with this though. > > > > It would be useful to be able to ask a file for its list of > > physical blocks on the underlying device, so that you could > > sort them into contiguous extents, and then use *those*, > > instead of eating the translation overhead, each time... > > Sounds a bit like VMS (IIRC..) > > :) Yes. Windows does the same thing, itself, for its own swap files. If you write an IFS for Windows, you will find that if you do not implement this optional-to-implement interface, you will not be able to configure swapping on the device in question. I was really more concerned with Peter's point, about getting around the problem of translation. One way to do it would be to front-load the cost, so that you only incur the overhead one time. The cost to swapping to a file with a vnconfig'ed device is that you pay the FS<->block translation penalty on each and every I/O. Creating this interface would avoid it. I haven't really paid a heck of a lot of attention to Poul's version of the slice code, as far as implementation details go; the overhead of translation layers, at least for linear translation, should be able to be optimized out, by way of ordered block lists plus the underlying device: most of these translations should be possible to complete statically, once. Poul said that there was a per-layer cost to using GEOM, which implies that he doesn't do this: in theiry, you should be able to collapse all references to a single layer, no matter what. If he did this, you could do the same thing for a discontiguous aggregate array of block extents -- which means that they could come from a file, or stripe sets, or whatever. The interesting thing in the swap case, is that you care more about the physicality of the blocks, anyway: the implication there is that translation layers are a bad idea, no matter what, if there are more than one of them. The other interesting thing in the swap case is that there is not a "right" order: so long as the blocks are physically contiguous, and the translation proceeds reversibly, you will get the same data in as out, even if the logical ordering of the blocks is not maintained at the upper layer (all you *really* care about is that the data at a logical offset in is the same as the data at the logical offset out). The only layers you cannot collapse are content translation, and content size change (e.g. a crypto layer using a one-time-pad, or a compression layer, using a fixed compression ratio and block reallocation to approximate physical contiguity). Pretty much, that's basically everything but the sample AES layer, at this point, which could have its overhead squeezed out of it. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 15:52:57 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7029537B401; Mon, 7 Oct 2002 15:52:53 -0700 (PDT) Received: from ebb.errno.com (ebb.errno.com [66.127.85.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id D304D43E3B; Mon, 7 Oct 2002 15:52:52 -0700 (PDT) (envelope-from sam@errno.com) Received: from melange (melange.errno.com [66.127.85.82]) (authenticated bits=0) by ebb.errno.com (8.12.5/8.12.1) with ESMTP id g97Mqo1H005527 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Mon, 7 Oct 2002 15:52:50 -0700 (PDT)?g (envelope-from sam@errno.com)œ X-Authentication-Warning: ebb.errno.com: Host melange.errno.com [66.127.85.82] claimed to be melange Message-ID: <185201c26e54$43339f40$52557f42@errno.com> From: "Sam Leffler" To: "Julian Elischer" Cc: , References: Subject: Re: CFR: m_tag patch Date: Mon, 7 Oct 2002 15:52:49 -0700 Organization: Errno Consulting MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > If we make this a 'standard' method of adding metadata to a packet, then > I'd like to move netgraph to use it to. it's always better > to use a standard method if it will do than to roll your own. > > however, there are some thiongs that would proclude me from doing so at > tth moment. Thes are not major issues but they do give a little feeling > of "banging a square peg into a round hole" if I try use what you have > for replacing the netgraph metadata system. > > Issues: > > Firstlty, each netgraph module type is ignorant of > all other types (except in some special cases). Each 'type' > labels its structures, control messages, and in fact special metadata > that it keeps on a packet, using a special maginc number. Each 'type' > of netgraph node (e.g. ppp node, or frame-relay-mux node) > defines a different 32 bit magic number (traditionally in netgraph > it's just the 32 bit time since the epoch when the module was written). > > Metadata that is associated with a packet may only have meaning > to a subset of the modules that touch that packet. so in effect > teh maginc number is an API/ABI indicator. It specifies which API > defines this metadata. (or in netgraph, this control message). > A particular set of modules may want to know about a common API which > includes some metadata thet they share. In tehis case it is permissable > for them to agree about a common magic number to identify these > things, stored in an include file special to that API interface. > > All modules that do not know about that maginc number (do not include > that API include file) must ignore that metadata. In this way > 3rd party modes can define their own metadata > types, assuming that they do not have a magic-number collision with some > other node type (pretty damned unlikely). Once they have seen that > teh metadata belongs to an API they understand then, and only then will > they look firther to try understand the type of the metadata. > There is enough data (e.g. size info) to allow ignorant nodes to free or > bypass > unknown metadata. > > > Your tags have a single 16 bit tag ID field. > This is insufficient for my needs. > I need to be able to store the API cookie which is a 32 bit > unsigned number, and on top of that, I also need a 16 bit type field > that specifies what the data is within teh given API and a 16 bit > length to allow opaque handling. > > your stucture is: > /* > * Packet tags structure. > */ > struct m_tag { > SLIST_ENTRY(m_tag) m_tag_link; /* List of packet tags */ > u_int16_t m_tag_id; /* Tag ID */ > u_int16_t m_tag_len; /* Length of data */ > }; > > > and mine is: > > struct meta_field_header { > u_long cookie; /* cookie for the field. Skip fields you don't > * know about (same cookie as in messages) */ > u_short type; /* field ID */ > u_short len; /* total len of this field including extra > * data */ > char data[0]; /* data starts here */ > }; > > Basically I'd need to add a 32 bit API identifier > to the metadata to allow me to use it. Since (for example) > both ppp and frame relay could define a metadata of type "1". > > Since neither ppp nor frame realy knows of the other they can not > co-operate on selecting command and metadata IDs. However > we DO send ppp packets over frame relay links, so the chance that > a packet has to pass through both node types is very real. > > I would define a 'global' API that defines a few characteristics that > would be useful for all modules to know about. > e.g. packet priority. (frame relay needs this to work right > for example, but it needs to be understood at the physical > layer so that hi-pri packets can bypass low-pri packets) > So all this is to say that you want m_tag_id to be u_int32_t. Having another 16-bit field is immaterial to anything I've thought of but given that switching to a 32-bit tag_id will require allocating an additional 32-bit item you can have that for free since there's little point in having a 32-bit length. The obvious downside is that m_tag structs now go from 8 bytes to 12, but this is still a darn sight better than allocating a 256 byte mbuf. > Your TAG IDS are centrally allocated. > e.g. in mbuf.h: > /* Packet tag types -- first ones are from NetBSD */ > > #define PACKET_TAG_NONE 0 /* Nadda */ > #define PACKET_TAG_IPSEC_IN_DONE 1 /* IPsec applied, in */ > #define PACKET_TAG_IPSEC_OUT_DONE 2 /* IPsec applied, out */ > #define PACKET_TAG_IPSEC_IN_CRYPTO_DONE 3 /* NIC IPsec crypto done */ > #define PACKET_TAG_IPSEC_OUT_CRYPTO_NEEDED 4 /* NIC IPsec crypto req'ed */ > #define PACKET_TAG_IPSEC_IN_COULD_DO_CRYPTO 5 /* NIC notifies IPsec */ > #define PACKET_TAG_IPSEC_PENDING_TDB 6 /* Reminder to do IPsec */ > #define PACKET_TAG_BRIDGE 7 /* Bridge processing done */ > #define PACKET_TAG_GIF 8 /* GIF processing done */ > #define PACKET_TAG_GRE 9 /* GRE processing done */ > [etc.] > > If you allocated your API definitions corectly > using my scheme, you might allocate a API number of > 1034025045 for example to the IPSEC-CRYPTO interface. > and that API would define it's own IDS for metadata. > This would not be able to accidentally match with the Priority > metadata used for frame relay (if you sent Ipsec over frame relay) > because (for example) teh frame relay API number is 872148478 > /usr/include/netgraph/ng_frame_relay.h line 48 > > #define NGM_FRAMERELAY_COOKIE 872148478 > > You wouldn't have to know about frame relay and frame relay doesn't need > to know about IPSEC, but it does know how to free the metadata if it > needs to discard the packet. > If you allocate tag id's using your 32-bit time scheme then the fixed values above would never be hit since they are all for impossible times and so there'd be no conflict. > > (this is a contrived example because priority is a "BASE API" metadata > type in netgraph, and the base API doesn't have a magic number at the > moment but probably should have one. (certainly would in this case) > > > As I mentionned before, it is also not clear to me that the metadata > needs to be in linked list form, but I could live with it. > Sounds to me like the real issue for you is insuring unique m_tag_id values. We're certainly less likely to have collisions with a 32-bit value than a 16-bit value and expanding this way gives you your "field ID" too. I guess the question I have is whether the existing API's that search only by "cookie" are sufficient for your needs. If so then I'm ok with changing things. Otherwise we have an API incompatibility with openbsd that I'd like to avoid. Sam To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 16:20:13 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 121FA37B401; Mon, 7 Oct 2002 16:20:09 -0700 (PDT) Received: from rwcrmhc53.attbi.com (rwcrmhc53.attbi.com [204.127.198.39]) by mx1.FreeBSD.org (Postfix) with ESMTP id A3B5643E6A; Mon, 7 Oct 2002 16:20:08 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc53.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021007232007.OMXC18767.rwcrmhc53.attbi.com@InterJet.elischer.org>; Mon, 7 Oct 2002 23:20:07 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id QAA36302; Mon, 7 Oct 2002 16:11:13 -0700 (PDT) Date: Mon, 7 Oct 2002 16:11:12 -0700 (PDT) From: Julian Elischer To: Terry Lambert Cc: Sam Leffler , freebsd-arch@freebsd.org, freebsd-net@freebsd.org Subject: Re: CFR: m_tag patch In-Reply-To: <3DA20C1C.A4B863B7@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 7 Oct 2002, Terry Lambert wrote: > > > > Your tags have a single 16 bit tag ID field. > > This is insufficient for my needs. > > I need to be able to store the API cookie which is a 32 bit > > unsigned number, and on top of that, I also need a 16 bit type field > > that specifies what the data is within teh given API and a 16 bit > > length to allow opaque handling. > > This is insufficient for Alpha and other 64 bit architectures. I > think what you are asking for is really a 'void *'. IT IS NOT A POINTER! it is a 32 bit unsigned number. > > The other issue here is that your idea of an opaque API/ABI indicator > is in conflict, unless you say that this is a pointer, and then format > the initial information pointed to by the pointer. Otherwise, you > will need a small indirection structure that's pointed to the pointer, > AND which contains the API/ABI identifier (i.e. you will need two, not > one piece of information for that -- which is what you show, but not > what you describe in your text). it is not used to look up anything.. it is used to verify only. it is just working on the principal that there is not going to be a collision in the 32 bit space. Especially when we create them from "time since the epoch", and when teh various authors can see each other's choices of value. > This is moderately bogus. no it is not. I estimate that the chance of having a collision given all the factors is 1:2^50 or so ASSUMING THAT 1000000 PEOPLE DEVELOP THEIR OWN MODULES AND DO NOT CHECK THEM IN BUT DO ALL SHARE THEM WITH EACH OTHER. > > Specifically, if you are going to register in new types without an > assigned numbers authority (e.g. if I have a vendor private extension, > which I wish to implement, yet not have collide with someone else's > vendor private extension or a future FreeBSD "standard extension"), > then you need to implement a registration interface for named > registration, and use *that*. Terry if you like your chances of developing a module within the next 100 years in exactly the same moment to the second that someone else does so, and neither of you checks in that module, and your modules have to co-exist, and you don't TALK to each other, then I have some used lottery tickets I an sell you.. > The easiest way to do this would be to ensure that you use the > *runtime* kernel address as your identifier, which guarantees that > it will be unique in any given system. Terry I am NOT going to do that.. end of argument. > I think it has to. The reason he has this is pretty clear from > his crypto work, and the reason for the linked list is to, in the > limit, allow a linear traversal of the list elements to find data > that's relevent to you. Read what he said Terry.. for gods sake. He said that there are usually < 2 metadata elements each being a few bytes long.. > > It's kind of ugly, but "anything that works is better than anything > that doesn't"... it at least guarantees that it *can* work. there is more than one way to skin a cat. the fact that it is a linked list doesn't mean it has to be a linked list. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 16:31: 6 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5023337B401; Mon, 7 Oct 2002 16:31:05 -0700 (PDT) Received: from carp.icir.org (carp.icir.org [192.150.187.71]) by mx1.FreeBSD.org (Postfix) with ESMTP id EE83B43E91; Mon, 7 Oct 2002 16:31:04 -0700 (PDT) (envelope-from rizzo@carp.icir.org) Received: from carp.icir.org (localhost [127.0.0.1]) by carp.icir.org (8.12.3/8.12.3) with ESMTP id g97NV1O2028018; Mon, 7 Oct 2002 16:31:01 -0700 (PDT) (envelope-from rizzo@carp.icir.org) Received: (from rizzo@localhost) by carp.icir.org (8.12.3/8.12.3/Submit) id g97NV1Gn028017; Mon, 7 Oct 2002 16:31:01 -0700 (PDT) (envelope-from rizzo) Date: Mon, 7 Oct 2002 16:31:01 -0700 From: Luigi Rizzo To: Sam Leffler Cc: Julian Elischer , freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: m_tag patch Message-ID: <20021007163101.A27463@carp.icir.org> References: <185201c26e54$43339f40$52557f42@errno.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <185201c26e54$43339f40$52557f42@errno.com>; from sam@errno.com on Mon, Oct 07, 2002 at 03:52:49PM -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG the 16 vs. 32 bit On Mon, Oct 07, 2002 at 03:52:49PM -0700, Sam Leffler wrote: ... > Sounds to me like the real issue for you is insuring unique m_tag_id values. > We're certainly less likely to have collisions with a 32-bit value than a > 16-bit value and expanding this way gives you your "field ID" too. I guess > the question I have is whether the existing API's that search only by > "cookie" are sufficient for your needs. If so then I'm ok with changing > things. Otherwise we have an API incompatibility with openbsd that I'd like > to avoid. i wonder what do we gain by moving to 32 bit m_tag_id -- because there is is still no strict guarantee that we have no conflicts if people randomly choose "cookies", and also, using the same reasoning for allocations one could argue that having the cookie chosen as the number of _days_ since the epoch, will still give low conflict probability while still fitting in 16 bits. Also, those modules that require one or a very small number of different annotations (e.g. all the ones currently using m_tags) would just need the "cookie", whereas others with a large set of subfields could as well consider the field_id as part of the opaque data. cheers luigi ----------------------------------+----------------------------------------- Luigi RIZZO, luigi@iet.unipi.it . ICSI (on leave from Univ. di Pisa) http://www.iet.unipi.it/~luigi/ . 1947 Center St, Berkeley CA 94704 Phone: (510) 666 2988 ----------------------------------+----------------------------------------- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 16:40:11 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C852F37B401; Mon, 7 Oct 2002 16:40:08 -0700 (PDT) Received: from sccrmhc02.attbi.com (sccrmhc02.attbi.com [204.127.202.62]) by mx1.FreeBSD.org (Postfix) with ESMTP id 372BE43E42; Mon, 7 Oct 2002 16:40:08 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by sccrmhc02.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021007234007.RWGE27763.sccrmhc02.attbi.com@InterJet.elischer.org>; Mon, 7 Oct 2002 23:40:07 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id QAA36379; Mon, 7 Oct 2002 16:23:02 -0700 (PDT) Date: Mon, 7 Oct 2002 16:23:00 -0700 (PDT) From: Julian Elischer To: Sam Leffler Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: m_tag patch In-Reply-To: <185201c26e54$43339f40$52557f42@errno.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 7 Oct 2002, Sam Leffler wrote: > > So all this is to say that you want m_tag_id to be u_int32_t. Having > another 16-bit field is immaterial to anything I've thought of but given > that switching to a 32-bit tag_id will require allocating an additional > 32-bit item you can have that for free since there's little point in having > a 32-bit length. The obvious downside is that m_tag structs now go from 8 > bytes to 12, but this is still a darn sight better than allocating a 256 > byte mbuf. sure... "an extra 4 bytes please sir" :-) > > If you allocate tag id's using your 32-bit time scheme then the fixed values > above would never be hit since they are all for impossible times and so > there'd be no conflict. Just make them all IDs in a single "Legacy" API > > > > > (this is a contrived example because priority is a "BASE API" metadata > > type in netgraph, and the base API doesn't have a magic number at the > > moment but probably should have one. (certainly would in this case) > > > > > > As I mentionned before, it is also not clear to me that the metadata > > needs to be in linked list form, but I could live with it. > > > > Sounds to me like the real issue for you is insuring unique m_tag_id values. > We're certainly less likely to have collisions with a 32-bit value than a > 16-bit value and expanding this way gives you your "field ID" too. I guess > the question I have is whether the existing API's that search only by > "cookie" are sufficient for your needs. If so then I'm ok with changing > things. Otherwise we have an API incompatibility with openbsd that I'd like > to avoid. You could "mis-use" the API identifier as you suggest, or you could just keep them in theID and have "Legacy" API identifier of 1034032666 (the output of date -u +'%s' for right now.. the suggested method of deriving an API ID.) I'd rather not pollute the namespace of other modules, but it's up to you. first check if it's an API you understand. if not skip to next if it is, go to some switch table to identify more exactly.. heck take API identifier #0 :-) I have actually had code with 2 version sof the same API. Being able to just assign a different API ID was good. I could support both sorts of input. Julian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 17: 0:14 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1372F37B406; Mon, 7 Oct 2002 17:00:11 -0700 (PDT) Received: from sccrmhc01.attbi.com (sccrmhc01.attbi.com [204.127.202.61]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7066A43E75; Mon, 7 Oct 2002 17:00:10 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by sccrmhc01.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021008000009.VNMZ6431.sccrmhc01.attbi.com@InterJet.elischer.org>; Tue, 8 Oct 2002 00:00:09 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id QAA36501; Mon, 7 Oct 2002 16:55:43 -0700 (PDT) Date: Mon, 7 Oct 2002 16:55:42 -0700 (PDT) From: Julian Elischer To: Luigi Rizzo Cc: Sam Leffler , freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: m_tag patch In-Reply-To: <20021007163101.A27463@carp.icir.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 7 Oct 2002, Luigi Rizzo wrote: > the 16 vs. 32 bit > On Mon, Oct 07, 2002 at 03:52:49PM -0700, Sam Leffler wrote: > ... > > Sounds to me like the real issue for you is insuring unique m_tag_id values. > > We're certainly less likely to have collisions with a 32-bit value than a > > 16-bit value and expanding this way gives you your "field ID" too. I guess > > the question I have is whether the existing API's that search only by > > "cookie" are sufficient for your needs. If so then I'm ok with changing > > things. Otherwise we have an API incompatibility with openbsd that I'd like > > to avoid. > > i wonder what do we gain by moving to 32 bit m_tag_id -- because there is > is still no strict guarantee that we have no conflicts > if people randomly choose "cookies", and also, using the same reasoning > for allocations one could argue that having the cookie chosen as the number > of _days_ since the epoch, will still give low conflict probability while > still fitting in 16 bits. > Also, those modules that require one or a very small number of different > annotations (e.g. all the ones currently using m_tags) would just need > the "cookie", whereas others with a large set of subfields could as well > consider the field_id as part of the opaque data. In my usage, the API id also includes other parts of the API, not just packet metadata. I use teh same cookie to define control command messags within netgraph. In fact the control messages currently FAR exceed the metadata types. At this moment there are about 37 APIS defined in netgraph and they implement about 150 different control messages there are only 2 metadata types in use. (priority and "dropability") To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 17: 6:31 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 82D2B37B401; Mon, 7 Oct 2002 17:06:29 -0700 (PDT) Received: from ebb.errno.com (ebb.errno.com [66.127.85.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id 19F2243E77; Mon, 7 Oct 2002 17:06:29 -0700 (PDT) (envelope-from sam@errno.com) Received: from melange (melange.errno.com [66.127.85.82]) (authenticated bits=0) by ebb.errno.com (8.12.5/8.12.1) with ESMTP id g9806P1H005894 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Mon, 7 Oct 2002 17:06:26 -0700 (PDT)?g (envelope-from sam@errno.com)œ X-Authentication-Warning: ebb.errno.com: Host melange.errno.com [66.127.85.82] claimed to be melange Message-ID: <18d301c26e5e$8b5c7a30$52557f42@errno.com> From: "Sam Leffler" To: "Julian Elischer" Cc: , References: Subject: Re: CFR: m_tag patch Date: Mon, 7 Oct 2002 17:06:25 -0700 Organization: Errno Consulting MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > On Mon, 7 Oct 2002, Sam Leffler wrote: > > > > If you allocate tag id's using your 32-bit time scheme then the fixed values > > above would never be hit since they are all for impossible times and so > > there'd be no conflict. > > Just make them all IDs in a single "Legacy" API > Good idea; I see the way out. Try this: struct m_tag { SLIST_ENTRY(m_tag) m_tag_link; /* List of packet tags */ u_int16_t m_tag_id; /* Tag ID */ u_int16_t m_tag_len; /* Length of data */ u_int32_t m_tag_cookie; /* Module/ABI */ }; Then define the "Legacy ABI" to be zero (or whatever you want). Then all the m_tag_* routines that I specified work only for the Legacy ABI. (Whether this is done with shims or whatever doesn't matter.) This gives me the compatiblity I want with openbsd and gives you the functionality you need for netgraph. For new work we can specify users should avoid the Legacy ABI. Cost is basically 4 bytes per tag and an extra compare when walking the tags. Happy? Sam To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 17:17:21 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AB14037B404 for ; Mon, 7 Oct 2002 17:17:20 -0700 (PDT) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 531C843E88 for ; Mon, 7 Oct 2002 17:17:20 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.5/8.12.4) with ESMTP id g980H4PQ049579; Mon, 7 Oct 2002 17:17:04 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.5/8.12.4/Submit) id g980H3Dl049578; Mon, 7 Oct 2002 17:17:03 -0700 (PDT) (envelope-from dillon) Date: Mon, 7 Oct 2002 17:17:03 -0700 (PDT) From: Matthew Dillon Message-Id: <200210080017.g980H3Dl049578@apollo.backplane.com> To: Terry Lambert Cc: Wilko Bulte , Peter Wemm , Mikhail Teterin , arch@FreeBSD.ORG Subject: Re: swapon some regular file References: <20021007212545.C363B2A88D@canning.wemm.org> <3DA204A7.50530BE5@mindspring.com> <20021008000656.A598@freebie.xs4all.nl> <3DA20F40.D3C3FD59@mindspring.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG The translation overhead isn't a major problem. What is a major problem is the fact that having to do a translation may require additional I/O's... in fact, may require additional memory allocations. Since pageouts usually occur when memory is low, the additional I/O can lead to a deadlock in the paging system. So the OS would definitely need to preallocate and record the physical block numbers / number ranges related to the new swap area. This is just as well since you can't safely use a sparse file to back the swap area anyway. Since pageouts are essentially random, the underlying file blocks would be allocated non deterministically and the result would be a mess of seeks and no ability to cluster anything. This is easy to demonstrate with VN. You can back a VN disk with a pre-allocated file, with a sparse file, with pre-reserved swap, or with sparse swap. The performance of VN disks created with sparse files or swap tend to degrade rather quickly due to the severe randomness in the seeking that winds up happening. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 17:20:12 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B0D0037B408; Mon, 7 Oct 2002 17:20:10 -0700 (PDT) Received: from sccrmhc02.attbi.com (sccrmhc02.attbi.com [204.127.202.62]) by mx1.FreeBSD.org (Postfix) with ESMTP id 00BAD43E75; Mon, 7 Oct 2002 17:20:10 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by sccrmhc02.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021008002009.TEYW27763.sccrmhc02.attbi.com@InterJet.elischer.org>; Tue, 8 Oct 2002 00:20:09 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id RAA36612; Mon, 7 Oct 2002 17:10:43 -0700 (PDT) Date: Mon, 7 Oct 2002 17:10:42 -0700 (PDT) From: Julian Elischer To: Sam Leffler Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: m_tag patch In-Reply-To: <18d301c26e5e$8b5c7a30$52557f42@errno.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 7 Oct 2002, Sam Leffler wrote: > > On Mon, 7 Oct 2002, Sam Leffler wrote: > > > > > > If you allocate tag id's using your 32-bit time scheme then the fixed > values > > > above would never be hit since they are all for impossible times and so > > > there'd be no conflict. > > > > Just make them all IDs in a single "Legacy" API > > > > Good idea; I see the way out. Try this: > > struct m_tag { > SLIST_ENTRY(m_tag) m_tag_link; /* List of packet tags */ > u_int16_t m_tag_id; /* Tag ID */ > u_int16_t m_tag_len; /* Length of data */ > u_int32_t m_tag_cookie; /* Module/ABI */ > }; > > Then define the "Legacy ABI" to be zero (or whatever you want). Then all > the m_tag_* routines that I specified work only for the Legacy ABI. > (Whether this is done with shims or whatever doesn't matter.) This gives me > the compatiblity I want with openbsd and gives you the functionality you > need for netgraph. For new work we can specify users should avoid the > Legacy ABI. > > Cost is basically 4 bytes per tag and an extra compare when walking the > tags. Happy? > definitly. Each API authout gets to polute his own namespace as much as he wants.. :-) > Sam > > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 17:20:21 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5800B37B401; Mon, 7 Oct 2002 17:20:19 -0700 (PDT) Received: from sccrmhc02.attbi.com (sccrmhc02.attbi.com [204.127.202.62]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6D7A143E3B; Mon, 7 Oct 2002 17:20:18 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by sccrmhc02.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021008002017.TFCJ27763.sccrmhc02.attbi.com@InterJet.elischer.org>; Tue, 8 Oct 2002 00:20:17 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id RAA36615; Mon, 7 Oct 2002 17:14:29 -0700 (PDT) Date: Mon, 7 Oct 2002 17:14:29 -0700 (PDT) From: Julian Elischer To: Luigi Rizzo Cc: Sam Leffler , freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: m_tag patch In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 7 Oct 2002, Julian Elischer wrote: > > > On Mon, 7 Oct 2002, Luigi Rizzo wrote: > > > the 16 vs. 32 bit > > On Mon, Oct 07, 2002 at 03:52:49PM -0700, Sam Leffler wrote: > > ... > > > Sounds to me like the real issue for you is insuring unique m_tag_id values. > > > We're certainly less likely to have collisions with a 32-bit value than a > > > 16-bit value and expanding this way gives you your "field ID" too. I guess > > > the question I have is whether the existing API's that search only by > > > "cookie" are sufficient for your needs. If so then I'm ok with changing > > > things. Otherwise we have an API incompatibility with openbsd that I'd like > > > to avoid. > > > > i wonder what do we gain by moving to 32 bit m_tag_id -- because there is > > is still no strict guarantee that we have no conflicts > > if people randomly choose "cookies", and also, using the same reasoning > > for allocations one could argue that having the cookie chosen as the number > > of _days_ since the epoch, will still give low conflict probability while > > still fitting in 16 bits. > > Also, those modules that require one or a very small number of different > > annotations (e.g. all the ones currently using m_tags) would just need > > the "cookie", whereas others with a large set of subfields could as well > > consider the field_id as part of the opaque data. > > > In my usage, the API id also includes other parts of the API, not just > packet metadata. > I use teh same cookie to define control command messags within > netgraph. In fact the control messages currently FAR exceed the > metadata types. > > At this moment there are about 37 APIS defined in netgraph and > they implement about 150 different control messages > > there are only 2 metadata types in use. > > (priority and "dropability") I lie.. there is a bandwidteh manager written in netgraph that uses metadata to hold the queueing and packet category info. so there are maybe 2 more types defined in their own API. > > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 17:34:34 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2E49337B401; Mon, 7 Oct 2002 17:34:32 -0700 (PDT) Received: from pintail.mail.pas.earthlink.net (pintail.mail.pas.earthlink.net [207.217.120.122]) by mx1.FreeBSD.org (Postfix) with ESMTP id B9A1C43E6E; Mon, 7 Oct 2002 17:34:31 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0113.cvx22-bradley.dialup.earthlink.net ([209.179.198.113] helo=mindspring.com) by pintail.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17yiKN-00065C-00; Mon, 07 Oct 2002 17:34:24 -0700 Message-ID: <3DA2278A.A1A33A02@mindspring.com> Date: Mon, 07 Oct 2002 17:32:10 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Julian Elischer Cc: Sam Leffler , freebsd-arch@freebsd.org, freebsd-net@freebsd.org Subject: Re: CFR: m_tag patch References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Julian Elischer wrote: > > This is insufficient for Alpha and other 64 bit architectures. I > > think what you are asking for is really a 'void *'. > > IT IS NOT A POINTER! > > it is a 32 bit unsigned number. OK, I give up. Why is 2^32 possibilities better than 2^16th possibilities? You have > 65536 different chunks of metadata you need to deal with? > > The other issue here is that your idea of an opaque API/ABI indicator > > is in conflict, unless you say that this is a pointer, and then format > > the initial information pointed to by the pointer. Otherwise, you > > will need a small indirection structure that's pointed to the pointer, > > AND which contains the API/ABI identifier (i.e. you will need two, not > > one piece of information for that -- which is what you show, but not > > what you describe in your text). > > it is not used to look up anything.. > it is used to verify only. > > it is just working on the principal that there is not going to be > a collision in the 32 bit space. Especially when we create them from > "time since the epoch", and when teh various authors can see each > other's choices of value. I don't buy this. At this point, you are arguing statistical protection, and you are talking about a difference in collision probability, not collision avoidance. If 16 is bad, and 32 is good, then 64 is better. If 64 is "way too big", then 16 vs. 32 is just a matter of opinion, nothing else. > > This is moderately bogus. > > no it is not. > > I estimate that the chance of having a collision given all the > factors is 1:2^50 or so ASSUMING THAT 1000000 PEOPLE DEVELOP THEIR OWN > MODULES AND DO NOT CHECK THEM IN BUT DO ALL SHARE THEM WITH EACH OTHER. That assumes that the numbers people pick are actually random; they won't be. The way to handle this is to ask the kernel for a unique number for use in the registration process. > > Specifically, if you are going to register in new types without an > > assigned numbers authority (e.g. if I have a vendor private extension, > > which I wish to implement, yet not have collide with someone else's > > vendor private extension or a future FreeBSD "standard extension"), > > then you need to implement a registration interface for named > > registration, and use *that*. > > Terry if you like your chances of developing a module within the next > 100 years in exactly the same moment to the second that someone else > does so, and neither of you checks in that module, and your modules > have to co-exist, and you don't TALK to each other, then I have some > used lottery tickets I an sell you.. So it's a timestamp? You didn't say that it was a 32 bit time counter, so that simultaneuity was a requirement for a collision. > > I think it has to. The reason he has this is pretty clear from > > his crypto work, and the reason for the linked list is to, in the > > limit, allow a linear traversal of the list elements to find data > > that's relevent to you. > > Read what he said Terry.. for gods sake. He said that there are usually > < 2 metadata elements each being a few bytes long.. You keep avoiding his reason for the linked list, Julian. > > It's kind of ugly, but "anything that works is better than anything > > that doesn't"... it at least guarantees that it *can* work. > > there is more than one way to skin a cat. the fact that it is a linked > list doesn't mean it has to be a linked list. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 17:40: 8 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AA4F537B401; Mon, 7 Oct 2002 17:40:07 -0700 (PDT) Received: from rwcrmhc51.attbi.com (rwcrmhc51.attbi.com [204.127.198.38]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5F27C43E4A; Mon, 7 Oct 2002 17:40:07 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc51.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021008004007.IDKW17535.rwcrmhc51.attbi.com@InterJet.elischer.org>; Tue, 8 Oct 2002 00:40:07 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id RAA36685; Mon, 7 Oct 2002 17:24:15 -0700 (PDT) Date: Mon, 7 Oct 2002 17:24:14 -0700 (PDT) From: Julian Elischer To: Sam Leffler Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: m_tag patch In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 7 Oct 2002, Julian Elischer wrote: > definitly. > Each API authout gets to polute his own namespace as much as he author > wants.. :-) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 17:50:31 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9DFA137B401; Mon, 7 Oct 2002 17:50:29 -0700 (PDT) Received: from harrier.mail.pas.earthlink.net (harrier.mail.pas.earthlink.net [207.217.120.12]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3442B43E6A; Mon, 7 Oct 2002 17:50:29 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0113.cvx22-bradley.dialup.earthlink.net ([209.179.198.113] helo=mindspring.com) by harrier.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17yiZs-000018-00; Mon, 07 Oct 2002 17:50:24 -0700 Message-ID: <3DA22B42.FBB6ECBF@mindspring.com> Date: Mon, 07 Oct 2002 17:48:02 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Luigi Rizzo Cc: Sam Leffler , Julian Elischer , freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: m_tag patch References: <185201c26e54$43339f40$52557f42@errno.com> <20021007163101.A27463@carp.icir.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Luigi Rizzo wrote: > i wonder what do we gain by moving to 32 bit m_tag_id -- because there is > is still no strict guarantee that we have no conflicts > if people randomly choose "cookies", and also, using the same reasoning > for allocations one could argue that having the cookie chosen as the number > of _days_ since the epoch, will still give low conflict probability while > still fitting in 16 bits. Julian wants to use "32 bit seconds since epoch" as the "random" value, and then rely on statistical protection. This wasn't clear to me, either. It depends on knowing netgraph developement well enough, and following the recommended approach well enough that you use the seconds-since-epoch method of obtaining interface IDs. > Also, those modules that require one or a very small number of different > annotations (e.g. all the ones currently using m_tags) would just need > the "cookie", whereas others with a large set of subfields could as well > consider the field_id as part of the opaque data. That's why I suggested a void *, and that it would not be good enough for 64 bit machines. If everyone followed the seconds-since-epoch assignment methodology, though, Julian's approach would provide statistical protection (I worry about "the birthday paradox" making the likelihood much higher; for example, I'd expect the space to be smaller almost instantly, due to time zones...). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 19: 6:31 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 293B737B401; Mon, 7 Oct 2002 19:06:28 -0700 (PDT) Received: from out011.verizon.net (out011pub.verizon.net [206.46.170.135]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7655C43E65; Mon, 7 Oct 2002 19:06:27 -0700 (PDT) (envelope-from res03db2@verizon.net) Received: from verizon.net ([4.47.70.146]) by out011.verizon.net (InterMail vM.5.01.05.09 201-253-122-126-109-20020611) with ESMTP id <20021008020626.IVCC17563.out011.verizon.net@verizon.net>; Mon, 7 Oct 2002 21:06:26 -0500 Received: (from res03db2@localhost) by verizon.net (8.9.3/8.9.3) id TAA31352; Mon, 7 Oct 2002 19:06:15 -0700 (PDT) (envelope-from res03db2) Date: Mon, 7 Oct 2002 19:06:10 -0700 From: Robert Clark To: Nate Lawson Cc: freebsd-arch@FreeBSD.ORG, freebsd-smp@FreeBSD.ORG Subject: Re: Running independent kernel instances on dual-Xeon/E7500 system Message-ID: <20021007190610.A31292@darkstar.gte.net> References: <20021006115816.A28963@darkstar.gte.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: ; from nate@root.org on Sun, Oct 06, 2002 at 07:48:40PM -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Ok, if I missed the clue boat, where does it call next? I'd like to catch it before it gets too far away. Is FreeBSD getting better on SMP? [RC] On Sun, Oct 06, 2002 at 07:48:40PM -0700, Nate Lawson wrote: > Sorry for the unhelpful first posting. I sent a more detailed letter via > private mail, recommending he look into the exokernel papers. > > My dismissiveness was due to anticipating the direction this was going, > which is nicely shown by the response below. In short, dedicated > processors for IO were used in the minicomputer days but are wasteful > nowadays when you have lightweight interrupts and/or polling when > appropriate. > > If your scheduler sucks, fix it. If a device needs extra processing > equivalent to another N Ghz CPU, the vendor will add silicon. The "S" in > SMP is for symmetric, lest we forget. > > -Nate > > On Sun, 6 Oct 2002, Robert Clark wrote: > > I've often thought it would be nice to be able to devote > > one processor to a RT style OS instance that continuous > > duty doing "throw away" work updating the display, audio, > > etc. > > > > Using a general purpose CPU for graphics and sound work > > may not result in the kinds of performance you get with > > a GPU, but I have to imagine it would have a better > > chance of encouraging "free" driver development. > > > > On the flip side, the OS instance that didn't have > > anything to do with audio/video could spend more of > > its time doing network/disk I/O, and more traditional > > duties. > > > > [RC] > > > > On Sat, Oct 05, 2002 at 02:28:04AM -0700, Terry Lambert wrote: > > > Nate Lawson wrote: > > > > On Fri, 4 Oct 2002, David Francheski wrote: > > > > > I have a dual-Xeon processor (with E7500 chipset) motherboard. > > > > > Can anybody tell me what the development effort would be to > > > > > boot and run two independent copies of the FreeBSD kernel, > > > > > one on each Xeon processor? By this I mean that an SMP > > > > > enabled kernel would not be utilized, each kernel would be UP. > > > > > > > > > > Regards, > > > > > David L. Francheski > > > > > > > > Not possible without another BIOS, PCI bus, and separate memory -- > > > > i.e. another PC. > > > > > > IPL'ing is not the same as "running". So long as you crafted the > > > memory image of the second OS and its page tables, etc., using the > > > first processor, there should be no problem running a second copy > > > of an OS on an AP, as a result of a START IPI from the BP, after > > > the code is crafted. Thus there is no need for a separate BIOS. > > > > > > > > > > > > -- > > > > > > I've personally considered pursuing the ability to run code seperately, > > > though with the same 4G address space, seperated, so as to permit > > > running a debugger against a "crashed" FreeBSD "system" running on an > > > AP, doing the debugging from the BP, as a hosted system. The cost > > > in labor would be 2-3 months of continuous work, I think... that is > > > the estimate I arrived at, when I considered the project previously. > > > Doing this certaily beats the cost of buying an ICE to get similar > > > capability. > > > > > > > > > It would be interesting to see what other people have to say on this, > > > other than "can't be done" (not to pick on you in particular, here; > > > this is the knee-jerk reaction many people have to things like this). > > > > > > -- Terry > > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > > > with "unsubscribe freebsd-smp" in the body of the message > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > > with "unsubscribe freebsd-arch" in the body of the message > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-smp" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 20:22:33 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 073E337B401; Mon, 7 Oct 2002 20:22:32 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4146E43E65; Mon, 7 Oct 2002 20:22:31 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Received: from mousie.catspoiler.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.5/8.12.5) with ESMTP id g983MIvU034090; Mon, 7 Oct 2002 20:22:22 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Message-Id: <200210080322.g983MIvU034090@gw.catspoiler.org> Date: Mon, 7 Oct 2002 20:22:18 -0700 (PDT) From: Don Lewis Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., To: jhb@FreeBSD.org Cc: jmallett@FreeBSD.org, arch@FreeBSD.org In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 7 Oct, John Baldwin wrote: > > On 07-Oct-2002 Don Lewis wrote: >> Probably, but the list is also modified in the exit code. All those >> processes that we are sending SIGKILL to are removing themselves from >> the list. > > Processes dieing from SIGKILL that we send them aren't a problem since > we have already read their p_peers member before we kill them. That's > the point of 'nq'. The problem is that 'nq' could exit and could be > an invalid pointer. If a process later in the list after 'nq' died > that is not a problem either. Well, how about this: I missed your use of nq, even though this is a fairly common way of handling similar problems if there is only a single thread. > http://www.FreeBSD.org/~jhb/patches/ppeers.patch That's pretty much what I had envisioned. I have a little bit of a concern that funnelling a single mutex could be a bottleneck in some cases, but it is simple, safe, and otherwise low overhead. It looks like we've got a potential lock order reversal problem, though. In fork1() we grab ppeers_lock while holding a couple of PROC_LOCKs, while in the first part of exit1() we grab ppeers_lock before PROC_LOCK. My caffeine level is insufficient to judge whether P_WEXIT checking would save us in practice. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 21:55: 2 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A326C37B401 for ; Mon, 7 Oct 2002 21:55:01 -0700 (PDT) Received: from rootlabs.com (root.org [67.118.192.226]) by mx1.FreeBSD.org (Postfix) with SMTP id 3CDBC43E77 for ; Mon, 7 Oct 2002 21:55:01 -0700 (PDT) (envelope-from nate@rootlabs.com) Received: (qmail 9336 invoked by uid 1000); 8 Oct 2002 04:55:03 -0000 Date: Mon, 7 Oct 2002 21:55:03 -0700 (PDT) From: Nate Lawson To: Terry Lambert Cc: freebsd-arch@freebsd.org Subject: Re: Running independent kernel instances on dual-Xeon/E7500 system In-Reply-To: <3DA1F91F.F707826E@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 7 Oct 2002, Terry Lambert wrote: > LinuxRT deals with this by having a hard RT executive, that runs > the Linux kernel as one of its tasks, and assigns resources. Doing > something similar on a two processor box in FreeBSD, without needing > the RT executive because you have an extra CPU, is not that much of > an intuitive leap, I think. > > -- Terry I believe you mean RTLinux and I agree that this is a great approach. -Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 22:43: 7 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 382BB37B401 for ; Mon, 7 Oct 2002 22:43:05 -0700 (PDT) Received: from rootlabs.com (root.org [67.118.192.226]) by mx1.FreeBSD.org (Postfix) with SMTP id AA2FF43E6E for ; Mon, 7 Oct 2002 22:43:04 -0700 (PDT) (envelope-from nate@rootlabs.com) Received: (qmail 9484 invoked by uid 1000); 8 Oct 2002 05:43:06 -0000 Date: Mon, 7 Oct 2002 22:43:06 -0700 (PDT) From: Nate Lawson To: Julian Elischer Cc: Terry Lambert , Sam Leffler , freebsd-arch@freebsd.org, freebsd-net@freebsd.org Subject: Re: CFR: m_tag patch In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 7 Oct 2002, Julian Elischer wrote: > On Mon, 7 Oct 2002, Terry Lambert wrote: > > On Mon, 7 Oct 2002, Julian Elischer wrote: > > > Your tags have a single 16 bit tag ID field. > > > This is insufficient for my needs. > > > I need to be able to store the API cookie which is a 32 bit > > > unsigned number, and on top of that, I also need a 16 bit type field > > > that specifies what the data is within teh given API and a 16 bit > > > length to allow opaque handling. > > > > This is insufficient for Alpha and other 64 bit architectures. I > > think what you are asking for is really a 'void *'. > > IT IS NOT A POINTER! > > it is a 32 bit unsigned number. Ok. > > The other issue here is that your idea of an opaque API/ABI indicator > > is in conflict, unless you say that this is a pointer, and then format > > the initial information pointed to by the pointer. Otherwise, you > > will need a small indirection structure that's pointed to the pointer, > > AND which contains the API/ABI identifier (i.e. you will need two, not > > one piece of information for that -- which is what you show, but not > > what you describe in your text). > > it is not used to look up anything.. > it is used to verify only. > > it is just working on the principal that there is not going to be > a collision in the 32 bit space. Especially when we create them from > "time since the epoch", and when teh various authors can see each > other's choices of value. There are deterministic ways to generate them. 1. A counter -- gettag() { return tag++; } 2. A LCRG -- gettag() { return (A * tag) % n; } 3. A global registry -- "Hey, gimme a major" There are non-deterministic ways as well, i.e. hash functions and PRNGs. And if code can run faster than a given time source, the output of that source or permutation thereof can produce collisions. What leads you towards the time-based option vs. the others, especially the deterministic ones? > > This is moderately bogus. > > no it is not. > > I estimate that the chance of having a collision given all the > factors is 1:2^50 or so ASSUMING THAT 1000000 PEOPLE DEVELOP THEIR OWN > MODULES AND DO NOT CHECK THEM IN BUT DO ALL SHARE THEM WITH EACH OTHER. Hmmm, if they choose them at random, the chance of a collision is sqrt(n) or 1/(2^16). If they choose them perfectly coordinated, the chance is 1/(2^32). Much less than 2^50. -Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 23: 7:17 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 19AEF37B401; Mon, 7 Oct 2002 23:07:16 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 99E4B43E6E; Mon, 7 Oct 2002 23:07:15 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Received: from mousie.catspoiler.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.5/8.12.5) with ESMTP id g9866OvU034411; Mon, 7 Oct 2002 23:06:28 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Message-Id: <200210080606.g9866OvU034411@gw.catspoiler.org> Date: Mon, 7 Oct 2002 23:06:24 -0700 (PDT) From: Don Lewis Subject: Re: CFR: m_tag patch To: nate@root.org Cc: julian@elischer.org, tlambert2@mindspring.com, sam@errno.com, freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 7 Oct, Nate Lawson wrote: > On Mon, 7 Oct 2002, Julian Elischer wrote: >> it is just working on the principal that there is not going to be >> a collision in the 32 bit space. Especially when we create them from >> "time since the epoch", and when teh various authors can see each >> other's choices of value. > > There are deterministic ways to generate them. > 1. A counter -- gettag() { return tag++; } > 2. A LCRG -- gettag() { return (A * tag) % n; } > 3. A global registry -- "Hey, gimme a major" > > There are non-deterministic ways as well, i.e. hash functions and > PRNGs. And if code can run faster than a given time source, the output of > that source or permutation thereof can produce collisions. > > What leads you towards the time-based option vs. the others, especially > the deterministic ones? Why not name them? At boot or module load time stuff the name in a table and use the table index as the 16 bit ID. Is there any reason the ID has to be the same each time the system is booted? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 23:40:12 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AAD8437B401; Mon, 7 Oct 2002 23:40:09 -0700 (PDT) Received: from rwcrmhc53.attbi.com (rwcrmhc53.attbi.com [204.127.198.39]) by mx1.FreeBSD.org (Postfix) with ESMTP id 48E6343E75; Mon, 7 Oct 2002 23:40:09 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc53.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021008064008.CZCQ18767.rwcrmhc53.attbi.com@InterJet.elischer.org>; Tue, 8 Oct 2002 06:40:08 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id XAA38152; Mon, 7 Oct 2002 23:28:30 -0700 (PDT) Date: Mon, 7 Oct 2002 23:28:29 -0700 (PDT) From: Julian Elischer To: Nate Lawson Cc: Terry Lambert , Sam Leffler , freebsd-arch@freebsd.org, freebsd-net@freebsd.org Subject: Re: CFR: m_tag patch In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 7 Oct 2002, Nate Lawson wrote: > On Mon, 7 Oct 2002, Julian Elischer wrote: > > On Mon, 7 Oct 2002, Terry Lambert wrote: > > > On Mon, 7 Oct 2002, Julian Elischer wrote: > > > > Your tags have a single 16 bit tag ID field. > > > > This is insufficient for my needs. > > > > I need to be able to store the API cookie which is a 32 bit > > > > unsigned number, and on top of that, I also need a 16 bit type field > > > > that specifies what the data is within teh given API and a 16 bit > > > > length to allow opaque handling. > > > > > > This is insufficient for Alpha and other 64 bit architectures. I > > > think what you are asking for is really a 'void *'. > > > > IT IS NOT A POINTER! > > > > it is a 32 bit unsigned number. > > Ok. > > > > The other issue here is that your idea of an opaque API/ABI indicator > > > is in conflict, unless you say that this is a pointer, and then format > > > the initial information pointed to by the pointer. Otherwise, you > > > will need a small indirection structure that's pointed to the pointer, > > > AND which contains the API/ABI identifier (i.e. you will need two, not > > > one piece of information for that -- which is what you show, but not > > > what you describe in your text). > > > > it is not used to look up anything.. > > it is used to verify only. > > > > it is just working on the principal that there is not going to be > > a collision in the 32 bit space. Especially when we create them from > > "time since the epoch", and when teh various authors can see each > > other's choices of value. > > There are deterministic ways to generate them. > 1. A counter -- gettag() { return tag++; } > 2. A LCRG -- gettag() { return (A * tag) % n; } > 3. A global registry -- "Hey, gimme a major" > > There are non-deterministic ways as well, i.e. hash functions and > PRNGs. And if code can run faster than a given time source, the output of > that source or permutation thereof can produce collisions. > > What leads you towards the time-based option vs. the others, especially > the deterministic ones? > > > > This is moderately bogus. > > > > no it is not. > > > > I estimate that the chance of having a collision given all the > > factors is 1:2^50 or so ASSUMING THAT 1000000 PEOPLE DEVELOP THEIR OWN > > MODULES AND DO NOT CHECK THEM IN BUT DO ALL SHARE THEM WITH EACH OTHER. > > Hmmm, if they choose them at random, the chance of a collision is > sqrt(n) or 1/(2^16). If they choose them perfectly coordinated, the > chance is 1/(2^32). Much less than 2^50. Nate, what is the chance you will load 1 million netgraph node types? If you load 3, what is the chance they were all non-checked in nodes and the authors weren't communicating, and you are the first person to ever combine them? now multiply That by 2^32 down 2 > > -Nate > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Oct 7 23:40:21 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2E7BA37B407; Mon, 7 Oct 2002 23:40:20 -0700 (PDT) Received: from rwcrmhc53.attbi.com (rwcrmhc53.attbi.com [204.127.198.39]) by mx1.FreeBSD.org (Postfix) with ESMTP id CD31E43E42; Mon, 7 Oct 2002 23:40:19 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc53.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021008064019.CZES18767.rwcrmhc53.attbi.com@InterJet.elischer.org>; Tue, 8 Oct 2002 06:40:19 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id XAA38161; Mon, 7 Oct 2002 23:33:21 -0700 (PDT) Date: Mon, 7 Oct 2002 23:33:20 -0700 (PDT) From: Julian Elischer To: Don Lewis Cc: nate@root.org, tlambert2@mindspring.com, sam@errno.com, freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: m_tag patch In-Reply-To: <200210080606.g9866OvU034411@gw.catspoiler.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 7 Oct 2002, Don Lewis wrote: > On 7 Oct, Nate Lawson wrote: > > On Mon, 7 Oct 2002, Julian Elischer wrote: > > >> it is just working on the principal that there is not going to be > >> a collision in the 32 bit space. Especially when we create them from > >> "time since the epoch", and when teh various authors can see each > >> other's choices of value. > > > > There are deterministic ways to generate them. > > 1. A counter -- gettag() { return tag++; } > > 2. A LCRG -- gettag() { return (A * tag) % n; } > > 3. A global registry -- "Hey, gimme a major" > > > > There are non-deterministic ways as well, i.e. hash functions and > > PRNGs. And if code can run faster than a given time source, the output of > > that source or permutation thereof can produce collisions. > > > > What leads you towards the time-based option vs. the others, especially > > the deterministic ones? > > Why not name them? At boot or module load time stuff the name in a > table and use the table index as the 16 bit ID. Is there any reason the > ID has to be the same each time the system is booted? I want to be able to specify an OLD API and the NEW version if I can only get a particular node in object form, and I knowi uses the old version, and some other code I have uses the new version, and I need them to co-exist. one binary sync driver and one opensource drive,, running 2 sync cards, both feeding into the "framerealy" code. All the "perfect" methods are more work than this really requires sonce I'm pretty sure that a collision will not occur in the lifetime of this civilisation. > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 0:43:50 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A446E37B401; Tue, 8 Oct 2002 00:43:49 -0700 (PDT) Received: from avocet.mail.pas.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id B5F8343E8A; Tue, 8 Oct 2002 00:43:45 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0058.cvx21-bradley.dialup.earthlink.net ([209.179.192.58] helo=mindspring.com) by avocet.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17yp1o-0004mR-00; Tue, 08 Oct 2002 00:43:40 -0700 Message-ID: <3DA28C65.DEA3E379@mindspring.com> Date: Tue, 08 Oct 2002 00:42:29 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Don Lewis Cc: nate@root.org, julian@elischer.org, sam@errno.com, freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: m_tag patch References: <200210080606.g9866OvU034411@gw.catspoiler.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Don Lewis wrote: > Why not name them? At boot or module load time stuff the name in a > table and use the table index as the 16 bit ID. Is there any reason the > ID has to be the same each time the system is booted? That was my suggestion. I stopped short of calling it XInternAtom()... -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 0:55: 7 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 288C237B401; Tue, 8 Oct 2002 00:55:05 -0700 (PDT) Received: from gull.mail.pas.earthlink.net (gull.mail.pas.earthlink.net [207.217.120.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id B6EF443E77; Tue, 8 Oct 2002 00:55:04 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0058.cvx21-bradley.dialup.earthlink.net ([209.179.192.58] helo=mindspring.com) by gull.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17ypCi-0004Yb-00; Tue, 08 Oct 2002 00:54:56 -0700 Message-ID: <3DA28F08.658274C3@mindspring.com> Date: Tue, 08 Oct 2002 00:53:44 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Julian Elischer Cc: Don Lewis , nate@root.org, sam@errno.com, freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: m_tag patch References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Julian Elischer wrote: > > Why not name them? At boot or module load time stuff the name in a > > table and use the table index as the 16 bit ID. Is there any reason the > > ID has to be the same each time the system is booted? > > I want to be able to specify an OLD API and the NEW version > if I can only get a particular node in object form, and I knowi uses the > old version, and some other code I have uses the new version, > and I need them to co-exist. > > one binary sync driver and one opensource drive,, running 2 sync cards, > both feeding into the "framerealy" code. > > All the "perfect" methods are more work than this really requires sonce > I'm pretty sure that a collision will not occur in the lifetime of this > civilisation. I've often thought that an interning process for atoms was actually a good idea for the kernel. The place I first thought of using this approach is for FS ID's. As things currently sit, there are header files that need to be hacked to add new members to (theoretically) anonymous classes of objects. One of the most egregious files in this regard is vnode.h, for the enumerated type values in 'vtype' and in 'vtagtype'. As an example, copy the NULLFS code to "FOOFS" instead, do all the name replacement in it, and see what breaks and/or wht gets accounted incorrectly. 8-(. Among other things, if you could intern them, and then enumerate them, based on defined classes, you could get rid of things like the socket protocol family crap, and most of the places where you end up pushing strings in API's across the user/kernel boundary. IMO, most strings you push across should be considered const members of range restricted sets. This is happy, in that it would work for the netgraph API ID code, as well (you look up a value by looking up the atom in a class table to get an ID, and then pass the ID around. I think the ID should be opaque, but you could call it a 16 or 32 bit calue, if you wanted to insist that it not be a pointer. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 1:20:17 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B72D837B401; Tue, 8 Oct 2002 01:20:15 -0700 (PDT) Received: from mailhub.fokus.gmd.de (mailhub.fokus.gmd.de [193.174.154.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 49CA943E4A; Tue, 8 Oct 2002 01:20:14 -0700 (PDT) (envelope-from brandt@fokus.gmd.de) Received: from beagle (beagle [193.175.132.100]) by mailhub.fokus.gmd.de (8.11.6/8.11.6) with ESMTP id g988Jqe07033; Tue, 8 Oct 2002 10:19:52 +0200 (MEST) Date: Tue, 8 Oct 2002 10:19:52 +0200 (CEST) From: Harti Brandt To: Don Lewis Cc: nate@root.org, , , , , Subject: Re: CFR: m_tag patch In-Reply-To: <200210080606.g9866OvU034411@gw.catspoiler.org> Message-ID: <20021008100826.H77302-100000@beagle.fokus.gmd.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 7 Oct 2002, Don Lewis wrote: DL>On 7 Oct, Nate Lawson wrote: DL>> On Mon, 7 Oct 2002, Julian Elischer wrote: DL> DL>>> it is just working on the principal that there is not going to be DL>>> a collision in the 32 bit space. Especially when we create them from DL>>> "time since the epoch", and when teh various authors can see each DL>>> other's choices of value. DL>> DL>> There are deterministic ways to generate them. DL>> 1. A counter -- gettag() { return tag++; } DL>> 2. A LCRG -- gettag() { return (A * tag) % n; } DL>> 3. A global registry -- "Hey, gimme a major" DL>> DL>> There are non-deterministic ways as well, i.e. hash functions and DL>> PRNGs. And if code can run faster than a given time source, the output of DL>> that source or permutation thereof can produce collisions. DL>> DL>> What leads you towards the time-based option vs. the others, especially DL>> the deterministic ones? DL> DL>Why not name them? At boot or module load time stuff the name in a DL>table and use the table index as the 16 bit ID. Is there any reason the DL>ID has to be the same each time the system is booted? Well, the point is that you need a common name between netgraph nodes and their controling application. As it is now this common name is the 32-bit cookie generated by issuing "date -u +'%s'" (so no timezone problem here). I have, for example, an user space ILMI daemon for ATM. This daemon needs to talk to the call control netgraph node. To do this the header file for the call control node contains #define NGM_CCATM_COOKIE 984046139 both, the netgraph node and the daemon include this file and the daemon addresses messages to the node by filling in the cookie into the appropriate field in the message. The node filters out the messages by compare the cookie field with the above cookie. If you use a dynamically generated cookie (be it a ++tags or a hash over a string or the address of a kernel structure) both the user space application and the node would need to call the code that generates these cookies with just another cookie (for example a string). So what you would do is to replace the 32-bit cookie with, for example, a string cookie. The question is, would a string cookie reduce the probability of conflicts on cookies? This question is rather hard to answer, because on one hand strings may contain more bits, but people would try to use descriptive and short cookie. I see a very high probability of two people that develop a ppp node to use the same string "ppp". This would be bad, because the actual API they implement would for sure be different. With the above method to choose the 32-bit cookie, this wouldn't happen. Given that the netgraph-hype is not of the dimensions of the Java-hype two people starting to develop netgraph node at the same moment of UTC is rather improbable. The only option that would make sense would be a assigned numbers authority, but again, given the dimensions of the netgraph-hype - is it worth the effort? harti -- harti brandt, http://www.fokus.gmd.de/research/cc/cats/employees/hartmut.brandt/private brandt@fokus.gmd.de, brandt@fokus.fhg.de To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 4: 1:30 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8E12437B401 for ; Tue, 8 Oct 2002 04:01:29 -0700 (PDT) Received: from vbook.express.ru (asplinux.ru [195.133.213.194]) by mx1.FreeBSD.org (Postfix) with ESMTP id 152EF43E65 for ; Tue, 8 Oct 2002 04:01:29 -0700 (PDT) (envelope-from vova@express.ru) Received: from vova by vbook.express.ru with local (Exim 3.36 #1) id 17ys73-0001wm-00; Tue, 08 Oct 2002 15:01:17 +0400 Subject: using mem above 4Gb was: swapon some regular file From: "Vladimir B. " Grebenschikov To: Mikhail Teterin Cc: arch@FreeBSD.org In-Reply-To: <200210071630.42512.mi+mx@aldan.algebra.com> References: <200210071630.42512.mi+mx@aldan.algebra.com> Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: quoted-printable X-Mailer: Ximian Evolution 1.0.7 Date: 08 Oct 2002 15:01:16 +0400 Message-Id: <1034074876.917.23.camel@vbook.express.ru> Mime-Version: 1.0 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG =F7 Tue, 08.10.2002, =D7 00:30, Mikhail Teterin =CE=C1=D0=C9=D3=C1=CC: > Users wishing to swap onto a local regular file have to go through the > vnconfig/mdconfig gimnastics. Is that intentional? Yes. May be we need add new type to md device, like "highmem", to access memory above 4G as memory disk, and as consequence use it as swap-device or as fast /tmp/ partition or whatever ? In this case we will be able to use more than 3Gb of RAM. =20 > Thanks! =20 --=20 Vladimir B. Grebenschikov vova@sw.ru, SWsoft, Inc. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 7:14:30 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3490B37B408 for ; Tue, 8 Oct 2002 07:14:28 -0700 (PDT) Received: from mail.speakeasy.net (mail14.speakeasy.net [216.254.0.214]) by mx1.FreeBSD.org (Postfix) with ESMTP id B72EE43E42 for ; Tue, 8 Oct 2002 07:14:27 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 26005 invoked from network); 8 Oct 2002 14:14:27 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail14.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 8 Oct 2002 14:14:27 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id g98EEPn5006134; Tue, 8 Oct 2002 10:14:25 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <200210080322.g983MIvU034090@gw.catspoiler.org> Date: Tue, 08 Oct 2002 10:14:29 -0400 (EDT) From: John Baldwin To: Don Lewis Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., Cc: arch@FreeBSD.org, jmallett@FreeBSD.org Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 08-Oct-2002 Don Lewis wrote: > On 7 Oct, John Baldwin wrote: >> >> On 07-Oct-2002 Don Lewis wrote: > >>> Probably, but the list is also modified in the exit code. All those >>> processes that we are sending SIGKILL to are removing themselves from >>> the list. >> >> Processes dieing from SIGKILL that we send them aren't a problem since >> we have already read their p_peers member before we kill them. That's >> the point of 'nq'. The problem is that 'nq' could exit and could be >> an invalid pointer. If a process later in the list after 'nq' died >> that is not a problem either. Well, how about this: > > I missed your use of nq, even though this is a fairly common way of > handling similar problems if there is only a single thread. > >> http://www.FreeBSD.org/~jhb/patches/ppeers.patch > > That's pretty much what I had envisioned. I have a little bit of a > concern that funnelling a single mutex could be a bottleneck in some > cases, but it is simple, safe, and otherwise low overhead. Well, the mutex is only used in the RFTHREAD case most of the time. The only time it is uncondtionally acquired it is almost immediately released in the !RFTHREAD case. > It looks like we've got a potential lock order reversal problem, though. > In fork1() we grab ppeers_lock while holding a couple of PROC_LOCKs, > while in the first part of exit1() we grab ppeers_lock before PROC_LOCK. > My caffeine level is insufficient to judge whether P_WEXIT checking > would save us in practice. Bah, fixed the reversal, thanks. We still need the P_WEXIT check in fork1() since otherwise a new peer or child could be added after we have finished going through the entire list. Hmm, adding this is ugly though b/c we really need to check after we acquire the ppeers_lock and do the actual hookup. Hmm, we can move the RFTHREAD stuff a lot earlier and then this isn't such a big deal. Ok, I've updated the patch again. One note: I've got a question about how to handle the error condition in that case in fork1(). I'm really starting to think that instead of returning an error, the peer process should just go ahead and call exit1() in this case since it is about to be killed anyways. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 7:35:18 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1382237B401 for ; Tue, 8 Oct 2002 07:35:17 -0700 (PDT) Received: from fed1mtao03.cox.net (fed1mtao03.cox.net [68.6.19.242]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8BD2543E65 for ; Tue, 8 Oct 2002 07:35:16 -0700 (PDT) (envelope-from dchrist@cox.net) Received: from linus ([68.4.176.221]) by fed1mtao03.cox.net (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP id <20021008143514.BUES12156.fed1mtao03.cox.net@linus> for ; Tue, 8 Oct 2002 10:35:14 -0400 From: "David Christensen" To: Subject: Device Driver Overview Date: Tue, 8 Oct 2002 07:34:12 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4024 Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I'm new to FreeBSD and I'm porting a disk controller over from Linux. So far the module is loading and the controller is being initialized, so the next step is adding drives. I can't seem to find a good overview of how the device driver is supposed to be structured so I'm piecing one together to get a feel on how things are supposed to work (I've been looking at the 3Ware IDE RAID controller as an example under sys/twe on FreeBSD 4.6-RELEASE). Here's what I understand so far: 1) Load the kernel module 2) The kernel will call the device_probe method for the controller (as defined in the device_method_t table) for any unattached devices on the parent bus (PCI in this case) looking for a match. 3) If the device_probe returns true, the controller's device_attach method is called, which sets up the device (claiming resources, initializing data structures, etc). 4) During the controller's device_attach method, if any drives are found, they added into the system through calls to device_add_child(), device_set_ivars(), and bus_generic_attach(). 5) The newly created disks generate a call to the drive's device_probe method. 6) If the drive's device_probe returns true (which it always does for the 3Ware controller), the drive's device_attach module is called, which adds the drive through calls to devstat_add_entry, and disk_create(). Assuming the above is correct (if I'm missing something or I've got anything wrong I'd appreciate hearing about it), a couple of questions come to mind. 1) After completing step 6, is the disk usable (accessible through its device node)? 2) In the cdevsw structure for the drives, what do the physread and physwrite entries mean? 3) How do actual read/write requests make it to the device driver? David Christensen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 7:57:19 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5359E37B401 for ; Tue, 8 Oct 2002 07:57:18 -0700 (PDT) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 67BE743E6A for ; Tue, 8 Oct 2002 07:57:17 -0700 (PDT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id g98Ev5pS028319; Tue, 8 Oct 2002 16:57:06 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: "David Christensen" Cc: arch@FreeBSD.ORG Subject: Re: Device Driver Overview In-Reply-To: Your message of "Tue, 08 Oct 2002 07:34:12 PDT." Date: Tue, 08 Oct 2002 16:57:05 +0200 Message-ID: <28318.1034089025@critter.freebsd.dk> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message , "David Ch ristensen" writes: >1) After completing step 6, is the disk usable (accessible through its >device node)? >2) In the cdevsw structure for the drives, what do the physread and >physwrite entries mean? They point to generic routines which will do some tedious work and call the driver back trhough the strategy routine. >3) How do actual read/write requests make it to the device driver? Which is the answer to this question: though the strategy routine. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 8:19:18 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7CDBB37B401 for ; Tue, 8 Oct 2002 08:19:15 -0700 (PDT) Received: from snipe.mail.pas.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1106843E75 for ; Tue, 8 Oct 2002 08:19:15 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0142.cvx22-bradley.dialup.earthlink.net ([209.179.198.142] helo=mindspring.com) by snipe.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17yw8M-0007nV-00; Tue, 08 Oct 2002 08:18:54 -0700 Message-ID: <3DA2F716.C5B69C7C@mindspring.com> Date: Tue, 08 Oct 2002 08:17:42 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "Vladimir B. Grebenschikov" Cc: Mikhail Teterin , arch@FreeBSD.org Subject: Re: using mem above 4Gb was: swapon some regular file References: <200210071630.42512.mi+mx@aldan.algebra.com> <1034074876.917.23.camel@vbook.express.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG "Vladimir B. Grebenschikov" wrote: > May be we need add new type to md device, like "highmem", to access > memory above 4G as memory disk, and as consequence use it as swap-device > or as fast /tmp/ partition or whatever ? > > In this case we will be able to use more than 3Gb of RAM. We can use 4G now. But: KVA + UVA + window <= 4G ...so by adding a window in which to access the extra memory above 4G, you actually *reduce* the amount of RAM available to either each process, or the kernel, or both. A RAM-disk is probably not worth doing this; the place most people are bumping their head is the UVA (data space for the process itself) or KVA (data space for mbufs, mappings for pages, etc.). For example, if you have 4G of RAM, to support a large number of network connections, you have to spend ~2G on mbufs, which means spending 1G on mappings and other kernel structures, leaving only 1G for UVA. That means that, in order to get your RAM disk, you have to either firther reduce the size of your server processes, or you have to reduce the number of connections you will be able to support simultaneously. Example: 64k simultaneous connections * 32k window per connection = 2G of mbufs ...say you overcommit this memory by a factor of 4; you are still only talking a quarter of a million connections. If you hack all your kernel allocations to use the minimum amount possible, and pare down your structures to get rid of kevent pointers that you don't use, and other things you don't use, you can steal some of the 1G KVA for more mbufs. Then, if you hack the TCP stack window management code rather signficantly (e.g. drop the average window to 4k), then you can push 1,000,000 connections. That leaves you about 512b of context per connection in the user space applicaition. The best I've ever done is 1.6 million simultaneous connections; to do that, I had to drop space out of a lot of structures (64 bytes for 1,000,000 connections is 64M of RAM -- not insignificant). So whatever connections you are getting now... halve that, or less, to get a window for your RAM disk (you will need KVA for mappings for all the memory that *can* be in the window, etc.). It's not really worth using it directly. On the other hand, if you could allocate pools of memory for per processor use, you basically gain most of that overhead back -- though, without TCP/IP stack changes and interrupt processing changes, you can't use the regained memory for, e.g., mbufs, because the way things stand now, mbufs have to be visible at: o DMA o IRQ o NETISR o Application ...and you can't guarantee a nice clean division, because you don't route interrupts to a particular CPU, or have connections to sockets in a particular CPU's address space, or have your applications running on a particular CPU, so that the CPU can have a seperate address space, so you don't have to worry about migration, etc.. So for an extremely high capacity box, you will have to do tricks, like logically splitting the box into seperate virtual machines, and seperating out the code path from the network card, all the way to the application. Just like we were discussing earlier. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 8:42:49 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9CFAD37B401 for ; Tue, 8 Oct 2002 08:42:48 -0700 (PDT) Received: from avocet.mail.pas.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 151FE43E3B for ; Tue, 8 Oct 2002 08:42:44 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0142.cvx22-bradley.dialup.earthlink.net ([209.179.198.142] helo=mindspring.com) by avocet.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17ywVC-00078Z-00; Tue, 08 Oct 2002 08:42:31 -0700 Message-ID: <3DA2FC9F.22C66877@mindspring.com> Date: Tue, 08 Oct 2002 08:41:19 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "Vladimir B. Grebenschikov" , Mikhail Teterin , arch@FreeBSD.org Subject: Re: using mem above 4Gb was: swapon some regular file References: <200210071630.42512.mi+mx@aldan.algebra.com> <1034074876.917.23.camel@vbook.express.ru> <3DA2F716.C5B69C7C@mindspring.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Terry Lambert wrote: > So whatever connections you are getting now... halve that, or less, > to get a window for your RAM disk (you will need KVA for mappings > for all the memory that *can* be in the window, etc.). To emphasize this: if you are using 4K pages, you will need: 4K/1M * 64G = 256M ...1/4 of 1G of memory outside the window, just for page tables. Also, if we still were using an mbuf per connection for the template, for 1,000,000 connections, that's 256M of RAM -- another 1/4 gig. Yeah, most people don't think in these terms; personally, I like to call it "Extreme BSD". 8-). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 8:48:28 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1C0A337B407 for ; Tue, 8 Oct 2002 08:48:27 -0700 (PDT) Received: from hawk.mail.pas.earthlink.net (hawk.mail.pas.earthlink.net [207.217.120.22]) by mx1.FreeBSD.org (Postfix) with ESMTP id C38CD43E4A for ; Tue, 8 Oct 2002 08:48:16 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0142.cvx22-bradley.dialup.earthlink.net ([209.179.198.142] helo=mindspring.com) by hawk.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17ywaR-0006uF-00; Tue, 08 Oct 2002 08:47:55 -0700 Message-ID: <3DA2FDE4.9A485E35@mindspring.com> Date: Tue, 08 Oct 2002 08:46:44 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "Vladimir B. Grebenschikov" , Mikhail Teterin , arch@FreeBSD.org Subject: Re: using mem above 4Gb was: swapon some regular file References: <200210071630.42512.mi+mx@aldan.algebra.com> <1034074876.917.23.camel@vbook.express.ru> <3DA2F716.C5B69C7C@mindspring.com> <3DA2FC9F.22C66877@mindspring.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Oops. Divided by 4 on the page tables. Still a hell of a lot of your available memory. -- Terry Terry Lambert wrote: > > Terry Lambert wrote: > > So whatever connections you are getting now... halve that, or less, > > to get a window for your RAM disk (you will need KVA for mappings > > for all the memory that *can* be in the window, etc.). > > To emphasize this: if you are using 4K pages, you will need: > > 4K/1M * 64G = 256M > > ...1/4 of 1G of memory outside the window, just for page tables. > > Also, if we still were using an mbuf per connection for the > template, for 1,000,000 connections, that's 256M of RAM -- another > 1/4 gig. > > Yeah, most people don't think in these terms; personally, I like > to call it "Extreme BSD". 8-). > > -- Terry > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 9:11:33 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1037F37B401 for ; Tue, 8 Oct 2002 09:11:32 -0700 (PDT) Received: from corbulon.video-collage.com (corbulon.video-collage.com [64.35.99.179]) by mx1.FreeBSD.org (Postfix) with ESMTP id A7D7743E3B for ; Tue, 8 Oct 2002 09:11:27 -0700 (PDT) (envelope-from mi+mx@aldan.algebra.com) Received: from misha.murex.com (250-217.customer.cloud9.net [168.100.250.217]) by corbulon.video-collage.com (8.12.2/8.12.2) with ESMTP id g98Fmv1P093751 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=FAIL); Tue, 8 Oct 2002 11:48:57 -0400 (EDT) (envelope-from mi+mx@aldan.algebra.com) X-Authentication-Warning: corbulon.video-collage.com: Host 250-217.customer.cloud9.net [168.100.250.217] claimed to be misha.murex.com Content-Type: text/plain; charset="iso-8859-1" From: Mikhail Teterin Organization: Virtual Estates, Inc. To: Terry Lambert , "Vladimir B. Grebenschikov" , arch@FreeBSD.org Subject: Re: using mem above 4Gb was: swapon some regular file Date: Tue, 8 Oct 2002 11:50:47 -0400 User-Agent: KMail/1.4.3 References: <200210071630.42512.mi+mx@aldan.algebra.com> <3DA2F716.C5B69C7C@mindspring.com> <3DA2FC9F.22C66877@mindspring.com> In-Reply-To: <3DA2FC9F.22C66877@mindspring.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-Id: <200210081150.47943.mi+mx@aldan.algebra.com> X-Scanned-By: MIMEDefang 2.15 (www dot roaringpenguin dot com slash mimedefang) Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tuesday 08 October 2002 11:41 am, Terry Lambert wrote: = Terry Lambert wrote: = > So whatever connections you are getting now... halve that, or less, = > to get a window for your RAM disk (you will need KVA for mappings = > for all the memory that *can* be in the window, etc.). = = To emphasize this: if you are using 4K pages, you will need: = = 4K/1M * 64G = 256M = = ...1/4 of 1G of memory outside the window, just for page tables. = = Also, if we still were using an mbuf per connection for the = template, for 1,000,000 connections, that's 256M of RAM -- another = 1/4 gig. = Yeah, most people don't think in these terms; personally, I like = to call it "Extreme BSD". 8-). Although this is fascinating read -- it getting further and further away from the original subject. And from the modified one too -- I don't believe Vladimir said anything about networking... -mi To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 9:24:13 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E318A37B404 for ; Tue, 8 Oct 2002 09:24:11 -0700 (PDT) Received: from vbook.express.ru (asplinux.ru [195.133.213.194]) by mx1.FreeBSD.org (Postfix) with ESMTP id 447E143E4A for ; Tue, 8 Oct 2002 09:24:11 -0700 (PDT) (envelope-from vova@express.ru) Received: from vova by vbook.express.ru with local (Exim 3.36 #1) id 17yx9K-0000Hk-00; Tue, 08 Oct 2002 20:23:58 +0400 Subject: Re: using mem above 4Gb was: swapon some regular file From: "Vladimir B. " Grebenschikov To: Mikhail Teterin Cc: Terry Lambert , arch@FreeBSD.org In-Reply-To: <200210081150.47943.mi+mx@aldan.algebra.com> References: <200210071630.42512.mi+mx@aldan.algebra.com> <3DA2F716.C5B69C7C@mindspring.com> <3DA2FC9F.22C66877@mindspring.com> <200210081150.47943.mi+mx@aldan.algebra.com> Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: quoted-printable X-Mailer: Ximian Evolution 1.0.7 Date: 08 Oct 2002 20:23:58 +0400 Message-Id: <1034094238.899.14.camel@vbook.express.ru> Mime-Version: 1.0 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG =F7 Tue, 08.10.2002, =D7 19:50, Mikhail Teterin =CE=C1=D0=C9=D3=C1=CC: > On Tuesday 08 October 2002 11:41 am, Terry Lambert wrote: > =3D Terry Lambert wrote: > =3D > So whatever connections you are getting now... halve that, or less, > =3D > to get a window for your RAM disk (you will need KVA for mappings > =3D > for all the memory that *can* be in the window, etc.). > =3D=20 > =3D To emphasize this: if you are using 4K pages, you will need: > =3D=20 > =3D 4K/1M * 64G =3D 256M > =3D=20 > =3D ...1/4 of 1G of memory outside the window, just for page tables. > =3D=20 > =3D Also, if we still were using an mbuf per connection for the > =3D template, for 1,000,000 connections, that's 256M of RAM -- another > =3D 1/4 gig. > =20 > =3D Yeah, most people don't think in these terms; personally, I like > =3D to call it "Extreme BSD". 8-). >=20 > Although this is fascinating read -- it getting further and further away > from the original subject. And from the modified one too -- I don't > believe Vladimir said anything about networking... Exactly, Terry is right about large number of relative-small network-access processes (say apaches). But there are some other cases, say you have some DB server with huge index, say 10Gb, I think keep index in RAM effective than on disk. Actually question is density of KVA consumption per Mb of used RAM. =20 > -mi --=20 Vladimir B. Grebenschikov vova@sw.ru, SWsoft, Inc. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 11:25:45 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9526C37B401 for ; Tue, 8 Oct 2002 11:25:44 -0700 (PDT) Received: from rootlabs.com (root.org [67.118.192.226]) by mx1.FreeBSD.org (Postfix) with SMTP id A180143E8A for ; Tue, 8 Oct 2002 11:25:40 -0700 (PDT) (envelope-from nate@rootlabs.com) Received: (qmail 11393 invoked by uid 1000); 8 Oct 2002 18:25:41 -0000 Date: Tue, 8 Oct 2002 11:25:41 -0700 (PDT) From: Nate Lawson To: Terry Lambert Cc: freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG Subject: Re: CFR: m_tag patch In-Reply-To: <3DA28F08.658274C3@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, 8 Oct 2002, Terry Lambert wrote: > I've often thought that an interning process for atoms was actually > a good idea for the kernel. > > The place I first thought of using this approach is for FS ID's. As > things currently sit, there are header files that need to be hacked > to add new members to (theoretically) anonymous classes of objects. > One of the most egregious files in this regard is vnode.h, for the > enumerated type values in 'vtype' and in 'vtagtype'. Well, I killed vtagtype in -current. > As an example, copy the NULLFS code to "FOOFS" instead, do all the > name replacement in it, and see what breaks and/or wht gets accounted > incorrectly. 8-(. > > Among other things, if you could intern them, and then enumerate them, > based on defined classes, you could get rid of things like the > socket protocol family crap, and most of the places where you end > up pushing strings in API's across the user/kernel boundary. IMO, > most strings you push across should be considered const members of > range restricted sets. > > This is happy, in that it would work for the netgraph API ID code, > as well (you look up a value by looking up the atom in a class > table to get an ID, and then pass the ID around. > > I think the ID should be opaque, but you could call it a 16 or 32 > bit calue, if you wanted to insist that it not be a pointer. I am really against using an extremely large space (32 bits) in a sparse manner just because the algorithm is non-deterministic. The only exception is when the system may be attacked (i.e. a cryptographic hash function). In this situation, all players are cooperating but may make mistakes or be lazy if the system is difficult. Whatever system is chosen, there is no reason to make the algorithm non-deterministic. -Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 12:10:40 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 313F137B401 for ; Tue, 8 Oct 2002 12:10:39 -0700 (PDT) Received: from rootlabs.com (root.org [67.118.192.226]) by mx1.FreeBSD.org (Postfix) with SMTP id CB02D43E6A for ; Tue, 8 Oct 2002 12:10:38 -0700 (PDT) (envelope-from nate@rootlabs.com) Received: (qmail 11470 invoked by uid 1000); 8 Oct 2002 19:10:39 -0000 Date: Tue, 8 Oct 2002 12:10:39 -0700 (PDT) From: Nate Lawson To: "Vladimir B. Grebenschikov" Cc: arch@FreeBSD.org Subject: Re: using mem above 4Gb was: swapon some regular file In-Reply-To: <1034094238.899.14.camel@vbook.express.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN Content-Transfer-Encoding: 8BIT Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 8 Oct 2002, Vladimir B. Grebenschikov wrote: > ÷ Tue, 08.10.2002, × 19:50, Mikhail Teterin ÎÁÐÉÓÁÌ: > > On Tuesday 08 October 2002 11:41 am, Terry Lambert wrote: > > = Terry Lambert wrote: > > = > So whatever connections you are getting now... halve that, or less, > > = > to get a window for your RAM disk (you will need KVA for mappings > > = > for all the memory that *can* be in the window, etc.). > > = > > = To emphasize this: if you are using 4K pages, you will need: > > = > > = 4K/1M * 64G = 256M > > = > > = ...1/4 of 1G of memory outside the window, just for page tables. > > = > > = Also, if we still were using an mbuf per connection for the > > = template, for 1,000,000 connections, that's 256M of RAM -- another > > = 1/4 gig. > > > > = Yeah, most people don't think in these terms; personally, I like > > = to call it "Extreme BSD". 8-). > > > > Although this is fascinating read -- it getting further and further away > > from the original subject. And from the modified one too -- I don't > > believe Vladimir said anything about networking... > > Exactly, Terry is right about large number of relative-small > network-access processes (say apaches). But there are some other cases, > say you have some DB server with huge index, say 10Gb, I think keep > index in RAM effective than on disk. It's often surprisingly effective to just access the index on disk and tune your VM cache instead. You can lose performance by double-caching data. -Nate You teach a child to read and he or her will be able to pass a literacy test. -- President Bush, 2/21/2001 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 12:16:21 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BD34B37B401 for ; Tue, 8 Oct 2002 12:16:19 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9DF5143E77 for ; Tue, 8 Oct 2002 12:16:15 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Received: from mousie.catspoiler.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.5/8.12.5) with ESMTP id g98JG2vU036242; Tue, 8 Oct 2002 12:16:07 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Message-Id: <200210081916.g98JG2vU036242@gw.catspoiler.org> Date: Tue, 8 Oct 2002 12:16:02 -0700 (PDT) From: Don Lewis Subject: Re: CFR: m_tag patch To: brandt@fokus.gmd.de Cc: nate@root.org, julian@elischer.org, tlambert2@mindspring.com, sam@errno.com, freebsd-arch@FreeBSD.ORG, freebsd-net@FreeBSD.ORG In-Reply-To: <20021008100826.H77302-100000@beagle.fokus.gmd.de> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 8 Oct, Harti Brandt wrote: > On Mon, 7 Oct 2002, Don Lewis wrote: > DL>Why not name them? At boot or module load time stuff the name in a > DL>table and use the table index as the 16 bit ID. Is there any reason the > DL>ID has to be the same each time the system is booted? > > Well, the point is that you need a common name between netgraph nodes and > their controling application. As it is now this common name is the 32-bit > cookie generated by issuing "date -u +'%s'" (so no timezone problem here). > > I have, for example, an user space ILMI daemon for ATM. This daemon needs > to talk to the call control netgraph node. To do this the header file for > the call control node contains > > #define NGM_CCATM_COOKIE 984046139 > > both, the netgraph node and the daemon include this file and the daemon > addresses messages to the node by filling in the cookie into the > appropriate field in the message. The node filters out the messages by > compare the cookie field with the above cookie. In this implementation you also have to worry about collisions in the include file name, possibly the #define name, as well as the actual cookie. > If you use a dynamically generated cookie (be it a ++tags or a hash over a > string or the address of a kernel structure) both the user space > application and the node would need to call the code that generates these > cookies with just another cookie (for example a string). So what you would > do is to replace the 32-bit cookie with, for example, a string cookie. > > The question is, would a string cookie reduce the probability of conflicts > on cookies? This question is rather hard to answer, because on one hand > strings may contain more bits, but people would try to use descriptive and > short cookie. I see a very high probability of two people that develop a > ppp node to use the same string "ppp". This would be bad, because the > actual API they implement would for sure be different. With the above > method to choose the 32-bit cookie, this wouldn't happen. Given that the > netgraph-hype is not of the dimensions of the Java-hype two people > starting to develop netgraph node at the same moment of UTC is rather > improbable. Pick a convention for generating the string cookies: *printf(..., "netgraphid_%08x", 32bitnetgraphid) netgraph_brandt_ccatm_v1.234 netgraph_brandt_ccatm_`date -u` All of these allow different versions to simultaneously coexist. In the latter two examples if the API is rich enough and the proper naming convention is chosen, a client could even look to see if a "close enough" version is already installed. I see the problem of arranging the rendezvous between the user and kernel parts as totally separate from the tag that finally gets tacked onto each packet. The latter only has to be unique for the system uptime. > The only option that would make sense would be a assigned numbers > authority, but again, given the dimensions of the netgraph-hype - is it > worth the effort? If the proper naming convention is chosen, each author can have his own name space to play in, so no central authority is needed other than to allocate author names. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 12:40: 2 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BD38A37B401 for ; Tue, 8 Oct 2002 12:39:59 -0700 (PDT) Received: from vbook.express.ru (asplinux.ru [195.133.213.194]) by mx1.FreeBSD.org (Postfix) with ESMTP id 288C343E6E for ; Tue, 8 Oct 2002 12:39:59 -0700 (PDT) (envelope-from vova@express.ru) Received: from vova by vbook.express.ru with local (Exim 3.36 #1) id 17z0Cw-0000QX-00; Tue, 08 Oct 2002 23:39:54 +0400 Subject: Re: using mem above 4Gb was: swapon some regular file From: "Vladimir B. " Grebenschikov To: Nate Lawson Cc: arch@FreeBSD.org In-Reply-To: References: Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: quoted-printable X-Mailer: Ximian Evolution 1.0.7 Date: 08 Oct 2002 23:39:52 +0400 Message-Id: <1034105993.913.1.camel@vbook.express.ru> Mime-Version: 1.0 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG =F7 Tue, 08.10.2002, =D7 23:10, Nate Lawson =CE=C1=D0=C9=D3=C1=CC: > On 8 Oct 2002, Vladimir B. Grebenschikov wrote: > > =F7 Tue, 08.10.2002, =D7 19:50, Mikhail Teterin =CE=C1=D0=C9=D3=C1=CC: > > > On Tuesday 08 October 2002 11:41 am, Terry Lambert wrote: > > > =3D Terry Lambert wrote: > > > =3D > So whatever connections you are getting now... halve that, or l= ess, > > > =3D > to get a window for your RAM disk (you will need KVA for mappin= gs > > > =3D > for all the memory that *can* be in the window, etc.). > > > =3D=20 > > > =3D To emphasize this: if you are using 4K pages, you will need: > > > =3D=20 > > > =3D 4K/1M * 64G =3D 256M > > > =3D=20 > > > =3D ...1/4 of 1G of memory outside the window, just for page tables. > > > =3D=20 > > > =3D Also, if we still were using an mbuf per connection for the > > > =3D template, for 1,000,000 connections, that's 256M of RAM -- anothe= r > > > =3D 1/4 gig. > > > =20 > > > =3D Yeah, most people don't think in these terms; personally, I like > > > =3D to call it "Extreme BSD". 8-). > > >=20 > > > Although this is fascinating read -- it getting further and further a= way > > > from the original subject. And from the modified one too -- I don't > > > believe Vladimir said anything about networking... > >=20 > > Exactly, Terry is right about large number of relative-small > > network-access processes (say apaches). But there are some other cases, > > say you have some DB server with huge index, say 10Gb, I think keep > > index in RAM effective than on disk. >=20 > It's often surprisingly effective to just access the index on disk and > tune your VM cache instead. You can lose performance by double-caching > data. I don't want cache disk data in extra memory - simply store index in RAM (no disk access at all) - I think it must be faster. > -Nate --=20 Vladimir B. Grebenschikov vova@sw.ru, SWsoft, Inc. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 13:15:17 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EDF1737B401 for ; Tue, 8 Oct 2002 13:15:15 -0700 (PDT) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7D4E543E6A for ; Tue, 8 Oct 2002 13:15:15 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.5/8.12.4) with ESMTP id g98KFFPQ084626; Tue, 8 Oct 2002 13:15:15 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.5/8.12.4/Submit) id g98KFFrq084625; Tue, 8 Oct 2002 13:15:15 -0700 (PDT) (envelope-from dillon) Date: Tue, 8 Oct 2002 13:15:15 -0700 (PDT) From: Matthew Dillon Message-Id: <200210082015.g98KFFrq084625@apollo.backplane.com> To: "Vladimir B. " Grebenschikov Cc: Nate Lawson , arch@FreeBSD.ORG Subject: Database indexes and ram (was Re: using mem above 4Gb was: swapon some regular file) References: <1034105993.913.1.camel@vbook.express.ru> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :.. :> It's often surprisingly effective to just access the index on disk and :> tune your VM cache instead. You can lose performance by double-caching :> data. : :I don't want cache disk data in extra memory - simply store index in RAM :(no disk access at all) - I think it must be faster. : :> -Nate : :-- :Vladimir B. Grebenschikov :vova@sw.ru, SWsoft, Inc. If you have enough ram to hold the index, copying the index into anonymous memory will be no slower or faster then mmap()ing it into ram. If you do not have enough ram to hold the index then trying to store it in ram won't work. Database indexes, e.g. typically B+Trees or similar entities, are highly cacheable and designed to reduce the number of seek/reads required to do a lookup as much as possible. This tends to result in fairly good matching between our VM system and a fairly optimal caching of the index. For example, take a B+Tree with 64 elements per node and a database with 16 million records in it. 16 million records can be represented by four levels in the B+Tree. The first three levels (64*64* 64*sizeof(btreeelm)) = 262144 * sizeof(btreeelm), or, typically, less then 16 MB of data which the VM system will cache at a high priority due to the frequency of accesses. The last B+Tree level in this example represents the only seek/read that would have to occur on the disk (if you didn't have enough memory to hold the entire index). The only *PROBLEM* with using mmap() is that the database will not have a very good idea about whether a particular mapped memory location is resident or whether it will stall the process while doing a disk read, which can seriously impact multi-threaded access to the database. madvise() and mincore() can be used to some effect but that still means making system calls that one would rather not have to make. Still, mmap() can be used to good effect and I usually find it easier to use then having to write a userland shared memory disk cache manager. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 13:31:11 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3BC0F37B401 for ; Tue, 8 Oct 2002 13:31:09 -0700 (PDT) Received: from vbook.express.ru (asplinux.ru [195.133.213.194]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9B30E43E77 for ; Tue, 8 Oct 2002 13:31:08 -0700 (PDT) (envelope-from vova@express.ru) Received: from vova by vbook.express.ru with local (Exim 3.36 #1) id 17z10H-0000zm-00; Wed, 09 Oct 2002 00:30:53 +0400 Subject: Re: Database indexes and ram (was Re: using mem above 4Gb was: swapon some regular file) From: "Vladimir B. " Grebenschikov To: Matthew Dillon Cc: Nate Lawson , arch@FreeBSD.ORG In-Reply-To: <200210082015.g98KFFrq084625@apollo.backplane.com> References: <1034105993.913.1.camel@vbook.express.ru> <200210082015.g98KFFrq084625@apollo.backplane.com> Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: quoted-printable X-Mailer: Ximian Evolution 1.0.7 Date: 09 Oct 2002 00:30:52 +0400 Message-Id: <1034109053.913.7.camel@vbook.express.ru> Mime-Version: 1.0 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG =F7 Wed, 09.10.2002, =D7 00:15, Matthew Dillon =CE=C1=D0=C9=D3=C1=CC: > :.. > :> It's often surprisingly effective to just access the index on disk and > :> tune your VM cache instead. You can lose performance by double-cachin= g > :> data. > : > :I don't want cache disk data in extra memory - simply store index in RAM > :(no disk access at all) - I think it must be faster. > : > :> -Nate > : > :--=20 > :Vladimir B. Grebenschikov > :vova@sw.ru, SWsoft, Inc. >=20 > If you have enough ram to hold the index, copying the index into > anonymous memory will be no slower or faster then mmap()ing it into r= am. >=20 > If you do not have enough ram to hold the index then trying to store=20 > it in ram won't work. Mattew, please look at my initial posting. My idea is to extend ram available for storing such thing as index above 4Gb (actually about 3Gb) limit, if there more physical ram. Current mmap(read vm) implementation will map/cache only in memory below 4Gb not depending of amount of physical ram. > Database indexes, e.g. typically B+Trees or similar entities, are > highly cacheable and designed to reduce the number of seek/reads=20 > required to do a lookup as much as possible. This tends to result > in fairly good matching between our VM system and a fairly optimal > caching of the index. =20 >=20 > For example, take a B+Tree with 64 elements per node and a database w= ith > 16 million records in it. 16 million records can be represented by=20 > four levels in the B+Tree. The first three levels (64*64* > 64*sizeof(btreeelm)) =3D 262144 * sizeof(btreeelm), or, typically, > less then 16 MB of data which the VM system will cache at a high=20 > priority due to the frequency of accesses. The last B+Tree level in > this example represents the only seek/read that would have to occur o= n > the disk (if you didn't have enough memory to hold the entire index). >=20 > The only *PROBLEM* with using mmap() is that the database will not ha= ve > a very good idea about whether a particular mapped memory location is > resident or whether it will stall the process while doing a disk read= , > which can seriously impact multi-threaded access to the database. > madvise() and mincore() can be used to some effect but that still mea= ns > making system calls that one would rather not have to make. Still, > mmap() can be used to good effect and I usually find it easier to use > then having to write a userland shared memory disk cache manager. Agree, but see above. > -Matt --=20 Vladimir B. Grebenschikov vova@sw.ru, SWsoft, Inc. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 13:51:46 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9B12937B401 for ; Tue, 8 Oct 2002 13:51:45 -0700 (PDT) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3B3DD43E4A for ; Tue, 8 Oct 2002 13:51:45 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.5/8.12.4) with ESMTP id g98KpjPQ084794; Tue, 8 Oct 2002 13:51:45 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.5/8.12.4/Submit) id g98KpjU1084793; Tue, 8 Oct 2002 13:51:45 -0700 (PDT) (envelope-from dillon) Date: Tue, 8 Oct 2002 13:51:45 -0700 (PDT) From: Matthew Dillon Message-Id: <200210082051.g98KpjU1084793@apollo.backplane.com> To: "Vladimir B. " Grebenschikov Cc: Nate Lawson , arch@FreeBSD.ORG Subject: Re: Database indexes and ram (was Re: using mem above 4Gb was: swapon some regular file) References: <1034105993.913.1.camel@vbook.express.ru> <200210082015.g98KFFrq084625@apollo.backplane.com> <1034109053.913.7.camel@vbook.express.ru> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :Mattew, please look at my initial posting. My idea is to extend ram :available for storing such thing as index above 4Gb (actually about 3Gb) :limit, if there more physical ram. Current mmap(read vm) implementation :will map/cache only in memory below 4Gb not depending of amount of :physical ram. Well, this has been discussed before. The issue with accessing ram over 4GB, apart from the fact that the page tables double in size (you have to use 64 bit pte's instead of 32 bit pte's) is that DMAing to/from memory above 4GB can be rather tricky. This creates all sorts of problem including not necessarily being able to read() or write() above the 4G mark (in regards to physical ram) without a lot of mess in the OS .. bounce buffers redux, so to speak. So while it would be possible use such memory as unswappable, unIOable anonymous-only memory, such use would be fairly limited and might not be worth implementing for a 32 bit platform. At that point you might as well move to a 64 bit platform. It also might be more effective to spend that money on more ram for the RAID system backing the database rather then trying to bump the PC past the 4G mark, or spend that money on purchasing a second server and distributing the load across the two servers. The types of accesses to the index that might result in cacheable table data are also the types of accesses to the index that will likely result in cacheable index data. Using the same argument, the types of accesses that might result in an uncacheable index would also likely result in uncacheable table data which means you are going to run up against seek/read problems on the table data, making it more worthwhile to spend the money on beefing up the storage subsystem. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 15: 3:25 2002 Delivered-To: freebsd-arch@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 931) id 2A3D137B401; Tue, 8 Oct 2002 15:03:24 -0700 (PDT) Date: Tue, 8 Oct 2002 15:03:24 -0700 From: Juli Mallett To: Garrett Wollman Cc: arch@FreeBSD.ORG Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] Message-ID: <20021008150324.A47084@FreeBSD.org> References: <20021005002021.A14635@FreeBSD.org> <200210051816.g95IGu7K026880@khavrinen.lcs.mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200210051816.g95IGu7K026880@khavrinen.lcs.mit.edu>; from wollman@lcs.mit.edu on Sat, Oct 05, 2002 at 02:16:56PM -0400 Organisation: The FreeBSD Project X-Alternate-Addresses: , , , , X-Towel: Yes X-LiveJournal: flata, jmallett X-Negacore: Yes Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG * De: Garrett Wollman [ Data: 2002-10-05 ] [ Subjecte: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] ] > >The most notable change is that the most recently sent && lowest > >numbered signal is sent, in the normal course of events, rather than > >simply the lowest numbered or most recently sent. > > This still isn't right. Real-time signals are QUEUED -- i.e., signals > of the same species are delivered in FIFO, not LIFO, order. POSIX > further specifies that signal N will be delivered before signal N+k, > for SIGRTMIN <= N <= N+k <= SIGRTMAX. The relative delivery order of > any signals outside of this range is unspecified beyond the special > behavior of SIGCONT, SIGSTOP, and SIGKILL. OK, I'm reading through this stuff extensively. There's a number of kernel interfaces that I'd like to add, related to them, but first thing is to get the queueing in there, IMHO, so that the base functionality is there to be built on. sigqueue() for example is about 10LOC with this stuff, and adding 'si_errno' stuff (which I'll love to have around) is just a matter of 4 lines of code wherever it can be used, once I've added a supportable in-kernel abstraction of psignal that takes a ksi, and does the normal sanity checks. That will make psignal about 12LOC, given that there's about 2LOC more than sigqueue() needed, as most of that is allocation and filling out a structure. So assuming the FIFO behaviour is fixed, and that I also deliver the lowest available signal, and given that I plan to implement the above, do you have ny further objections? Other than the issue of the bitmask, which I see no easy and reliable method for getting around cleanly... And the failure cases. Would you settle for me using subr_sigq.c as my abstraction, and making actual queues optional, and having it use sigset_t under certain circumstances? It will add about 8LOC to every sendsig() to support pulling out the information when no ksiginfo is around. Thanks, juli. -- Juli Mallett | FreeBSD: The Power To Serve Will break world for fulltime employment. | finger jmallett@FreeBSD.org http://people.FreeBSD.org/~jmallett/ | Support my FreeBSD hacking! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 15:35:20 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 773AD37B401 for ; Tue, 8 Oct 2002 15:35:19 -0700 (PDT) Received: from avocet.mail.pas.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0FBC343E6E for ; Tue, 8 Oct 2002 15:35:19 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0137.cvx22-bradley.dialup.earthlink.net ([209.179.198.137] helo=mindspring.com) by avocet.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17z2wZ-0000lT-00; Tue, 08 Oct 2002 15:35:11 -0700 Message-ID: <3DA35D58.B1B5D78D@mindspring.com> Date: Tue, 08 Oct 2002 15:34:00 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Nate Lawson Cc: "Vladimir B. Grebenschikov" , arch@FreeBSD.org Subject: Re: using mem above 4Gb was: swapon some regular file References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Nate Lawson wrote: > On 8 Oct 2002, Vladimir B. Grebenschikov wrote: > > Exactly, Terry is right about large number of relative-small > > network-access processes (say apaches). But there are some other cases, > > say you have some DB server with huge index, say 10Gb, I think keep > > index in RAM effective than on disk. > > It's often surprisingly effective to just access the index on disk and > tune your VM cache instead. You can lose performance by double-caching > data. PSE-36 and PAE give you access to a 36 bit address space. But you are still limited to a 32 bit *linear* address space. More RAM in a 32 bit machine, even if you can wave the appropriate entrails over the keyboard so that it's accessible to the OS, will *NOT* increase the linear address space. IMO, if you want a larger linear address space, instead of pretending you have one, buy yourself an IA64 instead. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 17:26:59 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6E6D537B401 for ; Tue, 8 Oct 2002 17:26:58 -0700 (PDT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id B80B443E6A for ; Tue, 8 Oct 2002 17:26:57 -0700 (PDT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id UAA26465 for ; Tue, 8 Oct 2002 20:26:57 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.6/8.9.1) id g990QRE53055; Tue, 8 Oct 2002 20:26:27 -0400 (EDT) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15779.30643.68675.577669@grasshopper.cs.duke.edu> Date: Tue, 8 Oct 2002 20:26:27 -0400 (EDT) To: freebsd-arch@freebsd.org Subject: lp64 vs lp32 printf X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG What's the accepted way to printf something (like sizeof()) which boils down to "unsigned int" on x86 and "unsigned long" on the LP64 platforms? I'm trying to fix alpha LINT, which so far is mosly stuff like: cc1: warnings being treated as errors ../../../dev/aic7xxx/aic79xx.c: In function `ahd_alloc': ../../../dev/aic7xxx/aic79xx.c:4208: warning: unsigned int format, different type arg (arg 3) ../../../dev/aic7xxx/aic79xx.c:4208: warning: unsigned int format, different type arg (arg 4) if ((ahd_debug & AHD_SHOW_MEMORY) != 0) { printf("%s: scb size = 0x%x, hscb size - 0x%x\n", ahd_name(ahd), sizeof(struct scb), sizeof(struct hardware_scb)); } I'm tempted to change the formats to 0x%lx and cast the args to (unsigned long) and be done with it. Is that correct? %j and uintmax_t? Thanks, Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 17:38:46 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8B25337B401 for ; Tue, 8 Oct 2002 17:38:45 -0700 (PDT) Received: from espresso.q9media.com (espresso.q9media.com [65.39.129.122]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3085543E4A for ; Tue, 8 Oct 2002 17:38:45 -0700 (PDT) (envelope-from mike@espresso.q9media.com) Received: by espresso.q9media.com (Postfix, from userid 1002) id 7B92A9C0B; Tue, 8 Oct 2002 20:31:20 -0400 (EDT) Date: Tue, 8 Oct 2002 20:31:20 -0400 From: Mike Barcroft To: Andrew Gallatin Cc: freebsd-arch@freebsd.org Subject: Re: lp64 vs lp32 printf Message-ID: <20021008203120.K97120@espresso.q9media.com> References: <15779.30643.68675.577669@grasshopper.cs.duke.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <15779.30643.68675.577669@grasshopper.cs.duke.edu>; from gallatin@cs.duke.edu on Tue, Oct 08, 2002 at 08:26:27PM -0400 Organization: The FreeBSD Project Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Andrew Gallatin writes: > > What's the accepted way to printf something (like sizeof()) which > boils down to "unsigned int" on x86 and "unsigned long" on the LP64 > platforms? In userland you can use %z for printing size_t's. In the kernel, casting to intmax_t/uintmax_t and using %j is correct. Best regards, Mike Barcroft To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 18:22:40 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0C67737B401 for ; Tue, 8 Oct 2002 18:22:39 -0700 (PDT) Received: from mail.speakeasy.net (mail11.speakeasy.net [216.254.0.211]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4782543E75 for ; Tue, 8 Oct 2002 18:22:38 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 6837 invoked from network); 9 Oct 2002 01:22:39 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail11.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 9 Oct 2002 01:22:39 -0000 Received: from laptop.baldwin.cx (laptop.baldwin.cx [192.168.0.4]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id g991Man5007959; Tue, 8 Oct 2002 21:22:36 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20021008203120.K97120@espresso.q9media.com> Date: Tue, 08 Oct 2002 21:22:40 -0400 (EDT) From: John Baldwin To: Mike Barcroft Subject: Re: lp64 vs lp32 printf Cc: freebsd-arch@freebsd.org, Andrew Gallatin Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 09-Oct-2002 Mike Barcroft wrote: > Andrew Gallatin writes: >> >> What's the accepted way to printf something (like sizeof()) which >> boils down to "unsigned int" on x86 and "unsigned long" on the LP64 >> platforms? > > In userland you can use %z for printing size_t's. In the kernel, > casting to intmax_t/uintmax_t and using %j is correct. We could add '%z' to the kernel and change whatever hack %z DDB is using in db_printf() to be some other letter. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 18:26:24 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 29E9737B401; Tue, 8 Oct 2002 18:26:23 -0700 (PDT) Received: from ebb.errno.com (ebb.errno.com [66.127.85.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id B189143E75; Tue, 8 Oct 2002 18:26:22 -0700 (PDT) (envelope-from sam@errno.com) Received: from melange (melange.errno.com [66.127.85.82]) (authenticated bits=0) by ebb.errno.com (8.12.5/8.12.1) with ESMTP id g991QC1H012001 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Tue, 8 Oct 2002 18:26:13 -0700 (PDT)?g (envelope-from sam@errno.com)œ X-Authentication-Warning: ebb.errno.com: Host melange.errno.com [66.127.85.82] claimed to be melange Message-ID: <1f3601c26f32$daf85170$52557f42@errno.com> From: "Sam Leffler" To: "SUZUKI Shinsuke" Cc: "Julian Elischer" , , , References: <13e901c26dbb$63059f60$52557f42@errno.com> Subject: Re: CFR: m_tag patch Date: Tue, 8 Oct 2002 18:26:12 -0700 Organization: Errno Consulting MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > But from the KAME maintenance point of view I request only one thing > regarding this. > > Please keep the m_tag design same as OpenBSD (i.e. I'd like to > avoid changes such as change of member name in m_tag > structure, behavior-change etc, to prevent inessential merging > effort among *BSDs) > > If this condition is accepted, then we have no problem in adopting > m_tag in FreeBSD-KAME. > (as far as I looked though, there is no necessity of such change in > m_tag architecture, but if you have some difficulties, please let me > know.) > It will be compatible at the source level. That was the intention from the start. Sam To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 19: 1:21 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8F70537B401; Tue, 8 Oct 2002 19:01:20 -0700 (PDT) Received: from espresso.q9media.com (espresso.q9media.com [65.39.129.122]) by mx1.FreeBSD.org (Postfix) with ESMTP id 335F443E42; Tue, 8 Oct 2002 19:01:20 -0700 (PDT) (envelope-from mike@espresso.q9media.com) Received: by espresso.q9media.com (Postfix, from userid 1002) id ADF539C0B; Tue, 8 Oct 2002 21:53:55 -0400 (EDT) Date: Tue, 8 Oct 2002 21:53:55 -0400 From: Mike Barcroft To: John Baldwin Cc: freebsd-arch@freebsd.org, Andrew Gallatin Subject: Re: lp64 vs lp32 printf Message-ID: <20021008215355.O97120@espresso.q9media.com> References: <20021008203120.K97120@espresso.q9media.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ; from jhb@FreeBSD.org on Tue, Oct 08, 2002 at 09:22:40PM -0400 Organization: The FreeBSD Project Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG John Baldwin writes: > On 09-Oct-2002 Mike Barcroft wrote: > > Andrew Gallatin writes: > >> > >> What's the accepted way to printf something (like sizeof()) which > >> boils down to "unsigned int" on x86 and "unsigned long" on the LP64 > >> platforms? > > > > In userland you can use %z for printing size_t's. In the kernel, > > casting to intmax_t/uintmax_t and using %j is correct. > > We could add '%z' to the kernel and change whatever hack %z DDB is > using in db_printf() to be some other letter. This would be ideal. Best regards, Mike Barcroft To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 19:16:47 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C70AF37B401; Tue, 8 Oct 2002 19:16:46 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1C00A43E6A; Tue, 8 Oct 2002 19:16:46 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Received: from mousie.catspoiler.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.5/8.12.5) with ESMTP id g992GYvU037201; Tue, 8 Oct 2002 19:16:38 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Message-Id: <200210090216.g992GYvU037201@gw.catspoiler.org> Date: Tue, 8 Oct 2002 19:16:34 -0700 (PDT) From: Don Lewis Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] To: jmallett@FreeBSD.ORG Cc: arch@FreeBSD.ORG In-Reply-To: <20021005002021.A14635@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 5 Oct, Juli Mallett wrote: > To > accomodate situations where allocation of a 'ksiginfo' is a failure > mode (no memory), the destination process is told to exit via a new > member of 'struct proc', p_suicide, which tells a process to kill itself > next time it goes through userret. It is done this way to prevent a > recursive failure case, and to prevent possibly dying with extraneous > locks held, as signals are sent from odd places of the kernel. Another problem with p_suicide is that the process exit status will be incorrect. If the process dies because of the receipt of a signal, the exit status should contain the signal number. I like Garrett's suggestion of keeping the bitmap. There's no sense in queuing up information for any signals that will terminate the process before it can retrieve the additional information, whether the signal is uncatchable like SIGKILL, or because the process has not supplied a handler and the default action is to terminate the process. That will prevent kernel memory from being uselessly consumed by a user leaning on ^c in an attempt to kill a process stuck in an uninterruptable wait. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 19:27:21 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B60BC37B401 for ; Tue, 8 Oct 2002 19:27:20 -0700 (PDT) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0BBCC43E3B for ; Tue, 8 Oct 2002 19:27:20 -0700 (PDT) (envelope-from jeff@freebsd.org) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g992RJa38681 for ; Tue, 8 Oct 2002 22:27:19 -0400 (EDT) (envelope-from jeff@freebsd.org) X-Authentication-Warning: mail.chesapeake.net: jroberson owned process doing -bs Date: Tue, 8 Oct 2002 22:27:19 -0400 (EDT) From: Jeff Roberson X-X-Sender: jroberson@mail.chesapeake.net To: arch@freebsd.org Subject: Scheduler framework. Message-ID: <20021008221856.L35572-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I have mostly finished writing a new scheduler for FreeBSD. In the process I have modularized the scheduling decicions and broken the scheduler up into an API. It has been done in such a way that the scheduler could be chosen at compile time. My diff is available at http://www.chesapeake.net/~jroberson/sched.patch. It is a mostly complete reimplementation of the old scheduler on this new api. The old scheduler has some empty stubs that my new scheduler uses. My new scheduler is not included in this diff. This diff isn't intended to be complete. I'm looking for a design review not a code review. At present it does not even boot. I will be fixing it shortly. I'd like to get an idea of whether or not it's too late in 5.0 to get this in. If it is, I won't pursue it any further, and I'll save my new scheduler for another day. If not, I'd like to get my new scheduler in as an option for 5.0. Any feedback is welcome. Style bugs should be submitted in private messages so we don't clog the lists. Thanks, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 19:56:16 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3E16A37B401 for ; Tue, 8 Oct 2002 19:56:15 -0700 (PDT) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 839C543E42 for ; Tue, 8 Oct 2002 19:56:14 -0700 (PDT) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g992uEo49763 for ; Tue, 8 Oct 2002 22:56:14 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Tue, 8 Oct 2002 22:56:13 -0400 (EDT) From: Jeff Roberson Cc: arch@freebsd.org Subject: Re: Scheduler framework. In-Reply-To: <20021008221856.L35572-100000@mail.chesapeake.net> Message-ID: <20021008225520.R44108-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > > My diff is available at http://www.chesapeake.net/~jroberson/sched.patch. > I fubared the diff. I fixed it and uploaded it again. My machine is running with this patch applied. Thanks, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 20: 0:22 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6CDD937B401; Tue, 8 Oct 2002 20:00:21 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id E4F6943E88; Tue, 8 Oct 2002 20:00:20 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Received: from mousie.catspoiler.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.5/8.12.5) with ESMTP id g9930BvU037277; Tue, 8 Oct 2002 20:00:15 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Message-Id: <200210090300.g9930BvU037277@gw.catspoiler.org> Date: Tue, 8 Oct 2002 20:00:11 -0700 (PDT) From: Don Lewis Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] To: jmallett@FreeBSD.ORG Cc: bright@mu.org, wollman@lcs.mit.edu, arch@FreeBSD.ORG In-Reply-To: <20021005192012.A87801@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 5 Oct, Juli Mallett wrote: > I prefer a generalised approach to all signals, especially since SunOS > (ok, Solaris) sends "sender" information for all signals (istr), and > that furthermore SIGCHLD contains information about what child died. In Solaris it looks like the "sender" information is only sent if the sender is using sigqueue() and the target has set the SA_SIGINFO flag for the signal. If the target has not set SA_SIGINFO, it looks like sigqueue() behaves like kill() and probably just sets a flag. Also, if the target has set SA_SIGINFO, sigqueue() can fail with EAGAIN if the target has more than a certain number of queued signals pending or of the system is out of resources. It looks like if the sender of the signal used kill() that the only valid information in the siginfo structure may be the signal number. This is a lot easier to implement than a more general scheme, since sigueue is a syscall so the locking and resource allocation issues should be a lot easier to deal with, as opposed to having to deal with these issues in all the other places in the kernel that send signals. SIGCHLD can cheat and peek at the list of zombies. > Very useful stuff, IMO. Yup, it can be. It would also sometimes be useful to know which file descriptor was responsible for a SIGIO, but that would be really nasty on the kernel side ... To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 20: 6: 2 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5C22E37B401; Tue, 8 Oct 2002 20:06:01 -0700 (PDT) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id 23DCC43E42; Tue, 8 Oct 2002 20:06:01 -0700 (PDT) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id 001042A88D; Tue, 8 Oct 2002 20:06:00 -0700 (PDT) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Jeff Roberson Cc: arch@freebsd.org Subject: Re: Scheduler framework. In-Reply-To: <20021008221856.L35572-100000@mail.chesapeake.net> Date: Tue, 08 Oct 2002 20:06:00 -0700 From: Peter Wemm Message-Id: <20021009030601.001042A88D@canning.wemm.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Jeff Roberson wrote: > I have mostly finished writing a new scheduler for FreeBSD. In the > process I have modularized the scheduling decicions and broken the > scheduler up into an API. It has been done in such a way that the > scheduler could be chosen at compile time. I'm greatly relieved that somebody is taking the time to sit down and figure out the places that are necessary to add the hooks so that this can be neatly encapsulated. To me, that's far more interesting than the actual scheduler changes itself. I've always wanted to try out a variation of a table based scheduler, but the existing one was so well entrenched all over the place that it wasn't funny. I know lots of other folks want to tinker with this stuff too, but nobody has seriously proposed (that I remember seeing) doing the encapsulation without imposing their new scheduler as well. Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 20: 9:21 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 91B8937B401; Tue, 8 Oct 2002 20:09:19 -0700 (PDT) Received: from rwcrmhc51.attbi.com (rwcrmhc51.attbi.com [204.127.198.38]) by mx1.FreeBSD.org (Postfix) with ESMTP id 374EA43E6A; Tue, 8 Oct 2002 20:09:19 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc51.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021009030013.NVWB18356.rwcrmhc51.attbi.com@InterJet.elischer.org>; Wed, 9 Oct 2002 03:00:13 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id TAA43107; Tue, 8 Oct 2002 19:43:01 -0700 (PDT) Date: Tue, 8 Oct 2002 19:43:00 -0700 (PDT) From: Julian Elischer To: Jeff Roberson Cc: arch@freebsd.org Subject: Re: Scheduler framework. In-Reply-To: <20021008221856.L35572-100000@mail.chesapeake.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, 8 Oct 2002, Jeff Roberson wrote: > I have mostly finished writing a new scheduler for FreeBSD. In the > process I have modularized the scheduling decicions and broken the > scheduler up into an API. It has been done in such a way that the > scheduler could be chosen at compile time. More work needs to be done to modularise the scheduler. KSE is making this more difficult but is also making it more necessary. People can no-longer read it usn "understand it" in order to tweek it for expereimentation.. it needs to eb mor emodular so that people can just throw it away and replace it.. (e.g. what Luigi is up to). BTW sorry, I just checked in a bunch of KSE changes that may collide a bit with your patch.. probably you'll have some merging to do. > > My diff is available at http://www.chesapeake.net/~jroberson/sched.patch. > It is a mostly complete reimplementation of the old scheduler on this new > api. The old scheduler has some empty stubs that my new scheduler uses. > My new scheduler is not included in this diff. yeah I noticed.. where is it? how does it cope with threads vs Processes? > > This diff isn't intended to be complete. I'm looking for a design review > not a code review. At present it does not even boot. I will be fixing it > shortly. I'd like to get an idea of whether or not it's too late in 5.0 > to get this in. If it is, I won't pursue it any further, and I'll save my > new scheduler for another day. If not, I'd like to get my new scheduler > in as an option for 5.0. Can we have a 'rough outline" here on -arch of your ideas? (BTW where are you?) > > Any feedback is welcome. Style bugs should be submitted in private > messages so we don't clog the lists. > > Thanks, > Jeff > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 20:18:37 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9EC3537B401; Tue, 8 Oct 2002 20:18:36 -0700 (PDT) Received: from carp.icir.org (carp.icir.org [192.150.187.71]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4BE6F43E65; Tue, 8 Oct 2002 20:18:36 -0700 (PDT) (envelope-from rizzo@carp.icir.org) Received: from carp.icir.org (localhost [127.0.0.1]) by carp.icir.org (8.12.3/8.12.3) with ESMTP id g993IYO2044525; Tue, 8 Oct 2002 20:18:34 -0700 (PDT) (envelope-from rizzo@carp.icir.org) Received: (from rizzo@localhost) by carp.icir.org (8.12.3/8.12.3/Submit) id g993IYXU044524; Tue, 8 Oct 2002 20:18:34 -0700 (PDT) (envelope-from rizzo) Date: Tue, 8 Oct 2002 20:18:34 -0700 From: Luigi Rizzo To: Peter Wemm Cc: Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Scheduler framework. Message-ID: <20021008201834.A44413@carp.icir.org> References: <20021008221856.L35572-100000@mail.chesapeake.net> <20021009030601.001042A88D@canning.wemm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20021009030601.001042A88D@canning.wemm.org>; from peter@wemm.org on Tue, Oct 08, 2002 at 08:06:00PM -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, Oct 08, 2002 at 08:06:00PM -0700, Peter Wemm wrote: ... > I've always wanted to try out a variation of a table based scheduler, but > the existing one was so well entrenched all over the place that it wasn't > funny. I know lots of other folks want to tinker with this stuff too, but > nobody has seriously proposed (that I remember seeing) doing the > encapsulation without imposing their new scheduler as well. well i don't know if you were talking about me, but in july we went along the exact same lines, trying to abstract the scheduler interface so that one could replace the stock one with something else. Given that our code lets you switch between schedulers at runtime, i wouldn't exactly call that "impose their new schedulers" :) I still have to look at jeff's patches in detail (and he said he has to do the same with mine :) but the only immediate difference i can see is the fact that his work applies to -current whereas mine applies to -stable. More after i study his code. Refs: http://www.FreeBSD.org/cgi/getmsg.cgi?fetch=708498+729070+/usr/local/www/db/text/2002/freebsd-stable/20020721.freebsd-stable (the code in the patch is slightly buggy, we have a much more robust version now). cheers luigi To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 20:22:33 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 60FA637B401 for ; Tue, 8 Oct 2002 20:22:31 -0700 (PDT) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id BE32443E3B for ; Tue, 8 Oct 2002 20:22:30 -0700 (PDT) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g993MRO59995; Tue, 8 Oct 2002 23:22:27 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Tue, 8 Oct 2002 23:22:27 -0400 (EDT) From: Jeff Roberson To: Julian Elischer Cc: arch@freebsd.org Subject: Re: Scheduler framework. In-Reply-To: Message-ID: <20021008231137.O44108-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, 8 Oct 2002, Julian Elischer wrote: > > More work needs to be done to modularise the scheduler. > KSE is making this more difficult but is also making it more necessary. > People can no-longer read it usn "understand it" in order to > tweek it for expereimentation.. > it needs to eb mor emodular so that people > can just throw it away and replace it.. > (e.g. what Luigi is up to). Yes, I agree, It seems to be getting closer though. The only way to really nail down the api is to look at the requirements of a few schedulers and implement it in a way that satisfies them all. > > BTW sorry, I just checked in a bunch of KSE changes that may > collide a bit with your patch.. probably you'll have some merging to do. > No worries. > > > yeah I noticed.. where is it? how does it cope with threads vs > Processes? I will post it soon. It's similar to the solaris/linux schedulers with a few exceptions. The base user priority is still assigned to the kseg, as is the fixed nice value. The kseg also records how many ticks each of it's kse's have slept for voluntarily. This value is decayed for every tick that they are awake. It is used to determine the interactivity and dynamic priority of the kseg. Each thread inherits a priority from it's kesg and that priority is adjusted as it enters/exits the kernel and through priority inheritance. Threads should eventually be bound to kse's and only migrated if a kse is starved. THis is essential for KSE to cpu affinity. KSEs get a time slice that is determined by the kseg's priority and interactivity. When this time slice expires the priority and new slice is calculated based on the behavior of all kses in the group. KSEs are bound to a particular cpu and are only migrated when one cpu's load is way out of balance. There are per cpu run queues, but there is still only one global sched lock. I did this for simplicity's sake. It's too late in 5.0 to risk complications such as that. This also makes migration a no brainer. It is, O(1) as well. That seems to be quite fashionable these days. > > Can we have a 'rough outline" here on -arch of your ideas? Well, the API in sched.h pretty much says it all. It's intended to be as minimal as possible. I hooked all the places that a scheduler might be interested in adjusting priorities and turned that into an api. Much of that was just figuring out where my scheduler and the old scheduler wanted to do different things. > > (BTW where are you?) > Seattle To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 21:42: 2 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 713FC37B401; Tue, 8 Oct 2002 21:42:01 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id F02A443E4A; Tue, 8 Oct 2002 21:42:00 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Received: from mousie.catspoiler.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.5/8.12.5) with ESMTP id g994fqvU037422; Tue, 8 Oct 2002 21:41:56 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Message-Id: <200210090441.g994fqvU037422@gw.catspoiler.org> Date: Tue, 8 Oct 2002 21:41:52 -0700 (PDT) From: Don Lewis Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., To: jhb@FreeBSD.org Cc: arch@FreeBSD.org, jmallett@FreeBSD.org In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 8 Oct, John Baldwin wrote: > > On 08-Oct-2002 Don Lewis wrote: >> It looks like we've got a potential lock order reversal problem, though. >> In fork1() we grab ppeers_lock while holding a couple of PROC_LOCKs, >> while in the first part of exit1() we grab ppeers_lock before PROC_LOCK. >> My caffeine level is insufficient to judge whether P_WEXIT checking >> would save us in practice. > > Bah, fixed the reversal, thanks. We still need the P_WEXIT check in > fork1() since otherwise a new peer or child could be added after we > have finished going through the entire list. Hmm, adding this is ugly > though b/c we really need to check after we acquire the ppeers_lock and > do the actual hookup. Hmm, we can move the RFTHREAD stuff a lot earlier > and then this isn't such a big deal. Ok, I've updated the patch again. Looks good. > One note: I've got a question about how to handle the error condition > in that case in fork1(). I'm really starting to think that instead of > returning an error, the peer process should just go ahead and call > exit1() in this case since it is about to be killed anyways. I pretty much agree. I would worry about the process doing something bogus based on the wierd error returned by fork() before it is finally killed off. Calling exit1() from within fork() is kind of icky, though. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 23: 0:13 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AB3A437B401 for ; Tue, 8 Oct 2002 23:00:10 -0700 (PDT) Received: from rwcrmhc53.attbi.com (rwcrmhc53.attbi.com [204.127.198.39]) by mx1.FreeBSD.org (Postfix) with ESMTP id 544D043E6E for ; Tue, 8 Oct 2002 23:00:10 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc53.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021009060010.SUGO12956.rwcrmhc53.attbi.com@InterJet.elischer.org>; Wed, 9 Oct 2002 06:00:10 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id WAA43780; Tue, 8 Oct 2002 22:44:30 -0700 (PDT) Date: Tue, 8 Oct 2002 22:44:29 -0700 (PDT) From: Julian Elischer To: Jeff Roberson Cc: arch@freebsd.org Subject: Re: Scheduler framework. In-Reply-To: <20021008231137.O44108-100000@mail.chesapeake.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, 8 Oct 2002, Jeff Roberson wrote: > I will post it soon. It's similar to the solaris/linux schedulers with a > few exceptions. The base user priority is still assigned to the kseg, as > is the fixed nice value. The kseg also records how many ticks each of > it's kse's have slept for voluntarily. This value is decayed for every > tick that they are awake. It is used to determine the interactivity and > dynamic priority of the kseg. fair enough... we are presently keeping some of this in the kse and some in the ksegrp. > > Each thread inherits a priority from it's kesg and that priority is > adjusted as it enters/exits the kernel and through priority inheritance. that's basically what we do now.. > Threads should eventually be bound to kse's and only migrated if a kse is > starved. THis is essential for KSE to cpu affinity. That's basically the plan, but it's rather a complicated scenario. Presently KSEs are the vehicle via which threads are scheduled, thereby stopping a process with a lot of threads from swamping the run queues. Currently there is no binding between threads (unless they are specifically bound) and KSEs and between KSEs and CPUs, It was envisionned that at some time in the future some 'affinity' would be added. > > KSEs get a time slice that is determined by the kseg's priority and > interactivity. When this time slice expires the priority and new slice is > calculated based on the behavior of all kses in the group. KSEs are bound > to a particular cpu and are only migrated when one cpu's load is way out > of balance. I was thinking about that.. my thought was that there is litle point in moving a KSE to a processor that already has a KSE from that group. just migrate the threads to the KSE that is on the processor that has time.. However Since then I have decided that in a heavily loaded system, it may be worth shifting 2 KSEs to the same processor if it has other work and neither would run for 50% anyhow. > > There are per cpu run queues, but there is still only one global sched > lock. I did this for simplicity's sake. It's too late in 5.0 to risk > complications such as that. This also makes migration a no brainer. > > It is, O(1) as well. That seems to be quite fashionable these days. > > > > > Can we have a 'rough outline" here on -arch of your ideas? > > Well, the API in sched.h pretty much says it all. It's intended to be as > minimal as possible. I hooked all the places that a scheduler might be > interested in adjusting priorities and turned that into an api. Much of > that was just figuring out where my scheduler and the old scheduler wanted > to do different things. Have you taken into account the KSE loaning that just went into the tree in the last week..? It's needed to stop deadlocks and starvation in some common cases. > > > > > (BTW where are you?) > > > Seattle > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 23:13:46 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4FBAB37B401; Tue, 8 Oct 2002 23:13:45 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 33A6343E4A; Tue, 8 Oct 2002 23:13:44 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id QAA13743; Wed, 9 Oct 2002 16:13:42 +1000 Date: Wed, 9 Oct 2002 16:23:46 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Mike Barcroft Cc: Andrew Gallatin , Subject: Re: lp64 vs lp32 printf In-Reply-To: <20021008203120.K97120@espresso.q9media.com> Message-ID: <20021009161756.E4040-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, 8 Oct 2002, Mike Barcroft wrote: > Andrew Gallatin writes: > > > > What's the accepted way to printf something (like sizeof()) which > > boils down to "unsigned int" on x86 and "unsigned long" on the LP64 > > platforms? > > In userland you can use %z for printing size_t's. In the kernel, > casting to intmax_t/uintmax_t and using %j is correct. Um, using intmax_t to print size_t's would be incorrect, since it is signed. Using uintmax_t would be bloat. Very few typedefed types need the full bloat of [u]intmax_t, and size_t is unlikely to become one of them before casting it to uintmax_t to print it becomes a style bug in the kernel too (when %z is implemented). To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Oct 8 23:15: 7 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 13FDA37B406 for ; Tue, 8 Oct 2002 23:15:04 -0700 (PDT) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 816E343E65 for ; Tue, 8 Oct 2002 23:15:03 -0700 (PDT) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g996Ex818482; Wed, 9 Oct 2002 02:14:59 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Wed, 9 Oct 2002 02:14:59 -0400 (EDT) From: Jeff Roberson To: Julian Elischer Cc: arch@freebsd.org Subject: Re: Scheduler framework. In-Reply-To: Message-ID: <20021009020755.N44108-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, 8 Oct 2002, Julian Elischer wrote: > > On Tue, 8 Oct 2002, Jeff Roberson wrote: > > > I will post it soon. It's similar to the solaris/linux schedulers with a > > few exceptions. The base user priority is still assigned to the kseg, as > > is the fixed nice value. The kseg also records how many ticks each of > > it's kse's have slept for voluntarily. This value is decayed for every > > tick that they are awake. It is used to determine the interactivity and > > dynamic priority of the kseg. > > > fair enough... we are presently keeping some of this in the kse and some > in the ksegrp. Yeah, I have some changes in proc.h that are a little messy. I'm going to have to figure out how to stuff scheduler dependant data in there. I don't caculate pctcpu right now. That's going to be my biggest problem with this I fear. > > > > > Each thread inherits a priority from it's kesg and that priority is > > adjusted as it enters/exits the kernel and through priority inheritance. > > that's basically what we do now.. Yep. > > > Threads should eventually be bound to kse's and only migrated if a kse is > > starved. THis is essential for KSE to cpu affinity. > > That's basically the plan, but it's rather a complicated > scenario. Presently KSEs are the vehicle via which threads are > scheduled, thereby stopping a process with a lot of threads from > swamping the run queues. Currently there is no binding between threads > (unless they are specifically bound) and KSEs and between KSEs and > CPUs, It was envisionned that at some time in the future > some 'affinity' would be added. > Yes, I'm going to have to do that to complete the scheduler. For now I'm only implementing the KSE to cpu affinity. The rest can fall into place later. This will at least give us good processor affinity in the single threaded case. > > > > KSEs get a time slice that is determined by the kseg's priority and > > interactivity. When this time slice expires the priority and new slice is > > calculated based on the behavior of all kses in the group. KSEs are bound > > to a particular cpu and are only migrated when one cpu's load is way out > > of balance. > > I was thinking about that.. my thought was that > there is litle point in moving a KSE to a processor that already has a > KSE from that group. just migrate the threads to the KSE that is on the > processor that has time.. However Since then I have decided that in a > heavily loaded system, it may be worth shifting 2 KSEs to the same > processor if it has other work and neither would run for 50% anyhow. I think that you should move other work onto the lightly loaded processor and not 2 KSEs from the same process onto the same processor. KSEs are what defines the parallelism of the application. Having two in the same kseg on the same processor defeats that. Do you have any plans to implement cpu binding? > > > > > There are per cpu run queues, but there is still only one global sched > > lock. I did this for simplicity's sake. It's too late in 5.0 to risk > > complications such as that. This also makes migration a no brainer. > > > > It is, O(1) as well. That seems to be quite fashionable these days. > > > > > > > > Can we have a 'rough outline" here on -arch of your ideas? > > > > Well, the API in sched.h pretty much says it all. It's intended to be as > > minimal as possible. I hooked all the places that a scheduler might be > > interested in adjusting priorities and turned that into an api. Much of > > that was just figuring out where my scheduler and the old scheduler wanted > > to do different things. > > Have you taken into account the KSE loaning that just went into the > tree in the last week..? It's needed to stop deadlocks and starvation > in some common cases. > Nope, I haven't looked into that. It sounds like it would just make performance a little worse by forcing a thread to temporarily migrate. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 1:20:14 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 92CAE37B401 for ; Wed, 9 Oct 2002 01:20:11 -0700 (PDT) Received: from rwcrmhc52.attbi.com (rwcrmhc52.attbi.com [216.148.227.88]) by mx1.FreeBSD.org (Postfix) with ESMTP id D474843E42 for ; Wed, 9 Oct 2002 01:20:10 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc52.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021009082010.UJTF2722.rwcrmhc52.attbi.com@InterJet.elischer.org>; Wed, 9 Oct 2002 08:20:10 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id BAA44329; Wed, 9 Oct 2002 01:04:48 -0700 (PDT) Date: Wed, 9 Oct 2002 01:04:47 -0700 (PDT) From: Julian Elischer To: Jeff Roberson Cc: arch@freebsd.org Subject: Re: Scheduler framework. In-Reply-To: <20021009020755.N44108-100000@mail.chesapeake.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, Jeff Roberson wrote: [...] > what defines the parallelism of the application. Having two in the same > kseg on the same processor defeats that. Do you have any plans to > implement cpu binding? I have plans to plan it.. :-) > > > to do different things. > > > > Have you taken into account the KSE loaning that just went into the > > tree in the last week..? It's needed to stop deadlocks and starvation > > in some common cases. > > > > Nope, I haven't looked into that. It sounds like it would just make > performance a little worse by forcing a thread to temporarily migrate. No. that's not what it is.. consider.. thread A gets some resourse (X) and then blocks, causing an upcall that resuls in thread B running... thread B becomes BOUND to it's KSE (through one of several posible methods.. ) and then needs resource X. howeve A cannot proceed because the KSE is bound to B. KSE loaning allows the KSE to be given to A to complete its operation within the kernel, thereby allowing B to eventually be able to run again and claim back the KSE... basically teh life of a thread in a syscall is: If a thread mailbox is not provided: Thread is considered BOUND. blocking will not disassociate the KSE from the thread. While the thread is on the sleep queue, the KSE is still pointed to by the thread, and to visa versa (except for loaning..(see later).) When the thread is restarted, it continues with the same KSE (* that is important) and returns to userland directly as per normal. If a thread mailbox IS provided: the syscall is entered. the thread blocks. A second thread is invoked and attached to the KSE, which is disconnected from the original thread. the new thread is set to do an upcall, and since it must not create any more threads if IT blocks an dsince IT does not have a thread mailbox, the new thread and the KSE are bound together. The upcall goes to the user boundary. In thread_userret() the ksegrp is scanned for any runnable threads that need a KSE to complete. The upcalling thread is held to one side while the KSE is applied to each completing thread in turn, and they write their exit status back to their individual mailboxes. After the last one is finished, teh upcall is allowed to complete, and reports ALL the completed syscalls to the userland scheduler. At some time in the future the original thread is awoken and cannot proceed due to lack of KSE. THe nect time a KSE is available or henext time one tries to go to userland, the same scheme as described above happens and It aquires a KSE for long enough to run back o the user boundary and write it's completion status back to the mailbox. it then hands the KSE back t the owner who will upcall and report the completed thread. Note upcalls are BOUND BOUND threads Lend their KSEs when they block or cross to userland When there is no work to do, a borroed KSE reverts to it's "Owner" which continues (unless it is otherwise blocked). The lender will never restart with some other KSE while its KSE is lent out. A bound thread will ALWAYs use a particular KSE and can NEVER swap KSEs. An unbound thread can swap KSEs any time it likes. and when blocked, the KSe is free to run other threads EVEN to USERLAND. When it has no work to do it becomes idle. (on Idle queue) A Borrowed KSE can never go to userland. When it has no work to do it reverts to the owner thread and tries to run that. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 7:57:17 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7E77637B4DB for ; Wed, 9 Oct 2002 07:57:06 -0700 (PDT) Received: from qmail.anet.net.th (qmail.anet.net.th [203.148.255.86]) by mx1.FreeBSD.org (Postfix) with SMTP id 1FF9543F0C for ; Wed, 9 Oct 2002 07:56:55 -0700 (PDT) (envelope-from MZ00080315@anet.net.th) Received: (qmail 58201 invoked by uid 0); 9 Oct 2002 14:56:46 -0000 Received: from unknown (HELO ME) (210.203.184.240) by qmail.anet.net.th with SMTP; 9 Oct 2002 14:56:46 -0000 From: MZ00080315@anet.net.th Subject:22:19:01 - ¢Íú¡Ç¹ mail ¢Í§¤Ø³ ªèÇ·ÓẺÊͺ¶ÒÁ ¢Íº¤Ø³ÁÒ¡¤èÐ - 22:19:01 X-Priority: 1 (Highest) Reply-To: MZ00080315@anet.net.th X-Mailer: Microsoft Outlook Express 5.00.2615.200 MIME-Version: 1.0 Content-type: multipart/mixed; boundary="#MYBOUNDARY#" Message-Id: <20021009145657.1FF9543F0C@mx1.FreeBSD.org> Date: Wed, 9 Oct 2002 07:56:57 -0700 (PDT) To: undisclosed-recipients: ; Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --#MYBOUNDARY# Content-Type: text/plain; charset=ansi Content-Transfer-Encoding: 8bit ¨Ò¡ÃÒ¡ÒÃàÁ×ͧä·ÂÇѹ¹Õé ·ÓãË龺ÇèÒ ÁÕ¤¹·ÕèÁջѭËÒà¡ÕèÂǡѺ¡ÒÃÊØ¢ÀÒ¾à»ç¹¨Ó¹Ç¹ÁÒ¡ ẺÊͺ¶ÒÁ¹ÕéÍÒ¨ªèǤسä´é áµè¶éҤسà»ç¹¤¹·ÕèÁÕÊØ¢ÀÒ¾´ÕÍÂÙèáÅéÇ ¡çäÁè¨Óà»ç¹µéͧµÍºáººÊͺ¶ÒÁ¤èРẺÊͺ¶ÒÁ¡ÒôÙáÅÊØ¢ÀÒ¾ áÅÐäÅ¿ìÊäµÅì 1. »Ñ­ËÒÊØ¢ÀÒ¾¢Í§¤Ø³¤×Í /¹éÓ˹ѡÁÒ¡ µéͧ¡ÒÃÅ´ .......¡ÔâÅ/ ¤ÇÒÁ´Ñ¹ÊÙ§-µèÓ / àºÒËÇÒ¹ / ¤ÅÍàÃÊàµÍÃÍÅ - ä¤Ã¡ÃÕà«ÍÃìäŹì / Í×è¹ æ ............................................................................. 2. ã¹áµèÅÐÇѹ¤Ø³ÃѺ»Ãзҹ¤Ãº 3 Á×éÍËÃ×ÍäÁè / ãªè / äÁè ..................Á×éÍ / Çѹ 3. ¤èÒÍÒËÒÃáµèÅÐÁ×éÍ·Õè¤Ø³ÃѺ»Ãзҹ»ÃÐÁÒ³ÃÒ¤ÒÁ×éÍÅСÕèºÒ· / 20-25 ºÒ· / 30-40 / 50-60 / ÁÒ¡¡ÇèÒ 60 ............................................................................. 4. »Ñ¨¨ØºÑ¹¤Ø³´ÙáÅÊØ¢ÀÒ¾ÍÂèÒ§äà / ÍÍ¡¡ÓÅѧ¡ÒÂÊÁèÓàÊÁÍ / äÁèãÊèã¨ã¹¡ÒôÙáÅÊØ¢ÀÒ¾ / ·Ò¹ÍÒËÒÃàÊÃÔÁÂÕèËéÍ / Í×è¹ æ ............................................................................. 5. ¤Ø³à¤Â¤Çº¤ØÁ¹éÓ˹ѡÁÒ¡è͹ËÃ×ÍäÁè / à¤Â (ä´é¼Å-äÁèä´é¼Å) / äÁèà¤Â ............................................................................. 6. ¤Ø³¨ÃÔ§¨Ñ§¡Ñº¡Ò÷Õè¨ÐÁÕÃÙ»ÃèÒ§ áÅÐÊØ¢ÀÒ¾·Õè´ÕËÃ×ÍäÁè / ¨ÃÔ§¨Ñ§ / äÁè¨ÃÔ§¨Ñ§ ÊÓËÃѺ¼Ùé·Õè¨ÃÔ§¨Ñ§·Õè¨ÐÁÕÃÙ»ÃèÒ§ áÅÐÊØ¢ÀÒ¾·Õè´Õ àÃÒ¢Íá¹Ð¹Ó ¼ÅÔµÀѳ±ìâ»Ãá¡ÃÁÍÒËÒÃÊÙµÃÊÁ´ØÅÂì¨Ò¡ÊÁعä¾Ã¸ÃÃÁªÒµÔ ªèÇÂá¡é»Ñ­ËÒ¹éÓ˹ѡ ÊÑ´Êèǹ áÅÐÊØ¢ÀÒ¾ ·Õèµé¹à赯 ÍØ´Á´éÇÂÊÒÃÍÒËÒúÃÔÊØ·¸Ôì¤Ãº 5 ËÁÙè ·Ò¹á·¹Á×éÍÍÒËÒû¡µÔ 1 - 2 Á×é͵èÍÇѹ ä´éÃѺ¡ÒÃÃѺÃͧ¨Ò¡ ÍÂ. 54 »ÃÐà·È äÁèãªèÂÒÅ´¹éÓ˹ѡ äÁèµéͧʹÍÒËÒà äÁèÁռŢéÒ§à¤Õ§ ÃѺ»ÃСѹ¤ÇÒÁ¾Íã¨ÀÒÂã¹ 1 à´×͹ ´éÇÂÃкº¤×¹à§Ô¹ 100% ¤Ø³Ê¹ã¨·Õè¨Ð·ÃÒº¢éÍÁÙÅà¡ÕèÂǡѺâ»Ãá¡ÃÁâÀª¹Ò¡ÒôÙáÅÃÙ»ÃèÒ§ áÅÐÊØ¢ÀÒ¾ËÃ×ÍäÁè / ʹ㨠/ äÁèʹ㨠............................................................................. ¡Ã³Õ·Õèʹ㨠àÃÒ¨ÐÁÕà¨éÒ˹éÒ·ÕèµÔ´µèÍ¡ÅѺä»ËҤسÀÒÂã¹ 48 ªÁ. à¾×èÍãËéÃÒÂÅÐàÍÕ´áÅШѴÊè§àÍ¡ÊÒÿÃÕ ª×èÍ-¹ÒÁÊ¡ØÅ ...........................................ÍÒªÕ¾ ........................... ÍÒÂØ .......................... ÊèǹÊÙ§ ...........................¹éÓ˹ѡ ............... â·ÃÈѾ·ì ......................... àÇÅÒ·ÕèÊдǡ㹡ÒõԴµèÍ ..................... ·ÕèÍÂÙè ·Õè¨ÐãËé¨Ñ´Êè§àÍ¡ÊÒÃãËé¿ÃÕ .................... Please unsubscribe sent mail to MZ00080315@anet.net.th --#MYBOUNDARY#-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 9:33:22 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 562E437B404 for ; Wed, 9 Oct 2002 09:33:21 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [217.73.193.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id E524E43E4A for ; Wed, 9 Oct 2002 09:33:19 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is (is.stack.net [217.73.193.40]) by park.rambler.ru (8.11.6/8.9.3) with ESMTP id g99GXC075605; Wed, 9 Oct 2002 20:33:13 +0400 (MSD) (envelope-from is@rambler-co.ru) Date: Wed, 9 Oct 2002 20:33:12 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Julian Elischer Cc: arch@FreeBSD.ORG Subject: Re: Scheduler framework. In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, Julian Elischer wrote: > If a thread mailbox IS provided: > the syscall is entered. > the thread blocks. A second thread is invoked and attached > to the KSE, which is disconnected from the original thread. Sorry, where did this second thread come from ? > the new thread is set to do an upcall, and since it must not > create any more threads if IT blocks an dsince IT does not have a thread > mailbox, the new thread and the KSE are bound together. > The upcall goes to the user boundary. In thread_userret() the > ksegrp is scanned for any runnable threads that need a KSE to complete. > The upcalling thread is held to one side while the KSE is applied to > each completing thread in turn, and they write their > exit status back to their individual mailboxes. After the last one > is finished, teh upcall is allowed to complete, and reports ALL the > completed syscalls to the userland scheduler. > At some time in the future the original thread is awoken > and cannot proceed due to lack of KSE. THe nect time a KSE is available > or henext time one tries to go to userland, the same scheme as described > above happens and It aquires a KSE for long enough to > run back o the user boundary and write it's completion status back to > the mailbox. it then hands the KSE back t the owner who will upcall > and report the completed thread. Igor Sysoev http://sysoev.ru To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 10:20:33 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D730C37B401 for ; Wed, 9 Oct 2002 10:20:32 -0700 (PDT) Received: from sccrmhc03.attbi.com (sccrmhc03.attbi.com [204.127.202.63]) by mx1.FreeBSD.org (Postfix) with ESMTP id C592B43E65 for ; Wed, 9 Oct 2002 10:20:31 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by sccrmhc03.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021009172030.CQJJ20316.sccrmhc03.attbi.com@InterJet.elischer.org>; Wed, 9 Oct 2002 17:20:30 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id KAA46397; Wed, 9 Oct 2002 10:03:30 -0700 (PDT) Date: Wed, 9 Oct 2002 10:03:29 -0700 (PDT) From: Julian Elischer To: Igor Sysoev Cc: arch@FreeBSD.ORG Subject: Re: Scheduler framework. In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, Igor Sysoev wrote: > On Wed, 9 Oct 2002, Julian Elischer wrote: > > > If a thread mailbox IS provided: > > the syscall is entered. > > the thread blocks. A second thread is invoked and attached > > to the KSE, which is disconnected from the original thread. > > Sorry, where did this second thread come from ? there is a thread_allocator that allocates threads on demand. Actually the process ahs a couple of spare threads "Up its sleave" so it doesn't have to go to teh thread allocator every time.. > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 10:29:58 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 87D3837B401 for ; Wed, 9 Oct 2002 10:29:57 -0700 (PDT) Received: from mail.speakeasy.net (mail12.speakeasy.net [216.254.0.212]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1DB9943E6E for ; Wed, 9 Oct 2002 10:29:57 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 4005 invoked from network); 9 Oct 2002 17:29:56 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail12.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 9 Oct 2002 17:29:56 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id g99HTsn5011428; Wed, 9 Oct 2002 13:29:54 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Wed, 09 Oct 2002 13:29:58 -0400 (EDT) From: John Baldwin To: Julian Elischer Subject: Re: Scheduler framework. Cc: arch@FreeBSD.ORG, Igor Sysoev Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 09-Oct-2002 Julian Elischer wrote: > > > On Wed, 9 Oct 2002, Igor Sysoev wrote: > >> On Wed, 9 Oct 2002, Julian Elischer wrote: >> >> > If a thread mailbox IS provided: >> > the syscall is entered. >> > the thread blocks. A second thread is invoked and attached >> > to the KSE, which is disconnected from the original thread. >> >> Sorry, where did this second thread come from ? > > there is a thread_allocator that allocates threads on demand. > > Actually the process ahs a couple of spare threads "Up its sleave" > so it doesn't have to go to teh thread allocator every time.. Which kind of defeats the point of letting the slab allocator manage memory from a larger whole-view perspective. :-P -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 10:31:15 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3DA3537B401; Wed, 9 Oct 2002 10:31:14 -0700 (PDT) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id 04F6343E42; Wed, 9 Oct 2002 10:31:10 -0700 (PDT) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id 9D1862A88D; Wed, 9 Oct 2002 10:31:06 -0700 (PDT) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Bruce Evans Cc: Mike Barcroft , Andrew Gallatin , freebsd-arch@FreeBSD.ORG Subject: Re: lp64 vs lp32 printf In-Reply-To: <20021009161756.E4040-100000@gamplex.bde.org> Date: Wed, 09 Oct 2002 10:31:06 -0700 From: Peter Wemm Message-Id: <20021009173106.9D1862A88D@canning.wemm.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Bruce Evans wrote: > On Tue, 8 Oct 2002, Mike Barcroft wrote: > > > Andrew Gallatin writes: > > > > > > What's the accepted way to printf something (like sizeof()) which > > > boils down to "unsigned int" on x86 and "unsigned long" on the LP64 > > > platforms? > > > > In userland you can use %z for printing size_t's. In the kernel, > > casting to intmax_t/uintmax_t and using %j is correct. > > Um, using intmax_t to print size_t's would be incorrect, since it is > signed. Using uintmax_t would be bloat. Very few typedefed types > need the full bloat of [u]intmax_t, and size_t is unlikely to become > one of them before casting it to uintmax_t to print it becomes a style > bug in the kernel too (when %z is implemented). Bring it on! The sooner %z gets here the better. The only problem is that gcc has been taught that %z means something different in the kernel. :-( Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 10:31:46 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6222B37B401; Wed, 9 Oct 2002 10:31:44 -0700 (PDT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id AF2F043E42; Wed, 9 Oct 2002 10:31:40 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.12.4/8.12.4) with SMTP id g99HVAOo021184; Wed, 9 Oct 2002 13:31:11 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Wed, 9 Oct 2002 13:31:10 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Juli Mallett Cc: Garrett Wollman , arch@FreeBSD.org Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] In-Reply-To: <20021008150324.A47084@FreeBSD.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, 8 Oct 2002, Juli Mallett wrote: > * De: Garrett Wollman [ Data: 2002-10-05 ] > [ Subjecte: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] ] > > >The most notable change is that the most recently sent && lowest > > >numbered signal is sent, in the normal course of events, rather than > > >simply the lowest numbered or most recently sent. > > > > This still isn't right. Real-time signals are QUEUED -- i.e., signals > > of the same species are delivered in FIFO, not LIFO, order. POSIX > > further specifies that signal N will be delivered before signal N+k, > > for SIGRTMIN <= N <= N+k <= SIGRTMAX. The relative delivery order of > > any signals outside of this range is unspecified beyond the special > > behavior of SIGCONT, SIGSTOP, and SIGKILL. > > OK, I'm reading through this stuff extensively. There's a number of > kernel interfaces that I'd like to add, related to them, but first thing > is to get the queueing in there, IMHO, so that the base functionality is > there to be built on. sigqueue() for example is about 10LOC with this > stuff, and adding 'si_errno' stuff (which I'll love to have around) is > just a matter of 4 lines of code wherever it can be used, once I've > added a supportable in-kernel abstraction of psignal that takes a ksi, > and does the normal sanity checks. > > That will make psignal about 12LOC, given that there's about 2LOC more > than sigqueue() needed, as most of that is allocation and filling out a > structure. Lines of code is not a good measure of complexity, especially when what you're doing is moving and introducing complexity in other bits of the code. I appreciate that this improves the abstractions some, but psignal() is actually not all that terrible. > So assuming the FIFO behaviour is fixed, and that I also deliver the > lowest available signal, and given that I plan to implement the above, > do you have ny further objections? > > Other than the issue of the bitmask, which I see no easy and reliable > method for getting around cleanly... And the failure cases. Would you > settle for me using subr_sigq.c as my abstraction, and making actual > queues optional, and having it use sigset_t under certain circumstances? > It will add about 8LOC to every sendsig() to support pulling out the > information when no ksiginfo is around. Signal queues involve failures. If at all possible, I'd like us to use a strategy that: (1) Avoids the failure modes of signal queues in critical places (i.e., termination of a process that is consuming too many resources, such as in OOM). (2) Avoids the failure modes of signal queues in situations where access to the signal data is not critical (i.e., if the receiving process isn't requesting information on signals, don't store it -- I don't know if the POSIX API supports this semantic though). (3) Leaves the failure mode semantics up to the caller, so that the caller can decide if the signal delivery attempt is something worth retrying or just ignoring. The Linux behavior I looked at (and told you about) is that the actual signal queueing routines return EAGAIN if the slab allocation fails, permitting the caller to retry if it wants, or more likely, simply drop the signal. I believe this is how Linux handles slab allocator failures for things like SIGIO, SIGCHLD, etc. In terms of strategy for supporting a changed to queued signals, my recommendation would be that you go ahead and implement the POSIX realtime signals based on your structural changes in a local tree and make sure the structural changes end up doing what you need. Then present the whole bundle as one big patch on arch@, along with an indication of how the elements of the commit relate. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 10:33:32 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CDB9537B401 for ; Wed, 9 Oct 2002 10:33:31 -0700 (PDT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2AD5F43E65 for ; Wed, 9 Oct 2002 10:33:31 -0700 (PDT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id NAA23894; Wed, 9 Oct 2002 13:33:30 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.6/8.9.1) id g99HX0954344; Wed, 9 Oct 2002 13:33:00 -0400 (EDT) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15780.26700.615985.133379@grasshopper.cs.duke.edu> Date: Wed, 9 Oct 2002 13:33:00 -0400 (EDT) To: Peter Wemm Cc: freebsd-arch@FreeBSD.ORG Subject: Re: lp64 vs lp32 printf In-Reply-To: <20021009173106.9D1862A88D@canning.wemm.org> References: <20021009161756.E4040-100000@gamplex.bde.org> <20021009173106.9D1862A88D@canning.wemm.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Peter Wemm writes: > > > > Um, using intmax_t to print size_t's would be incorrect, since it is > > signed. Using uintmax_t would be bloat. Very few typedefed types > > need the full bloat of [u]intmax_t, and size_t is unlikely to become > > one of them before casting it to uintmax_t to print it becomes a style > > bug in the kernel too (when %z is implemented). > > Bring it on! The sooner %z gets here the better. The only problem is that > gcc has been taught that %z means something different in the kernel. :-( Where is gcc taught these things? Can we update it? Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 10:41:57 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4BF9A37B401; Wed, 9 Oct 2002 10:41:56 -0700 (PDT) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id B4E7C43E65; Wed, 9 Oct 2002 10:41:55 -0700 (PDT) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id 95088AE2B7; Wed, 9 Oct 2002 10:41:55 -0700 (PDT) Date: Wed, 9 Oct 2002 10:41:55 -0700 From: Alfred Perlstein To: John Baldwin Cc: Julian Elischer , arch@FreeBSD.ORG, Igor Sysoev Subject: Re: Scheduler framework. Message-ID: <20021009174155.GJ95327@elvis.mu.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG * John Baldwin [021009 10:30] wrote: > > On 09-Oct-2002 Julian Elischer wrote: > > > > > > On Wed, 9 Oct 2002, Igor Sysoev wrote: > > > >> On Wed, 9 Oct 2002, Julian Elischer wrote: > >> > >> > If a thread mailbox IS provided: > >> > the syscall is entered. > >> > the thread blocks. A second thread is invoked and attached > >> > to the KSE, which is disconnected from the original thread. > >> > >> Sorry, where did this second thread come from ? > > > > there is a thread_allocator that allocates threads on demand. > > > > Actually the process ahs a couple of spare threads "Up its sleave" > > so it doesn't have to go to teh thread allocator every time.. > > Which kind of defeats the point of letting the slab allocator manage > memory from a larger whole-view perspective. :-P Kind of, but not entirely, since one can guarantee exclusive access to a private pool and therefor doesn't need locks. I'd be nice if there was a macro or something to do this in a official sanctioned API. -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 11: 4:21 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 096FC37B401 for ; Wed, 9 Oct 2002 11:04:20 -0700 (PDT) Received: from mail.speakeasy.net (mail16.speakeasy.net [216.254.0.216]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3788843E4A for ; Wed, 9 Oct 2002 11:04:19 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 8084 invoked from network); 9 Oct 2002 18:04:19 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail16.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 9 Oct 2002 18:04:19 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id g99I4Fn5011539; Wed, 9 Oct 2002 14:04:16 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <15780.26700.615985.133379@grasshopper.cs.duke.edu> Date: Wed, 09 Oct 2002 14:04:19 -0400 (EDT) From: John Baldwin To: Andrew Gallatin Subject: Re: lp64 vs lp32 printf Cc: freebsd-arch@FreeBSD.ORG, Peter Wemm Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 09-Oct-2002 Andrew Gallatin wrote: > > Peter Wemm writes: > > > > > > Um, using intmax_t to print size_t's would be incorrect, since it is > > > signed. Using uintmax_t would be bloat. Very few typedefed types > > > need the full bloat of [u]intmax_t, and size_t is unlikely to become > > > one of them before casting it to uintmax_t to print it becomes a style > > > bug in the kernel too (when %z is implemented). > > > > Bring it on! The sooner %z gets here the better. The only problem is that > > gcc has been taught that %z means something different in the kernel. :-( > > Where is gcc taught these things? Can we update it? We should be able to change the kernel %z to some other weird letter. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 11:11:49 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 241EC37B401; Wed, 9 Oct 2002 11:11:48 -0700 (PDT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 62F3943E4A; Wed, 9 Oct 2002 11:11:47 -0700 (PDT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id OAA25403; Wed, 9 Oct 2002 14:11:47 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.6/8.9.1) id g99IBGL54381; Wed, 9 Oct 2002 14:11:17 -0400 (EDT) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15780.28996.936657.152472@grasshopper.cs.duke.edu> Date: Wed, 9 Oct 2002 14:11:16 -0400 (EDT) To: John Baldwin Cc: freebsd-arch@FreeBSD.org, Peter Wemm Subject: Re: lp64 vs lp32 printf In-Reply-To: References: <15780.26700.615985.133379@grasshopper.cs.duke.edu> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG John Baldwin writes: > > On 09-Oct-2002 Andrew Gallatin wrote: > > > > Peter Wemm writes: > > > > > > > > Um, using intmax_t to print size_t's would be incorrect, since it is > > > > signed. Using uintmax_t would be bloat. Very few typedefed types > > > > need the full bloat of [u]intmax_t, and size_t is unlikely to become > > > > one of them before casting it to uintmax_t to print it becomes a style > > > > bug in the kernel too (when %z is implemented). > > > > > > Bring it on! The sooner %z gets here the better. The only problem is that > > > gcc has been taught that %z means something different in the kernel. :-( > > > > Where is gcc taught these things? Can we update it? > > We should be able to change the kernel %z to some other weird letter. Sure.. but do you know where in the sources %z is defined to be something weird? Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 11:20:30 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A13F837B401 for ; Wed, 9 Oct 2002 11:20:29 -0700 (PDT) Received: from mail.speakeasy.net (mail14.speakeasy.net [216.254.0.214]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3469843E75 for ; Wed, 9 Oct 2002 11:20:29 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 6928 invoked from network); 9 Oct 2002 18:20:28 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail14.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 9 Oct 2002 18:20:28 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id g99IKPn5011602; Wed, 9 Oct 2002 14:20:26 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Wed, 09 Oct 2002 14:20:29 -0400 (EDT) From: John Baldwin To: John Baldwin Subject: Re: lp64 vs lp32 printf Cc: Peter Wemm , freebsd-arch@FreeBSD.ORG, Andrew Gallatin Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 09-Oct-2002 John Baldwin wrote: > > On 09-Oct-2002 Andrew Gallatin wrote: >> >> Peter Wemm writes: >> > > >> > > Um, using intmax_t to print size_t's would be incorrect, since it is >> > > signed. Using uintmax_t would be bloat. Very few typedefed types >> > > need the full bloat of [u]intmax_t, and size_t is unlikely to become >> > > one of them before casting it to uintmax_t to print it becomes a style >> > > bug in the kernel too (when %z is implemented). >> > >> > Bring it on! The sooner %z gets here the better. The only problem is that >> > gcc has been taught that %z means something different in the kernel. :-( >> >> Where is gcc taught these things? Can we update it? > > We should be able to change the kernel %z to some other weird letter. Actually, nothing in the kernel uses %z. It is a version of %x that allows for a sign (e.g. -0x10 instead of 0xfffffff0, or +0x10). -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 11:30:31 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4512137B401 for ; Wed, 9 Oct 2002 11:30:30 -0700 (PDT) Received: from mail.speakeasy.net (mail17.speakeasy.net [216.254.0.217]) by mx1.FreeBSD.org (Postfix) with ESMTP id BD9E943E77 for ; Wed, 9 Oct 2002 11:30:29 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 21218 invoked from network); 9 Oct 2002 18:30:29 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail17.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 9 Oct 2002 18:30:29 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id g99IUPn5011649; Wed, 9 Oct 2002 14:30:25 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <15780.28996.936657.152472@grasshopper.cs.duke.edu> Date: Wed, 09 Oct 2002 14:30:29 -0400 (EDT) From: John Baldwin To: Andrew Gallatin Subject: Re: lp64 vs lp32 printf Cc: Peter Wemm , freebsd-arch@FreeBSD.org Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 09-Oct-2002 Andrew Gallatin wrote: > > John Baldwin writes: > > > > On 09-Oct-2002 Andrew Gallatin wrote: > > > > > > Peter Wemm writes: > > > > > > > > > > Um, using intmax_t to print size_t's would be incorrect, since it is > > > > > signed. Using uintmax_t would be bloat. Very few typedefed types > > > > > need the full bloat of [u]intmax_t, and size_t is unlikely to become > > > > > one of them before casting it to uintmax_t to print it becomes a style > > > > > bug in the kernel too (when %z is implemented). > > > > > > > > Bring it on! The sooner %z gets here the better. The only problem is that > > > > gcc has been taught that %z means something different in the kernel. :-( > > > > > > Where is gcc taught these things? Can we update it? > > > > We should be able to change the kernel %z to some other weird letter. > > Sure.. but do you know where in the sources %z is defined to be > something weird? sys/kern/subr_prf.c in the kernel, and in the -fformat-extensions local patches stuff for gcc. I think the gcc work wouldn't be too difficult to do since it would just be renaming a letter. Hmm, I was incorrect (my grep re was busted) and %z is actually used in two places in ddb. We can either pick a letter to use or just use %x with explicit signs in those two cases: ddb/db_examine.c: db_printf("%-*lz", width, (long)value); ddb/db_examine.c: db_printf("%8lz", (long)addr); Hmm, the second case doesn't even use a sign so it can be %x anyways. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 11:39:25 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BE58237B407 for ; Wed, 9 Oct 2002 11:39:23 -0700 (PDT) Received: from mail.speakeasy.net (mail17.speakeasy.net [216.254.0.217]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7664443E88 for ; Wed, 9 Oct 2002 11:39:20 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 28130 invoked from network); 9 Oct 2002 18:39:21 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail17.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 9 Oct 2002 18:39:21 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id g99IdIn5011662; Wed, 9 Oct 2002 14:39:18 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Wed, 09 Oct 2002 14:39:22 -0400 (EDT) From: John Baldwin To: John Baldwin Subject: Re: lp64 vs lp32 printf Cc: freebsd-arch@FreeBSD.org, Peter Wemm , Andrew Gallatin Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 09-Oct-2002 John Baldwin wrote: > > On 09-Oct-2002 Andrew Gallatin wrote: >> >> John Baldwin writes: >> > >> > On 09-Oct-2002 Andrew Gallatin wrote: >> > > >> > > Peter Wemm writes: >> > > > > >> > > > > Um, using intmax_t to print size_t's would be incorrect, since it is >> > > > > signed. Using uintmax_t would be bloat. Very few typedefed types >> > > > > need the full bloat of [u]intmax_t, and size_t is unlikely to become >> > > > > one of them before casting it to uintmax_t to print it becomes a style >> > > > > bug in the kernel too (when %z is implemented). >> > > > >> > > > Bring it on! The sooner %z gets here the better. The only problem is that >> > > > gcc has been taught that %z means something different in the kernel. :-( >> > > >> > > Where is gcc taught these things? Can we update it? >> > >> > We should be able to change the kernel %z to some other weird letter. >> >> Sure.. but do you know where in the sources %z is defined to be >> something weird? > > sys/kern/subr_prf.c in the kernel, and in the -fformat-extensions local > patches stuff for gcc. I think the gcc work wouldn't be too difficult > to do since it would just be renaming a letter. Hmm, I was incorrect > (my grep re was busted) and %z is actually used in two places in ddb. > We can either pick a letter to use or just use %x with explicit signs > in those two cases: > > ddb/db_examine.c: db_printf("%-*lz", width, (long)value); > ddb/db_examine.c: db_printf("%8lz", (long)addr); > > Hmm, the second case doesn't even use a sign so it can be %x anyways. And the first one doesn't use the '+' modifier either so it can just be converted to use '%x' as well. Hmm, more likely is that probably these two places should be using '+z' instead of just 'z'. So, maybe 'y' instead of 'z'? -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 11:52:58 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8F18E37B401 for ; Wed, 9 Oct 2002 11:52:57 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [217.73.193.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0AC0643E75 for ; Wed, 9 Oct 2002 11:52:56 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is (is.stack.net [217.73.193.40]) by park.rambler.ru (8.11.6/8.9.3) with ESMTP id g99Iqm079966; Wed, 9 Oct 2002 22:52:48 +0400 (MSD) (envelope-from is@rambler-co.ru) Date: Wed, 9 Oct 2002 22:52:48 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Julian Elischer Cc: arch@FreeBSD.ORG Subject: Re: Scheduler framework. In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, Julian Elischer wrote: > > > If a thread mailbox IS provided: > > > the syscall is entered. > > > the thread blocks. A second thread is invoked and attached > > > to the KSE, which is disconnected from the original thread. > > > > Sorry, where did this second thread come from ? > > there is a thread_allocator that allocates threads on demand. > > Actually the process ahs a couple of spare threads "Up its sleave" > so it doesn't have to go to teh thread allocator every time.. As I understand this second thread has user-level context and its context pointed by tm_context.uc_link of blocked thread mailbox. Am I right ? If it's so then can several threads use the same second thread context ? Igor Sysoev http://sysoev.ru To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 12:11:10 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BF22237B401 for ; Wed, 9 Oct 2002 12:11:09 -0700 (PDT) Received: from rootlabs.com (root.org [67.118.192.226]) by mx1.FreeBSD.org (Postfix) with SMTP id 659EF43E4A for ; Wed, 9 Oct 2002 12:11:09 -0700 (PDT) (envelope-from nate@rootlabs.com) Received: (qmail 14803 invoked by uid 1000); 9 Oct 2002 19:11:10 -0000 Date: Wed, 9 Oct 2002 12:11:10 -0700 (PDT) From: Nate Lawson To: Bruce Evans Cc: freebsd-arch@FreeBSD.ORG Subject: Re: lp64 vs lp32 printf In-Reply-To: <20021009161756.E4040-100000@gamplex.bde.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, Bruce Evans wrote: > On Tue, 8 Oct 2002, Mike Barcroft wrote: > > > Andrew Gallatin writes: > > > > > > What's the accepted way to printf something (like sizeof()) which > > > boils down to "unsigned int" on x86 and "unsigned long" on the LP64 > > > platforms? > > > > In userland you can use %z for printing size_t's. In the kernel, > > casting to intmax_t/uintmax_t and using %j is correct. > > Um, using intmax_t to print size_t's would be incorrect, since it is > signed. Using uintmax_t would be bloat. Very few typedefed types > need the full bloat of [u]intmax_t, and size_t is unlikely to become > one of them before casting it to uintmax_t to print it becomes a style > bug in the kernel too (when %z is implemented). Ok, so back to Drew's original question. What's the accepted way (both kernel and user)? -Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 12:16:17 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D4A5237B401 for ; Wed, 9 Oct 2002 12:16:16 -0700 (PDT) Received: from rootlabs.com (root.org [67.118.192.226]) by mx1.FreeBSD.org (Postfix) with SMTP id A1E2E43E42 for ; Wed, 9 Oct 2002 12:16:16 -0700 (PDT) (envelope-from nate@rootlabs.com) Received: (qmail 14819 invoked by uid 1000); 9 Oct 2002 19:16:17 -0000 Date: Wed, 9 Oct 2002 12:16:17 -0700 (PDT) From: Nate Lawson To: arch@freebsd.org Subject: slab allocator performance? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG John Baldwin wrote: >>On 09-Oct-2002 Julian Elischer wrote: >> there is a thread_allocator that allocates threads on demand. >> >> Actually the process ahs a couple of spare threads "Up its sleave" >> so it doesn't have to go to teh thread allocator every time.. > >Which kind of defeats the point of letting the slab allocator manage >memory from a larger whole-view perspective. :-P I've written a driver that used to use a private struct pool. It mallocs/frees about 120 bytes per 32KB transaction. I'm curious how others are using the allocator and what kind of performance/usage it is best at. -Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 12:20:42 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0D17C37B401 for ; Wed, 9 Oct 2002 12:20:41 -0700 (PDT) Received: from sccrmhc01.attbi.com (sccrmhc01.attbi.com [204.127.202.61]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7D1B743E3B for ; Wed, 9 Oct 2002 12:20:40 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by sccrmhc01.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021009192039.GOPB29655.sccrmhc01.attbi.com@InterJet.elischer.org>; Wed, 9 Oct 2002 19:20:39 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id MAA46976; Wed, 9 Oct 2002 12:05:47 -0700 (PDT) Date: Wed, 9 Oct 2002 12:05:44 -0700 (PDT) From: Julian Elischer To: Igor Sysoev Cc: arch@FreeBSD.ORG Subject: Re: Scheduler framework. In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, Igor Sysoev wrote: > On Wed, 9 Oct 2002, Julian Elischer wrote: > > > > > If a thread mailbox IS provided: > > > > the syscall is entered. > > > > the thread blocks. A second thread is invoked and attached > > > > to the KSE, which is disconnected from the original thread. > > > > > > Sorry, where did this second thread come from ? > > > > there is a thread_allocator that allocates threads on demand. > > > > Actually the process ahs a couple of spare threads "Up its sleave" > > so it doesn't have to go to teh thread allocator every time.. > > As I understand this second thread has user-level context and > its context pointed by tm_context.uc_link of blocked thread mailbox. > Am I right ? The first thread has a user context.. the second thread has no user context, so we manufacture one. This makes the second thread jump into user space, much like exec() jumps in to user space, except that we don't jump to start() (or whatever it is called) but to the userland thread scheduler. We have a special stack for that upcall as well. In userspace the second thread will convert to another thread, with a different stack. When it does a syscall it is now in the same category as thread 1. if it blocks, then: if thread 1 is ready to run we will switch back to thread 1. It will run until it wants to go back to userland at which point it writes its context back to its mailbox and allows an upcall to happen. If thread 1 was NOT ready to run then we will allocate thread 3 which will upcall on the upcall stack, (as it is not being used any more). tm_context is where a completed syscall writes its state before an upcall reports that it has finished. > > If it's so then can several threads use the same second thread context ? no > > Igor Sysoev > http://sysoev.ru > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 12:30:34 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6E62B37B401 for ; Wed, 9 Oct 2002 12:30:32 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id EE81C43E4A for ; Wed, 9 Oct 2002 12:30:30 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id FAA19401; Thu, 10 Oct 2002 05:30:23 +1000 Date: Thu, 10 Oct 2002 05:40:30 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Andrew Gallatin Cc: Peter Wemm , Subject: Re: lp64 vs lp32 printf In-Reply-To: <15780.26700.615985.133379@grasshopper.cs.duke.edu> Message-ID: <20021010051056.C6361-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, Andrew Gallatin wrote: > Peter Wemm writes: > > > > > > Um, using intmax_t to print size_t's would be incorrect, since it is > > > signed. Using uintmax_t would be bloat. Very few typedefed types > > > need the full bloat of [u]intmax_t, and size_t is unlikely to become > > > one of them before casting it to uintmax_t to print it becomes a style > > > bug in the kernel too (when %z is implemented). > > > > Bring it on! The sooner %z gets here the better. The only problem is that > > gcc has been taught that %z means something different in the kernel. :-( > > Where is gcc taught these things? Can we update it? From c-format.c: %%% /* BSD conversion specifiers. */ /* FreeBSD kernel extensions (src/sys/kern/subr_prf.c). The format %b is supported to decode error registers. Its usage is: printf("reg=%b\n", regval, "*"); which produces: reg=3 The format %D provides a hexdump given a pointer and separator string: ("%6D", ptr, ":") -> XX:XX:XX:XX:XX:XX ("%*D", len, ptr, " ") -> XX XX XX XX ... */ { "D", 1, STD_EXT, { T89_C, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-wp", "cR" }, { "b", 1, STD_EXT, { T89_C, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-wp", "" }, { "rz", 0, STD_EXT, { T89_I, BADLEN, BADLEN, T89_L, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-wp0 +#", "i" }, { NULL, 0, 0, NOLENGTHS, NULL, NULL } %%% The "z" in "rz" here gives the FreeBSD extension. There seem to be no conflicts in practice, because %z is a format specifier (like %z) in the extension, but is (bogusly) a length modifer (like %l) in C99. Plain %z (which can only be reasonably interpreted as a format specifier) is implemented by the T89_I entry in the above table. This is broken, because -fformat-extensions (brokenly) doesn't turn off the things that are not supported by the kernel printf (e.g., %z as a length modifier), and gcc interprets %z as a length modifier first. But this is harmless because plain %z is not used in the kernel. %lz is used and still works because there is no ambiguity for %z. This is implemented by the T89_L intry in the above table. Format checking for printing size_t's using %zu (or %zd) works too, since there is no ambiguity. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 13:30:47 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1C52737B401; Wed, 9 Oct 2002 13:30:46 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id C673A43E3B; Wed, 9 Oct 2002 13:30:44 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id GAA23457; Thu, 10 Oct 2002 06:30:34 +1000 Date: Thu, 10 Oct 2002 06:40:41 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: John Baldwin Cc: Andrew Gallatin , Peter Wemm , Subject: Re: lp64 vs lp32 printf In-Reply-To: Message-ID: <20021010062921.T6622-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, John Baldwin wrote: > ... Hmm, I was incorrect > (my grep re was busted) and %z is actually used in two places in ddb. > We can either pick a letter to use or just use %x with explicit signs > in those two cases: > > ddb/db_examine.c: db_printf("%-*lz", width, (long)value); > ddb/db_examine.c: db_printf("%8lz", (long)addr); > > Hmm, the second case doesn't even use a sign so it can be %x anyways. This seems to be just a bug. The original db_printf() prints -1 as -1 for %z format. From db_output.c rev.1.1: %%% case 'z': ul = lflag ? va_arg(ap, u_long) : va_arg(ap, u_int); if ((long)ul < 0) { neg = 1; ul = -(long)ul; } base = 16; goto number; %%% I "restored" this wrong in subr_prf.c 1.47. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 13:45: 7 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7FEF737B401 for ; Wed, 9 Oct 2002 13:45:05 -0700 (PDT) Received: from mail.speakeasy.net (mail17.speakeasy.net [216.254.0.217]) by mx1.FreeBSD.org (Postfix) with ESMTP id 14F8643E75 for ; Wed, 9 Oct 2002 13:45:05 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 17340 invoked from network); 9 Oct 2002 20:45:05 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail17.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 9 Oct 2002 20:45:05 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id g99Kj2n5012116; Wed, 9 Oct 2002 16:45:02 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20021010062921.T6622-100000@gamplex.bde.org> Date: Wed, 09 Oct 2002 16:45:06 -0400 (EDT) From: John Baldwin To: Bruce Evans Subject: Re: lp64 vs lp32 printf Cc: freebsd-arch@FreeBSD.ORG, Peter Wemm , Andrew Gallatin Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 09-Oct-2002 Bruce Evans wrote: > On Wed, 9 Oct 2002, John Baldwin wrote: > >> ... Hmm, I was incorrect >> (my grep re was busted) and %z is actually used in two places in ddb. >> We can either pick a letter to use or just use %x with explicit signs >> in those two cases: >> >> ddb/db_examine.c: db_printf("%-*lz", width, (long)value); >> ddb/db_examine.c: db_printf("%8lz", (long)addr); >> >> Hmm, the second case doesn't even use a sign so it can be %x anyways. > > This seems to be just a bug. The original db_printf() prints -1 as -1 > for %z format. From db_output.c rev.1.1: So should %z force sign on? --- subr_prf.c 28 Sep 2002 21:34:31 -0000 1.88 +++ subr_prf.c 9 Oct 2002 20:41:53 -0000 @@ -664,8 +664,8 @@ goto handle_nosign; case 'z': base = 16; - if (sign) - goto handle_sign; + sign = 1; + goto handle_sign; handle_nosign: sign = 0; if (jflag) -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 13:55: 5 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0FC2C37B401; Wed, 9 Oct 2002 13:55:04 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id E34BD43E6A; Wed, 9 Oct 2002 13:55:02 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id GAA25112; Thu, 10 Oct 2002 06:55:00 +1000 Date: Thu, 10 Oct 2002 07:05:07 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: John Baldwin Cc: freebsd-arch@FreeBSD.ORG, Peter Wemm , Andrew Gallatin Subject: Re: lp64 vs lp32 printf In-Reply-To: Message-ID: <20021010064644.C6622-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, John Baldwin wrote: > > ddb/db_examine.c: db_printf("%-*lz", width, (long)value); > > ddb/db_examine.c: db_printf("%8lz", (long)addr); > > > > Hmm, the second case doesn't even use a sign so it can be %x anyways. > And the first one doesn't use the '+' modifier either so it can just be > converted to use '%x' as well. Hmm, more likely is that probably > these two places should be using '+z' instead of just 'z'. So, > maybe 'y' instead of 'z'? '+' doesn't work normally in the kernel. It is a no-op before %d and it should be a similar no-op before %z (unless %+d is fixed or %z is renamed). However, it currently has the affect of unbreaking %z. s/z/y/ seems reasonable. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 14:23: 3 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1CF0137B401; Wed, 9 Oct 2002 14:23:02 -0700 (PDT) Received: from dragon.nuxi.com (trang.nuxi.com [66.92.13.169]) by mx1.FreeBSD.org (Postfix) with ESMTP id 907BE43E3B; Wed, 9 Oct 2002 14:23:01 -0700 (PDT) (envelope-from obrien@NUXI.com) Received: from dragon.nuxi.com (obrien@localhost [127.0.0.1]) by dragon.nuxi.com (8.12.6/8.12.2) with ESMTP id g99LMw5b065548; Wed, 9 Oct 2002 14:22:58 -0700 (PDT) (envelope-from obrien@dragon.nuxi.com) Received: (from obrien@localhost) by dragon.nuxi.com (8.12.6/8.12.5/Submit) id g99LMw9f065547; Wed, 9 Oct 2002 14:22:58 -0700 (PDT) Date: Wed, 9 Oct 2002 14:22:58 -0700 From: "David O'Brien" To: Peter Wemm Cc: Bruce Evans , Mike Barcroft , Andrew Gallatin , freebsd-arch@FreeBSD.ORG Subject: Re: lp64 vs lp32 printf Message-ID: <20021009212258.GA65457@dragon.nuxi.com> Reply-To: obrien@FreeBSD.ORG References: <20021009161756.E4040-100000@gamplex.bde.org> <20021009173106.9D1862A88D@canning.wemm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20021009173106.9D1862A88D@canning.wemm.org> User-Agent: Mutt/1.4i X-Operating-System: FreeBSD 5.0-CURRENT Organization: The NUXI BSD Group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, Oct 09, 2002 at 10:31:06AM -0700, Peter Wemm wrote: > Bruce Evans wrote: > > On Tue, 8 Oct 2002, Mike Barcroft wrote: > > > > > Andrew Gallatin writes: > > > > > > > > What's the accepted way to printf something (like sizeof()) which > > > > boils down to "unsigned int" on x86 and "unsigned long" on the LP64 > > > > platforms? > > > > > > In userland you can use %z for printing size_t's. In the kernel, > > > casting to intmax_t/uintmax_t and using %j is correct. > > > > Um, using intmax_t to print size_t's would be incorrect, since it is > > signed. Using uintmax_t would be bloat. Very few typedefed types > > need the full bloat of [u]intmax_t, and size_t is unlikely to become > > one of them before casting it to uintmax_t to print it becomes a style > > bug in the kernel too (when %z is implemented). > > Bring it on! The sooner %z gets here the better. The only problem is that > gcc has been taught that %z means something different in the kernel. :-( Yeah, I can fix that along with fixing %z. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 15: 5:29 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9DE2737B406; Wed, 9 Oct 2002 15:05:24 -0700 (PDT) Received: from dragon.nuxi.com (trang.nuxi.com [66.92.13.169]) by mx1.FreeBSD.org (Postfix) with ESMTP id 17C1D43E4A; Wed, 9 Oct 2002 15:05:24 -0700 (PDT) (envelope-from obrien@NUXI.com) Received: from dragon.nuxi.com (obrien@localhost [127.0.0.1]) by dragon.nuxi.com (8.12.6/8.12.2) with ESMTP id g99M5M5b065967; Wed, 9 Oct 2002 15:05:22 -0700 (PDT) (envelope-from obrien@dragon.nuxi.com) Received: (from obrien@localhost) by dragon.nuxi.com (8.12.6/8.12.5/Submit) id g99M5Mve065966; Wed, 9 Oct 2002 15:05:22 -0700 (PDT) Date: Wed, 9 Oct 2002 15:05:22 -0700 From: "David O'Brien" To: John Baldwin Cc: Mike Barcroft , freebsd-arch@FreeBSD.org, Andrew Gallatin Subject: Re: lp64 vs lp32 printf Message-ID: <20021009220522.GA65943@dragon.nuxi.com> Reply-To: obrien@FreeBSD.org References: <20021008203120.K97120@espresso.q9media.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i X-Operating-System: FreeBSD 5.0-CURRENT Organization: The NUXI BSD Group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG How is this patch? Index: contrib/gcc/c-format.c =================================================================== RCS file: /home/ncvs/src/contrib/gcc/c-format.c,v retrieving revision 1.5 diff -u -r1.5 c-format.c --- contrib/gcc/c-format.c 12 Jul 2002 00:49:52 -0000 1.5 +++ contrib/gcc/c-format.c 9 Oct 2002 21:52:40 -0000 @@ -795,10 +795,12 @@ The format %D provides a hexdump given a pointer and separator string: ("%6D", ptr, ":") -> XX:XX:XX:XX:XX:XX ("%*D", len, ptr, " ") -> XX XX XX XX ... + The format %H is a version of %x that allows for a sign + (e.g. -0x10 instead of 0xfffffff0, or +0x10). */ { "D", 1, STD_EXT, { T89_C, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-wp", "cR" }, { "b", 1, STD_EXT, { T89_C, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-wp", "" }, - { "rz", 0, STD_EXT, { T89_I, BADLEN, BADLEN, T89_L, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-wp0 +#", "i" }, + { "rH", 0, STD_EXT, { T89_I, BADLEN, BADLEN, T89_L, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-wp0 +#", "i" }, { NULL, 0, 0, NOLENGTHS, NULL, NULL } }; Index: share/man/man9/printf.9 =================================================================== RCS file: /home/ncvs/src/share/man/man9/printf.9,v retrieving revision 1.3 diff -u -r1.3 printf.9 --- share/man/man9/printf.9 1 Oct 2001 16:09:25 -0000 1.3 +++ share/man/man9/printf.9 9 Oct 2002 21:55:51 -0000 @@ -66,7 +66,7 @@ .Xr printf 3 . However, .Xr printf 9 -adds two other conversion specifiers. +adds four conversion specifiers. .Pp The .Cm \&%b @@ -90,6 +90,10 @@ for the last bit identifier. .Pp The +.Cm \&%r +identifier is undocumented. +.Pp +The .Cm \&%D identifier is meant to assist in hexdumps. It requires two arguments: a @@ -102,6 +106,12 @@ The string is used as a delimiter between individual bytes. If present, a width directive will specify the number of bytes to display. By default, 16 bytes of data are output. +.Pp +The +.Cm \&%H +identifier is a version of +.Cm \&%x +that allows for a sign (e.g. -0x10 instead of 0xfffffff0, or +0x10). .Sh RETURN VALUES The .Fn printf Index: sys/ddb/db_examine.c =================================================================== RCS file: /home/ncvs/src/sys/ddb/db_examine.c,v retrieving revision 1.29 diff -u -r1.29 db_examine.c --- sys/ddb/db_examine.c 25 Jun 2002 15:59:24 -0000 1.29 +++ sys/ddb/db_examine.c 9 Oct 2002 21:49:55 -0000 @@ -129,7 +129,7 @@ case 'z': /* signed hex */ value = db_get_value(addr, size, TRUE); addr += size; - db_printf("%-*lz", width, (long)value); + db_printf("%-*lH", width, (long)value); break; case 'd': /* signed decimal */ value = db_get_value(addr, size, TRUE); @@ -212,8 +212,8 @@ case 'x': db_printf("%8lx", (unsigned long)addr); break; - case 'z': - db_printf("%8lz", (long)addr); + case 'H': + db_printf("%8lH", (long)addr); break; case 'd': db_printf("%11ld", (long)addr); Index: sys/kern/subr_prf.c =================================================================== RCS file: /home/ncvs/src/sys/kern/subr_prf.c,v retrieving revision 1.88 diff -u -r1.88 subr_prf.c --- sys/kern/subr_prf.c 28 Sep 2002 21:34:31 -0000 1.88 +++ sys/kern/subr_prf.c 9 Oct 2002 21:49:08 -0000 @@ -662,7 +662,7 @@ case 'X': base = 16; goto handle_nosign; - case 'z': + case 'H': base = 16; if (sign) goto handle_sign; To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 15:33:35 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7607037B401 for ; Wed, 9 Oct 2002 15:33:34 -0700 (PDT) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3B7EB43E3B for ; Wed, 9 Oct 2002 15:33:34 -0700 (PDT) (envelope-from baka@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1921) id 0EF94AE28D; Wed, 9 Oct 2002 15:33:34 -0700 (PDT) Date: Wed, 9 Oct 2002 15:33:34 -0700 From: Jon Mini To: Julian Elischer Cc: Igor Sysoev , arch@FreeBSD.ORG Subject: Re: Scheduler framework. Message-ID: <20021009223333.GH30246@elvis.mu.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Julian Elischer [julian@elischer.org] wrote : > > Sorry, where did this second thread come from ? > > there is a thread_allocator that allocates threads on demand. > > Actually the process ahs a couple of spare threads "Up its sleave" > so it doesn't have to go to teh thread allocator every time.. I know Jeff asked in an earlier message "why do this? Isn't that why we have UMA?" The short answer is that we can't allocate from within the scheduler, because if a page is allocated from the VM to fill another slab, we run into locking problems. -- Jonathan Mini http://www.freebsd.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 16:46:29 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2C6DB37B401 for ; Wed, 9 Oct 2002 16:46:27 -0700 (PDT) Received: from softweyr.com (softweyr.com [65.88.244.127]) by mx1.FreeBSD.org (Postfix) with ESMTP id 49B1243E7B for ; Wed, 9 Oct 2002 16:46:26 -0700 (PDT) (envelope-from wes@softweyr.com) Received: from nextgig-9.access.nethere.net ([66.63.140.201] helo=softweyr.com) by softweyr.com with esmtp (Exim 3.35 #1) id 17zQWe-000LRA-00; Wed, 09 Oct 2002 17:46:00 -0600 Message-ID: <3DA4C271.37AACAA3@softweyr.com> Date: Wed, 09 Oct 2002 16:57:37 -0700 From: Wes Peters Organization: Softweyr LLC X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.2 i386) X-Accept-Language: en MIME-Version: 1.0 To: Matthew Dillon Cc: "Vladimir B. Grebenschikov" , Nate Lawson , arch@FreeBSD.ORG Subject: Re: Database indexes and ram (was Re: using mem above 4Gb was:swapon some regular file) References: <1034105993.913.1.camel@vbook.express.ru> <200210082015.g98KFFrq084625@apollo.backplane.com> <1034109053.913.7.camel@vbook.express.ru> <200210082051.g98KpjU1084793@apollo.backplane.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Matthew Dillon wrote: > > :Mattew, please look at my initial posting. My idea is to extend ram > :available for storing such thing as index above 4Gb (actually about 3Gb) > :limit, if there more physical ram. Current mmap(read vm) implementation > :will map/cache only in memory below 4Gb not depending of amount of > :physical ram. > > Well, this has been discussed before. The issue with accessing ram > over 4GB, apart from the fact that the page tables double in size (you > have to use 64 bit pte's instead of 32 bit pte's) is that DMAing to/from > memory above 4GB can be rather tricky. This creates all sorts of > problem including not necessarily being able to read() or write() > above the 4G mark (in regards to physical ram) without a lot of mess > in the OS .. bounce buffers redux, so to speak. Linux solved this problem by refusing to do it. The candidates for DMA transfers include skbufs and buffers from the disk buffer pool, both of which are allocated from the lowest 4GB of physical ram when using PAE mode. > So while it would be possible use such memory as unswappable, unIOable > anonymous-only memory, such use would be fairly limited and might not > be worth implementing for a 32 bit platform. At that point you might > as well move to a 64 bit platform. Nah, it works great. Each process gets 3GB process virtual address and 1GB kernel virtual address and all of the program text+data can be located anywhere in physical ram. For things like databases that need large indeces in memory, this is a big win. > It also might be more effective to spend that money on more ram for > the RAID system backing the database rather then trying to bump the > PC past the 4G mark, or spend that money on purchasing a second > server and distributing the load across the two servers. The types Neither will help you with index sizes if you're using really honking big tables, where the index just won't fit. We actually use multiple processes to hold cached data, including indexes, in order to make use of the extra RAM. I should shut up now. ;^) > of accesses to the index that might result in cacheable table data are > also the types of accesses to the index that will likely result in > cacheable index data. Using the same argument, the types of accesses > that might result in an uncacheable index would also likely result in > uncacheable table data which means you are going to run up against > seek/read problems on the table data, making it more worthwhile to > spend the money on beefing up the storage subsystem. That's only true if your database server is I/O bound. Depending on your job mix, this may or may not be the problem. -- "Where am I, and what am I doing in this handbasket?" Wes Peters Softweyr LLC wes@softweyr.com http://softweyr.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 16:48:37 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 43D0437B406 for ; Wed, 9 Oct 2002 16:48:36 -0700 (PDT) Received: from softweyr.com (softweyr.com [65.88.244.127]) by mx1.FreeBSD.org (Postfix) with ESMTP id 508BF43E7B for ; Wed, 9 Oct 2002 16:48:35 -0700 (PDT) (envelope-from wes@softweyr.com) Received: from nextgig-8.access.nethere.net ([66.63.140.200] helo=softweyr.com) by softweyr.com with esmtp (Exim 3.35 #1) id 17zQYs-000LRh-00; Wed, 09 Oct 2002 17:48:19 -0600 Message-ID: <3DA4C2F1.74450081@softweyr.com> Date: Wed, 09 Oct 2002 16:59:45 -0700 From: Wes Peters Organization: Softweyr LLC X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.2 i386) X-Accept-Language: en MIME-Version: 1.0 To: Terry Lambert Cc: Nate Lawson , "Vladimir B. Grebenschikov" , arch@FreeBSD.org Subject: Re: using mem above 4Gb was: swapon some regular file References: <3DA35D58.B1B5D78D@mindspring.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Terry Lambert wrote: > > PSE-36 and PAE give you access to a 36 bit address space. But you > are still limited to a 32 bit *linear* address space. > > More RAM in a 32 bit machine, even if you can wave the appropriate > entrails over the keyboard so that it's accessible to the OS, will > *NOT* increase the linear address space. > > IMO, if you want a larger linear address space, instead of pretending > you have one, buy yourself an IA64 instead. Or an Alpha, or a SPARC64, or a MIPS64, etc. But they all seem to cost more than a PIII solution, except perhaps a Netra and you can't cram enough RAM in that to make a difference. -- "Where am I, and what am I doing in this handbasket?" Wes Peters Softweyr LLC wes@softweyr.com http://softweyr.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 17:15:55 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BBDFA37B401 for ; Wed, 9 Oct 2002 17:15:53 -0700 (PDT) Received: from flamingo.mail.pas.earthlink.net (flamingo.mail.pas.earthlink.net [207.217.120.232]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0BF2B43E42 for ; Wed, 9 Oct 2002 17:15:53 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0210.cvx21-bradley.dialup.earthlink.net ([209.179.192.210] helo=mindspring.com) by flamingo.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17zQz9-0003LO-00; Wed, 09 Oct 2002 17:15:27 -0700 Message-ID: <3DA4C632.325F2EBE@mindspring.com> Date: Wed, 09 Oct 2002 17:13:38 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Wes Peters Cc: Matthew Dillon , "Vladimir B. Grebenschikov" , Nate Lawson , arch@FreeBSD.ORG Subject: Re: Database indexes and ram (was Re: using mem above 4Gb was:swapon some regular file) References: <1034105993.913.1.camel@vbook.express.ru> <200210082015.g98KFFrq084625@apollo.backplane.com> <1034109053.913.7.camel@vbook.express.ru> <200210082051.g98KpjU1084793@apollo.backplane.com> <3DA4C271.37AACAA3@softweyr.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Wes Peters wrote: > Linux solved this problem by refusing to do it. The candidates for DMA > transfers include skbufs and buffers from the disk buffer pool, both of > which are allocated from the lowest 4GB of physical ram when using PAE > mode. Yes; this is the "Fast RAM/bounce buffer" approach I mentioned already. Linux has an advantage here, in that they already run software virtualization on the VM system, in order to try to be architecture independent. The result is overhead in reverse lookups that has only recently been fixed (and you need patches to use it). FreeBSD would eat more overhead doing this, where it sort of "fell out" of the extra overhead they already eat in the Linux case. > > So while it would be possible use such memory as unswappable, unIOable > > anonymous-only memory, such use would be fairly limited and might not > > be worth implementing for a 32 bit platform. At that point you might > > as well move to a 64 bit platform. > > Nah, it works great. Each process gets 3GB process virtual address and > 1GB kernel virtual address and all of the program text+data can be located > anywhere in physical ram. For things like databases that need large > indeces in memory, this is a big win. This, I don't get: I don't understand how they can live with only 1G of KVA space. I guess they are expecting a small number of net connections... > > It also might be more effective to spend that money on more ram for > > the RAID system backing the database rather then trying to bump the > > PC past the 4G mark, or spend that money on purchasing a second > > server and distributing the load across the two servers. The types > > Neither will help you with index sizes if you're using really honking big > tables, where the index just won't fit. We actually use multiple processes > to hold cached data, including indexes, in order to make use of the extra > RAM. I should shut up now. ;^) ...or you'll have to kill you. 8-) 8-). > > of accesses to the index that might result in cacheable table data are > > also the types of accesses to the index that will likely result in > > cacheable index data. Using the same argument, the types of accesses > > that might result in an uncacheable index would also likely result in > > uncacheable table data which means you are going to run up against > > seek/read problems on the table data, making it more worthwhile to > > spend the money on beefing up the storage subsystem. > > That's only true if your database server is I/O bound. Depending on your > job mix, this may or may not be the problem. Likely, it will not be true, for any very large database, particularly if you end up doing a reasonable number of joins. Hardly anybody goes past 3rd normal form, and some people never even get that far. 8-). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 17:23: 7 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E54BA37B401 for ; Wed, 9 Oct 2002 17:23:05 -0700 (PDT) Received: from flamingo.mail.pas.earthlink.net (flamingo.mail.pas.earthlink.net [207.217.120.232]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8145843E4A for ; Wed, 9 Oct 2002 17:23:05 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0210.cvx21-bradley.dialup.earthlink.net ([209.179.192.210] helo=mindspring.com) by flamingo.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17zR6H-0006DB-00; Wed, 09 Oct 2002 17:22:49 -0700 Message-ID: <3DA4C7EC.F749B803@mindspring.com> Date: Wed, 09 Oct 2002 17:21:00 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Wes Peters Cc: Nate Lawson , "Vladimir B. Grebenschikov" , arch@FreeBSD.org Subject: Re: using mem above 4Gb was: swapon some regular file References: <3DA35D58.B1B5D78D@mindspring.com> <3DA4C2F1.74450081@softweyr.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Wes Peters wrote: > Terry Lambert wrote: > > IMO, if you want a larger linear address space, instead of pretending > > you have one, buy yourself an IA64 instead. > > Or an Alpha, or a SPARC64, or a MIPS64, etc. But they all seem to cost > more than a PIII solution, except perhaps a Netra and you can't cram enough > RAM in that to make a difference. People always say this, but... the Alpha is unsuitable, because FreeBSD on the Alpha doesn't support more than 2G of physical RAM, because the drivers choke. The MIPS is not an option, because though there is a FreeBSD port, as reported at last year's "developer summit" at Usenix, it was never integrated into the source tree. The SPARC64 isn't a mainstream port yet (I know this because my patch to kdenetwork3 was adulterated to be "if Alpha", when it should have been adulterated to "if !32_bit_x86", if at all, because the SPARC64 and IA64 GOT will go over 64K, as well... the problem is the 64bit vs. 32bit values, not symbol names, etc., that causes the table size to be bigger there). Right now, IA64 is about the only supported 64 bit architecture that gives you the real benefit of a 64 bit address space; I guess you can mmap a lot of stuff on the Alpha, too, up to your KVA mapping limit, but that's not a win for this application. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 18:17:21 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 011AC37B401 for ; Wed, 9 Oct 2002 18:17:21 -0700 (PDT) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2FC2243E42 for ; Wed, 9 Oct 2002 18:17:20 -0700 (PDT) (envelope-from jeff@freebsd.org) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g9A1HIF13420 for ; Wed, 9 Oct 2002 21:17:18 -0400 (EDT) (envelope-from jeff@freebsd.org) X-Authentication-Warning: mail.chesapeake.net: jroberson owned process doing -bs Date: Wed, 9 Oct 2002 21:17:18 -0400 (EDT) From: Jeff Roberson X-X-Sender: jroberson@mail.chesapeake.net To: arch@freebsd.org Subject: Scheduler patch, ready for commit. Message-ID: <20021009211321.M23516-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I haven't heard any objections to the goals of this patch. I have cleaned it up and readied it for commit. This step is important so that I can stop manually merging in new scheduler changes and get on with the new scheduler. This patch does not change any functionality in the current system. It is only a code reorg. As always any comments are welcome. The patch is available at http://www.chesapeake.net/~jroberson/sched.patch Thanks, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 19:35: 4 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0BBC037B401; Wed, 9 Oct 2002 19:35:02 -0700 (PDT) Received: from carp.icir.org (carp.icir.org [192.150.187.71]) by mx1.FreeBSD.org (Postfix) with ESMTP id A6D3943E6A; Wed, 9 Oct 2002 19:35:01 -0700 (PDT) (envelope-from rizzo@carp.icir.org) Received: from carp.icir.org (localhost [127.0.0.1]) by carp.icir.org (8.12.3/8.12.3) with ESMTP id g9A2Z1O2055798; Wed, 9 Oct 2002 19:35:01 -0700 (PDT) (envelope-from rizzo@carp.icir.org) Received: (from rizzo@localhost) by carp.icir.org (8.12.3/8.12.3/Submit) id g9A2Z1aS055797; Wed, 9 Oct 2002 19:35:01 -0700 (PDT) (envelope-from rizzo) Date: Wed, 9 Oct 2002 19:35:01 -0700 From: Luigi Rizzo To: Jeff Roberson Cc: arch@FreeBSD.ORG Subject: Re: Scheduler patch, ready for commit. Message-ID: <20021009193501.A55534@carp.icir.org> References: <20021009211321.M23516-100000@mail.chesapeake.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20021009211321.M23516-100000@mail.chesapeake.net>; from jeff@FreeBSD.ORG on Wed, Oct 09, 2002 at 09:17:18PM -0400 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, Oct 09, 2002 at 09:17:18PM -0400, Jeff Roberson wrote: > I haven't heard any objections to the goals of this patch. I have cleaned > it up and readied it for commit. This step is important so that I can well, you said you posted it just to get feedback, and now in 24 hours you declare it "ready for commit". A bit rushing, aren't you! I totally agree on the goals of the change (to abstract the scheduler from the base system, generalise it, etc.), but it seems to me that the specific details still need some cleanup before committing. And, well, maybe leave a little bit more time to people to provide feedback! So to come to the specific points: * there is one important API function which in my opinion is missing: -stable has a function, curpriority_cmp, which compares the priority of a currently running "thing" (process/thread) with that of a newly awaken one, and decises who has more right to get the CPU. In -current this is done inline (by comparing priorities), but this is very specific of the scheduler used there -- e.g. it doesn't work for the scheduler used in -stable (where we have 3 different classes for rtprio, normal and idlepri processes), and it does not work in general in cases where you could have different metrics to decide who is going to proceed. * Another API which should be made generic is forward_roundrobin(). I believe the purpose of a "generic" version of this function is to dispatch a generic timeout to the appropriate CPU(s). In the FreeBSD's scheduler this timeout happens to be the roundrobin timeout, hence the name, but other schedulers (e.g. the one we wrote) have different timeout routines, and the dispatching requirements vary (e.g. could need to go to a single CPU as opposed to all cpus). On the same grounds, sched_rr_interval() is not generic, but specific for the freebsd scheduler. Not all schedulers have the concept of a roundrobin interval. * The other thing that i would really like to see is to call the scheduler functions through function pointers, so life is a lot easier when we will decide to enable kldloading of schedulers (or having multiple alternative ones). The way to implement it is trivial, see how we did in http://info.iet.unipi.it/~luigi/ps_sched.20020719a.diff. Oh, another thing: struct proc (or struct thread, whatever it is in -current)) contains some scheduler-specific information, such as the priority fields. Other schedulers might need different data structures to work on. Given that this structure is something we should not change lightly or frequently, it would be a good idea to provide one (1) field (e.g. a void *) to be used by the scheduler to reach a "scheduler extension" block from it. The way we provided this extensibility in our scheduler framework (in -stable) was to use some padding bytes in struct proc to store an integer which is an index in an array of extended process descriptors, but that was just a hack motivated by the need of not changing "struct proc". Finally, thanks for working on this stuff! cheers luigi ----------------------------------+----------------------------------------- Luigi RIZZO, luigi@iet.unipi.it . ICSI (on leave from Univ. di Pisa) http://www.iet.unipi.it/~luigi/ . 1947 Center St, Berkeley CA 94704 Phone: (510) 666 2988 ----------------------------------+----------------------------------------- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 21:11:16 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D52C837B401 for ; Wed, 9 Oct 2002 21:11:13 -0700 (PDT) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 36CF543E6E for ; Wed, 9 Oct 2002 21:11:13 -0700 (PDT) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g9A4BBv81584; Thu, 10 Oct 2002 00:11:11 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Thu, 10 Oct 2002 00:11:11 -0400 (EDT) From: Jeff Roberson To: Luigi Rizzo Cc: arch@FreeBSD.ORG Subject: Re: Scheduler patch, ready for commit. In-Reply-To: <20021009193501.A55534@carp.icir.org> Message-ID: <20021009234324.F23516-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, Luigi Rizzo wrote: > > well, you said you posted it just to get feedback, and now in 24 hours > you declare it "ready for commit". A bit rushing, aren't you! Perhaps I wasn't clear. I was looking for an indication of whether or not this was something people were interested in having in for 5.0. I'm not going to rush the commit but by declaring it to be of a sufficient quality for commiting I was hoping to scare more poeple into reviewing it. ;-) > So to come to the specific points: > > * there is one important API function which in my opinion is > missing: -stable has a function, curpriority_cmp, which compares > the priority of a currently running "thing" (process/thread) > with that of a newly awaken one, and decises who has more right > to get the CPU. This decicion is currently only made within code that has intimate knowledge of the scheduler. ie, within sched_4bsd.c. There is currently no need for an externally visible api. sched_wakeup() handles this specific case. The only other places that knows a significant amount about priorities is kern_mutex.c. It calls a hepler function to adjust priorities though. > > * Another API which should be made generic is forward_roundrobin(). > I believe the purpose of a "generic" version of this function > is to dispatch a generic timeout to the appropriate CPU(s). In > the FreeBSD's scheduler this timeout happens to be the roundrobin > timeout, hence the name, but other schedulers (e.g. the one we > wrote) have different timeout routines, and the dispatching > requirements vary (e.g. could need to go to a single CPU as > opposed to all cpus). FreeBSD already has generic timeout handling code. Schedulers are free to use it however they like. This timeout is only enabled in sched_4bsd.c right now. It is only using the helper function from subr_smp.c. I think good arguments could be made for puting this function in either file so I'm leaving it where it is. I only assume that schedulers will implement a handler for hardclock. So that's all I have hooks for. > > On the same grounds, sched_rr_interval() is not generic, > but specific for the freebsd scheduler. Not all schedulers > have the concept of a roundrobin interval. I agree. There is one bit of code that depends on this that I'm not intimately familiar with. My intention here was for the individual schedulers to provide a good approximation of this if they didn't support it exactly. That seems necessary for this code to work at all. (specificaly, sys/posix4/ksched.c) > > * The other thing that i would really like to see is to > call the scheduler functions through function pointers, so > life is a lot easier when we will decide to enable kldloading of > schedulers (or having multiple alternative ones). > The way to implement it is trivial, see how we did in > http://info.iet.unipi.it/~luigi/ps_sched.20020719a.diff. This is not a priority for me at the moment. I don't think it's important for now. At least not for my purposes. I'm not prepaired to deal with the overhead or the complications that arise from runtime scheduler selection. I'd just like to have an alternate scheduler ready for 5.0. This seems to be the cleanest path to that goal. I consider this an intermediate stage on the way towards a pluggable scheduler interface. > > Oh, another thing: struct proc (or struct thread, whatever it is > in -current)) contains some scheduler-specific information, such > as the priority fields. Other schedulers might need different data > structures to work on. Given that this structure is something we > should not change lightly or frequently, it would be a good idea > to provide one (1) field (e.g. a void *) to be used by the scheduler > to reach a "scheduler extension" block from it. The way we provided > this extensibility in our scheduler framework (in -stable) was to > use some padding bytes in struct proc to store an integer which is > an index in an array of extended process descriptors, but that was > just a hack motivated by the need of not changing "struct proc". > Yes, this whole problem is quite ugly. Perhaps I should be clear about my intentions. I'm not attempting to create the perfect scheduler abstraction. I'm trying to get us close enough so that I can continue my own work on a new scheduler. I believe that what I have done so far can provide a good foundation for abstrated, loadable, dynamic scheduling in the future. My real goal, however, is just to have a better scheduler for 5.0. For me that is more important than a loadable scheduler infrastructure. > Finally, thanks for working on this stuff! > Thank you very much for the feedback! Perhaps at the next BSDcon we can sit down and devise a good plan for a full featured framework. Thanks, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 21:46:13 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3461937B401 for ; Wed, 9 Oct 2002 21:46:12 -0700 (PDT) Received: from sccrmhc01.attbi.com (sccrmhc01.attbi.com [204.127.202.61]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7879343E4A for ; Wed, 9 Oct 2002 21:46:11 -0700 (PDT) (envelope-from bmah@employees.org) Received: from bmah.dyndns.org ([12.233.149.189]) by sccrmhc01.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021010044606.DOMP29655.sccrmhc01.attbi.com@bmah.dyndns.org>; Thu, 10 Oct 2002 04:46:06 +0000 Received: from intruder.bmah.org (localhost [IPv6:::1]) by bmah.dyndns.org (8.12.6/8.12.6) with ESMTP id g9A4k68W026652; Wed, 9 Oct 2002 21:46:06 -0700 (PDT) (envelope-from bmah@intruder.bmah.org) Received: (from bmah@localhost) by intruder.bmah.org (8.12.6/8.12.6/Submit) id g9A4k6kx026651; Wed, 9 Oct 2002 21:46:06 -0700 (PDT) Message-Id: <200210100446.g9A4k6kx026651@intruder.bmah.org> X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Jeff Roberson Cc: Luigi Rizzo , arch@FreeBSD.ORG Subject: Re: Scheduler patch, ready for commit. In-Reply-To: <20021009234324.F23516-100000@mail.chesapeake.net> References: <20021009234324.F23516-100000@mail.chesapeake.net> Comments: In-reply-to Jeff Roberson message dated "Thu, 10 Oct 2002 00:11:11 -0400." From: "Bruce A. Mah" Reply-To: bmah@FreeBSD.ORG X-Face: g~c`.{#4q0"(V*b#g[i~rXgm*w;:nMfz%_RZLma)UgGN&=j`5vXoU^@n5v4:OO)c["!w)nD/!!~e4Sj7LiT'6*wZ83454H""lb{CC%T37O!!'S$S&D}sem7I[A 2V%N&+ X-Image-Url: http://www.employees.org/~bmah/Images/bmah-cisco-small.gif X-Url: http://www.employees.org/~bmah/ Mime-Version: 1.0 Content-Type: multipart/signed; boundary="==_Exmh_-79557579P"; micalg=pgp-sha1; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit Date: Wed, 09 Oct 2002 21:46:05 -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --==_Exmh_-79557579P Content-Type: text/plain; charset=us-ascii If memory serves me right, Jeff Roberson wrote: > On Wed, 9 Oct 2002, Luigi Rizzo wrote: > > > > > well, you said you posted it just to get feedback, and now in 24 hours > > you declare it "ready for commit". A bit rushing, aren't you! > > Perhaps I wasn't clear. I was looking for an indication of whether or not > this was something people were interested in having in for 5.0. I'm not > going to rush the commit but by declaring it to be of a sufficient quality > for commiting I was hoping to scare more poeple into reviewing it. ;-) Let me just briefly don my RE team member hat and say that for right now, I'm much more interested in seeing commits to make CURRENT more stable, rather than seeing people add lots of new functionality. Remember that we're targeting a release in less than two months. It's not going to be possible to make CURRENT perfect by then, but we need to avoid making this process more complicated by adding loads of new features, especially in the area of something as fundamental as the scheduler. This is not a comment on the quality of your work. I freely admit that I'm not qualified to review it. Bruce. --==_Exmh_-79557579P Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (FreeBSD) Comment: Exmh version 2.5+ 20020506 iD8DBQE9pQYN2MoxcVugUsMRAg9fAKD/Ui02gaSloTbChFt4Fyb46n28vwCg2lDo G7ars+aXi5Lqsrms3NNd1UU= =gwyI -----END PGP SIGNATURE----- --==_Exmh_-79557579P-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 22:12:50 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4B06637B401; Wed, 9 Oct 2002 22:12:49 -0700 (PDT) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0E2E243E75; Wed, 9 Oct 2002 22:12:45 -0700 (PDT) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id 700B22A88D; Wed, 9 Oct 2002 22:12:41 -0700 (PDT) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: bmah@FreeBSD.ORG Cc: Jeff Roberson , Luigi Rizzo , arch@FreeBSD.ORG Subject: Re: Scheduler patch, ready for commit. In-Reply-To: <200210100446.g9A4k6kx026651@intruder.bmah.org> Date: Wed, 09 Oct 2002 22:12:41 -0700 From: Peter Wemm Message-Id: <20021010051241.700B22A88D@canning.wemm.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG "Bruce A. Mah" wrote: > If memory serves me right, Jeff Roberson wrote: > > On Wed, 9 Oct 2002, Luigi Rizzo wrote: > > > > > > > > well, you said you posted it just to get feedback, and now in 24 hours > > > you declare it "ready for commit". A bit rushing, aren't you! > > > > Perhaps I wasn't clear. I was looking for an indication of whether or not > > this was something people were interested in having in for 5.0. I'm not > > going to rush the commit but by declaring it to be of a sufficient quality > > for commiting I was hoping to scare more poeple into reviewing it. ;-) > > Let me just briefly don my RE team member hat and say that for right > now, I'm much more interested in seeing commits to make CURRENT more > stable, rather than seeing people add lots of new functionality. > Remember that we're targeting a release in less than two months. It's > not going to be possible to make CURRENT perfect by then, but we need to > avoid making this process more complicated by adding loads of new > features, especially in the area of something as fundamental as the > scheduler. To answer your concerns.. What Jeff is doing is trying to neatly encapsulate the existing scheduler into one place with a well defined interface and hooks to the rest of the kernel. As long as this is done right, it is a NOP change.. but with an important difference. It then allows optional drop-in replacements to be worked on independently. I personally think it is worth it since the potential gains are so great - as long as as this step is done carefully and doesn't change the existing policy and strategies. And that just happens to be what Jeff is trying to do. Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 22:28:16 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3D7BB37B404; Wed, 9 Oct 2002 22:28:15 -0700 (PDT) Received: from flamingo.mail.pas.earthlink.net (flamingo.mail.pas.earthlink.net [207.217.120.232]) by mx1.FreeBSD.org (Postfix) with ESMTP id BB48B43E77; Wed, 9 Oct 2002 22:28:14 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0137.cvx22-bradley.dialup.earthlink.net ([209.179.198.137] helo=mindspring.com) by flamingo.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17zVrp-0005AX-00; Wed, 09 Oct 2002 22:28:14 -0700 Message-ID: <3DA50FA4.3C8BE165@mindspring.com> Date: Wed, 09 Oct 2002 22:27:00 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Roberson Cc: arch@freebsd.org Subject: Re: Scheduler patch, ready for commit. References: <20021009211321.M23516-100000@mail.chesapeake.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Jeff Roberson wrote: > I haven't heard any objections to the goals of this patch. I have cleaned > it up and readied it for commit. This step is important so that I can > stop manually merging in new scheduler changes and get on with the new > scheduler. This patch does not change any functionality in the current > system. It is only a code reorg. > > As always any comments are welcome. The patch is available at > http://www.chesapeake.net/~jroberson/sched.patch THe documentation that you remove in kern_exit.c is not recreated anywhere. I think the XXX comments are important expository information. I'm somewhat concerned that you go to all this trouble, and then don't seperate out the statistics data from the proc structure; this probably means pushing the allocation of the proc structure into the scheduler code, if it's supposed to be one lump, but it should be just as easy to allocate it seperately with an encapsulated allocation that allocated the scheduler part, the proc part, and then aggregates them, all protected by the proc lock, and then imply that the proc lock protexts the data (since they will never divorce, even on deallocation, because the proc structs go to a free list, unless the memory is freed back to the system). I rather expected the statistical data, which is algorithm dependent, to be broken out. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 22:35:40 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7EDEB37B401 for ; Wed, 9 Oct 2002 22:35:31 -0700 (PDT) Received: from pirzyk.org (dsl-65-184-181-29.telocity.com [65.184.181.29]) by mx1.FreeBSD.org (Postfix) with ESMTP id 87D6B43E4A for ; Wed, 9 Oct 2002 22:35:30 -0700 (PDT) (envelope-from jim@pirzyk.org) Received: from snoopy (snoopy.pirzyk.org [10.26.0.4]) by pirzyk.org (8.12.3/8.12.3) with ESMTP id g9A5X0Fb000389 for ; Wed, 9 Oct 2002 22:33:02 -0700 (PDT) (envelope-from jim@pirzyk.org) From: Jim Pirzyk To: freebsd-arch@freebsd.org Subject: getnetby* functions Date: Wed, 9 Oct 2002 22:36:06 -0700 User-Agent: KMail/1.4.3 MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="------------Boundary-00=_6K3RC8Q3XF1UIKP2VYDR" Message-Id: <200210092236.06338.jim@pirzyk.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --------------Boundary-00=_6K3RC8Q3XF1UIKP2VYDR Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable The getnetby* functions seem to be broken for the DNS case. It cannot find the .0 address in dns. I propose that the _getnetbydns*() functions call the _gethostbydns*() function, reformat the result and stuff it into the netent struct. Included is a patch to do such a thing (it is patched against 4.6.2-RELEASE). Comments? - JimP --=20 --- @(#) $Id: dot.signature,v 1.10 2001/05/17 23:38:49 Jim.Pirzyk Exp $ __o jim@pirzyk.org ----------------------------------------------- _'\<,_ =20 (*)/ (*) =20 --------------Boundary-00=_6K3RC8Q3XF1UIKP2VYDR Content-Type: text/x-diff; charset="us-ascii"; name="getnetbydns.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="getnetbydns.patch" *** lib/libc/net/getnetbydns.c.orig Thu Sep 19 08:58:31 2002 --- lib/libc/net/getnetbydns.c Sat Sep 21 09:52:14 2002 *************** *** 81,88 **** extern int h_errno; - #define BYADDR 0 - #define BYNAME 1 #define MAXALIASES 35 #if PACKETSZ > 1024 --- 81,86 ---- *************** *** 91,222 **** #define MAXPACKET 1024 #endif ! typedef union { ! HEADER hdr; ! u_char buf[MAXPACKET]; ! } querybuf; ! ! typedef union { ! long al; ! char ac; ! } align; static struct netent * ! getnetanswer(answer, anslen, net_i) ! querybuf *answer; ! int anslen; ! int net_i; { - register HEADER *hp; - register u_char *cp; - register int n; - u_char *eom; - int type, class, buflen, ancount, qdcount, haveanswer, i, nchar; - char aux1[MAXHOSTNAMELEN], aux2[MAXHOSTNAMELEN], ans[MAXHOSTNAMELEN]; - char *in, *st, *pauxt, *bp, **ap; - char *paux1 = &aux1[0], *paux2 = &aux2[0], flag = 0; static struct netent net_entry; ! static char *net_aliases[MAXALIASES], netbuf[PACKETSZ]; ! /* ! * find first satisfactory answer ! * ! * answer --> +------------+ ( MESSAGE ) ! * | Header | ! * +------------+ ! * | Question | the question for the name server ! * +------------+ ! * | Answer | RRs answering the question ! * +------------+ ! * | Authority | RRs pointing toward an authority ! * | Additional | RRs holding additional information ! * +------------+ ! */ ! eom = answer->buf + anslen; ! hp = &answer->hdr; ! ancount = ntohs(hp->ancount); /* #/records in the answer section */ ! qdcount = ntohs(hp->qdcount); /* #/entries in the question section */ ! bp = netbuf; ! buflen = sizeof(netbuf); ! cp = answer->buf + HFIXEDSZ; ! if (!qdcount) { ! if (hp->aa) ! h_errno = HOST_NOT_FOUND; ! else ! h_errno = TRY_AGAIN; ! return (NULL); } - while (qdcount-- > 0) - cp += __dn_skipname(cp, eom) + QFIXEDSZ; - ap = net_aliases; - *ap = NULL; net_entry.n_aliases = net_aliases; ! haveanswer = 0; ! while (--ancount >= 0 && cp < eom) { ! n = dn_expand(answer->buf, eom, cp, bp, buflen); ! if ((n < 0) || !res_dnok(bp)) ! break; ! cp += n; ! ans[0] = '\0'; ! (void)strncpy(&ans[0], bp, sizeof(ans) - 1); ! ans[sizeof(ans) - 1] = '\0'; ! GETSHORT(type, cp); ! GETSHORT(class, cp); ! cp += INT32SZ; /* TTL */ ! GETSHORT(n, cp); ! if (class == C_IN && type == T_PTR) { ! n = dn_expand(answer->buf, eom, cp, bp, buflen); ! if ((n < 0) || !res_hnok(bp)) { ! cp += n; ! return (NULL); ! } ! cp += n; ! *ap++ = bp; ! n = strlen(bp) + 1; ! bp += n; ! buflen -= n; ! net_entry.n_addrtype = ! (class == C_IN) ? AF_INET : AF_UNSPEC; ! haveanswer++; ! } ! } ! if (haveanswer) { ! *ap = NULL; ! switch (net_i) { ! case BYADDR: ! net_entry.n_name = *net_entry.n_aliases; ! net_entry.n_net = 0L; ! break; ! case BYNAME: ! in = *net_entry.n_aliases; ! net_entry.n_name = &ans[0]; ! aux2[0] = '\0'; ! for (i = 0; i < 4; i++) { ! for (st = in, nchar = 0; ! *st != '.'; ! st++, nchar++) ! ; ! if (nchar != 1 || *in != '0' || flag) { ! flag = 1; ! (void)strncpy(paux1, ! (i==0) ? in : in-1, ! (i==0) ?nchar : nchar+1); ! paux1[(i==0) ? nchar : nchar+1] = '\0'; ! pauxt = paux2; ! paux2 = strcat(paux1, paux2); ! paux1 = pauxt; ! } ! in = ++st; ! } ! net_entry.n_net = inet_network(paux2); ! break; ! } ! net_entry.n_aliases++; ! return (&net_entry); ! } ! h_errno = TRY_AGAIN; ! return (NULL); } struct netent * --- 89,132 ---- #define MAXPACKET 1024 #endif ! struct hostent * _gethostbydnsaddr(const char *, int, int); ! struct hostent * _gethostbydnsname(const char *, int); static struct netent * ! getnetanswer(answer) ! struct hostent *answer; { static struct netent net_entry; ! static char *net_aliases[MAXALIASES]; ! u_long net; ! int i; ! /* Check to make sure we found a hostent */ ! if ( !answer ) return (NULL); ! ! net_entry.n_name = answer->h_name; ! ! for (i = 0; answer->h_aliases[i] && i < MAXALIASES; i++) { ! net_aliases[i] = answer->h_aliases[i]; } net_entry.n_aliases = net_aliases; ! ! net_entry.n_addrtype = answer->h_addrtype; ! ! /* One difference between gethostby* and getnetby* is the */ ! /* address for the former is in network byte order and the */ ! /* latter is in machine byte order :( */ ! /* Do the memcpy instead of a cast to make sure we have */ ! /* aligned memory for a u_long */ ! memcpy (&net, answer->h_addr, sizeof (net)); ! net_entry.n_net = ntohl(net); ! ! /* Strip trailing zeros */ ! while ((net_entry.n_net & 0xff) == 0 && net_entry.n_net != 0) ! net_entry.n_net >>= 8; ! ! return (&net_entry); } struct netent * *************** *** 224,301 **** register unsigned long net; register int net_type; { ! unsigned int netbr[4]; ! int nn, anslen; ! querybuf buf; ! char qbuf[MAXDNAME]; ! unsigned long net2; ! struct netent *net_entry; ! ! if (net_type != AF_INET) ! return (NULL); ! ! for (nn = 4, net2 = net; net2; net2 >>= 8) ! netbr[--nn] = net2 & 0xff; ! switch (nn) { ! case 3: /* Class A */ ! sprintf(qbuf, "0.0.0.%u.in-addr.arpa", netbr[3]); ! break; ! case 2: /* Class B */ ! sprintf(qbuf, "0.0.%u.%u.in-addr.arpa", netbr[3], netbr[2]); ! break; ! case 1: /* Class C */ ! sprintf(qbuf, "0.%u.%u.%u.in-addr.arpa", netbr[3], netbr[2], ! netbr[1]); ! break; ! case 0: /* Class D - E */ ! sprintf(qbuf, "%u.%u.%u.%u.in-addr.arpa", netbr[3], netbr[2], ! netbr[1], netbr[0]); ! break; ! } ! anslen = res_query(qbuf, C_IN, T_PTR, (u_char *)&buf, sizeof(buf)); ! if (anslen < 0) { ! #ifdef DEBUG ! if (_res.options & RES_DEBUG) ! printf("res_query failed\n"); ! #endif ! return (NULL); ! } ! net_entry = getnetanswer(&buf, anslen, BYADDR); ! if (net_entry) { ! unsigned u_net = net; /* maybe net should be unsigned ? */ ! ! /* Strip trailing zeros */ ! while ((u_net & 0xff) == 0 && u_net != 0) ! u_net >>= 8; ! net_entry->n_net = u_net; ! return (net_entry); ! } ! return (NULL); } struct netent * _getnetbydnsname(net) register const char *net; { ! int anslen; ! querybuf buf; ! char qbuf[MAXDNAME]; ! ! if ((_res.options & RES_INIT) == 0 && res_init() == -1) { ! h_errno = NETDB_INTERNAL; ! return (NULL); ! } ! strncpy(qbuf, net, sizeof(qbuf) - 1); ! qbuf[sizeof(qbuf) - 1] = '\0'; ! anslen = res_search(qbuf, C_IN, T_PTR, (u_char *)&buf, sizeof(buf)); ! if (anslen < 0) { ! #ifdef DEBUG ! if (_res.options & RES_DEBUG) ! printf("res_query failed\n"); ! #endif ! return (NULL); ! } ! return getnetanswer(&buf, anslen, BYNAME); } void --- 134,194 ---- register unsigned long net; register int net_type; { ! unsigned int netbr[4]; ! int nn, anslen; ! char qbuf[MAXDNAME]; ! unsigned long net2; ! struct hostent *buf; ! ! for (nn = 4, net2 = net; net2; net2 >>= 8) ! netbr[--nn] = net2 & 0xff; ! switch (nn) { ! case 3: /* Class A */ ! sprintf(qbuf, "0.0.0.%u.in-addr.arpa", netbr[3]); ! break; ! case 2: /* Class B */ ! sprintf(qbuf, "0.0.%u.%u.in-addr.arpa", netbr[3], netbr[2]); ! break; ! case 1: /* Class C */ ! sprintf(qbuf, "0.%u.%u.%u.in-addr.arpa", netbr[3], netbr[2], ! netbr[1]); ! break; ! case 0: /* Class D - E */ ! sprintf(qbuf, "%u.%u.%u.%u.in-addr.arpa", netbr[3], netbr[2], ! netbr[1], netbr[0]); ! break; ! } ! ! anslen = strlen(qbuf); ! ! if ((_res.options & RES_INIT) == 0 && res_init() == -1) { ! h_errno = NETDB_INTERNAL; ! return (NULL); ! } ! if (_res.options & RES_USE_INET6) { /* XXX */ ! buf = _gethostbydnsaddr(qbuf, anslen, AF_INET6); /* XXX */ ! if (buf) /* XXX */ ! return (getnetanswer(buf)); /* XXX */ ! } /* XXX */ ! return getnetanswer(_gethostbydnsaddr(qbuf, anslen, AF_INET)); } struct netent * _getnetbydnsname(net) register const char *net; { ! struct hostent *hp; ! ! if ((_res.options & RES_INIT) == 0 && res_init() == -1) { ! h_errno = NETDB_INTERNAL; ! return (NULL); ! } ! if (_res.options & RES_USE_INET6) { /* XXX */ ! hp = _gethostbydnsname(net, AF_INET6); /* XXX */ ! if (hp) /* XXX */ ! return (getnetanswer(hp)); /* XXX */ ! } /* XXX */ ! return getnetanswer(_gethostbydnsname(net, AF_INET)); } void --------------Boundary-00=_6K3RC8Q3XF1UIKP2VYDR-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 23:25:57 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 992F437B401; Wed, 9 Oct 2002 23:25:55 -0700 (PDT) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id CF99543E42; Wed, 9 Oct 2002 23:25:54 -0700 (PDT) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g9A6Psh26983; Thu, 10 Oct 2002 02:25:54 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Thu, 10 Oct 2002 02:25:53 -0400 (EDT) From: Jeff Roberson To: Terry Lambert Cc: Jeff Roberson , Subject: Re: Scheduler patch, ready for commit. In-Reply-To: <3DA50FA4.3C8BE165@mindspring.com> Message-ID: <20021010022058.A23516-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, Terry Lambert wrote: > Jeff Roberson wrote: > > I haven't heard any objections to the goals of this patch. I have cleaned > > it up and readied it for commit. This step is important so that I can > > stop manually merging in new scheduler changes and get on with the new > > scheduler. This patch does not change any functionality in the current > > system. It is only a code reorg. > > > > As always any comments are welcome. The patch is available at > > http://www.chesapeake.net/~jroberson/sched.patch > > THe documentation that you remove in kern_exit.c is not recreated > anywhere. I think the XXX comments are important expository > information. I tried to be careful about that. I guess I missed one. I'll throw it back in. Thanks! > > I'm somewhat concerned that you go to all this trouble, and then > don't seperate out the statistics data from the proc structure; > this probably means pushing the allocation of the proc structure > into the scheduler code, if it's supposed to be one lump, but it > should be just as easy to allocate it seperately with an encapsulated > allocation that allocated the scheduler part, the proc part, and then > aggregates them, all protected by the proc lock, and then imply that > the proc lock protexts the data (since they will never divorce, even > on deallocation, because the proc structs go to a free list, unless > the memory is freed back to the system). I rather expected the > statistical data, which is algorithm dependent, to be broken out. > Yes, I agree, this is an important next step. I'm thinking that the scheduler should indicate how much space is needed to the proc allocation code. This much extra space could be allocated, and a pointer to scheduler specific data could really be a pointer within that allocated structure. This way it might be near enough for processor caches to be effective. Clearly this needs more work. That is outside of the scope of the current patch though. Thanks, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 23:40:15 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AF65437B401 for ; Wed, 9 Oct 2002 23:40:14 -0700 (PDT) Received: from sccrmhc03.attbi.com (sccrmhc03.attbi.com [204.127.202.63]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3777143E3B for ; Wed, 9 Oct 2002 23:40:14 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by sccrmhc03.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021010064013.HDNZ20316.sccrmhc03.attbi.com@InterJet.elischer.org>; Thu, 10 Oct 2002 06:40:13 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id XAA49880; Wed, 9 Oct 2002 23:32:05 -0700 (PDT) Date: Wed, 9 Oct 2002 23:32:04 -0700 (PDT) From: Julian Elischer To: Jeff Roberson Cc: Luigi Rizzo , arch@FreeBSD.ORG Subject: Re: Scheduler patch, ready for commit. In-Reply-To: <20021009234324.F23516-100000@mail.chesapeake.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Luigi. I'm giving a detailed walkthtough of KSE at the BAFUG meeting thursday evening. http://www.bafug.org/ if you can make it.. that'd be great.. BTW, jeff, mini just moved to Seattle. maybe he can lease with you re: KSE scheduling.. On Thu, 10 Oct 2002, Jeff Roberson wrote: > On Wed, 9 Oct 2002, Luigi Rizzo wrote: [comments on jeffs changes] To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Oct 9 23:40:19 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 764E537B401; Wed, 9 Oct 2002 23:40:18 -0700 (PDT) Received: from sccrmhc03.attbi.com (sccrmhc03.attbi.com [204.127.202.63]) by mx1.FreeBSD.org (Postfix) with ESMTP id D999143E3B; Wed, 9 Oct 2002 23:40:17 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by sccrmhc03.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021010064017.HDOS20316.sccrmhc03.attbi.com@InterJet.elischer.org>; Thu, 10 Oct 2002 06:40:17 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id XAA49889; Wed, 9 Oct 2002 23:38:23 -0700 (PDT) Date: Wed, 9 Oct 2002 23:38:22 -0700 (PDT) From: Julian Elischer To: "Bruce A. Mah" Cc: Jeff Roberson , Luigi Rizzo , arch@FreeBSD.ORG Subject: Re: Scheduler patch, ready for commit. In-Reply-To: <200210100446.g9A4k6kx026651@intruder.bmah.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, Bruce A. Mah wrote: > If memory serves me right, Jeff Roberson wrote: > > On Wed, 9 Oct 2002, Luigi Rizzo wrote: > > > > > > > > well, you said you posted it just to get feedback, and now in 24 hours > > > you declare it "ready for commit". A bit rushing, aren't you! > > > > Perhaps I wasn't clear. I was looking for an indication of whether or not > > this was something people were interested in having in for 5.0. I'm not > > going to rush the commit but by declaring it to be of a sufficient quality > > for commiting I was hoping to scare more poeple into reviewing it. ;-) > > Let me just briefly don my RE team member hat and say that for right > now, I'm much more interested in seeing commits to make CURRENT more > stable, rather than seeing people add lots of new functionality. > Remember that we're targeting a release in less than two months. It's > not going to be possible to make CURRENT perfect by then, but we need to > avoid making this process more complicated by adding loads of new > features, especially in the area of something as fundamental as the > scheduler. > > This is not a comment on the quality of your work. I freely admit that > I'm not qualified to review it. THIS change only moves stuff around, but it allows people to write replacement parts. It'd be good to have the frameowrk in 5.0 so people can use 5.0 as a scheduler testbed. > > Bruce. > > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Oct 10 0: 0:31 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 46DDB37B401 for ; Thu, 10 Oct 2002 00:00:30 -0700 (PDT) Received: from sccrmhc01.attbi.com (sccrmhc01.attbi.com [204.127.202.61]) by mx1.FreeBSD.org (Postfix) with ESMTP id 712F943E88 for ; Thu, 10 Oct 2002 00:00:29 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by sccrmhc01.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021010070028.HPFH29655.sccrmhc01.attbi.com@InterJet.elischer.org>; Thu, 10 Oct 2002 07:00:28 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id XAA49946; Wed, 9 Oct 2002 23:45:13 -0700 (PDT) Date: Wed, 9 Oct 2002 23:45:12 -0700 (PDT) From: Julian Elischer To: Jeff Roberson Cc: Luigi Rizzo , arch@FreeBSD.ORG Subject: Re: Scheduler patch, ready for commit. In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, Julian Elischer wrote: > BTW, jeff, mini just moved to Seattle. > maybe he can lease with you re: KSE scheduling.. Duh liase.... To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Oct 10 0: 4:33 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 22E3837B401 for ; Thu, 10 Oct 2002 00:04:33 -0700 (PDT) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 75EF643E65 for ; Thu, 10 Oct 2002 00:04:32 -0700 (PDT) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g9A74RO40223; Thu, 10 Oct 2002 03:04:28 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Thu, 10 Oct 2002 03:04:27 -0400 (EDT) From: Jeff Roberson To: Julian Elischer Cc: Luigi Rizzo , Subject: Re: Scheduler patch, ready for commit. In-Reply-To: Message-ID: <20021010030316.S23516-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, Julian Elischer wrote: > > On Wed, 9 Oct 2002, Julian Elischer wrote: > > BTW, jeff, mini just moved to Seattle. > > maybe he can lease with you re: KSE scheduling.. > Duh liase.... > > Yes, he was very helpful in bringing me up to speed on KSE. I'm sure we can collaborate more in the future. Cheers, jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Oct 10 0: 7:21 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2FFEF37B401; Thu, 10 Oct 2002 00:07:18 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8D1B543E65; Thu, 10 Oct 2002 00:07:16 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id RAA16231; Thu, 10 Oct 2002 17:07:12 +1000 Date: Thu, 10 Oct 2002 17:17:20 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: "David O'Brien" Cc: John Baldwin , Mike Barcroft , , Andrew Gallatin Subject: Re: lp64 vs lp32 printf In-Reply-To: <20021009220522.GA65943@dragon.nuxi.com> Message-ID: <20021010162920.Y8030-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, David O'Brien wrote: > How is this patch? > > Index: contrib/gcc/c-format.c > =================================================================== > RCS file: /home/ncvs/src/contrib/gcc/c-format.c,v > retrieving revision 1.5 > diff -u -r1.5 c-format.c > --- contrib/gcc/c-format.c 12 Jul 2002 00:49:52 -0000 1.5 > +++ contrib/gcc/c-format.c 9 Oct 2002 21:52:40 -0000 > @@ -795,10 +795,12 @@ > The format %D provides a hexdump given a pointer and separator string: > ("%6D", ptr, ":") -> XX:XX:XX:XX:XX:XX > ("%*D", len, ptr, " ") -> XX XX XX XX ... > + The format %H is a version of %x that allows for a sign > + (e.g. -0x10 instead of 0xfffffff0, or +0x10). I'm not sure if I like 'H'. It's closer to the floating point specifiers [EFG] than to the hex specifiers [xX]. The following comment assuems that %H and %+ have been fixed. Currently, plain %H is equivalent to %x, and %+H gives the non-broken %H but not the normal userland behaviour for %+. This format doesn't "allow" for a sign. It gives one if the value is negative when cast to a signed integer of the relevant size. It does not give one if this value is positive, and it does not give an 0x prefix, so +0x10 is a bad example. %+H would give +10 in the same way that %+d would give +16 if %+ actually worked in the kernel; %#+H would also give the 0x prefix. 0xfffffff0 being printed as -0x10 is a bad example. 0xfffffffff0 is only printed as -10 (not -0x10) on 32-bit 2's complement machines. -0x10 is printed as -0x10. There should be a comma after "e.g.". Correctly the comment gives: The format %H is like %x except it takes a signed (int or long long) arg instead of an unsigned (int, long, long long) or uintmax_t arg, and prints in "signed hex" format instead of hex format (e.g., -16 is printed as "-10", and 16 is printed as "10"). > { "D", 1, STD_EXT, { T89_C, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-wp", "cR" }, > { "b", 1, STD_EXT, { T89_C, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-wp", "" }, > - { "rz", 0, STD_EXT, { T89_I, BADLEN, BADLEN, T89_L, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-wp0 +#", "i" }, > + { "rH", 0, STD_EXT, { T89_I, BADLEN, BADLEN, T89_L, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN }, "-wp0 +#", "i" }, Try adding long long (ugh) support here. It is implemented for %H in printf(9), since %H mostly uses the same code as %x. > Index: share/man/man9/printf.9 > =================================================================== > RCS file: /home/ncvs/src/share/man/man9/printf.9,v > retrieving revision 1.3 > diff -u -r1.3 printf.9 > --- share/man/man9/printf.9 1 Oct 2001 16:09:25 -0000 1.3 > +++ share/man/man9/printf.9 9 Oct 2002 21:55:51 -0000 > ... > @@ -102,6 +106,12 @@ > The string is used as a delimiter between individual bytes. > If present, a width directive will specify the number of bytes to display. > By default, 16 bytes of data are output. > +.Pp > +The > +.Cm \&%H > +identifier is a version of > +.Cm \&%x > +that allows for a sign (e.g. -0x10 instead of 0xfffffff0, or +0x10). The details mostly belong here, not in gcc. More details: If the arg is negative, then a "-" followed by the negation of the arg (in infinite precision) is printed (in hex). The current implementation doesn't actually use infinite precision and only works on normal 2's complement machines. E.g., it prints -1 as "0" on normal 1's complement machines, and it has overflow problems negating INT_MAX. I only tested this behaviour by reading the code. > Index: sys/kern/subr_prf.c > =================================================================== > RCS file: /home/ncvs/src/sys/kern/subr_prf.c,v > retrieving revision 1.88 > diff -u -r1.88 subr_prf.c > --- sys/kern/subr_prf.c 28 Sep 2002 21:34:31 -0000 1.88 > +++ sys/kern/subr_prf.c 9 Oct 2002 21:49:08 -0000 > @@ -662,7 +662,7 @@ > case 'X': > base = 16; > goto handle_nosign; > - case 'z': > + case 'H': > base = 16; > if (sign) > goto handle_sign; > OK. jhb has a patch to make plain %H actually work. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Oct 10 1:16:10 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9989437B401; Thu, 10 Oct 2002 01:16:07 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id F149643EAA; Thu, 10 Oct 2002 01:16:05 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id SAA24768; Thu, 10 Oct 2002 18:15:52 +1000 Date: Thu, 10 Oct 2002 18:26:00 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Luigi Rizzo Cc: Jeff Roberson , Subject: Re: Scheduler patch, ready for commit. In-Reply-To: <20021009193501.A55534@carp.icir.org> Message-ID: <20021010171931.U8144-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, Luigi Rizzo wrote: > On Wed, Oct 09, 2002 at 09:17:18PM -0400, Jeff Roberson wrote: > > I haven't heard any objections to the goals of this patch. I have cleaned > > it up and readied it for commit. This step is important so that I can > > well, you said you posted it just to get feedback, and now in 24 hours > you declare it "ready for commit". A bit rushing, aren't you! I agree. I got it too late yesterday to respond then. > I totally agree on the goals of the change (to abstract the scheduler > from the base system, generalise it, etc.), but it seems to me that > the specific details still need some cleanup before committing. > And, well, maybe leave a little bit more time to people to provide feedback! > > So to come to the specific points: > > * there is one important API function which in my opinion is > missing: -stable has a function, curpriority_cmp, which compares > the priority of a currently running "thing" (process/thread) > with that of a newly awaken one, and decises who has more right > to get the CPU. This function is a mistake which never actually worked. > In -current this is done inline (by comparing priorities), -current cleaned this up by restoring priorities to a simple totally ordered scheme, but didn't actually restore the inline comparisons that worked. It still does the the main comparison in maybe_resched(), and still gets it wrong by comparing wrong priorities (td->td_priority vs. curthread->td_curpriority). KSE has made this bug more obvious: td is quite often curthread, so the comparision fails, and even when td is not curthread, td->td_priority is often stale. The correct priority for at least the call to maybe_resched() from resetpriority() is kg->kg_user_pri (was p->p_usrpri). td->td_priority is set to kg->kg_user_pri in schedclock() _after_ calling resetpriority() there, so the bug only causes rescheduling delays of at most INVERSE_ESTCPU_WEIGHT ticks (default 1/16 seconds). maybe_resched() is a mistake too. It is easier to get the comparison correct by comparing the correct terms inline in the few places that the comparsion is done than to pass the correct terms to maybe_resched(). KSE reduced the number of such places by uninlining "OPTIMIZED EXPANSION OF setrunnable()". > but this is very specific of the scheduler used there -- e.g. > it doesn't work for the scheduler used in -stable (where we have > 3 different classes for rtprio, normal and idlepri processes), > and it does not work in general in cases where you could have > different metrics to decide who is going to proceed. curpriority_cmp() and maybe_resched() are unsitable for APIs _because_ they are very is very scheduler-specific (non-broken schedulers schedulers don't even have them :-). However, I think totally ordered priorities are more or less required for priority propagation to work right. The scheduling algorithm is unimportant for priority propagation -- there just must be a way to make selected processes run in preference to others so that priority inversion doesn't occur. Schedulers can work by mapping their decisions into generic td_priority values. The range of generic values should be larger than [0..UCHAR_MAX]. > * Another API which should be made generic is forward_roundrobin(). > I believe the purpose of a "generic" version of this function > is to dispatch a generic timeout to the appropriate CPU(s). In > the FreeBSD's scheduler this timeout happens to be the roundrobin > timeout, hence the name, but other schedulers (e.g. the one we > wrote) have different timeout routines, and the dispatching > requirements vary (e.g. could need to go to a single CPU as > opposed to all cpus). forward_roundrobin() seems to be generic enough already. It doesn't decide the timeout. It just reschedules the current thread on _all_ CPUs. This is a bit too scheduler-dependent. A generic version would let the scheduler choose which threads to reschedule and just kick the CPUs that these threads are running on. But it is only an optimization to avoid null rescheduling. Schedulers can kick running threads directly with different costs I agree with most of Luigi's other points. Just don't pessimize things more than 0.01% using too many function calls and indirections. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Oct 10 1:19:59 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1ED8637B401; Thu, 10 Oct 2002 01:19:58 -0700 (PDT) Received: from snipe.mail.pas.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62]) by mx1.FreeBSD.org (Postfix) with ESMTP id B4C7643EB1; Thu, 10 Oct 2002 01:19:57 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0055.cvx40-bradley.dialup.earthlink.net ([216.244.42.55] helo=mindspring.com) by snipe.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17zYXz-0004zo-00; Thu, 10 Oct 2002 01:19:56 -0700 Message-ID: <3DA537E4.274A3714@mindspring.com> Date: Thu, 10 Oct 2002 01:18:44 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Roberson Cc: Jeff Roberson , arch@freebsd.org Subject: Re: Scheduler patch, ready for commit. References: <20021010022058.A23516-100000@mail.chesapeake.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Jeff Roberson wrote: [ ... seperate out scheduler specific proc struct gunk ... ] > Yes, I agree, this is an important next step. I'm thinking that the > scheduler should indicate how much space is needed to the proc allocation > code. This much extra space could be allocated, and a pointer to > scheduler specific data could really be a pointer within that allocated > structure. This way it might be near enough for processor caches to be > effective. Clearly this needs more work. That is outside of the scope of > the current patch though. I understand that, but it means that it leaves data interfaces lying around; to be completely abstract, everything has to be procedural, with data references occurring via accessor/mutator functions. If the point is to create the abstract interface, then you probably want to be thorough. As far as throwing it to the proc allocation code, to allocate the proc struct and the data as a single lump, that's probably *not* a good idea, unless there is a scheduler specific allocation and free function ("new/delete"). The reason is that it's a short step from having an abstract interface to supporting multiple scheduler classes simultaneously in the same system. You may actually want to look at the Solaris/SVR4 implementation, which supports both scheduling classes as loadable modules, and simultaneous multiple scheduler classes (SVID III(RT) and the "fixed" scheduling class, used to improve interactive response of the X server, as well as a batch scheduler, are included in the defaults for both systems). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Oct 10 5: 5:19 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3102037B401; Thu, 10 Oct 2002 05:05:18 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0B9E143EAC; Thu, 10 Oct 2002 05:05:17 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id WAA12533; Thu, 10 Oct 2002 22:05:14 +1000 Date: Thu, 10 Oct 2002 22:15:23 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: John Baldwin Cc: freebsd-arch@FreeBSD.org, Peter Wemm , Andrew Gallatin Subject: Re: lp64 vs lp32 printf In-Reply-To: Message-ID: <20021010221405.G8817-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 9 Oct 2002, John Baldwin wrote: > On 09-Oct-2002 Bruce Evans wrote: > > On Wed, 9 Oct 2002, John Baldwin wrote: > >> [ddb/db_examine.c's use of %z] > >> Hmm, the second case doesn't even use a sign so it can be %x anyways. > > > > This seems to be just a bug. The original db_printf() prints -1 as -1 > > for %z format. From db_output.c rev.1.1: > > So should %z force sign on? > > --- subr_prf.c 28 Sep 2002 21:34:31 -0000 1.88 > +++ subr_prf.c 9 Oct 2002 20:41:53 -0000 > @@ -664,8 +664,8 @@ > goto handle_nosign; > case 'z': > base = 16; > - if (sign) > - goto handle_sign; > + sign = 1; > + goto handle_sign; > handle_nosign: > sign = 0; > if (jflag) > This seems to be correct. I have not checked any of this at runtime. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Oct 10 6:30:30 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 081B137B404 for ; Thu, 10 Oct 2002 06:30:23 -0700 (PDT) Received: from mail.speakeasy.net (mail15.speakeasy.net [216.254.0.215]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7001943EAA for ; Thu, 10 Oct 2002 06:30:20 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 26910 invoked from network); 10 Oct 2002 13:30:19 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail15.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 10 Oct 2002 13:30:19 -0000 Received: from laptop.baldwin.cx (laptop.baldwin.cx [192.168.0.4]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id g9ADUHn5014556; Thu, 10 Oct 2002 09:30:17 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20021009220522.GA65943@dragon.nuxi.com> Date: Thu, 10 Oct 2002 09:30:21 -0400 (EDT) From: John Baldwin To: "David O'Brien" Subject: Re: lp64 vs lp32 printf Cc: Andrew Gallatin , freebsd-arch@FreeBSD.org, Mike Barcroft Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 09-Oct-2002 David O'Brien wrote: > > How is this patch? Perhaps "%y" instead of %H? It's closer to %x and was somewhat agreed upon earlier. > Index: contrib/gcc/c-format.c > =================================================================== > RCS file: /home/ncvs/src/contrib/gcc/c-format.c,v > retrieving revision 1.5 > diff -u -r1.5 c-format.c > --- contrib/gcc/c-format.c 12 Jul 2002 00:49:52 -0000 1.5 > +++ contrib/gcc/c-format.c 9 Oct 2002 21:52:40 -0000 > @@ -795,10 +795,12 @@ > The format %D provides a hexdump given a pointer and separator string: > ("%6D", ptr, ":") -> XX:XX:XX:XX:XX:XX > ("%*D", len, ptr, " ") -> XX XX XX XX ... > + The format %H is a version of %x that allows for a sign > + (e.g. -0x10 instead of 0xfffffff0, or +0x10). > */ > { "D", 1, STD_EXT, { T89_C, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, > BADLEN }, "-wp", "cR" }, > { "b", 1, STD_EXT, { T89_C, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, BADLEN, > BADLEN }, "-wp", "" }, > - { "rz", 0, STD_EXT, { T89_I, BADLEN, BADLEN, T89_L, BADLEN, BADLEN, BADLEN, BADLEN, > BADLEN }, "-wp0 +#", "i" }, > + { "rH", 0, STD_EXT, { T89_I, BADLEN, BADLEN, T89_L, BADLEN, BADLEN, BADLEN, BADLEN, > BADLEN }, "-wp0 +#", "i" }, > { NULL, 0, 0, NOLENGTHS, NULL, NULL } > }; > > Index: share/man/man9/printf.9 > =================================================================== > RCS file: /home/ncvs/src/share/man/man9/printf.9,v > retrieving revision 1.3 > diff -u -r1.3 printf.9 > --- share/man/man9/printf.9 1 Oct 2001 16:09:25 -0000 1.3 > +++ share/man/man9/printf.9 9 Oct 2002 21:55:51 -0000 > @@ -66,7 +66,7 @@ > .Xr printf 3 . > However, > .Xr printf 9 > -adds two other conversion specifiers. > +adds four conversion specifiers. > .Pp > The > .Cm \&%b > @@ -90,6 +90,10 @@ > for the last bit identifier. > .Pp > The > +.Cm \&%r > +identifier is undocumented. > +.Pp > +The > .Cm \&%D > identifier is meant to assist in hexdumps. > It requires two arguments: a > @@ -102,6 +106,12 @@ > The string is used as a delimiter between individual bytes. > If present, a width directive will specify the number of bytes to display. > By default, 16 bytes of data are output. > +.Pp > +The > +.Cm \&%H > +identifier is a version of > +.Cm \&%x > +that allows for a sign (e.g. -0x10 instead of 0xfffffff0, or +0x10). > .Sh RETURN VALUES > The > .Fn printf > Index: sys/ddb/db_examine.c > =================================================================== > RCS file: /home/ncvs/src/sys/ddb/db_examine.c,v > retrieving revision 1.29 > diff -u -r1.29 db_examine.c > --- sys/ddb/db_examine.c 25 Jun 2002 15:59:24 -0000 1.29 > +++ sys/ddb/db_examine.c 9 Oct 2002 21:49:55 -0000 > @@ -129,7 +129,7 @@ > case 'z': /* signed hex */ > value = db_get_value(addr, size, TRUE); > addr += size; > - db_printf("%-*lz", width, (long)value); > + db_printf("%-*lH", width, (long)value); > break; > case 'd': /* signed decimal */ > value = db_get_value(addr, size, TRUE); > @@ -212,8 +212,8 @@ > case 'x': > db_printf("%8lx", (unsigned long)addr); > break; > - case 'z': > - db_printf("%8lz", (long)addr); > + case 'H': > + db_printf("%8lH", (long)addr); > break; > case 'd': > db_printf("%11ld", (long)addr); > Index: sys/kern/subr_prf.c > =================================================================== > RCS file: /home/ncvs/src/sys/kern/subr_prf.c,v > retrieving revision 1.88 > diff -u -r1.88 subr_prf.c > --- sys/kern/subr_prf.c 28 Sep 2002 21:34:31 -0000 1.88 > +++ sys/kern/subr_prf.c 9 Oct 2002 21:49:08 -0000 > @@ -662,7 +662,7 @@ > case 'X': > base = 16; > goto handle_nosign; > - case 'z': > + case 'H': > base = 16; > if (sign) > goto handle_sign; -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Oct 10 7:38:59 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 89FDB37B401; Thu, 10 Oct 2002 07:38:58 -0700 (PDT) Received: from dragon.nuxi.com (trang.nuxi.com [66.92.13.169]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1092E43EAF; Thu, 10 Oct 2002 07:38:54 -0700 (PDT) (envelope-from obrien@NUXI.com) Received: from dragon.nuxi.com (obrien@localhost [127.0.0.1]) by dragon.nuxi.com (8.12.6/8.12.2) with ESMTP id g9AEck5b001521; Thu, 10 Oct 2002 07:38:46 -0700 (PDT) (envelope-from obrien@dragon.nuxi.com) Received: (from obrien@localhost) by dragon.nuxi.com (8.12.6/8.12.5/Submit) id g9AEck8X001520; Thu, 10 Oct 2002 07:38:46 -0700 (PDT) Date: Thu, 10 Oct 2002 07:38:45 -0700 From: "David O'Brien" To: John Baldwin Cc: Andrew Gallatin , bde@FreeBSD.org, Mike Barcroft , freebsd-arch@FreeBSD.org Subject: Re: lp64 vs lp32 printf Message-ID: <20021010143845.GA1448@dragon.nuxi.com> Reply-To: obrien@FreeBSD.org References: <20021009220522.GA65943@dragon.nuxi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i X-Operating-System: FreeBSD 5.0-CURRENT Organization: The NUXI BSD Group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, Oct 10, 2002 at 05:17:20PM +1000, Bruce Evans wrote: > I'm not sure if I like 'H'. It's closer to the floating point > specifiers > [EFG] than to the hex specifiers [xX]. On Thu, Oct 10, 2002 at 09:30:21AM -0400, John Baldwin wrote: > Perhaps "%y" instead of %H? It's closer to %x and was somewhat agreed upon > earlier. I was looking for something actually implied what the thing does. It took too much digging to figure out what it meant. That is why I picked 'H' for Hex. Does anyone have a more suggestive letter than y? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Oct 10 8: 4:29 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4726737B401 for ; Thu, 10 Oct 2002 08:04:28 -0700 (PDT) Received: from mail.speakeasy.net (mail15.speakeasy.net [216.254.0.215]) by mx1.FreeBSD.org (Postfix) with ESMTP id F173B43EA3 for ; Thu, 10 Oct 2002 08:04:26 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 14538 invoked from network); 10 Oct 2002 15:04:26 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail15.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 10 Oct 2002 15:04:26 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id g9AF4On5014873; Thu, 10 Oct 2002 11:04:24 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20021010143845.GA1448@dragon.nuxi.com> Date: Thu, 10 Oct 2002 11:04:28 -0400 (EDT) From: John Baldwin To: "David O'Brien" Subject: Re: lp64 vs lp32 printf Cc: freebsd-arch@FreeBSD.org, Mike Barcroft , bde@FreeBSD.org, Andrew Gallatin Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 10-Oct-2002 David O'Brien wrote: > On Thu, Oct 10, 2002 at 05:17:20PM +1000, Bruce Evans wrote: >> I'm not sure if I like 'H'. It's closer to the floating point >> specifiers >> [EFG] than to the hex specifiers [xX]. > > > On Thu, Oct 10, 2002 at 09:30:21AM -0400, John Baldwin wrote: >> Perhaps "%y" instead of %H? It's closer to %x and was somewhat agreed upon >> earlier. > > I was looking for something actually implied what the thing does. It > took too much digging to figure out what it meant. That is why I picked > 'H' for Hex. Does anyone have a more suggestive letter than y? %+x is the most logical thing but we can't use that. :-P %h means short, and this has nothing to do with printing shorts. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Oct 10 9:58:30 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E0AB937B401; Thu, 10 Oct 2002 09:58:28 -0700 (PDT) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9D43943EAF; Thu, 10 Oct 2002 09:58:28 -0700 (PDT) (envelope-from baka@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1921) id 6C5B6AE165; Thu, 10 Oct 2002 09:58:28 -0700 (PDT) Date: Thu, 10 Oct 2002 09:58:28 -0700 From: Jon Mini To: Peter Wemm Cc: bmah@FreeBSD.ORG, Jeff Roberson , Luigi Rizzo , arch@FreeBSD.ORG Subject: Re: Scheduler patch, ready for commit. Message-ID: <20021010165828.GA82783@elvis.mu.org> References: <200210100446.g9A4k6kx026651@intruder.bmah.org> <20021010051241.700B22A88D@canning.wemm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20021010051241.700B22A88D@canning.wemm.org> User-Agent: Mutt/1.4i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Peter Wemm [peter@wemm.org] wrote : > "Bruce A. Mah" wrote: > > > Let me just briefly don my RE team member hat and say that for right > > now, I'm much more interested in seeing commits to make CURRENT more > > stable, rather than seeing people add lots of new functionality. > > Remember that we're targeting a release in less than two months. It's > > not going to be possible to make CURRENT perfect by then, but we need to > > avoid making this process more complicated by adding loads of new > > features, especially in the area of something as fundamental as the > > scheduler. > > To answer your concerns.. What Jeff is doing is trying to neatly > encapsulate the existing scheduler into one place with a well defined > interface and hooks to the rest of the kernel. As long as this is done > right, it is a NOP change.. but with an important difference. It then > allows optional drop-in replacements to be worked on independently. > > I personally think it is worth it since the potential gains are so great - > as long as as this step is done carefully and doesn't change the existing > policy and strategies. And that just happens to be what Jeff is trying to > do. FWIW, I agree. We should push this into 5.0-R. There are no real functional changes, just an abstration. This abstraction will help us along the lifetime of 5.0-R noticably, because a fair amount of scheduler tweaking is going to have to happen within the next 5.x-R timeline to adjust our scheduling methods so that KSE processes schedule well. This type of work will benifit well from Jeff's changes. As he said, it's not perfect, but it is a strong step in the right direction. -- Jonathan Mini http://www.freebsd.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Oct 10 9:59: 3 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E8A8837B401 for ; Thu, 10 Oct 2002 09:59:02 -0700 (PDT) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id AE2A043EAC for ; Thu, 10 Oct 2002 09:59:02 -0700 (PDT) (envelope-from baka@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1921) id 7CA94AE1C1; Thu, 10 Oct 2002 09:59:02 -0700 (PDT) Date: Thu, 10 Oct 2002 09:59:02 -0700 From: Jon Mini To: Jeff Roberson Cc: Luigi Rizzo , arch@FreeBSD.ORG Subject: Re: Scheduler patch, ready for commit. Message-ID: <20021010165902.GB82783@elvis.mu.org> References: <20021009193501.A55534@carp.icir.org> <20021009234324.F23516-100000@mail.chesapeake.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20021009234324.F23516-100000@mail.chesapeake.net> User-Agent: Mutt/1.4i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Jeff Roberson [jroberson@chesapeake.net] wrote : > > Finally, thanks for working on this stuff! > > > Thank you very much for the feedback! Perhaps at the next BSDcon we can > sit down and devise a good plan for a full featured framework. I would very much like to join that conversation. =) -- Jonathan Mini http://www.freebsd.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Oct 10 10:32:46 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3E86237B401 for ; Thu, 10 Oct 2002 10:32:45 -0700 (PDT) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0586F43EB1 for ; Thu, 10 Oct 2002 10:32:41 -0700 (PDT) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id 583872A88D; Thu, 10 Oct 2002 10:32:37 -0700 (PDT) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Jeff Roberson Cc: arch@freebsd.org Subject: Re: Scheduler patch, ready for commit. In-Reply-To: <20021010022058.A23516-100000@mail.chesapeake.net> Date: Thu, 10 Oct 2002 10:32:37 -0700 From: Peter Wemm Message-Id: <20021010173237.583872A88D@canning.wemm.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Jeff Roberson wrote: > On Wed, 9 Oct 2002, Terry Lambert wrote: > > I'm somewhat concerned that you go to all this trouble, and then > > don't seperate out the statistics data from the proc structure; > > this probably means pushing the allocation of the proc structure > > into the scheduler code, if it's supposed to be one lump, but it > > should be just as easy to allocate it seperately with an encapsulated > > allocation that allocated the scheduler part, the proc part, and then > > aggregates them, all protected by the proc lock, and then imply that > > the proc lock protexts the data (since they will never divorce, even > > on deallocation, because the proc structs go to a free list, unless > > the memory is freed back to the system). I rather expected the > > statistical data, which is algorithm dependent, to be broken out. > > > Yes, I agree, this is an important next step. I'm thinking that the > scheduler should indicate how much space is needed to the proc allocation > code. This much extra space could be allocated, and a pointer to > scheduler specific data could really be a pointer within that allocated > structure. This way it might be near enough for processor caches to be > effective. Clearly this needs more work. That is outside of the scope of > the current patch though. If you're going to do this, a low impact way to do it might be do to what the pcpu stuff does and what vm_page_t does to insert MD fields into a MI structure. Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Oct 10 10:34:34 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2211B37B401 for ; Thu, 10 Oct 2002 10:34:34 -0700 (PDT) Received: from rootlabs.com (root.org [67.118.192.226]) by mx1.FreeBSD.org (Postfix) with SMTP id 92DB443E9C for ; Thu, 10 Oct 2002 10:34:28 -0700 (PDT) (envelope-from nate@rootlabs.com) Received: (qmail 18175 invoked by uid 1000); 10 Oct 2002 17:34:29 -0000 Date: Thu, 10 Oct 2002 10:34:29 -0700 (PDT) From: Nate Lawson To: Jeff Roberson Cc: arch@freebsd.org Subject: Re: Scheduler patch, ready for commit. In-Reply-To: <20021010022058.A23516-100000@mail.chesapeake.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, 10 Oct 2002, Jeff Roberson wrote: > Yes, I agree, this is an important next step. I'm thinking that the > scheduler should indicate how much space is needed to the proc allocation > code. This much extra space could be allocated, and a pointer to > scheduler specific data could really be a pointer within that allocated > structure. This way it might be near enough for processor caches to be > effective. Clearly this needs more work. That is outside of the scope of > the current patch though. > > Thanks, > Jeff Just a minor point: if it's allocated as one chunk, the offset will need to be word-aligned. -Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Oct 10 19:33:23 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E9AC937B401; Thu, 10 Oct 2002 19:33:21 -0700 (PDT) Received: from shuttle.wide.toshiba.co.jp (shuttle.wide.toshiba.co.jp [202.249.10.124]) by mx1.FreeBSD.org (Postfix) with ESMTP id A1BD943E8A; Thu, 10 Oct 2002 19:33:20 -0700 (PDT) (envelope-from jinmei@isl.rdc.toshiba.co.jp) Received: from localhost ([3ffe:501:4819:2000:200:39ff:fe10:85d7]) by shuttle.wide.toshiba.co.jp (8.11.6/8.9.1) with ESMTP id g9B2XAt31374; Fri, 11 Oct 2002 11:33:10 +0900 (JST) Date: Fri, 11 Oct 2002 11:33:52 +0900 Message-ID: From: JINMEI Tatuya / =?ISO-2022-JP?B?GyRCP0BMQEMjOkgbKEI=?= To: "Sam Leffler" Cc: "Julian Elischer" , , Subject: Re: CFR: m_tag patch In-Reply-To: <18d301c26e5e$8b5c7a30$52557f42@errno.com> References: <18d301c26e5e$8b5c7a30$52557f42@errno.com> User-Agent: Wanderlust/2.6.1 (Upside Down) Emacs/21.2 Mule/5.0 (SAKAKI) Organization: Research & Development Center, Toshiba Corp., Kawasaki, Japan. MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Dispatcher: imput version 20000228(IM140) Lines: 42 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG >>>>> On Mon, 7 Oct 2002 17:06:25 -0700, >>>>> "Sam Leffler" said: >> > If you allocate tag id's using your 32-bit time scheme then the fixed > values >> > above would never be hit since they are all for impossible times and so >> > there'd be no conflict. >> >> Just make them all IDs in a single "Legacy" API >> > Good idea; I see the way out. Try this: > struct m_tag { > SLIST_ENTRY(m_tag) m_tag_link; /* List of packet tags */ > u_int16_t m_tag_id; /* Tag ID */ > u_int16_t m_tag_len; /* Length of data */ > u_int32_t m_tag_cookie; /* Module/ABI */ > }; > Then define the "Legacy ABI" to be zero (or whatever you want). Then all > the m_tag_* routines that I specified work only for the Legacy ABI. > (Whether this is done with shims or whatever doesn't matter.) This gives me > the compatiblity I want with openbsd and gives you the functionality you > need for netgraph. For new work we can specify users should avoid the > Legacy ABI. > Cost is basically 4 bytes per tag and an extra compare when walking the > tags. Happy? Sorry for interrupting, but please let me make it sure. Do you intend to hide the additional member from other modules than the m_tag internal? I'm afraid a story that (e.g.) some code fragments in the network layer directly refers to m_tag_cookie, which will break source level compatibility with other BSDs (when the code fragments are shared with others). As suz said before, we (KAME) are very much afraid of this kind of story. JINMEI, Tatuya Communication Platform Lab. Corporate R&D Center, Toshiba Corp. jinmei@isl.rdc.toshiba.co.jp To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Oct 11 3: 3:32 2002 Delivered-To: freebsd-arch@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 931) id BA36E37B401; Fri, 11 Oct 2002 03:03:28 -0700 (PDT) Date: Fri, 11 Oct 2002 03:03:28 -0700 From: Juli Mallett To: Robert Watson Cc: Garrett Wollman , arch@FreeBSD.org Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] Message-ID: <20021011030328.A92175@FreeBSD.org> References: <20021008150324.A47084@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: ; from rwatson@FreeBSD.org on Wed, Oct 09, 2002 at 01:31:10PM -0400 Organisation: The FreeBSD Project X-Alternate-Addresses: , , , , X-Towel: Yes X-LiveJournal: flata, jmallett X-Negacore: Yes Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG * De: Robert Watson [ Data: 2002-10-09 ] [ Subjecte: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] ] > On Tue, 8 Oct 2002, Juli Mallett wrote: > > > * De: Garrett Wollman [ Data: 2002-10-05 ] > > [ Subjecte: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] ] > > > >The most notable change is that the most recently sent && lowest > > > >numbered signal is sent, in the normal course of events, rather than > > > >simply the lowest numbered or most recently sent. > > > > > > This still isn't right. Real-time signals are QUEUED -- i.e., signals > > > of the same species are delivered in FIFO, not LIFO, order. POSIX > > > further specifies that signal N will be delivered before signal N+k, > > > for SIGRTMIN <= N <= N+k <= SIGRTMAX. The relative delivery order of > > > any signals outside of this range is unspecified beyond the special > > > behavior of SIGCONT, SIGSTOP, and SIGKILL. > > > > OK, I'm reading through this stuff extensively. There's a number of > > kernel interfaces that I'd like to add, related to them, but first thing > > is to get the queueing in there, IMHO, so that the base functionality is > > there to be built on. sigqueue() for example is about 10LOC with this > > stuff, and adding 'si_errno' stuff (which I'll love to have around) is > > just a matter of 4 lines of code wherever it can be used, once I've > > added a supportable in-kernel abstraction of psignal that takes a ksi, > > and does the normal sanity checks. > > > > That will make psignal about 12LOC, given that there's about 2LOC more > > than sigqueue() needed, as most of that is allocation and filling out a > > structure. > > Lines of code is not a good measure of complexity, especially when what > you're doing is moving and introducing complexity in other bits of the > code. I appreciate that this improves the abstractions some, but > psignal() is actually not all that terrible. Actually, in this case, it's more than that. It's expository with regard to the fact that this abstracts a few existing things, and further, it makes it actually possible to get a siginfo_t to/from somewhere. This is more or less impossible to do right with the current code. > > So assuming the FIFO behaviour is fixed, and that I also deliver the > > lowest available signal, and given that I plan to implement the above, > > do you have ny further objections? > > > > Other than the issue of the bitmask, which I see no easy and reliable > > method for getting around cleanly... And the failure cases. Would you > > settle for me using subr_sigq.c as my abstraction, and making actual > > queues optional, and having it use sigset_t under certain circumstances? > > It will add about 8LOC to every sendsig() to support pulling out the > > information when no ksiginfo is around. > > Signal queues involve failures. If at all possible, I'd like us to use a > strategy that: > > (2) Avoids the failure modes of signal queues in situations where access > to the signal data is not critical (i.e., if the receiving process > isn't requesting information on signals, don't store it -- I don't > know if the POSIX API supports this semantic though). I'm playing with an idea in my head such that: in a signal queuer/sender(not sendsig, that's really md_postsig -- it posts a signal), 1. signal_add is called. a. Does this signal have an SA_SIGINFO handler? I. Were we given a ksiginfo to queue? 1. Allocate one... Does that fail? a. Invoke an OOM killer, or such. : a. Continue. : 1. Enqueue it! : I. Add it to the bitmask. And I'd resurrect the bitmask for plain signals, but I'd put all the operations inside wrapped versions in subr_sigq, so that the higher-level has no idea what is going on. > (3) Leaves the failure mode semantics up to the caller, so that the caller > can decide if the signal delivery attempt is something worth retrying > or just ignoring. The Linux behavior I looked at (and told you > about) is that the actual signal queueing routines return EAGAIN if > the slab allocation fails, permitting the caller to retry if it wants, > or more likely, simply drop the signal. I believe this is how Linux > handles slab allocator failures for things like SIGIO, SIGCHLD, etc. Along with the above, would returning EAGAIN satisfy you? > In terms of strategy for supporting a changed to queued signals, my > recommendation would be that you go ahead and implement the POSIX realtime > signals based on your structural changes in a local tree and make sure the > structural changes end up doing what you need. Then present the whole > bundle as one big patch on arch@, along with an indication of how the > elements of the commit relate. I'm doing this in the jmallett_hack branch in Perforce now, so others will have access to this, and so I don't have to lug around patch files, or 10 different local CVS repositories. This stuff is too low-impact to warrant that for me. As for the further implementation, I'm going to look at sigqueue() and friends soon. I read the whole of the specs the other day, and it cleared up a good number of my concerns, and also gave me a handful of good ideas. I will go back over them some time soon, and have a go at implementing some of the more basic signal queue/rts interfaces. I want to lay groundwork, something to build on in the future. I'm also very concerned that we have siginfo_t in the kernel right now, but it's always a lie, bogus, etc., and pretty much has no business being there. Sound OK? Thanks, juli. -- Juli Mallett | FreeBSD: The Power To Serve Will break world for fulltime employment. | finger jmallett@FreeBSD.org http://people.FreeBSD.org/~jmallett/ | Support my FreeBSD hacking! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Oct 11 5:28:55 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7672837B401; Fri, 11 Oct 2002 05:28:54 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id DD2F843EA9; Fri, 11 Oct 2002 05:28:53 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Received: from mousie.catspoiler.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.5/8.12.5) with ESMTP id g9BCIpvU045027; Fri, 11 Oct 2002 05:18:55 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Message-Id: <200210111218.g9BCIpvU045027@gw.catspoiler.org> Date: Fri, 11 Oct 2002 05:18:51 -0700 (PDT) From: Don Lewis Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] To: jmallett@FreeBSD.ORG Cc: rwatson@FreeBSD.ORG, wollman@lcs.mit.edu, arch@FreeBSD.ORG In-Reply-To: <20021011030328.A92175@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 11 Oct, Juli Mallett wrote: > * De: Robert Watson [ Data: 2002-10-09 ] >> Signal queues involve failures. If at all possible, I'd like us to use a >> strategy that: >> >> (2) Avoids the failure modes of signal queues in situations where access >> to the signal data is not critical (i.e., if the receiving process >> isn't requesting information on signals, don't store it -- I don't >> know if the POSIX API supports this semantic though). The Solaris man page for sigqueue() doesn't specifically say what happens in this case, but it implies that just setting a signal bit on the target process is one of the possibilities. > I'm playing with an idea in my head such that: > in a signal queuer/sender(not sendsig, that's really md_postsig -- it posts > a signal), > 1. signal_add is called. > a. Does this signal have an SA_SIGINFO handler? > I. Were we given a ksiginfo to queue? > 1. Allocate one... Does that fail? > a. Invoke an OOM killer, or such. Solaris returns an EAGAIN to the caller and the target is unaffected. If the caller really wants to nuke the target, it could retry with kill(). The same error will be returned if there are too many signals in the target's queue, which should prevent the signal queue for a wedged process from consuming all of kmem. > : > a. Continue. > : > 1. Enqueue it! > : > I. Add it to the bitmask. > > > And I'd resurrect the bitmask for plain signals, but I'd put all the > operations inside wrapped versions in subr_sigq, so that the higher-level > has no idea what is going on. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Oct 11 5:37:21 2002 Delivered-To: freebsd-arch@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 931) id 2F0AA37B401; Fri, 11 Oct 2002 05:37:20 -0700 (PDT) Date: Fri, 11 Oct 2002 05:37:20 -0700 From: Juli Mallett To: Don Lewis Cc: rwatson@FreeBSD.ORG, wollman@lcs.mit.edu, arch@FreeBSD.ORG Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] Message-ID: <20021011053720.A2431@FreeBSD.org> References: <20021011030328.A92175@FreeBSD.org> <200210111218.g9BCIpvU045027@gw.catspoiler.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200210111218.g9BCIpvU045027@gw.catspoiler.org>; from dl-freebsd@catspoiler.org on Fri, Oct 11, 2002 at 05:18:51AM -0700 Organisation: The FreeBSD Project X-Alternate-Addresses: , , , , X-Towel: Yes X-LiveJournal: flata, jmallett X-Negacore: Yes Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG * De: Don Lewis [ Data: 2002-10-11 ] [ Subjecte: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] ] > On 11 Oct, Juli Mallett wrote: > > * De: Robert Watson [ Data: 2002-10-09 ] > > >> Signal queues involve failures. If at all possible, I'd like us to use a > >> strategy that: > >> > >> (2) Avoids the failure modes of signal queues in situations where access > >> to the signal data is not critical (i.e., if the receiving process > >> isn't requesting information on signals, don't store it -- I don't > >> know if the POSIX API supports this semantic though). > > The Solaris man page for sigqueue() doesn't specifically say what > happens in this case, but it implies that just setting a signal bit on > the target process is one of the possibilities. I'd really really rather avoid that, but I'm not wholly opposed to it. > > I'm playing with an idea in my head such that: > > in a signal queuer/sender(not sendsig, that's really md_postsig -- it posts > > a signal), > > 1. signal_add is called. > > a. Does this signal have an SA_SIGINFO handler? > > I. Were we given a ksiginfo to queue? > > 1. Allocate one... Does that fail? > > a. Invoke an OOM killer, or such. > > Solaris returns an EAGAIN to the caller and the target is unaffected. If > the caller really wants to nuke the target, it could retry with kill(). > The same error will be returned if there are too many signals in the > target's queue, which should prevent the signal queue for a wedged > process from consuming all of kmem. Uhm, not really. Retrying with SIGKILL won't result in the signal being queued. -- Juli Mallett | FreeBSD: The Power To Serve Will break world for fulltime employment. | finger jmallett@FreeBSD.org http://people.FreeBSD.org/~jmallett/ | Support my FreeBSD hacking! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Oct 11 5:40:54 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 002AA37B401; Fri, 11 Oct 2002 05:40:52 -0700 (PDT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 29F1343E9E; Fri, 11 Oct 2002 05:40:40 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.12.4/8.12.4) with SMTP id g9BCbmOo049187; Fri, 11 Oct 2002 08:37:48 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Fri, 11 Oct 2002 08:37:48 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Don Lewis Cc: jmallett@FreeBSD.ORG, wollman@lcs.mit.edu, arch@FreeBSD.ORG Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] In-Reply-To: <200210111218.g9BCIpvU045027@gw.catspoiler.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, 11 Oct 2002, Don Lewis wrote: > > I'm playing with an idea in my head such that: > > in a signal queuer/sender(not sendsig, that's really md_postsig -- it posts > > a signal), > > 1. signal_add is called. > > a. Does this signal have an SA_SIGINFO handler? > > I. Were we given a ksiginfo to queue? > > 1. Allocate one... Does that fail? > > a. Invoke an OOM killer, or such. > > Solaris returns an EAGAIN to the caller and the target is unaffected. If > the caller really wants to nuke the target, it could retry with kill(). > The same error will be returned if there are too many signals in the > target's queue, which should prevent the signal queue for a wedged > process from consuming all of kmem. Agreed. I think it would be best if the signal code itself didn't kill processes (Well, with the exception of cases where it is supposed to :-) to reclaim resources. Or, if that's the best place to put it, the caller should definitely be able to indicate its disposition with regards to failure modes. The temptation would be (assuming this was feasible): 1 If the target isn't doing anything special for the signal, don't pay the price of reliable delivery. 2 If the target is doing something special for the signal, allow the code attempting to deliver the signal figure out what to do if it fails. I know that (2) is possible, because Linux does that. I don't know much/anything about (1), but the conversation seems suggestive that that is possible. I'd be comfortable with this route as the experimental direction to see how well it all pulls together in the Perforce branch. However, for each case where we're considering (2) for a kernel generated signal, we need to determine what (if any) failure mode is appropriate. That would probably take looking at the specs closely, looking at other implementations, etc. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Oct 11 5:53:38 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A442F37B404; Fri, 11 Oct 2002 05:53:36 -0700 (PDT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2539A43E91; Fri, 11 Oct 2002 05:53:34 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.12.4/8.12.4) with SMTP id g9BCr0Oo049306; Fri, 11 Oct 2002 08:53:00 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Fri, 11 Oct 2002 08:52:59 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Juli Mallett Cc: Don Lewis , wollman@lcs.mit.edu, arch@FreeBSD.org Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] In-Reply-To: <20021011053720.A2431@FreeBSD.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, 11 Oct 2002, Juli Mallett wrote: > > Solaris returns an EAGAIN to the caller and the target is unaffected. If > > the caller really wants to nuke the target, it could retry with kill(). > > The same error will be returned if there are too many signals in the > > target's queue, which should prevent the signal queue for a wedged > > process from consuming all of kmem. > > Uhm, not really. Retrying with SIGKILL won't result in the signal being > queued. I think you may be missing the thrust: there are two sources of signals in the world: (1) User processes signalling each other or themselves. (2) Kernel services signalling user processes in response to a trap or an event. In both cases, we're talking about an EAGAIN error getting returned if insufficient resources are available to the source of the signal, and in both cases, we may be interested in a fail-stop approach. The case I believe Don is talking about specifically is the: Application boomctl tries to deliver SIGUSR1 to boomd, the reliable boom daemon. boomctl gets back EAGAIN because the kernel does not have the resources to reliably deliver the signal, and boomd has a handler for SIGUSR1. boomctl/boomd have fail-stop semantics, so boomctl calls kill(boomd_pid, SIGKILL). Or, if it doesn't care about the failure very much, it queues the instance delivery via some other sort of non-asynchronous-delivery IPC. This permits fail-stop semantics where they are needed, but doesn't force them on applications that would rather not stop. Another case to consider is that of init. Init may be interested in SIGCHLD with process information, but not so interested that it wants to be terminated if the pid can't be delivered with a siginfo; it can always call wait(). You care a lot about reliable init behavior in a memory constraint situation because if init dies, your system either halts or panics, depending on the circumstance. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Oct 11 6: 4:42 2002 Delivered-To: freebsd-arch@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 931) id E087C37B401; Fri, 11 Oct 2002 06:04:40 -0700 (PDT) Date: Fri, 11 Oct 2002 06:04:40 -0700 From: Juli Mallett To: Robert Watson Cc: Don Lewis , wollman@lcs.mit.edu, arch@FreeBSD.org Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] Message-ID: <20021011060440.A5569@FreeBSD.org> References: <20021011053720.A2431@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: ; from rwatson@FreeBSD.org on Fri, Oct 11, 2002 at 08:52:59AM -0400 Organisation: The FreeBSD Project X-Alternate-Addresses: , , , , X-Towel: Yes X-LiveJournal: flata, jmallett X-Negacore: Yes Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG * De: Robert Watson [ Data: 2002-10-11 ] [ Subjecte: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] ] > > On Fri, 11 Oct 2002, Juli Mallett wrote: > > > > Solaris returns an EAGAIN to the caller and the target is unaffected. If > > > the caller really wants to nuke the target, it could retry with kill(). > > > The same error will be returned if there are too many signals in the > > > target's queue, which should prevent the signal queue for a wedged > > > process from consuming all of kmem. > > > > Uhm, not really. Retrying with SIGKILL won't result in the signal being > > queued. > > I think you may be missing the thrust: there are two sources of signals in > the world: My lexical analysis failed, that's all. IWPTA "if the sender tries to resend with SIGKILL, it will just fail again because of the queueing", because that's something a number of people have brought up with me, ignoring the fact that we special-process such things. Thanks for triggering a re-parse, juli. -- Juli Mallett | FreeBSD: The Power To Serve Will break world for fulltime employment. | finger jmallett@FreeBSD.org http://people.FreeBSD.org/~jmallett/ | Support my FreeBSD hacking! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Oct 11 6: 6:17 2002 Delivered-To: freebsd-arch@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 931) id 147AD37B401; Fri, 11 Oct 2002 06:06:16 -0700 (PDT) Date: Fri, 11 Oct 2002 06:06:16 -0700 From: Juli Mallett To: Robert Watson Cc: Don Lewis , wollman@lcs.mit.edu, arch@FreeBSD.ORG Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] Message-ID: <20021011060615.B5569@FreeBSD.org> References: <200210111218.g9BCIpvU045027@gw.catspoiler.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: ; from rwatson@FreeBSD.ORG on Fri, Oct 11, 2002 at 08:37:48AM -0400 Organisation: The FreeBSD Project X-Alternate-Addresses: , , , , X-Towel: Yes X-LiveJournal: flata, jmallett X-Negacore: Yes Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG * De: Robert Watson [ Data: 2002-10-11 ] [ Subjecte: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] ] > On Fri, 11 Oct 2002, Don Lewis wrote: > > > > I'm playing with an idea in my head such that: > > > in a signal queuer/sender(not sendsig, that's really md_postsig -- it posts > > > a signal), > > > 1. signal_add is called. > > > a. Does this signal have an SA_SIGINFO handler? > > > I. Were we given a ksiginfo to queue? > > > 1. Allocate one... Does that fail? > > > a. Invoke an OOM killer, or such. > > > > Solaris returns an EAGAIN to the caller and the target is unaffected. If > > the caller really wants to nuke the target, it could retry with kill(). > > The same error will be returned if there are too many signals in the > > target's queue, which should prevent the signal queue for a wedged > > process from consuming all of kmem. > > Agreed. I think it would be best if the signal code itself didn't kill > processes (Well, with the exception of cases where it is supposed to :-) > to reclaim resources. Or, if that's the best place to put it, the caller > should definitely be able to indicate its disposition with regards to > failure modes. The temptation would be (assuming this was feasible): > > 1 If the target isn't doing anything special for the signal, don't pay the > price of reliable delivery. That's what top-1.a. was for, and of course we can do this, because if the handler isn't SA_SIGINFO, the second argument is u_long code, not a siginfo_t, ergo all we need to have is traditional-quality code delivery... Which sorta sucks :) > 2 If the target is doing something special for the signal, allow the code > attempting to deliver the signal figure out what to do if it fails. Of course this can be done, too. -- Juli Mallett | FreeBSD: The Power To Serve Will break world for fulltime employment. | finger jmallett@FreeBSD.org http://people.FreeBSD.org/~jmallett/ | Support my FreeBSD hacking! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Oct 11 6:12:17 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4FE9E37B401; Fri, 11 Oct 2002 06:12:15 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8C26843E9E; Fri, 11 Oct 2002 06:12:14 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Received: from mousie.catspoiler.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.5/8.12.5) with ESMTP id g9BDC3vU045194; Fri, 11 Oct 2002 06:12:07 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Message-Id: <200210111312.g9BDC3vU045194@gw.catspoiler.org> Date: Fri, 11 Oct 2002 06:12:03 -0700 (PDT) From: Don Lewis Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] To: rwatson@FreeBSD.ORG Cc: dl-freebsd@catspoiler.org, jmallett@FreeBSD.ORG, wollman@lcs.mit.edu, arch@FreeBSD.ORG In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 11 Oct, Robert Watson wrote: > On Fri, 11 Oct 2002, Don Lewis wrote: > >> > I'm playing with an idea in my head such that: >> > in a signal queuer/sender(not sendsig, that's really md_postsig -- it posts >> > a signal), >> > 1. signal_add is called. >> > a. Does this signal have an SA_SIGINFO handler? >> > I. Were we given a ksiginfo to queue? >> > 1. Allocate one... Does that fail? >> > a. Invoke an OOM killer, or such. >> >> Solaris returns an EAGAIN to the caller and the target is unaffected. If >> the caller really wants to nuke the target, it could retry with kill(). >> The same error will be returned if there are too many signals in the >> target's queue, which should prevent the signal queue for a wedged >> process from consuming all of kmem. > > Agreed. I think it would be best if the signal code itself didn't kill > processes (Well, with the exception of cases where it is supposed to :-) > to reclaim resources. Or, if that's the best place to put it, the caller > should definitely be able to indicate its disposition with regards to > failure modes. The temptation would be (assuming this was feasible): > > 1 If the target isn't doing anything special for the signal, don't pay the > price of reliable delivery. > 2 If the target is doing something special for the signal, allow the code > attempting to deliver the signal figure out what to do if it fails. > > I know that (2) is possible, because Linux does that. I don't know > much/anything about (1), but the conversation seems suggestive that that > is possible. I'd be comfortable with this route as the experimental > direction to see how well it all pulls together in the Perforce branch. > However, for each case where we're considering (2) for a kernel generated > signal, we need to determine what (if any) failure mode is appropriate. > That would probably take looking at the specs closely, looking at other > implementations, etc. Alas, RH 7.3 doesn't seem to have a man page for sigqueue(), so I don't know much about it's failure modes. The sigaction() man page describes all sorts of wonderful things that can be returned in the siginfo structure. One thing in the Solaris implementation that is not in the Linux implementation is the value SI_NOINFO value for si_code, which indicates that no other information is being returned. Nothing needs to be allocated on the kernel side to implement this, and it looks like a reasonable precedent for doing an incremental implementation. I wonder if the Linux version actually queues the information for SIGSEGV, etc. If the info is only returned if the signal is enabled when the error occurs, then the info could be just copied back to user space by the trap handler (well, it's not that easy because of the way we have to return to user space to invoke the signal handler ...). It's should be easy to cheat for SIGCHLD. The information can be harvested from the not yet waited-for child process. The Linux implementation appears to have made provisions for returning information for SIGIO, but it doesn't appear to be implemented yet. I wonder why that is ... To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Oct 11 6:20:10 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 772CA37B401; Fri, 11 Oct 2002 06:20:09 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 01CB743E88; Fri, 11 Oct 2002 06:20:09 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Received: from mousie.catspoiler.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.5/8.12.5) with ESMTP id g9BDJxvU045210; Fri, 11 Oct 2002 06:20:03 -0700 (PDT) (envelope-from dl-freebsd@catspoiler.org) Message-Id: <200210111320.g9BDJxvU045210@gw.catspoiler.org> Date: Fri, 11 Oct 2002 06:19:59 -0700 (PDT) From: Don Lewis Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., [for review]] To: jmallett@FreeBSD.ORG Cc: rwatson@FreeBSD.ORG, wollman@lcs.mit.edu, arch@FreeBSD.ORG In-Reply-To: <20021011053720.A2431@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 11 Oct, Juli Mallett wrote: > * De: Don Lewis [ Data: 2002-10-11 ] >> Solaris returns an EAGAIN to the caller and the target is unaffected. If >> the caller really wants to nuke the target, it could retry with kill(). >> The same error will be returned if there are too many signals in the >> target's queue, which should prevent the signal queue for a wedged >> process from consuming all of kmem. > > Uhm, not really. Retrying with SIGKILL won't result in the signal being > queued. The sender may be periodically sending SIGUSR1 in a loop to wake up an associated process. If the other process is stuffed up, then the sender will eventually find out when the target's queue fills up. It can then use kill(), or take whatever other action is appropriate. Have you never been blessed with a process stuck in an un-interruptable wait that even SIGKILL won't touch? It's a real PITA when this happens to a process on THE VERY IMPORTANT SERVER that happens to have a file descriptor open on the only tape drive. The only way to unwedge things is to reboot. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Oct 11 11:12: 8 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 52CE137B401; Fri, 11 Oct 2002 11:12:06 -0700 (PDT) Received: from ebb.errno.com (ebb.errno.com [66.127.85.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id B395343EA3; Fri, 11 Oct 2002 11:12:05 -0700 (PDT) (envelope-from sam@errno.com) Received: from melange (melange.errno.com [66.127.85.82]) (authenticated bits=0) by ebb.errno.com (8.12.5/8.12.1) with ESMTP id g9BIC21I029236 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Fri, 11 Oct 2002 11:12:02 -0700 (PDT)?g (envelope-from sam@errno.com)œ X-Authentication-Warning: ebb.errno.com: Host melange.errno.com [66.127.85.82] claimed to be melange Message-ID: <080101c27151$b2e92a30$52557f42@errno.com> From: "Sam Leffler" To: Cc: "Julian Elischer" , , References: <18d301c26e5e$8b5c7a30$52557f42@errno.com> Subject: Re: CFR: m_tag patch Date: Fri, 11 Oct 2002 11:12:02 -0700 Organization: Errno Consulting MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > >>>>> On Mon, 7 Oct 2002 17:06:25 -0700, > >>>>> "Sam Leffler" said: > > >> > If you allocate tag id's using your 32-bit time scheme then the fixed > > values > >> > above would never be hit since they are all for impossible times and so > >> > there'd be no conflict. > >> > >> Just make them all IDs in a single "Legacy" API > >> > > > Good idea; I see the way out. Try this: > > > struct m_tag { > > SLIST_ENTRY(m_tag) m_tag_link; /* List of packet tags */ > > u_int16_t m_tag_id; /* Tag ID */ > > u_int16_t m_tag_len; /* Length of data */ > > u_int32_t m_tag_cookie; /* Module/ABI */ > > }; > > > Then define the "Legacy ABI" to be zero (or whatever you want). Then all > > the m_tag_* routines that I specified work only for the Legacy ABI. > > (Whether this is done with shims or whatever doesn't matter.) This gives me > > the compatiblity I want with openbsd and gives you the functionality you > > need for netgraph. For new work we can specify users should avoid the > > Legacy ABI. > > > Cost is basically 4 bytes per tag and an extra compare when walking the > > tags. Happy? > > Sorry for interrupting, but please let me make it sure. Do you intend > to hide the additional member from other modules than the m_tag > internal? I'm afraid a story that (e.g.) some code fragments in the > network layer directly refers to m_tag_cookie, which will break source > level compatibility with other BSDs (when the code fragments are > shared with others). As suz said before, we (KAME) are very much > afraid of this kind of story. > The changes I'm proposing for KAME code make no references to m_tag_cookie. Things should be clear when you have a patch to look at. I'm working on getting that to you. Sam To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Oct 11 12: 8:16 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9F0A637B406 for ; Fri, 11 Oct 2002 12:08:14 -0700 (PDT) Received: from spqr.osg.gov.bc.ca (spqr.osg.gov.bc.ca [142.32.102.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id AB2C843E91 for ; Fri, 11 Oct 2002 12:08:13 -0700 (PDT) (envelope-from Cy.Schubert@osg.gov.bc.ca) Received: from passer.osg.gov.bc.ca (passer.osg.gov.bc.ca [142.32.110.29]) by spqr.osg.gov.bc.ca (Postfix) with ESMTP id 31FB69EF18; Fri, 11 Oct 2002 12:08:13 -0700 (PDT) Received: from cwsys.cwsent.com (cwsys2 [10.1.2.1]) by passer.osg.gov.bc.ca (8.12.6/8.12.3) with ESMTP id g9BJ7mKt003025; Fri, 11 Oct 2002 12:07:48 -0700 (PDT) (envelope-from cy@cwsent.com) Received: from cwsys (localhost [127.0.0.1]) by cwsys.cwsent.com (8.12.6/8.12.3) with ESMTP id g9BJ7grW002634; Fri, 11 Oct 2002 12:07:43 -0700 (PDT) (envelope-from cy@cwsys.cwsent.com) Message-Id: <200210111907.g9BJ7grW002634@cwsys.cwsent.com> X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 Reply-To: Cy Schubert - CITS Open Systems Group From: Cy Schubert - CITS Open Systems Group X-os: FreeBSD X-Sender: cy@cwsent.com To: "Vladimir B. " Grebenschikov Cc: Mikhail Teterin , arch@FreeBSD.ORG Subject: Re: using mem above 4Gb was: swapon some regular file In-Reply-To: Message from "Vladimir B. " Grebenschikov of "08 Oct 2002 15:01:16 +0400." <1034074876.917.23.camel@vbook.express.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Date: Fri, 11 Oct 2002 12:07:42 -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message <1034074876.917.23.camel@vbook.express.ru>, "Vladimir B. " = Grebensch ikov writes: > =F7 Tue, 08.10.2002, =D7 00:30, Mikhail Teterin =CE=C1=D0=C9=D3=C1=CC: > = > > Users wishing to swap onto a local regular file have to go through th= e > > vnconfig/mdconfig gimnastics. Is that intentional? > = > Yes. > May be we need add new type to md device, like "highmem", to access > memory above 4G as memory disk, and as consequence use it as swap-devic= e > or as fast /tmp/ partition or whatever ? > = > In this case we will be able to use more than 3Gb of RAM. This is reminiscent of S390 ESTOR. -- Cheers, Phone: 250-387-8437 Cy Schubert Fax: 250-387-5231 Team Leader, Sun/Alpha Team Email: Cy.Schubert@osg.gov.bc.ca Open Systems Group, CITS Ministry of Management Services Province of BC = FreeBSD UNIX: cy@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Oct 11 23:28:42 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6DB4037B401; Fri, 11 Oct 2002 23:28:41 -0700 (PDT) Received: from softweyr.com (softweyr.com [65.88.244.127]) by mx1.FreeBSD.org (Postfix) with ESMTP id BFDE443E88; Fri, 11 Oct 2002 23:28:40 -0700 (PDT) (envelope-from wes@softweyr.com) Received: from nextgig-9.access.nethere.net ([66.63.140.201] helo=softweyr.com) by softweyr.com with esmtp (Exim 3.35 #1) id 180FlL-0001n4-00; Sat, 12 Oct 2002 00:28:35 -0600 Message-ID: <3DA7C3DF.1CFD6978@softweyr.com> Date: Fri, 11 Oct 2002 23:40:31 -0700 From: Wes Peters Reply-To: arch@freebsd.org Organization: Softweyr LLC X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.2 i386) X-Accept-Language: en MIME-Version: 1.0 To: developers@FreeBSD.ORG Cc: arch@freebsd.org Subject: 6.0 branching (no longer: HEADS UP: 5.0 Feature Freeze October 16, 2002) References: <200210112056.g9BKuZEx041686@apollo.backplane.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Matthew Dillon wrote: > > I definitely agree with the 5.0 time schedule. I also agree with > Julian that it would be premature to branch 5.0 into -stable and > make 6.x -current. We should go through at *least* one more release > cycle (5.1) before branching, IMHO, simply to reduce the amount of > MFCing that would otherwise be necessary. Matt brings up a good point here. I'm daring to cross-post this because I want to move THIS discussion to -arch, where it belings. I've directed replies to -arch. I think we need to discuss when we will branch 6.x. I think we need to wait until we have a 5.x release that is stable enough to consider for production workstation usage levels, and hope we may reach that point by the 5.2 release. I think arbitrarily whacking off a new development branch before 5.x is really and truly stabilized could hurt the FreeBSD project greatly. This is obviously not my decision to make, and many of you know much more about the actual work to be done than I do. Please provide your input. I'm not asking that we make a decision at this point, just getting people thinking about how we might go about this, since it is likely to be different from how we've done it in the past. -- "Where am I, and what am I doing in this handbasket?" Wes Peters Softweyr LLC wes@softweyr.com http://softweyr.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Oct 11 23:53:45 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C159E37B401 for ; Fri, 11 Oct 2002 23:53:42 -0700 (PDT) Received: from softweyr.com (softweyr.com [65.88.244.127]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1788E43E77 for ; Fri, 11 Oct 2002 23:53:42 -0700 (PDT) (envelope-from wes@softweyr.com) Received: from nextgig-9.access.nethere.net ([66.63.140.201] helo=softweyr.com) by softweyr.com with esmtp (Exim 3.35 #1) id 180G96-0001p7-00; Sat, 12 Oct 2002 00:53:09 -0600 Message-ID: <3DA7C997.95F3C4FF@softweyr.com> Date: Sat, 12 Oct 2002 00:04:55 -0700 From: Wes Peters Organization: Softweyr LLC X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.2 i386) X-Accept-Language: en MIME-Version: 1.0 To: Terry Lambert Cc: Matthew Dillon , arch@FreeBSD.ORG Subject: Re: Database indexes and ram (was Re: using mem above 4Gb was:swapon some regular file) References: <1034105993.913.1.camel@vbook.express.ru> <200210082015.g98KFFrq084625@apollo.backplane.com> <1034109053.913.7.camel@vbook.express.ru> <200210082051.g98KpjU1084793@apollo.backplane.com> <3DA4C271.37AACAA3@softweyr.com> <3DA4C632.325F2EBE@mindspring.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Terry Lambert wrote: > > Wes Peters wrote: > > Linux solved this problem by refusing to do it. The candidates for DMA > > transfers include skbufs and buffers from the disk buffer pool, both of > > which are allocated from the lowest 4GB of physical ram when using PAE > > mode. > > Yes; this is the "Fast RAM/bounce buffer" approach I mentioned > already. Linux has an advantage here, in that they already run > software virtualization on the VM system, in order to try to be > architecture independent. The result is overhead in reverse > lookups that has only recently been fixed (and you need patches > to use it). FreeBSD would eat more overhead doing this, where > it sort of "fell out" of the extra overhead they already eat in > the Linux case. Yup. We could do much the same, but it'll take a bit of architecting. Adding some physical locality preferences to pools in JeffR's slab allocator would be a way to start investigating this, at a guess. > > Nah, it works great. Each process gets 3GB process virtual address and > > 1GB kernel virtual address and all of the program text+data can be located > > anywhere in physical ram. For things like databases that need large > > indeces in memory, this is a big win. > > This, I don't get: I don't understand how they can live with only > 1G of KVA space. I guess they are expecting a small number of net > connections... Per-process. I don't know that they've made socket pcb's (or their equivalent) per-process or not, but it seems a logical leap. I haven't looked into any of this because for our application, with a relatively small number of connections, it just works. > > Neither will help you with index sizes if you're using really honking big > > tables, where the index just won't fit. We actually use multiple processes > > to hold cached data, including indexes, in order to make use of the extra > > RAM. I should shut up now. ;^) > > ...or you'll have to kill you. 8-) 8-). Gurk! Sad, but true. > > > of accesses to the index that might result in cacheable table data are > > > also the types of accesses to the index that will likely result in > > > cacheable index data. Using the same argument, the types of accesses > > > that might result in an uncacheable index would also likely result in > > > uncacheable table data which means you are going to run up against > > > seek/read problems on the table data, making it more worthwhile to > > > spend the money on beefing up the storage subsystem. > > > > That's only true if your database server is I/O bound. Depending on your > > job mix, this may or may not be the problem. > > Likely, it will not be true, for any very large database, particularly > if you end up doing a reasonable number of joins. Hardly anybody goes > past 3rd normal form, and some people never even get that far. 8-). Some? You've seen a production database that was normalized at ALL? Gee, that'd be... nice? astonishing? like seeing the pope tour temple square? DBA stands for Data Base A..... The key to accelerating database access rarely has much to do with I/O speed. How many Oracle servers do you know that can stuff a Gigabit channel full, even doing straight selects? Memory usage is VERY important and DBAs are not famous for optimizing queries to make effecient use of the processor cache. Or anything else, for that matter. -- "Where am I, and what am I doing in this handbasket?" Wes Peters Softweyr LLC wes@softweyr.com http://softweyr.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 0: 7:36 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2445237B401 for ; Sat, 12 Oct 2002 00:07:36 -0700 (PDT) Received: from critter.freebsd.dk (l155.freebsd.dk [212.242.86.155]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2377643E8A for ; Sat, 12 Oct 2002 00:07:35 -0700 (PDT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id g9C77U4B047536 for ; Sat, 12 Oct 2002 09:07:31 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: arch@FreeBSD.ORG Subject: Re: 6.0 branching (no longer: HEADS UP: 5.0 Feature Freeze October 16, 2002) In-Reply-To: Your message of "Fri, 11 Oct 2002 23:40:31 PDT." <3DA7C3DF.1CFD6978@softweyr.com> Date: Sat, 12 Oct 2002 09:07:30 +0200 Message-ID: <47535.1034406450@critter.freebsd.dk> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message <3DA7C3DF.1CFD6978@softweyr.com>, Wes Peters writes: >I think we need to discuss when we will branch 6.x. Of course we need to. I don't think anybody in a sane state of mind would even dream about branching right now, and we will certainly not do it before 5.0-R, so how about we put this on the agenda for our mid-november bikeshed ? Discussing it now can only lead to political posturing and handwaving. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 0:14:55 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4C97837B401 for ; Sat, 12 Oct 2002 00:14:54 -0700 (PDT) Received: from softweyr.com (softweyr.com [65.88.244.127]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7EAD843E88 for ; Sat, 12 Oct 2002 00:14:53 -0700 (PDT) (envelope-from wes@softweyr.com) Received: from nextgig-11.access.nethere.net ([66.63.140.203] helo=softweyr.com) by softweyr.com with esmtp (Exim 3.35 #1) id 180GTn-0001rd-00; Sat, 12 Oct 2002 01:14:31 -0600 Message-ID: <3DA7CE9B.97BD3989@softweyr.com> Date: Sat, 12 Oct 2002 00:26:19 -0700 From: Wes Peters Organization: Softweyr LLC X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.2 i386) X-Accept-Language: en MIME-Version: 1.0 To: Terry Lambert Cc: Nate Lawson , "Vladimir B. Grebenschikov" , arch@FreeBSD.org Subject: Re: using mem above 4Gb was: swapon some regular file References: <3DA35D58.B1B5D78D@mindspring.com> <3DA4C2F1.74450081@softweyr.com> <3DA4C7EC.F749B803@mindspring.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Terry Lambert wrote: > > Wes Peters wrote: > > Terry Lambert wrote: > > > IMO, if you want a larger linear address space, instead of pretending > > > you have one, buy yourself an IA64 instead. > > > > Or an Alpha, or a SPARC64, or a MIPS64, etc. But they all seem to cost > > more than a PIII solution, except perhaps a Netra and you can't cram enough > > RAM in that to make a difference. > > People always say this, but... the Alpha is unsuitable, because > FreeBSD on the Alpha doesn't support more than 2G of physical > RAM, because the drivers choke. The MIPS is not an option, > because though there is a FreeBSD port, as reported at last > year's "developer summit" at Usenix, it was never integrated into > the source tree. The SPARC64 isn't a mainstream port yet (I know > this because my patch to kdenetwork3 was adulterated to be "if Alpha", > when it should have been adulterated to "if !32_bit_x86", if at all, > because the SPARC64 and IA64 GOT will go over 64K, as well... the > problem is the 64bit vs. 32bit values, not symbol names, etc., that > causes the table size to be bigger there). > > Right now, IA64 is about the only supported 64 bit architecture > that gives you the real benefit of a 64 bit address space; I guess > you can mmap a lot of stuff on the Alpha, too, up to your KVA > mapping limit, but that's not a win for this application. There aren't any architectural issues on the SPARC64 itself that prevent it from being a fully 64 bit system. It will take a while before FreeBSD developers catch up with the idea of 64-bitness. If you want to help, find a way to get more 64 bit systems into developers hands. Got a secret cache of Netras hiding somewhere you just have to get rid of? ;^) At least you can buy a brand-new, supported SPARC64 machine from Sun for < $1000 (in the USA). Two, in fact: the Sun Blade V100, and the Netra, nee Sun Fire V100. -- "Where am I, and what am I doing in this handbasket?" Wes Peters Softweyr LLC wes@softweyr.com http://softweyr.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 5:53: 6 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 54F6F37B401 for ; Sat, 12 Oct 2002 05:53:05 -0700 (PDT) Received: from phoenix.infradead.org (carisma.slowglass.com [195.224.96.167]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4AF0543EAF for ; Sat, 12 Oct 2002 05:53:04 -0700 (PDT) (envelope-from hch@infradead.org) Received: from hch by phoenix.infradead.org with local (Exim 4.10) id 180Ll7-0004Jk-00; Sat, 12 Oct 2002 13:52:45 +0100 Date: Sat, 12 Oct 2002 13:52:45 +0100 From: Christoph Hellwig To: Wes Peters Cc: Matthew Dillon , "Vladimir B. Grebenschikov" , Nate Lawson , arch@FreeBSD.ORG Subject: Re: Database indexes and ram (was Re: using mem above 4Gb was:swapon some regular file) Message-ID: <20021012135245.A16453@infradead.org> References: <1034105993.913.1.camel@vbook.express.ru> <200210082015.g98KFFrq084625@apollo.backplane.com> <1034109053.913.7.camel@vbook.express.ru> <200210082051.g98KpjU1084793@apollo.backplane.com> <3DA4C271.37AACAA3@softweyr.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <3DA4C271.37AACAA3@softweyr.com>; from wes@softweyr.com on Wed, Oct 09, 2002 at 04:57:37PM -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, Oct 09, 2002 at 04:57:37PM -0700, Wes Peters wrote: > Linux solved this problem by refusing to do it. The candidates for DMA > transfers include skbufs and buffers from the disk buffer pool, both of > which are allocated from the lowest 4GB of physical ram when using PAE > mode. Umm, Linux _does_ DMA into any memory if the NIC/HBA/whatever supports it. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 7:22:28 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B6C5E37B401 for ; Sat, 12 Oct 2002 07:22:24 -0700 (PDT) Received: from trantor.utsl.org (cvg-65-27-234-246.cinci.rr.com [65.27.234.246]) by mx1.FreeBSD.org (Postfix) with ESMTP id B9EBE43ECD for ; Sat, 12 Oct 2002 07:22:23 -0700 (PDT) (envelope-from utsl@quic.net) Received: from hotrod.utsl.org ([10.10.57.3] helo=quic.net) by trantor.utsl.org with esmtp (Exim 3.35 #1 (Debian)) id 180N9X-0001ln-00; Sat, 12 Oct 2002 10:22:03 -0400 Message-ID: <3DA82FBA.9070607@quic.net> Date: Sat, 12 Oct 2002 10:20:42 -0400 From: Nathan Hawkins User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020615 Debian/1.0.0-3 MIME-Version: 1.0 To: Wes Peters Cc: arch@FreeBSD.ORG Subject: Re: Database indexes and ram (was Re: using mem above 4Gb was:swapon some regular file) References: <1034105993.913.1.camel@vbook.express.ru> <200210082015.g98KFFrq084625@apollo.backplane.com> <1034109053.913.7.camel@vbook.express.ru> <200210082051.g98KpjU1084793@apollo.backplane.com> <3DA4C271.37AACAA3@softweyr.com> <3DA4C632.325F2EBE@mindspring.com> <3DA7C997.95F3C4FF@softweyr.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Wes Peters wrote: > Some? You've seen a production database that was normalized at ALL? > Gee, that'd be... nice? astonishing? like seeing the pope tour temple > square? DBA stands for Data Base A..... Sigh. Apparently some actually believe that denormalizing a database is an optimization. I should become a DBA. They get paid more than I do, and most seem to know less. :( > The key to accelerating database access rarely has much to do with I/O > speed. How many Oracle servers do you know that can stuff a Gigabit > channel full, even doing straight selects? Memory usage is VERY important > and DBAs are not famous for optimizing queries to make effecient use of > the processor cache. Or anything else, for that matter. Hmm. I have seen I/O speed become a problem. But that's generally when the DBA did a poor job of placing data volumes. Putting logs on the same disk with the filesystem where the online backups get dumped was spectacular. User imported some data while a backup was going, and that particular disk saw more activity than the other 30 or so combined... (Striping would have helped, but then, separating heavily used regions of disk onto different spindles is what striping is _for_.) Memory usage is critical, and so is correct processor use. Oracle and Sybase require correct tuning on SMP machines, or they waste CPU. Lock tuning, IIRC can be more critical to performance than memory, at least with Sybase. (i.e. You can add more memory and CPUs and performance gets worse.) Informix either worked vastly better, or we had a better DBA... :) The worst performance problem I ever saw was one where the DBA insisted on creating a shared memory segment that was larger than physical memory in the machine. Keeping half your database's cache in swap is a _bad_ idea. ---Nathan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 8:18: 6 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 74FEC37B404 for ; Sat, 12 Oct 2002 08:18:05 -0700 (PDT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7790543EB2 for ; Sat, 12 Oct 2002 08:18:04 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.12.4/8.12.4) with SMTP id g9CFHMOo001076; Sat, 12 Oct 2002 11:17:23 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Sat, 12 Oct 2002 11:17:21 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Poul-Henning Kamp Cc: arch@FreeBSD.ORG Subject: Re: 6.0 branching (no longer: HEADS UP: 5.0 Feature Freeze October 16, 2002) In-Reply-To: <47535.1034406450@critter.freebsd.dk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sat, 12 Oct 2002, Poul-Henning Kamp wrote: > In message <3DA7C3DF.1CFD6978@softweyr.com>, Wes Peters writes: > > >I think we need to discuss when we will branch 6.x. > > Of course we need to. > > I don't think anybody in a sane state of mind would even dream about > branching right now, and we will certainly not do it before 5.0-R, so > how about we put this on the agenda for our mid-november bikeshed ? > > Discussing it now can only lead to political posturing and handwaving. Agreed. At this point, it's not clear how rapidly 5.x will land. Once we've either seen it land, or decided it's taking too long, we can take appropriate action. For at least the next month, everything else is pure speculation. I'd much rather people invested time in making 5.x get into strong production shape as soon as possible than that we get involved in long discussions of the possible strategies. It's in everyone's best interest to make 5.0 work as well as possible, and since that's the one thing we can all agree on, we should work on that :-). Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 8:40:10 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C2B3837B413; Sat, 12 Oct 2002 08:40:08 -0700 (PDT) Received: from sccrmhc03.attbi.com (sccrmhc03.attbi.com [204.127.202.63]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2A05B43E6A; Sat, 12 Oct 2002 08:40:08 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by sccrmhc03.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021012154007.IWIG24958.sccrmhc03.attbi.com@InterJet.elischer.org>; Sat, 12 Oct 2002 15:40:07 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id IAA63230; Sat, 12 Oct 2002 08:30:21 -0700 (PDT) Date: Sat, 12 Oct 2002 08:30:20 -0700 (PDT) From: Julian Elischer To: Robert Watson Cc: Poul-Henning Kamp , arch@FreeBSD.ORG Subject: Re: 6.0 branching (no longer: HEADS UP: 5.0 Feature Freeze October 16, 2002) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I don't think it is a waste of time to make sure that all devevlopers know the issues and have the same expectations of what is likely to happen. On Sat, 12 Oct 2002, Robert Watson wrote: > > On Sat, 12 Oct 2002, Poul-Henning Kamp wrote: > > > > Discussing it now can only lead to political posturing and handwaving. > > Agreed. At this point, it's not clear how rapidly 5.x will land. Once > we've either seen it land, or decided it's taking too long, we can take [...] To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 9:25:32 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B5B3A37B401; Sat, 12 Oct 2002 09:25:31 -0700 (PDT) Received: from angelica.unixdaemons.com (angelica.unixdaemons.com [209.148.64.135]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9BA7D43EB1; Sat, 12 Oct 2002 09:25:30 -0700 (PDT) (envelope-from hiten@angelica.unixdaemons.com) Received: from angelica.unixdaemons.com (hiten@localhost.unixdaemons.com [127.0.0.1]) by angelica.unixdaemons.com (8.12.5/8.12.1) with ESMTP id g9CGPCc8020374; Sat, 12 Oct 2002 12:25:12 -0400 (EDT) X-Authentication-Warning: angelica.unixdaemons.com: Host hiten@localhost.unixdaemons.com [127.0.0.1] claimed to be angelica.unixdaemons.com Received: (from hiten@localhost) by angelica.unixdaemons.com (8.12.5/8.12.1/Submit) id g9CGPBNl020373; Sat, 12 Oct 2002 12:25:11 -0400 (EDT) (envelope-from hiten) Date: Sat, 12 Oct 2002 12:25:11 -0400 From: Hiten Pandya To: Terry Lambert Cc: Jeff Roberson , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Scheduler patch, ready for commit. Message-ID: <20021012122510.A13430@angelica.unixdaemons.com> References: <20021010022058.A23516-100000@mail.chesapeake.net> <3DA537E4.274A3714@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <3DA537E4.274A3714@mindspring.com>; from tlambert2@mindspring.com on Thu, Oct 10, 2002 at 01:18:44AM -0700 X-Operating-System: FreeBSD i386 X-Public-Key: http://www.pittgoth.com/~hiten/pubkey.asc X-URL: http://www.unixdaemons.com/~hiten X-PGP: http://pgp.mit.edu:11371/pks/lookup?search=Hiten+Pandya&op=index Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, Oct 10, 2002 at 01:18:44AM -0700, Terry Lambert wrote the words in effect of: > > Yes, I agree, this is an important next step. I'm thinking that the > > scheduler should indicate how much space is needed to the proc allocation > > code. This much extra space could be allocated, and a pointer to > > scheduler specific data could really be a pointer within that allocated > > structure. This way it might be near enough for processor caches to be > > effective. Clearly this needs more work. That is outside of the scope of > > the current patch though. > > [...] > You may actually want to look at the Solaris/SVR4 implementation, > which supports both scheduling classes as loadable modules, and > simultaneous multiple scheduler classes (SVID III(RT) and the > "fixed" scheduling class, used to improve interactive response of > the X server, as well as a batch scheduler, are included in the > defaults for both systems). FWIW, the Solaris Internals book discusses this topic of scheduler classes in detail, IIRC. It has been time since I touched the book. Cheers. -- Hiten Pandya http://www.unixdaemons.com/~hiten hiten@unixdaemons.com, hiten@uk.FreeBSD.org, hiten@softweyr.com PGP: http://pgp.mit.edu:11371/pks/lookup?search=Hiten+Pandya&op=index To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 12:20:14 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8AA3C37B407; Sat, 12 Oct 2002 12:20:11 -0700 (PDT) Received: from sccrmhc02.attbi.com (sccrmhc02.attbi.com [204.127.202.62]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9EA0843EB1; Sat, 12 Oct 2002 12:20:10 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by sccrmhc02.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20021012192009.OIYB24595.sccrmhc02.attbi.com@InterJet.elischer.org>; Sat, 12 Oct 2002 19:20:09 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id MAA64071; Sat, 12 Oct 2002 12:05:26 -0700 (PDT) Date: Sat, 12 Oct 2002 12:05:24 -0700 (PDT) From: Julian Elischer To: Hiten Pandya Cc: Terry Lambert , Jeff Roberson , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Scheduler patch, ready for commit. In-Reply-To: <20021012122510.A13430@angelica.unixdaemons.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sat, 12 Oct 2002, Hiten Pandya wrote: > On Thu, Oct 10, 2002 at 01:18:44AM -0700, Terry Lambert wrote the words in effect of: > > > Yes, I agree, this is an important next step. I'm thinking that the > > > scheduler should indicate how much space is needed to the proc allocation > > > code. This much extra space could be allocated, and a pointer to > > > scheduler specific data could really be a pointer within that allocated > > > structure. This way it might be near enough for processor caches to be > > > effective. Clearly this needs more work. That is outside of the scope of > > > the current patch though. If done on the fly, this would require freeing all the allocated procs in the uma cache and changing the size of the zone, and re-filling it, and replacing all the existing procs with the new larger ones.. hardly a likely scenario. Pretty obviously the additional storage is in the form of an extra blobb hanging off the proc/kse/ksegrp/thread structures as needed. (Unless the scheduler can make use of a couple of void * 'p_sched_private' type fields we can preallocate. > > > > [...] > > You may actually want to look at the Solaris/SVR4 implementation, > > which supports both scheduling classes as loadable modules, and > > simultaneous multiple scheduler classes (SVID III(RT) and the > > "fixed" scheduling class, used to improve interactive response of > > the X server, as well as a batch scheduler, are included in the > > defaults for both systems). > > FWIW, the Solaris Internals book discusses this topic of scheduler > classes in detail, IIRC. It has been time since I touched the book. > > Cheers. > > -- > Hiten Pandya > http://www.unixdaemons.com/~hiten > hiten@unixdaemons.com, hiten@uk.FreeBSD.org, hiten@softweyr.com > PGP: http://pgp.mit.edu:11371/pks/lookup?search=Hiten+Pandya&op=index > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 12:27:36 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5EF1437B404; Sat, 12 Oct 2002 12:27:35 -0700 (PDT) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7EA4D43ED1; Sat, 12 Oct 2002 12:27:33 -0700 (PDT) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g9CJRK196235; Sat, 12 Oct 2002 15:27:20 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Sat, 12 Oct 2002 15:27:19 -0400 (EDT) From: Jeff Roberson To: Julian Elischer Cc: Hiten Pandya , Terry Lambert , Jeff Roberson , Subject: Re: Scheduler patch, ready for commit. In-Reply-To: Message-ID: <20021012152434.U30714-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sat, 12 Oct 2002, Julian Elischer wrote: > On Sat, 12 Oct 2002, Hiten Pandya wrote: > > > On Thu, Oct 10, 2002 at 01:18:44AM -0700, Terry Lambert wrote the words in effect of: > > > > Yes, I agree, this is an important next step. I'm thinking that the > > > > scheduler should indicate how much space is needed to the proc allocation > > > > code. This much extra space could be allocated, and a pointer to > > > > scheduler specific data could really be a pointer within that allocated > > > > structure. This way it might be near enough for processor caches to be > > > > effective. Clearly this needs more work. That is outside of the scope of > > > > the current patch though. > > If done on the fly, this would require freeing all the allocated procs > in the uma cache and changing the size of the zone, and re-filling it, > and replacing all the existing procs with the new larger ones.. hardly a > likely scenario. > Pretty obviously the additional storage is in the form of an extra blobb > hanging off the proc/kse/ksegrp/thread structures as needed. (Unless the > scheduler can make use of a couple of void * 'p_sched_private' type > fields we can preallocate. > Is there really demand for on the fly scheduler changes? I guess I always thought of it as a neat trick and not something useful. It doesnt seem like it's worth the overhead for the extremely small number of scenarios where it's needed. Cheers, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 12:58: 2 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CB39B37B401 for ; Sat, 12 Oct 2002 12:58:00 -0700 (PDT) Received: from harmony.village.org (rover.bsdimp.com [204.144.255.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id 787E743EB3 for ; Sat, 12 Oct 2002 12:57:58 -0700 (PDT) (envelope-from imp@bsdimp.com) Received: from localhost (warner@rover2.village.org [10.0.0.1]) by harmony.village.org (8.12.3/8.12.3) with ESMTP id g9CJvvpk008348 for ; Sat, 12 Oct 2002 13:57:57 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Sat, 12 Oct 2002 13:57:09 -0600 (MDT) Message-Id: <20021012.135709.38051542.imp@bsdimp.com> To: arch@FreeBSD.org Subject: Re: 6.0 branching (no longer: HEADS UP: 5.0 Feature Freeze October 16, 2002) From: "M. Warner Losh" In-Reply-To: <3DA7C3DF.1CFD6978@softweyr.com> References: <200210112056.g9BKuZEx041686@apollo.backplane.com> <3DA7C3DF.1CFD6978@softweyr.com> X-Mailer: Mew version 2.1 on Emacs 21.2 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG [[ don't cross post to private and public lists ]] In message: <3DA7C3DF.1CFD6978@softweyr.com> Wes Peters writes: : Matthew Dillon wrote: : > : > I definitely agree with the 5.0 time schedule. I also agree with : > Julian that it would be premature to branch 5.0 into -stable and : > make 6.x -current. We should go through at *least* one more release : > cycle (5.1) before branching, IMHO, simply to reduce the amount of : > MFCing that would otherwise be necessary. : : Matt brings up a good point here. I'm daring to cross-post this because : I want to move THIS discussion to -arch, where it belings. I've directed : replies to -arch. : : I think we need to discuss when we will branch 6.x. I think we need to : wait until we have a 5.x release that is stable enough to consider for : production workstation usage levels, and hope we may reach that point : by the 5.2 release. I think arbitrarily whacking off a new development : branch before 5.x is really and truly stabilized could hurt the FreeBSD : project greatly. : : This is obviously not my decision to make, and many of you know much : more about the actual work to be done than I do. Please provide your : input. I'm not asking that we make a decision at this point, just : getting people thinking about how we might go about this, since it is : likely to be different from how we've done it in the past. I think that the general consensus has been to wait until 5.1 or 5.2 to do the branch. This has been the position of the last couple of developer summits that I've been at. We knew this a year ago. I'd let RE make the final call on this, but what you (and Matt) have said makes good sense. It is basically what most of the folks that have an opinion on this have been saying for some time now. This does mean that we'll need to keep the amount of rototilling down to a minimum during this time, and exclude new features that impact the stability of the system until after the branch. It will be a new feature slush rather than an outright freeze, since many new features can be integrated w/o impacting system stability. Warner To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 13:32:47 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2D9ED37B401; Sat, 12 Oct 2002 13:32:46 -0700 (PDT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 92EEC43EB7; Sat, 12 Oct 2002 13:32:44 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.12.4/8.12.4) with SMTP id g9CKVlOo044182; Sat, 12 Oct 2002 16:31:47 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Sat, 12 Oct 2002 16:31:47 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Jeff Roberson Cc: Julian Elischer , Hiten Pandya , Terry Lambert , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Scheduler patch, ready for commit. In-Reply-To: <20021012152434.U30714-100000@mail.chesapeake.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sat, 12 Oct 2002, Jeff Roberson wrote: > > If done on the fly, this would require freeing all the allocated procs > > in the uma cache and changing the size of the zone, and re-filling it, > > and replacing all the existing procs with the new larger ones.. hardly a > > likely scenario. > > Pretty obviously the additional storage is in the form of an extra blobb > > hanging off the proc/kse/ksegrp/thread structures as needed. (Unless the > > scheduler can make use of a couple of void * 'p_sched_private' type > > fields we can preallocate. > > Is there really demand for on the fly scheduler changes? I guess I > always thought of it as a neat trick and not something useful. It > doesnt seem like it's worth the overhead for the extremely small number > of scenarios where it's needed. It seems to me that there are several levels of "pluggability" (1) Source code pluggable. You modify the source to do the plugging. Also known as the patchset. (2) Compile pluggable. You don't have to modify existing code, but you do have to add new code in a module; the system is designed to easily allow this extensibility. (3) Boot-time pluggable. You can insert a module by linking it to the kernel, or by loading it with the boot loader. (4) Run-time pluggable. You can link it to the kernel, load it prior to boot, or load it after boot. There's also open questions about removal. With the MAC Framework, we've taken it to (4), although there are some policies that by definition can't be loaded after system start since they need the opportunity to impact the system from inception, as well as policies that can't be unloaded. Likewise with file systems, device drivers, etc. My personal feeling is that it would be nice, eventually, to be able to do boot-time selectable schedulers, but that that is about where you hit diminishing returns: to do more, it requires a lot more work with a lot less return, since you have to do a lot more in the way of allocation management, cleanup, not to mention whether you either "compose" or "replace" scheduling policies, etc. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 13:47:26 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ABE6637B401; Sat, 12 Oct 2002 13:47:24 -0700 (PDT) Received: from scaup.mail.pas.earthlink.net (scaup.mail.pas.earthlink.net [207.217.120.49]) by mx1.FreeBSD.org (Postfix) with ESMTP id 43CA143EAC; Sat, 12 Oct 2002 13:47:24 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0083.cvx21-bradley.dialup.earthlink.net ([209.179.192.83] helo=mindspring.com) by scaup.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 180TAM-0004Yd-00; Sat, 12 Oct 2002 13:47:18 -0700 Message-ID: <3DA88A09.12402F6C@mindspring.com> Date: Sat, 12 Oct 2002 13:46:01 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Julian Elischer Cc: Hiten Pandya , Jeff Roberson , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Scheduler patch, ready for commit. References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > > > > Yes, I agree, this is an important next step. I'm thinking that the > > > > scheduler should indicate how much space is needed to the proc allocation > > > > code. This much extra space could be allocated, and a pointer to > > > > scheduler specific data could really be a pointer within that allocated > > > > structure. This way it might be near enough for processor caches to be > > > > effective. Clearly this needs more work. That is outside of the scope of > > > > the current patch though. This is actually Jeff you are quoting here... > If done on the fly, this would require freeing all the allocated procs > in the uma cache and changing the size of the zone, and re-filling it, > and replacing all the existing procs with the new larger ones.. hardly a > likely scenario. > Pretty obviously the additional storage is in the form of an extra blobb > hanging off the proc/kse/ksegrp/thread structures as needed. (Unless the > scheduler can make use of a couple of void * 'p_sched_private' type > fields we can preallocate. Yes. The point is to encapsulate the allocation, so that it occurs in the context of a scheduler. By doing that, you permit the proc lock to protect both the proc struct, and the "blob" pointed to by the proc struct. What this boils down to is asking a scheduler for a new process that it manages, as opposed to asking the system for a process, and then assigning it a scheduler. This is consistent with "inherit on fork" semantics. Initially, you could preclude migration of control between schedulers of individual processes: it's something you could handle later. Note that any migration is going to have different measures, so it's not like you are going to be able to translate a measure between one and the other, and have the second scheduler class pick up as if the proc had always been running under it. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 14: 7:18 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A725437B401 for ; Sat, 12 Oct 2002 14:07:17 -0700 (PDT) Received: from harmony.village.org (rover.bsdimp.com [204.144.255.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0397F43EC2 for ; Sat, 12 Oct 2002 14:07:17 -0700 (PDT) (envelope-from imp@bsdimp.com) Received: from localhost (warner@rover2.village.org [10.0.0.1]) by harmony.village.org (8.12.3/8.12.3) with ESMTP id g9CL7Bpk008713; Sat, 12 Oct 2002 15:07:12 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Sat, 12 Oct 2002 15:06:16 -0600 (MDT) Message-Id: <20021012.150616.129769790.imp@bsdimp.com> To: hch@infradead.org Cc: wes@softweyr.com, dillon@apollo.backplane.com, vova@sw.ru, nate@root.org, arch@FreeBSD.ORG Subject: Re: Database indexes and ram From: "M. Warner Losh" In-Reply-To: <20021012135245.A16453@infradead.org> References: <200210082051.g98KpjU1084793@apollo.backplane.com> <3DA4C271.37AACAA3@softweyr.com> <20021012135245.A16453@infradead.org> X-Mailer: Mew version 2.1 on Emacs 21.2 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message: <20021012135245.A16453@infradead.org> Christoph Hellwig writes: : On Wed, Oct 09, 2002 at 04:57:37PM -0700, Wes Peters wrote: : > Linux solved this problem by refusing to do it. The candidates for DMA : > transfers include skbufs and buffers from the disk buffer pool, both of : > which are allocated from the lowest 4GB of physical ram when using PAE : > mode. : : Umm, Linux _does_ DMA into any memory if the NIC/HBA/whatever supports : it. Unless the card is 64bit, it can't DMA past 4G. Warner To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 14:17: 0 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D5DDC37B401; Sat, 12 Oct 2002 14:16:58 -0700 (PDT) Received: from carp.icir.org (carp.icir.org [192.150.187.71]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5B5B643EAF; Sat, 12 Oct 2002 14:16:58 -0700 (PDT) (envelope-from rizzo@carp.icir.org) Received: from carp.icir.org (localhost [127.0.0.1]) by carp.icir.org (8.12.3/8.12.3) with ESMTP id g9CLGnpJ091748; Sat, 12 Oct 2002 14:16:49 -0700 (PDT) (envelope-from rizzo@carp.icir.org) Received: (from rizzo@localhost) by carp.icir.org (8.12.3/8.12.3/Submit) id g9CLGnIS091747; Sat, 12 Oct 2002 14:16:49 -0700 (PDT) (envelope-from rizzo) Date: Sat, 12 Oct 2002 14:16:49 -0700 From: Luigi Rizzo To: Jeff Roberson Cc: Julian Elischer , Hiten Pandya , Terry Lambert , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Scheduler patch, ready for commit. Message-ID: <20021012141649.A91655@carp.icir.org> References: <20021012152434.U30714-100000@mail.chesapeake.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20021012152434.U30714-100000@mail.chesapeake.net>; from jroberson@chesapeake.net on Sat, Oct 12, 2002 at 03:27:19PM -0400 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sat, Oct 12, 2002 at 03:27:19PM -0400, Jeff Roberson wrote: ... > > If done on the fly, this would require freeing all the allocated procs > > in the uma cache and changing the size of the zone, and re-filling it, > > and replacing all the existing procs with the new larger ones.. hardly a > > likely scenario. > > Pretty obviously the additional storage is in the form of an extra blobb > > hanging off the proc/kse/ksegrp/thread structures as needed. (Unless the > > scheduler can make use of a couple of void * 'p_sched_private' type > > fields we can preallocate. > > > > Is there really demand for on the fly scheduler changes? I guess I always > thought of it as a neat trick and not something useful. It doesnt seem > like it's worth the overhead for the extremely small number of scenarios > where it's needed. if you would actually have a look at how i did it in stable, the codeto support that is just trivial, and the only no overhead in supporting this capability during normal operation is one indirect (as opposed to one direct) function call for each of the scheduler functions, which do not occur very often, and disappear in the noise compared to the rest of the work done by the scheduling code. Besides, the indirect function call is something we are already paying for multiple times on each packet handled by the network stack -- the protocol input routines are handled like this, so is the firewall call, so are the various if_* functions (interrupts, if_start ...), in netgraph, and wherever there is a loadable kernel module. Network events happen in the order of 100,000 times per second on a busy box, whereas scheduling decisions are probably taken 1-2 order of magnitude less frequently. Not that we _need_ to switch schedulers at runtime, but it is terribly conveninent when you are doing testing, and it basically comes for free when you want to make schedulers loadable as KLDs. And since most of the system is going that route i do not see why there are objections to this. cheers luigi To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 14:57:45 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4FC3037B401 for ; Sat, 12 Oct 2002 14:57:44 -0700 (PDT) Received: from mail.rpi.edu (mail.rpi.edu [128.113.22.40]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7A1EB43E7B for ; Sat, 12 Oct 2002 14:57:43 -0700 (PDT) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by mail.rpi.edu (8.12.1/8.12.1) with ESMTP id g9CLvfh1282038 for ; Sat, 12 Oct 2002 17:57:41 -0400 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <3DA7C3DF.1CFD6978@softweyr.com> References: <200210112056.g9BKuZEx041686@apollo.backplane.com> <3DA7C3DF.1CFD6978@softweyr.com> Date: Sat, 12 Oct 2002 17:57:41 -0400 To: arch@FreeBSD.ORG From: Garance A Drosihn Subject: Re: 6.0 branching (no longer: HEADS UP: 5.0 Feature Freeze October 16, 2002) Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: MIMEDefang 2.3 (www dot roaringpenguin dot com slash mimedefang) Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG At 11:40 PM -0700 10/11/02, Wes Peters wrote: >I think we need to discuss when we will branch 6.x. I think we need >to wait until we have a 5.x release that is stable enough to consider >for production workstation usage levels, and hope we may reach that >point by the 5.2 release. It is good to explicitly say something about this topic now, so people don't think a 6.0-current branch is going to happen right away. That said, the decision of exactly *when* to do the branch can only be guessed at right now. Right now we can say that 5.0-release will not be quite production-quality enough for a new 6.0 branch at that time. If we find that 5.1-release is production quality, then we should do the branch then. If it is not production-quality, then we should wait until after 5.2-release. If 5.2 is not production quality, then we will have to wait some more, no matter what hopes we have for 5.2 as we sit here talking about it before 5.0-release is even out the door. We should be clear that the criteria is "5.x is production quality", and not "when .x reaches .2". At some point (which might be 5.1) it will probably be helpful to have an explicit list of what issues need to be fixed before we can make the new -current branch. Let me also invoke the popular image of herding cats, and point out that "the project" can only hold off from making an official branch for so long before individual developers are going to feel that they (personally) have to start working on the Next Great Thing, and they will start doing private branches using their own source repositories. -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Oct 12 17:32: 2 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 684CE37B401 for ; Sat, 12 Oct 2002 17:32:01 -0700 (PDT) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0564B43E75 for ; Sat, 12 Oct 2002 17:31:57 -0700 (PDT) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id 728432A88D for ; Sat, 12 Oct 2002 17:31:53 -0700 (PDT) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: arch@FreeBSD.ORG Subject: Re: 6.0 branching (no longer: HEADS UP: 5.0 Feature Freeze October 16, 2002) In-Reply-To: Date: Sat, 12 Oct 2002 17:31:53 -0700 From: Peter Wemm Message-Id: <20021013003153.728432A88D@canning.wemm.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG The decision when to branch RELENG_5 can only be made when the time is right. If the tree is in no fit state to be branched (and it isn't right now), then the answer is simple - "not yet". Lets worry about getting the tree in shape before we worry about when to branch. Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message