From owner-freebsd-smp Sun Mar 18 12:13:12 2001 Delivered-To: freebsd-smp@freebsd.org Received: from gratis.grondar.za (grouter.grondar.za [196.7.18.65]) by hub.freebsd.org (Postfix) with ESMTP id 7D1EB37B718 for ; Sun, 18 Mar 2001 12:13:04 -0800 (PST) (envelope-from mark@grondar.za) Received: from grondar.za (root@gratis.grondar.za [196.7.18.133]) by gratis.grondar.za (8.11.1/8.11.1) with ESMTP id f2IKCvf28535 for ; Sun, 18 Mar 2001 22:12:59 +0200 (SAST) (envelope-from mark@grondar.za) Message-Id: <200103182012.f2IKCvf28535@gratis.grondar.za> To: smp@freebsd.org Subject: Reliable KASSERT panic in CURRENT Date: Sun, 18 Mar 2001 22:13:43 +0200 From: Mark Murray Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Hi Im getting a reliable panic (KASSERT actually) in kern_mutex.c line 215. It happens when I ^Z in vi on an intel SMP box running (very) CURRENT. M -- Mark Murray Warning: this .sig is umop ap!sdn To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Mar 18 17: 7:48 2001 Delivered-To: freebsd-smp@freebsd.org Received: from moby.geekhouse.net (moby.geekhouse.net [64.81.6.36]) by hub.freebsd.org (Postfix) with ESMTP id D077E37B718 for ; Sun, 18 Mar 2001 17:07:45 -0800 (PST) (envelope-from jhb@FreeBSD.org) Received: from laptop.baldwin.cx (john@dhcp152.geekhouse.net [192.168.1.152]) by moby.geekhouse.net (8.11.0/8.9.3) with ESMTP id f2J19j195919; Sun, 18 Mar 2001 17:09:47 -0800 (PST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <200103182012.f2IKCvf28535@gratis.grondar.za> Date: Sun, 18 Mar 2001 17:07:04 -0800 (PST) From: John Baldwin To: Mark Murray Subject: RE: Reliable KASSERT panic in CURRENT Cc: smp@FreeBSD.org Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On 18-Mar-01 Mark Murray wrote: > Hi > > Im getting a reliable panic (KASSERT actually) in kern_mutex.c > line 215. > > It happens when I ^Z in vi on an intel SMP box running (very) CURRENT. It's a bogus assertion now (like I said on IRC :-P). We can actually run for a very little bit while we are in SSTOP just before we go to sleep, so on an SMP system during priority propagation we might hit a running process that's not in SZOMB or SRUN. You can either add SSTOP to the MPASS() there or just remove the assertion entirely. I'm leaning towards removing the assertion but don't feel too strongly about it either way. > M > -- > Mark Murray > Warning: this .sig is umop ap!sdn -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Mar 18 21:52: 5 2001 Delivered-To: freebsd-smp@freebsd.org Received: from gratis.grondar.za (grouter.grondar.za [196.7.18.65]) by hub.freebsd.org (Postfix) with ESMTP id 53E5037B719; Sun, 18 Mar 2001 21:51:59 -0800 (PST) (envelope-from mark@grondar.za) Received: from grondar.za (root@gratis.grondar.za [196.7.18.133]) by gratis.grondar.za (8.11.1/8.11.1) with ESMTP id f2J5psf30575; Mon, 19 Mar 2001 07:51:54 +0200 (SAST) (envelope-from mark@grondar.za) Message-Id: <200103190551.f2J5psf30575@gratis.grondar.za> To: John Baldwin Cc: smp@FreeBSD.org Subject: Re: Reliable KASSERT panic in CURRENT References: In-Reply-To: ; from John Baldwin "Sun, 18 Mar 2001 17:07:04 PST." Date: Mon, 19 Mar 2001 07:52:37 +0200 From: Mark Murray Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > Im getting a reliable panic (KASSERT actually) in kern_mutex.c line > > 215. > > > > It happens when I ^Z in vi on an intel SMP box running (very) > > CURRENT. > > It's a bogus assertion now (like I said on IRC :-P). I kept missing you on IRC :-) :-( > We can actually > run for a very little bit while we are in SSTOP just before we go to > sleep, so on an SMP system during priority propagation we might hit > a running process that's not in SZOMB or SRUN. You can either add > SSTOP to the MPASS() there or just remove the assertion entirely. I'm > leaning towards removing the assertion but don't feel too strongly > about it either way. Cool! Is this commitworthy (if it works?) M -- Mark Murray Warning: this .sig is umop ap!sdn To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Mar 19 9:30: 7 2001 Delivered-To: freebsd-smp@freebsd.org Received: from moby.geekhouse.net (moby.geekhouse.net [64.81.6.36]) by hub.freebsd.org (Postfix) with ESMTP id DB66137B719 for ; Mon, 19 Mar 2001 09:30:01 -0800 (PST) (envelope-from jhb@FreeBSD.org) Received: from laptop.baldwin.cx (john@dhcp152.geekhouse.net [192.168.1.152]) by moby.geekhouse.net (8.11.0/8.9.3) with ESMTP id f2JHW8199533; Mon, 19 Mar 2001 09:32:09 -0800 (PST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <200103190551.f2J5psf30575@gratis.grondar.za> Date: Mon, 19 Mar 2001 09:29:23 -0800 (PST) From: John Baldwin To: Mark Murray Subject: Re: Reliable KASSERT panic in CURRENT Cc: smp@FreeBSD.org Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On 19-Mar-01 Mark Murray wrote: >> > Im getting a reliable panic (KASSERT actually) in kern_mutex.c line >> > 215. >> > >> > It happens when I ^Z in vi on an intel SMP box running (very) >> > CURRENT. >> >> It's a bogus assertion now (like I said on IRC :-P). > > I kept missing you on IRC :-) :-( > >> We can actually >> run for a very little bit while we are in SSTOP just before we go to >> sleep, so on an SMP system during priority propagation we might hit >> a running process that's not in SZOMB or SRUN. You can either add >> SSTOP to the MPASS() there or just remove the assertion entirely. I'm >> leaning towards removing the assertion but don't feel too strongly >> about it either way. > > Cool! Is this commitworthy (if it works?) Yes, and it will work. :) -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Mar 21 2:38:32 2001 Delivered-To: freebsd-smp@freebsd.org Received: from unit11.support.nl (unit11.support.nl [195.114.229.252]) by hub.freebsd.org (Postfix) with ESMTP id C99D637B719; Wed, 21 Mar 2001 02:38:19 -0800 (PST) (envelope-from marcel@support.nl) Received: from localhost (marcel@localhost) by unit11.support.nl (8.9.3/8.9.3/Debian 8.9.3-21) with ESMTP id LAA15979; Wed, 21 Mar 2001 11:38:28 +0100 Date: Wed, 21 Mar 2001 11:38:28 +0100 (CET) From: Marcel Lemmen To: freebsd-scsi@freebsd.org, freebsd-smp@freebsd.org Subject: High load with FreeBSD 4.2-REL Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Hello, I'm running FreeBSD 4.2-REL on a box dedicated for newsfeeding. The server specs are PIII733/1024MB Ram/Adaptec 160/7x Seagate 18GB disks. The problem is the load. The load averages are between 20-30!!! I've started a discussion at the Diablo list, but they couldn't find anything strange. The machine is working, currently 30Mbit/s input and 80Mbit/s output, it's much, but shouldn't affect the load that much. On a previous server (PIII450/256MB RAM/Adaptec 2940) the load was around 2, which should be normal (also a bandwith eater..). I think the problem is the Adaptec 29160 (PCI64) adapter in combination with FreeBSD 4.2-REL. Or a kernel option I've forgotten ;) The server mainboard is a Micro with a ServerWorks chipset and 2 processors slots (only 1 used). I've disabled as much as possible in the kernel, even the SMP options (should these be enabled...). I've also enable softupdates on the spool and set the maxuser to 512 in the Kernel. An "iostat -d 1" looks good. Below I've attached the dmesg and a top. Please let me know if I've forgotten something or if you have any other options! Cheers, Marcel Lemmen -------------------------------------------------------------- | Marcel Lemmen | Support Net BV | | | System Engineer | beheer@support.nl | \|/ | | | | ___.oO___|_ | | Jobs@SupportNet | http://jobs.supportnet.nl | | -------------------------------------------------------------- (It's a snowman in the desert next to a saguaro) Top: last pid: 3184; load averages: 30.96, 31.26, 31.62 up 0+02:23:06 11:30:25 213 processes: 16 running, 197 sleeping CPU states: 26.4% user, 0.0% nice, 24.8% system, 36.0% interrupt, 12.8% idle Mem: 816M Active, 13M Inact, 134M Wired, 40M Cache, 112M Buf, 1820K Free Swap: 512M Total, 17M Used, 495M Free, 3% Inuse PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 168 news 62 0 2652K 1100K RUN 6:00 2.69% 2.69% diablo 3097 news 52 0 1188K 368K RUN 0:10 2.05% 2.05% dnewslink 2876 news 55 0 1032K 360K RUN 0:27 1.95% 1.95% dnewslink DMESG: Copyright (c) 1992-2000 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.2-RELEASE #12: Mon Mar 19 15:32:31 CET 2001 support@news-x2.support.nl:/usr/src/sys/compile/NEWS-X2 Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 732985260 Hz CPU: Pentium III/Pentium III Xeon/Celeron (732.99-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x683 Stepping = 3 Features=0x383fbff real memory = 1073676288 (1048512K bytes) avail memory = 1042333696 (1017904K bytes) Preloaded elf kernel "kernel" at 0xc02a1000. ccd0-3: Concatenated disk drivers Pentium Pro MTRR support enabled md0: Malloc disk npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard pci0: on pcib0 pci0: at 2.0 irq 11 xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xd080-0xd0ff mem 0xfeafdf00-0xfeafdf7f irq 5 at device 3.0 on pci0 xl0: Ethernet address: 00:50:04:35:0a:49 miibus0: on xl0 xlphy0: <3Com internal media interface> on miibus0 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl1: <3Com 3c905B-TX Fast Etherlink XL> port 0xdc00-0xdc7f mem 0xfeafdf80-0xfeafdfff irq 15 at device 4.0 on pci0 xl1: Ethernet address: 00:50:04:35:0a:39 miibus1: on xl1 xlphy1: <3Com internal media interface> on miibus1 xlphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto pci0: (vendor=0x8086, dev=0x1229) at 6.0 irq 9 isab0: at device 15.0 on pci0 isa0: on isab0 pci0: at 15.1 pci0: at 15.2 irq 0 pcib1: on motherboard pci1: on pcib1 ahc0: port 0xe800-0xe8ff mem 0xfebff000-0xfebfffff irq 10 at device 1.0 on pci1 aic7892: Wide Channel A, SCSI Id=7, 32/255 SCBs fdc0: at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x100> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A, console Waiting 8 seconds for SCSI devices to settle Mounting root from ufs:/dev/da0s1a da0 at ahc0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-3 device da0: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing Enabled da0: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C) da4 at ahc0 bus 0 target 4 lun 0 da4: Fixed Direct Access SCSI-3 device da4: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing Enabled da4: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C) da3 at ahc0 bus 0 target 3 lun 0 da3: Fixed Direct Access SCSI-3 device da3: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing Enabled da3: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C) da2 at ahc0 bus 0 target 2 lun 0 da2: Fixed Direct Access SCSI-3 device da2: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing Enabled da2: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C) da1 at ahc0 bus 0 target 1 lun 0 da1: Fixed Direct Access SCSI-3 device da1: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing Enabled da1: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C) da6 at ahc0 bus 0 target 6 lun 0 da6: Fixed Direct Access SCSI-3 device da6: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing Enabled da6: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C) da5 at ahc0 bus 0 target 5 lun 0 da5: Fixed Direct Access SCSI-3 device da5: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing Enabled da5: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Mar 21 9:33:43 2001 Delivered-To: freebsd-smp@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id E254A37B719; Wed, 21 Mar 2001 09:33:39 -0800 (PST) (envelope-from bright@fw.wintelcom.net) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f2LHV2226396; Wed, 21 Mar 2001 09:31:02 -0800 (PST) Date: Wed, 21 Mar 2001 09:31:02 -0800 From: Alfred Perlstein To: Marcel Lemmen Cc: freebsd-scsi@FreeBSD.ORG, freebsd-smp@FreeBSD.ORG Subject: Re: High load with FreeBSD 4.2-REL Message-ID: <20010321093102.C12319@fw.wintelcom.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from marcel@support.nl on Wed, Mar 21, 2001 at 11:38:28AM +0100 X-all-your-base: are belong to us. Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org * Marcel Lemmen [010321 02:38] wrote: > Hello, > > I'm running FreeBSD 4.2-REL on a box dedicated for newsfeeding. The server > specs are PIII733/1024MB Ram/Adaptec 160/7x Seagate 18GB disks. The > problem is the load. The load averages are between 20-30!!! I've started a > discussion at the Diablo list, but they couldn't find anything strange. A high load average is fine, come back when you have a problem other than disliking a number. :) -Alfred To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Mar 22 7:24: 4 2001 Delivered-To: freebsd-smp@freebsd.org Received: from hand.dotat.at (inch.demon.co.uk [194.222.223.128]) by hub.freebsd.org (Postfix) with ESMTP id 33E0E37B71D for ; Thu, 22 Mar 2001 07:24:01 -0800 (PST) (envelope-from fanf@dotat.at) Received: from fanf by hand.dotat.at with local (Exim 3.20 #3) id 14g6vW-00077j-00 for freebsd-smp@freebsd.org; Thu, 22 Mar 2001 15:23:02 +0000 From: Tony Finch To: freebsd-smp@freebsd.org Subject: Locked data-structures and delayed writes. Message-Id: Date: Thu, 22 Mar 2001 15:23:02 +0000 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org I've been having an interesting discussion elsewhere with someone about the problems caused by delayed writes within the CPU. He's of the general opinion that everything is broken and can be very enlightening when explaining why he thinks this but he can also be frustratingly vague. Anyway, the question at hand is what happens if two threads on different CPUs are accessing the same locked data structure when the CPU delays writes to RAM, i.e. acquire_lock(s); modify(s); release_lock(s); Things are very broken if the write can be delayed until after the lock is released. What prevents that? A related question, but perhaps more implausible, is what happens if a page is unmapped from underneath a delayed write. This is particularly pathological if the destination page was mmapped and the program is exiting: the write may be lost. Tony. -- f.a.n.finch fanf@covalent.net dot@dotat.at To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Mar 22 8:43:49 2001 Delivered-To: freebsd-smp@freebsd.org Received: from prism.flugsvamp.com (cb58709-a.mdsn1.wi.home.com [24.17.241.9]) by hub.freebsd.org (Postfix) with ESMTP id 5DC0237B71B for ; Thu, 22 Mar 2001 08:43:46 -0800 (PST) (envelope-from jlemon@flugsvamp.com) Received: (from jlemon@localhost) by prism.flugsvamp.com (8.11.0/8.11.0) id f2MGdgW18437; Thu, 22 Mar 2001 10:39:42 -0600 (CST) (envelope-from jlemon) Date: Thu, 22 Mar 2001 10:39:42 -0600 (CST) From: Jonathan Lemon Message-Id: <200103221639.f2MGdgW18437@prism.flugsvamp.com> To: dot@dotat.at, smp@freebsd.org Subject: Re: Locked data-structures and delayed writes. X-Newsgroups: local.mail.freebsd-smp In-Reply-To: Organization: Cc: Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org In article you write: > >I've been having an interesting discussion elsewhere with someone >about the problems caused by delayed writes within the CPU. He's of >the general opinion that everything is broken and can be very >enlightening when explaining why he thinks this but he can also be >frustratingly vague. > >Anyway, the question at hand is what happens if two threads on >different CPUs are accessing the same locked data structure when the >CPU delays writes to RAM, i.e. > > acquire_lock(s); > modify(s); > release_lock(s); > >Things are very broken if the write can be delayed until after the >lock is released. What prevents that? Uhm. Why would things be broken simply because the write is delayed? This isn't a trick answer; if you don't subsequently read the location, then why would it matter if the write is delayed? (modulo writes to device memory; in these cases, you probably want to mark the writes as uncacheable) Now, if your modify involves doing a read of the location, then that is a different question. The ia-32 architecture uses a strong cache coherence model, so that writes to the same location appear to be seen in the same order by all processors (write serialization). So even if the memory write in modify() above is delayed until after the release_lock() call, any reads from that location will return the new data. On other architectures, you may need to explicitly manage the cache coherence yourself. >A related question, but perhaps more implausible, is what happens if a >page is unmapped from underneath a delayed write. This is particularly >pathological if the destination page was mmapped and the program is >exiting: the write may be lost. On the alpha, the write buffers are physically addressed, so if the virtual address mapping is removed, it will not affect the delayed write, since the write does not require the page tables. I'm not sure what Intel does; I would guess probably the same thing. I would assume that if some architecture uses virtually addressed blocks in a write buffer (!!), then part of the task of a TLB flush be to complete the delayed write before removing the mapping. -- Jonathan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Mar 22 9:29:50 2001 Delivered-To: freebsd-smp@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id DB49137B71C for ; Thu, 22 Mar 2001 09:29:47 -0800 (PST) (envelope-from bright@fw.wintelcom.net) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f2MHTkw05202; Thu, 22 Mar 2001 09:29:46 -0800 (PST) Date: Thu, 22 Mar 2001 09:29:46 -0800 From: Alfred Perlstein To: Tony Finch Cc: freebsd-smp@FreeBSD.ORG Subject: Re: Locked data-structures and delayed writes. Message-ID: <20010322092946.M9431@fw.wintelcom.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from dot@dotat.at on Thu, Mar 22, 2001 at 03:23:02PM +0000 X-all-your-base: are belong to us. Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org * Tony Finch [010322 07:24] wrote: > > I've been having an interesting discussion elsewhere with someone > about the problems caused by delayed writes within the CPU. He's of > the general opinion that everything is broken and can be very > enlightening when explaining why he thinks this but he can also be > frustratingly vague. Ah, the glass is not only half empty, but it will most likely shatter and slice your lips off kinda guy... Yes, I know a couple of people like that. :) > Anyway, the question at hand is what happens if two threads on > different CPUs are accessing the same locked data structure when the > CPU delays writes to RAM, i.e. > > acquire_lock(s); > modify(s); > release_lock(s); > > Things are very broken if the write can be delayed until after the > lock is released. What prevents that? Usually one of two things: 1) any locked op forces the CPU to flush all writes before it completes 2) there are explicit write/read barrier opcodes that people who design lock primatives are expected to use. > A related question, but perhaps more implausible, is what happens if a > page is unmapped from underneath a delayed write. This is particularly > pathological if the destination page was mmapped and the program is > exiting: the write may be lost. Yes, this is why there's such a thing as a 'tlb' shootdown, which I think requires IPI (interprocessor interrupt) to notify that the system pagetables are being changed. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Mar 22 9:32: 2 2001 Delivered-To: freebsd-smp@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id A23AA37B718 for ; Thu, 22 Mar 2001 09:32:00 -0800 (PST) (envelope-from bright@fw.wintelcom.net) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f2MHVwF05355; Thu, 22 Mar 2001 09:31:58 -0800 (PST) Date: Thu, 22 Mar 2001 09:31:58 -0800 From: Alfred Perlstein To: Jonathan Lemon Cc: dot@dotat.at, smp@FreeBSD.ORG Subject: Re: Locked data-structures and delayed writes. Message-ID: <20010322093158.N9431@fw.wintelcom.net> References: <200103221639.f2MGdgW18437@prism.flugsvamp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200103221639.f2MGdgW18437@prism.flugsvamp.com>; from jlemon@flugsvamp.com on Thu, Mar 22, 2001 at 10:39:42AM -0600 X-all-your-base: are belong to us. Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org * Jonathan Lemon [010322 08:44] wrote: > In article you write: > > > >I've been having an interesting discussion elsewhere with someone > >about the problems caused by delayed writes within the CPU. He's of > >the general opinion that everything is broken and can be very > >enlightening when explaining why he thinks this but he can also be > >frustratingly vague. > > > >Anyway, the question at hand is what happens if two threads on > >different CPUs are accessing the same locked data structure when the > >CPU delays writes to RAM, i.e. > > > > acquire_lock(s); > > modify(s); > > release_lock(s); > > > >Things are very broken if the write can be delayed until after the > >lock is released. What prevents that? > > Uhm. Why would things be broken simply because the write is delayed? > This isn't a trick answer; if you don't subsequently read the location, > then why would it matter if the write is delayed? (modulo writes to > device memory; in these cases, you probably want to mark the writes as > uncacheable) > > Now, if your modify involves doing a read of the location, then that > is a different question. Heh, what if your write to 's' happens after lock release and without it 's' is not consistant? You need a write barrier. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] Represent yourself, show up at BABUG http://www.babug.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Mar 22 9:41:54 2001 Delivered-To: freebsd-smp@freebsd.org Received: from prism.flugsvamp.com (cb58709-a.mdsn1.wi.home.com [24.17.241.9]) by hub.freebsd.org (Postfix) with ESMTP id 7F20937B71A for ; Thu, 22 Mar 2001 09:41:51 -0800 (PST) (envelope-from jlemon@flugsvamp.com) Received: (from jlemon@localhost) by prism.flugsvamp.com (8.11.0/8.11.0) id f2MHbiG20456; Thu, 22 Mar 2001 11:37:44 -0600 (CST) (envelope-from jlemon) Date: Thu, 22 Mar 2001 11:37:44 -0600 From: Jonathan Lemon To: Alfred Perlstein Cc: Jonathan Lemon , dot@dotat.at, smp@FreeBSD.ORG Subject: Re: Locked data-structures and delayed writes. Message-ID: <20010322113744.T82645@prism.flugsvamp.com> References: <200103221639.f2MGdgW18437@prism.flugsvamp.com> <20010322093158.N9431@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: <20010322093158.N9431@fw.wintelcom.net> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Thu, Mar 22, 2001 at 09:31:58AM -0800, Alfred Perlstein wrote: > * Jonathan Lemon [010322 08:44] wrote: > > In article you write: > > > > > >I've been having an interesting discussion elsewhere with someone > > >about the problems caused by delayed writes within the CPU. He's of > > >the general opinion that everything is broken and can be very > > >enlightening when explaining why he thinks this but he can also be > > >frustratingly vague. > > > > > >Anyway, the question at hand is what happens if two threads on > > >different CPUs are accessing the same locked data structure when the > > >CPU delays writes to RAM, i.e. > > > > > > acquire_lock(s); > > > modify(s); > > > release_lock(s); > > > > > >Things are very broken if the write can be delayed until after the > > >lock is released. What prevents that? > > > > Uhm. Why would things be broken simply because the write is delayed? > > This isn't a trick answer; if you don't subsequently read the location, > > then why would it matter if the write is delayed? (modulo writes to > > device memory; in these cases, you probably want to mark the writes as > > uncacheable) > > > > Now, if your modify involves doing a read of the location, then that > > is a different question. > > Heh, what if your write to 's' happens after lock release and without > it 's' is not consistant? You need a write barrier. Well, cache coherency and memory ordering are two different things. If the architecture has a relaxed memory ordering (say, release consistency) then you will need a write barrier to enforce ordering of the lock with respect to modify(). But that is a different question from just a delayed write. -- Jonathan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Mar 22 10: 4:16 2001 Delivered-To: freebsd-smp@freebsd.org Received: from gratis.grondar.za (grouter.grondar.za [196.7.18.65]) by hub.freebsd.org (Postfix) with ESMTP id BE7FB37B71F for ; Thu, 22 Mar 2001 10:04:11 -0800 (PST) (envelope-from mark@grondar.za) Received: from grondar.za (root@gratis.grondar.za [196.7.18.133]) by gratis.grondar.za (8.11.1/8.11.1) with ESMTP id f2MI3tf50897; Thu, 22 Mar 2001 20:03:55 +0200 (SAST) (envelope-from mark@grondar.za) Message-Id: <200103221803.f2MI3tf50897@gratis.grondar.za> To: Tony Finch Cc: freebsd-smp@FreeBSD.ORG Subject: Re: Locked data-structures and delayed writes. References: In-Reply-To: ; from Tony Finch "Thu, 22 Mar 2001 15:23:02 GMT." Date: Thu, 22 Mar 2001 20:05:04 +0200 From: Mark Murray Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Anyway, the question at hand is what happens if two threads on > different CPUs are accessing the same locked data structure when the > CPU delays writes to RAM, i.e. > > acquire_lock(s); > modify(s); > release_lock(s); > > Things are very broken if the write can be delayed until after the > lock is released. What prevents that? "man atomic", and look at the "acquire" and "release" memory barriers. M -- Mark Murray Warning: this .sig is umop ap!sdn To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Mar 22 11: 1:29 2001 Delivered-To: freebsd-smp@freebsd.org Received: from meow.osd.bsdi.com (meow.osd.bsdi.com [204.216.28.88]) by hub.freebsd.org (Postfix) with ESMTP id 627EC37B720 for ; Thu, 22 Mar 2001 11:01:26 -0800 (PST) (envelope-from jhb@FreeBSD.org) Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241]) by meow.osd.bsdi.com (8.11.2/8.11.2) with ESMTP id f2MIxsG80547; Thu, 22 Mar 2001 10:59:54 -0800 (PST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Thu, 22 Mar 2001 10:59:42 -0800 (PST) From: John Baldwin To: Tony Finch Subject: RE: Locked data-structures and delayed writes. Cc: freebsd-smp@FreeBSD.org Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On 22-Mar-01 Tony Finch wrote: > > I've been having an interesting discussion elsewhere with someone > about the problems caused by delayed writes within the CPU. He's of > the general opinion that everything is broken and can be very > enlightening when explaining why he thinks this but he can also be > frustratingly vague. > > Anyway, the question at hand is what happens if two threads on > different CPUs are accessing the same locked data structure when the > CPU delays writes to RAM, i.e. > > acquire_lock(s); > modify(s); > release_lock(s); > > Things are very broken if the write can be delayed until after the > lock is released. What prevents that? Memory barriers. When we acquire a lock, we enforce a memory barrier to ensure that the data accesses to actually obtain the lock are obtained before we perform any 'sensitive' operations. Secondly, we use another memory barrier during the release to ensure that all 'sensitive' operations are finished before the lock is released. > A related question, but perhaps more implausible, is what happens if a > page is unmapped from underneath a delayed write. This is particularly > pathological if the destination page was mmapped and the program is > exiting: the write may be lost. If the program doesn't need the data, who cares if it is lost? If the data is not program specific (e.g. kernel data structures) then the page won't be unmapped. :) However, it is actually a concern to make sure that if the data is still in the cache, it doesnt' get written out later when some other program is using this page. This can be handled in various ways depending on what cache architecture is being used. For an excellent treatment of this topic, see "Unix Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers" by Curt Schimmel ISBN 0-201-63338-8. > Tony. > -- > f.a.n.finch fanf@covalent.net dot@dotat.at -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Mar 22 12: 2:14 2001 Delivered-To: freebsd-smp@freebsd.org Received: from linuxpower.p00t.net (mke-24-167-255-186.wi.rr.com [24.167.255.186]) by hub.freebsd.org (Postfix) with ESMTP id E080837B719 for ; Thu, 22 Mar 2001 12:02:07 -0800 (PST) (envelope-from tduffey@wi.rr.com) Received: from localhost (trout@localhost) by linuxpower.p00t.net (8.11.3/8.11.3) with ESMTP id f2MK27F11149 for ; Thu, 22 Mar 2001 14:02:07 -0600 Date: Thu, 22 Mar 2001 14:02:07 -0600 (CST) From: Tom Duffey To: freebsd-smp@freebsd.org Subject: IBM Netfinity 3500 SMP problem Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Dear Fellow FreeBSD Users and Developers, I'm attempting to make FreeBSD 4.2-RELEASE run with SMP support on the aforementioned Netfinity machine. The kernel boots fine w/o SMP, but hangs somewhere after "Waiting 15 seconds for SCSI devices to settle down" appears. I've checked the archives and see many mentions of similar trouble but no solutions or conclusions. Any help is appreciated. The kernel works w/o SMP and I'm using the defaults when attempting to boot with SMP. Here is the mptable output: =============================================================================== MPTable, version 2.0.15 ------------------------------------------------------------------------------- MP Floating Pointer Structure: location: EBDA physical address: 0x0009e1d0 signature: '_MP_' length: 16 bytes version: 1.4 checksum: 0xd6 mode: Virtual Wire ------------------------------------------------------------------------------- MP Config Table Header: physical address: 0x0009e1e0 signature: 'PCMP' base table length: 244 version: 1.4 checksum: 0xef OEM ID: 'IBM ENSW' Product ID: 'NF 6000R SMP' OEM table pointer: 0x00000000 OEM table size: 0 entry count: 22 local APIC address: 0xfee00000 extended table length: 168 extended table checksum: 117 ------------------------------------------------------------------------------- MP Config Base Table Entries: -- Processors: APIC ID Version State Family Model Step Flags 3 0x11 BSP, usable 6 8 3 0x0301 0 0x11 AP, usable 6 8 3 0x0301 -- Bus: Bus ID Type 0 PCI 1 PCI 2 ISA -- I/O APICs: APIC ID Version State Address 14 0x11 usable 0xfec00000 13 0x11 usable 0xfec01000 -- I/O Ints: Type Polarity Trigger Bus ID IRQ APIC ID PIN# INT conforms conforms 2 1 14 1 INT conforms conforms 2 0 14 2 INT conforms conforms 2 3 14 3 INT conforms conforms 2 4 14 4 INT conforms conforms 2 6 14 6 INT conforms conforms 2 7 14 7 INT conforms conforms 2 8 14 8 INT conforms conforms 2 12 14 12 INT conforms conforms 2 13 14 13 INT conforms conforms 2 14 14 14 INT conforms conforms 0 2:A 13 11 INT conforms conforms 0 15:A 14 10 INT conforms conforms 1 3:A 13 12 -- Local Ints: Type Polarity Trigger Bus ID IRQ APIC ID PIN# NMI conforms conforms 2 0 255 1 ExtINT conforms conforms 2 0 255 0 ------------------------------------------------------------------------------- MP Config Extended Table Entries: -- System Address Space bus ID: 0 address type: memory address address base: 0xa0000 address range: 0x20000 -- System Address Space bus ID: 0 address type: memory address address base: 0xd0000 address range: 0x10000 -- System Address Space bus ID: 0 address type: memory address address base: 0xfd000000 address range: 0x3000000 -- System Address Space bus ID: 0 address type: prefetch address address base: 0xf0000000 address range: 0xd000000 -- System Address Space bus ID: 1 address type: memory address address base: 0xee000000 address range: 0x2000000 -- System Address Space bus ID: 1 address type: prefetch address address base: 0x8000000 address range: 0xe6000000 -- System Address Space bus ID: 0 address type: I/O address address base: 0x0 address range: 0x2040 -- System Address Space bus ID: 1 address type: I/O address address base: 0x2040 address range: 0xdfc0 -- Bus Heirarchy bus ID: 2 bus info: 0x01 parent bus ID: 0 ------------------------------------------------------------------------------- # SMP kernel config file options: # Required: options SMP # Symmetric MultiProcessor Kernel options APIC_IO # Symmetric (APIC) I/O # Optional (built-in defaults will work in most cases): #options NCPU=2 # number of CPUs #options NBUS=3 # number of busses #options NAPIC=2 # number of IO APICs #options NINTR=24 # number of INTs ===============================================================================A Best Regards, Tom Duffey To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Mar 22 12:55:31 2001 Delivered-To: freebsd-smp@freebsd.org Received: from meow.osd.bsdi.com (meow.osd.bsdi.com [204.216.28.88]) by hub.freebsd.org (Postfix) with ESMTP id 6CFB237B719 for ; Thu, 22 Mar 2001 12:55:29 -0800 (PST) (envelope-from jhb@FreeBSD.org) Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241]) by meow.osd.bsdi.com (8.11.2/8.11.2) with ESMTP id f2MKtBG84070 for ; Thu, 22 Mar 2001 12:55:12 -0800 (PST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 Date: Thu, 22 Mar 2001 12:55:00 -0800 (PST) From: John Baldwin To: smp@FreeBSD.org Subject: SMPng Status Report Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Until such time as a new SMPng Project Manager is appointed or whatever, I figured that someone needs to at least send out an occasional status report, so here goes: - Bosko Milekic is presently changing the msleep()/wakeup() code in the mbuf subsystem to make use of condition variables instead. He is also looking into some optimizations in the mbuf subsystem in terms of mutex locks in conjuction with some potential internal changes along with Alfred Perlstein. - I have just finished overhauling the witness code to not be mutex specific, but to instead use abstract lock objects. Each lock object has a lock class that specifies properties of all locks of a certain type. Individual lock objects also have additional properties that can override and/or add to the class properties. I haven't updated the sx locks yet, but that should be a 15 minute job. Once this is done sx locks can safely be used throughout the system. The first ones in widespread use will replace the lockmgr locks currently backing the allproc and proctree locks. I've also implemented a small critical_enter/exit API that will be used to replace the restore/save_intr() functions that came in with the original SMPng commit. With this, disable/enable_intr() will go back to being trivial one instruction functions that are i386 and ia64 specific. The todo list still resides at its old location: http://www.FreeBSD.org/~jasone/smp/ Some of the notable items on the todo list for those who would like to help out but are not sure where to start include removing the syscall MP safe flag in favor of explicit mtx_lock/unlock's of Giant for all syscall's and removing nested includes of in other kernel headers. Some of the big projects that should be coming very soon include: - Removing or at least ignoring all the priorities passed in to msleep/tsleep now that priority propagation works. - Convert lockmgr locks over to using mutexes and sx locks. -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Mar 22 17: 4: 0 2001 Delivered-To: freebsd-smp@freebsd.org Received: from linuxpower.p00t.net (mke-24-167-255-186.wi.rr.com [24.167.255.186]) by hub.freebsd.org (Postfix) with ESMTP id B951837B71D for ; Thu, 22 Mar 2001 17:03:55 -0800 (PST) (envelope-from tduffey@wi.rr.com) Received: from localhost (trout@localhost) by linuxpower.p00t.net (8.11.3/8.11.3) with ESMTP id f2N13sK11477 for ; Thu, 22 Mar 2001 19:03:54 -0600 Date: Thu, 22 Mar 2001 19:03:54 -0600 (CST) From: Tom Duffey To: freebsd-smp@freebsd.org Subject: More IBM Netfinity 3500 SMP problem In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org The output of mptable shows that this system has three busses, but FreeBSD defaults to 4. So, I attempted to recompile a kernel using more specific SMP options, namely: options SMP options APIC_IOB options NBUS=3 But 'config' complains: smp:61: unknown option "NBUS". Are the optional SMP paramaters no longer available? Does it matter? Please let me know if there's anything I can do to help make FreeBSD's SMP support work with this hardware. Best Regards, Tom Duffey To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Mar 22 17:23:47 2001 Delivered-To: freebsd-smp@freebsd.org Received: from femail12.sdc1.sfba.home.com (femail12.sdc1.sfba.home.com [24.0.95.108]) by hub.freebsd.org (Postfix) with ESMTP id 1F7AF37B71F for ; Thu, 22 Mar 2001 17:23:42 -0800 (PST) (envelope-from jgowdy@home.com) Received: from cx443070b ([24.0.36.170]) by femail12.sdc1.sfba.home.com (InterMail vM.4.01.03.20 201-229-121-120-20010223) with SMTP id <20010323012341.HEMK7377.femail12.sdc1.sfba.home.com@cx443070b>; Thu, 22 Mar 2001 17:23:41 -0800 Message-ID: <000a01c0b338$4bc3d680$aa240018@cx443070b> From: "Jeremiah Gowdy" To: "Tom Duffey" , References: Subject: Re: More IBM Netfinity 3500 SMP problem Date: Thu, 22 Mar 2001 17:26:32 -0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4133.2400 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org ----- Original Message ----- From: "Tom Duffey" To: Sent: Thursday, March 22, 2001 5:03 PM Subject: More IBM Netfinity 3500 SMP problem > The output of mptable shows that this system has three busses, but FreeBSD > defaults to 4. So, I attempted to recompile a kernel using more specific > SMP options, namely: > > options SMP > options APIC_IOB > options NBUS=3 > > But 'config' complains: > > smp:61: unknown option "NBUS". > > Are the optional SMP paramaters no longer available? Does it > matter? Please let me know if there's anything I can do to help make > FreeBSD's SMP support work with this hardware. You shouldn't need those options. I use the 3500 with Dual PIII 500s, and it works fine, just options SMP To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Mar 22 17:36:45 2001 Delivered-To: freebsd-smp@freebsd.org Received: from meow.osd.bsdi.com (meow.osd.bsdi.com [204.216.28.88]) by hub.freebsd.org (Postfix) with ESMTP id 6438437B71E for ; Thu, 22 Mar 2001 17:36:42 -0800 (PST) (envelope-from jhb@FreeBSD.org) Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241]) by meow.osd.bsdi.com (8.11.2/8.11.2) with ESMTP id f2N1aIG93081; Thu, 22 Mar 2001 17:36:18 -0800 (PST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <000a01c0b338$4bc3d680$aa240018@cx443070b> Date: Thu, 22 Mar 2001 17:36:10 -0800 (PST) From: John Baldwin To: Jeremiah Gowdy Subject: Re: More IBM Netfinity 3500 SMP problem Cc: freebsd-smp@FreeBSD.org, Tom Duffey Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On 23-Mar-01 Jeremiah Gowdy wrote: > > ----- Original Message ----- > From: "Tom Duffey" > To: > Sent: Thursday, March 22, 2001 5:03 PM > Subject: More IBM Netfinity 3500 SMP problem > > >> The output of mptable shows that this system has three busses, but FreeBSD >> defaults to 4. So, I attempted to recompile a kernel using more specific >> SMP options, namely: >> >> options SMP >> options APIC_IOB >> options NBUS=3 >> >> But 'config' complains: >> >> smp:61: unknown option "NBUS". >> >> Are the optional SMP paramaters no longer available? Does it >> matter? Please let me know if there's anything I can do to help make >> FreeBSD's SMP support work with this hardware. > > You shouldn't need those options. I use the 3500 with Dual PIII 500s, and > it works fine, just options SMP Err, and APIC_IO I hope. The kernel now examines the MP table and dynamically figures out how many busses, cpus, etc. to deal with on the fly, so it should work fine without needing NBUS tweaked. -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Mar 22 19:55:42 2001 Delivered-To: freebsd-smp@freebsd.org Received: from linuxpower.p00t.net (mke-24-167-255-186.wi.rr.com [24.167.255.186]) by hub.freebsd.org (Postfix) with ESMTP id 92E5337B71A for ; Thu, 22 Mar 2001 19:55:35 -0800 (PST) (envelope-from tduffey@wi.rr.com) Received: from localhost (trout@localhost) by linuxpower.p00t.net (8.11.3/8.11.3) with ESMTP id f2N3tYG11743 for ; Thu, 22 Mar 2001 21:55:34 -0600 Date: Thu, 22 Mar 2001 21:55:34 -0600 (CST) From: Tom Duffey To: freebsd-smp@freebsd.org Subject: Re: More IBM Netfinity 3500 SMP problem In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Thanks for clearing up the confusion regarding optional SMP kernel paramaters. Unfortunately, my 3500 with dual PIII 733's still fails to boot if I compile the kernel with "options SMP" and "options APIC_IO." Is there anything out of the ordinary that must be done to make SMP work with the Netfinity 3500 M20's? Is it possible to step through the kernel boot procedure to determine what exactly is causing the machine to stop? Or, is there any information I can provide that would be useful to the SMP developers to resolve this issue? Thanks, Tom Duffey > On 23-Mar-01 Jeremiah Gowdy wrote: > > > > ----- Original Message ----- > > From: "Tom Duffey" > > To: > > Sent: Thursday, March 22, 2001 5:03 PM > > Subject: More IBM Netfinity 3500 SMP problem > > > > > >> The output of mptable shows that this system has three busses, but FreeBSD > >> defaults to 4. So, I attempted to recompile a kernel using more specific > >> SMP options, namely: > >> > >> options SMP > >> options APIC_IOB > >> options NBUS=3 > >> > >> But 'config' complains: > >> > >> smp:61: unknown option "NBUS". > >> > >> Are the optional SMP paramaters no longer available? Does it > >> matter? Please let me know if there's anything I can do to help make > >> FreeBSD's SMP support work with this hardware. > > > > You shouldn't need those options. I use the 3500 with Dual PIII 500s, and > > it works fine, just options SMP > > Err, and APIC_IO I hope. The kernel now examines the MP table and dynamically > figures out how many busses, cpus, etc. to deal with on the fly, so it should > work fine without needing NBUS tweaked. > > -- > > John Baldwin -- http://www.FreeBSD.org/~jhb/ > PGP Key: http://www.baldwin.cx/~john/pgpkey.asc > "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Mar 22 20:27:15 2001 Delivered-To: freebsd-smp@freebsd.org Received: from ambrisko.com (adsl-216-103-208-74.dsl.snfc21.pacbell.net [216.103.208.74]) by hub.freebsd.org (Postfix) with ESMTP id 52DF437B71A for ; Thu, 22 Mar 2001 20:27:13 -0800 (PST) (envelope-from ambrisko@ambrisko.com) Received: (from ambrisko@localhost) by ambrisko.com (8.11.2/8.11.2) id f2N4RBu19168; Thu, 22 Mar 2001 20:27:11 -0800 (PST) (envelope-from ambrisko) From: Doug Ambrisko Message-Id: <200103230427.f2N4RBu19168@ambrisko.com> Subject: Re: More IBM Netfinity 3500 SMP problem In-Reply-To: "from Tom Duffey at Mar 22, 2001 09:55:34 pm" To: Tom Duffey Date: Thu, 22 Mar 2001 20:27:11 -0800 (PST) Cc: freebsd-smp@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL82 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Tom Duffey writes: | Thanks for clearing up the confusion regarding optional SMP kernel | paramaters. | | Unfortunately, my 3500 with dual PIII 733's still fails to boot if I | compile the kernel with "options SMP" and "options APIC_IO." Is there | anything out of the ordinary that must be done to make SMP work with the | Netfinity 3500 M20's? FYI the dual CPU IBM IntelliStations M Pro 6868 needed an upgraded BIOS to work with SMP. It was a BIOS error noted in the release notes for the BIOS stating problems with SMP. The IntellisStation use the Intel 840 chipset. Prior 6889 used the BX chipset and we just fine. Since IBM rebrands the same hardware under different names you could be using the same board and then need the BIOS update. Doug A. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Fri Mar 23 1:56:32 2001 Delivered-To: freebsd-smp@freebsd.org Received: from pasteur.alize-sfl.com (pasteur.alize-sfl.com [195.6.237.2]) by hub.freebsd.org (Postfix) with ESMTP id 0FE9237B718 for ; Fri, 23 Mar 2001 01:56:25 -0800 (PST) (envelope-from receiver@alize-sfl.com) Received: (from receiver@localhost) by pasteur.alize-sfl.com (8.9.3/8.9.3) id KAA29579 for freebsd-smp@freebsd.org; Fri, 23 Mar 2001 10:56:24 +0100 Date: Fri, 23 Mar 2001 10:56:24 +0100 From: receiver@alize-sfl.com To: freebsd-smp@freebsd.org Subject: SMP / trap12 / heat problem. Message-ID: <20010323105624.C28104@pasteur.alize-sfl.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2i Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Hi all, yesterday i bought a ASUS CUV4X-D with 2 PIII 800 & 4x256Mo SDRAM. (i've been expecting a smp machine for 4 years ;-) ). please note that i'm totally new to SMP world, so be indulgent please :) i've encountered different problems: * first, when everything is alright (temperature disabled in BIOS, bi-pro kernel), the system can eat up to 45% (avg 15-35%) of CPU when building world -j 4. do you think it's normal ? (/usr/src and /usr/obj are on the same 40Mb/s SCSI disk (seagate) on an AHA2940UW (not U2W)) * second, when building world -j 4, i see some (not many) calcru errors, only with ld and as, not make or cc1. some = 12 exactly (the build world didn't terminate : i checked LINT, and sysctl'd kern.timecounter.method, then the console was *FLOODED* with 'microuptime went backward' messages, i switched to X to be a bit cooler to type things, xconsole freezed, then X freezed, keyboard too, then i powered off and went to bed. ;-( third, bios problem : when all hardware monitors in BIOS are on, in monoproc kernel, everything is fine. when booting my SMP kernel, the machines starts *beeping* near the 'waiting 5 seconds for scsi devices to settle'. if i disable CPU#0 temperature watch in BIOS, everything is fine. * independently (sorry, i don't know if this word exists), healthd: * does not find CPU temp / fan properties in ISA mode. * cannot find smb0, even if my kernel is compiled with support for it (it worked with the same options on my old ABIT P2something. * reports 6.86V for 5V, 14.xx for 12V, and 4.x for 3.3VCORE. (the bios reports 5.02, 12.8 and 3.01. * when i try AUTO_EOI_1 and AUTO_EOI_2 and NTIMECOUNTER=20 in my SMP kernel, nmbd (at boot) kills the kernel which says : abort trap 12 : default page while in kernel mode, and so on. i didn't have time this morning to test which of the 3 options is faultive (does this exist too ?). I suspect my power supply to be not so good. It's a 300W, but not for SMP mother boards, it has no EAUX pin. i will try to change it at noon. (i've got two 250W to test, only to see if healthd reports better voltage). I will try to swap the CPU's too (CPU#0 is at ~170K (75°C) and CPU#1 is at ~100K (49°C), acording to the bios hardware monitor. Generally, the #0 is *much* hotter than #1. I think i've said everything. All the hardware is brand new. if you want my KERNEL file, my dmesg, i will give you all that in the afternoon (it's currently 10:36AM in France), and my box is at home. note that for the moment, i don't know how to debug the kernel and catch the page fault message and all the magic kdb things... i've got only few knowledge of that sort of things. but i can learn (and i want to). thanks for you comments and your help, keep the good work, i love freebsd. PS: does anyone knows why at BIOS boot i get : ------------------------------ blahblah ACPI rev blah CPU1: Intel blahblah CPU2:*Intel blahblah ^ this star ? ------------------------------ could this be a problem on CPU2 (#1 for us) ? or is the mb just happy to see a second CPU ? thanks again, Olivier Cortes Free Software Admin PS2: i checked the mailing lists archives (for -questions, -hackers, and -smp), and have read smp(4), mptables(1), sync(*), fsync(*), syncer(*). i didn't find anything relevant for my problems. overall, i reboot ~10 times during 4 hours, and i love vinum (fscking a 2x30 stripped IDE storage is quick :) ). To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Fri Mar 23 8:28:29 2001 Delivered-To: freebsd-smp@freebsd.org Received: from linuxpower.p00t.net (mke-24-167-255-186.wi.rr.com [24.167.255.186]) by hub.freebsd.org (Postfix) with ESMTP id 037B237B71D for ; Fri, 23 Mar 2001 08:28:26 -0800 (PST) (envelope-from tduffey@wi.rr.com) Received: from localhost (trout@localhost) by linuxpower.p00t.net (8.11.3/8.11.3) with ESMTP id f2NGPhH00226 for ; Fri, 23 Mar 2001 10:28:24 -0600 Date: Fri, 23 Mar 2001 10:25:43 -0600 (CST) From: Tom Duffey To: freebsd-smp@freebsd.org Subject: More detailed debugging info for Netfinity SMP lock Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Thanks to everyone for the help so far. This morning I upgraded the system BIOS to the latest provided by IBM (v1.06) but still cannot boot the system with an SMP enabled kernel. However, I enabled the kernel debugger and came up with the following: stopped at atkbd_isa_intr+0x19; ret stopped at Xresume1+0x35: cli stopped at Xresume1+0x36: lock andl $-0x3,iactive stopped at Xresume1+0x3e: pushl $0xc02e3140 stopped at Xresume1+0x43: call s_lock stopped at s_lock: movel 0x4(%esp),$edx stopped at setlock: movel %fs:0x94,%ecx stopped at setlock+0x7: incl %ecx stopped at setlock+0x8: movl $0,%eax stopped at setlock+0xd: lock cmpxchgl %ecx,0(%edx) panic: rslock: cpu: 1, addr 0xc02e3140, lock: 0x01000001 mp_lock = 01000003; cpuid = 1; lapic.id = 00000000 Debugger("panic") Stopped at Debugger+0x34: movb $0,in_Debugger,597 db> I am not a kernel hacker so unfortunately I don't know where to go next. Is this information useful to anyone? I realize this might be some obscure problem since others claim to be doing SMP on similar Netfinity machines. So, should I continue to post debugging information to this list or just forget about it? The machine runs fine without SMP so I will probably get by for the time being but it would be nice to hear whether or not this issue is going to be looked at by someone and possibly fixed in the future. Once again, thanks for your time. Best Regards, Tom Duffey To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Fri Mar 23 18:27: 5 2001 Delivered-To: freebsd-smp@freebsd.org Received: from meow.osd.bsdi.com (meow.osd.bsdi.com [204.216.28.88]) by hub.freebsd.org (Postfix) with ESMTP id 8F5E737B71B for ; Fri, 23 Mar 2001 18:26:42 -0800 (PST) (envelope-from jhb@FreeBSD.org) Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241]) by meow.osd.bsdi.com (8.11.2/8.11.2) with ESMTP id f2O2PwG30915; Fri, 23 Mar 2001 18:25:59 -0800 (PST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Fri, 23 Mar 2001 18:26:08 -0800 (PST) From: John Baldwin To: Tom Duffey Subject: RE: More detailed debugging info for Netfinity SMP lock Cc: freebsd-smp@FreeBSD.org Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On 23-Mar-01 Tom Duffey wrote: > Thanks to everyone for the help so far. This morning I upgraded the > system BIOS to the latest provided by IBM (v1.06) but still cannot boot > the system with an SMP enabled kernel. However, I enabled the kernel > debugger and came up with the following: > > stopped at atkbd_isa_intr+0x19; ret > stopped at Xresume1+0x35: cli > stopped at Xresume1+0x36: lock andl $-0x3,iactive > stopped at Xresume1+0x3e: pushl $0xc02e3140 > stopped at Xresume1+0x43: call s_lock > stopped at s_lock: movel 0x4(%esp),$edx > stopped at setlock: movel %fs:0x94,%ecx > stopped at setlock+0x7: incl %ecx > stopped at setlock+0x8: movl $0,%eax > stopped at setlock+0xd: lock cmpxchgl %ecx,0(%edx) > panic: rslock: cpu: 1, addr 0xc02e3140, lock: 0x01000001 > mp_lock = 01000003; cpuid = 1; lapic.id = 00000000 > Debugger("panic") > Stopped at Debugger+0x34: movb $0,in_Debugger,597 > db> Type 'trace' here to see where we are. It is recursing on a simplelock which is very bad. It is probably a kernel bug but could be something else. -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sat Mar 24 10:52:20 2001 Delivered-To: freebsd-smp@freebsd.org Received: from linuxpower.p00t.net (mke-24-167-255-186.wi.rr.com [24.167.255.186]) by hub.freebsd.org (Postfix) with ESMTP id CCC2037B71A; Sat, 24 Mar 2001 10:52:12 -0800 (PST) (envelope-from tduffey@wi.rr.com) Received: from localhost (trout@localhost) by linuxpower.p00t.net (8.11.3/8.11.3) with ESMTP id f2OIqB601670; Sat, 24 Mar 2001 12:52:11 -0600 Date: Sat, 24 Mar 2001 12:52:11 -0600 (CST) From: Tom Duffey To: John Baldwin Cc: freebsd-smp@FreeBSD.org Subject: RE: More detailed debugging info for Netfinity SMP lock In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Here's the trace, taken directly after the panic shown below. Hopefully I'm doing this right ... db> trace Debugger(c0x798d2) at Debugger+0x34 panic(c0258f5a,1,c02e3140,10000001,c024a26b) at panic+0xa4 bsl1(0,a0,0) at bsl1 selected_apic_ipi(1,a0,0,a,ff80dee8) at selected_apic_ipi+0x3a stop_cpus(1,0,0,1,ff80df34) at stop_cpus+0x21 kdb_trap(a,0,ff80df3c) at kdb_trap+0xe5 trap(18,10,10,ff80a864,0) at trap+0x454 calltrap() at calltrap+0x17 --- trap 0xa, eip = 0xc0258f2d, esp = 0xff80df7c, ebp = 0xff80dfdc setlock(3a60000,0,0,1,0) at setlock+0x11 vm_page_zero_idle(c02596c4, Fatal trap 12: page fault while in kernel mode mp_lock = 01000007; cpuid = 1, lapic.id = 00000000 fault virtual address = 0xff80e000 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0247c30 stack pointer = 0x10:0xff80dbc8 frame pointer = 0x10:0xff80dbcc code segment = hbase 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, defs32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = tty <- SMP: XXX kernel: type 12 trap, code 0 Let me know if I've done something incorrectly or missed anything. Thanks, Tom Duffey On Fri, 23 Mar 2001, John Baldwin wrote: > > On 23-Mar-01 Tom Duffey wrote: > > Thanks to everyone for the help so far. This morning I upgraded the > > system BIOS to the latest provided by IBM (v1.06) but still cannot boot > > the system with an SMP enabled kernel. However, I enabled the kernel > > debugger and came up with the following: > > > > stopped at atkbd_isa_intr+0x19; ret > > stopped at Xresume1+0x35: cli > > stopped at Xresume1+0x36: lock andl $-0x3,iactive > > stopped at Xresume1+0x3e: pushl $0xc02e3140 > > stopped at Xresume1+0x43: call s_lock > > stopped at s_lock: movel 0x4(%esp),$edx > > stopped at setlock: movel %fs:0x94,%ecx > > stopped at setlock+0x7: incl %ecx > > stopped at setlock+0x8: movl $0,%eax > > stopped at setlock+0xd: lock cmpxchgl %ecx,0(%edx) > > panic: rslock: cpu: 1, addr 0xc02e3140, lock: 0x01000001 > > mp_lock = 01000003; cpuid = 1; lapic.id = 00000000 > > Debugger("panic") > > Stopped at Debugger+0x34: movb $0,in_Debugger,597 > > db> > > Type 'trace' here to see where we are. It is recursing on a simplelock which > is very bad. It is probably a kernel bug but could be something else. > > -- > > John Baldwin -- http://www.FreeBSD.org/~jhb/ > PGP Key: http://www.baldwin.cx/~john/pgpkey.asc > "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sat Mar 24 16:49:37 2001 Delivered-To: freebsd-smp@freebsd.org Received: from matt.MUNICH.v-net.org (u57n248.syd.eastlink.ca [24.222.57.248]) by hub.freebsd.org (Postfix) with ESMTP id BE1BB37B71A for ; Sat, 24 Mar 2001 16:49:27 -0800 (PST) (envelope-from matt@researcher.com) Received: from researcher.com (Windozzze [192.168.8.2]) by matt.MUNICH.v-net.org (8.9.3/8.9.3) with ESMTP id UAA32813 for ; Sat, 24 Mar 2001 20:49:19 -0400 (AST) (envelope-from matt@researcher.com) Message-ID: <3ABD405F.10802@researcher.com> Date: Sat, 24 Mar 2001 20:48:31 -0400 From: Matt Rudderham User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20010131 Netscape6/6.01 X-Accept-Language: en MIME-Version: 1.0 To: freebsd-smp@freebsd.org Subject: SMP Server Wont Reboot Properly Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Hi, I'm having a problem with a new server. It is dual PIII - 800s / 256MB running FreeBSD 4.1.1-Release. When doing a reboot, it appears to stop both CPUs, syncs disks as normal, but then locks where it is just about to restart. Does anyone have any ideas on what could be up? Custom kernel built with the standard SMP options, etc... Thanks - Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sat Mar 24 17: 4:34 2001 Delivered-To: freebsd-smp@freebsd.org Received: from matt.MUNICH.v-net.org (u57n248.syd.eastlink.ca [24.222.57.248]) by hub.freebsd.org (Postfix) with ESMTP id 7D6C637B71B for ; Sat, 24 Mar 2001 17:04:29 -0800 (PST) (envelope-from matt@researcher.com) Received: from researcher.com (Windozzze [192.168.8.2]) by matt.MUNICH.v-net.org (8.9.3/8.9.3) with ESMTP id VAA32845; Sat, 24 Mar 2001 21:04:24 -0400 (AST) (envelope-from matt@researcher.com) Message-ID: <3ABD43E7.3050400@researcher.com> Date: Sat, 24 Mar 2001 21:03:35 -0400 From: Matt Rudderham User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20010131 Netscape6/6.01 X-Accept-Language: en MIME-Version: 1.0 To: Barney Wolff , freebsd-smp@freebsd.org Subject: Re: SMP Server Wont Reboot Properly References: <3ABD405F.10802@researcher.com> <20010324195351.A58196@mx.databus.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org That was my first thinking, did a compile apm, etc out, and then recompiled with it in. Same result. Any other ideas? - Matt Barney Wolff wrote: > Try making sure all the power management stuff is disabled. > I don't think it plays nice with smp. > Barney Wolff > > On Sat, Mar 24, 2001 at 08:48:31PM -0400, Matt Rudderham wrote: > >> Hi, >> I'm having a problem with a new server. It is dual PIII - 800s / 256MB >> running FreeBSD 4.1.1-Release. When doing a reboot, it appears to stop >> both CPUs, syncs disks as normal, but then locks where it is just about >> to restart. Does anyone have any ideas on what could be up? Custom >> kernel built with the standard SMP options, etc... Thanks >> - Matt >> >> >> To Unsubscribe: send mail to majordomo@FreeBSD.org >> with "unsubscribe freebsd-smp" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message