From owner-freebsd-smp@FreeBSD.ORG Mon Apr 7 20:32:51 2003 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6186737B401 for ; Mon, 7 Apr 2003 20:32:51 -0700 (PDT) Received: from lakemtao08.cox.net (lakemtao08.cox.net [68.1.17.113]) by mx1.FreeBSD.org (Postfix) with ESMTP id 18B2243FA3 for ; Mon, 7 Apr 2003 20:32:50 -0700 (PDT) (envelope-from bzimmer@megavision.com) Received: from megavision.com ([68.13.87.30]) by lakemtao08.cox.net (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP id <20030408033248.KXOS7627.lakemtao08.cox.net@megavision.com> for ; Mon, 7 Apr 2003 23:32:48 -0400 Date: Mon, 7 Apr 2003 22:32:49 -0500 Mime-Version: 1.0 (Apple Message framework v551) Content-Type: text/plain; charset=US-ASCII; format=flowed From: Brad Zimmerman To: freebsd-smp@freebsd.org Content-Transfer-Encoding: 7bit Message-Id: X-Mailer: Apple Mail (2.551) Subject: problems getting SMP kernel to run on IBM intellistation X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2003 03:32:51 -0000 I'm running an IBM intellistation 6898-18U with 128MB of RAM installed and two Intel P2s running at 333MHz. I recently upgraded the box from a single P2 266MHz. The box had been running great with the single 266MHz processor, so, for the geek factor and the fact that processors are so cheap, I decided to spend the $20 and buy two new processors and "max the box out." Well, I installed both processors, upgraded the BIOS when prompted, and allowed the box's BIOS to essentially automatically set up the two processors. The BIOS thinks everything is fine with both processors (for what little that is worth). I did change the dip switches so that the box is configured for 333Mhz. Now, the box still runs great under the GENERIC kernel (no SMP enabled), or pretty much every other revision I have tried -except- for when I enable SMP in the kernel. I turn on the three SMP options listed in the kernel (and they were the only three I could find in the LINT config). I compile and install the new kernel, reboot and... the box freezes up hard shortly after I get the message about it (FreeBSD) enabling processor 1. I've run through this several times and with several versions of FreeBSD (4.7, 4.8 release candidates, and 4.8). They all wedge up, although now, with FreeBSD 4.8, it actually wedges earlier than before. 4.8 release candidates locked up about 5 minutes after the box was done booting. I could log in on the console and everything, but after I tried to do much of anything, the box would freeze and that was that. I'm open to suggestions. I can't guarantee that SMP even works properly on this box, as FreeBSD is the only OS I've run on it since installing dual processors. If there is a way to test this that I don't know about, that would certainly help narrow it down to either a HW problem or OS problem. Now, on that front, I did take both processors out, re-seated them - same problem. I went down to one processor and put the "blank" back into the 2nd processor slot. The box ran fine. I pulled that processor out and put the 2nd one into the slot 0. The box ran fine. Both processors seem to work _individually_ but not with SMP. I did search through the archives of -questsion and -smp and didn't find anything too relevant. I've avoided 5.0 since it seemed to me that, with only two processors, I shouldn't need 5.0's SMP features. I also use the box for things every day and would like it to be relatively stable. Honestly, I can live without SMP on the box, I just wanted to get it working if I could. Thank you. If I've left out relevant information, I'll be happy to provide it. bz From owner-freebsd-smp@FreeBSD.ORG Mon Apr 7 20:40:07 2003 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7FD7037B401 for ; Mon, 7 Apr 2003 20:40:07 -0700 (PDT) Received: from kurdistan.ath.cx (adsl-63-207-238-20.dsl.chic01.pacbell.net [63.207.238.20]) by mx1.FreeBSD.org (Postfix) with ESMTP id A3D0043FA3 for ; Mon, 7 Apr 2003 20:40:06 -0700 (PDT) (envelope-from sereciya@kurdistan.ath.cx) Received: from kurdistan.ath.cx (ns1 [127.0.0.1]) by kurdistan.ath.cx (8.12.8/8.12.6) with ESMTP id h383e5Y2063290; Mon, 7 Apr 2003 20:40:05 -0700 (PDT) (envelope-from sereciya@kurdistan.ath.cx) Received: (from sereciya@localhost) by kurdistan.ath.cx (8.12.8/8.12.6/Submit) id h383e5RL063289; Mon, 7 Apr 2003 20:40:05 -0700 (PDT) Date: Mon, 7 Apr 2003 20:40:05 -0700 From: =?unknown-8bit?Q?S=EAr=EAciya_Kurdistan=EE?= To: freebsd-smp@freebsd.org Message-ID: <20030408034005.GB62752@kurdistan.ath.cx> References: Mime-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.4i cc: bzimmer@megavision.com Subject: Re: problems getting SMP kernel to run on IBM intellistation X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2003 03:40:07 -0000 Hello, > Now, on that front, I did take both processors out, re-seated them - > same problem. I went down to one processor and put the "blank" back > into the 2nd processor slot. The box ran fine. I pulled that processor > out and put the 2nd one into the slot 0. The box ran fine. Both > processors seem to work _individually_ but not with SMP. What specificaly are the lines you enabled for SMP in the kernel? I believe that with 4.7+ there are only two lines needed: options SMP # Symmetric MultiProcessor Kernel options APIC_IO # Symmetric (APIC) I/O You may want to have the following in there too: options SYSVSHM # include support for shared memory options SHMSEG=9 # max shared memory segments per process options SYSVSEM # include support for semaphores options SYSVMSG # include support for message queues Try that and let us know what happens. -- +--------------------------------------------------------------+ | Welat xwe ava nake, dest bidin hevdu, pist nedin tu dijminî | | Riya azadiyê ne hêsan e, hêviya xwe bernedin, dema me | | nêzîk e. | | | | Hevaltî bi kesên du rû nekin, hevaltî bi hevdu ra bikin | | Ne ji hevaltiya wan kesên pêxwas û rû dirêj, ne bi wan | | kesên xwînperest, ne jî ji yên din. | | | | -Sêrêciya Kurdistanî | +--------------------------------------------------------------+ translation provided on request: sereciya@kurdistan.ath.cx From owner-freebsd-smp@FreeBSD.ORG Tue Apr 8 02:05:40 2003 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7D8AA37B401 for ; Tue, 8 Apr 2003 02:05:40 -0700 (PDT) Received: from scribble.fsn.hu (scribble.fsn.hu [193.224.40.95]) by mx1.FreeBSD.org (Postfix) with SMTP id F39BD43F93 for ; Tue, 8 Apr 2003 02:05:38 -0700 (PDT) (envelope-from bra@fsn.hu) Received: (qmail 10013 invoked by uid 1000); 8 Apr 2003 09:05:36 -0000 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 8 Apr 2003 09:05:36 -0000 Date: Tue, 8 Apr 2003 11:05:36 +0200 (CEST) From: Attila Nagy To: "Cagle, John (ISS-Houston)" In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-smp@freebsd.org cc: Gabor Esperon Subject: RE: DUAL XEON 2.40Mhz SMP APIC_IO Troubles X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2003 09:05:40 -0000 Hello, > If you're using -current (which has a patch for this issue) you'll need > to also enable HTT. I have a Fujitsu-Siemens F250 machine here (dual 1,8 GHz Xeon) and the same problem. Enabling, or disabling HTT does not help. Disabling ACPI helps. ----------[ Free Software ISOs - http://www.fsn.hu/?f=download ]---------- Attila Nagy e-mail: Attila.Nagy@fsn.hu Free Software Network (FSN.HU) phone @work: +361 210 1415 (194) cell.: +3630 306 6758 From owner-freebsd-smp@FreeBSD.ORG Tue Apr 8 12:31:46 2003 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6D9D737B401 for ; Tue, 8 Apr 2003 12:31:46 -0700 (PDT) Received: from mail.speakeasy.net (mail14.speakeasy.net [216.254.0.214]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6AA7143F93 for ; Tue, 8 Apr 2003 12:31:45 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 27457 invoked from network); 8 Apr 2003 19:31:51 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender )encrypted SMTP for ; 8 Apr 2003 19:31:51 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.8/8.12.8) with ESMTP id h38JVfOv046770; Tue, 8 Apr 2003 15:31:41 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.4 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20030317152235.M52165@beagle.fokus.fraunhofer.de> Date: Tue, 08 Apr 2003 15:31:40 -0400 (EDT) From: John Baldwin To: Harti Brandt cc: jeff@FreeBSD.ORG cc: smp@FreeBSD.ORG Subject: Re: malloc.9 locking section X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2003 19:31:46 -0000 On 17-Mar-2003 Harti Brandt wrote: > Index: malloc.9 > =================================================================== > RCS file: /home/ncvs/src/share/man/man9/malloc.9,v > retrieving revision 1.30 > diff -u -r1.30 malloc.9 > --- malloc.9 24 Feb 2003 05:53:27 -0000 1.30 > +++ malloc.9 17 Mar 2003 15:06:14 -0000 > > [snip] Looks good to me. While you are at it, please kill the following from the manpage (if you aren't already doing so): Any calls to malloc() or free() when holding a vnode(9) interlock, will cause a LOR (Lock Order Reversal) due to the interwining of VM Objects and Vnodes. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ From owner-freebsd-smp@FreeBSD.ORG Tue Apr 8 13:03:00 2003 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5A43A37B401; Tue, 8 Apr 2003 13:03:00 -0700 (PDT) Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by mx1.FreeBSD.org (Postfix) with ESMTP id 921F743FAF; Tue, 8 Apr 2003 13:02:59 -0700 (PDT) (envelope-from alc@cs.rice.edu) Received: from localhost (localhost [127.0.0.1]) by cs.rice.edu (Postfix) with SMTP id 1252F4AD5A; Tue, 8 Apr 2003 15:02:59 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by cs.rice.edu (Postfix) with ESMTP id D1C024AD59; Tue, 8 Apr 2003 15:02:58 -0500 (CDT) Received: from cs.rice.edu ([127.0.0.1]) by localhost (cs.rice.edu [127.0.0.1:10024]) (amavisd-new) with ESMTP id 25798-09; Tue, 8 Apr 2003 15:02:57 -0500 (CDT) Received: by cs.rice.edu (Postfix, from userid 19572) id 009ED4AD57; Tue, 8 Apr 2003 15:02:56 -0500 (CDT) Date: Tue, 8 Apr 2003 15:02:56 -0500 From: Alan Cox To: John Baldwin Message-ID: <20030408200256.GJ1112@cs.rice.edu> References: <20030317152235.M52165@beagle.fokus.fraunhofer.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.28i X-Virus-Scanned: by amavis-20021227p2 X-DCC--Metrics: cs.rice.edu 1066; Body=1 Fuz1=1 Fuz2=1 cc: jeff@FreeBSD.ORG cc: smp@FreeBSD.ORG Subject: Re: malloc.9 locking section X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2003 20:03:00 -0000 On Tue, Apr 08, 2003 at 03:31:40PM -0400, John Baldwin wrote: > > On 17-Mar-2003 Harti Brandt wrote: > > Index: malloc.9 > > =================================================================== > > RCS file: /home/ncvs/src/share/man/man9/malloc.9,v > > retrieving revision 1.30 > > diff -u -r1.30 malloc.9 > > --- malloc.9 24 Feb 2003 05:53:27 -0000 1.30 > > +++ malloc.9 17 Mar 2003 15:06:14 -0000 > > > > [snip] > > Looks good to me. While you are at it, please kill the following > from the manpage (if you aren't already doing so): > > Any calls to malloc() or free() when holding a vnode(9) interlock, will > cause a LOR (Lock Order Reversal) due to the interwining of VM Objects > and Vnodes. > Why? The above statement is true. Alan From owner-freebsd-smp@FreeBSD.ORG Tue Apr 8 13:47:31 2003 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 83E5C37B405 for ; Tue, 8 Apr 2003 13:47:31 -0700 (PDT) Received: from mail.speakeasy.net (mail13.speakeasy.net [216.254.0.213]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6BAD243FB1 for ; Tue, 8 Apr 2003 13:47:30 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 3133 invoked from network); 8 Apr 2003 20:47:33 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender )encrypted SMTP for ; 8 Apr 2003 20:47:33 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.8/8.12.8) with ESMTP id h38KlHOv046968; Tue, 8 Apr 2003 16:47:21 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.4 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20030408200256.GJ1112@cs.rice.edu> Date: Tue, 08 Apr 2003 16:47:14 -0400 (EDT) From: John Baldwin To: Alan Cox cc: smp@FreeBSD.ORG cc: jeff@FreeBSD.ORG Subject: Re: malloc.9 locking section X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2003 20:47:31 -0000 On 08-Apr-2003 Alan Cox wrote: > On Tue, Apr 08, 2003 at 03:31:40PM -0400, John Baldwin wrote: >> >> On 17-Mar-2003 Harti Brandt wrote: >> > Index: malloc.9 >> > =================================================================== >> > RCS file: /home/ncvs/src/share/man/man9/malloc.9,v >> > retrieving revision 1.30 >> > diff -u -r1.30 malloc.9 >> > --- malloc.9 24 Feb 2003 05:53:27 -0000 1.30 >> > +++ malloc.9 17 Mar 2003 15:06:14 -0000 >> > >> > [snip] >> >> Looks good to me. While you are at it, please kill the following >> from the manpage (if you aren't already doing so): >> >> Any calls to malloc() or free() when holding a vnode(9) interlock, will >> cause a LOR (Lock Order Reversal) due to the interwining of VM Objects >> and Vnodes. >> > > Why? The above statement is true. It's highly specific. Harti is adding wording to say "don't hold any locks when calling malloc() with M_WAITOK," not just vnode interlocks. If vnode interlocks are even a problem with M_NOWAIT, then perhaps you could add wording for that case to Harti's statement ("even with M_NOWAIT one cannot hold vnode interlocks..."). My main concern is that I don't want a situation where malloc(9) grows a huge laundry list of all the locks in the kernel saying that can't be held when it is called. Such a list would be hard to maintain and would easily rot, be incomplete, etc. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ From owner-freebsd-smp@FreeBSD.ORG Tue Apr 8 16:19:39 2003 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4CD3737B401 for ; Tue, 8 Apr 2003 16:19:39 -0700 (PDT) Received: from lakemtao06.cox.net (lakemtao06.cox.net [68.1.17.115]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4748543FA3 for ; Tue, 8 Apr 2003 16:19:38 -0700 (PDT) (envelope-from bzimmer@megavision.com) Received: from megavision.com ([68.13.87.30]) by lakemtao06.cox.net (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP id <20030408231936.VKNS23505.lakemtao06.cox.net@megavision.com>; Tue, 8 Apr 2003 19:19:36 -0400 Date: Tue, 8 Apr 2003 18:19:36 -0500 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Mime-Version: 1.0 (Apple Message framework v551) To: freebsd-smp@freebsd.org From: Brad Zimmerman In-Reply-To: <20030408034005.GB62752@kurdistan.ath.cx> Message-Id: <90A58D16-6A18-11D7-8B7C-00306549B92C@megavision.com> Content-Transfer-Encoding: quoted-printable X-Mailer: Apple Mail (2.551) cc: =?ISO-8859-1?Q?S=EAr=EAciya_Kurdistan=EE?= Subject: Re: problems getting SMP kernel to run on IBM intellistation X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2003 23:19:39 -0000 On Monday, April 7, 2003, at 10:40 PM, S=EAr=EAciya Kurdistan=EE wrote: > Hello, > >> Now, on that front, I did take both processors out, re-seated them - >> same problem. I went down to one processor and put the "blank" back >> into the 2nd processor slot. The box ran fine. I pulled that = processor >> out and put the 2nd one into the slot 0. The box ran fine. Both >> processors seem to work _individually_ but not with SMP. > > What specificaly are the lines you enabled for SMP in the kernel? > > I believe that with 4.7+ there are only two lines needed: > > options SMP # Symmetric MultiProcessor=20 > Kernel > options APIC_IO # Symmetric (APIC) I/O > > You may want to have the following in there too: > > options SYSVSHM # include support for shared memory > options SHMSEG=3D9 # max shared memory segments per=20= > process > options SYSVSEM # include support for semaphores > options SYSVMSG # include support for message queues > > Try that and let us know what happens. > Okay, I just got a chance to try it, and the system got a little=20 further. I already had the following enabled: options SMP options APIC_IO options SYSVSHM options SYSVSEM options SYSVMSG I added the following: options SHMSEG=3D9 I took out the following: options HTT Configured, compiled, installed, rebooted. This time, the system booted=20= all of the way up. I got the message telling me my second processor was=20= enabled and then got a login prompt. Definitely better. I logged in and=20= ran top just to see what was happening. It appeared that processes were=20= being assigned to both processors to run and everything went great...=20 for about 20 seconds and then the machine froze again. Rebooted and got=20= the same result. I'm now back on my GENERIC kernel and I'm good again -=20= but with only one processor working. I appreciate the ideas, and they definitely seemed to help, but I've=20 still got something not quite right. I'm open to try other ideas or=20 post any information that might be useful. Thanks! From owner-freebsd-smp@FreeBSD.ORG Wed Apr 9 00:31:36 2003 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1864A37B401; Wed, 9 Apr 2003 00:31:36 -0700 (PDT) Received: from alpha.siliconlandmark.com (alpha.siliconlandmark.com [209.69.98.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4F3BA43FBF; Wed, 9 Apr 2003 00:31:35 -0700 (PDT) (envelope-from andy@siliconlandmark.com) Received: from alpha.siliconlandmark.com (localhost [127.0.0.1]) h397VYUs034767; Wed, 9 Apr 2003 03:31:34 -0400 (EDT) (envelope-from andy@siliconlandmark.com) Received: from localhost (andy@localhost)h397VYXt034764; Wed, 9 Apr 2003 03:31:34 -0400 (EDT) (envelope-from andy@siliconlandmark.com) X-Authentication-Warning: alpha.siliconlandmark.com: andy owned process doing -bs Date: Wed, 9 Apr 2003 03:31:34 -0400 (EDT) From: Andre Guibert de Bruet To: current@freebsd.org In-Reply-To: <20030406172853.X92580@alpha.siliconlandmark.com> Message-ID: <20030409015004.A92580@alpha.siliconlandmark.com> References: <20030406172853.X92580@alpha.siliconlandmark.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE cc: freebsd-smp@freebsd.org Subject: ATA activity under SMP affecting USB interrupts? (Was Re: usb scheduling overruns under system load) X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Apr 2003 07:31:36 -0000 Hello again, I've been doing some additional testing. It appears that the scheduling overrun errors printed from ohci.c:1192 only seem to happen under moderate/high disk load. I've noticed that the message gets printed out under the following conditions: - (un)bzip2'ing, (un)gzip'ing larger files (20MB+). - cvsup'ing ports or system sources. - moving large files between disks. - compiling world and building a port. - large ftp file transfers over the local network. Everything in this list seems to involve disk activity. Unless there's something that I'm missing, is it fair to assume that there's something possibly in ATA-land which is affecting/delaying USB interrupt delivery? The disks in this system are identical Maxtor 200GB drives with 8MB cache, each connected to a seperate channel on the bundled Maxtor ATA133 card. ad4: 194481MB [395136/16/63] at ata2-master UDMA133 ad6: 194481MB [395136/16/63] at ata3-master UDMA133 I've made the output of 'pciconf -vl' available at: http://siliconlandmark.com/staff/andre/files/BLING.pciconf In order to narrow down the list of possible causes, I installed to dnetc port to see if CPU load would cause the message to be printed out. I then proceeded to disable ACPI, but that didn't help any. I guess I'll copy freebsd-smp@ on this one... Any ideas? > Andre Guibert de Bruet | Enterprise Software Consultant > > Silicon Landmark, LLC. | http://siliconlandmark.com/ > On Sun, 6 Apr 2003, Andre Guibert de Bruet wrote: > I've just installed -current on another one of my machines and have been > seeing "usb0: 1 scheduling overruns" messages printed to the console over > and over under load. As the load increases, so does the frequency of the > messages. After a bit of experimenting, I've found that large amounts of > network traffic (I used ftp) and bzip2'ing a large dump cause the message > to be printed roughly ten times a second. > > This computer had been running -stable for a few months prior and did not > exhibit this behavior. It's a dual Athlon MP 2000+ running on an Asus > A7M-266D with an onboard AMD-brand OHCI USB controller. The only USB > devices that are connected are a Microsoft Internet Keyboard Pro and an > IntelliMouse Optical (1.0A). Unplugging both devices doesn't fix the > problem. > > pciconf -lv thinks the following of the controller: > ohci0@pci2:0:0: class=3D0x0c0310 card=3D0x80441043 chip=3D0x74491022 rev= =3D0x07 hdr=3D0x00 > vendor =3D 'Advanced Micro Devices (AMD)' > device =3D 'AMD-768 USB Controller' > class =3D serial bus > subclass =3D USB > > usbdevs shows: > addr 1: OHCI root hub, AMD > addr 2: product 0x001c, Microsoft > addr 3: Microsoft IntelliMouse=AE Optical, Microsoft > > The kernel config file and dmesg can be found at: > http://siliconlandmark.com/staff/andre/files/BLING > http://siliconlandmark.com/staff/andre/files/BLING.dmesg > > uname -a: > FreeBSD bling.properkernel.com 5.0-CURRENT FreeBSD 5.0-CURRENT #1: Sun Ap= r 6 16:47:32 EDT 2003 root@bling.properkernel.com:/usr/src/sys/i386/co= mpile/BLING i386 From owner-freebsd-smp@FreeBSD.ORG Wed Apr 9 00:58:30 2003 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 64B4937B401; Wed, 9 Apr 2003 00:58:30 -0700 (PDT) Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8D98B43F93; Wed, 9 Apr 2003 00:58:29 -0700 (PDT) (envelope-from alc@cs.rice.edu) Received: from localhost (localhost [127.0.0.1]) by cs.rice.edu (Postfix) with SMTP id DE75A4AC83; Wed, 9 Apr 2003 02:58:28 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by cs.rice.edu (Postfix) with ESMTP id AA5704AC7B; Wed, 9 Apr 2003 02:58:28 -0500 (CDT) Received: from cs.rice.edu ([127.0.0.1]) by localhost (cs.rice.edu [127.0.0.1:10024]) (amavisd-new) with ESMTP id 23446-10; Wed, 9 Apr 2003 02:58:26 -0500 (CDT) Received: by cs.rice.edu (Postfix, from userid 19572) id D94A44ABF7; Wed, 9 Apr 2003 02:58:26 -0500 (CDT) Date: Wed, 9 Apr 2003 02:58:26 -0500 From: Alan Cox To: John Baldwin Message-ID: <20030409075826.GM1112@cs.rice.edu> References: <20030408200256.GJ1112@cs.rice.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.28i X-Virus-Scanned: by amavis-20021227p2 X-DCC--Metrics: cs.rice.edu 1067; Body=1 Fuz1=1 Fuz2=1 cc: smp@FreeBSD.ORG cc: jeff@FreeBSD.ORG Subject: Re: malloc.9 locking section X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Apr 2003 07:58:30 -0000 On Tue, Apr 08, 2003 at 04:47:14PM -0400, John Baldwin wrote: > > On 08-Apr-2003 Alan Cox wrote: > > On Tue, Apr 08, 2003 at 03:31:40PM -0400, John Baldwin wrote: > >> > >> On 17-Mar-2003 Harti Brandt wrote: > >> > Index: malloc.9 > >> > =================================================================== > >> > RCS file: /home/ncvs/src/share/man/man9/malloc.9,v > >> > retrieving revision 1.30 > >> > diff -u -r1.30 malloc.9 > >> > --- malloc.9 24 Feb 2003 05:53:27 -0000 1.30 > >> > +++ malloc.9 17 Mar 2003 15:06:14 -0000 > >> > > >> > [snip] > >> > >> Looks good to me. While you are at it, please kill the following > >> from the manpage (if you aren't already doing so): > >> > >> Any calls to malloc() or free() when holding a vnode(9) interlock, will > >> cause a LOR (Lock Order Reversal) due to the interwining of VM Objects > >> and Vnodes. > >> > > > > Why? The above statement is true. > > It's highly specific. Harti is adding wording to say "don't hold any > locks when calling malloc() with M_WAITOK," not just vnode interlocks. > If vnode interlocks are even a problem with M_NOWAIT, then perhaps you > could add wording for that case to Harti's statement ("even with M_NOWAIT > one cannot hold vnode interlocks..."). Holding a vnode interlock is problematic regardless of whether M_WAITOK or M_NOWAIT is specified. It's a rather non-obvious special case. Even free() is problematic. In December or January, I recall there being several reported lock order reversals due to this. This inspired someone to add the above comment to the man page. > ... My main concern is that I don't want > a situation where malloc(9) grows a huge laundry list of all the locks > in the kernel saying that can't be held when it is called. Such a list > would be hard to maintain and would easily rot, be incomplete, etc. > I think this is a one-of-a-kind special case. The vnode interlock is the only lock in this part of the memory management system that gets shared with another part of the kernel. Alan From owner-freebsd-smp@FreeBSD.ORG Wed Apr 9 01:31:18 2003 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4598F37B401 for ; Wed, 9 Apr 2003 01:31:18 -0700 (PDT) Received: from smtp806.mail.sc5.yahoo.com (smtp806.mail.sc5.yahoo.com [66.163.168.185]) by mx1.FreeBSD.org (Postfix) with SMTP id D222543FB1 for ; Wed, 9 Apr 2003 01:31:17 -0700 (PDT) (envelope-from tzap@pacbell.net) Received: from adsl-64-171-12-137.dsl.sntc01.pacbell.net (HELO kronos) (tzap@pacbell.net@64.171.12.137 with login) by smtp-sbc-v1.mail.vip.sc5.yahoo.com with SMTP; 9 Apr 2003 08:31:17 -0000 Message-ID: <001501c2fe72$63c1b040$0300a8c0@kronos> From: "Aleksandr Melentiev" To: Date: Wed, 9 Apr 2003 01:31:16 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Subject: System freezes with SMP support enabled X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Aleksandr Melentiev List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Apr 2003 08:31:18 -0000 Hello all, A quick note: I am not subscribed here yet, please reply directly to me. Problem: Apparently SMP support and network do not work well together here. I have tried both 5.0 and 4.8 releases. Once I recompile the kernels with SMP support, I start getting constant 'watchdog timeout' on 5.0, network would halt and system would not respond. Same happens on a 4.8 machine too when I try to transfer files >1mb in size to an SMP machine, system would freeze, no error messages though. Nothing in /var/log/messages. Changing network cards and increasing NMBCLUSTERS doesnt help. No such problems occur whatsoever without SMP support. Motherboard is an Intel N440BX with two Pentium III 500Mhz CPUs. Heres a part of dmesg: Apr 8 23:40:12 kronos /kernel: FreeBSD 4.8-RELEASE #4: Tue Apr 8 22:14:12 PDT 2003 Apr 8 23:40:12 kronos /kernel: alex@kronos.homeunix.org:/usr/obj/usr/src/sys/KRONOS Apr 8 23:40:12 kronos /kernel: Timecounter "i8254" frequency 1193182 Hz Apr 8 23:40:12 kronos /kernel: CPU: Pentium III/Pentium III Xeon/Celeron (498.75-MHz 686-class CPU) Apr 8 23:40:12 kronos /kernel: Origin = "GenuineIntel" Id = 0x672 Stepping = 2 Apr 8 23:40:12 kronos /kernel: Features=0x383fbff Apr 8 23:40:12 kronos /kernel: real memory = 536805376 (524224K bytes) Apr 8 23:40:12 kronos /kernel: avail memory = 518299648 (506152K bytes) Apr 8 23:40:12 kronos /kernel: Programming 24 pins in IOAPIC #0 Apr 8 23:40:12 kronos /kernel: IOAPIC #0 intpin 2 -> irq 0 Apr 8 23:40:12 kronos /kernel: FreeBSD/SMP: Multiprocessor motherboard Apr 8 23:40:12 kronos /kernel: cpu0 (BSP): apic id: 1, version: 0x00040011, at 0xfee00000 Apr 8 23:40:12 kronos /kernel: cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee00000 Apr 8 23:40:12 kronos /kernel: io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec00000 Apr 8 23:40:12 kronos /kernel: Preloaded elf kernel "kernel" at 0xc03e4000. Apr 8 23:40:12 kronos /kernel: Pentium Pro MTRR support enabled Apr 8 23:40:13 kronos /kernel: APIC_IO: Testing 8254 interrupt delivery Apr 8 23:40:13 kronos /kernel: APIC_IO: routing 8254 via IOAPIC #0 intpin 2 Apr 8 23:40:13 kronos /kernel: SMP: AP CPU #1 Launched! Did anyone experience anything like it? Any possible solutions? Thanks in advance. Alex. P.S. Please reply directly to me as I am not subscribed to the list yet. Thanks. From owner-freebsd-smp@FreeBSD.ORG Wed Apr 9 03:19:02 2003 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E5F1D37B401 for ; Wed, 9 Apr 2003 03:19:02 -0700 (PDT) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6AD3643FB1 for ; Wed, 9 Apr 2003 03:19:02 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0033.cvx22-bradley.dialup.earthlink.net ([209.179.198.33] helo=mindspring.com) by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 193CfH-0007fs-00; Wed, 09 Apr 2003 03:18:47 -0700 Message-ID: <3E93F331.E92FC7DF@mindspring.com> Date: Wed, 09 Apr 2003 03:17:21 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Aleksandr Melentiev References: <001501c2fe72$63c1b040$0300a8c0@kronos> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a40a61bad5646c4dc3597caac34a98fd59a2d4e88014a4647c350badd9bab72f9c350badd9bab72f9c cc: freebsd-smp@freebsd.org Subject: Re: System freezes with SMP support enabled X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Apr 2003 10:19:03 -0000 Aleksandr Melentiev wrote: > A quick note: I am not subscribed here yet, please reply directly to me. > > Problem: > Apparently SMP support and network do not work well together here. I have > tried both 5.0 and 4.8 releases. Once I recompile the kernels with SMP > support, I start getting constant 'watchdog timeout' on 5.0, network would > halt and system would not respond. Same happens on a 4.8 machine too when I > try to transfer files >1mb in size to an SMP machine, system would freeze, > no error messages though. Nothing in /var/log/messages. Changing network > cards and increasing NMBCLUSTERS doesnt help. No such problems occur > whatsoever without SMP support. > > Motherboard is an Intel N440BX with two Pentium III 500Mhz CPUs. What are both network cards? Are they fxp? There are a couple of possibilities to consider... The first is that if both cards are identical (same vendor, etc.), you might want to use a network card from a different vendor, to make sure it's not the network card driver. Second, it seems to me that there's a possibility for a deadlock if an interrupt comes in on one CPU, and an ithread to handle it is scheduled to run on a different CPU. You may want to try using SCHED_4BSD to see if that changes anything. -- Terry From owner-freebsd-smp@FreeBSD.ORG Wed Apr 9 09:00:25 2003 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2427937B404 for ; Wed, 9 Apr 2003 09:00:25 -0700 (PDT) Received: from mail.speakeasy.net (mail16.speakeasy.net [216.254.0.216]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0177E43FAF for ; Wed, 9 Apr 2003 09:00:24 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 19228 invoked from network); 9 Apr 2003 16:00:27 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender )encrypted SMTP for ; 9 Apr 2003 16:00:27 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.8/8.12.8) with ESMTP id h39G0LOv049695; Wed, 9 Apr 2003 12:00:21 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.4 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20030409075826.GM1112@cs.rice.edu> Date: Wed, 09 Apr 2003 12:00:21 -0400 (EDT) From: John Baldwin To: Alan Cox cc: jeff@FreeBSD.ORG cc: smp@FreeBSD.ORG Subject: Re: malloc.9 locking section X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Apr 2003 16:00:25 -0000 On 09-Apr-2003 Alan Cox wrote: > On Tue, Apr 08, 2003 at 04:47:14PM -0400, John Baldwin wrote: >> >> On 08-Apr-2003 Alan Cox wrote: >> > On Tue, Apr 08, 2003 at 03:31:40PM -0400, John Baldwin wrote: >> >> >> >> On 17-Mar-2003 Harti Brandt wrote: >> >> > Index: malloc.9 >> >> > =================================================================== >> >> > RCS file: /home/ncvs/src/share/man/man9/malloc.9,v >> >> > retrieving revision 1.30 >> >> > diff -u -r1.30 malloc.9 >> >> > --- malloc.9 24 Feb 2003 05:53:27 -0000 1.30 >> >> > +++ malloc.9 17 Mar 2003 15:06:14 -0000 >> >> > >> >> > [snip] >> >> >> >> Looks good to me. While you are at it, please kill the following >> >> from the manpage (if you aren't already doing so): >> >> >> >> Any calls to malloc() or free() when holding a vnode(9) interlock, will >> >> cause a LOR (Lock Order Reversal) due to the interwining of VM Objects >> >> and Vnodes. >> >> >> > >> > Why? The above statement is true. >> >> It's highly specific. Harti is adding wording to say "don't hold any >> locks when calling malloc() with M_WAITOK," not just vnode interlocks. >> If vnode interlocks are even a problem with M_NOWAIT, then perhaps you >> could add wording for that case to Harti's statement ("even with M_NOWAIT >> one cannot hold vnode interlocks..."). > > Holding a vnode interlock is problematic regardless of whether M_WAITOK > or M_NOWAIT is specified. It's a rather non-obvious special case. Even > free() is problematic. In December or January, I recall there being > several reported lock order reversals due to this. This inspired someone > to add the above comment to the man page. > >> ... My main concern is that I don't want >> a situation where malloc(9) grows a huge laundry list of all the locks >> in the kernel saying that can't be held when it is called. Such a list >> would be hard to maintain and would easily rot, be incomplete, etc. >> > > I think this is a one-of-a-kind special case. The vnode interlock is > the only lock in this part of the memory management system that gets > shared with another part of the kernel. Ok. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ From owner-freebsd-smp@FreeBSD.ORG Wed Apr 9 09:00:35 2003 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8A66137B401 for ; Wed, 9 Apr 2003 09:00:33 -0700 (PDT) Received: from mail.speakeasy.net (mail11.speakeasy.net [216.254.0.211]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6630E43FE3 for ; Wed, 9 Apr 2003 09:00:30 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: (qmail 10042 invoked from network); 9 Apr 2003 16:00:34 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender )encrypted SMTP for ; 9 Apr 2003 16:00:34 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.8/8.12.8) with ESMTP id h39G0ROv049701; Wed, 9 Apr 2003 12:00:28 -0400 (EDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.4 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <3E93F331.E92FC7DF@mindspring.com> Date: Wed, 09 Apr 2003 12:00:27 -0400 (EDT) From: John Baldwin To: Terry Lambert cc: Aleksandr Melentiev cc: freebsd-smp@freebsd.org Subject: Re: System freezes with SMP support enabled X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Apr 2003 16:00:35 -0000 On 09-Apr-2003 Terry Lambert wrote: > What are both network cards? Are they fxp? > > There are a couple of possibilities to consider... > > The first is that if both cards are identical (same vendor, etc.), > you might want to use a network card from a different vendor, to > make sure it's not the network card driver. This sounds like a sensible possibility and a worthy test. > Second, it seems to me that there's a possibility for a deadlock > if an interrupt comes in on one CPU, and an ithread to handle it > is scheduled to run on a different CPU. You may want to try using > SCHED_4BSD to see if that changes anything. Huh? Where in the code do you see this happening exactly? All the bits you should need to look at for this are in ithread_schedule() and ithread_loop() in sys/kern/kern_intr.c. Not only that, but 4.8 doesn't have ithreads so I doubt seriously that this is causing the lockups on 4.x. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ From owner-freebsd-smp@FreeBSD.ORG Wed Apr 9 14:58:12 2003 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1C26937B401 for ; Wed, 9 Apr 2003 14:58:11 -0700 (PDT) Received: from smtp804.mail.sc5.yahoo.com (smtp804.mail.sc5.yahoo.com [66.163.168.183]) by mx1.FreeBSD.org (Postfix) with SMTP id 6CB2443FA3 for ; Wed, 9 Apr 2003 14:58:10 -0700 (PDT) (envelope-from tzap@pacbell.net) Received: from adsl-64-171-12-137.dsl.sntc01.pacbell.net (HELO kronos) (tzap@pacbell.net@64.171.12.137 with login) by smtp-sbc-v1.mail.vip.sc5.yahoo.com with SMTP; 9 Apr 2003 21:58:01 -0000 Message-ID: <001b01c2fee3$16e1e710$0300a8c0@kronos> From: "Aleksandr Melentiev" To: "John Baldwin" , References: Date: Wed, 9 Apr 2003 14:57:36 -0700 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0018_01C2FEA8.5B9C5C90" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Subject: Re: System freezes with SMP support enabled X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Aleksandr Melentiev List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Apr 2003 21:58:12 -0000 This is a multi-part message in MIME format. ------=_NextPart_000_0018_01C2FEA8.5B9C5C90 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Here's an exact behavior when SMP support is enabled: Hence, one network card is onboard which is an fxp0. 5.0-RELEASE: Often spontaneous and constant 'watchdog timeouts' with different network cards that I have tried including SMC 1255 and 3Com 3C905-TX. An Intel Pro/100+ showed 'device timeout'. In all cases, system would stop responding. In most cases network connection would dissapear. Sometimes it would happen again on the first second after reboot. Onboard network card didnt show any errors! Only the PCI cards did. 4.8-RELEASE: Same behavior as above, only no error messages whatsoever, network connection goes down and system locks up. It is not spontaneous, happens only when I try to transfer >1MB files locally via ftp (tried several different ftpd and clients too), maybe other ways of high-speed transfering are affected too. However, if I throttle my ftp client's upload speed to 15KBytes/sec, it transfers without a problem and systems stays stable. None of the above happens when SMP support is disabled. I am attaching dmesg where SMP support is enabled. Might it be because of the PCI bridge? Regards, Alex ----- Original Message ----- From: "John Baldwin" To: "Terry Lambert" Cc: ; "Aleksandr Melentiev" Sent: Wednesday, April 09, 2003 9:00 AM Subject: Re: System freezes with SMP support enabled > > On 09-Apr-2003 Terry Lambert wrote: > > What are both network cards? Are they fxp? > > > > There are a couple of possibilities to consider... > > > > The first is that if both cards are identical (same vendor, etc.), > > you might want to use a network card from a different vendor, to > > make sure it's not the network card driver. > > This sounds like a sensible possibility and a worthy test. > > > Second, it seems to me that there's a possibility for a deadlock > > if an interrupt comes in on one CPU, and an ithread to handle it > > is scheduled to run on a different CPU. You may want to try using > > SCHED_4BSD to see if that changes anything. > > Huh? Where in the code do you see this happening exactly? All the > bits you should need to look at for this are in ithread_schedule() > and ithread_loop() in sys/kern/kern_intr.c. Not only that, but 4.8 > doesn't have ithreads so I doubt seriously that this is causing the > lockups on 4.x. > > -- > > John Baldwin <>< http://www.FreeBSD.org/~jhb/ > "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ ------=_NextPart_000_0018_01C2FEA8.5B9C5C90 Content-Type: application/octet-stream; name="dmesg" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="dmesg" Apr 8 23:40:12 kronos /kernel: Copyright (c) 1992-2003 The FreeBSD = Project. Apr 8 23:40:12 kronos /kernel: Copyright (c) 1979, 1980, 1983, 1986, = 1988, 1989, 1991, 1992, 1993, 1994 Apr 8 23:40:12 kronos /kernel: The Regents of the University of = California. All rights reserved. Apr 8 23:40:12 kronos /kernel: FreeBSD 4.8-RELEASE #4: Tue Apr 8 = 22:14:12 PDT 2003 Apr 8 23:40:12 kronos /kernel: = alex@kronos.homeunix.org:/usr/obj/usr/src/sys/KRONOS Apr 8 23:40:12 kronos /kernel: Timecounter "i8254" frequency 1193182 = Hz Apr 8 23:40:12 kronos /kernel: CPU: Pentium III/Pentium III = Xeon/Celeron (498.75-MHz 686-class CPU) Apr 8 23:40:12 kronos /kernel: Origin =3D "GenuineIntel" Id =3D 0x672 = Stepping =3D 2 Apr 8 23:40:12 kronos /kernel: = Features=3D0x383fbff Apr 8 23:40:12 kronos /kernel: real memory =3D 536805376 (524224K = bytes) Apr 8 23:40:12 kronos /kernel: avail memory =3D 518299648 (506152K = bytes) Apr 8 23:40:12 kronos /kernel: Programming 24 pins in IOAPIC #0 Apr 8 23:40:12 kronos /kernel: IOAPIC #0 intpin 2 -> irq 0 Apr 8 23:40:12 kronos /kernel: FreeBSD/SMP: Multiprocessor motherboard Apr 8 23:40:12 kronos /kernel: cpu0 (BSP): apic id: 1, version: = 0x00040011, at 0xfee00000 Apr 8 23:40:12 kronos /kernel: cpu1 (AP): apic id: 0, version: = 0x00040011, at 0xfee00000 Apr 8 23:40:12 kronos /kernel: io0 (APIC): apic id: 2, version: = 0x00170011, at 0xfec00000 Apr 8 23:40:12 kronos /kernel: Preloaded elf kernel "kernel" at = 0xc03e4000. Apr 8 23:40:12 kronos /kernel: VESA: v2.0, 2048k memory, flags:0x0, = mode table:0xc036a882 (1000022) Apr 8 23:40:12 kronos /kernel: VESA: Cirrus Logic GD-5480 VGA Apr 8 23:40:12 kronos /kernel: Pentium Pro MTRR support enabled Apr 8 23:40:12 kronos /kernel: md0: Malloc disk Apr 8 23:40:12 kronos /kernel: Using $PIR table, 8 entries at = 0xc00fdf40 Apr 8 23:40:12 kronos /kernel: npx0: on motherboard Apr 8 23:40:12 kronos /kernel: npx0: INT 16 interface Apr 8 23:40:12 kronos /kernel: pcib0: on motherboard Apr 8 23:40:12 kronos /kernel: pci0: on pcib0 Apr 8 23:40:12 kronos /kernel: sym0: <875> port 0x1400-0x14ff mem = 0xfa200000-0xfa200fff,0xfa204000-0xfa2040ff irq 11 at device 13.0 on = pci0 Apr 8 23:40:12 kronos /kernel: sym0: No NVRAM, ID 7, Fast-20, SE, = parity checking Apr 8 23:40:12 kronos /kernel: sym1: <875> port 0x1800-0x18ff mem = 0xfa201000-0xfa201fff,0xfa204400-0xfa2044ff irq 10 at device 13.1 on = pci0 Apr 8 23:40:12 kronos /kernel: sym1: No NVRAM, ID 7, Fast-20, SE, = parity checking Apr 8 23:40:12 kronos /kernel: fxp0: = port 0x1060-0x107f mem 0xfa000000-0xfa0fffff,0xfa205000-0xfa205fff irq 5 = at device 15.0 on pci0 Apr 8 23:40:12 kronos /kernel: fxp0: Ethernet address 00:90:27:73:5c:04 Apr 8 23:40:12 kronos /kernel: inphy0: = on miibus0 Apr 8 23:40:12 kronos /kernel: inphy0: 10baseT, 10baseT-FDX, = 100baseTX, 100baseTX-FDX, auto Apr 8 23:40:12 kronos /kernel: fxp1: = port 0x1080-0x10bf mem 0xfa100000-0xfa1fffff,0xfa202000-0xfa202fff irq = 11 at device 16.0 on pci0 Apr 8 23:40:12 kronos /kernel: fxp1: Ethernet address 00:d0:b7:53:ed:3c Apr 8 23:40:12 kronos /kernel: inphy1: = on miibus1 Apr 8 23:40:12 kronos /kernel: inphy1: 10baseT, 10baseT-FDX, = 100baseTX, 100baseTX-FDX, auto Apr 8 23:40:12 kronos /kernel: isab0: = at device 18.0 on pci0 Apr 8 23:40:12 kronos /kernel: isa0: on isab0 Apr 8 23:40:12 kronos /kernel: atapci0: = port 0x1050-0x105f at device 18.1 on pci0 Apr 8 23:40:12 kronos /kernel: ata0: at 0x1f0 irq 14 on atapci0 Apr 8 23:40:12 kronos /kernel: ata1: at 0x170 irq 15 on atapci0 Apr 8 23:40:12 kronos /kernel: uhci0: port 0x10c0-0x10df irq 10 at device 18.2 on pci0 Apr 8 23:40:12 kronos /kernel: usb0: on uhci0 Apr 8 23:40:12 kronos /kernel: usb0: USB revision 1.0 Apr 8 23:40:12 kronos /kernel: uhub0: Intel UHCI root hub, class 9/0, = rev 1.00/1.00, addr 1 Apr 8 23:40:12 kronos /kernel: uhub0: 2 ports with 2 removable, self = powered Apr 8 23:40:13 kronos /kernel: Timecounter "PIIX" frequency 3579545 Hz Apr 8 23:40:13 kronos /kernel: chip1: port 0x1040-0x104f at device 18.3 on pci0 Apr 8 23:40:13 kronos /kernel: pci0: at 20.0 Apr 8 23:40:13 kronos /kernel: orm0: