From owner-freebsd-smp Sun Jul 29 15:39:18 2001 Delivered-To: freebsd-smp@freebsd.org Received: from beppo.feral.com (beppo.feral.com [192.67.166.79]) by hub.freebsd.org (Postfix) with ESMTP id 222BA37B401; Sun, 29 Jul 2001 15:39:15 -0700 (PDT) (envelope-from mjacob@feral.com) Received: from wonky.feral.com (mjacob@wonky.feral.com [192.67.166.7]) by beppo.feral.com (8.11.3/8.11.3) with ESMTP id f6TMdEI92567; Sun, 29 Jul 2001 15:39:14 -0700 (PDT) (envelope-from mjacob@feral.com) Date: Sun, 29 Jul 2001 15:39:00 -0700 (PDT) From: Matthew Jacob Reply-To: To: John Baldwin Cc: Subject: RE: kaboom... In-Reply-To: Message-ID: <20010729153617.C44279-100000@wonky.feral.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org > > This is new however. The real problem is something trap'd during exit1(). It > would be helpful to see what source line is at exit1+0x15e4. The other issue > is that I've no idea why the process lock is being used as the interlock for a > vm_map lock. Happened again once. It may be if I do 2 make -j 8 kernel builds I get this. The panic is somewhere in exit1 where marked: -------- /* * notify interested parties of our demise. */ PROC_LOCK(p); KNOTE(&p->p_klist, NOTE_EXIT); /* * Notify parent that we're gone. If parent has the PS_NOCLDWAIT * flag set, or if the handler is set to SIG_IGN, notify process * 1 instead (and hope it will handle this situation). */ if ((p->p_pptr->p_procsig->ps_flag & PS_NOCLDWAIT) || p->p_pptr->p_sigacts->ps_sigact[_SIG_IDX(SIGCHLD)] == SIG_IGN) { struct proc *pp = p->p_pptr; >>>>> proc_reparent(p, initproc); /* * If this was the last child of our parent, notify * parent, so in case he was wait(2)ing, he will * continue. */ if (LIST_EMPTY(&pp->p_children)) wakeup((caddr_t)pp); } PROC_LOCK(p->p_pptr); if (p->p_sigparent && p->p_pptr != initproc) psignal(p->p_pptr, p->p_sigparent); else psignal(p->p_pptr, SIGCHLD); PROC_UNLOCK(p->p_pptr); --------- Shouldn't there be a PROC_LOC(p->p_pptr) prior to trying to touch or test stuff in the paren't PROC structure? Would it be easier to move PROC_LOCK(p->p_pptr) above the conditional? Naive.... -matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jul 29 15:50:22 2001 Delivered-To: freebsd-smp@freebsd.org Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by hub.freebsd.org (Postfix) with SMTP id 3DE0037B401; Sun, 29 Jul 2001 15:50:19 -0700 (PDT) (envelope-from iedowse@maths.tcd.ie) Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 29 Jul 2001 23:50:18 +0100 (BST) To: mjacob@feral.com Cc: John Baldwin , smp@FreeBSD.org Subject: Re: kaboom... In-Reply-To: Your message of "Sun, 29 Jul 2001 15:39:00 PDT." <20010729153617.C44279-100000@wonky.feral.com> Date: Sun, 29 Jul 2001 23:50:17 +0100 From: Ian Dowse Message-ID: <200107292350.aa62249@salmon.maths.tcd.ie> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org In message <20010729153617.C44279-100000@wonky.feral.com>, Matthew Jacob writes : >Happened again once. It may be if I do 2 make -j 8 kernel builds I get this. >The panic is somewhere in exit1 where marked: > if ((p->p_pptr->p_procsig->ps_flag & PS_NOCLDWAIT) > || p->p_pptr->p_sigacts->ps_sigact[_SIG_IDX(SIGCHLD)] == SIG_IGN) Yeah, see my post to -current on friday ("SIGCHLD changes causing.."). Matt Dillon is apparently looking into this, but I was able to find out that this is caused when the parent process is swapped out. It is ok to check p_procsig->ps_flag, but p_sigacts is in the struct user area that is inaccessible when the process is swapped out. Backing out kern_sig.c r1.125 and kern_exit.c r1.131 should fix it. Ian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Sun Jul 29 15:54:35 2001 Delivered-To: freebsd-smp@freebsd.org Received: from beppo.feral.com (beppo.feral.com [192.67.166.79]) by hub.freebsd.org (Postfix) with ESMTP id 96D7D37B403; Sun, 29 Jul 2001 15:54:31 -0700 (PDT) (envelope-from mjacob@feral.com) Received: from wonky.feral.com (mjacob@wonky.feral.com [192.67.166.7]) by beppo.feral.com (8.11.3/8.11.3) with ESMTP id f6TMsRI92801; Sun, 29 Jul 2001 15:54:27 -0700 (PDT) (envelope-from mjacob@feral.com) Date: Sun, 29 Jul 2001 15:54:12 -0700 (PDT) From: Matthew Jacob Reply-To: To: Ian Dowse Cc: John Baldwin , Subject: Re: kaboom... In-Reply-To: <200107292350.aa62249@salmon.maths.tcd.ie> Message-ID: <20010729155402.V44279-100000@wonky.feral.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Ah. Sorry for the noise then. On Sun, 29 Jul 2001, Ian Dowse wrote: > In message <20010729153617.C44279-100000@wonky.feral.com>, Matthew Jacob writes > : > >Happened again once. It may be if I do 2 make -j 8 kernel builds I get this. > >The panic is somewhere in exit1 where marked: > > > if ((p->p_pptr->p_procsig->ps_flag & PS_NOCLDWAIT) > > || p->p_pptr->p_sigacts->ps_sigact[_SIG_IDX(SIGCHLD)] == SIG_IGN) > > Yeah, see my post to -current on friday ("SIGCHLD changes causing.."). > Matt Dillon is apparently looking into this, but I was able to find > out that this is caused when the parent process is swapped out. It > is ok to check p_procsig->ps_flag, but p_sigacts is in the struct > user area that is inaccessible when the process is swapped out. > > Backing out kern_sig.c r1.125 and kern_exit.c r1.131 should fix it. > > Ian > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 30 3: 6:22 2001 Delivered-To: freebsd-smp@freebsd.org Received: from nx.dk (ns1.nx.dk [62.242.36.11]) by hub.freebsd.org (Postfix) with ESMTP id 1B50937B401 for ; Mon, 30 Jul 2001 03:06:17 -0700 (PDT) (envelope-from signout@signout.dk) Received: from signout (office.zipnet.dk [62.242.36.28]) by nx.dk (8.11.2/8.11.2) with SMTP id f6UA6E304636; Mon, 30 Jul 2001 12:06:14 +0200 Message-ID: <170f01c118df$c4c8eb80$7600a8c0@signout> From: =?iso-8859-1?Q?Dennis_Kj=E6r_Jensen?= To: Cc: "Vincent Janelle" References: <20010728075553.49d57063.random@carnagecopia.com> Subject: Re: install on a quad xeon, 4GB of ram, 4.3-STABLE Date: Mon, 30 Jul 2001 12:09:49 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4522.1200 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org I had the same problem on a 8-way XEON w/ 4 gig Removed some memory so it had 2Gig left, and the system booted without a glitch. dmesg output can be foud at http://leech.dk/compaq_8500_8_xeon_550_4_gig_memory/dmesg The kernel simply froze if more than 2 gig of mem was in the machine. I didn't have any more time to dig for a solution, but 2 gig was enough for starters. ... Dennis ----- Original Message ----- From: "Vincent Janelle" To: Sent: Saturday, July 28, 2001 4:55 PM Subject: install on a quad xeon, 4GB of ram, 4.3-STABLE > I can't seem to get the kernel to boot up on a machine with 4GB of ram.. On > bootup, it panics with: > > panic: swap_pager_swap_init: swap_zone=NULL > > I took the drives out and plugged it into another machine, modified NKGPT to be > 64. > > Any clues? > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-smp" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 30 9: 4:59 2001 Delivered-To: freebsd-smp@freebsd.org Received: from mail.wrs.com (unknown-1-11.windriver.com [147.11.1.11]) by hub.freebsd.org (Postfix) with ESMTP id 822D637B422 for ; Mon, 30 Jul 2001 09:04:54 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: from laptop.baldwin.cx (john@[147.11.46.217]) by mail.wrs.com (8.9.3/8.9.1) with ESMTP id JAA16795; Mon, 30 Jul 2001 09:04:42 -0700 (PDT) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20010729153617.C44279-100000@wonky.feral.com> Date: Mon, 30 Jul 2001 09:04:43 -0700 (PDT) From: John Baldwin To: Matthew Jacob Subject: RE: kaboom... Cc: smp@FreeBSD.org Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On 29-Jul-01 Matthew Jacob wrote: > --------- > Shouldn't there be a PROC_LOC(p->p_pptr) prior to trying to touch or test > stuff in the paren't PROC structure? No. p_pptr is also locked by the proctree lock, so holding the proctree lock is sufficient to protect reads of p_pptr. -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Mon Jul 30 9: 5: 9 2001 Delivered-To: freebsd-smp@freebsd.org Received: from mail.wrs.com (unknown-1-11.windriver.com [147.11.1.11]) by hub.freebsd.org (Postfix) with ESMTP id 88DF537B427 for ; Mon, 30 Jul 2001 09:04:55 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: from laptop.baldwin.cx (john@[147.11.46.217]) by mail.wrs.com (8.9.3/8.9.1) with ESMTP id JAA16810; Mon, 30 Jul 2001 09:04:43 -0700 (PDT) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <200107292350.aa62249@salmon.maths.tcd.ie> Date: Mon, 30 Jul 2001 09:04:44 -0700 (PDT) From: John Baldwin To: Ian Dowse Subject: Re: kaboom... Cc: smp@FreeBSD.org, mjacob@feral.com Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On 29-Jul-01 Ian Dowse wrote: > In message <20010729153617.C44279-100000@wonky.feral.com>, Matthew Jacob > writes >: >>Happened again once. It may be if I do 2 make -j 8 kernel builds I get this. >>The panic is somewhere in exit1 where marked: > >> if ((p->p_pptr->p_procsig->ps_flag & PS_NOCLDWAIT) >> || p->p_pptr->p_sigacts->ps_sigact[_SIG_IDX(SIGCHLD)] == SIG_IGN) > > Yeah, see my post to -current on friday ("SIGCHLD changes causing.."). > Matt Dillon is apparently looking into this, but I was able to find > out that this is caused when the parent process is swapped out. It > is ok to check p_procsig->ps_flag, but p_sigacts is in the struct > user area that is inaccessible when the process is swapped out. > > Backing out kern_sig.c r1.125 and kern_exit.c r1.131 should fix it. Ah. > Ian -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jul 31 12:47:46 2001 Delivered-To: freebsd-smp@freebsd.org Received: from bdr-xcon.matchlogic.com (mail.matchlogic.com [205.216.147.127]) by hub.freebsd.org (Postfix) with ESMTP id 4D78537B401 for ; Tue, 31 Jul 2001 12:47:43 -0700 (PDT) (envelope-from crandall@matchlogic.com) Received: by mail.matchlogic.com with Internet Mail Service (5.5.2653.19) id ; Tue, 31 Jul 2001 13:47:34 -0600 Message-ID: <5FE9B713CCCDD311A03400508B8B30130828F203@bdr-xcln.corp.matchlogic.com> From: Charles Randall To: "smp@freebsd.org" Subject: RE: Dell 1550 SMP crash Date: Tue, 31 Jul 2001 13:45:33 -0600 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Again, no panic message and possibly a different problem. This time it points to NFS I/O (which has always been present, but not always the suspect). Is there anyone willing to work through this with me off-list? -Charles mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 instruction pointer = 0x8:0xc01ceea2 stack pointer = 0x10:0xf78b0d14 frame pointer = 0x10:0xf78b0d3c code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, IOPL = 0 current process = 2399 (db_build) interrupt mask = bio <- SMP: XXX kernel: type 29 trap, code=0 Stopped at getblk+0x372: movl %edx,0x4(%ebx) db> trace getblk(f7987940,4080,2000,0,0) at getblk+0x372 nfs_getcacheblk(f7987940,4080,2000,edaf0380,f7987940) at nfs_getcacheblk+0x83 nfs_bioread(f7987940,f78b0ed8,0,c569d880,f78b0e7c) at nfs_bioread+0x5ed nfs_read(f78b0e68,edaf0380,edaf0380,200,0) at nfs_read+0x1e vn_read(c5761600,f78b0ed8,c569d880,1,edaf0380) at vn_read+0x110 dofileread(edaf0380,c5761600,3,281e9eec,200) at dofileread+0xb0 pread(edaf0380,f78b0f80,281ab628,28204cd0,bfbfe148) at pread+0x48 syscall2(bfbf002f,2820002f,bfbf002f,bfbfe148,28204cd0) at syscall2+0x219 Xint0x80_syscall() at Xint0x80_syscall+0x2b db> show registers cs 0x8 gd_npxproc ds 0x8100010 es 0x10 gd_switchtime fs 0xd1ab0018 ss 0x10 gd_switchtime eax 0xd1d700a0 ecx 0x4080 gd_astpending+0x3fbc edx 0xd1ef60f8 ebx 0xd1ab1088 esp 0xf78b0d14 ebp 0xf78b0d3c esi 0x8100000 edi 0 eip 0xc01ceea2 getblk+0x372 efl 0x286 gd_astpending+0x1c2 getblk+0x372: movl %edx,0x4(%ebx) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jul 31 14:57:28 2001 Delivered-To: freebsd-smp@freebsd.org Received: from mail.wrs.com (unknown-1-11.windriver.com [147.11.1.11]) by hub.freebsd.org (Postfix) with ESMTP id 225E937B401 for ; Tue, 31 Jul 2001 14:57:24 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: from laptop.baldwin.cx (john@[147.11.46.217]) by mail.wrs.com (8.9.3/8.9.1) with ESMTP id OAA29457; Tue, 31 Jul 2001 14:57:17 -0700 (PDT) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <5FE9B713CCCDD311A03400508B8B30130828F203@bdr-xcln.corp.matchlogic.com> Date: Tue, 31 Jul 2001 14:57:18 -0700 (PDT) From: John Baldwin To: Charles Randall Subject: RE: Dell 1550 SMP crash Cc: "smp@freebsd.org" Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On 31-Jul-01 Charles Randall wrote: > Again, no panic message and possibly a different problem. This time it > points to NFS I/O (which has always been present, but not always the > suspect). > > Is there anyone willing to work through this with me off-list? > > -Charles > > mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 > instruction pointer = 0x8:0xc01ceea2 > stack pointer = 0x10:0xf78b0d14 > frame pointer = 0x10:0xf78b0d3c > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, IOPL = 0 > current process = 2399 (db_build) > interrupt mask = bio <- SMP: XXX > kernel: type 29 trap, code=0 Ah, didn't see the type 29 before. Hrm, there aren't really 29 traps according to trap.c, the highest is 28, which is a machine check. Let me go look around.. :( Oh, ok. type 28 is a machine check which is INT 18. INT 19 on the P3 is a new exception we don't handle: SIMD Floating-Point exception. I thought we had some SIMD code floating around somewhere though, but not in 4.x. Peter? Granted, I don't know why a mov would generate that. :( Only SSE and SSE2 instructions are supposed to generate that. > Stopped at getblk+0x372: movl %edx,0x4(%ebx) > > db> trace > getblk(f7987940,4080,2000,0,0) at getblk+0x372 > nfs_getcacheblk(f7987940,4080,2000,edaf0380,f7987940) at > nfs_getcacheblk+0x83 > nfs_bioread(f7987940,f78b0ed8,0,c569d880,f78b0e7c) at nfs_bioread+0x5ed > nfs_read(f78b0e68,edaf0380,edaf0380,200,0) at nfs_read+0x1e > vn_read(c5761600,f78b0ed8,c569d880,1,edaf0380) at vn_read+0x110 > dofileread(edaf0380,c5761600,3,281e9eec,200) at dofileread+0xb0 > pread(edaf0380,f78b0f80,281ab628,28204cd0,bfbfe148) at pread+0x48 > syscall2(bfbf002f,2820002f,bfbf002f,bfbfe148,28204cd0) at syscall2+0x219 > Xint0x80_syscall() at Xint0x80_syscall+0x2b > > db> show registers > cs 0x8 gd_npxproc > ds 0x8100010 > es 0x10 gd_switchtime > fs 0xd1ab0018 > ss 0x10 gd_switchtime > eax 0xd1d700a0 > ecx 0x4080 gd_astpending+0x3fbc > edx 0xd1ef60f8 > ebx 0xd1ab1088 > esp 0xf78b0d14 > ebp 0xf78b0d3c > esi 0x8100000 > edi 0 > eip 0xc01ceea2 getblk+0x372 > efl 0x286 gd_astpending+0x1c2 > getblk+0x372: movl %edx,0x4(%ebx) -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jul 31 15:18:22 2001 Delivered-To: freebsd-smp@freebsd.org Received: from bdr-xcon.matchlogic.com (mail.matchlogic.com [205.216.147.127]) by hub.freebsd.org (Postfix) with ESMTP id 5DCDC37B401 for ; Tue, 31 Jul 2001 15:18:17 -0700 (PDT) (envelope-from crandall@matchlogic.com) Received: by mail.matchlogic.com with Internet Mail Service (5.5.2653.19) id ; Tue, 31 Jul 2001 16:18:08 -0600 Message-ID: <5FE9B713CCCDD311A03400508B8B30130828F20A@bdr-xcln.corp.matchlogic.com> From: Charles Randall To: "smp@freebsd.org" Subject: RE: Dell 1550 SMP crash Date: Tue, 31 Jul 2001 16:16:07 -0600 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org From: John Baldwin [mailto:jhb@FreeBSD.org] >Ah, didn't see the type 29 before. Hrm, there aren't really 29 traps >according to trap.c, the highest is 28, which is a machine check. Let me go >look around.. :( > >Oh, ok. type 28 is a machine check which is INT 18. INT 19 on the P3 is a new >exception we don't handle: SIMD Floating-Point exception. I thought we had >some SIMD code floating around somewhere though, but not in 4.x. Peter? > >Granted, I don't know why a mov would generate that. :( >Only SSE and SSE2 instructions are supposed to generate that. Searching the archives for "type 29", this looks like a problem that James FitzGibbon reported in May 2000 with a Dell PowerEdge 2450. Drew Eckhardt posted a response and then the thread basically died. Has anyone with the Dell 1550 or 2450 seen this before? Charles To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jul 31 15:33:55 2001 Delivered-To: freebsd-smp@freebsd.org Received: from bdr-xcon.matchlogic.com (mail.matchlogic.com [205.216.147.127]) by hub.freebsd.org (Postfix) with ESMTP id 33A6437B401 for ; Tue, 31 Jul 2001 15:33:54 -0700 (PDT) (envelope-from crandall@matchlogic.com) Received: by mail.matchlogic.com with Internet Mail Service (5.5.2653.19) id ; Tue, 31 Jul 2001 16:33:45 -0600 Message-ID: <5FE9B713CCCDD311A03400508B8B30130828F20B@bdr-xcln.corp.matchlogic.com> From: Charles Randall To: "smp@freebsd.org" Subject: RE: Dell 1550 SMP crash Date: Tue, 31 Jul 2001 16:31:43 -0600 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org There's also an ancient discussion of this in the -hackers archives from May 1997 ("trap type 29 on P6"). Following that thread, it was believed that this was due to spurious hardware problems. Some thought that trap 29 should be ignored, others thought ignoring it may mask a real problem. It was noted that NetBSD ignores this. Two patches were posted without response. Does anyone have a current opinion on this? Is Dell producing buggy hardware or should FreeBSD ignore this (or both)? -Charles To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jul 31 15:38:31 2001 Delivered-To: freebsd-smp@freebsd.org Received: from mail.wrs.com (unknown-1-11.windriver.com [147.11.1.11]) by hub.freebsd.org (Postfix) with ESMTP id 9FC5037B409 for ; Tue, 31 Jul 2001 15:38:25 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: from laptop.baldwin.cx (john@[147.11.46.217]) by mail.wrs.com (8.9.3/8.9.1) with ESMTP id PAA22230; Tue, 31 Jul 2001 15:38:19 -0700 (PDT) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <5FE9B713CCCDD311A03400508B8B30130828F20A@bdr-xcln.corp.matchlogic.com> Date: Tue, 31 Jul 2001 15:38:20 -0700 (PDT) From: John Baldwin To: Charles Randall Subject: RE: Dell 1550 SMP crash Cc: "smp@freebsd.org" Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On 31-Jul-01 Charles Randall wrote: > From: John Baldwin [mailto:jhb@FreeBSD.org] >>Ah, didn't see the type 29 before. Hrm, there aren't really 29 traps >>according to trap.c, the highest is 28, which is a machine check. Let me > go >>look around.. :( >> >>Oh, ok. type 28 is a machine check which is INT 18. INT 19 on the P3 is a > new >>exception we don't handle: SIMD Floating-Point exception. I thought we had >>some SIMD code floating around somewhere though, but not in 4.x. Peter? >> >>Granted, I don't know why a mov would generate that. :( >>Only SSE and SSE2 instructions are supposed to generate that. > > Searching the archives for "type 29", this looks like a problem that James > FitzGibbon reported in May 2000 with a Dell PowerEdge 2450. Drew Eckhardt > posted a response and then the thread basically died. > > Has anyone with the Dell 1550 or 2450 seen this before? Well, it may not actually be a SIMD trap. Is this reproducible, and is it always at the same place? Peter Wemm (I think) added some code to handle SIMD traps in the kernel recently. I'll try and backport that to 4.x and let you try it out though to see if it helps any. > Charles -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Tue Jul 31 16: 2: 4 2001 Delivered-To: freebsd-smp@freebsd.org Received: from revolt.poohsticks.org (revolt.poohsticks.org [63.227.60.74]) by hub.freebsd.org (Postfix) with ESMTP id D19EF37B401 for ; Tue, 31 Jul 2001 16:02:00 -0700 (PDT) (envelope-from drew@revolt.poohsticks.org) Received: from revolt.poohsticks.org (localhost [127.0.0.1]) by revolt.poohsticks.org (8.11.3/8.11.3) with ESMTP id f6VN1wd38295; Tue, 31 Jul 2001 17:01:58 -0600 (MDT) (envelope-from drew@revolt.poohsticks.org) Message-Id: <200107312301.f6VN1wd38295@revolt.poohsticks.org> To: Charles Randall Cc: "smp@freebsd.org" Subject: Re: Dell 1550 SMP crash In-reply-to: Your message of "Tue, 31 Jul 2001 16:16:07 MDT." <5FE9B713CCCDD311A03400508B8B30130828F20A@bdr-xcln.corp.matchlogic.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <38292.996620518.1@revolt.poohsticks.org> Date: Tue, 31 Jul 2001 17:01:58 -0600 From: Drew Eckhardt Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org >Searching the archives for "type 29", this looks like a problem that James >FitzGibbon reported in May 2000 with a Dell PowerEdge 2450. Drew Eckhardt >posted a response and then the thread basically died. I had some headaches with frequent (I couldn't do a parallel make buildworld) unexplained T_RESESVEDs and looked at several possibilities: 1. The SIMD instruction fault. This wasn't happening. 2. One of the other 32 CPU traps which is allegedly still reserved. This wasn't happening. 3. Something generated by the APIC. Explicitly initializing the unused APIC pins eliminated most but not all of these (every week or so it might crash), suggesting that some form of hardware problem may be responsible. The simplest sometimes working solutions to hardware problems are 1. Unplugging and replugging things 2. Swapping parts So I unplugged and swapped my two Slot-1 PIII 600E processors and found that the problem stopped entirely. -- Home Page For those who do, no explanation is necessary. For those who don't, no explanation is possible. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Wed Aug 1 13:11:21 2001 Delivered-To: freebsd-smp@freebsd.org Received: from beppo.feral.com (beppo.feral.com [192.67.166.79]) by hub.freebsd.org (Postfix) with ESMTP id 4B8A837B403; Wed, 1 Aug 2001 13:11:18 -0700 (PDT) (envelope-from mjacob@feral.com) Received: from beppo (mjacob@beppo [192.67.166.79]) by beppo.feral.com (8.11.3/8.11.3) with ESMTP id f71KBBI20772; Wed, 1 Aug 2001 13:11:12 -0700 (PDT) (envelope-from mjacob@feral.com) Date: Wed, 1 Aug 2001 13:11:11 -0700 (PDT) From: Matthew Jacob X-Sender: mjacob@beppo Reply-To: mjacob@feral.com To: Ian Dowse Cc: John Baldwin , smp@FreeBSD.org Subject: Re: kaboom... In-Reply-To: <20010729155402.V44279-100000@wonky.feral.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Has there been anything further on this? Leaving this serious amount breakage at the top of the tree for more than a week is really bad. I think I'm going to roll this out if nothing happens in a couple of days. On Sun, 29 Jul 2001, Matthew Jacob wrote: > > Ah. Sorry for the noise then. > > On Sun, 29 Jul 2001, Ian Dowse wrote: > > > In message <20010729153617.C44279-100000@wonky.feral.com>, Matthew Jacob writes > > : > > >Happened again once. It may be if I do 2 make -j 8 kernel builds I get this. > > >The panic is somewhere in exit1 where marked: > > > > > if ((p->p_pptr->p_procsig->ps_flag & PS_NOCLDWAIT) > > > || p->p_pptr->p_sigacts->ps_sigact[_SIG_IDX(SIGCHLD)] == SIG_IGN) > > > > Yeah, see my post to -current on friday ("SIGCHLD changes causing.."). > > Matt Dillon is apparently looking into this, but I was able to find > > out that this is caused when the parent process is swapped out. It > > is ok to check p_procsig->ps_flag, but p_sigacts is in the struct > > user area that is inaccessible when the process is swapped out. > > > > Backing out kern_sig.c r1.125 and kern_exit.c r1.131 should fix it. > > > > Ian > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message