From owner-freebsd-smp@FreeBSD.ORG Sun Aug 14 10:31:35 2005 Return-Path: X-Original-To: freebsd-smp@freebsd.org Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DD05416A41F for ; Sun, 14 Aug 2005 10:31:35 +0000 (GMT) (envelope-from hvleest@signet.nl) Received: from bsd.local.jellevanleest.nl (cp417515-a.dbsch1.nb.home.nl [84.27.32.116]) by mx1.FreeBSD.org (Postfix) with ESMTP id 317DC43D48 for ; Sun, 14 Aug 2005 10:31:34 +0000 (GMT) (envelope-from hvleest@signet.nl) Received: from [192.168.100.105] (unknown [192.168.100.105]) by bsd.local.jellevanleest.nl (Postfix) with ESMTP id C5B7F60D0 for ; Sun, 14 Aug 2005 12:31:32 +0200 (CEST) Message-ID: <42FF1D68.4080706@signet.nl> Date: Sun, 14 Aug 2005 12:31:04 +0200 From: Hans van Leest User-Agent: Mozilla Thunderbird 0.7.1 (Windows/20040626) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-smp@freebsd.org References: <42CD9115.70302@signet.nl> <200507291322.40113.jhb@FreeBSD.org> In-Reply-To: <200507291322.40113.jhb@FreeBSD.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: SMP boot errors X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2005 10:31:36 -0000 Hello, I've started with a new 6.0 source. Patched the source and did a make buildworld, with no problems. When I do a make buildkernel I get an error: cc -c -O -pipe -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -std=c99 -g -nostdinc -I- -I. -I/usr/src/sys -I/usr/src/sys/contrib/dev/acpica -I/usr/src/sys/contrib/altq -I/usr/src/sys/contrib/ipfilter -I/usr/src/sys/contrib/pf -I/usr/src/sys/contrib/dev/ath -I/usr/src/sys/contrib/dev/ath/freebsd -I/usr/src/sys/contrib/ngatm -I/usr/src/sys/dev/twa -D_KERNEL -include opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -ffreestanding -Werror /usr/src/sys/dev/pci/pci.c /usr/src/sys/dev/pci/pci.c:928: error: conflicting types for 'pci_assign_interrupt' /usr/src/sys/dev/pci/pci.c:80: error: previous declaration of 'pci_assign_interrupt' was here /usr/src/sys/dev/pci/pci.c:928: error: conflicting types for 'pci_assign_interrupt' /usr/src/sys/dev/pci/pci.c:80: error: previous declaration of 'pci_assign_interrupt' was here /usr/src/sys/dev/pci/pci.c: In function `pci_add_resources': /usr/src/sys/dev/pci/pci.c:979: warning: unused variable `irq' /usr/src/sys/dev/pci/pci.c: At top level: /usr/src/sys/dev/pci/pci.c:80: warning: 'pci_assign_interrupt' declared `static' but never defined *** Error code 1 Stop in /usr/obj/usr/src/sys/KERNEL-6.0. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. bsd# How do I continue? Hans John Baldwin wrote: > On Thursday 07 July 2005 04:31 pm, Hans van Leest wrote: > >>Hello, >> >>I've posted last week also my problems with a SMP kernel on a dual XEON >>machine. >> >>Maybe this error message is better to debug, I hope >> >>I've upgraded the source to 6.0 current. Did not modify my GENERIC file >>and compiled the source. This inclusus the options SMP and device APIC. >>I've booted several times and had the same error: >> >>panic: multiple IRQs for PCI interrupt 0.31.INTA 18 AND 16 >> >> >>Where to start to fix this! > > > Ok, I have a patch you can try to use to work around this. Apply this patch > and then try setting 'hw.pci0.31.INTA.irq=16' from the loader to force that > specific PCI interrupt to IRQ 16. If that results in interrupt storms, you > can try setting it to 18 instead of 16. > > --- //depot/vendor/freebsd/src/sys/dev/pci/pci.c 2005/06/03 19:45:18 > +++ //depot/user/jhb/acpipci/dev/pci/pci.c 2005/07/29 14:35:47 > @@ -76,6 +76,8 @@ > > static int pci_porten(device_t pcib, int b, int s, int f); > static int pci_memen(device_t pcib, int b, int s, int f); > +static int pci_assign_interrupt(device_t bus, device_t dev, > + int force_route); > static int pci_add_map(device_t pcib, device_t bus, device_t dev, > int b, int s, int f, int reg, > struct resource_list *rl); > @@ -922,6 +924,52 @@ > } > > static void > +pci_assign_interrupt(device_t bus, device_t dev, int force_route) > +{ > + struct pci_devinfo *dinfo = device_get_ivars(dev); > + pcicfgregs *cfg = &dinfo->cfg; > + char tunable_name[64]; > + int irq; > + > + /* Has to have an intpin to have an interrupt. */ > + if (cfg->intpin == 0) > + return; > + > + /* Let the user override the IRQ with a tunable. */ > + irq = PCI_INVALID_IRQ; > + snprintf(tunable_name, sizeof(tunable_name), "hw.pci%d.%d.INT%c.irq", > + cfg->bus, cfg->slot, cfg->intpin + 'A' - 1); > + if (TUNABLE_INT_FETCH(tunable_name, &irq) && (irq >= 255 || irq <= 0)) > + irq = PCI_INVALID_IRQ; > + > + /* > + * If we didn't get an IRQ via the tunable, then we either use the > + * IRQ value in the intline register or we ask the bus to route an > + * interrupt for us. If force_route is true, then we only use the > + * value in the intline register if the bus was unable to assign an > + * IRQ. > + */ > + if (!PCI_INTERRUPT_VALID(irq)) { > + irq = cfg->intline; > + if (!PCI_INTERRUPT_VALID(irq) || force_route) > + irq = PCI_ASSIGN_INTERRUPT(bus, dev); > + } > + > + /* If after all that we don't have an IRQ, just bail. */ > + if (!PCI_INTERRUPT_VALID(irq)) > + return; > + > + /* Update the config register if it changed. */ > + if (irq != cfg->intline) { > + cfg->intline = irq; > + pci_write_config(dev, PCIR_INTLINE, irq, 1); > + } > + > + /* Add this IRQ as rid 0 interrupt resource. */ > + resource_list_add(&dinfo->resources, SYS_RES_IRQ, 0, irq, irq, 1); > +} > + > +static void > pci_add_resources(device_t pcib, device_t bus, device_t dev) > { > struct pci_devinfo *dinfo = device_get_ivars(dev); > @@ -959,14 +1007,10 @@ > * If the re-route fails, then just stick with what we > * have. > */ > - irq = PCI_ASSIGN_INTERRUPT(bus, dev); > - if (PCI_INTERRUPT_VALID(irq)) { > - pci_write_config(dev, PCIR_INTLINE, irq, 1); > - cfg->intline = irq; > - } else > + pci_assign_interrupt(bus, dev, 1); > +#else > + pci_assign_interrupt(bus, dev, 0); > #endif > - irq = cfg->intline; > - resource_list_add(rl, SYS_RES_IRQ, 0, irq, irq, 1); > } > } > > @@ -1705,15 +1749,8 @@ > * interrupt, try to assign it one. > */ > if (!PCI_INTERRUPT_VALID(cfg->intline) && > - (cfg->intpin != 0)) { > - cfg->intline = PCI_ASSIGN_INTERRUPT(dev, child); > - if (PCI_INTERRUPT_VALID(cfg->intline)) { > - pci_write_config(child, PCIR_INTLINE, > - cfg->intline, 1); > - resource_list_add(rl, SYS_RES_IRQ, 0, > - cfg->intline, cfg->intline, 1); > - } > - } > + (cfg->intpin != 0)) > + pci_assign_interrupt(dev, child, 0); > break; > case SYS_RES_IOPORT: > case SYS_RES_MEMORY: > From owner-freebsd-smp@FreeBSD.ORG Mon Aug 15 21:01:18 2005 Return-Path: X-Original-To: freebsd-smp@freebsd.org Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2030816A41F for ; Mon, 15 Aug 2005 21:01:18 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from mv.twc.weather.com (mv.twc.weather.com [65.212.71.225]) by mx1.FreeBSD.org (Postfix) with ESMTP id BD74943D46 for ; Mon, 15 Aug 2005 21:01:17 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from [10.50.40.201] (Not Verified[10.50.40.201]) by mv.twc.weather.com with NetIQ MailMarshal (v6, 0, 3, 8) id ; Mon, 15 Aug 2005 17:16:13 -0400 From: John Baldwin To: freebsd-smp@freebsd.org Date: Mon, 15 Aug 2005 13:37:58 -0400 User-Agent: KMail/1.8 References: <42CD9115.70302@signet.nl> <200507291322.40113.jhb@FreeBSD.org> <42FF1D68.4080706@signet.nl> In-Reply-To: <42FF1D68.4080706@signet.nl> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200508151337.59240.jhb@FreeBSD.org> Cc: Hans van Leest Subject: Re: SMP boot errors X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2005 21:01:18 -0000 On Sunday 14 August 2005 06:31 am, Hans van Leest wrote: > Hello, > > I've started with a new 6.0 source. Patched the source and did a make > buildworld, with no problems. > When I do a make buildkernel I get an error: Yes, the patch had some bugs which I've since fixed (it actually completely busted interrupts on my alpha at first). Here is the updated patch: --- //depot/vendor/freebsd/src/sys/dev/pci/pci.c 2005/06/03 19:45:18 +++ //depot/user/jhb/acpipci/dev/pci/pci.c 2005/08/03 19:57:34 @@ -76,6 +76,8 @@ static int pci_porten(device_t pcib, int b, int s, int f); static int pci_memen(device_t pcib, int b, int s, int f); +static void pci_assign_interrupt(device_t bus, device_t dev, + int force_route); static int pci_add_map(device_t pcib, device_t bus, device_t dev, int b, int s, int f, int reg, struct resource_list *rl); @@ -922,13 +924,60 @@ } static void +pci_assign_interrupt(device_t bus, device_t dev, int force_route) +{ + struct pci_devinfo *dinfo = device_get_ivars(dev); + pcicfgregs *cfg = &dinfo->cfg; + char tunable_name[64]; + int irq; + + /* Has to have an intpin to have an interrupt. */ + if (cfg->intpin == 0) + return; + + /* Let the user override the IRQ with a tunable. */ + irq = PCI_INVALID_IRQ; + snprintf(tunable_name, sizeof(tunable_name), "hw.pci%d.%d.INT%c.irq", + cfg->bus, cfg->slot, cfg->intpin + 'A' - 1); + if (TUNABLE_INT_FETCH(tunable_name, &irq) && (irq >= 255 || irq <= 0)) + irq = PCI_INVALID_IRQ; + + /* + * If we didn't get an IRQ via the tunable, then we either use the + * IRQ value in the intline register or we ask the bus to route an + * interrupt for us. If force_route is true, then we only use the + * value in the intline register if the bus was unable to assign an + * IRQ. + */ + if (!PCI_INTERRUPT_VALID(irq)) { + if (!PCI_INTERRUPT_VALID(cfg->intline) || force_route) + irq = PCI_ASSIGN_INTERRUPT(bus, dev); + if (!PCI_INTERRUPT_VALID(irq)) + irq = cfg->intline; + } + + /* If after all that we don't have an IRQ, just bail. */ + if (!PCI_INTERRUPT_VALID(irq)) + return; + + /* Update the config register if it changed. */ + if (irq != cfg->intline) { + cfg->intline = irq; + pci_write_config(dev, PCIR_INTLINE, irq, 1); + } + + /* Add this IRQ as rid 0 interrupt resource. */ + resource_list_add(&dinfo->resources, SYS_RES_IRQ, 0, irq, irq, 1); +} + +static void pci_add_resources(device_t pcib, device_t bus, device_t dev) { struct pci_devinfo *dinfo = device_get_ivars(dev); pcicfgregs *cfg = &dinfo->cfg; struct resource_list *rl = &dinfo->resources; struct pci_quirk *q; - int b, i, irq, f, s; + int b, i, f, s; b = cfg->bus; s = cfg->slot; @@ -959,14 +1008,10 @@ * If the re-route fails, then just stick with what we * have. */ - irq = PCI_ASSIGN_INTERRUPT(bus, dev); - if (PCI_INTERRUPT_VALID(irq)) { - pci_write_config(dev, PCIR_INTLINE, irq, 1); - cfg->intline = irq; - } else + pci_assign_interrupt(bus, dev, 1); +#else + pci_assign_interrupt(bus, dev, 0); #endif - irq = cfg->intline; - resource_list_add(rl, SYS_RES_IRQ, 0, irq, irq, 1); } } @@ -1705,15 +1750,8 @@ * interrupt, try to assign it one. */ if (!PCI_INTERRUPT_VALID(cfg->intline) && - (cfg->intpin != 0)) { - cfg->intline = PCI_ASSIGN_INTERRUPT(dev, child); - if (PCI_INTERRUPT_VALID(cfg->intline)) { - pci_write_config(child, PCIR_INTLINE, - cfg->intline, 1); - resource_list_add(rl, SYS_RES_IRQ, 0, - cfg->intline, cfg->intline, 1); - } - } + (cfg->intpin != 0)) + pci_assign_interrupt(dev, child, 0); break; case SYS_RES_IOPORT: case SYS_RES_MEMORY: -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-smp@FreeBSD.ORG Tue Aug 16 12:50:01 2005 Return-Path: X-Original-To: freebsd-smp@freebsd.org Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1F78316A41F for ; Tue, 16 Aug 2005 12:50:01 +0000 (GMT) (envelope-from hvleest@signet.nl) Received: from bsd.local.jellevanleest.nl (cp417515-a.dbsch1.nb.home.nl [84.27.32.116]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7C98543D53 for ; Tue, 16 Aug 2005 12:49:59 +0000 (GMT) (envelope-from hvleest@signet.nl) Received: from [192.168.100.105] (unknown [192.168.100.105]) by bsd.local.jellevanleest.nl (Postfix) with ESMTP id 406666171 for ; Tue, 16 Aug 2005 14:49:58 +0200 (CEST) Message-ID: <4301E0D9.8090307@signet.nl> Date: Tue, 16 Aug 2005 14:49:29 +0200 From: Hans van Leest User-Agent: Mozilla Thunderbird 0.7.1 (Windows/20040626) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-smp@freebsd.org References: <42CD9115.70302@signet.nl> <200507291322.40113.jhb@FreeBSD.org> <42FF1D68.4080706@signet.nl> <200508151337.59240.jhb@FreeBSD.org> In-Reply-To: <200508151337.59240.jhb@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: SMP boot errors X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: hvleest@signet.nl List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2005 12:50:01 -0000 Still problems unfortunaly I did the following. Started with a new CURRENT source. applied the patch, and build / installed the kernel At boot, with ACPI enabled in the BIOS, In the bootloader I typed: #set hw.pci0.31.INTA.irq=16 #boot I get the following error: Instruc. pointer: 0x20:0xc0800a87 Same error with ..irq=18 With ACPI disabled in the BIOS: #set hw.pci0.31.INTA.irq=16 (or 18) #boot Instruc. pointer: 0x20:0xc0530f4d A noticed the following line in dmesg after booting the previous kernel: atapci0: port 0xf000-0xf00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 irq 18 at device 31.2 on pci0 I looks like the Intel RAID controller uses irq 18 Any suggestions? John Baldwin wrote: >On Sunday 14 August 2005 06:31 am, Hans van Leest wrote: > > >>Hello, >> >>I've started with a new 6.0 source. Patched the source and did a make >>buildworld, with no problems. >>When I do a make buildkernel I get an error: >> >> > >Yes, the patch had some bugs which I've since fixed (it actually completely busted interrupts on my alpha at first). Here is the updated patch: > >--- //depot/vendor/freebsd/src/sys/dev/pci/pci.c 2005/06/03 19:45:18 >+++ //depot/user/jhb/acpipci/dev/pci/pci.c 2005/08/03 19:57:34 >@@ -76,6 +76,8 @@ > > static int pci_porten(device_t pcib, int b, int s, int f); > static int pci_memen(device_t pcib, int b, int s, int f); >+static void pci_assign_interrupt(device_t bus, device_t dev, >+ int force_route); > static int pci_add_map(device_t pcib, device_t bus, device_t dev, > int b, int s, int f, int reg, > struct resource_list *rl); >@@ -922,13 +924,60 @@ > } > > static void >+pci_assign_interrupt(device_t bus, device_t dev, int force_route) >+{ >+ struct pci_devinfo *dinfo = device_get_ivars(dev); >+ pcicfgregs *cfg = &dinfo->cfg; >+ char tunable_name[64]; >+ int irq; >+ >+ /* Has to have an intpin to have an interrupt. */ >+ if (cfg->intpin == 0) >+ return; >+ >+ /* Let the user override the IRQ with a tunable. */ >+ irq = PCI_INVALID_IRQ; >+ snprintf(tunable_name, sizeof(tunable_name), "hw.pci%d.%d.INT%c.irq", >+ cfg->bus, cfg->slot, cfg->intpin + 'A' - 1); >+ if (TUNABLE_INT_FETCH(tunable_name, &irq) && (irq >= 255 || irq <= 0)) >+ irq = PCI_INVALID_IRQ; >+ >+ /* >+ * If we didn't get an IRQ via the tunable, then we either use the >+ * IRQ value in the intline register or we ask the bus to route an >+ * interrupt for us. If force_route is true, then we only use the >+ * value in the intline register if the bus was unable to assign an >+ * IRQ. >+ */ >+ if (!PCI_INTERRUPT_VALID(irq)) { >+ if (!PCI_INTERRUPT_VALID(cfg->intline) || force_route) >+ irq = PCI_ASSIGN_INTERRUPT(bus, dev); >+ if (!PCI_INTERRUPT_VALID(irq)) >+ irq = cfg->intline; >+ } >+ >+ /* If after all that we don't have an IRQ, just bail. */ >+ if (!PCI_INTERRUPT_VALID(irq)) >+ return; >+ >+ /* Update the config register if it changed. */ >+ if (irq != cfg->intline) { >+ cfg->intline = irq; >+ pci_write_config(dev, PCIR_INTLINE, irq, 1); >+ } >+ >+ /* Add this IRQ as rid 0 interrupt resource. */ >+ resource_list_add(&dinfo->resources, SYS_RES_IRQ, 0, irq, irq, 1); >+} >+ >+static void > pci_add_resources(device_t pcib, device_t bus, device_t dev) > { > struct pci_devinfo *dinfo = device_get_ivars(dev); > pcicfgregs *cfg = &dinfo->cfg; > struct resource_list *rl = &dinfo->resources; > struct pci_quirk *q; >- int b, i, irq, f, s; >+ int b, i, f, s; > > b = cfg->bus; > s = cfg->slot; >@@ -959,14 +1008,10 @@ > * If the re-route fails, then just stick with what we > * have. > */ >- irq = PCI_ASSIGN_INTERRUPT(bus, dev); >- if (PCI_INTERRUPT_VALID(irq)) { >- pci_write_config(dev, PCIR_INTLINE, irq, 1); >- cfg->intline = irq; >- } else >+ pci_assign_interrupt(bus, dev, 1); >+#else >+ pci_assign_interrupt(bus, dev, 0); > #endif >- irq = cfg->intline; >- resource_list_add(rl, SYS_RES_IRQ, 0, irq, irq, 1); > } > } > >@@ -1705,15 +1750,8 @@ > * interrupt, try to assign it one. > */ > if (!PCI_INTERRUPT_VALID(cfg->intline) && >- (cfg->intpin != 0)) { >- cfg->intline = PCI_ASSIGN_INTERRUPT(dev, child); >- if (PCI_INTERRUPT_VALID(cfg->intline)) { >- pci_write_config(child, PCIR_INTLINE, >- cfg->intline, 1); >- resource_list_add(rl, SYS_RES_IRQ, 0, >- cfg->intline, cfg->intline, 1); >- } >- } >+ (cfg->intpin != 0)) >+ pci_assign_interrupt(dev, child, 0); > break; > case SYS_RES_IOPORT: > case SYS_RES_MEMORY: > > > > From owner-freebsd-smp@FreeBSD.ORG Tue Aug 16 17:50:15 2005 Return-Path: X-Original-To: freebsd-smp@freebsd.org Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A7AF416A421 for ; Tue, 16 Aug 2005 17:50:15 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from mv.twc.weather.com (mv.twc.weather.com [65.212.71.225]) by mx1.FreeBSD.org (Postfix) with ESMTP id 72B6743D49 for ; Tue, 16 Aug 2005 17:50:13 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from [10.50.40.201] (Not Verified[10.50.40.201]) by mv.twc.weather.com with NetIQ MailMarshal (v6, 0, 3, 8) id ; Tue, 16 Aug 2005 14:05:10 -0400 From: John Baldwin To: freebsd-smp@freebsd.org, hvleest@signet.nl Date: Tue, 16 Aug 2005 11:26:11 -0400 User-Agent: KMail/1.8 References: <42CD9115.70302@signet.nl> <200508151337.59240.jhb@FreeBSD.org> <4301E0D9.8090307@signet.nl> In-Reply-To: <4301E0D9.8090307@signet.nl> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200508161126.12330.jhb@FreeBSD.org> Cc: Subject: Re: SMP boot errors X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2005 17:50:15 -0000 On Tuesday 16 August 2005 08:49 am, Hans van Leest wrote: > Still problems unfortunaly > I did the following. > Started with a new CURRENT source. > applied the patch, and build / installed the kernel > > At boot, with ACPI enabled in the BIOS, > In the bootloader I typed: > #set hw.pci0.31.INTA.irq=16 > #boot > I get the following error: > Instruc. pointer: 0x20:0xc0800a87 > Same error with ..irq=18 > > With ACPI disabled in the BIOS: > #set hw.pci0.31.INTA.irq=16 (or 18) > #boot > Instruc. pointer: 0x20:0xc0530f4d > > A noticed the following line in dmesg after booting the previous kernel: > atapci0: port > 0xf000-0xf00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 irq 18 at device 31.2 > on pci0 > > I looks like the Intel RAID controller uses irq 18 > > Any suggestions? I need the full error message when it dies. It seems to be a kernel panic. At the very least can you build a debug kernel and then get the panic messages including the faulting virtual address and instruction pointer lines? Even better can be if you compile ddb and kdb into your kernel and get a stack trace from ddb. I'm pretty sure this patch can only help in the non-ACPI case as well where it panics with the multiple IRQs problem. > John Baldwin wrote: > > >On Sunday 14 August 2005 06:31 am, Hans van Leest wrote: > > > > > >>Hello, > >> > >>I've started with a new 6.0 source. Patched the source and did a make > >>buildworld, with no problems. > >>When I do a make buildkernel I get an error: > >> > >> > > > >Yes, the patch had some bugs which I've since fixed (it actually completely busted interrupts on my alpha at first). Here is the updated patch: > > > >--- //depot/vendor/freebsd/src/sys/dev/pci/pci.c 2005/06/03 19:45:18 > >+++ //depot/user/jhb/acpipci/dev/pci/pci.c 2005/08/03 19:57:34 > >@@ -76,6 +76,8 @@ > > > > static int pci_porten(device_t pcib, int b, int s, int f); > > static int pci_memen(device_t pcib, int b, int s, int f); > >+static void pci_assign_interrupt(device_t bus, device_t dev, > >+ int force_route); > > static int pci_add_map(device_t pcib, device_t bus, device_t dev, > > int b, int s, int f, int reg, > > struct resource_list *rl); > >@@ -922,13 +924,60 @@ > > } > > > > static void > >+pci_assign_interrupt(device_t bus, device_t dev, int force_route) > >+{ > >+ struct pci_devinfo *dinfo = device_get_ivars(dev); > >+ pcicfgregs *cfg = &dinfo->cfg; > >+ char tunable_name[64]; > >+ int irq; > >+ > >+ /* Has to have an intpin to have an interrupt. */ > >+ if (cfg->intpin == 0) > >+ return; > >+ > >+ /* Let the user override the IRQ with a tunable. */ > >+ irq = PCI_INVALID_IRQ; > >+ snprintf(tunable_name, sizeof(tunable_name), "hw.pci%d.%d.INT%c.irq", > >+ cfg->bus, cfg->slot, cfg->intpin + 'A' - 1); > >+ if (TUNABLE_INT_FETCH(tunable_name, &irq) && (irq >= 255 || irq <= 0)) > >+ irq = PCI_INVALID_IRQ; > >+ > >+ /* > >+ * If we didn't get an IRQ via the tunable, then we either use the > >+ * IRQ value in the intline register or we ask the bus to route an > >+ * interrupt for us. If force_route is true, then we only use the > >+ * value in the intline register if the bus was unable to assign an > >+ * IRQ. > >+ */ > >+ if (!PCI_INTERRUPT_VALID(irq)) { > >+ if (!PCI_INTERRUPT_VALID(cfg->intline) || force_route) > >+ irq = PCI_ASSIGN_INTERRUPT(bus, dev); > >+ if (!PCI_INTERRUPT_VALID(irq)) > >+ irq = cfg->intline; > >+ } > >+ > >+ /* If after all that we don't have an IRQ, just bail. */ > >+ if (!PCI_INTERRUPT_VALID(irq)) > >+ return; > >+ > >+ /* Update the config register if it changed. */ > >+ if (irq != cfg->intline) { > >+ cfg->intline = irq; > >+ pci_write_config(dev, PCIR_INTLINE, irq, 1); > >+ } > >+ > >+ /* Add this IRQ as rid 0 interrupt resource. */ > >+ resource_list_add(&dinfo->resources, SYS_RES_IRQ, 0, irq, irq, 1); > >+} > >+ > >+static void > > pci_add_resources(device_t pcib, device_t bus, device_t dev) > > { > > struct pci_devinfo *dinfo = device_get_ivars(dev); > > pcicfgregs *cfg = &dinfo->cfg; > > struct resource_list *rl = &dinfo->resources; > > struct pci_quirk *q; > >- int b, i, irq, f, s; > >+ int b, i, f, s; > > > > b = cfg->bus; > > s = cfg->slot; > >@@ -959,14 +1008,10 @@ > > * If the re-route fails, then just stick with what we > > * have. > > */ > >- irq = PCI_ASSIGN_INTERRUPT(bus, dev); > >- if (PCI_INTERRUPT_VALID(irq)) { > >- pci_write_config(dev, PCIR_INTLINE, irq, 1); > >- cfg->intline = irq; > >- } else > >+ pci_assign_interrupt(bus, dev, 1); > >+#else > >+ pci_assign_interrupt(bus, dev, 0); > > #endif > >- irq = cfg->intline; > >- resource_list_add(rl, SYS_RES_IRQ, 0, irq, irq, 1); > > } > > } > > > >@@ -1705,15 +1750,8 @@ > > * interrupt, try to assign it one. > > */ > > if (!PCI_INTERRUPT_VALID(cfg->intline) && > >- (cfg->intpin != 0)) { > >- cfg->intline = PCI_ASSIGN_INTERRUPT(dev, child); > >- if (PCI_INTERRUPT_VALID(cfg->intline)) { > >- pci_write_config(child, PCIR_INTLINE, > >- cfg->intline, 1); > >- resource_list_add(rl, SYS_RES_IRQ, 0, > >- cfg->intline, cfg->intline, 1); > >- } > >- } > >+ (cfg->intpin != 0)) > >+ pci_assign_interrupt(dev, child, 0); > > break; > > case SYS_RES_IOPORT: > > case SYS_RES_MEMORY: > > > > > > > > > > _______________________________________________ > freebsd-smp@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-smp > To unsubscribe, send any mail to "freebsd-smp-unsubscribe@freebsd.org" > -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-smp@FreeBSD.ORG Wed Aug 17 12:32:01 2005 Return-Path: X-Original-To: freebsd-smp@freebsd.org Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7457C16A41F for ; Wed, 17 Aug 2005 12:32:01 +0000 (GMT) (envelope-from hvleest@signet.nl) Received: from bsd.local.jellevanleest.nl (cp417515-a.dbsch1.nb.home.nl [84.27.32.116]) by mx1.FreeBSD.org (Postfix) with ESMTP id BE63443D45 for ; Wed, 17 Aug 2005 12:31:59 +0000 (GMT) (envelope-from hvleest@signet.nl) Received: from 84.27.32.116 (localhost.local.jellevanleest.nl [127.0.0.1]) by bsd.local.jellevanleest.nl (Postfix) with ESMTP id 36AA96208 for ; Wed, 17 Aug 2005 14:31:59 +0200 (CEST) Received: from 80.100.141.112 (SquirrelMail authenticated user hans); by 84.27.32.116 with HTTP; Wed, 17 Aug 2005 14:31:59 +0200 (CEST) Message-ID: <2168.80.100.141.112.1124281919.squirrel@80.100.141.112> In-Reply-To: <200508161126.12330.jhb@FreeBSD.org> References: <42CD9115.70302@signet.nl> <200508151337.59240.jhb@FreeBSD.org> <4301E0D9.8090307@signet.nl> <200508161126.12330.jhb@FreeBSD.org> Date: Wed, 17 Aug 2005 14:31:59 +0200 (CEST) From: "Hans van Leest" To: freebsd-smp@freebsd.org User-Agent: SquirrelMail/1.4.3a X-Mailer: SquirrelMail/1.4.3a MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Subject: Re: SMP boot errors X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: hvleest@signet.nl List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2005 12:32:01 -0000 > On Tuesday 16 August 2005 08:49 am, Hans van Leest wrote: >> Still problems unfortunaly >> I did the following. >> Started with a new CURRENT source. >> applied the patch, and build / installed the kernel >> >> At boot, with ACPI enabled in the BIOS, >> In the bootloader I typed: >> #set hw.pci0.31.INTA.irq=16 >> #boot >> I get the following error: >> Instruc. pointer: 0x20:0xc0800a87 >> Same error with ..irq=18 >> >> With ACPI disabled in the BIOS: >> #set hw.pci0.31.INTA.irq=16 (or 18) >> #boot >> Instruc. pointer: 0x20:0xc0530f4d >> >> A noticed the following line in dmesg after booting the previous kernel: >> atapci0: port >> 0xf000-0xf00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 irq 18 at device 31.2 >> on pci0 >> >> I looks like the Intel RAID controller uses irq 18 >> >> Any suggestions? > > I need the full error message when it dies. It seems to be a kernel > panic. > At the very least can you build a debug kernel and then get the panic > messages including the faulting virtual address and instruction pointer > lines? Even better can be if you compile ddb and kdb into your kernel and > get a stack trace from ddb. I'm pretty sure this patch can only help in > the > non-ACPI case as well where it panics with the multiple IRQs problem. > I've got ddb, kdb and debug compiled in my kernel, so I see all that on my screen. The last time, I tried to grap it by a serialconsole, with no succes. My XP box or FreeBSD box could not make a connection. Does anyone knowes a good howto for this? Thanx >> John Baldwin wrote: >> >> >On Sunday 14 August 2005 06:31 am, Hans van Leest wrote: >> > >> > >> >>Hello, >> >> >> >>I've started with a new 6.0 source. Patched the source and did a make >> >>buildworld, with no problems. >> >>When I do a make buildkernel I get an error: >> >> >> >> >> > >> >Yes, the patch had some bugs which I've since fixed (it actually >> completely > busted interrupts on my alpha at first). Here is the updated patch: >> > >> >--- //depot/vendor/freebsd/src/sys/dev/pci/pci.c 2005/06/03 19:45:18 >> >+++ //depot/user/jhb/acpipci/dev/pci/pci.c 2005/08/03 19:57:34 >> >@@ -76,6 +76,8 @@ >> > >> > static int pci_porten(device_t pcib, int b, int s, int f); >> > static int pci_memen(device_t pcib, int b, int s, int f); >> >+static void pci_assign_interrupt(device_t bus, device_t dev, >> >+ int force_route); >> > static int pci_add_map(device_t pcib, device_t bus, device_t dev, >> > int b, int s, int f, int reg, >> > struct resource_list *rl); >> >@@ -922,13 +924,60 @@ >> > } >> > >> > static void >> >+pci_assign_interrupt(device_t bus, device_t dev, int force_route) >> >+{ >> >+ struct pci_devinfo *dinfo = device_get_ivars(dev); >> >+ pcicfgregs *cfg = &dinfo->cfg; >> >+ char tunable_name[64]; >> >+ int irq; >> >+ >> >+ /* Has to have an intpin to have an interrupt. */ >> >+ if (cfg->intpin == 0) >> >+ return; >> >+ >> >+ /* Let the user override the IRQ with a tunable. */ >> >+ irq = PCI_INVALID_IRQ; >> >+ snprintf(tunable_name, sizeof(tunable_name), "hw.pci%d.%d.INT%c.irq", >> >+ cfg->bus, cfg->slot, cfg->intpin + 'A' - 1); >> >+ if (TUNABLE_INT_FETCH(tunable_name, &irq) && (irq >= 255 || irq <= >> 0)) >> >+ irq = PCI_INVALID_IRQ; >> >+ >> >+ /* >> >+ * If we didn't get an IRQ via the tunable, then we either use the >> >+ * IRQ value in the intline register or we ask the bus to route an >> >+ * interrupt for us. If force_route is true, then we only use the >> >+ * value in the intline register if the bus was unable to assign an >> >+ * IRQ. >> >+ */ >> >+ if (!PCI_INTERRUPT_VALID(irq)) { >> >+ if (!PCI_INTERRUPT_VALID(cfg->intline) || force_route) >> >+ irq = PCI_ASSIGN_INTERRUPT(bus, dev); >> >+ if (!PCI_INTERRUPT_VALID(irq)) >> >+ irq = cfg->intline; >> >+ } >> >+ >> >+ /* If after all that we don't have an IRQ, just bail. */ >> >+ if (!PCI_INTERRUPT_VALID(irq)) >> >+ return; >> >+ >> >+ /* Update the config register if it changed. */ >> >+ if (irq != cfg->intline) { >> >+ cfg->intline = irq; >> >+ pci_write_config(dev, PCIR_INTLINE, irq, 1); >> >+ } >> >+ >> >+ /* Add this IRQ as rid 0 interrupt resource. */ >> >+ resource_list_add(&dinfo->resources, SYS_RES_IRQ, 0, irq, irq, 1); >> >+} >> >+ >> >+static void >> > pci_add_resources(device_t pcib, device_t bus, device_t dev) >> > { >> > struct pci_devinfo *dinfo = device_get_ivars(dev); >> > pcicfgregs *cfg = &dinfo->cfg; >> > struct resource_list *rl = &dinfo->resources; >> > struct pci_quirk *q; >> >- int b, i, irq, f, s; >> >+ int b, i, f, s; >> > >> > b = cfg->bus; >> > s = cfg->slot; >> >@@ -959,14 +1008,10 @@ >> > * If the re-route fails, then just stick with what we >> > * have. >> > */ >> >- irq = PCI_ASSIGN_INTERRUPT(bus, dev); >> >- if (PCI_INTERRUPT_VALID(irq)) { >> >- pci_write_config(dev, PCIR_INTLINE, irq, 1); >> >- cfg->intline = irq; >> >- } else >> >+ pci_assign_interrupt(bus, dev, 1); >> >+#else >> >+ pci_assign_interrupt(bus, dev, 0); >> > #endif >> >- irq = cfg->intline; >> >- resource_list_add(rl, SYS_RES_IRQ, 0, irq, irq, 1); >> > } >> > } >> > >> >@@ -1705,15 +1750,8 @@ >> > * interrupt, try to assign it one. >> > */ >> > if (!PCI_INTERRUPT_VALID(cfg->intline) && >> >- (cfg->intpin != 0)) { >> >- cfg->intline = PCI_ASSIGN_INTERRUPT(dev, child); >> >- if (PCI_INTERRUPT_VALID(cfg->intline)) { >> >- pci_write_config(child, PCIR_INTLINE, >> >- cfg->intline, 1); >> >- resource_list_add(rl, SYS_RES_IRQ, 0, >> >- cfg->intline, cfg->intline, 1); >> >- } >> >- } >> >+ (cfg->intpin != 0)) >> >+ pci_assign_interrupt(dev, child, 0); >> > break; >> > case SYS_RES_IOPORT: >> > case SYS_RES_MEMORY: >> > >> > >> > >> > >> >> _______________________________________________ >> freebsd-smp@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-smp >> To unsubscribe, send any mail to "freebsd-smp-unsubscribe@freebsd.org" >> > > -- > John Baldwin <>< http://www.FreeBSD.org/~jhb/ > "Power Users Use the Power to Serve" = http://www.FreeBSD.org > _______________________________________________ > freebsd-smp@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-smp > To unsubscribe, send any mail to "freebsd-smp-unsubscribe@freebsd.org" > From owner-freebsd-smp@FreeBSD.ORG Wed Aug 17 23:55:11 2005 Return-Path: X-Original-To: freebsd-smp@freebsd.org Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E603116A41F for ; Wed, 17 Aug 2005 23:55:11 +0000 (GMT) (envelope-from girgen@FreeBSD.org) Received: from mxfep01.bredband.com (mxfep01.bredband.com [195.54.107.70]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0B3F543D48 for ; Wed, 17 Aug 2005 23:55:10 +0000 (GMT) (envelope-from girgen@FreeBSD.org) Received: from palle.girgensohn.se ([213.114.205.87] [213.114.205.87]) by mxfep01.bredband.com with ESMTP id <20050817235509.OLO23053.mxfep01.bredband.com@palle.girgensohn.se>; Thu, 18 Aug 2005 01:55:09 +0200 Received: from localhost (palle.girgensohn.se [127.0.0.1]) by palle.girgensohn.se (Postfix) with ESMTP id 568AA1CFCE; Thu, 18 Aug 2005 01:55:09 +0200 (CEST) Received: from palle.girgensohn.se ([127.0.0.1]) by localhost (palle.girgensohn.se [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 99127-09; Thu, 18 Aug 2005 01:55:09 +0200 (CEST) Received: from palle.girgensohn.se (palle.girgensohn.se [127.0.0.1]) by palle.girgensohn.se (Postfix) with ESMTP id 24EEF1CF5F; Thu, 18 Aug 2005 01:55:09 +0200 (CEST) Date: Thu, 18 Aug 2005 01:55:09 +0200 From: Palle Girgensohn To: Rutger Bevaart , freebsd-smp@freebsd.org Message-ID: <54A5EA8AE63A943A718F6AF2@palle.girgensohn.se> In-Reply-To: <24434.193.172.18.3.1121433324.squirrel@193.172.18.3> References: <24434.193.172.18.3.1121433324.squirrel@193.172.18.3> X-Mailer: Mulberry/3.1.6 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Virus-Scanned: by amavisd-new at pingpong.net Cc: Subject: Re: FreeBSD unstable on Dell 1750 using SMP? X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2005 23:55:12 -0000 --On fredag, juli 15, 2005 15.15.24 +0200 Rutger Bevaart wrote: > > hello list, > > For the past year we've been running several Dell PowerEdge 1750 servers > on FreeBSD 4.10, 4.11 and 5.3. All these machines have dual Xeons running > with HT enabled. This install has proven to be unstable in that the > machine will reboot between 3 days and 170 days without apparant reason. > No log is written. Other machines we have with a single CPU (HT enabled) > do not experience this problem. > > As it is present in both 4.x and 5.x and googling the last year has not > revealed similar experience I'm consulting this list. As all of these > machines are productions machines that have a continuous load (not heavly > load, but a light average - some peaks) it's not easy to experiment with > HT setting etc. I dislike driving to the datacenter for locked systems > with fubarred kernels ;-) > > The only error i've ever seen just before a reboot is "bge0: discard frame > w/o packet header" on the 5.3 machine. Late comment while browsing the list for tips... No good clues, I'm afraid, but we have a 2850, and it is far from stable, crashing within hours when running SMP, often but not always under high load. Single CPU works like a charm. This is very annoying, to say the least. See my posts on amd64@ around June 15. FreeBSD 5.4p1 (amd64). Dell 2850 with dual Xeon CPUS, EM64T. /Palle From owner-freebsd-smp@FreeBSD.ORG Thu Aug 18 12:30:40 2005 Return-Path: X-Original-To: freebsd-smp@freebsd.org Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BFD6E16A41F; Thu, 18 Aug 2005 12:30:40 +0000 (GMT) (envelope-from rutger.bevaart@illian.net) Received: from darwin.illian.net (darwin.illian.net [80.69.74.160]) by mx1.FreeBSD.org (Postfix) with ESMTP id 425ED43D46; Thu, 18 Aug 2005 12:30:39 +0000 (GMT) (envelope-from rutger.bevaart@illian.net) Received: from localhost (localhost.illian.net [127.0.0.1]) by darwin.illian.net (Postfix) with ESMTP id 0EB884506F; Thu, 18 Aug 2005 14:30:45 +0200 (CEST) Received: from darwin.illian.net ([127.0.0.1]) by localhost (darwin.illian.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 89413-05; Thu, 18 Aug 2005 14:30:44 +0200 (CEST) Received: from www.illian.net (localhost.illian.net [127.0.0.1]) by darwin.illian.net (Postfix) with ESMTP id 4747745059; Thu, 18 Aug 2005 14:30:44 +0200 (CEST) Received: from 193.172.18.3 (SquirrelMail authenticated user rutger); by www.illian.net with HTTP; Thu, 18 Aug 2005 14:30:44 +0200 (CEST) Message-ID: <14564.193.172.18.3.1124368244.squirrel@193.172.18.3> In-Reply-To: <54A5EA8AE63A943A718F6AF2@palle.girgensohn.se> References: <24434.193.172.18.3.1121433324.squirrel@193.172.18.3> <54A5EA8AE63A943A718F6AF2@palle.girgensohn.se> Date: Thu, 18 Aug 2005 14:30:44 +0200 (CEST) From: "Rutger Bevaart" To: "Palle Girgensohn" User-Agent: SquirrelMail/1.4.3a X-Mailer: SquirrelMail/1.4.3a MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-Virus-Scanned: amavisd-new at illian.net Cc: freebsd-smp@freebsd.org, Rutger Bevaart Subject: Re: FreeBSD unstable on Dell 1750 using SMP? X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 12:30:40 -0000 It seems that updating our machine to 5.4-p5 (RELEND_5_4) has solved this, or at least made it occur less frequently. Our last reboot was after building and installing the new kernel and it hasn't gone down since. This is with SMP, ACPI and HT enabled on a Dell 1750 with two 3GHz Xeons. The 2850 has been rock-stable running 5.4-p3. Whatever is was, it seems to have been fixed around that time. Could be that your issues are amd64 related. We run the i386 branch because we need stable systems, not 64bit. The issue still persists on 4.11 though. Can somebody explain what the ACPI fixes were around that time and if they will be backported to 4.X? Regards Rutger Bevaart On Thu, August 18, 2005 1:55, Palle Girgensohn said: > > > --On fredag, juli 15, 2005 15.15.24 +0200 Rutger Bevaart > wrote: > >> >> hello list, >> >> For the past year we've been running several Dell PowerEdge 1750 servers >> on FreeBSD 4.10, 4.11 and 5.3. All these machines have dual Xeons >> running >> with HT enabled. This install has proven to be unstable in that the >> machine will reboot between 3 days and 170 days without apparant reason. >> No log is written. Other machines we have with a single CPU (HT enabled) >> do not experience this problem. >> >> As it is present in both 4.x and 5.x and googling the last year has not >> revealed similar experience I'm consulting this list. As all of these >> machines are productions machines that have a continuous load (not >> heavly >> load, but a light average - some peaks) it's not easy to experiment with >> HT setting etc. I dislike driving to the datacenter for locked systems >> with fubarred kernels ;-) >> >> The only error i've ever seen just before a reboot is "bge0: discard >> frame >> w/o packet header" on the 5.3 machine. > > Late comment while browsing the list for tips... > > No good clues, I'm afraid, but we have a 2850, and it is far from stable, > crashing within hours when running SMP, often but not always under high > load. Single CPU works like a charm. This is very annoying, to say the > least. See my posts on amd64@ around June 15. > > FreeBSD 5.4p1 (amd64). Dell 2850 with dual Xeon CPUS, EM64T. > > /Palle > > Rutger Bevaart :: illian.networks From owner-freebsd-smp@FreeBSD.ORG Thu Aug 18 12:46:06 2005 Return-Path: X-Original-To: freebsd-smp@freebsd.org Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A018816A41F for ; Thu, 18 Aug 2005 12:46:06 +0000 (GMT) (envelope-from girgen@FreeBSD.org) Received: from mxfep01.bredband.com (mxfep01.bredband.com [195.54.107.70]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9522443D45 for ; Thu, 18 Aug 2005 12:46:05 +0000 (GMT) (envelope-from girgen@FreeBSD.org) Received: from palle.girgensohn.se ([213.114.205.87] [213.114.205.87]) by mxfep01.bredband.com with ESMTP id <20050818124604.GBZO23053.mxfep01.bredband.com@palle.girgensohn.se>; Thu, 18 Aug 2005 14:46:04 +0200 Received: from localhost (palle.girgensohn.se [127.0.0.1]) by palle.girgensohn.se (Postfix) with ESMTP id D474F1D12D; Thu, 18 Aug 2005 14:46:03 +0200 (CEST) Received: from palle.girgensohn.se ([127.0.0.1]) by localhost (palle.girgensohn.se [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 06128-05; Thu, 18 Aug 2005 14:46:03 +0200 (CEST) Received: from palle.girgensohn.se (palle.girgensohn.se [127.0.0.1]) by palle.girgensohn.se (Postfix) with ESMTP id 9CCC21CFCE; Thu, 18 Aug 2005 14:46:03 +0200 (CEST) Date: Thu, 18 Aug 2005 14:46:03 +0200 From: Palle Girgensohn To: Rutger Bevaart Message-ID: <1FD3C2C1CA1D994795EC5288@palle.girgensohn.se> In-Reply-To: <14564.193.172.18.3.1124368244.squirrel@193.172.18.3> References: <24434.193.172.18.3.1121433324.squirrel@193.172.18.3> <54A5EA8AE63A943A718F6AF2@palle.girgensohn.se> <14564.193.172.18.3.1124368244.squirrel@193.172.18.3> X-Mailer: Mulberry/3.1.6 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Virus-Scanned: by amavisd-new at pingpong.net Cc: freebsd-smp@freebsd.org, Rutger Bevaart Subject: Re: FreeBSD unstable on Dell 1750 using SMP? X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 12:46:06 -0000 --On torsdag, augusti 18, 2005 14.30.44 +0200 Rutger Bevaart wrote: > It seems that updating our machine to 5.4-p5 (RELEND_5_4) has solved this, > or at least made it occur less frequently. Our last reboot was after > building and installing the new kernel and it hasn't gone down since. Very interesting. We're still at 5.4-p1. The version bump fixes didn't look like they were addressing stability, only security, but why not... > This > is with SMP, ACPI and HT enabled on a Dell 1750 with two 3GHz Xeons. Pretty identical to our system. > The > 2850 has been rock-stable running 5.4-p3. And you never ran previous versions on that system? > Whatever is was, it seems to > have been fixed around that time. > > Could be that your issues are amd64 related. We run the i386 branch > because we need stable systems, not 64bit. I have indications that the problems have occured equally on i386 and amd64, and that amd64 is considered stable, but that might not be quite true? Regards, Palle > The issue still persists on 4.11 though. Can somebody explain what the > ACPI fixes were around that time and if they will be backported to 4.X? > > Regards > Rutger Bevaart > > On Thu, August 18, 2005 1:55, Palle Girgensohn said: >> >> >> --On fredag, juli 15, 2005 15.15.24 +0200 Rutger Bevaart >> wrote: >> >>> >>> hello list, >>> >>> For the past year we've been running several Dell PowerEdge 1750 servers >>> on FreeBSD 4.10, 4.11 and 5.3. All these machines have dual Xeons >>> running >>> with HT enabled. This install has proven to be unstable in that the >>> machine will reboot between 3 days and 170 days without apparant reason. >>> No log is written. Other machines we have with a single CPU (HT enabled) >>> do not experience this problem. >>> >>> As it is present in both 4.x and 5.x and googling the last year has not >>> revealed similar experience I'm consulting this list. As all of these >>> machines are productions machines that have a continuous load (not >>> heavly >>> load, but a light average - some peaks) it's not easy to experiment with >>> HT setting etc. I dislike driving to the datacenter for locked systems >>> with fubarred kernels ;-) >>> >>> The only error i've ever seen just before a reboot is "bge0: discard >>> frame >>> w/o packet header" on the 5.3 machine. >> >> Late comment while browsing the list for tips... >> >> No good clues, I'm afraid, but we have a 2850, and it is far from stable, >> crashing within hours when running SMP, often but not always under high >> load. Single CPU works like a charm. This is very annoying, to say the >> least. See my posts on amd64@ around June 15. >> >> FreeBSD 5.4p1 (amd64). Dell 2850 with dual Xeon CPUS, EM64T. >> >> /Palle >> >> > > > Rutger Bevaart :: illian.networks > From owner-freebsd-smp@FreeBSD.ORG Thu Aug 18 12:49:51 2005 Return-Path: X-Original-To: freebsd-smp@freebsd.org Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EEA7E16A41F for ; Thu, 18 Aug 2005 12:49:51 +0000 (GMT) (envelope-from nicklas@dinpris.no) Received: from dp-mail-01.dinpris.com (dp-mail-01.dinpris.com [62.73.247.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 5875F43D46 for ; Thu, 18 Aug 2005 12:49:48 +0000 (GMT) (envelope-from nicklas@dinpris.no) Received: (qmail 58372 invoked by uid 1004); 18 Aug 2005 12:58:53 -0000 Received: from 62.73.247.155 by dp-mail-01.dinpris.com (envelope-from , uid 98) with qmail-scanner-1.25 (clamdscan: 0.86.2/1030. spamassassin: 3.1.0-rc1. perlscan: 1.25. Clear:RC:1(62.73.247.155):. Processed in 0.049316 secs); 18 Aug 2005 12:58:53 -0000 X-Qmail-Scanner-Mail-From: nicklas@dinpris.no via dp-mail-01.dinpris.com X-Qmail-Scanner: 1.25 (Clear:RC:1(62.73.247.155):. Processed in 0.049316 secs) Received: from 529c-tbg7-5fl.oslo.dinpris.com (HELO ?62.73.247.155?) (nicklas@dinpris.no@62.73.247.155) by dp-mail-01.dinpris.com with SMTP; 18 Aug 2005 12:58:53 -0000 Message-ID: <430483CA.5090103@dinpris.no> Date: Thu, 18 Aug 2005 14:49:14 +0200 From: "Nicklas B. Westerlund" User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Palle Girgensohn References: <24434.193.172.18.3.1121433324.squirrel@193.172.18.3> <54A5EA8AE63A943A718F6AF2@palle.girgensohn.se> <14564.193.172.18.3.1124368244.squirrel@193.172.18.3> <1FD3C2C1CA1D994795EC5288@palle.girgensohn.se> In-Reply-To: <1FD3C2C1CA1D994795EC5288@palle.girgensohn.se> X-Enigmail-Version: 0.91.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-smp@freebsd.org, Rutger Bevaart Subject: Re: FreeBSD unstable on Dell 1750 using SMP? X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 12:49:52 -0000 Palle Girgensohn wrote: > > > --On torsdag, augusti 18, 2005 14.30.44 +0200 Rutger Bevaart > wrote: > >> It seems that updating our machine to 5.4-p5 (RELEND_5_4) has solved >> this, >> or at least made it occur less frequently. Our last reboot was after >> building and installing the new kernel and it hasn't gone down since. > > > Very interesting. We're still at 5.4-p1. The version bump fixes didn't > look like they were addressing stability, only security, but why not... I'm running p1 on a few webservers, as p2 (which was the next level when I set them up) seemed to break parts of the nfs throughput, and since then I havn't dared patching up anything more. Anyone of you using nfs on something higher than p1 ? > > > > Regards, > Palle > > Regards, Nick. From owner-freebsd-smp@FreeBSD.ORG Thu Aug 18 12:53:06 2005 Return-Path: X-Original-To: freebsd-smp@freebsd.org Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CA5F616A41F for ; Thu, 18 Aug 2005 12:53:06 +0000 (GMT) (envelope-from girgen@FreeBSD.org) Received: from mxfep02.bredband.com (mxfep02.bredband.com [195.54.107.73]) by mx1.FreeBSD.org (Postfix) with ESMTP id E1B3B43D48 for ; Thu, 18 Aug 2005 12:53:05 +0000 (GMT) (envelope-from girgen@FreeBSD.org) Received: from palle.girgensohn.se ([213.114.205.87] [213.114.205.87]) by mxfep02.bredband.com with ESMTP id <20050818125304.HWKE17509.mxfep02.bredband.com@palle.girgensohn.se>; Thu, 18 Aug 2005 14:53:04 +0200 Received: from localhost (palle.girgensohn.se [127.0.0.1]) by palle.girgensohn.se (Postfix) with ESMTP id 8E1BA1CE61; Thu, 18 Aug 2005 14:53:04 +0200 (CEST) Received: from palle.girgensohn.se ([127.0.0.1]) by localhost (palle.girgensohn.se [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 06128-06; Thu, 18 Aug 2005 14:53:04 +0200 (CEST) Received: from palle.girgensohn.se (palle.girgensohn.se [127.0.0.1]) by palle.girgensohn.se (Postfix) with ESMTP id 5462C1CC16; Thu, 18 Aug 2005 14:53:04 +0200 (CEST) Date: Thu, 18 Aug 2005 14:53:04 +0200 From: Palle Girgensohn To: "Nicklas B. Westerlund" Message-ID: <5170F4A2B44FD350339F1681@palle.girgensohn.se> In-Reply-To: <430483CA.5090103@dinpris.no> References: <24434.193.172.18.3.1121433324.squirrel@193.172.18.3> <54A5EA8AE63A943A718F6AF2@palle.girgensohn.se> <14564.193.172.18.3.1124368244.squirrel@193.172.18.3> <1FD3C2C1CA1D994795EC5288@palle.girgensohn.se> <430483CA.5090103@dinpris.no> X-Mailer: Mulberry/3.1.6 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Virus-Scanned: by amavisd-new at pingpong.net Cc: freebsd-smp@freebsd.org, Rutger Bevaart Subject: Re: FreeBSD unstable on Dell 1750 using SMP? X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 12:53:06 -0000 --On torsdag, augusti 18, 2005 14.49.14 +0200 "Nicklas B. Westerlund" wrote: > Palle Girgensohn wrote: > >> >> >> --On torsdag, augusti 18, 2005 14.30.44 +0200 Rutger Bevaart >> wrote: >> >>> It seems that updating our machine to 5.4-p5 (RELEND_5_4) has solved >>> this, >>> or at least made it occur less frequently. Our last reboot was after >>> building and installing the new kernel and it hasn't gone down since. >> >> >> Very interesting. We're still at 5.4-p1. The version bump fixes didn't >> look like they were addressing stability, only security, but why not... > > > I'm running p1 on a few webservers, as p2 (which was the next level when > I set them up) seemed to break parts of the nfs throughput, and since > then I havn't dared patching up anything more. Anyone of you using nfs > on something higher than p1 ? I'm using nfs a bit, not for important tasks though. and I'm still on 5.4-p1, but this is subject to change soon. /Palle From owner-freebsd-smp@FreeBSD.ORG Thu Aug 18 12:57:08 2005 Return-Path: X-Original-To: freebsd-smp@freebsd.org Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3883316A41F; Thu, 18 Aug 2005 12:57:08 +0000 (GMT) (envelope-from rutger.bevaart@illian.net) Received: from darwin.illian.net (darwin.illian.net [80.69.74.160]) by mx1.FreeBSD.org (Postfix) with ESMTP id ACE4843D58; Thu, 18 Aug 2005 12:57:07 +0000 (GMT) (envelope-from rutger.bevaart@illian.net) Received: from localhost (localhost.illian.net [127.0.0.1]) by darwin.illian.net (Postfix) with ESMTP id B9EFD450B5; Thu, 18 Aug 2005 14:57:17 +0200 (CEST) Received: from darwin.illian.net ([127.0.0.1]) by localhost (darwin.illian.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 93128-02; Thu, 18 Aug 2005 14:57:17 +0200 (CEST) Received: from www.illian.net (localhost.illian.net [127.0.0.1]) by darwin.illian.net (Postfix) with ESMTP id 06F874506F; Thu, 18 Aug 2005 14:57:17 +0200 (CEST) Received: from 193.172.18.3 (SquirrelMail authenticated user rutger); by www.illian.net with HTTP; Thu, 18 Aug 2005 14:57:17 +0200 (CEST) Message-ID: <44669.193.172.18.3.1124369837.squirrel@193.172.18.3> In-Reply-To: <430483CA.5090103@dinpris.no> References: <24434.193.172.18.3.1121433324.squirrel@193.172.18.3> <54A5EA8AE63A943A718F6AF2@palle.girgensohn.se> <14564.193.172.18.3.1124368244.squirrel@193.172.18.3> <1FD3C2C1CA1D994795EC5288@palle.girgensohn.se> <430483CA.5090103@dinpris.no> Date: Thu, 18 Aug 2005 14:57:17 +0200 (CEST) From: "Rutger Bevaart" To: "Nicklas B. Westerlund" User-Agent: SquirrelMail/1.4.3a X-Mailer: SquirrelMail/1.4.3a MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-Virus-Scanned: amavisd-new at illian.net Cc: Palle Girgensohn , freebsd-smp@freebsd.org Subject: Re: FreeBSD unstable on Dell 1750 using SMP? X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 12:57:08 -0000 I have to check. We're running a reasonable sized domain ordering system on 5.4 but as I currently don't have access to those machines I cannot check the exact version. I must correct myself, our stable 2850 is running 5.3-p9. Therefore I cannot say if it's stable because of 5.3 or the relatively light load (java tomcat stuff). Rgds Rutger On Thu, August 18, 2005 14:49, Nicklas B. Westerlund said: > Palle Girgensohn wrote: > >> >> >> --On torsdag, augusti 18, 2005 14.30.44 +0200 Rutger Bevaart >> wrote: >> >>> It seems that updating our machine to 5.4-p5 (RELEND_5_4) has solved >>> this, >>> or at least made it occur less frequently. Our last reboot was after >>> building and installing the new kernel and it hasn't gone down since. >> >> >> Very interesting. We're still at 5.4-p1. The version bump fixes didn't >> look like they were addressing stability, only security, but why not... > > > I'm running p1 on a few webservers, as p2 (which was the next level when > I set them up) seemed to break parts of the nfs throughput, and since > then I havn't dared patching up anything more. Anyone of you using nfs > on something higher than p1 ? > >> >> >> >> Regards, >> Palle >> >> > > Regards, > Nick. > > Rutger Bevaart :: illian.networks