From owner-freebsd-stable@FreeBSD.ORG Thu Nov 17 17:19:46 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A2B8516A44B for ; Thu, 17 Nov 2005 17:19:46 +0000 (GMT) (envelope-from johan@stromnet.org) Received: from pne-smtpout1-sn2.hy.skanova.net (pne-smtpout1-sn2.hy.skanova.net [81.228.8.83]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1A9CE43D49 for ; Thu, 17 Nov 2005 17:19:45 +0000 (GMT) (envelope-from johan@stromnet.org) Received: from elfi2.stromnet.org (81.231.107.13) by pne-smtpout1-sn2.hy.skanova.net (7.2.060.1) id 4378E95300109EEA for freebsd-stable@freebsd.org; Thu, 17 Nov 2005 18:19:44 +0100 Received: from [10.10.0.6] (vpn1-c1.stromnet.org [10.10.0.6]) by elfi2.stromnet.org (Postfix) with ESMTP id 21428CF032 for ; Thu, 17 Nov 2005 18:19:39 +0100 (CET) Mime-Version: 1.0 (Apple Message framework v746.2) Content-Transfer-Encoding: 7bit Message-Id: <991F35AA-151B-4AEA-82BD-5F4AEDF28424@stromnet.org> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed To: freebsd-stable@freebsd.org From: =?ISO-8859-1?Q?Johan_Str=F6m?= Date: Thu, 17 Nov 2005 18:20:38 +0100 X-Mailer: Apple Mail (2.746.2) Subject: Page fault, GEOM problem?? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Nov 2005 17:19:46 -0000 Ok, just got this not so very nice error on a RELENG_6_0 box (built from sources this morning, GENERIC kernel minus drivers I dont use): Nov 17 15:35:43 elfi kernel: subdisk10: detached Nov 17 15:35:43 elfi kernel: ad10: detached Nov 17 15:35:43 elfi kernel: unknown: TIMEOUT - READ_DMA retrying (1 retry left) LBA=85720528 Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad10s1 disconnected. Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134356992, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134373376, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134438912, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268591104, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268607488, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268623872, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268640256, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=20151026176, length=2048)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=32299655680, length=8192)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[READ(offset=37363671552, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[READ(offset=38349087232, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[READ(offset=45453566464, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[READ(offset=54459458048, length=131072)] Nov 17 17:59:18 elfi syslogd: kernel boot file is /boot/kernel/kernel Nov 17 17:59:18 elfi kernel: Nov 17 17:59:18 elfi kernel: Nov 17 17:59:18 elfi kernel: Fatal trap 12: page fault while in kernel mode Nov 17 17:59:18 elfi kernel: fault virtual address = 0x48 Nov 17 17:59:18 elfi kernel: fault code = supervisor read, page not present Nov 17 17:59:18 elfi kernel: instruction pointer = 0x20:0xc0506b92 Nov 17 17:59:18 elfi kernel: stack pointer = 0x28:0xd56d7c9c Nov 17 17:59:18 elfi kernel: frame pointer = 0x28:0xd56d7c9c Nov 17 17:59:18 elfi kernel: code segment = base 0x0, limit 0xfffff, type 0x1b Nov 17 17:59:18 elfi kernel: = DPL 0, pres 1, def32 1, gran 1 Nov 17 17:59:18 elfi kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Nov 17 17:59:18 elfi kernel: current process = 36 (swi4: clock sio) Nov 17 17:59:18 elfi kernel: trap number = 12 Nov 17 17:59:18 elfi kernel: panic: page fault Nov 17 17:59:18 elfi kernel: Uptime: 8h55m1s ad10 and ad6, 2 brand new Maxtor Maxline 300GB SATA, attached to a Promise PDC40518 SATA150 controller, makes a GEOM mirror gm0s1. I've been running this stuff in another "test" machine (MSI K8N neo Platinum, KT333 chip I believe), and I havent had a single problem. I moved the disks/controllercard to my "real" server 24 hours ago, with the only apparent "problem" I seemd to have was this: Nov 17 07:06:12 elfi kernel: xl0: transmission error: 90 Nov 17 07:06:12 elfi kernel: xl0: tx underrun, increasing tx start threshold to 120 bytes Nov 17 07:06:18 elfi kernel: xl0: watchdog timeout Nov 17 07:06:18 elfi kernel: xl0: link state changed to DOWN Nov 17 07:06:18 elfi kernel: vlan5: link state changed to DOWN Nov 17 07:06:20 elfi kernel: xl0: link state changed to UP Nov 17 07:06:20 elfi kernel: vlan5: link state changed to UP Comming and going... these problems just apperade during first 20-30 minutes after boot, then they dissapeared totally (and yes there was plenty of IO on the net going on both during and after these messages). Sometimes i just got the first two messages and nothing "happened", but sometimes the watchdog message came and the network died for a minute or so. Here is dmesg from last boot (directly after crash): Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.0-RELEASE #0: Thu Nov 17 00:49:29 CET 2005 johan@elfi.stromnet.org:/usr/obj/usr/src/sys/ELFI ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(TM) XP 1900+ (1599.56-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x662 Stepping = 2 Features=0x383fbff AMD Features=0xc0480800 real memory = 536854528 (511 MB) avail memory = 516014080 (492 MB) ioapic0: Changing APIC ID to 2 ioapic0 irqs 0-23 on motherboard npx0: [FAST] npx0: on motherboard npx0: INT 16 interface acpi0: on motherboard acpi0: Power Button (fixed) pci_link0: irq 11 on acpi0 pci_link1: irq 10 on acpi0 pci_link2: irq 0 on acpi0 pci_link3: irq 12 on acpi0 pci_link4: irq 5 on acpi0 Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <32-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0 cpu0: on acpi0 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: mem 0xe0000000-0xe3ffffff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pci0: at device 5.0 (no driver attached) fwohci0: mem 0xdf000000-0xdf0007ff,0xde800000-0xde803fff irq 17 at device 7.0 on pci0 fwohci0: OHCI version 1.10 (ROM=1) fwohci0: No. of Isochronous channels is 4. fwohci0: EUI64 00:e0:18:00:00:02:7e:fe fwohci0: Phy 1394a available S400, 1 ports. fwohci0: Link S400, max_rec 2048 bytes. firewire0: on fwohci0 sbp0: on firewire0 fwe0: on firewire0 if_fwe0: Fake Ethernet address: 02:e0:18:02:7e:fe fwe0: Ethernet address: 02:e0:18:02:7e:fe fwe0: if_start running deferred for Giant fwohci0: Initiate bus reset fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me) firewire0: bus manager 0 (me) uhci0: port 0xd400-0xd41f irq 19 at device 9.0 on pci0 uhci0: [GIANT-LOCKED] usb0: on uhci0 usb0: USB revision 1.0 uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xd000-0xd01f irq 16 at device 9.1 on pci0 uhci1: [GIANT-LOCKED] usb1: on uhci1 usb1: USB revision 1.0 uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered ehci0: mem 0xde000000-0xde0000ff irq 17 at device 9.2 on pci0 ehci0: [GIANT-LOCKED] usb2: EHCI version 0.95 usb2: companion controllers, 2 ports each: usb0 usb1 usb2: on ehci0 usb2: USB revision 2.0 uhub2: VIA EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub2: 4 ports with 4 removable, self powered pci0: at device 12.0 (no driver attached) atapci0: port 0xb400-0xb47f, 0xb000-0xb0ff mem 0xdc000000-0xdc000fff,0xdb800000-0xdb81ffff irq 17 at device 14.0 on pci0 ata2: on atapci0 ata3: on atapci0 ata4: on atapci0 ata5: on atapci0 xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xa800-0xa87f mem 0xdb000000-0xdb00007f irq 19 at device 16.0 on pci0 miibus0: on xl0 xlphy0: <3c905C 10/100 internal PHY> on miibus0 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl0: Ethernet address: 00:04:76:ef:c6:36 isab0: at device 17.0 on pci0 isa0: on isab0 atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xa400-0xa40f at device 17.1 on pci0 ata0: on atapci1 ata1: on atapci1 uhci2: port 0xa000-0xa01f at device 17.2 on pci0 uhci2: [GIANT-LOCKED] usb3: on uhci2 usb3: USB revision 1.0 uhub3: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub3: 2 ports with 2 removable, self powered uhci3: port 0x9800-0x981f irq 21 at device 17.3 on pci0 uhci3: [GIANT-LOCKED] usb4: on uhci3 usb4: USB revision 1.0 uhub4: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub4: 2 ports with 2 removable, self powered ppc0: port 0x378-0x37f,0x778-0x77b irq 7 drq 3 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/9 bytes threshold ppbus0: on ppc0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] pmtimer0 on isa0 orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xccfff, 0xd0000-0xd07ff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 1599556047 Hz quality 800 Timecounters tick every 1.000 msec acd0: CDROM at ata1-master PIO4 ad6: 286188MB at ata3-master SATA150 ad10: 286188MB at ata5-master SATA150 GEOM_MIRROR: Device gm0s1 created (id=4118114647). GEOM_MIRROR: Device gm0s1: provider ad6s1 detected. GEOM_MIRROR: Device gm0s1: provider ad10s1 detected. GEOM_MIRROR: Device gm0s1: provider ad6s1 activated. GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 launched. GEOM_MIRROR: Device gm0s1: rebuilding provider ad10s1. Trying to mount root from ufs:/dev/mirror/gm0s1a WARNING: / was not properly dismounted WARNING: /tmp was not properly dismounted WARNING: /usr was not properly dismounted /usr: mount pending error: blocks 8076 files 28 WARNING: /var was not properly dismounted /var: mount pending error: blocks 4508 files 2 xl0: transmission error: 90 xl0: tx underrun, increasing tx start threshold to 120 bytes xl0: transmission error: 90 xl0: tx underrun, increasing tx start threshold to 180 bytes The network card is the exact same model as the one I used in the "test" machine, didn't have any problems there.. So, any ideas what this can be? If there were a disk crash, wish I have a hard time believing since I ran powermax (maxtor test program) on both of these disk 3 weeks ago and they have been running fine w/o a single problem since I started using them, why didn't just GEOM kick in and run on the other disk? Pagefaulting is not a way to react if a disk goes dead.. Hope someone can help me/this problem doesn't occur any more... but I suppose that is to much to hope for... Thanks Johan