From owner-freebsd-stable@FreeBSD.ORG Fri Apr 7 12:15:10 2006 Return-Path: X-Original-To: freebsd-stable@FreeBSD.ORG Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6651B16A40D for ; Fri, 7 Apr 2006 12:15:10 +0000 (UTC) (envelope-from jas@math.jussieu.fr) Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id F1E8A43D45 for ; Fri, 7 Apr 2006 12:15:04 +0000 (GMT) (envelope-from jas@math.jussieu.fr) Received: from riemann.math.jussieu.fr (riemann.math.jussieu.fr [134.157.13.3]) by shiva.jussieu.fr (8.13.6/jtpda-5.4) with ESMTP id k37CF3GT051116 for ; Fri, 7 Apr 2006 14:15:03 +0200 (CEST) X-Ids: 168 Received: from grobner1.math.jussieu.fr (grobner1.math.jussieu.fr [134.157.13.118]) by riemann.math.jussieu.fr (8.13.6/jtpda-5.4) with ESMTP id k37CEqvN051599 for ; Fri, 7 Apr 2006 14:14:54 +0200 (CEST) Received: from grobner1.math.jussieu.fr (localhost.localdomain [127.0.0.1]) by grobner1.math.jussieu.fr (8.13.1/jtpda-5.4) with ESMTP id k37CEqov027353 for ; Fri, 7 Apr 2006 14:14:52 +0200 Received: (from jas@localhost) by grobner1.math.jussieu.fr (8.13.1/8.13.1/Submit) id k37CEqZH027352 for freebsd-stable@FreeBSD.ORG; Fri, 7 Apr 2006 14:14:52 +0200 Date: Fri, 7 Apr 2006 14:14:52 +0200 From: Albert Shih To: freebsd-stable@FreeBSD.ORG Message-ID: <20060407121452.GO1784@math.jussieu.fr> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="0ntfKIWw70PvrIHh" Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.5.6i X-Spam-Score: -1.44 () ALL_TRUSTED X-Scanned-By: MIMEDefang 2.56 on 134.157.13.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.7.2 (shiva.jussieu.fr [134.157.0.168]); Fri, 07 Apr 2006 14:15:03 +0200 (CEST) X-Antivirus: scanned by sophie at shiva.jussieu.fr X-Miltered: at shiva.jussieu.fr with ID 443657C7.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! Cc: Subject: Disappointed-new X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: shih@math.jussieu.fr List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Apr 2006 12:15:10 -0000 --0ntfKIWw70PvrIHh Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit Hi all I've sent a message two days ago with "Disappointed" subject. Many of you answer me I don't have describe the bug. Well, first my english is very bad, and second I don't blame anyone, never the developper. Personnaly I'm very impresse by the work you doing. Now a fine description of my problem. Hardware : HP Proliant ML 350 G4 2 x Xeon 3.2 Ghz (HT disable) 1 Go Ram 1 bge network interface on mothercard 1 dual 1000Mbits/s (em chipset) 1 dual 100Mbits/s (fxp chipset) 1 Internal 641 Smart Array with 2 Hotplug disk in raid 1 1 MSA1000 with 14 disk on fiber channel attachement Situation : On every network card we have different IP subnet Every network card have he's owne IP address All interface is connected on Foundry 1000 Mbits/s switch L2 Purpose : It's central nfs server (NetApp for «small» budget...), the server run a dhcpd server and that's all. There are no user account on this server. The server is not routed on Internet. The nfs is bind on 3 of 5 IP number (the 2 other is just running ssh for scp) There are 13 nfs clients running Linux (different version of kernel) There are also 4 nfs clients running FreeBSD (different version but all > 5.2 and < 5.5) In the time : The 6-stable is installed on the server on begin of February 2006 Problems : First time : Kernel : SMP+ipfw In first time the «main» nfsd is bind on the bge0 interface (main=90% nfs traffic) After 10-15 days of perfect running, the bge0 don't work, but the other interface working perfectly. The server is up and the other nfs clients can acces without problem the nfs partition. I can logon the console. And I've try many ifconfig bge0 down ifconfig bge up ifconfig bge0 delete etc... nothing work On the console the are repeatly message like bge0 watchdog timeout problems bge0 watchdog timeout problems Only (for me) reboot can make the system re-work. And after reboot everything work fine. But after some days the problem is come again. And in this second case all interfaces don't work. But the I always can logon in the console. But the reboot is not clean (I need to make a big fsck) Second time : Kernel : NO_SMP +ipfw After some advice on this mailing-list I switch to mono-proc version of the kernel. This time after some days working fine the bge0 don't work again (same condition of first time) third time : Kernel : NO_SMP + ipfw I switch the main nfs(=90% of traffic) interface to em0 and put a not running nfs (only scp) ip number on the bge0. Again after some days the .... em0 interface don't work. And this time the message on console is em0 watchdog timeout problems sometime I have fxpX watchdog timeout problem too forth time : Kernel : NO_SMP + polling + ipfw Now I'm running all interface in polling mode. And...I hope it's work...(running from 2 days). Information : I can't tell if it's during heavy nfs load, but I really don't think. There are on crash during saturday (and we don't have many users in this day). I cannot reproduce this bug. I've try to make a big nfs access (on 4 linux clients I'm running in same time something like find . -type f -exec md5sum {} \; but he won't crash. In this partition there are 30 Go. I forget to tell I'm running a very close configuration (a old ML350G3 with same MSA1000 in same condition) with 4.x during 4 years without any crash (with the same clients etc...) In attachement the dmesg just after the server boot. Next Monday I switch to DB kernel but now I just can reboot the server (600 users). I hope that's can help you to make FreeBSD better than best OS ;-) . Lots of thanks. -- Albert SHIH Universite de Paris 7 (Denis DIDEROT) U.F.R. de Mathematiques. Heure local/Local time: Fri Apr 7 13:38:55 CEST 2006 --0ntfKIWw70PvrIHh Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=dmesg-20060405-174250 Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.1-PRERELEASE #1: Wed Apr 5 17:27:03 CEST 2006 root@nfs3.math.jussieu.fr:/usr/obj/usr/src/sys/NFS3-mono ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 3.20GHz (3200.13-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf41 Stepping = 1 Features=0xbfebfbff Features2=0x641d> AMD Features=0x20000000 Hyperthreading: 2 logical CPUs real memory = 1073688576 (1023 MB) avail memory = 1041752064 (993 MB) ioapic1: Changing APIC ID to 9 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard ioapic2 irqs 48-71 on motherboard ioapic3 irqs 72-95 on motherboard npx0: [FAST] npx0: on motherboard npx0: INT 16 interface acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x908-0x90b on acpi0 cpu0: on acpi0 pcib0: on acpi0 pci0: on pcib0 pcib1: at device 2.0 on pci0 pci5: on pcib1 pcib2: at device 0.0 on pci5 pci6: on pcib2 isp0: port 0x6000-0x60ff mem 0xfdef0000-0xfdef0fff irq 48 at device 1.0 on pci6 isp0: [GIANT-LOCKED] pcib3: at device 0.2 on pci5 pci9: on pcib3 em0: port 0x7000-0x703f mem 0xfdfe0000-0xfdffffff,0xfdf80000-0xfdfbffff irq 76 at device 1.0 on pci9 em0: Ethernet address: 00:11:0a:56:57:9e em1: port 0x7040-0x707f mem 0xfdf60000-0xfdf7ffff irq 77 at device 1.1 on pci9 em1: Ethernet address: 00:11:0a:56:57:9f ciss0: port 0x7400-0x74ff mem 0xfdf50000-0xfdf51fff,0xfdf00000-0xfdf3ffff irq 72 at device 2.0 on pci9 ciss0: [GIANT-LOCKED] pcib4: at device 4.0 on pci0 pci13: on pcib4 pcib5: at device 6.0 on pci0 pci16: on pcib5 pcib6: at device 28.0 on pci0 pci2: on pcib6 pcib7: at device 2.0 on pci2 pci3: on pcib7 fxp0: port 0x5000-0x503f mem 0xfddf0000-0xfddf0fff,0xfdc00000-0xfdcfffff irq 26 at device 4.0 on pci3 miibus0: on fxp0 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:08:02:cd:d5:be fxp1: port 0x5040-0x507f mem 0xfdbf0000-0xfdbf0fff,0xfda00000-0xfdafffff irq 26 at device 5.0 on pci3 miibus1: on fxp1 inphy1: on miibus1 inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp1: Ethernet address: 00:08:02:cd:d5:bf mpt0: port 0x4000-0x40ff mem 0xfd9e0000-0xfd9fffff,0xfd9c0000-0xfd9dffff irq 24 at device 3.0 on pci2 mpt0: [GIANT-LOCKED] mpt0: MPI Version=1.2.14.0 mpt0: Unhandled Event Notify Frame. Event 0xa. mpt1: port 0x4400-0x44ff mem 0xfd9a0000-0xfd9bffff,0xfd980000-0xfd99ffff irq 25 at device 3.1 on pci2 mpt1: [GIANT-LOCKED] mpt1: MPI Version=1.2.14.0 mpt1: Unhandled Event Notify Frame. Event 0xa. uhci0: port 0x2000-0x201f irq 16 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0x2020-0x203f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered pci0: at device 29.4 (no driver attached) pci0: at device 29.5 (no driver attached) ehci0: mem 0xfbee0000-0xfbee03ff irq 23 at device 29.7 on pci0 ehci0: [GIANT-LOCKED] usb2: EHCI version 1.0 usb2: companion controllers, 2 ports each: usb0 usb1 usb2: on ehci0 usb2: USB revision 2.0 uhub2: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub2: 4 ports with 4 removable, self powered pcib8: at device 30.0 on pci0 pci1: on pcib8 bge0: mem 0xfd8f0000-0xfd8fffff irq 17 at device 2.0 on pci1 miibus2: on bge0 brgphy0: on miibus2 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge0: Ethernet address: 00:15:60:0b:09:b4 pci1: at device 3.0 (no driver attached) pci1: at device 4.0 (no driver attached) isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x500-0x50f at device 31.1 on pci0 ata0: on atapci0 ata1: on atapci0 acpi_tz0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] ppc0: port 0x378-0x37f,0x778-0x77d irq 7 drq 0 on acpi0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: on ppc0 plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 sio0: port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A fdc0: port 0x3f2-0x3f5 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 pmtimer0 on isa0 orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xc87ff,0xc8800-0xc9fff,0xca000-0xcdfff,0xee000-0xeffff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 3200131784 Hz quality 800 Timecounters tick every 1.000 msec ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding disabled, default to deny, logging unlimited acd0: CDROM at ata1-master UDMA33 Waiting 5 seconds for SCSI devices to settle sa0 at mpt0 bus 0 target 0 lun 0 sa0: Removable Sequential Access SCSI-4 device sa0: 160.000MB/s transfers (80.000MHz, offset 126, 16bit) pass0 at isp0 bus 0 target 125 lun 0 pass0: Fixed Storage Array SCSI-4 device pass0: 200.000MB/s transfers, Tagged Queueing Enabled da2 at ciss0 bus 0 target 0 lun 0 da2: Fixed Direct Access SCSI-0 device da2: 135.168MB/s transfers da2: 69459MB (142253280 512 byte sectors: 255H 32S/T 17433C) da0 at isp0 bus 0 target 125 lun 1 da0: Fixed Direct Access SCSI-4 device da0: 200.000MB/s transfers, Tagged Queueing Enabled da0: 555714MB (1138103296 512 byte sectors: 255H 63S/T 70843C) da1 at isp0 bus 0 target 125 lun 2 da1: Fixed Direct Access SCSI-4 device da1: 200.000MB/s transfers, Tagged Queueing Enabled da1: 277850MB (569038365 512 byte sectors: 255H 63S/T 35421C) Trying to mount root from ufs:/dev/da2s1a em0: link state changed to UP em1: link state changed to UP --0ntfKIWw70PvrIHh--