From owner-freebsd-stable@FreeBSD.ORG Tue Mar 13 14:49:31 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id DE60D16A401 for ; Tue, 13 Mar 2007 14:49:30 +0000 (UTC) (envelope-from aw1@stade.co.uk) Received: from lon-mail-4.gradwell.net (lon-mail-4.gradwell.net [193.111.201.130]) by mx1.freebsd.org (Postfix) with ESMTP id 5D70413C469 for ; Tue, 13 Mar 2007 14:49:30 +0000 (UTC) (envelope-from aw1@stade.co.uk) Received: from alsager-adsl.stade.co.uk ([81.6.222.119] helo=access2.hanley.stade.co.uk country=GB) by lon-mail-4.gradwell.net with esmtp (Gradwell gwh-smtpd 1.243) id 45f6b075.cba6.a0 for freebsd-stable@freebsd.org; Tue, 13 Mar 2007 14:08:53 +0000 (envelope-sender ) Received: from steerpike.hanley.stade.co.uk (steerpike [192.168.1.10]) by access2.hanley.stade.co.uk (8.13.8/8.13.8) with ESMTP id l2DE8p9E082754 for ; Tue, 13 Mar 2007 14:08:51 GMT (envelope-from aw1@steerpike.hanley.stade.co.uk) Received: from steerpike.hanley.stade.co.uk (localhost [127.0.0.1]) by steerpike.hanley.stade.co.uk (8.13.8/8.13.8) with ESMTP id l2DE8m7v089488 for ; Tue, 13 Mar 2007 14:08:48 GMT (envelope-from aw1@steerpike.hanley.stade.co.uk) Received: (from aw1@localhost) by steerpike.hanley.stade.co.uk (8.13.8/8.13.8/Submit) id l2DE8m6Y089487 for freebsd-stable@freebsd.org; Tue, 13 Mar 2007 14:08:48 GMT (envelope-from aw1) Date: Tue, 13 Mar 2007 14:08:48 +0000 From: Adrian Wontroba To: freebsd-stable@freebsd.org Message-ID: <20070313140848.GA89182@steerpike.hanley.stade.co.uk> Mail-Followup-To: Adrian Wontroba , freebsd-stable@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.2i X-Operating-System: FreeBSD 6.2-STABLE Organization: Oh dear, I've joined one again. X-Virus-Scanned: ClamAV 0.88.6/2832/Tue Mar 13 12:05:08 2007 on access2.hanley.stade.co.uk X-Virus-Scanned: ClamAV 0.88.6/2832/Tue Mar 13 12:05:08 2007 on steerpike.hanley.stade.co.uk X-Virus-Status: Clean Subject: 6.2-STABLE deadlock? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: aw1@stade.co.uk List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Mar 2007 14:49:31 -0000 At work, amoungst my stable of old computers running FreeBSD, I have a Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This primarily runs Nagios and a small and lightly used MySQL database, along with a few inbound FTP transfers per minute. It has a Mylex card based disc subsystem, ruling out crash dumps. At some point during 5.5-STABLE this machine started to occasionally hang while performing its daily "application" housekeeping - closing and restarting Apache and Nagios, and dumping the database. Upgrading to 6.2-STABLE appeared to solve the problem, with no problems visible while running 1,000 cycles of the sequence which seemed to provoke the problem. cvsup for this version of the kernel and userland was run at 01:20 GMT on 06 March. However, shortly after 15:15 last Sunday afternoon the machine hung again "out of the blue". kdb diagnostics were taken some 12 hours later, and look somewhat odd. Maybe it was left to fester for too long. ps etc output at http://www.stade.co.uk/crash/console which contains boot to boot serial console output, including some output from test cycles. I'd be grateful for any expert comments on the ps etc output. Supporting stuff. [root@beastie ~/crash]# df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/mlxd0s1a 507630 70074 396946 15% / devfs 1 1 0 100% /dev /dev/mlxd0s1f 63541498 44355014 14103166 76% /home /dev/mlxd0s1e 16244334 6784900 8159888 45% /usr /dev/mlxd0s1d 1012974 117456 814482 13% /var /dev/md0 1646 32 1484 2% /home/topftp/instances /dev/md1 253678 132 233252 0% /tmp [root@beastie ~]# find /var -inum 23 -ls 23 4 -rw-r--r-- 1 daemon daemon 60 Mar 12 20:22 /var/rwho/whod.xjamesfriis Problem stopped http and FTP logging soon after 15:14 on Sunday 11, diagnostics taken and machine rebooted around 04:30 on Monday 12. 172.19.112.92 - - [11/Mar/2007:15:14:53 +0000] "GET / HTTP/1.0" 200 688 "-" "check_http/1.89 (nagios-plugins 1.4.3)"