From owner-freebsd-amd64@FreeBSD.ORG Thu Jun 29 20:43:18 2006 Return-Path: X-Original-To: amd64@freebsd.org Delivered-To: freebsd-amd64@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 70B6B16A51F for ; Thu, 29 Jun 2006 20:43:18 +0000 (UTC) (envelope-from arno@heho.snv.jussieu.fr) Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id B1A1A44C6D for ; Thu, 29 Jun 2006 20:25:21 +0000 (GMT) (envelope-from arno@heho.snv.jussieu.fr) Received: from heho.snv.jussieu.fr (heho.snv.jussieu.fr [134.157.184.22]) by shiva.jussieu.fr (8.13.6/jtpda-5.4) with ESMTP id k5TKPJ8g020528 ; Thu, 29 Jun 2006 22:25:19 +0200 (CEST) X-Ids: 165 Received: from heho.labo (localhost [127.0.0.1]) by heho.snv.jussieu.fr (8.13.3/jtpda-5.2) with ESMTP id k5TKPIfC017908 ; Thu, 29 Jun 2006 22:25:18 +0200 (MEST) Received: (from arno@localhost) by heho.labo (8.13.3/8.13.1/Submit) id k5TKPINZ017905; Thu, 29 Jun 2006 22:25:18 +0200 (MEST) (envelope-from arno) Sender: arno@heho.snv.jussieu.fr To: amd64@freebsd.org References: <74DFB78C-4710-4DD2-A3DA-222BABAECE96@khera.org> <20060627230716.44120c49.kgunders@teamcool.net> <42450.192.168.0.10.1151509103.squirrel@webmail.sd73.bc.ca> From: "Arno J. Klaassen" Date: 29 Jun 2006 22:25:18 +0200 In-Reply-To: <42450.192.168.0.10.1151509103.squirrel@webmail.sd73.bc.ca> Message-ID: Lines: 88 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (shiva.jussieu.fr [134.157.0.165]); Thu, 29 Jun 2006 22:25:20 +0200 (CEST) X-Virus-Scanned: ClamAV 0.88.2/1576/Thu Jun 29 15:22:01 2006 on shiva.jussieu.fr X-Virus-Status: Clean X-Miltered: at shiva.jussieu.fr with ID 44A4372F.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! Cc: Subject: Re: SMP system not running SMP X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jun 2006 20:43:18 -0000 "UEMURA (fka. MAENAKA) Tetsuya" writes: > Posted on Tue, 27 Jun 2006 15:06:51 +0100 > By default, FreeBSD couldn't start. Dumping the ahd state when probing > the da and simply stopped. So I set the SCSI BIOS to restrict the device > speed upto 80MB/s and the problem went away. After that, the machine > runs flawlessly for 8 months. I have a Tyan S2882 which I cannot get up for more than a couple of days under moderate load, and the symptoms seem related : config : - tracking -stable - 8G RAM - latest BIOS 3ware 9500S-12 with 1.1T data - RAID-1 MAXTOR ATLAS10K5_73WLS as system-disk on ahd0 - doing nothing else than some test-scripts implying fairly moderate nfs-traffic (i.e. scripts via nfs, (rarely needed) data either on NFS or raid, scripts being CPU-intensive) symptom : - systems cold-boots fine (SMP dual opteron 248) - runs OK for a couple of minutes/hours/days - then total freeze; *never* a panic in 9 months - warm reset either does not detect da0 or indeed dumps ahd state when probing it - even cold reboot sometimes has to be repeated once or twice in order to redetect correctly da0 has tried : - changed scsi-cables and termination three times : no deal - decreased device speed to 80Mhz : seems to eliminate the "minutes" part from "runs OK for a couple of minutes/hours/days" ... observations : - this week I downloaded the latest manual from tyan and came across the following jumper setting (dunno if it was in the original version or whether I overlooked it; the printed manual is at the customer's site) : "Set PCI-X Bridge A (PCI 3 & PCI 4 & SCSI7902 & BCM5704) to operate at a maximum 66MHz; Note: Due to the PCI-X specifications it will be necessary to set this bus to 66MHz if a 133/100MHz PCI-X card is added to this bus." Since I do have a 100MHz PCI-X card (3ware) I set this jumper; system up for three days now, cannot confirm right now this was the culprit but other AMD811X based systems might have the same issue. - this board has dual ahd and dual bge : vmstat -i (I just rebooted for an upgrade -stable + linux_base) : irq24: bge0 ahd0 16826 2 irq25: bge1 ahd1 1305665 157 network is attached to bge1, disk is on ahd0. Interestingly, when I provoke insane swapping, it is the "irq25:" process which consumes 50-90%! of cpu-time, but when I stop the program provoking swapping and redo vmstat -i, it indeed reports slightly increased irq24 activity but no noticeable change in irq25 activity ... ( I put hint.ahd.1.disabled="1" in /boot/loader.conf since I do not need ahd1 but that does not seem to do anything ) FYI. I can test on this box for a couple of more weeks, feel free to contact me for more information. Thanx, regards, Arno -- Arno J. Klaassen SCITO S.A. 8 rue des Haies F-75020 Paris, France http://scito.com