Skip site navigation (1)Skip section navigation (2)
Date:      29 Jun 2006 22:25:18 +0200
From:      "Arno J. Klaassen" <arno@heho.snv.jussieu.fr>
To:        amd64@freebsd.org
Subject:   Re: SMP system not running SMP
Message-ID:  <wpu063n1zl.fsf@heho.labo>
In-Reply-To: <42450.192.168.0.10.1151509103.squirrel@webmail.sd73.bc.ca>
References:  <74DFB78C-4710-4DD2-A3DA-222BABAECE96@khera.org> <E1FvEDL-000A7v-BH@dilbert.firstcallgroup.co.uk> <20060627230716.44120c49.kgunders@teamcool.net> <42450.192.168.0.10.1151509103.squirrel@webmail.sd73.bc.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

"UEMURA (fka. MAENAKA) Tetsuya" <maenaka@pluto.dti.ne.jp> writes:

> Posted on Tue, 27 Jun 2006 15:06:51 +0100
> By default, FreeBSD couldn't start. Dumping the ahd state when probing
> the da and simply stopped. So I set the SCSI BIOS to restrict the device
> speed upto 80MB/s and the problem went away. After that, the machine
> runs flawlessly for 8 months.

I have a Tyan S2882 which I cannot get up for more than a couple of
days under moderate load, and the symptoms seem related :

config :

 - tracking -stable 
 - 8G RAM 
 - latest BIOS 3ware 9500S-12 with 1.1T data
 - RAID-1 MAXTOR ATLAS10K5_73WLS as system-disk on ahd0
 - doing nothing else than some test-scripts implying fairly
   moderate nfs-traffic (i.e. scripts via nfs, (rarely needed) data
   either on NFS or raid, scripts being CPU-intensive)


symptom :

 - systems cold-boots fine (SMP dual opteron 248)
 - runs OK for a couple of minutes/hours/days
 - then total freeze; *never* a panic in 9 months
 - warm reset either does not detect da0 or indeed dumps ahd state
   when probing it
 - even cold reboot sometimes has to be repeated once or twice in order
   to redetect correctly da0


has tried :

 - changed scsi-cables and termination three times : no deal
 - decreased device speed to 80Mhz : seems to eliminate the "minutes"
   part from "runs OK for a couple of minutes/hours/days" ...


observations :

 - this week I downloaded the latest manual from tyan and came across
   the following jumper setting (dunno if it was in the original
   version or whether I overlooked it; the printed manual is at the
   customer's site) :

    "Set PCI-X Bridge A (PCI 3 & PCI 4 & SCSI7902 & BCM5704) to operate at
     a maximum 66MHz;
     Note: Due to the PCI-X specifications it will be necessary to set
     this bus to 66MHz if a 133/100MHz PCI-X card is
     added to this bus."

   Since I do have a 100MHz PCI-X card (3ware) I set this jumper;
   system up for three days now, cannot confirm right now this was the
   culprit but other AMD811X based systems might have the same issue.

 - this board has dual ahd and dual bge :

   vmstat -i (I just rebooted for an upgrade -stable + linux_base) :

    irq24: bge0 ahd0                   16826          2
    irq25: bge1 ahd1                 1305665        157

  network is attached to bge1, disk is on ahd0.  Interestingly, when I
  provoke insane swapping, it is the "irq25:" process which consumes
  50-90%! of cpu-time, but when I stop the program provoking swapping
  and redo vmstat -i, it indeed reports slightly increased irq24
  activity but no noticeable change in irq25 activity ...
  ( I put hint.ahd.1.disabled="1" in /boot/loader.conf since
    I do not need ahd1 but that does not seem to do anything )

FYI.

I can test on this box for a couple of more weeks, feel free to
contact me for more information.

Thanx, regards, Arno

-- 

  Arno J. Klaassen

  SCITO S.A.
  8 rue des Haies
  F-75020 Paris, France
  http://scito.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?wpu063n1zl.fsf>