Date: Mon, 16 Oct 2000 11:47:23 +0200 From: Andre Albsmeier <andre.albsmeier@mchp.siemens.de> To: dbhague@allstor-sw.co.uk Cc: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>, freebsd-scsi@FreeBSD.org, freebsd-fs@FreeBSD.org, smcintyre@allstor-sw.co.uk Subject: Re: Stressed SCSI subsystem locks up the system Message-ID: <20001016114723.A22193@curry.mchp.siemens.de> In-Reply-To: <8025697A.00340E6C.00@mail.plasmon.co.uk>; from dbhague@allstor-sw.co.uk on Mon, Oct 16, 2000 at 10:28:34AM %2B0100 References: <8025697A.00340E6C.00@mail.plasmon.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 16-Oct-2000 at 10:28:34 +0100, dbhague@allstor-sw.co.uk wrote: > Andre, > What were your SCSI errors ? Oct 13 10:05:28 <kern.crit> server /ktry: (da2:ahc0:0:2:0): data overrun detected in Data-out phase. +Tag == 0xe. Oct 13 10:09:40 <kern.crit> server /ktry: (da2:ahc0:0:2:0): Have seen Data Phase. Length = 65536. +NumSGs = 16. These appeared with the 3940AU. When replacing it with two 2940 everything worked great for several days now. I am keeping it this way now. When Justin does some driver changes, I will try my 3940AU again... -Andre > > We have one system that has now run for five days without failure. Today we > will start to deconstruct this unit, any advice would be welcome. > > We also ran five system over the weekend and all but the one, the IDE system, > failed. > These were: > A repeat of the passing system above, failed with > Bad blocks 135666304, inode 5142534 > 6 seconds later, Bad blocks 135666304, inode 5634466 > then, panic ffs_blkfree: freeing free frag, this is on the /RAID partition. > Test run against an IDE disk, still running but slowly > Test run against a SCSI disk > Test run using a Symbios dual SCSI card, > Test running FreeBSD 3.0 > > Two of the above tests have got struck in iowait, for example. > root 451 0.0 0.1 368 172 p0 D Fri06PM 0:17.77 rm -rf /RAID/5 > root 454 0.0 0.2 368 196 p0 D Fri06PM 0:17.85 rm -rf /RAID/7 > root 455 0.0 0.2 368 196 p0 D Fri06PM 0:17.42 rm -rf /RAID/1 > root 457 0.0 0.2 368 196 p0 D Fri06PM 0:17.44 rm -rf /RAID/2 > root 459 0.0 0.2 368 196 p0 D Fri06PM 0:17.71 rm -rf /RAID/6 > root 461 0.0 0.2 368 196 p0 D Fri06PM 0:17.10 rm -rf /RAID/4 > root 463 0.0 0.2 368 196 p0 D Fri06PM 0:17.56 rm -rf /RAID/3 > > Just a few minutes ago cron started to die with a signal 10, we don't think this > is relevant but... > Oct 16 09:55:02 birch /kernel: pid 3551 (cron), uid 0: exited on signal 10 (core > dumped) > Oct 16 10:00:00 birch /kernel: pid 3555 (cron), uid 0: exited on signal 10 (core > dumped) > Oct 16 10:00:00 birch /kernel: pid 3556 (cron), uid 0: exited on signal 10 (core > dumped) > Oct 16 10:05:01 birch /kernel: pid 3558 (cron), uid 0: exited on signal 10 (core > dumped) > Oct 16 10:10:00 birch /kernel: pid 3560 (cron), uid 0: exited on signal 10 (core > dumped) > Oct 16 10:15:00 birch /kernel: pid 3562 (cron), uid 0: exited on signal 10 (core > dumped) > Oct 16 10:20:00 birch /kernel: pid 3564 (cron), uid 0: exited on signal 10 (core > dumped) > > Regards Dave > -- Micro$oft: Which virus will you get today? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20001016114723.A22193>