Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Sep 2004 09:00:36 -0400
From:      "Carroll Kong" <me@carrollkong.com>
To:        "Jason Thomson" <jason.thomson@mintel.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: 3ware 7506,  FreeBSD 4.x,  Maxtor Disks & SMART Problems.
Message-ID:  <1a3a01c4966d$0332f790$0200a8c0@athena>
References:  <40CD80F1.6020107@mintel.com><52270.24.8.51.173.1087217572.squirrel@webmail.liquidneon.com><40CDAA9A.1090507@mintel.com> <17b501c495d9$84af2850$0200a8c0@athena> <19db01c49613$b3054ab0$0200a8c0@athena> <41402CA2.90801@mintel.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I really appreciate your timely feedback.  In a weird way, it's good to know
you found the problem because we have been hunting high and low for a reason
for the crashes and it's scary when there are so many possibilities and zero
panics.  Of course you get a lot of those "it works fine for me, it's
[insert every other possibility] instead".

I am a bit frightened to try to reproduce the problem, but I think we will
do it when we have a chance to once again revalidate we have the same
problem.  Once again, thanks, this will help a lot in reaffirming it's our
disks.

I do not get the sector repair occurred error, but then again I am running a
very old firmware compared to you.  I would like to believe that my firmware
is probably not throwing up as many diagnostics.

> I think that the disk I have on port 3 is flakey.  I could replace the
> disk,  but I'm waiting until 3ware get back to me / issue a fix so I can
> have some reasonable idea that the problem is fixed.

Yeah, I get a feeling that the handful of people who do have maxtor+3ware
disks are going to eventually get bitten by this.  Even though you
'resolved' 2 of your systems fix disk fixes, I bet once the disks start
getting some minor errors you will probably re-experience the problem again.

I do typically get a lot of these errors across every port
twe0: AEN: <twe0: port 0: ATA UDMA downgrade>

But I spoke to Mike Smith about that a while ago and he said it wasn't a big
deal or that if anything it's a cabling issue.  The cabling issue would have
to be the IDE backplane we are using which only supports UDMA 66 or
something and the disks are UDMA 100 I believe?  I have been getting those
errors intermittently for a while, so I doubt that's it.

And finally, no matter what, (and as others have posted before) even if the
disk was failing to some degree it should never hard fail like that.
Somewhat defeats the purpose of a RAID.

Well I think this is good enough for me to warrant a swap out of the disks.
We are on a low budget but need to be stable again as soon as possible.

Thanks again, I really appreciate your help in narrowing this issue down.



- Carroll Kong
----- Original Message -----
From: "Jason Thomson" <jason.thomson@mintel.com>
To: "Carroll Kong" <me@carrollkong.com>
Cc: <freebsd-stable@freebsd.org>
Sent: Thursday, September 09, 2004 6:12 AM
Subject: Re: 3ware 7506, FreeBSD 4.x, Maxtor Disks & SMART Problems.


> Hi Carroll,
>
> I posted the original problem report you referred to earlier.
>
> 3ware are looking into the problem.  It looks like it's a problem with
> 3ware's firmware (perhaps related to some anomaly in the way that Maxtor
> disks behave).
>
> It would appear that it's only a problem when the disk has errors.
>
> On one machine,  I can reproduce this problem by dd'ing from the RAID5
> array:
>
> dd if=/dev/twed0s1h iseek=137510 bs=1m of=/dev/null
>
> On that machine I have the lockup will *always* be preceded by the
> following message on the console:
>
> twe0: AEN: <twe0: port 3: sector repair occurred>
>
> Do you have any error messages on the console?
>
>
> I think that the disk I have on port 3 is flakey.  I could replace the
> disk,  but I'm waiting until 3ware get back to me / issue a fix so I can
> have some reasonable idea that the problem is fixed.
>
> 3ware *have* been looking into this problem,  and I think have
> established that it's a firmware rather than a driver issue (it occurs
> with other OSes as well apparently).  I don't know how close they are to
> being able to fix this.
>
> We buy all our new machines with Western Digital disks (and 3ware
> controllers).  No problems yet (and we have about 10 of them - more than
> we have with Maxtor disks).
>
> (BTW I have established over a period of months that this problems
> existed with various versions of the driver and firmware dating back to
> 2003.  It still exists with the latest FBSD driver and 3ware firmware:
> FE7X 1.05.00.068)
>
> Carroll Kong wrote:
> > I tried using the SmartD 5.33 (CVS).  It appeared to work, but did not
pick
> > up anything in the next crash.  I noticed some temperature changes, and
I
> > plan on running some difference tests, but nothing out of the ordinary.
> >
> > This time the crash hung a lot of httpds and got them stuck into the D
> > state.  We had something like this happen before ... but now that I
think
> > about it, it matches the experience of Jason almost perfectly.
> >
> > Upon lockup, sometimes we still have partial control of the system.  The
> > processes waiting on the 3ware card cannot be killed.  The web sites
that
> > are still in cache are servable.
> >
> > It occurred when a big I/O request was going through (along with the
normal
> > web traffic).  The odd thing is, it's not a function of raw I/O, since
our
> > definition of big I/O was simply 3-4MB/sec according to iostat.  It
seems
> > over time it just... well it just goes kaput if you push it a bit hard
after
> > a long days run of non-stop I/Os.
> >
> > The initial fsck we do runs at 17MB/sec at far more transactions per
second.
> > Anyway, I am convinced the problem is somehow related to the 3ware
system
> > (either the disks, the controller or something).  Originally I was
looking
> > at other possibilities, but seeing people's experiences here, and a
> > colleague of mine's experience, something fishy is going on.
> >
> > I am leaning towards a full hdd swap, seems like I will have to replace
one
> > disk at a time and let it rebuild slowly to eventually swap out all the
> > disks.  I am able to get this problem to occur faster and faster now,
> > unfortunately it is a production box and we would much rather it not.
And I
> > am going to switch off to Seagate instead of Maxtor.  Despite using
> > 3ware+maxtor on other machines here, (but they have considerably less
load),
> > it's just too much of a coincidence that 3 different people including
myself
> > have had problems with 3ware+maxtor whereas you can easily find that
many
> > and more that have it working fine with another vendor.
> >
> >
> >
> > - Carroll Kong
> > ----- Original Message -----
> > From: "Carroll Kong" <me@carrollkong.com>
> > To: "Jason Thomson" <jason.thomson@mintel.com>; <so14k@so14k.com>
> > Cc: <vkayshap@amcc.com>; <freebsd-stable@freebsd.org>
> > Sent: Wednesday, September 08, 2004 3:24 PM
> > Subject: Re: 3ware 7506, FreeBSD 4.x, Maxtor Disks & SMART Problems.
> >
> >
> >
> >>Hi, in reference to this
> >>http://lists.freebsd.org/pipermail/freebsd-stable/2004-June/007828.html
> >>
> >>I have a FreeBSD 4.10-p2 system, using a 7450 with 4xMAXTOR 6L080J4  (80
> >>gig) disks.
> >>
> >>Raid 5 setup.
> >>
> >>      Monitor version: ME7X 1.01.00.035
> >>      Firmware version: FE7X 1.05.00.036
> >>      BIOS version: BE7X 1.08.00.044
> >>
> >>
> >>(Firmware 7.5.3 basically).
> >>
> >>I am also having the same problems you are having.  Randomly under heavy
> >
> > I/O
> >
> >>the system will just halt I/O requests.  No error messages on the
console,
> >>it would just start to hang and halt completely.  (no kernel panics at
> >
> > all).
> >
> >>I believe I have the same problem you do.  Were you able to resolve the
> >>issue or narrow it down?  The machine is not local, but I am curious if
> >
> > you
> >
> >>did resolve it, what version of FreeBSD did you have?  What firmware?
And
> >>did you have to do the powermax testing on all the disks or not?
> >>
> >>I cannot easily do the powermax testing yet, and my firmware is older
and
> >
> > I
> >
> >>am still running into this problem (which should have all the twe driver
> >>fixes).
> >>
> >>I tried using "Smartmontools" to verify if the Maxtor disks are okay
since
> >>they only work for Linux + 3Ware.
> >>
> >>Thanks in advance!
> >>
> >>
> >>
> >>- Carroll Kong
> >>
> >>
> >>
> >>_______________________________________________
> >>freebsd-stable@freebsd.org mailing list
> >>http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> >>To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org"
> >>
> >
> >
> > _______________________________________________
> > freebsd-stable@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org"
> >
> >
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1a3a01c4966d$0332f790$0200a8c0>