Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Oct 2011 00:19:26 +0100
From:      "Pegasus Mc Cleaft" <ken@mthelicon.com>
To:        "'Alexander Kabaev'" <kabaev@gmail.com>, "'C. P. Ghost'" <cpghost@cordula.ws>
Cc:        'Alexey Shuvaev' <shuvaev@physik.uni-wuerzburg.de>, freebsd-current@freebsd.org
Subject:   RE: Panics after AHCI timeouts
Message-ID:  <005e01cc94fe$dfbe3390$9f3a9ab0$@com>
In-Reply-To: <20111027185957.54ece0ad@kan.dyndns.org>
References:  <20111008201456.GA3529@lexx.ifp.tuwien.ac.at>	<20111017190027.GA9873@lexx.ifp.tuwien.ac.at>	<CAJ-Vmokbm5z3GPbKjc6_o0_Ea6u_b7twDu=xLeYpORiUpp6Z=Q@mail.gmail.com>	<20111018131353.GA83797@lexx.ifp.tuwien.ac.at>	<649509EEAEBA42D4A3DCC1FDF5DA72E5@multiplay.co.uk>	<20111025202755.4243ae74@kan.dyndns.org>	<CADGWnjX95yMEO06o%2B8xUho4Yc2-R9S=GJTWkGqvfbzDMHqCiGw@mail.gmail.com> <20111027185957.54ece0ad@kan.dyndns.org>

next in thread | previous in thread | raw e-mail | index | archive | help
>> If it's only one process, the machine (usually) doesn't hang, even 
>> when that process is copying big files back and forth for a long 
>> period of time (it's a backup process). But interleave that process 
>> with another one accessing the same disk, and poof!, almost 
>> immediately ahci timeouts. occur. Very strange... Maybe a race 
>> condition of some sort after all?
>> 
>
>No, I cannot say there is any specific correlation to IO load of the
machine, 
>timeouts I saw happen randomly and seem almost always happen as system
uptime
>crosses two weeks boundary. I am suspecting Samsung firmware at this point.

Now that's interesting as I use a mixture of Samsung, WD, and Seagate.. And
I do believe the Samsungs tend to do this more. I see ACHI timeouts from
time to time on my machine (10-Current AMD64) but normally only when I am
doing something like a scrub. The machine has never panicked as a result of
this, it normally just FAULTS the drive in the pool and keeps on going. At
that point, doing a camcontrol rescan all does not bring the drive back into
existence (it will normally just hang on that bus for 15-20 seconds and then
carry on without identifying a drive). I have to pull the drive, let it spin
down and then reinsert it. Once its reinserted, the drive comes back on the
bus and I can online it again. 

The weird thing is this.. For me, it only ever seems to be when I am writing
to the pool/disk. Pure reads don't seem to bother it. 

I don't really know at this point if the SATA ports have gone wonkey on the
motherboard, or if the processor on the HD has crashed. I almost tend to
believe it's the drive because camcontrol stops on that port almost as it if
knows there is a link there, but can't talk to it. 

Peg





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?005e01cc94fe$dfbe3390$9f3a9ab0$>