Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 2 Mar 2010 23:52:54 -0800
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        freebsd-stable@freebsd.org
Subject:   Re: ahcich timeouts, only with ahci, not with ataahci
Message-ID:  <20100303075254.GA47119@icarus.home.lan>
In-Reply-To: <4B8E1489.2070306@omnilan.de>
References:  <1266934981.00222684.1266922202@10.7.7.3> <4B83EFD4.8050403@FreeBSD.org> <4B8E1489.2070306@omnilan.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Mar 03, 2010 at 08:49:29AM +0100, Harald Schmalzbauer wrote:
> Alexander Motin schrieb am 23.02.2010 16:10 (localtime):
> >Harald Schmalzbauer wrote:
> >>I'm frequently getting my machine locked with ahcichX timeouts:
> >>ahcich2: Timeout on slot 0
> >>ahcich2: is 00000000 cs 00000001 ss 00000000 rs 00000001 tfd c0 serr
> >>00000000
> >>ahcich2: Timeout on slot 8
> >>ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd c0 serr
> >>00000000
> >>ahcich2: Timeout on slot 8
> >>ahcich2: is 00000000 cs fffff07f ss ffffff7f rs ffffff7f tfd c0 serr
> >>00000000
> >>...
> >
> >Looking that is (Interrupt status) is zero and `rs == cs | ss` (running
> >command bitmasks in driver and hardware), controller doesn't report
> >command completion. Looking on TFD status 0xc0 with BUSY bit set, I
> >would suppose that either disk stuck in command processing for some
> >reason, or controller missed command completion status.
> >
> >Have you noticed 30 second (default ATA timeout) pause before timeout
> >message printed? Just want to be sure that driver waited enough before
> >give up.
> >
> >>This happens when backup over GbE overloads ZFS/HDD capabilities.
> >>I reduced vfs.zfs.txg.timeout to 1 to prevent the machine from locking
> >>up almost immediately, but from it still happens.
> >>When I don't use ahci but ataahci (the old driver if I understand things
> >>correct) I also see the ZFS burst write congestion, but this doesn't
> >>lead to controller timeouts, thus blocking the machine.
> >>
> >>Sometimes the machine recovers from the disk lock, but most often I have
> >>to reboot.
> >
> >How it looks when it doesn't? Can you send me full log messages?
> 
> Hello, this morning I had a stall, but the machine recovered after
> about  one Minute. Here's what I got from the kernel:
> ahcich2: Timeout on slot 29
> ahcich2: is 00000000 cs 00000003 ss e0000003 rs e0000003 tfd c0 serr
> 00000000
> em1: watchdog timeout -- resetting
> em1: watchdog timeout -- resetting

Please provide the following output:

pciconf -lv
vmstat -i

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100303075254.GA47119>