Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Sep 2011 10:08:31 +0200
From:      Dennis Koegel <dk@neveragain.de>
To:        stable@freebsd.org
Subject:   System freeze: Adaptec (aac) timeouts (releng 8)
Message-ID:  <20110914080831.GB41431@neveragain.de>

next in thread | raw e-mail | index | archive | help
Cheers,

we have a reproducible system freeze due to Adaptec driver (aac) timeouts:

Sep  3 05:26:44 foo kernel: aac0: COMMAND 0xffffff80005ae4c0 (TYPE 502) TIMEOUT AFTER 129 SECONDS
Sep  3 05:26:44 foo kernel: aac0: COMMAND 0xffffff80005ac0e0 (TYPE 502) TIMEOUT AFTER 129 SECONDS
Sep  3 05:26:44 foo kernel: aac0: COMMAND 0xffffff80005b0fa0 (TYPE 502) TIMEOUT AFTER 129 SECONDS
<dozens more of these...>

Once this happens, the userland seems to be alive, but the controller is
completely dead. As soon as the disk subsystem is involved, any process
hangs forever (e.g. SSH crypto-exchange still happens, but a shell won't
even start anymore).

We observe the same issue on two systems of (mostly) identical spec, so
it's not a hardware issue.

Apparently this only happens under heavy disk i/o and high cpu load.
Notably high write throughput plus a 'zpool scrub' on a large
GELI-backed zpool usually triggers the problem after a few hours.
Without high activity, they run smooth for weeks.

Both systems are amd64 with an Adaptec 5805 controller and 16 disks (of
which two form a RAID-1 system volume (UFS), and the remaining 14 serve
as JBOD for a large zpool -- a total of 15 "aacd" devices).

Both were running 8.2R originally. I've taken them to 8-STABLE now and
also applied svn r222951 (where the MFC was forgotten, it seems), but
the problem remains.

Any help is greatly appreciated.

Thanks,
- D.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110914080831.GB41431>