Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 30 Jan 2013 23:51:45 +0200
From:      "Vladislav Prodan" <universite@ukr.net>
To:        "Beeblebrox" <zaphod@berentweb.com>
Cc:        freebsd-fs@freebsd.org, freebsd-current@freebsd.org
Subject:   Re[2]: Re[2]: AHCI timeout when using ZFS + AIO + NCQ
Message-ID:  <87448.1359582705.624376220320202752@ffe17.ukr.net>
In-Reply-To: <1359317924363-5781425.post@n5.nabble.com>
References:  <70362.1359299605.3196836531757973504@ffe11.ukr.net> <16B555759C2041ED8185DF478193A59D@multiplay.co.uk> <917933DB5C9A490D93A739058C2507A1@multiplay.co.uk> <93308.1359297551.14145052969567453184@ffe15.ukr.net> <13391.1359029978.3957795939058384896@ffe16.ukr.net> <70578.1359313319.18126575192049975296@ffe16.ukr.net> <221B307551154F489452F89E304CA5F7@multiplay.co.uk> <1359317924363-5781425.post@n5.nabble.com>

next in thread | previous in thread | raw e-mail | index | archive | help



> I once ran into a very severe AHCI timeout problem. After months of trying to
> figure it out and insane "Hardware_ECC_Recovered" error values, I found that
> the error was with the power connector plug / sata HDD interface. All errors
> disappeared after replacing that cable. Since you have error on more than 1
> HDD, I suggest:
> 1. Check smartctl output for each AND all HDD
> 2. Check whether your power supply unit is still healthy or if it is
> supplying inconsistent power.
> 3. Check the main power supply line and whether it shows any voltage
> fluctuations or if there is a new heavy consumer of amps on the same power
> line as the server is plugged to.
> 
> 

I've deliberately chose a different server that has a different chipset, and that there were no problems with the HDD.

Added kernel support:
device ahci # AHCI-compatible SATA controllers

And now, after 2.5 days fell off one HDD.

[3:14]beastie:root->/root# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: none requested
config:

        NAME                     STATE     READ WRITE CKSUM
        tank                     DEGRADED     0     0     0
          mirror-0               ONLINE       0     0     0
            gpt/disk0            ONLINE       0     0     0
            gpt/disk2            ONLINE       0     0     0
          mirror-1               DEGRADED     0     0     0
            gpt/disk1            ONLINE       0     0     0
            4931885954389536913  REMOVED      0     0     0  was /dev/gpt/disk3

errors: No known data errors


Jan 30 09:49:28 beastie kernel: ahcich3: Timeout on slot 29 port 0
Jan 30 09:49:28 beastie kernel: ahcich3: is 00000000 cs 20000000 ss 00000000 rs 20000000 tfd c0 serr 00000000 cmd 0004dd17
Jan 30 09:49:28 beastie kernel: (ada3:ahcich3:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
Jan 30 09:49:28 beastie kernel: (ada3:ahcich3:0:0:0): CAM status: Command timeout
Jan 30 09:49:28 beastie kernel: (ada3:ahcich3:0:0:0): Retrying command
Jan 30 09:51:31 beastie kernel: ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Jan 30 09:51:31 beastie kernel: ahcich3: Timeout on slot 29 port 0
Jan 30 09:51:31 beastie kernel: ahcich3: is 00000000 cs 20000000 ss 00000000 rs 20000000 tfd 80 serr 00000000 cmd 0004dd17
Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): CAM status: Command timeout
Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): Error 5, Retry was blocked
Jan 30 09:51:31 beastie kernel: ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Jan 30 09:51:31 beastie kernel: ahcich3: Timeout on slot 29 port 0
Jan 30 09:51:31 beastie kernel: ahcich3: is 00000000 cs 00000000 ss 00000000 rs 20000000 tfd 58 serr 00000000 cmd 0004dd17
Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): CAM status: Command timeout
Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): Error 5, Retry was blocked
Jan 30 09:51:31 beastie kernel: (ada3:ahcich3:0:0:0): lost device
Jan 30 09:51:31 beastie kernel: (pass3:ahcich3:0:0:0): passdevgonecb: devfs entry is gone


-- 
Vladislav V. Prodan            
System & Network Administrator 
http://support.od.ua           
+380 67 4584408, +380 99 4060508
VVP88-RIPE




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?87448.1359582705.624376220320202752>