Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Dec 2009 22:11:52 +1300
From:      Phil Murray <pmurray@nevada.net.nz>
To:        Phil Murray <pmurray@nevada.net.nz>
Cc:        "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, Alexander Zagrebin <alexz@visp.ru>
Subject:   Re: 8.0-RELEASE: disk IO temporarily hangs up (ZFS or ATA related problem)
Message-ID:  <6FAA390A-1E40-4D7A-AAD5-DC72578CE974@nevada.net.nz>
In-Reply-To: <4C1C2598-4157-4B04-8DB8-C84F353AB8B8@nevada.net.nz>
References:  <39309F560B98453EBB9AEA0F29D9D80E@vosz.local> <4B2A341C.5000802@clearchain.com> <6D3B0162A2134CAEA9F4DF5BC03707AA@vosz.local> <4C1C2598-4157-4B04-8DB8-C84F353AB8B8@nevada.net.nz>

next in thread | previous in thread | raw e-mail | index | archive | help

On 18/12/2009, at 9:39 PM, Phil Murray wrote:

>=20
>=20
> On 18/12/2009, at 9:15 PM, "Alexander Zagrebin" <alexz@visp.ru> wrote:
>=20
>> Big thanks for your reply!
>>=20
>>>> I use onboard ICH7 SATA controller with two disks attached:
>>>>=20
>>>> atapci1:<Intel ICH7 SATA300 controller>  port
>>>>=20
>>> 0x30c8-0x30cf,0x30ec-0x30ef,0x30c0-0x30c7,0x30e8-0x30eb,0x30a0
>>> -0x30af irq 19
>>>> at device 31.2 on pci0
>>>> atapci1: [ITHREAD]
>>>> ata2:<ATA channel 0>  on atapci1
>>>> ata2: [ITHREAD]
>>>> ata3:<ATA channel 1>  on atapci1
>>>> ata3: [ITHREAD]
>>>> ad4: 1430799MB<Seagate ST31500541AS CC34>  at ata2-master SATA150
>>>> ad6: 1430799MB<WDC WD15EADS-00P8B0 01.00A01>  at ata3-master =
SATA150
>>>>=20
>>>> The disks are used for mirrored ZFS pool.
>>>> I have noticed that the system periodically locks up on
>>> disk operations.
>>>> After approx. 10 min of very slow disk i/o (several KB/s)
>>> the speed of disk
>>>> operations restores to normal.
>>>> gstat has shown that the problem is in ad6.
>>>> For example, there is a filtered output of iostat -x 1:
>>>>=20
>>>>                        extended device statistics
>>>> device     r/s   w/s    kr/s    kw/s wait svc_t  %b
>>>> ad6      985.1   0.0  5093.9     0.0    0   0.2  23
>>>> ad6      761.8   0.0  9801.3     0.0    1   0.4  31
>>>> ad6      698.7   0.0  9215.1     0.0    0   0.4  30
>>>> ad6      434.2 513.9  5903.1 13658.3   48  10.2  55
>>>> ad6        3.0 762.8   191.2 28732.3    0  57.6  99
>>>> ad6       10.0   4.0   163.9     4.0    1   1.6   2
>>>>=20
>>>> Before this line we have a normal operations.
>>>> Then the behaviour of ad6 changes (pay attention to high
>>> average access time
>>>> and percent of "busy" significantly greater than 100):
>>>>=20
>>>> ad6        0.0   0.0     0.0     0.0    1   0.0   0
>>>> ad6        1.0   0.0     0.5     0.0    1 1798.3 179
>>>> ad6        1.0   0.0     1.5     0.0    1 1775.4 177
>>>> ad6        0.0   0.0     0.0     0.0    1   0.0   0
>>>> ad6       10.0   0.0    75.2     0.0    1 180.3 180
>>>> ad6        0.0   0.0     0.0     0.0    1   0.0   0
>>>> ad6        1.0   0.0     2.0     0.0    1 1786.7 178
>>>> ad6        0.0   0.0     0.0     0.0    1   0.0   0
>>>>=20
>>>> And so on for about 10 minutes.
>>>> Then the disk i/o is reverted to normal:
>>>>=20
>>>> ad6      139.4   0.0  8860.5     0.0    1   4.4  61
>>>> ad6      167.3   0.0 10528.7     0.0    1   3.3  55
>>>> ad6       60.8 411.5  3707.6  8574.8    1  19.6  87
>>>> ad6      163.4   0.0 10334.9     0.0    1   4.4  72
>>>> ad6      157.4   0.0  9770.7     0.0    1   5.0  78
>>>> ad6      108.5   0.0  6886.8     0.0    0   3.9  43
>>>>=20
>>>> There are no ata error messages neither in the system log,
>>> nor on the
>>>> console.
>>>> The manufacture's diagnostic test is passed on ad6 without
>>> any errors.
>>>> The ad6 also contains swap partition.
>>>> I have tried to run several (10..20) instances of dd, which
>>> read and write
>>>> data
>>>> from and to the swap partition simultaneously, but it has
>>> not called the
>>>> lockup.
>>>> So there is a probability that this problem is ZFS related.
>>>>=20
>>>> I have been forced to switch ad6 to the offline state... :(
>>>>=20
>>>> Any suggestions on this problem?
>>>>=20
>>> I also have been experiencing the same problem with a different
>>> disk/controller (via mpt on a vmware machine). During the
>>> same period I
>>> notice that system cpu usage hits 80+% and top shows the
>>> zfskern process
>>> being the main culprit. At the same time I've discovered the
>>> kstat.zfs.misc.arcstats.memory_throttle_count sysctl rising.
>>> Arc is also
>>> normally close to the arc_max limit.
>>=20
>> My case has differences.
>> 1. CPU usage is near 0%
>> 2. zfs's sysctls doesn't change significantly during
>>  "normal operation" -> "lockup" -> "normal" transition
>> 3. ARC size is far from its limits,
>> kstat.zfs.misc.arcstats.memory_throttle_count: 0
>>=20
>> Here my actions, observations and conclusions:
>> 1. I have tried to change placements of disks on sata channels.
>>  Nothing has changed - the problems still on WD15EADS, although it =
became
>> ad4.
>>  So issue isn't in south bridge, sata cables and so on.
>> 2. I have tried to detach ad6 from the pool, to zero system area, and =
to
>> reattach it again.
>>  Of course, resilvering was started. During resilvering 250 GB was =
copied
>> without lockups
>>  and delays. While resilvering, I have tried periodically to load =
drive
>> with a read
>>  operations (dd if=3D/dev/ad6 of=3D/dev/null ...).
>>  But after resilvering and several minutes of normal mirror =
operation,
>> lockups appeared again.
>>  So drive is seems to be ok and we have a software problem?
>> 3. I have noticed that lockups often happens during postgresql =
activity.
>>  postgresql often uses sync. So I have tried to disable ZIL.
>>  No success.
>> 4. "IDE LED" is constantly on during lockups.
>>  So it is really read/write delays.
>> 5. I see two variants of zfskern's state:
>>  a) it is constantly in the vgeom:io
>>  b) it is in either zio->io_ state (when active), or in tx->tx_s =
(when
>> idle).
>>     During lockups it is mostly in zio->io_.
>>  What the difference with vgeom:io and zio->io_/tx->tx_s?
>>=20
>> May be a problem is in ata? WD15EADS is a "green" series of drives.
>=20
> The WD green drives have a feature called Time Limited Error Recovery =
where the disk can spend several minutes trying to read a bad block etc.
>=20
> It plays havoc with RAID arrays which is why WD recommend you don't =
use the green drives in arrays. They have more info about the "feature" =
in the WD FAQ/knowledgebase
>=20

Sorry, TLER is the feature that 'fixes' the problem, see:

=
http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=3D=
1397&p_created=3D1131638613&p_sid=3DvfyE1KPj&p_accessibility=3D0&p_redirec=
t=3D&p_srch=3D1&p_lva=3D&p_sp=3DcF9zcmNoPTEmcF9zb3J0X2J5PSZwX2dyaWRzb3J0PS=
ZwX3Jvd19jbnQ9MTcsMTcmcF9wcm9kcz0yMjcsMjk0JnBfY2F0cz0mcF9wdj0yLjI5NCZwX2N2=
PSZwX3BhZ2U9MSZwX3NlYXJjaF90ZXh0PXJhaWQ!&p_li=3D&p_topview=3D1


Sounds like your drive is going into the recovery procedure...



>=20
>> May be i have a problem with its power management?
>> Is there a method to completely reset sata channel and drive?
>> atacontrol reinit will do it?
>=20
>=20
>=20
>>=20
>> Any help is welcomed.
>>=20
>> --=20
>> Alexander Zagrebin
>>=20
>> _______________________________________________
>> freebsd-current@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to =
"freebsd-current-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to =
"freebsd-current-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6FAA390A-1E40-4D7A-AAD5-DC72578CE974>