Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 09 Jan 2013 19:35:04 +0100
From:      "Ronald Klop" <ronald-freebsd8@klop.yi.org>
To:        freebsd-fs@freebsd.org
Subject:   Re: slowdown of zfs (tx->tx)
Message-ID:  <op.wqnpwqu08527sy@212-182-167-131.ip.telfort.nl>
In-Reply-To: <20130109162613.GA34276@mid.pc5.i.0x5.de>
References:  <20130108174225.GA17260@mid.pc5.i.0x5.de> <CAFqOu6jgA8RWV5d%2BrOBk8D=3Vu3yWSnDkAi1cFJ0esj4OpBy2Q@mail.gmail.com> <20130109162613.GA34276@mid.pc5.i.0x5.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 09 Jan 2013 17:26:13 +0100, Nicolas Rachinsky  
<fbsd-mas-0@ml.turing-complete.org> wrote:

> * Artem Belevich <art@freebsd.org> [2013-01-08 12:47 -0800]:
>> On Tue, Jan 8, 2013 at 9:42 AM, Nicolas Rachinsky
>> <fbsd-mas-0@ml.turing-complete.org> wrote:
>> >       NAME                      STATE     READ WRITE CKSUM
>> >         pool1                     DEGRADED     0     0     0
>> >           raidz2-0                DEGRADED     0     0     0
>> >             ada5                  ONLINE       0     0     0
>> >             ada8                  ONLINE       0     0     0
>> >             ada2                  ONLINE       0     0     0
>> >             ada3                  ONLINE       0     0     0
>> >             11846390416703086268  UNAVAIL      0     0     0  was  
>> /dev/dsk/ada1
>> >             ada6                  ONLINE       0     0     0
>> >             ada0                  ONLINE       0     0     1
>> >             ada7                  ONLINE       0     0     0
>> >             ada4                  ONLINE       0     0     3
>>
>> You seem to have some checksum errors which does suggest hardware  
>> troubles.
>
> I somehow missed these. Is there any way to learn when these checksum
> errors happen?
>
>> For starters, check smart info for all drives and see if they have any
>> relocated sectors.
>
> There are some disks with relocated sectors, but for both ada0 and
> ada4 Reallocated_Sector_Ct is 0.
>
>> Use gstat during your workload to see if any of the drives takes much
>> longer than others to handle its job.
>
> There is one disk sticking out a bit.
>
>> > There is almost no disk activity during this time.
>>
>> What kind of disk activity *is* there?
>
> What would be interesting?
>
>
>> > sync is disabled for the whole pool.
>>
>> If that's the case (assyming you're talking about sync=disabled zfs
>> property), then synchronous writes are probably not the cause of
>> slowdown. My guess would be either failing HDD or something funky with
>> cabling or sata controller.
>
> Yes, sync=disabled for pool1.
>
>
> Ok, I will start swapping hardware (sadly the machine is quite a drive
> away).
>
> Thank you very much for your help.
>
> Nicolas


If you are driving anyway replace this one:

>> >             11846390416703086268  UNAVAIL      0     0     0  was  
>> /dev/dsk/ada1

If the pool is healthy checksum errors will be noticed earlier by the  
sysadmin.

Ronald.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?op.wqnpwqu08527sy>