Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Jun 2009 04:28:38 -0700
From:      Kip Macy <kmacy@freebsd.org>
To:        Kip Macy <kmacy@freebsd.org>, freebsd-stable@freebsd.org
Subject:   Re: ZFS weird device tasting loop since MFC
Message-ID:  <3c1674c90906050428mafb5760gc706e879193345e0@mail.gmail.com>
In-Reply-To: <20090605084423.GA1609@acme.spoerlein.net>
References:  <20090602091610.GE93344@acme.spoerlein.net> <20090602092408.GF93344@acme.spoerlein.net> <20090605084423.GA1609@acme.spoerlein.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Must be a weird geom interaction. I don't see this with raw disk. I'll
look at it eventually but UMA and performance are further up in the
queue.

-Kip

On Fri, Jun 5, 2009 at 1:44 AM, Ulrich Sp=F6rlein<uqs@spoerlein.net> wrote:
> On Tue, 02.06.2009 at 11:24:08 +0200, Ulrich Sp=F6rlein wrote:
>> On Tue, 02.06.2009 at 11:16:10 +0200, Ulrich Sp=F6rlein wrote:
>> > Hi all,
>> >
>> > so I went ahead and updated my ~7.2 file server to the new ZFS goodnes=
s,
>> > and before running any further tests, I already discovered something
>> > weird and annoying.
>> >
>> > I'm using a mirror on GELI, where one disk is usually *not* attached a=
s
>> > a means of poor man's backup. (I had to go that route, as send/recv of
>> > snapshots frequently deadlocked the system, whereas a mirror scrubbing
>> > did not)
>> >
>> > root@coyote:~# zpool status
>> > =A0 pool: tank
>> > =A0state: DEGRADED
>> > status: The pool is formatted using an older on-disk format. =A0The po=
ol can
>> > =A0 =A0 =A0 =A0 still be used, but some features are unavailable.
>> > action: Upgrade the pool using 'zpool upgrade'. =A0Once this is done, =
the
>> > =A0 =A0 =A0 =A0 pool will no longer be accessible on older software ve=
rsions.
>> > =A0scrub: none requested
>> > config:
>> >
>> > =A0 =A0 =A0 =A0 NAME =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0STATE =
=A0 =A0 READ WRITE CKSUM
>> > =A0 =A0 =A0 =A0 tank =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0DEGRAD=
ED =A0 =A0 0 =A0 =A0 0 =A0 =A0 0
>> > =A0 =A0 =A0 =A0 =A0 mirror =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0DEGRADED=
 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0
>> > =A0 =A0 =A0 =A0 =A0 =A0 ad4.eli =A0 =A0 =A0 =A0 =A0 =A0 =A0 ONLINE =A0=
 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0
>> > =A0 =A0 =A0 =A0 =A0 =A0 12333765091756463941 =A0REMOVED =A0 =A0 =A00 =
=A0 =A0 0 =A0 =A0 0 =A0was /dev/da0.eli
>> >
>> > errors: No known data errors
>> >
>> > When imported, there is a constant "tasting" of all devices in the sys=
tem,
>> > which also makes the floppy drive go spinning constantly, which is rea=
lly
>> > annoying. It did not do this with the old ZFS, are there any remedies?
>> >
>> > gstat(8) is displaying the following every other second, together with=
 a
>> > spinning fd0 drive.
>> >
>> > dT: 1.010s =A0w: 1.000s =A0filter: ^...$
>> > =A0L(q) =A0ops/s =A0 =A0r/s =A0 kBps =A0 ms/r =A0 =A0w/s =A0 kBps =A0 =
ms/w =A0 %busy Name
>> > =A0 =A0 0 =A0 =A0 =A00 =A0 =A0 =A00 =A0 =A0 =A00 =A0 =A00.0 =A0 =A0 =
=A00 =A0 =A0 =A00 =A0 =A00.0 =A0 =A00.0| fd0
>> > =A0 =A0 0 =A0 =A0 =A08 =A0 =A0 =A08 =A0 1014 =A0 =A00.1 =A0 =A0 =A00 =
=A0 =A0 =A00 =A0 =A00.0 =A0 =A00.1| md0
>> > =A0 =A0 0 =A0 =A0 32 =A0 =A0 32 =A0 4055 =A0 =A09.2 =A0 =A0 =A00 =A0 =
=A0 =A00 =A0 =A00.0 =A0 29.2| ad0
>> > =A0 =A0 0 =A0 =A0 77 =A0 =A0 10 =A0 1267 =A0 =A07.1 =A0 =A0 63 =A0 112=
5 =A0 =A02.3 =A0 31.8| ad4
>> >
>> > There is no activity going on, especially md0 is for /tmp, yet it
>> > constantly tries to read stuff from everywhere. I will now insert the
>> > second drive and see if ZFS shuts up then ...
>>
>> It does, but it also did not start resilvering the second disk:
>>
>> root@coyote:~# zpool status
>> =A0 pool: tank
>> =A0state: ONLINE
>> status: One or more devices has experienced an unrecoverable error. =A0A=
n
>> =A0 =A0 =A0 =A0 attempt was made to correct the error. =A0Applications a=
re unaffected.
>> action: Determine if the device needs to be replaced, and clear the erro=
rs
>> =A0 =A0 =A0 =A0 using 'zpool clear' or replace the device with 'zpool re=
place'.
>> =A0 =A0see: http://www.sun.com/msg/ZFS-8000-9P
>> =A0scrub: none requested
>> config:
>>
>> =A0 =A0 =A0 =A0 NAME =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM
>> =A0 =A0 =A0 =A0 tank =A0 =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =
=A0 0
>> =A0 =A0 =A0 =A0 =A0 mirror =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =
=A0 0
>> =A0 =A0 =A0 =A0 =A0 =A0 ad4.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =
=A0 0
>> =A0 =A0 =A0 =A0 =A0 =A0 da0.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =
=A016
>>
>> errors: No known data errors
>>
>> Will now run the scrub and report back in 6-9h.
>
> Another datapoint: While the floppy-tasting has stopped, since the mirror=
 sees
> all devices again, there is some other problem here:
>
> root@coyote:/# zpool online tank da0.eli
> root@coyote:/# zpool status
> =A0pool: tank
> =A0state: ONLINE
> =A0scrub: resilver completed after 0h0m with 0 errors on Fri Jun =A05 10:=
21:36 2009
> config:
>
> =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM
> =A0 =A0 =A0 =A0tank =A0 =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =
=A0 0
> =A0 =A0 =A0 =A0 =A0mirror =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 =
0
> =A0 =A0 =A0 =A0 =A0 =A0ad4.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 =
0 =A0684K resilvered
> =A0 =A0 =A0 =A0 =A0 =A0da0.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 =
0 =A02.20M resilvered
>
> errors: No known data errors
> root@coyote:/# zpool offline tank da0.eli
> root@coyote:/# zpool status
> =A0pool: tank
> =A0state: DEGRADED
> status: One or more devices has been taken offline by the administrator.
> =A0 =A0 =A0 =A0Sufficient replicas exist for the pool to continue functio=
ning in a
> =A0 =A0 =A0 =A0degraded state.
> action: Online the device using 'zpool online' or replace the device with
> =A0 =A0 =A0 =A0'zpool replace'.
> =A0scrub: resilver completed after 0h0m with 0 errors on Fri Jun =A05 10:=
21:36 2009
> config:
>
> =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM
> =A0 =A0 =A0 =A0tank =A0 =A0 =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 =
0
> =A0 =A0 =A0 =A0 =A0mirror =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 0
> =A0 =A0 =A0 =A0 =A0 =A0ad4.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 =
0 =A0684K resilvered
> =A0 =A0 =A0 =A0 =A0 =A0da0.eli =A0OFFLINE =A0 =A0 =A00 =A0 =A0 0 =A0 =A0 =
0 =A02.20M resilvered
>
> errors: No known data errors
> root@coyote:/# zpool status
> =A0pool: tank
> =A0state: DEGRADED
> status: One or more devices has experienced an unrecoverable error. =A0An
> =A0 =A0 =A0 =A0attempt was made to correct the error. =A0Applications are=
 unaffected.
> action: Determine if the device needs to be replaced, and clear the error=
s
> =A0 =A0 =A0 =A0using 'zpool clear' or replace the device with 'zpool repl=
ace'.
> =A0 see: http://www.sun.com/msg/ZFS-8000-9P
> =A0scrub: resilver completed after 0h0m with 0 errors on Fri Jun =A05 10:=
21:36 2009
> config:
>
> =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM
> =A0 =A0 =A0 =A0tank =A0 =A0 =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 =
0
> =A0 =A0 =A0 =A0 =A0mirror =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 0
> =A0 =A0 =A0 =A0 =A0 =A0ad4.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 =
0 =A0684K resilvered
> =A0 =A0 =A0 =A0 =A0 =A0da0.eli =A0OFFLINE =A0 =A0 =A00 =A0 339 =A0 =A0 0 =
=A02.20M resilvered
>
> errors: No known data errors
> root@coyote:/# zpool status
> =A0pool: tank
> =A0state: DEGRADED
> status: One or more devices has been taken offline by the administrator.
> =A0 =A0 =A0 =A0Sufficient replicas exist for the pool to continue functio=
ning in a
> =A0 =A0 =A0 =A0degraded state.
> action: Online the device using 'zpool online' or replace the device with
> =A0 =A0 =A0 =A0'zpool replace'.
> =A0scrub: resilver completed after 0h0m with 0 errors on Fri Jun =A05 10:=
21:36 2009
> config:
>
> =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM
> =A0 =A0 =A0 =A0tank =A0 =A0 =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 =
0
> =A0 =A0 =A0 =A0 =A0mirror =A0 =A0 DEGRADED =A0 =A0 0 =A0 =A0 0 =A0 =A0 0
> =A0 =A0 =A0 =A0 =A0 =A0ad4.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 =
0 =A0684K resilvered
> =A0 =A0 =A0 =A0 =A0 =A0da0.eli =A0OFFLINE =A0 =A0 =A00 =A0 =A0 0 =A0 =A0 =
0 =A02.20M resilvered
>
> errors: No known data errors
>
>
> So I ran 'zpool status' thrice after the offline, and the second one repo=
rts
> write errors on the OFFLINE device (WTF?). Running zpool status in a loop=
, this
> will constantly show up and then vanish again.
>
> I also get constant write requests to the remaining device, even though n=
o
> applications are accessing it. What the hell is ZFS trying to do here?
>
> root@coyote:/# zpool iostat 1
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 capacity =A0 =A0 operations =A0 =A0bandwidth
> pool =A0 =A0 =A0 =A0 used =A0avail =A0 read =A0write =A0 read =A0write
> ---------- =A0----- =A0----- =A0----- =A0----- =A0----- =A0-----
> tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0246 =A056.8K =A01.=
53M
> tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0249 =A055.9K =A01.=
55M
> tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0250 =A055.0K =A01.=
54M
> tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0252 =A054.1K =A01.=
56M
> tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0254 =A053.3K =A01.=
57M
> tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A08 =A0 =A0253 =A052.5K =A01.=
56M
> tank =A0 =A0 =A0 =A0 883G =A048.4G =A0 =A0 =A07 =A0 =A0255 =A051.7K =A01.=
57M
> ^C
>
> Again, WTF? Can someone please enlighten me here?
>
> Cheers,
> Ulrich Sp=F6rlein
> --
> http://www.dubistterrorist.de/
>



--=20
When bad men combine, the good must associate; else they will fall one
by one, an unpitied sacrifice in a contemptible struggle.

    Edmund Burke



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3c1674c90906050428mafb5760gc706e879193345e0>