Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Jul 2019 16:18:00 +0300
From:      Daniel Braniss <danny@cs.huji.ac.il>
To:        Allan Jude <allanjude@freebsd.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: zpool errors
Message-ID:  <88CFC175-8275-4C4E-B7BE-110E07C0A31C@cs.huji.ac.il>
In-Reply-To: <05D8BD75-78B4-4336-8A8A-C84A901CB3D4@cs.huji.ac.il>
References:  <52CE32B1-7E01-4C35-A2AB-84D3D5BD4E2F@cs.huji.ac.il> <27c3e59a-07ea-5df3-9de2-302d5290a477@freebsd.org> <831204B6-3F3B-4736-89FA-1207C4C46A7E@cs.huji.ac.il> <70f1be10-e37a-de20-e188-6155fda2d06a@freebsd.org> <05D8BD75-78B4-4336-8A8A-C84A901CB3D4@cs.huji.ac.il>

next in thread | previous in thread | raw e-mail | index | archive | help


> On 11 Jul 2019, at 10:39, Daniel Braniss <danny@cs.huji.ac.il> wrote:
>=20
>=20
>=20
>> On 10 Jul 2019, at 20:23, Allan Jude <allanjude@freebsd.org> wrote:
>>=20
>> On 2019-07-10 11:37, Daniel Braniss wrote:
>>>=20
>>>=20
>>>> On 10 Jul 2019, at 18:24, Allan Jude <allanjude@freebsd.org> wrote:
>>>>=20
>>>> On 2019-07-10 10:48, Daniel Braniss wrote:
>>>>> hi,
>>>>> i got a degraded pool, but can=E2=80=99t make sense  of the file =
name:
>>>>>=20
>>>>> protonew-2# zpool status -vx
>>>>> pool: h
>>>>> state: ONLINE
>>>>> status: One or more devices has experienced an error resulting in =
data
>>>>>     corruption.  Applications may be affected.
>>>>> action: Restore the file in question if possible.  Otherwise =
restore the
>>>>>     entire pool from backup.
>>>>> see: http://illumos.org/msg/ZFS-8000-8A =
<http://illumos.org/msg/ZFS-8000-8A>;
>>>>> scan: scrub repaired 6.50K in 17h30m with 0 errors on Wed Jul 10 =
12:06:14 2019
>>>>> config:
>>>>>=20
>>>>>     NAME          STATE     READ WRITE CKSUM
>>>>>     h             ONLINE       0     0 14.4M
>>>>>       gpt/r5/zfs  ONLINE       0     0 57.5M
>>>>>=20
>>>>> errors: Permanent errors have been detected in the following =
files:
>>>>>=20
>>>>>     <0x102>:<0x30723>
>>>>>     <0x102>:<0x30726>
>>>>>     <0x102>:<0x3062a>
>>>>> =E2=80=A6
>>>>>     <0x281>:<0x0>
>>>>>     <0x6aa>:<0x305cd>
>>>>>     <0xffffffffffffffff>:<0x305cd>
>>>>>=20
>>>>>=20
>>>>> any hints as how I can identify third files?
>>>>>=20
>>>>> thanks,
>>>>> 	danny
>>>>>=20
>>>>> _______________________________________________
>>>>> freebsd-hackers@freebsd.org mailing list
>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>>>> To unsubscribe, send any mail to =
"freebsd-hackers-unsubscribe@freebsd.org"
>>>>>=20
>>>>=20
>>>> Once a file has been deleted, ZFS can have a hard time determining =
its
>>>> filename.
>>>>=20
>>>> It is inode 198186 (0x3062a) on dataset 0x102. The file has been
>>>> deleted, but still exists in at least one snapshot.
>>>>=20
>>>> Although, 57 million checksum errors seems like there may be some =
other
>>>> problem. You might look for and resolve the problem with what =
appears to
>>>> be a raid5 you have built your ZFS pool on top of it? Then do =
'zpool
>>>> clear' to reset the counters to zero, and 'zpool scrub' to try to =
read
>>>> everything again.
>>>>=20
>>>> --=20
>>>> Allan Jude
>>>>=20
>>> I don=E2=80=99t know when the first error was detected, and this =
host has been up for 367 days!
>>> I did a scrub but no change.
>>> i will remove old snapshots and see if it helps.
>>>=20
>>> is it possible to know at least which volume?
>>>=20
>>> thanks,
>>> 	danny
>>>=20
>>>=20
>>> _______________________________________________
>>> freebsd-hackers@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>> To unsubscribe, send any mail to =
"freebsd-hackers-unsubscribe@freebsd.org"
>>>=20
>>=20
>> zdb -ddddd h 0x102
>>=20
>> Should tell you about which dataset that is
>>=20
>> --=20
>> Allan Jude
>>=20
>=20

the above did=E2=80=99t work for me, but,
after removing old snapshots I reduced the problematic files to 1!
     <0xffffffffffffffff>:<0x305cd>
which seems very odd -1?
so now I removed more old snapshots, and started a a new zpool scrub.
what still worries me is the fast growing checksum count,
thanks,
	danny


> firstly, thanks for your help!
> now, after doing a zpool clear, I notice that the CHKSUM is growing,
> the pool is on a raid controller raid5 (PERC from dell) which is =
showing
> it=E2=80=99s correcting the errors (=E2=80=98Corrected medium error =
during recovery on PD =E2=80=A6).
>=20
> so what can be  the cause? btw, the FreeBSD is 10.3-stable.
>=20
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to =
"freebsd-hackers-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?88CFC175-8275-4C4E-B7BE-110E07C0A31C>