From owner-freebsd-stable@freebsd.org Thu May 9 11:15:17 2019 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 129D915A5836 for ; Thu, 9 May 2019 11:15:17 +0000 (UTC) (envelope-from michelle@sorbs.net) Received: from hades.sorbs.net (hades.sorbs.net [72.12.213.40]) by mx1.freebsd.org (Postfix) with ESMTP id 333B589DE9 for ; Thu, 9 May 2019 11:15:16 +0000 (UTC) (envelope-from michelle@sorbs.net) MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Received: from [10.10.0.230] (gate.mhix.org [203.206.128.220]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0PR800MKBIKP0530@hades.sorbs.net> for freebsd-stable@freebsd.org; Thu, 09 May 2019 04:29:16 -0700 (PDT) Sun-Java-System-SMTP-Warning: Lines longer than SMTP allows found and truncated. Subject: Re: ZFS... From: Michelle Sullivan X-Mailer: iPad Mail (16A404) In-reply-to: Date: Thu, 09 May 2019 21:15:11 +1000 Cc: Walter Parker , freebsd-stable@freebsd.org Content-transfer-encoding: quoted-printable Message-id: References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net> <56833732-2945-4BD3-95A6-7AF55AB87674@sorbs.net> <3d0f6436-f3d7-6fee-ed81-a24d44223f2f@netfence.it> <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net> <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de> <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net> <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net> <2e4941bf-999a-7f16-f4fe-1a520f2187c0@sorbs.net> <20190430102024.E84286@mulder.mintsol.com> <41FA461B-40AE-4D34-B280-214B5C5868B5@punkt.de> <20190506080804.Y87441@mulder.mintsol.com> <08E46EBF-154F-4670-B411-482DCE6F395D@sorbs.net> <33D7EFC4-5C15-4FE0-970B-E6034EF80BEF@gromit.dlib.vt.edu> To: Borja Marcos X-Rspamd-Queue-Id: 333B589DE9 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of michelle@sorbs.net designates 72.12.213.40 as permitted sender) smtp.mailfrom=michelle@sorbs.net X-Spamd-Result: default: False [-2.28 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-0.90)[-0.901,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+a:hades.sorbs.net]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[sorbs.net]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: battlestar.sorbs.net]; NEURAL_HAM_SHORT(-0.30)[-0.301,0]; RCVD_IN_DNSWL_NONE(0.00)[40.213.12.72.list.dnswl.org : 127.0.10.0]; SUBJ_ALL_CAPS(0.45)[6]; IP_SCORE(-0.32)[ip: (-0.79), ipnet: 72.12.192.0/19(-0.41), asn: 11114(-0.33), country: US(-0.06)]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:11114, ipnet:72.12.192.0/19, country:US]; FREEMAIL_CC(0.00)[gmail.com]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 May 2019 11:15:17 -0000 Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 09 May 2019, at 19:41, Borja Marcos wrote: >=20 >=20 >=20 >> On 9 May 2019, at 00:55, Michelle Sullivan wrote: >>=20 >>=20 >>=20 >> This is true, but I am of the thought in alignment with the zFs devs this= might not be a good idea... if zfs can=E2=80=99t work it out already, the b= est thing to do will probably be get everything off it and reformat=E2=80=A6= =20 >=20 > That=E2=80=99s true, I would rescue what I could and create the pool again= but after testing the setup thoroughly. >=20 +1 > It would be worth to have a look at the excellent guide offered by the Fre= eNAS people. It=E2=80=99s full of excellent advice and a > priceless list of =E2=80=9Cdonts=E2=80=9D such as SATA port multipliers, e= tc.=20 >=20 Yeah already worked out over time port multipliers can=E2=80=99t be good. >>=20 >>> That sound not be hard to write if everything else on the disk has no >>> issues. Don't you say in another message that the system is now returnin= g >>> 100's of drive errors. >>=20 >> No, one disk in the 16 disk zRAID2 ... previously unseen but it could be= the errors have occurred in the last 6 weeks... everytime I reboot it start= ed resilvering, gets to 761M resilvered and then stops. >=20 > That=E2=80=99s a really bad sign. It shouldn=E2=80=99t happen.=20 That=E2=80=99s since the metadata corruption. That is probably part of the p= roblem. >=20 >>> How does that relate the statement =3D>Everything on >>> the disk is fine except for a little bit of corruption in the freespace m= ap? >>=20 >> Well I think it goes through until it hits that little bit of corruption a= t stops it mounting... then stops again.. >>=20 >> I=E2=80=99m seeing 100s of hard errors at the beginning of one of the dri= ves.. they were reported in syslog but only just so could be a new thing. C= ould be previously undetected.. no way to know. >=20 > As for disk monitoring, smartmontools can be pretty good although only as a= n indicator. I also monitor my systems using Orca (I wrote a crude =E2=80=9C= devilator=E2=80=9D many years > ago) and I gather disk I/O statistics using GEOM of which the read/write/d= elete/flush times are very valuable. An ailing disk can be returning valid d= ata but become very slow due to retries.=20 Yes, though often these will show up in syslog (something I monitor religiou= sly... though I concede that when it hits syslog it=E2=80=99s probably alre= ady and urgent issue. Michelle=