Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Oct 2016 15:55:37 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-stable@freebsd.org
Subject:   Re: Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE
Message-ID:  <1fefed03-6062-50f9-be97-d693e25a64c9@denninger.net>
In-Reply-To: <4d4909b7-c44b-996e-90e1-ca446e8e4813@multiplay.co.uk>
References:  <3d4f25c9-a262-a373-ec7e-755325f8810b@denninger.net> <9adecd24-6659-0da5-5c05-d0d3957a2cb3@denninger.net> <CANCZdfq5QCDNhLY5GOpmBoh5ONYy2VPteuaMhQ2=3v%2B0vcoM0g@mail.gmail.com> <0f58b11f-0bca-bc08-6f90-4e6e530f9956@denninger.net> <43a67287-f4f8-5d3e-6c5e-b3599c6adb4d@multiplay.co.uk> <76551fd6-0565-ee6c-b0f2-7d472ad6a4b3@denninger.net> <25ff3a3e-77a9-063b-e491-8d10a06e6ae2@multiplay.co.uk> <26e092b2-17c6-8744-5035-d0853d733870@denninger.net> <d2afc0b0-0e7f-e7ac-fb21-fa4ffd1c1003@multiplay.co.uk> <f9a4a12d-62df-482d-feeb-9d9f64de3e55@denninger.net> <4d4909b7-c44b-996e-90e1-ca446e8e4813@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms040206080209020402030205
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On 10/17/2016 18:32, Steven Hartland wrote:
>
>
> On 17/10/2016 22:50, Karl Denninger wrote:
>> I will make some effort on the sandbox machine to see if I can come up=

>> with a way to replicate this.  I do have plenty of spare larger drives=

>> laying around that used to be in service and were obsolesced due to
>> capacity -- but what I don't know if whether the system will misbehave=

>> if the source is all spinning rust.
>>
>> In other words:
>>
>> 1. Root filesystem is mirrored spinning rust (production is mirrored
>> SSDs)
>>
>> 2. Backup is mirrored spinning rust (of approx the same size)
>>
>> 3. Set up auto-snapshot exactly as the production system has now (whic=
h
>> the sandbox is NOT since I don't care about incremental recovery on th=
at
>> machine; it's a sandbox!)
>>
>> 4. Run a bunch of build-somethings (e.g. buildworlds, cross-build for
>> the Pi2s I have here, etc) to generate a LOT of filesystem entropy
>> across lots of snapshots.
>>
>> 5. Back that up.
>>
>> 6. Export the backup pool.
>>
>> 7. Re-import it and "zfs destroy -r" the backup filesystem.
>>
>> That is what got me in a reboot loop after the *first* panic; I was
>> simply going to destroy the backup filesystem and re-run the backup, b=
ut
>> as soon as I issued that zfs destroy the machine panic'd and as soon a=
s
>> I re-attached it after a reboot it panic'd again.  Repeat until I set
>> trim=3D0.
>>
>> But... if I CAN replicate it that still shouldn't be happening, and th=
e
>> system should *certainly* survive attempting to TRIM on a vdev that
>> doesn't support TRIMs, even if the removal is for a large amount of
>> space and/or files on the target, without blowing up.
>>
>> BTW I bet it isn't that rare -- if you're taking timed snapshots on an=

>> active filesystem (with lots of entropy) and then make the mistake of
>> trying to remove those snapshots (as is the case with a zfs destroy -r=

>> or a zfs recv of an incremental copy that attempts to sync against a
>> source) on a pool that has been imported before the system realizes th=
at
>> TRIM is unavailable on those vdevs.
>>
>> Noting this:
>>
>>      Yes need to find some time to have a look at it, but given how ra=
re
>>      this is and with TRIM being re-implemented upstream in a totally
>>      different manor I'm reticent to spend any real time on it.
>>
>> What's in-process in this regard, if you happen to have a reference?
> Looks like it may be still in review: https://reviews.csiden.org/r/263/=

>
>
Initial attempts to provoke the panic has failed on the sandbox machine
-- it appears that I need a materially-fragmented backup volume (which
makes sense, as that would greatly increase the number of TRIM's queued.)=


Running a bunch of builds with snapshots taken between generates a
metric ton of entropy in the filesystem, but it appears that the number
of TRIMs actually issued when you bulk-remove them (with zfs destroy -r)
is small enough to not cause it -- probably because the system issues
one per area of freed disk, and since there is no interleaving with
other (non-removed) data that number is "reasonable" since there's
little fragmentation of that free space.

The TRIMs *are* attempted, and they *do* fail, however.....

I'm running with the 6 pages of kstack now on the production machine,
and we'll see if I get another panic...

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms040206080209020402030205
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjEwMTgyMDU1MzdaME8GCSqGSIb3DQEJBDFCBEBI
z9dDmtaCkAeNlMzW4yZMrJchdvA25n5Tei2nlg+EevLD+Q9tZ+zjqMFRYrYr8nGf9TBBFC6c
0ZMVfCBAj3pQMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAn2rgnnNi
EZHYytdfyH2xDEN7jIIP9erFeFT61zryxkKUDWsB6zj2hUL0109YN9YOQMfJR0qPPA68W7Np
duVddYyZ07Xhcc9l/tYERrUTa9WMmHMDidA3H3tdkruRbZDJ6fWaWUyIcqsTRfoSG5uavzzM
U4wS20ouzMXBHLa+/CiJ0+h7qsqjRVdE5pkFJcAQ8hxUsP/wCDB92rEnGdwzXPWrcaOIxIBk
cFlPJ7o6TcdlT62Hx7QxF1Ncq1iSz4YDAS49sens04khYP9ZsDAYFlpasc9YzQeQ9ttoQNE4
yGiDhLnf5JMqPCPqGMrWT9sLIvVTRrZdXZEkAEhOirIpA7GVQfqZa3f7jpwVk6Vv6xZj1Y44
BYsG93fAdmsIfl+fEp2D//6s1u/Y+1NcfjQVq6CikpRrAORUX7Ec93OjJieT69vJo80KkdYI
ZOQyUu0FkBYGdmd+b0EaJBUJ+D3BZN4R01mMYowjsLSDqYPAB+jyiNSZdzNLAoen6k2pBJ66
qVYaZQz3rXmRfeKdC1fQBISti7jfKngTZe4/n166tplqHc2Ww+mYFWflMKBgdtgtKvUIQe4h
6+kKN0hPnlWwFHGOPHDvthRxiYV1hZ6MOhC7iA/NOdR2YJP2r0kLJj9hLuPjpt5yzLohSLpn
wjG/3tuiJtEytNyVNZ+SHo0NTiQAAAAAAAA=
--------------ms040206080209020402030205--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1fefed03-6062-50f9-be97-d693e25a64c9>