From owner-freebsd-fs@freebsd.org Thu Aug 27 20:22:58 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 873789C36EC for ; Thu, 27 Aug 2015 20:22:58 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "NewFS.denninger.net", Issuer "NewFS.denninger.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 3250D132E for ; Thu, 27 Aug 2015 20:22:57 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (localhost [127.0.0.1]) by fs.denninger.net (8.15.2/8.14.8) with ESMTP id t7RKMnbt059314 for ; Thu, 27 Aug 2015 15:22:50 -0500 (CDT) (envelope-from karl@denninger.net) Received: from [192.168.1.40] [192.168.1.40] (Via SSLv3 AES128-SHA) ; by Spamblock-sys (LOCAL/AUTH) Thu Aug 27 15:22:49 2015 Subject: Re: Panic in ZFS during zfs recv (while snapshots being destroyed) To: freebsd-fs@freebsd.org References: <55BB443E.8040801@denninger.net> <55CF7926.1030901@denninger.net> From: Karl Denninger X-Enigmail-Draft-Status: N1110 Message-ID: <55DF7191.2080409@denninger.net> Date: Thu, 27 Aug 2015 15:22:41 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <55CF7926.1030901@denninger.net> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms000808020402070505000800" X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Aug 2015 20:22:58 -0000 This is a cryptographically signed message in MIME format. --------------ms000808020402070505000800 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 8/15/2015 12:38, Karl Denninger wrote: > Update: > > This /appears /to be related to attempting to send or receive a > /cloned /snapshot. > > I use /beadm /to manage boot environments and the crashes have all > come while send/recv-ing the root pool, which is the one where these > clones get created. It is /not /consistent within a given snapshot > when it crashes and a second attempt (which does a "recovery" > send/receive) succeeds every time -- I've yet to have it panic twice > sequentially. > > I surmise that the problem comes about when a file in the cloned > snapshot is modified, but this is a guess at this point. > > I'm going to try to force replication of the problem on my test system.= > > On 7/31/2015 04:47, Karl Denninger wrote: >> I have an automated script that runs zfs send/recv copies to bring a >> backup data set into congruence with the running copies nightly. The >> source has automated snapshots running on a fairly frequent basis >> through zfs-auto-snapshot. >> >> Recently I have started having a panic show up about once a week durin= g >> the backup run, but it's inconsistent. It is in the same place, but I= >> cannot force it to repeat. >> >> The trap itself is a page fault in kernel mode in the zfs code at >> zfs_unmount_snap(); here's the traceback from the kvm (sorry for the >> image link but I don't have a better option right now.) >> >> I'll try to get a dump, this is a production machine with encrypted sw= ap >> so it's not normally turned on. >> >> Note that the pool that appears to be involved (the backup pool) has >> passed a scrub and thus I would assume the on-disk structure is ok....= =2E >> but that might be an unfair assumption. It is always occurring in the= >> same dataset although there are a half-dozen that are sync'd -- if thi= s >> one (the first one) successfully completes during the run then all the= >> rest will as well (that is, whenever I restart the process it has alwa= ys >> failed here.) The source pool is also clean and passes a scrub. >> >> traceback is at http://www.denninger.net/kvmimage.png; apologies for t= he >> image traceback but this is coming from a remote KVM. >> >> I first saw this on 10.1-STABLE and it is still happening on FreeBSD >> 10.2-PRERELEASE #9 r285890M, which I updated to in an attempt to see i= f >> the problem was something that had been addressed. >> >> > > --=20 > Karl Denninger > karl@denninger.net > /The Market Ticker/ > /[S/MIME encrypted email preferred]/ Second update: I have now taken another panic on 10.2-Stable, same deal, but without any cloned snapshots in the source image. I had thought that removing cloned snapshots might eliminate the issue; that is now out the window. It ONLY happens on this one filesystem (the root one, incidentally) which is fairly-recently created as I moved this machine from spinning rust to SSDs for the OS and root pool -- and only when it is being backed up by using zfs send | zfs recv (with the receive going to a different pool in the same machine.) I have yet to be able to provoke it when using zfs send to copy to a different machine on the same LAN, but given that it is not able to be reproduced on demand I can't be certain it's timing related (e.g. performance between the two pools in question) or just that I haven't hit the unlucky combination. This looks like some sort of race condition and I will continue to see if I can craft a case to make it occur "on demand" --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms000808020402070505000800 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNTA4MjcyMDIyNDFaME8GCSqGSIb3DQEJBDFCBECb Dgr+d2nEYlHfHUt98UBNkMqjQ/Fyo5PexVssCqJTqjGMgr0Sqs4QGOWhffL3WPmBCeOy0bfQ 5kyUBozIN66oMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAmb5WzQI7 lN1Y6vYI52qpPhnJ55u5LIwAJCPpsyNCJXlYh76Wu399RmyMbjc2LjGJmYSx6rpzK4O9j8BX k/VbKPNbJ5gWjJyyVXO4RRCTLNczCNU8JkboCcnq8eUBlL/4V/QoviDjDgzQKQ75pCY9M0JW a6vs2X6bcyVzF05clAgeHGh+/aQhSFToLXyqpitHNixjlWUDSdJ7o7EOcP9y4iWlQzgCjamZ fQrjlCZLPm64PmJ+Jy//lfuwFThYvhErQi8SA/kNg8A+GFsTUwlhSXMEQ4n9KhlhQDGFUUhb kNztItLgtgdLwsuUjwKJ+yc/PyUr1F9F3S9QR3lfc7JgyZYiwXGx+w40aEuR1jb3YSray3uM lw0SEAAoNh2Mi2i2/rJIEjFqLsFiJo01wEvWqkUWPcxeG9sgODL6DoafzJM1fSw6Rz09gLAv uzYj8+HYrEcfEvga2Ayi6ypZ/trBcbBdhHDgTVPqZ8GEAJOVFjpqhHDqVtX2tUN+cksJhvLo /1DYLfwJc2SViApUx5GM9Xc7q7efrvz0m14/ylKbZUUlPfbfN30bb1GTuLD5eoweuasXflRY HczVBhmMlZi9P+Stlwvb3QSWcIttXSjVUJaqOAEK93kg0odoNe8CDA/U21w/zUSK46gyiJSc +QrAkX69arUQ5RfsOQIp2Xk5MdgAAAAAAAA= --------------ms000808020402070505000800--