From owner-freebsd-stable@freebsd.org Fri Nov 18 12:31:24 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D1EE9C48AB0 for ; Fri, 18 Nov 2016 12:31:24 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id C7966A89; Fri, 18 Nov 2016 12:31:23 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA18390; Fri, 18 Nov 2016 14:31:15 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1c7iK3-000HaV-E5; Fri, 18 Nov 2016 14:31:15 +0200 Subject: Re: Freebsd 11.0 RELEASE - ZFS deadlock To: Henri Hennebert , Konstantin Belousov References: <0c223160-b76f-c635-bb15-4a068ba7efe7@restart.be> <9d1f9a76-5a8d-6eca-9a50-907d55099847@FreeBSD.org> <6bc95dce-31e1-3013-bfe3-7c2dd80f9d1e@restart.be> <23a66749-f138-1f1a-afae-c775f906ff37@restart.be> <8e7547ef-87f7-7fab-6f45-221e8cea1989@FreeBSD.org> <6d991cea-b420-531e-12cc-001e4aeed66b@restart.be> <67f2e8bd-bff0-f808-7557-7dabe5cad78c@FreeBSD.org> <1cb09c54-5f0e-2259-a41a-fefe76b4fe8b@restart.be> <9f20020b-e2f1-862b-c3fc-dc6ff94e301e@restart.be> <599c5a5b-aa08-2030-34f3-23ff19d09a9b@restart.be> <32686283-948a-6faf-7ded-ed8fcd23affb@FreeBSD.org> <26512d69-94c2-92da-e3ea-50aebf17e3a0@restart.be> <80f65c86-1015-c409-1bf6-c01a5fe569c8@restart.be> Cc: freebsd-stable@FreeBSD.org From: Andriy Gapon Message-ID: <7c932021-ff99-9ef9-7042-4f267fb0b955@FreeBSD.org> Date: Fri, 18 Nov 2016 14:30:39 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <80f65c86-1015-c409-1bf6-c01a5fe569c8@restart.be> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Nov 2016 12:31:24 -0000 On 14/11/2016 14:00, Henri Hennebert wrote: > On 11/14/2016 12:45, Andriy Gapon wrote: >> Okay. Luckily for us, it seems that 'm' is available in frame 5. It also >> happens to be the first field of 'struct faultstate'. So, could you please go >> to frame and print '*m' and '*(struct faultstate *)m' ? >> > (kgdb) fr 4 > #4 0xffffffff8089d1c1 in vm_page_busy_sleep (m=0xfffff800df68cd40, wmesg= optimized out>) at /usr/src/sys/vm/vm_page.c:753 > 753 msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0); > (kgdb) print *m > $1 = {plinks = {q = {tqe_next = 0xfffff800dc5d85b0, tqe_prev = > 0xfffff800debf3bd0}, s = {ss = {sle_next = 0xfffff800dc5d85b0}, > pv = 0xfffff800debf3bd0}, memguard = {p = 18446735281313646000, v = > 18446735281353604048}}, listq = {tqe_next = 0x0, > tqe_prev = 0xfffff800dc5d85c0}, object = 0xfffff800b62e9c60, pindex = 11, > phys_addr = 3389358080, md = {pv_list = { > tqh_first = 0x0, tqh_last = 0xfffff800df68cd78}, pv_gen = 426, pat_mode = > 6}, wire_count = 0, busy_lock = 6, hold_count = 0, > flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind = 0 '\0', > segind = 3 '\003', order = 13 '\r', > pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'} If I interpret this correctly the page is in the 'exclusive busy' state. Unfortunately, I can't tell much beyond that. But I am confident that this is the root cause of the lock-up. > (kgdb) print *(struct faultstate *)m > $2 = {m = 0xfffff800dc5d85b0, object = 0xfffff800debf3bd0, pindex = 0, first_m = > 0xfffff800dc5d85c0, > first_object = 0xfffff800b62e9c60, first_pindex = 11, map = 0xca058000, entry > = 0x0, lookup_still_valid = -546779784, > vp = 0x6000001aa} > (kgdb) I was wrong on this one as 'm' is actually a pointer, so the above is not correct. Maybe 'info reg' in frame 5 would give a clue about the value of 'fs'. I am not sure how to proceed from here. The only thing I can think of is a lock order reversal between the vnode lock and the page busying quasi-lock. But examining the code I can not spot it. Another possibility is a leak of a busy page, but that's hard to debug. How hard is it to reproduce the problem? Maybe Konstantin would have some ideas or suggestions. -- Andriy Gapon