Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Nov 2016 13:45:58 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Henri Hennebert <hlh@restart.be>, freebsd-stable@FreeBSD.org
Cc:        Konstantin Belousov <kib@FreeBSD.org>
Subject:   Re: Freebsd 11.0 RELEASE - ZFS deadlock
Message-ID:  <f406ad95-bd3f-710c-5a2c-cc526d1a9812@FreeBSD.org>
In-Reply-To: <26512d69-94c2-92da-e3ea-50aebf17e3a0@restart.be>
References:  <0c223160-b76f-c635-bb15-4a068ba7efe7@restart.be> <43c9d4d4-1995-5626-d70a-f92a5b456629@FreeBSD.org> <a14d508d-351f-71f4-c7cc-ac73dbcde357@restart.be> <9d1f9a76-5a8d-6eca-9a50-907d55099847@FreeBSD.org> <6bc95dce-31e1-3013-bfe3-7c2dd80f9d1e@restart.be> <e4878992-a362-3f12-e743-8efa1347cabf@FreeBSD.org> <23a66749-f138-1f1a-afae-c775f906ff37@restart.be> <8e7547ef-87f7-7fab-6f45-221e8cea1989@FreeBSD.org> <6d991cea-b420-531e-12cc-001e4aeed66b@restart.be> <67f2e8bd-bff0-f808-7557-7dabe5cad78c@FreeBSD.org> <1cb09c54-5f0e-2259-a41a-fefe76b4fe8b@restart.be> <d25c8035-b710-5de9-ebe3-7990b2d0e3b1@FreeBSD.org> <9f20020b-e2f1-862b-c3fc-dc6ff94e301e@restart.be> <c1b7aa94-1f1d-7edd-8764-adb72fdc053c@FreeBSD.org> <599c5a5b-aa08-2030-34f3-23ff19d09a9b@restart.be> <32686283-948a-6faf-7ded-ed8fcd23affb@FreeBSD.org> <cf0fc1e3-b621-074e-1351-4dd89d980ddd@restart.be> <af4e0c2b-00f8-bbaa-bcb7-d97062a393b8@FreeBSD.org> <26512d69-94c2-92da-e3ea-50aebf17e3a0@restart.be>

next in thread | previous in thread | raw e-mail | index | archive | help
On 14/11/2016 11:35, Henri Hennebert wrote:
> 
> 
> On 11/14/2016 10:07, Andriy Gapon wrote:
>> Hmm, I've just noticed another interesting thread:
>> Thread 668 (Thread 101245):
>> #0  sched_switch (td=0xfffff800b642aa00, newtd=0xfffff8000285f000, flags=<value
>> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
>> #1  0xffffffff80561ae2 in mi_switch (flags=<value optimized out>, newtd=0x0) at
>> /usr/src/sys/kern/kern_synch.c:455
>> #2  0xffffffff805ae8da in sleepq_wait (wchan=0x0, pri=0) at
>> /usr/src/sys/kern/subr_sleepqueue.c:646
>> #3  0xffffffff805614b1 in _sleep (ident=<value optimized out>, lock=<value
>> optimized out>, priority=<value optimized out>, wmesg=0xffffffff809c51bc
>> "vmpfw", sbt=0, pr=<value optimized out>, flags=<value optimized out>) at
>> /usr/src/sys/kern/kern_synch.c:229
>> #4  0xffffffff8089d1c1 in vm_page_busy_sleep (m=0xfffff800df68cd40, wmesg=<value
>> optimized out>) at /usr/src/sys/vm/vm_page.c:753
>> #5  0xffffffff8089dd4d in vm_page_sleep_if_busy (m=0xfffff800df68cd40,
>> msg=0xffffffff809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086
>> #6  0xffffffff80886be9 in vm_fault_hold (map=<value optimized out>, vaddr=<value
>> optimized out>, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at
>> /usr/src/sys/vm/vm_fault.c:495
>> #7  0xffffffff80885448 in vm_fault (map=0xfffff80011d66000, vaddr=<value
>> optimized out>, fault_type=4 '\004', fault_flags=<value optimized out>) at
>> /usr/src/sys/vm/vm_fault.c:273
>> #8  0xffffffff808d3c49 in trap_pfault (frame=0xfffffe0101836c00, usermode=1) at
>> /usr/src/sys/amd64/amd64/trap.c:741
>> #9  0xffffffff808d3386 in trap (frame=0xfffffe0101836c00) at
>> /usr/src/sys/amd64/amd64/trap.c:333
>> #10 0xffffffff808b7af1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
> 
> This tread is another program from the news system:
> 668 Thread 101245 (PID=49124: innfeed)  sched_switch (td=0xfffff800b642aa00,
> newtd=0xfffff8000285f000, flags=<value optimized out>) at
> /usr/src/sys/kern/sched_ule.c:1973
> 
>>
>> I strongly suspect that this is thread that we were looking for.
>> I think that it has the vnode lock in the shared mode while trying to fault in a
>> page.
>>
>> Could you please check that by going to frame 6 and printing 'fs' and '*fs.vp'?
>> It'd be interesting to understand why this thread is waiting here.
>> So, please also print '*fs.m' and '*fs.object'.
> 
> No luck :-(
> (kgdb) fr 6
> #6  0xffffffff80886be9 in vm_fault_hold (map=<value optimized out>, vaddr=<value
> optimized out>, fault_type=4 '\004',
>     fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:495
> 495                        vm_page_sleep_if_busy(fs.m, "vmpfw");
> (kgdb) print fs
> Cannot access memory at address 0xffff00001fa0
> (kgdb)

Okay.  Luckily for us, it seems that 'm' is available in frame 5.  It also
happens to be the first field of 'struct faultstate'.  So, could you please go
to frame and print '*m' and '*(struct faultstate *)m' ?

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f406ad95-bd3f-710c-5a2c-cc526d1a9812>