Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 07 Mar 2017 11:30:09 +0100
From:      Harry Schmalzbauer <freebsd@omnilan.de>
To:        hiren panchasara <hiren@strugglingcoder.info>
Cc:        stable@freebsd.org, Rick Macklem <rmacklem@uoguelph.ca>
Subject:   Re: 'show alllocks' of completely locked machine [Was: Re: Complete IO lockup, state "ufs" from userland, debuging help wanted]
Message-ID:  <58BE8BB1.1000200@omnilan.de>
In-Reply-To: <20170306201042.GB71537@strugglingcoder.info>
References:  <58BC8A44.8020607@omnilan.de> <58BDBEE3.6080601@omnilan.de> <20170306201042.GB71537@strugglingcoder.info>

next in thread | previous in thread | raw e-mail | index | archive | help
Bezüglich hiren panchasara's Nachricht vom 06.03.2017 21:10 (localtime):
> On 03/06/17 at 08:56P, Harry Schmalzbauer wrote:
>>  Bez?glich Harry Schmalzbauer's Nachricht vom 05.03.2017 22:59 (localtime):
>>>  Hello,
>>>
>>> I can easily lock up FreeBSD stable/11 from userland. Not that I want to...
>>> I'm running squid, which starts an authentication helper
>>> "*negotiate_kerberos_auth*", which seems to be the culprit.
>>> Completely all IO is blocked, there's no way to get anything from any
>>> filesystem.
>>> All non IO-requesting processes(threads) run well, including sshd and
>>> shells.
>>> There's no load (neither cpu nor io) just any process requesting io
>>> stucks in state "ufs"
>>>
>>> Can anyone help me finding out what's going wrong?
>>> Serial console is available.
>>
>> Dear hackers,
>>
>> I managed to get into DDB, but I'm lost from there?
>>
>> What information could be usefull to find out the cause of this complete
>> lockup?
>>
>> I'd need someone who could guide me through ? I'd pay for a debuging
>> lesson! (quiet constrained budget though)
>>
>> This happens when the machine got stuck:
>>
>> intr_event_handle() at intr_event_handle+0x9c/frame 0xfffffe0093dcb7d0
>> intr_execute_handlers() at intr_execute_handlers+0x48/frame
>> 0xfffffe0093dcb800
>> lapic_handle_intr() at lapic_handle_intr+0x68/frame 0xfffffe0093dcb840
>> Xapic_isr1() at Xapic_isr1+0xb7/frame 0xfffffe0093dcb840
>> --- interrupt, rip = 0xffffffff807b9bd6, rsp = 0xfffffe0093dcb910, rbp =
>> 0xfffffe0093dcb910 ---
>> acpi_cpu_c1() at acpi_cpu_c1+0x6/frame 0xfffffe0093dcb910
>> acpi_cpu_idle() at acpi_cpu_idle+0x2ea/frame 0xfffffe0093dcb960
>> cpu_idle_acpi() at cpu_idle_acpi+0x3f/frame 0xfffffe0093dcb980
>> cpu_idle() at cpu_idle+0x8f/frame 0xfffffe0093dcb9a0
>> sched_idletd() at sched_idletd+0x436/frame 0xfffffe0093dcba70
>> fork_exit() at fork_exit+0x84/frame 0xfffffe0093dcbab0
>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0093dcbab0
>> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>>
>>
>> db> show alllocks
>> Process 1259 (negotiate_kerberos_) thread 0xfffff80005ddea00 (100096)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 1258 (negotiate_kerberos_) thread 0xfffff80005ddc500 (100252)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 1257 (negotiate_kerberos_) thread 0xfffff80005ddda00 (100247)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 1256 (negotiate_kerberos_) thread 0xfffff80065612500 (100261)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 1255 (negotiate_kerberos_) thread 0xfffff80065612a00 (100260)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 1254 (negotiate_kerberos_) thread 0xfffff80065613000 (100257)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 1253 (negotiate_kerberos_) thread 0xfffff80065614000 (100254)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 1252 (negotiate_kerberos_) thread 0xfffff800651e1000 (100246)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 1251 (negotiate_kerberos_) thread 0xfffff80005ddca00 (100251)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 1250 (negotiate_kerberos_) thread 0xfffff800651e2a00 (100241)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 1251 (negotiate_kerberos_) thread 0xfffff80005ddca00 (100251)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 1250 (negotiate_kerberos_) thread 0xfffff800651e2a00 (100241)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 1247 (sqtop) thread 0xfffff80065650a00 (100259)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 1184 (systat) thread 0xfffff80065613a00 (100255)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 1042 (negotiate_kerberos_) thread 0xfffff800651e2500 (100242)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 1041 (negotiate_kerberos_) thread 0xfffff800055e4000 (100078)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 639 (cron) thread 0xfffff80005afa000 (100095)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 605 (log_file_daemon) thread 0xfffff800057eda00 (100088)
>> exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffff80005b547b0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/uipc_sockbuf.c:283
>> Process 604 (ext_ldap_group_acl) thread 0xfffff800057dc000 (100087)
>> exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffff80005b550f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/uipc_sockbuf.c:283
>> Process 603 (ext_ldap_group_acl) thread 0xfffff80005478000 (100063)
>> exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffff80005b410f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/uipc_sockbuf.c:283
>> Process 601 (negotiate_kerberos_) thread 0xfffff80005afc000 (100085)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 600 (negotiate_kerberos_) thread 0xfffff80005afc500 (100084)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 599 (negotiate_kerberos_) thread 0xfffff80005478a00 (100061)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 598 (negotiate_kerberos_) thread 0xfffff8000547a000 (100057)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 597 (negotiate_kerberos_) thread 0xfffff80005afca00 (100083)
>> exclusive lockmgr ufs (ufs) r = 0 (0xfffff800057555f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905
>> Process 596 (negotiate_kerberos_) thread 0xfffff8000559a500 (100068)
>> exclusive lockmgr ufs (ufs) r = 0 (0xfffff800055125f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1936
>> exclusive lockmgr ufs (ufs) r = 0 (0xfffff8000523d068) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905
>> Process 595 (negotiate_kerberos_) thread 0xfffff800055e3a00 (100079)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
>> Process 594 (negotiate_kerberos_) thread 0xfffff80005599000 (100072)
>> shared lockmgr ufs (ufs) r = 0 (0xfffff8000523d5f0) locked @
>> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_lookup.c:611
> 
> IMO, you'd want to take a crashdump so you can analyze it better.
> But first while you are at db> prompt,
> 
> db> show allchains

Thank you very much!
That's a empty one.


> db> show sleepchain

thread 100004 (pid 11, idle: cpu1) running on CPU 1

Won't help much I guess, since:
 100004 (0xfffff8000287da00) (stack 0xfffffe0093dc8000)
kdb_alt_break_internal() at kdb_alt_break_internal+0x106/frame
0xfffffe0093dcb700


> Paste that o/p here and then take the dump:
> db> call doadump

This backtrace looks similar to me:
intr_event_handle() at intr_event_handle+0x9c/frame 0xfffffe0093dcb7d0
intr_execute_handlers() at intr_execute_handlers+0x48/frame
0xfffffe0093dcb800
lapic_handle_intr() at lapic_handle_intr+0x68/frame 0xfffffe0093dcb840
Xapic_isr1() at Xapic_isr1+0xb7/frame 0xfffffe0093dcb840
--- interrupt, rip = 0xffffffff807b9bd6, rsp = 0xfffffe0093dcb910, rbp =
0xfffffe0093dcb910 ---
acpi_cpu_c1() at acpi_cpu_c1+0x6/frame 0xfffffe0093dcb910
acpi_cpu_idle() at acpi_cpu_idle+0x2ea/frame 0xfffffe0093dcb960
cpu_idle_acpi() at cpu_idle_acpi+0x3f/frame 0xfffffe0093dcb980
cpu_idle() at cpu_idle+0x8f/frame 0xfffffe0093dcb9a0
sched_idletd() at sched_idletd+0x436/frame 0xfffffe0093dcba70
fork_exit() at fork_exit+0x84/frame 0xfffffe0093dcbab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0093dcbab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

No swap unfortunately, that's a newly added service to a production
machine :-(

Rick pointed me to the right direction I think.  I haven't looked very
closely to what I posted (because I have no idea about that code), but
there's unionfs involved :-(
I thought the process wouldn't traverse /etc if I setenv KRB5_KTNAME and
KRB5_CONFIG outside /etc; /etc is a union mount, which I've been using
upt tp 10.3 without problems (but an uncommited locking-panic
workarround).  This still improves situation on 11, but there are other
areas in 11's unionfs(5?) which unfortunately seem to make it unusable.
 Locally skill by time product is far too small to start looking into
unionfs with the expectation to get anything useful :-(

Haven't decided yet if I take unionfs out of servie or better go back to
10.3.  I'm doing updates with 'newfs /dev/label/ROOT' (the like) and
unionfs is irreplacable in that scenario.

Thanks,

-harry



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?58BE8BB1.1000200>