Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Nov 2019 08:37:22 -0800
From:      bob prohaska <fbsd@www.zefox.net>
To:        freebsd-arm@freebsd.org
Subject:   RPI3, panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffffd003ad8f000, blocked for 1802596 ticks
Message-ID:  <20191129163722.GA40850@www.zefox.net>

next in thread | raw e-mail | index | archive | help
This morning an Rpi3 was found reporting "panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffffd003ad8f000, blocked for 1802596 ticks" .

System and kernel are at r355024.

Late last night the machine seemed to have stalled on a make in www/chromium, with
make-related processes showing zero percent cpu in top, low storage activity and no
progress in log files but still responsive to shell commands. This is the same
machine that stalled with "smsc0: failed to create mbuf" on Nov. 25th but didn't
panic by itself in that case.

This time I captured a backtrace:
panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffffd003ad8f000, blocked for 1802596 ticks

cpuid = 0
time = 1575027501
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
	 pc = 0xffff000000729e4c  lr = 0xffff0000001066c8
	 sp = 0xffff0000518d2580  fp = 0xffff0000518d2790

db_trace_self_wrapper() at vpanic+0x18c
	 pc = 0xffff0000001066c8  lr = 0xffff000000400d7c
	 sp = 0xffff0000518d27a0  fp = 0xffff0000518d2850

vpanic() at panic+0x44
	 pc = 0xffff000000400d7c  lr = 0xffff000000400b2c
	 sp = 0xffff0000518d2860  fp = 0xffff0000518d28e0

panic() at deadlkres+0x314
	 pc = 0xffff000000400b2c  lr = 0xffff00000039cdf0
	 sp = 0xffff0000518d28f0  fp = 0xffff0000518d2940

deadlkres() at fork_exit+0x7c
	 pc = 0xffff00000039cdf0  lr = 0xffff0000003c10d4
	 sp = 0xffff0000518d2950  fp = 0xffff0000518d2980

fork_exit() at fork_trampoline+0x10
	 pc = 0xffff0000003c10d4  lr = 0xffff0000007471a4
	 sp = 0xffff0000518d2990  fp = 0x0000000000000000

KDB: enter: panic
[ thread pid 0 tid 100052 ]
Stopped at      0
db> bt
Tracing pid 0 tid 100052 td 0xfffffd0000bb5560
db_trace_self() at db_stack_trace+0xf8
         pc = 0xffff000000729e4c  lr = 0xffff000000103b0c
         sp = 0xffff0000518d2150  fp = 0xffff0000518d2180

db_stack_trace() at db_command+0x228
         pc = 0xffff000000103b0c  lr = 0xffff000000103784
         sp = 0xffff0000518d2190  fp = 0xffff0000518d2270

db_command() at db_command_loop+0x58
         pc = 0xffff000000103784  lr = 0xffff00000010352c
         sp = 0xffff0000518d2280  fp = 0xffff0000518d22a0

db_command_loop() at db_trap+0xf4
         pc = 0xffff00000010352c  lr = 0xffff000000106830
         sp = 0xffff0000518d22b0  fp = 0xffff0000518d24d0

db_trap() at kdb_trap+0x1d8
         pc = 0xffff000000106830  lr = 0xffff0000004492fc
         sp = 0xffff0000518d24e0  fp = 0xffff0000518d2590
        
kdb_trap() at do_el1h_sync+0xf4
         pc = 0xffff0000004492fc  lr = 0xffff000000747418
         sp = 0xffff0000518d25a0  fp = 0xffff0000518d25d0

do_el1h_sync() at handle_el1h_sync+0x78
         pc = 0xffff000000747418  lr = 0xffff00000072c878
         sp = 0xffff0000518d25e0  fp = 0xffff0000518d26f0

handle_el1h_sync() at kdb_enter+0x34
         pc = 0xffff00000072c878  lr = 0xffff000000448948
         sp = 0xffff0000518d2700  fp = 0xffff0000518d2790

kdb_enter() at vpanic+0x1a8
         pc = 0xffff000000448948  lr = 0xffff000000400d98
         sp = 0xffff0000518d27a0  fp = 0xffff0000518d2850

vpanic() at panic+0x44
         pc = 0xffff000000400d98  lr = 0xffff000000400b2c
         sp = 0xffff0000518d2860  fp = 0xffff0000518d28e0
        
panic() at deadlkres+0x314
         pc = 0xffff000000400b2c  lr = 0xffff00000039cdf0
         sp = 0xffff0000518d28f0  fp = 0xffff0000518d2940

deadlkres() at fork_exit+0x7c
         pc = 0xffff00000039cdf0  lr = 0xffff0000003c10d4
         sp = 0xffff0000518d2950  fp = 0xffff0000518d2980

fork_exit() at fork_trampoline+0x10
         pc = 0xffff0000003c10d4  lr = 0xffff0000007471a4
         sp = 0xffff0000518d2990  fp = 0x0000000000000000

db> 

The last top window reported:
last pid: 50623;  load averages:  0.32,  0.35,  0.39                                up 3+20:13:48  03:38:19
60 processes:  1 running, 58 sleeping, 1 zombie
CPU:  2.3% user,  0.0% nice,  8.1% system,  0.2% interrupt, 89.4% idle
Mem: 86M Active, 629M Inact, 7536K Laundry, 116M Wired, 26M Buf, 62M Free
Swap: 7194M Total, 1772M Used, 5422M Free, 24% Inuse
packet_write_wait: Connection to 50.1.20.28 port 22: Broken pipe
bob@raspberrypi:~ $ R PRI NICE   SIZE    RES SWAP STATE    C   TIME    WCPU COMMAND
67897 root          1  23    0   679M    93M  18M vmpfw    1  11:34   0.00% c++
37083 root          1  52    0    12M  3756K   0B ufs      1   4:29   0.00% find
37086 root          1  52    0    11M  2720K   0B piperd   0   0:00   0.00% cat
50623 bob           1  52    0    12M  2652K   0B nanslp   2   0:00   0.00% gstat
37079 root          1  52    0    11M  2172K   0B wait     1   0:00   0.00% sh
37084 root          1  52    0    11M  2168K   0B wait     3   0:00   0.00% sh
36967 root          1  52    0    11M  2164K   0B wait     0   0:00   0.00% sh
37077 root          1  52    0    11M  2164K   0B wait     3   0:00   0.00% sh
37066 root          1  52    0    11M  2160K   0B wait     1   0:00   0.00% sh
37070 root          1  52    0    11M  2156K   0B wait     2   0:00   0.00% sh
37067 root          1  52    0    11M  2156K   0B wait     3   0:00   0.00% sh
36949 root          1  32    0    11M  2156K   0B wait     1   0:00   0.00% sh
36959 root          1  52    0    11M  2156K   0B wait     2   0:00   0.00% sh
37078 root          1  52    0    11M  1860K   0B piperd   0   0:00   0.00% mail
36968 root          1  29    0    11M  1852K   0B piperd   1   0:00   0.00% mail

The swap usage was high, but lower than earlier in the make session. A log of both
make output and system activity was captured, the last entries of which are:
[12229/16483] c++ -MMD -MF obj/chrome/browser/ui/ui/ui_jumbo_41.o.d -DUSE_DBUS -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_NSS_CERTS=1 -DUSE_X11=1 -DOFFICIAL_BUILD -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE
.......(long command line, list of more-or-less routine warnings)

and

0/1530/1530/19178 mbuf clusters in use (current/cache/total/max)
procs     memory       page                      disks     faults       cpu
r b w     avm     fre  flt  re  pi  po    fr   sr mm0 da0   in   sy   cs us sy id
 0  0  9 2472776   63368  7126 159 228 129  6751 24168   0   0 23120  1800 15731 41 22 37
dT: 1.063s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
    0      2      0      0    0.0      2     53    3.2      0      0    0.0    0.6  mmcsd0
    0      2      0      0    0.0      2     53    3.2      0      0    0.0    0.6  mmcsd0s2
    0      2      0      0    0.0      2     53    3.2      0      0    0.0    0.6  mmcsd0s2a
    0      2      0      0    0.0      2     53    3.3      0      0    0.0    0.6  ufs/rootfs
Fri Nov 29 03:37:44 PST 2019
Device          1K-blocks     Used    Avail Capacity
/dev/mmcsd0s2b    4404252   906260  3497992    21%
/dev/da0p6        5242880   907904  4334976    17%
Total             9647132  1814164  7832968    19%
Nov 29 03:37:13 www sshd[50211]: error: PAM: Authentication error for illegal user support from 103.133.104.114
Nov 29 03:37:13 www sshd[50211]: error: Received disconnect from 103.133.104.114 port 60251:14: No more user authentication methods available. [preauth]

The commands used to capture the system activity were:
#!/bin/sh
while true
do vmstat ; gstat -abd -I 1s ; date ; swapinfo ; tail -n 2 /var/log/messages ; netstat -m | grep "mbuf clusters" 
done

About three hours elapsed between the last interactive observation
and the last logfile entry.

The log files are too large to post in their entirety but they can
be put on a webserver if it's useful. 

Thanks for reading, and any suggestions

bob prohaska




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20191129163722.GA40850>