From owner-freebsd-arm@freebsd.org Fri Nov 29 16:37:27 2019 Return-Path: Delivered-To: freebsd-arm@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 7A3171B56F2 for ; Fri, 29 Nov 2019 16:37:27 +0000 (UTC) (envelope-from fbsd@www.zefox.net) Received: from www.zefox.net (www.zefox.net [50.1.20.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "www.zefox.org", Issuer "www.zefox.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 47PgCV2qgBz45Mk for ; Fri, 29 Nov 2019 16:37:26 +0000 (UTC) (envelope-from fbsd@www.zefox.net) Received: from www.zefox.net (localhost [127.0.0.1]) by www.zefox.net (8.15.2/8.15.2) with ESMTPS id xATGbNNn040965 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 29 Nov 2019 08:37:24 -0800 (PST) (envelope-from fbsd@www.zefox.net) Received: (from fbsd@localhost) by www.zefox.net (8.15.2/8.15.2/Submit) id xATGbMJ9040964; Fri, 29 Nov 2019 08:37:22 -0800 (PST) (envelope-from fbsd) Date: Fri, 29 Nov 2019 08:37:22 -0800 From: bob prohaska To: freebsd-arm@freebsd.org Subject: RPI3, panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffffd003ad8f000, blocked for 1802596 ticks Message-ID: <20191129163722.GA40850@www.zefox.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) X-Rspamd-Queue-Id: 47PgCV2qgBz45Mk X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of fbsd@www.zefox.net has no SPF policy when checking 50.1.20.27) smtp.mailfrom=fbsd@www.zefox.net X-Spamd-Result: default: False [0.48 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.87)[-0.874,0]; WWW_DOT_DOMAIN(0.50)[]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; IP_SCORE(0.07)[ip: (0.30), ipnet: 50.1.16.0/20(0.15), asn: 7065(-0.04), country: US(-0.05)]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[zefox.net]; AUTH_NA(1.00)[]; NEURAL_HAM_LONG(-0.62)[-0.617,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; R_SPF_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:7065, ipnet:50.1.16.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[]; MID_RHS_WWW(0.50)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Nov 2019 16:37:27 -0000 This morning an Rpi3 was found reporting "panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffffd003ad8f000, blocked for 1802596 ticks" . System and kernel are at r355024. Late last night the machine seemed to have stalled on a make in www/chromium, with make-related processes showing zero percent cpu in top, low storage activity and no progress in log files but still responsive to shell commands. This is the same machine that stalled with "smsc0: failed to create mbuf" on Nov. 25th but didn't panic by itself in that case. This time I captured a backtrace: panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffffd003ad8f000, blocked for 1802596 ticks cpuid = 0 time = 1575027501 KDB: stack backtrace: db_trace_self() at db_trace_self_wrapper+0x28 pc = 0xffff000000729e4c lr = 0xffff0000001066c8 sp = 0xffff0000518d2580 fp = 0xffff0000518d2790 db_trace_self_wrapper() at vpanic+0x18c pc = 0xffff0000001066c8 lr = 0xffff000000400d7c sp = 0xffff0000518d27a0 fp = 0xffff0000518d2850 vpanic() at panic+0x44 pc = 0xffff000000400d7c lr = 0xffff000000400b2c sp = 0xffff0000518d2860 fp = 0xffff0000518d28e0 panic() at deadlkres+0x314 pc = 0xffff000000400b2c lr = 0xffff00000039cdf0 sp = 0xffff0000518d28f0 fp = 0xffff0000518d2940 deadlkres() at fork_exit+0x7c pc = 0xffff00000039cdf0 lr = 0xffff0000003c10d4 sp = 0xffff0000518d2950 fp = 0xffff0000518d2980 fork_exit() at fork_trampoline+0x10 pc = 0xffff0000003c10d4 lr = 0xffff0000007471a4 sp = 0xffff0000518d2990 fp = 0x0000000000000000 KDB: enter: panic [ thread pid 0 tid 100052 ] Stopped at 0 db> bt Tracing pid 0 tid 100052 td 0xfffffd0000bb5560 db_trace_self() at db_stack_trace+0xf8 pc = 0xffff000000729e4c lr = 0xffff000000103b0c sp = 0xffff0000518d2150 fp = 0xffff0000518d2180 db_stack_trace() at db_command+0x228 pc = 0xffff000000103b0c lr = 0xffff000000103784 sp = 0xffff0000518d2190 fp = 0xffff0000518d2270 db_command() at db_command_loop+0x58 pc = 0xffff000000103784 lr = 0xffff00000010352c sp = 0xffff0000518d2280 fp = 0xffff0000518d22a0 db_command_loop() at db_trap+0xf4 pc = 0xffff00000010352c lr = 0xffff000000106830 sp = 0xffff0000518d22b0 fp = 0xffff0000518d24d0 db_trap() at kdb_trap+0x1d8 pc = 0xffff000000106830 lr = 0xffff0000004492fc sp = 0xffff0000518d24e0 fp = 0xffff0000518d2590 kdb_trap() at do_el1h_sync+0xf4 pc = 0xffff0000004492fc lr = 0xffff000000747418 sp = 0xffff0000518d25a0 fp = 0xffff0000518d25d0 do_el1h_sync() at handle_el1h_sync+0x78 pc = 0xffff000000747418 lr = 0xffff00000072c878 sp = 0xffff0000518d25e0 fp = 0xffff0000518d26f0 handle_el1h_sync() at kdb_enter+0x34 pc = 0xffff00000072c878 lr = 0xffff000000448948 sp = 0xffff0000518d2700 fp = 0xffff0000518d2790 kdb_enter() at vpanic+0x1a8 pc = 0xffff000000448948 lr = 0xffff000000400d98 sp = 0xffff0000518d27a0 fp = 0xffff0000518d2850 vpanic() at panic+0x44 pc = 0xffff000000400d98 lr = 0xffff000000400b2c sp = 0xffff0000518d2860 fp = 0xffff0000518d28e0 panic() at deadlkres+0x314 pc = 0xffff000000400b2c lr = 0xffff00000039cdf0 sp = 0xffff0000518d28f0 fp = 0xffff0000518d2940 deadlkres() at fork_exit+0x7c pc = 0xffff00000039cdf0 lr = 0xffff0000003c10d4 sp = 0xffff0000518d2950 fp = 0xffff0000518d2980 fork_exit() at fork_trampoline+0x10 pc = 0xffff0000003c10d4 lr = 0xffff0000007471a4 sp = 0xffff0000518d2990 fp = 0x0000000000000000 db> The last top window reported: last pid: 50623; load averages: 0.32, 0.35, 0.39 up 3+20:13:48 03:38:19 60 processes: 1 running, 58 sleeping, 1 zombie CPU: 2.3% user, 0.0% nice, 8.1% system, 0.2% interrupt, 89.4% idle Mem: 86M Active, 629M Inact, 7536K Laundry, 116M Wired, 26M Buf, 62M Free Swap: 7194M Total, 1772M Used, 5422M Free, 24% Inuse packet_write_wait: Connection to 50.1.20.28 port 22: Broken pipe bob@raspberrypi:~ $ R PRI NICE SIZE RES SWAP STATE C TIME WCPU COMMAND 67897 root 1 23 0 679M 93M 18M vmpfw 1 11:34 0.00% c++ 37083 root 1 52 0 12M 3756K 0B ufs 1 4:29 0.00% find 37086 root 1 52 0 11M 2720K 0B piperd 0 0:00 0.00% cat 50623 bob 1 52 0 12M 2652K 0B nanslp 2 0:00 0.00% gstat 37079 root 1 52 0 11M 2172K 0B wait 1 0:00 0.00% sh 37084 root 1 52 0 11M 2168K 0B wait 3 0:00 0.00% sh 36967 root 1 52 0 11M 2164K 0B wait 0 0:00 0.00% sh 37077 root 1 52 0 11M 2164K 0B wait 3 0:00 0.00% sh 37066 root 1 52 0 11M 2160K 0B wait 1 0:00 0.00% sh 37070 root 1 52 0 11M 2156K 0B wait 2 0:00 0.00% sh 37067 root 1 52 0 11M 2156K 0B wait 3 0:00 0.00% sh 36949 root 1 32 0 11M 2156K 0B wait 1 0:00 0.00% sh 36959 root 1 52 0 11M 2156K 0B wait 2 0:00 0.00% sh 37078 root 1 52 0 11M 1860K 0B piperd 0 0:00 0.00% mail 36968 root 1 29 0 11M 1852K 0B piperd 1 0:00 0.00% mail The swap usage was high, but lower than earlier in the make session. A log of both make output and system activity was captured, the last entries of which are: [12229/16483] c++ -MMD -MF obj/chrome/browser/ui/ui/ui_jumbo_41.o.d -DUSE_DBUS -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_NSS_CERTS=1 -DUSE_X11=1 -DOFFICIAL_BUILD -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE .......(long command line, list of more-or-less routine warnings) and 0/1530/1530/19178 mbuf clusters in use (current/cache/total/max) procs memory page disks faults cpu r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id 0 0 9 2472776 63368 7126 159 228 129 6751 24168 0 0 23120 1800 15731 41 22 37 dT: 1.063s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name 0 2 0 0 0.0 2 53 3.2 0 0 0.0 0.6 mmcsd0 0 2 0 0 0.0 2 53 3.2 0 0 0.0 0.6 mmcsd0s2 0 2 0 0 0.0 2 53 3.2 0 0 0.0 0.6 mmcsd0s2a 0 2 0 0 0.0 2 53 3.3 0 0 0.0 0.6 ufs/rootfs Fri Nov 29 03:37:44 PST 2019 Device 1K-blocks Used Avail Capacity /dev/mmcsd0s2b 4404252 906260 3497992 21% /dev/da0p6 5242880 907904 4334976 17% Total 9647132 1814164 7832968 19% Nov 29 03:37:13 www sshd[50211]: error: PAM: Authentication error for illegal user support from 103.133.104.114 Nov 29 03:37:13 www sshd[50211]: error: Received disconnect from 103.133.104.114 port 60251:14: No more user authentication methods available. [preauth] The commands used to capture the system activity were: #!/bin/sh while true do vmstat ; gstat -abd -I 1s ; date ; swapinfo ; tail -n 2 /var/log/messages ; netstat -m | grep "mbuf clusters" done About three hours elapsed between the last interactive observation and the last logfile entry. The log files are too large to post in their entirety but they can be put on a webserver if it's useful. Thanks for reading, and any suggestions bob prohaska