Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Mar 2012 16:40:13 GMT
From:      Christian Esken <Christian.Esken@trivago.com>
To:        freebsd-bugs@FreeBSD.org
Subject:   Re: misc/166340: Process under FreeBSD 9.0 hangs in uninterruptable sleep with apparently no syscall (empty wchan)
Message-ID:  <201203271640.q2RGeDGv000643@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/166340; it has been noted by GNATS.

From: Christian Esken <Christian.Esken@trivago.com>
To: bug-followup@freebsd.org
Cc: Konstantin Belousov <kostikbel@gmail.com>, avg@freebsd.org
Subject: Re: misc/166340: Process under FreeBSD 9.0 hangs in
 uninterruptable sleep with apparently no syscall (empty wchan)
Date: Tue, 27 Mar 2012 17:30:27 +0200

 --=-0r5Lk3awdhxqDAjyo6lR
 Content-Type: text/plain; charset="UTF-8"
 Content-Transfer-Encoding: 7bit
 
 Konstantin Belousov wrote:
 > Thank you for the data. Semi-obviously, the callout_stop() call in
 > sleepq_check_timeout() have to return 0, otherwise we would not call
 > mi_switch() there. But I do not see how this can happen, because
 > the callout state, printed from kgdb, still indicates that callout
 > is pending. Callout cannot be reset while in sleepq code.
 > 
 > So there are two possible routes to go forward: preferrable is for
 > you to extract the self-contained C program that would illustrate
 > the issue and send this sample to me. Second is to recompile your
 > kernel with INVARIANTS/WITNESS and possibly KTR and see what happen.
 
 I repeated the test with INVARIANTS/WITNESS and KTR compiled in
 (actually WITNESS was already included during the last test).
 
 I ran KTR with nothing filtered out, and formatted the dump with
 "ktrdump -cftH  -i ktr.out". The whole log is excessive (1GB), so
 I have extrated two short sections (see attachment).
 
 The first section shows the last action of the application, namely a
 succselful sendto() to a TCP socket, and then waiting for an answer via
 recvfrom().
 The second section illustrates the lock/unlock sequence of the sleep
 mutex for the recfrom(). It goes like LOCK, LOCK, UNLOCK.
 
 This time the signal status is different. We have a pending signal:
 USER     PID  PPID  PENDING   CAUGHT  IGNORED  BLOCKED STAT WCHAN
 nobody  9163     1     4000 80005006 79f88010        0 D    -     
 
 Looks like SIGPROF (27). Just wondering where it comes from.
 
 
 
 By the way: I evaluated the possibility to implement a standalone test
 case. It would be extremely complicated, as the issue is while writing
 to the socket, and thus it would require extracting the socket code from
 the Thrift procect (http://thrift.apache.org/ ).
 
   Christian
 
 
 
 --=-0r5Lk3awdhxqDAjyo6lR
 Content-Disposition: attachment; filename="wait_recvfrom.txt"
 Content-Type: text/plain; name="wait_recvfrom.txt"; charset="UTF-8"
 Content-Transfer-Encoding: 7bit
 
 Last actions of pid 9163 (serelog):
 sendto() succesful
 recvfrom() waits for data, uisng sleep mutex
 
 
 7463551   5    5644560314159 /usr/src/sys/kern/kern_sx.c:352          0xfffffe01972b2480 XUNLOCK (sx) so_snd_sx 0xfffffe0344b07490 r = 0 at /usr/src/sys/kern/uipc_sockbuf.c:160
 7463552   1    5644560316280 /usr/src/sys/kern/kern_mutex.c:205       0xfffffe01bcf4d000 LOCK (sleep mutex) kqueue 0xfffffe03eef5eb00 r = 0 at /usr/src/sys/kern/kern_event.c:1779
 7463553   5    5644560319107 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:149 0xfffffe01972b2480 syscall: p=0xfffffe01bc1b0910 error=0 return 0x77d 0x77d
 7463557   5    5644560329931 /usr/src/sys/kern/subr_trap.c:101        0xfffffe01972b2480 userret: thread 0xfffffe01972b2480 (pid 9163, serelog)
 7463559   4    5644560336733 /usr/src/sys/vm/uma_core.c:1975          0xfffffe0008364480 uma_zalloc_arg thread 8364480 zone mbuf flags 1
 7463561   6    5644560344432 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:84 0xfffffe01972ae900 syscall: td=0xfffffe01972ae900 pid 9767 gettimeofday (0x7fffffffac40, 0, 0x43fd18)
 7463562   1    5644560347528 /usr/src/sys/kern/kern_mutex.c:222       0xfffffe01bcf4d000 UNLOCK (sleep mutex) kqueue 0xfffffe03eef5eb00 r = 0 at /usr/src/sys/kern/kern_event.c:1796
 7463563   6    5644560348788 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:149 0xfffffe01972ae900 syscall: p=0xfffffe01bc1ad000 error=0 return 0 0x43fd18
 7463564   5    5644560351047 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:187 0xfffffe01972b2480 syscall sendto exit thread 0xfffffe01972b2480 pid 9163 proc serelog
 7463565   1    5644560354848 /usr/src/sys/kern/kern_mutex.c:205       0xfffffe01bcf4d000 LOCK (sleep mutex) kqueue 0xfffffe02f7525700 r = 0 at /usr/src/sys/kern/kern_event.c:1779
 7463567   5    5644560360499 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:84 0xfffffe01972b2480 syscall: td=0xfffffe01972b2480 pid 9163 gettimeofday (0x7fffffffc600, 0, 0xffffffff
 )
 7463568   2    5644560364617 /usr/src/sys/kern/kern_mutex.c:356       0xfffffe004e384000 _mtx_lock_sleep: taskqueue contested (lock=0xfffffe0008375000) at /usr/src/sys/kern/kern_mutex.c
 :147
 7463569   5    5644560366559 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:149 0xfffffe01972b2480 syscall: p=0xfffffe01bc1b0910 error=0 return 0 0xffffffff
 7463570   3    5644560369374 /usr/src/sys/kern/kern_mutex.c:222       0xfffffe0008375000 UNLOCK (sleep mutex) taskqueue 0xfffffe004e2604b0 r = 0 at /usr/src/sys/kern/subr_taskqueue.c:21
 6
 7463572   6    5644560378372 /usr/src/sys/kern/subr_trap.c:101        0xfffffe01972ae900 userret: thread 0xfffffe01972ae900 (pid 9767, httpd)
 7463573   3    5644560380378 /usr/src/sys/kern/kern_mutex.c:205       0xfffffe0008375000 LOCK (sleep mutex) bio queue 0xffffffff81146ad0 r = 0 at /usr/src/sys/geom/geom_io.c:77
 7463574   4    5644560385085 /usr/src/sys/kern/kern_mutex.c:205       0xfffffe0008364480 LOCK (sleep mutex) mbuf 0xfffffe043ffe5e10 r = 0 at /usr/src/sys/vm/uma_core.c:2013
 7463575   1    5644560388724 /usr/src/sys/kern/kern_mutex.c:222       0xfffffe01bcf4d000 UNLOCK (sleep mutex) kqueue 0xfffffe02f7525700 r = 0 at /usr/src/sys/kern/kern_event.c:1796
 7463576   6    5644560391648 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:187 0xfffffe01972ae900 syscall gettimeofday exit thread 0xfffffe01972ae900 pid 9767 proc httpd
 7463577   3    5644560394202 /usr/src/sys/kern/kern_mutex.c:222       0xfffffe0008375000 UNLOCK (sleep mutex) bio queue 0xffffffff81146ad0 r = 0 at /usr/src/sys/geom/geom_io.c:84
 7463578   1    5644560396480 /usr/src/sys/kern/kern_mutex.c:205       0xfffffe01bcf4d000 LOCK (sleep mutex) kqueue 0xfffffe01e96bd000 r = 0 at /usr/src/sys/kern/kern_event.c:1779
 7463580   5    5644560404315 /usr/src/sys/kern/subr_trap.c:101        0xfffffe01972b2480 userret: thread 0xfffffe01972b2480 (pid 9163, serelog)
 7463581   2    5644560408289 /usr/src/sys/kern/kern_mutex.c:374       0xfffffe004e384000 _mtx_lock_sleep: spinning on 0xfffffe004e2604b0 held by 0xfffffe0008375000
 7463582   5    5644560410639 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:187 0xfffffe01972b2480 syscall gettimeofday exit thread 0xfffffe01972b2480 pid 9163 proc serelog
 7463583   2    5644560414981 /usr/src/sys/kern/kern_mutex.c:205       0xfffffe004e384000 LOCK (sleep mutex) taskqueue 0xfffffe004e2604b0 r = 0 at /usr/src/sys/kern/kern_mutex.c:147
 7463584   5    5644560419167 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:84 0xfffffe01972b2480 syscall: td=0xfffffe01972b2480 pid 9163 recvfrom (0x6, 0x7fffffffc734, 0x4)
 7463585   1    5644560421688 /usr/src/sys/kern/kern_mutex.c:222       0xfffffe01bcf4d000 UNLOCK (sleep mutex) kqueue 0xfffffe01e96bd000 r = 0 at /usr/src/sys/kern/kern_event.c:1796
 7463586   5    5644560423707 /usr/src/sys/kern/kern_sx.c:291          0xfffffe01972b2480 XLOCK (sx) so_rcv_sx 0xfffffe0344b073a0 r = 0 at /usr/src/sys/kern/uipc_sockbuf.c:148
 7463587   3    5644560426330 /usr/src/sys/geom/geom_io.c:678          0xfffffe0008375000 g_up biodone bp 0xfffffe004e8a0740 provider da0s1 off 134525138944 len 131072
 7463588   5    5644560428231 /usr/src/sys/kern/kern_mutex.c:205       0xfffffe01972b2480 LOCK (sleep mutex) so_rcv 0xfffffe0344b07380 r = 0 at /usr/src/sys/kern/uipc_socket.c:1488
 
 This is the last place where I see 0xfffffe01972b2480 or "pid 9163".
 The value behind so_rcv os 0xfffffe0344b07380, and that one comes twice again:
 
 
 7464953   4    5644563693233 /usr/src/sys/kern/kern_mutex.c:205       0xfffffe0008364480 LOCK (sleep mutex) so_rcv 0xfffffe0344b07380 r = 0 at /usr/src/sys/netinet/tcp_input.c:2834
 7464954   0    5644563696896 /usr/src/sys/kern/kern_mutex.c:205       0xfffffe004e75a900 LOCK (sleep mutex) KNOTE 0xfffffe004e104010 r = 0 at /usr/src/sys/vm/uma_core.c:2013
 7464955   6    5644563699004 /usr/src/sys/kern/subr_trap.c:101        0xfffffe01972ae900 userret: thread 0xfffffe01972ae900 (pid 9767, httpd)
 7464956   1    5644563701340 /usr/src/sys/kern/kern_mutex.c:222       0xfffffe01bcf4d000 UNLOCK (sleep mutex) kqueue 0xfffffe01f3fb8900 r = 0 at /usr/src/sys/kern/kern_event.c:1796
 7464957   6    5644563704636 /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:187 0xfffffe01972ae900 syscall gettimeofday exit thread 0xfffffe01972ae900 pid 9767 proc httpd
 7464958   3    5644563706986 /usr/src/sys/geom/geom_io.c:165          0xfffffe0008375000 #2 0xffffffff807c5a45 at g_io_schedule_up+0x175
 7464959   4    5644563708981 /usr/src/sys/kern/kern_mutex.c:222       0xfffffe0008364480 UNLOCK (sleep mutex) so_rcv 0xfffffe0344b07380 r = 0 at /usr/src/sys/kern/uipc_sockbuf.c:201
 
 What I find notewothy is, that the lock/unlock sequence with so_rcv 0xfffffe0344b07380 looks odd:
 LOCK, LOCK, UNLOCK
 
 Natively one would expect to see one more UNLOCK like this:
 LOCK, UNLOCK, LOCK, UNLOCK
 
 --=-0r5Lk3awdhxqDAjyo6lR--
 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201203271640.q2RGeDGv000643>