Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Oct 2004 12:23:22 +0300
From:      Andriy Gapon <avg@icyb.net.ua>
To:        freebsd-emulation@freebsd.org
Subject:   linux threaded application hang
Message-ID:  <416CF40A.3070008@icyb.net.ua>

next in thread | raw e-mail | index | archive | help

This is probably not a problem of our linux emulation, but I think this
is most likely place for me to get help.
Here's what I have:

linux_base-7.1_7
linux_devtools-7.1_3

I have installed IBM WebSphere MQ:

MQSeriesJava-5.3.0-1
MQSeriesSDK-5.3.0-1
MQSeriesRuntime-5.3.0-1
MQSeriesServer-5.3.0-1
MQSeriesClient-5.3.0-1
MQSeriesJava-U486878-5.3.0-4
MQSeriesRuntime-U486878-5.3.0-4
MQSeriesSDK-U486878-5.3.0-4
MQSeriesServer-U486878-5.3.0-4
MQSeriesClient-U486878-5.3.0-4


I am trying to execute `setmqcap -1` command and it hangs, seemingly
because of inter-thread communication problem.
Here's ktrace | linux_kdump of the ineteresting part:

#main thread creates a new thread/process with pid 5716
  5715 setmqcap CALL  linux_clone(0xf00,0x805b3a0)
  5715 setmqcap RET   linux_clone 5716/0x1654
  5716 setmqcap RET   linux_fork 0

#writes/reads are made to a pipe and probably are some sort of
#inter-thread communication in old linux threads
#fd 4 in 5715 is writing end and fd 3 in 5716 is reading end of the pipe
  5715 setmqcap CALL  write(0x4,0xbfbfd900,0x94)
  5715 setmqcap RET   write 148/0x94
  5716 setmqcap CALL  linux_rt_sigprocmask(0x2,0x805b304,0,0x8)
  5716 setmqcap RET   linux_rt_sigprocmask 0
  5715 setmqcap CALL  linux_rt_sigprocmask(0x2,0,0xbfbfd9d4,0x8)
  5715 setmqcap RET   linux_rt_sigprocmask 0
  5716 setmqcap CALL  read(0x3,0x805b264,0x94)
  5716 setmqcap GIO   fd 3 read 148 bytes
  5716 setmqcap RET   read 148/0x94
  5715 setmqcap CALL  write(0x4,0xbfbfd9c0,0x94)
  5715 setmqcap GIO   fd 4 wrote 148 bytes
  5715 setmqcap RET   write 148/0x94

#child threas performs timed poll on a pipe before reading from it
#0x7d0 == 2000ms == 2s
  5716 setmqcap CALL  poll(0x805b25c,0x1,0x7d0)
  5716 setmqcap RET   poll 1
  5715 setmqcap CALL  linux_rt_sigprocmask(0x2,0,0xbfbfd910,0x8)
  5715 setmqcap RET   linux_rt_sigprocmask 0
  5716 setmqcap CALL  getppid
  5716 setmqcap RET   getppid 5715/0x1653

#it seems that main thread starts to wait on some condition
#until it is awaken by child thread by a signal
  5715 setmqcap CALL  linux_rt_sigsuspend(0xbfbfd910,0x8)
  5716 setmqcap CALL  read(0x3,0x805b264,0x94)
  5716 setmqcap GIO   fd 3 read 148 bytes
  5716 setmqcap RET   read 148/0x94
  5716 setmqcap CALL  linux_mmap(0x805b194)
  5716 setmqcap RET   linux_mmap -1086394368/0xbf3ef000
  5716 setmqcap CALL  mprotect(0xbf3ef000,0x1000,0)
  5716 setmqcap RET   mprotect 0

#child thread creates another thread, that's probably not important
  5716 setmqcap CALL  linux_clone(0xf21,0xbf3ffbf8)
  5716 setmqcap RET   linux_clone 5717/0x1655
  5717 setmqcap RET   linux_fork 0

#child thread wants to wake up main thread (pid == 5715 == 0x1653)
#by sending signal 0x20 == 32 (== linux SIGRTMIN ?)
  5716 setmqcap CALL  linux_kill(0x1653,0x20)

#but it seems to fail.. or not ?
#I am not sure how to interpret "RESTART" here
  5716 setmqcap RET   linux_kill RESTART
  5717 setmqcap CALL  linux_getpid
  5717 setmqcap RET   linux_getpid 5717/0x1655
  5716 setmqcap CALL  poll(0x805b25c,0x1,0x7d0)
  5717 setmqcap CALL  linux_rt_sigprocmask(0x2,0xbf3ffca4,0,0x8)
  5717 setmqcap RET   linux_rt_sigprocmask 0
  5717 setmqcap CALL  linux_sched_setscheduler(0x1655,0,0xbf3ffd28)
  5717 setmqcap RET   linux_sched_setscheduler 0
  5717 setmqcap CALL  linux_getpid
  5717 setmqcap RET   linux_getpid 5717/0x1655
  5717 setmqcap CALL  linux_getpid
  5717 setmqcap RET   linux_getpid 5717/0x1655
  5717 setmqcap CALL  linux_sched_setscheduler(0x1655,0,0xbf3ff7a8)
  5717 setmqcap RET   linux_sched_setscheduler 0
  5717 setmqcap CALL  linux_rt_sigprocmask(0,0xbf3feecc,0xbf3ff5ac,0x8)
  5717 setmqcap RET   linux_rt_sigprocmask 0
  5717 setmqcap CALL  linux_rt_sigaction(0x1c,0xbf3fefcc,0xbf3fef3c,0x8)
  5717 setmqcap RET   linux_rt_sigaction 0
  5717 setmqcap CALL  linux_rt_sigprocmask(0x2,0,0xbf3feb3c,0x8)
  5717 setmqcap RET   linux_rt_sigprocmask 0

#second child thread starts to wait for a signal
  5717 setmqcap CALL  linux_rt_sigsuspend(0xbf3feb3c,0x8)

#first child thread continues polling for read communication pipe
#but there is nothing there because 5715 is still suspended
  5716 setmqcap RET   poll 0
  5716 setmqcap CALL  getppid
  5716 setmqcap RET   getppid 5715/0x1653
  5716 setmqcap CALL  poll(0x805b25c,0x1,0x7d0)
  5716 setmqcap RET   poll 0
  5716 setmqcap CALL  getppid
  5716 setmqcap RET   getppid 5715/0x1653
  5716 setmqcap CALL  poll(0x805b25c,0x1,0x7d0)
  5716 setmqcap RET   poll 0
  5716 setmqcap CALL  getppid
  5716 setmqcap RET   getppid 5715/0x1653
  5716 setmqcap CALL  poll(0x805b25c,0x1,0x7d0)
  5716 setmqcap RET   poll 0
  5716 setmqcap CALL  getppid
  5716 setmqcap RET   getppid 5715/0x1653
  5716 setmqcap CALL  poll(0x805b25c,0x1,0x7d0)
#this keeps going on forever (until program is interrupted)

If my understanding of how old linux threads work is correct, then it
seems that the problem is that thread 5715 is never waken up from
linux_rt_sigsuspend. I am not sure if this is because that signal 32 was
never delivered or something more fishy is going on there:

I have attached with gdb to main thread process and here's some data:
(gdb) bt
#0  0x481c69b6 in __sigsuspend (set=0xbfbfd8b0) at
../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x48186d45 in __pthread_wait_for_restart_signal (self=0x481905a0) at
pthread.c:978
#2  0x48186e1d in __pthread_create_2_1 (thread=0xbfbfdaf4,
attr=0xbfbfdf80, start_routine=0x480b6b80 <ThreadMain>, arg=0x8058438)
at restart.h:34
#3  0x480b5683 in xcsCreateThreadWithHandle () at eval.c:41
#4  0x480861ed in xehStartAsySignalMonitor () at eval.c:41
#5  0x480824d0 in xehInitialize () at eval.c:41
#6  0x480928b5 in InitPrivateServices () at eval.c:41
#7  0x48094531 in xcsInitialize () at eval.c:41
#8  0x08049a01 in main () at eval.c:41
#9  0x481b4336 in __libc_start_main (main=0x80496d0 <main>, argc=1,
ubp_av=0xbfbfe744, init=0x8049108 <_init>, fini=0x80540a0 <_fini>,
    rtld_fini=0x4806325c <_dl_fini>, stack_end=0xbfbfe73c) at
../sysdeps/generic/libc-start.c:129
(gdb) p set
$1 = (sigset_t *) 0x482c87f0
(gdb) p *set
$2 = {__val = {1250168, 1208405312, 1208363760, 1209738430, 1209738446,
1210126092, 1209738478, 1209738494, 1210131344, 1209738526, 1209738542,
1209738558,
    1209738574, 1209738590, 1209554352, 1209738622, 1209738638,
1209738654, 1209738670, 1209738686, 1209738702, 1209738718, 1209738734,
1209738750,
    1209738766, 1209738782, 1209738798, 1210090948, 1209738830,
1209738846, 1209738862, 1209738878}}

if I understand correctly linux sigset_t then mask for signal 32 should
be bit 31 (highest) of __val[0]:

1250168 == 100110001001101111000

e.g. it seems that the signal was blocked.

Unfortunately I was not able to attach to other threads, but this looks
insane, especially considering this:

(gdb) fr 1
#1  0x48186d45 in __pthread_wait_for_restart_signal (self=0x481905a0) at
pthread.c:978
978     pthread.c: No such file or directory.
        in pthread.c
(gdb) p mask
$3 = {__val = {0, 0, 0, 0, 0, 1208347125, 0, 3217021232, 0, 0, 0, 0, 0,
0, 3, 0, 0, 0, 0, 0, 16777216, 0, 1209525676, 102, 1209536638, 1209529244,
    1209524044, 1208404144, 7, 1210900512, 3, 1210877936}}
(gdb) p __pthread_sig_restart
$4 = 32
(gdb) p &mask
$5 = (sigset_t *) 0xbfbfd890

How can &mask in frame 1 be different from set in frame 0 ?

P.S.
I tried setting LD_ASSUME_KERNEL to various values from 2.2.5 to 2.4.2,
but it didn't change anything.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?416CF40A.3070008>