Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Dec 2008 01:18:21 -0500
From:      "Dylan Cochran" <a134qaed@gmail.com>
To:        freebsd-stable <freebsd-stable@freebsd.org>
Subject:   Re: Hard lock on 7.1-RC1
Message-ID:  <bdf82f800812222218j3c77f564gf9b8e48bbbda7ca2@mail.gmail.com>
In-Reply-To: <bdf82f800812211205k1a90c926q473f468126d58d7f@mail.gmail.com>
References:  <bdf82f800812211205k1a90c926q473f468126d58d7f@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Dec 21, 2008 at 3:05 PM, Dylan Cochran <a134qaed@gmail.com> wrote:
> I'm hitting a strange lockup on 7.1-RC1, where some socket operations
> seem to stall, as well as basic file operations. The only reproducable
> way I have of triggering it is by doing multiple inserts into
> phpmyadmin on lighttpd+fastcgi php5 + mysql51-server, though this
> isn't the only thing which triggers it, just the only one which is
> semi reliable. I've also reproduced this on another machine, set up
> specifically to rule out any machine specific problems (as they have
> different drive controllers, one uses gjournal, etc).
>
> I inititially built a kernel with SW_WATCHDOG, and attempted to use
> watchdogd and DDB to get an output from show locks, but the watchdogd
> hasn't panicked the machine, so at least devfs is still unlocked; I'm
> not able to get physical access to the machine until monday.
>
> The bug was introduced as far as I can tell, between 7.1-BETA2 and 7.1-RC=
1.
>
> Any suggestions on what I can test for tommorow?
>
I updated the kernel source to RELENG_7_1 as of a few hours ago, and
built with DEBUG_VFS_LOCKS as well.

Luckily the backtrace included the operating it was at before the
watchdog, which seems to be kern_sendfile(). I'm no expert at kernel
debugging, so any assistance on tracking this down further would be
greatly appreciated.

And, as promised, here is the output of script after the watchdog induced p=
anic:

Script started on Tue Dec 23 01:05:56 2008
# cu -l cua01
interrupt                   total
irq4: sio0                           623
irq6: fdc0                             1
irq17: fwohci0                         3
irq18: rl0 uhci2++                 60718
irq23: rl1 ehci0                     206
cpu0: timer                       514596
Total                      576147
KDB: stack backtrace:
db_trace_self_wrapper(c0b55b52,e66e0ae0,c07615e9,c0b50617,8ca93,...)
at db_trace_self_wrapper+0x26
kdb_backtrace(c0b50617,8ca93,0,c41a7690,2,...) at kdb_backtrace+0x29
hardclock(0,c07ff29d,0,0,4,...) at hardclock+0x1f9
lapic_handle_timer(e66e0b08) at lapic_handle_timer+0x9c
Xtimerint() at Xtimerint+0x1f
--- interrupt, eip =3D 0xc07ff29d, esp =3D 0xe66e0b48, ebp =3D 0xe66e0c34 -=
--
kern_sendfile(c41a7690,e66e0cfc,0,0,0,...) at kern_sendfile+0x90d
do_sendfile(e66e0d2c,c0aba265,c41a7690,e66e0cfc,20,...) at do_sendfile+0xb1
sendfile(c41a7690,e66e0cfc,20,16,e66e0d2c,...) at sendfile+0x13
syscall(e66e0d38) at syscall+0x335
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (393, FreeBSD ELF32, sendfile), eip =3D 0x282cb0cb, esp =3D
0xbfbfc7cc, ebp =3D 0xbfbfe848 ---
KDB: enter: watchdog timeout
[thread pid 1288 tid 100060 ]
Stopped at      kdb_enter_why+0x3a:     movl    $0,kdb_why
db> show lock
db> p=08 =08show all proc
  pid  ppid  pgrp   uid   state   wmesg     wchan    cmd
 1600   902   902     0  R                           watchdogd
 1470  1469  1470     0  S+      ttyin    0xc418fc10 csh
 1469     1  1469     0  Ss+     wait     0xc46032b8 login
 1468     1  1468     0  Ss+     ttyin    0xc41ac810 getty
 1427     1  1427     0  Ss      nanslp   0xc0c7dc44 cron
 1420     1  1420     0  Ss      select   0xc0c88eb8 sshd
 1419  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
 1418  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
 1417  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
 1416  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
 1415  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
 1414  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
 1413  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
 1412  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
 1411  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
 1410  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
 1409  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
 1408  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
 1407  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
--More--

 1406  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
--More--

 1405  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
 1404  1289  1289    80  SJ      accept   0xc445ab9a php-cgi
 1403  1300  1300    80  SJ      accept   0xc445a6ba php-cgi
 1402  1300  1300    80  SJ      accept   0xc445a6ba php-cgi
 1401  1300  1300    80  SJ      accept   0xc445a6ba php-cgi
 1400  1300  1300    80  SJ      accept   0xc445a6ba php-cgi
 1399  1300  1300    80  RJ                          php-cgi
 1398  1300  1300    80  SJ      accept   0xc445a6ba php-cgi
 1397  1300  1300    80  SJ      accept   0xc445a6ba php-cgi
 1396  1300  1300    80  SJ      accept   0xc445a6ba php-cgi
 1395  1300  1300    80  SJ      accept   0xc445a6ba php-cgi
 1394  1300  1300    80  SJ      accept   0xc445a6ba php-cgi
 1393  1300  1300    80  SJ      accept   0xc445a6ba php-cgi
 1392  1300  1300    80  SJ      accept   0xc445a6ba php-cgi
 1391  1300  1300    80  SJ      accept   0xc445a6ba php-cgi
 1390  1300  1300    80  SJ      accept   0xc445a6ba php-cgi
 1389  1300  1300    80  SJ      accept   0xc445a6ba php-cgi
 1388  1300  1300    80  SJ      accept   0xc445a6ba php-cgi
 1322     1  1322    25  SsJ     pause    0xc4562b40 sendmail
 1316     1  1316     0  RsJ                         sendmail
--More--

 1306     1  1306     0  SsJ     select   0xc0c88eb8 sshd
 1300  1288  1300    80  SsJ     wait     0xc43beae0 initial thread
 1289  1288  1289    80  SsJ     wait     0xc4564000 initial thread
 1288     1  1287    80  RJ      CPU 0               lighttpd
 1286  1173  1170    88  R+J     (threaded)          mysqld
100114                   S       sigwait  0xe67a2be0 mysqld
100090                   S       ucond    0xc4359580 mysqld
100092                   RunQ                        mysqld
100091                   RunQ                        mysqld
100089                   S       ucond    0xc43598c0 mysqld
100088                   S       ucond    0xc4358c80 mysqld
100087                   S       ucond    0xc4094780 mysqld
100086                   S       ucond    0xc4359140 mysqld
100049                   S       select   0xc0c88eb8 initial thread
 1173     1  1170    88  S+J     wait     0xc43be2b8 sh
  920     1   920     0  R+      (threaded)          venti
100072                   S       ucond    0xc4358300 venti
100071                   S       accept   0xc445b03a venti
100070                   S       ucond    0xc4358200 venti
100069                   S       ucond    0xc4095980 venti
--More--

100068                   S       ucond    0xc41b1e00 venti
100067                   S       ucond    0xc4358100 venti
100066                   S       ucond    0xc4358280 venti
100065                   S       ucond    0xc41a45c0 venti
100064                   S       accept   0xc445a1da venti
100063                   RunQ                        venti
100053                   RunQ                        venti
  902     1   902     0  Ss      wait     0xc4343570 watchdogd
  891     1   891   102  Rs                          gmond
  770     1   770     0  Ss      select   0xc0c88eb8 devd
  338     1   338    65  Ss      select   0xc0c88eb8 dhclient
  322     1    49     0  S+      select   0xc0c88eb8 dhclient
   48     0     0     0  SL      sdflush  0xc0c95228 [softdepflush]
   47     0     0     0  RL                          [syncer]
   46     0     0     0  SL      vlruwt   0xc4342000 [vnlru]
   45     0     0     0  SL      psleep   0xc0c89324 [bufdaemon]
   44     0     0     0  SL      pgzero   0xc0c95e00 [pagezero]
   43     0     0     0  SL      psleep   0xc0c95a18 [vmdaemon]
   42     0     0     0  SL      psleep   0xc0c959e0 [pagedaemon]
   41     0     0     0  SL      waiting_ 0xc0c8b0b4 [sctp_iterator]
--More--

   40     0     0     0  WL                          [irq7: ppbus0 ppc0]
   39     0     0     0  WL                          [irq1: atkbd0]
   38     0     0     0  WL                          [irq15: ata1]
   37     0     0     0  WL                          [irq14: ata0]
   36     0     0     0  WL                          [swi0: sio]
   35     0     0     0  SL      -        0xc418d83c [fdc0]
   34     0     0     0  SL      -        0xc4097000 [fw0_probe]
   33     0     0     0  SL      -        0xc408c200 [fw0_taskq]
   32     0     0     0  SL      usbevt   0xc4013210 [usb4]
   31     0     0     0  WL                          [irq23: rl1 ehci0]
   30     0     0     0  SL      usbevt   0xc407f210 [usb3]
   29     0     0     0  SL      usbevt   0xc406c210 [usb2]
   28     0     0     0  WL                          [irq18: rl0 uhci2++]
   27     0     0     0  SL      usbevt   0xc405c210 [usb1]
   26     0     0     0  WL                          [irq19: uhci1]
   25     0     0     0  SL      usbtsk   0xc0c7b234 [usbtask-dr]
   24     0     0     0  SL      usbtsk   0xc0c7b220 [usbtask-hc]
   23     0     0     0  SL      usbevt   0xc4024210 [usb0]
   22     0     0     0  WL                          [irq16: uhci0 uhci3]
   21     0     0     0  WL                          [irq9: acpi0]
--More--

   20     0     0     0  WL                          [swi6: task queue]
   19     0     0     0  WL                          [swi6: Giant taskq]
   18     0     0     0  SL      -        0xc3fe3700 [thread taskq]
   17     0     0     0  WL                          [swi5: +]
    9     0     0     0  SL      -        0xc3fe3880 [acpi_task_2]
    8     0     0     0  SL      -        0xc3fe3880 [acpi_task_1]
    7     0     0     0  SL      -        0xc3fe3880 [acpi_task_0]
   16     0     0     0  WL                          [swi2: cambio]
    6     0     0     0  SL      ccb_scan 0xc0c4b054 [xpt_thrd]
    5     0     0     0  SL      -        0xc3fe3a80 [kqueue taskq]
   15     0     0     0  SL      -        0xc0c7da74 [yarrow]
    4     0     0     0  SL      -        0xc0c7b62c [g_down]
    3     0     0     0  SL      -        0xc0c7b628 [g_up]
    2     0     0     0  SL      -        0xc0c7b620 [g_event]
   14     0     0     0  WL                          [swi3: vm]
   13     0     0     0  WL                          [swi4: clock sio]
   12     0     0     0  WL                          [swi1: net]
   11     0     0     0  RL                          [idle: cpu0]
    1     0     1     0  SLs     wait     0xc3f16ae0 [init]
   10     0     0     0  SL      audit_wo 0xc0c94c64 [audit]
--More--

    0     0     0     0  SLs     sched    0xc0c7b6e0 [swapper]
db> Killed



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bdf82f800812222218j3c77f564gf9b8e48bbbda7ca2>