Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Aug 2011 20:53:17 -0700
From:      =?iso-8859-1?Q?Ask_Bj=F8rn_Hansen?= <ask@develooper.com>
To:        Julian Elischer <julian@freebsd.org>
Cc:        Mike Tancsa <mike@sentex.net>, net@freebsd.org
Subject:   system locks up when pf is enabled? (was: system locks up with vr driver on alix board)
Message-ID:  <8D7CB918-F7E0-4BC6-B9FC-624B1E438692@develooper.com>
In-Reply-To: <4E4B3267.8050707@freebsd.org>
References:  <D8B41107-90ED-4357-A7DA-4FF987C70567@develooper.com>	<4E4AB3BE.4090603@sentex.net> <9255C71C-BB78-417E-A900-85140FC2050C@develooper.com> <4E4B3267.8050707@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Aug 16, 2011, at 20:15, Julian Elischer wrote:

> from your description it doesn't sound like a vr problem.
> I suggest you hook up teh serial console (I am guessing you already =
have)
> and set the config options to allow break-to-debugger or =
alt-break-to-debugger on it
> when it happens next, drop into the debugger..
>=20
> in fact, drop in, and do a ps to see what processes are runnable,
> 'tr [pid] (or thread id)' to get a stack trace of anything that looks
> interesting, and then cont and do it again a few times to get a feel
> for where the processor is hanging out (a straight 'tr' will give you
> the interrupt of the com port which is not intresting..)

Alright, I thought I had locked this down to ZERO_COPY_SOCKETS being =
enabled in the kernel (as unlikely as it seems that it'd break things =
without being explicitly used -- I wanted to believe!).

A couple days ago I figured out how to consistently make it happen (or =
not).  For about a week we'd been running with no trouble; until I =
realized that a hostname added to our pf rules made them not load on =
startup.  When running without our pf rules loaded, everything is fine.  =
WIth the pf rules loaded the system will hang within 2-4 hours.

Our ruleset was about 240 rules.  We cut it down to ~140 rules today to =
see if that made a difference; it didn't.  We also turned off as much =
IPv6 traffic as we could in case that was what was disturbing things =
(adding IPv6 traffic is relatively new).

Flipping the backup system to be the master for the active IP addresses =
doesn't (as far as I can tell) make the old master recover.

I tried looking for interesting information with the kernel debugger; =
but honestly nothing springs out as interesting -- though that may be =
due to my lack of skills more than anything else.   I am including a =
snapshot below.  I ran ps and tr on 100024, 100025 (vr0 and vr1 =
interrupt threads) and 100022 (thread taskq) a few more times but it =
seemed to me to give basically the same thread trace information.

For what it's worth, if I let 'cont' run for long enough for the system =
to send out carp packets and get it's IPs back, 'ps' seemed more likely =
to have 100024 or 100025 (vr0/vr1) in 'Run' state.  If I interrupted the =
system again relatively quickly after continuing, it was more likely to =
have 'thread tasks' be the busy process.

Any tips for what I can do to extract something more useful would be =
greatly appreciated.

As a last note -- earlier today I noticed that "Searches" in 'pfctl -si' =
seemed to go up significantly just before the console locks up (which is =
15-30 minutes before the box stops routing packets).   My theory was =
that we were getting some sort of tiny DoS attack (doesn't take much to =
take out a computer the size of a CD case); but that doesn't really make =
sense as the system 1) never recovers and 2) the backup is fine as soon =
as the master has been turned off / changed priorities.


 - ask


db> ps
  pid  ppid  pgrp   uid   state   wmesg     wchan    cmd
 3300  2371  2371     0  R                           sshd
 3299  3298  3298     0  S+      sbwait   0xc2ddb58c ssh
 3298  3100  3298     0  S+      piperd   0xc2624620 scp
 3100  3098  3100     0  Ss+     pause    0xc2e95850 csh
 3098  2371  3098     0  Rs                          sshd
 2904  2371  2904     0  Ss      select   0xc379a924 sshd
 2476  2474  2476     0  Ss+     ttyin    0xc2e44070 csh
 2474  2371  2474     0  Rs                          sshd
 2440     1  2440     0  Rs+                         getty
 2433     1  2433     0  Rs                          cron
 2426     1  2426    62  Rs                          ftp-proxy
 2397     1  2397     0  Ss      select   0xc2d71964 inetd
 2371     1  2371     0  Rs                          sshd
 2349     1  2349     0  Rs                          bird
 2302     1  2302     0  Rs                          radvd
 2298  2292  2285 65534  S       piperd   0xc2624188 multilog
 2297  2291  2285     0  R                           openvpn
 2292  2287  2285     0  S       select   0xc2d713a4 supervise
 2291  2287  2285     0  S       select   0xc2d71a64 supervise
 2288     1  2285     0  S       piperd   0xc2624c40 readproctitle
 2287     1  2285     0  R                           svscan
 2244     1  2244     0  Rs                          ntpd
 2129     1  2129     0  Rs                          syslogd
   50     0     0     0  SL      mdwait   0xc2717000 [md1]
   40     0     0     0  SL      mdwait   0xc24ef000 [md0]
   22     0     0     0  RL                          [flowcleaner]
   21     0     0     0  SL      sdflush  0xc0b24d80 [softdepflush]
   20     0     0     0  SL      vlruwt   0xc25f52a8 [vnlru]
   19     0     0     0  SL      syncer   0xc0b14774 [syncer]
   18     0     0     0  RL                          [bufdaemon]
   17     0     0     0  RL                          [pagezero]
   16     0     0     0  SL      psleep   0xc0b2567c [vmdaemon]
   15     0     0     0  SL      psleep   0xc0b25644 [pagedaemon]
    9     0     0     0  SL      ccb_scan 0xc0b006d4 [xpt_thrd]
    8     0     0     0  RL                          [pfpurge]
    7     0     0     0  SL      waiting_ 0xc0b181d8 [sctp_iterator]
   14     0     0     0  SL      (threaded)          usb
100036                   D       -        0xc250ed34 [usbus1]
100035                   D       -        0xc250ed04 [usbus1]
100034                   D       -        0xc250ecd4 [usbus1]
100033                   D       -        0xc250eca4 [usbus1]
100032                   D       -        0xc2504b5c [usbus0]
100031                   D       -        0xc2504b2c [usbus0]
100030                   D       -        0xc2504afc [usbus0]
100029                   D       -        0xc2504acc [usbus0]
   13     0     0     0  SL      -        0xc0b0df64 [yarrow]
    6     0     0     0  SL      crypto_r 0xc0b2432c [crypto returns]
    5     0     0     0  SL      crypto_w 0xc0b24308 [crypto]
    4     0     0     0  SL      -        0xc0b0bc44 [g_down]
    3     0     0     0  SL      -        0xc0b0bc40 [g_up]
    2     0     0     0  SL      -        0xc0b0bc38 [g_event]
   12     0     0     0  RL      (threaded)          intr
100037                   I                           [swi0: uart uart]
100028                   I                           [irq12: ohci0 =
ehci0]
100027                   I                           [irq14: ata0]
100026                   I                           [irq15: vr2 ata1]
100025                   Run     CPU 0               [irq11: vr1]
100024                   RunQ                        [irq10: vr0]
100021                   I                           [swi5: +]
100016                   I                           [swi2: cambio]
100015                   I                           [swi6: task queue]
100014                   I                           [swi6: Giant taskq]
100006                   I                           [swi1: netisr 0]
100005                   I                           [swi4: clock]
100004                   I                           [swi3: vm]
   11     0     0     0  RL                          [idle]
    1     0     1     0  SLs     wait     0xc2433d48 [init]
   10     0     0     0  SL      audit_wo 0xc0b246a0 [audit]
    0     0     0     0  RLs     (threaded)          kernel
100040                   D       -        0xc2527880 [dummynet]
100023                   D       -        0xc24d6580 [glxsb0 taskq]
100022                   RunQ                        [thread taskq]
100020                   D       -        0xc24afd40 [acpi_task_2]
100019                   D       -        0xc24afd40 [acpi_task_1]
100018                   D       -        0xc24afd40 [acpi_task_0]
100017                   D       -        0xc24afd80 [kqueue taskq]
100010                   D       -        0xc24512c0 [firmware taskq]
100000                   D       sched    0xc0b0bd20 [swapper]


db>  tr 100025
Tracing pid 12 tid 100025 td 0xc24d42d0
kdb_enter(c0a48fbe,c0a5d00a,1,c2522280,0,...) at 0xc0733b3a =3D =
kdb_enter+0x3a
uart_intr(c2522200,c24d42d0,c0afdc90,c2430980,4,...) at 0xc0632206 =3D =
uart_intr+0x126
intr_event_handle(c2430980,cc918ac4,cc918aa4,cc918b40,4,...) at =
0xc06dff2c =3D intr_event_handle+0x5c
intr_execute_handlers(c0afdc90,cc918ac4,cc918b40,c2ef8a00,cc918b68,...) =
at 0xc09d4929 =3D intr_execute_handlers+0x49
atpic_handle_intr(4,cc918ac4) at 0xc09ee047 =3D atpic_handle_intr+0x67
Xatpic_intr4() at 0xc09cfa52 =3D Xatpic_intr4+0x22
--- interrupt, eip =3D 0xc09e0f58, esp =3D 0xcc918b04, ebp =3D =
0xcc918b68 ---
bzero(c2ef8a00,0,12,0,0,...) at 0xc09e0f58 =3D bzero
ip_input(c2ef8a00,246,c24a77c0,cc918bd0,c0622c21,...) at 0xc083ff8d =3D =
ip_input+0x6cd
netisr_dispatch_src(1,0,c2ef8a00,cc918c08,c07bb331,...) at 0xc07c9109 =3D =
netisr_dispatch_src+0x89
netisr_dispatch(1,c2ef8a00,c24d8c00,c24d8c00,c2f16008,...) at 0xc07c93a0 =
=3D netisr_dispatch+0x20
ether_demux(c24d8c00,c2ef8a00,3,0,3,...) at 0xc07bb331 =3D =
ether_demux+0x161
ether_input(c24d8c00,c2ef8a00,c09d6ad0,c0b11780,c24d42d0,...) at =
0xc07bb8af =3D ether_input+0x33f
vr_intr(c2499000,c24d42d0,0,109,dc13a3e2,...) at 0xc0690034 =3D =
vr_intr+0x4d4
intr_event_execute_handlers(c24337f8,c2430600,c0a6b180,52d,c2430670,...) =
at 0xc06def8b =3D intr_event_execute_handlers+0x13b
ithread_loop(c24de360,cc918d28,0,0,0,...) at 0xc06e05ea =3D =
ithread_loop+0x6a
fork_exit(c06e0580,c24de360,cc918d28) at 0xc06dc830 =3D fork_exit+0x90
fork_trampoline() at 0xc09cf964 =3D fork_trampoline+0x8
--- trap 0, eip =3D 0, esp =3D 0xcc918d60, ebp =3D 0 ---

db> tr 100024
Tracing pid 12 tid 100024 td 0xc24d45a0
sched_switch(c24d45a0,0,109,dc1435c0,660,...) at 0xc07270bf =3D =
sched_switch+0x1df
mi_switch(109,0,c0a6b180,52d,c24306f0,...) at 0xc071086a =3D =
mi_switch+0x11a
ithread_loop(c24de8c0,cc894d28,0,0,0,...) at 0xc06e06df =3D =
ithread_loop+0x15f
fork_exit(c06e0580,c24de8c0,cc894d28) at 0xc06dc830 =3D fork_exit+0x90
fork_trampoline() at 0xc09cf964 =3D fork_trampoline+0x8
--- trap 0, eip =3D 0, esp =3D 0xcc894d60, ebp =3D 0 ---


db> cont
KDB: enter: Line break on console
[thread pid 0 tid 100022 ]
Stopped at      0xc0733b3a =3D kdb_enter+0x3a:    movl    $0,0xc0b124b4 =
=3D kdb_why
db> ps=20
  pid  ppid  pgrp   uid   state   wmesg     wchan    cmd
 3300  2371  2371     0  R                           sshd
 3299  3298  3298     0  S+      sbwait   0xc2ddb58c ssh
 3298  3100  3298     0  S+      piperd   0xc2624620 scp
 3100  3098  3100     0  Ss+     pause    0xc2e95850 csh
 3098  2371  3098     0  Rs                          sshd
 2904  2371  2904     0  Ss      select   0xc379a924 sshd
 2476  2474  2476     0  Ss+     ttyin    0xc2e44070 csh
 2474  2371  2474     0  Rs                          sshd
 2440     1  2440     0  Rs+                         getty
 2433     1  2433     0  Rs                          cron
 2426     1  2426    62  Rs                          ftp-proxy
 2397     1  2397     0  Ss      select   0xc2d71964 inetd
 2371     1  2371     0  Rs                          sshd
 2349     1  2349     0  Rs                          bird
 2302     1  2302     0  Rs                          radvd
 2298  2292  2285 65534  S       piperd   0xc2624188 multilog
 2297  2291  2285     0  R                           openvpn
 2292  2287  2285     0  S       select   0xc2d713a4 supervise
 2291  2287  2285     0  S       select   0xc2d71a64 supervise
 2288     1  2285     0  S       piperd   0xc2624c40 readproctitle
 2287     1  2285     0  R                           svscan
 2244     1  2244     0  Rs                          ntpd
 2129     1  2129     0  Rs                          syslogd
   50     0     0     0  SL      mdwait   0xc2717000 [md1]
   40     0     0     0  SL      mdwait   0xc24ef000 [md0]
   22     0     0     0  RL                          [flowcleaner]
   21     0     0     0  SL      sdflush  0xc0b24d80 [softdepflush]
   20     0     0     0  SL      vlruwt   0xc25f52a8 [vnlru]
   19     0     0     0  SL      syncer   0xc0b14774 [syncer]
   18     0     0     0  RL                          [bufdaemon]
   17     0     0     0  RL                          [pagezero]
   16     0     0     0  SL      psleep   0xc0b2567c [vmdaemon]
   15     0     0     0  SL      psleep   0xc0b25644 [pagedaemon]
    9     0     0     0  SL      ccb_scan 0xc0b006d4 [xpt_thrd]
    8     0     0     0  RL                          [pfpurge]
    7     0     0     0  SL      waiting_ 0xc0b181d8 [sctp_iterator]
   14     0     0     0  SL      (threaded)          usb
100036                   D       -        0xc250ed34 [usbus1]
100035                   D       -        0xc250ed04 [usbus1]
100034                   D       -        0xc250ecd4 [usbus1]
100033                   D       -        0xc250eca4 [usbus1]
100032                   D       -        0xc2504b5c [usbus0]
100031                   D       -        0xc2504b2c [usbus0]
100030                   D       -        0xc2504afc [usbus0]
100029                   D       -        0xc2504acc [usbus0]
   13     0     0     0  SL      -        0xc0b0df64 [yarrow]
    6     0     0     0  SL      crypto_r 0xc0b2432c [crypto returns]
    5     0     0     0  SL      crypto_w 0xc0b24308 [crypto]
    4     0     0     0  SL      -        0xc0b0bc44 [g_down]
    3     0     0     0  SL      -        0xc0b0bc40 [g_up]
    2     0     0     0  SL      -        0xc0b0bc38 [g_event]
   12     0     0     0  WL      (threaded)          intr
100037                   I                           [swi0: uart uart]
100028                   I                           [irq12: ohci0 =
ehci0]
100027                   I                           [irq14: ata0]
100026                   I                           [irq15: vr2 ata1]
100025                   I                           [irq11: vr1]
100024                   I                           [irq10: vr0]
100021                   I                           [swi5: +]
100016                   I                           [swi2: cambio]
100015                   I                           [swi6: task queue]
100014                   I                           [swi6: Giant taskq]
100006                   I                           [swi1: netisr 0]
100005                   I                           [swi4: clock]
100004                   I                           [swi3: vm]
   11     0     0     0  RL                          [idle]
    1     0     1     0  SLs     wait     0xc2433d48 [init]
   10     0     0     0  SL      audit_wo 0xc0b246a0 [audit]
    0     0     0     0  RLs     (threaded)          kernel
100040                   D       -        0xc2527880 [dummynet]
100023                   D       -        0xc24d6580 [glxsb0 taskq]
100022                   Run     CPU 0               [thread taskq]
100020                   D       -        0xc24afd40 [acpi_task_2]
100019                   D       -        0xc24afd40 [acpi_task_1]
100018                   D       -        0xc24afd40 [acpi_task_0]
100017                   D       -        0xc24afd80 [kqueue taskq]
100010                   D       -        0xc24512c0 [firmware taskq]
100000                   D       sched    0xc0b0bd20 [swapper]
db>  tr 100024
Tracing pid 12 tid 100024 td 0xc24d45a0
sched_switch(c24d45a0,0,109,b67db121,673,...) at 0xc07270bf =3D =
sched_switch+0x1df
mi_switch(109,0,c0a6b180,52d,c24306f0,...) at 0xc071086a =3D =
mi_switch+0x11a
ithread_loop(c24de8c0,cc894d28,0,0,0,...) at 0xc06e06df =3D =
ithread_loop+0x15f
fork_exit(c06e0580,c24de8c0,cc894d28) at 0xc06dc830 =3D fork_exit+0x90
fork_trampoline() at 0xc09cf964 =3D fork_trampoline+0x8
--- trap 0, eip =3D 0, esp =3D 0xcc894d60, ebp =3D 0 ---
db> tr 100022
Tracing pid 0 tid 100022 td 0xc24d4b40
kdb_enter(c0a48fbe,c0a5d00a,0,c2522280,0,...) at 0xc0733b3a =3D =
kdb_enter+0x3a
uart_intr(c2522200,c24d4b40,c0afdc90,c2430980,4,...) at 0xc0632206 =3D =
uart_intr+0x126
intr_event_handle(c2430980,c23f5c20,0,c25db2bc,4,...) at 0xc06dff2c =3D =
intr_event_handle+0x5c
intr_execute_handlers(c0afdc90,c23f5c20,c25db2bc,0,c23f5c60,...) at =
0xc09d4929 =3D intr_execute_handlers+0x49
atpic_handle_intr(4,c23f5c20) at 0xc09ee047 =3D atpic_handle_intr+0x67
Xatpic_intr4() at 0xc09cfa52 =3D Xatpic_intr4+0x22
--- interrupt, eip =3D 0xc073cf01, esp =3D 0xc23f5c60, ebp =3D =
0xc23f5c60 ---
sleepq_release(c25db2bc,0,0,0,c24afc80,...) at 0xc073cf01 =3D =
sleepq_release+0x31
wakeup(c25db2bc,0,0,0,0,...) at 0xc07109f1 =3D wakeup+0x41
taskqueue_run_locked(c24afc80,c24afc9c,0,c0a66f59,0,...) at 0xc073e8e2 =3D=
 taskqueue_run_locked+0x142
taskqueue_thread_loop(c0b134ec,c23f5d28,0,0,0,...) at 0xc073ef27 =3D =
taskqueue_thread_loop+0x87
fork_exit(c073eea0,c0b134ec,c23f5d28) at 0xc06dc830 =3D fork_exit+0x90
fork_trampoline() at 0xc09cf964 =3D fork_trampoline+0x8
--- trap 0, eip =3D 0, esp =3D 0xc23f5d60, ebp =3D 0 ---

... cont dance again ...

db> tr 100022
Tracing pid 0 tid 100022 td 0xc24d4b40
kdb_enter(c0a48fbe,c0a5d00a,0,c2522280,0,...) at 0xc0733b3a =3D =
kdb_enter+0x3a
uart_intr(c2522200,c24d4b40,c0afdc90,c2430980,4,...) at 0xc0632206 =3D =
uart_intr+0x126
intr_event_handle(c2430980,c23f5c48,c23f5c14,c24afc80,4,...) at =
0xc06dff2c =3D intr_event_handle+0x5c
intr_execute_handlers(c0afdc90,c23f5c48,c24afc80,c25db2bc,c23f5cbc,...) =
at 0xc09d4929 =3D intr_execute_handlers+0x49
atpic_handle_intr(4,c23f5c48) at 0xc09ee047 =3D atpic_handle_intr+0x67
Xatpic_intr4() at 0xc09cfa52 =3D Xatpic_intr4+0x22
--- interrupt, eip =3D 0xc073e7e3, esp =3D 0xc23f5c88, ebp =3D =
0xc23f5cbc ---
taskqueue_run_locked(c24afc80,c24afc9c,0,c0a66f59,0,...) at 0xc073e7e3 =3D=
 taskqueue_run_locked+0x43
taskqueue_thread_loop(c0b134ec,c23f5d28,0,0,0,...) at 0xc073ef27 =3D =
taskqueue_thread_loop+0x87
fork_exit(c073eea0,c0b134ec,c23f5d28) at 0xc06dc830 =3D fork_exit+0x90
fork_trampoline() at 0xc09cf964 =3D fork_trampoline+0x8
--- trap 0, eip =3D 0, esp =3D 0xc23f5d60, ebp =3D 0 ---

db> tr 100024
Tracing pid 12 tid 100024 td 0xc24d45a0
sched_switch(c24d45a0,0,109,4b19c082,684,...) at 0xc07270bf =3D =
sched_switch+0x1df
mi_switch(109,0,c0a6b180,52d,c24306f0,...) at 0xc071086a =3D =
mi_switch+0x11a
ithread_loop(c24de8c0,cc894d28,0,0,0,...) at 0xc06e06df =3D =
ithread_loop+0x15f
fork_exit(c06e0580,c24de8c0,cc894d28) at 0xc06dc830 =3D fork_exit+0x90
fork_trampoline() at 0xc09cf964 =3D fork_trampoline+0x8
--- trap 0, eip =3D 0, esp =3D 0xcc894d60, ebp =3D 0 ---

db> tr 100025
Tracing pid 12 tid 100025 td 0xc24d42d0
kdb_enter(c0a48fbe,c0a5d00a,58,c2522280,0,...) at 0xc0733b3a =3D =
kdb_enter+0x3a
uart_intr(c2522200,c24d42d0,c0afdc90,c2430980,4,...) at 0xc0632206 =3D =
uart_intr+0x126
intr_event_handle(c2430980,cc918ac8,0,0,4,...) at 0xc06dff2c =3D =
intr_event_handle+0x5c
intr_execute_handlers(c0afdc90,cc918ac8,0,c2efa300,cc918b68,...) at =
0xc09d4929 =3D intr_execute_handlers+0x49
atpic_handle_intr(4,cc918ac8) at 0xc09ee047 =3D atpic_handle_intr+0x67
Xatpic_intr4() at 0xc09cfa52 =3D Xatpic_intr4+0x22
--- interrupt, eip =3D 0xc083e822, esp =3D 0xcc918b08, ebp =3D =
0xcc918b68 ---
ip_forward(c2efa300,0,12,0,0,...) at 0xc083e822 =3D ip_forward+0x3b2
ip_input(c2efa300,246,c24a1580,cc918bd0,c0622c21,...) at 0xc083ff8d =3D =
ip_input+0x6cd
netisr_dispatch_src(1,0,c2efa300,cc918c08,c07bb331,...) at 0xc07c9109 =3D =
netisr_dispatch_src+0x89
netisr_dispatch(1,c2efa300,c24d8c00,c24d8c00,c2f1a808,...) at 0xc07c93a0 =
=3D netisr_dispatch+0x20
ether_demux(c24d8c00,c2efa300,3,0,3,...) at 0xc07bb331 =3D =
ether_demux+0x161
ether_input(c24d8c00,c2efa300,c09d6ad0,c0b11780,c24d42d0,...) at =
0xc07bb8af =3D ether_input+0x33f
vr_intr(c2499000,c24d42d0,0,109,48d02b08,...) at 0xc0690034 =3D =
vr_intr+0x4d4
intr_event_execute_handlers(c24337f8,c2430600,c0a6b180,52d,c2430670,...) =
at 0xc06def8b =3D intr_event_execute_handlers+0x13b
ithread_loop(c24de360,cc918d28,0,0,0,...) at 0xc06e05ea =3D =
ithread_loop+0x6a
fork_exit(c06e0580,c24de360,cc918d28) at 0xc06dc830 =3D fork_exit+0x90
fork_trampoline() at 0xc09cf964 =3D fork_trampoline+0x8
--- trap 0, eip =3D 0, esp =3D 0xcc918d60, ebp =3D 0 ---





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8D7CB918-F7E0-4BC6-B9FC-624B1E438692>