Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 May 2003 10:19:00 -0400 (EDT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        current@FreeBSD.org, alfred@FreeBSD.org
Subject:   rpc.lockd spinning; much breakage
Message-ID:  <Pine.NEB.3.96L.1030512100945.76804F-100000@fledge.watson.org>

next in thread | raw e-mail | index | archive | help

I have three boxes in the current configuration: a build server (cboss) 
and two diskless workstations (crash1, crash2).  crash1 and crash2 pxeboot
up to load a kernel via NFS, then run almost entirely out of NFS with the
exception of local swap and perhaps a local file system for testing.

The clients will frequently "hang" with many processes in the "lockd"
state when the daily event runs -- here's the ps from crash2 in such a
state:

db> ps
  pid   proc     addr    uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
23872 c1de05a0 c934c000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc620][SLP] sendmail
23870 c1de0960 c934e000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc540][SLP] sendmail
23868 c1dbf780 c92f2000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc020][SLP] sendmail
23866 c1dbf000 c92ee000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc1e0][SLP] sendmail
23864 c1b72960 c9226000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc200][SLP] sendmail
23862 c1b725a0 c9224000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc1a0][SLP] sendmail
23860 c1b73d20 c927e000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc400][SLP] sendmail
23858 c1b733c0 c922b000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc640][SLP] sendmail
23856 c1dbe5a0 c92e9000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc5e0][SLP] sendmail
23854 c1a395a0 c9160000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc1c0][SLP] sendmail
23852 c1b723c0 c9223000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc080][SLP] sendmail
23850 c1b37d20 c920a000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc2c0][SLP] sendmail
23848 c1dbf3c0 c92f0000    0   446   446 0000100 norm[SLPQ   lockd
c1db8d60][SLP] sendmail
23846 c1a3e960 c91b8000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc2a0][SLP] sendmail
23845 c1b73780 c927b000    0 23840 23820 0004102 norm[SLPQ   lockd
c1dbc140][SLP] su
23840 c1dbe780 c92ea000    0 23828 23820 0004002 norm[SLPQ    wait
c1dbe780][SLP] sh
23832 c1b72000 c9221000    0 23829 23820 0004002 norm[SLPQ  piperd
c19cc8f0][SLP] mail
23831 c1b73960 c927c000    0     1 23831 0000100 norm[SLPQ   lockd
c1db8d80][SLP] sendmail
23829 c1b37780 c9207000    0 23820 23820 0000002 norm[SLPQ    wait
c1b37780][SLP] sh
23828 c1a3eb40 c91b9000    0 23820 23820 0000002 norm[SLPQ    wait
c1a3eb40][SLP] sh
23821 c1b37960 c9208000    0 23789 23821 0004000 norm[SLPQ  piperd
c19ccb00][SLP] mail.local
23820 c1b73000 c9229000    0   521 23820 0004002 norm[SLPQ    wait
c1b73000][SLP] sh
23818 c1dbf1e0 c92ef000   25     1 23575 0004102 norm[SLPQ   lockd
c1db8dc0][SLP] sendmail
23789 c1dbe000 c92e6000    0 23781   446 0000100 norm[SLPQ   lockd
c1db8ec0][SLP] sendmail
23781 c1a393c0 c915f000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc0c0][SLP] sendmail
23627 c1dbeb40 c92ec000    0 23602 23627 0004000 norm[SLPQ  piperd
c19cde70][SLP] mail.local
23626 c1dbf5a0 c92f1000    0 23601 23626 0004000 norm[SLPQ  piperd
c19cc6e0][SLP] mail.local
23602 c1dbe3c0 c92e8000    0 23589   446 0000100 norm[SLPQ   lockd
c1dbca20][SLP] sendmail
23601 c1dbed20 c92ed000    0 23591   446 0000100 norm[SLPQ   lockd
c1dbca00][SLP] sendmail
23591 c1a3e000 c91b3000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc6a0][SLP] sendmail
23589 c1dbe1e0 c92e7000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc760][SLP] sendmail
23529 c1b73b40 c927d000    0 23514 23529 0004000 norm[SLPQ  piperd
c19cc840][SLP] mail.local
23514 c1b721e0 c9222000    0 23508   446 0000100 norm[SLPQ   lockd
c1dbc160][SLP] sendmail
23508 c1b731e0 c922a000    0   446   446 0000100 norm[SLPQ   lockd
c1dbc000][SLP] sendmail
23471 c1b72d20 c9228000    0 23457 23471 0004000 norm[SLPQ  piperd
c19cdd10][SLP] mail.local
23457 c1b735a0 c922c000    0 23452   446 0000100 norm[SLPQ   lockd
c1db8da0][SLP] sendmail
23452 c1dbe960 c92eb000    0   446   446 0000100 norm[SLPQ   lockd
c1db8e80][SLP] sendmail
23324 c1b72780 c9225000    0 23279 23324 0004000 norm[SLPQ  piperd
c19cd370][SLP] mail.local
23323 c1b37b40 c9209000    0 23276 23323 0004000 norm[SLPQ  piperd
c19cd9a0][SLP] mail.local
23279 c1a39b40 c9163000    0     1   446 0000100 norm[SLPQ   lockd
c1dbcb00][SLP] sendmail
23276 c1b72b40 c9227000    0     1   446 0000100 norm[SLPQ   lockd
c1dbc8e0][SLP] sendmail
  521 c1a3ed20 c91ba000    0   515   521 0004002 norm[SLPQ   pause
c91ba000][SLP] csh
  520 c1b36000 c91c0000  425   471   471 0000100 norm[SLPQ  accept
c19b1c36][SLP] httpd
  519 c1b361e0 c91c1000  425   471   471 0000100 norm[SLPQ  accept
c19b1c36][SLP] httpd
  518 c1b363c0 c91c4000  425   471   471 0000100 norm[SLPQ  accept
c19b1c36][SLP] httpd
  517 c1b365a0 c91d7000  425   471   471 0000100 norm[SLPQ  accept
c19b1c36][SLP] httpd
  516 c1b36780 c91d8000  425   471   471 0000100 norm[SLPQ  accept
c19b1c36][SLP] httpd
  515 c1b36960 c91d9000    0     1   515 0004102 norm[SLPQ    wait
c1b36960][SLP] login
  514 c1b36b40 c91da000    0     1   514 0004002 norm[SLPQ   ttyin
c1a27010][SLP] getty
  513 c1b36d20 c91db000    0     1   513 0004002 norm[SLPQ   ttyin
c1a29010][SLP] getty
  512 c1b37000 c91dc000    0     1   512 0004002 norm[SLPQ   ttyin
c1a28c10][SLP] getty
  511 c1b371e0 c9204000    0     1   511 0004002 norm[SLPQ   ttyin
c1a28810][SLP] getty
  510 c1b373c0 c9205000    0     1   510 0004002 norm[SLPQ   ttyin
c1a27c10][SLP] getty
  509 c1b375a0 c9206000    0     1   509 0004002 norm[SLPQ   ttyin
c1a27810][SLP] getty
  508 c1a39780 c9161000    0     1   508 0004002 norm[SLPQ   ttyin
c1a27410][SLP] getty
  507 c19ad780 c8d01000    0     1   507 0004002 norm[SLPQ   ttyin
c1907010][SLP] getty
  488 c19155a0 c8cd1000    0     1   488 0000000 norm[CVQ  select
c05b65b4][SLP] inetd
  471 c1a39960 c9162000    0     1   471 0000000 norm[CVQ  select
c05b65b4][SLP] httpd
  449 c1915960 c8cd3000   25     1   449 0000100 norm[SLPQ   pause
c8cd3000][SLP] sendmail
  446 c1a39000 c914d000    0     1   446 0000100 norm[CVQ  select
c05b65b4][SLP] sendmail
  440 c19add20 c8d04000    0     1   440 0000100 norm[CVQ  select
c05b65b4][SLP] sshd
  430 c1a3e3c0 c91b5000    0     1   430 0000000 norm[CVQ  select
c05b65b4][SLP] usbd
  378 c1a391e0 c915e000    0   372   372 0000100 norm[CVQ  select
c05b65b4][SLP] rpc.lockd
  372 c1a39d20 c91b2000    0     1   372 0000000 norm[CVQ  select
c05b65b4][SLP] rpc.lockd
  370 c1a3e1e0 c91b4000    0     1   370 0000000 norm[CVQ  select
c05b65b4][SLP] rpc.statd
  368 c1a3e5a0 c91b6000    0   362   362 0000000 norm[SLPQ    nfsd
c1a27e00][SLP] nfsd
  367 c1a3e780 c91b7000    0   362   362 0000000 norm[SLPQ    nfsd
c1a28000][SLP] nfsd
  365 c1915780 c8cd2000    0   362   362 0000000 norm[SLPQ    nfsd
c1a28200][SLP] nfsd
  364 c19ad1e0 c8cfe000    0   362   362 0000000 norm[SLPQ    nfsd
c1a28400][SLP] nfsd
  362 c1915b40 c8cfb000    0     1   362 0000000 norm[CVQ  select
c05b65b4][SLP] nfsd
  360 c19ad000 c8cfd000    0     1   360 0000000 norm[CVQ  select
c05b65b4][SLP] mountd
  291 c1915d20 c8cfc000    0     1   291 0000000 norm[CVQ  select
c05b65b4][SLP] ypbind
  276 c19ad5a0 c8d00000    0     1   276 0000000 norm[CVQ  select
c05b65b4][SLP] rpcbind
  210 c19153c0 c8cd0000    0     1   210 0000000 norm[CVQ  select
c05b65b4][SLP] dhclient
  124 c19ad3c0 c8cff000    0     1   124 0000000 norm[SLPQ   pause
c8cff000][SLP] adjkerntz
   44 c19ad960 c8d02000    0     0     0 0000204 norm[SLPQ  nfsidl
c05bd14c][SLP] nfsiod 3
   43 c19adb40 c8d03000    0     0     0 0000204 norm[SLPQ  nfsidl
c05bd148][SLP] nfsiod 2
   42 c189bb40 c8c9a000    0     0     0 0000204 norm[SLPQ  nfsidl
c05bd144][SLP] nfsiod 1
   41 c189bd20 c8c9b000    0     0     0 0000204 norm[SLPQ  nfsidl
c05bd140][SLP] nfsiod 0
   40 c1913000 c8cc6000    0     0     0 0000204 norm[SLPQ  vlruwt
c1913000][SLP] vnlru
   39 c19131e0 c8cc7000    0     0     0 0000204 norm[SLPQ  syncer
c0591160][SLP] syncer
   38 c19133c0 c8cc8000    0     0     0 0000204 norm[SLPQ  psleep
c05b697c][SLP] bufdaemon
   37 c19135a0 c8cc9000    0     0     0 000020c norm[SLPQ  pgzero
c05be808][SLP] pagezero
    9 c1913780 c8cca000    0     0     0 0000204 norm[SLPQ  psleep
c05be834][SLP] vmdaemon
    8 c1913960 c8ccb000    0     0     0 0000204 norm[SLPQ  psleep
c05be820][SLP] pagedaemon
   36 c1913b40 c8ccc000    0     0     0 0000204 new [IWAIT] irq8: rtc
   35 c1913d20 c8ccd000    0     0     0 0000204 new [IWAIT] irq0: clk
   34 c1915000 c8cce000    0     0     0 0000204 new [IWAIT] irq3: sio1
   33 c19151e0 c8ccf000    0     0     0 0000204 new [IWAIT] irq4: sio0
   32 c17c85a0 c8c62000    0     0     0 0000204 norm[IWAIT] swi0: tty:sio
   31 c17c8780 c8c63000    0     0     0 0000204 norm[IWAIT] irq6: fdc0
   30 c17c8960 c8c64000    0     0     0 0000204 new [IWAIT] irq5: fxp3
   29 c17c8b40 c8c65000    0     0     0 0000204 new [IWAIT] irq7: fxp2
   28 c17c8d20 c8c66000    0     0     0 0000204 norm[SLPQ  usbtsk
c0588118][SLP] usbtask
   27 c189b000 c8c6d000    0     0     0 0000204 norm[SLPQ  usbevt
c18ef210][SLP] usb0
   26 c189b1e0 c8c6e000    0     0     0 0000204 new [IWAIT] irq11: ohci0
   25 c189b3c0 c8c6f000    0     0     0 0000204 norm[IWAIT] irq15: ata1
   24 c189b5a0 c8c70000    0     0     0 0000204 norm[IWAIT] irq14: ata0
   23 c189b780 c8c71000    0     0     0 0000204 norm[IWAIT] irq10: fxp0
fxp1
    7 c189b960 c8c72000    0     0     0 0000204 norm[SLPQ  actask
c06da0cc][SLP] acpi_task2
    6 c0b8b1e0 c8670000    0     0     0 0000204 norm[SLPQ  actask
c06da0cc][SLP] acpi_task1
    5 c0b8b3c0 c8671000    0     0     0 0000204 norm[SLPQ  actask
c06da0cc][SLP] acpi_task0
   22 c0b8b5a0 c8672000    0     0     0 0000204 new [IWAIT] irq9: acpi0
   21 c0b8b780 c8673000    0     0     0 0000204 new [IWAIT] irq13:
   20 c0b8b960 c8674000    0     0     0 0000204 new [IWAIT] swi5:+
   19 c0b8bb40 c8675000    0     0     0 0000204 new [IWAIT] swi6: task
queue
   18 c0b8bd20 c869d000    0     0     0 0000204 new [IWAIT] swi3: cambio
   17 c17c8000 c8c5f000    0     0     0 0000204 new [IWAIT] swi2: camnet
   16 c17c81e0 c8c60000    0     0     0 0000204 new [IWAIT] swi6:
acpitaskq
   15 c17c83c0 c8c61000    0     0     0 0000204 norm[SLPQ   sleep
c057bf60][SLP] random
    4 c0b84000 c8619000    0     0     0 0000204 norm[SLPQ  g_down
c058a678][SLP] g_down
    3 c0b841e0 c8668000    0     0     0 0000204 norm[SLPQ    g_up
c058a674][SLP] g_up
    2 c0b843c0 c8669000    0     0     0 0000204 norm[SLPQ  g_events
c058a66c][SLP] g_event
   14 c0b845a0 c866a000    0     0     0 0000204 new [IWAIT] swi4: vm
   13 c0b84780 c866b000    0     0     0 000020c norm[IWAIT] swi7: tty:sio
clock
   12 c0b84960 c866c000    0     0     0 0000204 norm[IWAIT] swi1: net
   11 c0b84b40 c866d000    0     0     0 000020c norm[CPU 0] idle
    1 c0b84d20 c866e000    0     0     1 0004200 norm[SLPQ    wait
c0b84d20][SLP] init
   10 c0b8b000 c866f000    0     0     0 0000204 norm[CVQ  ktrace
c058dc34][SLP] ktrace
    0 c058a760 c0709000    0     0     0 0000200 norm[SLPQ   sched
c058a760][SLP] swapper

When in this state, the rpc.lockd on crash2 (.11) appears to be spinning,
spewing out the following sorts of packets to cboss (.1):

10:13:37.329753 0:0:e2:3c:70:63 0:d0:b7:68:4a:8c 0800 290:
192.168.50.11.49938 > 192.168.50.1.1016: udp 248
                         4500 0114 9515 0000 4011 ff66 c0a8 320b
                         c0a8 3201 c312 03f8 0100 4bf5 3eba ef11
                         0000 0000 0000 0002 0001 86b5 0000 0001
                         0000 0007 0000 0001 0000 0054 3ebf a6d0
                         0000 0015 6372 6173 6832 2e72 762e 6e61
                         696c
10:13:37.330089 0:0:e2:3c:70:63 0:d0:b7:68:4a:8c 0800 250:
192.168.50.11.49938 > 192.168.50.1.1016: udp 208
                         4500 00ec 9516 0000 4011 ff8d c0a8 320b
                         c0a8 3201 c312 03f8 00d8 2d1b 3eba ef12
                         0000 0000 0000 0002 0001 86b5 0000 0001
                         0000 0007 0000 0001 0000 002c 3ebf a6d0
                         0000 0015 6372 6173 6832 2e72 762e 6e61
                         696c
10:13:37.519632 0:0:e2:3c:70:63 0:d0:b7:68:4a:8c 0800 250:
192.168.50.11.49938 > 192.168.50.1.1016: udp 208
                         4500 00ec 9517 0000 4011 ff8c c0a8 320b
                         c0a8 3201 c312 03f8 00d8 3a03 3eba ef13
                         0000 0000 0000 0002 0001 86b5 0000 0001
                         0000 0007 0000 0001 0000 002c 3ebf a6d0
                         0000 0015 6372 6173 6832 2e72 762e 6e61
                         696c
10:13:37.829638 0:0:e2:3c:70:63 0:d0:b7:68:4a:8c 0800 290:
192.168.50.11.49938 > 192.168.50.1.1016: udp 248
                         4500 0114 9518 0000 4011 ff63 c0a8 320b
                         c0a8 3201 c312 03f8 0100 601b 3eba ef14
                         0000 0000 0000 0002 0001 86b5 0000 0001
                         0000 0007 0000 0001 0000 0054 3ebf a6d0
                         0000 0015 6372 6173 6832 2e72 762e 6e61
                         696c

On the server side, rpc.lockd is unusually busy as a result:

  PID USERNAME   PRI NICE   SIZE    RES STATE    TIME   WCPU    CPU COMMAND
47547 root       121    0  1592K  1224K RUN    448:30 98.68% 98.68% rpc.lockd

Usually the behavior that upsets the client is to run:

while (1)
    periodic daily
    periodic weekly
    periodic monthly
end

The rpc.lockd process remains extremely busy even after crash2 is rebooted
and the stream of packets is no longer present.

I'm not sure how to go about debugging these problems, but the current
scenario basically means I can't get both the crash boxes through their
daily events if both the client and server are very busy (i.e., if they
both run their daily events at the same time).  I'm going to reboot cboss
and the systems and see if that flushes whatever nasty state hangs around,
but any advice on the debugging process would be helpful.  Is there a way
to get rpc.lockd on the server to dump it's state to a file?

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Network Associates Laboratories



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1030512100945.76804F-100000>