Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Jan 2008 19:27:49 -0700
From:      Dennis Glatting <freebsd@penx.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-amd64@freebsd.org
Subject:   Re: Multi processor locking problem under 7.0
Message-ID:  <1201660069.95413.9.camel@Sylvester.dco.penx.com>
In-Reply-To: <200801291900.42989.jhb@freebsd.org>
References:  <1201388299.84900.12.camel@Sylvester.dco.penx.com> <20080129202643.6BF568DE@fep1.cogeco.net> <200801291900.42989.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Tue, 2008-01-29 at 19:00 -0500, John Baldwin wrote:
> On Tuesday 29 January 2008 03:26:44 pm Paul wrote:
> > 
> > >I have several systems of two different types running 7.0. One is an IBM
> > >3550 and the other a Dell 2950. The IBMs more than the Dells
> > >consistently seem to have a kernel locking problem during dump.
> > >Specifically, if I execute this command:
> > >
> > >         dump 0uaLCf 64 /dev/null /usr
> > >
> > >Dump consistently stops in Phase IV. However, if I set
> > >machdep.hlt_logical_cpus=1, dump does not stop. At the end of this
> > >message is my boot information.
> > >
> > >When logical_cpus=0, the following is typical of what is displayed by
> > >top when dump stops:
> > >
> > >   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU
> > >COMMAND
> > >   926 root        1   4    0 75476K 71744K sbwait 0   0:04  0.00% dump
> > >   928 root        1  20    0 75348K 67740K pause  1   0:02  0.00% dump
> > >   929 root        1  20    0 75348K 67740K pause  1   0:02  0.00% dump
> > >   927 root        1  20    0 75348K 67740K pause  1   0:02  0.00% dump
> > >   919 root        1   8    0 75348K 67144K wait   0   0:00  0.00% dump
> > >
> > >Fooling around a bit I have found that if I truss dump, the dump
> > >continues. On the Dells, if I force disk activity during the dump, such
> > >as executing a ls -lR /usr > /dev/null, the dump finishes.
> > >
> > >I am unsure how to proceed in debugging this problem. It has been around
> > >for a while but I am now installing the IBMs and the dump problem is a
> > >no-starter. Please contact me directly on how to proceed.
> > 
> > I have noticed something similar on my Intel test box.
> > 
> > When compiling many ports in the tree that is updated on 7.0RC1 with 
> > a S5000pal with 2 Quadcore Xeons the process just STOPS. I am using 
> > the install disk and have not updated to the latest cvsup release yet 
> > (I am trying to make the world now with fingers crossed :)  ) I tried 
> > it with just one quadcore and the same problem happens.
> > 
> > There are no errors on the screen but it no longer proceeds with the 
> > port build. When I suspend the process and restart the make in the 
> > same session it has no problem getting past this impasse and with a 
> > few suspends the make finishes without error. It does not happen 
> > every time which is very odd.
> > 
> > Based on your description above it seems like it may be the same problem.
> > 
> > What do you think?
> 
> If you have threads blocked on "vmo_de" then upgrade to the latest RELENG_7 or 
> RELENG_7_0 (specifically the sys/kern/subr_sleepqueue.c file) and try again.
> 

I got the right file and updated my systems. I ran dump on the IBM
system five times. Dump hung four times, three times when 99.99%
complete. Below is a ps output.

How do I tell what the threads are blocked on?


Daffy> ps -axwHl | grep dump
    0   801     1   0  96  0 20952  4060 select Is    ??
0:00.00 /usr/sbin/sshd -f /etc/ssh/dumper/sshd_config
    0 14682   870   0   8  0 34388 26628 wait   I+    p0    0:00.20 dump
0uaLCf 24 /dev/null /usr (dump)
    0 14774 14682   0   4  0 34388 30680 sbwait I+    p0    0:01.01
dump: /dev/aacd0s1e: pass 4: 14.97% done, finished in 0:03 at T
    0 14775 14774   0  20  0 34388 26644 pause  I+    p0    0:00.69 dump
0uaLCf 24 /dev/null /usr (dump)
    0 14776 14774   0  20  0 34388 26644 pause  I+    p0    0:00.69 dump
0uaLCf 24 /dev/null /usr (dump)
    0 14777 14774   0  20  0 34388 26644 pause  I+    p0    0:00.69 dump
0uaLCf 24 /dev/null /usr (dump)
  600 14896 12552   0  96  0  5900  1184 -      R+    p2    0:00.00 grep
dump







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1201660069.95413.9.camel>