From owner-freebsd-amd64@FreeBSD.ORG Wed Jan 30 00:06:30 2008 Return-Path: Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A05CB16A417 for ; Wed, 30 Jan 2008 00:06:30 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219]) by mx1.freebsd.org (Postfix) with ESMTP id 3580113C45A for ; Wed, 30 Jan 2008 00:06:30 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.8s) with ESMTP id 230158602-1834499 for multiple; Tue, 29 Jan 2008 19:05:05 -0500 Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m0U06Lhh099880; Tue, 29 Jan 2008 19:06:22 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-amd64@freebsd.org Date: Tue, 29 Jan 2008 19:00:42 -0500 User-Agent: KMail/1.9.7 References: <1201388299.84900.12.camel@Sylvester.dco.penx.com> <20080129202643.6BF568DE@fep1.cogeco.net> In-Reply-To: <20080129202643.6BF568DE@fep1.cogeco.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200801291900.42989.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Tue, 29 Jan 2008 19:06:23 -0500 (EST) X-Virus-Scanned: ClamAV 0.91.2/5600/Tue Jan 29 16:52:23 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: Subject: Re: Multi processor locking problem under 7.0 X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Jan 2008 00:06:30 -0000 On Tuesday 29 January 2008 03:26:44 pm Paul wrote: > > >I have several systems of two different types running 7.0. One is an IBM > >3550 and the other a Dell 2950. The IBMs more than the Dells > >consistently seem to have a kernel locking problem during dump. > >Specifically, if I execute this command: > > > > dump 0uaLCf 64 /dev/null /usr > > > >Dump consistently stops in Phase IV. However, if I set > >machdep.hlt_logical_cpus=1, dump does not stop. At the end of this > >message is my boot information. > > > >When logical_cpus=0, the following is typical of what is displayed by > >top when dump stops: > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU > >COMMAND > > 926 root 1 4 0 75476K 71744K sbwait 0 0:04 0.00% dump > > 928 root 1 20 0 75348K 67740K pause 1 0:02 0.00% dump > > 929 root 1 20 0 75348K 67740K pause 1 0:02 0.00% dump > > 927 root 1 20 0 75348K 67740K pause 1 0:02 0.00% dump > > 919 root 1 8 0 75348K 67144K wait 0 0:00 0.00% dump > > > >Fooling around a bit I have found that if I truss dump, the dump > >continues. On the Dells, if I force disk activity during the dump, such > >as executing a ls -lR /usr > /dev/null, the dump finishes. > > > >I am unsure how to proceed in debugging this problem. It has been around > >for a while but I am now installing the IBMs and the dump problem is a > >no-starter. Please contact me directly on how to proceed. > > I have noticed something similar on my Intel test box. > > When compiling many ports in the tree that is updated on 7.0RC1 with > a S5000pal with 2 Quadcore Xeons the process just STOPS. I am using > the install disk and have not updated to the latest cvsup release yet > (I am trying to make the world now with fingers crossed :) ) I tried > it with just one quadcore and the same problem happens. > > There are no errors on the screen but it no longer proceeds with the > port build. When I suspend the process and restart the make in the > same session it has no problem getting past this impasse and with a > few suspends the make finishes without error. It does not happen > every time which is very odd. > > Based on your description above it seems like it may be the same problem. > > What do you think? If you have threads blocked on "vmo_de" then upgrade to the latest RELENG_7 or RELENG_7_0 (specifically the sys/kern/subr_sleepqueue.c file) and try again. -- John Baldwin