From owner-freebsd-bugs@FreeBSD.ORG Tue Feb 6 14:55:53 2007 Return-Path: X-Original-To: freebsd-bugs@FreeBSD.org Delivered-To: freebsd-bugs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EC89016A400; Tue, 6 Feb 2007 14:55:52 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2-3.pacific.net.au [61.8.2.226]) by mx1.freebsd.org (Postfix) with ESMTP id B46AF13C48D; Tue, 6 Feb 2007 14:55:52 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.2.163]) by mailout2.pacific.net.au (Postfix) with ESMTP id 7091D10A328; Wed, 7 Feb 2007 01:55:48 +1100 (EST) Received: from besplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy2.pacific.net.au (Postfix) with ESMTP id 6407F27403; Wed, 7 Feb 2007 01:55:50 +1100 (EST) Date: Wed, 7 Feb 2007 01:55:49 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: David Xu In-Reply-To: <200701020250.l022oKEM040902@freefall.freebsd.org> Message-ID: <20070207014307.B31879@besplex.bde.org> References: <200701020250.l022oKEM040902@freefall.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-bugs@FreeBSD.org Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Feb 2007 14:55:53 -0000 Long ago, On Tue, 2 Jan 2007, David Xu wrote: > The following reply was made to PR kern/72979; it has been noted by GNATS. > On Tuesday 02 January 2007 03:12, Andrew Pantyukhin wrote: > > I've got picard (a very fat python app) in this very > > state, on latest current. I tried killing its parent > > (zsh) which only reassigned it to ppid 1. I won't try > > to kill that one :-) > > > > Tell me I'm on crack or I'll reopen this PR. > > > > Thanks! > > Can you reproduce it on 6.2 RC ? This or something similar is still happening under 6.2-PRERELEASE-Dec-10-2006 (nosedive.freebsd.org). It seemed to be more common under 6.0. Stopping a kernel compile used to cause lots of processes that seemed to be in inifinite disk wait on nfs files. The stop (^Z and then ^C) doesn't finish and the shell appears to be hung too, and cannot be killed properly using SIGHUP. However, last time this happened I was able to clear all the hung processes by sending SIGCONT from another shell to some of the deeply nested ones. I haven't seen this on any local machines, so it may be timing-related. When it seemed to be happening more, I was running kernel builds with excessive concurrency due to bugs in getting enough parallelism to work around network latency, so it may be related to just running lots of similar competing processes. The recent patches don't seem to be in -current or RELENG_6. Bruce