From owner-freebsd-jail@FreeBSD.ORG Tue Oct 9 14:26:19 2007 Return-Path: Delivered-To: freebsd-jail@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 142C816A41B for ; Tue, 9 Oct 2007 14:26:19 +0000 (UTC) (envelope-from wmoran@collaborativefusion.com) Received: from mx00.pub.collaborativefusion.com (mx00.pub.collaborativefusion.com [206.210.89.199]) by mx1.freebsd.org (Postfix) with ESMTP id 9EF0913C48A for ; Tue, 9 Oct 2007 14:26:18 +0000 (UTC) (envelope-from wmoran@collaborativefusion.com) Received: from vanquish.pitbpa0.priv.collaborativefusion.com (vanquish.pitbpa0.priv.collaborativefusion.com [192.168.2.61]) (SSL: TLSv1/SSLv3,256bits,AES256-SHA) by wingspan with esmtp; Tue, 09 Oct 2007 10:16:15 -0400 id 0005641E.470B8D2F.0000CFDB Date: Tue, 9 Oct 2007 10:16:15 -0400 From: Bill Moran To: freebsd-jail@freebsd.org Message-Id: <20071009101615.bd2601de.wmoran@collaborativefusion.com> Organization: Collaborative Fusion X-Mailer: Sylpheed 2.4.4 (GTK+ 2.10.14; i386-portbld-freebsd6.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: netops@collaborativefusion.com Subject: Mysterious jail lockups X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Oct 2007 14:26:19 -0000 Has anyone else seen this? The symptoms are a jail that has no processes in it, and thus can not be stopped/killed/whatever. Only solution is to reboot the host system. Trying to jexec into the jail results in an error, so new processes can't be started therein. It doesn't happen very often, and I've been unable to reproduce it on demand. What I'm looking for at this point are whether or not anyone else has seen this, and advice on how to track it down/reproduce it, with the eventual goal of fixing the problem. It would be nice if there were a command, let's say "jkill" that killed the _jail_. There is a port called jkill that (allegedly) does this, but looking at the perl code, all it does it loop through a ps listing killing off processes. In the event of a jail with no processes, this doesn't help any. Theoretically, this would be some sort of kernel bug, whereby the reference counter to the jail is not properly decremented when processes die and thus the jail never shuts down. Given the infrequency of the occurrence and my inability to produce a reproducible case, I expect it to be challenging to track down. Any advice? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023