From owner-freebsd-stable@FreeBSD.ORG Wed Feb 29 19:31:41 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 402B41065672; Wed, 29 Feb 2012 19:31:41 +0000 (UTC) (envelope-from lacombar@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 880508FC17; Wed, 29 Feb 2012 19:31:40 +0000 (UTC) Received: by wgbds12 with SMTP id ds12so3229627wgb.31 for ; Wed, 29 Feb 2012 11:31:39 -0800 (PST) Received-SPF: pass (google.com: domain of lacombar@gmail.com designates 10.180.14.73 as permitted sender) client-ip=10.180.14.73; Authentication-Results: mr.google.com; spf=pass (google.com: domain of lacombar@gmail.com designates 10.180.14.73 as permitted sender) smtp.mail=lacombar@gmail.com; dkim=pass header.i=lacombar@gmail.com Received: from mr.google.com ([10.180.14.73]) by 10.180.14.73 with SMTP id n9mr3679732wic.16.1330543899515 (num_hops = 1); Wed, 29 Feb 2012 11:31:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=s0BlvUbLokN0CGEXvmPR/bn7mZyqv7ulaiGPHp2gM3Y=; b=oG9SDtyzeOVoR2Yc/9t+n1zmV8pKyv1IQRcF+mVLhcuuCHmIkRFHVbf8lDvedKj0Vy /vpfz98mWWJtDiJpvBm7W4W5/VTnScWEe11J6rVSrG52tCP6LXJkSZuPhifJBvKT34O9 q1vMFTnw/SNCRGVu8jsy/PQ36AMFGVjCAjOO4= MIME-Version: 1.0 Received: by 10.180.14.73 with SMTP id n9mr2951198wic.16.1330543899467; Wed, 29 Feb 2012 11:31:39 -0800 (PST) Received: by 10.216.166.11 with HTTP; Wed, 29 Feb 2012 11:31:39 -0800 (PST) In-Reply-To: References: Date: Wed, 29 Feb 2012 14:31:39 -0500 Message-ID: From: Arnaud Lacombe To: Attilio Rao Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-stable Subject: Re: Complete hang on 9.0-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Feb 2012 19:31:41 -0000 Hi, On Wed, Feb 29, 2012 at 2:22 PM, Attilio Rao wrote: > 2012/2/29, Arnaud Lacombe : >> Hi, >> >> On Wed, Feb 29, 2012 at 1:44 PM, Attilio Rao wrote= : >>> 2012/2/29, Arnaud Lacombe : >>>> Hi, >>>> >>>> On Wed, Feb 29, 2012 at 12:59 PM, Arnaud Lacombe >>>> wrote: >>>>> Hi, >>>>> >>>>> On Mon, Feb 27, 2012 at 12:48 PM, Arnaud Lacombe >>>>> wrote: >>>>>> Hi, >>>>>> >>>>>> On Mon, Feb 27, 2012 at 10:36 AM, Attilio Rao >>>>>> wrote: >>>>>>> 2012/2/27, Arnaud Lacombe : >>>>>>>> Hi, >>>>>>>> >>>>>>>> On Tue, Feb 14, 2012 at 11:41 AM, Arnaud Lacombe >>>>>>>> wrote: >>>>>>>>> Hi folks, >>>>>>>>> >>>>>>>>> For the records, I was running some tests yesterday on top of a >>>>>>>>> 9.0-RELEASE, amd64, kernel when the box hanged. At the time of th= e >>>>>>>>> hang, the box was running a process with about 2800 threads with >>>>>>>>> heavy >>>>>>>>> IPC between 1400 writers and 1400 readers. The box was in single >>>>>>>>> user >>>>>>>>> mode (/bin/sh coming from FreeBSD 7.4-STABLE). Here is the beginn= ing >>>>>>>>> of the dmesg: >>>>>>>>> >>>>>>>> This happened a second time, now with FreeBSD 8.2-RELEASE. Complet= e >>>>>>>> machine hang. The machine was running about 4000 threads in a sing= le >>>>>>>> process, all the other condition are the same. >>>>>>> >>>>>>> Arnaud, >>>>>>> can you please break in your kernel via KDB, collect the following >>>>>>> informations from the DDB prompt: >>>>>>> - ps >>>>>>> - alltrace >>>>>>> - show allpcpu >>>>>>> - possibly get a coredump with 'call doadump' >>>>>>> >>>>>> Will do, but I'll need to rebuild a kernel to include DDB. >>>>>> >>>>>>> and in the end provide all those along with kernel binary and possi= bly >>>>>>> sources somewhere? >>>>>>> >>>>>> I'll be testing a bare `release/8.2.0' with the following patch: >>>>>> >>>>>> diff --git a/sys/amd64/conf/GENERIC b/sys/amd64/conf/GENERIC >>>>>> index c3e0095..7bd997f 100644 >>>>>> --- a/sys/amd64/conf/GENERIC >>>>>> +++ b/sys/amd64/conf/GENERIC >>>>>> @@ -79,6 +79,10 @@ options =A0 =A0 =A0INCLUDE_CONFIG_FILE =A0 =A0 # = Include this >>>>>> file in kernel >>>>>> >>>>>> =A0options =A0 =A0 =A0 =A0KDB =A0 =A0 =A0 =A0 =A0 # Kernel debugger = related code >>>>>> =A0options =A0 =A0 =A0 =A0KDB_TRACE =A0 =A0 # Print a stack trace fo= r a panic >>>>>> +options =A0 =A0 =A0 =A0DDB >>>>>> +options =A0 =A0 =A0 =A0BREAK_TO_DEBUGGER >>>>>> +options =A0 =A0 =A0 =A0ALT_BREAK_TO_DEBUGGER >>>>>> >>>>>> =A0# Make an SMP-capable kernel by default >>>>>> =A0options =A0 =A0 =A0 =A0SMP =A0 =A0 =A0 =A0 =A0 # Symmetric MultiP= rocessor Kernel >>>>>> >>>>> ok, it happened again after 2 days, the process was running about 320= 0 >>>>> threads. I'm trying to break into DDB and let you know, I'm not that >>>>> successful for now... >>>>> >>>> No luck. None of BREAK or ALT_BREAK are responding. I will not touch >>>> the system in the next few hours if you want me to test something on >>>> it. In the event of 8.2-RELEASE or 9.0-RELEASE are =A0not meant to wor= k >>>> reliably on top of a 7.4-RELEASE userland, I will re-setup the test to >>>> occurs on a clean 9.0-RELEASE system and re-try. >>> >>> We allow to break KBI when new releases happens, thus this may cause a >>> breakage for you, even if a deadlock is really not something you want. >>> >>> Can you try enabling SW_WATCHDOG, DEADLKRES and possibly arm your ichwd= ? >>> if the breakage involves clocks or interrupt sources there are still >>> chances they will be able to catch it though. >>> >>> However, it doesn't seem you are setup with a proper serial console? >> The serial console is working definitively fine. I can break into DDB >> at will when the test is running. I did not test with ALT_BREAK >> per-se, but BREAK does work. > > So if you try to break in DDB via serial break it doesn't work? > That is definitively very bad... > just to be sure, I rebooted the system and I could break into DDB at the first attempt with ALT_BREAK, BREAK was a bit more reluctant but worked too. So yes, this does not taste good :/ > Can you try with the options I mentioned earlier and see if something cha= nges? > will do, but I will first attempt to reproduce this on 9.0-RELEASE. - Arnaud > Attilio > > > -- > Peace can only be achieved by understanding - A. Einstein