From owner-freebsd-stable@FreeBSD.ORG Wed Apr 18 06:22:41 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 22CE8106566B; Wed, 18 Apr 2012 06:22:41 +0000 (UTC) (envelope-from lacombar@gmail.com) Received: from mail-wg0-f42.google.com (mail-wg0-f42.google.com [74.125.82.42]) by mx1.freebsd.org (Postfix) with ESMTP id 82C4E8FC12; Wed, 18 Apr 2012 06:22:40 +0000 (UTC) Received: by wgbds11 with SMTP id ds11so283954wgb.1 for ; Tue, 17 Apr 2012 23:22:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=zj50AleLlGE7qj8gB22DK5pBi+wRufs++8ENtIWbRRg=; b=AMYj3I7s4hf9vHQTMkD5RcJauBZBkndl64MQSGwnMmbQUNdztcsVTLwT8DiVA51SvE 7ArrIwFvGsydVMj5tGLyciGsrZmHr4etO6eJe/Xc7qXzSeX5ie1Myw5syht3hxRgC8PQ 7zWll+iXCGHgW5I8/y6xm0ZbL4rLraUxD+li0Qx+bdhqKWnl/Ngzhyw1mCavf97cNbrV nuzuOC8tM3aEgQ7y89u+Chxk+E2a4nPiJjnfD++aipoLiVBpKWuBthJnVDlB1cicVu+d hNN/u3aCu9WEIwW6KYZp+pK3UXiXvlUbF8uys4Ih0ripPtA3t0IYP/gjA4VIoERZWk6s LUag== MIME-Version: 1.0 Received: by 10.180.107.132 with SMTP id hc4mr3027463wib.21.1334730153777; Tue, 17 Apr 2012 23:22:33 -0700 (PDT) Received: by 10.216.49.81 with HTTP; Tue, 17 Apr 2012 23:22:33 -0700 (PDT) In-Reply-To: References: Date: Wed, 18 Apr 2012 02:22:33 -0400 Message-ID: From: Arnaud Lacombe To: freebsd-stable , FreeBSD Current Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Subject: Re: Complete hang on 9.0-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 06:22:41 -0000 Hi, On Mon, Apr 16, 2012 at 5:50 PM, Arnaud Lacombe wrote: > Hi, > > [for the record...] > > On Tue, Feb 14, 2012 at 11:41 AM, Arnaud Lacombe wro= te: >> Hi folks, >> >> For the records, I was running some tests yesterday on top of a >> 9.0-RELEASE, amd64, kernel when the box hanged. At the time of the >> hang, the box was running a process with about 2800 threads with heavy >> IPC between 1400 writers and 1400 readers. The box was in single user >> mode (/bin/sh coming from FreeBSD 7.4-STABLE). Here is the beginning >> of the dmesg: >> >> Copyright (c) 1992-2012 The FreeBSD Project. >> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 >> =A0 =A0 =A0 =A0The Regents of the University of California. All rights r= eserved. >> FreeBSD is a registered trademark of The FreeBSD Foundation. >> FreeBSD 9.0-RELEASE #0: Tue Jan =A03 07:46:30 UTC 2012 >> =A0 =A0root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 >> CPU: Intel(R) Atom(TM) CPU D510 =A0 @ 1.66GHz (1666.70-MHz K8-class CPU) >> =A0Origin =3D "GenuineIntel" =A0Id =3D 0x106ca =A0Family =3D 6 =A0Model = =3D 1c =A0Stepping =3D 10 >> =A0Features=3D0xbfebfbff >> =A0Features2=3D0x40e31d >> =A0AMD Features=3D0x20000800 >> =A0AMD Features2=3D0x1 >> =A0TSC: P-state invariant, performance statistics >> real memory =A0=3D 2137587712 (2038 MB) >> avail memory =3D 2037841920 (1943 MB) >> Event timer "LAPIC" quality 400 >> ACPI APIC Table: <070611 APIC1125> >> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs >> FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 HTT threads >> =A0cpu0 (BSP): APIC ID: =A00 >> =A0cpu1 (AP/HT): APIC ID: =A01 >> =A0cpu2 (AP): APIC ID: =A02 >> =A0cpu3 (AP/HT): APIC ID: =A03 >> >> I will restart the test and see if this happens again. >> > I reproduced the previous problem on 10-CURRENT from r233917, on the > following platform (here running 8.2-RELEASE): > > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 8.2-RELEASE #0: Thu Feb 17 02:41:51 UTC 2011 > =A0 =A0root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Intel(R) Atom(TM) CPU D525 =A0 @ 1.80GHz (1800.01-MHz K8-class CPU) > =A0Origin =3D "GenuineIntel" =A0Id =3D 0x106ca =A0Family =3D 6 =A0Model = =3D 1c =A0Stepping =3D 10 > =A0Features=3D0xbfebfbff > =A0Features2=3D0x40e31d > =A0AMD Features=3D0x20100800 > =A0AMD Features2=3D0x1 > =A0TSC: P-state invariant > real memory =A0=3D 2136539136 (2037 MB) > avail memory =3D 2043772928 (1949 MB) > ACPI APIC Table: <010312 APIC0947> > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs > FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 HTT threads > =A0cpu0 (BSP): APIC ID: =A00 > =A0cpu1 (AP/HT): APIC ID: =A01 > =A0cpu2 (AP): APIC ID: =A02 > =A0cpu3 (AP/HT): APIC ID: =A03 > > Complete system freeze while running about 2400 threads. I had to > power cycle the system to get it back alive. I discussed a way to > debug this with attilio@ on freebsd-stable@, but still did not had > time to implement it. > 10-CURRENT from r233917 hanged again today while running 3600 threads. I enabled WITNESS and INVARIANTS on that specific kernel, secretly hoping that they would trigger some meaningful information, but they did not. I would guess my last attempt is to enable SW_WATCHDOG, and gather some state information out of DDB when the watchdog trigger, if it does... Btw, this issue seems to be specifically happening on Atom/ICH8M platform running amd64 kernel, as I've never seen it on other platforms, and yet ran extensive tests. I am not entirely sure it happens on i386. I would need to check. - Arnaud