From owner-freebsd-stable@FreeBSD.ORG Wed Oct 31 18:25:17 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9EE7840C for ; Wed, 31 Oct 2012 18:25:17 +0000 (UTC) (envelope-from cowens@greatbaysoftware.com) Received: from ecbiz102.inmotionhosting.com (ecbiz102.inmotionhosting.com [70.39.235.94]) by mx1.freebsd.org (Postfix) with ESMTP id 513378FC0C for ; Wed, 31 Oct 2012 18:25:17 +0000 (UTC) Received: from c-24-218-93-106.hsd1.nh.comcast.net ([24.218.93.106]:61964 helo=jack.bspruce.com) by ecbiz102.inmotionhosting.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.77) (envelope-from ) id 1TTbcg-0006Ge-Qg; Wed, 31 Oct 2012 12:58:34 -0400 Message-ID: <509158BC.7090901@greatbaysoftware.com> Date: Wed, 31 Oct 2012 12:58:36 -0400 From: Charles Owens Organization: Great Bay Software User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: freebsd-stable@freebsd.org, Steve McCoy , Jack Vogel , jdc@parodius.com Subject: Panic during kernel boot, igb-init related? (8.3-RELEASE) X-Priority: 2 (High) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - ecbiz102.inmotionhosting.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - greatbaysoftware.com Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Oct 2012 18:25:17 -0000 Hello, We're seeing boot-time panics in about 4% of cases when upgrading from FreeBSD 8.1 to 8.3-RELEASE (i386). This problem is subtle enough that it escaped detection during our regular testing cycle... now with over 100 systems upgraded we're convinced there's a real issue. Our kernel config is essentially PAE (ie. static modules ... with a few drivers added/removed). The hardware is Intel Server System SR1625UR. This appears to match a finding discussed in these threads, having to do with timing of initialization of the igb(4)-based NICs (if I'm understanding it properly): http://lists.freebsd.org/pipermail/freebsd-stable/2011-May/062596.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062949.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063867.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063958.html These threads include some potential patches and possibility of commit/MFC... but it isn't clear that there was ever final resolution (and MFC to 8-stable). I've cc'd a few folks from back then. A real challenge here is the frequency of occurrence. As mentioned, it only hit's a fraction of our systems. When it _does_ hit, the system may enter a reboot loop for days and then mysteriously break out of it... and thereafter seem to work fine. I'd be very grateful for any help. Some questions: * Was there ever a final "blessed" patch? o if so, will it apply to RELENG_8_3? * Is there anything that could be said that might help us with reproducing-the-problem / testing / validating-a-fix? Panic message is -- panic: m_getzone: m_getjcl: invalid cluster type cpuid = 0 KDB: stack backtrace: #0 0xc059c717 at kdb_backtrace+0x47 #1 0xc056caf7 at panic+0x117 #2 0xc03c979e at igb_refresh_mbufs+0x25e #3 0xc03c9f98 at igb_rxeof+0x638 #4 0xc03ca135 at igb_msix_que+0x105 #5 0xc0541e2b at intr_event_execute_handlers+0x13b #6 0xc05434eb at ithread_loop+0x6b #7 0xc053efb7 at fork_exit+0x97 #8 0xc0806744 at fork_trampoline+0x8 Thanks very much, Charles -- Charles Owens Great Bay Software, Inc.