From owner-freebsd-scsi@FreeBSD.ORG Tue Apr 23 08:09:47 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E65FB6FC; Tue, 23 Apr 2013 08:09:47 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-ea0-x22c.google.com (mail-ea0-x22c.google.com [IPv6:2a00:1450:4013:c01::22c]) by mx1.freebsd.org (Postfix) with ESMTP id 55FD01EC6; Tue, 23 Apr 2013 08:09:47 +0000 (UTC) Received: by mail-ea0-f172.google.com with SMTP id g14so118857eak.3 for ; Tue, 23 Apr 2013 01:09:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:sender:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=v2yxHfIvLDSvtogL8lpTVmmFjb+ZQBJ+ApiDmYGoCjE=; b=D7TtcvfyWFYXW59vVt4Trx/8QAiYKZcRusSozwWfS30igluc2jKZjhQzJ9yQX4+Qfc +GqiSgIjni6oGfRfuzwHu+K0cMA2JRPz5zV0dhnA3Eoc7pyqhZON+f6Sch3+zPyI9cCk Ry2mdGbTumg/4vAqeG4YMqBTsO8wSDLaMDYJkKTDPu00L/1H1AQSwGZ8hwPfOInZTKq6 /byjx+p5kQAOiS7jHEnhTm7j+l2L94IykQ34CHkIPpnsnNoo5bvultvVkbkdXiIVrGe+ vpbIkn7+F7seEbvZgveYXlIWPwXyV/jw9/XXQQlN26iEv4t4Gsg3CYRZqvRhN2rUKVMB lQHA== X-Received: by 10.14.214.65 with SMTP id b41mr6618922eep.37.1366704586366; Tue, 23 Apr 2013 01:09:46 -0700 (PDT) Received: from mavbook.mavhome.dp.ua (mavhome.mavhome.dp.ua. [213.227.240.37]) by mx.google.com with ESMTPS id s47sm45202086eeg.8.2013.04.23.01.09.44 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 23 Apr 2013 01:09:45 -0700 (PDT) Sender: Alexander Motin Message-ID: <517641C6.7010905@FreeBSD.org> Date: Tue, 23 Apr 2013 11:09:42 +0300 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130413 Thunderbird/17.0.5 MIME-Version: 1.0 To: John Subject: Re: Repeated msgs & kernel panic w/ r246437 (Revamp the CAM enclosure services driver) References: <20130422030053.GA23186@FreeBSD.org> In-Reply-To: <20130422030053.GA23186@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD SCSI X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Apr 2013 08:09:48 -0000 On 22.04.2013 06:00, John wrote: > Hi Folks, > > After updating one of our servers to the latest stable image, > it appears that commit r246437 appears to be causing it to panic. > > The commit: > > http://svnweb.freebsd.org/base?view=revision&revision=246437 > > What one of our servers looks like: > > http://people.freebsd.org/~jwd/zfsnfsserver.jpg > > The last known working commit: > > http://people.freebsd.org/~jwd/r246437/dmesg.r246431.clean.txt > > With commit r246437: > > http://people.freebsd.org/~jwd/r246437/dmesg.r246437.log.txt > > Note, most of the dmesg output is related to the ses devices. It > repeats itself multiple times before the panic. > > ses39: ses0,pass20: Element descriptor: ' ' > ses39: ses0,pass20: SAS Expander: 24 Physses39: phy 0: connector 255 other 255 > ses39: phy 1: connector 255 other 255 > ses39: phy 2: connector 255 other 255 > ses39: phy 3: connector 255 other 255 > ses39: phy 4: connector 255 other 255 > ses39: phy 5: connector 255 other 255 > ses39: phy 6: connector 255 other 255 > > etc, etc... That is not my part of code, but I think it is just too verbose debug messages, that should be hidden. > After just a few minutes, the system panics. A pair of images > of the screen (sorry, no serial console at this time): > > Panic: http://people.freebsd.org/~jwd/r246437/20130419_160143.jpg > > bt: http://people.freebsd.org/~jwd/r246437/20130419_110158.jpg Despite that you are talking about "latest stable image", I believe your kernel is not latest 9-STABLE. Your backtrace reminds me about locking problems that should be already fixed from several sides. For example, on present 9-STABLE ses_path_iter_devid_callback() doesn't call xpt_create_path(), but calls xpt_create_path_unlocked() instead. If you can reproduce the issue with latest 9-STABLE, please provide respective information. > We are currently running a test to see if the fact that all our > shelves are dual-attached, allowing us to use geom multipath is > related. ie: we have disabled the 2nd HBA thus cutting the total > number of da & ses devices in half and thus not executing the > code in the commit that tracks duplicate ses devices. > > Note, if we disable both HBA devices and boot the system up it > does not panic or print out the repeated messages, but of course > we have no disks :-) > > I am unclear on the "connector 255 other 255" messages and have not > taken the time to look into them yet. > > I would appreciate any insights folks can provide. > > Many Thanks, > John > > ps: We've had to seriously increase the console buffer size to > capture the complete dmesg output... > > options MSGBUF_SIZE=(32768*32) > > Can we delay starting the kernel daemon until after the system > is up and /var/log/messages is available? Just a thought... The goal of this code was to create persistent location-dependent names for devices. It may be better to have them earlier. -- Alexander Motin