Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Apr 2013 11:09:42 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        John <jwd@FreeBSD.org>
Cc:        FreeBSD SCSI <freebsd-scsi@freebsd.org>
Subject:   Re: Repeated msgs & kernel panic w/ r246437 (Revamp the CAM enclosure services driver)
Message-ID:  <517641C6.7010905@FreeBSD.org>
In-Reply-To: <20130422030053.GA23186@FreeBSD.org>
References:  <20130422030053.GA23186@FreeBSD.org>

Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help
On 22.04.2013 06:00, John wrote:
> Hi Folks,
>
>     After updating one of our servers to the latest stable image,
> it appears that commit r246437 appears to be causing it to panic.
>
> The commit:
>
> http://svnweb.freebsd.org/base?view=revision&revision=246437
>
> What one of our servers looks like:
>
> http://people.freebsd.org/~jwd/zfsnfsserver.jpg
>
> The last known working commit:
>
> http://people.freebsd.org/~jwd/r246437/dmesg.r246431.clean.txt
>
> With commit r246437:
>
> http://people.freebsd.org/~jwd/r246437/dmesg.r246437.log.txt
>
> Note, most of the dmesg output is related to the ses devices. It
> repeats itself multiple times before the panic.
>
> ses39: ses0,pass20: Element descriptor: '            '
> ses39: ses0,pass20: SAS Expander: 24 Physses39:  phy 0: connector 255 other 255
> ses39:  phy 1: connector 255 other 255
> ses39:  phy 2: connector 255 other 255
> ses39:  phy 3: connector 255 other 255
> ses39:  phy 4: connector 255 other 255
> ses39:  phy 5: connector 255 other 255
> ses39:  phy 6: connector 255 other 255
>
> etc, etc...

That is not my part of code, but I think it is just too verbose debug 
messages, that should be hidden.

> After just a few minutes, the system panics. A pair of images
> of the screen (sorry, no serial console at this time):
>
> Panic: http://people.freebsd.org/~jwd/r246437/20130419_160143.jpg
>
> bt: http://people.freebsd.org/~jwd/r246437/20130419_110158.jpg

Despite that you are talking about "latest stable image", I believe your 
kernel is not latest 9-STABLE. Your backtrace reminds me about locking 
problems that should be already fixed from several sides. For example, 
on present 9-STABLE ses_path_iter_devid_callback() doesn't call 
xpt_create_path(), but calls xpt_create_path_unlocked() instead. If you 
can reproduce the issue with latest 9-STABLE, please provide respective 
information.

> We are currently running a test to see if the fact that all our
> shelves are dual-attached, allowing us to use geom multipath is
> related. ie: we have disabled the 2nd HBA thus cutting the total
> number of da & ses devices in half and thus not executing the
> code in the commit that tracks duplicate ses devices.
>
> Note, if we disable both HBA devices and boot the system up it
> does not panic or print out the repeated messages, but of course
> we have no disks :-)
>
> I am unclear on the "connector 255 other 255" messages and have not
> taken the time to look into them yet.
>
> I would appreciate any insights folks can provide.
>
> Many Thanks,
> John
>
> ps: We've had to seriously increase the console buffer size to
> capture the complete dmesg output...
>
> options   MSGBUF_SIZE=(32768*32)
>
> Can we delay starting the kernel daemon until after the system
> is up and /var/log/messages is available?  Just a thought...

The goal of this code was to create persistent location-dependent names 
for devices. It may be better to have them earlier.

-- 
Alexander Motin



Want to link to this message? Use this URL: <http://docs.FreeBSD.org/cgi/mid.cgi?517641C6.7010905>