Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Feb 2010 19:16:19 +0200
From:      Alexander Motin <mav@FreeBSD.org>
To:        Lawrence Stewart <lstewart@freebsd.org>
Cc:        svn-src-stable@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, svn-src-stable-8@FreeBSD.org
Subject:   Re: svn commit: r203889 - in stable/8/sys: cam cam/ata cam/scsi dev/ahci dev/asr dev/ata dev/ciss dev/hptiop dev/hptrr dev/mly dev/mpt dev/ppbus dev/siis dev/trm dev/twa dev/usb/storage
Message-ID:  <4B7EC763.4090507@FreeBSD.org>
In-Reply-To: <4B7D4962.8070706@freebsd.org>
References:  <201002141938.o1EJcRpx065470@svn.freebsd.org> <4B7D4962.8070706@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Lawrence Stewart wrote:
> A couple of times it has gotten even more upset reporting things like this:
> 
> mpt0: mpt_cam_event: 0x16
> mpt0: mpt_cam_event: 0x16
> mpt0: request 0xffffff80002f1400:54058 timed out for ccb
> 0xffffff0001c65000 (req->ccb 0xffffff0001c65000)
> mpt0: attempting to abort req 0xffffff80002f1400:54058 function 0
> mpt0: request 0xffffff80002fd100:54059 timed out for ccb
> 0xffffff009f3ec800 (req->ccb 0xffffff009f3ec800)
> mpt0: request 0xffffff80002efcf0:54060 timed out for ccb
> 0xffffff0001bd2000 (req->ccb 0xffffff0001bd2000)
> mpt0: mpt_recover_commands: IOC Status 0x4a. Resetting controller.
> mpt0: mpt_cam_event: 0x0
> mpt0: mpt_cam_event: 0x0
> mpt0: completing timedout/aborted req 0xffffff80002f1400:54058
> mpt0: completing timedout/aborted req 0xffffff80002fd100:54059
> mpt0: completing timedout/aborted req 0xffffff80002efcf0:54060
> mpt0: mpt_cam_event: 0x16
> mpt0: mpt_cam_event: 0x12
> mpt0: mpt_cam_event: 0x12
> mpt0: mpt_cam_event: 0x16
> mpt0: Volume(0:2): Volume Status Changed
> mpt0: request 0xffffff80002f8990:0 timed out for ccb 0xffffff009f3cb800
> (req->ccb 0)
> 
> No ill effects are observed after such an episode and the array remains
> in healthy as-normal state. The only observable problem is the stall of
> all disk IO while these events occur.

I have no idea how mpt driver works, neither I have hardware to play,
but quick look shows that 0x12 event is MPI_EVENT_SAS_PHY_LINK_STATUS,
and 0x16 is MPI_EVENT_SAS_DISCOVERY. Both are not handled by mpt driver
and so logged. I would say something is going on at physical level of
your SAN. Timeouts are also could be the result of physical issues.

> As best I can tell, the hardware is ok, both disks report as fine
> without SMART errors and are only 2 months old, so wanted to rule out
> software issues. On upgrading to recent 8-STABLE, I got a page fault
> kernel panic on boot in the mpt driver mpt_raid0 kproc. After some trial
> and error, r203888 is the most recent revision that boots fine, whilst
> r203889 exhibits the page fault. I should also note that r203888 still
> sees the "mpt0: mpt_cam_event: 0x16" messages and associated disk IO
> stalls.
> 
> I compiled DDB into my r203889 kernel. Unfortunately my ILO emulates a
> USB keyboard so I can't do anything in DDB which is a huge pain, but
> here's the info I did get (hand transcribed):
> 
> Fatal trap 12: page fault while in kernel mode
> current process: mpt_raid0
> Stopped at xpt_rescan+0x1d:     movq   0x10(%rsi),%rdx
> 
> 1. Any thoughts on how to resolve the regression in the mpt driver with
> the r203889 commit?

Any thoughts where to find a good telepath? :)

For the beginning, show at least verbose boot messages up to the crash.
Full panic message could also be useful, it may show address of the
fault instruction, which may be resolved to source line with addr2line
tool. If you could find a good old PS/2 keyboard, backtrace would be
interesting to see.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B7EC763.4090507>