From owner-freebsd-scsi@FreeBSD.ORG Wed Apr 21 02:16:32 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 80D24106567E for ; Wed, 21 Apr 2010 02:16:32 +0000 (UTC) (envelope-from mj@feral.com) Received: from ns1.feral.com (ns1.feral.com [192.67.166.1]) by mx1.freebsd.org (Postfix) with ESMTP id 2FBC18FC1A for ; Wed, 21 Apr 2010 02:16:31 +0000 (UTC) Received: from [192.168.0.102] (m206-63.dsl.tsoft.com [198.144.206.63]) by ns1.feral.com (8.14.3/8.14.3) with ESMTP id o3L2GVSP053993 for ; Tue, 20 Apr 2010 19:16:31 -0700 (PDT) (envelope-from mj@feral.com) Message-ID: <4BCE6016.5020108@feral.com> Date: Tue, 20 Apr 2010 19:16:54 -0700 From: Matthew Jacob User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: <4BCE6988.1060302@fuujingroup.com> In-Reply-To: <4BCE6988.1060302@fuujingroup.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Default is to whitelist mail, not delayed by milter-greylist-4.2.3 (ns1.feral.com [192.67.166.1]); Tue, 20 Apr 2010 19:16:31 -0700 (PDT) Subject: Re: isp and scsi_target X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Apr 2010 02:16:32 -0000 What a mess. I need to look at this in detail. The stuff was working (sort of) in RELENG_8, but got very little testing otherwise. > We're trying to get an emulated disk to show up on 7.3-REL and not > having much luck. This is a point-to-point connection with a pair of > Qlogic cards (pciconf below). There is no FC switch in between the > machines, and both cards were defaulted prior to testing (factory BIOS > settings). The moment I rescan the bus on the initiator, the target > machine panics and dumps core. The initiator hangs until the FC card > on the initiator resets, then returns to the prompt (wedge??). > > Here's the card (same in both machines though different scsi bus) > > isp0@pci0:5:1:0: class=0x0c0400 card=0x00091077 chip=0x23001077 > rev=0x01 hdr=0x00 > vendor = 'QLogic Corporation' > device = 'QLA2300 SANblade 2300 64-bit FC-AL Adapter' > class = serial bus > subclass = Fibre Channel > > > I get tons of debugging output on the target machine when launching > scsi_target with the following command: > > test001# scsi_target -d 3:0:0 /usr/home/testuser/target0 > > Here's a snip-it of the debugging output on the target machine after > the above command (goes on for pages): > > scsi_target: sending ccb (0x332) > scsi_target: sending ccb (0x334) > scsi_target: sending ccb (0x332) > scsi_target: sending ccb (0x334) > scsi_target: main loop beginning > > Then this when the initiator rescans the bus just before it tanks: > > scsi_target: read ready > scsi_target: event -1 done > scsi_target: Working on ATIO 0x2825c200 > scsi_target: tcmd_handle atio 0x2825c200 ctio 0x2825e0c0 atioflags 0x8000 > > And this in the log on the initiator when it comes back up: > > isp0: bad pdb (110) @ handle 0x1 > isp0: 0: hdl 0x1 PROB al1 tgt 0 TGT 0x0000e8 => UNK 0x000000; WWNN > 0x200000e08b08f56d WWPN 0x210000e08b08f56d > > > Here's the relevant kernel info on the target: > > # ISP SCSI Controllers > device isp # Qlogic family > device ispfw # Firmware for QLogic HBAs > options ISP_TARGET_MODE # Qlogic family target mode > device targ > device targbh > options CAMDEBUG > options VFS_AIO > > /boot/device.hints on the target: > > hint.isp.0.fullduplex="1" > hint.isp.0.topology="nport-only" > hint.isp.0.role="target" > > Here's the relevant kernel info on the initiator: > > # ISP SCSI Controllers > device isp # Qlogic family > device ispfw # Firmware for QLogic HBAs > device targ > device targbh > options CAMDEBUG > options VFS_AIO > > /boot/device.hints on the initiator: > > hint.isp.0.fullduplex="1" > hint.isp.0.topology="nport-only" > hint.isp.0.role="initiator" > hint.isp.0.iid="4" > > > I'm seeing this in the syslog on the initiator: > > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.5 (count 36, > resid 36, status not marked) > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, > resid 36, status not marked) > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.5 (count 36, > resid 36, status not marked) > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, > resid 36, status not marked) > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.5 (count 36, > resid 36, status not marked) > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, > resid 36, status not marked) > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, > resid 36, status not marked) > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.7 (count 36, > resid 36, status not marked) > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, > resid 36, status not marked) > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.7 (count 36, > resid 36, status not marked) > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, > resid 36, status not marked) > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.7 (count 36, > resid 36, status not marked) > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, > resid 36, status not marked) > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.7 (count 36, > resid 36, status not marked) > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, > resid 36, status not marked) > Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.7 (count 36, > resid 36, status not marked) > > > Here's the bt for the core dump after the panic which looks to be > pretty useless from my observation (I'd _love_ to be wrong!!): > > test001# kgdb kernel.debug /var/crash/vmcore.0 > > Unread portion of the kernel message buffer: > (targ0:isp0:0:0:0): targdone 0xc7b7b400 > (targ0:isp0:0:0:0): targread > (targ0:isp0:0:0:0): targread ccb 0xc7b7b400 (0x2825c200) > (targ0:isp0:0:0:0): targreturnccb 0xc7b7b400 > cam_debug: targfreeccb descr 0xc7b80060 and > cam_debug: freeing ccb 0xc7b7b400 > (targ0:isp0:0:0:0): write - uio_resid 4 > (targ0:isp0:0:0:0): Sending queued ccb 0x933 (0x2825e0c0) > (targ0:isp0:0:0:0): targstart 0xc73bd400 > (targ0:isp0:0:0:0): sendccb 0xc73bd400 > > > Fatal trap 12: page fault while in kernel mode > cpuid = 4; apic id = 04 > fault virtual address = 0x4 > fault code = supervisor read, page not present > instruction pointer = 0x20:0xc04f0a66 > stack pointer = 0x28:0xc6fe5900 > frame pointer = 0x28:0xc6fe5950 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 639 (scsi_target) > trap number = 12 > panic: page fault > cpuid = 4 > Uptime: 51s > Physical memory: 3767 MB > Dumping 102 MB: 87 71 55 39 23 7 > > Reading symbols from /boot/kernel/ispfw.ko...Reading symbols from > /boot/kernel/ispfw.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/ispfw.ko > Reading symbols from /boot/kernel/acpi.ko...Reading symbols from > /boot/kernel/acpi.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/acpi.ko > #0 doadump () at pcpu.h:196 > 196 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); > (kgdb) bt > #0 doadump () at pcpu.h:196 > #1 0xc05c4e87 in boot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:418 > #2 0xc05c5159 in panic (fmt=Variable "fmt" is not available. > ) at /usr/src/sys/kern/kern_shutdown.c:574 > #3 0xc08258bc in trap_fatal (frame=0xc6fe58c0, eva=4) at > /usr/src/sys/i386/i386/trap.c:950 > #4 0xc0825b20 in trap_pfault (frame=0xc6fe58c0, usermode=0, eva=4) at > /usr/src/sys/i386/i386/trap.c:863 > #5 0xc08264d9 in trap (frame=0xc6fe58c0) at > /usr/src/sys/i386/i386/trap.c:541 > #6 0xc080a1db in calltrap () at /usr/src/sys/i386/i386/exception.s:166 > #7 0xc04f0a66 in isp_pci_dmasetup (isp=0xc71de000, csio=0xc73bd400, > rq=0xc6fe59c4, nxtip=0xc6fe5a0c, optr=1) at > /usr/src/sys/dev/isp/isp_pci.c:2781 > #8 0xc04e96a1 in isp_action (sim=0xc7198e00, ccb=0xc73bd400) at > /usr/src/sys/dev/isp/isp_freebsd.c:1373 > #9 0xc0449104 in xpt_run_dev_sendq (bus=0xc71d65c0) at > /usr/src/sys/cam/cam_xpt.c:3894 > #10 0xc04495ce in xpt_action (start_ccb=0xc73bd400) at > /usr/src/sys/cam/cam_xpt.c:3056 > #11 0xc0466ee6 in targsendccb (softc=0xc744ee00, ccb=0xc73bd400, > descr=0xc7b80020) at /usr/src/sys/cam/scsi/scsi_target.c:787 > #12 0xc0467027 in targstart (periph=0xc71cc700, start_ccb=0xc73bd400) > at /usr/src/sys/cam/scsi/scsi_target.c:654 > #13 0xc044dd1d in xpt_run_dev_allocq (bus=0xc71d65c0) at > /usr/src/sys/cam/cam_xpt.c:3765 > #14 0xc044e0ad in xpt_schedule (perph=0xc71cc700, new_priority=1) at > /usr/src/sys/cam/cam_xpt.c:3665 > #15 0xc04684f4 in targwrite (dev=0xc7681000, uio=0xc6fe5c60, ioflag=0) > at /usr/src/sys/cam/scsi/scsi_target.c:599 > #16 0xc0586359 in giant_write (dev=0xc7681000, uio=0xc6fe5c60, > ioflag=0) at /usr/src/sys/kern/kern_conf.c:434 > #17 0xc054cbde in devfs_write_f (fp=0xc7631b94, uio=0xc6fe5c60, > cred=0xc7681600, flags=0, td=0xc7889240) at > /usr/src/sys/fs/devfs/devfs_vnops.c:1446 > #18 0xc05ff917 in dofilewrite (td=0xc7889240, fd=4, fp=0xc7631b94, > auio=0xc6fe5c60, offset=-1, flags=0) at file.h:257 > #19 0xc05ffbf8 in kern_writev (td=0xc7889240, fd=4, auio=0xc6fe5c60) > at /usr/src/sys/kern/sys_generic.c:402 > #20 0xc05ffc6f in write (td=0xc7889240, uap=0xc6fe5cfc) at > /usr/src/sys/kern/sys_generic.c:318 > #21 0xc0825e75 in syscall (frame=0xc6fe5d38) at > /usr/src/sys/i386/i386/trap.c:1101 > #22 0xc080a240 in Xint0x80_syscall () at > /usr/src/sys/i386/i386/exception.s:262 > #23 0x00000033 in ?? () > Previous frame inner to this frame (corrupt stack?) > (kgdb) > > Platform is a pair of HP DL580-G3 servers, quad 2.8GHz Xeon CPU's with > 4 gigs of ram in each (x86-32/i386, not x86-64/amd64). I've tried this > with and without the device.hints options, all resulting in a core > dump on the target and a hang on the initiator until the card in the > target gets reset on reboot. > > Any thoughts would be great. I'd like to get a SQL server up on these > FC cards. I understand I could use iSCSI, but the powers that be have > requested FC. >