From owner-freebsd-scsi@FreeBSD.ORG Sun Sep 12 21:23:01 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7D553106566C for ; Sun, 12 Sep 2010 21:23:01 +0000 (UTC) (envelope-from freebsd-ml@bommel.de) Received: from mail.terralink.de (mail.terralink.de [217.9.16.16]) by mx1.freebsd.org (Postfix) with ESMTP id 445288FC14 for ; Sun, 12 Sep 2010 21:23:01 +0000 (UTC) Received: from sulaco.terralink.de (p579A4C1A.dip.t-dialin.net [87.154.76.26]) by mail.terralink.de (Postfix) with ESMTPA id 8CE16181652 for ; Sun, 12 Sep 2010 23:05:29 +0200 (CEST) Message-ID: <4C8D4073.9050100@bommel.de> Date: Sun, 12 Sep 2010 23:04:51 +0200 From: Gregor Moeller User-Agent: Thunderbird 2.0.0.23 (X11/20091021) MIME-Version: 1.0 To: freebsd-scsi@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: lost iscsi devices not recognised by zfs X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Sep 2010 21:23:01 -0000 Hi folks, given the following setup: iscsi initiator: 2.2.4.2, 8.1-STABLE (tried also 8.1-PRERELEASE with initiator 2.1.0) iscsi target: 8.1-STABLE as of 2010-09-12 ZFS pool on the initiator box: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 ada1 ONLINE 0 0 0 da0 ONLINE 0 0 0 mirror ONLINE 0 0 0 ada2 ONLINE 0 0 0 da1 ONLINE 0 0 0 mirror ONLINE 0 0 0 ada3 ONLINE 0 0 0 da2 ONLINE 0 0 0 ada1, ada2, ada3: local disks da0, da1, da2: iscsi disks A loss of all iscsi devices shouldn't render the pool unusable due to the mirroring. Problem: When I stop the iscsi target service to simulate an error on the target box (reboot etc.), the initiator recognises this: iscontrol[5634]: trapped signal 30 iscontrol[5628]: trapped signal 30 iscontrol[5638]: trapped signal 30 iscontrol: supervise going down iscontrol: supervise going down iscontrol[5628]: sess flags=2000040d iscontrol[5634]: sess flags=2000040d iscontrol[5628]: Reconnect iscontrol[5634]: Reconnect iscontrol: supervise going down iscontrol[5638]: sess flags=2000040d iscontrol[5638]: Reconnect recvpdu: Socket is not connected recvpdu failed iscontrol[5638]: terminated recvpdu: Socket is not connected recvpdu failed iscontrol[5634]: terminated recvpdu: Socket is not connected recvpdu failed iscontrol[5628]: terminated and shortly after this I see: (da2:iscsi2:0:0:0): lost device (da1:iscsi1:0:0:0): lost device (da0:iscsi0:0:0:0): lost device But somehow ZFS does not recognise the lost devices. The devices remain in ZFS "online" status with some r/w errors reported. The pool is unusable though, any r/w access hangs, as does a zpool detach tank da0 or a ls /tank. If I restart the target service after some minutes, the initiator doesn't reconnect although the processes are running: root 5628 0.0 0.0 9200 1616 ?? DEs 7:19PM 0:00.00 iscontrol -c /etc/iscsi/disk1.conf -n disk1 root 5634 0.0 0.0 9200 1616 ?? DEs 7:19PM 0:00.00 iscontrol -c /etc/iscsi/disk2.conf -n disk2 root 5638 0.0 0.0 9200 1616 ?? DEs 7:19PM 0:00.00 iscontrol -c /etc/iscsi/disk3.conf -n disk3 I don't know if this issue is related to the initiator (my guess) or ZFS or some other component (maybe even my misunderstanding of some concepts) so I kindly ask if someone can give me a hint? Best regards, Gregor From owner-freebsd-scsi@FreeBSD.ORG Mon Sep 13 11:07:02 2010 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C910E10656F0 for ; Mon, 13 Sep 2010 11:07:02 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id B47258FC24 for ; Mon, 13 Sep 2010 11:07:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o8DB723F001994 for ; Mon, 13 Sep 2010 11:07:02 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o8DB72wq001992 for freebsd-scsi@FreeBSD.org; Mon, 13 Sep 2010 11:07:02 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 13 Sep 2010 11:07:02 GMT Message-Id: <201009131107.o8DB72wq001992@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-scsi@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Sep 2010 11:07:03 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/149502 scsi [mpt] Latent buglet in debug print code o kern/148785 scsi [twa] [patch] twa driver doesn't pass proper max. io s o kern/148083 scsi [aac] Strange device reporting o kern/147704 scsi [mpt] sys/dev/mpt: new chip revision, partially unsupp o kern/146287 scsi [ciss] ciss(4) cannot see more than one SmartArray con o kern/145768 scsi [mpt] can't perform I/O on SAS based SAN disk in freeb o kern/144648 scsi [aac] Strange values of speed and bus width in dmesg o kern/144301 scsi [ciss] [hang] HP proliant server locks when using ciss o kern/142351 scsi [mpt] LSILogic driver performance problems o kern/141934 scsi [cam] [patch] add support for SEAGATE DAT Scopion 130 o kern/134488 scsi [mpt] MPT SCSI driver probes max. 8 LUNs per device o kern/132250 scsi [ciss] ciss driver does not support more then 15 drive o kern/132206 scsi [mpt] system panics on boot when mirroring and 2nd dri p kern/130735 scsi [cam] [patch] pass M_NOWAIT to the malloc() call insid o kern/130621 scsi [mpt] tranfer rate is inscrutable slow when use lsi213 o kern/129602 scsi [ahd] ahd(4) gets confused and wedges SCSI bus o kern/128452 scsi [sa] [panic] Accessing SCSI tape drive randomly crashe o kern/128245 scsi [scsi] "inquiry data fails comparison at DV1 step" [re o kern/127927 scsi [isp] isp(4) target driver crashes kernel when set up o kern/124667 scsi [amd] [panic] FreeBSD-7 kernel page faults at amd-scsi o kern/123674 scsi [ahc] ahc driver dumping o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/120487 scsi [sg] scsi_sg incompatible with scanners o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s o kern/119668 scsi [cam] [patch] certain errors are too verbose comparing o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/94838 scsi Kernel panic while mounting SD card with lock switch o o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load o kern/60598 scsi wire down of scsi devices conflicts with config s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/40895 scsi wierd kernel / device driver bug o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce 42 problems total. From owner-freebsd-scsi@FreeBSD.ORG Mon Sep 13 23:17:29 2010 Return-Path: Delivered-To: freebsd-scsi@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 647CD1065670; Mon, 13 Sep 2010 23:17:29 +0000 (UTC) (envelope-from delphij@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 3A85A8FC14; Mon, 13 Sep 2010 23:17:29 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o8DNHTvY058603; Mon, 13 Sep 2010 23:17:29 GMT (envelope-from delphij@freefall.freebsd.org) Received: (from delphij@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o8DNHSoD058599; Mon, 13 Sep 2010 23:17:28 GMT (envelope-from delphij) Date: Mon, 13 Sep 2010 23:17:28 GMT Message-Id: <201009132317.o8DNHSoD058599@freefall.freebsd.org> To: lampa@fit.vutbr.cz, delphij@FreeBSD.org, freebsd-scsi@FreeBSD.org, delphij@FreeBSD.org From: delphij@FreeBSD.org Cc: Subject: Re: kern/148785: [twa] [patch] twa driver doesn't pass proper max. io size to cam X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Sep 2010 23:17:29 -0000 Synopsis: [twa] [patch] twa driver doesn't pass proper max. io size to cam State-Changed-From-To: open->patched State-Changed-By: delphij State-Changed-When: Mon Sep 13 23:16:56 UTC 2010 State-Changed-Why: This issue have been fixed in HEAD and RELENG_8. Responsible-Changed-From-To: freebsd-scsi->delphij Responsible-Changed-By: delphij Responsible-Changed-When: Mon Sep 13 23:16:56 UTC 2010 Responsible-Changed-Why: Take. http://www.freebsd.org/cgi/query-pr.cgi?pr=148785 From owner-freebsd-scsi@FreeBSD.ORG Tue Sep 14 16:12:25 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7D68210656A8 for ; Tue, 14 Sep 2010 16:12:25 +0000 (UTC) (envelope-from niklas@saers.com) Received: from mail-ew0-f54.google.com (mail-ew0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 198F58FC23 for ; Tue, 14 Sep 2010 16:12:24 +0000 (UTC) Received: by ewy4 with SMTP id 4so3678384ewy.13 for ; Tue, 14 Sep 2010 09:12:24 -0700 (PDT) Received: by 10.213.17.7 with SMTP id q7mr212708eba.23.1284480743985; Tue, 14 Sep 2010 09:12:23 -0700 (PDT) Received: from [172.10.20.2] ([109.56.201.224]) by mx.google.com with ESMTPS id a48sm462548eei.12.2010.09.14.09.12.12 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 14 Sep 2010 09:12:22 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: Niklas Saers In-Reply-To: <4C80A52A.5080300@darkbsd.org> Date: Tue, 14 Sep 2010 18:11:56 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <6D86094C-866F-4C7A-92CD-BD7A8B570BEB@saers.com> References: <4C80A52A.5080300@darkbsd.org> To: Stephane LAPIE X-Mailer: Apple Mail (2.1081) Cc: freebsd-scsi@freebsd.org, =?iso-8859-1?Q?Dag-Erling_Sm=F8rgrav?= Subject: Re: mpt0 and removing disks X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Sep 2010 16:12:25 -0000 Hi Stephane, I thought you'd be interested on hearing the follow-up: we've now got an = mfi controller (LSI MegaSAS Gen2) instead. I'm looking forward to trying = it out, the cable turned out to be too short, so it'll be interesting to = see how it works out once we get the cable. :-) Cheers Nik= From owner-freebsd-scsi@FreeBSD.ORG Thu Sep 16 18:12:28 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 942ED106566C for ; Thu, 16 Sep 2010 18:12:28 +0000 (UTC) (envelope-from PMahan@adaranet.com) Received: from barracuda.adaranet.com (smtp.adaranet.com [72.5.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 7C8928FC0A for ; Thu, 16 Sep 2010 18:12:28 +0000 (UTC) X-ASG-Debug-ID: 1284659816-5061225e0001-NzfR5x Received: from SJ-EXCH-1.adaranet.com ([10.10.1.29]) by barracuda.adaranet.com with ESMTP id rkED9r3SxhqRVfBp for ; Thu, 16 Sep 2010 10:56:56 -0700 (PDT) X-Barracuda-Envelope-From: PMahan@adaranet.com Received: from mycroft.adaranet.com (10.10.24.100) by SJ-EXCH-1.adaranet.com (10.10.1.29) with Microsoft SMTP Server (TLS) id 8.1.240.5; Thu, 16 Sep 2010 10:56:56 -0700 Message-ID: <4C925B75.5090305@adaranet.com> X-Barracuda-BBL-IP: nil Date: Thu, 16 Sep 2010 11:01:25 -0700 From: Patrick Mahan User-Agent: Thunderbird 2.0.0.23 (X11/20091021) MIME-Version: 1.0 To: X-ASG-Orig-Subj: [Fwd: Crash dump on HP Proliant G6 broken as of V8.0] Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Barracuda-Connect: UNKNOWN[10.10.1.29] X-Barracuda-Start-Time: 1284659816 X-Barracuda-URL: http://172.16.10.203:8000/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at adaranet.com Subject: [Fwd: Crash dump on HP Proliant G6 broken as of V8.0] X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 18:12:28 -0000 Originally posted this to freebsd-hackers, now posting it to freebsd-scsi. Any help is appreciated. Thanks, Patrick -------- Original Message -------- Subject: Crash dump on HP Proliant G6 broken as of V8.0 Date: Wed, 15 Sep 2010 11:53:16 -0700 From: Paul Heyman To: freebsd-hackers@freebsd.org CC: Patrick Mahan References: <32AB5C9615CC494997D9ABB1DB12783C024C8C5A95@SJ-EXCH-1.adaranet.com>,<32AB5C9615CC494997D9ABB1DB12783C024C8DE83F@SJ-EXCH-1.adaranet.com>,<32AB5C9615CC494997D9ABB1DB12783C024C8C5A9C@SJ-EXCH-1.adaranet.com> ALL, The crash dump worked fine in V7.3. I am debugging crash dump problem on a HP Proliant G6 which uses a SATA drive connected to a CISS Raid Controller. I have tried this on a x86 box using a non-raid ATA/SATA disk controller and it works well. I noticed that in V8.0 there is a new SCSI operating method. In the v7.3 version there was only CISS_TRANSPORT_METHOD_SIMPLE, but in v8.0 there has been CISS_TRANSPORT_METHOD_PERF method added. These methods have different function calls in ciss_poll_request. The dump comand starts with a call to dadump. This function will setup a struct ccb_scsiio structure. This is done by calling scsi_read_write. Then the meat of dump happens when it calls xpt_polled_action, which manages and simualtes interrupt functionality that is working fine. The disk operations work fine except during a crash dump. I have turned debug on for CISS and CAMDEBUG to debug this problem. In xpt_polled_action (cam_xpt.c) we get past the first polling loop at line 3013, as both devq->send_opening and dev->ccbq.dev_openings are > 0 ( 256 and 254 ). But we do get stuck in the second one at line 3025. We eventually time out setting start_ccb->ccb_h.status to CAM_CMD_TIMEOUT. The timeout is set with DA_DEFAULT_TIMEOUT (scsi_da.c) which is set to 60, and is used in the call to scsi_read_write. Here is the debug trace: Dumping 1240 MB: ciss_cam_action_io: XPT_SCSI_IO 0:0:0 ciss_get_request: called ciss_start: post command 150 tag 600 ciss_map_request: called ciss_request_map_helper: called ciss_cam_poll: called ciss_perf_done: completed command 150 ciss_perf_done: completed command 150 ciss_complete: called ciss_unmap_request: called ciss_cam_complete: called _ciss_report_request: called ciss_cam_complete: SCSI_STATUS_OK ciss_release_request: called ciss_complete: called ciss_unmap_request: called ciss0: WARNING: completing non-busy request ciss_cam_complete: called _ciss_report_request: called ciss_cam_complete: SCSI_STATUS_OK . . . . after about 60 seconds ciss0: WARNING: completing non-busy request ciss0: WARNING: completed command with no submitter ciss_unmap_request: called . . . This goes on forever Thanks Paul Paul Heyman pheyman@adaranetworks.com