From owner-freebsd-scsi@FreeBSD.ORG Mon Jul 1 11:06:53 2013 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id ED01265B for ; Mon, 1 Jul 2013 11:06:53 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id DDC3F10EA for ; Mon, 1 Jul 2013 11:06:53 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r61B6rFN085911 for ; Mon, 1 Jul 2013 11:06:53 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r61B6rgc085909 for freebsd-scsi@FreeBSD.org; Mon, 1 Jul 2013 11:06:53 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 1 Jul 2013 11:06:53 GMT Message-Id: <201307011106.r61B6rgc085909@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-scsi@FreeBSD.org Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Jul 2013 11:06:54 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/179932 scsi [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP o kern/178795 scsi [mps] MSI for mps driver doesn't work under vmware o kern/165982 scsi [mpt] mpt instability, drive resets, and losses on Fre o kern/165740 scsi [cam] SCSI code must drain callbacks before free f kern/162256 scsi [mpt] QUEUE FULL EVENT and 'mpt_cam_event: 0x0' o docs/151336 scsi Missing documentation of scsi_ and ata_ functions in c o kern/148083 scsi [aac] Strange device reporting o kern/144648 scsi [aac] Strange values of speed and bus width in dmesg o kern/142351 scsi [mpt] LSILogic driver performance problems o kern/134488 scsi [mpt] MPT SCSI driver probes max. 8 LUNs per device o kern/130621 scsi [mpt] tranfer rate is inscrutable slow when use lsi213 f kern/129602 scsi [ahd] ahd(4) gets confused and wedges SCSI bus f kern/123674 scsi [ahc] ahc driver dumping o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc 14 problems total. From owner-freebsd-scsi@FreeBSD.ORG Mon Jul 1 11:40:01 2013 Return-Path: Delivered-To: freebsd-scsi@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AF68FB91 for ; Mon, 1 Jul 2013 11:40:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id A086B15C9 for ; Mon, 1 Jul 2013 11:40:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r61Be14M095460 for ; Mon, 1 Jul 2013 11:40:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r61Be1Gx095459; Mon, 1 Jul 2013 11:40:01 GMT (envelope-from gnats) Date: Mon, 1 Jul 2013 11:40:01 GMT Message-Id: <201307011140.r61Be1Gx095459@freefall.freebsd.org> To: freebsd-scsi@FreeBSD.org Cc: From: Markus Gebert Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Markus Gebert List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Jul 2013 11:40:01 -0000 The following reply was made to PR kern/179932; it has been noted by GNATS. From: Markus Gebert To: bug-followup@FreeBSD.org, =?iso-8859-1?Q?Philipp_M=E4chler?= , "sean_bruno@yahoo.com" Cc: Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) Date: Fri, 28 Jun 2013 15:23:25 +0200 I already replied to Sean last friday but the copy has not made its way = into the PR system, probably due to the size of the alltrace attachment. = So here's a dropbox link: = https://dl.dropboxusercontent.com/u/10669369/alltrace%20G8%20PERF%209.1cis= s.txt And the email I sent: ---- Hi again One of the G8 blades we had booted back into PERF mode yesterday stalled = last night. This one was _not_ patched with the ciss changes from head, = so it was the plain 9.1 driver running in default PERF mode. This means = that as soon as we restore the original default configuration, the = stalls come back quite quickly. Of course we still don't know wether it = is the patch, SIMPLE mode or the combination of both that seems to help = the other systems. Hopefully we'll know that soon. Anyway, we got an alltrace from the server that stalled. Unfortunately = no ciss debug outpout since it was not patched. Markus ---- From owner-freebsd-scsi@FreeBSD.ORG Wed Jul 3 03:20:02 2013 Return-Path: Delivered-To: freebsd-scsi@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EBAF68C6 for ; Wed, 3 Jul 2013 03:20:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id DE9AB18B4 for ; Wed, 3 Jul 2013 03:20:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r633K2Xe000256 for ; Wed, 3 Jul 2013 03:20:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r633K2eQ000255; Wed, 3 Jul 2013 03:20:02 GMT (envelope-from gnats) Date: Wed, 3 Jul 2013 03:20:02 GMT Message-Id: <201307030320.r633K2eQ000255@freefall.freebsd.org> To: freebsd-scsi@FreeBSD.org Cc: From: Sean Bruno Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Sean Bruno List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 03:20:03 -0000 The following reply was made to PR kern/179932; it has been noted by GNATS. From: Sean Bruno To: bug-followup@FreeBSD.org, philipp.maechler@hostpoint.ch Cc: Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) Date: Tue, 02 Jul 2013 20:13:38 -0700 --=-1oDHSwgoKhillQ25c2wC Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable While I wait for a p420 and p410 to test your configuration, I setup a dl180g6 with a p400 in a test configuration. I've had some interesting results. I setup two disks in a RAID1+0 volume for root and swap and UFS I created a 10 disk raidz2 volume named "ztest": ---------------------------------------------------------------------------= ----- bash-4.2# swapinfo; df -k; zpool status Device 512-blocks Used Avail Capacity /dev/da0p3 16777216 0 16777216 0% ---------------------------------------------------------------------------= ----- Filesystem 1024-blocks Used Avail Capacity Mounted on /dev/da0p2 4058430 1256338 2477418 34% / devfs 1 1 0 100% /dev /dev/da0p4 245197222 6081304 219500142 3% /home procfs 4 4 0 100% /proc ztest 1118637925 67 1118637858 0% /ztest ---------------------------------------------------------------------------= ----- pool: ztest state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM ztest ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da7 ONLINE 0 0 0 da8 ONLINE 0 0 0 da9 ONLINE 0 0 0 da10 ONLINE 0 0 0 errors: No known data errors ---------------------------------------------------------------------------= ----- I find that as we reach maximum capacity of /ztest, zfs really starts bogging down. =20 I'm running iozone with the following parameters: ---------------------------------------------------------------------------= ----- iozone -s 50G -r 256K -+q 30 -i 0 -i 1 -R -t 12 -F /ztest/1 /ztest/2 /ztest/3 /ztest/4 /ztest/5 /ztest/6 /ztest/7 /ztest/8 = /ztest/9 /ztest/10 /ztest/11 /ztest/12 ---------------------------------------------------------------------------= ----- This would be a good start to more reliably test your configuration. I didn't bother doing gpt partitions and the geom partitioning scheme in this configuration because its not bootable. Sean --=-1oDHSwgoKhillQ25c2wC Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (FreeBSD) iQEcBAABAgAGBQJR05bZAAoJEBkJRdwI6BaH1BgH/3ohHo9N533wnCXnGPyQFR2d WRJQWwT6NW/YyuHk5M5kNxupRegfS2IHPvrTCEqEHJEgKHaoFkbyUw1BNH0iqECM eelvL8Y4vHxBPqL9BHbUBz6rW19piEB8hZP0kPPhsbI5GR67zzHmIRV82WGTIwQg CtHbEK+77f1Ol5onJ1SYkDJh06swRPOMFZOnAqU2cl2Aw6apLQSWiF2PQ3+jeOk7 wtiqKuL8Sz0428GKvDE6spWNrEW9FS7I3yC6pHKKQB07r7mYgzmg80pJ8KG4qcZ8 f1b/UlBg1nkYb/Tj1LsD6bx0kgp4yHVBRxHo8f8HhGfrU21CZOD197k8Kr69uDA= =GSPu -----END PGP SIGNATURE----- --=-1oDHSwgoKhillQ25c2wC-- From owner-freebsd-scsi@FreeBSD.ORG Fri Jul 5 08:30:01 2013 Return-Path: Delivered-To: freebsd-scsi@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CC229859 for ; Fri, 5 Jul 2013 08:30:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id A4E9C1B0C for ; Fri, 5 Jul 2013 08:30:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r658U1WQ013654 for ; Fri, 5 Jul 2013 08:30:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r658U1X4013653; Fri, 5 Jul 2013 08:30:01 GMT (envelope-from gnats) Date: Fri, 5 Jul 2013 08:30:01 GMT Message-Id: <201307050830.r658U1X4013653@freefall.freebsd.org> To: freebsd-scsi@FreeBSD.org Cc: From: Markus Gebert Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Markus Gebert List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 08:30:01 -0000 The following reply was made to PR kern/179932; it has been noted by GNATS. From: Markus Gebert To: bug-followup@FreeBSD.org, =?iso-8859-1?Q?Philipp_M=E4chler?= , "sean_bruno@yahoo.com" Cc: Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) Date: Fri, 5 Jul 2013 10:19:58 +0200 Hey Sean I'm glad to hear you're getting the same controller as ours to test. In = the meantime it seems that the backported ciss changes from head seem to = help a lot on the G8 blades with the p220 controllers. It's quite likely = that the G8 problem is already fixed in head. Of course, we can't be = sure yet, but still it might be better to focus on the G7 with p410 and = storage blade, where the issue has occured even with ciss from head. So = it's good your getting a p410. We discussed your test scenario. ZFS is known to go nuts and do really = much IO once a zpool get quite full, so is your goal just to maximise IO = to reproduce the problem more reliably? Or is there a specific reason = why you want us to fill a zpool? Our problem is that half of the G7 blades are productive, so filling the = zpool is no option there. The second half is where the first half = replicates all data to, so they're kind of hot standby and we're more = flexibel doing tests there, but we still have to keep the replication = running, which makes filling the pool impossible as well. The day before yesterday we installed the patched kernel that has ciss = from head and CISS_DEBUG defined on all these standby systems. We run = zpool scrubs non-stop on all of them to generate IO and as they are = replication targets, they also receive some amount of write IO. Like = that, we hope to get a system to stall more often, so we can progress = more quickly debugging the G7 problem. If you think that more write IO = would help, we can look into using iozone, but a stated before, we won't = be able to do things like filling the zpool. Also, once a G7 blade stalls, is there any information apart from = alltrace and DDB ciss debug print you want as to pull out of the system? When reading through the ciss driver source I noticed that the DDB print = may only outpout information about the first controller. Since the = storage blade contains a second p410, do you think it'd be worth to = alter the debug function to print out information about any ciss = controller in the system? Markus From owner-freebsd-scsi@FreeBSD.ORG Fri Jul 5 12:11:07 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 11ADFDEB for ; Fri, 5 Jul 2013 12:11:07 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from proxypop04.sare.net (proxypop04.sare.net [194.30.0.65]) by mx1.freebsd.org (Postfix) with ESMTP id CE9A01892 for ; Fri, 5 Jul 2013 12:11:06 +0000 (UTC) Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11]) by proxypop04.sare.net (Postfix) with ESMTPSA id 125FF9DC6BA for ; Fri, 5 Jul 2013 14:01:59 +0200 (CEST) From: Borja Marcos Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: -STABLE: scsi_da.c and ATA on SAS Date: Fri, 5 Jul 2013 14:01:55 +0200 Message-Id: <779CB2D3-C7D3-410E-BF7B-0BB931CF7E10@sarenet.es> To: freebsd-scsi@freebsd.org Mime-Version: 1.0 (Apple Message framework v1283) X-Mailer: Apple Mail (2.1283) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 12:11:07 -0000 Hi I am trying to update to -STABLE (as of today) in order to test the new = trim facility for ZFS. In my system I have a mix of SAS and SATA drives on a SAS backplane: # camcontrol devlist at scbus6 target 8 lun 0 (da0,pass0) at scbus6 target 9 lun 0 (da1,pass1) at scbus6 target 10 lun 0 (da2,pass2) at scbus6 target 11 lun 0 (da3,pass3) at scbus6 target 12 lun 0 (da4,pass4) at scbus6 target 13 lun 0 (da5,pass5) at scbus6 target 14 lun 0 (da6,pass6) at scbus6 target 15 lun 0 (da7,pass7) at scbus6 target 16 lun 0 (da8,pass8) at scbus6 target 17 lun 0 (da9,pass9) at scbus6 target 18 lun 0 = (da10,pass10) at scbus6 target 19 lun 0 = (da11,pass11) at scbus6 target 20 lun 0 = (da12,pass12) at scbus6 target 21 lun 0 = (da13,pass13) < OCZ-VERTEX4 1.5> at scbus6 target = 22 lun 0 (da14,pass14) at scbus6 target 23 lun 0 = (da15,pass15) at scbus8 target 0 lun 0 = (ses0,pass16) at scbus8 target 1 lun 0 = (ses1,pass17) at scbus8 target 2 lun 0 = (ses2,pass18) at scbus16 target 0 lun 0 = (pass19,cd0) So far, the system worked perfectly. It's a Sun server with an "aac" = raid card, and I applied a patch to aac_cam.c so that=20 disks can be directly attached to the "da"driver instead of creating = "JBOD" volumes for the disks, which is error prone and silly using ZFS = anyway. It has been working flawlessly., even though I had to add a "quirk" to = scsi_da.c so that it won't try a read_capacity(16) on the OCZ disk. However, after updating to -STABLE yesterday I can't get the SSD to be = recognized: (da14:aacp0:0:22:0): got CAM status 0x84 (da14:aacp0:0:22:0): fatal error, failed to attach to device (da14:aacp0:0:22:0): lost device - 0 outstanding, 5 refs (da14:aacp0:0:22:0): removing device entry root@rasputin:/pool/usrsrc/sys/dev/aac # camcontrol reset 0:22:0 camcontrol: cam_open_btl: no passthrough device found at 0:22:0 root@rasputin:/pool/usrsrc/sys/dev/aac #=20 The only differences between the OCZ and the other disks are: - It's a SSD - It's a SATA, hence it's speaking SATA-on-SAS (the other disks are = SAS) Any clues at all? I see there are plenty of changes in scsi_da.c and I = am not that familiar with its=20 code after all :) Borja. From owner-freebsd-scsi@FreeBSD.ORG Fri Jul 5 12:30:02 2013 Return-Path: Delivered-To: freebsd-scsi@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1034479C for ; Fri, 5 Jul 2013 12:30:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 020651A38 for ; Fri, 5 Jul 2013 12:30:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r65CU1S0060679 for ; Fri, 5 Jul 2013 12:30:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r65CU15d060678; Fri, 5 Jul 2013 12:30:01 GMT (envelope-from gnats) Date: Fri, 5 Jul 2013 12:30:01 GMT Message-Id: <201307051230.r65CU15d060678@freefall.freebsd.org> To: freebsd-scsi@FreeBSD.org Cc: From: Markus Gebert Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Markus Gebert List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 12:30:02 -0000 The following reply was made to PR kern/179932; it has been noted by GNATS. From: Markus Gebert To: bug-followup@FreeBSD.org, =?windows-1252?Q?Philipp_M=E4chler?= , "sean_bruno@yahoo.com" Cc: Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) Date: Fri, 5 Jul 2013 14:28:42 +0200 Hey Sean Two of the G7 blades stalled today, that was quick=85 Here's an alltrace = and ciss debug output each: First stall: = https://dl.dropboxusercontent.com/u/10669369/fbsd%20ciss%20iostall%20debug= /20130705%20-%20G7%20crash%201%20(12s)/20130705%20-%201%20-%20alltrace.txt= = https://dl.dropboxusercontent.com/u/10669369/fbsd%20ciss%20iostall%20debug= /20130705%20-%20G7%20crash%201%20(12s)/20130705%20-%201%20-%20cissdebug.tx= t Second stall: = https://dl.dropboxusercontent.com/u/10669369/fbsd%20ciss%20iostall%20debug= /20130705%20-%20G7%20crash%202%20(16s)/20130705%20-%202%20-%20alltrace.txt= = https://dl.dropboxusercontent.com/u/10669369/fbsd%20ciss%20iostall%20debug= /20130705%20-%20G7%20crash%202%20(16s)/20130705%20-%202%20-%20cissdebug.tx= t As said, the debug information is for ciss0 only, because the driver = does not iterate through all controllers when printing debug = information. Also, if you need more information when this happens the = next time, please let us know. Markus From owner-freebsd-scsi@FreeBSD.ORG Fri Jul 5 13:22:49 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 872B1B65 for ; Fri, 5 Jul 2013 13:22:49 +0000 (UTC) (envelope-from prvs=1898728ac9=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 178AF1D3B for ; Fri, 5 Jul 2013 13:22:48 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004740925.msg for ; Fri, 05 Jul 2013 14:22:47 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Fri, 05 Jul 2013 14:22:47 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1898728ac9=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-scsi@freebsd.org Message-ID: <6CCD8BF318A741518CAC3E674D658B44@multiplay.co.uk> From: "Steven Hartland" To: "Borja Marcos" , References: <779CB2D3-C7D3-410E-BF7B-0BB931CF7E10@sarenet.es> Subject: Re: -STABLE: scsi_da.c and ATA on SAS Date: Fri, 5 Jul 2013 14:22:57 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 13:22:49 -0000 There was a commit to stable/9 for aac yesterday:- Checkout: http://svnweb.freebsd.org/changeset/base/252778 ----- Original Message ----- From: "Borja Marcos" To: Sent: Friday, July 05, 2013 1:01 PM Subject: -STABLE: scsi_da.c and ATA on SAS > > Hi > > I am trying to update to -STABLE (as of today) in order to test the new trim facility for ZFS. > > In my system I have a mix of SAS and SATA drives on a SAS backplane: > > # camcontrol devlist > at scbus6 target 8 lun 0 (da0,pass0) > at scbus6 target 9 lun 0 (da1,pass1) > at scbus6 target 10 lun 0 (da2,pass2) > at scbus6 target 11 lun 0 (da3,pass3) > at scbus6 target 12 lun 0 (da4,pass4) > at scbus6 target 13 lun 0 (da5,pass5) > at scbus6 target 14 lun 0 (da6,pass6) > at scbus6 target 15 lun 0 (da7,pass7) > at scbus6 target 16 lun 0 (da8,pass8) > at scbus6 target 17 lun 0 (da9,pass9) > at scbus6 target 18 lun 0 (da10,pass10) > at scbus6 target 19 lun 0 (da11,pass11) > at scbus6 target 20 lun 0 (da12,pass12) > at scbus6 target 21 lun 0 (da13,pass13) > < OCZ-VERTEX4 1.5> at scbus6 target 22 lun 0 (da14,pass14) > at scbus6 target 23 lun 0 (da15,pass15) > at scbus8 target 0 lun 0 (ses0,pass16) > at scbus8 target 1 lun 0 (ses1,pass17) > at scbus8 target 2 lun 0 (ses2,pass18) > at scbus16 target 0 lun 0 (pass19,cd0) > > > So far, the system worked perfectly. It's a Sun server with an "aac" raid card, and I applied a patch to aac_cam.c so that > disks can be directly attached to the "da"driver instead of creating "JBOD" volumes for the disks, which is error prone and > silly using ZFS anyway. > > It has been working flawlessly., even though I had to add a "quirk" to scsi_da.c so that it won't try > a read_capacity(16) on the OCZ disk. > > However, after updating to -STABLE yesterday I can't get the SSD to be recognized: > > (da14:aacp0:0:22:0): got CAM status 0x84 > (da14:aacp0:0:22:0): fatal error, failed to attach to device > (da14:aacp0:0:22:0): lost device - 0 outstanding, 5 refs > (da14:aacp0:0:22:0): removing device entry > > root@rasputin:/pool/usrsrc/sys/dev/aac # camcontrol reset 0:22:0 > camcontrol: cam_open_btl: no passthrough device found at 0:22:0 > root@rasputin:/pool/usrsrc/sys/dev/aac # > > > > The only differences between the OCZ and the other disks are: > > - It's a SSD > - It's a SATA, hence it's speaking SATA-on-SAS (the other disks are SAS) > > > Any clues at all? I see there are plenty of changes in scsi_da.c and I am not that familiar with its > code after all :) > > > > > > > > Borja. > > > > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-scsi@FreeBSD.ORG Fri Jul 5 13:42:58 2013 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 053B6D1 for ; Fri, 5 Jul 2013 13:42:58 +0000 (UTC) (envelope-from prvs=1898728ac9=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 9E60D1E0A for ; Fri, 5 Jul 2013 13:42:54 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004741173.msg for ; Fri, 05 Jul 2013 14:42:53 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Fri, 05 Jul 2013 14:42:53 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1898728ac9=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-scsi@FreeBSD.org Message-ID: From: "Steven Hartland" To: "Markus Gebert" , References: <201307051230.r65CU15d060678@freefall.freebsd.org> Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) Date: Fri, 5 Jul 2013 14:43:03 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 13:42:58 -0000 Might also want to get the output from "show sleepchain" for all threads too as that will easily identify sleep lock dead locks. Also whats the check_disk process? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-scsi@FreeBSD.ORG Fri Jul 5 14:15:10 2013 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 3C78CBC2; Fri, 5 Jul 2013 14:15:10 +0000 (UTC) (envelope-from markus.gebert@hostpoint.ch) Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [IPv6:2a00:d70:0:a::e0]) by mx1.freebsd.org (Postfix) with ESMTP id D4E741F5F; Fri, 5 Jul 2013 14:15:09 +0000 (UTC) Received: from [2001:1620:2013:1:e8a7:b484:7a4f:689b] (port=50534) by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1Uv6my-000FEo-OP; Fri, 05 Jul 2013 16:15:08 +0200 Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) From: Markus Gebert X-Priority: 3 In-Reply-To: Date: Fri, 5 Jul 2013 16:14:26 +0200 Message-Id: References: <201307051230.r65CU15d060678@freefall.freebsd.org> To: Steven Hartland X-Mailer: Apple Mail (2.1508) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-scsi@FreeBSD.org, "bug-followup@FreeBSD.org" X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 14:15:10 -0000 Hey Steven Thanks for your input. On 05.07.2013, at 15:43, Steven Hartland = wrote: > Might also want to get the output from "show sleepchain" for all = threads > too as that will easily identify sleep lock dead locks. Is there an easy way to do this for all threads with one command? The = first server that crashed had 800 threads=85 If not, we should probably = script this outside of ddb using thread ids from the alltrace output. Or = is there a subset of threads you're particularly interested in? > Also whats the check_disk process? This is Nagios' check_disk plugin we use to check the filesystem usage = on all mountpoints. It runs quite frequently, that's why multiple may be = get started until we notice and break into the debugger. Markus From owner-freebsd-scsi@FreeBSD.ORG Fri Jul 5 14:20:01 2013 Return-Path: Delivered-To: freebsd-scsi@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A5E6C7E for ; Fri, 5 Jul 2013 14:20:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 86F431F89 for ; Fri, 5 Jul 2013 14:20:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r65EK0kP083947 for ; Fri, 5 Jul 2013 14:20:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r65EK03e083946; Fri, 5 Jul 2013 14:20:00 GMT (envelope-from gnats) Date: Fri, 5 Jul 2013 14:20:00 GMT Message-Id: <201307051420.r65EK03e083946@freefall.freebsd.org> To: freebsd-scsi@FreeBSD.org Cc: From: Markus Gebert Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Markus Gebert List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 14:20:01 -0000 The following reply was made to PR kern/179932; it has been noted by GNATS. From: Markus Gebert To: Steven Hartland Cc: freebsd-scsi@FreeBSD.org, "bug-followup@FreeBSD.org" Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) Date: Fri, 5 Jul 2013 16:14:26 +0200 --Apple-Mail=_9D4196B5-B411-44BB-BDE1-8695E2E76451 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Hey Steven Thanks for your input. On 05.07.2013, at 15:43, Steven Hartland = wrote: > Might also want to get the output from "show sleepchain" for all = threads > too as that will easily identify sleep lock dead locks. Is there an easy way to do this for all threads with one command? The = first server that crashed had 800 threads=85 If not, we should probably = script this outside of ddb using thread ids from the alltrace output. Or = is there a subset of threads you're particularly interested in? > Also whats the check_disk process? This is Nagios' check_disk plugin we use to check the filesystem usage = on all mountpoints. It runs quite frequently, that's why multiple may be = get started until we notice and break into the debugger. Markus --Apple-Mail=_9D4196B5-B411-44BB-BDE1-8695E2E76451 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 killing@multiplay.co.uk> = wrote:

too as that will easily identify sleep = lock dead locks.

Is = there an easy way to do this for all threads with one command? The first = server that crashed had 800 threads=85 If not, we should probably script = this outside of ddb using thread ids from the alltrace output. Or is = there a subset of threads you're particularly interested = in?


Also whats the check_disk = process?

This is Nagios' check_disk = plugin we use to check the filesystem usage on all mountpoints. It runs = quite frequently, that's why multiple may be get started until we notice = and break into the = debugger.


Markus

= --Apple-Mail=_9D4196B5-B411-44BB-BDE1-8695E2E76451--