From owner-freebsd-scsi@freebsd.org  Mon Jun  6 22:19:12 2016
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E40F3B63C72
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Mon,  6 Jun 2016 22:19:12 +0000 (UTC)
 (envelope-from list-news@mindpackstudios.com)
Received: from mail.furymx.com (mindpack.mx1.furymx.net [64.141.130.10])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id C196317B5
 for <freebsd-scsi@freebsd.org>; Mon,  6 Jun 2016 22:19:11 +0000 (UTC)
 (envelope-from list-news@mindpackstudios.com)
Received: from mindpack.furymx.net (mindpack.mx1.furymx.net [10.10.1.10])
 by mail.furymx.com (Postfix) with ESMTP id D084921AA6C
 for <freebsd-scsi@freebsd.org>; Mon,  6 Jun 2016 17:19:04 -0500 (CDT)
X-Virus-Scanned: amavisd-new at furymx.com
Received: from mail.furymx.com ([10.10.1.10])
 by mindpack.furymx.net (mail.furymx.com [10.10.1.10]) (amavisd-new, port 10024)
 with ESMTP id QU7in_cUxwyD for <freebsd-scsi@freebsd.org>;
 Mon,  6 Jun 2016 17:19:02 -0500 (CDT)
Received: from vortex.local (c-98-215-180-176.hsd1.in.comcast.net
 [98.215.180.176])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 (Authenticated sender: kyle@mindpackstudios.com)
 by mail.furymx.com (Postfix) with ESMTPSA id C90D221AA58
 for <freebsd-scsi@freebsd.org>; Mon,  6 Jun 2016 17:19:02 -0500 (CDT)
Subject: Re: Avago LSI SAS 3008 & Intel SSD Timeouts
To: freebsd-scsi@freebsd.org
References: <30c04d8b-80cb-c637-26dc-97caebad3acb@mindpackstudios.com>
 <b30f968c-cc41-f7de-5a54-35bed961e65a@multiplay.co.uk>
 <08C01646-9AF3-4E89-A545-C051A284E039@sarenet.es>
From: list-news <list-news@mindpackstudios.com>
Message-ID: <986e03a7-5dc8-f5e0-5a17-4bf49459f905@mindpackstudios.com>
Date: Mon, 6 Jun 2016 17:19:02 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
 Gecko/20100101 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <08C01646-9AF3-4E89-A545-C051A284E039@sarenet.es>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Jun 2016 22:19:13 -0000

System was running solid all weekend with camcontrol tags set to 1 for 
each device, zero errors.

Last week I did try
*# sysctl kern.cam.da.X.delete_method=**DISABLE*
for each drive, but it still threw errors.

Also, I did try out bio_delete_disable earlier today:
*# camcontrol tags daX -N 255*
(Firstly resetting tags back to 255 for each device, as they are 
currently at 1.)

*# sysctl vfs.zfs.vdev.bio_delete_disable=1*
(a few minutes later)

Jun  6 12:28:36 s18 kernel: (da2:mpr0:0:12:0): SYNCHRONIZE CACHE(10). 
CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 577 command timeout cm 
0xfffffe0001351550 ccb 0xfffff804e78e3800 target 12, handle(0x000c)
Jun  6 12:28:36 s18 kernel: mpr0: At enclosure level 0, slot 4, 
connector name (    )
Jun  6 12:28:36 s18 kernel: mpr0: timedout cm 0xfffffe0001351550 
allocated tm 0xfffffe0001322150
Jun  6 12:28:36 s18 kernel: (noperiph:mpr0:0:4294967295:0): SMID 1 
Aborting command 0xfffffe0001351550
Jun  6 12:28:36 s18 kernel: mpr0: Sending reset from mprsas_send_abort 
for target ID 12
Jun  6 12:28:36 s18 kernel: (da2:mpr0:0:12:0): READ(10). CDB: 28 00 18 
45 1c c0 00 00 08 00 length 4096 SMID 583 command timeout cm 
0xfffffe0001351d30 ccb 0xfffff806b9556800 target 12, handle(0x000c)
Jun  6 12:28:36 s18 kernel: mpr0: At enclosure level 0, slot 4, 
connector name (    )
Jun  6 12:28:36 s18 kernel: mpr0: queued timedout cm 0xfffffe0001351d30 
for processing by tm 0xfffffe0001322150
...

During the 60 second hang:
*# gstat -do*
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s kBps   
ms/d    o/s   ms/o   %busy Name
    70      0      0      0    0.0      0      0    0.0 0      0    
0.0      0    0.0    0.0| da2
     0      0      0      0    0.0      0      0    0.0 0      0    
0.0      0    0.0    0.0| da4
     0      0      0      0    0.0      0      0    0.0 0      0    
0.0      0    0.0    0.0| da6
     0      0      0      0    0.0      0      0    0.0 0      0    
0.0      0    0.0    0.0| da7


Also during the 60 second hang:
*# camcontrol tags da3 -v*
(pass2:mpr0:0:12:0): dev_openings  248
(pass2:mpr0:0:12:0): dev_active    7
(pass2:mpr0:0:12:0): allocated     7
(pass2:mpr0:0:12:0): queued        0
(pass2:mpr0:0:12:0): held          0
(pass2:mpr0:0:12:0): mintags       2
(pass2:mpr0:0:12:0): maxtags       255

Also during the 60 second hang:
*# sysctl dev.mpr*
dev.mpr.0.spinup_wait_time: 3
dev.mpr.0.chain_alloc_fail: 0
dev.mpr.0.enable_ssu: 1
dev.mpr.0.max_chains: 2048
dev.mpr.0.chain_free_lowwater: 2022
dev.mpr.0.chain_free: 2048
dev.mpr.0.io_cmds_highwater: 71
dev.mpr.0.io_cmds_active: 4
dev.mpr.0.driver_version: 09.255.01.00-fbsd
dev.mpr.0.firmware_version: 10.00.03.00
dev.mpr.0.disable_msi: 0
dev.mpr.0.disable_msix: 0
dev.mpr.0.debug_level: 895
dev.mpr.0.%parent: pci1
dev.mpr.0.%pnpinfo: vendor=0x1000 device=0x0097 subvendor=0x15d9 
subdevice=0x0808 class=0x010700
dev.mpr.0.%location: pci0:1:0:0 handle=\_SB_.PCI0.BR1A.H000
dev.mpr.0.%driver: mpr
dev.mpr.0.%desc: Avago Technologies (LSI) SAS3008
dev.mpr.%parent:

Something else that may be of consideration: I ran fio & bonnie++ for 
about an hour of heavy io (with tags still set to 255 drive busy showing 
90-100%).  No errors.  I fire up my application (threaded Java/Postgres 
application), and within minutes:

*# gstat -do*
dT: 1.002s  w: 1.000s
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w d/s   kBps   
ms/d    o/s   ms/o   %busy Name
     0      0      0      0    0.0      0      0 0.0      0      0    
0.0      0    0.0    0.0| da2
     0      0      0      0    0.0      0      0 0.0      0      0    
0.0      0    0.0    0.0| da4
    71      0      0      0    0.0      0      0 0.0      0      0    
0.0      0    0.0    0.0| da6
     0      0      0      0    0.0      0      0 0.0      0      0    
0.0      0    0.0    0.0| da7

*Error:*
Jun  6 13:36:15 s18 kernel: (da6:mpr0:0:16:0): WRITE(10). CDB: 2a 00 30 
65 13 90 00 00 10 00
Jun  6 13:36:15 s18 kernel: (da6:mpr0:0:16:0): CAM status: SCSI Status Error
Jun  6 13:36:15 s18 kernel: (da6:mpr0:0:16:0): SCSI status: Check Condition
Jun  6 13:36:15 s18 kernel: (da6:mpr0:0:16:0): SCSI sense: UNIT 
ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Jun  6 13:36:15 s18 kernel: (da6:mpr0:0:16:0): Retrying command (per 
sense data)
...

*And again 2 minutes later:*

Jun  6 13:38:43 s18 kernel: (da2:mpr0:0:12:0): WRITE(10). CDB: 2a 00 21 
66 63 58 00 00 10 00
Jun  6 13:38:43 s18 kernel: (da2:mpr0:0:12:0): CAM status: SCSI Status Error
Jun  6 13:38:43 s18 kernel: (da2:mpr0:0:12:0): SCSI status: Check Condition
Jun  6 13:38:43 s18 kernel: (da2:mpr0:0:12:0): SCSI sense: UNIT 
ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Jun  6 13:38:43 s18 kernel: (da2:mpr0:0:12:0): Retrying command (per 
sense data)
...

*And again 3 minutes later:*

Jun  6 13:41:29 s18 kernel: (da7:mpr0:0:18:0): WRITE(10). CDB: 2a 00 33 
44 b5 b8 00 00 10 00
...

*#camcontrol tags daX -N **1*
(And now, after 15 minutes, zero errors.)

In putting some thoughts to this, which may or may not be off base 
(please feel free to correct me btw), I've noticed the following:

1) There doesn't seem to be any indication as to what causes the drive 
to time-out.  The command that fails in the error log is one of the 
following: READ(10), WRITE(10), ATA COMMAND PASS THROUGH(16), and 
SYNCHRONIZE CACHE(10).  As I understand it, that was the command being 
executed, timed-out, and retried, not what potentially caused the drive 
lock-up.

2) When my application is run, it hammers postgres pretty hard, and when 
postgres is running I get the errors. FIO & Bonnie++ doesn't give me 
errors; daily use of the system doesn't give me errors.  I'm assuming 
postgresql is sending far more of a certain type of command to the io 
subsystem than those other applications, and the first command that 
comes to mind is fsync.

3) I turned fsync off in postgresql.conf (I'm brave for science!!) then 
ran my application again with tags at 255, 100% cpu load, 70-80% drive 
busy%.

*1.5 hours later at full load - finally, a single timeout:*
Jun  6 16:31:33 s18 kernel: (da2:mpr0:0:12:0): READ(10). CDB: 28 00 2d 
50 1b 78 00 00 08 00 length 4096 SMID 556 command timeout cm 
0xfffffe000134f9c0 ccb 0xfffff83aa5b25000 target 12, handle(0x000c)

I ran it for another 20 minutes with no additional timeouts.

I assume the fsync command turns into a zfs -> cam -> SYNCHRONIZE CACHE 
command for each device.  And postgres is sending this command 
considerably more often than a typical application (at least with fsync 
turned on in postgresql.conf), which would explain why when fsync is 
turned off or minimal fsyncs are being sent (ie typical system usage), 
the error is rare.  Yet, when fsync is being sent repeatedly, the errors 
start happening every few minutes.  The only reason I can think why 
setting tags to 1 eliminates the errors entirely must have something to 
do with Intel drives not handling parallel commands from cam when one 
(or more) of the commands are SYNCHRONIZE CACHE.  Thoughts?

Thanks,

-Kyle


On 6/6/16 3:42 AM, Borja Marcos wrote:
>> On 03 Jun 2016, at 23:49, Steven Hartland <killing@multiplay.co.uk> wrote:
>>
>> First thing would be to run gstat with -d to see if you're actually stacking up deletes, a symptom of which can be r/w dropping to zero.
>>
>> If you are seeing significant deletes it could be a FW issue on the drives.
> Hmm. I’ve suffered that badly with Intel P3500 NVMe drives, which suffer at least from a driver problem: trims are not coalesced.
> However I didn’t experience command timeouts. Reads and, especially, writes, stalled badly.
>
> A quick test for trim related trouble is setting the sysctl variable vfs.zfs.vdev.bio_delete_disable to 1. It doesn´t require
> a reboot and you can quickly compare results.
>
> In my case, a somewhat similar problem in an IBM server was caused by a faulty LSI3008 card it seems. As I didn´t have spare LSI3008 cards
> at the time I replaced it by a LSI2008 and everything works perfectly. Before anyone chimes in suggesting card incompatibility of some sort,
> I have a twin system with a LSI3008 working like a charm. ;)
>
>
>
>
>
>
>
> Borja.
>
>
> _______________________________________________
> freebsd-scsi@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"