From owner-freebsd-scsi@FreeBSD.ORG Sun Feb 3 17:12:36 2013 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 89FEC696 for ; Sun, 3 Feb 2013 17:12:36 +0000 (UTC) (envelope-from jau@oxit.fi) Received: from smtp.oxit.fi (smtp.oxit.fi [193.185.41.132]) by mx1.freebsd.org (Postfix) with ESMTP id 01BBD15B for ; Sun, 3 Feb 2013 17:12:35 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp.oxit.fi (Postfix) with ESMTP id 5502E6C226F; Sun, 3 Feb 2013 19:04:03 +0200 (EET) X-Virus-Scanned: Debian amavisd-new at smtp.oxit.fi Received: from smtp.oxit.fi ([127.0.0.1]) by localhost (huskvarna.oxit.fi [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Pd+FxkbSdu6T; Sun, 3 Feb 2013 19:03:57 +0200 (EET) Received: from [192.168.1.131] (ip193-64-26-115.cust.eunet.fi [193.64.26.115]) by smtp.oxit.fi (Postfix) with ESMTPSA id D0E4F6C053F; Sun, 3 Feb 2013 19:03:56 +0200 (EET) Message-ID: <510E987C.4090509@oxit.fi> Date: Sun, 03 Feb 2013 19:03:56 +0200 From: "Jukka A. Ukkonen" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: freebsd-scsi@FreeBSD.org Subject: Re: Multiple FreeBSD SCSI Hosts Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Joerg Wunsch X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Feb 2013 17:12:36 -0000 Hello all, I have been browsing through these old messages about the SCSI RESERVE/RELEASE operations in FreeBSD. What nobody seems to have quite realized at the time is that a multiply attached SCSI device can be used as the trusted 3rd party for a "cluster group" or for anything which should be active only at one node at any particular time. E.g. Solaris cluster groups do exactly that. If one node gets the reservation through, no other node will until the first successful one either releases the device or crashes. The reserved LUN can be either an otherwise unused small storage unit or one that is going to be anyhow mounted and unmounted as the cluster group dictates. The same method would work also for selecting the leader among Shared QFS metadata servers and for other similar purposes. The reservation should definitely not be only bundled inside the mount operations. Instead it should be possible to trigger reserve and release through ioctl() or through a separate system call. This is because sometimes the feature might be used for unmounted raw devices or for devices which could be logically mounted to multiple systems while anyhow busy for all other systems but one. E.g. something in the style of Shared QFS could use a SCSI reservation to its metadata volumes, which need not visible to the users as separate file systems at all. Anyhow I could see seriously more use for this particular SCSI feature than just locking a mounted tape drive were it implemented for other devices than sa only and somehow exported to the user space. Obviously it should be available for root only, though. Cheers, --jau From owner-freebsd-scsi@FreeBSD.ORG Sun Feb 3 17:35:31 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 57A1C338 for ; Sun, 3 Feb 2013 17:35:31 +0000 (UTC) (envelope-from mjacob@freebsd.org) Received: from ns1.feral.com (ns1.feral.com [192.67.166.1]) by mx1.freebsd.org (Postfix) with ESMTP id 3DB2925B for ; Sun, 3 Feb 2013 17:35:29 +0000 (UTC) Received: from [192.168.135.7] (quaver.net [76.14.49.207]) (authenticated bits=0) by ns1.feral.com (8.14.5/8.14.4) with ESMTP id r13HZI1A032716 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sun, 3 Feb 2013 09:35:23 -0800 (PST) (envelope-from mjacob@freebsd.org) Message-ID: <510E9FD1.5070907@freebsd.org> Date: Sun, 03 Feb 2013 09:35:13 -0800 From: Matthew Jacob Organization: FreeBSD User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org, j@uriah.heep.sax.de Subject: Re: Multiple FreeBSD SCSI Hosts References: <510E987C.4090509@oxit.fi> In-Reply-To: <510E987C.4090509@oxit.fi> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (ns1.feral.com [192.67.166.1]); Sun, 03 Feb 2013 09:35:23 -0800 (PST) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: mjacob@freebsd.org List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Feb 2013 17:35:31 -0000 On 2/3/2013 9:03 AM, Jukka A. Ukkonen wrote: > > a multiply attached SCSI device can be used as the trusted For SANs or iSCSI this can make some sense- but only if you really really trust the release mechanism (which I don't in any heterogeneous environment). The other question to raise is how do you sensibly represent the disks to the non-winner node and field failed attempts to operate on the shared disk? In other words, how do percolate RESERVATION CONFLICT errors up to the application level? From owner-freebsd-scsi@FreeBSD.ORG Mon Feb 4 11:06:51 2013 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 509219A8 for ; Mon, 4 Feb 2013 11:06:51 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 42B28D1C for ; Mon, 4 Feb 2013 11:06:51 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r14B6pUV028898 for ; Mon, 4 Feb 2013 11:06:51 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r14B6of6028896 for freebsd-scsi@FreeBSD.org; Mon, 4 Feb 2013 11:06:50 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 4 Feb 2013 11:06:50 GMT Message-Id: <201302041106.r14B6of6028896@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-scsi@FreeBSD.org Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Feb 2013 11:06:51 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/171650 scsi [da] da(4) driver does not recognize end of cciss (Sma o kern/169403 scsi [cam] [patch] CAM layer, I/O starvation, no fairness o kern/165982 scsi [mpt] mpt instability, drive resets, and losses on Fre o kern/165740 scsi [cam] SCSI code must drain callbacks before free o kern/163713 scsi [aic7xxx] [patch] Add Adaptec29329LPE to aic79xx_pci.c o kern/162256 scsi [mpt] QUEUE FULL EVENT and 'mpt_cam_event: 0x0' o kern/161809 scsi [cam] [patch] set kern.cam.boot_delay via build option o kern/157770 scsi [iscsi] [panic] iscsi_initiator panic o kern/154432 scsi [xpt] run_interrupt_driven_hooks: still waiting after o kern/153514 scsi [cam] [panic] CAM related panic o docs/151336 scsi Missing documentation of scsi_ and ata_ functions in c s kern/149927 scsi [cam] hard drive not stopped before removing power dur o kern/148083 scsi [aac] Strange device reporting o kern/147704 scsi [mpt] sys/dev/mpt: new chip revision, partially unsupp o kern/145768 scsi [mpt] can't perform I/O on SAS based SAN disk in freeb o kern/144648 scsi [aac] Strange values of speed and bus width in dmesg o kern/142351 scsi [mpt] LSILogic driver performance problems o kern/134488 scsi [mpt] MPT SCSI driver probes max. 8 LUNs per device o kern/132206 scsi [mpt] system panics on boot when mirroring and 2nd dri o kern/130621 scsi [mpt] tranfer rate is inscrutable slow when use lsi213 o kern/129602 scsi [ahd] ahd(4) gets confused and wedges SCSI bus o kern/128452 scsi [sa] [panic] Accessing SCSI tape drive randomly crashe o kern/128245 scsi [scsi] "inquiry data fails comparison at DV1 step" [re o kern/127927 scsi [isp] isp(4) target driver crashes kernel when set up o kern/127717 scsi [ata] [patch] [request] - support write cache toggling o kern/123674 scsi [ahc] ahc driver dumping o kern/123520 scsi [ahd] unable to boot from net while using ahd o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/120487 scsi [sg] scsi_sg incompatible with scanners o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load o kern/60598 scsi wire down of scsi devices conflicts with config s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce 45 problems total. From owner-freebsd-scsi@FreeBSD.ORG Tue Feb 5 02:36:09 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 821) id A853C2E1; Tue, 5 Feb 2013 02:36:09 +0000 (UTC) Date: Tue, 5 Feb 2013 02:36:09 +0000 From: John To: freebsd-scsi@freebsd.org Subject: Increase mps sequential read performance with ZFS/zvol Message-ID: <20130205023609.GA99100@FreeBSD.org> References: <510E987C.4090509@oxit.fi> <510E9FD1.5070907@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <510E9FD1.5070907@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 02:36:09 -0000 Hi Folks, I'm in the process of putting together another ZFS server and after running some sequential read performance tests I'm thinking things could be better. It's running 9.1-stable from late January: FreeBSD vprzfs30p.unx.sas.com 9.1-STABLE FreeBSD 9.1-STABLE #1 r246079M I have two HP D2700 shelves populated with 600GB drives connected to a pair of LSI 9207-8e HBA cards installed in a Del R620 with 128GB of ram, the OS installed an internal raid volume. The shelves are dual channel, each LSI card with a channel through both shelves. Gmultipath is used to bind the disks such that each disk can be addressed by either controller and the I/O balanced. The zfs pool consists of 24 mirrors, each pair one from each shelf. The multipaths are rotated such that I/O is balanced between shelves and controllers. For testing, two 300GB zvols are created, each almost full: NAME USED AVAIL REFER MOUNTPOINT pool0 1.46T 11.4T 31K /pool0 pool0/lun000004 301G 11.4T 261G - pool0/lun000005 301G 11.4T 300G - Running a simple dd test: # dd if=/dev/zvol/pool0/lun000005 of=/dev/null bs=512k 614400+0 records in 614400+0 records out 322122547200 bytes transferred in 278.554656 secs (1156406975 bytes/sec) The drives are spread and balanced across four 6Gb/s channels, 1.1GB/s seems a bit slow. Note, changing the bs= options makes no real difference. Now, if I run 2 'dd' operations against different pools in parallel: # dd if=/dev/zvol/pool0/lun000005 of=/dev/null bs=512k 614400+0 records in 614400+0 records out 322122547200 bytes transferred in 278.605380 secs (1156196435 bytes/sec) # dd if=/dev/zvol/pool0/lun000004 of=/dev/null bs=512k 614400+0 records in 614400+0 records out 322122547200 bytes transferred in 282.065008 secs (1142015274 bytes/sec) This tells me the I/O subsystem has plenty of overhead room available such that the first 'dd' operation could run faster. I've included some basic config information below. No kmem values in /boot/loader.conf. I did play around with block_cap but it made no difference. It seems like something is holding the system back. Thanks for any ideas. -John Output from top during a single dd run: 5 root 11 -8 - 0K 208K zvol:i 1 5:11 41.65% zfskern 0 root 350 -8 0 0K 5600K - 5 3:59 15.23% kernel 1784 root 1 26 0 9944K 2072K CPU1 1 0:31 13.87% dd The zvol:io state appears to be a simple loop wait loop waiting for outstanding I/O requests to complete. How to get more I/O requests going? Sample of the highest number of I/O requests per controller: dev.mps.0.io_cmds_highwater: 207 dev.mps.1.io_cmds_highwater: 126 IOCFACTS (identical): mps0: port 0xec00-0xecff mem 0xdaff0000-0xdaffffff,0xdaf80000-0xdafbffff irq 48 at device 0.0 on pci5 mps0: Doorbell= 0x22000000 mps0: mps_wait_db_ack: successfull count(2), timeout(5) mps0: Doorbell= 0x12000000 mps0: mps_wait_db_ack: successfull count(1), timeout(5) mps0: mps_wait_db_ack: successfull count(1), timeout(5) mps0: mps_wait_db_ack: successfull count(1), timeout(5) mps0: mps_wait_db_ack: successfull count(1), timeout(5) mps0: IOCFacts : MsgVersion: 0x200 HeaderVersion: 0x1b00 IOCNumber: 0 IOCExceptions: 0x0 MaxChainDepth: 128 WhoInit: ROM BIOS NumberOfPorts: 1 RequestCredit: 10240 ProductID: 0x2214 IOCCapabilities: 1285c FWVersion= 15-0-0-0 IOCRequestFrameSize: 32 MaxInitiators: 32 MaxTargets: 1024 MaxSasExpanders: 64 MaxEnclosures: 65 ProtocolFlags: 3 HighPriorityCredit: 128 MaxReplyDescriptorPostQueueDepth: 65504 ReplyFrameSize: 32 MaxVolumes: 0 MaxDevHandle: 1128 MaxPersistentEntries: 128 mps0: Firmware: 15.00.00.00, Driver: 14.00.00.01-fbsd mps0: IOCCapabilities: 1285c And some output from 'gstat -f Z -I 300ms' dT: 0.302s w: 0.300s filter: Z L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 202 202 25450 2.6 0 0 0.0 25.5| multipath/Z0 1 202 202 25046 6.2 0 0 0.0 36.6| multipath/Z2 7 185 185 23735 6.3 0 0 0.0 33.1| multipath/Z4 0 212 212 27125 5.4 0 0 0.0 30.4| multipath/Z6 0 169 169 21616 5.0 0 0 0.0 28.1| multipath/Z8 0 162 162 20768 5.0 0 0 0.0 25.7| multipath/Z10 0 175 175 22463 6.0 0 0 0.0 30.4| multipath/Z12 0 192 192 24582 4.4 0 0 0.0 32.1| multipath/Z14 2 169 169 21616 3.3 0 0 0.0 18.8| multipath/Z16 4 169 169 20808 4.1 0 0 0.0 23.0| multipath/Z18 2 195 195 24602 4.5 0 0 0.0 28.5| multipath/Z20 5 172 172 22039 4.4 0 0 0.0 22.7| multipath/Z22 0 166 166 21192 3.7 0 0 0.0 20.2| multipath/Z24 7 179 179 22887 5.4 0 0 0.0 27.8| multipath/Z26 7 172 172 22039 3.5 0 0 0.0 23.1| multipath/Z28 0 192 192 24582 3.8 0 0 0.0 25.5| multipath/Z30 1 175 175 22463 6.0 0 0 0.0 30.5| multipath/Z32 1 182 182 22907 3.9 0 0 0.0 25.6| multipath/Z34 0 212 212 27125 6.3 0 0 0.0 32.7| multipath/Z36 0 179 179 22483 4.8 0 0 0.0 27.5| multipath/Z38 2 185 185 23735 4.6 0 0 0.0 30.0| multipath/Z40 0 179 179 22887 4.5 0 0 0.0 28.2| multipath/Z42 3 195 195 25006 4.4 0 0 0.0 32.3| multipath/Z44 3 192 192 24582 4.0 0 0 0.0 30.5| multipath/Z46 0 0 0 0 0.0 0 0 0.0 0.0| multipath/Z48 0 179 179 22887 4.7 0 0 0.0 31.0| multipath/Z1 0 185 185 23331 4.1 0 0 0.0 24.8| multipath/Z3 0 175 175 21639 5.3 0 0 0.0 28.2| multipath/Z5 4 162 162 20768 5.1 0 0 0.0 26.6| multipath/Z7 0 195 195 25006 3.5 0 0 0.0 23.4| multipath/Z9 3 179 179 22887 5.0 0 0 0.0 25.7| multipath/Z11 4 159 159 20344 4.9 0 0 0.0 23.7| multipath/Z13 4 166 166 21192 4.3 0 0 0.0 25.1| multipath/Z15 0 169 169 21616 3.9 0 0 0.0 24.7| multipath/Z17 7 189 189 23334 4.2 0 0 0.0 25.7| multipath/Z19 4 169 169 21212 4.3 0 0 0.0 28.1| multipath/Z21 0 159 159 20344 5.3 0 0 0.0 25.8| multipath/Z23 5 185 185 23316 4.1 0 0 0.0 26.0| multipath/Z25 0 192 192 24582 4.9 0 0 0.0 30.6| multipath/Z27 0 172 172 22039 5.5 0 0 0.0 27.4| multipath/Z29 4 166 166 21192 4.2 0 0 0.0 23.7| multipath/Z31 0 169 169 20778 3.5 0 0 0.0 22.2| multipath/Z33 2 172 172 21232 5.1 0 0 0.0 29.4| multipath/Z35 3 169 169 21616 2.9 0 0 0.0 20.1| multipath/Z37 0 179 179 22887 5.2 0 0 0.0 32.0| multipath/Z39 0 212 212 26721 5.4 0 0 0.0 31.7| multipath/Z41 2 175 175 22463 4.4 0 0 0.0 28.0| multipath/Z43 0 179 179 22887 3.6 0 0 0.0 18.2| multipath/Z45 0 179 179 22887 4.3 0 0 0.0 28.3| multipath/Z47 0 0 0 0 0.0 0 0 0.0 0.0| multipath/Z49 Each individual disk on the system shows the capability of 255 tags: # camcontrol tags da0 -v (pass2:mps0:0:10:0): dev_openings 255 (pass2:mps0:0:10:0): dev_active 0 (pass2:mps0:0:10:0): devq_openings 255 (pass2:mps0:0:10:0): devq_queued 0 (pass2:mps0:0:10:0): held 0 (pass2:mps0:0:10:0): mintags 2 (pass2:mps0:0:10:0): maxtags 255 zpool: # zpool status pool: pool0 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM pool0 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 multipath/Z0 ONLINE 0 0 0 multipath/Z1 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 multipath/Z2 ONLINE 0 0 0 multipath/Z3 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 multipath/Z4 ONLINE 0 0 0 multipath/Z5 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 multipath/Z6 ONLINE 0 0 0 multipath/Z7 ONLINE 0 0 0 mirror-4 ONLINE 0 0 0 multipath/Z8 ONLINE 0 0 0 multipath/Z9 ONLINE 0 0 0 mirror-5 ONLINE 0 0 0 multipath/Z10 ONLINE 0 0 0 multipath/Z11 ONLINE 0 0 0 ... mirror-21 ONLINE 0 0 0 multipath/Z42 ONLINE 0 0 0 multipath/Z43 ONLINE 0 0 0 mirror-22 ONLINE 0 0 0 multipath/Z44 ONLINE 0 0 0 multipath/Z45 ONLINE 0 0 0 mirror-23 ONLINE 0 0 0 multipath/Z46 ONLINE 0 0 0 multipath/Z47 ONLINE 0 0 0 spares multipath/Z48 AVAIL multipath/Z49 AVAIL errors: No known data errors From owner-freebsd-scsi@FreeBSD.ORG Tue Feb 5 21:18:39 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 58B01DAE; Tue, 5 Feb 2013 21:18:39 +0000 (UTC) (envelope-from ken@kdm.org) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) by mx1.freebsd.org (Postfix) with ESMTP id 161578BB; Tue, 5 Feb 2013 21:18:38 +0000 (UTC) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.14.2/8.14.2) with ESMTP id r15LGhCO075656; Tue, 5 Feb 2013 14:16:43 -0700 (MST) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.14.2/8.14.2/Submit) id r15LGgLf075655; Tue, 5 Feb 2013 14:16:42 -0700 (MST) (envelope-from ken) Date: Tue, 5 Feb 2013 14:16:42 -0700 From: "Kenneth D. Merry" To: "Desai, Kashyap" Subject: Re: Max Queue depth of HBA limited to 256 ? Message-ID: <20130205211642.GA75343@nargothrond.kdm.org> References: <20130121170529.GA64188@nargothrond.kdm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2i Cc: "freebsd-scsi@freebsd.org" , "McConnell, Stephen" , "jhb@freebsd.org" X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 21:18:39 -0000 I'm able to get more than 255 commands outstanding to the controller in my configuration. For example: dev.mps.0.%desc: LSI SAS2116 dev.mps.0.%driver: mps dev.mps.0.%location: slot=6 function=0 handle=\_SB_.PCI0.S30_ dev.mps.0.%pnpinfo: vendor=0x1000 device=0x0064 subvendor=0x1000 subdevice=0x30c0 class=0x010700 dev.mps.0.%parent: pci0 dev.mps.0.debug_level: 4 dev.mps.0.disable_msix: 0 dev.mps.0.disable_msi: 0 dev.mps.0.firmware_version: 13.00.01.00 dev.mps.0.driver_version: 14.00.00.01-fbsd dev.mps.0.io_cmds_active: 442 dev.mps.0.io_cmds_highwater: 464 dev.mps.0.chain_free: 354 dev.mps.0.chain_free_lowwater: 181 dev.mps.0.max_chains: 2048 dev.mps.0.chain_alloc_fail: 0 io_cmds_highwater is 464. Can you get more than 255 commands outstanding if you use more than 1 target? This is with 272 'dd' processes doing 1MB reads to 16 2TB and 3TB SAS drives behind 2 3Gb Maxim expanders: at scbus2 target 144 lun 0 (pass4,sg4,da0) at scbus2 target 145 lun 0 (pass5,sg5,da1) at scbus2 target 146 lun 0 (pass6,sg6,da2) at scbus2 target 147 lun 0 (pass7,sg7,da3) at scbus2 target 148 lun 0 (pass8,sg8,da4) at scbus2 target 149 lun 0 (pass9,sg9,da5) at scbus2 target 150 lun 0 (pass10,sg10,da6) at scbus2 target 151 lun 0 (pass11,sg11,da7) at scbus2 target 152 lun 0 (pass12,sg12,da8) at scbus2 target 153 lun 0 (pass13,sg13,da9) at scbus2 target 154 lun 0 (pass14,sg14,da10) at scbus2 target 155 lun 0 (pass15,sg15,da11) at scbus2 target 156 lun 0 (pass16,sg16,da12) at scbus2 target 157 lun 0 (pass17,sg17,da13) at scbus2 target 158 lun 0 (pass18,sg18,da14) at scbus2 target 159 lun 0 (pass19,sg19,da15) i.e. 17 iterations of this: ((i=0)); while [ $i -le 15 ]; do dd if=/dev/da$i of=/dev/null bs=1m & ((i++)); done The individual drives see varying numbers of tags, but nowhere near the maximum: [root@storage-domain ~]# camcontrol tags da15 -v (pass19:mps0:0:159:0): dev_openings 230 (pass19:mps0:0:159:0): dev_active 25 (pass19:mps0:0:159:0): devq_openings 230 (pass19:mps0:0:159:0): devq_queued 0 (pass19:mps0:0:159:0): held 0 (pass19:mps0:0:159:0): mintags 2 (pass19:mps0:0:159:0): maxtags 255 What kind of drive is the target? Ken On Wed, Jan 23, 2013 at 00:44:31 +0530, Desai, Kashyap wrote: > LSI h/w needs more outstanding command in FW to get better Perf counts compare to other OS. > > Please suggest if whatever I have been observed is limitation from FreeBSD or we can tune it in Driver ? > My goals is to pump ~1000 outstanding IOs to the HBA. I see that it never goes beyond 255. > > Thanks, > Kashyap > > > -----Original Message----- > > From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- > > scsi@freebsd.org] On Behalf Of Desai, Kashyap > > Sent: Monday, January 21, 2013 11:18 PM > > To: Kenneth D. Merry > > Cc: freebsd-scsi@freebsd.org; jhb@freebsd.org; McConnell, Stephen > > Subject: RE: Max Queue depth of HBA limited to 256 ? > > > > > > > > > -----Original Message----- > > > From: Kenneth D. Merry [mailto:ken@freebsd.org] > > > Sent: Monday, January 21, 2013 10:35 PM > > > To: Desai, Kashyap > > > Cc: freebsd-scsi@freebsd.org; McConnell, Stephen; Saxena, Sumit; > > > jhb@freebsd.org > > > Subject: Re: Max Queue depth of HBA limited to 256 ? > > > > > > On Mon, Jan 21, 2013 at 20:15:47 +0530, Desai, Kashyap wrote: > > > > Hi, > > > > > > > > I was trying to check few things on LSI controller, where we have > > > > more > > > than 256 queue depth support. > > > > I added default maxtags in scsi/scsi_xpt.c as below. (Because I > > > > don't > > > want mattags to restrict any outstanding commands the LSI HBA. > > > > > > > > { > > > > /* Default tagged queuing parameters for all devices */ > > > > { > > > > T_ANY, SIP_MEDIA_REMOVABLE|SIP_MEDIA_FIXED, > > > > /*vendor*/"*", /*product*/"*", /*revision*/"*" > > > > }, > > > > /*quirks*/0, /*mintags*/2, /*maxtags*/1024 <--- Default > > > maxtags were 256. I increase it to 10234 > > > > }, > > > > > > > > > > > > LSI's SAS-HBA and MR-HBA can support more than 256 outstanding > > > commands in Firmware. But due to some reason, I am not able to pump > > > more than 256 outstanding commands to the HBA. > > > > > > > > I used "rawio -p 256 /dev/da1" and more /dev/dax in loop. I have > > > sysctl parameter in Driver to display outstanding "FW commands". Max > > > value for FW outstanding only goes up to 256. > > > > > > > > Also from some other mail thread Subject "mfi driver performance", I > > > found that folks talk about tuning queue depth _but_ nobody discussed > > > to increase it beyond 256. Is there any limitation in FreeBSD ? > > > > > > > > > > As Jim pointed out, one thing to check is the values passed into > > > cam_sim_alloc(). In the case of the mps(4) driver, the calculation is > > > in mps_attach(): > > > > > > sc->num_reqs = MIN(MPS_REQ_FRAMES, sc->facts->RequestCredit); > > > > > > What is reported for the RequestCredit on this particular adapter? > > > > > > The other question is, what does 'camcontrol tags daX -v' show when > > > you are running the test? > > > > Below is output of camcontrol tags da1 -v. > > > > dhcp-135-24-192-127# camcontrol tags da13 -v > > (pass13:mrsas0:0:13:0): dev_openings 1024 > > (pass13:mrsas0:0:13:0): dev_active 0 > > (pass13:mrsas0:0:13:0): devq_openings 1024 > > (pass13:mrsas0:0:13:0): devq_queued 0 > > (pass13:mrsas0:0:13:0): held 0 > > (pass13:mrsas0:0:13:0): mintags 2 > > (pass13:mrsas0:0:13:0): maxtags 1024 > > dhcp-135-24-192-127# camcontrol tags da1 -v > > (pass1:mrsas0:0:1:0): dev_openings 1024 > > (pass1:mrsas0:0:1:0): dev_active 0 > > (pass1:mrsas0:0:1:0): devq_openings 1024 > > (pass1:mrsas0:0:1:0): devq_queued 0 > > (pass1:mrsas0:0:1:0): held 0 > > (pass1:mrsas0:0:1:0): mintags 2 > > (pass1:mrsas0:0:1:0): maxtags 1024 > > > > Value 1024 is hard coded for my testing. In MegaRaid controller and SAS- > > HBA Driver read max commands value from FW. > > Similar to "RequestCredit".. Different FW has different value, but they > > are every time above 255. > > > > > > When I run IOs dev_active stays in range of 0-255 only. See below > > output when I run IOs on /dev/da1 and /dev/da13. I expect total > > dev_openings should go beyond 255, which is not happening. > > > > > > dhcp-135-24-192-127# camcontrol tags da1 -v > > (pass1:mrsas0:0:1:0): dev_openings 832 > > (pass1:mrsas0:0:1:0): dev_active 192 > > (pass1:mrsas0:0:1:0): devq_openings 832 > > (pass1:mrsas0:0:1:0): devq_queued 0 > > (pass1:mrsas0:0:1:0): held 0 > > (pass1:mrsas0:0:1:0): mintags 2 > > (pass1:mrsas0:0:1:0): maxtags 1024 > > dhcp-135-24-192-127# camcontrol tags da13 -v > > (pass13:mrsas0:0:13:0): dev_openings 881 > > (pass13:mrsas0:0:13:0): dev_active 143 > > (pass13:mrsas0:0:13:0): devq_openings 881 > > (pass13:mrsas0:0:13:0): devq_queued 0 > > (pass13:mrsas0:0:13:0): held 0 > > (pass13:mrsas0:0:13:0): mintags 2 > > (pass13:mrsas0:0:13:0): maxtags 1024 > > > > > > > > > > Jim: > > Below is my API call. I have hard code value "queue_depth" = 1024 > > > > sc->sim_0 = cam_sim_alloc(mrsas_action, mrsas_poll, "mrsas", sc, > > device_get_unit(sc->mrsas_dev), &sc->sim_lock, queue_depth, > > queue_depth, devq); > > > > ~ Kashyap > > > > > > > > Ken > > > -- > > > Kenneth Merry > > > ken@FreeBSD.ORG > > _______________________________________________ > > freebsd-scsi@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" -- Kenneth Merry ken@FreeBSD.ORG