From owner-freebsd-scsi@FreeBSD.ORG Tue Feb 5 02:36:09 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 821) id A853C2E1; Tue, 5 Feb 2013 02:36:09 +0000 (UTC) Date: Tue, 5 Feb 2013 02:36:09 +0000 From: John To: freebsd-scsi@freebsd.org Subject: Increase mps sequential read performance with ZFS/zvol Message-ID: <20130205023609.GA99100@FreeBSD.org> References: <510E987C.4090509@oxit.fi> <510E9FD1.5070907@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <510E9FD1.5070907@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 02:36:09 -0000 Hi Folks, I'm in the process of putting together another ZFS server and after running some sequential read performance tests I'm thinking things could be better. It's running 9.1-stable from late January: FreeBSD vprzfs30p.unx.sas.com 9.1-STABLE FreeBSD 9.1-STABLE #1 r246079M I have two HP D2700 shelves populated with 600GB drives connected to a pair of LSI 9207-8e HBA cards installed in a Del R620 with 128GB of ram, the OS installed an internal raid volume. The shelves are dual channel, each LSI card with a channel through both shelves. Gmultipath is used to bind the disks such that each disk can be addressed by either controller and the I/O balanced. The zfs pool consists of 24 mirrors, each pair one from each shelf. The multipaths are rotated such that I/O is balanced between shelves and controllers. For testing, two 300GB zvols are created, each almost full: NAME USED AVAIL REFER MOUNTPOINT pool0 1.46T 11.4T 31K /pool0 pool0/lun000004 301G 11.4T 261G - pool0/lun000005 301G 11.4T 300G - Running a simple dd test: # dd if=/dev/zvol/pool0/lun000005 of=/dev/null bs=512k 614400+0 records in 614400+0 records out 322122547200 bytes transferred in 278.554656 secs (1156406975 bytes/sec) The drives are spread and balanced across four 6Gb/s channels, 1.1GB/s seems a bit slow. Note, changing the bs= options makes no real difference. Now, if I run 2 'dd' operations against different pools in parallel: # dd if=/dev/zvol/pool0/lun000005 of=/dev/null bs=512k 614400+0 records in 614400+0 records out 322122547200 bytes transferred in 278.605380 secs (1156196435 bytes/sec) # dd if=/dev/zvol/pool0/lun000004 of=/dev/null bs=512k 614400+0 records in 614400+0 records out 322122547200 bytes transferred in 282.065008 secs (1142015274 bytes/sec) This tells me the I/O subsystem has plenty of overhead room available such that the first 'dd' operation could run faster. I've included some basic config information below. No kmem values in /boot/loader.conf. I did play around with block_cap but it made no difference. It seems like something is holding the system back. Thanks for any ideas. -John Output from top during a single dd run: 5 root 11 -8 - 0K 208K zvol:i 1 5:11 41.65% zfskern 0 root 350 -8 0 0K 5600K - 5 3:59 15.23% kernel 1784 root 1 26 0 9944K 2072K CPU1 1 0:31 13.87% dd The zvol:io state appears to be a simple loop wait loop waiting for outstanding I/O requests to complete. How to get more I/O requests going? Sample of the highest number of I/O requests per controller: dev.mps.0.io_cmds_highwater: 207 dev.mps.1.io_cmds_highwater: 126 IOCFACTS (identical): mps0: port 0xec00-0xecff mem 0xdaff0000-0xdaffffff,0xdaf80000-0xdafbffff irq 48 at device 0.0 on pci5 mps0: Doorbell= 0x22000000 mps0: mps_wait_db_ack: successfull count(2), timeout(5) mps0: Doorbell= 0x12000000 mps0: mps_wait_db_ack: successfull count(1), timeout(5) mps0: mps_wait_db_ack: successfull count(1), timeout(5) mps0: mps_wait_db_ack: successfull count(1), timeout(5) mps0: mps_wait_db_ack: successfull count(1), timeout(5) mps0: IOCFacts : MsgVersion: 0x200 HeaderVersion: 0x1b00 IOCNumber: 0 IOCExceptions: 0x0 MaxChainDepth: 128 WhoInit: ROM BIOS NumberOfPorts: 1 RequestCredit: 10240 ProductID: 0x2214 IOCCapabilities: 1285c FWVersion= 15-0-0-0 IOCRequestFrameSize: 32 MaxInitiators: 32 MaxTargets: 1024 MaxSasExpanders: 64 MaxEnclosures: 65 ProtocolFlags: 3 HighPriorityCredit: 128 MaxReplyDescriptorPostQueueDepth: 65504 ReplyFrameSize: 32 MaxVolumes: 0 MaxDevHandle: 1128 MaxPersistentEntries: 128 mps0: Firmware: 15.00.00.00, Driver: 14.00.00.01-fbsd mps0: IOCCapabilities: 1285c And some output from 'gstat -f Z -I 300ms' dT: 0.302s w: 0.300s filter: Z L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 202 202 25450 2.6 0 0 0.0 25.5| multipath/Z0 1 202 202 25046 6.2 0 0 0.0 36.6| multipath/Z2 7 185 185 23735 6.3 0 0 0.0 33.1| multipath/Z4 0 212 212 27125 5.4 0 0 0.0 30.4| multipath/Z6 0 169 169 21616 5.0 0 0 0.0 28.1| multipath/Z8 0 162 162 20768 5.0 0 0 0.0 25.7| multipath/Z10 0 175 175 22463 6.0 0 0 0.0 30.4| multipath/Z12 0 192 192 24582 4.4 0 0 0.0 32.1| multipath/Z14 2 169 169 21616 3.3 0 0 0.0 18.8| multipath/Z16 4 169 169 20808 4.1 0 0 0.0 23.0| multipath/Z18 2 195 195 24602 4.5 0 0 0.0 28.5| multipath/Z20 5 172 172 22039 4.4 0 0 0.0 22.7| multipath/Z22 0 166 166 21192 3.7 0 0 0.0 20.2| multipath/Z24 7 179 179 22887 5.4 0 0 0.0 27.8| multipath/Z26 7 172 172 22039 3.5 0 0 0.0 23.1| multipath/Z28 0 192 192 24582 3.8 0 0 0.0 25.5| multipath/Z30 1 175 175 22463 6.0 0 0 0.0 30.5| multipath/Z32 1 182 182 22907 3.9 0 0 0.0 25.6| multipath/Z34 0 212 212 27125 6.3 0 0 0.0 32.7| multipath/Z36 0 179 179 22483 4.8 0 0 0.0 27.5| multipath/Z38 2 185 185 23735 4.6 0 0 0.0 30.0| multipath/Z40 0 179 179 22887 4.5 0 0 0.0 28.2| multipath/Z42 3 195 195 25006 4.4 0 0 0.0 32.3| multipath/Z44 3 192 192 24582 4.0 0 0 0.0 30.5| multipath/Z46 0 0 0 0 0.0 0 0 0.0 0.0| multipath/Z48 0 179 179 22887 4.7 0 0 0.0 31.0| multipath/Z1 0 185 185 23331 4.1 0 0 0.0 24.8| multipath/Z3 0 175 175 21639 5.3 0 0 0.0 28.2| multipath/Z5 4 162 162 20768 5.1 0 0 0.0 26.6| multipath/Z7 0 195 195 25006 3.5 0 0 0.0 23.4| multipath/Z9 3 179 179 22887 5.0 0 0 0.0 25.7| multipath/Z11 4 159 159 20344 4.9 0 0 0.0 23.7| multipath/Z13 4 166 166 21192 4.3 0 0 0.0 25.1| multipath/Z15 0 169 169 21616 3.9 0 0 0.0 24.7| multipath/Z17 7 189 189 23334 4.2 0 0 0.0 25.7| multipath/Z19 4 169 169 21212 4.3 0 0 0.0 28.1| multipath/Z21 0 159 159 20344 5.3 0 0 0.0 25.8| multipath/Z23 5 185 185 23316 4.1 0 0 0.0 26.0| multipath/Z25 0 192 192 24582 4.9 0 0 0.0 30.6| multipath/Z27 0 172 172 22039 5.5 0 0 0.0 27.4| multipath/Z29 4 166 166 21192 4.2 0 0 0.0 23.7| multipath/Z31 0 169 169 20778 3.5 0 0 0.0 22.2| multipath/Z33 2 172 172 21232 5.1 0 0 0.0 29.4| multipath/Z35 3 169 169 21616 2.9 0 0 0.0 20.1| multipath/Z37 0 179 179 22887 5.2 0 0 0.0 32.0| multipath/Z39 0 212 212 26721 5.4 0 0 0.0 31.7| multipath/Z41 2 175 175 22463 4.4 0 0 0.0 28.0| multipath/Z43 0 179 179 22887 3.6 0 0 0.0 18.2| multipath/Z45 0 179 179 22887 4.3 0 0 0.0 28.3| multipath/Z47 0 0 0 0 0.0 0 0 0.0 0.0| multipath/Z49 Each individual disk on the system shows the capability of 255 tags: # camcontrol tags da0 -v (pass2:mps0:0:10:0): dev_openings 255 (pass2:mps0:0:10:0): dev_active 0 (pass2:mps0:0:10:0): devq_openings 255 (pass2:mps0:0:10:0): devq_queued 0 (pass2:mps0:0:10:0): held 0 (pass2:mps0:0:10:0): mintags 2 (pass2:mps0:0:10:0): maxtags 255 zpool: # zpool status pool: pool0 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM pool0 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 multipath/Z0 ONLINE 0 0 0 multipath/Z1 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 multipath/Z2 ONLINE 0 0 0 multipath/Z3 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 multipath/Z4 ONLINE 0 0 0 multipath/Z5 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 multipath/Z6 ONLINE 0 0 0 multipath/Z7 ONLINE 0 0 0 mirror-4 ONLINE 0 0 0 multipath/Z8 ONLINE 0 0 0 multipath/Z9 ONLINE 0 0 0 mirror-5 ONLINE 0 0 0 multipath/Z10 ONLINE 0 0 0 multipath/Z11 ONLINE 0 0 0 ... mirror-21 ONLINE 0 0 0 multipath/Z42 ONLINE 0 0 0 multipath/Z43 ONLINE 0 0 0 mirror-22 ONLINE 0 0 0 multipath/Z44 ONLINE 0 0 0 multipath/Z45 ONLINE 0 0 0 mirror-23 ONLINE 0 0 0 multipath/Z46 ONLINE 0 0 0 multipath/Z47 ONLINE 0 0 0 spares multipath/Z48 AVAIL multipath/Z49 AVAIL errors: No known data errors