From owner-freebsd-hackers@FreeBSD.ORG Sat Jul 5 10:35:18 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F2644356; Sat, 5 Jul 2014 10:35:17 +0000 (UTC) Received: from SMTP02.CITRIX.COM (smtp02.citrix.com [66.165.176.63]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "mail.citrix.com", Issuer "Cybertrust Public SureServer SV CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EB4AA2BCF; Sat, 5 Jul 2014 10:35:16 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.01,607,1400025600"; d="scan'208";a="150119556" Received: from accessns.citrite.net (HELO FTLPEX01CL02.citrite.net) ([10.9.154.239]) by FTLPIPO02.CITRIX.COM with ESMTP; 05 Jul 2014 10:35:12 +0000 Received: from [IPv6:::1] (10.80.16.47) by smtprelay.citrix.com (10.13.107.79) with Microsoft SMTP Server id 14.3.181.6; Sat, 5 Jul 2014 06:35:12 -0400 Message-ID: <53B7D4DF.40301@citrix.com> Date: Sat, 5 Jul 2014 12:35:11 +0200 From: =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: Strange IO performance with UFS References: <53B691EA.3070108@citrix.com> <53B69C73.7090806@citrix.com> <20140705001938.54a3873dd698080d93d840e2@systemdatarecorder.org> <53B7C616.1000702@citrix.com> <20140705095831.GO93733@kib.kiev.ua> In-Reply-To: <20140705095831.GO93733@kib.kiev.ua> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-DLP: MIA2 Cc: freebsd-fs@freebsd.org, Stefan Parvu , FreeBSD Hackers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 05 Jul 2014 10:35:18 -0000 On 05/07/14 11:58, Konstantin Belousov wrote: > On Sat, Jul 05, 2014 at 11:32:06AM +0200, Roger Pau Monn? wrote: >> On 04/07/14 23:19, Stefan Parvu wrote: >>> Hi, >>> >>>>> I'm doing some tests on IO performance using fio, and I've >>>>> found something weird when using UFS and large files. I >>>>> have the following very simple sequential fio workload: >>> >>> System: FreeBSD ox 10.0-RELEASE-p6 FreeBSD 10.0-RELEASE-p6 #0: >>> Tue Jun 24 07:47:37 UTC 2014 >>> root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC >>> amd64 >>> >>> >>> 1. Seq Write to 1 file, 10GB size, single writer, block 4k, >>> UFS2: >>> >>> I tried to write seq using a single writer using an IOSIZE >>> similar to your example, 10 GB to a 14TB Hdw RAID 10 LSI device >>> using fio 2.1.9 under FreeBSD 10.0. >>> >>> Result: Run status group 0 (all jobs): WRITE: io=10240MB, >>> aggrb=460993KB/s, minb=460993KB/s, maxb=460993KB/s, >>> mint=22746msec, maxt=22746msec >> >> This looks much better than what I've saw in my benchmarks, how >> much memory does the system have? >> >> In my case I've seen the reads issue when trying to write to >> files that where greater than the memory the system has. My box >> has 6GB of RAM and I was using a 10GB file. >> >>> >>> >>> 2. Seq Write to 2500 files, each file 5MB size, multiple >>> writers, UFS2: >>> >>> Result: Run status group 0 (all jobs): WRITE: io=12500MB, >>> aggrb=167429KB/s, minb=334KB/s, maxb=9968KB/s, mint=2568msec, >>> maxt=76450msec >>> >>> Questions: >>> >>> - where are you writing, what storage: hdw / sfw RAID ? >> >> The storage is a simple SATA disk, no RAID: >> >> pass0 at ahcich0 bus 0 scbus0 target 0 lun 0 pass0: >> ATA-8 SATA 3.x device pass0: Serial >> Number Z3T3FJXL pass0: 300.000MB/s transfers (SATA 2.x, UDMA6, >> PIO 8192bytes) pass0: Command Queueing enabled >> >>> - are you using time based fio tests ? >> >> I'm using the following fio workload, as stated in the first >> email: >> >> [global] rw=write size=4g bs=4k >> >> [job1] >> >> The problem doesn't seem to be related to the hardware (I've also >> seen this when running inside of a VM), but to UFS itself that at >> some point (or maybe under certain conditions) starts making a >> lot of reads when doing a simple write: >> >> kernel`g_io_request+0x384 kernel`g_part_start+0x2c3 >> kernel`g_io_request+0x384 kernel`g_part_start+0x2c3 >> kernel`g_io_request+0x384 kernel`ufs_strategy+0x8a >> kernel`VOP_STRATEGY_APV+0xf5 kernel`bufstrategy+0x46 >> kernel`cluster_read+0x5e6 kernel`ffs_balloc_ufs2+0x1be2 >> kernel`ffs_write+0x310 kernel`VOP_WRITE_APV+0x166 >> kernel`vn_write+0x2eb kernel`vn_io_fault_doio+0x22 >> kernel`vn_io_fault1+0x78 kernel`vn_io_fault+0x173 >> kernel`dofilewrite+0x85 kernel`kern_writev+0x65 >> kernel`sys_write+0x63 >> >> This can also be seen by running iostat in parallel with the fio >> workload: >> >> device r/s w/s kr/s kw/s qlen svc_t %b ada0 >> 243.3 233.7 31053.3 29919.1 31 57.4 100 >> >> This clearly shows that even when I was doing a sequential write >> (the fio workload shown above), the disk was actually reading >> more data than writing it, which makes no sense, and all the >> reads come from the path trace shown above. > > The backtrace above means that the BA_CLRBUF was specified for > UFS_BALLOC(). In turns, this occurs when the write size is less > than the UFS block size. UFS has to read the block to ensure that > partial write does not corrupt the rest of the buffer. Thanks for the clarification, that makes sense. I'm not opening the file with O_DIRECT, so shouldn't the write be cached in memory and flushed to disk when we have the full block? It's a sequential write, so the whole block is going to be rewritten very soon. > > You can get the block size for file with stat(2), st_blksize field > of the struct stat, or using statfs(2), field f_iosize of struct > statfs, or just looking at the dumpfs output for your filesystem, > the bsize value. For modern UFS typical value is 32KB. Yes, block size is 32KB, checked with dumpfs. I've changed the block size in fio to 32k and then I get the expected results in iostat and fio: extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 1.0 658.2 31.1 84245.1 58 108.4 101 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 0.0 689.8 0.0 88291.4 54 112.1 99 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b ada0 1.0 593.3 30.6 75936.9 80 111.7 97 write: io=10240MB, bw=81704KB/s, iops=2553, runt=128339msec Roger.