Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 5 Jul 2014 18:18:07 +0200
From:      =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= <roger.pau@citrix.com>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-fs@freebsd.org, Stefan Parvu <sparvu@systemdatarecorder.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: Strange IO performance with UFS
Message-ID:  <53B8253F.5060403@citrix.com>
In-Reply-To: <20140705112448.GQ93733@kib.kiev.ua>
References:  <53B691EA.3070108@citrix.com> <53B69C73.7090806@citrix.com> <20140705001938.54a3873dd698080d93d840e2@systemdatarecorder.org> <53B7C616.1000702@citrix.com> <20140705095831.GO93733@kib.kiev.ua> <53B7D4DF.40301@citrix.com> <20140705112448.GQ93733@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 05/07/14 13:24, Konstantin Belousov wrote:
> On Sat, Jul 05, 2014 at 12:35:11PM +0200, Roger Pau Monn? wrote:
>> On 05/07/14 11:58, Konstantin Belousov wrote:
>>> On Sat, Jul 05, 2014 at 11:32:06AM +0200, Roger Pau Monn?
>>> wrote:
>>>> kernel`g_io_request+0x384 kernel`g_part_start+0x2c3 
>>>> kernel`g_io_request+0x384 kernel`g_part_start+0x2c3 
>>>> kernel`g_io_request+0x384 kernel`ufs_strategy+0x8a 
>>>> kernel`VOP_STRATEGY_APV+0xf5 kernel`bufstrategy+0x46 
>>>> kernel`cluster_read+0x5e6 kernel`ffs_balloc_ufs2+0x1be2 
>>>> kernel`ffs_write+0x310 kernel`VOP_WRITE_APV+0x166 
>>>> kernel`vn_write+0x2eb kernel`vn_io_fault_doio+0x22 
>>>> kernel`vn_io_fault1+0x78 kernel`vn_io_fault+0x173 
>>>> kernel`dofilewrite+0x85 kernel`kern_writev+0x65 
>>>> kernel`sys_write+0x63
>>>> 
>>>> This can also be seen by running iostat in parallel with the
>>>> fio workload:
>>>> 
>>>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b ada0 
>>>> 243.3 233.7 31053.3 29919.1   31  57.4 100
>>>> 
>>>> This clearly shows that even when I was doing a sequential
>>>> write (the fio workload shown above), the disk was actually
>>>> reading more data than writing it, which makes no sense, and
>>>> all the reads come from the path trace shown above.
>>> 
>>> The backtrace above means that the BA_CLRBUF was specified for 
>>> UFS_BALLOC(). In turns, this occurs when the write size is
>>> less than the UFS block size. UFS has to read the block to
>>> ensure that partial write does not corrupt the rest of the
>>> buffer.
>> 
>> Thanks for the clarification, that makes sense. I'm not opening
>> the file with O_DIRECT, so shouldn't the write be cached in
>> memory and flushed to disk when we have the full block? It's a
>> sequential write, so the whole block is going to be rewritten
>> very soon.
>> 
>>> 
>>> You can get the block size for file with stat(2), st_blksize
>>> field of the struct stat, or using statfs(2), field f_iosize of
>>> struct statfs, or just looking at the dumpfs output for your
>>> filesystem, the bsize value.  For modern UFS typical value is
>>> 32KB.
>> 
>> Yes, block size is 32KB, checked with dumpfs. I've changed the
>> block size in fio to 32k and then I get the expected results in
>> iostat and fio:
>> 
>> extended device statistics device     r/s   w/s    kr/s    kw/s
>> qlen svc_t  %b ada0       1.0 658.2    31.1 84245.1   58 108.4
>> 101 extended device statistics device     r/s   w/s    kr/s
>> kw/s qlen svc_t  %b ada0       0.0 689.8     0.0 88291.4   54
>> 112.1  99 extended device statistics device     r/s   w/s    kr/s
>> kw/s qlen svc_t  %b ada0       1.0 593.3    30.6 75936.9   80
>> 111.7  97
>> 
>> write: io=10240MB, bw=81704KB/s, iops=2553, runt=128339msec
> 
> The current code in ffs_write() only avoids read before write when 
> write covers complete block.  I think we can somewhat loose the
> test to also avoid read when we are at EOF and write covers
> completely the valid portion of the last block.
> 
> This leaves the unwritten portion of the block with the garbage. I 
> believe that it is not harmful, since the only way for usermode to 
> access that garbage is through the mmap(2). The
> vnode_generic_getpages() zeroes out parts of the page which are
> after EOF.
> 
> Try this, almost completely untested:

Doesn't seem to help much, I'm still seeing the same issue. I'm
sampling iostat every 1s, and here's the output form the start of the
4k block fio workload:

                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 349.5     0.0 44612.3   48  88.0  52
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 655.4     0.0 83773.6   76  99.8 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 699.2     0.0 89493.1   59 109.4 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 628.1     0.0 80392.6   55 114.8  98
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 655.7     0.0 83799.6   79  98.4 102
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 701.4     0.0 89782.0   80 105.5  97
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 697.9     0.0 89331.6   78 112.0 103
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 714.1     0.0 91408.7   77 110.3 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 724.0     0.0 92675.0   67 112.5 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 700.4     0.0 89646.6   49 102.5 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 686.4     0.0 87857.2   78 110.0 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 702.0     0.0 89851.6   80 112.9  97
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 736.3     0.0 94246.4   67 110.1 103
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 624.6     0.0 79950.0   48 115.7 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 704.0     0.0 90118.4   77 106.1 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 714.6     0.0 91470.0   80 103.6  99
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 710.4     0.0 90926.1   80 111.1 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 655.3     0.0 83882.1   70 115.8  99
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 539.8     0.0 69094.5   80 121.2 101
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 711.6     0.0 91087.6   79 107.9 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 705.5     0.0 90304.5   81 111.3  97
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 727.3     0.0 93092.8   81 108.9 102
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 699.5     0.0 89296.4   55 109.0 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 689.0     0.0 88066.1   78  96.6 101
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 738.3     0.0 94496.1   56 109.1 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 615.4     0.0 78770.0   80 112.3  98
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 707.3     0.0 90529.8   86 105.7 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 704.3     0.0 89333.9   67  98.3  99
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 641.3     0.0 82081.5   80 112.3 101
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 701.6     0.0 89747.9   51 101.1 101
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 693.0     0.0 88702.1   80 103.6  97
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 632.7     0.0 80991.8   80 112.0  99
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 709.0     0.0 90748.2   80 107.5 102
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 715.0     0.0 91523.0   80 104.7 101
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 650.1     0.0 83210.5   56 110.9 101
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 682.2     0.0 87319.1   57 107.9 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 719.0     0.0 92032.6   80 103.6  99
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 624.3     0.0 79905.8   80 110.5  97
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 696.5     0.0 89151.7   80 109.9 103
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 664.2     0.0 85017.6   77 109.9 101
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 681.7     0.0 87254.0   80 107.5  98
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 668.5     0.0 85569.3   57 109.9  99
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 682.3     0.0 87329.0   53 110.8 102
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0       0.0 643.9     0.0 82420.9   77 104.8 101
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     107.5 457.1 13701.7 58471.3   57 106.0 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     220.9 253.9 28281.4 32498.9   54 108.8 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     290.6 277.9 37198.8 35576.1   65  94.3 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     309.3 267.9 39590.7 34295.9   80  89.5 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     213.6 302.0 27212.7 38562.0   24  93.5 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     232.1 224.3 29712.5 28339.8   31 117.4 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     262.9 249.4 33654.0 31928.1   47  81.4 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     232.2 229.2 29721.6 29340.5   50  78.5 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     222.8 229.4 28430.0 29362.7   42  85.9 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     231.5 246.5 29628.8 31555.9    6  72.9 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     261.7 256.8 33498.7 32769.1   33  83.9 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     262.7 260.7 33628.3 33279.4   35  85.5 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     234.0 249.1 29867.9 31883.1   18  90.9 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     252.1 239.8 32263.0 30581.4   32  91.2 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     241.5 257.5 30917.0 32961.1   16  69.5 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     257.9 243.5 33011.9 31164.2   32  86.8 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     237.5 235.6 30311.2 30046.9   31  67.4 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     290.4 213.1 37172.8 27277.0   79  65.3 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     216.4 284.3 27703.7 36392.5   42  95.4 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     223.8 248.2 28645.1 31774.4   16  69.4  89
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     294.0 217.7 37544.4 27864.2   64  68.0 110
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     210.7 245.6 26966.6 31439.8   59 107.4 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     228.5 265.2 29246.6 33940.5   10  99.2  98
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     279.1 218.4 35727.2 27955.0   52  71.9 102
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     232.3 293.4 29607.9 37521.4   14  93.2 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     299.5 236.6 38340.2 30288.8   79  69.7 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     216.3 268.9 27686.3 34417.3    4  90.5 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     285.8 261.0 36585.3 33409.5   53  84.6 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     228.5 232.5 29059.7 29661.1   48  74.3 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     242.7 262.4 31060.0 33588.2   27  69.9 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     248.2 252.2 31766.1 32149.3    8  78.9 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     267.9 230.2 34288.6 29462.8   62  68.5 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
ada0     238.0 266.2 30375.8 34075.6    0  95.4 100

As can be seen from the log above, at first the workload runs fine,
and the disk is only performing writes, but at some point (in this
case around 40% of completion) it starts performing this
read-before-write dance that completely screws up performance.

Roger.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53B8253F.5060403>