Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 6 Jul 2014 14:49:16 -0700 (PDT)
From:      Don Lewis <truckman@FreeBSD.org>
To:        kostikbel@gmail.com
Cc:        freebsd-fs@FreeBSD.org, sparvu@systemdatarecorder.org, freebsd-hackers@FreeBSD.org
Subject:   Re: Strange IO performance with UFS
Message-ID:  <201407062149.s66LnGnm021769@gw.catspoiler.org>
In-Reply-To: <20140705195816.GV93733@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On  5 Jul, Konstantin Belousov wrote:
> On Sat, Jul 05, 2014 at 06:18:07PM +0200, Roger Pau Monn? wrote:
>> On 05/07/14 13:24, Konstantin Belousov wrote:
>> > On Sat, Jul 05, 2014 at 12:35:11PM +0200, Roger Pau Monn? wrote:
>> >> On 05/07/14 11:58, Konstantin Belousov wrote:
>> >>> On Sat, Jul 05, 2014 at 11:32:06AM +0200, Roger Pau Monn?
>> >>> wrote:
>> >>>> kernel`g_io_request+0x384 kernel`g_part_start+0x2c3 
>> >>>> kernel`g_io_request+0x384 kernel`g_part_start+0x2c3 
>> >>>> kernel`g_io_request+0x384 kernel`ufs_strategy+0x8a 
>> >>>> kernel`VOP_STRATEGY_APV+0xf5 kernel`bufstrategy+0x46 
>> >>>> kernel`cluster_read+0x5e6 kernel`ffs_balloc_ufs2+0x1be2 
>> >>>> kernel`ffs_write+0x310 kernel`VOP_WRITE_APV+0x166 
>> >>>> kernel`vn_write+0x2eb kernel`vn_io_fault_doio+0x22 
>> >>>> kernel`vn_io_fault1+0x78 kernel`vn_io_fault+0x173 
>> >>>> kernel`dofilewrite+0x85 kernel`kern_writev+0x65 
>> >>>> kernel`sys_write+0x63
>> >>>> 
>> >>>> This can also be seen by running iostat in parallel with the
>> >>>> fio workload:
>> >>>> 
>> >>>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b ada0 
>> >>>> 243.3 233.7 31053.3 29919.1   31  57.4 100
>> >>>> 
>> >>>> This clearly shows that even when I was doing a sequential
>> >>>> write (the fio workload shown above), the disk was actually
>> >>>> reading more data than writing it, which makes no sense, and
>> >>>> all the reads come from the path trace shown above.
>> >>> 
>> >>> The backtrace above means that the BA_CLRBUF was specified for 
>> >>> UFS_BALLOC(). In turns, this occurs when the write size is
>> >>> less than the UFS block size. UFS has to read the block to
>> >>> ensure that partial write does not corrupt the rest of the
>> >>> buffer.
>> >> 
>> >> Thanks for the clarification, that makes sense. I'm not opening
>> >> the file with O_DIRECT, so shouldn't the write be cached in
>> >> memory and flushed to disk when we have the full block? It's a
>> >> sequential write, so the whole block is going to be rewritten
>> >> very soon.
>> >> 
>> >>> 
>> >>> You can get the block size for file with stat(2), st_blksize
>> >>> field of the struct stat, or using statfs(2), field f_iosize of
>> >>> struct statfs, or just looking at the dumpfs output for your
>> >>> filesystem, the bsize value.  For modern UFS typical value is
>> >>> 32KB.
>> >> 
>> >> Yes, block size is 32KB, checked with dumpfs. I've changed the
>> >> block size in fio to 32k and then I get the expected results in
>> >> iostat and fio:
>> >> 
>> >> extended device statistics device     r/s   w/s    kr/s    kw/s
>> >> qlen svc_t  %b ada0       1.0 658.2    31.1 84245.1   58 108.4
>> >> 101 extended device statistics device     r/s   w/s    kr/s
>> >> kw/s qlen svc_t  %b ada0       0.0 689.8     0.0 88291.4   54
>> >> 112.1  99 extended device statistics device     r/s   w/s    kr/s
>> >> kw/s qlen svc_t  %b ada0       1.0 593.3    30.6 75936.9   80
>> >> 111.7  97
>> >> 
>> >> write: io=10240MB, bw=81704KB/s, iops=2553, runt=128339msec
>> > 
>> > The current code in ffs_write() only avoids read before write when 
>> > write covers complete block.  I think we can somewhat loose the
>> > test to also avoid read when we are at EOF and write covers
>> > completely the valid portion of the last block.
>> > 
>> > This leaves the unwritten portion of the block with the garbage. I 
>> > believe that it is not harmful, since the only way for usermode to 
>> > access that garbage is through the mmap(2). The
>> > vnode_generic_getpages() zeroes out parts of the page which are
>> > after EOF.
>> > 
>> > Try this, almost completely untested:
>> 
>> Doesn't seem to help much, I'm still seeing the same issue. I'm
>> sampling iostat every 1s, and here's the output form the start of the
>> 4k block fio workload:
>> 
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 349.5     0.0 44612.3   48  88.0  52
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 655.4     0.0 83773.6   76  99.8 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 699.2     0.0 89493.1   59 109.4 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 628.1     0.0 80392.6   55 114.8  98
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 655.7     0.0 83799.6   79  98.4 102
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 701.4     0.0 89782.0   80 105.5  97
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 697.9     0.0 89331.6   78 112.0 103
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 714.1     0.0 91408.7   77 110.3 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 724.0     0.0 92675.0   67 112.5 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 700.4     0.0 89646.6   49 102.5 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 686.4     0.0 87857.2   78 110.0 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 702.0     0.0 89851.6   80 112.9  97
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 736.3     0.0 94246.4   67 110.1 103
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 624.6     0.0 79950.0   48 115.7 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 704.0     0.0 90118.4   77 106.1 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 714.6     0.0 91470.0   80 103.6  99
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 710.4     0.0 90926.1   80 111.1 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 655.3     0.0 83882.1   70 115.8  99
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 539.8     0.0 69094.5   80 121.2 101
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 711.6     0.0 91087.6   79 107.9 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 705.5     0.0 90304.5   81 111.3  97
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 727.3     0.0 93092.8   81 108.9 102
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 699.5     0.0 89296.4   55 109.0 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 689.0     0.0 88066.1   78  96.6 101
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 738.3     0.0 94496.1   56 109.1 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 615.4     0.0 78770.0   80 112.3  98
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 707.3     0.0 90529.8   86 105.7 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 704.3     0.0 89333.9   67  98.3  99
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 641.3     0.0 82081.5   80 112.3 101
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 701.6     0.0 89747.9   51 101.1 101
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 693.0     0.0 88702.1   80 103.6  97
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 632.7     0.0 80991.8   80 112.0  99
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 709.0     0.0 90748.2   80 107.5 102
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 715.0     0.0 91523.0   80 104.7 101
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 650.1     0.0 83210.5   56 110.9 101
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 682.2     0.0 87319.1   57 107.9 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 719.0     0.0 92032.6   80 103.6  99
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 624.3     0.0 79905.8   80 110.5  97
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 696.5     0.0 89151.7   80 109.9 103
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 664.2     0.0 85017.6   77 109.9 101
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 681.7     0.0 87254.0   80 107.5  98
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 668.5     0.0 85569.3   57 109.9  99
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 682.3     0.0 87329.0   53 110.8 102
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0       0.0 643.9     0.0 82420.9   77 104.8 101
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     107.5 457.1 13701.7 58471.3   57 106.0 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     220.9 253.9 28281.4 32498.9   54 108.8 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     290.6 277.9 37198.8 35576.1   65  94.3 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     309.3 267.9 39590.7 34295.9   80  89.5 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     213.6 302.0 27212.7 38562.0   24  93.5 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     232.1 224.3 29712.5 28339.8   31 117.4 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     262.9 249.4 33654.0 31928.1   47  81.4 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     232.2 229.2 29721.6 29340.5   50  78.5 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     222.8 229.4 28430.0 29362.7   42  85.9 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     231.5 246.5 29628.8 31555.9    6  72.9 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     261.7 256.8 33498.7 32769.1   33  83.9 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     262.7 260.7 33628.3 33279.4   35  85.5 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     234.0 249.1 29867.9 31883.1   18  90.9 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     252.1 239.8 32263.0 30581.4   32  91.2 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     241.5 257.5 30917.0 32961.1   16  69.5 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     257.9 243.5 33011.9 31164.2   32  86.8 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     237.5 235.6 30311.2 30046.9   31  67.4 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     290.4 213.1 37172.8 27277.0   79  65.3 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     216.4 284.3 27703.7 36392.5   42  95.4 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     223.8 248.2 28645.1 31774.4   16  69.4  89
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     294.0 217.7 37544.4 27864.2   64  68.0 110
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     210.7 245.6 26966.6 31439.8   59 107.4 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     228.5 265.2 29246.6 33940.5   10  99.2  98
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     279.1 218.4 35727.2 27955.0   52  71.9 102
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     232.3 293.4 29607.9 37521.4   14  93.2 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     299.5 236.6 38340.2 30288.8   79  69.7 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     216.3 268.9 27686.3 34417.3    4  90.5 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     285.8 261.0 36585.3 33409.5   53  84.6 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     228.5 232.5 29059.7 29661.1   48  74.3 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     242.7 262.4 31060.0 33588.2   27  69.9 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     248.2 252.2 31766.1 32149.3    8  78.9 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     267.9 230.2 34288.6 29462.8   62  68.5 100
>>                         extended device statistics
>> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
>> ada0     238.0 266.2 30375.8 34075.6    0  95.4 100
>> 
>> As can be seen from the log above, at first the workload runs fine,
>> and the disk is only performing writes, but at some point (in this
>> case around 40% of completion) it starts performing this
>> read-before-write dance that completely screws up performance.
> 
> I reproduced this locally.  I think my patch is useless for the fio/4k write
> situation.
> 
> What happens is indeed related to the amount of the available memory.
> When the size of the file written by fio is larger than the memory,
> system has to recycle the cached pages.  So after some moment, doing
> a write has to do read-before-write, and this occurs not at the EOF
> (since fio pre-allocated the job file).

It would seem to be much better to recycle pages associated parts of the
file that haven't been touched in a long time before recycling pages
associated with the filesystem block that is currently being written. If
the writes are sequential, then it definitely makes sense to hang on to
the block until the last portion of the block is written.  It sounds
like we are doing pretty much the opposite of this.

What seems odd is that it sounds like we are detecting the partial write
of a block, reading the block from the disk, updating it with the new
partial data, writing the block, and then immediately tossing it.  It
seems odd that the the dirty block isn't allowed to stick around until
the syncer causes it to be written, with clean blocks being reclaimed in
the meantime.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201407062149.s66LnGnm021769>