Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 07 Jan 2010 00:31:12 +0200
From:      Alexander Motin <mav@FreeBSD.org>
To:        Ivan Voras <ivoras@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r201658 - head/sbin/geom/class/stripe
Message-ID:  <4B450F30.20705@FreeBSD.org>
In-Reply-To: <9bbcef731001061103u33fd289q727179454b21ce18@mail.gmail.com>
References:  <201001061712.o06HCICF087127@svn.freebsd.org> <9bbcef731001060938k2b0014a2m15eef911b9922b2c@mail.gmail.com> <4B44D8FA.2000608@FreeBSD.org> <9bbcef731001061103u33fd289q727179454b21ce18@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Ivan Voras wrote:
> 2010/1/6 Alexander Motin <mav@freebsd.org>:
>> Ivan Voras wrote:
> 
>>> I think there was one more reason - though I'm not sure if it is still
>>> valid because of your current and future work - the MAXPHYS
>>> limitation. If MAXPHYS is 128k, with 64k stripes data was only to be
>>> read from maximum of 2 drives. With 4k stripes it would have been read
>>> from 128/4=32 drives, though I agree 4k is too low in any case
>>> nowadays. I usually choose 16k or 32k for my setups.
>> While you are right about MAXPHYS influence, and I hope we can rise it
>> not so far, IMHO it is file system business to manage deep enough
>> read-ahead/write-back to make all drives busy, independently from
>> MAXPHYS value. With small MAXPHYS value FS should just generate more
>> requests in advance. Except some RAID3/5/6 cases, where short writes
>> ineffective, MAXPHYS value should only affect processing overhead.
> 
> Yes, my experience which lead to the post was mostly on UFS which,
> while AFAIK it does read-ahead, it still does it serially (I think
> this is implied by your experiments with NCQ and ZFS vs UFS) - so in
> any case only 2 drives are hit with 64k stripe size at any moment in
> time.

I do not think it is true. On system with default MAXPHYS I've made
gstripe with 64K block of 4 equal drives with 108MB/s of maximal read
speed. Reads with dd from large pre-written file on UFS shown:

vfs.read_max=8 (default) - 235090074 bytes/sec
vfs.read_max=16          - 378385148 bytes/sec
vfs.read_max=32          - 386620109 bytes/sec

I've put some printfs into the clustering read code and found enough
read-ahead there. So it works.

One thing IMHO would be nice to see there is the alignment of the
read-ahead requests to the array stripe size/offset. Dirty hack I've
tried there, reduced number of requests to the array components by 30%.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B450F30.20705>