Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Dec 2010 18:26:21 +0300
From:      Lev Serebryakov <lev@serebryakov.spb.ru>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Where userland read/write requests, whcih is larger than MAXPHYS, are splitted?
Message-ID:  <242059106.20101210182621@serebryakov.spb.ru>
In-Reply-To: <4D023D00.10301@FreeBSD.org>
References:  <mailpost.1291988544.5326917.42118.mailing.freebsd.hackers@FreeBSD.cs.nctu.edu.tw> <4D023D00.10301@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hello, Alexander.
You wrote 10 =E4=E5=EA=E0=E1=F0=FF 2010 =E3., 17:45:20:

>>    I'm  digging  thought  GEOM/IO  code  and  can not find place, where
>>  requests  from  userland to read more than MAXPHYS bytes, is splitted
>>  into  several "struct bio"?
>>       It seems, that these children request are issued one-by-one, not in
>>  parallel,   am  I  right?  Why?  It  breaks  down  parallelism,  when
>>  underlying GEOM can process several requests simoltaneously?
> AFAIK first time requests from user-land broken to MAXPHYS-size pieces
> by physio() before entering GEOM. Requests are indeed serialized here, I
> suppose to limit KVA that thread can harvest, but IMHO it could be
> reconsidered.
  It  is good idea, maybe to have GEOM flag for this? For example, any
  stripe/geom3/geom5  code  can  process  read of series of reads, for
  example     much     fater,     than     sequentially -- if userland
  want  to  read big blocks, bigger than stripe size. And small stripe
  size  is  bad  idea due to high fixed cost of transaction. Now, when
  application  read  files  on  RAID5 with big blocks (say, read() is
  called with 1Mb buffer), RAID5 geom sees  read requests   of 128Kb
  in size, one by one. And with stripe size of  128Kb,  it  performs
  as  single  disk :( I can add pre-read for full-sized  reads,  but
  it is not generic solution, and sending BIOs from   one
  (logical/userland) read/write  request  without awaiting  their
  completion is generic solution.

> One more split happens (when needed) at geom_disk module to honor disk
> driver's maximal I/O size. There is no serialization. Most of ATA/SATA
> drivers in 8-STABLE support I/O up to at least min(512K, MAXPHYS) - 128K
> by default. Many SCSI drivers still limited by DFLTPHYS - 64K.
  Yep, it is what I seen in my investigations.

--=20
// Black Lion AKA Lev Serebryakov <lev@serebryakov.spb.ru>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?242059106.20101210182621>