Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Mar 2013 16:53:37 +0400
From:      Lev Serebryakov <lev@FreeBSD.org>
To:        Don Lewis <truckman@FreeBSD.org>
Cc:        freebsd-fs@FreeBSD.org, ivoras@FreeBSD.org, freebsd-geom@FreeBSD.org
Subject:   Re: Unexpected SU+J inconsistency AGAIN -- please, don't shift topic to ZFS!
Message-ID:  <1402477662.20130306165337@serebryakov.spb.ru>
In-Reply-To: <201303061001.r26A18n3015414@gw.catspoiler.org>
References:  <1198028260.20130306124139@serebryakov.spb.ru> <201303061001.r26A18n3015414@gw.catspoiler.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hello, Don.
You wrote 6 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2013 =D0=B3., 14:01:08:

>> DL> With NCQ or TCQ, the drive can have a sizeable number of writes
>> DL> internally queued and it is free to reorder them as it pleases even =
with
>> DL> write caching disabled, but if write caching is disabled it has to d=
elay
>> DL> the notification of their completion until the data is on the platte=
rs
>> DL> so that UFS+SU can enforce the proper dependency ordering.
>>   But, again, performance would be terrible :( I've checked it. On
>>  very sparse multi-threaded patterns (multiple torrents download on
>>  fast channel in my simple home case, and, I think, things could be
>>  worse in case of big file server in organization) and "simple" SATA
>>  drives it significant worse in my experience :(

DL> I'm surprised that a typical drive would have enough onboard cache for
DL> write caching to help signficantly in that situation.  Is the torrent
   It is 5x64MiB in my case, oh, effectively, 4x64MiB :)
   Really, I could repeat experiment with some predictable and
  repeatable benchmark. What in out ports could be used for
  massively-parallel (16+ files) random (with blocks like 64KiB and
  file sizes like 2+GiB) but "repeatable" benchmark?

DL> software doing a lot of fsync() calls?  Those would essentially turn
  Nope. It trys to avoid fsync(), of course

DL> Creating a file by writing it in random order is fairly expensive.  Each
DL> time a new block is written by the application, UFS+SU has to first find
DL> a free block by searching the block bitmaps, mark that block as
DL> allocated, wait for that write of the bitmap block to complete, write
DL> the data to that block, wait for that to complete, and then write the
DL> block pointer to the inode or an indirect block.  Because of the random
DL> write ordering, there is probably not enough locality to do coalesce
DL> multiple updates to the bitmap and indirect blocks into one write before
DL> the syncer interval expires.  These operations all happen in the
DL> background after the write() call, but once you hit the I/O per second
DL> limit of the drive, eventually enough backlog builds to stall the
DL> application.  Also, if another update needs to be done to a block that
DL> the syncer has queued for writing, that may also cause a stall until the
DL> write completes.  If you hack the torrent software to create and
DL> pre-zero each file before it starts downloading it, then each bitmap and
DL> indirect block will probably only get written once during that operation
DL> and won't get written again during the actual download, and zeroing the
DL> data blocks will be sequential and fast. During the download, the only
DL> writes will be to the data blocks, so you might see something like a 3x
DL> performance improvement.
   My client (transmission, from ports) is configured to do "real
  preallocation" (not sparse one), but it doesn't help much. It surely
  limited by disk I/O :(
    But anyway, torrent client is bad benchmark if we start to speak
  about some real experiments to decide what could be improved in
  FFS/GEOM stack, as it is not very repeatable.


--=20
// Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1402477662.20130306165337>