Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Mar 2013 02:01:08 -0800 (PST)
From:      Don Lewis <truckman@FreeBSD.org>
To:        lev@FreeBSD.org
Cc:        freebsd-fs@FreeBSD.org, ivoras@FreeBSD.org, freebsd-geom@FreeBSD.org
Subject:   Re: Unexpected SU+J inconsistency AGAIN -- please, don't shift topic to ZFS!
Message-ID:  <201303061001.r26A18n3015414@gw.catspoiler.org>
In-Reply-To: <1198028260.20130306124139@serebryakov.spb.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On  6 Mar, Lev Serebryakov wrote:
 
> DL> With NCQ or TCQ, the drive can have a sizeable number of writes
> DL> internally queued and it is free to reorder them as it pleases even with
> DL> write caching disabled, but if write caching is disabled it has to delay
> DL> the notification of their completion until the data is on the platters
> DL> so that UFS+SU can enforce the proper dependency ordering.
>   But, again, performance would be terrible :( I've checked it. On
>  very sparse multi-threaded patterns (multiple torrents download on
>  fast channel in my simple home case, and, I think, things could be
>  worse in case of big file server in organization) and "simple" SATA
>  drives it significant worse in my experience :(

I'm surprised that a typical drive would have enough onboard cache for
write caching to help signficantly in that situation.  Is the torrent
software doing a lot of fsync() calls?  Those would essentially turn
into NOPs if write caching is enabled, but would stall the thread until
the data hits the platter if write caching is disabled.

One limitation of NCQ is that it only supports 32 simultaneous commands.
With write caching enabled, you might be able to stuff more writes into
the drive's onboard memory so that it can do a better job of optimizing
the ordering and increase it's number of I/O's per second, though I
wouldn't expect miracles.  A SAS drive and controller with TCQ would
support more simultaneous commands and might also perform better.

Creating a file by writing it in random order is fairly expensive.  Each
time a new block is written by the application, UFS+SU has to first find
a free block by searching the block bitmaps, mark that block as
allocated, wait for that write of the bitmap block to complete, write
the data to that block, wait for that to complete, and then write the
block pointer to the inode or an indirect block.  Because of the random
write ordering, there is probably not enough locality to do coalesce
multiple updates to the bitmap and indirect blocks into one write before
the syncer interval expires.  These operations all happen in the
background after the write() call, but once you hit the I/O per second
limit of the drive, eventually enough backlog builds to stall the
application.  Also, if another update needs to be done to a block that
the syncer has queued for writing, that may also cause a stall until the
write completes.  If you hack the torrent software to create and
pre-zero each file before it starts downloading it, then each bitmap and
indirect block will probably only get written once during that operation
and won't get written again during the actual download, and zeroing the
data blocks will be sequential and fast. During the download, the only
writes will be to the data blocks, so you might see something like a 3x
performance improvement.






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201303061001.r26A18n3015414>