Date: Sat, 17 Oct 2015 00:54:52 +0300 From: Slawa Olhovchenkov <slw@zxy.spb.ru> To: Warner Losh <imp@bsdimp.com> Cc: Warner Losh <imp@FreeBSD.org>, src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r289405 - head/sys/ufs/ffs\ Message-ID: <20151016215452.GQ6469@zxy.spb.ru> In-Reply-To: <4FC55895-99AF-4E5B-9E1B-C5085F3FC178@bsdimp.com> References: <201510160306.t9G3622O049128@repo.freebsd.org> <20151016131940.GE42243@zxy.spb.ru> <3ADA7934-3EE1-449E-A8D1-723B73020C13@bsdimp.com> <20151016201850.GP6469@zxy.spb.ru> <4FC55895-99AF-4E5B-9E1B-C5085F3FC178@bsdimp.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Oct 16, 2015 at 03:00:50PM -0600, Warner Losh wrote: > >>>> Do not relocate extents to make them contiguous if the underlying drive can do > >>>> deletions. Ability to do deletions is a strong indication that this > >>>> optimization will not help performance. It will only generate extra write > >>>> traffic. These devices are typically flash based and have a limited number of > >>>> write cycles. In addition, making the file contiguous in LBA space doesn't > >>>> improve the access times from flash devices because they have no seek time. > >>> > >>> In reality, flash devices have seek time, about 0.1ms. > >>> Many flash devices can do 8 simultaneously "seek" (I think NVMe can do > >>> more). > >> > >> That's just not true. tREAD for most flash is a few tens of microseconds. The > >> streaming time is at most 10 microseconds. There's no "seek" time in the classic > >> sense. Once you get the data, you have it. There's no extra "read time" in > >> the NAND flash parts. > >> > >> And the number of simultaneous reads depends a lot on how the flash vendor > >> organized the flash. Many of today's designs use 8 or 16 die parts that have 2 > >> to 4 planes on them, giving a parallelism in the 16-64 range. And that's before > >> we get into innovative strategies that use partial page reads to decrease tREAD > >> time and novel data striping methods. > >> > >> Seek time, as a separate operation, simply doesn't exist. > >> > >> Furthermore, NAND-based devices are log-structured with garbage collection > >> for both retention and to deal with retired blocks in the underlying NAND. The > >> relationship between LBA ranges and where the data is at any given time on > >> the NAND is almost uncorrelated. > >> > >> So, rearranging data so that it is in LBA contiguous ranges doesn't help once > >> you're above the FFS block level. > > > > Stream of random reads 512-4096 bytes from most flash SATA drives in one > > thread give about 10K IOPS. This is only 40Mbit/s from 6*0.8 Gbit/s > > SATA bandwidth. You may decompose 0.1ms to different, real delay (bank > > select, command process and etc.) or give 0.1ms seek time for all > > practical purpose. > > I strongly disagree. That's not seek time in the classic sense. All of those 100us > are the delay from reading the data from the flash. The reason I'm so adamant > is that adjacent pages read have exactly the same cost. In a spinning disk, > adjacent sectors read have a tiny cost compared to moving the head (seeking). > > Then again, I spent almost three years building a PCIe NAND-based flash > drive, so maybe I'm biased by that experience... For internal view you right. For external view this delay like seek time. For general HDD total_time = seek_time + transfer_time. seek_time independed of block_size. transfer_time depended of block_size and is block_size/transfer_speed. http://tweakers.net/benchdb/testcombo/3817 https://docs.google.com/spreadsheets/d/1BJ-XY0xwc1JvHJxfOGbcW2Be-q5GQbATcMOh9W0d1-U/edit?usp=sharing This is very closed aproximated by 0.11+block_size/461456. As for HDD. Yes, no real seek. But in all models this is like seek time, regardless of real cause.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151016215452.GQ6469>