Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 30 Apr 2011 10:52:58 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: TRIM clustering
Message-ID:  <4DBBBFDA.8040908@FreeBSD.org>
In-Reply-To: <20110430072831.GA65598@icarus.home.lan>
References:  <4DBBB20A.5050102@FreeBSD.org> <20110430072831.GA65598@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
Jeremy Chadwick wrote:
> On Sat, Apr 30, 2011 at 09:54:02AM +0300, Alexander Motin wrote:
>> I've noticed that on file deletion from UFS with TRIM enabled, kernel
>> issues BIO_DELETE for each 16K (block size?) separately -- thousands per
>> second for single big file deletion. Fortunately ada driver will try to
>> aggregate them for the device, but won't some clustering code worth to
>> be there?
> 
> I'd like to know who decided it would be best to submit the TRIM command
> automatically on every single block that is deemed free by UFS during
> inode removal.  The performance hit, from what I've been reading, from
> doing this is quite severe.  Many SSDs take hundreds of milliseconds to
> complete TRIM operations, which greatly impacts filesystem performance.
> I appreciate the efforts to get TRIM into FreeBSD for UFS, but the
> implementation -- if what Alexander says is accurate -- seems like a bad
> choice.

There is a special code in ada driver, grouping multiple ranges into
single disk command to address that hundreds of milliseconds delay. So
it is not so major problem. But 25K BIO_DELETEs per second flowing
through GEOM is not very good for the system. And 25K ranges still could
be more difficult for the disk. And while ATA provides the way to delete
multiple ranges with one command, I am not sure SCSI or proprietary
drivers can do the same.

> Solutions as I see them:
> 
> a) Provide appropriate UFS framework to obtain a list of freed blocks (I
> do not know much about UFS under the hood so I don't know how to
> accomplish this), and let a userspace daemon issue the appropriate
> commands to the underlying ATA/CAM layer, providing a list (more
> importantly, a range of) LBAs to initiate TRIM for.  Daemon could run at
> some particular interval (controlled by the user of course), or meet
> sets of required criteria before actually doing it.
> 
> b) periodic(8) script (relying on appropriate ways of getting freed
> blocks) which could run weekly.  Maybe the TRIM-issuing piece could be
> implemented in both atacontrol(8) and camcontrol(8)?
> 
> c) Don't want it in userspace?  Okay, make it some kind of kernel
> thread.  It still needs to be configurable, probably through sysctl.  It
> should also provide some form of accounting details (how many LBAs its
> freed, as well as how many times TRIM itself has been run (these are two
> separate metrics)).
> 
> d) Look at how Linux and/or Windows 7 does this.  I believe Linux
> doesn't do it automatically at all, but instead provides necessary
> frameworks within libata and their SCSI layer to offer the capability.
> There was a script circling within the Linux community called "wiper.sh"
> which required use of a very new version of hdparm(8) that would find
> freed blocks on ext3 (I think?) and issue hdparm commands to induce
> TRIM on sets of LBAs.  ext4 seems to offer some sort of "support" for
> this but only when the filesystem is mounted with an option called
> "discard" (and specifying that mount option is a manual process).
> 
> Catches: whatever method needs to be able to handle the situation where
> a device is added on-the-fly (e.g. hot-swap insertion of a new disk), so
> for TRIM capability identification, probing kern.disks for TRIM
> capability per appropriate ioctls would be ideal.

I don't think user level can/should do anything here for the live file
systems.  Otherwise it would require FS code to report it about blocks
reuse and wait for running TRIM to complete. Bad choice. Existing
implementation integrated into SU (as I understand it) is IMHO fine,
except lack of clustering. If you like to do it offline - there is new
-E option just added to the fsck_ffs.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DBBBFDA.8040908>