Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Dec 2010 12:34:11 +0200
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Kirk McKusick <mckusick@mckusick.com>
Cc:        freebsd-fs@freebsd.org, pjd@freebsd.org, Oliver Fromme <olli@lurza.secnetix.de>
Subject:   Re: TRIM support for UFS?
Message-ID:  <20101209103411.GL33073@deviant.kiev.zoral.com.ua>
In-Reply-To: <201012082356.oB8NuxwX040914@chez.mckusick.com>
References:  <20101208225352.GK33073@deviant.kiev.zoral.com.ua> <201012082356.oB8NuxwX040914@chez.mckusick.com>

Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help

--wA9WyeW1yVBM2Q32
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Dec 08, 2010 at 03:56:59PM -0800, Kirk McKusick wrote:
> > Date: Thu, 9 Dec 2010 00:53:52 +0200
> > From: Kostik Belousov <kostikbel@gmail.com>
> > To: Kirk McKusick <mckusick@mckusick.com>
> > Cc: Julian Elischer <julian@freebsd.org>, freebsd-fs@freebsd.org,
> >         pjd@freebsd.org, Oliver Fromme <olli@lurza.secnetix.de>
> > Subject: Re: TRIM support for UFS?
> >=20
> > On Wed, Dec 08, 2010 at 02:23:52PM -0800, Kirk McKusick wrote:
> > >
> > > It is safe to do the BIO_DELETE here because at this point we have
> > > progressed through the file delete to the point where the inode that
> > > claimed this block has been updated on the disk with a NULL pointer.
> >=20
> > Thanks for the explanation. On the other hand, can my scenario be real
> > for async mounts ?
>=20
> Async mounts are not compatible with SU or SUJ. You are running the
> filesystem in a mode where there is no guarantee of recovery after
> a crash. So yes, you may TRIM blocks that are still claimed by on-disk
> inodes. But that is likely the least of your worries. Since async makes
> no claims about filesystem consistency after a crash, adding one more
> way for it to break does not seem like a big deal :-)
>=20
> > > The above patch looks good though I would do it unconditionally
> > > (e.g., also for snapshots). It seems sensible to make it conditional
> > > on a sysctl variable so that folks could experiment with it more
> > > easily. And I would leave it off by default as non-SSD disks are
> > > unlikely to benefit from it. If it does prove useful for SSD disks
> > > then I would make it conditional on a filesystem flag so that it
> > > is possible to enable it on a filesystem-by-filesystem basis (e.g.,
> > > on for filesystems on the SSD and off for the rest).
> >=20
> > I think it is better to have a flag in superblock instead of the mount
> > option.
> >
> > Most of diff deleted...
> >=20
> > diff --git a/sys/ufs/ffs/fs.h b/sys/ufs/ffs/fs.h
> > index ba98ed3..13d9ede 100644
> > --- a/sys/ufs/ffs/fs.h
> > +++ b/sys/ufs/ffs/fs.h
> > @@ -417,6 +417,7 @@ CTASSERT(sizeof(struct fs) =3D=3D 1376);
> >  #define FS_FLAGS_UPDATED 0x0080 /* flags have been moved to new locati=
on */
> >  #define FS_NFS4ACLS	0x0100	/* file system has NFSv4 ACLs enabled */
> >  #define FS_INDEXDIRS	0x0200	/* kernel supports indexed directories */
> > +#define FS_TRIM 	0x0400	/* issue BIO_DELETE for deleted blocks */
> >=20
> >  /*
> >   * Macros to access bits in the fs_active array.
>=20
> I approve of this change and am happy to see it (finally) get added.
>=20
> > On Wed, 8 Dec 2010 at 23:33:27 +0100, Pawel Jakub Dawidek wrote:
> >> On Wed, Dec 08, 2010 at 02:23:52PM -0800, Kirk McKusick wrote:
> >> The above patch looks good though I would do it unconditionally
> >> (e.g., also for snapshots). It seems sensible to make it conditional
> >> on a sysctl variable so that folks could experiment with it more
> >> easily. And I would leave it off by default as non-SSD disks are
> >> unlikely to benefit from it. If it does prove useful for SSD disks
> >> then I would make it conditional on a filesystem flag so that it
> >> is possible to enable it on a filesystem-by-filesystem basis (e.g.,
> >> on for filesystems on the SSD and off for the rest).
> >=20
> > We already have a flag for disks: DISKFLAG_CANDELETE, which tells if the
> > disk support TRIM or not. Next we should add BIO_GETATTR attribute for
> > DISK class to return true if TRIM is supported. This way UFS can ask if
> > TRIM is supported on mount and don't bother sending BIO_DELETE if it is
> > not supported.
>=20
> Clearly asking the underlying device is the most automated way to decide
> whether to do TRIM. Once it is possible to ask the underlying disk if it
> supports TRIM, your change could be modified to do the querry to the disk
> and set the FS_TRIM flag accordingly each time that the filesystem is
> mounted. If it is easy to add a BIO_GETATTR attribute for the DISK class,
> then we can skip the intermediate stage of using tunefs to enable/disable=
 it.

I still think that TRIM should be opt-in, at least for the first time.
Below is the BIO_GETATTR change integrated.

Now, both FS_TRIM should be set and device shall report the
GEOM::candelete attribute as true to have BIO_DELETE issued. geom_disk
and md(4) report GEOM::candelete. Also, I updated tunefs(8) manpage,
missed in the previous version.

diff --git a/sbin/dumpfs/dumpfs.c b/sbin/dumpfs/dumpfs.c
index 38c05f6..fa562dc 100644
--- a/sbin/dumpfs/dumpfs.c
+++ b/sbin/dumpfs/dumpfs.c
@@ -253,9 +253,11 @@ dumpfs(const char *name)
 		printf("fs_flags expanded ");
 	if (fsflags & FS_NFS4ACLS)
 		printf("nfsv4acls ");
+	if (fsflags & FS_TRIM)
+		printf("trim ");
 	fsflags &=3D ~(FS_UNCLEAN | FS_DOSOFTDEP | FS_NEEDSFSCK | FS_INDEXDIRS |
 		     FS_ACLS | FS_MULTILABEL | FS_GJOURNAL | FS_FLAGS_UPDATED |
-		     FS_NFS4ACLS | FS_SUJ);
+		     FS_NFS4ACLS | FS_SUJ | FS_TRIM);
 	if (fsflags !=3D 0)
 		printf("unknown flags (%#x)", fsflags);
 	putchar('\n');
diff --git a/sbin/tunefs/tunefs.8 b/sbin/tunefs/tunefs.8
index a883cd4..c62af94 100644
--- a/sbin/tunefs/tunefs.8
+++ b/sbin/tunefs/tunefs.8
@@ -28,7 +28,7 @@
 .\"     @(#)tunefs.8	8.2 (Berkeley) 12/11/93
 .\" $FreeBSD$
 .\"
-.Dd March 6, 2010
+.Dd December 9, 2010
 .Dt TUNEFS 8
 .Os
 .Sh NAME
@@ -51,6 +51,7 @@
 .Op Fl p
 .Op Fl s Ar avgfpdir
 .Op Fl S Ar size
+.Op Fl t Cm enable | disable
 .Ar special | filesystem
 .Sh DESCRIPTION
 The
@@ -143,6 +144,10 @@ Specify the expected number of files per directory.
 .It Fl S Ar size
 Specify the softdep journal size in bytes.
 The minimum is 4M.
+.It Fl t Cm enable | disable
+Turn on/off the TRIM enable flag.
+If enabled, and if the underlaying device supports BIO_DELETE
+command, kernel will delete the freed blocks.
 .El
 .Pp
 At least one of the above flags is required.
diff --git a/sbin/tunefs/tunefs.c b/sbin/tunefs/tunefs.c
index b2ea602..0ed713d 100644
--- a/sbin/tunefs/tunefs.c
+++ b/sbin/tunefs/tunefs.c
@@ -82,11 +82,13 @@ int
 main(int argc, char *argv[])
 {
 	char *avalue, *jvalue, *Jvalue, *Lvalue, *lvalue, *Nvalue, *nvalue;
+	char *tvalue;
 	const char *special, *on;
 	const char *name;
 	int active;
 	int Aflag, aflag, eflag, evalue, fflag, fvalue, jflag, Jflag, Lflag;
 	int lflag, mflag, mvalue, Nflag, nflag, oflag, ovalue, pflag, sflag;
+	int tflag;
 	int svalue, Sflag, Svalue;
 	int ch, found_arg, i;
 	const char *chg[2];
@@ -101,7 +103,8 @@ main(int argc, char *argv[])
 	evalue =3D fvalue =3D mvalue =3D ovalue =3D svalue =3D Svalue =3D 0;
 	active =3D 0;
 	found_arg =3D 0;		/* At least one arg is required. */
-	while ((ch =3D getopt(argc, argv, "Aa:e:f:j:J:L:l:m:N:n:o:ps:S:")) !=3D -=
1)
+	while ((ch =3D getopt(argc, argv, "Aa:e:f:j:J:L:l:m:N:n:o:ps:S:t:"))
+	    !=3D -1)
 		switch (ch) {
=20
 		case 'A':
@@ -268,6 +271,18 @@ main(int argc, char *argv[])
 			Sflag =3D 1;
 			break;
=20
+		case 't':
+			found_arg =3D 1;
+			name =3D "trim";
+			tvalue =3D optarg;
+			if (strcmp(tvalue, "enable") !=3D 0 &&
+			    strcmp(tvalue, "disable") !=3D 0) {
+				errx(10, "bad %s (options are %s)",
+				    name, "`enable' or `disable'");
+			}
+			tflag =3D 1;
+			break;
+
 		default:
 			usage();
 		}
@@ -493,6 +508,24 @@ main(int argc, char *argv[])
 			sblock.fs_avgfpdir =3D svalue;
 		}
 	}
+	if (tflag) {
+		name =3D "issue TRIM to the disk";
+ 		if (strcmp(tvalue, "enable") =3D=3D 0) {
+			if (sblock.fs_flags & FS_TRIM)
+				warnx("%s remains unchanged as enabled", name);
+			else {
+ 				sblock.fs_flags |=3D FS_TRIM;
+ 				warnx("%s set", name);
+			}
+ 		} else if (strcmp(tvalue, "disable") =3D=3D 0) {
+			if ((~sblock.fs_flags & FS_TRIM) =3D=3D FS_TRIM)
+				warnx("%s remains unchanged as disabled", name);
+			else {
+ 				sblock.fs_flags &=3D ~FS_TRIM;
+ 				warnx("%s cleared", name);
+			}
+ 		}
+	}
=20
 	if (sbwrite(&disk, Aflag) =3D=3D -1)
 		goto err;
@@ -1011,12 +1044,13 @@ out:
 void
 usage(void)
 {
-	fprintf(stderr, "%s\n%s\n%s\n%s\n%s\n",
+	fprintf(stderr, "%s\n%s\n%s\n%s\n%s\n%s\n",
 "usage: tunefs [-A] [-a enable | disable] [-e maxbpg] [-f avgfilesize]",
 "              [-J enable | disable] [-j enable | disable]",=20
 "              [-L volname] [-l enable | disable] [-m minfree]",
 "              [-N enable | disable] [-n enable | disable]",
-"              [-o space | time] [-p] [-s avgfpdir] special | filesystem");
+"              [-o space | time] [-p] [-s avgfpdir] [-t enable | disable]",
+"              special | filesystem");
 	exit(2);
 }
=20
@@ -1035,6 +1069,8 @@ printfs(void)
 		(sblock.fs_flags & FS_SUJ)? "enabled" : "disabled");
 	warnx("gjournal: (-J)                                     %s",
 		(sblock.fs_flags & FS_GJOURNAL)? "enabled" : "disabled");
+	warnx("trim: (-p)                                         %s",=20
+		(sblock.fs_flags & FS_TRIM)? "enabled" : "disabled");
 	warnx("maximum blocks per file in a cylinder group: (-e)  %d",
 	      sblock.fs_maxbpg);
 	warnx("average file size: (-f)                            %d",
diff --git a/sys/dev/md/md.c b/sys/dev/md/md.c
index 6c3484e..c294d1b 100644
--- a/sys/dev/md/md.c
+++ b/sys/dev/md/md.c
@@ -713,11 +713,12 @@ md_kthread(void *arg)
 		}
 		mtx_unlock(&sc->queue_mtx);
 		if (bp->bio_cmd =3D=3D BIO_GETATTR) {
-			if (sc->fwsectors && sc->fwheads &&
+			if ((sc->fwsectors && sc->fwheads &&
 			    (g_handleattr_int(bp, "GEOM::fwsectors",
 			    sc->fwsectors) ||
 			    g_handleattr_int(bp, "GEOM::fwheads",
-			    sc->fwheads)))
+			    sc->fwheads))) ||
+			    g_handleattr_int(bp, "GEOM::candelete", 1))
 				error =3D -1;
 			else
 				error =3D EOPNOTSUPP;
diff --git a/sys/geom/geom_disk.c b/sys/geom/geom_disk.c
index 25d2e3b..eda702c 100644
--- a/sys/geom/geom_disk.c
+++ b/sys/geom/geom_disk.c
@@ -297,6 +297,9 @@ g_disk_start(struct bio *bp)
 		} while (bp2 !=3D NULL);
 		break;
 	case BIO_GETATTR:
+		if (g_handleattr_int(bp, "GEOM::candelete",
+		    (dp->d_flags & DISKFLAG_CANDELETE) !=3D 0))
+			break;
 		if (g_handleattr_int(bp, "GEOM::fwsectors", dp->d_fwsectors))
 			break;
 		else if (g_handleattr_int(bp, "GEOM::fwheads", dp->d_fwheads))
diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c
index b740bbb..926229c 100644
--- a/sys/ufs/ffs/ffs_alloc.c
+++ b/sys/ufs/ffs/ffs_alloc.c
@@ -83,6 +83,8 @@ __FBSDID("$FreeBSD$");
=20
 #include <security/audit/audit.h>
=20
+#include <geom/geom.h>
+
 #include <ufs/ufs/dir.h>
 #include <ufs/ufs/extattr.h>
 #include <ufs/ufs/quota.h>
@@ -1850,6 +1852,7 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd)
 	u_int cg;
 	u_int8_t *blksfree;
 	struct cdev *dev;
+	struct bio *bip;
=20
 	cg =3D dtog(fs, bno);
 	if (devvp->v_type =3D=3D VREG) {
@@ -1962,6 +1965,16 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd)
 		softdep_setup_blkfree(UFSTOVFS(ump), bp, bno,
 		    numfrags(fs, size), dephd);
 	bdwrite(bp);
+
+	if (ump->um_candelete) {
+		bip =3D g_alloc_bio();
+		bip->bio_cmd =3D BIO_DELETE;
+		bip->bio_offset =3D dbtob(fsbtodb(fs, bno));
+		bip->bio_done =3D g_destroy_bio;
+		bip->bio_length =3D size;
+		g_io_request(bip,
+		    (struct g_consumer *)devvp->v_bufobj.bo_private);
+	}
 }
=20
 #ifdef INVARIANTS
diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
index 72f40da..f578382 100644
--- a/sys/ufs/ffs/ffs_vfsops.c
+++ b/sys/ufs/ffs/ffs_vfsops.c
@@ -895,6 +895,21 @@ ffs_mountfs(devvp, mp, td)
 		    mp->mnt_stat.f_mntonname);
 #endif
 	}
+	if ((fs->fs_flags & FS_TRIM) !=3D 0) {
+		size =3D sizeof(int);
+		if (g_io_getattr("GEOM::candelete", cp, &size,
+		    &ump->um_candelete) =3D=3D 0) {
+			if (!ump->um_candelete)
+				printf(
+"WARNING: %s: TRIM flag on fs but disk does not support TRIM\n",
+				    mp->mnt_stat.f_mntonname);
+		} else {
+			printf(
+"WARNING: %s: TRIM flag on fs but cannot get whether disk supports TRIM\n",
+			    mp->mnt_stat.f_mntonname);
+			ump->um_candelete =3D 0;
+		}
+	}
=20
 	ump->um_mountp =3D mp;
 	ump->um_dev =3D dev;
diff --git a/sys/ufs/ffs/fs.h b/sys/ufs/ffs/fs.h
index ba98ed3..13d9ede 100644
--- a/sys/ufs/ffs/fs.h
+++ b/sys/ufs/ffs/fs.h
@@ -417,6 +417,7 @@ CTASSERT(sizeof(struct fs) =3D=3D 1376);
 #define FS_FLAGS_UPDATED 0x0080	/* flags have been moved to new location */
 #define FS_NFS4ACLS	0x0100	/* file system has NFSv4 ACLs enabled */
 #define FS_INDEXDIRS	0x0200	/* kernel supports indexed directories */
+#define	FS_TRIM		0x0400	/* issue BIO_DELETE for deleted blocks */
=20
 /*
  * Macros to access bits in the fs_active array.
diff --git a/sys/ufs/ufs/ufsmount.h b/sys/ufs/ufs/ufsmount.h
index 03edd73..b13db40 100644
--- a/sys/ufs/ufs/ufsmount.h
+++ b/sys/ufs/ufs/ufsmount.h
@@ -95,6 +95,7 @@ struct ufsmount {
 	time_t	um_itime[MAXQUOTAS];		/* inode quota time limit */
 	char	um_qflags[MAXQUOTAS];		/* quota specific flags */
 	int64_t	um_savedmaxfilesize;		/* XXX - limit maxfilesize */
+	int	um_candelete;			/* devvp supports TRIM */
 	int	(*um_balloc)(struct vnode *, off_t, int, struct ucred *, int, struct =
buf **);
 	int	(*um_blkatoff)(struct vnode *, off_t, char **, struct buf **);
 	int	(*um_truncate)(struct vnode *, off_t, int, struct ucred *, struct thr=
ead *);

--wA9WyeW1yVBM2Q32
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk0AsKMACgkQC3+MBN1Mb4gTWgCgr4ImFVQhYac2hVOHZOnTneXv
VR0AoKXYQX99T79attXilSYn6/kjX86+
=RLhi
-----END PGP SIGNATURE-----

--wA9WyeW1yVBM2Q32--



Want to link to this message? Use this URL: <http://docs.FreeBSD.org/cgi/mid.cgi?20101209103411.GL33073>