Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 Apr 2012 13:31:19 +0300
From:      Gleb Kurtsou <gleb.kurtsou@gmail.com>
To:        "O. Hartmann" <ohartman@zedat.fu-berlin.de>
Cc:        Current FreeBSD <freebsd-current@freebsd.org>, utisoft@gmail.com
Subject:   Re: Using TMPFS for /tmp and /var/run?
Message-ID:  <20120402103119.GA2389@reks>
In-Reply-To: <4F74BCD5.4040609@zedat.fu-berlin.de>
References:  <4F746F1E.6090702@mail.zedat.fu-berlin.de> <20120329161452.GZ1709@albert.catwhisker.org> <4F74BCD5.4040609@zedat.fu-berlin.de>

next in thread | previous in thread | raw e-mail | index | archive | help

--MGYHOYXEY6WxJCY8
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline

On (29/03/2012 21:49), O. Hartmann wrote:
> Am 03/29/12 18:14, schrieb David Wolfskill:
> > On Thu, Mar 29, 2012 at 04:18:06PM +0200, O. Hartmann wrote:
> >> I was wondering if there are some objections using TMPFS for /tmp and
> >> /var/run.
> >> ...
> >> My question is whether there are objections using TMPFS for bot /tmp/
> >> and /var/run/ at this stage on FreeBSD 10.0-CURRENT/amd64?
> >> ....
> > 
> > I have no experience using tmpfs for /var/run, but I have been using it
> > for /tmp for some time (mostly in i386, though).
> > 
> > While I use it quite successfully on machines with a small number of
> > folks actively busy -- e.g., my desktop; my laptop; my home machines), I
> > encountered some issues when I tried to do so on machines that were
> > intended for significantly "heavier" use.  Specifically:
> > 
> > * Compared to an md-resident /tmp, a tmpfs-resident /tmp has much less
> >   flexibility for specifying the size.  Per mdconfig(8), the former
> >   uses:
> > 
> >      -s size
> >              Size of the memory disk.  Size is the number of 512 byte sectors
> >              unless suffixed with a b, k, m, g, or t which denotes byte, kilo-
> >              byte, megabyte, gigabyte and terabyte respectively. Options -a
> >              and -t swap are implied if not specified.
> > 
> >   while the latter uses:
> > 
> >      size    Specifies the total file system size in bytes.  If zero (the
> >              default) or a value larger than SIZE_MAX - PAGE_SIZE is given,
> >              the available amount of memory (including main memory and swap
> >              space) will be used.
> > 
> >   In this configuration, I would have preferred to have specified
> >   about 10GB for /tmp.  I wouldn't mind if it spilled to swap space,
> >   but I certaianly didn't want it using 10GB of RAM -- especially since
> >   the machines only had 6GB RAM.
> > 
> >   Nor did I especially want *all* of the swap space used for /tmp.  I
> >   would have allocated (say) 20GB for swap.  I wouldn't mind if half of
> >   that were used for /tmp -- but a reason I allocate so much swap is
> >   that I've seen what happens when a machine runs out of swap, and it
> >   wasn't pretty.
> > 
> > 
> >   In any case, effective maximum usable size for tmpfs involves SIZE_MAX
> >   (~4G) & PAGE_SIZE (4K, in my case).

size_t is 64-bit on 64-bit archs.

> > 
> > * Even when I went ahead and created a tmpfs for /tmp, I'd get ENOSPC
> >   whenever I tried to allocate anything on it -- until I dropped the
> >   size specification to <2G (2**32).  Well, 2GB for /tmp just wasn't at
> >   all likely to be useful for my purposes in this case.

Are you using ZFS alongside tmpfs? It should be fixed in 9-STABLE.

> > So I continue to use tmpfs for /tmp for machines with fewer folks
> > logging in, but I'm a bit less enthusiastic about its use unless the
> > workload and other requirements are fairly carefully considered
> > beforehand.
> > 
> > Peace,
> > david
> 
> 
> It seems there is only one switch which determines the size of the tmpfs
> in question (size) and there is no convenient way to say what amount of
> RAM is being used before using the swap space. I'd like to have at least
> a knob determining the limit of RAM being used.

There is no way to force tmpfs to use given amount of RAM only. It's VM
subsystem that decides what pages to swap. Although some tweaking for VM
to prefer swapping tmpfs pages prior to process pages would be nice.

You could try the patch attached. It adds support for size option suffixes
(like 1g) and introduces swap limit (part of the older patch, not sure
if it's any use).

Patch is against 10-CURRENT.
Older version: https://github.com/glk/freebsd-head/commit/3bd8f7d

Thanks,
Gleb.

> 
> On the other hand - my view of those things is really naiv. I think
> having tmpfs isn't even a benefit in terms of security, it should also
> offer a speedy access to files kept in memory, doesn't it?
> 
> Linux is using TMPFS filesystems a lot for these purposes. How do they
> overcome restrictions of the size or not flloding RAM and/or swap?
> 
> Regards,
> Oliver
> 



--MGYHOYXEY6WxJCY8
Content-Type: text/plain; charset=utf-8
Content-Disposition: attachment; filename="tmpfs-memlimit+size.patch.txt"

diff --git a/sys/fs/tmpfs/tmpfs.h b/sys/fs/tmpfs/tmpfs.h
index efa7c6d..3fc72ab 100644
--- a/sys/fs/tmpfs/tmpfs.h
+++ b/sys/fs/tmpfs/tmpfs.h
@@ -337,11 +337,10 @@ struct tmpfs_mount {
 	 * system, set during mount time.  This variable must never be
 	 * used directly as it may be bigger than the current amount of
 	 * free memory; in the extreme case, it will hold the SIZE_MAX
-	 * value.  Instead, use the TMPFS_PAGES_MAX macro. */
+	 * value. */
 	size_t			tm_pages_max;
 
-	/* Number of pages in use by the file system.  Cannot be bigger
-	 * than the value returned by TMPFS_PAGES_MAX in any case. */
+	/* Number of pages in use by the file system. */
 	size_t			tm_pages_used;
 
 	/* Pointer to the node representing the root directory of this
@@ -486,58 +485,32 @@ int	tmpfs_truncate(struct vnode *, off_t);
  * Memory management stuff.
  */
 
-/* Amount of memory pages to reserve for the system (e.g., to not use by
+/*
+ * Amount of memory pages to reserve for the system (e.g., to not use by
  * tmpfs).
- * XXX: Should this be tunable through sysctl, for instance? */
-#define TMPFS_PAGES_RESERVED (4 * 1024 * 1024 / PAGE_SIZE)
+ */
+#define TMPFS_PAGES_MINRESERVED		(4 * 1024 * 1024 / PAGE_SIZE)
 
 /*
- * Returns information about the number of available memory pages,
- * including physical and virtual ones.
- *
- * Remember to remove TMPFS_PAGES_RESERVED from the returned value to avoid
- * excessive memory usage.
- *
+ * Number of reserved swap pages should not be lower than
+ * swap_pager_almost_full high water mark.
  */
-static __inline size_t
-tmpfs_mem_info(void)
-{
+#define TMPFS_SWAP_MINRESERVED		1024
 
-	return (swap_pager_avail + cnt.v_free_count + cnt.v_cache_count);
-}
+size_t tmpfs_mem_avail(void);
 
-/* Returns the maximum size allowed for a tmpfs file system.  This macro
- * must be used instead of directly retrieving the value from tm_pages_max.
- * The reason is that the size of a tmpfs file system is dynamic: it lets
- * the user store files as long as there is enough free memory (including
- * physical memory and swap space).  Therefore, the amount of memory to be
- * used is either the limit imposed by the user during mount time or the
- * amount of available memory, whichever is lower.  To avoid consuming all
- * the memory for a given mount point, the system will always reserve a
- * minimum of TMPFS_PAGES_RESERVED pages, which is also taken into account
- * by this macro (see above). */
 static __inline size_t
-TMPFS_PAGES_MAX(struct tmpfs_mount *tmp)
+tmpfs_pages_used(struct tmpfs_mount *tmp)
 {
-	size_t freepages;
-
-	freepages = tmpfs_mem_info();
-	freepages -= freepages < TMPFS_PAGES_RESERVED ?
-	    freepages : TMPFS_PAGES_RESERVED;
+	const size_t node_size = sizeof(struct tmpfs_node) +
+	    sizeof(struct tmpfs_dirent);
+	size_t meta_pages;
 
-	return MIN(tmp->tm_pages_max, freepages + tmp->tm_pages_used);
+	meta_pages = howmany((uintmax_t)tmp->tm_nodes_inuse * node_size,
+	    PAGE_SIZE);
+	return (meta_pages + tmp->tm_pages_used);
 }
 
-/* Returns the available space for the given file system. */
-#define TMPFS_META_PAGES(tmp) (howmany((tmp)->tm_nodes_inuse * (sizeof(struct tmpfs_node) \
-				+ sizeof(struct tmpfs_dirent)), PAGE_SIZE))
-#define TMPFS_FILE_PAGES(tmp) ((tmp)->tm_pages_used)
-
-#define TMPFS_PAGES_AVAIL(tmp) (TMPFS_PAGES_MAX(tmp) > \
-			TMPFS_META_PAGES(tmp)+TMPFS_FILE_PAGES(tmp)? \
-			TMPFS_PAGES_MAX(tmp) - TMPFS_META_PAGES(tmp) \
-			- TMPFS_FILE_PAGES(tmp):0)
-
 #endif
 
 /* --------------------------------------------------------------------- */
diff --git a/sys/fs/tmpfs/tmpfs_subr.c b/sys/fs/tmpfs/tmpfs_subr.c
index fe596aa..5123fcc 100644
--- a/sys/fs/tmpfs/tmpfs_subr.c
+++ b/sys/fs/tmpfs/tmpfs_subr.c
@@ -59,6 +59,76 @@ __FBSDID("$FreeBSD$");
 
 SYSCTL_NODE(_vfs, OID_AUTO, tmpfs, CTLFLAG_RW, 0, "tmpfs file system");
 
+static long tmpfs_swap_reserved = TMPFS_SWAP_MINRESERVED * 2;
+
+static long tmpfs_pages_reserved = TMPFS_PAGES_MINRESERVED;
+
+static int
+sysctl_mem_reserved(SYSCTL_HANDLER_ARGS)
+{
+	int error;
+	long pages, bytes, reserved;
+
+	pages = *(long *)arg1;
+	bytes = pages * PAGE_SIZE;
+
+	error = sysctl_handle_long(oidp, &bytes, 0, req);
+	if (error || !req->newptr)
+		return (error);
+
+	pages = bytes / PAGE_SIZE;
+	if (arg1 == &tmpfs_swap_reserved)
+		reserved = TMPFS_SWAP_MINRESERVED;
+	else
+		reserved = TMPFS_PAGES_MINRESERVED;
+	if (pages < reserved)
+		return (EINVAL);
+
+	*(long *)arg1 = pages;
+	return (0);
+}
+
+SYSCTL_PROC(_vfs_tmpfs, OID_AUTO, memory_reserved, CTLTYPE_LONG|CTLFLAG_RW,
+    &tmpfs_pages_reserved, 0, sysctl_mem_reserved, "L", "reserved memory");
+SYSCTL_PROC(_vfs_tmpfs, OID_AUTO, swap_reserved, CTLTYPE_LONG|CTLFLAG_RW,
+    &tmpfs_swap_reserved, 0, sysctl_mem_reserved, "L", "reserved swap memory");
+
+size_t
+tmpfs_mem_avail(void)
+{
+	vm_ooffset_t avail_swap, avail_mem;
+
+	avail_swap = swap_pager_avail - tmpfs_swap_reserved;
+	if (__predict_false(avail_swap <= 0)) {
+		/* FIXME No swap or disabled swap check */
+		if (swap_pager_avail == 0)
+			avail_swap = 0;
+		else
+			return (0);
+	}
+	avail_mem = cnt.v_free_count + cnt.v_cache_count - tmpfs_pages_reserved;
+	if (__predict_false(avail_mem < 0))
+		avail_mem = 0;
+	return (avail_swap + avail_mem);
+}
+
+static size_t
+tmpfs_pages_check_avail(struct tmpfs_mount *tmp, size_t req_pages)
+{
+	size_t avail;
+
+	avail = tmpfs_mem_avail();
+	if (avail < req_pages)
+		return (0);
+
+	if (tmp->tm_pages_max != SIZE_MAX)
+		avail = tmp->tm_pages_max - tmpfs_pages_used(tmp);
+		if (avail < req_pages)
+			return (0);
+
+	return (1);
+}
+
 /* --------------------------------------------------------------------- */
 
 /*
@@ -99,6 +169,8 @@ tmpfs_alloc_node(struct tmpfs_mount *tmp, enum vtype type,
 
 	if (tmp->tm_nodes_inuse >= tmp->tm_nodes_max)
 		return (ENOSPC);
+	if (tmpfs_pages_check_avail(tmp, 1) == 0)
+		return (ENOSPC);
 
 	nnode = (struct tmpfs_node *)uma_zalloc_arg(
 				tmp->tm_node_pool, tmp, M_WAITOK);
@@ -917,7 +989,7 @@ tmpfs_reg_resize(struct vnode *vp, off_t newsize, boolean_t ignerr)
 	MPASS(oldpages == uobj->size);
 	newpages = OFF_TO_IDX(newsize + PAGE_MASK);
 	if (newpages > oldpages &&
-	    newpages - oldpages > TMPFS_PAGES_AVAIL(tmp))
+	    tmpfs_pages_check_avail(tmp, newpages - oldpages) == 0)
 		return (ENOSPC);
 
 	VM_OBJECT_LOCK(uobj);
diff --git a/sys/fs/tmpfs/tmpfs_vfsops.c b/sys/fs/tmpfs/tmpfs_vfsops.c
index e04c410..32adc71 100644
--- a/sys/fs/tmpfs/tmpfs_vfsops.c
+++ b/sys/fs/tmpfs/tmpfs_vfsops.c
@@ -90,6 +90,8 @@ tmpfs_node_ctor(void *mem, int size, void *arg, int flags)
 	struct tmpfs_node *node = (struct tmpfs_node *)mem;
 
 	node->tn_gen++;
+	if (node->tn_gen == 0)
+		node->tn_gen = (arc4random() / 2) + 1;
 	node->tn_size = 0;
 	node->tn_status = 0;
 	node->tn_flags = 0;
@@ -114,7 +116,7 @@ tmpfs_node_init(void *mem, int size, int flags)
 	node->tn_id = 0;
 
 	mtx_init(&node->tn_interlock, "tmpfs node interlock", NULL, MTX_DEF);
-	node->tn_gen = arc4random();
+	node->tn_gen = (arc4random() / 2) + 1;
 
 	return (0);
 }
@@ -127,17 +129,59 @@ tmpfs_node_fini(void *mem, int size)
 	mtx_destroy(&node->tn_interlock);
 }
 
+/*
+ * XXX Rename to vfs_getopt_size()
+ */
+static int
+tmpfs_getopt_size(struct vfsoptlist *opts, const char *name, u_quad_t *data)
+{
+	char *opt_value, *vtp;
+	quad_t	iv;
+	int error, opt_len;
+
+	error = vfs_getopt(opts, name, (void **)&opt_value, &opt_len);
+	if (error != 0)
+		return (error);
+	if (opt_len == 0 || opt_value == NULL)
+		return (EINVAL);
+	if (opt_value[0] == '\0' || opt_value[opt_len - 1] != '\0')
+		return (EINVAL);
+
+	iv = strtoq(opt_value, &vtp, 0);
+	if (vtp == opt_value || (vtp[0] != '\0' && vtp[1] != '\0'))
+		return (EINVAL);
+	if (iv < 0)
+		return (EINVAL);
+	switch (vtp[0]) {
+	case 't': case 'T':
+		iv *= 1024;
+	case 'g': case 'G':
+		iv *= 1024;
+	case 'm': case 'M':
+		iv *= 1024;
+	case 'k': case 'K':
+		iv *= 1024;
+	case '\0':
+		break;
+	default:
+		return (EINVAL);
+	}
+	*data = iv;
+
+	return (0);
+}
+
 static int
 tmpfs_mount(struct mount *mp)
 {
+	const size_t nodes_per_page = howmany(PAGE_SIZE,
+	    sizeof(struct tmpfs_dirent) + sizeof(struct tmpfs_node));
 	struct tmpfs_mount *tmp;
 	struct tmpfs_node *root;
-	size_t pages;
-	uint32_t nodes;
 	int error;
 	/* Size counters. */
-	u_int nodes_max;
-	u_quad_t size_max, maxfilesize;
+	u_quad_t pages;
+	u_quad_t nodes_max, size_max, maxfilesize;
 
 	/* Root node attributes. */
 	uid_t root_uid;
@@ -173,17 +217,16 @@ tmpfs_mount(struct mount *mp)
 	if (mp->mnt_cred->cr_ruid != 0 ||
 	    vfs_scanopt(mp->mnt_optnew, "mode", "%ho", &root_mode) != 1)
 		root_mode = va.va_mode;
-	if (vfs_scanopt(mp->mnt_optnew, "inodes", "%u", &nodes_max) != 1)
+	if (tmpfs_getopt_size(mp->mnt_optnew, "inodes", &nodes_max) != 0)
 		nodes_max = 0;
-	if (vfs_scanopt(mp->mnt_optnew, "size", "%qu", &size_max) != 1)
+	if (tmpfs_getopt_size(mp->mnt_optnew, "size", &size_max) != 0)
 		size_max = 0;
-	if (vfs_scanopt(mp->mnt_optnew, "maxfilesize", "%qu",
-	    &maxfilesize) != 1)
+	if (tmpfs_getopt_size(mp->mnt_optnew, "maxfilesize", &maxfilesize) != 0)
 		maxfilesize = 0;
 
 	/* Do not allow mounts if we do not have enough memory to preserve
 	 * the minimum reserved pages. */
-	if (tmpfs_mem_info() < TMPFS_PAGES_RESERVED)
+	if (tmpfs_mem_avail() < TMPFS_PAGES_MINRESERVED)
 		return ENOSPC;
 
 	/* Get the maximum number of memory pages this file system is
@@ -196,21 +239,27 @@ tmpfs_mount(struct mount *mp)
 		pages = howmany(size_max, PAGE_SIZE);
 	MPASS(pages > 0);
 
+	if (pages < SIZE_MAX / PAGE_SIZE)
+		size_max = pages * PAGE_SIZE;
+	else
+		size_max = SIZE_MAX;
+
 	if (nodes_max <= 3) {
-		if (pages > UINT32_MAX - 3)
-			nodes = UINT32_MAX;
+		if (pages < INT_MAX / nodes_per_page)
+			nodes_max = pages * nodes_per_page;
 		else
-			nodes = pages + 3;
-	} else
-		nodes = nodes_max;
-	MPASS(nodes >= 3);
+			nodes_max = INT_MAX;
+	}
+	if (nodes_max > INT_MAX)
+		nodes_max = INT_MAX;
+	MPASS(nodes_max >= 3);
 
 	/* Allocate the tmpfs mount structure and fill it. */
 	tmp = (struct tmpfs_mount *)malloc(sizeof(struct tmpfs_mount),
 	    M_TMPFSMNT, M_WAITOK | M_ZERO);
 
 	mtx_init(&tmp->allnode_lock, "tmpfs allnode lock", NULL, MTX_DEF);
-	tmp->tm_nodes_max = nodes;
+	tmp->tm_nodes_max = nodes_max;
 	tmp->tm_nodes_inuse = 0;
 	tmp->tm_maxfilesize = maxfilesize > 0 ? maxfilesize : UINT64_MAX;
 	LIST_INIT(&tmp->tm_nodes_used);
@@ -355,16 +404,16 @@ tmpfs_fhtovp(struct mount *mp, struct fid *fhp, int flags,
 	if (tfhp->tf_len != sizeof(struct tmpfs_fid))
 		return EINVAL;
 
-	if (tfhp->tf_id >= tmp->tm_nodes_max)
+	if (tfhp->tf_id > INT_MAX || tfhp->tf_id <= 0)
 		return EINVAL;
 
 	found = FALSE;
 
 	TMPFS_LOCK(tmp);
 	LIST_FOREACH(node, &tmp->tm_nodes_used, tn_entries) {
-		if (node->tn_id == tfhp->tf_id &&
-		    node->tn_gen == tfhp->tf_gen) {
-			found = TRUE;
+		if (node->tn_id == tfhp->tf_id) {
+			if (node->tn_gen == tfhp->tf_gen)
+				found = TRUE;
 			break;
 		}
 	}
@@ -373,7 +422,7 @@ tmpfs_fhtovp(struct mount *mp, struct fid *fhp, int flags,
 	if (found)
 		return (tmpfs_alloc_vp(mp, node, LK_EXCLUSIVE, vpp));
 
-	return (EINVAL);
+	return (ESTALE);
 }
 
 /* --------------------------------------------------------------------- */
@@ -382,22 +431,26 @@ tmpfs_fhtovp(struct mount *mp, struct fid *fhp, int flags,
 static int
 tmpfs_statfs(struct mount *mp, struct statfs *sbp)
 {
-	fsfilcnt_t freenodes;
 	struct tmpfs_mount *tmp;
+	size_t used;
 
 	tmp = VFS_TO_TMPFS(mp);
 
 	sbp->f_iosize = PAGE_SIZE;
 	sbp->f_bsize = PAGE_SIZE;
 
-	sbp->f_blocks = TMPFS_PAGES_MAX(tmp);
-	sbp->f_bavail = sbp->f_bfree = TMPFS_PAGES_AVAIL(tmp);
-
-	freenodes = MIN(tmp->tm_nodes_max - tmp->tm_nodes_inuse,
-	    TMPFS_PAGES_AVAIL(tmp) * PAGE_SIZE / sizeof(struct tmpfs_node));
-
-	sbp->f_files = freenodes + tmp->tm_nodes_inuse;
-	sbp->f_ffree = freenodes;
+	used = tmpfs_pages_used(tmp);
+	if (tmp->tm_pages_max != SIZE_MAX)
+		 sbp->f_blocks = tmp->tm_pages_max;
+	else
+		 sbp->f_blocks = used + tmpfs_mem_avail();
+	if (sbp->f_blocks <= used)
+		sbp->f_bavail = 0;
+	else
+		sbp->f_bavail = sbp->f_blocks - used;
+	sbp->f_bfree = sbp->f_bavail;
+	sbp->f_files = tmp->tm_nodes_max;
+	sbp->f_ffree = tmp->tm_nodes_max - tmp->tm_nodes_inuse;
 	/* sbp->f_owner = tmp->tn_uid; */
 
 	return 0;

--MGYHOYXEY6WxJCY8--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120402103119.GA2389>