Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Sep 1995 13:33:35 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        rashid@haven.ios.com (Rashid Karimov.)
Cc:        hackers@FreeBSD.ORG
Subject:   Re: QUOTAs code causes WEIRD system locks in 210Stable
Message-ID:  <199509062033.NAA00669@phaeton.artisoft.com>
In-Reply-To: <199509061926.PAA27707@haven.ios.com> from "Rashid Karimov." at Sep 6, 95 03:26:22 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> 	A LOT of processes are sleeping on wmesg "ufslk2" with stat of 3
> 	and flag of 00004. Saying a lot I mean probably 100-200 of them.

It's a deadlock between VOP_LOCK and VFS_LOCK operations.  The lock
graph is not heirarchical, and it needs to be, or it's impossible to
compute transitive closure over the graph.

You can effectively flatten the graph by putting the quota file in the
root of the file system to which the quota entries themselves apply.  This
will avoid the a->B b->A dealy embrace deadlock caused by directory
traversal not unlocking and backing off to the root on failure (also
fixable by asserting the VFS_LOCK and waiting for all VOP_LOCK locks
to drain out before granting the  VFS_LOCK request to the caller).

This leaves A->a a->A and A->a a->B b->A deadlock possible.  To avoid
that, don't run quotas in a submount of a file system that has quotas.
ie:

			/		<- quotas NOT enabled
			|
			|
			/usr		<- quotas enabled
			   |
			   |
			   /usr/home	<- quotas NOT enabled

This is because the quota files are rooted relative to the system root,
and you must directory traverse in order to get to the quota file.

Probably quota editing wants to occur as file system operations other
than read/write's on quota files themselves... a system call interface
that calls the file system quota ops.

> 	I tried to see what was on the caddr_t they were sleeping at ( that
> 	should be i_node struct) and I got dev_t = 400 and inode = 2 (!!?).

Inode 2 is the / inode for any file system.  If /usr is a file system
mounted on /usr, a subdirectory of /, then the mount point traversal
puts you in inode 2 on the /usr file system.  Its the parent/child
lock a/lock A/unlock a/lock b/unlock A/lock B/unlock b

To get 'B' that causes the deadlock over the mount point traversal.  The
parent vnode prior to the traversal is not a recoverable unlock on failure
to traverse because of the VFS_LOCK.

So this is the expected behaviour for a quota'd FS mounted on a quota'd
FS at mount point traversal time (or traversal from the FS containing
the quota file for the quota'd FS to the quota'ed FS).

> 	What makes me think that was QUOTAs code which  caused this is 
> 	that in the beginning there were a couple of processes "edquota"
> 	which I started to change the QUOTAs for a couple of users. 

This should be prevented, or the VFS_LOCK on the quota'ed FS and the
VOP_LOCK on the quota'ed FS's quota file should be a/A/b/B locked in
heirarchy and the lock not granted until the other process closes the
quota file.

> 	So that's about it .... I'm not sure  who was the author of the 
> 	QUOTAs code and where is the bug exactly.

The QUOTA code is from the original BSD FFS/UFS sources.

>	I also know that certain ppl here think that the current
>	implementation of QUOTA mechanism sucks and aren't willing to
>	change it, voting for rewriting the thing form the scratch.

A certain amount of rewriting is inevitable.  Whether this takes the
form of enforcing the placement of the quota file in the root of the
FS being quota'd and minor mods to VFS_LOCK, or a full rewrite and
the importation of a hierarchical lock manager (should be there anyway
for return EWOULDBLOCK for embrace deadlock on flock() operations), is
really immaterial.

My vote for a rewrite would make the quota code a stackable layer, where
you mount /usr on /usr and do quota file I/O so you can put quotas on DOS
and other partitions.

Currently, I do not have time for this.  I'm chasing issues in the namei()
and lookup() code and in the unionfs and portalfs use of the bogus
nameidata fields for consumption of path components, and the expansion
of symlinks into the pathname buffer in place causing NFS mounted symlinks
to exceed MAXPATHLEN depending on mount depth differring from the source
host's FS.

I might have time for this after FreeBSD runs on a couple platforms
and supports SMP and kernel level file system multithreading.  8-(.

> 	What should we do about it ? I don't have enough time to dedicate
> 	to this problem and frankly don't have enough kernel programming
> 	experience to work on it.
> 	The same time QUOTAs are a must for FBSD to be used as a user
> 	server.

Limit the way in which you use quotas.  Don't run multiple instances of
edquota.  Turn quota's off before running edquota and back on when you
are done editing.  Use the quotactl(2) interface to turn quotas on using
specific file paths to put the quota files on the drives where quotas
are being enforced (ie: get rid of the 'userquota' option and turn
them on manually per fs in your /etc/rc after they've been mounted.
That should keep you away from at least the known failure modes.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199509062033.NAA00669>