Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Sep 1995 15:26:22 -0400 (EDT)
From:      "Rashid Karimov." <rashid@haven.ios.com>
To:        hackers@freebsd.org
Subject:   QUOTAs code causes WEIRD system locks in 210Stable
Message-ID:  <199509061926.PAA27707@haven.ios.com>

next in thread | raw e-mail | index | archive | help
		Hi there folx,

	looks like we're getting closer to the biggest problem
	the Free BSD faces with on the servers market:
	system locks. Random and confusing.

	A long time ago I was advised by Terry (Lambert) that
	the QUOTAs implementation in FreeBSD could cause the
	system lockup.

	Citation:

>From looking at the quota code, it looks like it may not take the
dev_t field into account when computing quotas, which means that it
doesn't use the right mount record to locate the quota file.  So
it locks entrancy across file systems, but doesn't compute transitive
closure over the directed graph which is the set of locks held in
all file systems.

Really, it should per-fs lock to guarantee reentrancy.

Probably this is a 10-15 line fix in the quota code alone, if someone
spends the time working it out (I've just taken a quick pass through
the code that does vonde-based I/O that the quota stuff uses, and
I didn't see any obvious bugs).       

	=-=-=-= End of citation =-=-=-=

	But it happened that starting with 205 we all got working QUOTAs
	and they still work with 2.1Stable  kind of fine. But:

	on certain servers here ( they all pretty much the same - 
	P90-120 PCI Adaptecs/SMC EtherPowers,4000 users) I get often locks.
	System literally dies. And having now DDB compiled I was
	able to see what's going on when the systems lock.

	A LOT of processes are sleeping on wmesg "ufslk2" with stat of 3
	and flag of 00004. Saying a lot I mean probably 100-200 of them.

	I tried to see what was on the caddr_t they were sleeping at ( that
	should be i_node struct) and I got dev_t = 400 and inode = 2 (!!?).

	Looks like those processes went to sleep from ufs_lock() func.
	("ufs_vnops.c" file).

	What makes me think that was QUOTAs code which  caused this is 
	that in the beginning there were a couple of processes "edquota"
	which I started to change the QUOTAs for a couple of users. 
	When I did ps ( and system was alive at that time) a bit later
	I saw those processes (only!)  were sleeping on the same "ufslk2"
	event.
	I decided to start extra "edquota" process - just for the heck of
	it and system locked up in a minute. When running "ps" from DDB
	later - all those processes the system was running were sleeping
	on the same event ( well, almost all of them - probably 90%,
	but there were a few sleeping on the same wmesg but with different
	wait channels).


	So that's about it .... I'm not sure  who was the author of the 
	QUOTAs code and where is the bug exactly. I also know that certain
	ppl here think that the current implementation of QUOTA mechanism
	sucks and aren't willing to change it, voting for rewriting the thing
	form the scratch.


	What should we do about it ? I don't have enough time to dedicate
	to this problem and frankly don't have enough kernel programming
	experience to work on it.
	The same time QUOTAs are a must for FBSD to be used as a user
	server.

	A bit more about that system:

	P90 ASUSP54TP4 motherboard , Adaptec 2940 + 2 SEAGATE BARRACUDAs,
	QUOTAs are on :

	/dev/sd0a           /       ufs rw 1 1
	/dev/sd0s1b         none        swap    sw 0 0
	proc                /proc       procfs  rw 0 0
	/dev/sd0s1e         /usr        ufs rw 1 1
	/dev/sd0s1f         /var        ufs rw,userquota 1 1
	/dev/sd1s1e         /u/u1       ufs rw,userquota 1 1
	/dev/sd1s1f         /u/u2       ufs rw,userquota 1 1
	/dev/sd1s1g         /u/u3       ufs rw,userquota 1 1
	/dev/sd1s1h         /u/u4       ufs rw,userquota 1 1

	Rashid




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199509061926.PAA27707>