Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 16 Feb 2002 14:21:57 -0500
From:      Bill Wells <bill@twwells.com>
To:        FreeBSD-gnats-submit@freebsd.org
Subject:   kern/35004: [PATCH] Fix for pcm driver lockups
Message-ID:  <E16cAPF-0002Qk-00@bill.twwells.com>

next in thread | raw e-mail | index | archive | help

>Number:         35004
>Category:       kern
>Synopsis:       [PATCH] Fix for pcm driver lockups
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Feb 16 11:30:01 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     Bill Wells
>Release:        FreeBSD 4.5-STABLE i386
>Organization:
>Environment:
System: FreeBSD bill.twwells.com 4.5-STABLE FreeBSD 4.5-STABLE #0: Sat Feb 16 04:01:38 EST 2002 toor@bill.twwells.com:/usr/obj/usr/src/sys/BILL i386


	
>Description:
	There is a race condition in pcm that will cause it to
	lock up in certain circumstances. When the driver is
	locked up, all further attempts at accessing it result in
	"Device busy". (NB: The -current fixes don't correct
	this.)

>How-To-Repeat:

	Run this:

	while :; do cp beep.au /dev/audio; done

	And while it is running, enter this:

	cp beep.au /dev/audio

	(beep.au can be anything, but the shorter it is the
	better.)

>Fix:
	The appended patches (against stable) fix this problem and
	a few others as well. Before the patches (use patch -l and
	you'll probably have to fix the spacing by hand), I'll
	explain what they're about.

	The first patch, against sound.c, simply adds some
	newlines to certain debugging messages. The second patch
	involves a number of changes to resolve the lockup problem
	and other problems.

	First, this bit of code in dsp_open (and its equivalent
	for wrch), sets up the channel. Its main problem is that
	the reference count is not updated for a newly opened
	channel but is instead updated if the channel had been
	previously opened. (E.g., an open to read will not
	increment the count but a following call to write will
	cause the *read* count to be incremented.) These reference
	count problems have the effect of preventing kldload from
	unloading the driver under certain circumstances.

	if (rdch) {
		if (flags & FREAD) {
			chn_reset(rdch, fmt);
			if (flags & O_NONBLOCK)
				rdch->flags |= CHN_F_NBIO;
		} else {
			CHN_LOCK(rdch);
			pcm_chnref(rdch, 1);
		}
		CHN_UNLOCK(rdch);
	}

	Here's the replacement code. This code is much simpler
	because it takes advantage of a couple of facts. First, if
	FREAD is set, rdch (i_dev->si_drv1) *must* have previously
	been null and *must* be non-null now. So, no need to test
	it. Also, the channel that was allocated is already
	locked, so no need to do that again.

	if (flags & FREAD) {
		chn_reset(rdch, fmt);
		if (flags & O_NONBLOCK)
			rdch->flags |= CHN_F_NBIO;
		pcm_chnref(rdch, 1);
		CHN_UNLOCK(rdch);
	}

	The remaining changes are in dsp_close. This bit of code
	(and the equivalent wrch code) leaves the channel locked
	if the reference count doesn't go to zero. That means that
	the channel remains locked when dsp_close exits; I don't
	think that's intended. The unlock goes afer the "if", not
	inside it.

	if (rdch) {
		CHN_LOCK(rdch);
		if (pcm_chnref(rdch, -1) > 0) {
			CHN_UNLOCK(rdch);
			exit = 1;
		}
	}

	Note this comment that I've included in the patch. If the
	answer is "no", then all is good and the comment can go
	away. Otherwise, it's necessary to someday fix the problem
	and the comment should be retained in the code.

	/* XXX And what happens if one of the channels had 2 references and
	   the other has but one? The latter won't get reset. Can that
	   happen? */

	In the original code, these twe lines are executed if both
	of the reference counts go to zero but not if either is
	nonzero. If the reference counts are supposed to indicate
	the number of references from si_drv?'s, that's wrong;
	these need to be nulled out regardless of what the
	reference counts do.

	i_dev->si_drv1 = NULL;
	i_dev->si_drv2 = NULL;

	So, one fix involving these lines is to add them to the
	"if (exit)" code. This ensures that the reference counts
	correspond to the references.

	However, that's not the only problem with these two lines.
	In the original code, the fields are cleared before the
	abort and flush. In my patch, they are cleared after it.
	This misplacement is the actual cause of the lockups.

*** sound.c.orig        Fri Feb 15 16:44:01 2002
--- sound.c     Sat Feb 16 03:28:30 2002
***************
*** 416,434 ****

	snd_mtxlock(d->lock);
	if (d->inprog) {
!               device_printf(dev, "unregister: operation in progress");
		snd_mtxunlock(d->lock);
		return EBUSY;
	}
	SLIST_FOREACH(sce, &d->channels, link) {
		if (sce->channel->refcount > 0) {
!                       device_printf(dev, "unregister: channel busy");
			snd_mtxunlock(d->lock);
			return EBUSY;
		}
	}
	if (mixer_uninit(dev)) {
!               device_printf(dev, "unregister: mixer busy");
		snd_mtxunlock(d->lock);
		return EBUSY;
	}
--- 416,434 ----

	snd_mtxlock(d->lock);
	if (d->inprog) {
!               device_printf(dev, "unregister: operation in progress\n");
		snd_mtxunlock(d->lock);
		return EBUSY;
	}
	SLIST_FOREACH(sce, &d->channels, link) {
		if (sce->channel->refcount > 0) {
!                       device_printf(dev, "unregister: channel busy\n");
			snd_mtxunlock(d->lock);
			return EBUSY;
		}
	}
	if (mixer_uninit(dev)) {
!               device_printf(dev, "unregister: mixer busy\n");
		snd_mtxunlock(d->lock);
		return EBUSY;
	}


*** dsp.c.orig  Fri Feb 15 16:28:58 2002
--- dsp.c       Sat Feb 16 03:33:40 2002
***************
*** 240,265 ****
	/* finished with snddev, new channels still locked */

	/* bump refcounts, reset and unlock any channels that we just opened */
-       if (rdch) {
		if (flags & FREAD) {
			chn_reset(rdch, fmt);
			if (flags & O_NONBLOCK)
				rdch->flags |= CHN_F_NBIO;
-               } else {
-                       CHN_LOCK(rdch);
			pcm_chnref(rdch, 1);
-               }
		CHN_UNLOCK(rdch);
	}
-       if (wrch) {
		if (flags & FWRITE) {
			chn_reset(wrch, fmt);
			if (flags & O_NONBLOCK)
				wrch->flags |= CHN_F_NBIO;
-               } else {
-                       CHN_LOCK(wrch);
			pcm_chnref(wrch, 1);
-               }
		CHN_UNLOCK(wrch);
	}
	splx(s);
--- 240,257 ----
***************
*** 286,316 ****
	if (rdch) {
		CHN_LOCK(rdch);
		if (pcm_chnref(rdch, -1) > 0) {
-                       CHN_UNLOCK(rdch);
			exit = 1;
		}
	}
	if (wrch) {
		CHN_LOCK(wrch);
		if (pcm_chnref(wrch, -1) > 0) {
-                       CHN_UNLOCK(wrch);
			exit = 1;
		}
	}
	if (exit) {
		snd_mtxunlock(d->lock);
		splx(s);
		return 0;
	}
-
	/* both refcounts are zero, abort and release */

	if (d->fakechan)
		d->fakechan->flags = 0;

-       i_dev->si_drv1 = NULL;
-       i_dev->si_drv2 = NULL;
-
	d->flags &= ~SD_F_TRANSIENT;
	snd_mtxunlock(d->lock);

--- 278,310 ----
	if (rdch) {
		CHN_LOCK(rdch);
		if (pcm_chnref(rdch, -1) > 0) {
			exit = 1;
		}
+               CHN_UNLOCK(rdch);
	}
	if (wrch) {
		CHN_LOCK(wrch);
		if (pcm_chnref(wrch, -1) > 0) {
			exit = 1;
		}
+               CHN_UNLOCK(wrch);
	}
+       /* XXX And what happens if one of the channels had 2 references and
+          the other has but one? The latter won't get reset. Can that
+          happen? */
+
	if (exit) {
+               i_dev->si_drv1 = NULL;
+               i_dev->si_drv2 = NULL;
		snd_mtxunlock(d->lock);
		splx(s);
		return 0;
	}
	/* both refcounts are zero, abort and release */

	if (d->fakechan)
		d->fakechan->flags = 0;

	d->flags &= ~SD_F_TRANSIENT;
	snd_mtxunlock(d->lock);

***************
*** 326,331 ****
--- 320,327 ----
		chn_reset(wrch, 0);
		pcm_chnrelease(wrch);
	}
+       i_dev->si_drv1 = NULL;
+       i_dev->si_drv2 = NULL;

	splx(s);
	return 0;
>Release-Note:
>Audit-Trail:
>Unformatted:

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E16cAPF-0002Qk-00>