From owner-freebsd-bugs Sat Feb 16 11:30:19 2002 Delivered-To: freebsd-bugs@hub.freebsd.org Received: from freefall.freebsd.org (freefall.FreeBSD.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 4703437B416 for ; Sat, 16 Feb 2002 11:30:01 -0800 (PST) Received: (from gnats@localhost) by freefall.freebsd.org (8.11.6/8.11.6) id g1GJU1V24852; Sat, 16 Feb 2002 11:30:01 -0800 (PST) (envelope-from gnats) Received: from mail.junkproof.net (mail.junkproof.net [206.55.70.12]) by hub.freebsd.org (Postfix) with ESMTP id 0D46E37B404 for ; Sat, 16 Feb 2002 11:27:03 -0800 (PST) Received: from mail (helo=mail.junkproof.net) by mail.junkproof.net with local-bsmtp (Exim 3.32 #1) id 16cAW1-0009l3-00 for freebsd-gnats-submit@freebsd.org; Sat, 16 Feb 2002 13:28:57 -0600 Received: from bill.twwells.com ( [68.44.48.161] ) by mail.junkproof.net via tcp with submission id 3c6eb1cf-009224; Sat, 16 Feb 2002 13:23:59 -0600 Received: from bill by bill.twwells.com with local (Exim 3.34 #1) id 16cAPF-0002Qk-00 for FreeBSD-gnats-submit@freebsd.org; Sat, 16 Feb 2002 14:21:57 -0500 Message-Id: Date: Sat, 16 Feb 2002 14:21:57 -0500 From: Bill Wells Reply-To: Bill Wells To: FreeBSD-gnats-submit@freebsd.org X-Send-Pr-Version: 3.113 Subject: kern/35004: [PATCH] Fix for pcm driver lockups Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org >Number: 35004 >Category: kern >Synopsis: [PATCH] Fix for pcm driver lockups >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Feb 16 11:30:01 PST 2002 >Closed-Date: >Last-Modified: >Originator: Bill Wells >Release: FreeBSD 4.5-STABLE i386 >Organization: >Environment: System: FreeBSD bill.twwells.com 4.5-STABLE FreeBSD 4.5-STABLE #0: Sat Feb 16 04:01:38 EST 2002 toor@bill.twwells.com:/usr/obj/usr/src/sys/BILL i386 >Description: There is a race condition in pcm that will cause it to lock up in certain circumstances. When the driver is locked up, all further attempts at accessing it result in "Device busy". (NB: The -current fixes don't correct this.) >How-To-Repeat: Run this: while :; do cp beep.au /dev/audio; done And while it is running, enter this: cp beep.au /dev/audio (beep.au can be anything, but the shorter it is the better.) >Fix: The appended patches (against stable) fix this problem and a few others as well. Before the patches (use patch -l and you'll probably have to fix the spacing by hand), I'll explain what they're about. The first patch, against sound.c, simply adds some newlines to certain debugging messages. The second patch involves a number of changes to resolve the lockup problem and other problems. First, this bit of code in dsp_open (and its equivalent for wrch), sets up the channel. Its main problem is that the reference count is not updated for a newly opened channel but is instead updated if the channel had been previously opened. (E.g., an open to read will not increment the count but a following call to write will cause the *read* count to be incremented.) These reference count problems have the effect of preventing kldload from unloading the driver under certain circumstances. if (rdch) { if (flags & FREAD) { chn_reset(rdch, fmt); if (flags & O_NONBLOCK) rdch->flags |= CHN_F_NBIO; } else { CHN_LOCK(rdch); pcm_chnref(rdch, 1); } CHN_UNLOCK(rdch); } Here's the replacement code. This code is much simpler because it takes advantage of a couple of facts. First, if FREAD is set, rdch (i_dev->si_drv1) *must* have previously been null and *must* be non-null now. So, no need to test it. Also, the channel that was allocated is already locked, so no need to do that again. if (flags & FREAD) { chn_reset(rdch, fmt); if (flags & O_NONBLOCK) rdch->flags |= CHN_F_NBIO; pcm_chnref(rdch, 1); CHN_UNLOCK(rdch); } The remaining changes are in dsp_close. This bit of code (and the equivalent wrch code) leaves the channel locked if the reference count doesn't go to zero. That means that the channel remains locked when dsp_close exits; I don't think that's intended. The unlock goes afer the "if", not inside it. if (rdch) { CHN_LOCK(rdch); if (pcm_chnref(rdch, -1) > 0) { CHN_UNLOCK(rdch); exit = 1; } } Note this comment that I've included in the patch. If the answer is "no", then all is good and the comment can go away. Otherwise, it's necessary to someday fix the problem and the comment should be retained in the code. /* XXX And what happens if one of the channels had 2 references and the other has but one? The latter won't get reset. Can that happen? */ In the original code, these twe lines are executed if both of the reference counts go to zero but not if either is nonzero. If the reference counts are supposed to indicate the number of references from si_drv?'s, that's wrong; these need to be nulled out regardless of what the reference counts do. i_dev->si_drv1 = NULL; i_dev->si_drv2 = NULL; So, one fix involving these lines is to add them to the "if (exit)" code. This ensures that the reference counts correspond to the references. However, that's not the only problem with these two lines. In the original code, the fields are cleared before the abort and flush. In my patch, they are cleared after it. This misplacement is the actual cause of the lockups. *** sound.c.orig Fri Feb 15 16:44:01 2002 --- sound.c Sat Feb 16 03:28:30 2002 *************** *** 416,434 **** snd_mtxlock(d->lock); if (d->inprog) { ! device_printf(dev, "unregister: operation in progress"); snd_mtxunlock(d->lock); return EBUSY; } SLIST_FOREACH(sce, &d->channels, link) { if (sce->channel->refcount > 0) { ! device_printf(dev, "unregister: channel busy"); snd_mtxunlock(d->lock); return EBUSY; } } if (mixer_uninit(dev)) { ! device_printf(dev, "unregister: mixer busy"); snd_mtxunlock(d->lock); return EBUSY; } --- 416,434 ---- snd_mtxlock(d->lock); if (d->inprog) { ! device_printf(dev, "unregister: operation in progress\n"); snd_mtxunlock(d->lock); return EBUSY; } SLIST_FOREACH(sce, &d->channels, link) { if (sce->channel->refcount > 0) { ! device_printf(dev, "unregister: channel busy\n"); snd_mtxunlock(d->lock); return EBUSY; } } if (mixer_uninit(dev)) { ! device_printf(dev, "unregister: mixer busy\n"); snd_mtxunlock(d->lock); return EBUSY; } *** dsp.c.orig Fri Feb 15 16:28:58 2002 --- dsp.c Sat Feb 16 03:33:40 2002 *************** *** 240,265 **** /* finished with snddev, new channels still locked */ /* bump refcounts, reset and unlock any channels that we just opened */ - if (rdch) { if (flags & FREAD) { chn_reset(rdch, fmt); if (flags & O_NONBLOCK) rdch->flags |= CHN_F_NBIO; - } else { - CHN_LOCK(rdch); pcm_chnref(rdch, 1); - } CHN_UNLOCK(rdch); } - if (wrch) { if (flags & FWRITE) { chn_reset(wrch, fmt); if (flags & O_NONBLOCK) wrch->flags |= CHN_F_NBIO; - } else { - CHN_LOCK(wrch); pcm_chnref(wrch, 1); - } CHN_UNLOCK(wrch); } splx(s); --- 240,257 ---- *************** *** 286,316 **** if (rdch) { CHN_LOCK(rdch); if (pcm_chnref(rdch, -1) > 0) { - CHN_UNLOCK(rdch); exit = 1; } } if (wrch) { CHN_LOCK(wrch); if (pcm_chnref(wrch, -1) > 0) { - CHN_UNLOCK(wrch); exit = 1; } } if (exit) { snd_mtxunlock(d->lock); splx(s); return 0; } - /* both refcounts are zero, abort and release */ if (d->fakechan) d->fakechan->flags = 0; - i_dev->si_drv1 = NULL; - i_dev->si_drv2 = NULL; - d->flags &= ~SD_F_TRANSIENT; snd_mtxunlock(d->lock); --- 278,310 ---- if (rdch) { CHN_LOCK(rdch); if (pcm_chnref(rdch, -1) > 0) { exit = 1; } + CHN_UNLOCK(rdch); } if (wrch) { CHN_LOCK(wrch); if (pcm_chnref(wrch, -1) > 0) { exit = 1; } + CHN_UNLOCK(wrch); } + /* XXX And what happens if one of the channels had 2 references and + the other has but one? The latter won't get reset. Can that + happen? */ + if (exit) { + i_dev->si_drv1 = NULL; + i_dev->si_drv2 = NULL; snd_mtxunlock(d->lock); splx(s); return 0; } /* both refcounts are zero, abort and release */ if (d->fakechan) d->fakechan->flags = 0; d->flags &= ~SD_F_TRANSIENT; snd_mtxunlock(d->lock); *************** *** 326,331 **** --- 320,327 ---- chn_reset(wrch, 0); pcm_chnrelease(wrch); } + i_dev->si_drv1 = NULL; + i_dev->si_drv2 = NULL; splx(s); return 0; >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message