From owner-freebsd-current@FreeBSD.ORG Wed Aug 11 03:29:41 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2CCE916A4CE; Wed, 11 Aug 2004 03:29:41 +0000 (GMT) Received: from tethys.ringofsaturn.com (tethys.ringofsaturn.com [66.13.175.242]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7200D43D1D; Wed, 11 Aug 2004 03:29:40 +0000 (GMT) (envelope-from rnejdl@ringofsaturn.com) Received: from mail.ringofsaturn.com (localhost [127.0.0.1]) i7B3TYRN004594; Tue, 10 Aug 2004 22:29:34 -0500 (CDT) (envelope-from rnejdl@ringofsaturn.com) Received: from 66.13.175.242 (SquirrelMail authenticated user rnejdl); by mail.ringofsaturn.com with HTTP; Tue, 10 Aug 2004 22:29:34 -0500 (CDT) Message-ID: <49521.66.13.175.242.1092194974.squirrel@[66.13.175.242]> In-Reply-To: <200408100826.i7A8Qa8H013148@gw.catspoiler.org> References: <44129.12.148.147.242.1092077172.squirrel@[12.148.147.242]> <200408100826.i7A8Qa8H013148@gw.catspoiler.org> Date: Tue, 10 Aug 2004 22:29:34 -0500 (CDT) From: "Rusty Nejdl" To: "Don Lewis" User-Agent: SquirrelMail/1.5.1 [CVS] MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-Virus-Scanned: clamd / ClamAV version 0.75, clamav-milter version 0.75 on tethys.ringofsaturn.com X-Virus-Status: Clean cc: freebsd-current@FreeBSD.org cc: dnelson@allantgroup.com Subject: Re: Is anything being done re: the pcm timeout issue? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: rnejdl@ringofsaturn.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Aug 2004 03:29:41 -0000 >> >> And I have seen that these will eventually stop working one by one >> until I have none left. lsof and fstat don't show any programs using >> them, but nonetheless, programms like xmms and gaim can't use them >> anymore. Well, try as much as I could, I haven't been able to duplicate this tonight. I've got 4 vchans setup and I was running madplay continuously on 4 channels for 4 hours and it worked the whole time. > > The vchan code is fairly broken. I was hoping to have to some time to > work on this (and other problems in the top half of the sound code) before > 5.3, but it looks like the clock has just about run out. I'm not seeing the locked channels yet, but that doesn't mean that they aren't there. > > >> Do you have any more details on the pcm play timeout? Are you using >> vchans? What program are you using? > > My suspicion is that there is either a problem in ich_intr() that it > causing it to stop receiving interrupts or to stop calling chn_intr(), or > there is enough interrupt latency to allow the DMA pointer to wrap and > fool chn_dmaupdate() into thinking no data was consumed. It is possible > that the ich_intr() problem is specific to amd64. > > I previously sent out these suggestions on how to debug the problem: I remembered seeing these, but I'm learning as I go so that is a bit more than I can do at present. > > > ------ Forwarded message ------ > From: Don Lewis > Subject: Re: Questionable code in sys/dev/sound/pcm/channel.c > Date: Tue, 27 Jul 2004 15:15:06 -0700 (PDT) > To: mat@cnd.mcgill.ca > Cc: freebsd-current@freebsd.org > > > On 27 Jul, Mathew Kanner wrote: > >> On Jul 26, John-Mark Gurney wrote: >> >>> Conrad J. Sabatier wrote this message on Mon, Jul 26, 2004 at 16:35 >>> -0500: >>> >>>> Why the formulaic calculation of timeout, if it's simply going to >>>> be unconditionally set to 1 immediately afterwards anyway? What's >>>> going on here? >>> >>> Well, if you look at the annotations, that absolute set of timeout >>> was added in rev 1.65 by cg with the comment: tweaks to reduce >>> latency/pauses in output >>> >> >> >> I think this has been raised on the mailling list before. >> IIRC, the logic for this is to check frequently for dead channels but >> CG is the authoriy. >> > > My suspicion is that this change was made to reduce the consequences of > lost wakeups from the interrupt routine. This would have been more of a > problem when tsleep() was used in chn_sleep() and shouldn't be needed now > that the top and bottom halves of the code use the channel lock and > chn_sleep() uses msleep() to atomically release the lock and wait for the > wakeup from the interrupt code. That said, setting timeout to 1 shouldn't > hurt anything and will just waste a bit of CPU time. > > >>>> Also, at the end of the function: >>>> >>>> >>>> if (count <= 0) { c->flags |= CHN_F_DEAD; printf("%s: play interrupt >>>> timeout, channel dead\n", c->name); } >>>> >>>> >>>> return ret; } >>>> >>> >>> that was changed in rev1.52 (by cg also), and previously was just a >>> check for count == 0.. >>> >>> So, I'd recommend a message off to cg and ask why he made this >>> changes... > > The original version of the code always set timeout to 1 and looped on > (count > 0), so count could never go negative. When the code was > changed to set count to something larger than 1, count could go negative if > (hz % timeout != 0), so the condition for setting CHN_F_DEAD had to > be modified accordingly. > > My suspicion is that there is sometimes enough latency in executing the > interrupt routine that the hardware DMA pointer is wrapping and > chn_dmaupdate() is calculating delta as zero. This would cause > chn_wrfeed() not to consume any data from the software buffer (and skip > the wakeup()), which might be enough to cause the chn_write() to time out > while waiting for space to become available in the software buffer. It > would be interesting to enable the debug code in chn_dmaupdate(), and add > (delta == 0) as a condition to trigger the device_printf(). > > > The bigger question is what is the cause of the latency ... > > > > ------ Forwarded message ------ > From: Don Lewis > Subject: Re: Questionable code in sys/dev/sound/pcm/channel.c > Date: Tue, 27 Jul 2004 15:21:57 -0700 (PDT) > To: conrads@cox.net > Cc: freebsd-current@freebsd.org > > > On 27 Jul, Conrad J. Sabatier wrote: > >> >> On 26-Jul-2004 Conrad J. Sabatier wrote: >> >>> >>> On 26-Jul-2004 Conrad J. Sabatier wrote: >>> >>>> I'm a little perplexed at the following bit of logic in chn_write() >>>> (which is where the "interrupt timeout, channel dead" messages are >>>> being generated). >> >> [snip] >> >> >>>> Also, at the end of the function: >>>> >>>> >>>> if (count <= 0) { c->flags |= CHN_F_DEAD; printf("%s: play interrupt >>>> timeout, channel dead\n", c->name); } >>>> >>>> >>>> return ret; } >>>> >>>> >>>> Could it be that the conditional test is wrong here? Perhaps >>>> we should be using (count < 0) instead? >>> >>> I'm now running a kernel built with this last conditional test >>> changed to "if (count < 0)" and sound is still working OK. Have yet to >>> see if this eliminates the interrupt timeout messages. >> >> Well, that was a failure. :-) Didn't see any timeout error messages, >> but the device still died eventually, nonetheless. I've since changed >> back to the original code. > > That's an interesting data point. At this point I'd start looking at the > driver code for your sound hardware. I suspect that the driver interrupt > code is either no longer seeing interrupts, or it is no longer calling > chn_intr(). > > >