From owner-freebsd-hardware@FreeBSD.ORG Fri Oct 13 09:12:29 2006 Return-Path: X-Original-To: freebsd-hardware@freebsd.org Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3ABDC16A40F for ; Fri, 13 Oct 2006 09:12:29 +0000 (UTC) (envelope-from jrhett@svcolo.com) Received: from kininvie.sv.svcolo.com (kininvie.sv.svcolo.com [64.13.135.12]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9014043D49 for ; Fri, 13 Oct 2006 09:12:28 +0000 (GMT) (envelope-from jrhett@svcolo.com) Received: from [172.16.12.22] (covad-jrhett.meer.net [209.157.140.144]) (authenticated bits=0) by kininvie.sv.svcolo.com (8.13.8/8.13.4) with ESMTP id k9D9CRA5053453; Fri, 13 Oct 2006 02:12:27 -0700 (PDT) (envelope-from jrhett@svcolo.com) Message-ID: <452F587F.4080108@svcolo.com> Date: Fri, 13 Oct 2006 02:12:31 -0700 From: Jo Rhett Organization: Silicon Valley Colocation User-Agent: Thunderbird 1.5.0.7 (Windows/20060909) MIME-Version: 1.0 To: Bruce Evans References: <20060721000018.GA99237@svcolo.com> <20060721001607.GA64376@megan.kiwi-computer.com> <20060721004731.GC8868@svcolo.com> <20060724154856.I58894@delplex.bde.org> <452EB286.8000503@svcolo.com> <20061013152714.Y49451@delplex.bde.org> In-Reply-To: <20061013152714.Y49451@delplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Bayes-Prob: 0.0001 (Score 0) X-Spam-Score: 0.00 () [Tag at 3.50] X-CanItPRO-Stream: default X-Canit-Stats-ID: 4087 - ce14857022cf X-Scanned-By: CanIt (www . roaringpenguin . com) on 64.13.135.12 Cc: "Rick C. Petty" , freebsd-hardware@freebsd.org Subject: Re: Bounty offered to fix sio device lock problem X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Oct 2006 09:12:29 -0000 Bruce Evans wrote: > On Thu, 12 Oct 2006, Jo Rhett wrote: > >> Bruce -- who owns getting this fixed? Or who should own it? Or who >> will take on getting it fixed if we offer a bounty on it? >> >> Replication scenario: >> Modem on sio0 (or sio1 or any normal i386 serial port) >> /etc/ttys has port enabled with "dialup" >> qpage (from ports, unchanged) uses modem for dialout >> ** or just write a script that periodically dials out using tip >> >> Within a day and often within a few hours, the serial port will go >> awol. You can't talk to the modem any more. Modem is just fine. >> Rebooting the system solves the problem. Rebooting the modem does >> not solve it. >> >> 100% replicable, and sooner versus later if you call out more often. > > [context lost to top posting] > > I mentioned an old vfs refcounting bug. New ones turned up a week or > two ago. They cause leaked pty masters and worse. The pty leak is > caused by last-close sometimes not being called. For pty masters, the > leak is permanent since reopening of the master is not permitted for > security reasons so there is no way to reach the device close, but for > sio devices it should be possible to fix up the problem by reopening > and closing the device relevant device after ensuring that it is not > already open: > - for cua*, simply stty -f'ing it or just using it should be enough. > I guess this is not your problem, since the fix is almost automatic. > - for tty*, it may be necessary to disable getty on the port and kill > the current getty, since the old vfs refcounting bug normally prevents > reaching last-close if any process is sleeping in open, so if you don't > disable getty on the port then you have to race with the new getty to > complete the open/last-close before the new getty sleeps in open. > > Many nearby vfs bugs will be fixed in 6.2-RELEASE, but no fix is in > sight for the main refcounting ones. So these problems are all in 6.0-REL, not 6.1 or CURRENT. (I assume they persist, but new ones may be newer than this) On one system, I've disabled getty for over a month and ... well, waiting for schedule downtime for the host. Still can't use the device. I'd be happy to give you any debug or any information you need to diagnose. And we'd be happy to give you money to adjust your priorities too :-) It's a fairly serious annoyance for us, causing our emergency out of band pagers to missing crucial messages.