From owner-freebsd-stable@FreeBSD.ORG Wed Oct 4 05:38:36 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A153E16A407 for ; Wed, 4 Oct 2006 05:38:36 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2DAE443D46 for ; Wed, 4 Oct 2006 05:38:36 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [192.168.254.11] (phobos.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k945cSGm062866; Tue, 3 Oct 2006 23:38:33 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <452348D4.8070603@samsco.org> Date: Tue, 03 Oct 2006 23:38:28 -0600 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.0.7) Gecko/20060910 SeaMonkey/1.0.5 MIME-Version: 1.0 To: David G Lawrence References: <9F7B653A50CF3D45A92C05401046239B0E0C27@rwsrv06.rw2.riverwillow.net.au> <45234418.7000205@samsco.org> <20061004052453.GO17642@tnn.dglawrence.com> In-Reply-To: <20061004052453.GO17642@tnn.dglawrence.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org Cc: freebsd-stable@freebsd.org, John Marshall Subject: Re: Watchdog Timeout - bge devices X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Oct 2006 05:38:36 -0000 David G Lawrence wrote: >> Very interesting data point. I wonder if this accounts for some of the >> inconsistency in the reporting from others. In any case, SCHED_ULE is >> still considered to be highly experimental. Hopefully it will get some >> more attention in the near future to bring it closer to production >> quality. > > I'm not using SCHED_ULE on any of the machines that I'm seeing the > timeout problem with em and fxp devices. I suspect the problem has to do > with interrupt thread scheduling; maybe SCHED_ULE just somehow makes the > problem worse? > > -DG > Well, the two things that will block the scheduler are critical sections and spinlocks. The system will complain about spinlocks that are held too long, but you might need WITNESS and/or INVARIANTS enabled for it. Critical section debugging is almost non-existant; the only way to do it is to turn on KTR and then feed the output to schedgraph.py. This only works reliably for UP, though. Scott