From owner-freebsd-net@FreeBSD.ORG Wed Jul 9 15:13:10 2008 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 87AD5106568A; Wed, 9 Jul 2008 15:13:10 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id 23BC98FC19; Wed, 9 Jul 2008 15:13:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c220-239-252-11.carlnfd3.nsw.optusnet.com.au (c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m69FD558018806 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 10 Jul 2008 01:13:08 +1000 Date: Thu, 10 Jul 2008 01:13:05 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Robert Watson In-Reply-To: <20080709131101.S8639@fledge.watson.org> Message-ID: <20080710002759.Q27395@delplex.bde.org> References: <200807041748.m64HmZur018637@svn.freebsd.org> <20080705161831.F13262@delplex.bde.org> <20080709131101.S8639@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: John Baldwin , net@FreeBSD.org Subject: Re: svn commit: r180256 - head/sys/dev/arl X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jul 2008 15:13:10 -0000 On Wed, 9 Jul 2008, Robert Watson wrote: > On Sat, 5 Jul 2008, Bruce Evans wrote: > >> On Fri, 4 Jul 2008, John Baldwin wrote: Since ifqmaxlen isn't a tuneable or >> sysctl, and is statically initialized to IFQ_MAXLEN, not using only makes a >> difference if someone iniitalizes it diffently using a debugger, so these >> bugs are normally just spelling errors. IFQ_MAXLEN is also too small for >> 1Gbps or even 100Nbps hardware devices, so only drivers for old hardware >> and some software drivers can use it anyway. > > I was actually thinking about this this morning -- Paul Saab pointed out to > me that, on Linux, you can run-time tune the transmit queue limit using > ifconfig(8). I think doing something similar would, if nothing else, make it > easier to understand the impact of our current queue settings in testing. Yes, the control should really be per-device. However, I don't like the bloat for dynamic everything in every driver. However2, I use a hack (a per-driver global possibly-set by ddb at boot time) to optionally enlarge the tx queue for all drivers that I touch. It was in editing this and having to change it for ALTQ and its unnecessary macro that I noticed the bogusness if IFQ_MAXLEN and the ALTQ macro. > And, just to put it on the table in e-mail, since I know it has come up a lot > at developer summits: the ALTQ infrastructure is decreasingly compatible with > current network devices, which often have quite large queues (descriptor > rings) in hardware, or where there are multiple transmit queues. One Hardware queues are never large :-). 512 is common, but enlargement gives ~20000. 20000 is too large for most purposes but rarely matters. I don't use ALTQ, and just notice that very rarely, latency can be enormous if the queue length builds up to he maximum. > possibility I've been considering is making the whole ifq subsystem a library > to device drivers, rather than a required interface to transmit. This would > allow the device driver to instantiate more than one if there are multiple > hardware queues that need to be represented, or, for example, allow synthetic > encapsulation interfaces (such as vlan) to avoid queueing entirely and > directly dispatch to the lower layer interface without requiring a mandatory > enqueue/dequeue step. I've started hacking on this every now and then, but > it requires a lot of code to be touched -- it's something we do need to > address before 8.0, however. Could this be more efficient? I think direct dispatch wouldn't work well. It didn't help as much as hoped for rx, and tx is predictable so perfect scheduling of it is possible (only dispatch in bulk in order to be more efficient). Also, the current implementation gives necessary watermark stuff almost automatically -- the queue split gives a virtual low watermark at the split point, and this reduces the chance of the combined queue running dry. Bruce