From owner-freebsd-net@FreeBSD.ORG  Fri Mar  8 07:10:43 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 9108689A;
 Fri,  8 Mar 2013 07:10:43 +0000 (UTC)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: from hergotha.csail.mit.edu
 (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2])
 by mx1.freebsd.org (Postfix) with ESMTP id 24604CEF;
 Fri,  8 Mar 2013 07:10:42 +0000 (UTC)
Received: from hergotha.csail.mit.edu (localhost [127.0.0.1])
 by hergotha.csail.mit.edu (8.14.5/8.14.5) with ESMTP id r287AfnT054755;
 Fri, 8 Mar 2013 02:10:41 -0500 (EST)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: (from wollman@localhost)
 by hergotha.csail.mit.edu (8.14.5/8.14.4/Submit) id r287AfKg054752;
 Fri, 8 Mar 2013 02:10:41 -0500 (EST) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <20793.36593.774795.720959@hergotha.csail.mit.edu>
Date: Fri, 8 Mar 2013 02:10:41 -0500
From: Garrett Wollman <wollman@freebsd.org>
To: freebsd-net@freebsd.org
Subject: Limits on jumbo mbuf cluster allocation
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7
 (hergotha.csail.mit.edu [127.0.0.1]); Fri, 08 Mar 2013 02:10:42 -0500 (EST)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED
 autolearn=disabled version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
 hergotha.csail.mit.edu
Cc: jfv@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Mar 2013 07:10:43 -0000

I have a machine (actually six of them) with an Intel dual-10G NIC on
the motherboard.  Two of them (so far) are connected to a network
using jumbo frames, with an MTU a little under 9k, so the ixgbe driver
allocates 32,000 9k clusters for its receive rings.  I have noticed,
on the machine that is an active NFS server, that it can get into a
state where allocating more 9k clusters fails (as reflected in the
mbuf failure counters) at a utilization far lower than the configured
limits -- in fact, quite close to the number allocated by the driver
for its rx ring.  Eventually, network traffic grinds completely to a
halt, and if one of the interfaces is administratively downed, it
cannot be brought back up again.  There's generally plenty of physical
memory free (at least two or three GB).

There are no console messages generated to indicate what is going on,
and overall UMA usage doesn't look extreme.  I'm guessing that this is
a result of kernel memory fragmentation, although I'm a little bit
unclear as to how this actually comes about.  I am assuming that this
hardware has only limited scatter-gather capability and can't receive
a single packet into multiple buffers of a smaller size, which would
reduce the requirement for two-and-a-quarter consecutive pages of KVA
for each packet.  In actual usage, most of our clients aren't on a
jumbo network, so most of the time, all the packets will fit into a
normal 2k cluster, and we've never observed this issue when the
*server* is on a non-jumbo network.

Does anyone have suggestions for dealing with this issue?  Will
increasing the amount of KVA (to, say, twice physical memory) help
things?  It seems to me like a bug that these large packets don't have
their own submap to ensure that allocation is always possible when
sufficient physical pages are available.

-GAWollman