From owner-freebsd-net@FreeBSD.ORG Thu Mar 20 23:01:01 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3A49DAEC for ; Thu, 20 Mar 2014 23:01:01 +0000 (UTC) Received: from mail-qg0-x22e.google.com (mail-qg0-x22e.google.com [IPv6:2607:f8b0:400d:c04::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E5334F9 for ; Thu, 20 Mar 2014 23:01:00 +0000 (UTC) Received: by mail-qg0-f46.google.com with SMTP id e89so4728423qgf.5 for ; Thu, 20 Mar 2014 16:01:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=z1j12ve1hsVDfQmU6s8JoitgRkdFveHzdh+KYBvs//k=; b=IS5v8MHVmzUSLsS4Znwvg4aXYWNkP8/v3k/GMlbMUoDOfH4wVE78X4SVp+9sUZnQ+g O0YZiL9c6wAuTDjLUizP79vAvY/VlibeMl+phDo2ipLbIP6tkCVt3Ae3g7WBniKYiIrh N9AURKyd6WZyfFbDfjqjQlCwBksavF30oG7QHcSecAH8JzvDfBMjYwPNJvdx18J2q2VZ D4zCzHT+KfSg4hTm8xKKiuZYsDAT0FuivjdnHqgPxUVeMnELwTA64pFYnKEhcJvI36Sg Cj2apwrKqhLYs4k2rFrWEzUaXoTTOEzYrymZ+uvwohpGnrBzgn7bQ9iYSAkmTG9cxbmq 8xgw== MIME-Version: 1.0 X-Received: by 10.224.22.147 with SMTP id n19mr6146514qab.93.1395356460126; Thu, 20 Mar 2014 16:01:00 -0700 (PDT) Received: by 10.96.79.97 with HTTP; Thu, 20 Mar 2014 16:01:00 -0700 (PDT) In-Reply-To: References: <1159309884.25490921.1395282576806.JavaMail.root@uoguelph.ca> <201403202113.s2KLD7GB085085@hergotha.csail.mit.edu> Date: Thu, 20 Mar 2014 20:01:00 -0300 Message-ID: Subject: Re: 9.2 ixgbe tx queue hang From: Christopher Forgeron To: Jack Vogel Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: FreeBSD Net , Garrett Wollman X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Mar 2014 23:01:01 -0000 Ah, good point about the 4k buff size : I will allocate more to kern.ipc.nmbjumbop , perhaps taking it from 9 and 16. Yes, I did have to tweak the patch slightly to work on 10.0, but it's basically the same thing I was trying after looking at Garrett's notes. I see this is part of a larger problem, but I didn't see any issues with a 9.0 system for over a year, and my 9.2 system seems to be stable (all the same hardware, same use). I was thinking it was an issue with later 9.2's or 10, but ultimately I guess it's just a problem on any system that can't allocate 3 contiguous 4k memory pages quickly enough. (?).. I do notice ~ 30% more NFS speed to my ZFS pool with 10.0 - Perhaps that's the key performance to start noticing this problem. Then again, my 10.0 system starts out with denied 9k bufs at boot, where my 9.2 doesn't. There's no real memory pressure on boot when I have 96G of RAM I would expect. (I also wonder if I I shouldn't be considering a MTU that fits inside a MJUMPAGESIZE. I don't think my switches support a MTU that will == 3 or 4 full MJUMPAGESIZE. Then again, wasting a bit of memory on the server may be worth it to have slightly fewer TCP frames. ) What should be done about the other network drivers that still call MJUM9BYTES? http://fxr.watson.org/fxr/ident?im=excerpts;i=MJUM9BYTES I have a collection of a number of different NICs, I could test a few of these to verify they work okay with the same sort of patch we're talking about. I appreciate the help everyone gives me here, so I'm willing to help out if it's needed. Thanks again. On Thu, Mar 20, 2014 at 7:42 PM, Jack Vogel wrote: > Your 4K mbuf pool is not being used, make sure you increase the size once > you are > using that or you'll just be having the same issue with a different pool. > > Oh, and that patch was against the code in HEAD, it might need some manual > hacking > if you're using anything older. > > Not sure what you mean about memory allocation in 10, this change is not > 10 specific, its > something I intended on doing and it just slipped between the cracks. > > Jack > > > > On Thu, Mar 20, 2014 at 3:32 PM, Christopher Forgeron < > csforgeron@gmail.com> wrote: > >> I agree, performance is noticeably worse with TSO off, but I thought it >> would be a good step in troubleshooting. I'm glad you're a regular reader >> of the list, so I don't have to settle for slow performance. :-) >> >> I'm applying your patch now, I think it will fix it - but I'll report in >> after it's run iometer for the night regardless. >> >> On another note: What's so different about memory allocation in 10 that >> is making this an issue? >> >> >> >> >> On Thu, Mar 20, 2014 at 7:24 PM, Jack Vogel wrote: >> >>> I strongly discourage anyone from disabling TSO on 10G, its necessary to >>> get the >>> performance one wants to see on the hardware. >>> >>> Here is a patch to do what i'm talking about: >>> >>> *** ixgbe.c Fri Jan 10 18:12:20 2014 >>> --- ixgbe.jfv.c Thu Mar 20 23:04:15 2014 >>> *************** ixgbe_init_locked(struct adapter *adapte >>> *** 1140,1151 **** >>> */ >>> if (adapter->max_frame_size <= 2048) >>> adapter->rx_mbuf_sz = MCLBYTES; >>> - else if (adapter->max_frame_size <= 4096) >>> - adapter->rx_mbuf_sz = MJUMPAGESIZE; >>> - else if (adapter->max_frame_size <= 9216) >>> - adapter->rx_mbuf_sz = MJUM9BYTES; >>> else >>> ! adapter->rx_mbuf_sz = MJUM16BYTES; >>> >>> /* Prepare receive descriptors and buffers */ >>> if (ixgbe_setup_receive_structures(adapter)) { >>> --- 1140,1147 ---- >>> */ >>> if (adapter->max_frame_size <= 2048) >>> adapter->rx_mbuf_sz = MCLBYTES; >>> else >>> ! adapter->rx_mbuf_sz = MJUMPAGESIZE; >>> >>> /* Prepare receive descriptors and buffers */ >>> if (ixgbe_setup_receive_structures(adapter)) { >>> >>> >>> >>> >>> >>> >>> On Thu, Mar 20, 2014 at 3:12 PM, Christopher Forgeron < >>> csforgeron@gmail.com> wrote: >>> >>>> Hi Jack, >>>> >>>> I'm on ixgbe 2.5.15 >>>> >>>> I see a few other threads about using MJUMPAGESIZE instead of >>>> MJUM9BYTES. >>>> >>>> If you have a patch you'd like me to test, I'll compile it in and let >>>> you know. I was just looking at Garrett's if_em.c patch and thinking about >>>> applying it to ixgbe.. >>>> >>>> As it stands I seem to not be having the problem now that I have >>>> disabled TSO on ix0, but I still need more test runs to confirm - Which is >>>> also in line (i think) with what you are all saying. >>>> >>>> >>>> >>>> >>>> On Thu, Mar 20, 2014 at 7:00 PM, Jack Vogel wrote: >>>> >>>>> What he's saying is that the driver should not be using 9K mbuf >>>>> clusters, I thought >>>>> this had been changed but I see the code in HEAD is still using the >>>>> larger clusters >>>>> when you up the mtu. I will put it on my list to change with the next >>>>> update to HEAD. >>>>> >>>>> >>>>> What version of ixgbe are you using? >>>>> >>>>> Jack >>>>> >>>>> >>>>> >>>>> On Thu, Mar 20, 2014 at 2:34 PM, Christopher Forgeron < >>>>> csforgeron@gmail.com> wrote: >>>>> >>>>>> I have found this: >>>>>> >>>>>> >>>>>> http://lists.freebsd.org/pipermail/freebsd-net/2013-October/036955.html >>>>>> >>>>>> I think what you're saying is that; >>>>>> - a MTU of 9000 doesn't need to equal a 9k mbuf / jumbo cluster >>>>>> - modern NIC drivers can gather 9000 bytes of data from various memory >>>>>> locations >>>>>> - The fact that I'm seeing 9k jumbo clusters is showing me that my >>>>>> driver >>>>>> is trying to allocate 9k of contiguous space, and it's failing. >>>>>> >>>>>> Please correct me if I'm off here, I'd love to understand more. >>>>>> >>>>>> >>>>>> On Thu, Mar 20, 2014 at 6:13 PM, Garrett Wollman < >>>>>> wollman@hergotha.csail.mit.edu> wrote: >>>>>> >>>>>> > In article >>>>>> > >>>>> >, >>>>>> > csforgeron@gmail.com writes: >>>>>> > >>>>>> > >50/27433/0 requests for jumbo clusters denied (4k/9k/16k) >>>>>> > >>>>>> > This is going to screw you. You need to make sure that no NIC >>>>>> driver >>>>>> > ever allocates 9k jumbo pages -- unless you are using one of those >>>>>> > mythical drivers that can't do scatter/gather DMA on receive, which >>>>>> > you don't appear to be. >>>>>> > >>>>>> > These failures occur when the driver is trying to replenish its >>>>>> > receive queue, but is unable to allocate three *physically* >>>>>> contiguous >>>>>> > pages of RAM to construct the 9k jumbo cluster (of which the >>>>>> remaining >>>>>> > 3k is simply wasted). This happens on any moderately active server, >>>>>> > once physical memory gets checkerboarded with active single pages, >>>>>> > particularly with ZFS where those pages are wired in kernel memory >>>>>> and >>>>>> > so can't be evicted. >>>>>> > >>>>>> > -GAWollman >>>>>> > >>>>>> _______________________________________________ >>>>>> freebsd-net@freebsd.org mailing list >>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org >>>>>> " >>>>>> >>>>> >>>>> >>>> >>> >> >