From owner-freebsd-net@FreeBSD.ORG Thu Mar 6 21:46:11 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BAD7478D for ; Thu, 6 Mar 2014 21:46:11 +0000 (UTC) Received: from mo6-p00-ob.smtp.rzone.de (mo6-p00-ob.smtp.rzone.de [IPv6:2a01:238:20a:202:5300::3]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1C5D1F03 for ; Thu, 6 Mar 2014 21:46:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; t=1394142367; l=12673; s=domk; d=haakh.de; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:References: Subject:To:MIME-Version:From:Date:X-RZG-CLASS-ID:X-RZG-AUTH; bh=q3ujKYdf+GrwghJ4ubilcG3+olk=; b=q+9ixqe7OhODAgoQ5+lHH9iz7jkx4RQa8ix/zWMEs2aYHrkklOS9kHqBLvCI3fb2QC9 GKH8xH6kiR++0q88DmBQQuCm9f4GwIHEcCLIj0cFTWVAthpju0RQSoa67QKqBq1AT5knL vgAsBw64w+utNNJIS8LXbVD/Okyz/Ny83x4= X-RZG-AUTH: :LWQcbViwW/e6OTbW0dHzwKkCepEs/ThuRG8zpeuciRNkwehqPJJjNL7gV97j X-RZG-CLASS-ID: mo00 Received: from abaton.Haakh.de (p57A72615.dip0.t-ipconnect.de [87.167.38.21]) by smtp.strato.de (RZmta 32.27 DYNA|AUTH) with ESMTPA id i073e8q26Ljunut for ; Thu, 6 Mar 2014 22:45:56 +0100 (CET) Received: from Crabberio.Haakh.de (crabberio.Haakh.de [192.168.63.16]) by abaton.Haakh.de (8.14.8/8.14.7) with ESMTP id s26LjtZA013040 for ; Thu, 6 Mar 2014 22:45:56 +0100 (CET) (envelope-from bugReporter@Haakh.de) Message-ID: <5318EC93.4000303@Haakh.de> Date: Thu, 06 Mar 2014 22:45:55 +0100 From: "Dr. A. Haakh" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:27.0) Gecko/20100101 Firefox/27.0 SeaMonkey/2.24 MIME-Version: 1.0 To: freebsd-net@freebsd.org Subject: Re: 9.2 ixgbe tx queue hang References: <9C5B43BD-9D80-49EA-8EDC-C7EF53D79C8D@hostpoint.ch> <02AD7510-C862-4C29-9420-25ABF1A6E744@hostpoint.ch> In-Reply-To: <02AD7510-C862-4C29-9420-25ABF1A6E744@hostpoint.ch> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Mar 2014 21:46:11 -0000 Markus Gebert schrieb: > On 06.03.2014, at 19:33, Jack Vogel wrote: > >> You did not make it explicit before, but I noticed in your dtrace info that >> you are using >> lagg, its been the source of lots of problems, so take it out of the setup >> and see if this >> queue problem still happens please. >> >> Jack > Well, last year when upgrading another batch of servers (same hardware) to 9.2, we tried find a solution to this network problem, and we eliminated lagg where we had used it before, which did not help at all. That’s why I didn’t mention it explicitly. > > My point is, I can confirm that 9.2 has network problems on this same hardware with or without lagg, so it’s unlikely that removing it will bring immediate success. OTOH, I didn’t have this tx queue theory back then, so I cannot be sure that what we saw then without lagg, and what we see now with lagg, really are the same problem. > > I guess, for the sake of simplicity I will remove lagg on these new systems. But before I do that, to save time, I wanted to ask wether I should remove vlan interfaces too? While that didn’t help either last year, my guess is that I should take them out of the picture, unless you say otherwise. > > Thanks for looking into this. > > > Markus > I don't use ixgbe but this might be related to the discussed problem. I too realized network problems when I moved from 9.1 to 9.2 last october. Occasionally I use vlc to watch tv on udp://@224.0.0.1:7792 coming from an XP-system which displayed perfect on 9.1 but got scrambled on 9.2. By accident I realized that vlc worked fine again, when I had a cpu-intensiv job like portupgrade -a running. So I thought it might be a problem related to the scheduler. In the meantime I upgraded to 10.0-STABLE and things looks better now -- it still takes about 20 seconds for a video-stream get synchronized. My system is CPU: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz (2675.02-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x106e5 Family = 0x6 Model = 0x1e Stepping = 5 Features=0xbfebfbff Features2=0x98e3fd AMD Features=0x28100800 AMD Features2=0x1 TSC: P-state invariant, performance statistics real memory = 12884901888 (12288 MB) avail memory = 12438151168 (11861 MB) with this ethernet-card re0: port 0xd800-0xd8ff mem 0xf6fff000-0xf6ffffff,0xf6ff8000-0xf6ffbfff irq 19 at device 0.0 on pci2 re0: Using 1 MSI-X message re0: Chip rev. 0x28000000 re0: MAC rev. 0x00300000 miibus0: on re0 rgephy0: PHY 1 on miibus0 rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow re0: Ethernet address: 90:e6:ba:bb:28:3e Andreas > >> On Thu, Mar 6, 2014 at 2:24 AM, Markus Gebert wrote: >> >>> (creating a new thread, because I'm no longer sure this is related to >>> Johan's thread that I originally used to discuss this) >>> >>> On 27.02.2014, at 18:02, Jack Vogel wrote: >>> >>>> I would make SURE that you have enough mbuf resources of whatever size >>> pool >>>> that you are >>>> using (2, 4, 9K), and I would try the code in HEAD if you had not. >>>> >>>> Jack >>> Jack, we've upgraded some other systems on which I get more time to debug >>> (no impact for customers). Although those systems use the nfsclient too, I >>> no longer think that NFS is the source of the problem (hence the new >>> thread). I think it's the ixgbe driver and/or card. When our problem >>> occurs, it looks like it's a single tx queue that gets stuck somehow (its >>> buf_ring remains full). >>> >>> I tracked ping using dtrace to determine the source of ENOBUFS it returns >>> every few packets when things get weird: >>> >>> # dtrace -n 'fbt:::return / arg1 == ENOBUFS && execname == "ping" / { >>> stack(); }' >>> dtrace: description 'fbt:::return ' matched 25476 probes >>> CPU ID FUNCTION:NAME >>> 26 7730 ixgbe_mq_start:return >>> if_lagg.ko`lagg_transmit+0xc4 >>> kernel`ether_output_frame+0x33 >>> kernel`ether_output+0x4fe >>> kernel`ip_output+0xd74 >>> kernel`rip_output+0x229 >>> kernel`sosend_generic+0x3f6 >>> kernel`kern_sendit+0x1a3 >>> kernel`sendit+0xdc >>> kernel`sys_sendto+0x4d >>> kernel`amd64_syscall+0x5ea >>> kernel`0xffffffff80d35667 >>> >>> >>> >>> The only way ixgbe_mq_start could return ENOBUFS would be when >>> drbr_enqueue() encouters a full tx buf_ring. Since a new ping packet >>> probably has no flow id, it should be assigned to a queue based on curcpu, >>> which made me try to pin ping to single cpus to check wether it's always >>> the same tx buf_ring that reports being full. This turned out to be true: >>> >>> # cpuset -l 0 ping 10.0.4.5 >>> PING 10.0.4.5 (10.0.4.5): 56 data bytes >>> 64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.347 ms >>> 64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.135 ms >>> >>> # cpuset -l 1 ping 10.0.4.5 >>> PING 10.0.4.5 (10.0.4.5): 56 data bytes >>> 64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.184 ms >>> 64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.232 ms >>> >>> # cpuset -l 2 ping 10.0.4.5 >>> PING 10.0.4.5 (10.0.4.5): 56 data bytes >>> ping: sendto: No buffer space available >>> ping: sendto: No buffer space available >>> ping: sendto: No buffer space available >>> ping: sendto: No buffer space available >>> ping: sendto: No buffer space available >>> >>> # cpuset -l 3 ping 10.0.4.5 >>> PING 10.0.4.5 (10.0.4.5): 56 data bytes >>> 64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.130 ms >>> 64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.126 ms >>> [...snip...] >>> >>> The system has 32 cores, if ping runs on cpu 2, 10, 18 or 26, which use >>> the third tx buf_ring, ping reliably return ENOBUFS. If ping is run on any >>> other cpu using any other tx queue, it runs without any packet loss. >>> >>> So, when ENOBUFS is returned, this is not due to an mbuf shortage, it's >>> because the buf_ring is full. Not surprisingly, netstat -m looks pretty >>> normal: >>> >>> # netstat -m >>> 38622/11823/50445 mbufs in use (current/cache/total) >>> 32856/11642/44498/132096 mbuf clusters in use (current/cache/total/max) >>> 32824/6344 mbuf+clusters out of packet secondary zone in use >>> (current/cache) >>> 16/3906/3922/66048 4k (page size) jumbo clusters in use >>> (current/cache/total/max) >>> 0/0/0/33024 9k jumbo clusters in use (current/cache/total/max) >>> 0/0/0/16512 16k jumbo clusters in use (current/cache/total/max) >>> 75431K/41863K/117295K bytes allocated to network (current/cache/total) >>> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) >>> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) >>> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) >>> 0/0/0 requests for jumbo clusters denied (4k/9k/16k) >>> 0/0/0 sfbufs in use (current/peak/max) >>> 0 requests for sfbufs denied >>> 0 requests for sfbufs delayed >>> 0 requests for I/O initiated by sendfile >>> 0 calls to protocol drain routines >>> >>> In the meantime I've checked the commit log of the ixgbe driver in HEAD >>> and besides there are little differences between HEAD and 9.2, I don't see >>> a commit that fixes anything related to what were seeing... >>> >>> So, what's the conclusion here? Firmware bug that's only triggered under >>> 9.2? Driver bug introduced between 9.1 and 9.2 when new multiqueue stuff >>> was added? Jack, how should we proceed? >>> >>> >>> Markus >>> >>> >>> >>> On Thu, Feb 27, 2014 at 8:05 AM, Markus Gebert >>> wrote: >>> >>>> On 27.02.2014, at 02:00, Rick Macklem wrote: >>>> >>>>> John Baldwin wrote: >>>>>> On Tuesday, February 25, 2014 2:19:01 am Johan Kooijman wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I have a weird situation here where I can't get my head around. >>>>>>> >>>>>>> One FreeBSD 9.2-STABLE ZFS/NFS box, multiple Linux clients. Once in >>>>>>> a while >>>>>>> the Linux clients loose their NFS connection: >>>>>>> >>>>>>> Feb 25 06:24:09 hv3 kernel: nfs: server 10.0.24.1 not responding, >>>>>>> timed out >>>>>>> >>>>>>> Not all boxes, just one out of the cluster. The weird part is that >>>>>>> when I >>>>>>> try to ping a Linux client from the FreeBSD box, I have between 10 >>>>>>> and 30% >>>>>>> packetloss - all day long, no specific timeframe. If I ping the >>>>>>> Linux >>>>>>> clients - no loss. If I ping back from the Linux clients to FBSD >>>>>>> box - no >>>>>>> loss. >>>>>>> >>>>>>> The errors I get when pinging a Linux client is this one: >>>>>>> ping: sendto: File too large >>>> We were facing similar problems when upgrading to 9.2 and have stayed >>> with >>>> 9.1 on affected systems for now. We've seen this on HP G8 blades with >>>> 82599EB controllers: >>>> >>>> ix0@pci0:4:0:0: class=0x020000 card=0x18d0103c chip=0x10f88086 rev=0x01 >>>> hdr=0x00 >>>> vendor = 'Intel Corporation' >>>> device = '82599EB 10 Gigabit Dual Port Backplane Connection' >>>> class = network >>>> subclass = ethernet >>>> >>>> We didn't find a way to trigger the problem reliably. But when it occurs, >>>> it usually affects only one interface. Symptoms include: >>>> >>>> - socket functions return the 'File too large' error mentioned by Johan >>>> - socket functions return 'No buffer space' available >>>> - heavy to full packet loss on the affected interface >>>> - "stuck" TCP connection, i.e. ESTABLISHED TCP connections that should >>>> have timed out stick around forever (socket on the other side could have >>>> been closed ours ago) >>>> - userland programs using the corresponding sockets usually got stuck too >>>> (can't find kernel traces right now, but always in network related >>> syscalls) >>>> Network is only lightly loaded on the affected systems (usually 5-20 >>> mbit, >>>> capped at 200 mbit, per server), and netstat never showed any indication >>> of >>>> ressource shortage (like mbufs). >>>> >>>> What made the problem go away temporariliy was to ifconfig down/up the >>>> affected interface. >>>> >>>> We tested a 9.2 kernel with the 9.1 ixgbe driver, which was not really >>>> stable. Also, we tested a few revisions between 9.1 and 9.2 to find out >>>> when the problem started. Unfortunately, the ixgbe driver turned out to >>> be >>>> mostly unstable on our systems between these releases, worse than on 9.2. >>>> The instability was introduced shortly after to 9.1 and fixed only very >>>> shortly before 9.2 release. So no luck there. We ended up using 9.1 with >>>> backports of 9.2 features we really need. >>>> >>>> What we can't tell is wether it's the 9.2 kernel or the 9.2 ixgbe driver >>>> or a combination of both that causes these problems. Unfortunately we ran >>>> out of time (and ideas). >>>> >>>> >>>>>> EFBIG is sometimes used for drivers when a packet takes too many >>>>>> scatter/gather entries. Since you mentioned NFS, one thing you can >>>>>> try is to >>>>>> disable TSO on the intertface you are using for NFS to see if that >>>>>> "fixes" it. >>>>>> >>>>> And please email if you try it and let us know if it helps. >>>>> >>>>> I've think I've figured out how 64K NFS read replies can do this, >>>>> but I'll admit "ping" is a mystery? (Doesn't it just send a single >>>>> packet that would be in a single mbuf?) >>>>> >>>>> I think the EFBIG is replied by bus_dmamap_load_mbuf_sg(), but I >>>>> don't know if it can happen for an mbuf chain with < 32 entries? >>>> We don't use the nfs server on our systems, but they're (new)nfsclients. >>>> So I don't think our problem is nfs related, unless the default >>> rsize/wsize >>>> for client mounts is not 8K, which I thought it was. Can you confirm >>> this, >>>> Rick? >>>> >>>> IIRC, disabling TSO did not make any difference in our case. >>>> >>>> >>>> Markus >>>> >>>> >>> >>> >>> >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> >