FreeBSD Mail Archives

Date:      Thu, 06 Mar 2014 22:45:55 +0100
From:      "Dr. A. Haakh" <bugReporter@Haakh.de>
To:        freebsd-net@freebsd.org
Subject:   Re: 9.2 ixgbe tx queue hang
Message-ID:  <5318EC93.4000303@Haakh.de>
In-Reply-To: <02AD7510-C862-4C29-9420-25ABF1A6E744@hostpoint.ch>
References:  <9C5B43BD-9D80-49EA-8EDC-C7EF53D79C8D@hostpoint.ch> <CAFOYbcmrVms7VJmPCZHCTMDvBfsV775aDFkHhMrGAEAtPx8-Mw@mail.gmail.com> <02AD7510-C862-4C29-9420-25ABF1A6E744@hostpoint.ch>


Markus Gebert schrieb:
> On 06.03.2014, at 19:33, Jack Vogel <jfvogel@gmail.com> wrote:
>
>> You did not make it explicit before, but I noticed in your dtrace info that
>> you are using
>> lagg, its been the source of lots of problems, so take it out of the setup
>> and see if this
>> queue problem still happens please.
>>
>> Jack
> Well, last year when upgrading another batch of servers (same hardware) to 9.2, we tried find a solution to this network problem, and we eliminated lagg where we had used it before, which did not help at all. That’s why I didn’t mention it explicitly.
>
> My point is, I can confirm that 9.2 has network problems on this same hardware with or without lagg, so it’s unlikely that removing it will bring immediate success. OTOH, I didn’t have this tx queue theory back then, so I cannot be sure that what we saw then without lagg, and what we see now with lagg, really are the same problem.
>
> I guess, for the sake of simplicity I will remove lagg on these new systems. But before I do that, to save time, I wanted to ask wether I should remove vlan interfaces too? While that didn’t help either last year, my guess is that I should take them out of the picture, unless you say otherwise.
>
> Thanks for looking into this.
>
>
> Markus
>
I don't use ixgbe but this might be related to the discussed problem.
I too realized network problems when I moved from 9.1 to 9.2 last 
october. Occasionally I use vlc to watch tv on udp://@224.0.0.1:7792 
coming from an XP-system which displayed perfect on 9.1 but got 
scrambled on 9.2. By accident I realized that vlc worked fine again, 
when I had a cpu-intensiv job like portupgrade -a running. So I thought 
it might be a problem related to the scheduler.

In the meantime I upgraded to 10.0-STABLE and things looks better now -- 
it still takes about 20 seconds for a video-stream get synchronized.

My system is
CPU: Intel(R) Core(TM) i5 CPU         750  @ 2.67GHz (2675.02-MHz 
K8-class CPU)
   Origin = "GenuineIntel"  Id = 0x106e5  Family = 0x6  Model = 0x1e 
Stepping = 5
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x98e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT>
   AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
   AMD Features2=0x1<LAHF>
   TSC: P-state invariant, performance statistics
real memory  = 12884901888 (12288 MB)
avail memory = 12438151168 (11861 MB)

with this ethernet-card
re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 
0xd800-0xd8ff mem 0xf6fff000-0xf6ffffff,0xf6ff8000-0xf6ffbfff irq 19 at 
device 0.0 on pci2
re0: Using 1 MSI-X message
re0: Chip rev. 0x28000000
re0: MAC rev. 0x00300000
miibus0: <MII bus> on re0
rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0
rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 
100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 
1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 
1000baseT-FDX-flow-master, auto, auto-flow
re0: Ethernet address: 90:e6:ba:bb:28:3e


Andreas

>
>> On Thu, Mar 6, 2014 at 2:24 AM, Markus Gebert <markus.gebert@hostpoint.ch>wrote:
>>
>>> (creating a new thread, because I'm no longer sure this is related to
>>> Johan's thread that I originally used to discuss this)
>>>
>>> On 27.02.2014, at 18:02, Jack Vogel <jfvogel@gmail.com> wrote:
>>>
>>>> I would make SURE that you have enough mbuf resources of whatever size
>>> pool
>>>> that you are
>>>> using (2, 4, 9K), and I would try the code in HEAD if you had not.
>>>>
>>>> Jack
>>> Jack, we've upgraded some other systems on which I get more time to debug
>>> (no impact for customers). Although those systems use the nfsclient too, I
>>> no longer think that NFS is the source of the problem (hence the new
>>> thread). I think it's the ixgbe driver and/or card. When our problem
>>> occurs, it looks like it's a single tx queue that gets stuck somehow (its
>>> buf_ring remains full).
>>>
>>> I tracked ping using dtrace to determine the source of ENOBUFS it returns
>>> every few packets when things get weird:
>>>
>>> # dtrace -n 'fbt:::return / arg1 == ENOBUFS && execname == "ping" / {
>>> stack(); }'
>>> dtrace: description 'fbt:::return ' matched 25476 probes
>>> CPU     ID                    FUNCTION:NAME
>>> 26   7730            ixgbe_mq_start:return
>>>               if_lagg.ko`lagg_transmit+0xc4
>>>               kernel`ether_output_frame+0x33
>>>               kernel`ether_output+0x4fe
>>>               kernel`ip_output+0xd74
>>>               kernel`rip_output+0x229
>>>               kernel`sosend_generic+0x3f6
>>>               kernel`kern_sendit+0x1a3
>>>               kernel`sendit+0xdc
>>>               kernel`sys_sendto+0x4d
>>>               kernel`amd64_syscall+0x5ea
>>>               kernel`0xffffffff80d35667
>>>
>>>
>>>
>>> The only way ixgbe_mq_start could return ENOBUFS would be when
>>> drbr_enqueue() encouters a full tx buf_ring. Since a new ping packet
>>> probably has no flow id, it should be assigned to a queue based on curcpu,
>>> which made me try to pin ping to single cpus to check wether it's always
>>> the same tx buf_ring that reports being full. This turned out to be true:
>>>
>>> # cpuset -l 0 ping 10.0.4.5
>>> PING 10.0.4.5 (10.0.4.5): 56 data bytes
>>> 64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.347 ms
>>> 64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.135 ms
>>>
>>> # cpuset -l 1 ping 10.0.4.5
>>> PING 10.0.4.5 (10.0.4.5): 56 data bytes
>>> 64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.184 ms
>>> 64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.232 ms
>>>
>>> # cpuset -l 2 ping 10.0.4.5
>>> PING 10.0.4.5 (10.0.4.5): 56 data bytes
>>> ping: sendto: No buffer space available
>>> ping: sendto: No buffer space available
>>> ping: sendto: No buffer space available
>>> ping: sendto: No buffer space available
>>> ping: sendto: No buffer space available
>>>
>>> # cpuset -l 3 ping 10.0.4.5
>>> PING 10.0.4.5 (10.0.4.5): 56 data bytes
>>> 64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.130 ms
>>> 64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.126 ms
>>> [...snip...]
>>>
>>> The system has 32 cores, if ping runs on cpu 2, 10, 18 or 26, which use
>>> the third tx buf_ring, ping reliably return ENOBUFS. If ping is run on any
>>> other cpu using any other tx queue, it runs without any packet loss.
>>>
>>> So, when ENOBUFS is returned, this is not due to an mbuf shortage, it's
>>> because the buf_ring is full. Not surprisingly, netstat -m looks pretty
>>> normal:
>>>
>>> # netstat -m
>>> 38622/11823/50445 mbufs in use (current/cache/total)
>>> 32856/11642/44498/132096 mbuf clusters in use (current/cache/total/max)
>>> 32824/6344 mbuf+clusters out of packet secondary zone in use
>>> (current/cache)
>>> 16/3906/3922/66048 4k (page size) jumbo clusters in use
>>> (current/cache/total/max)
>>> 0/0/0/33024 9k jumbo clusters in use (current/cache/total/max)
>>> 0/0/0/16512 16k jumbo clusters in use (current/cache/total/max)
>>> 75431K/41863K/117295K bytes allocated to network (current/cache/total)
>>> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>>> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
>>> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
>>> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
>>> 0/0/0 sfbufs in use (current/peak/max)
>>> 0 requests for sfbufs denied
>>> 0 requests for sfbufs delayed
>>> 0 requests for I/O initiated by sendfile
>>> 0 calls to protocol drain routines
>>>
>>> In the meantime I've checked the commit log of the ixgbe driver in HEAD
>>> and besides there are little differences between HEAD and 9.2, I don't see
>>> a commit that fixes anything related to what were seeing...
>>>
>>> So, what's the conclusion here? Firmware bug that's only triggered under
>>> 9.2? Driver bug introduced between 9.1 and 9.2 when new multiqueue stuff
>>> was added? Jack, how should we proceed?
>>>
>>>
>>> Markus
>>>
>>>
>>>
>>> On Thu, Feb 27, 2014 at 8:05 AM, Markus Gebert
>>> <markus.gebert@hostpoint.ch>wrote:
>>>
>>>> On 27.02.2014, at 02:00, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>>>>
>>>>> John Baldwin wrote:
>>>>>> On Tuesday, February 25, 2014 2:19:01 am Johan Kooijman wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have a weird situation here where I can't get my head around.
>>>>>>>
>>>>>>> One FreeBSD 9.2-STABLE ZFS/NFS box, multiple Linux clients. Once in
>>>>>>> a while
>>>>>>> the Linux clients loose their NFS connection:
>>>>>>>
>>>>>>> Feb 25 06:24:09 hv3 kernel: nfs: server 10.0.24.1 not responding,
>>>>>>> timed out
>>>>>>>
>>>>>>> Not all boxes, just one out of the cluster. The weird part is that
>>>>>>> when I
>>>>>>> try to ping a Linux client from the FreeBSD box, I have between 10
>>>>>>> and 30%
>>>>>>> packetloss - all day long, no specific timeframe. If I ping the
>>>>>>> Linux
>>>>>>> clients - no loss. If I ping back from the Linux clients to FBSD
>>>>>>> box - no
>>>>>>> loss.
>>>>>>>
>>>>>>> The errors I get when pinging a Linux client is this one:
>>>>>>> ping: sendto: File too large
>>>> We were facing similar problems when upgrading to 9.2 and have stayed
>>> with
>>>> 9.1 on affected systems for now. We've seen this on HP G8 blades with
>>>> 82599EB controllers:
>>>>
>>>> ix0@pci0:4:0:0: class=0x020000 card=0x18d0103c chip=0x10f88086 rev=0x01
>>>> hdr=0x00
>>>>    vendor     = 'Intel Corporation'
>>>>    device     = '82599EB 10 Gigabit Dual Port Backplane Connection'
>>>>    class      = network
>>>>    subclass   = ethernet
>>>>
>>>> We didn't find a way to trigger the problem reliably. But when it occurs,
>>>> it usually affects only one interface. Symptoms include:
>>>>
>>>> - socket functions return the 'File too large' error mentioned by Johan
>>>> - socket functions return 'No buffer space' available
>>>> - heavy to full packet loss on the affected interface
>>>> - "stuck" TCP connection, i.e. ESTABLISHED TCP connections that should
>>>> have timed out stick around forever (socket on the other side could have
>>>> been closed ours ago)
>>>> - userland programs using the corresponding sockets usually got stuck too
>>>> (can't find kernel traces right now, but always in network related
>>> syscalls)
>>>> Network is only lightly loaded on the affected systems (usually 5-20
>>> mbit,
>>>> capped at 200 mbit, per server), and netstat never showed any indication
>>> of
>>>> ressource shortage (like mbufs).
>>>>
>>>> What made the problem go away temporariliy was to ifconfig down/up the
>>>> affected interface.
>>>>
>>>> We tested a 9.2 kernel with the 9.1 ixgbe driver, which was not really
>>>> stable. Also, we tested a few revisions between 9.1 and 9.2 to find out
>>>> when the problem started. Unfortunately, the ixgbe driver turned out to
>>> be
>>>> mostly unstable on our systems between these releases, worse than on 9.2.
>>>> The instability was introduced shortly after to 9.1 and fixed only very
>>>> shortly before 9.2 release. So no luck there. We ended up using 9.1 with
>>>> backports of 9.2 features we really need.
>>>>
>>>> What we can't tell is wether it's the 9.2 kernel or the 9.2 ixgbe driver
>>>> or a combination of both that causes these problems. Unfortunately we ran
>>>> out of time (and ideas).
>>>>
>>>>
>>>>>> EFBIG is sometimes used for drivers when a packet takes too many
>>>>>> scatter/gather entries.  Since you mentioned NFS, one thing you can
>>>>>> try is to
>>>>>> disable TSO on the intertface you are using for NFS to see if that
>>>>>> "fixes" it.
>>>>>>
>>>>> And please email if you try it and let us know if it helps.
>>>>>
>>>>> I've think I've figured out how 64K NFS read replies can do this,
>>>>> but I'll admit "ping" is a mystery? (Doesn't it just send a single
>>>>> packet that would be in a single mbuf?)
>>>>>
>>>>> I think the EFBIG is replied by bus_dmamap_load_mbuf_sg(), but I
>>>>> don't know if it can happen for an mbuf chain with < 32 entries?
>>>> We don't use the nfs server on our systems, but they're (new)nfsclients.
>>>> So I don't think our problem is nfs related, unless the default
>>> rsize/wsize
>>>> for client mounts is not 8K, which I thought it was. Can you confirm
>>> this,
>>>> Rick?
>>>>
>>>> IIRC, disabling TSO did not make any difference in our case.
>>>>
>>>>
>>>> Markus
>>>>
>>>>
>>>
>>>
>>>
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>>
>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5318EC93.4000303>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation