Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Mar 2014 18:47:13 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Christopher Forgeron <csforgeron@gmail.com>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>, Garrett Wollman <wollman@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, Markus Gebert <markus.gebert@hostpoint.ch>
Subject:   Re: 9.2 ixgbe tx queue hang
Message-ID:  <1149589960.2454997.1395701233190.JavaMail.root@uoguelph.ca>
In-Reply-To: <CAB2_NwCHM9D1HZSMsuQQ-dYNAt-t2721jKqfO=2h3M4qdumY7w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Christopher Forgeron wrote:
> I'm going to split this into different posts to focus on each topic.
> This
> is about setting IP_MAXPACKET to 65495
> 
> Update on Last Night's Run:
> 
> (Last night's run is a kernel with IP_MAXPACKET = 65495)
> 
> - Uptime on this run: 10:53AM  up 13:21, 5 users, load averages:
> 1.98,
> 2.09, 2.13
> - Ping logger records no ping errors for the entire run.
> - At Mar 24th 10:57 I did a grep through the night's log for 'before'
> (which is the printf logging that Rick suggested a few days ago), and
> saved
> it to before_total.txt
> - With wc -l on before_total.txt I can see that we have 504 lines,
> thus 504
> incidents of the packet being above IP_MAXPACKET during this run.
> - I did tr -c '[:alnum:]' '[\n*]' < before_total.txt | sort | uniq -c
> |
> sort -nr | head -50 to list the most common words. Ignoring the
> non-pklen
> output. The relevant output is:
> 
>  344 65498 (3)
>  330 65506 (11)
>  330 65502 (7)
> 
This makes sense to me, since tp->t_tsomax is used in tcp_output() for
the TCP/IP packet, which does not include the link level (ethernet)
header. When that is added, I would expect the length to be up to 14
(or maybe 18 for vlan cases) greater than IP_MAXPACKET. Since none of
these are greater than 65509, this looks fine to me.

So, unless you get ones greater than (65495 + 18 = 65513), this makes
sense and does not indicate a problem.

In another post, you indicate that having the driver set if_hw_tsomax
didn't set tp->t_tsomax to the same value.
--> I believe that is a bug and would mean my ixgbe.patch would not
    fix the problem, because it is tp->t_tsomax that must be decreased
    to at least (65536 - 18 = 65518).
    --> Now, have you tried a case between 65495 and 65518 and seen
        any EFBIG errors?
        If so, then I don't understand why 65518 isn't small enough?

rick

>  - First # being the # of times. (Each pklen is printed twice on the
>  log,
> thus 2x the total line count).
>  - Last (#) being the byte overrun from 65495
>  - A fairly even distribution of each type of packet overrun.
> 
>  You will recall that my IP_MAXPACKET is 65495, so each of these
>  packet
> lengths represents a overshoot.
> 
>  The fact that we have only 3 different types of overrun is good - It
> suggests a non-random event, more like a broken 'if' statement for a
> particular case.
> 
I think it just means that your load happens to do only 3 sizes of I/O
that is a little less than 65536.

>  If IP_MAXPACKET was set to 65535 as it normally is, I would have had
>  504
> incidents of errors, with a chance that any one of them could have
> blocked
> the queue for considerable time.
> 
If tp->t_tsomax hasn't been set to a smaller value than 65535, the
ixgbe.patch didn't do what I thought it would.

>  Question: Should there be logic that discards packets that are over
> IP_MAXPACKET to ensure that we don't end up in a blocked queue
> situation
> again?
> 
> 
>  Moving forward, I am doing two things:
> 
>  1) I'm running a longer test with TSO disabled on my ix0 adapter. I
>  want
> to make sure that over say 4 hours I don't have even 1 packet over
> 65495.
> This will at least locate the issue to TSO related code.
> 
>  2) I have tcpdump running, to see if I can capture the packets over
>  65495.
> Here is my command. Any suggestions on additional switches I should
> include?
> 
> tcpdump -ennvvXS greater 65495
> 
> I'll report in on this again once I have new info.
> 
> Thanks for reading.
> 
> On Mon, Mar 24, 2014 at 2:14 AM, Christopher Forgeron
> <csforgeron@gmail.com>wrote:
> 
> > Hi,
> >
> >  I'll follow up more tomorrow, as it's late and I don't have time
> >  for
> > detail.
> >
> >  The basic TSO patch didn't work, as packets were were still going
> >  over
> > 65535 by a fair amount. I thought I wrote that earlier, but I am
> > dumping a
> > lot of info into a few threads, so I apologize if I'm not as
> > concise as I
> > could be.
> >
> >  However, setting IP_MAXPACKET did. 4 hours of continuous run-time,
> >  no
> > issues. No lost pings, no issues. Of course this isn't a fix - but
> > it helps
> > isolate the problem.
> > > what the story is a few months down the road.
> > >
> > >
> > > Thanks for the patches, will have to start giving them code-names
> > > so
> > > we can keep them straight. :-) I guess we have printf, tsomax,
> > > and
> > > this one.
> > >
> > >
> >
> >
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1149589960.2454997.1395701233190.JavaMail.root>