Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Mar 2014 11:56:55 -0300
From:      Christopher Forgeron <csforgeron@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>, Garrett Wollman <wollman@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, Markus Gebert <markus.gebert@hostpoint.ch>
Subject:   Re: 9.2 ixgbe tx queue hang
Message-ID:  <CAB2_NwCHM9D1HZSMsuQQ-dYNAt-t2721jKqfO=2h3M4qdumY7w@mail.gmail.com>
In-Reply-To: <CAB2_NwAbHzFqa8RM5pwV7Yy5t=96JwzaF%2BSdjJN9kK3uhKKn_w@mail.gmail.com>
References:  <CAB2_NwAcDPM6YKNLQMC0=YSp%2Bn9nBpXGJQR9ajbgbfcQFoWYPw@mail.gmail.com> <1164414873.1690348.1395622026185.JavaMail.root@uoguelph.ca> <CAB2_NwAbHzFqa8RM5pwV7Yy5t=96JwzaF%2BSdjJN9kK3uhKKn_w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I'm going to split this into different posts to focus on each topic. This
is about setting IP_MAXPACKET to 65495

Update on Last Night's Run:

(Last night's run is a kernel with IP_MAXPACKET = 65495)

- Uptime on this run: 10:53AM  up 13:21, 5 users, load averages: 1.98,
2.09, 2.13
- Ping logger records no ping errors for the entire run.
- At Mar 24th 10:57 I did a grep through the night's log for 'before'
(which is the printf logging that Rick suggested a few days ago), and saved
it to before_total.txt
- With wc -l on before_total.txt I can see that we have 504 lines, thus 504
incidents of the packet being above IP_MAXPACKET during this run.
- I did tr -c '[:alnum:]' '[\n*]' < before_total.txt | sort | uniq -c |
sort -nr | head -50 to list the most common words. Ignoring the non-pklen
output. The relevant output is:

 344 65498 (3)
 330 65506 (11)
 330 65502 (7)

 - First # being the # of times. (Each pklen is printed twice on the log,
thus 2x the total line count).
 - Last (#) being the byte overrun from 65495
 - A fairly even distribution of each type of packet overrun.

 You will recall that my IP_MAXPACKET is 65495, so each of these packet
lengths represents a overshoot.

 The fact that we have only 3 different types of overrun is good - It
suggests a non-random event, more like a broken 'if' statement for a
particular case.

 If IP_MAXPACKET was set to 65535 as it normally is, I would have had 504
incidents of errors, with a chance that any one of them could have blocked
the queue for considerable time.

 Question: Should there be logic that discards packets that are over
IP_MAXPACKET to ensure that we don't end up in a blocked queue situation
again?


 Moving forward, I am doing two things:

 1) I'm running a longer test with TSO disabled on my ix0 adapter. I want
to make sure that over say 4 hours I don't have even 1 packet over 65495.
This will at least locate the issue to TSO related code.

 2) I have tcpdump running, to see if I can capture the packets over 65495.
Here is my command. Any suggestions on additional switches I should include?

tcpdump -ennvvXS greater 65495

I'll report in on this again once I have new info.

Thanks for reading.

On Mon, Mar 24, 2014 at 2:14 AM, Christopher Forgeron
<csforgeron@gmail.com>wrote:

> Hi,
>
>  I'll follow up more tomorrow, as it's late and I don't have time for
> detail.
>
>  The basic TSO patch didn't work, as packets were were still going over
> 65535 by a fair amount. I thought I wrote that earlier, but I am dumping a
> lot of info into a few threads, so I apologize if I'm not as concise as I
> could be.
>
>  However, setting IP_MAXPACKET did. 4 hours of continuous run-time, no
> issues. No lost pings, no issues. Of course this isn't a fix - but it helps
> isolate the problem.
> > what the story is a few months down the road.
> >
> >
> > Thanks for the patches, will have to start giving them code-names so
> > we can keep them straight. :-) I guess we have printf, tsomax, and
> > this one.
> >
> >
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAB2_NwCHM9D1HZSMsuQQ-dYNAt-t2721jKqfO=2h3M4qdumY7w>