Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 1 Feb 2016 17:24:36 -0600
From:      Xiaoye Sun <Xiaoye.Sun@rice.edu>
To:        Luigi Rizzo <rizzo@iet.unipi.it>
Cc:        Pavel Odintsov <pavel.odintsov@gmail.com>,  "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: swaping ring slots between NIC ring and Host ring does not always success
Message-ID:  <CAJnByzht-qfDcm8oEg1aSRyVBZ1ygPvc2eMuoyJcq4geueTZ0Q@mail.gmail.com>
In-Reply-To: <CA%2BhQ2%2Bim9nFfYnqDS2HgRbAzdf5D0iaLCmCYhfXQVVRMouUFuw@mail.gmail.com>
References:  <CAJnByzj6Dj3vouZ2NbxqvCV-2-7TVtTR4FaWKuCFaaRN2X%2ByAA@mail.gmail.com> <CALgsdbd3XuE3wMYp4ey%2B1aer%2BHSVNojLYoVqwqTBPAXXdf9i%2BQ@mail.gmail.com> <CAJnByzirLXdCe-kwHV2s_E6ytGJG0Dth=0Ms12RrEk7FK_%2B8Og@mail.gmail.com> <CA%2BhQ2%2BgMWY0eabjHGw0=PJCAkS-wO=RBrN5brSbaqWc3_AOYoQ@mail.gmail.com> <CAJnByziBS8o6LtmpUrUu5xtRUd008Z2hnCsp=WVFv35r2J0rHw@mail.gmail.com> <CA%2BhQ2%2Bim9nFfYnqDS2HgRbAzdf5D0iaLCmCYhfXQVVRMouUFuw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Luigi,

Thanks for the detailed advice.

With more detailed experiments, actually I found that the udp
sender/receiver packet reorder issue *might* be irrelevant to the original
issue I posted. However, I think we should solve the udp sender/receiver
issue first.
I run the experiment with more detailed log. Here is my findings.

1. I am running a netmap version available since about Oct 13rd from github
(https://github.com/luigirizzo/netmap). So I think this is not the one
related to the buffer allocation issue. I tried to running the newest
version, however, that version causes problem when I exit the bridge
program (something like kernel error which make the os crash).

2 & 3. I changed the receiver.c & bridge.c so that I can get more
information (more detailed log).
The reorder happens multiple times (about 10 times) within a second. Here
is one example trace collected from the above two programs. (remembering
that we have udp sender running on one machine; netmap bridge and udp
receiver are running on another machine).
There is only one pair of rings each with 512 slots (511 slot usable) on
the receiver machine.

=================== packet trace collected from receiver.c
===================
===== together with the slot and buf_idx of the corresponding netmap ring
slots ======
[seq]   [slot]   [buf_idx]
8208   294    1833
*8209   295    1834*
*8388   474    2013*
... (packet received in order)
8398   484    2023

*8399   485    2024*
*8210   296    1835*
8211   297    1836
... (packet received in order)
...

*8222   308    1847*
*8400   486    2025*

*8223   309    1848*
*8401   487    2026*
*8224   310    1849*
*8402   488    2027*
*8225   311    1850*
*8403   489    2028*
*8226   312    1851*
*8404   450    2029*
*8227   313    1852*
8228   314    1853
===================================================================
As we can see that the udp receiver got packet 8210 after it got 8399,
which is the first reorder. Then, the receiver got 8211 to 8222
sequentially. Then it got packet from 8223-8227 and 8400-8404 interleaved.


==================== event order seen by netmap bridge ==================
get 8209
poll called
*get 8210*
...
...
get 8228
poll called
get 8229
...
...
get 8383
poll called
get 8384
...
get 8387
poll called
get 8388
...
get 8393
poll called
get 8394
...
*get 8399*
poll called
*get 8400*
...
get 8404
poll called
get 8405
===================================================================
As we can see, from the event ordering see by the bridge.c, all the packets
are receiver in order, which means the the reorder happens when the bridge
code swap the buf_idx between the nic ring(slot) and the host ring(slot).
The reordered seq usually right before or after the poll function call.

Best,
Xiaoye








On Fri, Jan 29, 2016 at 4:27 PM, Luigi Rizzo <rizzo@iet.unipi.it> wrote:

> On Fri, Jan 29, 2016 at 2:12 PM, Xiaoye Sun <Xiaoye.Sun@rice.edu> wrote:
> > Hi Luigi,
> >
> > Thanks for your advice.
> > I forgot to mention that I use the command "ethtool -L eth1 combined 1"
> to
> > set the number of rings of the nic to 1.  The host also only has one
> ring.
> > I understand the situation where the first tx ring is full so the bridge
> > will swap the packets to the second tx ring and then the host/nic might
> > drain either rings. But this is not the case in the experiment.
>
> ok good to know that.
>
> So if we have ruled out multiqueue and iommu, let's look at
> the internal allocator and at bridge.c
>
> 1. are you running the most recent version of netmap ?
>    Some older version (probably 1-2 years ago) had a bug
>    in the buffer allocator and some buffers were allocated
>    twice.
>
> 2. can you tweak your receiver.c to report some more info
>    on how often you get out of sequence packets, how much
>    out of sequence they are ?
>    Also it would be useful to report gaps on the increasing side
>    (i.e. new_seq != old_seq +1 )
>
> 3. can you tweak bridge.c so that it writes into the packet
>    the netmap buffer indexes and slots on the rx and tx side,
>    so when you detect a sequence error we can figure out
>    where it is happening.
>    Ideally you could also add the sequence number detection
>    code in bridge.c so we can check whether the errors appear
>    on the input or output sides.
>
> cheers
> luigi
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJnByzht-qfDcm8oEg1aSRyVBZ1ygPvc2eMuoyJcq4geueTZ0Q>