Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Sep 2013 09:47:24 +0100
From:      Joe Holden <lists@rewt.org.uk>
To:        freebsd-net@freebsd.org
Subject:   Re: Network stack changes
Message-ID:  <5241519C.9040908@rewt.org.uk>
In-Reply-To: <201309240958.06172.zec@fer.hr>
References:  <521E41CB.30700@yandex-team.ru> <523F4F14.9090404@yandex-team.ru> <CAEW%2BogZttyScUBQQWht%2BYGfLEDU_APcoRyYeMy_wDseAcZwVnA@mail.gmail.com> <201309240958.06172.zec@fer.hr>

next in thread | previous in thread | raw e-mail | index | archive | help
On 24/09/2013 08:58, Marko Zec wrote:
> On Tuesday 24 September 2013 00:46:46 Sami Halabi wrote:
>> Hi,
>>
>>> http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf<http://info.i
>>> et.unipi.it/~luigi/papers/20120601-dxr.pdf>
>>> http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff<http://www.nxlab.f
>>> er.hr/dxr/stable_8_20120824.diff>
>>
>> I've tried the diff in 10-current, applied cleanly but had errors
>> compiling new kernel... is there any work to make it work? i'd love to
>> test it.
>
> Even if you'd make it compile on current, you could only run synthetic tests
> measuring lookup performance using streams of random keys, as outlined in
> the paper (btw. the paper at Luigi's site is an older draft, the final
> version with slightly revised benchmarks is available here:
> http://www.sigcomm.org/sites/default/files/ccr/papers/2012/October/2378956-2378961.pdf)
>
> I.e. the code only hooks into the routing API for testing purposes, but is
> completely disconnected from the forwarding path.
>
aha!  How much work would it be to enable it to be used?

> We have a prototype in the works which combines DXR with Netmap in userspace
> and is capable of sustaining well above line rate forwarding with
> full-sized BGP views using Intel 10G cards on commodity multicore machines.
> The work was somewhat stalled during the summer but I plan to wrap it up
> and release the code until the end of this year.  With recent advances in
> netmap it might also be feasible to merge DXR and netmap entirely inside
> the kernel but I've not explored that path yet...
>
mmm, forwarding using netmap would be pretty awesome...
> Marko
>
>
>> Sami
>>
>>
>> On Sun, Sep 22, 2013 at 11:12 PM, Alexander V. Chernikov <
>>
>> melifaro@yandex-team.ru> wrote:
>>> On 29.08.2013 15:49, Adrian Chadd wrote:
>>>> Hi,
>>>
>>> Hello Adrian!
>>> I'm very sorry for the looong reply.
>>>
>>>> There's a lot of good stuff to review here, thanks!
>>>>
>>>> Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to
>>>> keep locking things like that on a per-packet basis. We should be able
>>>> to do this in a cleaner way - we can defer RX into a CPU pinned
>>>> taskqueue and convert the interrupt handler to a fast handler that
>>>> just schedules that taskqueue. We can ignore the ithread entirely
>>>> here.
>>>>
>>>> What do you think?
>>>
>>> Well, it sounds good :) But performance numbers and Jack opinion is
>>> more important :)
>>>
>>> Are you going to Malta?
>>>
>>>> Totally pie in the sky handwaving at this point:
>>>>
>>>> * create an array of mbuf pointers for completed mbufs;
>>>> * populate the mbuf array;
>>>> * pass the array up to ether_demux().
>>>>
>>>> For vlan handling, it may end up populating its own list of mbufs to
>>>> push up to ether_demux(). So maybe we should extend the API to have a
>>>> bitmap of packets to actually handle from the array, so we can pass up
>>>> a larger array of mbufs, note which ones are for the destination and
>>>> then the upcall can mark which frames its consumed.
>>>>
>>>> I specifically wonder how much work/benefit we may see by doing:
>>>>
>>>> * batching packets into lists so various steps can batch process
>>>> things rather than run to completion;
>>>> * batching the processing of a list of frames under a single lock
>>>> instance - eg, if the forwarding code could do the forwarding lookup
>>>> for 'n' packets under a single lock, then pass that list of frames up
>>>> to inet_pfil_hook() to do the work under one lock, etc, etc.
>>>
>>> I'm thinking the same way, but we're stuck with 'forwarding lookup' due
>>> to problem with egress interface pointer, as I mention earlier. However
>>> it is interesting to see how much it helps, regardless of locking.
>>>
>>> Currently I'm thinking that we should try to change radix to something
>>> different (it seems that it can be checked fast) and see what happened.
>>> Luigi's performance numbers for our radix are too awful, and there is a
>>> patch implementing alternative trie:
>>> http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf<http://info.i
>>> et.unipi.it/~luigi/papers/20120601-dxr.pdf>
>>> http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff<http://www.nxlab.f
>>> er.hr/dxr/stable_8_20120824.diff>
>>>
>>>> Here, the processing would look less like "grab lock and process to
>>>> completion" and more like "mark and sweep" - ie, we have a list of
>>>> frames that we mark as needing processing and mark as having been
>>>> processed at each layer, so we know where to next dispatch them.
>>>>
>>>> I still have some tool coding to do with PMC before I even think about
>>>> tinkering with this as I'd like to measure stuff like per-packet
>>>> latency as well as top-level processing overhead (ie,
>>>> CPU_CLK_UNHALTED.THREAD_P / lagg0 TX bytes/pkts, RX bytes/pkts, NIC
>>>> interrupts on that core, etc.)
>>>
>>> That will be great to see!
>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>> -adrian
>>>
>>> ______________________________**_________________
>>> freebsd-net@freebsd.org mailing list
>>> http://lists.freebsd.org/**mailman/listinfo/freebsd-net<http://lists.fr
>>> eebsd.org/mailman/listinfo/freebsd-net> To unsubscribe, send any mail to
>>> "freebsd-net-unsubscribe@**freebsd.org<freebsd-net-unsubscribe@freebsd.
>>> org> "
>>
>> --
>> Sami Halabi
>> Information Systems Engineer
>> NMS Projects Expert
>> FreeBSD SysAdmin Expert
>>
>>
>> On Sun, Sep 22, 2013 at 11:12 PM, Alexander V. Chernikov <
>>
>> melifaro@yandex-team.ru> wrote:
>>> On 29.08.2013 15:49, Adrian Chadd wrote:
>>>> Hi,
>>>
>>> Hello Adrian!
>>> I'm very sorry for the looong reply.
>>>
>>>> There's a lot of good stuff to review here, thanks!
>>>>
>>>> Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to
>>>> keep locking things like that on a per-packet basis. We should be able
>>>> to do this in a cleaner way - we can defer RX into a CPU pinned
>>>> taskqueue and convert the interrupt handler to a fast handler that
>>>> just schedules that taskqueue. We can ignore the ithread entirely
>>>> here.
>>>>
>>>> What do you think?
>>>
>>> Well, it sounds good :) But performance numbers and Jack opinion is
>>> more important :)
>>>
>>> Are you going to Malta?
>>>
>>>> Totally pie in the sky handwaving at this point:
>>>>
>>>> * create an array of mbuf pointers for completed mbufs;
>>>> * populate the mbuf array;
>>>> * pass the array up to ether_demux().
>>>>
>>>> For vlan handling, it may end up populating its own list of mbufs to
>>>> push up to ether_demux(). So maybe we should extend the API to have a
>>>> bitmap of packets to actually handle from the array, so we can pass up
>>>> a larger array of mbufs, note which ones are for the destination and
>>>> then the upcall can mark which frames its consumed.
>>>>
>>>> I specifically wonder how much work/benefit we may see by doing:
>>>>
>>>> * batching packets into lists so various steps can batch process
>>>> things rather than run to completion;
>>>> * batching the processing of a list of frames under a single lock
>>>> instance - eg, if the forwarding code could do the forwarding lookup
>>>> for 'n' packets under a single lock, then pass that list of frames up
>>>> to inet_pfil_hook() to do the work under one lock, etc, etc.
>>>
>>> I'm thinking the same way, but we're stuck with 'forwarding lookup' due
>>> to problem with egress interface pointer, as I mention earlier. However
>>> it is interesting to see how much it helps, regardless of locking.
>>>
>>> Currently I'm thinking that we should try to change radix to something
>>> different (it seems that it can be checked fast) and see what happened.
>>> Luigi's performance numbers for our radix are too awful, and there is a
>>> patch implementing alternative trie:
>>> http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf<http://info.i
>>> et.unipi.it/~luigi/papers/20120601-dxr.pdf>
>>> http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff<http://www.nxlab.f
>>> er.hr/dxr/stable_8_20120824.diff>
>>>
>>>> Here, the processing would look less like "grab lock and process to
>>>> completion" and more like "mark and sweep" - ie, we have a list of
>>>> frames that we mark as needing processing and mark as having been
>>>> processed at each layer, so we know where to next dispatch them.
>>>>
>>>> I still have some tool coding to do with PMC before I even think about
>>>> tinkering with this as I'd like to measure stuff like per-packet
>>>> latency as well as top-level processing overhead (ie,
>>>> CPU_CLK_UNHALTED.THREAD_P / lagg0 TX bytes/pkts, RX bytes/pkts, NIC
>>>> interrupts on that core, etc.)
>>>
>>> That will be great to see!
>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>> -adrian
>>>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5241519C.9040908>