From owner-freebsd-net@FreeBSD.ORG  Mon Sep 23 22:46:47 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 8EB6AE72;
 Mon, 23 Sep 2013 22:46:47 +0000 (UTC)
 (envelope-from sodynet1@gmail.com)
Received: from mail-pd0-x22a.google.com (mail-pd0-x22a.google.com
 [IPv6:2607:f8b0:400e:c02::22a])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4269C2F8F;
 Mon, 23 Sep 2013 22:46:47 +0000 (UTC)
Received: by mail-pd0-f170.google.com with SMTP id x10so3809582pdj.29
 for <multiple recipients>; Mon, 23 Sep 2013 15:46:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=UjilBwU9OcaYlukiK4g1agmh0oG9lXC8IZz6YH7pBlg=;
 b=zNd2nK4fBP8S1qNkzhLLiAaFq3C4x9cs2l/LTZGD8uliIScxephPgtW+7SCmsbDgyP
 zPOwbSas1L4JqJYHFeXl+xDyHGgcKGQxW90RfR95vFmZzoqsZPL6IkW4nE98w++cfJsZ
 +e6CJwB7dPIYAg0V1byVyhshHzsVmidAdwxDXn3Aij1tX5RjR+b7zexAkbHtjh1aDB/L
 FafyXOjh/rDms1JVnjmCUv9t2V6LedW+gx722WWwMGk7TiOi77IdH6DE5B9lYEfwZ4g7
 trkHotGxZ1WWlEq6SF2TtwsgUIzr8ghc2hO73/neyiPvynvGoPaLlM/NHjuRcQjZss2Q
 lv5A==
MIME-Version: 1.0
X-Received: by 10.68.11.103 with SMTP id p7mr3431565pbb.84.1379976406783; Mon,
 23 Sep 2013 15:46:46 -0700 (PDT)
Received: by 10.70.30.98 with HTTP; Mon, 23 Sep 2013 15:46:46 -0700 (PDT)
In-Reply-To: <523F4F14.9090404@yandex-team.ru>
References: <521E41CB.30700@yandex-team.ru>
 <CAJ-Vmo=N=HnZVCD41ZmDg2GwNnoa-tD0J0QLH80x=f7KA5d+Ug@mail.gmail.com>
 <523F4F14.9090404@yandex-team.ru>
Date: Tue, 24 Sep 2013 01:46:46 +0300
Message-ID: <CAEW+ogZttyScUBQQWht+YGfLEDU_APcoRyYeMy_wDseAcZwVnA@mail.gmail.com>
Subject: Re: Network stack changes
From: Sami Halabi <sodynet1@gmail.com>
To: "Alexander V. Chernikov" <melifaro@yandex-team.ru>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: Adrian Chadd <adrian@freebsd.org>, Andre Oppermann <andre@freebsd.org>,
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>,
 Luigi Rizzo <luigi@freebsd.org>, "Andrey V. Elsukov" <ae@freebsd.org>,
 FreeBSD Net <net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Sep 2013 22:46:47 -0000

Hi,
> http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf<http://info.iet.unipi.it/~luigi/papers/20120601-dxr.pdf>
> http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff<http://www.nxlab.fer.hr/dxr/stable_8_20120824.diff>
I've tried the diff in 10-current, applied cleanly but had errors compiling
new kernel... is there any work to make it work? i'd love to test it.

Sami


On Sun, Sep 22, 2013 at 11:12 PM, Alexander V. Chernikov <
melifaro@yandex-team.ru> wrote:

> On 29.08.2013 15:49, Adrian Chadd wrote:
>
>> Hi,
>>
> Hello Adrian!
> I'm very sorry for the looong reply.
>
>
>
>> There's a lot of good stuff to review here, thanks!
>>
>> Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to
>> keep locking things like that on a per-packet basis. We should be able to
>> do this in a cleaner way - we can defer RX into a CPU pinned taskqueue and
>> convert the interrupt handler to a fast handler that just schedules that
>> taskqueue. We can ignore the ithread entirely here.
>>
>> What do you think?
>>
> Well, it sounds good :) But performance numbers and Jack opinion is more
> important :)
>
> Are you going to Malta?
>
>
>> Totally pie in the sky handwaving at this point:
>>
>> * create an array of mbuf pointers for completed mbufs;
>> * populate the mbuf array;
>> * pass the array up to ether_demux().
>>
>> For vlan handling, it may end up populating its own list of mbufs to push
>> up to ether_demux(). So maybe we should extend the API to have a bitmap of
>> packets to actually handle from the array, so we can pass up a larger array
>> of mbufs, note which ones are for the destination and then the upcall can
>> mark which frames its consumed.
>>
>> I specifically wonder how much work/benefit we may see by doing:
>>
>> * batching packets into lists so various steps can batch process things
>> rather than run to completion;
>> * batching the processing of a list of frames under a single lock
>> instance - eg, if the forwarding code could do the forwarding lookup for
>> 'n' packets under a single lock, then pass that list of frames up to
>> inet_pfil_hook() to do the work under one lock, etc, etc.
>>
> I'm thinking the same way, but we're stuck with 'forwarding lookup' due to
> problem with egress interface pointer, as I mention earlier. However it is
> interesting to see how much it helps, regardless of locking.
>
> Currently I'm thinking that we should try to change radix to something
> different (it seems that it can be checked fast) and see what happened.
> Luigi's performance numbers for our radix are too awful, and there is a
> patch implementing alternative trie:
> http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf<http://info.iet.unipi.it/~luigi/papers/20120601-dxr.pdf>
> http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff<http://www.nxlab.fer.hr/dxr/stable_8_20120824.diff>
>
>
>
>
>> Here, the processing would look less like "grab lock and process to
>> completion" and more like "mark and sweep" - ie, we have a list of frames
>> that we mark as needing processing and mark as having been processed at
>> each layer, so we know where to next dispatch them.
>>
>> I still have some tool coding to do with PMC before I even think about
>> tinkering with this as I'd like to measure stuff like per-packet latency as
>> well as top-level processing overhead (ie, CPU_CLK_UNHALTED.THREAD_P /
>> lagg0 TX bytes/pkts, RX bytes/pkts, NIC interrupts on that core, etc.)
>>
> That will be great to see!
>
>>
>> Thanks,
>>
>>
>>
>> -adrian
>>
>>
> ______________________________**_________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/**mailman/listinfo/freebsd-net<http://lists.freebsd.org/mailman/listinfo/freebsd-net>
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@**freebsd.org<freebsd-net-unsubscribe@freebsd.org>
> "
>



-- 
Sami Halabi
Information Systems Engineer
NMS Projects Expert
FreeBSD SysAdmin Expert


On Sun, Sep 22, 2013 at 11:12 PM, Alexander V. Chernikov <
melifaro@yandex-team.ru> wrote:

> On 29.08.2013 15:49, Adrian Chadd wrote:
>
>> Hi,
>>
> Hello Adrian!
> I'm very sorry for the looong reply.
>
>
>
>> There's a lot of good stuff to review here, thanks!
>>
>> Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to
>> keep locking things like that on a per-packet basis. We should be able to
>> do this in a cleaner way - we can defer RX into a CPU pinned taskqueue and
>> convert the interrupt handler to a fast handler that just schedules that
>> taskqueue. We can ignore the ithread entirely here.
>>
>> What do you think?
>>
> Well, it sounds good :) But performance numbers and Jack opinion is more
> important :)
>
> Are you going to Malta?
>
>
>> Totally pie in the sky handwaving at this point:
>>
>> * create an array of mbuf pointers for completed mbufs;
>> * populate the mbuf array;
>> * pass the array up to ether_demux().
>>
>> For vlan handling, it may end up populating its own list of mbufs to push
>> up to ether_demux(). So maybe we should extend the API to have a bitmap of
>> packets to actually handle from the array, so we can pass up a larger array
>> of mbufs, note which ones are for the destination and then the upcall can
>> mark which frames its consumed.
>>
>> I specifically wonder how much work/benefit we may see by doing:
>>
>> * batching packets into lists so various steps can batch process things
>> rather than run to completion;
>> * batching the processing of a list of frames under a single lock
>> instance - eg, if the forwarding code could do the forwarding lookup for
>> 'n' packets under a single lock, then pass that list of frames up to
>> inet_pfil_hook() to do the work under one lock, etc, etc.
>>
> I'm thinking the same way, but we're stuck with 'forwarding lookup' due to
> problem with egress interface pointer, as I mention earlier. However it is
> interesting to see how much it helps, regardless of locking.
>
> Currently I'm thinking that we should try to change radix to something
> different (it seems that it can be checked fast) and see what happened.
> Luigi's performance numbers for our radix are too awful, and there is a
> patch implementing alternative trie:
> http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf<http://info.iet.unipi.it/~luigi/papers/20120601-dxr.pdf>
> http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff<http://www.nxlab.fer.hr/dxr/stable_8_20120824.diff>
>
>
>
>
>> Here, the processing would look less like "grab lock and process to
>> completion" and more like "mark and sweep" - ie, we have a list of frames
>> that we mark as needing processing and mark as having been processed at
>> each layer, so we know where to next dispatch them.
>>
>> I still have some tool coding to do with PMC before I even think about
>> tinkering with this as I'd like to measure stuff like per-packet latency as
>> well as top-level processing overhead (ie, CPU_CLK_UNHALTED.THREAD_P /
>> lagg0 TX bytes/pkts, RX bytes/pkts, NIC interrupts on that core, etc.)
>>
> That will be great to see!
>
>>
>> Thanks,
>>
>>
>>
>> -adrian
>>
>>
> ______________________________**_________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/**mailman/listinfo/freebsd-net<http://lists.freebsd.org/mailman/listinfo/freebsd-net>
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@**freebsd.org<freebsd-net-unsubscribe@freebsd.org>
> "
>



-- 
Sami Halabi
Information Systems Engineer
NMS Projects Expert
FreeBSD SysAdmin Expert