From owner-freebsd-net@FreeBSD.ORG Mon Sep 23 22:46:47 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 8EB6AE72; Mon, 23 Sep 2013 22:46:47 +0000 (UTC) (envelope-from sodynet1@gmail.com) Received: from mail-pd0-x22a.google.com (mail-pd0-x22a.google.com [IPv6:2607:f8b0:400e:c02::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 4269C2F8F; Mon, 23 Sep 2013 22:46:47 +0000 (UTC) Received: by mail-pd0-f170.google.com with SMTP id x10so3809582pdj.29 for ; Mon, 23 Sep 2013 15:46:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=UjilBwU9OcaYlukiK4g1agmh0oG9lXC8IZz6YH7pBlg=; b=zNd2nK4fBP8S1qNkzhLLiAaFq3C4x9cs2l/LTZGD8uliIScxephPgtW+7SCmsbDgyP zPOwbSas1L4JqJYHFeXl+xDyHGgcKGQxW90RfR95vFmZzoqsZPL6IkW4nE98w++cfJsZ +e6CJwB7dPIYAg0V1byVyhshHzsVmidAdwxDXn3Aij1tX5RjR+b7zexAkbHtjh1aDB/L FafyXOjh/rDms1JVnjmCUv9t2V6LedW+gx722WWwMGk7TiOi77IdH6DE5B9lYEfwZ4g7 trkHotGxZ1WWlEq6SF2TtwsgUIzr8ghc2hO73/neyiPvynvGoPaLlM/NHjuRcQjZss2Q lv5A== MIME-Version: 1.0 X-Received: by 10.68.11.103 with SMTP id p7mr3431565pbb.84.1379976406783; Mon, 23 Sep 2013 15:46:46 -0700 (PDT) Received: by 10.70.30.98 with HTTP; Mon, 23 Sep 2013 15:46:46 -0700 (PDT) In-Reply-To: <523F4F14.9090404@yandex-team.ru> References: <521E41CB.30700@yandex-team.ru> <523F4F14.9090404@yandex-team.ru> Date: Tue, 24 Sep 2013 01:46:46 +0300 Message-ID: Subject: Re: Network stack changes From: Sami Halabi To: "Alexander V. Chernikov" Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Adrian Chadd , Andre Oppermann , "freebsd-hackers@freebsd.org" , "freebsd-arch@freebsd.org" , Luigi Rizzo , "Andrey V. Elsukov" , FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Sep 2013 22:46:47 -0000 Hi, > http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf > http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff I've tried the diff in 10-current, applied cleanly but had errors compiling new kernel... is there any work to make it work? i'd love to test it. Sami On Sun, Sep 22, 2013 at 11:12 PM, Alexander V. Chernikov < melifaro@yandex-team.ru> wrote: > On 29.08.2013 15:49, Adrian Chadd wrote: > >> Hi, >> > Hello Adrian! > I'm very sorry for the looong reply. > > > >> There's a lot of good stuff to review here, thanks! >> >> Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to >> keep locking things like that on a per-packet basis. We should be able to >> do this in a cleaner way - we can defer RX into a CPU pinned taskqueue and >> convert the interrupt handler to a fast handler that just schedules that >> taskqueue. We can ignore the ithread entirely here. >> >> What do you think? >> > Well, it sounds good :) But performance numbers and Jack opinion is more > important :) > > Are you going to Malta? > > >> Totally pie in the sky handwaving at this point: >> >> * create an array of mbuf pointers for completed mbufs; >> * populate the mbuf array; >> * pass the array up to ether_demux(). >> >> For vlan handling, it may end up populating its own list of mbufs to push >> up to ether_demux(). So maybe we should extend the API to have a bitmap of >> packets to actually handle from the array, so we can pass up a larger array >> of mbufs, note which ones are for the destination and then the upcall can >> mark which frames its consumed. >> >> I specifically wonder how much work/benefit we may see by doing: >> >> * batching packets into lists so various steps can batch process things >> rather than run to completion; >> * batching the processing of a list of frames under a single lock >> instance - eg, if the forwarding code could do the forwarding lookup for >> 'n' packets under a single lock, then pass that list of frames up to >> inet_pfil_hook() to do the work under one lock, etc, etc. >> > I'm thinking the same way, but we're stuck with 'forwarding lookup' due to > problem with egress interface pointer, as I mention earlier. However it is > interesting to see how much it helps, regardless of locking. > > Currently I'm thinking that we should try to change radix to something > different (it seems that it can be checked fast) and see what happened. > Luigi's performance numbers for our radix are too awful, and there is a > patch implementing alternative trie: > http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf > http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff > > > > >> Here, the processing would look less like "grab lock and process to >> completion" and more like "mark and sweep" - ie, we have a list of frames >> that we mark as needing processing and mark as having been processed at >> each layer, so we know where to next dispatch them. >> >> I still have some tool coding to do with PMC before I even think about >> tinkering with this as I'd like to measure stuff like per-packet latency as >> well as top-level processing overhead (ie, CPU_CLK_UNHALTED.THREAD_P / >> lagg0 TX bytes/pkts, RX bytes/pkts, NIC interrupts on that core, etc.) >> > That will be great to see! > >> >> Thanks, >> >> >> >> -adrian >> >> > ______________________________**_________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/**mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@**freebsd.org > " > -- Sami Halabi Information Systems Engineer NMS Projects Expert FreeBSD SysAdmin Expert On Sun, Sep 22, 2013 at 11:12 PM, Alexander V. Chernikov < melifaro@yandex-team.ru> wrote: > On 29.08.2013 15:49, Adrian Chadd wrote: > >> Hi, >> > Hello Adrian! > I'm very sorry for the looong reply. > > > >> There's a lot of good stuff to review here, thanks! >> >> Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to >> keep locking things like that on a per-packet basis. We should be able to >> do this in a cleaner way - we can defer RX into a CPU pinned taskqueue and >> convert the interrupt handler to a fast handler that just schedules that >> taskqueue. We can ignore the ithread entirely here. >> >> What do you think? >> > Well, it sounds good :) But performance numbers and Jack opinion is more > important :) > > Are you going to Malta? > > >> Totally pie in the sky handwaving at this point: >> >> * create an array of mbuf pointers for completed mbufs; >> * populate the mbuf array; >> * pass the array up to ether_demux(). >> >> For vlan handling, it may end up populating its own list of mbufs to push >> up to ether_demux(). So maybe we should extend the API to have a bitmap of >> packets to actually handle from the array, so we can pass up a larger array >> of mbufs, note which ones are for the destination and then the upcall can >> mark which frames its consumed. >> >> I specifically wonder how much work/benefit we may see by doing: >> >> * batching packets into lists so various steps can batch process things >> rather than run to completion; >> * batching the processing of a list of frames under a single lock >> instance - eg, if the forwarding code could do the forwarding lookup for >> 'n' packets under a single lock, then pass that list of frames up to >> inet_pfil_hook() to do the work under one lock, etc, etc. >> > I'm thinking the same way, but we're stuck with 'forwarding lookup' due to > problem with egress interface pointer, as I mention earlier. However it is > interesting to see how much it helps, regardless of locking. > > Currently I'm thinking that we should try to change radix to something > different (it seems that it can be checked fast) and see what happened. > Luigi's performance numbers for our radix are too awful, and there is a > patch implementing alternative trie: > http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf > http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff > > > > >> Here, the processing would look less like "grab lock and process to >> completion" and more like "mark and sweep" - ie, we have a list of frames >> that we mark as needing processing and mark as having been processed at >> each layer, so we know where to next dispatch them. >> >> I still have some tool coding to do with PMC before I even think about >> tinkering with this as I'd like to measure stuff like per-packet latency as >> well as top-level processing overhead (ie, CPU_CLK_UNHALTED.THREAD_P / >> lagg0 TX bytes/pkts, RX bytes/pkts, NIC interrupts on that core, etc.) >> > That will be great to see! > >> >> Thanks, >> >> >> >> -adrian >> >> > ______________________________**_________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/**mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@**freebsd.org > " > -- Sami Halabi Information Systems Engineer NMS Projects Expert FreeBSD SysAdmin Expert