Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Aug 2011 17:21:08 +0900
From:      Takuya ASADA <syuu@dokukino.com>
To:        net@freebsd.org
Subject:   Re: Multiqueue support for bpf
Message-ID:  <CALG4x-Vvu=LsRjdvaz19%2B_QTr2uAqcY514OxA3dy=L%2BnY-qV5g@mail.gmail.com>
In-Reply-To: <CALG4x-VFC0yJK_dB9Z%2BDoBvBv1FGjOuVYWd=jtTBs0FeArjALg@mail.gmail.com>
References:  <CALG4x-VwhLmnh%2BRq0T8zdzp=yMD8o_WQ64_eqzc_dEhF-_mrGA@mail.gmail.com> <2AB05A3E-BDC3-427D-B4A7-ABDDFA98D194@dudu.ro> <0BB87D28-3094-422D-8262-5FA0E40BFC7C@dudu.ro> <CALG4x-VFC0yJK_dB9Z%2BDoBvBv1FGjOuVYWd=jtTBs0FeArjALg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Any comments or suggestions?

2011/8/18 Takuya ASADA <syuu@dokukino.com>:
> 2011/8/16 Vlad Galu <dudu@dudu.ro>:
>> On Aug 16, 2011, at 11:50 AM, Vlad Galu wrote:
>>> On Aug 16, 2011, at 11:13 AM, Takuya ASADA wrote:
>>>> Hi all,
>>>>
>>>> I implemented multiqueue support for bpf, I'd like to present for revi=
ew.
>>>> This is a Google Summer of Code project, the project goal is to
>>>> support multiqueue network interface on BPF, and provide interfaces
>>>> for multithreaded packet processing using BPF.
>>>> Modern high performance NICs have multiple receive/send queues and RSS
>>>> feature, this allows to process packet concurrently on multiple
>>>> processors.
>>>> Main purpose of the project is to support these hardware and get
>>>> benefit of parallelism.
>>>>
>>>> This provides following new APIs:
>>>> - queue filter for each bpf descriptor (bpf ioctl)
>>>> =C2=A0 - BIOCENAQMASK =C2=A0 =C2=A0Enables multiqueue filter on the de=
scriptor
>>>> =C2=A0 - BIOCDISQMASK =C2=A0 =C2=A0Disables multiqueue filter on the d=
escriptor
>>>> =C2=A0 - BIOCSTRXQMASK =C2=A0 =C2=A0Set mask bit on specified RX queue
>>>> =C2=A0 - BIOCCRRXQMASK =C2=A0 =C2=A0Clear mask bit on specified RX que=
ue
>>>> =C2=A0 - BIOCGTRXQMASK =C2=A0 =C2=A0Get mask bit on specified RX queue
>>>> =C2=A0 - BIOCSTTXQMASK =C2=A0 =C2=A0Set mask bit on specified TX queue
>>>> =C2=A0 - BIOCCRTXQMASK =C2=A0 =C2=A0Clear mask bit on specified TX que=
ue
>>>> =C2=A0 - BIOCGTTXQMASK =C2=A0 =C2=A0Get mask bit on specified TX queue
>>>> =C2=A0 - BIOCSTOTHERMASK =C2=A0 =C2=A0Set mask bit for the packets whi=
ch not tied
>>>> with any queues
>>>> =C2=A0 - BIOCCROTHERMASK =C2=A0 =C2=A0Clear mask bit for the packets w=
hich not tied
>>>> with any queues
>>>> =C2=A0 - BIOCGTOTHERMASK =C2=A0 =C2=A0Get mask bit for the packets whi=
ch not tied
>>>> with any queues
>>>>
>>>> - generic interface for getting hardware queue information from NIC
>>>> driver (socket ioctl)
>>>> =C2=A0 - SIOCGIFQLEN =C2=A0 =C2=A0Get interface RX/TX queue length
>>>> =C2=A0 - SIOCGIFRXQAFFINITY =C2=A0 =C2=A0Get interface RX queue affini=
ty
>>>> =C2=A0 - SIOCGIFTXQAFFINITY =C2=A0 =C2=A0Get interface TX queue affini=
ty
>>>>
>>>> Patch for -CURRENT is here, right now it only supports igb(4),
>>>> ixgbe(4), mxge(4):
>>>> http://www.dokukino.com/mq_bpf_20110813.diff
>>>>
>>>> And below is performance benchmark:
>>>>
>>>> =3D=3D=3D=3D
>>>> I implemented benchmark programs based on
>>>> bpfnull(//depot/projects/zcopybpf/utils/bpfnull/),
>>>>
>>>> test_sqbpf measures bpf throughput on one thread, without using multiq=
ueue APIs.
>>>> http://p4db.freebsd.org/fileViewer.cgi?FSPC=3D//depot/projects/soc2011=
/mq_bpf/src/tools/regression/bpf/mq_bpf/test_sqbpf/test_sqbpf.c
>>>>
>>>> test_mqbpf is multithreaded version of test_sqbpf, using multiqueue AP=
Is.
>>>> http://p4db.freebsd.org/fileViewer.cgi?FSPC=3D//depot/projects/soc2011=
/mq_bpf/src/tools/regression/bpf/mq_bpf/test_mqbpf/test_mqbpf.c
>>>>
>>>> I benchmarked with six conditions:
>>>> - benchmark1 only reads bpf, doesn't write packet anywhere
>>>> - benchmark2 writes packet on memory(mfs)
>>>> - benchmark3 writes packet on hdd(zfs)
>>>> - benchmark4 only reads bpf, doesn't write packet anywhere, with zeroc=
opy
>>>> - benchmark5 writes packet on memory(mfs), with zerocopy
>>>> - benchmark6 writes packet on hdd(zfs), with zerocopy
>>>>
>>>>> From benchmark result, I can say the performance is increased using
>>>> mq_bpf on 10GbE, but not on GbE.
>>>>
>>>> * Throughput benchmark
>>>> - Test environment
>>>> - FreeBSD node
>>>> =C2=A0CPU: Core i7 X980 (12 threads)
>>>> =C2=A0MB: ASUS P6X58D Premium(Intel X58)
>>>> =C2=A0NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
>>>> =C2=A0NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
>>>> - Linux node
>>>> =C2=A0CPU: Core 2 Quad (4 threads)
>>>> =C2=A0MB: GIGABYTE GA-G33-DS3R(Intel G33)
>>>> =C2=A0NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
>>>> =C2=A0NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
>>>>
>>>> iperf used for generate network traffic, with following argument optio=
ns
>>>> =C2=A0- Linux node: iperf -c [IP] -i 10 -t 100000 -P12
>>>> =C2=A0- FreeBSD node: iperf -s
>>>> =C2=A0# 12 threads, TCP
>>>>
>>>> following sysctl parameter is changed
>>>> =C2=A0sysctl -w net.bpf.maxbufsize=3D1048576
>>>
>>>
>>> Thank you for your work! You may want to increase that (4x/8x) and reru=
n the test, though.
>>
>> More, actually. Your current buffer is easily filled.
>
> Hi,
>
> I measured performance again with maxbufsize =3D 268435456 and multiple
> cpu configurations, here's an result.
> It seems the performance on 10GbE is bit unstable, not scaling
> linearly by adding cpus/queues.
> Maybe it depends some sort of system parameter, but I don't figure out
> the answer.
>
> Multithreaded BPF performance is increasing than single thread BPF in
> all case, anyway.
>
> * Test environment
> =C2=A0- FreeBSD node
> =C2=A0=C2=A0CPU: Core i7 X980 (12 threads)
> =C2=A0# Tested on 1 core, 2 core, 4 core and 6 core configuration (Each
> core has 2 threads using HT)
> =C2=A0=C2=A0MB: ASUS P6X58D Premium(Intel X58)
> =C2=A0=C2=A0NIC: Intel Ethernet X520-DA2 Server Adapter(82599)
>
> =C2=A0- Linux node
> =C2=A0=C2=A0CPU: Core 2 Quad (4 threads)
> =C2=A0=C2=A0MB: GIGABYTE GA-G33-DS3R(Intel G33)
> =C2=A0=C2=A0NIC: Intel Ethernet X520-DA2 Server Adapter(82599)
>
> =C2=A0- iperf
> =C2=A0 Linux node: iperf -c [IP] -i 10 -t 100000 -P16
> =C2=A0 FreeBSD node: iperf -s
> =C2=A0 # 16 threads, TCP
> =C2=A0- system parameter
> =C2=A0 net.bpf.maxbufsize=3D268435456
> =C2=A0 hw.ixgbe.num_queues=3D[n queues]
>
> * 2threads, 2queues
> =C2=A0- iperf throughput
> =C2=A0 iperf only: 8.845Gbps
> =C2=A0 test_mqbpf: 5.78Gbps
> =C2=A0 test_sqbpf: 6.89Gbps
> =C2=A0- test program throughput
> =C2=A0 test_mqbpf: 4526.863414 Mbps
> =C2=A0 test_sqbpf: 762.452475 Mbps
> =C2=A0- received/dropped
> =C2=A0 test_mqbpf:
> =C2=A0 =C2=A0 =C2=A045315011 packets received (BPF)
> =C2=A0 =C2=A0 =C2=A09646958 packets dropped (BPF)
> =C2=A0 test_sqbpf:
> =C2=A0 =C2=A0 =C2=A056216145 packets received (BPF)
> =C2=A0 =C2=A0 =C2=A049765127 packets dropped (BPF)
>
> * 4threads, 4queues
> =C2=A0- iperf throughput
> =C2=A0 iperf only: 3.03Gbps
> =C2=A0 test_mqbpf: 2.49Gbps
> =C2=A0 test_sqbpf: 2.57Gbps
> =C2=A0- test program throughput
> =C2=A0 test_mqbpf: 2420.195051 Mbps
> =C2=A0 test_sqbpf: 430.774870 Mbps
> =C2=A0- received/dropped
> =C2=A0 test_mqbpf:
> =C2=A0 =C2=A0 =C2=A019601503 packets received (BPF)
> =C2=A0 =C2=A0 =C2=A00 packets dropped (BPF)
> =C2=A0 test_sqbpf:
> =C2=A0 =C2=A0 =C2=A022803778 packets received (BPF)
> =C2=A0 =C2=A0 =C2=A018869653 packets dropped (BPF)
>
> * 8threads, 8queues
> =C2=A0- iperf throughput
> =C2=A0 iperf only: 5.80Gbps
> =C2=A0 test_mqbpf: 4.42Gbps
> =C2=A0 test_sqbpf: 4.30Gbps
> =C2=A0- test program throughput
> =C2=A0 test_mqbpf: 4242.314913 Mbps
> =C2=A0 test_sqbpf: 1291.719866 Mbps
> =C2=A0- received/dropped
> =C2=A0 test_mqbpf:
> =C2=A0 =C2=A0 =C2=A034996953 packets received (BPF)
> =C2=A0 =C2=A0 =C2=A0361947 packets dropped (BPF)
> =C2=A0 test_sqbpf:
> =C2=A0 =C2=A0 =C2=A035738058 packets received (BPF)
> =C2=A0 =C2=A0 =C2=A024749546 packets dropped (BPF)
>
> * 12threads, 12queues
> =C2=A0- iperf throughput
> =C2=A0 iperf only: 9.31Gbps
> =C2=A0 test_mqbpf: 8.06Gbps
> =C2=A0 test_sqbpf: 5.67Gbps
> =C2=A0- test program throughput
> =C2=A0 test_mqbpf: 8089.242472 Mbps
> =C2=A0 test_sqbpf: 5754.910665 Mbps
> =C2=A0- received/dropped
> =C2=A0 test_mqbpf:
> =C2=A0 =C2=A0 =C2=A073783957 packets received (BPF)
> =C2=A0 =C2=A0 =C2=A09938 packets dropped (BPF)
> =C2=A0 test_sqbpf:
> =C2=A0 =C2=A0 =C2=A049434479 packets received (BPF)
> =C2=A0 =C2=A0 =C2=A00 packets dropped (BPF)
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CALG4x-Vvu=LsRjdvaz19%2B_QTr2uAqcY514OxA3dy=L%2BnY-qV5g>