Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 8 Aug 2011 21:31:27 -0400
From:      George Neville-Neil <gnn@freebsd.org>
To:        Takuya ASADA <syuu@dokukino.com>
Cc:        "Robert N. M. Watson" <rwatson@freebsd.org>, soc-status@freebsd.org, Kazuya Goda <gockzy@gmail.com>
Subject:   Re: [mq_bpf] status report #9
Message-ID:  <7FB7BCF6-5224-420D-85FA-3B82F1407E93@freebsd.org>
In-Reply-To: <CALG4x-UdHdg6NYgvrD986_kPeyYLR3KmJ8ijOLr%2BkQ-8_SaByA@mail.gmail.com>
References:  <CALG4x-UdHdg6NYgvrD986_kPeyYLR3KmJ8ijOLr%2BkQ-8_SaByA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Jul 27, 2011, at 19:11 , Takuya ASADA wrote:

> *Project summary
> The project goal is to support multiqueue network interface on BPF,
> and provide interfaces for multithreaded packet processing using BPF.
> Modern high performance NICs have multiple receive/send queues and RSS
> feature, this allows to process packet concurrently on multiple
> processors.
> Main purpose of the project is to support these hardware and get
> benefit of parallelism.
>=20
> Here's status update from last week:
> * Throughput benchmark
> - Test environment
>     CPU: Core i7 X980
>     MB: ASUS P6X58D Premium(Intel X58)
>     NIC: Intel Gigabit ET Dual Port Server Adapter(82576)
>=20
> - Benchmark program
> test_sqpbf is single threaded bpf benchmark which used only existing =
bpf ioctls.
> It fetch all packets from a NIC and output them on file.
>=20
> test_mqbpf is multithreaded bpf benchmark which used new multiqueue =
bpf ioctls.
> Each thread fetch packets only from pinned queue and output them on
> per thread separated file.
>=20
> - Test conditions
> iperf used for generate network traffic, with following argument =
options
>     test node: iperf -s -i1
>     other node:  iperf -c [IP] -i1 -t 100000 -P8
>     # 8 threads, TCP
>=20
> tested with following 4 kernels to compare
>     current: GENERIC kernel on current, BPFIF_LOCK:mtx =
BPFQ_LOCK:doesn't exist
>     mq_bpf1: RSS kernel on mp_bpf, BPFIF_LOCK:mtx BPFQ_LOCK:mtx
>     mq_bpf2: RSS kernel on mp_bpf, BPFIF_LOCK:mtx BPFQ_LOCK:rmlock
>     mq_bpf3: RSS kernel on mp_bpf, BPFIF_LOCK:rmlock BPFQ_LOCK:rmlock
>=20
> - Benchmark result(MB/s)
> The result is 20 times average of test_sqbpf / test_mqbpf
> 		test_sqbpf		test_mqbpf
> current	26.65568315	-
> mq_bpf1	24.96387975	36.608574
> mq_bpf2	27.13427415	41.76666665
> mq_bpf3	27.0958332	51.48198915


This looks good and it looks as if the performance scales linearly.  =
Were the test programs
cpuset to each core?  Is the test code in the p4 tree yet?

Best,
George




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7FB7BCF6-5224-420D-85FA-3B82F1407E93>