Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Jun 2004 08:38:57 -0400
From:      Ed Maste <emaste@sandvine.com>
To:        'Sergey Lyubka' <devnull@uptsoft.com>, freebsd-hackers@freebsd.org
Subject:   RE: memory mapped packet capturing - bpf replacement ?
Message-ID:  <FE045D4D9F7AED4CBFF1B3B813C8533701BD40C7@mail.sandvine.com>

next in thread | raw e-mail | index | archive | help

> The module is a netgraph node, called ng_mmq. mmq stands for
> memory-mapped queue. The node has one hook, called "input".
> When this hook is connected,
> 	o memory buffer is allocated. size is controlled by the
> 	debug.mmq_size sysctl.
> 	o a device /dev/mmqX is created, where X is a node ID
> 	o /dev/mmqX is mmap-able by the user, mmap() returns an
> 	allocated buffer
> 	o when packet arrives on hook, it is copied to the buffer,
> 	which is actually a ringbuffer. The ringbuffer's head is
> 	advanced.
> 	o user spins until tail != head, which means new data arrived.
> 	Then it reads from ringbuffer, and advances the tail.
> 	o no mutexes are used
> 
> The code is at 
> 
> So this is the basic idea. I connected ng_mmq node to my rl0:
> ethernet node via the ng_hub, and benchmarked it against the
> pcap, using the same pcap callback function. Packet processing was
> simulated by the delay() function that just takes some CPU cycles.
> What I have found is:
> 	1. bpf seems to be faster, i.e. it drops less packets than mmq
> 	2. mmq seems to capture more packets.
> 
> This is sample output from the benchmark utility:
> # ./benchmark rl0 /dev/mmq5 1000
> pcap: rcvd: 15061, dropped: 14047, seen: 1000
> mmq: rcvd: 23172, dropped: 21789, seen: 1000
> 
> Now, the questions:
> 	1. is my interpretation of benchmark results correct?
> 	2. if they are correct, why bpf is faster?
> 	3. is it OK to have no mutexes for ringbuffer operations ?

Hello Sergey.  I haven't looked at your code, but I'll provide 
some comments, having implemented a mmaped ringbuffer BPF 
replacement myself.

First off, you should be able to do significantly better than 
vanilla BPF.  Gigabit line rate is doable for "reasonable" sized
packets and good hardware.

Watch how much time you spend in your simulated packet 
processing.  I also needed to add a delay to my 
benchmarking, because without it I'd run into the hardware 
limit (i.e. 1gbps), hiding the effects of further tweaking.
However, if it's too great it will overwhelm the bpf/ringbuffer
overhead, making your results less useful.

I did my benchmark by increasing the packet rate until I found
the point at which packets started to be dropped.  

In my testing I found the call to microtime() to be quite
expensive.  (It will vary depending on which timecounter is 
being used.)

Is this in a SMP or uniprocesor environment?  I think your gain
from a ringbuffer interface will be more significant in the SMP
case.

Does the ng_hub cause the packet to be copied?  If so you've 
still got the same number of copies as vanilla BPF.

Are you using the same snap length (or copying the entire packet)
in each case?

As for question 3, be careful that you're atomically modifying
the head and tail indices/pointers.  But yes, you can do it 
without a mutex.

-ed



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FE045D4D9F7AED4CBFF1B3B813C8533701BD40C7>