From owner-p4-projects@FreeBSD.ORG Mon Dec 24 20:20:51 2007 Return-Path: Delivered-To: p4-projects@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 32767) id D6EC616A421; Mon, 24 Dec 2007 20:20:50 +0000 (UTC) Delivered-To: perforce@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2B07516A417 for ; Mon, 24 Dec 2007 20:20:50 +0000 (UTC) (envelope-from bb+lists.freebsd.perforce@cyrus.watson.org) Received: from repoman.freebsd.org (repoman.freebsd.org [IPv6:2001:4f8:fff6::29]) by mx1.freebsd.org (Postfix) with ESMTP id 0831D13C4CC for ; Mon, 24 Dec 2007 20:20:50 +0000 (UTC) (envelope-from bb+lists.freebsd.perforce@cyrus.watson.org) Received: from repoman.freebsd.org (localhost [127.0.0.1]) by repoman.freebsd.org (8.14.1/8.14.1) with ESMTP id lBOKKnoS007553 for ; Mon, 24 Dec 2007 20:20:49 GMT (envelope-from bb+lists.freebsd.perforce@cyrus.watson.org) Received: (from perforce@localhost) by repoman.freebsd.org (8.14.1/8.14.1/Submit) id lBOKKnrh007550 for perforce@freebsd.org; Mon, 24 Dec 2007 20:20:49 GMT (envelope-from bb+lists.freebsd.perforce@cyrus.watson.org) Date: Mon, 24 Dec 2007 20:20:49 GMT Message-Id: <200712242020.lBOKKnrh007550@repoman.freebsd.org> X-Authentication-Warning: repoman.freebsd.org: perforce set sender to bb+lists.freebsd.perforce@cyrus.watson.org using -f From: Robert Watson To: Perforce Change Reviews Cc: Subject: PERFORCE change 131540 for review X-BeenThere: p4-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: p4 projects tree changes List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Dec 2007 20:20:51 -0000 http://perforce.freebsd.org/chv.cgi?CH=131540 Change 131540 by rwatson@rwatson_cinnamon on 2007/12/24 20:20:17 Rewrite buffering portions of the man page. Affected files ... .. //depot/projects/zcopybpf/src/share/man/man4/bpf.4#5 edit Differences ... ==== //depot/projects/zcopybpf/src/share/man/man4/bpf.4#5 (text+ko) ==== @@ -89,53 +89,18 @@ Each descriptor that accepts the packet receives its own copy. .Pp .Nm -devices operate in one of two buffering modes: buffered +devices operate in one of two buffering modes: .Xr read 2 , -in which packet data is copied from the kernel explicitly using the +and zero-copy. +In buffered read mode, packet data is copied explicitly from the kernel to +user memory buffers using the .Xr read 2 -system call, and zero-copy buffer mode, in which the user process provides -two memory regions that -.Nm -will write to directly as the packets are accepted. -The buffering mode may be set with the -.Dv BIOCSETBUFMODE -ioctl (see below), and will default to buffered -.Xr read 2 -mode -.Dv ( BPF_BUFMODE_BUFFER ) -by default. -Buffers return the next group of packets that have matched the filter. +system call. +In zero-copy buffering mode, the kernel writes packet data directly into +shared memory buffers provided by the user application. Note that an individual packet larger than the buffer size is necessarily truncated. .Pp -In the case of buffered -.Xr read 2 , -the user process will declare a fixed buffer size that will be used both for -sizing internal buffers and for all -.Xr read 2 -operations on the file. -This size is returned by the -.Dv BIOCGBLEN -ioctl (see below), and -can be set with -.Dv BIOCSBLEN . -.Pp -In zero-copy buffering, the user process registers two memory buffers with -.Nm -via the -.Dv BIOCSETZBUF -ioctl (see below). -The user process may monitor for completion (filling) of a buffer, at which -point the memory contents of the buffer will be stable until the buffer is -returned for further kernel use using the -.Dv BIOCACKZBUF -ioctl. -Buffers will be of a fixed (and equal) size, be -page-aligned, and the size must be an integer multiple of the page size. -The maximum zero-copy buffer size is returned by the -.Dv BIOCGETZMAX -ioctl (see below). -.Pp The packet filter will support any link level protocol that has fixed length headers. Currently, only Ethernet, @@ -156,6 +121,144 @@ Currently, only writes to Ethernets and .Tn SLIP links are supported. +.Sh BUFFER MODES +.Nm +devices deliver packet data to the application via memory buffers provided by +the application. +The buffer mode is set using the +.Dv BIOCSETBUFMODE +ioctl, and read using the +.Dv BIOCGETBUFMODE +ioctl. +.Ss Buffered read mode +By default, +.Nm +devices operate in the +.Dv BPF_BUFMODE_BUFFER +mode, in which packet data is copied explicitly from the kernel to user +memory using the +.Xr read 2 +system call. +The user process will declare a fixed buffer size that will be used both for +sizing internal buffers and for all +.Xr read 2 +operations on the file. +This size is queried using the +.Dv BIOCGBLEN +ioctl, and is set using the +.Dv BIOCSBLEN +ioctl. +.Ss Zero-copy buffer mode +.Nm +devices may also operate in the +.Dv BPF_BUFMODE_ZEROCOPY +mode, in which packet data is written directly by the kernel to memory +buffers provided by the process, avoiding both both system call and memory +copying overhead. +Buffers are of fixed (and equal) size, page-aligned, and an even multiple of +the page size. +The maximum zero-copy buffer size is returned by the +.Dv BIOCGETZMAX +ioctl. +.Pp +The user process registers two memory buffers using the +.Dv BIOCSETZBUF +ioctl, which accepts a +.Vt struct bpf_zbuf +pointer as an argument: +.Bd -literal +struct bpf_zbuf { + void *bz_bufa; + void *bz_bufb; + size_t bz_buflen; +}; +.Ed +.Pp +.Vt bz_bufa +is a pointer to the userspace address of the first buffer that will be +filled, and +.Vt bz_bufb +is a pointer to the second buffer. +.Nm +will then cycle between the two buffers. +.Pp +Buffer memory begins with a short, fixed-length header holding +synchronization and data length information for the buffer: +.Bd -literal +struct bpf_zbuf_header { + volatile u_int bzh_kernel_gen; /* Kernel generation number. */ + volatile u_int bzh_kernel_len; /* Length of buffer. */ + volatile u_int bzh_user_gen; /* User generation number. */ + /* ...padding for future use... */ +}; +.Ed +.Pp +This is followed immediately by packet data, laid out as described below. +.Pp +The kernel and the user process follow a simple acknowledgement protocol +using shared memory and ioctls to synchronize access to the two buffers. +Ownership of the buffer is signaled using the kernel and user generation +numbers in shared memory: the kernel modifies +.Vt bzh_kernel_gen +to assign ownership to userspace, and the user process sets +.Vt bzh_user_gen +to the value in +.Vt bzh_kernel_gen +to acknowledge the buffer and return it to kernel ownership. +While the kernel owns the buffer, the contents are unstable and will change +asynchronously; while the user process owns the buffer, its contents are +considered stable and will not be changed until the buffer is acknowledged. +The user process will initialize the +.Vt struct bpf_zbuf_header +to all 0's before registering the buffer, assigning initial ownership to the +kernel. +.Pp +In order to avoid caching and memory re-ordering effects, the user process +must use appropriate atomic operations and memory barriers when checking for +and acknowledging buffers: +.Bd -literal +#include + +/* + * Return ownership of a buffer to the kernel for reuse. + */ +static void +buffer_acknowledge(struct bpf_zbuf_header *bzh) +{ + + atomic_store_rel_int(&bzh->bzh_user_gen, bzh->bzh_kernel_gen); +}; + +/* + * Check whether a buffer has been assigned to userspace by the kernel. + * Return true if userspace owns the buffer, and false otherwise. + */ +static int +buffer_check(struct bpf_zbuf_header *bzh) +{ + + return (bzh->bzh_user_gen != + atomic_load_acq_int(&bzh->bzh_kernel_gen)); +} +.Ed +.Pp +The user process may force the assignment of the next buffer, if any data +is pending, to userspace using the +.Dv BIOCROTZBUF +ioctl. +This allows the user process to retrieve data in a partially filled buffer +before the buffer is completed, such as following a timeout; the process must +still check to see if ownership has been assigned using the header generation +numbers, as the buffer will not be assigned if there is no data available. +.Pp +As in the read buffering mode, +.Xr kqueue 2 , +.Xr poll 2 , +and +.Xr select 2 +may be used to sleep awaiting the availbility of a completed buffer. +They will return a readable file descriptor once at least one buffer is +assigned to user space. .Sh IOCTLS The .Xr ioctl 2 @@ -418,27 +521,9 @@ .Nm buffering mode; possible values are .Dv BPF_BUFMODE_BUFFER , -buffered -.Xr read 2 -mode, and +buffered read mode, and .Dv BPF_BUFMODE_ZBUF , zero-copy buffer mode. -.It Dv BIOCACKZBUF -.Pq Li struct bpf_zbuf -Return a completed zero-copy buffer to the kernel for reuse. -The following structure is used as an argument to these and other zero-copy -buffer ioctls: -.Bd -literal -struct bpf_zbuf { - void *bz_bufa; - void *bz_bufb; - size_t bz_buflen; -}; -.Ed -.Pp -Only the -.Vt bz_bufa -field will be used with this ioctl. .It Dv BIOCGETZBUF .It Dv BIOCSETZBUF .Pq Li struct bpf_zbuf @@ -455,6 +540,7 @@ and .Vt bz_buflen must be filled out. +If buffers have already been set for this device, the ioctl will fail. .It Dv BIOCGETZMAX .Pq Li size_t Get the largest individual zero-copy buffer size allowed. @@ -464,38 +550,23 @@ buffer size, especially when there are multiple .Nm descriptors in use on 32-bit systems. -.It Dv BIOCGETZNEXT .It Dv BIOCROTZBUF -.Pq Li struct bpf_zbuf -Get the buffer pointer and length of the next zero-copy buffer buffer ready -for userspace use, or -.Dv NULL -if there is no pending buffer. -.Pp -.Dv BIOCGETZNEXT -queries for the next completely filled buffer ready for immediate use, -returning NULL if there are only empty or partially filled buffers available. -.Pp -.Dv BIOCROTZBUF -queries for a filled buffer, but in the event there is only a partially -filled buffer, will make that buffer available for userspace to use -immediately. +Force ownership of the next buffer to be assigned to userspace, if any data +present in the buffer. +If no data is present, the buffer will remain owned by the kernel. +If userspace already owns the buffer, this operation will be a no-op. This allows consumers of zero-copy buffering to implement timeouts and retrieve partially filled buffers. -.Dv BIOCROTZBUF -will return -.Dv NULL -only if no data is present in either of the zero-copy buffers. -.Pp -Only the -.Vt bz_bufa -and -.Vt bz_buflen -fields will be used with this ioctl. +In order to handle the case where no data is present in the buffer and +therefore ownership is not assigned, the user process must check +.Vt bzh_kernel_gen +against +.Vt bzh_user_gen . .El .Sh BPF HEADER The following structure is prepended to each packet returned by -.Xr read 2 : +.Xr read 2 +or via a zero-copy buffer: .Bd -literal struct bpf_hdr { struct timeval bh_tstamp; /* time stamp */ @@ -861,6 +932,9 @@ .Sh SEE ALSO .Xr tcpdump 1 , .Xr ioctl 2 , +.Xr kqueue 2 , +.Xr poll 2 , +.Xr select 2 , .Xr byteorder 3 , .Xr ng_bpf 4 , .Xr bpf 9 @@ -893,6 +967,10 @@ Summer 1990. Much of the design is due to .An Van Jacobson . +.Pp +Support for zero-copy buffers was added by +.An Robert N. M. Watson +under contract to Seccuris Inc. .Sh BUGS The read buffer must be of a fixed size (returned by the .Dv BIOCGBLEN