From owner-freebsd-hackers Mon Jan 24 10:12:17 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from bomber.avantgo.com (ws1.avantgo.com [207.214.200.194]) by hub.freebsd.org (Postfix) with ESMTP id C4961155A5 for ; Mon, 24 Jan 2000 10:12:12 -0800 (PST) (envelope-from scott@avantgo.com) Received: from river ([10.0.128.30]) by bomber.avantgo.com (Netscape Messaging Server 3.5) with SMTP id 278 for ; Mon, 24 Jan 2000 10:08:08 -0800 Message-ID: <01b601bf6696$60701930$1e80000a@avantgo.com> From: "Scott Hess" To: Subject: Performance issue with rfork() and single socketpairs versus multiple socketpairs. Date: Mon, 24 Jan 2000 10:11:03 -0800 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_01B3_01BF6653.523EF680" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2919.6600 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6600 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG This is a multi-part message in MIME format. ------=_NextPart_000_01B3_01BF6653.523EF680 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit I've found an odd performance issue that I cannot explain. I'm using socketpairs to communicate with multiple rfork(RFPROC) processes. Initially, I used a seperate socketpair to communicate requests to each process, with locking in the parent to synchronize access to each client. I determined that by using a single socketpair, I could save on all the socketpairs, and also perhaps improve performance by allowing more requests to be dispatched than there were processes to handle them. Whenever a worker process finished one request, it would immediately be able to start the next, without having to wait for the parent to receive the response and reprocess the request structures. Unfortunately, I've found that having a group of processes reading from a group of socketpairs has better performance than having them all read from a single socketpair. I've been unable to determine why. I've reduced the problem down to a simple program, included as an attachment (sorry about that). The results of two runs of the program: ganja% time ./commtest --single ./commtest --single 0.00s user 0.66s system 15% cpu 4.132 total ganja% time ./commtest --multi ./commtest --multi 0.00s user 0.46s system 68% cpu 0.675 total Note that in the --single case, the system time rises a bit - but the wallclock time rises a _lot_. At first I thought this was a variant on the "thundering herd" problem, but the CPU times taken don't seem to bear this out. Any ideas? Running under 3.2-RELEASE on an SMP machine, though I saw the same results on 3.4-RELEASE. Thanks, scott ------=_NextPart_000_01B3_01BF6653.523EF680 Content-Type: application/octet-stream; name="commtest.c" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="commtest.c" // commtest.c=0A= // gcc -Wall -g -o commtest commtest.c=0A= //=0A= // Test performance differences for multiple socketpairs versus a=0A= // single shared socketpair.=0A= #include =0A= #include =0A= #include =0A= #include =0A= #include =0A= #include =0A= #include =0A= #include =0A= =0A= typedef unsigned char request_t;=0A= =0A= #define CLIENT_EXIT ((request_t)(~0))=0A= #define CLIENT_COUNT 32=0A= #define REQUEST_TARGET 10000=0A= =0A= int client_fd_count=3D0;=0A= int client_fds[ CLIENT_COUNT];=0A= int server_fds[ CLIENT_COUNT];=0A= =0A= /* Reflect requests. */=0A= void client( int fd)=0A= {=0A= request_t request;=0A= int rc;=0A= =0A= while( 1) {=0A= if( (rc=3Dread( fd, &request, sizeof( request)))=3D=3D-1) {=0A= perror( "client read");=0A= _exit( 1);=0A= } else if( rcmaxfd) {=0A= maxfd=3Dclient_fds[ ii];=0A= }=0A= }=0A= =0A= /* Spin off children to process requests. */=0A= for( ii=3D0; ii