From owner-freebsd-bugs@FreeBSD.ORG Sun Nov 4 12:57:26 2007 Return-Path: Delivered-To: freebsd-bugs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9865A16A468; Sun, 4 Nov 2007 12:57:26 +0000 (UTC) (envelope-from iedowse@iedowse.com) Received: from nowhere.iedowse.com (nowhere.iedowse.com [IPv6:2001:770:13b::1]) by mx1.freebsd.org (Postfix) with SMTP id A9AED13C4B6; Sun, 4 Nov 2007 12:57:25 +0000 (UTC) (envelope-from iedowse@iedowse.com) Received: from localhost ([127.0.0.1] helo=iedowse.com) by nowhere.iedowse.com via local-iedowse id ; 4 Nov 2007 12:57:24 +0000 (GMT) To: Danny Braniss In-reply-to: References: Comments: In-reply-to Danny Braniss message dated "Sun, 04 Nov 2007 13:46:49 +0200." Date: Sun, 04 Nov 2007 12:57:23 +0000 From: Ian Dowse Message-ID: <200711041257.aa85528@nowhere.iedowse.com> Cc: freebsd-bugs@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org, iedowse@FreeBSD.org Subject: Re: bin/117603: [patch] dump(8) hangs on SMP - 4way and higher. X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Nov 2007 12:57:26 -0000 Hi, It's probably better to keep the caught variable, since isn't there a risk of a different signal completing the sigsuspend(), e.g. SIGINFO caused by Ctrl-T? This is why the patch I posted has a while loop around the sigsuspend call - it presumbably was an issue with the original pause() call too. Would you be able to test the patch I posted? I haven't yet found a setup here that triggers the original problem, so it would be good to know if using sigsuspend alone is enough. It wasn't clear to me what exactly was involved in the original race - the use of volatiles and the setjmp()/longjmp(), while fairly horrible, did appear to cover the more obvious race conditions. Ian In message , Danny Braniss writes: >This is a multipart MIME message. > >--==_Exmh_1194176785_547800 >Content-Type: text/plain; charset=us-ascii > >wups, sent an old diffs. > > >--==_Exmh_1194176785_547800 >Content-Type: text/plain ; name="dump.c.diffs"; charset=us-ascii >Content-Description: dump.c.diffs >Content-Disposition: attachment; filename="dump.c.diffs" > >--- tape.c.orig Wed Mar 2 04:30:08 2005 >+++ tape.c Sun Nov 4 13:42:55 2007 >@@ -109,11 +109,6 @@ > > int master; /* pid of master, for sending error signals */ > int tenths; /* length of tape used per block written */ >-static volatile sig_atomic_t caught; /* have we caught the signal to proceed? > */ >-static volatile sig_atomic_t ready; /* reached the lock point without having >*/ >- /* received the SIGUSR2 signal from the prev slave? */ >-static jmp_buf jmpbuf; /* where to jump to if we are ready when the */ >- /* SIGUSR2 arrives from the previous slave */ > > int > alloctape(void) >@@ -685,15 +680,13 @@ > void > proceed(int signo __unused) > { >- >- if (ready) >- longjmp(jmpbuf, 1); >- caught++; >+ // do nothing ... > } > > void > enslave(void) > { >+ sigset_t s_mask; > int cmd[2]; > int i, j; > >@@ -704,13 +697,11 @@ > signal(SIGUSR1, tperror); /* Slave sends SIGUSR1 on tape errors */ > signal(SIGUSR2, proceed); /* Slave sends SIGUSR2 to next slave */ > >- for (i = 0; i < SLAVES; i++) { >- if (i == slp - &slaves[0]) { >- caught = 1; >- } else { >- caught = 0; >- } >+ sigemptyset(&s_mask); >+ sigaddset(&s_mask, SIGUSR2); >+ sigprocmask(SIG_BLOCK, &s_mask, NULL); > >+ for (i = 0; i < SLAVES; i++) { > if (socketpair(AF_UNIX, SOCK_STREAM, 0, cmd) < 0 || > (slaves[i].pid = fork()) < 0) > quit("too many slaves, %d (recompile smaller): %s\n", >@@ -733,6 +724,7 @@ > sizeof slaves[0].pid); > > master = 0; >+ kill(slp->pid, SIGUSR2); // start the ball rolling > } > > void >@@ -757,6 +749,7 @@ > static void > doslave(int cmd, int slave_number) > { >+ sigset_t s_mask; > int nread; > int nextslave, size, wrote, eot_count; > >@@ -774,7 +767,7 @@ > != sizeof nextslave) { > quit("master/slave protocol botched - didn't get pid of next sl >ave.\n"); > } >- >+ sigemptyset(&s_mask); > /* > * Get list of blocks to dump, read the blocks into tape buffer > */ >@@ -793,14 +786,7 @@ > quit("master/slave protocol botched.\n") >; > } > } >- if (setjmp(jmpbuf) == 0) { >- ready = 1; >- if (!caught) >- (void) pause(); >- } >- ready = 0; >- caught = 0; >- >+ sigsuspend(&s_mask); > /* Try to write the data... */ > eot_count = 0; > size = 0; > >--==_Exmh_1194176785_547800-- > >