From owner-freebsd-current@FreeBSD.ORG Mon Apr 18 00:34:20 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 968AD16A4CE for ; Mon, 18 Apr 2005 00:34:20 +0000 (GMT) Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id E9BB343D41 for ; Mon, 18 Apr 2005 00:34:19 +0000 (GMT) (envelope-from peadar.edwards@gmail.com) Received: by zproxy.gmail.com with SMTP id 34so1393698nzf for ; Sun, 17 Apr 2005 17:34:19 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:references; b=E3av4I2uGOsGq0G+wj0bVwaCL6bJzwUqbdhKMp3smmMBZErncC+fck9VckYVXoh2uzE7EOUPq8VwR3Wqn2WOjRYIoQfVfN+C4Le9OXD1kOaGXthQTPw57rczjrpKo08Q5t/Qo/NOaDwo3mUij44fxw6sHS8g7YwtNb0RB61NDoA= Received: by 10.36.25.2 with SMTP id 2mr325117nzy; Sun, 17 Apr 2005 17:34:19 -0700 (PDT) Received: by 10.36.68.4 with HTTP; Sun, 17 Apr 2005 17:34:19 -0700 (PDT) Message-ID: <34cb7c8405041717342891f2@mail.gmail.com> Date: Mon, 18 Apr 2005 01:34:19 +0100 From: Peter Edwards To: Greg 'groggy' Lehey , FreeBSD Current In-Reply-To: <20050214014217.GB85932@wantadilla.lemis.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_18724_15726492.1113784459091" References: <20050214014217.GB85932@wantadilla.lemis.com> Subject: Re: Race condition in debugger? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Peter Edwards List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Apr 2005 00:34:20 -0000 ------=_Part_18724_15726492.1113784459091 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline [Very late response: I just experienced the same problem and remembered the issue had been brought up before] On 2/14/05, Greg 'groggy' Lehey wrote: > I'm having some problems with userland gdb on recent -CURRENT builds: > at some point it hangs. >=20 > Specifically, I'm setting a conditional breakpoint like this: >=20 > b Minsert_blockletpointer if I->inode_num =3D=3D 0x1f0bb >=20 > inode_num increments for 1, so I hit this breakpoint about 100,000 > times. Or I should. What happens is that the debugger hangs at some > point on the way. ktrace shows multiple copies of: >=20 > 12325 gdb CALL ptrace(12,0x3026,0xbfbfd5e0,0) > 12325 gdb RET ptrace 0 > 12325 gdb CALL ptrace(PT_STEP,0x3026,0x1,0) > 12325 gdb RET ptrace 0 > 12325 gdb CALL wait4(0xffffffff,0xbfbfd808,0,0) <-- stops here > 12325 gdb RET wait4 12326/0x3026 > 12325 gdb CALL kill(0x3026,0) > 12325 gdb RET kill 0 > 12325 gdb CALL ptrace(PT_GETREGS,0x3026,0xbfbfd5c0,0) >=20 > When it hangs, it's at the call to wait4, as shown. It looks like the > completion of the ptrace request isn't being reported back. I think I know what's going on with this, and I have a feeling that there's a couple of other wait()-related issues that were left open on the lists that might be explained by the issue. Here's my hypothesis: kern_wait() checks each child of the current process to see if they have exited, or should otherwise report status to wait/wait3/wait4/waitpid, If it finds that all candidate children have nothing to report, it goes asleep, waiting to be awoken by the/a child reporting status, and repeats the process: it looks a bit like this: kern_wait() { loop: foreach child of self { if (child has status to report) return status; } lock self msleep(on "self") unlock self goto loop; } Problem is, that there's no lock protecting that the conditions in the inner loop hold by the time the current process locks its own "struct proc" and invokes msleep(). (It's probably most likely the race will happen on an SMP machine or with PREEMPTION, but the aquiry of curproc's lock could possibly cause the issue if it needed to sleep.), i.e., you can miss the wakeup generated by a particular child between checking the process in the inner loop, and going to sleep. I can at least reproduce this for the ptrace/gdb case, but AFAICT, it could happen for the standard wait()/exit() path, too. I worked up a patch to fix the problem by having those parts of the kernel that wake the process up flag the fact in the parent's flags and doing the wakeup while holding tha parent process lock, and noticing if this flag has been set before sleeping. (A simpler solution would be to hold the parent lock across the bulk of kern_wait, but from what I can gather this will lead to at least one LOR) I've been unable to reproduce the problem with a kernel with this patch, and using a nice sprinkling of printfs can show that when GDB hangs, the race has just occurred. Anyone got opinions on this? Cheers, Peadar. ------=_Part_18724_15726492.1113784459091 Content-Type: text/plain; name="waitpatch.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="waitpatch.txt" SW5kZXg6IGtlcm4va2Vybl9leGl0LmMKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQpSQ1MgZmlsZTogL3Vzci9jdnMvRnJl ZUJTRC1DVlMvc3JjL3N5cy9rZXJuL2tlcm5fZXhpdC5jLHYKcmV0cmlldmluZyByZXZpc2lvbiAx LjI1NwpkaWZmIC11IC1yMS4yNTcga2Vybl9leGl0LmMKLS0tIGtlcm4va2Vybl9leGl0LmMJMTMg TWFyIDIwMDUgMTE6NDc6MDQgLTAwMDAJMS4yNTcKKysrIGtlcm4va2Vybl9leGl0LmMJMTggQXBy IDIwMDUgMDA6MDg6MzAgLTAwMDAKQEAgLTU3Miw2ICs1NzIsNyBAQAogCXJldHVybiAoZXJyb3Ip OwogfQogCitpbnQgZml4cmFjZSA9IDE7CiBpbnQKIGtlcm5fd2FpdChzdHJ1Y3QgdGhyZWFkICp0 ZCwgcGlkX3QgcGlkLCBpbnQgKnN0YXR1cywgaW50IG9wdGlvbnMsCiAgICAgc3RydWN0IHJ1c2Fn ZSAqcnVzYWdlKQpAQCAtNzM5LDcgKzc0MCwxMSBAQAogCX0KIAlQUk9DX0xPQ0socSk7CiAJc3hf eHVubG9jaygmcHJvY3RyZWVfbG9jayk7Ci0JZXJyb3IgPSBtc2xlZXAocSwgJnEtPnBfbXR4LCBQ V0FJVCB8IFBDQVRDSCwgIndhaXQiLCAwKTsKKwlpZiAoZml4cmFjZSA9PSAwIHx8IChxLT5wX2Zs YWcgJiBQX1NUQVRDSElMRCkgPT0gMCkKKwkJZXJyb3IgPSBtc2xlZXAocSwgJnEtPnBfbXR4LCBQ V0FJVCB8IFBDQVRDSCwgIndhaXQiLCAwKTsKKwllbHNlCisJCWVycm9yID0gMDsKKwlxLT5wX2Zs YWcgJj0gflBfU1RBVENISUxEOwogCVBST0NfVU5MT0NLKHEpOwogCWlmIChlcnJvcikKIAkJcmV0 dXJuIChlcnJvcik7CQpJbmRleDoga2Vybi9rZXJuX3NpZy5jCj09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KUkNTIGZpbGU6 IC91c3IvY3ZzL0ZyZWVCU0QtQ1ZTL3NyYy9zeXMva2Vybi9rZXJuX3NpZy5jLHYKcmV0cmlldmlu ZyByZXZpc2lvbiAxLjMwNApkaWZmIC11IC1yMS4zMDQga2Vybl9zaWcuYwotLS0ga2Vybi9rZXJu X3NpZy5jCTEwIEFwciAyMDA1IDAyOjMxOjI0IC0wMDAwCTEuMzA0CisrKyBrZXJuL2tlcm5fc2ln LmMJMTggQXByIDIwMDUgMDA6MDg6MzEgLTAwMDAKQEAgLTcxLDYgKzcxLDcgQEAKICNpbmNsdWRl IDxzeXMvc3lzcHJvdG8uaD4KICNpbmNsdWRlIDxzeXMvdW5pc3RkLmg+CiAjaW5jbHVkZSA8c3lz L3dhaXQuaD4KKyNpbmNsdWRlIDxzeXMva2RiLmg+CiAKICNpbmNsdWRlIDxtYWNoaW5lL2NwdS5o PgogCkBAIC0yMjU5LDggKzIyNjAsMTAgQEAKIHsKIAogCVBST0NfTE9DS19BU1NFUlQocCwgTUFf T1dORUQpOworCVBST0NfTE9DS19BU1NFUlQocC0+cF9wcHRyLCBNQV9PV05FRCk7CiAJcC0+cF9m bGFnIHw9IFBfU1RPUFBFRF9TSUc7CiAJcC0+cF9mbGFnICY9IH5QX1dBSVRFRDsKKwlwLT5wX3Bw dHItPnBfZmxhZyB8PSBQX1NUQVRDSElMRDsKIAl3YWtldXAocC0+cF9wcHRyKTsKIH0KIApAQCAt MjI4MSw4ICsyMjg0LDggQEAKIAkJbisrOwogCWlmICgocC0+cF9mbGFnICYgUF9TVE9QUEVEX1NJ RykgJiYgKG4gPT0gcC0+cF9udW10aHJlYWRzKSkgewogCQltdHhfdW5sb2NrX3NwaW4oJnNjaGVk X2xvY2spOwotCQlzdG9wKHApOwogCQlQUk9DX0xPQ0socC0+cF9wcHRyKTsKKwkJc3RvcChwKTsK IAkJcHMgPSBwLT5wX3BwdHItPnBfc2lnYWN0czsKIAkJbXR4X2xvY2soJnBzLT5wc19tdHgpOwog CQlpZiAoKHBzLT5wc19mbGFnICYgUFNfTk9DTERTVE9QKSA9PSAwKSB7CkluZGV4OiBzeXMvcHJv Yy5oCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT0KUkNTIGZpbGU6IC91c3IvY3ZzL0ZyZWVCU0QtQ1ZTL3NyYy9zeXMvc3lz L3Byb2MuaCx2CnJldHJpZXZpbmcgcmV2aXNpb24gMS40MjQKZGlmZiAtdSAtcjEuNDI0IHByb2Mu aAotLS0gc3lzL3Byb2MuaAk4IEFwciAyMDA1IDAzOjM3OjUyIC0wMDAwCTEuNDI0CisrKyBzeXMv cHJvYy5oCTE4IEFwciAyMDA1IDAwOjA4OjQ0IC0wMDAwCkBAIC02MzYsNiArNjM2LDcgQEAKICNk ZWZpbmUJUF9TSU5HTEVfQk9VTkRBUlkgMHg0MDAwMDAgLyogVGhyZWFkcyBzaG91bGQgc3VzcGVu ZCBhdCB1c2VyIGJvdW5kYXJ5LiAqLwogI2RlZmluZQlQX0pBSUxFRAkweDEwMDAwMDAgLyogUHJv Y2VzcyBpcyBpbiBqYWlsLiAqLwogI2RlZmluZQlQX0lORVhFQwkweDQwMDAwMDAgLyogUHJvY2Vz cyBpcyBpbiBleGVjdmUoKS4gKi8KKyNkZWZpbmUJUF9TVEFUQ0hJTEQJMHg4MDAwMDAwIC8qIEEg Y2hpbGQgaGFzIHN0YXR1cyB0byByZXBvcnQuICovCiAKICNkZWZpbmUJUF9TVE9QUEVECShQX1NU T1BQRURfU0lHfFBfU1RPUFBFRF9TSU5HTEV8UF9TVE9QUEVEX1RSQUNFKQogI2RlZmluZQlQX1NI T1VMRFNUT1AocCkJKChwKS0+cF9mbGFnICYgUF9TVE9QUEVEKQo= ------=_Part_18724_15726492.1113784459091--