Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Apr 2005 01:34:19 +0100
From:      Peter Edwards <peadar.edwards@gmail.com>
To:        Greg 'groggy' Lehey <grog@freebsd.org>, FreeBSD Current <current@freebsd.org>
Subject:   Re: Race condition in debugger?
Message-ID:  <34cb7c8405041717342891f2@mail.gmail.com>
In-Reply-To: <20050214014217.GB85932@wantadilla.lemis.com>
References:  <20050214014217.GB85932@wantadilla.lemis.com>

next in thread | previous in thread | raw e-mail | index | archive | help
------=_Part_18724_15726492.1113784459091
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

[Very late response: I just experienced the same problem and
remembered the issue had been brought up before]

On 2/14/05, Greg 'groggy' Lehey <grog@freebsd.org> wrote:
> I'm having some problems with userland gdb on recent -CURRENT builds:
> at some point it hangs.
>=20
> Specifically, I'm setting a conditional breakpoint like this:
>=20
>   b Minsert_blockletpointer if I->inode_num =3D=3D 0x1f0bb
>=20
> inode_num increments for 1, so I hit this breakpoint about 100,000
> times.  Or I should.  What happens is that the debugger hangs at some
> point on the way.  ktrace shows multiple copies of:
>=20
>  12325 gdb      CALL  ptrace(12,0x3026,0xbfbfd5e0,0)
>  12325 gdb      RET   ptrace 0
>  12325 gdb      CALL  ptrace(PT_STEP,0x3026,0x1,0)
>  12325 gdb      RET   ptrace 0
>  12325 gdb      CALL  wait4(0xffffffff,0xbfbfd808,0,0)  <-- stops here
>  12325 gdb      RET   wait4 12326/0x3026
>  12325 gdb      CALL  kill(0x3026,0)
>  12325 gdb      RET   kill 0
>  12325 gdb      CALL  ptrace(PT_GETREGS,0x3026,0xbfbfd5c0,0)
>=20
> When it hangs, it's at the call to wait4, as shown.  It looks like the
> completion of the ptrace request isn't being reported back.

I think I know what's going on with this, and I have a feeling that
there's a couple of other wait()-related issues that were left open on
the lists that might be explained by the issue.

Here's my hypothesis: kern_wait() checks each child of the current
process to see if they have exited, or should otherwise report status
to wait/wait3/wait4/waitpid, If it finds that all candidate children
have nothing to report, it goes asleep, waiting to be awoken by the/a
child reporting status, and repeats the process: it looks a bit like
this:

kern_wait()
{
loop:
    foreach child of self {
        if (child has status to report)
            return status;
    }
    lock self
    msleep(on "self")
    unlock self
    goto loop;
}

Problem is, that there's no lock protecting that the conditions in the
inner loop hold by the time the current process locks its own "struct
proc" and invokes msleep(). (It's probably most likely the race will
happen on an SMP machine or with PREEMPTION, but the aquiry of
curproc's lock could possibly cause the issue if it needed to sleep.),
i.e., you can miss the wakeup generated by a particular child between
checking the process in the inner loop, and going to sleep.

I can at least reproduce this for the ptrace/gdb case, but AFAICT, it
could happen for the standard wait()/exit() path, too. I worked up a
patch to fix the problem by having those parts of the kernel that wake
the process up flag the fact in the parent's flags and doing the
wakeup while holding tha parent process lock, and noticing if this
flag has been set before sleeping. (A simpler solution would be to
hold the parent lock across the bulk of kern_wait, but from what I can
gather this will lead to at least one LOR)

I've been unable to reproduce the problem with a kernel with this
patch, and using a nice sprinkling of printfs can show that when GDB
hangs, the race has just occurred.

Anyone got opinions on this?
Cheers,
Peadar.

------=_Part_18724_15726492.1113784459091
Content-Type: text/plain; name="waitpatch.txt"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="waitpatch.txt"

SW5kZXg6IGtlcm4va2Vybl9leGl0LmMKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQpSQ1MgZmlsZTogL3Vzci9jdnMvRnJl
ZUJTRC1DVlMvc3JjL3N5cy9rZXJuL2tlcm5fZXhpdC5jLHYKcmV0cmlldmluZyByZXZpc2lvbiAx
LjI1NwpkaWZmIC11IC1yMS4yNTcga2Vybl9leGl0LmMKLS0tIGtlcm4va2Vybl9leGl0LmMJMTMg
TWFyIDIwMDUgMTE6NDc6MDQgLTAwMDAJMS4yNTcKKysrIGtlcm4va2Vybl9leGl0LmMJMTggQXBy
IDIwMDUgMDA6MDg6MzAgLTAwMDAKQEAgLTU3Miw2ICs1NzIsNyBAQAogCXJldHVybiAoZXJyb3Ip
OwogfQogCitpbnQgZml4cmFjZSA9IDE7CiBpbnQKIGtlcm5fd2FpdChzdHJ1Y3QgdGhyZWFkICp0
ZCwgcGlkX3QgcGlkLCBpbnQgKnN0YXR1cywgaW50IG9wdGlvbnMsCiAgICAgc3RydWN0IHJ1c2Fn
ZSAqcnVzYWdlKQpAQCAtNzM5LDcgKzc0MCwxMSBAQAogCX0KIAlQUk9DX0xPQ0socSk7CiAJc3hf
eHVubG9jaygmcHJvY3RyZWVfbG9jayk7Ci0JZXJyb3IgPSBtc2xlZXAocSwgJnEtPnBfbXR4LCBQ
V0FJVCB8IFBDQVRDSCwgIndhaXQiLCAwKTsKKwlpZiAoZml4cmFjZSA9PSAwIHx8IChxLT5wX2Zs
YWcgJiBQX1NUQVRDSElMRCkgPT0gMCkKKwkJZXJyb3IgPSBtc2xlZXAocSwgJnEtPnBfbXR4LCBQ
V0FJVCB8IFBDQVRDSCwgIndhaXQiLCAwKTsKKwllbHNlCisJCWVycm9yID0gMDsKKwlxLT5wX2Zs
YWcgJj0gflBfU1RBVENISUxEOwogCVBST0NfVU5MT0NLKHEpOwogCWlmIChlcnJvcikKIAkJcmV0
dXJuIChlcnJvcik7CQpJbmRleDoga2Vybi9rZXJuX3NpZy5jCj09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KUkNTIGZpbGU6
IC91c3IvY3ZzL0ZyZWVCU0QtQ1ZTL3NyYy9zeXMva2Vybi9rZXJuX3NpZy5jLHYKcmV0cmlldmlu
ZyByZXZpc2lvbiAxLjMwNApkaWZmIC11IC1yMS4zMDQga2Vybl9zaWcuYwotLS0ga2Vybi9rZXJu
X3NpZy5jCTEwIEFwciAyMDA1IDAyOjMxOjI0IC0wMDAwCTEuMzA0CisrKyBrZXJuL2tlcm5fc2ln
LmMJMTggQXByIDIwMDUgMDA6MDg6MzEgLTAwMDAKQEAgLTcxLDYgKzcxLDcgQEAKICNpbmNsdWRl
IDxzeXMvc3lzcHJvdG8uaD4KICNpbmNsdWRlIDxzeXMvdW5pc3RkLmg+CiAjaW5jbHVkZSA8c3lz
L3dhaXQuaD4KKyNpbmNsdWRlIDxzeXMva2RiLmg+CiAKICNpbmNsdWRlIDxtYWNoaW5lL2NwdS5o
PgogCkBAIC0yMjU5LDggKzIyNjAsMTAgQEAKIHsKIAogCVBST0NfTE9DS19BU1NFUlQocCwgTUFf
T1dORUQpOworCVBST0NfTE9DS19BU1NFUlQocC0+cF9wcHRyLCBNQV9PV05FRCk7CiAJcC0+cF9m
bGFnIHw9IFBfU1RPUFBFRF9TSUc7CiAJcC0+cF9mbGFnICY9IH5QX1dBSVRFRDsKKwlwLT5wX3Bw
dHItPnBfZmxhZyB8PSBQX1NUQVRDSElMRDsKIAl3YWtldXAocC0+cF9wcHRyKTsKIH0KIApAQCAt
MjI4MSw4ICsyMjg0LDggQEAKIAkJbisrOwogCWlmICgocC0+cF9mbGFnICYgUF9TVE9QUEVEX1NJ
RykgJiYgKG4gPT0gcC0+cF9udW10aHJlYWRzKSkgewogCQltdHhfdW5sb2NrX3NwaW4oJnNjaGVk
X2xvY2spOwotCQlzdG9wKHApOwogCQlQUk9DX0xPQ0socC0+cF9wcHRyKTsKKwkJc3RvcChwKTsK
IAkJcHMgPSBwLT5wX3BwdHItPnBfc2lnYWN0czsKIAkJbXR4X2xvY2soJnBzLT5wc19tdHgpOwog
CQlpZiAoKHBzLT5wc19mbGFnICYgUFNfTk9DTERTVE9QKSA9PSAwKSB7CkluZGV4OiBzeXMvcHJv
Yy5oCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT0KUkNTIGZpbGU6IC91c3IvY3ZzL0ZyZWVCU0QtQ1ZTL3NyYy9zeXMvc3lz
L3Byb2MuaCx2CnJldHJpZXZpbmcgcmV2aXNpb24gMS40MjQKZGlmZiAtdSAtcjEuNDI0IHByb2Mu
aAotLS0gc3lzL3Byb2MuaAk4IEFwciAyMDA1IDAzOjM3OjUyIC0wMDAwCTEuNDI0CisrKyBzeXMv
cHJvYy5oCTE4IEFwciAyMDA1IDAwOjA4OjQ0IC0wMDAwCkBAIC02MzYsNiArNjM2LDcgQEAKICNk
ZWZpbmUJUF9TSU5HTEVfQk9VTkRBUlkgMHg0MDAwMDAgLyogVGhyZWFkcyBzaG91bGQgc3VzcGVu
ZCBhdCB1c2VyIGJvdW5kYXJ5LiAqLwogI2RlZmluZQlQX0pBSUxFRAkweDEwMDAwMDAgLyogUHJv
Y2VzcyBpcyBpbiBqYWlsLiAqLwogI2RlZmluZQlQX0lORVhFQwkweDQwMDAwMDAgLyogUHJvY2Vz
cyBpcyBpbiBleGVjdmUoKS4gKi8KKyNkZWZpbmUJUF9TVEFUQ0hJTEQJMHg4MDAwMDAwIC8qIEEg
Y2hpbGQgaGFzIHN0YXR1cyB0byByZXBvcnQuICovCiAKICNkZWZpbmUJUF9TVE9QUEVECShQX1NU
T1BQRURfU0lHfFBfU1RPUFBFRF9TSU5HTEV8UF9TVE9QUEVEX1RSQUNFKQogI2RlZmluZQlQX1NI
T1VMRFNUT1AocCkJKChwKS0+cF9mbGFnICYgUF9TVE9QUEVEKQo=
------=_Part_18724_15726492.1113784459091--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?34cb7c8405041717342891f2>