From owner-freebsd-hackers Mon Feb 18 8:15:15 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from mail.openet-telecom.com (mail.openet-telecom.com [62.17.151.60]) by hub.freebsd.org (Postfix) with ESMTP id 6239D37B400 for ; Mon, 18 Feb 2002 08:15:04 -0800 (PST) Received: from mail.openet-telecom.com (unverified) by mail.openet-telecom.com (Content Technologies SMTPRS 4.2.1) with ESMTP id for ; Mon, 18 Feb 2002 16:23:17 +0000 Received: from openet-telecom.com (10.0.0.40) by mail.openet-telecom.com (NPlex 5.5.034) id 3C6A0D5F000050E4 for freebsd-hackers@freebsd.org; Mon, 18 Feb 2002 16:01:44 +0000 Message-ID: <3C712885.A0A91264@openet-telecom.com> Date: Mon, 18 Feb 2002 16:15:01 +0000 From: Peter Edwards X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.4.2 i386) X-Accept-Language: en MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Subject: ptrace() bug? Content-Type: multipart/mixed ; boundary="------------86D04AF17E37DA3C5C849E89" Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG This is a multi-part message in MIME format. --------------86D04AF17E37DA3C5C849E89 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, I recently wrote a program that was using ptrace() to suspend a process, and resume it later. Maybe ptrace just isn't used enough, or maybe I just don't get all the reasons behind why it's implemented the way it is, but it seems to be somewhat buggy. (I'm bringing this up on -hackers rather than filing a PR, because i'm not 100% sure that what I'm seeing is indeed a bug.) Firstly, I noticed that PT_DETACH() wasn't working as expected: The signal was effectively being ignored, and a combination of ptrace(PT_ATTACH, pid, 0, 0) followed by ptrace(PT_DETACH, pid, (caddr_t)1, 0) would result in the target process getting stopped. The only way around this was to kill(pid, SIGCONT) before the PT_DETACH. This seems to be less than satisfactory: I would have thought the ptrace() call should be capable of at least attaching and detaching from the process without causing such spurious signals to be delivered to its victim. This apparent mis-handling of this signal argument to ptrace() appears to occur in issignal(): (Im discussing line numbers from kern_sig.c, 1.72.2.14) On ptrace(PT_ATTACH, ...), the target process gets to the stop() on line 1254, with p->p_xstat == SIGSTOP. On ptrace(PT_DETACH, pid, (caddr_t)1, mySig), the parent kicks the stopped child, after clearing the P_TRACE flag. The child wakes up, with the new signal, mySig, set in p->p_xstat, and the old signal still present in p->p_siglist from the "psignal()" call on line 1252. It then notices that the P_TRACED flag has been switched off, and starts all over again, discarding the new signal, and re-finding the original SSTOP that was sent by the ptrace(PT_ATTACH) call. ie, the process gets sent the signal that the ptrace() call explicitly tried to replace. So, is my patch correct, or would one for the ptrace manpage be a better approach? Secondly, the entire mechanism for delivering these signals with ptrace() seems to be somewhat unreliable. The general idea is that issignal() is immediately followed by a call to postsig(), but that is not always what happens (eg, tsleep() called with PCATCH) If we end up with multiple calls to issignal() for one call to postsig(), the debugger can sometimes see signal it tries to continue a process with arrive after a wait(), and it has no idea wheather this signal arrived because it was raised by the child or an external process, or from its own call to ptrace(). I haven't investigated further, but I'd have thought that postsig() was a better place to do the stop/resume and signal replacement for a traced child. I know this would "hide" ignored signals, etc, from the child, but in that event, I don't think a user-space debugger would really care: The signal would never arrive at the child under normal circumstances anyway, and ptrace()/wait() combinations would show much more deterministic behaviour. Is it worthwhile trying to "fix" this, or is there an obvious (to someone else) stumbling block I'll fall over in attempting it? -- Peter. --------------86D04AF17E37DA3C5C849E89 Content-Type: text/plain; charset=us-ascii; name="kern_sig.c.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="kern_sig.c.patch" Index: kern_sig.c =================================================================== RCS file: /pub/FreeBSD/development/FreeBSD-CVS/src/sys/kern/kern_sig.c,v retrieving revision 1.72.2.14 diff -u -r1.72.2.14 kern_sig.c --- kern_sig.c 14 Dec 2001 03:05:32 -0000 1.72.2.14 +++ kern_sig.c 12 Feb 2002 09:22:52 -0000 @@ -1257,14 +1257,6 @@ && p->p_flag & P_TRACED); /* - * If the traced bit got turned off, go back up - * to the top to rescan signals. This ensures - * that p_sig* and ps_sigact are consistent. - */ - if ((p->p_flag & P_TRACED) == 0) - continue; - - /* * If parent wants us to take the signal, * then it will leave it in p->p_xstat; * otherwise we just look for signals again. @@ -1275,10 +1267,21 @@ continue; /* - * Put the new signal into p_siglist. If the - * signal is being masked, look for other signals. + * Put the new signal into p_siglist. */ SIGADDSET(p->p_siglist, sig); + + /* + * If the traced bit got turned off, go back up + * to the top to rescan signals. This ensures + * that p_sig* and ps_sigact are consistent. + */ + if ((p->p_flag & P_TRACED) == 0) + continue; + /* + * If the signal is being masked, look for other + * signals. + */ if (SIGISMEMBER(p->p_sigmask, sig)) continue; } --------------86D04AF17E37DA3C5C849E89-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message