Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Oct 2009 17:32:34 -0700 (PDT)
From:      "Dorr H. Clark" <dclark@engr.scu.edu>
To:        freebsd-stable@freebsd.org
Cc:        freebsd-hackers@freebsd.org, freebsd-bugs@freebsd.org
Subject:   ptrace problem 6.x/7.x - can someone explain this?
Message-ID:  <Pine.GSO.4.21.0910271711580.17024-100000@nova46.dc.engr.scu.edu>
In-Reply-To: <Pine.GSO.4.21.0810072312220.4889-100000@nova41.dc.engr.scu.edu>

next in thread | previous in thread | raw e-mail | index | archive | help

We believe ptrace has a problem in 6.3; we have not tried other
releases.  The same code, however, exists in 7.1. 

The bug was first encountered in gdb...

(gdb) det
Detaching from program: /usr/local/bin/emacs, process 66217
(gdb) att 66224
Attaching to program: /usr/local/bin/emacs, process 66224
Error accessing memory address 0x281ba5a4: Device busy.
(gdb) det
Detaching from program: /usr/local/bin/emacs, process 66224
ptrace: Device busy.
(gdb) quit	<--- target process 66224 dies here

To isolate this problem, a wrote a simple minded test program was
written that just attached and detached. This test program found 
even the very first detach fails with EBUSY (see test source below):

$ ./test1 -p 66217 -c 1 -d 10
pid 66217 count 1 delay 10
Start of pass 0
Calling PT_ATTACH pid 66217 addr 0x0 sig 0
Calling PT_DETACH pid 66217 addr 0xffffffff sig 0
Call 0 to PT_DETACH returned -1, errno 16

Once again, the target process died when the ptracing test program
exitted, as would be expected if a detach had failed. 

The failure return was coming from the following test in kern_ptrace()
in sys_process.c

                /* not currently stopped */ 
                if ((p->p_flag & (P_STOPPED_SIG | P_STOPPED_TRACE)) == 0 || 
                    p->p_suspcount != p->p_numthreads  || 
                    (p->p_flag & P_WAITED) == 0) { 
                        error = EBUSY; 
                        goto fail; 
                } 

This is applied to all operations except PT_TRACE_ME, PT_ATTACH, and
some instances of PT_CLEAR_STEP. 

P_WAITED is generally not true. In particular, it's not set
automatically when a process is PT_ATTACHed.   It is cleared by
PT_DETACH and again when ptrace sends a signal (PT_CONTINUE,
PT_DETACH.)  _But_ it's set in only two places, and they aren't in
ptrace code.

2 sys/kern/kern_exit.c      kern_wait         773 p->p_flag |= P_WAITED;
3 compat/svr4/svr4_misc.c   svr4_sys_waitsys 1351 q->p_flag |= P_WAITED;

The relevant one is the first one, primarily. Here's the code:

                mtx_lock_spin(&sched_lock); 
                if ((p->p_flag & P_STOPPED_SIG) && 
                    (p->p_suspcount == p->p_numthreads) && 
                    (p->p_flag & P_WAITED) == 0 && 
                    (p->p_flag & P_TRACED || options & WUNTRACED)) { 
                        mtx_unlock_spin(&sched_lock); 
                        p->p_flag |= P_WAITED; 
                        sx_xunlock(&proctree_lock); 
                        td->td_retval[0] = p->p_pid; 
                        if (status) 
                                *status = W_STOPCODE(p->p_xstat); 
                        PROC_UNLOCK(p); 
                        return (0); 
                } 
                mtx_unlock_spin(&sched_lock); 

So it's only set on processes which are already traced. But it's not
set until someone calls wait4() on them - or the equivalent sysV
compatability routine.

Gdb doesn't always wait4() for processes immediately opon tracing
them, and the ptrace man page does not imply this is needed. 

Moreover, it's not clear why it should matter. The process 
needs to be stopped in order for it to make sense to do most 
of the things ptrace does. But - why should it need to be waited for? 
And what kind of sense does this make to someone writing a debugging
tool, where the natural logic seems to be:
- attach to process
- look at some stuff
- stick in some kind of breakpoint or similar and start it going again
  (or 'step' it)
- wait for it to stop
- look at and modify stuff
- detach, or set it moving again

By way of experiment, the test for P_WAITED was removed. Gdb no longer had
problems, and no new issues with gdb were encountered (although this
was just interactive, no "gdb coverage test" was attempted).
The test program also stopped having issues. 

                /* not currently stopped */ 
                if ((p->p_flag & (P_STOPPED_SIG | P_STOPPED_TRACE)) == 0 || 
                    p->p_suspcount != p->p_numthreads {
                        error = EBUSY; 
                        goto fail; 
                } 

So does anyone know whether it's safe to simply remove that test?

Thanks,

Arlie Stephens
Engineer

Dorr H. Clark
Advisor

Graduate School of Engineering
Santa Clara University,
Santa Clara, CA

---------------------------------------------------------------------
Test program here
---------------------------------------------------------------------
/*
 * experiment with ptrace, try to see which is broken - gdb or ptrace
 */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/ptrace.h>

void usage(void)
{
	printf("Simple program to play with ptrace\n");
	printf("Usage: test1 -p <pid> -c <count> -d <delay (sec)>\n");
	printf("Specify -n for no explicit detach\n");
	printf("Will attach and detach repeatedly from target process\n");
	exit(1);
}

int main(int argc, char *argv[])
{
	pid_t pid = -1;
	int count = 2;
	int delay = 5;
	int nodetach = 0;
	int opt;
	int ret;
	int i;

	
	while((opt = getopt(argc, argv, ":p:c:d:n")) != -1) {
		switch(opt) {
		case 'c':
			if (sscanf(optarg, "%d", &count) != 1) {
				printf("Count should be numeric\n");
				usage();
			}
			break;
		case 'd':
			if (sscanf(optarg, "%d", &delay) != 1) {
				printf("Delay should be numeric\n");
				usage();
			}
			break;
		case 'n':
			nodetach = 1;
			break;
		case 'p':
			if (sscanf(optarg, "%d", &pid) != 1) {
				printf("Pid should be numeric\n");
				usage();
			}
			break;
		default:
			printf("Illegal option -%c\n", opt);
			usage();
			break;
		}
	} 
	printf("pid %d count %d delay %d\n", pid, count, delay);
	if (pid == -1) {
		printf("Pid must be specified\n");
		usage();
	}
	if (count <= 0) {
		printf("Count must be positive\n");
		usage();
	}
	if (delay < 0) {
		printf("Delay must not be negative\n");
		usage();
	}

	for (i = 0; i < count; i++) {
		printf("Start of pass %d\n", i);
		errno = 0;
			printf("Calling PT_ATTACH pid %d addr 0x%lx sig %d\n",
			       pid, (unsigned long)(caddr_t)NULL, 0);
		ret = ptrace(PT_ATTACH, pid, NULL, 0);
		if (ret != 0) {
			printf("Call %d to PT_ATTACH returned %d, errno %d\n",
			       i, ret, errno);
		}
		sleep(delay);
		if (!nodetach) {
			errno = 0;
			printf("Calling PT_DETACH pid %d addr 0x%lx sig %d\n",
			       pid, (unsigned long)(caddr_t)-1, 0);
			ret = ptrace(PT_DETACH, pid, (caddr_t)-1, 0);
			if (ret != 0) {
				printf("Call %d to PT_DETACH returned %d, "
				       "errno %d\n",
				       i, ret, errno);
			}
		}
		sleep(delay);
	}

	return 0;
}







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.21.0910271711580.17024-100000>