From owner-freebsd-stable@FreeBSD.ORG Wed Oct 28 01:16:37 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9E7EA1065692; Wed, 28 Oct 2009 01:16:37 +0000 (UTC) (envelope-from dclark@engr.scu.edu) Received: from endor.engr.scu.edu (smtp.engr.scu.edu [129.210.16.13]) by mx1.freebsd.org (Postfix) with ESMTP id 84B538FC0A; Wed, 28 Oct 2009 01:16:37 +0000 (UTC) Received: from nova46.dc.engr.scu.edu (nova46.dc.engr.scu.edu [129.210.16.43]) by endor.engr.scu.edu (8.13.6/8.13.6) with ESMTP id n9S0XQpu007996 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 27 Oct 2009 17:33:28 -0700 Received: from localhost (dclark@localhost) by nova46.dc.engr.scu.edu (8.13.6/8.13.6) with ESMTP id n9S0WYNj018063; Tue, 27 Oct 2009 17:32:35 -0700 (PDT) X-Authentication-Warning: nova46.dc.engr.scu.edu: dclark owned process doing -bs Date: Tue, 27 Oct 2009 17:32:34 -0700 (PDT) From: "Dorr H. Clark" X-Sender: dclark@nova46.dc.engr.scu.edu To: freebsd-stable@freebsd.org In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: freebsd-hackers@freebsd.org, freebsd-bugs@freebsd.org Subject: ptrace problem 6.x/7.x - can someone explain this? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Oct 2009 01:16:37 -0000 We believe ptrace has a problem in 6.3; we have not tried other releases. The same code, however, exists in 7.1. The bug was first encountered in gdb... (gdb) det Detaching from program: /usr/local/bin/emacs, process 66217 (gdb) att 66224 Attaching to program: /usr/local/bin/emacs, process 66224 Error accessing memory address 0x281ba5a4: Device busy. (gdb) det Detaching from program: /usr/local/bin/emacs, process 66224 ptrace: Device busy. (gdb) quit <--- target process 66224 dies here To isolate this problem, a wrote a simple minded test program was written that just attached and detached. This test program found even the very first detach fails with EBUSY (see test source below): $ ./test1 -p 66217 -c 1 -d 10 pid 66217 count 1 delay 10 Start of pass 0 Calling PT_ATTACH pid 66217 addr 0x0 sig 0 Calling PT_DETACH pid 66217 addr 0xffffffff sig 0 Call 0 to PT_DETACH returned -1, errno 16 Once again, the target process died when the ptracing test program exitted, as would be expected if a detach had failed. The failure return was coming from the following test in kern_ptrace() in sys_process.c /* not currently stopped */ if ((p->p_flag & (P_STOPPED_SIG | P_STOPPED_TRACE)) == 0 || p->p_suspcount != p->p_numthreads || (p->p_flag & P_WAITED) == 0) { error = EBUSY; goto fail; } This is applied to all operations except PT_TRACE_ME, PT_ATTACH, and some instances of PT_CLEAR_STEP. P_WAITED is generally not true. In particular, it's not set automatically when a process is PT_ATTACHed. It is cleared by PT_DETACH and again when ptrace sends a signal (PT_CONTINUE, PT_DETACH.) _But_ it's set in only two places, and they aren't in ptrace code. 2 sys/kern/kern_exit.c kern_wait 773 p->p_flag |= P_WAITED; 3 compat/svr4/svr4_misc.c svr4_sys_waitsys 1351 q->p_flag |= P_WAITED; The relevant one is the first one, primarily. Here's the code: mtx_lock_spin(&sched_lock); if ((p->p_flag & P_STOPPED_SIG) && (p->p_suspcount == p->p_numthreads) && (p->p_flag & P_WAITED) == 0 && (p->p_flag & P_TRACED || options & WUNTRACED)) { mtx_unlock_spin(&sched_lock); p->p_flag |= P_WAITED; sx_xunlock(&proctree_lock); td->td_retval[0] = p->p_pid; if (status) *status = W_STOPCODE(p->p_xstat); PROC_UNLOCK(p); return (0); } mtx_unlock_spin(&sched_lock); So it's only set on processes which are already traced. But it's not set until someone calls wait4() on them - or the equivalent sysV compatability routine. Gdb doesn't always wait4() for processes immediately opon tracing them, and the ptrace man page does not imply this is needed. Moreover, it's not clear why it should matter. The process needs to be stopped in order for it to make sense to do most of the things ptrace does. But - why should it need to be waited for? And what kind of sense does this make to someone writing a debugging tool, where the natural logic seems to be: - attach to process - look at some stuff - stick in some kind of breakpoint or similar and start it going again (or 'step' it) - wait for it to stop - look at and modify stuff - detach, or set it moving again By way of experiment, the test for P_WAITED was removed. Gdb no longer had problems, and no new issues with gdb were encountered (although this was just interactive, no "gdb coverage test" was attempted). The test program also stopped having issues. /* not currently stopped */ if ((p->p_flag & (P_STOPPED_SIG | P_STOPPED_TRACE)) == 0 || p->p_suspcount != p->p_numthreads { error = EBUSY; goto fail; } So does anyone know whether it's safe to simply remove that test? Thanks, Arlie Stephens Engineer Dorr H. Clark Advisor Graduate School of Engineering Santa Clara University, Santa Clara, CA --------------------------------------------------------------------- Test program here --------------------------------------------------------------------- /* * experiment with ptrace, try to see which is broken - gdb or ptrace */ #include #include #include #include #include #include void usage(void) { printf("Simple program to play with ptrace\n"); printf("Usage: test1 -p -c -d \n"); printf("Specify -n for no explicit detach\n"); printf("Will attach and detach repeatedly from target process\n"); exit(1); } int main(int argc, char *argv[]) { pid_t pid = -1; int count = 2; int delay = 5; int nodetach = 0; int opt; int ret; int i; while((opt = getopt(argc, argv, ":p:c:d:n")) != -1) { switch(opt) { case 'c': if (sscanf(optarg, "%d", &count) != 1) { printf("Count should be numeric\n"); usage(); } break; case 'd': if (sscanf(optarg, "%d", &delay) != 1) { printf("Delay should be numeric\n"); usage(); } break; case 'n': nodetach = 1; break; case 'p': if (sscanf(optarg, "%d", &pid) != 1) { printf("Pid should be numeric\n"); usage(); } break; default: printf("Illegal option -%c\n", opt); usage(); break; } } printf("pid %d count %d delay %d\n", pid, count, delay); if (pid == -1) { printf("Pid must be specified\n"); usage(); } if (count <= 0) { printf("Count must be positive\n"); usage(); } if (delay < 0) { printf("Delay must not be negative\n"); usage(); } for (i = 0; i < count; i++) { printf("Start of pass %d\n", i); errno = 0; printf("Calling PT_ATTACH pid %d addr 0x%lx sig %d\n", pid, (unsigned long)(caddr_t)NULL, 0); ret = ptrace(PT_ATTACH, pid, NULL, 0); if (ret != 0) { printf("Call %d to PT_ATTACH returned %d, errno %d\n", i, ret, errno); } sleep(delay); if (!nodetach) { errno = 0; printf("Calling PT_DETACH pid %d addr 0x%lx sig %d\n", pid, (unsigned long)(caddr_t)-1, 0); ret = ptrace(PT_DETACH, pid, (caddr_t)-1, 0); if (ret != 0) { printf("Call %d to PT_DETACH returned %d, " "errno %d\n", i, ret, errno); } } sleep(delay); } return 0; }