Date: Mon, 28 Jul 2008 08:40:05 GMT From: Nate Eldredge <neldredge@math.ucsd.edu> To: freebsd-bugs@FreeBSD.org Subject: Re: bin/125185: csh(1) exit on signal 11 Message-ID: <200807280840.m6S8e5At064548@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR bin/125185; it has been noted by GNATS. From: Nate Eldredge <neldredge@math.ucsd.edu> To: bug-followup@FreeBSD.org, 666.root@gmail.com Cc: Subject: Re: bin/125185: csh(1) exit on signal 11 Date: Mon, 28 Jul 2008 01:18:38 -0700 (PDT) I tracked this down. Here is the explanation as I understand it. The traceback from the segfault is as follows, for the record: #0 0x000000080096cd1e in malloc () from /lib/libc.so.7 #1 0x000000080096cfee in free () from /lib/libc.so.7 #2 0x0000000000448066 in sfree (p=0x427e46) at /usr/src/bin/csh/../../contrib/tcsh/tc.alloc.c:562 #3 0x0000000000450e79 in bb_cleanup (xbb=0x7fffffffdf70) at /usr/src/bin/csh/../../contrib/tcsh/tc.str.c:521 #4 0x000000000040d450 in cleanup_until (last_var=0x57b730) at /usr/src/bin/csh/../../contrib/tcsh/sh.err.c:444 #5 0x0000000000406423 in process (catch=1) at /usr/src/bin/csh/../../contrib/tcsh/sh.c:2027 #6 0x0000000000404f5f in main (argc=0, argv=0x7fffffffe7d8) at /usr/src/bin/csh/../../contrib/tcsh/sh.c:1304 However, the source of the bug is actually in the function `dobackp', sh.glob.c:646. tcsh has a "cleanup stack", where a function can push things to be cleaned up, and run them later. `dobackp' pushes some things on the cleanup stack, then detects the parse error and exits by calling stderror(). The problem is that the whole thing was being run in a subshell started with vfork(), so the stuff appears on the parent's cleanup stack, although they have pointers to objects that only existed for the child. (More specifically, pointers to a piece of the (regular) stack that is below the parent's current stack pointer, so it can get overwritten.) When the parent eventually runs its cleanup stack bad things happen. If you run csh with the -F option, to use fork() instead of vfork(), it does not crash. It would be easy to fix this specific instance of the bug, by calling cleanup_until() in `dobackp' before calling stderror(). Unfortunately, it looks like there are lots of places where the code tries to exit without cleaning up first, and it is not clear when they might be run in a vforked subshell. Here are some possibilities: 1. Audit the whole source to find and fix all places where a function may exit without popping the cleanup stack. 2. Set a mark on the stack as soon as vfork() returns in the child, and add code to xexit() or something to have it pop to that mark before exiting. I have not thought this through completely and am not sure if it is safe. 3. Stop using vfork() altogether. tcsh should really not be using it when there is non-trivial work for the child to do. How significant is the extra overhead of fork() in this day and age, when we have copy-on-write? The upstream tcsh people might also have some ideas, but a bit of Googling did not reveal who they are. -- Nate Eldredge neldredge@math.ucsd.edu
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200807280840.m6S8e5At064548>