Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Jul 2008 08:40:05 GMT
From:      Nate Eldredge <neldredge@math.ucsd.edu>
To:        freebsd-bugs@FreeBSD.org
Subject:   Re: bin/125185: csh(1) exit on signal 11
Message-ID:  <200807280840.m6S8e5At064548@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR bin/125185; it has been noted by GNATS.

From: Nate Eldredge <neldredge@math.ucsd.edu>
To: bug-followup@FreeBSD.org, 666.root@gmail.com
Cc:  
Subject: Re: bin/125185: csh(1) exit on signal 11
Date: Mon, 28 Jul 2008 01:18:38 -0700 (PDT)

 I tracked this down.  Here is the explanation as I understand it.
 
 The traceback from the segfault is as follows, for the record:
 
 #0  0x000000080096cd1e in malloc () from /lib/libc.so.7
 #1  0x000000080096cfee in free () from /lib/libc.so.7
 #2  0x0000000000448066 in sfree (p=0x427e46)
      at /usr/src/bin/csh/../../contrib/tcsh/tc.alloc.c:562
 #3  0x0000000000450e79 in bb_cleanup (xbb=0x7fffffffdf70)
      at /usr/src/bin/csh/../../contrib/tcsh/tc.str.c:521
 #4  0x000000000040d450 in cleanup_until (last_var=0x57b730)
      at /usr/src/bin/csh/../../contrib/tcsh/sh.err.c:444
 #5  0x0000000000406423 in process (catch=1)
      at /usr/src/bin/csh/../../contrib/tcsh/sh.c:2027
 #6  0x0000000000404f5f in main (argc=0, argv=0x7fffffffe7d8)
      at /usr/src/bin/csh/../../contrib/tcsh/sh.c:1304
 
 However, the source of the bug is actually in the function `dobackp', 
 sh.glob.c:646.  tcsh has a "cleanup stack", where a function can push 
 things to be cleaned up, and run them later.  `dobackp' pushes some things 
 on the cleanup stack, then detects the parse error and exits by calling 
 stderror().  The problem is that the whole thing was being run in a 
 subshell started with vfork(), so the stuff appears on the parent's 
 cleanup stack, although they have pointers to objects that only existed 
 for the child.  (More specifically, pointers to a piece of the (regular) 
 stack that is below the parent's current stack pointer, so it can get 
 overwritten.)  When the parent eventually runs its cleanup stack bad 
 things happen.
 
 If you run csh with the -F option, to use fork() instead of vfork(), it 
 does not crash.
 
 It would be easy to fix this specific instance of the bug, by calling 
 cleanup_until() in `dobackp' before calling stderror().  Unfortunately, it 
 looks like there are lots of places where the code tries to exit without 
 cleaning up first, and it is not clear when they might be run in a vforked 
 subshell.  Here are some possibilities:
 
 1. Audit the whole source to find and fix all places where a function may 
 exit without popping the cleanup stack.
 
 2. Set a mark on the stack as soon as vfork() returns in the child, and 
 add code to xexit() or something to have it pop to that mark before 
 exiting.  I have not thought this through completely and am not sure if it 
 is safe.
 
 3. Stop using vfork() altogether.  tcsh should really not be using it when 
 there is non-trivial work for the child to do.  How significant is the 
 extra overhead of fork() in this day and age, when we have copy-on-write?
 
 The upstream tcsh people might also have some ideas, but a bit of Googling 
 did not reveal who they are.
 
 -- 
 
 Nate Eldredge
 neldredge@math.ucsd.edu



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200807280840.m6S8e5At064548>