From owner-freebsd-threads@FreeBSD.ORG  Wed Oct 22 14:51:31 2003
Return-Path: <owner-freebsd-threads@FreeBSD.ORG>
Delivered-To: freebsd-threads@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 85F0B16A4B3
	for <threads@freebsd.org>; Wed, 22 Oct 2003 14:51:31 -0700 (PDT)
Received: from phantom.cris.net (phantom.cris.net [212.110.130.74])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0232243FB1
	for <threads@freebsd.org>; Wed, 22 Oct 2003 14:51:27 -0700 (PDT)
	(envelope-from phantom@FreeBSD.org.ua)
Received: (from phantom@localhost)
	by phantom.cris.net (8.12.6/8.12.6) id h9MM0csb071555;
	Thu, 23 Oct 2003 01:00:38 +0300 (EEST)
	(envelope-from phantom)
Date: Thu, 23 Oct 2003 01:00:38 +0300
From: Alexey Zelkin <phantom@freebsd.org>
To: threads@freebsd.org
Message-ID: <20031023010038.A71141@phantom.cris.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
X-Operating-System: FreeBSD 4.7-STABLE i386
Subject: libc_r & direct usage of syscalls
X-BeenThere: freebsd-threads@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Threading on FreeBSD <freebsd-threads.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-threads>
List-Post: <mailto:freebsd-threads@freebsd.org>
List-Help: <mailto:freebsd-threads-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Oct 2003 21:51:31 -0000

hi,

Some of you may remember a story about strange problems I had
with native jdk14 and fork() calls.

In few words -- sometimes, in absolutely random order JVM just after
call to fork() function become unusable due to SIGBUS signal storm
(JVM signal handler decided that this signal is not fatal and did not
stop an application).

Today I have completely tracked it down.  Or correctly to say
got a 100% reproducible .java testcase and wrote few more .c testcases in
order to prove my point of view.

JVM is using internally usual stack protection logic.  Every two pages on
borders of stack are protected with mmap().  When something accesses
it SIGBUS is generated and signal handler forces overflowing thread
to rollback some operation until it may safely continue its job.

fork() is special case here.  When fork() is called, child process
is need to reinitialize a libc_r internal state (this job is done by
fork() wrapper located in libc_r/uthread/uthread_fork.c).  One of steps
of reinitialization process is free()'ing of pthreads stacks.  Caveat here
is unchanged protections on stack pages.  Right after some stacks are
free()'ed, malloc internal (struct pginfo *) info got allocated into
protected region and this info being changed we get a big *KABOOM* (i.e.
SIGBUS).

Original code looked like:

[..]

pid = fork();
if (pid == 0) {
	make_pipes();
	close_descriptors();
	execvp();
}

[..]

Signal was arisen exactly while fork() in all cases.

I changed it into:

[..]

pthread_suspend_all_np();
pid = __sys_fork();
if (pid == 0) {
	make_pipes();
	close_descriptors();
	execvp();
}
pthread_resume_all_np();

[..]

Per my overview I should not expect problems with libc_r at -STABLE.  But
I am worrying about -CURRENT (espessially KSE) -- may such hack have
side effects ?

Comments and any input on potential problems is welcome!

PS:  This description maybe useful to somebody who also affected by
     same scenario (stack protections + libc_r's fork()), so I provide
     example of backtrace which is signaling about a problem:

: #0  0xbfbfffa8 in ?? ()
: #1  0x280f7b1d in free (ptr=0x828d000)
:     at /home/phantom/src/lib/libc_r/../libc/stdlib/malloc.c:1096
: #2  0x280b4b57 in _fork ()
:     at /home/phantom/src/lib/libc_r/uthread/uthread_fork.c:154
[..]