From owner-freebsd-threads@FreeBSD.ORG Mon Oct 13 13:36:22 2003 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B724816A4B3 for ; Mon, 13 Oct 2003 13:36:22 -0700 (PDT) Received: from qwerty.maxwell.syr.edu (qwerty.maxwell.syr.edu [128.230.129.248]) by mx1.FreeBSD.org (Postfix) with ESMTP id C564C43F93 for ; Mon, 13 Oct 2003 13:36:21 -0700 (PDT) (envelope-from cmsedore@maxwell.syr.edu) Received: from qwerty.maxwell.syr.edu (qwerty.maxwell.syr.edu [128.230.129.248])h9DKaK5e063960; Mon, 13 Oct 2003 16:36:20 -0400 (EDT) (envelope-from cmsedore@maxwell.syr.edu) Date: Mon, 13 Oct 2003 16:36:20 -0400 (EDT) From: Christopher Sedore To: Daniel Eischen In-Reply-To: Message-ID: <20031013162204.H63667-100000@qwerty.maxwell.syr.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-threads@freebsd.org Subject: Re: odd problem(s) with libthr and libkse X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Oct 2003 20:36:22 -0000 On Sat, 11 Oct 2003, Daniel Eischen wrote: > On Sat, 11 Oct 2003, Christopher M. Sedore wrote: > > > I have a multithreaded program that I've built to run under libc_r, libthr, > > and libkse. I use the libc_r build for debugging and the others for actual > > work (the program is disk/network io intensive and I want the disk io > > concurrency from thr or kse). > > Anyway, here is the issue I'm seeing. It may be the same or a related > > problem for both, or may not be. > > When running under libthr, everything works fine for an indeterminate > > period, usually between 10 seconds and 30 minutes. Eventually, all program > > function stops. If I watch in top, threads get stuck in "sigwai". First > > one, then a couple, then all. > > When running under kse, the program pauses periodically. I have one thread > > that prints out a heartbeat once per second, and prints debug info. I get > > pauses of up to 5 seconds between my heartbeat: > > sigwait() may not be behaving as you'd expect in libkse. > It is slightly different than in libc_r, but should be > POSIX compliant nonetheless. The strange thing is that I don't call sigwait(). > I use the following to test libkse for I/O intensive applications: > > http://people.freebsd.org/~deischen/kse/crew.c > http://people.freebsd.org/~deischen/kse/sched_bug.c > > The latter test may be similar to what you are describing. > It spawns a bunch of threads to perform disk I/O and one thread > that just sleeps and prints an incrementing number once a > second. > > Use the first test as "crew node /usr/src" and it will spawn > worker threads to search for the string "node" in all files > in /usr/src. It is one of Butenhof's tests. Thanks, I'll try these and see if I get similar behaviour. > Other than that, you'll need to give more info. SCHED_4BSD > or SCHED_ULE? SMP or UP? scope system threads or scope > process threads? Sample program to demonstrate the problem? 4BSD, SMP. I'm running with default parameters, which I assume is scope process threads (I'd like to take advantage of M:N threading...). I've peaked out around 15 threads, so I don't think I should be bumping any limits. I can try to boil the program down to a sample. At this point it is too large (~6600 lines :-). -Chris