From owner-freebsd-current@FreeBSD.ORG  Sun Dec 23 18:55:47 2012
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 2852D979
 for <freebsd-current@freebsd.org>; Sun, 23 Dec 2012 18:55:47 +0000 (UTC)
 (envelope-from freebsd@damnhippie.dyndns.org)
Received: from duck.symmetricom.us (duck.symmetricom.us [206.168.13.214])
 by mx1.freebsd.org (Postfix) with ESMTP id D7A0A8FC0A
 for <freebsd-current@freebsd.org>; Sun, 23 Dec 2012 18:55:46 +0000 (UTC)
Received: from damnhippie.dyndns.org (daffy.symmetricom.us [206.168.13.218])
 by duck.symmetricom.us (8.14.5/8.14.5) with ESMTP id qBNItcJq067249
 for <freebsd-current@freebsd.org>; Sun, 23 Dec 2012 11:55:38 -0700 (MST)
 (envelope-from freebsd@damnhippie.dyndns.org)
Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240])
 by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id qBNItF9p070079
 for <freebsd-current@freebsd.org>; Sun, 23 Dec 2012 11:55:15 -0700 (MST)
 (envelope-from freebsd@damnhippie.dyndns.org)
Subject: Why does sleep(1) end up blocked in bwillwrite()?
From: Ian Lepore <freebsd@damnhippie.dyndns.org>
To: freebsd-current@freebsd.org
Content-Type: text/plain; charset="us-ascii"
Date: Sun, 23 Dec 2012 11:55:15 -0700
Message-ID: <1356288915.1129.68.camel@revolution.hippie.lan>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 23 Dec 2012 18:55:47 -0000

Background:  I'm trying to get nandfs working on a low-end small-memory
embedded system.  I'm debugging performance problems that manifest as
the system (or large portions of it) becoming unresponsive for many
seconds at a time.  It appears that sometimes the nandfs background
garbage collector does things that lead to dirtying lots of buffers way
faster than they can be written.  When that happens it seems to take too
long (many seconds) for the problem to clear.  That's the basic
situation I'm investigating, but NOT what this mail is about, that's
just the background.

When this situation happens, some of the threads in my application keep
running fine.  Others get blocked unexpectedly even though they do no
disk IO at all, they're working with sockets and serial (uart) devices.
I discovered by accident that I can see a form of the problem happening
just using sleep(1) and hitting ^T while the buffer starvation is in
progress...

  guava# sleep 999999
[ hit ^T]
  load: 1.03  cmd: sleep 472 [nanslp] 2.03r 0.01u 0.02s 0% 1372k
  sleep: about 999997 second(s) left out of the original 999999
[ hit ^T]
  load: 1.27  cmd: sleep 472 [nanslp] 9.32r 0.01u 0.02s 0% 1376k
  sleep: about 999989 second(s) left out of the original 999999
[ hit ^T]
  load: 1.49  cmd: sleep 472 [nanslp] 11.53r 0.01u 0.02s 0% 1376k
[ note no output from sleep(1) here, repeated ^T now gives...]
  load: 1.49  cmd: sleep 472 [flswai] 12.01r 0.01u 0.03s 0% 1376k
  load: 1.49  cmd: sleep 472 [flswai] 12.27r 0.01u 0.03s 0% 1376k
  load: 1.49  cmd: sleep 472 [flswai] 12.76r 0.01u 0.03s 0% 1376k
  load: 1.49  cmd: sleep 472 [flswai] 13.06r 0.01u 0.03s 0% 1376k
  load: 1.49  cmd: sleep 472 [flswai] 13.26r 0.01u 0.03s 0% 1376k
  load: 1.61  cmd: sleep 472 [flswai] 20.03r 0.02u 0.07s 0% 1376k
  load: 1.64  cmd: sleep 472 [flswai] 20.49r 0.02u 0.07s 0% 1376k
  load: 1.64  cmd: sleep 472 [flswai] 20.68r 0.02u 0.08s 0% 1376k
  sleep: about 999987 second(s) left out of the original 999999

So here sleep(1) was blocked in bwillwrite() for about 9 seconds on a
write to stderr (which is an ssh xterm connection).

The call to bwillwrite() is in kern/sys_generic.c in dofilewrite():

	if (fp->f_type == DTYPE_VNODE)
              bwillwrite();

I just noticed the checkin message that added the DTYPE_VNODE check
specifically mentions not penalizing devices and pipes and such.  I
think maybe things have evolved since then (Dec 2000) and this check is
no longer sufficient.  Maybe it needs to be something more like

	if (fp->f_type == DTYPE_VNODE && fp->f_vnode->v_type == VREG)

but I have a gut feeling it needs to be more complex than that (can
f_vnode be NULL, what sort of locking is required to peek into f_vnode
at this point, etc), so I can't really propose a patch for this.  In
fact, I can't even say for sure it's a bug, but it sure feels like one
to the application-developer part of me.

-- Ian