Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 20 Dec 2001 11:04:59 -0800 (PST)
From:      Julian Elischer <julian@elischer.org>
To:        Poul-Henning Kamp <phk@freebsd.org>
Cc:        arch@freebsd.org
Subject:   Re: Kernel stack size and stacking: do we have a problem ?
Message-ID:  <Pine.BSF.4.21.0112201053020.46573-100000@InterJet.elischer.org>
In-Reply-To: <600.1008837822@critter.freebsd.dk>

next in thread | previous in thread | raw e-mail | index | archive | help


On Thu, 20 Dec 2001, Poul-Henning Kamp wrote:

>=20
> As most of you have probably heard, I'm working on a stacking
> disk I/O layer (http://freefall.freebsd.org/~phk/Geom).
>=20
> This is as far as I know, only the third freely stackable subsystem
> in the kernel, the first two being VFS/filesystems and netgraph.
>=20
> The problem with stacking layered systems is that the na=EFve and
> simple implementation, just calling into the layer below, has
> basically unbounded kernel stack usage.
>=20
> Fortunately for us, neither VFS nor netgraph has had too much use
> yet, so we have not been excessively bothered by people running
> out of kernel-stack.
>=20
> It is well documented how to avoid the unbounded stack usage for
> such setups: simply queue the requests at each "gadget" and run
> a scheduler but this no where near as simple nor as fast as the
> direct call.
>=20
> So I guess we need to ask our selves the following questions:
>=20
> 1. What do we do when people start to run out of kernel stack
>    because they stack filesystems ?
> =09a) Tell them not to.
> =09b) Tell them to increase UPAGES.
> =09c) Increase default UPAGES.
> =09d) Redesign VFS/VOP to avoid the problem.

A couple of points..

Firstly, the stacks were just increased.. with an unmapped guard page at
the end (well it's an option). DOesn't solve the problem,.. just
related info.

Secondly UPAGES will make no difference as it no longer exists..
use KSTACK_PAGES instead. Also we should implement the=20
'stack-hogs' patch for gcc that there are 3 versions around for.
Some fs layers are just massive HOGS of space for very little reason.


>=20
> 2. Do we in general want to incur the overhead of scheduling
>    in stacking layers or does increasing the kernel stack as
>    needed make more sense ?

Try an adaptive scheme such as I mentionned above.. 99.99% of the time
it avoids scheduling.

>=20
> 3. Would it be possible to make kernel stack size a sysctl ?

hmmm, it might but it would be tricky because the constant KSTACK_PAGES
is used for both allocation and deallocation so if you just changed it to
be a variable, and changed it in between......
This is about to change BTW in KSE as there is a kstack per thread
and the allocation routines will be different.
The problem is that I'm caching threads and their stacks for quick
reallocation so I'd have to check each stack as I pass it out and check
whether I need to resize it to match the new size..

 >=20
> 4. Would it make sense to build an intelligent kernel-stack
>    overflow handling into the kernel, rather than "handling"
>    this with a panic ?
>=20
Presently we have a guard page (unmapped)
We could possibly allocate more pages and fill them in if a page fault
occurs. It would be quite a change to the current code but it COULD be
done. (but not by me.. you'd have to have a good handle on the in-kernel
fault handling, which was hair-raising last time I looked)

>    It should be trivially simple to make a function called
>    enough_stack() which would return false if we were in the
>    danger zone.  This function could then be used to fail
>    intelligently at strategic high-risk points in the kernel:
>=20
> =09int
> =09somefunction(...)
> =09{
> =09=09...
>=20
> =09=09if (!enough_stack())
> =09=09=09return (ENOMEM);
> =09=09...
> =09}
>=20
> Think about it...

We COULD have a page available to map into the guard page
that woudl allow completion but the activation of it would
cause such a low-stack state to be entered.
>=20
> --=20
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk@FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD committer       | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by incompetenc=
e.
>=20
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-arch" in the body of the message
>=20


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0112201053020.46573-100000>