FreeBSD Mail Archives

Date:      Thu, 20 Dec 2001 11:26:16 -0800 (PST)
From:      Julian Elischer <julian@elischer.org>
To:        Alfred Perlstein <bright@mu.org>
Cc:        Poul-Henning Kamp <phk@freebsd.org>, arch@freebsd.org
Subject:   Re: Kernel stack size and stacking: do we have a problem
Message-ID:  <Pine.BSF.4.21.0112201106150.46573-100000@InterJet.elischer.org>
In-Reply-To: <20011220094723.B48837@elvis.mu.org>

On Thu, 20 Dec 2001, Alfred Perlstein wrote:

> * Poul-Henning Kamp <phk@freebsd.org> [011220 04:46] wrote:
> >=20
> > As most of you have probably heard, I'm working on a stacking
> > disk I/O layer (http://freefall.freebsd.org/~phk/Geom).
> >=20
> > This is as far as I know, only the third freely stackable subsystem
> > in the kernel, the first two being VFS/filesystems and netgraph.
> >=20
> > The problem with stacking layered systems is that the na=EFve and
> > simple implementation, just calling into the layer below, has
> > basically unbounded kernel stack usage.
>

My first version of this tried to not use recursion, but rather iteration,
where the output of each layer was supplied back to the caller (the
framework), who's job was to then call the next layer. The return could be
a linked list of multiple work requests in the case of a raid array, with
each item referencing a different 'next' module.  The difficulty was in
getting the reverse flow of error and status information to flow in
exactly the reverse path. I ended up having to pass completed work units
back through a second entry point in each case.  The returned items from
each entry point had been set up during processing with a pointer to=20
the next module that should process them. The layers were connected much
like netgraph nodes are now.. (raidgraph anyone?)

I ended up just going for stack versions, however that had a lot of
problems too.

=20
> I think you're thinking way too hard about this, what would make
> sense is a near surefire way to catch stack overflow along with a
> panic message that was clear like "increase upages in kernel".

err upages has been gone for 4 months now.. try KSTACK_PAGES.

>=20
> Btw, Windows catches this and somehow assigns additional kernel
> stack pages to processes (or at least it seems).  Do a search for
> "MinSp" or "MinSps".

it can be done.. who has the time and interest?

>=20
> Lastly I would really avoid redesigning the VOPs however one
> suggestion would be to define an entry/exit function so that
> instead of having traditional stacking code like so:
>=20
> my_layer(void *foo)
> {
>=20
>    if (mytest()) {
>       NEXT_LAYER(foo);
>       something;
>    }
>    return ERROR;
> }
>=20
> You could have:
>=20
> my_layer_entry(void *foo)
> {
>=20
>   if (mytest())
>      return (foo, next_layer_ptr);
>   else
>      return ERROR;
> }
>=20
> my_layer_exit(void *foo)
> {
>=20
>    something;
>=20
> }
>=20

This is similar to the scheme I originally did
and it does work, but it also has a lot of complications..
for example you need to keep track rigidly of how many requests you=20
passed back to process further down and how many you saw coming back up..
in a recursive scheme you get this almost for free but it can get wierd
in iterative schemes.

> This would keep the stack at a near constant level at the expense
> of programming complexity.

Netgraph in -current has a hybrid of these two schemes. It queues items
AND it runs directly, depending on dynamic state at the time of the
execution request. If it queues the data, it also puts the appropriate
queue onto a list of queues with work, and it will go service the next
entry on that list as soon as it's finished what it's doing now. The same
scheme may work for disk layering too. It's implemented in order to make
the netgraph capable of optimally running in a true SMP environment, while
providing the ability to reconfigure the graph, while data is traversing
it.

>  If you think about it though, it's quite
> like how network interrupts are handled, hardware queues the packets
> and then software runs after hardware returns.
>=20

Though some people are trying to change this, which leads to teh
possibility of circular packet paths..
(rcv->ip_input->ip_forward->ip_output->if_loop->ip_input->........)

> -Alfred
>=20
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-arch" in the body of the message
>=20

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0112201106150.46573-100000>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation