Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Sep 2000 10:36:32 -0700
From:      Alfred Perlstein <bright@wintelcom.net>
To:        net@freebsd.org
Subject:   Network stack journal.
Message-ID:  <20000911103632.E12231@fw.wintelcom.net>

next in thread | raw e-mail | index | archive | help
Journal: threading the FreeBSD network stack.
---------------------------------------------
Notes:
  When I use a lowercase name for someone that refers
to thier freefall login name.
---------------------------------------------
Preface:
I'm writing the journal for several reasons:
1) to provide a place for notes, because the network
   stack is so large there's going to be many parts I'm
   going to have to skip over, I'll note what I've skipped
   over so either I can get back to it or someone else can
   jump in and do it.
2) document how the locking systems I'm putting in work
3) random thoughts as I progress either towards my goal or insanity

I started working on this a day or two after the SMPng commit which
brought FreeBSD mutex primatives and interrupt threads, which was
sometime in the first week of Sept 2000.

I started this journal a couple of days after starting my work so
I will detail a few things that have happened so far:

Initially I wanted to place mutex locks in both the socket and
socketbuffer structures, that proved to be too painful, instead
use a lock on the socket and keep the old sleep/flags locking on
the socketbuffer, there isn't a race because the socketbuffer flags
are protected by my socket lock and the newly added msleep() function
allows me to maninpulate the flags and sleep on them safely with
my socket mutex interlocked.

I'm gone through a lot of the code replacing manipulation of
statistical counters with atomic_ operations, some places have many
manipulations (particularly the tcp code) it may make more sense
to keep a local statistics counter on the stack and do a batched
update of the global statistics structure under a spinlock.  Other
alternatives include per-cpu counters but I've heard many negative
comments about doing stats like that.

Bosko Milekic <bmilekic@dsuper.net> was kind enough to MPsafe the
mbuf allocator code, we need to test this, he used await/asleep
rather than msleep, this ought to be checked for validity as the
asleep interface was implemented before SMPng and may not be safe.
I'm hoping that Bosko sticks around to help out, he's got some
great programming skill and there's a lot of code to work on.

I've already decided that my initial goal is going to be getting
udp and tcp4 working, unfortunatly that means I'm most likely not
working on:

BRIDGE, DUMMYNET, INET6, NETATALK, NS, IPX, IPSEC, NETGRAPH

I suspect that they can easily be made mpsafe, but they aren't a
consideration at this point, I just want to get something working
right now and that means userland<-(tcp/udp)->wire MPsafe code.

The good part is that now more than ever developers are active enough
to jump in and fix these.  And before I get flamed off the earth
I most likely will not be committing until INET6, IPSEC and NETGRAPH
maintainers are comfortable with it.

Malloc is now MPsafe thanks to jasone and jake which is obviously
an important and key starting point.

I had an interesting discovery the other night, when replaceing an
spl with a mutex over a particular structure we must be very careful.
While the spl is raised we can tsleep and are effectively dropping
the mutual exclusion however we must be wary of that when switching
over to mutexes to avoid deadlocks.

A quick (stupid) example: calling a function to wait for data to
arrive on a socket while holding the socket lock and forgetting to
drop the lock before calling it.  Normally spl would be dropped the
instant you slept and the network stack could churn along and dump
some data into your socketbuffer, but this is no longer the case, the
interrupt must also block against your mutex and if you screw up you
block waiting for data while the socket is locked against outside
manipulation including data arrival.

So far I think I have a pretty sound system protecting sockets, there
also some preliminary stuff with routes and pcbs but I need to work on
those more.

I've switched the ucred system to use atomic ops which should make it
mpsafe.

Journal continued at:
  http://people.freebsd.org/~alfred/mpsafe/stackjournal.txt

Work in progress:
  http://people.freebsd.org/~alfred/mpsafe/mpsafestack.diff
  

Ok, and here begins a time based journal.
----------------------------------------------

Mon Sep 11 10:16:50 PDT 2000

Realized that attempting to thread tcp_input code before ether code was
a bad idea.  The tcp code uses global variables from the IP code
which probably uses globals from the ether code, so I'm working in
the wrong direction (or working in a direction that's going to have
me spread out too thin).

I've decided to take this route.
either_input->ip_input->tcp/udp_input->
and
tcp_output->ip_output->ether_output


-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000911103632.E12231>