Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 02 Sep 2008 15:55:53 +0100
From:      "Bruce M. Simpson" <bms@FreeBSD.org>
To:        Luigi Rizzo <rizzo@iet.unipi.it>
Cc:        FreeBSD networking and TCP/IP list <freebsd-net@freebsd.org>
Subject:   Re: how to read dynamic data structures from the kernel (was Re: reading routing table)
Message-ID:  <48BD53F9.50002@FreeBSD.org>
In-Reply-To: <20080902105124.GA22832@onelab2.iet.unipi.it>
References:  <3170f42f0809010507q6c37a9d5q19649bc261d7656d@mail.gmail.com>	<48BBE7B2.4050409@FreeBSD.org> <48BCE4AA.6050807@elischer.org>	<3170f42f0809020017k643180efte155a5b5701a40cf@mail.gmail.com>	<alpine.BSF.1.10.0809021017500.1150@fledge.watson.org> <20080902105124.GA22832@onelab2.iet.unipi.it>

next in thread | previous in thread | raw e-mail | index | archive | help
Luigi Rizzo wrote:
> do you know if any of the *BSD kernels implements some good mechanism
> to access a dynamic kernel data structure (e.g. the routing tree/trie,
> or even a list or hash table) without the flaws of the two approaches
> i indicate above ?
>   

Hahaha. I ran into an isomorphic problem with Net-SNMP at work last week.

    There's a need to export the BGP routing table via SNMP. Of course 
doing this in our framework at work requires some IPC calls which always 
require a select() (or WaitForMultipleObjects()) based continuation.
    Net-SNMP doesn't support continuations at the table iterator level, 
so somehow, we need to implement an iterator which can accomodate our 
blocking IPC mechanism.

   [No, we don't use threads, and that would actually create more 
problems than it solves -- running single-threaded with continuations 
lets us run lock free, and we rely on the OS's IPC primitives to 
serialize our code. works just fine for us so far...]

    So we would end up caching the whole primary key range in the SNMP 
sub-agent on a table OID access, a technique which would allow us to 
defer the IPC calls providing we walk the entire range of the iterator 
and cache the keys -- but even THAT is far too much data for the BGP 
table, which is a trie with ~250,000 entries. I hate SNMP GETNEXT.

    Back to the FreeBSD kernel, though.

    If you look at in_mcast.c, particularly in p4 bms_netdev, this is 
what happens for the per-socket multicast source filters -- there is the 
linearization of an RB-tree for setsourcefilter().
    This is fine for something with a limit of ~256 entries per socket 
(why RB for something so small? this is for space vs time -- and also it 
has to merge into a larger filter list in the IGMPv3 paths.)
    And the lock granularity is per-socket. However it doesn't do for 
something as big as a BGP routing table.

    C++ lends itself well to expressing these kinds of smart-pointer 
idioms, though.
    I'm thinking perhaps we need the notion of a sysctl iterator, which 
allocates a token for walking a shared data structure, and is able to 
guarantee that the token maps to a valid pointer for the same entry, 
until its 'advance pointer' operation is called.

Question is, who's going to pull the trigger?

cheers
BMS

P.S. I'm REALLY getting fed up with the lack of openness and 
transparency largely incumbent in doing work in p4.

Come one come all -- we shouldn't need accounts for folk to see and 
contribute what's going on, and the stagnation is getting silly. FreeBSD 
development should not be a committer or chum-of-committer in-crowd.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48BD53F9.50002>