Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Dec 2011 01:30:14 +1100
From:      Julien Ridoux <jrid@cubinlab.ee.unimelb.edu.au>
To:        Jung-uk Kim <jkim@freebsd.org>
Cc:        svn-src-head@freebsd.org, Lawrence Stewart <lstewart@freebsd.org>, Benjamin Kaduk <kaduk@mit.edu>, freebsd-arch@freebsd.org
Subject:   Re: svn commit: r227778 - head/sys/net
Message-ID:  <405E9C58-F999-45C2-BC20-81ED4CA8A3E9@cubinlab.ee.unimelb.edu.au>
In-Reply-To: <4EDB6814.9040403@freebsd.org>
References:  <201111210417.pAL4HOdi023556@svn.freebsd.org> <4ED8575D.2010405@freebsd.org> <201112021927.25234.jkim@FreeBSD.org> <201112022002.49618.jkim@FreeBSD.org> <4EDB6814.9040403@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On 04/12/2011, at 11:31 PM, Lawrence Stewart wrote:

> On 12/03/11 12:02, Jung-uk Kim wrote:
>> On Friday 02 December 2011 07:27 pm, Jung-uk Kim wrote:
>>> On Thursday 01 December 2011 11:43 pm, Lawrence Stewart wrote:
>>>> On 12/02/11 03:43, Jung-uk Kim wrote:
>>>>> On Thursday 01 December 2011 10:11 am, Lawrence Stewart wrote:
>>>>>> On 11/30/11 05:09, Jung-uk Kim wrote:
>>>>>>> On Tuesday 29 November 2011 11:13 am, Lawrence Stewart wrote:
> [snip]
>>>>>>>> Here's the first of the patches:
>>>>>>>>=20
>>>>>>>> http://people.freebsd.org/~lstewart/patches/misc/ffclock_bpf
>>>>>>>> _i nt act abi_10.x.r228130.patch
>>>>>>>=20
>>>>>>> I only glanced at it but it looks very close to what I wanted
>>>>>>> to suggest.
>>>>>>=20
>>>>>> Final candidate patch is at:
>>>>>>=20
>>>>>> http://people.freebsd.org/~lstewart/patches/misc/ffclock_bpf_i
>>>>>> nt act abi_10.x.r228180.patch
>>>>>>=20
>>>>>> Assuming it passes the "make tinderbox" build I'm currently
>>>>>> running and no further input is received from interested
>>>>>> parties, I plan to commit it in ~10 hours.
>>>>>>=20
>>>>>> Changes since the r228130 patch I sent previously:
>>>>>>=20
>>>>>> - The new flags in bpf.h are added unconditionally so that
>>>>>> they can always be referenced at compile time and a decision
>>>>>> made at runtime as to whether a flag will be set or not. Using
>>>>>> one of the new flags when the kernel doesn't have FFCLOCK
>>>>>> compiled in results in the flag being ignored. An app should
>>>>>> check for the existence of the "ffclock" kernel feature or the
>>>>>> "kern.sysclock" sysctl tree before attempting to use the
>>>>>> flags.
>>>>>>=20
>>>>>> - This patch will hopefully be MFCed at some point, so I added
>>>>>> a CTASSERT to bpf.c to ensure that the ABI of structs
>>>>>> bpf_hdr32, bpf_hdr and bpf_xhdr remains intact when FFCLOCK is
>>>>>> enabled and the union of a ffcounter and struct
>>>>>> timeval32/timeval/bpf_ts is switched in.
>>>>>>=20
>>>>>> - bpf_gettime() more comprehensively covers all the possible
>>>>>> cases of flag combination and does sensible things for each
>>>>>> case (even though some cases are rather silly).
>>>>>>=20
>>>>>> - The snippet of code at the beginning of catchpacket() that
>>>>>> was manipulating the struct bintime derived from bpf_gettime()
>>>>>> was gross and has been removed in favour of selecting the
>>>>>> right {get}bin{up}time() function call in bpf_gettime().
>>>>>=20
>>>>> I did that to reduce branching.  Since you are introducing more
>>>>> branches, it warrants a function pointer now.
>>>=20
>>> There's another reason, BTW.  If mbufs are tagged with timestamps
>>> (where only monotonic timestamps are allowed), they must be
>>> converted within the bpf.c.  I forgot all the details. :-(
>=20
> We should document this knowledge in some code comments.
>=20
>>>> I see, thanks for the explanation. Could you elaborate a bit more
>>>> about how you would implement the function pointer idea? I'm also
>>>> curious in the !FFCLOCK case just how much overhead having the
>>>> 2-layer nested if/else adds. I'm not an very optimisation savvy
>>>> person, but I'm wondering if it's actually worth micro-optimising
>>>> this code.
>>>>=20
>>>> My initial thoughts about your function pointer idea lead to
>>>> adding a function pointer in the bpf_d and setting it to the
>>>> appropriate function to get the timestamp from at bpf_d creation
>>>> or ioctl time. Whilst I like this idea, I can't see how it would
>>>> work given that the various functions involved in time/ffcounter
>>>> stamp generation all have different signatures.
>>>>=20
>>>> We could have multiple variants of bpf_gettime() which each call
>>>> the appropriate underlying functions to generate the appropriate
>>>> stamp. Would add quite a lot of code but would reduce the
>>>> overhead of calling bpf_gettime() to an indirect function call +
>>>> the underlying stamp generation function call. This also solves
>>>> the problem of multiple function signatures.
>>>=20
>>> Please see my patch:
>>>=20
>>> http://people.freebsd.org/~jkim/bpf_ffclock.diff
>>=20
>> I booted up the kernel and found it just crashes because of stupid
>> typos.  Now a new patch is uploaded in place.  Sorry.
>>=20
>> Anyway, I see no regression nor ABI breakage. :-)
>=20
> struct bpf_d being part of the ABI was the main thing I was concerned =
about, so given that it isn't I like your approach a lot.
>=20
> As noted by Julien, this approach does introduce problems with respect =
to the follow up patch that adds a permanent bh_ffcounter member to the =
bpf header. I thought the fromclock() API would sufficiently meet the =
needs of consumers like BPF, but if we were to proceed with something =
like Jung-uk's proposed patch I don't think that's true anymore.
>=20
> We decided to bite the bullet and devise an API that is more compact =
and can return all appropriate time information from all underlying =
sysclocks in an efficient manner - something like a more generic version =
of ffclock_abstime(). Julien and I spent quite some time today nutting =
out details, and Julien has done a proof of concept patch against my =
proposed BPF patch which looks good as a starting point for discussion:
>=20
> =
http://www.cubinlab.ee.unimelb.edu.au/~jrid/patches/r228254/sysclock_snap.=
patch
>=20
> It needs a bit more work and should be split into two patches (one =
introducing the API and the other being the BPF changes).
>=20
> With something like this though, the BPF patch becomes radically =
simplified in the FFCLOCK case. We don't even need a function pointer =
cached in the bpf_d anymore, but could cache a struct sysclock_snap =
there instead if we really wanted to minimise overhead in bpf_gettime().
>=20
> We'll have a go at refining the patch tomorrow hopefully, but wanted =
to put this out there for early consideration.
>=20
> Cheers,
> Lawrence


Hi Jung-uk (and all),

It took us a bit of time but we have a new patch. There are a lot of =
changes in the patch, and I will start by summarising our motivation for =
the changes. Hopefully I won't forget anything, but if I leave something =
unclear, don't hesitate to ask Lawrence or me.

Starting with the easy bits, we drop the union in the BPF header and =
followed something similar to what Ben and you proposed.

Next, the core motivation was to ensure that the timestamping function =
be called once only per packet captured, while keeping the ability to =
read both clocks and return the time from one of the two to the user. =
This forced us to redesign the timestamping function and one thing =
leading to another a fair bit of the BPF code.

There 3 orthogonal parameters to the timestamping function in the BPF =
context:
- which clock is used to return the time
- what is the format of the value returned
- and a performance issue (nothing to do with timestamping or =
timekeeping): reading the hardware counter or not

We handle the first one with a new sysclock_getsnapshot() which saves =
both clock parameters when timestamping and potentially reads the =
hardware counter.
The second one has been covered in previous discussion with the addition =
of BPF_T_FFCOUNTER.

The performance issue created problems. In the current code, the order =
with which the BPF devices are open on one interface impacts the =
timestamping function. Let's consider two BPF devices respectively open =
with BPF_TSTAMP_NORMAL and BPF_TSTAMP_FAST quality. If NORMAL is open =
after FAST, it is at the head of the list, the hardware counter is read, =
and FAST benefits from higher quality timestamps. If FAST is open after =
NORMAL, the timestamping function is called once and return last kernel =
tick time, then it is called a second time for NORMAL, this time =
accessing the hardware counter.

None of these cases are satisfactory. This lead to inconsistent =
behaviour based on the order the BPF are open. An application can impact =
the settings of other BPF devices and other applications / kernel =
consumers. The timestamping suffers from higher latency and worse, =
higher variability of the latency, which is bad.

We then took a different approach, which is inspired by your work on =
BPF. We believe the FAST/NORMAL settings should have 2 level of =
granularity: system wide and per interface. The system wide is the =
default behaviour (for example all BPF devices are NORMAL) and the =
interface one super-seeds the system one. This is implemented by the use =
of new sysctl living in the net.bpf tree.

This enables many scenarios such as:
- system default is FAST to improve performance because the system uses =
a slow timecounter (eg HPET or ACPI).
- the system has 3 interfaces. Interface 2 is put in NORMAL mode (need =
for accuracy but not much traffic), and the last can do hardware =
timestamping and is put in BPF_TSTAMP_EXTERNAL mode.

This then naturally led to store the performance setting =
(none/fast/normal/external) for each interface instead of each BPF =
descriptor.

Each BPF descriptor still maintains the other 2 settings: the choice of =
clock and the format of the timestamp independently from each other.

We believe this new patch keeps the spirit of the changes you proposed =
and improves it while accomodating the changes required by the =
feed-forward clock and future evolutions with things like IEEE 1588 =
compatible NICs.

We plan to push this patch as soon as possible. We would really =
appreciate if you could comment on this long email and pach very soon.

The patch can be found here:
=
http://www.cubinlab.ee.unimelb.edu.au/~jrid/patches/r228429-v4/ffclock_bpf=
_intactabi.patch

We will adjust the documentation as well once we have reached consensus.

Julien (and Lawrence of course)






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?405E9C58-F999-45C2-BC20-81ED4CA8A3E9>