Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Jun 2010 23:08:24 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Jung-uk Kim <jkim@freebsd.org>
Cc:        freebsd-net@freebsd.org
Subject:   Re: [RFC] BPF timestamping
Message-ID:  <20100611215032.U35046@delplex.bde.org>
In-Reply-To: <201006102124.02005.jkim@FreeBSD.org>
References:  <201006091444.50560.jkim@FreeBSD.org> <20100610173950.T33647@delplex.bde.org> <201006102124.02005.jkim@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 10 Jun 2010, Jung-uk Kim wrote:

> On Thursday 10 June 2010 05:45 am, Bruce Evans wrote:
>> On Wed, 9 Jun 2010, Jung-uk Kim wrote:
>>> bpf(4) can only timestamp packets with microtime(9).  I want to
>>> expand it to be able to use different format and resolution.  The
>>> ...
>> This has too many timestamp types, yet not one timestamp type which
>> is any good except possibly BPF_T_NONE, and not one monotonic
>> timestamp type.  Only external uses and compatibility require use
>> of CLOCK_REALTIME.
>> ...
> Please note that I am not trying to solve timecounter issues here.
> The current BPF timestamping is not too good because of two main
> reasons; 1) it is too slow with some timecounter hardware as you have
> noted and 2) we have no API to change timestamp resolution, accuracy,
> format, offset, or whatever *at all*.
>
> The most common trick for the first problem is using getmicrotime(9)
> instead of microtime() if the users don't care much about its
> accuracy.  For those people who want to collect as many packets as
> possible without spending fortunes, it works pretty well.  However,
> suppose you have multiple interfaces.  You want good timestamps from
> a slower controller (LAN side) and less accurate timestamps from a
> super fast controller (WAN side), but you can't.  My patch solves
> this problem by assigning time stamping function per descriptor.  So,
> you can use the same resolution but different accuracies, for
> example.

I now think you should provide exactly the same timestamping features
as provided to useland by clock_gettime(2), clock_getres(2) and
clock_getaccprecres(2missing), using essentially the same interface
and code.  The userland interface involves clock ids of type clockid_t
with names like CLOCK_REALTIME instead of bpf-specific names and types.
Unfortunately it only supports the timespec format.

> The second problem is little bit harder for us without breaking
> libpcap and its consumers as it expects struct timeval and nothing
> else.  That's why I had to introduce new header format with compat
> shims.  In fact, struct bpf_hdr (and struct pcap_sf_pkthdr) is really
> obsolete and people have been talking about pcap NG for many years,
> which can store timestamps in variable resolutions and offsets.

Does it prefer or support bintimes?

> However, we can only use the default resolution even if libpcap gets
> the new format because we are stuck with struct bpf_hdr[1].
>
> BTW, I updated my patch, which includes monotonic clocks now.
>
> 	BPF_T_MICROTIME_MONOTONIC	microuptime(9)
> 	BPF_T_NANOTIME_MONOTONIC	nanouptime(9)
> 	BPF_T_BINTIME_MONOTONIC		binuptime(9)
> 	BPF_T_MICROTIME_MONOTONIC_FAST	getmicrouptime(9)
> 	BPF_T_NANOTIME_MONOTONIC_FAST	getnanouptime(9)
> 	BPF_T_BINTIME_MONOTONIC_FAST	getbinuptime(9)
>
> http://people.freebsd.org/~jkim/bpf_tstamp2.diff
>
> Thanks for the hint, Bruce, although you may say there are more bogus
> clock types now. ;-)

Yes, there are far too many, but many are still missing:
- aliases BPF_T_*TIME_PRECISE for BPF_T_*TIME correpsonding to the
   corresponding aliases for clockid_t's.  This gives 18 clock ids
   per timecounter instead of only 12.  clock_gettime() only supports
   6 of these (it doesn't support the micro or bin time formats).
- aliases BPF_T_UPTIME* for BPF_*TIME_MONOTONIC.  This gives 27
   clock ids per timecounter instead of only 18.  clock_gettime()
   only supports 9 of these.
- BPF_T_SECOND corresponding to CLOCK_SECOND.  clock_gettime()
   supports this.
- BPF_T_THREAD_CPUTIME corresponding to CLOCK_THREAD_CPUTIME_ID, but
   without the bogus _ID suffix.  The latter gives the runtime of the current
   thread in nanoseconds.  This might be almost useful for bpf if all the
   packets are stamped by the same kernel or user thread.  Then it would
   function as a packet id with extra info about the time spent processing
   packets.
- BPF_T_VIRTUAL and BPF_T_PROF corresponding to CLOCK_VIRTUAL and
   CLOCK_PROF.  The latter give user and user+sys times for processes.
   They would be about as useful as BPF_T_THREAD_CPUTIME for bpf.
- the total is now 31 for bpf (19 missing) and 13 for clock_gettime().
- multiply this by the number of timecounters.  Non-primary timecounters
   should be available iff something has a use for them.
- raw cputicker timestamps.  CLOCK_THREAD_CPUTIME_ID's timer uses these.
   These are not available in userland.  They are easily available in the
   kernel, by calling cpu_tick().  Scaling them is nontrivial.
- raw timecounter reads.  These are already available in userland via
   sysctlbyname("kern.timecounter.tc.<name>.counter", ...).  Strangely,
   they are hard to call from the kernel.

By using normal clock ids and calling kern_clock_gettime(), you can
avoid lots of duplication (including documentation of the bpf clock
ids) and automatically support new normal clock ids.  However, I
can't see how to implement the following features as efficiently:
- direct scaling to the final precision (kern_clock_gettime() only
   returns timspecs -- see abov)
- delayed scaling to the final precision (bpf seems to make timestamps
   as binuptimes and scale them later)
- avoiding going through layers and switches.  bpf goes through several
   layers and switches now, but perhaps it can go directly to the
   *time() function in kern_tc.c via a single function pointer, where
   kern_clock_gettime() and delayed scaling have to use a switch or
   an indexed function pointer since their clock id is highly variable.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100611215032.U35046>