Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 18 Aug 2019 22:53:51 +0200
From:      Per Hedeland <per@hedeland.org>
To:        Ian Lepore <ian@freebsd.org>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: Is it a good idea to use a usb-serial adapter for PPS? Yes, it is.
Message-ID:  <fe2c2d77-3030-6734-e1d8-c1375f231a24@hedeland.org>
In-Reply-To: <72a964c78cbfc36be2345919633ca2196f0783e3.camel@freebsd.org>
References:  <alpine.BSF.2.21.99999.352.1908071046410.98975@autopsy.pc.athabascau.ca> <69a9bed3-4d0a-f8f6-91af-a8f7d84ee307@hedeland.org> <345bae77417c2495f55799b4c7ca2784f4ece9ed.camel@freebsd.org> <7312032d-2908-9414-0445-6b442c3a02e5@hedeland.org> <523b6f0a0fa5f2aeec298fa74df25d3c4af66acc.camel@freebsd.org> <0426fc8b-5398-d8ab-561e-7823c24403a5@hedeland.org> <24b0eaf25b64d6098b390df092866c69e352d859.camel@freebsd.org> <16c91be1-6f2a-b26d-22c7-be8e4ba8eec0@hedeland.org> <72a964c78cbfc36be2345919633ca2196f0783e3.camel@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2019-08-18 21:27, Ian Lepore wrote:
> On Thu, 2019-08-15 at 23:05 +0200, Per Hedeland wrote:
>> On 2019-08-15 17:49, Ian Lepore wrote:
>>> On Thu, 2019-08-15 at 13:46 +0200, Per Hedeland wrote:
>>>> On 2019-08-09 22:17, Ian Lepore wrote:
>>>>> [...]
>>>>
>>>> I have a theory that your making the kernel clock be based on the 10
>>>> MHz clock also ended up locking the USB poll frequency to that clock,
>>>> and thus to the PPS signal - this would certainly explain the result.
>>>> Do you think this is a possibility? Would it be possible for you to
>>>> re-run the test without modifying the kernel clock? (I do understand
>>>> that the results will be harder to interpret with the drift, and
>>>> ntpd's correction of it, coming into play.)
>>>>
>>>> --Per
>>>>
>>>
>>> I'm not sure what you mean by "modifying the kernel clock".  The kernel
>>> clock always runs on some frequency source.  Typically it's derived
>>> from the cheap 24 MHz crystal that clocks the SoC, sometimes after
>>> being scaled up to 66 MHz by a phase-fractional PLL within the SoC.  I
>>> arranged to use a very stable nearly-drift-free frequency source
>>> instead of a cheap crystal for counting time in the kernel.
>>>
>>> The kernel clock has nothing to do with usb, including polling
>>> intervals; the usb controller hardware handles that, and the root
>>> source clock for that is the cheap 24 MHz crystal.
>>
>> The thing that made me hypothesize that the kernel clock *could* have
>> *something* to do with the USB polling frequency was this observation
>> in https://blog.dan.drown.org/pps-over-usb (link provided by one of
>> the posters in the newsgroup, though he didn't refer specifically to
>> this):
>>
>>      Looking closer at the USB latency, you can see the PPS drifting
>>      relative to the host schedule of polling the USB device for its
>>      status. The system clock error was 2.215ppm during this time
>>      period, and this drift matches that error exactly. This probably
>>      means USB on this system shares the same clock as the system
>>      clock. This hardware is a Raspberry Pi 2, and I suspect it won't be
>>      true for other platforms.
>>
>> So at least on RPi 2, there appears to be a relation between the
>> "normal" system/kernel clock and the USB polling frequency. But I have
>> no idea if there is such a relation on the system you used, and even
>> in that case, *I* certainly can't see how using a different source for
>> the kernel clock could affect the USB polling frequency, which is why
>> asked if you thought that it was a possibility...
>>
> 
> I probably should have been clearer that I meant there was no
> correlation between the kernel clock and the usb polling on the system
> I was using as a testbed.  On most SoCs, and probably even modern x86
> systems, the same frequency source (typically a 24MHz crystal) will be
> the root clock for both the usb controller hardware and the timer
> hardware from which the kernel clock is derived.  However, the kernel
> clock is numerically steered to be more stable in frequency and
> accurate in phase, so once ntpd has been running for long enough to
> capture and disipline the kernel clock, the situation will change.  The
> usb polling will still be happening at the drifting frequency of the
> underlying crystal, while the kernel timestamps used to mark the PPS
> pulse time will not be drifting at that rate.

Understood. What I don't understand is if, and if so how, your
"replacing" the kernel clock with an "exact" frequency from your 10
MHz clock might affect the USB polling.

> I have a hard time understanding how the measurements were made in that
> pps-over-usb page you cited.  There is mention of a STM32F103
> microcontroller, but it's not clear to me what role it plays.  There is
> also mention of a usb irq and something about a message in a buffer.

My understanding is that the "STM32F103 devboard" actually
*implements* the USB-to-serial adapter. This means that the author has
detailed insight into the workings of the adapter, including the
possibilty to modify its firmware - something that is obviously not
the case for an off-the-shelf adapter. The "PPS IRQ" is thus something
that happens *inside* the "adapter".

> For the measurements I made, I was using FTDI usb-serial devices
> directly connected to the usb bus on the Wandboard I was using to make
> measurements.  When the DCD pin changes at the ftdi chip, the chip
> internally notes that it has a line-status change that must be
> communicated upstream at the next opportunity.  When the time comes to
> send the data, it sends a 2-byte packet which contains the modem and
> line status register bits.  (If there are any routine uart data bytes
> in the buffer, they also get transferred, but I'm not doing any data
> transfer on the adapters I'm using for this test, in fact the only pins
> connected are ground and DCD.)  When the input packet arrives, the
> uftdi driver sees the change in the DCD bit and captures a pps event.
> For ftdi chips, all of the foregoing is done with usb BULK-IN
> transfers, not control or interrupt transfers.
> 
> I need to run the same tests with some other brands of usb-serial
> adapters.  I think I may have a cable laying around based on Prolific
> PL2303 chipset.  If I can find it.  I should just go buy a few other-
> brand breakout boards and test them.

FWIW, I did a bleak replica of your setup, using a "noname"
USB-to-serial adapter that I had laying around, which was actually
identified as

pi kernel: uplcom0: <Prolific Technology Inc. USB-Serial Controller, class 0/0, rev 1.10/3.00, addr 4> on usbus0

- and since the uplcom driver has "support for Prolific
PL-2303/2303X/2303HX", I assume it is one of those. Since I don't
have a "real" PPS source, I simulated one with a simple program
running on an RPi 3, that generated a pulse on a gpio pin at the turn
of the second. This pin was then connected to a gpio pin on an RPi B,
and to the DCD pin on the above adapter, also connected to the RPi B -
notably without any ttl-to-rs232 converter (since I also don't have
one of those). I then set up ntpd on the RPi B to use gpiopps plus the
PPS from the uplcom driver:

server <lan host> iburst prefer

server 127.127.22.0 minpoll 4 maxpoll 4
fudge  127.127.22.0 refid gpio

server 127.127.22.1 minpoll 4 maxpoll 4 noselect
fudge  127.127.22.1 refid usb

The result was very nice for gpiopps, but the offset for the uplcom PPS
varied way more than the 1 ms that could be expected even from a constant 1
ms polling interval (this is a 1.1 device), more like a 5-6 ms interval -
some ntpq samples approximately 16 s apart:

      remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*<lan host>      194.58.205.148   2 u   40   64  377    0.996   -0.049   0.034
oPPS(0)          .gpio.           0 l   14   16  377    0.000   -0.004   0.004
  PPS(1)          .usb.            0 l   13   16  377    0.000   -7.090   3.273
      remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*<lan host>      194.58.205.148   2 u   56   64  377    0.996   -0.049   0.034
oPPS(0)          .gpio.           0 l   14   16  377    0.000    0.000   0.004
  PPS(1)          .usb.            0 l   13   16  377    0.000   -2.957   2.567
      remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*<lan host>      194.58.205.148   2 u    6   64  377    0.996   -0.049   0.034
oPPS(0)          .gpio.           0 l   15   16  377    0.000   -0.001   0.004
  PPS(1)          .usb.            0 l   14   16  377    0.000   -8.627   4.871

This *could* be taken to imply that there was also some polling going
on *in* the adapter towards the DCD pin - especially since the above
was with a 10 ms pulse, while if I shortened the pulse to 1 ms, the
variation went down to ~ 1 ms, but almost half the pulses were missed.
Or it might be due to shaky detection due to the lack of a ttl-to-rs232
converter. In any case pretty inconclusive, other than the observation that
it's certainly possible to mess things up...:-)

>>> I think people are massively confused by usb.  A usb 2.0 bus runs at
>>> 480MHz.  That means the time to transmit a packet describing a usb
>>> serial pin-change event takes literally a dozen or so nanoseconds.  The
>>> time it takes to transmit an entire sector of disk data is 2
>>> microseconds; even if continuous disk data is flowing, the usb serial
>>> adapter gets its round-robin opportunity to send a packet on the bus in
>>> between them.
>>
>> Yes, the transmission speed is obviously not a problem, the question
>> is about varying latency due to the polling.
>>
>>> A USB 2.0 bus spends most of its time idle.  The
>>> devices on the bus are polled, but the polling happens in time slots
>>> that are 125 microseconds wide.  There's just no reason for a lot of
>>> jitter or latency.
>>
>> In the newsgroup it was claimed that the polling frequency was 1 kHz
>> for USB 1.1 and 4 kHz for USB 2.0, but it seems it should indeed be 8
>> kHz for 2.0 "high" speed. And your test used one USB 1.1 device and
>> one 2.0 device.
>>
>> And "a lot" is a bit subjective, but for any polling at a frequency
>> that isn't an exact integral number of periods per second, there will
>> be a latency between the start of the PPS pulse and the detection in
>> the host that *varies* in an interval the size of the polling
>> interval. I believe that interval should thus be expected to be 1000
>> microseconds for 1.1 and 125 microseconds for 2.0.
>>
> 
> I think there is some confusion around the concept of usb 1.x devices
> on a usb 2.0 bus.  I think there may even be some confusion when a 1.x
> bus is involved.  And then adding to the confusion is the likelyhood
> that different usb-serial adapters use different usb transfer types
> (bulk vs interrupt)_to communicate line-state changes.
> 
> A usb 1.x bus is divided into 1ms frames.  A 2.0 bus is divided into
> 125us (micro-)frames.  For interrupt endpoints, a usb 1.x bus limits
> devices to 1 interrupt transfer per frame, and that may imply that
> there is up to 1ms of latency for reporting a DCD change on such a 1.x
> bus.  A 2.0 bus allows up to 3 interrupt transfers per microframe,
> implying latency of up to 125us.
> 
> However, there is no limit on either 1.x or 2.0 busses for how many
> bulk transfers can happen to a given endpoint during a frame.  The
> controller needs to fill a frame with transactions in a way that first
> provides all the g'teed bandwidth that is promised to control,
> interrupt, and isochronous transfers.  It is then free to fill all the
> remaining time with bulk transfers.
> 
> To me, this implies that you may end up with nearly no latency (and
> negligible jitter) if you have a usb 2.0 bus that has just one or two
> devices on it which are communicating via bulk-transfer endpoints.  The
> controller would be continuously sending BULK IN tokens to the one or
> two devices, so that as soon as one of them has data, it gets an
> opportunity to deliver it almost immediately (meaning within a few
> microseconds).
> 
> The results I see with FTDI usb-serial adapters which use bulk
> transfers provide some evidence that my theory may be correct.  I think
> the bus looks like this:
> 
>    BULK IN token to usb 1.x device (do you have anything to say?)
>    1.x device NAKs
>    BULK IN token to usb 2.0 device
>    2.0 device NAKs
>    <no significant amount of time elapses here>
>    BULK IN token to usb 1.x device
>    1.x device NAKs
>    ... (repeat forever)
> 
> In other words, the device(s) aren't getting 1 chance per frame to
> transfer data, they are getting many thousands of chances per second.
>   I think the bus overhead of the BULK-IN token followed by a NAK from
> the device, along with the various framing bits and crc and all that
> probably adds up to less than 64 bytes per poll.  But assuming it took
> as much as 64 bytes to do that, if there was one usb-serial device on a
> usb 2.0 bus, it would be getting asked about 1 million times per second
> whether it had anything to say.

This is extremely interesting - if it really is the case that the host
will poll "as fast as it can", as opposed to always doing it with a
fixed frequency, it would definitely change the picture. Unfortunately
I haven't seen any documentation to support that this is the case.

> I'd welcome input from low-level USB gurus about the bus and controller
> behavior in this regard.

I'm afraid we have lost freebsd-usb@ in this sub-thread (it was actually
the case before my first comment, but it would probably have happened
anyway, since I'm not subscribed to that list). Unless you know that
"low-level USB gurus" are also subscribed to freebsd-arm@, it might make
sense to forward your message to freebsd-usb@.

>> Your ntpq output showed an offset close to 200 microseconds for both
>> devices, and I *assumed* that it was more or less constant and thus
>> ntpd could trivially be told to correct for it - but maybe that
>> assumption was incorrect, there was only one instance of ntpq output?
>>
>> If it actually varied in an interval per above, I would expect the
>> jitter to be significantly higher though. And if it *is* more or less
>> constant, can you explain how this is possible? Even the 2.0 125
>> microsecond case should be clearly visible in the offset reported by
>> ntpd across a sequence of ntpq requests.
>>
>>> I'm not on a crusade to change the minds of people who make judgements
>>> based on gut feelings and reject objective measurements.  I put the
>>> measurements out there, and I described the measurement methodology.
>>> (Precision timing is what I do for a living, btw.)  I'm perfectly
>>> willing to explain the methodology in more detail or help interpret the
>>> results, but I'm not going to butt heads with people who just reject
>>> data they don't like for emotional reasons.
>>
>> Well, I guess a problem here is that it's my confused head that is
>> butted between yours and those of the supposedly-experts that
>> participate in the NTP newsgroup/maillist:-) - you already declined to
>> participate there, and I don't expect that any of them will take the
>> trouble to participate here. Maybe we'll just have to leave it at
>> that...
> 
> I had a brief look at whether I could get posting access to the ntp
> newsgroup and didn't find anything easy to set up and use, and I'm
> reluctant to get an nntp provider and install a newsreader for one
> conversation.

Understood - there are free nntp providers, but you do need a newsreader of
some kind. The newsgroup is sort-of gatewayed to the
questions@lists.ntp.org mailing list, but the gatewaying is pretty broken -
posts to the mailing list do not appear at all in the newsgroup, and posts
to the newsgroup only appear on the mailing list after manual approval by a
moderator (at least unless you are subscribed to the mailing list). Thus I
believe most participants use the newsgroup.

> (The 1980s me would be astounded to hear that future-me
> would have any reluctance to get involved in usenet.)

Well, there isn't much value there anymore, but there are some groups that
refuse to die.:-)

--Per



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?fe2c2d77-3030-6734-e1d8-c1375f231a24>