Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 19 Sep 2020 13:46:18 +0200
From:      Hans Petter Selasky <hps@selasky.org>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: Comment #135 for bugzilla 237666 : a USB3-handling problem with a investigatory fix for a cortex-a72 context
Message-ID:  <578faf26-6042-91ec-c639-2bca17105fae@selasky.org>
In-Reply-To: <F9E8A6A0-619B-4C6F-8485-032B8358B780@yahoo.com>
References:  <6E618C3D-12DF-429E-A249-5BAB90FC6B15.ref@yahoo.com> <6E618C3D-12DF-429E-A249-5BAB90FC6B15@yahoo.com> <866bd652-5a8b-7500-b1e0-7528c035048b@selasky.org> <F9E8A6A0-619B-4C6F-8485-032B8358B780@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-09-19 11:31, Mark Millard wrote:
> 
> 
> On 2020-Sep-19, at 01:08, Hans Petter Selasky <hps at selasky.org> wrote:
> 
>> On 2020-09-19 05:41, Mark Millard via freebsd-arm wrote:
>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237666 is about USB3
>>> problems that folks have been having for over a year. Bjoern A. Zeeb
>>> comment #125 that looked like something I'd seen on a cortex-a72 when
>>> I tried a kernel that had been built with -mcpu=cortex-a72 : an
>>> indefinite looping that involved "Resetting controller" for xhci0.
>>> (This prevented booting from the USB3 device.)
>>> Well, I got the A72 to boot and comment #135 indicates how but I'll
>>> paste the information below. All that was involved was adding
>>> examples of "dsb st" and one "dsb ld".
>>> Comment 135's content:
>>> A cortex-a72 success! (In the form of investigatory code, not
>>> code to check-in as is.)
>>> Just by adding some "dsb st" commands and a "dsb ld" command in
>>> places in the xhci code (just for __arach64__), I've gotten the
>>> cortext-A72 to boot from the USB3 SSD via a -mcpu=cortex-a72
>>> based kernel build. No more indefinitely repeating "Resetting
>>> controller" problem.
>>> I do not claim the additions are the minimal ones. I know one place
>>> is required for sure: the last one added enabled the boot (given
>>> the others were already present). Prior to that I still had the
>>> indefinitely repeating "Resetting controller" problem.
>>> There could be more places that should have dsb st or dsb ld for
>>> overall correctness.
>>> This evidence may suggest somewhat analogous problems for other
>>> platforms than aarch64 that are seeing problems.
>>> For the aarch64 context, the evidence is (the changes are)
>>> as follows. The first "dsb st" textually is the last one I
>>> added.
>>> # svnlite diff /usr/src/sys/dev/usb/controller/xhci.c
>>> Index: /usr/src/sys/dev/usb/controller/xhci.c
>>> ===================================================================
>>> --- /usr/src/sys/dev/usb/controller/xhci.c	(revision 363590)
>>> +++ /usr/src/sys/dev/usb/controller/xhci.c	(working copy)
>>> @@ -431,6 +431,9 @@
>>>     	phwr->hwr_ring_seg[0].qwEvrsTablePtr = htole64(addr);
>>>   	phwr->hwr_ring_seg[0].dwEvrsTableSize = htole32(XHCI_MAX_EVENTS);
>>> +#ifdef __aarch64__
>>> +__asm __volatile("dsb st" : : : "memory");
>>> +#endif
>>>     	DPRINTF("ERDP(0)=0x%016llx\n", (unsigned long long)addr);
>>>   @@ -1098,6 +1101,9 @@
>>>     	while (1) {
>>>   +#ifdef __aarch64__
>>> +__asm __volatile("dsb ld" : : : "memory");
>>> +#endif
>>>   		temp = le32toh(phwr->hwr_events[i].dwTrb3);
>>>     		k = (temp & XHCI_TRB_3_CYCLE_BIT) ? 1 : 0;
>>> @@ -1107,6 +1113,9 @@
>>>     		event = XHCI_TRB_3_TYPE_GET(temp);
>>>   +#ifdef __aarch64__
>>> +__asm __volatile("dsb ld" : : : "memory");
>>> +#endif
>>>   		DPRINTFN(10, "event[%u] = %u (0x%016llx 0x%08lx 0x%08lx)\n",
>>>   		    i, event, (long long)le64toh(phwr->hwr_events[i].qwTrb0),
>>>   		    (long)le32toh(phwr->hwr_events[i].dwTrb2),
>>> @@ -1239,6 +1248,9 @@
>>>   	sc->sc_command_idx = i;
>>>   	sc->sc_command_ccs = j;
>>>   +#ifdef __aarch64__
>>> +__asm __volatile("dsb st" : : : "memory");
>>> +#endif
>>>   	XWRITE4(sc, door, XHCI_DOORBELL(0), 0);
>>>     	err = cv_timedwait(&sc->sc_cmd_cv, &sc->sc_bus.bus_mtx,
>>> @@ -2855,6 +2867,9 @@
>>>   	index = xfer->xroot->udev->controller_slot_id;
>>>     	if (xfer->xroot->udev->flags.self_suspended == 0) {
>>> +#ifdef __aarch64__
>>> +__asm __volatile("dsb st" : : : "memory");
>>> +#endif
>>>   		XWRITE4(sc, door, XHCI_DOORBELL(index),
>>>   		    epno | XHCI_DB_SID_SET(xfer->stream_id));
>>>   	}
>>> @@ -4180,6 +4195,9 @@
>>>     	for (n = 1; n != XHCI_MAX_ENDPOINTS; n++) {
>>>   		for (p = 0; p != XHCI_MAX_STREAMS; p++) {
>>> +#ifdef __aarch64__
>>> +__asm __volatile("dsb st" : : : "memory");
>>> +#endif
>>>   			XWRITE4(sc, door, XHCI_DOORBELL(index),
>>>   			    n | XHCI_DB_SID_SET(p));
>>>   		}
>>> I'm clearly just "evidence hacking" here. I've no clue what a
>>> good design for allowing aarch64 build to supply having any of
>>> those "dsb st"s or the "dsb ld".
>>> Hopefully someone that knows what they are doing can figure out
>>> if any of the other examples on other platforms are analogous
>>> --and ,if they are, provide some hook for platform to contribute
>>> such code to the xhci code.
>>
>> Hi Mark,
>>
>> Your finding indicate a problem in usb_pc_cpu_flush() and
>>
>> bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE);
>>
>> Try to put the dsb only after dmamap_sync.
>>
>> void
>> usb_pc_cpu_flush(struct usb_page_cache *pc)
>> {
>>         if (pc->page_offset_end == pc->page_offset_buf) {
>>                 /* nothing has been loaded into this page cache! */
>>                 return;
>>         }
>>         bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE);
>> }
>>
>> The PCI I/O memory should be coherent and should not need any DSB's.
> 
> [I'll note that historically I've gotten one "Resetting Controller"
> notice from the -mcpu=cortex-a53 kernel during the sequence leading
> to the root file system mount. I get none from the -mcpu=cortex-a72
> kernel after the changes that I reported.]
> 
> After adding a couple of more DSB ST instructions and rebuilding
> (cross build), installing, and booting, I started a self-hosted,
> from-scratch, buildworld buildlkernel and intend to installkernel
> installworld and reboot if it appears to go well. (Better than
> just a boot test for if things seem at least sufficient, even if
> overkill.) But it will be hours before buildworld build kernel is
> done (-j3). (I'll sleep during much of that time.)
> 
> I'll get to experiments you suggest after that. I assume that
> you want the two (not one) dsb ld instructions removed as well.
> 
> I'm happy to do your experiments, despite the following note
> about what I expect the result will be for the first one.
> 
> I do not expect your experiment to work in one place because
> the addition that started things working does not involve a
> usb_pc_cpu_flush for the event ring initialization that the
> specific change was for:
> 
>          phwr = buf_res.buffer;
>          addr = buf_res.physaddr;
>          addr += (uintptr_t)&((struct xhci_hw_root *)0)->hwr_events[0];
>          
>          /* reset hardware root structure */
>          memset(phwr, 0, sizeof(*phwr));
>          
>          phwr->hwr_ring_seg[0].qwEvrsTablePtr = htole64(addr);
>          phwr->hwr_ring_seg[0].dwEvrsTableSize = htole32(XHCI_MAX_EVENTS);
>          
>          DPRINTF("ERDP(0)=0x%016llx\n", (unsigned long long)addr);
>              
> #ifdef __aarch64__
> __asm __volatile("dsb st" : : : "memory");
> #endif

Hi Mark,

The values in ERDP_LO/HI are not used until the RUN bit is set.

>          XWRITE4(sc, runt, XHCI_ERDP_LO(0), (uint32_t)addr);
>          XWRITE4(sc, runt, XHCI_ERDP_HI(0), (uint32_t)(addr >> 32));
>          
>          addr = buf_res.physaddr;
>          
>          DPRINTF("ERSTBA(0)=0x%016llx\n", (unsigned long long)addr);
> 
>          XWRITE4(sc, runt, XHCI_ERSTBA_LO(0), (uint32_t)addr);
>          XWRITE4(sc, runt, XHCI_ERSTBA_HI(0), (uint32_t)(addr >> 32));
> 
> My understanding is that those last two lines start the event ring
> and without the DSB ST above the:

No they don't.

> 
>          phwr->hwr_ring_seg[0].qwEvrsTablePtr = htole64(addr);
>          phwr->hwr_ring_seg[0].dwEvrsTableSize = htole32(XHCI_MAX_EVENTS);
> 
> result was not observed by xhci but needs to be observed at that ring
> start time.
> 
> Without the DSB ST above, xhci_interrupt_poll's code (with one
> of my additions shown):
> 
> #ifdef __aarch64__
> __asm __volatile("dsb ld" : : : "memory");
> #endif

Probably same bug in the USB invalidate function.

Try to patch those generic flush/invalidate functions instead. I've been 
very careful about those.

>                  temp = le32toh(phwr->hwr_events[i].dwTrb3);
>          
>                  k = (temp & XHCI_TRB_3_CYCLE_BIT) ? 1 : 0;
>          
>                  if (j != k)
>                          break;
> 
> always took the break because temp was always zero, k was always zero,
> and j was 1. I expect this is because the wrong qwEvrsTablePtr and
> dwEvrsTableSize content was in use by the xhci (fixed by the DSB ST).
> (I had a temporarily version that printed the values. With and without
> the DSB LD made no difference without the DSB ST that was later added.)
> 
>  From what I understand, the cortex-a72 has far more structure that
> involves the "DSB completes when" behavior being needed compared
> to the cortex-a53 and code tuned to the A72 is more likely to end
> up needing such. (Speculation and speculative-driven cache contents,
> out of order execution, and probably more.)
> 
> QUOTE
> 	• all explicit memory accesses that are observed by Pe before the DSB is executed, are of the required access types, and are from observers in the same required shareability domain as Pe, are complete for the set of observers in the required shareability domain
> 	• all cache, branch predictor, and TLB maintenance operations issued by Pe before the DSB are complete for the required shareability domain.
> END QUOTE

It smells like a generic problem in the busdma synchronize code.

> 
> (Pe means "executing processor".)
> 
> I expect the removal of that specific DSB ST to result in
> xhci_interrupt_poll's code again taking the break every time,
> just based on event ring behavior alone.
> 

Again, try to patch both
usb_pc_cpu_flush()
and
usb_pc_cpu_invalidate()

> 
> I'll note that the RPi4B SBC has a not-physically-exposed PCIe
> that the USB3 is off of, and the UEFI/ACPI that I have in use for
> the RPi4B does not expose the PICe to the loader or OS at all.
> ACPI does expose the USB3 related material.

--HPS



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?578faf26-6042-91ec-c639-2bca17105fae>