Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 19 Sep 2020 02:31:03 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Hans Petter Selasky <hps@selasky.org>
Cc:        freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: Comment #135 for bugzilla 237666 : a USB3-handling problem with a investigatory fix for a cortex-a72 context
Message-ID:  <F9E8A6A0-619B-4C6F-8485-032B8358B780@yahoo.com>
In-Reply-To: <866bd652-5a8b-7500-b1e0-7528c035048b@selasky.org>
References:  <6E618C3D-12DF-429E-A249-5BAB90FC6B15.ref@yahoo.com> <6E618C3D-12DF-429E-A249-5BAB90FC6B15@yahoo.com> <866bd652-5a8b-7500-b1e0-7528c035048b@selasky.org>

next in thread | previous in thread | raw e-mail | index | archive | help


On 2020-Sep-19, at 01:08, Hans Petter Selasky <hps at selasky.org> =
wrote:

> On 2020-09-19 05:41, Mark Millard via freebsd-arm wrote:
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D237666 is about =
USB3
>> problems that folks have been having for over a year. Bjoern A. Zeeb
>> comment #125 that looked like something I'd seen on a cortex-a72 when
>> I tried a kernel that had been built with -mcpu=3Dcortex-a72 : an
>> indefinite looping that involved "Resetting controller" for xhci0.
>> (This prevented booting from the USB3 device.)
>> Well, I got the A72 to boot and comment #135 indicates how but I'll
>> paste the information below. All that was involved was adding
>> examples of "dsb st" and one "dsb ld".
>> Comment 135's content:
>> A cortex-a72 success! (In the form of investigatory code, not
>> code to check-in as is.)
>> Just by adding some "dsb st" commands and a "dsb ld" command in
>> places in the xhci code (just for __arach64__), I've gotten the
>> cortext-A72 to boot from the USB3 SSD via a -mcpu=3Dcortex-a72
>> based kernel build. No more indefinitely repeating "Resetting
>> controller" problem.
>> I do not claim the additions are the minimal ones. I know one place
>> is required for sure: the last one added enabled the boot (given
>> the others were already present). Prior to that I still had the
>> indefinitely repeating "Resetting controller" problem.
>> There could be more places that should have dsb st or dsb ld for
>> overall correctness.
>> This evidence may suggest somewhat analogous problems for other
>> platforms than aarch64 that are seeing problems.
>> For the aarch64 context, the evidence is (the changes are)
>> as follows. The first "dsb st" textually is the last one I
>> added.
>> # svnlite diff /usr/src/sys/dev/usb/controller/xhci.c
>> Index: /usr/src/sys/dev/usb/controller/xhci.c
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- /usr/src/sys/dev/usb/controller/xhci.c	(revision 363590)
>> +++ /usr/src/sys/dev/usb/controller/xhci.c	(working copy)
>> @@ -431,6 +431,9 @@
>>    	phwr->hwr_ring_seg[0].qwEvrsTablePtr =3D htole64(addr);
>>  	phwr->hwr_ring_seg[0].dwEvrsTableSize =3D =
htole32(XHCI_MAX_EVENTS);
>> +#ifdef __aarch64__
>> +__asm __volatile("dsb st" : : : "memory");
>> +#endif
>>    	DPRINTF("ERDP(0)=3D0x%016llx\n", (unsigned long long)addr);
>>  @@ -1098,6 +1101,9 @@
>>    	while (1) {
>>  +#ifdef __aarch64__
>> +__asm __volatile("dsb ld" : : : "memory");
>> +#endif
>>  		temp =3D le32toh(phwr->hwr_events[i].dwTrb3);
>>    		k =3D (temp & XHCI_TRB_3_CYCLE_BIT) ? 1 : 0;
>> @@ -1107,6 +1113,9 @@
>>    		event =3D XHCI_TRB_3_TYPE_GET(temp);
>>  +#ifdef __aarch64__
>> +__asm __volatile("dsb ld" : : : "memory");
>> +#endif
>>  		DPRINTFN(10, "event[%u] =3D %u (0x%016llx 0x%08lx =
0x%08lx)\n",
>>  		    i, event, (long =
long)le64toh(phwr->hwr_events[i].qwTrb0),
>>  		    (long)le32toh(phwr->hwr_events[i].dwTrb2),
>> @@ -1239,6 +1248,9 @@
>>  	sc->sc_command_idx =3D i;
>>  	sc->sc_command_ccs =3D j;
>>  +#ifdef __aarch64__
>> +__asm __volatile("dsb st" : : : "memory");
>> +#endif
>>  	XWRITE4(sc, door, XHCI_DOORBELL(0), 0);
>>    	err =3D cv_timedwait(&sc->sc_cmd_cv, &sc->sc_bus.bus_mtx,
>> @@ -2855,6 +2867,9 @@
>>  	index =3D xfer->xroot->udev->controller_slot_id;
>>    	if (xfer->xroot->udev->flags.self_suspended =3D=3D 0) {
>> +#ifdef __aarch64__
>> +__asm __volatile("dsb st" : : : "memory");
>> +#endif
>>  		XWRITE4(sc, door, XHCI_DOORBELL(index),
>>  		    epno | XHCI_DB_SID_SET(xfer->stream_id));
>>  	}
>> @@ -4180,6 +4195,9 @@
>>    	for (n =3D 1; n !=3D XHCI_MAX_ENDPOINTS; n++) {
>>  		for (p =3D 0; p !=3D XHCI_MAX_STREAMS; p++) {
>> +#ifdef __aarch64__
>> +__asm __volatile("dsb st" : : : "memory");
>> +#endif
>>  			XWRITE4(sc, door, XHCI_DOORBELL(index),
>>  			    n | XHCI_DB_SID_SET(p));
>>  		}
>> I'm clearly just "evidence hacking" here. I've no clue what a
>> good design for allowing aarch64 build to supply having any of
>> those "dsb st"s or the "dsb ld".
>> Hopefully someone that knows what they are doing can figure out
>> if any of the other examples on other platforms are analogous
>> --and ,if they are, provide some hook for platform to contribute
>> such code to the xhci code.
>=20
> Hi Mark,
>=20
> Your finding indicate a problem in usb_pc_cpu_flush() and
>=20
> bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE);
>=20
> Try to put the dsb only after dmamap_sync.
>=20
> void
> usb_pc_cpu_flush(struct usb_page_cache *pc)
> {
>        if (pc->page_offset_end =3D=3D pc->page_offset_buf) {
>                /* nothing has been loaded into this page cache! */
>                return;
>        }
>        bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE);
> }
>=20
> The PCI I/O memory should be coherent and should not need any DSB's.

[I'll note that historically I've gotten one "Resetting Controller"
notice from the -mcpu=3Dcortex-a53 kernel during the sequence leading
to the root file system mount. I get none from the -mcpu=3Dcortex-a72
kernel after the changes that I reported.]

After adding a couple of more DSB ST instructions and rebuilding
(cross build), installing, and booting, I started a self-hosted,
from-scratch, buildworld buildlkernel and intend to installkernel
installworld and reboot if it appears to go well. (Better than
just a boot test for if things seem at least sufficient, even if
overkill.) But it will be hours before buildworld build kernel is
done (-j3). (I'll sleep during much of that time.)

I'll get to experiments you suggest after that. I assume that
you want the two (not one) dsb ld instructions removed as well.

I'm happy to do your experiments, despite the following note
about what I expect the result will be for the first one.

I do not expect your experiment to work in one place because
the addition that started things working does not involve a
usb_pc_cpu_flush for the event ring initialization that the
specific change was for:

        phwr =3D buf_res.buffer;
        addr =3D buf_res.physaddr;
        addr +=3D (uintptr_t)&((struct xhci_hw_root *)0)->hwr_events[0];
       =20
        /* reset hardware root structure */
        memset(phwr, 0, sizeof(*phwr));
       =20
        phwr->hwr_ring_seg[0].qwEvrsTablePtr =3D htole64(addr);
        phwr->hwr_ring_seg[0].dwEvrsTableSize =3D =
htole32(XHCI_MAX_EVENTS);
       =20
        DPRINTF("ERDP(0)=3D0x%016llx\n", (unsigned long long)addr);
           =20
#ifdef __aarch64__
__asm __volatile("dsb st" : : : "memory");
#endif
        XWRITE4(sc, runt, XHCI_ERDP_LO(0), (uint32_t)addr);
        XWRITE4(sc, runt, XHCI_ERDP_HI(0), (uint32_t)(addr >> 32));
       =20
        addr =3D buf_res.physaddr;
       =20
        DPRINTF("ERSTBA(0)=3D0x%016llx\n", (unsigned long long)addr);

        XWRITE4(sc, runt, XHCI_ERSTBA_LO(0), (uint32_t)addr);
        XWRITE4(sc, runt, XHCI_ERSTBA_HI(0), (uint32_t)(addr >> 32));

My understanding is that those last two lines start the event ring
and without the DSB ST above the:

        phwr->hwr_ring_seg[0].qwEvrsTablePtr =3D htole64(addr);
        phwr->hwr_ring_seg[0].dwEvrsTableSize =3D =
htole32(XHCI_MAX_EVENTS);

result was not observed by xhci but needs to be observed at that ring
start time.

Without the DSB ST above, xhci_interrupt_poll's code (with one
of my additions shown):

#ifdef __aarch64__
__asm __volatile("dsb ld" : : : "memory");
#endif
                temp =3D le32toh(phwr->hwr_events[i].dwTrb3);
       =20
                k =3D (temp & XHCI_TRB_3_CYCLE_BIT) ? 1 : 0;
       =20
                if (j !=3D k)
                        break;

always took the break because temp was always zero, k was always zero,
and j was 1. I expect this is because the wrong qwEvrsTablePtr and
dwEvrsTableSize content was in use by the xhci (fixed by the DSB ST).
(I had a temporarily version that printed the values. With and without
the DSB LD made no difference without the DSB ST that was later added.)

=46rom what I understand, the cortex-a72 has far more structure that
involves the "DSB completes when" behavior being needed compared
to the cortex-a53 and code tuned to the A72 is more likely to end
up needing such. (Speculation and speculative-driven cache contents,
out of order execution, and probably more.)

QUOTE
	=E2=80=A2 all explicit memory accesses that are observed by Pe =
before the DSB is executed, are of the required access types, and are =
from observers in the same required shareability domain as Pe, are =
complete for the set of observers in the required shareability domain
	=E2=80=A2 all cache, branch predictor, and TLB maintenance =
operations issued by Pe before the DSB are complete for the required =
shareability domain.
END QUOTE

(Pe means "executing processor".)

I expect the removal of that specific DSB ST to result in
xhci_interrupt_poll's code again taking the break every time,
just based on event ring behavior alone.


I'll note that the RPi4B SBC has a not-physically-exposed PCIe
that the USB3 is off of, and the UEFI/ACPI that I have in use for
the RPi4B does not expose the PICe to the loader or OS at all.
ACPI does expose the USB3 related material.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F9E8A6A0-619B-4C6F-8485-032B8358B780>