Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 30 Jul 2016 20:54:00 +0200
From:      Michal Meloun <mmel@FreeBSD.org>
To:        Nathan Whitehorn <nwhitehorn@freebsd.org>
Cc:        "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>, Svatopluk Kraus <skra@freebsd.org>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: INTRNG (Was: svn commit: r301453....)
Message-ID:  <579CF7C8.1040302@FreeBSD.org>
In-Reply-To: <460fa0b3-ddb7-6247-2412-3d75a589d5e7@freebsd.org>
References:  <201606051620.u55GKD5S066398@repo.freebsd.org> <5794720F.4050303@FreeBSD.org> <8bfd8668-bc49-e109-e610-b5cd470be3ec@freebsd.org> <57950005.6070403@FreeBSD.org> <f82018ee-51e7-60fa-2682-f0ef307a52b5@freebsd.org> <57961549.4020105@FreeBSD.org> <e2cace17-0924-2084-5fcf-626f87e41cc3@freebsd.org> <CANCZdfr%2BZ4XxXRY0yMiWXwp=8iKq54y3uJ9-OfAOdfxAs1qdtw@mail.gmail.com> <f94bfd25-fabf-efc3-55c9-cfdfd9e4d6e6@freebsd.org> <CANCZdfpz=z3gc3pyb_Qssa3vGJSnPv_r6J-SWDPPpE9zPYB9=w@mail.gmail.com> <ab44ddb1-515b-94ac-6b12-673b7c53d658@freebsd.org> <57976867.6080705@FreeBSD.org> <f2edac8f-2859-cd98-754e-881e2b2d1e63@freebsd.org> <5798E104.5020104@FreeBSD.org> <a5d43044-1733-6cc7-2e99-e85b60b0fcf3@freebsd.org> <579A25BB.8070206@FreeBSD.org> <30790e40-58b4-3371-c0f0-b7545571f389@freebsd.org> <579AFFC5.1040005@FreeBSD.org> <eb603349-eb88-866d-7a26-9e026518fd39@freebsd.org> <579CD355.1050203@FreeBSD.org> <460fa0b3-ddb7-6247-2412-3d75a589d5e7@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Dne 30.07.2016 v 18:59 Nathan Whitehorn napsal(a):
>
>
> On 07/30/16 09:18, Michal Meloun wrote:
>> Dne 29.07.2016 v 16:41 Nathan Whitehorn napsal(a):
>> [snip]
>> Svata is online now, so i think that he is more competent for responding
>> to all questions about INTRNG internals.
>>
>>
>>>>>> But, as you wrote,  INTRNG code is "absolutely free to map and
>>>>>> dispatch
>>>>>> multiple virtual IRQs from the same shared interrupt pin" (allow
>>>>>> me to
>>>>>> expand this also to INTRNG code itself) .
>>>>>> In attempt to simply  situation abound cache, in first step, we
>>>>>> removed
>>>>>> duplicates check from above model.  This also removes unpleasantly
>>>>>> sharing of entries.
>>>>>> This step converts cache to some sort of duplicate of resource list
>>>>>> entries. Each existing RLE have exactly one corresponding entry in
>>>>>> cache.
>>>>>> Also,  the cache is not needed for INTRNG core. At this point, its
>>>>>> only
>>>>>> transfers data from ofw_bus_intr_to_rl() to consumers PIC.
>>>>>>
>>>>>> So, if we are able to attach 'key' to RLE,  then we can remove the
>>>>>> whole
>>>>>> cache machinery from INTRNG.
>>>>>> Do you agree?
>>>>> No. Because you have to support two things that this still can't
>>>>> allow:
>>>>> 1. Devices attaching before their interrupt controllers.
>>>> You still expect that data readed from OFW  must be parsed at RLE
>>>> creation time. I don't see single reason why we cannot postpone
>>>> parsing to bus_alloc_resource() time.
>>>>    At this point, its irrelevant if  bus_alloc_resource()  supports
>>>> detached PIC or not.
>>> Because there are *lots* of cases in the kernel where interrupts are
>>> represented as a number and not a struct resource *. I've listed many
>>> of them; I'm sure there are more. PCI is the most notable example
>>> (both LSI and MSI). For both of these, the route_interrupt (LSI) and
>>> alloc_msi[x]() (MSI) functions give the bus driver a chance to look at
>>> the device tree and then they return an IRQ number. There doesn't seem
>>> to be any way to map that back to a struct resource * or something
>>> besides an internal table.
>>>
>>> Moreover, there are cases (self-added interrupts by children, for
>>> instance) in which the RL assigned by the parent does not reflect the
>>> final RL used by the child and there is no way for the parent bus to
>>> map an RID to a device tree entry.
>>>
>>> When we moved to doing this at interrupt assignment time rather than
>>> resource allocation time or another occasion on PowerPC, it was for
>>> these kinds of reasons. There were ways to do it later, but they were
>>> hugely invasive and involved changing large parts of the kernel. Even
>>> then, it would still be fragile: Trying to guess what device tree
>>> entry was meant at bus_alloc_resource() time is much more error-prone
>>> than doing the mapping when reading that device tree to assign the
>>> resources in the first place, when you can make guarantees.
>>>
>>>>> 2. Interrupts encoded entirely in a 32-bit integer rather than a
>>>>> struct resource * in a shared rman. This is required to support PCI
>>>>> (both LSI and MSI).
>>>>>
>>>>> #2 could of course be fixed by completely retooling the API of the
>>>>> PCI
>>>>> bus code, but, since it fixes something that currently is not a
>>>>> problem, why would we do that?
>>>> All relevant bus functions have struct resource * as an argument. And,
>>>> in post r301453 kernel, all necessary data are attached to it.
>>>> I don't see any reason for API change.
>>> There is:
>>> int PCIB_ROUTE_INTERRUPT(device_t, device_t, pin)
>>>
>>> Which takes a PIN (A, B, C, D) and returns the IRQ to add to the RL.
>>> This is the only real opportunity to parse an interrupt-map and does
>>> not deal with struct resource.
>>>
>>> There are also PCIB_ALLOC_MSI[X], which take a device_t and count and
>>> return a vector of IRQ numbers.
>>>
>>> How would you do resource allocation given these APIs? There seem to
>>> be some dubious-looking attempts at generic MSI mapping code in the
>>> new kern/subr_intr.c that don't allow the specification of more than
>>> 32-bit interrupt specifiers and otherwise exactly duplicate the
>>> pre-301453 flow for MSIs, but in a more complicated and
>>> less-functional way.
>>>
>>> (While investigating this, I also just found
>>> dev/pci/pci_host_generic.c, which mostly does the wrong things, is
>>> actually specific to a couple of ARM boards, and duplicates code in
>>> dev/ofw/ofwpci.c. But that's another discussion...)
>> [snip][2]
>>>>>>>> Each OFW device has a lot of dependencies. It must enable  clocks,
>>>>>>>> power
>>>>>>>> regulators..., it must be taken from reset. All these action
>>>>>>>> must be
>>>>>>>> done before we can access single register.
>>>>>>>> Most of these action can be done  only with attached
>>>>>>>> clock/power/reset
>>>>>>>> controller.
>>>>>>>> Within this, interrupts are not special.
>>>>>>> They are actually quite special in that the kernel enables them
>>>>>>> late
>>>>>>> and so you can defer setup. They are also special in that, for that
>>>>>>> reason, it is possible to design systems with circular dependency
>>>>>>> graphs with interrupts. It is not possible to do this with, for
>>>>>>> example, clocks: if I need to apply a clock to the clock controller
>>>>>>> with itself, the system is just not bootable and such a system will
>>>>>>> not be built or shipped. Interrupts need only happen later, after
>>>>>>> device setup, and so such systems *are* designed and shipped. The
>>>>>>> same
>>>>>>> is true for power.
>>>>>> The G5 problem is standard  'cross type' circular dependency.  You
>>>>>> must
>>>>>> (at BUS_PASS_BUS)  initialize memory address window of PCI bridge
>>>>>> and
>>>>>> start bus enumeration.
>>>>>> Rest of PCI bridge initialization (including all IRQ stuff) must be
>>>>>> postponed to any later pass. Of course, all interrupt controllers
>>>>>> must
>>>>>> be attached at BUS_PASS_INTERRUPT.
>>>>>> Or I missed something?
>>>>> Yes. As I said, you could solve it that way. It would, of course,
>>>>> require a massive retooling of newbus (to allow partial delayed
>>>>> attach), everything in dev/pci (which doesn't expect that), and
>>>>> (almost) every PCI driver in and out of the tree.
>>>> I still think that only host to PCI bridge must me modified, and
>>>> PCI-PCI
>>>> bridge must be run at  BUS_PASS_BUS (which is single line
>>>> modification).
>>> On Powermacs, one PIC is 5 levels deep in the bus hierarchy, under 3
>>> PCI<->PCI bridges and a device that also has an ATA controller and a
>>> bunch of other things.
>> Yes, I see. And, as a said before,  it's pretty common situation in OFW
>> based world.
>> Imho, all whats is needed is:
>> https://github.com/strejda/freebsd/commit/96ab64e04dcebb131643c2ae1cf373194d23e8e5
>>
>> (of course, it's not tested, I havn't access to powermac HW)
>
> That would probably work on at least some of the hardware. Some of
> those parents can have their own interrupts, however, which would
> break. I'll investigate how often this happens.
>
>> It's hard for me to understand why you say that this change requires
>> "massive retooling of newbus,  everything in dev/pci,  every PCI
>> driver).
>>
>>> Some of those can have their own interrupts (which is fairly common
>>> for error reporting or PCI-E hotplug). It wouldn't be impossible, but
>>> it is decidedly nontrivial.
>> That's true.  Any 'bus' driver, in multipass environment, *must* export
>> provided resources,  make bus accessible and enumerate it at
>> 'BUS_PASS_BUS', and
>> consume resources at some later bus pass.
>> At this time, only pcie-hotplug  needs this change. But I don't see why
>> a few lines change is "decidedly nontrivial".
>
> It's not just a few lines change. Newbus provides no mechanism for a
> bus to attach at two different bus passes.
??
Please see:
https://github.com/strejda/tegra/blob/master/sys/arm/nvidia/drm2/tegra_host1x.c#L639

and
https://github.com/strejda/tegra/blob/master/sys/arm/nvidia/drm2/tegra_host1x.c#L451


>
>>>>> Or: you could skip all of that and use the fully functional mechanism
>>>>> that already exists.
>>>> The problem is, at least from my point of view, that we don't have it.
>>>> The actual MIPS code doesn't work on ARM for numerous reasons. 
>>>> This is
>>>> first one ->
>>>>  
>>>> https://svnweb.freebsd.org/base/head/sys/powerpc/powerpc/nexus.c?revision=297000&view=markup#l186
>>>> The pre-r301453  ARM INTRNG doesn't support detached interrupts.
>>>> And r301453  doesn't changed nothing about this.
>>> I'm happy to write whatever code is missing. The ARM implementation of
>>> ofw_bus_map_intr() does seem fairly braindead and should be replaced.
>>>
>>> So here is where I am right now:
>>> - The perceived advantage of the new API seems to be that the mapping
>>> information (interrupt parent, etc.) ends up in a struct resource
>>> instead of in a centralized mapping table
>> True. And, in optimal case,  also in RLE. See [1].
>
> Your code also has a centralized table (static struct intr_irqsrc
> *irq_sources[NIRQ];), basically the same one, so I'm actually really
> confused on this point. The only actual difference seems to be that
> the firmware interrupt descriptor is stored in the RLE at
> bus_alloc_resource() time instead of in the table at bus enumeration
> time and that the new code requires some extra bus methods.
The irq_sources holds only real interrupts (all interrupts pin from all
interrupts controllers).  Your effort is move all 'interrupt description
data' out of INTRNG code -
to RLE (at bus enumeration time) (see [1])  and to struct resource (at 
bus_alloc_resource() time). The new bus method is only temporary
shortcut to bypass 11 release.

But, to be fair, I was hoping it might be necessary.

[1] This is last missing step in whole patch series and, in optimal
case, it need new filed(s) in RLE structure. So we chose to postpone
discussion about this to time after 11 release.

>
>>> - Additionally, it gives you a better shot at having the PIC online
>>> before the PIC's interrupts are parsed (which is not required, but
>>> nice).
>> Nope,  we simply fount that detached pics is very dangerous and not
>> needed. But, if you love it, it can be added as MD extension.
>
> Dangerous how? I don't "love it"; it's an unfortunate necessity.
Imho, some drivers may rely on correct return code from
bus_alloc_resource() or from bus_setup_intr().
For example to detect if interrupt exist, or if given interrupt
configuration is supported.

>
>>> - Beyond these aesthetic points, there are no concrete examples of new
>>> functionality added by this API, aside from some minor implementation
>>> bugs of the old one on ARM that are easily fixed.
>> Really? Or you just ignore it?
>
> Which ones? I've been asking for a week and a half now and you have
> responded with some hypothetical examples, all of which either (a) do
> not seem to occur in the real world and (b) were trivially
> implementable with the existing code.
>
>>
>>> - There are, conversely, a number of concrete cases where this new API
>>> would not be able to do the right thing. Some of these can be worked
>>> around or fixed with significant restructuring in the MI parts of the
>>> kernel.
>> And i offer you a fix, without "significant restructuring in the MI
>> parts of the kernel". Unfortunately, you did not want to accept anything
>> other than the old API.
>
> For one of them, partially. For the others, not so much. For example,
> I have no idea how to implement PCI interrupts (PCIB_ROUTE_INTERRUPT)
> in this framework. I am perfectly happy to accept a new API. What I am
> not perfectly happy to accept is an API that makes it impossible for
> me to support the platforms that I need to support and that,
> simultaneously, doesn't even provide any new capabilities on other
> platforms.
>From one of previous mail:
<quote>
But again, all what I want in this area is (for me) simple change of
your ofw_bus_map_intr() method to something like:

enum intr_map_data_type {
   INTR_MAP_DATA_OFW,
   INTR_MAP_DATA_GPIO,
   ...
};

int
bus_map_intr(device_t dev,  enum intr_map_data_type type,  uintptr_t
pic_id,  void *config_data, int config_size)
</quote>
> The current API is certainly a bit of a hack and I would be pleased to
> see it go; but the replacement needs to be more functional and more
> robust. As far as I can tell, I literally *cannot* move PowerPC over
> to this architecture.
Yes, at this time, I agree. If you can accept enhancement of
(owf_)bus_map_intr() then we can move to next, significantly less
hackish, state.
 
> The ideal API would be one in which the resource enumeration code
> could assign, instead of numbers, some structured information that
> contains the full firmware specifier (string parent + interrupt pin
> for ACPI, interrupt parent phandle + specifier for device tree, bare
> numbers for simple systems). 
Yes, exactly. I am happy that at least we agree on something. The only
difference is that we want triplet <OFW, interrupt parent phandle,
specifier> or <ACPI, string parent, interrupt pin>

> That seemed, and seems, too invasive. I think we agree on this and
> have chosen to approximate that API in two different ways.
I still think that we found way how we can realize the above idea in
non-invasive way. 
>From my point of view the solution is:
Put/attach the above triplet to RLE and then copy it (within
bus_resource_attach()  to struct resource.
But yes, I understand that I am not able to explain clearly enough.

>
> When we designed the interrupt mapping code, we evaluated a large
> number of possible approaches. Something very much like this was one
> of them. It was rejected as being too fragile (it requires resource
> allocation and enumeration to be coupled) and to have too many corner
> cases (the PCI MSI and LSI APIs, for instance). The interrupt mapping
> stuff seemed at the time, and so far still seems, like the least-bad
> approximation to the ideal case: it is essentially that ideal API, in
> that it happens at bus enumeration and involves just assigning the
> firmware specifiers, but with lookup keys to that information stuffed
> into the 32-bit numbers that the rest of the system uses.
The raw PCI (or MSIX)  case is relatively simple. The PCI domain uses
raw numbers and it's host-to-pci bridge job to translate raw numbers
from PCI domain to (for example) OFW based resource.
At this point, I'm still not sure about MSI...

Michal

>>> - If we have both, the interrupt handling code in the MI parts of the
>>> kernel will bifurcate. This patch alone has added a bunch of #ifdef to
>>> kern/subr_bus.c, a new file to kern/, and lots of #ifdef to dev/ofw.
>>> This is going to be really hard to maintain if we need both.
>> All those #ifdefs  has been added as safeguards for 11/stable and they
>> can be removed immediately.
>
> They should not be and I will request immediate reversion and appeal
> to core@ if they are before this discussion is complete.
> -Nathan
>
>> Michal
>>
>>> Is this all correct?
>>>
>>> If so, can we please back this out until this discussion is complete?
>>> I'm asking this formally at this point, under the Committer's Guide
>>> section about reversion requests.
>>> -Nathan
>> _______________________________________________
>> freebsd-arch@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>>
>
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?579CF7C8.1040302>