Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Jul 2016 09:03:33 +0200
From:      Michal Meloun <mmel@FreeBSD.org>
To:        Nathan Whitehorn <nwhitehorn@freebsd.org>
Cc:        "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>, Svatopluk Kraus <skra@freebsd.org>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: INTRNG (Was: svn commit: r301453....)
Message-ID:  <579AFFC5.1040005@FreeBSD.org>
In-Reply-To: <30790e40-58b4-3371-c0f0-b7545571f389@freebsd.org>
References:  <201606051620.u55GKD5S066398@repo.freebsd.org> <57907B0F.9070204@FreeBSD.org> <9d2a224c-b787-2875-5984-a7a2354e8695@freebsd.org> <57934ABD.6010807@FreeBSD.org> <4e7a3e8f-cc21-f5f2-e3e0-4dbd554a4cd0@freebsd.org> <5794720F.4050303@FreeBSD.org> <8bfd8668-bc49-e109-e610-b5cd470be3ec@freebsd.org> <57950005.6070403@FreeBSD.org> <f82018ee-51e7-60fa-2682-f0ef307a52b5@freebsd.org> <57961549.4020105@FreeBSD.org> <e2cace17-0924-2084-5fcf-626f87e41cc3@freebsd.org> <CANCZdfr%2BZ4XxXRY0yMiWXwp=8iKq54y3uJ9-OfAOdfxAs1qdtw@mail.gmail.com> <f94bfd25-fabf-efc3-55c9-cfdfd9e4d6e6@freebsd.org> <CANCZdfpz=z3gc3pyb_Qssa3vGJSnPv_r6J-SWDPPpE9zPYB9=w@mail.gmail.com> <ab44ddb1-515b-94ac-6b12-673b7c53d658@freebsd.org> <57976867.6080705@FreeBSD.org> <f2edac8f-2859-cd98-754e-881e2b2d1e63@freebsd.org> <5798E104.5020104@FreeBSD.org> <a5d43044-1733-6cc7-2e99-e85b60b0fcf3@freebsd.org> <579A25BB.8070206@FreeBSD.org> <30790e40-58b4-3371-c0f0-b7545571f389@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Dne 28.07.2016 v 19:23 Nathan Whitehorn napsal(a):
>
>
> On 07/28/16 08:33, Michal Meloun wrote:
>> Dne 27.07.2016 v 19:19 Nathan Whitehorn napsal(a):
>>>
>>> On 07/27/16 09:27, Michal Meloun wrote:
>>>
>>>
>>> [snip]
>>>>> The concept is *extremely* necessary, which is why both Linux and
>>>>> FreeBSD decided to use it independently There is no way to handle
>>>>> parent buses with a single rman and devices on multiple PICs with
>>>>> overlapping interrupt ranges without them; neither is there a way to
>>>>> decode arbitrary-length interrupt specifiers or to handle things like
>>>>> MSIs. Please see the list of cases at
>>>>> https://wiki.freebsd.org/Complicated_Interrupts and in my earlier
>>>>> email for some examples of things that you just can't represent with
>>>>> this new system, as far as I can tell.
>>>>>
>>>>>>      Also, both variants needs attached PIC at bus_alloc_resource()
>>>>>> time,
>>>>>> so timing wasn't been changed.
>>>>> They absolutely do not, as I have explained repeatedly. Due to parent
>>>>> devices with interrupts handled by their bus children, this is a hard
>>>>> requirement of any workable system. This is not a theoretical
>>>>> issue; I
>>>>> have lots of hardware like this.
>>>> Nathan,  I'm talking  about pre /post  r301453 state. Current
>>>> INTRNG it
>>>> has never supported. But yes,  I'm fully understand why you want
>>>> it. See
>>>> [2].
>>> Current "INTRNG", at least on PowerPC, has supported this for 10
>>> years. If it doesn't on ARM, it's because of some trivial
>>> implementation bug.
>>>>>> - it implements new BUS_MAP_INTR(). As I understand it,  this is
>>>>>> problematic for you, and I'm ready to change it. But I need more
>>>>>> details
>>>>>> than "it's fundamentally broken".
>>>>> Please explain (a) what cases it handles that the existing code and
>>>>> does, and (b) how you would resolve each of the cases on the wiki
>>>>> page
>>>>> I sent.
>>>>>
>>>>> The general issue is that it traces the newbus hierarchy, when
>>>>> interrupts often do not, and so breaks when you have links to
>>>>> as-yet-unattached parts of the hierarchy. It also relies on the
>>>>> assigned "IRQ" numbers being nonoverlapping to avoid rman errors, but
>>>>> that's a relatively minor thing.
>>>> Well, from my point of view,  only  "get interrupt data from
>>>> OFW/GPIO/.."  part of bus_alloc_resource() follows bus hierarchy, 
>>>> real
>>>> execution not changed.
>>>> In any case,  I see many possible variants how we can modify current
>>>> implementation. For rest, see [2].
>>> But that's an important part, for three reasons:
>>> 1. The PIC may not exist when bus_alloc_resource() is called
>>> 2. The bus parents have no useful information to contribute to this
>>> since the hierarchy doesn't go that way
>> See [2]
>>
>>> 3. If you try to use the interrupt pin as the resource ID, which
>>> r301453 does, you end up with rman conflicts if, for example, a bus
>>> has two children with interrupts on identically numbered pins on two
>>> different interrupt controllers.
>> No, r301453 doesn't do this.
>> Index to global interrupt source table (irq_sources) is used as resource
>> ID (and r301453 not changed it).
>
> Ah, I see that now. Apologies for the confusion.
>
> So, to understand fully: your major complaint about the existing code
> is that it uses arbitrary numbers reflecting some table lookup value
> as IRQ numbers, which r301453 *also* does?
No, my major complaint is that MIPS and pre-r301453 INTRNG mixes data
from independent sources in one table (irq_sources) . Which  r301453
eliminates.
All what I want is to have data associated to RLE separated from data
associated to already allocated (by bus_alloc_resouce()) interrupts.

>>>>>>   
>>>>>>> There are some other differences, of varying degrees of importance,
>>>>>>> but that's the really fundamental one. I haven't seen any cases
>>>>>>> where
>>>>>>> r301453 provides functionality absent in the already existing
>>>>>>> system,
>>>>>>> but there seem to be a large number of cases (see the first email I
>>>>>>> sent to freebsd-arch or
>>>>>>> https://wiki.freebsd.org/Complicated_Interrupts) that the API in
>>>>>>> r301453 cannot accommodate and that are needed to support a
>>>>>>> variety of
>>>>>>> hardware.
>>>>>> Which single feature has been removed by r301453?
>>>>> Nothing has been removed by it, because the normal code is intact.
>>>>> However, the new code is less functional than the old code, so cannot
>>>>> replace it. In particular, it is architecturally incapable of working
>>>>> with the kinds of device trees found on PowerPC systems (see the wiki
>>>>> page). That means we would have to keep both indefinitely, which is a
>>>>> significant maintenance burden to no gain whatsoever.
>>>>> -Nathan
>>>> Firstly,  please,  ignore dependency problem. It will be explained
>>>> later
>>>> in [2].
>>> Except that that isn't a workable solution (see the end)
>>>
>>>> [1]
>>>> The original (your) INTRNG assume several things:
>>>>
>>>> - value of interrupt parent xref  together with byte contents of
>>>> 'interrupts' property forms some sort of  'key' (aka virtual IRQ)
>>>> - byte contents of  'interrupts' property cannot be parsed in any way
>>>> inside INTRNG core.
>>>> - data forming this key are sufficient for subsequent mapping (and/or
>>>> configuring) to real IRQ.
>>>> - there must  exist bidirectional unique relation between virtual and
>>>> real IRQs.  So one key (virtual IRQ)  can be mappable to one and just
>>>> one
>>>>      real IRQ.  And only one virtual IRQ  can be mapped to any
>>>> given real
>>>> IRQ.
>>>> - we have cache that stores all observed 'keys' and associated real
>>>> IRQ
>>>> ) the they already mapped.
>>>>
>>>> Unfortunately,  necessity of unique mapping is very problematic.
>>>> We cannot handle situation, where shared interrupt is declared by
>>>> different but compatible 'interrupts' properties.
>>> There isn't actually a requirement of bidirectional uniqueness, as I
>>> explained the last time you brought this up. One "virtual" IRQ does
>>> need to map to exactly one interrupt specifier, but the reverse is not
>>> the case. The PIC driver is absolutely free to map and dispatch
>>> multiple virtual IRQs from the same shared interrupt pin, with no more
>>> overhead than you usually get from shared interrupt lines.
>> Oki, lets me to expand this a little bit more.
>>
>> The pre r301453 INTRNG[ARM] does this:
>> - ofw_bus_intr_to_rl() reads and pre-parses  'interrupts' property
>> into  'key'.
>
> Sometimes. For some buses. ofw_bus_intr_to_rl() is a function of very
> limited application that should really only be used by simplebus; how
> that works depends on the bus bindings. There are many cases (e.g.
> MSIs, PCI buses generally) where interrupts are set up by a different
> route.
Yes, I agree.
>> - cache is searched for duplicates and new entry is created.
>> - position of this entry in cache  is used as resource ID.
>
> Resource ID is arbitrary, but it could be this.
>
>> - given resource ID is stored to RL  and is later used in
>> bus_alloc_resource()
> Yes.
>> I'm right ?
>>
>> In the above model, individual cache entries may be shared are
>> referenced by multiple entities (PIC, RL, ..).  Due to this, life cycle
>> of cache entries is not well defined and we probably must use some sort
>> of reference counting to control it.
>
> Why? Interrupts are not unmapped and remapped with unique values often
> enough (or, really, at all) to warrant even thinking about life-cycle
> management. The maximum number is limited by the maximum of the number
> of interrupt specifiers in the device tree, which are all static values.
No, we must also support  attachable/removable  devices. GPIO board
attached to hot-pug capable interfaces(USB, ExpressCard,...) for example.
Due to this we must be able to attach/detach PIC at run-time.

>
>> But, as you wrote,  INTRNG code is "absolutely free to map and dispatch
>> multiple virtual IRQs from the same shared interrupt pin" (allow me to
>> expand this also to INTRNG code itself) .
>> In attempt to simply  situation abound cache, in first step, we removed
>> duplicates check from above model.  This also removes unpleasantly
>> sharing of entries.
>> This step converts cache to some sort of duplicate of resource list
>> entries. Each existing RLE have exactly one corresponding entry in
>> cache.
>> Also,  the cache is not needed for INTRNG core. At this point, its only
>> transfers data from ofw_bus_intr_to_rl() to consumers PIC.
>>
>> So, if we are able to attach 'key' to RLE,  then we can remove the whole
>> cache machinery from INTRNG.
>> Do you agree?
>
> No. Because you have to support two things that this still can't allow:
> 1. Devices attaching before their interrupt controllers.
You still expect that data readed from OFW  must be parsed at RLE
creation time. I don't see single reason why we cannot postpone
parsing to bus_alloc_resource() time.
 At this point, its irrelevant if  bus_alloc_resource()  supports
detached PIC or not.

> 2. Interrupts encoded entirely in a 32-bit integer rather than a
> struct resource * in a shared rman. This is required to support PCI
> (both LSI and MSI).
>
> #2 could of course be fixed by completely retooling the API of the PCI
> bus code, but, since it fixes something that currently is not a
> problem, why would we do that?
All relevant bus functions have struct resource * as an argument. And,
in post r301453 kernel, all necessary data are attached to it.
I don't see any reason for API change.
 
>
>>
>>>> For example:
>>>> -------------------------
>>>>        foo1: bar@12345678 {
>>>>            interrupt-parent = <&pic1>;
>>>>            interrupts = <1 IRQ_TYPE_LEVEL_HIGH>;
>>>>       }
>>>>
>>>>        foo2: bar@1234567C {
>>>>            interrupt-parent = <&pic1>;
>>>>            interrupts = <1 IRQ_TYPE_NONE>;
>>>>       }
>>>>
>>>> Is this valid (from DT point of view) configuration?  I think that
>>>> yes.
>>> No, it's malformed and a violation of the standard.
>> Can you give me pointer, please? I'm not able to find anything about
>> this...
>
> I will try to find the reference -- unfortunately these things are
> scattered about and not all of the documents are public. 
Don't worry, I trust you :)

> You can see that this must be invalid immediately, however, because it
> would make the interrupt configuration dependent on which devices
> attach and in which order. I've attached the main interrupt mapping
> spec if you would like to look at it.
Well, if IRQ_TYPE_LEVEL_HIGH is default interrupt configuration then
not. And with current hell with various shield boards and its overlays...

>>
>>>> Is this frequent configuration?  I'm sure,  isn't.
>>>> Is this possible configuration?  I'm afraid that yes.
>>> People do in fact do deranged non-conformant things with device trees
>>> sometimes, unfortunately, so it is indeed worth worrying about.
>>>
>>>> --------------------------
>>>>
>>>> Next example are GPIO interrupts. These are encoded differently, 
>>>> so two
>>>> 'keys' may point to to same real IRQ.
>>>> After that we decided to change our approach and utilize standard
>>>> resources and resource list entries to deliveryconfiguration data from
>>>> various sources (OFW/GPIO/...)  to consumers.
>>> This is a non-issue. You make the GPIO controller a PIC, it handles
>>> interrupts however you like. If you got a weird device tree that uses
>>> two encodings (one for "interrupts" properties with GPIOs and one for
>>> "GPIO" properties on which you need to take interrupts but that
>>> weren't added to the "interrupts" list for mysterious reasons), you
>>> introduce a helper function for the GPIO case.
>>>
>>> Are there any actual problems with the pre-existing interrupt mapping
>>> code? I have not seen any so far.
>>>
>>>> [2]
>>>> Each OFW device has a lot of dependencies. It must enable  clocks,
>>>> power
>>>> regulators..., it must be taken from reset. All these action must be
>>>> done before we can access single register.
>>>> Most of these action can be done  only with attached clock/power/reset
>>>> controller.
>>>> Within this, interrupts are not special.
>>> They are actually quite special in that the kernel enables them late
>>> and so you can defer setup. They are also special in that, for that
>>> reason, it is possible to design systems with circular dependency
>>> graphs with interrupts. It is not possible to do this with, for
>>> example, clocks: if I need to apply a clock to the clock controller
>>> with itself, the system is just not bootable and such a system will
>>> not be built or shipped. Interrupts need only happen later, after
>>> device setup, and so such systems *are* designed and shipped. The same
>>> is true for power.
>> The G5 problem is standard  'cross type' circular dependency.  You must
>> (at BUS_PASS_BUS)  initialize memory address window of PCI bridge and
>> start bus enumeration.
>> Rest of PCI bridge initialization (including all IRQ stuff) must be
>> postponed to any later pass. Of course, all interrupt controllers must
>> be attached at BUS_PASS_INTERRUPT.
>> Or I missed something?
>
> Yes. As I said, you could solve it that way. It would, of course,
> require a massive retooling of newbus (to allow partial delayed
> attach), everything in dev/pci (which doesn't expect that), and
> (almost) every PCI driver in and out of the tree.
I still think that only host to PCI bridge must me modified, and PCI-PCI
bridge must be run at  BUS_PASS_BUS (which is single line modification).

>
> Or: you could skip all of that and use the fully functional mechanism
> that already exists.
The problem is, at least from my point of view, that we don't have it.
The actual MIPS code doesn't work on ARM for numerous reasons.  This is
first one ->
 
https://svnweb.freebsd.org/base/head/sys/powerpc/powerpc/nexus.c?revision=297000&view=markup#l186
The pre-r301453  ARM INTRNG doesn't support detached interrupts.
And r301453  doesn't changed nothing about this.

Michal

> This is my point in a nutshell, basically. This code reinvents a
> perfectly functional wheel in a way that could, in the future, with a
> huge amount of work, accomplish the same things. For now, it is less
> functional and offers no visible advantages or new capabilities of any
> kind -- nor are there any such capabilities it could obviously provide
> in the future. Between now and that future, which will likely never
> arrive, we will have to maintain both in parallel and cause the device
> tree support on ARM and PowerPC to diverge.
> -Nathan
>
>>>> Moreover, 'cross type' circular
>>>> dependencies are not that rare.
>>>> I want to say:
>>>> Resource dependencies are must solved (and resolved) at different
>>>> level
>>>> than is INTRNG.
>>>> Blind resource allocation is not universal solution because given
>>>> resource may/must be accessible immediately after allocation (in many
>>>> cases).
>>> Absolutely: for GPIOs, clocks, power, etc. you want a system that
>>> looks different than the interrupt virtualization system. Probably
>>> with extra resource types and maybe with some API similar to r301453.
>>> But interrupts have different, and more complex, requirements and
>>> cannot be mapped this way.
>>>
>>>> Unfortunately,  and at time time, I known only one really working
>>>> solution:
>>>> staggered driver initialization combined with multipass bus
>>>> initialization.
>>> That does not solve the interrupt problem. If devices have interrupts
>>> on their own children, no amount of multipass initialization can
>>> possibly break the loop. Multipass would work for other kinds of
>>> resources (clocks, power, etc.) and is a perfect match for those
>>> resource types. A dynamic multipass (where a driver can return EAGAIN
>>> or something from attach if its resources don't exist yet) would work
>>> great.
>>>
>>> *But* in the specific, special case of interrupts, it is not a
>>> workable model. You could imagine some change to initialization that
>>> gives devices a late attach and an early attach method and does
>>> interrupts in the late part, but that is hugely invasive change that
>>> would need to touch every single driver in the tree -- to solve a
>>> problem that is already solved by interrupt virtualization.
>>> -Nathan
>>>
>>>> I'm sorry, I'm not able to express all this accurately and clearly,
>>>> but
>>>> I've really tried...
>>>> Michal
>>>>
>>>>
>>>> _______________________________________________
>>>> freebsd-arch@freebsd.org mailing list
>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
>>>> To unsubscribe, send any mail to
>>>> "freebsd-arch-unsubscribe@freebsd.org"
>>>>
>> _______________________________________________
>> freebsd-arch@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?579AFFC5.1040005>