Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Jun 2018 09:18:06 +0200
From:      Harry Schmalzbauer <freebsd@omnilan.de>
To:        Scott Long <scottl@samsco.org>
Cc:        scsi@freebsd.org
Subject:   =?UTF-8?Q?Re:_What_is_ENXIO_=e2=80=93_MSI_allocation_regression_in_?= =?UTF-8?Q?:[Was_Re:_svn_commit:_r321714_-_in_head/sys/dev:_mpr_mps]?=
Message-ID:  <d99e383d-b09a-f3bd-f1e2-a6a808016347@omnilan.de>
In-Reply-To: <78611650-D7A4-4B1D-A254-DB058E1AC1C6@samsco.org>
References:  <201707300653.v6U6rwLN099096@repo.freebsd.org> <597DA578.6030101@omnilan.de> <597F56A8.1060603@omnilan.de> <D18DFAD4-6E93-4AE2-BE15-EFF4D8ABCB2A@samsco.org> <59804C8C.1020003@omnilan.de> <e7d94e6a-89e8-ffa1-40da-7fb67e6bfc2b@omnilan.de> <78611650-D7A4-4B1D-A254-DB058E1AC1C6@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Am 05.06.2018 um 00:22 schrieb Scott Long:
> 
> 
>> On Jun 4, 2018, at 4:51 AM, Harry Schmalzbauer <freebsd@omnilan.de> wrote:
>>
>> Am 01.08.2017 um 11:40 schrieb Harry Schmalzbauer:
>>> Bezüglich Scott Long's Nachricht vom 31.07.2017 18:56 (localtime):
>>>
>>> …
>>>>> I'd like to report one I hadn't expected:
>>>>>
>>>>> mps0: <Avago Technologies (LSI) SAS2008> port 0x4000-0x40ff mem 0xc3bc0000-0xc3bc3fff,0xc3b80000-0xc3bbffff irq 19 at device 0.0 on pci7
>>>>>
>>>>> mps0: Firmware: 20.00.04.00, Driver: 21.02.00.00-fbsd
>>>>> mps0: IOCCapabilities:
>>>>> 185c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,IR>
>>>>> mps0: Cannot allocate INTx interrupt
>>>>> mps0: mps_iocfacts_allocate failed to setup interrupts
>>>>> mps0: mps_attach IOC Facts based allocation failed with error 6
>>>>> panic: resource_list_release: resource entry is not busy
>>>>> cpuid = 6
>>>>> KDB: stack backtrace:
>>>>> #0 0xffffffff805e32d7 at kdb_backtrace+0x67
>>>>> #1 0xffffffff805a1d26 at vpanic+0x186
>>>>> #2 0xffffffff805a1b93 at panic+0x43
>>>>> #3 0xffffffff805d71c6 at resource_list_release+0x1c6
>>>>> #4 0xffffffff8040fef1 at mps_pci_free+0xe1
>>>>> #5 0xffffffff8040fa23 at mps_pci_attach+0x1b3
>>>>> #6 0xffffffff805d6594 at device_attach+0x3a4
>>>>> #7 0xffffffff805d774d at bus_generic_attach+0x3d
>>>>> #8 0xffffffff8044ac05 at pci_attach+0xd5
>>>>> #9 0xffffffff805d6594 at device_attach+0x3a4
>>>>> #10 0xffffffff805d774d at bus_generic_attach+0x3d
>>>>> #11 0xffffffff80364761 at acpi_pcib_pci_attach+0xa1
>>>>> #12 0xffffffff805d6594 at device_attach+0x3a4
>>>>> #13 0xffffffff805d774d at bus_generic_attach+0x3d
>>>>> #14 0xffffffff8044ac05 at pci_attach+0xd5
>>>>> #15 0xffffffff805d6594 at device_attach+0x3a4
>>>>> #16 0xffffffff805d774d at bus_generic_attach+0x3d
>>>>> #17 0xffffffff80363e4d at acpi_pcib_acpi_attach+0x42d
>>>>> Uptime: 1s
>>> …
>>>
>>>> Fixed in r321799, thanks for the report.
>>> Fix confiremd; merged together with r321733 (and 321737) to 11.1 and
>>> panic vanished.
>>
>> Late in the 11.2 phase, I identified this commit as a regression for MSI (non-x) alloctaion.
>> I have an idea what probably causes the problem here (INTx allocation, although MSI (and MSI-x) capability):
>> disable_msix is not 0 (I need to disable MSI-x because of ESXi-passthru…).
>>
>> Corresponding lines:
>> {
>>          device_t dev;
>>          int error, msgs;
>>
>>          dev = sc->mps_dev;
>>          error = 0;
>>          msgs = 0;
>>
>>          if ((sc->disable_msix == 0) &&
>>              ((msgs = pci_msix_count(dev)) >= MPS_MSI_COUNT))
>>                  error = mps_alloc_msix(sc, MPS_MSI_COUNT);
>>          if ((error != 0) && (sc->disable_msi == 0) &&
>>              ((msgs = pci_msi_count(dev)) >= MPS_MSI_COUNT))
>>                  error = mps_alloc_msi(sc, MPS_MSI_COUNT);
>>          if (error != 0)
>>                  msgs = 0;
>>
>>          sc->msi_msgs = msgs;
>>          return (error);
>> }
>>
>> Before r321714, error was assigned ENXIO, which, if != 0, could help make me understand the problem.
>> Unfortunately I have no idea what ENXIO means, where it's defined and most important, how to find the place where the declaration/definition happens.  Only joe and vi available here, any hints highly appreciated.
>>
>> I can confirm that MSI allocation works with mps.ko_21.02.00.00-fbsd-r321415 with my ESXi-passthru-non_msi-x setup.
>> Although the dirver emits no message that an MSI was allocated, like toher drivers do.  That's a cosmetic one though.
>> But the MSI->INTx regression is a severe one for me, which I'd like to fix myself but I'm missing so many fundamental skills :-(
>>
> 
> Hi Harry,
> 
> You are correct about the bug.  Please change the line at the top of the function that reads
> 
> error = 0;
> 
> to
> 
> error = ENXIO;
> 
> Let me know if that fixes the MSI problem for you.

Hello Scott,

thanks for your hint.
Unfortunately I have a lot more problems – the system (11.2-RC1) 
deadlocks for some soconds with iSCSI load...
This is far easyer reproducable / heavier impact with mps(4) and INTx 
allocation than with MSI, but backup runs over night triggered that 
extreme slowdown although mps(4) was allocating MSI – up to 20 sec 
locks, where even no terminal update happes.
All those update ar queued though, so after about 10-20 sedonds, the 
screen flickers, showing all queued output.

One symptom is that systat(1) shows 25% intr usage which is one core.
It's a ZFS machine, so high sys usage is normal, but intr usually is 
about 10% with GbE traffic.
Only when the slowdown/lockup happens, intr usage constantly stays at 25%.

Can't imagine ctld(8) or zfs is causing this, but who knows – I don't at 
the moment.
Will have to revert to 11.1 and see if things change, the machine was 
10.? before – without such problems.

BTW, does anybody have a link where I can get info about ENXIO?

Thanks,

-harry




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d99e383d-b09a-f3bd-f1e2-a6a808016347>