Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 15 Jan 2018 10:31:04 -0600 (CST)
From:      "Valeri Galtsev" <galtsev@kicp.uchicago.edu>
To:        "Mehmet Erol Sanliturk" <m.e.sanliturk@gmail.com>
Cc:        "Grzegorz Junka" <list1@gjunka.com>, "FreeBSD Questions Mailing List" <freebsd-questions@freebsd.org>, "Warner Losh" <imp@bsdimp.com>, freebsd-drivers@freebsd.org
Subject:   Re: Server doesn't boot when 3 PCIe slots are populated
Message-ID:  <57715.108.68.169.115.1516033864.squirrel@cosmo.uchicago.edu>
In-Reply-To: <CAOgwaMvusKzt%2BYvmKeuyox0c=wgqEv9UP475Eacm2B0OkF7OrQ@mail.gmail.com>
References:  <ecce3fa6-3909-0947-685c-8a412684e99c@gjunka.com> <CAOgwaMsf9zByJYhL3KqpUMW5qKAzQEHpDWcwejY-uK=9swWbUQ@mail.gmail.com> <3d0ad00c-5214-71b0-017b-c2d5ba608e37@gjunka.com> <CAOgwaMsOKrGfGNmRt-C9Skjssj8JPAtFpk8bwG9v55LmaWdoVw@mail.gmail.com> <8df1e967-01e0-d3c2-e14c-64c7fc8c66b0@gjunka.com> <CANCZdfqZ-dogHXBdoyMPLOPs_R-vD%2BwLM-r6sm6ypesd0Nvp4A@mail.gmail.com> <0e582bdb-e1f9-438c-3da2-2bcdc950aab5@gjunka.com> <CAOgwaMvusKzt%2BYvmKeuyox0c=wgqEv9UP475Eacm2B0OkF7OrQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Mon, January 15, 2018 3:44 am, Mehmet Erol Sanliturk wrote:
> On Mon, Jan 15, 2018 at 9:44 AM, Grzegorz Junka <list1@gjunka.com> wrote:
>
>>
>> On 15/01/2018 06:18, Warner Losh wrote:
>>
>>>
>>>
>>> On Jan 14, 2018 11:05 PM, "Grzegorz Junka" <list1@gjunka.com <mailto:
>>> list1@gjunka.com>> wrote:
>>>
>>>
>>>     On 14/01/2018 16:18, Mehmet Erol Sanliturk wrote:
>>>
>>>
>>>
>>>         On Sun, Jan 14, 2018 at 5:46 PM, Grzegorz Junka
>>>         <list1@gjunka.com <mailto:list1@gjunka.com>
>>>         <mailto:list1@gjunka.com <mailto:list1@gjunka.com>>> wrote:
>>>
>>>
>>>             On 13/01/2018 17:56, Mehmet Erol Sanliturk wrote:
>>>
>>>
>>>
>>>                 On Sat, Jan 13, 2018 at 7:21 PM, Grzegorz Junka
>>>                 <list1@gjunka.com <mailto:list1@gjunka.com>
>>>         <mailto:list1@gjunka.com <mailto:list1@gjunka.com>>
>>>                 <mailto:list1@gjunka.com <mailto:list1@gjunka.com>
>>>         <mailto:list1@gjunka.com <mailto:list1@gjunka.com>>>> wrote:
>>>
>>>                     Hello,
>>>
>>>                     I am installing a FreeBSD server based on
>>>         Supermicro H8SML-iF.
>>>                     There are three PCIe slots to which I installed 2
>>> NVMe
>>>                 drives and
>>>                     one network card Intel I350-T4 (with 4 Ethernet
>>>         slots).
>>>
>>>                     I am observing a strange behavior where the system
>>>         doesn't
>>>                 boot if
>>>                     all three PCIe slots are populated. It shows this
>>>         message:
>>>
>>>                     nvme0: <Generic NVMe Device> mem
>>>         0xfd8fc000-0xfd8fffff irq
>>>                 24 at
>>>                     device 0.0 on pci1
>>>                     nvme0: controller ready did not become 1 within
>>>         30000 ms
>>>                     nvme0: did not complete shutdown within 5 seconds
>>> of
>>>                 notification
>>>
>>>                     The I see a kernel panic/dump and the system
>>>         reboots after
>>>                 15 seconds.
>>>
>>>                     If I remove one card, either one of the NVMe
>>>         drives or the
>>>                 network
>>>                     card, the system boots fine. Also, if in BIOS I
>>>         set PnP OS
>>>                 to YES
>>>                     then sometimes it boots (but not always). If I set
>>>         PnP OS
>>>                 to NO,
>>>                     and all three cards are installed, the system
>>>         never boots.
>>>
>>>                     When the system boots OK I can see that the
>>>         network card is
>>>                     reported as 4 separate devices on one of the PCIe
>>>         slots. I
>>>                 tried
>>>                     different NVMe drives as well as changing which
>>>         device is
>>>                     installed to which slot but the result seems to be
>>> the
>>>                 same in any
>>>                     case.
>>>
>>>                     What may be the issue? Amount of power drawn by the
>>>                 hardware? Too
>>>                     many devices not supported by the motherboard? Too
>>>         many
>>>                 interrupts
>>>                     for the FreeBSD kernel to handle?
>>>
>>>                     Any help would be greatly appreciated.
>>>
>>>                     GregJ
>>>
>>>                     _______________________________________________
>>>
>>>
>>>
>>>
>>>
>>>                 From my experience from other trade marked main boards
>>>         , an
>>>                 action may be to check manual of your server board to
>>> see
>>>                 whether there are rules about use of these slots :
>>>         Sometimes
>>>                 differently shaped slots are supplied with same ports
>>>         : If one
>>>                 slot is occupied , the other slot should be left open ,
>>> or
>>>                 rules about not to insert such a kind of device into a
>>>         slot ,
>>>                 for example , graphic cards .
>>>
>>>
>>>                 Mehmet Erol Sanliturk
>>>
>>>
>>>             I checked the manual but couldn't find any restrictions
>>>         regarding
>>>             PCIe ports. It only says how many lanes are available in
>>> each
>>>             slot. Would there be any obvious BIOS setting that could
>>> cause
>>>             this issue? I tried after resetting BIOS to default
>>>         settings but
>>>             maybe something is set incorrectly by default?
>>>
>>>             GregJ
>>>             _______________________________________________
>>>
>>>
>>>
>>>
>>>
>>>         http://www.supermicro.com/Aplus/motherboard/Opteron3000/SR56
>>> x0/H8SML-iF.cfm
>>>         <http://www.supermicro.com/Aplus/motherboard/Opteron3000/SR5
>>> 6x0/H8SML-iF.cfm>
>>>         H8SML-iF
>>>
>>>
>>>         On the above page , click "OS Compatibility"
>>>
>>>
>>>         On the following page , click "SR5650"
>>>
>>>         http://www.supermicro.com/Aplus/support/resources/OS/OS_Comp
>>> _SR5650.cfm
>>>         <http://www.supermicro.com/Aplus/support/resources/OS/OS_Com
>>> p_SR5650.cfm>
>>>         OS Compatibility Chart
>>>
>>>
>>>         On the column ( third )
>>>
>>>         H8SML-7F
>>>         H8SML-7
>>>         H8SML-iF
>>>         H8SML-i
>>>
>>>
>>>         there listed only *
>>>         *
>>>         **
>>>         *
>>>         *
>>>         *
>>>         *
>>>
>>>         FreeBSD 8.0
>>>         FreeBSD 9.1
>>>
>>>         From this list , it may be said that , this mother board date
>>>         is old , means , it seems that the new OS versions are not
>>>         tested after currently tested OS versions .
>>>
>>>
>>>         To check interaction between operating system and your
>>>         Supermicro H8SML-iF , select one of the suitable operating
>>>         system ( Unix class OSes are more suitable ) for you and
>>>         tested on this card , and try to install it as you like your
>>>         installed components . If it boots successfully , it means
>>>         that there is an incompatibility between your FreeBSD and the
>>>         main board . If no one of them boots , then you may conclude
>>>         that , there is a problem in your settings .
>>>
>>>
>>>         BIOS settings are important , because , OS communicates with
>>>         the main board through these settings .
>>>
>>>
>>>         In manual ( downloaded from the above page :
>>>         Manual Revision 1.0c
>>>         Release Date: March 12, 2014 ) , page 4-9  , "PCI/PnP
>>>         Configuration" is defined .
>>>         If PnP is selected YES. OS adjusts some device settings  . If
>>>         NO is selected , BIOS adjusts some device settings . When BIOS
>>>         adjusted device settings are not conforming to OS parameters ,
>>>         the result will be "FAIL" .
>>>
>>>         Therefore , more suitable selection is YES .
>>>
>>>
>>>         Another point is that , there are many more BIOS selectable
>>>         parameters and jumpers about PCI slots and others  .
>>>         There are some BIOS settings for PCI slots :
>>>
>>>         PCI X4 Slot 6 ( page 4-9 )
>>>         PCI x8 Slot 7 ( page 4-10 )
>>>
>>>
>>>
>>>         Please review these BIOS settings in your manual and set them
>>>         with respect to your requirements .
>>>
>>>
>>>     Thanks Mehmet for looking into this. It's an old motherboard but
>>>     my point is that it boots fine when either: one NVMe and the
>>>     network card, or both NVMe are installed, but not when all three
>>>     are installed. How would that be related to FreeBSD compatibility?
>>>     The chipset and all devices that I am trying to install are
>>>     supported by FreeBSD 11.x.
>>>
>>>     I just tried booting into a Debian live system and it also didn't
>>>     enumerate NVMe drives properly. This means that it's not FreeBSD
>>>     related and is no longer relevant for this list. I will try to
>>>     play with BIOS settings to see if I can make it work that way.
>>>     Thanks for all the help.
>>>
>>>
>>>
>>> Nvme drives are weird about power. I distrust the power estimate of
>>> 5-9w
>>> earlier in the thread... given the oddity with debian, it's not too
>>> crazy
>>> to think that. How far does FreeBSD boot though?
>>>
>>>
>> I tried with a different power supply but the outcome was exactly the
>> same. Sometimes FreeBSD boots fine but one of the NVMe drives is not
>> visible (i.e. dmesg grep shows only one NVMe). When it doesn't work it
>> boots up to the point of enumerating drives (SATA, USB, NVMe). Then it
>> stops at the first NVMe and reboots.
>>
>> The funny thing is that very often it's enough to pull out one of the
>> cards and put it back in. Then the system boots fine with all three
>> cards.
>> I had that a few times. Once it's booted it works, I can restart the
>> system
>> and it boots every time. As soon as I power off, unplug from the power
>> main, wait a few minutes and power it on again, the issue comes back -
>> can't boot as NVMe can't be enumerated.
>>
>> I though it might be caused by the hardware being too cold. I left the
>> server once overnight but it didn't boot up, it was trying and
>> restarting
>> the whole night.
>>
>> GregJ
>>
>>
>> _______________________________________________
>>
>>
>
>
>
> The above explanation brings mind to the "impedance mismatch  in
> electronics" problem .

Hm, I wouldn't say so. First of all, I will seriously doubt that sane
cards are out of specs as far as impedance is concerned.

But before going further, let's make sure we talk about the same thing. I
assume impedance mismatch is what is related to impedance of the load
attached to transmission line to be different from impedance of
transmission line itself. In such case part of transmitted signal is
reflected from the load back into transmission line. This can make mess as
transmitted signal is mixed with this reflected at different positions of
the loads along the same transmission line. One has to have really large
mismatch (over 20% at least) to make that matter. Many of us remember this
in at least two computer related cases: 1. we used terminators at the end
of SCSI cables (or attached "self-terminating SCSI device to the end of
line). 2. In some system boards in which memory buses had no terminators
the manual would say to populate slots beginning from the fartherst away
from CPU (to defeat reflection from open end of memory bus lines).

I have never heard of anything like that on PCI express bus. If I am
wrong, could you give some pointer so I can read about it.

Thanks in advance for pointers! (I know: you learn something every day -
which I bet I am about to ;-)

Valeri

>
> ( Please search
>
>
> impedance mismatch  in electronics
> impedance matching  in electronics
>
>
> in Internet if you want explanations about them . )
>
>
> When all of these cards are inserted into slots simultaneously , their
> accumulated electronic effect may distort behaviour of your mother board
> circuits or attached card circuit(s) .
>
>
> Therefore , if you can find another NVMe and/or network card , please test
> their effect .
> Such tests may be inconclusive because mother board circuits may be
> affected negatively from "properly" operating add on cards when they are
> inserted together .
>
>
> If it is feasible for you , you may use USB attached network card(s) to
> eliminate network card attachment .
> Or you may use a more capable one NVMe card instead of two smaller NVMe
> cards , or you may use only one of them , or/and select an SATA SSD .
> Such a choice would save your investment and produces a working server
> with
> a "little" loss when compared to "all" .
>
>
>
>
> Mehmet Erol Sanliturk
> _______________________________________________
> freebsd-questions@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to
> "freebsd-questions-unsubscribe@freebsd.org"
>


++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?57715.108.68.169.115.1516033864.squirrel>