Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 Jan 2017 13:08:06 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 216493] [Hyper-V] Mellanox ConnectX-3 VF driver can't work when FreeBSD runs on Hyper-V 2016
Message-ID:  <bug-216493-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D216493

            Bug ID: 216493
           Summary: [Hyper-V] Mellanox ConnectX-3 VF driver can't work
                    when FreeBSD runs on Hyper-V 2016
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: decui@microsoft.com

Windows Server 2016 (Hyper-V 2016) has the ability to support PCIe pass-thr=
ough
and NIC SR-IOV for non-Windows virtual machines (VMs) like Linux and FreeBSD
VMs. A few months ago, we enabled PCIe pass-through for FreeBSD VM running =
on
Hyper-V and successfully assigned a Mellanox ConnectX-3 PF device to the VM=
 and
the device worked fine in the VM.=20

Now we have added code to support NIC SR-IOV (which is based on PCIe
pass-through) in the Hyper-V hv_netvsc driver, but it turned out the VF dri=
ver
failed to load, so I ported two patches from Linux:
https://reviews.freebsd.org/D8867
https://reviews.freebsd.org/D8868

(Note: I only tested the PF/VF drivers in FreeBSD VM running on Hyper-V, bu=
t I
didn=E2=80=99t test them with the patches on a bare metal FreeBSD machine (=
it=E2=80=99s not so
easy to install such a FreeBSD machine in our lab for now), so it would be
really helpful & important if people could review the patches and help to t=
est
bare metal.)

With the 2 patches, the VF driver worked in my limited test.

BTW, this link (https://community.mellanox.com/docs/DOC-2242) shows how to
enable Mellanox ConnectX-3 VF for Windows VM running on Hyper-V 2012 R2. Wh=
at I
did to FreeBSD VM on Hyper-V 2016 is pretty similar.=20


Next, I did more testing and identified 4 issues we need to address:
1. When the VF is hot removed, I see the below error, but it looks nonfatal,
because later when the VF is hot added, it can still work.

mlx4_core0: Failed to free mtt range at:20769 order:0
mlx4_core0: detached


2. The VF works fine when the VM has <=3D12 virtual CPUs, but if the VM has=
 >=3D13
vCPUs, the VF driver fails to load:

  mlx4_core0: <mlx4_core> at device 2.0 on pci1
  mlx4_core: Initializing mlx4_core: Mellanox ConnectX VPI driver v2.1.6
  vmbus0: allocated type 3 (0xfe0800000-0xfe0ffffff) for rid 18 of mlx4_cor=
e0
  mlx4_core0: Lazy allocation of 0x800000 bytes rid 0x18 type 3 at 0xfe0800=
000
  mlx4_core0: Detected virtual function - running in slave mode
  mlx4_core0: Sending reset
  mlx4_core0: Sending vhcr0
  mlx4_core0: HCA minimum page size:512
  mlx4_core0: Timestamping is not supported in slave mode.
  mlx4_core0: attempting to allocate 20 MSI-X vectors (52 supported)
  mlx4_core0: using IRQs 256-275 for MSI-X
  mlx4_core0: Failed to allocate mtts for 1024 pages(order 10)
  mlx4_core0: Failed to initialize event queue table (err=3D-12), aborting.


3. The VF can't ping other VM's VF on the same host, and can't ping the PF =
on
the same host either.

On the same host,
    Windows VM <-> Windows VM
and=20
    Windows VM <-> Linux VM
are both OK.

Only FreeBSD VM <-> Windows/Linux VMs  can't work.

I suspect something is wrong or missing in the mlx4 VF driver in FreeBSD.


4. I got the below when Live Migration didn=E2=80=99t work. It seems the VF=
=E2=80=99s detach
method couldn=E2=80=99t finish successfully.

Jan 11 19:16:43 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm chan=
nel
is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:16:43 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST
commandopcode FREE_RES (0xf01)
Jan 11 19:16:43 decui-b11 kernel: mlx4_core0: Failed to free mtt range at:5=
937
order:0
Jan 11 19:16:54 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm chan=
nel
is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:16:54 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST
commandopcode CLOSE_PORT (0xa)
Jan 11 19:18:04 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm chan=
nel
is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:18:04 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST
commandopcode FREE_RES (0xf01)
Jan 11 19:19:14 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm chan=
nel
is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:19:14 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST
commandopcode QP_FLOW_STEERING_DETACH (0x66)
Jan 11 19:19:14 decui-b11 kernel: mlx4_core0: Fail to detach network rule.
registration id =3D 0x9000000000002
Jan 11 19:20:24 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm chan=
nel
is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:20:24 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST
commandopcode QP_FLOW_STEERING_DETACH (0x66)
Jan 11 19:20:24 decui-b11 kernel: mlx4_core0: Fail to detach network rule.
registration id =3D 0x9000000000003
Jan 11 19:21:34 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm chan=
nel
is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:21:34 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST
commandopcode QP_FLOW_STEERING_DETACH (0x66)
Jan 11 19:21:34 decui-b11 kernel: mlx4_core0: Fail to detach network rule.
registration id =3D 0x9000000000004
Jan 11 19:22:46 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm chan=
nel
is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:22:46 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST
commandopcode QP_FLOW_STEERING_DETACH (0x66)
Jan 11 19:22:46 decui-b11 kernel: mlx4_core0: Fail to detach network rule.
registration id =3D 0x9000000000005
Jan 11 19:23:56 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm chan=
nel
is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:23:56 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST
commandopcode QP_FLOW_STEERING_DETACH (0x66)
Jan 11 19:23:56 decui-b11 kernel: mlx4_core0: Fail to detach network rule.
registration id =3D 0x9000000000006
Jan 11 19:25:06 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm chan=
nel
is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:25:06 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST
commandopcode QP_FLOW_STEERING_DETACH (0x66)
Jan 11 19:25:06 decui-b11 kernel: mlx4_core0: Fail to detach network rule.
registration id =3D 0x9000000000007
Jan 11 19:26:16 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm chan=
nel
is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:26:16 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST
commandopcode SET_MCAST_FLTR (0x48)
Jan 11 19:27:26 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm chan=
nel
is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:27:26 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST
commandopcode FREE_RES (0xf01)
Jan 11 19:27:26 decui-b11 kernel: mlx4_core0: Failed to free icm of qp:2279
Jan 11 19:28:36 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm chan=
nel
is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:28:36 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST
commandopcode FREE_RES (0xf01)
Jan 11 19:28:36 decui-b11 kernel: mlx4_core0: Failed to release qp range
base:2279 cnt:1
Jan 11 19:29:46 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm chan=
nel
is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:29:46 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST
commandopcode 2RST_QP (0x21)
Jan 11 19:30:56 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm chan=
nel
is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:30:56 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST
commandopcode HW2SW_CQ (0x17)
Jan 11 19:30:56 decui-b11 kernel: mlx4_core0: HW2SW_CQ failed (-35) for CQN
0000b5
Jan 11 19:32:06 decui-b11 kernel: mlx4_core0: mlx4_comm_cmd_wait: Comm chan=
nel
is not idle. My toggle is 0 (op: 0x5)
Jan 11 19:32:06 decui-b11 kernel: mlx4_core0: failed execution of VHCR_POST
commandopcode FREE_RES (0xf01)
Jan 11 19:32:06 decui-b11 kernel: mlx4_core0: Failed freeing cq:181

More info about issue 4:

In the case of Live Migration, it looks the host just rescinds the VF by fo=
rce
without sending the PCI_EJECT message to the VM. It looks the current Mella=
nox
VF driver in FreeBSD can=E2=80=99t handle this case (i.e. the VF device dis=
appears
suddenly) and always hangs due to command timeout, because at that time the
host denies the VM=E2=80=99s access to the VF.=20=20

BTW, the VF driver in Linux VM doesn=E2=80=99t hang and it looks Live Migra=
tion can
work, but the driver also prints out these scary messages:

Jan 26 02:40:06 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: Internal error
detected on the communication channel
Jan 26 02:40:06 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: device is goin=
g to
be reset
Jan 26 02:40:06 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: VF reset is not
needed
Jan 26 02:40:06 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: device was res=
et
successfully
Jan 26 02:40:06 decui-lin-vm kernel: mlx4_en 99bb:00:02.0: Internal error
detected, restarting device
Jan 26 02:40:06 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: command 0x5
failed: fw status =3D 0x1
Jan 26 02:40:06 decui-lin-vm kernel: hv_netvsc vmbus_16 eth1: VF down:
enP39355p0s2
Jan 26 02:40:06 decui-lin-vm kernel: hv_netvsc vmbus_16 eth1: Data path
switched from VF: enP39355p0s2
Jan 26 02:40:06 decui-lin-vm kernel: hv_netvsc vmbus_16 eth1: VF unregister=
ing:
enP39355p0s2

Jan 26 02:40:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: Failed to close
slave function
Jan 26 02:40:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: Detected virtu=
al
function - running in slave mode
Jan 26 02:40:37 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: recovering from
previously mis-behaved VM
Jan 26 02:41:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: Communication
channel is offline.
Jan 26 02:41:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: PF is not
responsive, skipping initialization
Jan 26 02:41:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: Failed to
initialize slave
Jan 26 02:41:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: mlx4_restart_o=
ne:
ERROR: mlx4_load_one failed, pci_name=3D99bb:00:02.0, err=3D-5
Jan 26 02:41:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: mlx4_restart_o=
ne
was ended, ret=3D-5
Jan 26 02:41:07 decui-lin-vm kernel: mlx4_core 99bb:00:02.0: mlx4_remove_on=
e:
interface is down

I think at least we need to port this patch
=E2=80=9Cnet/mlx4_core: Enable device recovery flow with SRIOV =E2=80=9C
(https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=
=3D55ad359225b2232b9b8f04a0dfa169bd3a7d86d2)
from Linux to FreeBSD.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-216493-8>