Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Sep 2018 21:41:27 +0000
From:      bugzilla-noreply@freebsd.org
To:        ports-bugs@FreeBSD.org
Subject:   [Bug 231697] net/openmpi2:  MPI_Send to self fails (or receive from self fails?)
Message-ID:  <bug-231697-7788@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D231697

            Bug ID: 231697
           Summary: net/openmpi2:  MPI_Send to self fails (or receive from
                    self fails?)
           Product: Ports & Packages
           Version: Latest
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: Individual Port(s)
          Assignee: danilo@FreeBSD.org
          Reporter: russo@bogodyn.org
             Flags: maintainer-feedback?(danilo@FreeBSD.org)
          Assignee: danilo@FreeBSD.org
 Attachment #197466 text/plain
         mime type:

Created attachment 197466
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D197466&action=
=3Dedit
Simple test case that just does send/receives, fails with OpenMPI 2 or 3.

We have observed our code failing when built with OpenMPI 2.1 or 3.x on Fre=
eBSD
and also on one other Linux platform, and have tracked it down at least on =
BSD
to a simple test case, in which it is observed that data sent via MPI_Send
calls to the same processor that it's running on are not received by a
corresponding MPI_Irecv, a use case that is supposed to be standard complia=
nt
*AND* which DOES work with OpenMPI 1.10 on the same machine.

My uname -a:
FreeBSD yyy.zzz 10.4-STABLE FreeBSD 10.4-STABLE #0 r327510: Tue Jan  2 21:5=
2:13
MST 2018     xxx@yyy.zzz:/usr/obj/usr/src/sys/GENERIC  amd64

The attached test program will print BAD on each line where it is supposed =
to
report that proc#N received something from proc#M when M=3D=3DN, if compile=
d with
OpenMPI 2.1.x or 3.x.  It will pass just fine with OpenMPI 1.10.

We have run this on a few OSen other than BSD including RHEL6, RHEL7, and O=
S X,
and none have the same issue.  It does appear, however, that Ubuntu 18.04's
OpenMPI 2.1.x has the same problem.

It is not at all clear where this problem lies, except that the symptom is =
that
the receive requests do not in fact receive any data if the sender is the s=
ame
processor.

To reproduce:
   /usr/local/mpi/openmpi2/bin/mpicc -o testBUG967 testBUG967.c
   /usr/local/mpi/openmpi2/bin/mpirun -np 2 ./testBUG967
On my machine, this gives the output:
0 posting receive 0 0x803fc78b0
0 posting receive 1 0x803fc78b4
0 sending to 0 value 1000
1 posting receive 0 0x803fc78b0
1 posting receive 1 0x803fc78b4
1 sending to 0 value 2000
1 sending to 1 value 2001
0 sending to 1 value 1001
0 wait source 0 count 0=20
0 wait source 1 count 4=20
0 procs_from 0 vals_from -1000 BAD BAD BAD=20
0 procs_from 1 vals_from 2000=20=20=20
1 wait source 1 count 0=20
1 wait source 0 count 4=20
1 procs_from 1 vals_from -1000 BAD BAD BAD=20
1 procs_from 0 vals_from 1001=20=20=20

When run instead with openmpi 1 it gives the output actually expected:
> /usr/local/mpi/openmpi/bin/mpicc -o testBUG967 testBUG967.c=20
> /usr/local/mpi/openmpi/bin/mpirun -np 2 ./testBUG967=20
1 posting receive 0 0x803e23ad8
1 posting receive 1 0x803e23adc
1 sending to 0 value 2000
0 posting receive 0 0x803e23ad8
0 posting receive 1 0x803e23adc
0 sending to 0 value 1000
1 sending to 1 value 2001
0 sending to 1 value 1001
1 wait source 1 count 4=20
1 wait source 0 count 4=20
1 procs_from 1 vals_from 2001=20=20=20
0 wait source 0 count 4=20
0 wait source 1 count 4=20
0 procs_from 0 vals_from 1000=20=20=20
1 procs_from 0 vals_from 1001=20=20=20
0 procs_from 1 vals_from 2000=20=20=20

I have tried it with varying --mca btl options (tcp,self; sm,self; vader,se=
lf)
as well, and it always gets the failed receive issue with all of them unles=
s I
use OpenMPI 1.x.


Additional information:
> pkg info openmpi2
openmpi2-2.1.5
Name           : openmpi2
Version        : 2.1.5
Installed on   : Mon Sep 24 15:31:19 2018 MDT
Origin         : net/openmpi2
Architecture   : FreeBSD:10:amd64
Prefix         : /usr/local
Categories     : net parallel
Licenses       : BSD3CLAUSE
Maintainer     : danilo@FreeBSD.org
WWW            : http://www.open-mpi.org/
Comment        : High Performance Message Passing Library
Options        :
        DEBUG          : on
        IPV6           : on
        SLURM          : off
        TORQUE         : off
Shared Libs required:
        libhwloc.so.5
        libevent-2.1.so.6
        libevent_pthreads-2.1.so.6
        libquadmath.so.0
        libgcc_s.so.1
        libgfortran.so.4
        libmunge.so.2

> pkg info openmpi
openmpi-1.10.7_3
Name           : openmpi
Version        : 1.10.7_3
Installed on   : Wed Aug 22 23:44:37 2018 MDT
Origin         : net/openmpi
Architecture   : FreeBSD:10:amd64
Prefix         : /usr/local
Categories     : net parallel
Licenses       : BSD3CLAUSE
Maintainer     : danilo@FreeBSD.org
WWW            : http://www.open-mpi.org/
Comment        : High Performance Message Passing Library
Options        :
        IPV6           : on
        SLURM          : off
        TORQUE         : off
        VT             : off
Shared Libs required:
        libquadmath.so.0
        libevent_pthreads-2.1.so.6
        libevent-2.1.so.6
        libhwloc.so.5
        libgfortran.so.4
        libgcc_s.so.1

> pkg info hwloc
hwloc-1.11.11
Name           : hwloc
Version        : 1.11.11
Installed on   : Wed Sep 19 08:08:13 2018 MDT
Origin         : devel/hwloc
Architecture   : FreeBSD:10:amd64
Prefix         : /usr/local
Categories     : devel
Licenses       : BSD3CLAUSE
Maintainer     : phd_kimberlite@yahoo.co.jp
WWW            : http://www.open-mpi.org/projects/hwloc/
Comment        : Portable Hardware Locality software package
Options        :
        CAIRO          : off
        DOCS           : on
Shared Libs required:
        libxml2.so.2
        libpciaccess.so.0

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-231697-7788>