Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 25 May 2003 02:49:30 -0400
From:      Anthony Schneider <anthony@x-anthony.com>
To:        freebsd-current@freebsd.org
Subject:   mpi + shmem issues
Message-ID:  <20030525064929.GA96588@x-anthony.com>

next in thread | raw e-mail | index | archive | help

--6TrnltStXW4iwmi0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hello,
My machine is a dual athlon:
FreeBSD pickle. 5.1-BETA FreeBSD 5.1-BETA #6: Sun May 25 02:16:15 EDT 2003
anthony@pickle.:/usr/src/sys/i386/compile/PICKLE  i386

I started having this issue, which may or may not exist on uniprocessor
systems or 4.x systems.  I built mpi with ch_shmem device for shared memory
programs (instead of the more common rsh/ssh), and something strange
happens.  For even the most basic little program, the program will launch
fine (usually) the first time i run it after the system boots, but after a
few executions, execution starts failing consistently until after i reboot.

as an example, here is a small acknowledgment program:

#include <mpi.h>
#include <stdio.h>

int main (int argc, char *argv[]) {
        int mpiRank, mpiSize;

        MPI_Init (&argc, &argv);
        MPI_Comm_rank (MPI_COMM_WORLD, &mpiRank);

        printf ("#%d here\n", mpiRank);

        return 0;

}

and here is the history of executing it:

pickle:anthony:/home/anthony/src/mpi:6% mpirun -np 2 ./foo
#0 here
#1 here
Child process exited unexpectedly 0
Abort trap (core dumped)
pickle:anthony:/home/anthony/src/mpi:7% mpirun -np 2 ./foo
#0 here
pickle:anthony:/home/anthony/src/mpi:8% #1 here

pickle:anthony:/home/anthony/src/mpi:8% mpirun -np 2 ./foo
#0 here
#1 here
pickle:anthony:/home/anthony/src/mpi:9% mpirun -np 2 ./foo
#0 here
#1 here
pickle:anthony:/home/anthony/src/mpi:10% mpirun -np 2 ./foo
#1 here
#0 here
Child process exited unexpectedly 0
Abort trap (core dumped)
pickle:anthony:/home/anthony/src/mpi:11% mpirun -np 2 ./foo
#0 here
#1 here
Child process exited unexpectedly 0
Abort trap (core dumped)
pickle:anthony:/home/anthony/src/mpi:12% mpirun -np 2 ./foo
#0 here
#1 here
pickle:anthony:/home/anthony/src/mpi:13% mpirun -np 2 ./foo
#1 here
#0 here
Child process exited unexpectedly 0
Abort trap (core dumped)
pickle:anthony:/home/anthony/src/mpi:14% mpirun -np 2 ./foo
#0 here
#1 here
pickle:anthony:/home/anthony/src/mpi:15% mpirun -np 2 ./foo
#0 here
#1 here
pickle:anthony:/home/anthony/src/mpi:16% mpirun -np 2 ./foo
semget failed for setnum =  0
Abort trap (core dumped)
pickle:anthony:/home/anthony/src/mpi:17% mpirun -np 2 ./foo
semget failed for setnum =  0
Abort trap (core dumped)
pickle:anthony:/home/anthony/src/mpi:18% mpirun -np 2 ./foo
semget failed for setnum =  0
Abort trap (core dumped)

... (continues until i reboot)

the first run that aborts is strange, but since it is not something
i've witnessed previously, i'd like to forget that and focus on
the repeated semget failures.  i would normally be looking into
the mpi implementation (mpich 1.2.5), but since after semget fails
once it never seems to succeed again with other mpi programs, i
think this to be a freebsd problem.

i'm runing a (barely) custom kernel, with nothing added to it.
i just cvsup'd and rebuilt less than an hour ago, and the problem
has persisted from beta #5 through beta #6.

any suggestions?

thank you for your help.

-Anthony.

--6TrnltStXW4iwmi0
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (FreeBSD)

iD8DBQE+0Gd5KUeW47UGY2kRAjJxAJ9pgtjX0siafq+1AZ8FIeBrIF9tIwCaAxCj
GdV8I/NePVDjCT2Zb8kTZ5E=
=hYzt
-----END PGP SIGNATURE-----

--6TrnltStXW4iwmi0--


Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030525064929.GA96588>