FAQ  •  Login

Assignment 11: Problem 1 Bcast large amount of numbers

Moderators: Dan Negrut, ME964 Spring 2012

<<

S12fwang

Newbie
Newbie

Posts: 46

Joined: Mon Jan 23, 2012 9:07 pm

Unread post Wed Apr 18, 2012 5:39 pm

Assignment 11: Problem 1 Bcast large amount of numbers

I was working on the HW 11 prob1,use MPI_Bcast to send data to other processors, I found that when the size of array exceed 65536, I faced up with the problem like dead lock. I wonder if the length of array exceed the buffer of each processor. Anyone have any idea?
I simply use:
  Code:
MPI_Bcast(A,length MPI_CHAR, 0, MPI_COMM_WORLD);
<<

Dan Negrut

Global Moderator
Global Moderator

Posts: 833

Joined: Wed Sep 03, 2008 12:24 pm

Unread post Fri Apr 20, 2012 8:45 am

Re: Assignment 11: Problem 1 Bcast large amount of numbers

This is peculiar.
Can you please post the *smallest* amount of code that compiles and when run illustrates your problem.
Dan
<<

S12fwang

Newbie
Newbie

Posts: 46

Joined: Mon Jan 23, 2012 9:07 pm

Unread post Fri Apr 20, 2012 3:41 pm

Re: Assignment 11: Problem 1 Bcast large amount of numbers

I try to allocate memory:
  Code:
char* A = (char*) malloc( sizeof(char)*MAX);

using a for loop to bcast:
  Code:
for(length=0;length<31;length++)
{
      if (rank == 0) {
      MPI_Bcast(A,num, MPI_CHAR, 0, MPI_COMM_WORLD);
}
num*=2;
}

I qsub 16 processor:
  Code:
[fwang@euler26 ~]$ qsub -I -l nodes=2:ppn=8

Execute:
  Code:
[fwang@euler20 prob1]$ mpiexec -machinefile $PBS_NODEFILE ./prob1_1
Warning: Permanently added 'euler30,10.0.0.30' (RSA) to the list of known hosts.
That took for size of 1  0.170832 seconds
That took for size of 2  0.055991 seconds
That took for size of 4  0.080939 seconds
That took for size of 8  0.075992 seconds
That took for size of 16  0.055997 seconds
That took for size of 32  0.055992 seconds
That took for size of 64  0.055992 seconds
That took for size of 128  0.061000 seconds
That took for size of 256  0.070989 seconds
That took for size of 512  0.055991 seconds
That took for size of 1024  0.055993 seconds
That took for size of 2048  0.078988 seconds
That took for size of 4096  0.061004 seconds
That took for size of 8192  0.051004 seconds
That took for size of 16384  0.051011 seconds
^Cmpiexec: killing job...

--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 59590 on node euler20 exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
16 total processes killed (some possibly by mpiexec during cleanup)
mpiexec: clean termination accomplished


what's matter?
<<

S12fwang

Newbie
Newbie

Posts: 46

Joined: Mon Jan 23, 2012 9:07 pm

Unread post Fri Apr 20, 2012 3:52 pm

Re: Assignment 11: Problem 1 Bcast large amount of numbers

interesting things happened when I changed qsub:
  Code:
[fwang@euler26 ~]$ qsub -I -l nodes=4:ppn=4


result will be
  Code:
That took for size of 1  3.000923 seconds
That took for size of 2  0.000203 seconds
That took for size of 4  0.000233 seconds
That took for size of 8  0.000320 seconds
That took for size of 16  0.000249 seconds
That took for size of 32  0.000200 seconds
That took for size of 64  0.000219 seconds
That took for size of 128  0.000234 seconds
That took for size of 256  0.000267 seconds
That took for size of 512  0.000219 seconds
That took for size of 1024  0.000584 seconds
That took for size of 2048  0.000812 seconds
That took for size of 4096  0.000296 seconds
That took for size of 8192  0.000963 seconds
That took for size of 16384  0.000613 seconds
That took for size of 32768  0.000979 seconds
That took for size of 65536  0.001036 seconds
^Cmpiexec: killing job...

--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 57912 on node euler19 exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
16 total processes killed (some possibly by mpiexec during cleanup)
mpiexec: clean termination accomplished


Scale seems larger
<<

S12nvandam

Newbie
Newbie

Posts: 17

Joined: Mon Jan 23, 2012 9:07 pm

Unread post Fri Apr 20, 2012 4:02 pm

Re: Assignment 11: Problem 1 Bcast large amount of numbers

Your problem is this:
  Code:
if (rank == 0) {
      MPI_Bcast(A,num, MPI_CHAR, 0, MPI_COMM_WORLD);
}


it should rather simply be:
  Code:
MPI_Bcast(A, num, MPI_CHAR, 0, MPI_COMM_WORLD);


Every rank must see the MPI_Bcast call, otherwise the non-root processes won't know that they are supposed to be receiving a broadcast. MPI should handle whether a specific rank is sending or receiving based on the root you've provided (0 in your code). I don't know why the code has succeeded for small transfer sizes. It's possible that it could be behaving in a manner similar to the standard MPI_Send, which switches between eager and rendezvous modes depending on data transfer sizes, and is buffering the broadcast then returning and finishing the code without any of the other processes having received the data, but someone who knows more about MPI could correct me.

Noah
<<

S12fwang

Newbie
Newbie

Posts: 46

Joined: Mon Jan 23, 2012 9:07 pm

Unread post Fri Apr 20, 2012 4:11 pm

Re: Assignment 11: Problem 1 Bcast large amount of numbers

Oh, I fixed it. Thank you so much!!
<<

Dan Negrut

Global Moderator
Global Moderator

Posts: 833

Joined: Wed Sep 03, 2008 12:24 pm

Unread post Fri Apr 20, 2012 9:27 pm

Re: Assignment 11: Problem 1 Bcast large amount of numbers

Noah - thanks for your help.
Dan

Return to ME964 Spring 2012: High Performance Computing

Who is online

Users browsing this forum: No registered users and 1 guest

cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software.