FAQ  •  Login

MPI_Send hangs?

Moderators: Dan Negrut, ME964 Spring 2012

<<

S12ymyang

Jr. Member
Jr. Member

Posts: 66

Joined: Mon Jan 23, 2012 9:07 pm

Unread post Sat Apr 21, 2012 5:27 pm

MPI_Send hangs?

Is anyone having issues with MPI_Send hanging?

It does so at different times (i.e. different send sizes), but around when I'm sending to process 12, for 2^26 (67108864) bytes. This is in a for loop, so that means that a bunch of sends were correct (process 0-15 for 2^0 up to 2^25 all succeed). Is there any reason a process, which outputs that it's about to wait for the right amount of data, would just hang suddenly such that the root process can't move on from the blocking send?

Thanks,
Ming
<<

S12dmcao

Newbie
Newbie

Posts: 15

Joined: Mon Jan 23, 2012 9:07 pm

Unread post Sun Apr 22, 2012 12:11 am

Re: MPI_Send hangs?

You're not alone; I'm having a similar issue with hangs using MPI_Ssend. My transfers would proceed nicely up until N_bytes = 2^25. At that point, it would only succeed up to process 6. However, if I then restart the program from the troublesome byte number (i.e. 2^25), the transfer will run just fine through all the processes, but afterwards hang on the next byte iteration (i.e. 2^26) on some other process. Anyways, very confusing.
<<

S12dmcao

Newbie
Newbie

Posts: 15

Joined: Mon Jan 23, 2012 9:07 pm

Unread post Sun Apr 22, 2012 1:09 am

Re: MPI_Send hangs?

By the way, try using just 1 node, 16 processors. I just ran with that and my program no longer hangs. Apparently, it only hangs when I start to use more than one node (I have no clue why). I realize this may not be the best answer you're looking for.
<<

S12ymyang

Jr. Member
Jr. Member

Posts: 66

Joined: Mon Jan 23, 2012 9:07 pm

Unread post Sun Apr 22, 2012 2:30 am

Re: MPI_Send hangs?

S12dmcao wrote:By the way, try using just 1 node, 16 processors. I just ran with that and my program no longer hangs. Apparently, it only hangs when I start to use more than one node (I have no clue why). I realize this may not be the best answer you're looking for.


I thought each node could only have 8 processor/cores? Or is there some confusion on my end regarding nodes/processors/cores?

Haha I'll definitely try that though, thanks!
Ming
<<

S12pandey

Newbie
Newbie

Posts: 17

Joined: Mon Jan 23, 2012 9:07 pm

Unread post Sun Apr 22, 2012 6:14 am

Re: MPI_Send hangs?

Facing a similar situation here. Everything goes well for BCast with different no of nodes, processes, but with MPI_Send, it hangs after 2^26. Has anyone found the way to go around this? Is it an issue of deadlock?
<<

S12puthoor

Newbie
Newbie

Posts: 21

Joined: Mon Jan 23, 2012 9:07 pm

Unread post Sun Apr 22, 2012 10:26 am

Re: MPI_Send hangs?

Same situation for me.
<<

S12ymyang

Jr. Member
Jr. Member

Posts: 66

Joined: Mon Jan 23, 2012 9:07 pm

Unread post Sun Apr 22, 2012 3:08 pm

Re: MPI_Send hangs?

S12dmcao wrote:By the way, try using just 1 node, 16 processors. I just ran with that and my program no longer hangs. Apparently, it only hangs when I start to use more than one node (I have no clue why). I realize this may not be the best answer you're looking for.


This post saved the day for me. I got into iShell via:


  Code:
$ qsub -I -l nodes=1:ppn=16

Then I ran via:

  Code:
mpiexec -n 16 --hostfile $PBS_NODEFILE ./programname
<<

S12pandey

Newbie
Newbie

Posts: 17

Joined: Mon Jan 23, 2012 9:07 pm

Unread post Sun Apr 22, 2012 6:43 pm

Re: MPI_Send hangs?

Thanks guys, this works for me . :)
<<

S12ag

Newbie
Newbie

Posts: 16

Joined: Thu Jan 26, 2012 11:23 pm

Unread post Sun Apr 22, 2012 7:24 pm

Re: MPI_Send hangs?

Yup. Works with one node.
<<

S12tatannen

Newbie
Newbie

Posts: 23

Joined: Mon Jan 30, 2012 10:17 am

Unread post Sun Apr 22, 2012 11:04 pm

Re: MPI_Send hangs?

While it indeed does all work with one node, I found that the Broadcast times took a big hit (i.e. they increased a lot) when using one node. In the end, I did qsub nodes=16:ppn=1, and was able to run my code up to 2^29 successfully after a couple tries. While I was never able to get all the way to 2^30, I felt it was better (for the goals of this assignment anyway) to run w/ multiple physical nodes and get reasonable times for MPI_BCast.

Return to ME964 Spring 2012: High Performance Computing

Who is online

Users browsing this forum: No registered users and 1 guest

cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software.