FAQ  •  Login

Time Out error

<<

Vikalp

Newbie
Newbie

Posts: 13

Joined: Mon Aug 18, 2008 9:31 am

Unread post Sun Nov 09, 2008 2:24 pm

Time Out error

My code is working beautifully for numBodies up to 65536.

Beyond that it gives an error
"The launch timed out and was terminated at line "blah blah" "

and points to the line where I copy number of collisions (just an int) from device to host.

Makarand looked at nVIDIA forum and some people suggest the following:
1) it might be hardware error: someone changed the video card and it worked for him --> I tried running the code on Makarand's system, and it gave same error
2) for some people it worked by changing the block size and num block combination (which is not working for me)
3) driver issues --> one person wrote that this is a bug with cuda and nVIDIA is working on it.


Any suggestions ?
<<

BSme964uw

Newbie
Newbie

Posts: 15

Joined: Mon Aug 18, 2008 9:32 am

Unread post Sun Nov 09, 2008 4:41 pm

Re: Time Out error

Hi,

I don't know if it is a coincidence, but the maximum number of blocks in each dimension of a grid is 65535.

Brandon
<<

Vikalp

Newbie
Newbie

Posts: 13

Joined: Mon Aug 18, 2008 9:31 am

Unread post Sun Nov 09, 2008 10:18 pm

Re: Time Out error

I have some more observations:

1) My code works fine in EmuDebug mode for any number of bodies. number of collisions from bullet and from my code match perfectly.

2) In the debug mode the error never pops-up in the kernel, or the error message never points to the kernel.......I am printing a statement just after the kernel call and that prints fine every time, but gives error only at the place where I am trying to copy this integer (or long int) from device to host, which is kind of strange.

I did try to free some cuda memory before performing this memcopy operation, but that too doesn't work, and I have absolutely no idea what is going wrong and where.


Vikalp
<<

Dan Negrut

Global Moderator
Global Moderator

Posts: 833

Joined: Wed Sep 03, 2008 12:24 pm

Unread post Mon Nov 10, 2008 7:45 am

Re: Time Out error

Vikalp - you say "2) In the debug mode the error never pops-up in the kernel...", do you mean EmuDebug, or Debug.  If it works in Debug for more than 65K bodies just run it in Debug instead of Release.

On a different note, what happens when you don't try to copy that integer from device to host?  Does the code run OK?

Finally, it would be useful if you posted the kernel code.  Don't post everything (include files, etc.), but rather the important part of it so that people can take a quick look.
Dan
<<

Vikalp

Newbie
Newbie

Posts: 13

Joined: Mon Aug 18, 2008 9:31 am

Unread post Mon Nov 10, 2008 12:08 pm

Re: Time Out error

I think I have figured out the problem. I need to reduce global memory access in my code (which works fine on Sai's system even for very large number of bodies).

Basically need to change this portion of kernel, where x, y, z and r are read from the global memory

for (int iA=0; iA<(*nBody); iA++)
{
      int iB = thid + blid*BLOCK_SIZE;
if (iA {
centerDist = sqrt(pow(x[iA]-xP[thid],2) + pow(y[iA]-yP[thid],2) + pow(z[iA]-zP[thid],2));
rAB = r[iA] + radius[thid];
if (centerDist <= rAB)
{
atomicAdd(&collideID,1);
*nContacts = collideID;
}
}
}
<<

Dan Negrut

Global Moderator
Global Moderator

Posts: 833

Joined: Wed Sep 03, 2008 12:24 pm

Unread post Mon Nov 10, 2008 3:01 pm

Re: Time Out error

ok, Vikalp, i'm not going to look at the code you posted unless you come back and complain about it.
Dan
<<

MDme964uw

Newbie
Newbie

Posts: 13

Joined: Fri Sep 05, 2008 10:32 am

Unread post Tue Nov 11, 2008 1:17 pm

Re: Time Out error

I rewrote the entire program again last night, (I am not sure if the algorithm is fundamentally different though).
I get the same error where it says that the kernel timed out.
I guess I dont know what to do. It works for 2^16 bodies. If nothing works by tonight, I will probably just submit what I have.
<<

Dan Negrut

Global Moderator
Global Moderator

Posts: 833

Joined: Wed Sep 03, 2008 12:24 pm

Unread post Wed Nov 12, 2008 6:51 am

Re: Time Out error

sounds good, Makarand.  i will take a close look and if the code is correct and the problem is CUDA internal i'll grade your work accordingly.
dan

Return to ME964 Fall 2008: High Performance Computing

Who is online

Users browsing this forum: No registered users and 1 guest

cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software.