FAQ  •  Login

a general question about cuda 4.0

Moderator: Dan Negrut

<<

ME964KwangC

Newbie
Newbie

Posts: 11

Joined: Tue Jan 25, 2011 12:03 pm

Unread post Thu May 05, 2011 3:27 pm

a general question about cuda 4.0

As you know, we learned CUDA 3.2. It is a lot faster than CPU when the size of computed matrix is really big. (for example, 4096X4096)

However, when the size of computed matrix is very small such as 32X32, CPU is faster than GPU because of the time for data copy between GPU and host.

So in this case, I think CUDA is not effective way. As far as I know, CUDA 4.0 supports unified virtual addressing, so memory in host and device will be unified and works as one single space.

If I use CUDA 4.0, could I avoid the time waste for copying data between GPU and host? Do I understand what is the unified virtual addressing correctly?

Sometimes, in my research, I need to compute many different small matrices. So I don't think I could have big advantage with CUDA 3.2.

Does CUDA 4.0 solve this problem? Should I start to learn CUDA 4.0?

Thank your for your attention regarding this matter.
<<

Andrew Seidl

Administrator
Administrator

Posts: 193

Joined: Thu Oct 28, 2010 11:54 am

Unread post Fri May 06, 2011 1:50 pm

Re: a general question about cuda 4.0

Should you learn it, yes. Does UVA solve the problem of it taking 'forever' to move data to the GPU? Unfortunately not. What it does is make things easier for you, the programmer, to move data between the GPU and CPU, making it appear as a single address space (don't have to maintain two separate sets of pointers). This does not solve the hardware issue of the memories being physically separate. That issue is being addressed by projects such as AMD's Fusion.

As a side note, IBM's Cell BE processors (what the PS3 uses, and what Blue Waters will have) are somewhat similar to AMD Fusion. [For those still reading, this is based on my understanding of the architectures. It might be (probably is) wrong or oversimplified.] You have one main processor (called a PPE) and eight coprocessors (SPE). The main difference is that the SPEs are a bit beefier than the scalar processors in a GPU. Lastly, the 'official' OtherOS/Linux support on the PS3 only allowed you to use up to 6 SPEs at a time. Just a few days ago some people published a firmware that gives you direct hardware access, allowing use of all 8 SPEs (or 7, depending on if you keep the 'test' one disabled).

Return to ME964 Spring 2011: High Performance Computing

Who is online

Users browsing this forum: No registered users and 1 guest

cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software.