FAQ  •  Login

Progress with GPU Cluster



Supreme Overlord
Supreme Overlord

Posts: 37

Joined: Wed Sep 03, 2008 12:23 pm

Unread post Thu Apr 01, 2010 3:11 pm

Progress with GPU Cluster

Information on problems I encountered and any solutions I found will go here

Basic Outline [details to be added later]
(Restart Computer As Required):
Install Windows Server 2008 R2 on the Head Node
Activate Windows, apply updates as needed

Install Active Directory
Install HPC Cluster Manager
Install Remote Desktop

Add Nodes

Configuring Cluster:
Once all compute nodes have been added to cluster the next step is to install software
Begin by going into the cluster manager and creating a new Group
Call this group "Nodes", only add the compute nodes (and not the head node) to this group.
This group will be used as a way to specify that only the compute nodes will be used for a job

Installation of software will be done using the "clusrun" command from a HPC powershell

clusrun /nodegroup:Nodes /workdir:\\SBELHPCHEAD\TestFolder Example.exe /s

clusrun               -  run a job on cluster
/nodegroup:Nodes         -  job wil be run on group "Nodes" tha was just created
/workdir:\\SBELHPCHEAD\TestFolder   -  specifies working directory where code will run from, note that SBELHPCHEAD is the name of the head node and the folder "TestFolder" is a shared directory
Example.exe            -  code that jab will execute
/s               -  flag for executable

Installing CUDA:
Install the CUDA Toolkit and SDK on Head node
Install CUDA Tesla Server Driver on compute nodes:
clusrun /nodegroup:Nodes /workdir:\\SBELHPCHEAD\Downloads 197.03_Tesla_winserv2008R2_64bit_international_whql.exe /s

where: 197.03_Tesla_winserv2008R2_64bit_international_whql.exe is the driver to be installed
Note that in this case the working directory is called "Downloads" and is a shared folder

PS C:\Program Files\Microsoft HPC Pack 2008 R2\Bin> clusrun /nodegroup:Nodes /workdir:\\SBELHPCHEAD\Downloads 197.03_Tesla_winserv2008R2_64bit_international_whql.exe /s

-------------------------- COMPUTENODE003 returns 0 --------------------------
-------------------------- COMPUTENODE005 returns 0 --------------------------
-------------------------- COMPUTENODE006 returns 0 --------------------------
-------------------------- COMPUTENODE004 returns 0 --------------------------
-------------------------- COMPUTENODE001 returns 0 --------------------------
-------------------------- COMPUTENODE002 returns 0 --------------------------

-------------------------- Summary --------------------------
6 Nodes succeeded
0 Nodes failed
PS C:\Program Files\Microsoft HPC Pack 2008 R2\Bin>

Testing if CUDA works:
PS C:\Program Files\Microsoft HPC Pack 2008 R2\Bin> clusrun /nodes:Computenode001 /workdir:\\SBELHPCHEAD\Downloads\TestingCUDA deviceQuery

This job only runs on one of the compute nodes

Note that the clusrun command can only be run by an admin.

When compiling programs for the cluster, make sure that sm_13 GPU architecture is selected in the CUDA build rule, unexpected things may happen (crashing programs) if this is not done.
Last edited by HammadM on Thu May 20, 2010 4:09 pm, edited 1 time in total.

Dan Negrut

Global Moderator
Global Moderator

Posts: 833

Joined: Wed Sep 03, 2008 12:24 pm

Unread post Thu Apr 01, 2010 3:38 pm

Re: Progress with GPU Cluster

Hammad - please post the two emails that you sent out to Frank, and also the reply you got from him.
thank you for starting this up,

Return to High Performance Computing

Who is online

Users browsing this forum: No registered users and 1 guest

Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software.