mpiexec/mpirun examples for Euler
Below are some examples of how to use MPI on Euler. It is assumed that these are performed inside a Torque job (interactive or batch) with the appropriate resources requested (nodes=m:ppn=n).
Use all resources currently available within the job. Starts m*n processes.
Start x number of processes, where x <= m*n. See the manpage for how to control where these are run.
Use y nodes, each with x processes, there x <= m && y <= n. Starts x*y processes.
See the mpiexec manpage (man mpiexec) for many more options.
If you are asked for a password when trying to run MPI jobs, enable passwordless logins between Euler and the compute nodes. Run the following on Euler, making sure to not set a passphrase:
Some other notes:
Another helpful hint for getting slightly more consistent timings: force your jobs to be run on similar hardware by appending the :intel or :amd flags to your resource request. :intel will direct your jobs to our Intel Xeon 5520 nodes (the ones with the GPUs) while :amd will direct them to the AMD Opteron 6274 nodes (the ones with 64 cores each).
Use all resources currently available within the job. Starts m*n processes.
Start x number of processes, where x <= m*n. See the manpage for how to control where these are run.
Use y nodes, each with x processes, there x <= m && y <= n. Starts x*y processes.
See the mpiexec manpage (man mpiexec) for many more options.
If you are asked for a password when trying to run MPI jobs, enable passwordless logins between Euler and the compute nodes. Run the following on Euler, making sure to not set a passphrase:
Some other notes:
- Our current install of OpenMPI (sources provided by Mellanox) has some issues with Torque integration. Use the --hostfile as above to get around this
- The Infiniband cards for Euler's GPU nodes are in PCIe2.0 x4 slots (no x8 slots and all the x16 slots have GPUs). As such, the theoretical maximum achievable bandwidth is 2000MB/s.
- We will be adding 15 more CPU nodes (same specs as Euler15) to Euler during the week of April 1. [edit: added. See the hint below.]
Another helpful hint for getting slightly more consistent timings: force your jobs to be run on similar hardware by appending the :intel or :amd flags to your resource request. :intel will direct your jobs to our Intel Xeon 5520 nodes (the ones with the GPUs) while :amd will direct them to the AMD Opteron 6274 nodes (the ones with 64 cores each).