While assisting some customers at a High Performance Computing Event, I had the need to remember how to debug an MPI application. See, when you create distributed applications that will run on various computers (nodes) you need to use special tools to debug them. Think about it, you want have a centralized Visual Studio instance and be able to debug each process within the same IDE. Even though the idea sounds demented, the implementation is actually quite simple given that you follow the steps carefully. Let's get started.
This is lengthy tutorial, so it will most likely be split into various steps. Edit
: It is now a 2 part tutorial, Part 2 is found here
.Step 1 : Install the Remote Debugger
You need to install the Remote Debugguer on EACH of the nodes that will run the application you are trying to debug. The remote debugger is included on the Visual Studio 2005 distribution media within the “\vs\Remote Debugger\x64” folder.
You need to install it on each of the compute nodes (and on the head node if it is going to be working as a compute node). Once you install it, make sure you fire it up so that it will be awaiting connections.
You need to use the x64 remote debugguer. Distributed applications on Windows Server 2003 Compute Cluster edition NEED
to be 64-bit if you would like to debug them with mpishim.
Step 2: Make mpishim Easily Accesible
When you install the remote debugger, mpishim is installed. Mpishim is the binary responsible for launching the processes on each of the nodes for debugging. The default location for mpishim is "C:\Program Files\Microsoft Visual Studio 8\Common7\IDE\Remote Debugguer\x64". The trick here is to copy all those binaries from that x64 folder to a place that is easier to specify (such as c:\windows\system32). By doing so, you do not need to specify the whole path of mpishim when modifying the project properties debug info (which will be done later on).
Furthermore, you want to make sure that you copy mpishim to the same location
compute node. That is, if you coiped mpishim on c:\windows\system32 on Node 1, then you must copy it for the rest of the nodes as well in the exact same directory.
It is a good idea to copy all of the files within that directory in order to avoid missing on a dependency that mpishim may have.Step 3: Modify the Registry
Cmd.exe has an issue with UNC paths. MPI Debugging relies on these paths so just to be safe and make sure nothing breaks, carry out the following modification on each of the clusters. Access the following registry key:HKEY_CURRENT_USER\Software\Microsoft\Command Processor
Add a DWORD entry entitled “DisableUNCCheck”
and set the value to 1:
That about covers the first half, on my next post I will cover the what needs to be done at the scheduler and visual studio level. Read the second part
in this link.