When you submit a job to be run in a
Compute Cluster Server, you will find some information about the running tasks at the bottom pane. Information such as error output, task name, and so on is shown, but there is one vital piece of information that should be shown (IMHO) by default and it is not: what nodes are running the current task?
Luckily, this can easily be solved by right clicking on the column headers, selecting
Add/Remove Columns and adding the Allocated Nodes column. This will make it easier to know where to look for output. The following clip shows how it is done (BTW, if you have not checked
Jing, make sure you do, it's amazing):
Oops...nevermind, apparently our Community Server blog cannot embed objects correctly :(
Check out the video in this link:
Finding Nodes of your Job
To recap, on my
last post we went through some of the steps that need to be taken when debugging an MPI application, namely:
-Install the x64 remote debugger
-Copy mpishim to an accessible loction
-Modify the registry to avoid UNC path problems in the future
Let's go ahead and finish the rest of the steps in order to debug an MPI application.
Step 4: Configure an Empty Job with the Job SchedulerThe job scheduler is a utility by which all jobs that are submitted to the cluster are managed. If you want to have something done at the cluster for you, then you need to use the job scheduler. Debugging is no exception, as you need to create an empty job that will host your debugging application.
To get started, open the job scheduler and from the File menu, select Submit Job:
Name your job "Debugging Job" and move over to the Processors tab. Select the number of processors you would like to use for this job and then (this is actually quite important), check the box that says "Run Job until end of run time or until cancelled". Failure to check this box will cause the empty job to run and finish - which is not what we want. We want the job to continually run, so that Visual Studio will then attach the running processes to this specific job. Don't forget to mark this!:
Next, you need to move to the Advanced tab and select which nodes will be part of your debugging scheme. In this case, I will only use 2 nodes, namely Kim03a (the head node) and Kim02a:
Click on submit job, you should see your job running. Make sure you write down the
ID of the job (in this case, it is 3) as you will need this info later on!!
Step 5: Configure Visual StudioOpen Visual studio and the project you are working on. Go to project properties and access the Debugging section. From there, instead of the Local Debugger, select MPI Cluster Debugger:
The following screenshot shows my debugger properties window with all necessary values filled in:
Let's go ahead and talk about each of these values:
MPI Run Command: This needs to be mpiexec for MPI applications
MPIRun Arguments: The first argument "-job 3.0" is to specify which is the job in the scheduler to use. In my case, it was 3 when I created the job, and the 0 is to specify the task, which every job has by default. We then have "-np 2" which is used to specify that we will be using 2 nodes for this job. Finally you see I have "-machinefile \\kim03a\bin\machines.txt". The "-machinefile" is used to specfify the UNV location of a text file that contains the names of the machines that will be part of this job. The text file should have the names of the machines on each line.
MPIRun Working Directory: Use this location to specify the path where any output will be written to. Remember NOT to use absolute paths but rather UNC paths to make sure that this location is available to every node.
Application Command: This is the UNV path to the MPI application that you would like to debug. This application HAS to be compiled to 64-bit and debugging symbols should be in that same directory as well.
MPIShim Location: In this location, specify the path to the mpishim.exe binary that you copied in step 2 of this tutorial. Remember, mpishim should exist on each and every one of the machines at the specified local path.
MPI network security mode: I usually change it to Accept connections from any address to avoid problems
You probably also noted that there is an
Application Arguments window. In this row you would specify any additional commands you would like to send to the application.
Apply the settings, hit F5 and you should be ready to go and debug your processes. While trying to get this to work, I experienced pretty much every error out there, so post in the comments if you any issues and I will help you resolve them. Happy debugging!
The Compute Cluster Pack can be downloaded from Microsoft's site; however it is not as trivial as it sounds. These steps will hopefully make it easier to obatin the bits:
- Go to http://www.microsoft.com/windowsserver2003/ccs/default.aspx
- Click on the Get the Trial Software link:
- Click on the big blue button that says Get Started Today
- Sign-in to microsoft
- Select your country from the list
- Fill out the information that is being requested
- Review your order total (whooping $0.00), agree to the terms and conditions and click Place Order
- You will get a receipt and can now click on link to go to the installation instructions
- You will then be presented with the option to download the Compute Cluster Pack:
Enjoy!
While assisting some customers at a High Performance Computing Event, I had the need to remember how to debug an MPI application. See, when you create distributed applications that will run on various computers (nodes) you need to use special tools to debug them. Think about it, you want have a centralized Visual Studio instance and be able to debug each process within the same IDE. Even though the idea sounds demented, the implementation is actually quite simple given that you follow the steps carefully. Let's get started.
This is lengthy tutorial, so it will most likely be split into various steps.
Edit: It is now a 2 part tutorial, Part 2 is
found here.
Step 1 : Install the Remote DebuggerYou need to install the Remote Debugguer on EACH of the nodes that will run the application you are trying to debug. The remote debugger is included on the Visual Studio 2005 distribution media within the “\vs\Remote Debugger\x64” folder.
You need to install it on each of the compute nodes (and on the head node if it is going to be working as a compute node). Once you install it, make sure you fire it up so that it will be awaiting connections.
You need to use the x64 remote debugguer. Distributed applications on Windows Server 2003 Compute Cluster edition
NEED to be 64-bit if you would like to debug them with mpishim.
Step 2: Make mpishim Easily AccesibleWhen you install the remote debugger, mpishim is installed. Mpishim is the binary responsible for launching the processes on each of the nodes for debugging. The default location for mpishim is "C:\Program Files\Microsoft Visual Studio 8\Common7\IDE\Remote Debugguer\x64". The trick here is to copy all those binaries from that x64 folder to a place that is easier to specify (such as c:\windows\system32). By doing so, you do not need to specify the whole path of mpishim when modifying the project properties debug info (which will be done later on).
Furthermore, you want to make sure that you copy mpishim to the
same location on
EACH compute node. That is, if you coiped mpishim on c:\windows\system32 on Node 1, then you must copy it for the rest of the nodes as well in the exact same directory.
It is a good idea to copy all of the files within that directory in order to avoid missing on a dependency that mpishim may have.
Step 3: Modify the RegistryCmd.exe has an issue with UNC paths. MPI Debugging relies on these paths so just to be safe and make sure nothing breaks, carry out the following modification on each of the clusters. Access the following registry key:
HKEY_CURRENT_USER\Software\Microsoft\Command Processor
Add a DWORD entry entitled
“DisableUNCCheck” and set the value to 1:
That about covers the first half, on my next post I will cover the what needs to be done at the scheduler and visual studio level. Read the
second part in this link.