Monday, December 14, 2009

Advanced debugging of submitting jobs to Swarm-Grid

This instruction is for the advanced users of Swarm-Grid to track their submitted jobs.
(1) Checking the remote cluster
[Step 1] login to a machine which has Globus installed.
[Step 2] get your proxy certificate
myproxy-logon -s myproxy.teragrid.org -l your_user_name
[Step 3] submit a request to remote cluster
globusrun -o -r grid-co.ncsa.teragrid.org/jobmanager '&(executable=/bin/ls)'

* If this step works fine, your remote cluster and the jobmanager on the cluster works fine.
* If your myproxy-logon cannot be located, check your globus user environment is set up correctly.
source $GLOBUS_LOCATION/etc/globus-user-env.sh

(2) Checking the mySQL table directly
[Step 1] login to the mySQL with username and password for the swarm.
mysql -u jobsub -p
[Step 2] select the database
use jobsubmission;
[Step 3] check the debugging log for your job
select * from SubmitRecord where TicketID="213655869" AND InternalID=200;
* As soon as the job reaches the stage, swarm records the time to the field. Therefore if the value of the field is NULL, the job has not been reached to that stage.

(3) Globus Temporary Files and Locations
your_home
--.globus
----g1
------h6.bigred.teragrid.iu.edu
-------- directories with the actual globus job numbers
-----------remote_io_url
-----------scheduler_loadleveler_job_script:
-----------stdout
-----------x509_up
-- Globus*** : temporary directory: keeps input,output etc.. files. This is the directory your script will process as current directory.

* Temporary directory is mostly deleted after the job is processed. This is for the bigred. Other teragrid machines have similar structures for the temporary files and unstaged files.

No comments: