Tuesday, March 11, 2008

Running AMBER-pbsa on BigRed:[4] Parallel pmemd through condorG

There were few obstacles to submit amber jobs through condorG system.

(1) loadleveler specific commands
BigRed uses loadleveler as it's batch system. Compared to PBS or LSF, loadlever has some distinguished keyword, such as class instead of queue, and using machine list file. But we didn't have any problem with those during submitting job through condorG. machine list file is generated and used automatically by the job script file created by job manager.

(2) passing arguments
Amber uses arguments for its input/output/refer... files. You have to include those arguments in the globusrsl string. The condor keyword, "arguments" is related to the arguments for mpirun. However, it didn't really work for mpirun arguments too.

(3) specifying number of process and machine
loadleveler requires commands for setting the number of node and process, such as node, tasks_per_node. Also mpirun provides argument -np to specify the number of total processes being used for this job. To passing right value to loadleveler and mpirun, you have to specify those valued in the globusrsl. "count" in globusrsl string will be mapped to the value of -np in mpirun. And "hostCount" in the globusrsl string will be mapped to the value of "node" in the job script generated by job manager. I could not find how to specify the task_per_node. However, somehow based on the values of node and -np, job manager generated task_per_node.

(4) script
This is the condorG script for this job submission.
========================================================================
executable = /N/soft/linux-sles9-ppc64/amber9-ibm-64/exe/pmemd.MPI
transfer_executable = false
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/leesangm/bio/mm_pbsa/amber_min.in, /home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd, /home/leesangm/bio/mm_pbsa/ZINC04273785_com.top, /home/leesangm/bio/mm_pbsa/ZINC04273785.crd, /home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd

universe = grid
grid_resource = gt2 gatekeeper.bigred.iu.teragrid.org/jobmanager-loadleveler
transfer_output_files = min.out.$(Cluster), ZINC04273785.crd
error = amber.err.$(Cluster)
log = amber.log.$(Cluster)
x509userproxy = /tmp/x509up_u500

globusrsl = (jobtype=mpi)\
(count=16)\
(hostCount=4)\
(maxWallTime=00:15)\
(queue=DEBUG)\
(arguments= -O -i amber_min.in -o min.out.$(Cluster) -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd )


queue

=========================================================================

No comments: