Sangmi Lee Pallickara

Wednesday, April 30, 2008

Running PaCE via Teragrid Job submission service

To submit PaCE job through the Teragrid Job submission service, you need,

-- input files (fasta format file(s) and .cfs file) with valid URL(s)
-- number of clusters

Here is the sample java code to access TJSS.
====================================================================
import javax.xml.namespace.QName;

import org.apache.axis2.AxisFault;
import org.apache.axis2.addressing.EndpointReference;
import org.apache.axis2.client.Options;
import org.apache.axis2.rpc.client.RPCServiceClient;
import org.ogce.jobsubmissionservice.databean.*;

public class SubmitPaCEJob{
public static void main(String[] args) throws AxisFault {
int option = 0;
String serviceLoc = "http://localhost:8080/axis2/services/JobSubmissionService";
String serviceMethod = "submitJob";
String myproxy_username = null;
String myproxy_passwd = null;
for (int j = 0; j < argval =" args[j];" option ="1;" option =" 2;" option =" 3;"> 0){
if (option ==1)
serviceLoc = argVal;
else if (option ==2)
myproxy_username = argVal;

else if (option ==3)
myproxy_passwd = argVal;
}
}

String [] inputFileString ={
"http://validURLS:8080/tmp/PaCEexample/Brassica_rapa.mRNA.EST.fasta.PaCE",
"http://validURLS:8080/tmp/PaCEexample/Phase.cfg"
};
String rslString =
"(jobtype=mpi)"+
"(count=4)"+
"(hostCount=2)"+
"(maxWallTime=00:15)"+
"(queue=DEBUG)"+
"(arguments= Brassica_rapa.mRNA.EST.fasta.PaCE 33316 Phase.cfg)";

String [] outputFileString = {
"estClust.33316.3.PaCE",
"ContainedESTs.33316.PaCE",
"estClustSize.33316.3.PaCE",
"large_merges.33316.9.PaCE"};

try{
CondorJob cj = new CondorJob();
cj.setExecutable("/N/u/leesangm/BigRed/bin/PaCE_v9");
cj.setTransfer_input_files(inputFileString);
cj.setGrid_resource("gt2 gatekeeper.bigred.iu.teragrid.org/jobmanager-loadleveler");
cj.setTransfer_output_files(outputFileString);
cj.setGlobusrsl(rslString);
cj.setMyProxyHost("myproxy.teragrid.org:7512");
cj.setMyProxyNewProxyLifetime("7200");
cj.setMyProxyCredentialName(myproxy_username);
cj.setMyProxyPassword(myproxy_passwd);
cj.setMyProxyRefreshThreshold("3600");

System.out.println(cj.toString());

RPCServiceClient serviceClient = new RPCServiceClient();

Options options = serviceClient.getOptions();

EndpointReference targetEPR = new EndpointReference(serviceLoc);

options.setTo(targetEPR);

QName query = new QName("http://jobsubmissionservice.ogce.org/xsd", serviceMethod);
Class [] returnTypes = new Class []{JobMessage[].class};
Object[] queryArgs = new Object[] {cj};
Object [] response = serviceClient.invokeBlocking(query,queryArgs,returnTypes);
JobMessage[] result = (JobMessage[])response[0];

System.out.println(result[0].toString());
}catch (Exception e){
e.printStackTrace();
}
}

private static void usage(){
System.out.println("Usage: submit_job -s \n"+
"-l \n"+
"-p \n"+
"==========================================================="+
"\n[Example]:\n"+
"submit_job "+
"-s http://localhost:8080/axis2/services/JobSubmissionService "+
"-l yourusername "+
"-p yourpassword ");
return;
}
}

Tuesday, April 29, 2008

Running PaCE on BigRed

PaCE is software to cluster large collections of Expressed Sequence Tags(EST). BigRed provides PaCE package only for the internal use. Here are examples of job submission for the condorG to the BigRed for the PaCE package. Note: I set the "OutputFolder" parameter in the Phase.cfg as ".". This will let the gram job manager put all of the output files into the globus scratch directory.
============================================================

executable = /N/u/leesangm/BigRed/bin/PaCE_v9
transfer_executable = false
should_transfer_files = true
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/leesangm/EST/data/Brassica_rapa.mRNA.EST.fasta.PaCE, /home/leesangm/EST/data/Phase.cfg
universe = grid
grid_resource = gt2 gatekeeper.bigred.iu.teragrid.org/jobmanager-loadleveler
transfer_output_files = estClust.33316.3.PaCE
error = PaCE.err.$(Cluster)
err = PaCE.standardErr.$(Cluster)
log = PaCE.log.$(Cluster)
x509userproxy = /tmp/x509up_u500

globusrsl = (jobtype=mpi)(queue=DEBUG)(maxWallTime=00:15)\
(count = 4)\
(hostCount = 2)\
(maxWallTime=00:15)\
(arguments= 'Brassica_rapa.mRNA.EST.fasta.PaCE' '33316' 'Phase.cfg')

queue

Wednesday, April 23, 2008

Sample code: Access Job Submission Service from PHP page

If you installed WSF/PHP with your php server, you can try simple test php pages to access job submission service. Please make sure the SOAP body part should follow the style defined in the WSDL file. In my case, I could get the XML string from my SOAP monitor easily. However, you can simply download my example and use it as your template.

(1)Example of the SubmitJob operation
source code(mpi job)
source code(perl job)
Please note that you have to replace the username and password with your valid teragrid account.

(2)Example of the job management operations
source code(GetStatus)
source code(GetError)
source code(GetLog)

(3)Example of the retrieving output operation
source code(GetOutput)

Access to a Web Service from your PHP page

If you want to access Axis2 based Web Service such as OGCE local services(Job Submission service, and File Agent service) from your PHP page, there are several ways to do it. NuSOAP and Pear SOAP provide APIs into WebService. Also IMHO PHP has PHP's built-in SOAP libraries.

Since I'm a complete beginner of PHP, I'm not that familliar with the advanced design issues regarding the PHP based application. However, WSO2's WSF/PHP was a good starting point to me. Especially, it could provide us a proof of concept of our service which interacts with PHP pages through the standard WSDL interface.

You can use WSF/PHP both to build a Web Service and to build a client of a Web Service. In my case, I wanted to create a PHP client accessing a Web Service running in the Axis2 container. First, WSF/PHP provides quite simple interface to the PHP clients. For the examples, please refer to the next blog message. Basically, I needed to provide EPR of the service and XML string which are supposed to be in the SOAP body. Besides the ease-to-use feature, I was happy with that WSF/PHP allows the users to control over the SOAP message when it is needed(such as version of SOAP).

I've tried WSF/PHP with PHP5.1.1 on Linux box. Here is the step-by-step guide to the installation.

Step 1.Apache HTTP server install (if you don't have one already)
download PHP5.1.1 and install
download WSO2 WSF/PHP source from the project web site

Step 2. go to the directory of WSF/PHP source code
./configure
make
make install

Step 3. in php.ini (it will be in /usr/local/lib/php.ini if you didn't change the location)
add following lines:

extension=wsf.so
extension=xsl.so
extension_dir="/usr/local/lib/php/extensions/no-debug-non-zts-***".
include_path = "/home/username/php/wso2-wsf-php-src-1.2.1/script"

Step 4. copy sample code included in the code distribution to the Web server's document root. Test http://localhost/samples/

Tuesday, March 11, 2008

Running AMBER-pbsa on BigRed:[4] Parallel pmemd through condorG

There were few obstacles to submit amber jobs through condorG system.

(1) loadleveler specific commands
BigRed uses loadleveler as it's batch system. Compared to PBS or LSF, loadlever has some distinguished keyword, such as class instead of queue, and using machine list file. But we didn't have any problem with those during submitting job through condorG. machine list file is generated and used automatically by the job script file created by job manager.

(2) passing arguments
Amber uses arguments for its input/output/refer... files. You have to include those arguments in the globusrsl string. The condor keyword, "arguments" is related to the arguments for mpirun. However, it didn't really work for mpirun arguments too.

(3) specifying number of process and machine
loadleveler requires commands for setting the number of node and process, such as node, tasks_per_node. Also mpirun provides argument -np to specify the number of total processes being used for this job. To passing right value to loadleveler and mpirun, you have to specify those valued in the globusrsl. "count" in globusrsl string will be mapped to the value of -np in mpirun. And "hostCount" in the globusrsl string will be mapped to the value of "node" in the job script generated by job manager. I could not find how to specify the task_per_node. However, somehow based on the values of node and -np, job manager generated task_per_node.

(4) script
This is the condorG script for this job submission.
========================================================================
executable = /N/soft/linux-sles9-ppc64/amber9-ibm-64/exe/pmemd.MPI
transfer_executable = false
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/leesangm/bio/mm_pbsa/amber_min.in, /home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd, /home/leesangm/bio/mm_pbsa/ZINC04273785_com.top, /home/leesangm/bio/mm_pbsa/ZINC04273785.crd, /home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd

universe = grid
grid_resource = gt2 gatekeeper.bigred.iu.teragrid.org/jobmanager-loadleveler
transfer_output_files = min.out.$(Cluster), ZINC04273785.crd
error = amber.err.$(Cluster)
log = amber.log.$(Cluster)
x509userproxy = /tmp/x509up_u500

globusrsl = (jobtype=mpi)\
(count=16)\
(hostCount=4)\
(maxWallTime=00:15)\
(queue=DEBUG)\
(arguments= -O -i amber_min.in -o min.out.$(Cluster) -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd )

queue
=========================================================================

Friday, March 7, 2008

Running AMBER-pbsa on SDSC machines:[1] Serial Job interactively

Good News! There were AMBER installations in three of SDSC machines: DataStar, BlueGene and Teragrid. For the serial example, I could not run on BlueGene, because there was no "sander" executable under amber9 installation. However, here are some guidelines about running amber on SDSC machines.
All of the machines keep the amber installation under the directory with same name structure.
/usr/local/apps/amber9
Therefore, just set the environment and run the same command on each of the account.

leesangm/amber> setenv AMBERHOME /usr/local/apps/amber9
leesangm/amber> set path = ( $path $AMBERHOME/exe )
leesangm/amber> $AMBERHOME/exe/sander -O -i amber_min.in -o min.out -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd

Running AMBER-pbsa on NCSA machines:[1] Serial Job interactively

Running on tungsten and cobalt takes much longer time(tungsten:55 mins, cobalt:32 mins) than I expected. It takes longer time than BigRed.

1. tungsten
Set environment variables,
[leesangm@tund mm_pbsa]$ setenv AMBERHOME /usr/apps/chemistry/AMBER/Amber9/amber9
[leesangm@tund mm_pbsa]$ set path = ( $path $AMBERHOME/exe )
And get the same test package of amber test and run the command,
$AMBERHOME/exe/sander -O -i amber_min.in -o min.out -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd

2. cobalt
Set environment variables,
[leesangm@tund mm_pbsa]$ setenv AMBERHOME /usr/apps/chemistry/amber/amber9/amber9
[leesangm@tund mm_pbsa]$ set path = ( $path $AMBERHOME/exe )
And get the same test package of amber test and run the command,
$AMBERHOME/exe/sander -O -i amber_min.in -o min.out -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd