Tuesday, December 9, 2008
renewing proxy
#!/bin/bash
export GLOBUS_LOCATION=$HOME/globus-condor/globus
source $GLOBUS_LOCATION/etc/globus-user-env.sh
myproxy-logon -s myproxy.teragrid.org -l quakesim -t 5000 -S << EOF
PUT_PASSWORD_HERE
EOF
Monday, December 1, 2008
Writing simple Java application with HBase APIs[0]
The code is mainly downloaded from the hbase site,
http://hadoop.apache.org/hbase/docs/r0.2.1/api/index.html
import java.io.IOException;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Scanner;
import org.apache.hadoop.hbase.io.BatchUpdate;
import org.apache.hadoop.hbase.io.Cell;
import org.apache.hadoop.hbase.io.RowResult;
import org.apache.hadoop.hbase.HBaseConfiguration;
public class MySimpleTest {
public static void main(String args[]) throws IOException {
// You need a configuration object to tell the client where to connect.
// But don't worry, the defaults are pulled from the local config file.
HBaseConfiguration config = new HBaseConfiguration();
// This instantiates an HTable object that connects you to the "myTable"
// table.
HTable table = new HTable(config, "myTable");
// To do any sort of update on a row, you use an instance of the BatchUpdate
// class. A BatchUpdate takes a row and optionally a timestamp which your
// updates will affect.
BatchUpdate batchUpdate = new BatchUpdate("myRow");
// The BatchUpdate#put method takes a Text that describes what cell you want
// to put a value into, and a byte array that is the value you want to
// store. Note that if you want to store strings, you have to getBytes()
// from the string for HBase to understand how to store it. (The same goes
// for primitives like ints and longs and user-defined classes - you must
// find a way to reduce it to bytes.)
batchUpdate.put("myColumnFamily:columnQualifier1",
"columnQualifier1 value!".getBytes());
// Deletes are batch operations in HBase as well.
batchUpdate.delete("myColumnFamily:cellIWantDeleted");
// Once you've done all the puts you want, you need to commit the results.
// The HTable#commit method takes the BatchUpdate instance you've been
// building and pushes the batch of changes you made into HBase.
table.commit(batchUpdate);
// Now, to retrieve the data we just wrote. The values that come back are
// Cell instances. A Cell is a combination of the value as a byte array and
// the timestamp the value was stored with. If you happen to know that the
// value contained is a string and want an actual string, then you must
// convert it yourself.
Cell cell = table.get("myRow", "myColumnFamily:columnQualifier1");
String valueStr = new String(cell.getValue());
// Sometimes, you won't know the row you're looking for. In this case, you
// use a Scanner. This will give you cursor-like interface to the contents
// of the table.
Scanner scanner =
// we want to get back only "myColumnFamily:columnQualifier1" when we iterate
table.getScanner(new String[]{"myColumnFamily:columnQualifier1"});
// Scanners in HBase 0.2 return RowResult instances. A RowResult is like the
// row key and the columns all wrapped up in a single interface.
// RowResult#getRow gives you the row key. RowResult also implements
// Map, so you can get to your column results easily.
// Now, for the actual iteration. One way is to use a while loop like so:
RowResult rowResult = scanner.next();
while(rowResult != null) {
// print out the row we found and the columns we were looking for
System.out.println("Found row: " + new String(rowResult.getRow()) + " with value: " +
rowResult.get("myColumnFamily:columnQualifier1".getBytes()));
rowResult = scanner.next();
}
// The other approach is to use a foreach loop. Scanners are iterable!
for (RowResult result : scanner) {
// print out the row we found and the columns we were looking for
System.out.println("Found row: " + new String(result.getRow()) + " with value: " +
result.get("myColumnFamily:columnQualifier1".getBytes()));
}
// Make sure you close your scanners when you are done!
scanner.close();
}
}
Hadoop: java.io.IOException: Incompatible namespaceIDs
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
The complete error message was,
... ERROR org.apache.hadoop.dfs.DataNode: java.io.IOException: Incompatible namespaceIDs in /usr/local/hadoop-datastore/hadoop-hadoop/dfs/data: namenode namespaceID = 308967713; datanode namespaceID = 113030094
at org.apache.hadoop.dfs.DataStorage.doTransition(DataStorage.java:281)
at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:121)
at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:230)
at org.apache.hadoop.dfs.DataNode.(DataNode.java:199)
at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:1202)
at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1146)
at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:1167)
at org.apache.hadoop.dfs.DataNode.main(DataNode.java:1326)
Friday, November 21, 2008
Test #4(2mil): Nov.21.2008
(1) Resource setup: BigRed(150), Ornl(80), Cobalt(80)
(2) Service setup: Status check interval: 60 secs
Job queue scan interval: 60 secs
Job queue size: 100
(3) Input files : 2mil.tar copied to each of cluster in the $TG_CLUSTER_SCRATCH directory.
Referred by arguments with full path to the input files
(4) Output files: staged out to swarm host
(5) Client side setup: input files are located in my desktop(same machine with swarm host).
Scan the directory and find files which contain more than 1 sequence(with grep unix command through Java Runtime). Send the request to swarm with 10 batch per rpc call.
- Total duration of the Submission: 170364307 milliseconds(around 47.3 hours).
- Total number of jobs submitted: 75533
- Total number of files scanned: 536825
(7) Held Jobs: To be added
(8) Open Issues:
Submission time requires to be improved.
Reason:
- Loading 536825 objects which represent the filename takes too much of memory.[Approach]: Use filefilter and load partial list at a time
- Using Java Runtime: Java Runtime requires extra memory to execute system fork. [Approach]: Try checking the number of sequences by means of Java FileInputStream
- Running client and host in the same machine.[Approach]: Try the client in different machine.
Friday, November 14, 2008
Java Runtime Class with VM memory
On linux, Runtime.exec does a fork and execute. That means that you will need double what your current java process is using in virtual memory(real + swap). Therefore, if I specify initial heap as 512M then my total heap must be bigger than 1024M.
I tried to set my java option to -Xmn 256M -Xmx 1024M. Crawler was slowed down quite a bit, but it did not throw IO exception anymore.
Friday, November 7, 2008
Test #3(2mil): Nov.07.2008
(1) Starting time: 04:00 pm
(2) Server Setup:
-BigRed : max 5
-Cobalt : max5
(3) Client Setup:
-Total job max: 20
-Input source: EST Human 2mil
* Note
New Setup:(1) Increased timeout from 20 secs to 6 mins. (axis2.xml)(2) Decreased resource pool size(3) Decreased client jobs
Test #2(2mil) : Nov.07.2008
(2) Server Setup:
-BigRed : max 20
-Cobalt : max20
(3) Client Setup:
-Total job max: 1000
-Input source: EST Human 2mil
* Note
New Setup:
(1) Increased timeout from 20 secs to 6 mins. (axis2.xml)
(2) Decreased resource pool size
(3) Decreased client jobs
Result:
(1) Job submission successfully done. (total 1000 jobs)
References:
First condorjob clusterID: 50537
Test #1(2mil): Nov.07.2008
(2) Server Setup:
-BigRed max 400
-Cobalt max 200
(3) Client Setup:
-Max job submission: 2000000
-Input files source: EST Human 2mil
Note:
(1) Reading directory with 2mil sequence takes less than 20 secs.
(2) Client hung with http timeout
(3)Cobalt started to hold jobs with globus err code 17.
Wednesday, October 15, 2008
[ABSTRACT] Scheduling Large-scale Jobs over the Loosely-Coupled HPC Clusters
Tuesday, September 9, 2008
Installing Hadoop
http://hadoop.apache.org/core/docs/current/quickstart.html
I installed as a root, but I'm not sure if it is necessary.
Step 0. You have to have ssh, rsync, and java VM on your machine. I used,
1)ssh OpenSSH_4.3p2, OpenSSL 0.9.8b 04
2)rsync version 2.6.8
3)java 1.5.0_12
Step 1. Download software from a Hadoop distribution site.
http://hadoop.apache.org/core/releases.html
Step 2. Untar file
Step 3. reset the JAVA_HOME under your_hadoop_dir/conf/hadoop-env.sh
*note: I had JAVA_HOME defined in my .bashrc file. But I had to specify it again in the hadoop-env.sh.
Step 4. now you can just run your standalone operation as it is.
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*
Step 5. For the Pseudo-Distributed Operation, which runs multiple virtual machines in single node so that it imitates real distributed file systems you have to set up the configuration in conf/hadoop-site.xml
The 'name' element is defined by hadoop system. Therefore, you can just use the names in the example from the hadoop page. I change the value of fs.default.name to hdfs://localhost:54310, and the one of mapred.job.tracker to localhost:54311.
Step 6. Check ssh localhost
In my case, I could not connect to localhost, but I could access to my numerical ip address. I changed my /etc/hosts.allow to have ALL:127.0.0.1 and it started to recognize localhost.
If it requires your passphrase:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
If you still prefer use the passphrase, it will cause problem during the starting the daemons.
$ bin/hadoop namenode -format
$ bin/start-all.sh
Now you can check your namenode at http://localhost:50070/
Also your job tracker is available at http://localhost:50030/
Step 8. Test functions
Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input
Run some of the examples provided:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*
View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*
Step 9. Stop the daemon
$ bin/stop-all.sh
Monday, September 8, 2008
Update OGCE:file-manager portlet with new tacc classes
The list of file updated is following.
portlets/comp-file-management/src/main/webapp/jsp/fileBrowser.jsp
portlets/comp-file-management/src/main/webapp/jsp/view.jsp
portlets/comp-file-management/src/main/webapp/css/fileManagement.css
portlets/comp-file-management/src/main/java/edu/tacc/gridport/portlets/interactive/FileManagementConstants.java
portlets/comp-file-management/src/main/java/edu/tacc/gridport/portlets/interactive/FileManagementPortlet.java
portlets/gp-common/src/main/webapp/jsp/fileBrowser.jsp - identical tofileBrowser.jsp above
portlets/gp-common/src/main/webapp/javascript/fileBrowser.js
* editted part is noted as "lukas edit"
Thursday, August 21, 2008
Teragrid access end-to-end test
- condorG (grid universe)
- cap3 apps
- stage output file
(08/22/2008 current)
==============================================================
[bigred] gatekeeper.iu.teragrid.org:2119/jobmanager-loadleveler yes
[steele] tg-steele.purdue.teragrid.org:2119/jobmanager-pbs yes
[sdsc(ds)] dslogin.sdsc.teragrid.org:2119/jobmanager-loadleveler job state write error
[mercury] https://grid-hg.ncsa.teragrid.org:2119/jobmanager-pbs yes
[ornl] tg-login.ornl.teragrid.org:2119/jobmanager-pbs yes
[lonestar] gatekeeper.lonestar.tacc.teragrid.org:2119/jobmanager-lsf job state read error
[cobalt] grid-co.ncsa.teragrid.org:2119/jobmanager-pbs yes
[pople] gram.pople.psc.teragrid.org:2119/jobmanager-pbs cannot login
[sdsc(dtf)]tg-login1.sdsc.teragrid.org:2119/jobmanager-pbs disk quota error
Friday, August 15, 2008
Limit of job submissiont?
http://kb.iu.edu/data/awyt.html
http://kb.iu.edu/data/axal.html
NCSA(cobalt)
http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/SGIAltix/Doc/Jobs.html
qstat -Q
BigRed
llclass
Wednesday, August 13, 2008
Java memory setup
export JAVA_OPTS="-server -Xms512m -Xmx1024m -XX:MaxPermSize=256m"
Otherwise, your java virtual machine will provide 8M of memory which might be too small for running the server.
Wednesday, July 2, 2008
Access to the Job Submission Service with PHP using NuSOAP
Here is the simple example for the getStatus operation,
// Pull in the NuSOAP code
require_once('../lib/nusoap.php');
$client = new soapclient("http://your.service.location:8080/axis2/services/JobSubmissionService?wsdl",true);
$TaskId = array('clusterID' => "403", 'jobID' =>"0");
$taskId = array('TaskId' => $TaskId);
$getStatus = array('taskId'=>$TaskId);
$result = $client->call('getStatus',array('parameters'=> $getStatus),
'http://jobsubmissionservice.ogce.org/xsd','',false,null,'rpc','encoded');
if ($client->fault){
echo 'Fault';
}
print_r($result);
Creating a PHP client of the Web Service(Axis2) using NuSOAP
Step1. pull in the NuSOAP code into your php source code
require_once('../lib/nusoap.php');
Step2. Create now client and set the WSDL as "true".
soapclient("http://service.location.url:8080/axis2/services/YourTargetService?wsdl",true);
Step3. Send the request using call() method
mixed call (string $operation, [mixed $params = array()], [string $namespace = 'http://tempuri.org'], [string $soapAction = ''], [mixed $headers = false], [boolean $rpcParams = null], [string $style = 'rpc'], [string $use = 'encoded'])
Please note that if you don't specify the namespace, your outgoing SOAP message will include SOAP Body with namespace of 'http://tempuri.org'. This will cause fault from the server.
Here my test request,
$result = $client->call('yourMethod',array('parameters'=> $yourType),
'http://yourservice.namespace.org/xsd','',false,null,'rpc','encoded');
Also, the $params which is arguments passed to the service is the element defined in the wsdl:message
Step4. see the result..
I just used print_r($result)...
Wednesday, April 30, 2008
Running PaCE via Teragrid Job submission service
-- input files (fasta format file(s) and .cfs file) with valid URL(s)
-- number of clusters
Here is the sample java code to access TJSS.
====================================================================
import javax.xml.namespace.QName;
import org.apache.axis2.AxisFault;
import org.apache.axis2.addressing.EndpointReference;
import org.apache.axis2.client.Options;
import org.apache.axis2.rpc.client.RPCServiceClient;
import org.ogce.jobsubmissionservice.databean.*;
public class SubmitPaCEJob{
public static void main(String[] args) throws AxisFault {
int option = 0;
String serviceLoc = "http://localhost:8080/axis2/services/JobSubmissionService";
String serviceMethod = "submitJob";
String myproxy_username = null;
String myproxy_passwd = null;
for (int j = 0; j < argval =" args[j];" option ="1;" option =" 2;" option =" 3;"> 0){
if (option ==1)
serviceLoc = argVal;
else if (option ==2)
myproxy_username = argVal;
else if (option ==3)
myproxy_passwd = argVal;
}
}
String [] inputFileString ={
"http://validURLS:8080/tmp/PaCEexample/Brassica_rapa.mRNA.EST.fasta.PaCE",
"http://validURLS:8080/tmp/PaCEexample/Phase.cfg"
};
String rslString =
"(jobtype=mpi)"+
"(count=4)"+
"(hostCount=2)"+
"(maxWallTime=00:15)"+
"(queue=DEBUG)"+
"(arguments= Brassica_rapa.mRNA.EST.fasta.PaCE 33316 Phase.cfg)";
String [] outputFileString = {
"estClust.33316.3.PaCE",
"ContainedESTs.33316.PaCE",
"estClustSize.33316.3.PaCE",
"large_merges.33316.9.PaCE"};
try{
CondorJob cj = new CondorJob();
cj.setExecutable("/N/u/leesangm/BigRed/bin/PaCE_v9");
cj.setTransfer_input_files(inputFileString);
cj.setGrid_resource("gt2 gatekeeper.bigred.iu.teragrid.org/jobmanager-loadleveler");
cj.setTransfer_output_files(outputFileString);
cj.setGlobusrsl(rslString);
cj.setMyProxyHost("myproxy.teragrid.org:7512");
cj.setMyProxyNewProxyLifetime("7200");
cj.setMyProxyCredentialName(myproxy_username);
cj.setMyProxyPassword(myproxy_passwd);
cj.setMyProxyRefreshThreshold("3600");
System.out.println(cj.toString());
RPCServiceClient serviceClient = new RPCServiceClient();
Options options = serviceClient.getOptions();
EndpointReference targetEPR = new EndpointReference(serviceLoc);
options.setTo(targetEPR);
QName query = new QName("http://jobsubmissionservice.ogce.org/xsd", serviceMethod);
Class [] returnTypes = new Class []{JobMessage[].class};
Object[] queryArgs = new Object[] {cj};
Object [] response = serviceClient.invokeBlocking(query,queryArgs,returnTypes);
JobMessage[] result = (JobMessage[])response[0];
System.out.println(result[0].toString());
}catch (Exception e){
e.printStackTrace();
}
}
private static void usage(){
System.out.println("Usage: submit_job -s
"-l
"-p
"==========================================================="+
"\n[Example]:\n"+
"submit_job "+
"-s http://localhost:8080/axis2/services/JobSubmissionService "+
"-l yourusername "+
"-p yourpassword ");
return;
}
}
Tuesday, April 29, 2008
Running PaCE on BigRed
============================================================
executable = /N/u/leesangm/BigRed/bin/PaCE_v9
transfer_executable = false
should_transfer_files = true
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/leesangm/EST/data/Brassica_rapa.mRNA.EST.fasta.PaCE, /home/leesangm/EST/data/Phase.cfg
universe = grid
grid_resource = gt2 gatekeeper.bigred.iu.teragrid.org/jobmanager-loadleveler
transfer_output_files = estClust.33316.3.PaCE
error = PaCE.err.$(Cluster)
err = PaCE.standardErr.$(Cluster)
log = PaCE.log.$(Cluster)
x509userproxy = /tmp/x509up_u500
globusrsl = (jobtype=mpi)(queue=DEBUG)(maxWallTime=00:15)\
(count = 4)\
(hostCount = 2)\
(maxWallTime=00:15)\
(arguments= 'Brassica_rapa.mRNA.EST.fasta.PaCE' '33316' 'Phase.cfg')
queue
Wednesday, April 23, 2008
Sample code: Access Job Submission Service from PHP page
(1)Example of the SubmitJob operation
source code(mpi job)
source code(perl job)
Please note that you have to replace the username and password with your valid teragrid account.
(2)Example of the job management operations
source code(GetStatus)
source code(GetError)
source code(GetLog)
(3)Example of the retrieving output operation
source code(GetOutput)
Access to a Web Service from your PHP page
Since I'm a complete beginner of PHP, I'm not that familliar with the advanced design issues regarding the PHP based application. However, WSO2's WSF/PHP was a good starting point to me. Especially, it could provide us a proof of concept of our service which interacts with PHP pages through the standard WSDL interface.
You can use WSF/PHP both to build a Web Service and to build a client of a Web Service. In my case, I wanted to create a PHP client accessing a Web Service running in the Axis2 container. First, WSF/PHP provides quite simple interface to the PHP clients. For the examples, please refer to the next blog message. Basically, I needed to provide EPR of the service and XML string which are supposed to be in the SOAP body. Besides the ease-to-use feature, I was happy with that WSF/PHP allows the users to control over the SOAP message when it is needed(such as version of SOAP).
I've tried WSF/PHP with PHP5.1.1 on Linux box. Here is the step-by-step guide to the installation.
Step 1.Apache HTTP server install (if you don't have one already)
download PHP5.1.1 and install
download WSO2 WSF/PHP source from the project web site
Step 2. go to the directory of WSF/PHP source code
./configure
make
make install
Step 3. in php.ini (it will be in /usr/local/lib/php.ini if you didn't change the location)
add following lines:
extension=wsf.so
extension=xsl.so
extension_dir="/usr/local/lib/php/extensions/no-debug-non-zts-***".
include_path = "/home/username/php/wso2-wsf-php-src-1.2.1/script"
Step 4. copy sample code included in the code distribution to the Web server's document root. Test http://localhost/samples/
Tuesday, March 11, 2008
Running AMBER-pbsa on BigRed:[4] Parallel pmemd through condorG
(1) loadleveler specific commands
BigRed uses loadleveler as it's batch system. Compared to PBS or LSF, loadlever has some distinguished keyword, such as class instead of queue, and using machine list file. But we didn't have any problem with those during submitting job through condorG. machine list file is generated and used automatically by the job script file created by job manager.
(2) passing arguments
Amber uses arguments for its input/output/refer... files. You have to include those arguments in the globusrsl string. The condor keyword, "arguments" is related to the arguments for mpirun. However, it didn't really work for mpirun arguments too.
(3) specifying number of process and machine
loadleveler requires commands for setting the number of node and process, such as node, tasks_per_node. Also mpirun provides argument -np to specify the number of total processes being used for this job. To passing right value to loadleveler and mpirun, you have to specify those valued in the globusrsl. "count" in globusrsl string will be mapped to the value of -np in mpirun. And "hostCount" in the globusrsl string will be mapped to the value of "node" in the job script generated by job manager. I could not find how to specify the task_per_node. However, somehow based on the values of node and -np, job manager generated task_per_node.
(4) script
This is the condorG script for this job submission.
========================================================================
executable = /N/soft/linux-sles9-ppc64/amber9-ibm-64/exe/pmemd.MPI
transfer_executable = false
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/leesangm/bio/mm_pbsa/amber_min.in, /home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd, /home/leesangm/bio/mm_pbsa/ZINC04273785_com.top, /home/leesangm/bio/mm_pbsa/ZINC04273785.crd, /home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd
universe = grid
grid_resource = gt2 gatekeeper.bigred.iu.teragrid.org/jobmanager-loadleveler
transfer_output_files = min.out.$(Cluster), ZINC04273785.crd
error = amber.err.$(Cluster)
log = amber.log.$(Cluster)
x509userproxy = /tmp/x509up_u500
globusrsl = (jobtype=mpi)\
(count=16)\
(hostCount=4)\
(maxWallTime=00:15)\
(queue=DEBUG)\
(arguments= -O -i amber_min.in -o min.out.$(Cluster) -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd )
queue
=========================================================================
Friday, March 7, 2008
Running AMBER-pbsa on SDSC machines:[1] Serial Job interactively
All of the machines keep the amber installation under the directory with same name structure.
/usr/local/apps/amber9
Therefore, just set the environment and run the same command on each of the account.
leesangm/amber> setenv AMBERHOME /usr/local/apps/amber9
leesangm/amber> set path = ( $path $AMBERHOME/exe )
leesangm/amber> $AMBERHOME/exe/sander -O -i amber_min.in -o min.out -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd
Running AMBER-pbsa on NCSA machines:[1] Serial Job interactively
1. tungsten
Set environment variables,
[leesangm@tund mm_pbsa]$ setenv AMBERHOME /usr/apps/chemistry/AMBER/Amber9/amber9
[leesangm@tund mm_pbsa]$ set path = ( $path $AMBERHOME/exe )
And get the same test package of amber test and run the command,
$AMBERHOME/exe/sander -O -i amber_min.in -o min.out -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd
2. cobalt
Set environment variables,
[leesangm@tund mm_pbsa]$ setenv AMBERHOME /usr/apps/chemistry/amber/amber9/amber9
[leesangm@tund mm_pbsa]$ set path = ( $path $AMBERHOME/exe )
And get the same test package of amber test and run the command,
$AMBERHOME/exe/sander -O -i amber_min.in -o min.out -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd
Running AMBER-pbsa on BigRed:[3] Serial Job submit through CondorG
#
# This is the .soft file.
# It is used to customize your environment by setting up environment
# variables such as PATH and MANPATH.
# To learn what can be in this file, use 'man softenv'.
#
#
@bigred
@amber9
@teragrid-basic
@globus-4.0
@teragrid-dev
+mpich-mx-ibm-64
[Step 2] Create condor script including relevant arguments. I put all the required arguments in the "argument" command line of the script. I could get the result with using both batch system and system fork. Don't forget to transfer back the output file. My test script file is the following:
executable = /N/soft/linux-sles9-ppc64/amber9-ibm-64/exe/sander
arguments = -O -i amber_min.in -o min.out.$(Cluster) -c ZINC04273785_ini.crd
-p ZINC04273785_com.top -r ZINC04273785.crd
-ref ZINC04273785_ini.crd
transfer_executable = false
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/leesangm/bio/mm_pbsa/amber_min.in,
/home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd,
/home/leesangm/bio/mm_pbsa/ZINC04273785_com.top,
/home/leesangm/bio/mm_pbsa/ZINC04273785.crd,
/home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd
universe = grid
grid_resource = gt2 gatekeeper.bigred.iu.teragrid.org/jobmanager-loadleveler
transfer_output_files = min.out.$(Cluster)
error = condorG.err.$(Cluster)
log = condorG.log.$(Cluster)
x509userproxy = /tmp/x509up_u500
queue
Thursday, March 6, 2008
Running AMBER-pbsa on BigRed:[2] Serial-LoadLeveler
Step 1. setup the environment in .soft file
@amber9
+mpich-mx-ibm-64
Step 2. go to the work directory
Step 3. llsubmit serial.job
Running AMBER-pbsa on BigRed:[1] Serial-Interactive
@amber9
+mpich-mx-ibm-64
Step 2. Go to the work directory and run the following command
$AMBERHOME/exe/sander -O -i amber_min.in -o min.out -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd
Step 3. Output file is updated every 2 minutes and it took 20 minutes for me to finish the job completely.
Monday, February 18, 2008
Draft of PolarGrid database table design
use PolarGrid
# possible entry unit of dataset
# CREATE TABLE Expedition{
# ExpeditionID bigint,
# }
# possible entry unit of dataset
# CREATE TABLE Radar{
# RadarID bigint,
# }
#
# DataChunk
#
# DataChunk is a unit of dataset which is identified by
# (1) spatial information
# (2) temporal information
# (3) triplet of radar information (waveform, transmit antenna, receive antenna)
#
CREATE TABLE DataChunk(
DataChunkID BIGINT NOT NULL AUTO_INCREMENT,
UUID VARCHAR(255),
Desctiption VARCHAR(255),
SamplingFrequency int,
SampleAverage int,
NumberOfWaveform int,
DSPMode VARCHAR(255),
SystemDelay int,
StartPoint point,
StopPoint point,
StartUTC double,
StopUTC double,
Microformat MEDIUMBLOB,
CreationTimestamp timestamp,
RevisionTimestamp timestamp,
PGContactID bigint,
PRIMARY KEY(DataChunkID),
INDEX(StartPoint),
INDEX(StopPoint),
INDEX(StartUTC),
INDEX(StopUTC)
);
#
# FileObject:
#
# FileObject represents minimum unit of dataset. In general
# we assume that this object can be instrumental data or
# output visualization file, or revised data file.
# Please note that WaveformName,TXAntennaName, and RXAntennaName
# are from the file name. There is no validation about this name
# based on the antenna/waveform tables.
#
CREATE TABLE FileObject(
FileObjectID bigint NOT NULL AUTO_INCREMENT,
DataChunkID bigint,
UUID VARCHAR(255),
FileName VARCHAR(255),
RecordTimestamp timestamp,
RadarType VARCHAR(255),
DistributionFormat VARCHAR(255),
WaveformName VARCHAR(255),
TXAntennaName VARCHAR(255),
RXAntennaName VARCHAR(255),
OnlineResource VARCHAR(255),
CreationTimestamp timestamp,
RevisionTimestamp timestamp,
PRIMARY KEY (FileObjectID),
INDEX(DataChunkID),
INDEX(WaveformName),
INDEX(TXAntennaName),
INDEX(RXAntennaName),
INDEX(RecordTimestamp),
);
#
# Waveform
#
# This table defines waveform that transmited between antennas. Each
# radar system can have several different waveforms that it
# can transmit. And that transmitted waveform on that transmit antenna
# can be received on any combinations of antenna. Individual waveform
# describes single waveform that is used by datachunk.
#
CREATE TABLE Waveform(
WaveformID bigint NOT NULL AUTO_INCREMENT,
DataChunkID bigint,
WaveformName VARCHAR(255),
StartFrequency int,
StopFrequence int,
PulseWidth double,
ZeroPiMode int,
PRIMARY KEY (WaveformID),
INDEX(DataChunkID),
);
#
# DataAquisition
#
# This table defines how we describe the setup of antenna.
# This information is included for the waveform and data chunk
# AssociationType field specifies either this setup information is
# used for waveform or data chunk. Similarly, AssociationId field
# specified ID which is exact identity of the item.
#
CREATE TABLE DataAcquisition(
DataAcquisitionID bigint NOT NULL AUTO_INCREMENT,
NumberOfSamples int,
SampleDelay int,
BlankingTime int,
AssociationType VARCHAR(255),
AssociationID bigint,
PRIMARY KEY (DataAcquisitionID),
INDEX(AssociationType),
INDEX(AssociationID)
);
#
# Antenna
#
# This table specifies how we describe the antenna.
#
CREATE TABLE Antenna(
AntennaID bigint NOT NULL AUTO_INCREMENT,
AntennaName VARCHAR(255),
AntennaType VARCHAR(255),
Antennuation int,
AssociationType VARCHAR(255),
AssociationID bigint,
PRIMARY KEY (AntennaID),
INDEX(AssociationType),
INDEX(AssociationID)
);
#
# PGContact
#
# This table specifies contact information.
#
#
CREATE TABLE PGContact(
PGContactID bigint NOT NULL AUTO_INCREMENT,
IndividualName VARCHAR(255),
UNIXLoginName VARCHAR(255),
Email VARCHAR(255),
OrganizationName VARCHAR(255),
PositionName VARCHAR(255),
Voice VARCHAR(255),
Facsimile VARCHAR(255),
Address VARCHAR(255),
OnlineResource VARCHAR(255),
HoursOfService VARCHAR(255),
ContactInstruction VARCHAR(255),
PRIMARY KEY (PGContactID),
INDEX(UNIXLoginName),
INDEX(Email)
);
Friday, February 15, 2008
PolarGrid database table (initial draft)
#CREATE TABLE Expedition{
# ExpeditionID bigint,
#}
CREATE TABLE DataChunk{
DataChunkID bigint NOT NULL,
UUID VARCHAR(255),
Description VARCHAR(255),
SamplingFrequency int,
SampleAverage int,
NumberOfWaveform int,
DSPMode VARCHAR(255),
StartPoint point,
StopPoint point,
StartUTC double,
StopUTC double,
PRIMARY KEY ('DataChunkID')
}
CREATE TABLE FileObject{
FileObjectID bigint NOT NULL,
DataChunkID bigint,
UUID VARCHAR(255),
FileName VARCHAR(255),
RadarType VARCHAR(255),
Timestamp timestamp,
FileType VARCHAR(255),
WaveformName VARCHAR(255),
TXAntennaName VARCHAR(255),
RXAntennaName VARCHAR255).
OnLink VARCHAR(255)
PRIMARY KEY ('FileObjectID')
}
CREATE TABLE Waveform{
WaveformID bigint NOT NULL,
DataChunkID bigint,
WaveformName VARCHAR(255),
StartFrequency int,
StopFrequence int,
PulseWidth double,
ZeroPiMode int,
PRIMARY KEY ('WaveformID')
}
CREATE TABLE DataAcquisition{
DataAcquisitionID bingint NOT NULL,
NumberOfSamples int,
SampleDelay int,
BlankingTime int,
AssociationType VARCHAR(255),
AssociationID bigint
PRIMARY KEY ('DataAcquisitionID')
}
CREATE TABLE Antenna{
AntennaID bigint NOT NULL,
AntennaName VARCHAR(255),
AntennaType VARCHAR(255),
Antennuation int,
AssociationType VARCHAR(255),
AssociationID bigint
PRIMARY KEY ('AntennaID')
}
Wednesday, February 13, 2008
[PG] 80 TB of mobile data
Wednesday, February 6, 2008
[PG]mysql GIS [1] Creating Spatial data and using functions
I found useful manual which covers almost everything I was looking for.
http://www.browardphp.com/mysql_manual_en/manual_Spatial_extensions_in_MySQL.html
Thursday, January 17, 2008
Running parallel pw.x on the LoneStar of TACC: on site/condorG/condor-birdbath APIs
bsub -I -n 4 -W 0:05 -q development -o pwscf.out ibrun /home/teragrid/tg459247/vlab/espresso/bin/pw.x < /home/teragrid/tg459247/vlab/__CC5f_7/Pwscf_Input (2) submit through condorG script file Globus RSL parameter is available at http://www.globus.org/toolkit/docs/2.4/gram/gram_rsl_parameters.html
Actual script file is following,
=============================================
executable = /home/teragrid/tg459247/vlab/bin/pw_mpi.x
transfer_executable = false
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/leesangm/catalina/VLAB_Codes/__CC5f_7/008-O-ca--bm3.vdb,/home/leesangm/catalina/VLAB_Codes/__CC5f_7/__cc5_7,/home/leesangm/catalina/VLAB_Codes/__CC5f_7/Mg.vbc3
universe = grid
grid_resource = gt2 tg-login.tacc.teragrid.org/jobmanager-lsf
output = tmpfile.out.$(Cluster)
error = condorG.err.$(Cluster)
log = condorG.log.$(Cluster)
input = /home/leesangm/catalina/VLAB_Codes/__CC5f_7/Pwscf_Input
x509userproxy = /tmp/x509up_u500
globusrsl = (environment=(PATH /usr/bin))\
(jobtype=mpi)\
(count=4)\
(queue=development)\
(maxWallTime=5)
queue
(3) submit through condor birdbath APIs
Almost the same with serial job submission except for setting up the wall clock time. When you generate globusrsl, add
(maxWallTime=yourWallMaxTime)
Friday, January 11, 2008
Job submission to TG machines
Blue: Serial pw.x is ready to run and accessible by Task Executor
Red: pw.x installation failed.
Green: Serial + MPI pw.x is ready to run and accessed from Task Executor
==================================================
machine hostname architecture job sub job manager
-----------------------------------------------------------------------------------------
BigRed login.bigred.iu.teragrid.org ppc64 GT4 loadleveler
*QueenBeelogin-qb.lsu-loni.teragrid.org GT4
NCAR tg-login.frost.ncar.teragrid.org i686 GT4
*Abe login-abe.ncsa.teragrid.org Intel64 GT4 pbs
Cobalt login-co.ncsa.teragrid.org ia64 GT4/GT2 pbs/fork
Mercury login-hg.ncsa.teragrid.org ia64 GT4/GT2 pbs/fork
Tungsten login-w.ncsa.teragrid.org ia32 GT4/GT2 LSF/fork
ORNL tg-login.ornl.teragrid.org i686 GT4/GT2 pbs/fork
*BigBen tg-login.bigben.psc.teragrid.org AMD Opteron GT4/GT2 pbs
*Rachel tg-login.rachel.psc.teragrid.org GT4/GT2 pbs
Purdue tg-login.purdue.teragrid.org GT4/GT2 pbs
*sdsc BG bglogin.sdsc.edu ppc64 GT4/GT2 no job manager??
*sdsc DS dslogin.sdsc.edu 002628DA4C00 GT4/GT2 loadleveler/fork
sdsc IBM tg-login.sdsc.teragrid.org ia64 GT4/GT2 pbs/fork
lonestar tg-login.lonestar.tacc.teragrid.org ia64 GT4/GT2 LSF/fork
maverik tg-viz-login.tacc.teragrid.org sun4u GT4/GT2 sge/fork
*ranger tg-login.ranger.tacc.teragrid.org GT4 sge/fork
IA-VIS tg-viz-login.uc.teragrid.org i686 GT4/GT2 pbs
IS-64 tg-login.uc.teragrid.org ia64 GT4/GT2 pbs/fork
=================================================
*QueenBee : could not login
*Abe doesn't support single-sign-on
*BigBen: could not login
*Abe: could not login
*Rachel: could not login
*Purdue: could not login
*sdsc BlueGene: unknown job manager?
*sdsc DataStar: unusual architecture?
*ranger: could not login
Compiling espresso in the TG machines
Here is the instruction of installation serial run espresso. README.install was very useful.
* Cobalt, Mercury, and Tungsten NCSA
step 1. copy espressoXXX.tar
step 2. On the espresso directory, set the environment variable to select architecture.
setenv BIN_DIR /home/ac/quakesim/vlab/espresso/bin
setenv PSEUDO_DIR /home/ac/quakesim/vlab/espresso/pseudo
setenv TMP_DIR /home/ac/quakesim/vlab/espresso/tmp
setenv ARCH linux64
setenv PARA_PREFIX
setenv PARA_POSTFIX
note: for serial process, PARA_PREFIX MUST be left empty. For parallel process,
setenv PARA_PREFIX "mpirun -np 2"
setenv PARA_POSTFIX
step 2.5 make sure you have tmp, pseudo, bin directory under your espresso directory
step 3. ./configure
step 4. make all
* Lonestar parallel pw.x, ph.x
step 1. setenv PARA_PREFIX "mpirun"
step 2. setenv ARCH linux64
step 3. ./configure
step 4. make all
Submit job to pbs[1]: on site with command line
#!/bin/sh
/bin/hostname
(1) submit job test to the pbs queue.
qsub -o test.out -e test.err test
(2) check result file
*Useful guide
http://www.teragrid.org/userinfo/jobs/pbs.php
Friday, January 4, 2008
Submitting a job to LSF job queue [3]: through CondorG with birdbath APIs
* Attributes In, Out, and Err are used for specifying Standard Input, output, and error redirections. Therefore if your executables uses standard input/output and redirects them to files, those should be specified with these attributes.
* In this case, pw.x generates multiple files besides stdout output files. Attribute
TransferOutput specifies files that should be transfered after the process is done.
* Attribute GlobusRSL is equivalant to the keyword globusrsl in the script file for the command line submission by condor_submit.
* Many many thanks to Marlon for helping me out!!
Actual ClassAdStructAttr[] is following:
------------------------------------------------------------------------------------------------------------------------------
ClassAdStructAttr[] extraAttributes =
{
new ClassAdStructAttr("GridResource", ClassAdAttrType.value3, gridResourceVal),
new ClassAdStructAttr("TransferExecutable",ClassAdAttrType.value4,"FALSE"),
new ClassAdStructAttr("Out", ClassAdAttrType.value3, tmpDir+"/"+"pwscf-"+clusterId+".out"),
new ClassAdStructAttr("UserLog",ClassAdAttrType.value3, tmpDir+"/"+"pwscf-"+clusterId+".log"),
new ClassAdStructAttr("Err",ClassAdAttrType.value3, tmpDir+"/"+"pwscf-"+clusterId+".err"),
new ClassAdStructAttr("In",ClassAdAttrType.value3, workDir+"/"+"Pwscf_Input"),
new ClassAdStructAttr("ShouldTransferFiles", ClassAdAttrType.value2,"\"YES\""),
new ClassAdStructAttr("WhenToTransferOutput", ClassAdAttrType.value2,"\"ON_EXIT\""),
new ClassAdStructAttr("StreamOut", ClassAdAttrType.value4, "TRUE"),
new ClassAdStructAttr("StreamErr",ClassAdAttrType.value4,"TRUE"),
new ClassAdStructAttr("TransferOutput",ClassAdAttrType.value2,
"\"pwscf.pot, pwscf.rho, pwscf.wfc, pwscf.md, pwscf.oldrho, pwscf.save, pwscf.update\""),
new ClassAdStructAttr("TransferOutputRemaps",ClassAdAttrType.value2,
"\"pwscf.pot="+tmpDir+"/"+"pwscf-"+clusterId+
".pot; pwscf.rho="+tmpDir+"/"+"pwscf-"+clusterId+
".rho;pwscf.wfc="+tmpDir+"/"+"pwscf-"+clusterId+
".wfc; pwscf.md="+tmpDir+"/"+"pwscf-"+clusterId+
".md; pwscf.oldrho="+tmpDir+"/"+"pwscf-"+clusterId+
".oldrho; pwscf.save="+tmpDir+"/"+"pwscf-"+clusterId+
".save; pwscf.update="+tmpDir+"/"+"pwscf-"+clusterId+".update\""),
new ClassAdStructAttr("GlobusRSL", ClassAdAttrType.value2,
"\"(queue=development)(environment=(PATH /usr/bin))(jobtype=single)(count=1)\""),
new ClassAdStructAttr("x509userproxy",ClassAdAttrType.value3,proxyLocation),
};
------------------------------------------------------------------------------------------------------------------------------
Pwscf output files?
pwscf.pot, pwscf.rho, pwscf.wfc
unless I reuse the tmp directory.
However, in lonestar, it genrates,
pwscf.md pwscf.oldrho pwscf.pot pwscf.rho pwscf.save pwscf.update pwscf.wfc
For sure, I transfer all of the possible files from the remote machine.
Submitting a job to LSF job queue [2]: through CondorG with condor_submit
---------------------------------------------------------------------------------------------------------------------------------
executable = /home/teragrid/tg459282/vlab/pw.x
transfer_executable = false
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/leesangm/catalina/VLAB_Codes/__CC5f_7/008-O-ca--bm3.vdb,/home/leesangm/catalina/VLAB_Codes/__CC5f_7/__cc5_7,/home/leesangm/catalina/VLAB_Codes/__CC5f_7/Mg.vbc3
universe = grid
grid_resource = gt2 tg-login.tacc.teragrid.org/jobmanager-lsf
output = tmpfile.out.$(Cluster)
error = condorG.err.$(Cluster)
log = condorG.log.$(Cluster)
input = /home/leesangm/catalina/VLAB_Codes/__CC5f_7/Pwscf_Input
globusrsl = (queue=development)\
(environment=(PATH /usr/bin))\
(jobtype=single)\
(count=1)
queue
---------------------------------------------------------------------------------------------------------------------------------
This script file is almost the same with normal condor submit script except for the globusrsl keyword. This is a simple case for the serial job. For the parallel jobs, this should be modified.
Then submit condor job,
condor_submit script_file_name
Submitting a job to LSF job queue [1] : On the Cluster
bsub: submission jobs
bjobs: display information about the jobs
bkills: send signal to kill
For more commands,
http://its.unc.edu/dci/dci_components/lsf/lsf_commands.htm
* Useful options of bsub command
-q : name of the queue
-n: desired number of processors
-W: Walltime limit in batch jobs -W[hours]:[minutes]
-i : input file
-o : output file
-e: error file
Example lsf submit of pw.x in lonestar
bsub -q development -n 1 -W 15 -i "Pwscf_Input" -o "myout.out" ../pw.x
Thursday, January 3, 2008
Building a Client of the Task Executor
If the service is running on localhost, WSDL file is located at,
http://localhost:8080/task-executor/services/TaskExecutor?wsdl
With this WSDL file, we can generate java classes with WSDL2Java included in the axis package.
java org.apache.axis.wsdl.WSDL2Java http://localhost:8080/task-executor/services/TaskExecutor?wsdl
Then compile/jar the java code.
Required jar files to run WSDL2Java are following:
- axis-1.4.jar
- activation-1.1.jar
- commons-discovery-0.2.jar
- saaj.jar
- jaxrpc.jar
- mail-1.4.jar
- wsdl4j-1.5.1.jar