Last week, I found three additional teragrid sites working with Swarm. Please note that I have experinece some problem with ANL while I was stage out files before. And Marlon informed me that Pople doesn't support community certificate.
NCSA-Abe
(1) grid_resource: gt2 grid-abe.ncsa.teragrid.org:2119/jobmanager-pbs
(2) globusrsl: (jobtype=single)(count=1)
(3) Test cap3: success
Pople
(1) grid_resource: gt2 gram.pople.psc.teragrid.org:2119/jobmanager-pbs
(2) globusrsl: (jobtype=single)(count=1)
(3) Test cap3: success
ANL
(1) grid_resource: gt2 tg-grid.uc.teragrid.org:2119/jobmanager-pbs
(2) globusrsl: (jobtype=single)(count=1)
(3) Test cap3: success
Wednesday, March 18, 2009
Friday, March 6, 2009
Being a nice client of Inca service: monitoring Grid Resource
If you consider a monitoring mechanism for the TeraGrid resources, the Inca project from SDSC provides very useful information. (http://inca.sdsc.edu/) With access to Inca service, you can easily utilize monitoring information with light-weight software component. Thank you for hosting the service and very helpful supports from the Inca team! It was very pleasant experience.
If you are using the TeraGrid resources, you might be interested in the Inca's real time monitoring testbed (http://www.teragridforum.org/mediawiki/index.php?title=Inca_Real-Time_Monitoring_Testbed). Inca server runs the testing described in above document. Those information is available through RESTful URLs for the clients. The information is encoded either XML or HTML.
However, to use access the result, you have to create a query. There are pre-created query available for the particular projects including the TeraGrid Resource Monitoring page in the TeraGrid user portal. This query includes result of monitoring status of remote login to the TeraGrid cluster, pre-ws gatekeeper, and ws gatekeeper server. If you want to see the most recent result of the testing encoded in XML:
http://inca.teragrid.org/inca/XML/kit-status-v1/portal
Or, as a HTML page,
http://inca.teragrid.org/inca/HTML/kit-status-v1/portal
For more detail information,
http://inca.teragrid.org/inca/XML/kit-status-v1/portal/[clusterID]/prews-gram-batch
For example,
http://inca.teragrid.org/inca/XML/kit-status-v1/portal/sdsc-ia64/prews-gram-batch
You will see the more detail information about the SDSC IA64 clusters such as testing intervals.
To create a query for your own purpose, you have to contact the administrator of the Inca service. Inca provides nice user interface for the administrator to create new query. After you create new query, you can access your result through:
http://inca.teragrid.org/inca/[XML|HTML]/kit-status-v1//
This output can be easily integrated to your software though light-weight way such as HTTP client. After downloding the information, we used conventional Java DOM implementation. With customized query, the response size seems to be reasonable to run with DOM implementation.
Please note that if you access Inca service hosted by SDSC, the testing interval is not changable.
Here is my test code:
import java.net.*;
import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
public class Test{
public static void main(String[] args){
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document dom = db.parse("http://inca.teragrid.org/inca/XML/kit-status-v1/portal_summary");
Element docEle = dom.getDocumentElement();
NodeList nl = docEle.getElementsByTagName("row");
System.out.println("There are "+ nl.getLength()+ " elements.");
if(nl != null && nl.getLength() > 0) {
for(int i = 0 ; i < nl.getLength();i++) {
//get the employee element
Element el = (Element)nl.item(i);
NodeList sub1 = el.getElementsByTagName("reportSummary");
NodeList sub2 = ((Element)sub1.item(0)).getElementsByTagName("hostname");
String hostname = ((Element)sub2.item(0)).getFirstChild().getNodeValue();
System.out.println("HostName "+i+":"+hostname);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
If you are using the TeraGrid resources, you might be interested in the Inca's real time monitoring testbed (http://www.teragridforum.org/mediawiki/index.php?title=Inca_Real-Time_Monitoring_Testbed). Inca server runs the testing described in above document. Those information is available through RESTful URLs for the clients. The information is encoded either XML or HTML.
However, to use access the result, you have to create a query. There are pre-created query available for the particular projects including the TeraGrid Resource Monitoring page in the TeraGrid user portal. This query includes result of monitoring status of remote login to the TeraGrid cluster, pre-ws gatekeeper, and ws gatekeeper server. If you want to see the most recent result of the testing encoded in XML:
http://inca.teragrid.org/inca/XML/kit-status-v1/portal
Or, as a HTML page,
http://inca.teragrid.org/inca/HTML/kit-status-v1/portal
For more detail information,
http://inca.teragrid.org/inca/XML/kit-status-v1/portal/[clusterID]/prews-gram-batch
For example,
http://inca.teragrid.org/inca/XML/kit-status-v1/portal/sdsc-ia64/prews-gram-batch
You will see the more detail information about the SDSC IA64 clusters such as testing intervals.
To create a query for your own purpose, you have to contact the administrator of the Inca service. Inca provides nice user interface for the administrator to create new query. After you create new query, you can access your result through:
http://inca.teragrid.org/inca/[XML|HTML]/kit-status-v1/
This output can be easily integrated to your software though light-weight way such as HTTP client. After downloding the information, we used conventional Java DOM implementation. With customized query, the response size seems to be reasonable to run with DOM implementation.
Please note that if you access Inca service hosted by SDSC, the testing interval is not changable.
Here is my test code:
import java.net.*;
import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
public class Test{
public static void main(String[] args){
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document dom = db.parse("http://inca.teragrid.org/inca/XML/kit-status-v1/portal_summary");
Element docEle = dom.getDocumentElement();
NodeList nl = docEle.getElementsByTagName("row");
System.out.println("There are "+ nl.getLength()+ "
if(nl != null && nl.getLength() > 0) {
for(int i = 0 ; i < nl.getLength();i++) {
//get the employee element
Element el = (Element)nl.item(i);
NodeList sub1 = el.getElementsByTagName("reportSummary");
NodeList sub2 = ((Element)sub1.item(0)).getElementsByTagName("hostname");
String hostname = ((Element)sub2.item(0)).getFirstChild().getNodeValue();
System.out.println("HostName "+i+":"+hostname);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Friday, February 20, 2009
Store and Register your EC2 machine image
To retrieve your machine image later, your machine image should be stored in S3. To do that, there are multiple steps required.
Step 1) Copy your X.509 certificate and private key to your instance.
scp -i id_rsa-gsg-keypair pk-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem cert-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem root@domU-12-34-31-00-00-05.compute-1.amazonaws.com:/mnt
This is done on your local desktop. Than, your security files will be stored in /mnt directory of your instance.
Step 2) Now you have to bundle your instance with the commmand:
ec2-bundle-vol -d /mnt -k /mnt/pk-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem -c /mnt/cert-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem -u 495219933132 -r i386 -p sampleimage
This takes some time. (several minutes) If it was successful, you can see the bundled files under your /mnt directory. They are bunch of files with names of smapleimage*.
Step 3) Upload bundled files to S3 from your instance
ec2-upload-bundle -b -m /mnt/sampleimage.manifest.xml -a -s
I had some problem with this process. Although I have S3 access, this command complained that I don't have S3 access. Interestingly, I created a bucket in S3 manually, and it worked.
I use the firefox extension that recommanded in a EC2 forum.
http://www.s3fox.net/
It is quite useful. If I want to sync part of my disk, it would be a great tool.
Step 4) Then, logout your instance, and on your desktop,
ec2-register/sampleimage.manifest.xml
Now you can see your machine image from,
ec2-describe-images -o self
Step 5) That's it! whenever you want to reload your machine image,
ec2-run-instances ami-5bae4b32
for the longer version of tutorial,
http://docs.amazonwebservices.com/AWSEC2/latest/GettingStartedGuide/
Step 1) Copy your X.509 certificate and private key to your instance.
scp -i id_rsa-gsg-keypair pk-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem cert-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem root@domU-12-34-31-00-00-05.compute-1.amazonaws.com:/mnt
This is done on your local desktop. Than, your security files will be stored in /mnt directory of your instance.
Step 2) Now you have to bundle your instance with the commmand:
ec2-bundle-vol -d /mnt -k /mnt/pk-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem -c /mnt/cert-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem -u 495219933132 -r i386 -p sampleimage
This takes some time. (several minutes) If it was successful, you can see the bundled files under your /mnt directory. They are bunch of files with names of smapleimage*.
Step 3) Upload bundled files to S3 from your instance
ec2-upload-bundle -b
I had some problem with this process. Although I have S3 access, this command complained that I don't have S3 access. Interestingly, I created a bucket in S3 manually, and it worked.
I use the firefox extension that recommanded in a EC2 forum.
http://www.s3fox.net/
It is quite useful. If I want to sync part of my disk, it would be a great tool.
Step 4) Then, logout your instance, and on your desktop,
ec2-register
Now you can see your machine image from,
ec2-describe-images -o self
Step 5) That's it! whenever you want to reload your machine image,
ec2-run-instances ami-5bae4b32
for the longer version of tutorial,
http://docs.amazonwebservices.com/AWSEC2/latest/GettingStartedGuide/
Tuesday, February 17, 2009
Getting started with EC2
To get start with EC2, you have to create an account in EC2 site.
After you successfully create your EC2 account from the amazon site, there are several steps to actually use EC2 instances.
Step1) Create X.509 certificate
Login to you Amazon account and AWS access identifiers page. In the X.509 certificate section, click Create New. Also Download your Certificate and store ot into your desktop safely.
mkdir .ec2
cd .ec2
mv ~/Desktop/*.pem .
Step2) Download EC2 command line tools. http://developer.amazonwebservices.com/connect/entry.jspa?externalID=351&categoryID=88
And unzip files under the ec2 directory.
mv ~/Desktop/ec2-api-tools.zip .
unzip ec2-api-tools.zip
Step3) Modify your shell script file. In my .bashrc file, I added,
export EC2_HOME=~/.ec2
export PATH=$PATH:$EC2_HOME/bin
export EC2_PRIVATE_KEY=pk-YOURKEYNAME.pem
export EC2_CERT=cert-YOURKEYNAME.pem
Step 4) Generate key pair to ssh to your instance
cd .ec2
ec2-add-keypair pstam-keypair > id_rsa-pstam-keypair
Step 5) Now select your new instance
ec2-describe-images -o amazon
Step 6) Create an instance
ec2-run-instances ami-6138dd08 -k pstam-keypair
Step 7) Check the description of the instance
Loading an instance takes some time. If you see the valid URL for the instance in the description of the instance, it's ready to access.
To check the description,
ec2-describe-instances
Step 8) Open the ports for ssh and http connection
You have to open some of the port you will allow to outside of the instances.
ec2-authorize default -p 22
ec2-authorize default -p 80
After this you can ssh to your instance.
Step 9) SSH to your instance
ssh -i id_rsa-pstam-keypair root@ec2-XXX-XXX-XXX-XXX.z-2.compute-1.amazonaws.com
Step 10) Getting a static IP address
For your application, sometimes you need a static IP address for your instance.
First you have to assign a static IP and tie the address to your instance.
ec2-allocate-address
ec2-associate-address -i i-yourinstance XXX.XXX.XXX.XXX
Step 11) Now you can SSH to your instance like to other remote machine.
ssh root@XXX.XXX.XXX.XXX
Step 12) Terminate Instance
ec2-terminate-instances i-yourinstance
After you successfully create your EC2 account from the amazon site, there are several steps to actually use EC2 instances.
Step1) Create X.509 certificate
Login to you Amazon account and AWS access identifiers page. In the X.509 certificate section, click Create New. Also Download your Certificate and store ot into your desktop safely.
mkdir .ec2
cd .ec2
mv ~/Desktop/*.pem .
Step2) Download EC2 command line tools. http://developer.amazonwebservices.com/connect/entry.jspa?externalID=351&categoryID=88
And unzip files under the ec2 directory.
mv ~/Desktop/ec2-api-tools.zip .
unzip ec2-api-tools.zip
Step3) Modify your shell script file. In my .bashrc file, I added,
export EC2_HOME=~/.ec2
export PATH=$PATH:$EC2_HOME/bin
export EC2_PRIVATE_KEY=pk-YOURKEYNAME.pem
export EC2_CERT=cert-YOURKEYNAME.pem
Step 4) Generate key pair to ssh to your instance
cd .ec2
ec2-add-keypair pstam-keypair > id_rsa-pstam-keypair
Step 5) Now select your new instance
ec2-describe-images -o amazon
Step 6) Create an instance
ec2-run-instances ami-6138dd08 -k pstam-keypair
Step 7) Check the description of the instance
Loading an instance takes some time. If you see the valid URL for the instance in the description of the instance, it's ready to access.
To check the description,
ec2-describe-instances
Step 8) Open the ports for ssh and http connection
You have to open some of the port you will allow to outside of the instances.
ec2-authorize default -p 22
ec2-authorize default -p 80
After this you can ssh to your instance.
Step 9) SSH to your instance
ssh -i id_rsa-pstam-keypair root@ec2-XXX-XXX-XXX-XXX.z-2.compute-1.amazonaws.com
Step 10) Getting a static IP address
For your application, sometimes you need a static IP address for your instance.
First you have to assign a static IP and tie the address to your instance.
ec2-allocate-address
ec2-associate-address -i i-yourinstance XXX.XXX.XXX.XXX
Step 11) Now you can SSH to your instance like to other remote machine.
ssh root@XXX.XXX.XXX.XXX
Step 12) Terminate Instance
ec2-terminate-instances i-yourinstance
Wednesday, February 11, 2009
Recent Change in TG
cobalt requires (queue=) in rsl string. It was optional previously. Now if you don't put either
standard or industrial, globus will through exception,
Globus error 37
Also condor will through exception,
Exec format error Code 6 Subcode 8
standard or industrial, globus will through exception,
Globus error 37
Also condor will through exception,
Exec format error Code 6 Subcode 8
Submitting job to Ranger
Gatekeeper : gatekeeper.ranger.tacc.teragrid.org:2119/jobmanager-sge
Batch queue system: SGE
Check the queue status:showq -u
(Test 0) try locally installed executables with datafiles stored in scratch directory
./cap3 /scratch/00891/tg459282/2mil/cluster1807.fsa -p 95 -o 49 -t 100
(Test 1) try simple job submission through globus toolkit.
globusrun -o -r gatekeeper.ranger.tacc.teragrid.org:2119/jobmanager-sge '&(executable=/bin/ls)'
It works with swarm!
Batch queue system: SGE
Check the queue status:showq -u
(Test 0) try locally installed executables with datafiles stored in scratch directory
./cap3 /scratch/00891/tg459282/2mil/cluster1807.fsa -p 95 -o 49 -t 100
(Test 1) try simple job submission through globus toolkit.
globusrun -o -r gatekeeper.ranger.tacc.teragrid.org:2119/jobmanager-sge '&(executable=/bin/ls)'
It works with swarm!
Monday, February 9, 2009
Submit and Manage Vanilla Condor Job to Swarm
To submit Vanilla condor job(s) to Swarm, first you have to install swarm and configure the service to process vanilla condor jobs. the source code is available from
svn co https://ogce.svn.sourceforge.net/svnroot/ogce/ogce-services-incubator
For the general information to install swarm, please refer,
http://decemberstorm.blogspot.com/2008/11/swarm-sever-installation-guide.html
Now, configure your swarm for the Vanilla condor pool.
Step 1. modify swarm/TGResource.properties, to process only vanilla condor pool. This is my properties:
matchmaking=FirstAvailableResource
taskQueueMaxSize = 100
taskQueueScanningInterval = 3000
condorCluster=true
teragridHPC=false
eucalyptus=false
condorCluster_numberOfToken=10
condorRefreshInterval = 2000
Step 2. start swarm server.
Upload swarm/build/jobsub.aar to your axis2 installation.
Step 3. Client example of vanilla condor job is stored at,
swarm/src/core/org/ogce/jobsub/clients/SubmitVanillaJob.java
There are three example methods:
submitJobWithStandardOutput(): job with standard output wihout input files
submitJobWithOutputTransfer(): job with standard output and output files
submitJobWithInputOutputTransfer(): job with standard output, output files, and local input files
Step 4. Compile example file
[swarm]ant clean
[swarm]ant
Step 5. Run example file
[swarm]./run.sh submit_vanilla_job
Step 6. Check the status
[swarm]cd ClientKit
[ClientKit]./swarm status http://serverIP:8080/axis2/services/Swarm your_ticket_ID
Step 7. Get the location of output file
[ClientKit]./swarm outputURL http://serverIP:8080/axis2/services/Swarm your_ticket_ID
This will return the URL of the output file.
svn co https://ogce.svn.sourceforge.net/svnroot/ogce/ogce-services-incubator
For the general information to install swarm, please refer,
http://decemberstorm.blogspot.com/2008/11/swarm-sever-installation-guide.html
Now, configure your swarm for the Vanilla condor pool.
Step 1. modify swarm/TGResource.properties, to process only vanilla condor pool. This is my properties:
matchmaking=FirstAvailableResource
taskQueueMaxSize = 100
taskQueueScanningInterval = 3000
condorCluster=true
teragridHPC=false
eucalyptus=false
condorCluster_numberOfToken=10
condorRefreshInterval = 2000
Step 2. start swarm server.
Upload swarm/build/jobsub.aar to your axis2 installation.
Step 3. Client example of vanilla condor job is stored at,
swarm/src/core/org/ogce/jobsub/clients/SubmitVanillaJob.java
There are three example methods:
submitJobWithStandardOutput(): job with standard output wihout input files
submitJobWithOutputTransfer(): job with standard output and output files
submitJobWithInputOutputTransfer(): job with standard output, output files, and local input files
Step 4. Compile example file
[swarm]ant clean
[swarm]ant
Step 5. Run example file
[swarm]./run.sh submit_vanilla_job
Step 6. Check the status
[swarm]cd ClientKit
[ClientKit]./swarm status http://serverIP:8080/axis2/services/Swarm your_ticket_ID
Step 7. Get the location of output file
[ClientKit]./swarm outputURL http://serverIP:8080/axis2/services/Swarm your_ticket_ID
This will return the URL of the output file.
Subscribe to:
Posts (Atom)