Sangmi Lee Pallickara

Friday, November 7, 2008

Test #1(2mil): Nov.07.2008

(1) Starting time: 12:54 pm

(2) Server Setup:
-BigRed max 400
-Cobalt max 200

(3) Client Setup:
-Max job submission: 2000000
-Input files source: EST Human 2mil

Note:
(1) Reading directory with 2mil sequence takes less than 20 secs.
(2) Client hung with http timeout
(3)Cobalt started to hold jobs with globus err code 17.

Wednesday, October 15, 2008

[ABSTRACT] Scheduling Large-scale Jobs over the Loosely-Coupled HPC Clusters

Compute-intensive scientific applications are heavily reliant on the available quantity of computing resources. The Grid paradigm provides a large scale computing environment for scientific users. However, conventional Grid job submission tools do not provide a high-level job scheduling environment for these users across multiple institutions. For extremely large number of jobs, a more scalable job scheduling framework that can leverage highly distributed clusters and supercomputers is required. In this presentation, we propose a high-level job scheduling Web service framework, Swarm. Swarm is developed for scientific applications that must submit massive number of high-throughput jobs or workflows to highly distributed computing clusters. The Swarm service itself is designed to be extensible, lightweight, and easily installable on a desktop or small server. As a Web service, derivative services based on Swarm can be straightforwardly integrated with Web portals and science gateways. In this talk, we present the motivation for this research, the architecture of the Swarm framework, and a performance evaluation of the system prototype.

Tuesday, September 9, 2008

Installing Hadoop

This is my note written while I was following the installation documentation from hadoop's webpage.
http://hadoop.apache.org/core/docs/current/quickstart.html
I installed as a root, but I'm not sure if it is necessary.

Step 0. You have to have ssh, rsync, and java VM on your machine. I used,
1)ssh OpenSSH_4.3p2, OpenSSL 0.9.8b 04
2)rsync version 2.6.8
3)java 1.5.0_12

Step 1. Download software from a Hadoop distribution site.
http://hadoop.apache.org/core/releases.html

Step 2. Untar file

Step 3. reset the JAVA_HOME under your_hadoop_dir/conf/hadoop-env.sh
*note: I had JAVA_HOME defined in my .bashrc file. But I had to specify it again in the hadoop-env.sh.

Step 4. now you can just run your standalone operation as it is.
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*

Step 5. For the Pseudo-Distributed Operation, which runs multiple virtual machines in single node so that it imitates real distributed file systems you have to set up the configuration in conf/hadoop-site.xml
The 'name' element is defined by hadoop system. Therefore, you can just use the names in the example from the hadoop page. I change the value of fs.default.name to hdfs://localhost:54310, and the one of mapred.job.tracker to localhost:54311.

Step 6. Check ssh localhost
In my case, I could not connect to localhost, but I could access to my numerical ip address. I changed my /etc/hosts.allow to have ALL:127.0.0.1 and it started to recognize localhost.
If it requires your passphrase:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

If you still prefer use the passphrase, it will cause problem during the starting the daemons.

Step 7. Formatting namenode and starting the daemon

$ bin/hadoop namenode -format

$ bin/start-all.sh

Now you can check your namenode at http://localhost:50070/

Also your job tracker is available at http://localhost:50030/

Step 8. Test functions

Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input

Run some of the examples provided:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*

View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*

Step 9. Stop the daemon

$ bin/stop-all.sh

Monday, September 8, 2008

Update OGCE:file-manager portlet with new tacc classes

This update adds sorted view to the directory. JSP files and two java classes are modified: FileManagerConstants.java and FileManagerPortlet.java
The list of file updated is following.
portlets/comp-file-management/src/main/webapp/jsp/fileBrowser.jsp
portlets/comp-file-management/src/main/webapp/jsp/view.jsp
portlets/comp-file-management/src/main/webapp/css/fileManagement.css
portlets/comp-file-management/src/main/java/edu/tacc/gridport/portlets/interactive/FileManagementConstants.java
portlets/comp-file-management/src/main/java/edu/tacc/gridport/portlets/interactive/FileManagementPortlet.java
portlets/gp-common/src/main/webapp/jsp/fileBrowser.jsp - identical tofileBrowser.jsp above
portlets/gp-common/src/main/webapp/javascript/fileBrowser.js

* editted part is noted as "lukas edit"

Thursday, August 21, 2008

Teragrid access end-to-end test

Teragrid sistes end-to-end test
- condorG (grid universe)
- cap3 apps
- stage output file
(08/22/2008 current)
==============================================================
[bigred] gatekeeper.iu.teragrid.org:2119/jobmanager-loadleveler yes
[steele] tg-steele.purdue.teragrid.org:2119/jobmanager-pbs yes
[sdsc(ds)] dslogin.sdsc.teragrid.org:2119/jobmanager-loadleveler job state write error
[mercury] https://grid-hg.ncsa.teragrid.org:2119/jobmanager-pbs yes
[ornl] tg-login.ornl.teragrid.org:2119/jobmanager-pbs yes
[lonestar] gatekeeper.lonestar.tacc.teragrid.org:2119/jobmanager-lsf job state read error
[cobalt] grid-co.ncsa.teragrid.org:2119/jobmanager-pbs yes
[pople] gram.pople.psc.teragrid.org:2119/jobmanager-pbs cannot login
[sdsc(dtf)]tg-login1.sdsc.teragrid.org:2119/jobmanager-pbs disk quota error

Friday, August 15, 2008

Limit of job submissiont?

I'm gathering the information about the limit of job submissions.
http://kb.iu.edu/data/awyt.html
http://kb.iu.edu/data/axal.html
NCSA(cobalt)
http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/SGIAltix/Doc/Jobs.html
qstat -Q
BigRed
llclass

Wednesday, August 13, 2008

Java memory setup

To run Job submitssion service or Swarm service, please note that your environment variable should be set up as,
export JAVA_OPTS="-server -Xms512m -Xmx1024m -XX:MaxPermSize=256m"

Otherwise, your java virtual machine will provide 8M of memory which might be too small for running the server.