Sangmi Lee Pallickara

Friday, November 14, 2008

Java Runtime Class with VM memory

For swarm service, I worte a client kit to crawl a directory and find cluster files which need to be assembled. To access the files and count the number of gene sequences, I used Java Runtime Process class. With the 2 million sequences, I could get around 600000 clustered files. Those clustered files were visited by the crawler program. Especially when I tried to create DOM object to interact with Web service, this crawler started to throw IO exception-memory allocation error. My Java option was -Xmn 512M -Xmx 1024M.

On linux, Runtime.exec does a fork and execute. That means that you will need double what your current java process is using in virtual memory(real + swap). Therefore, if I specify initial heap as 512M then my total heap must be bigger than 1024M.

I tried to set my java option to -Xmn 256M -Xmx 1024M. Crawler was slowed down quite a bit, but it did not throw IO exception anymore.

Friday, November 7, 2008

Test #3(2mil): Nov.07.2008

(1) Starting time: 04:00 pm

(2) Server Setup:

-BigRed : max 5

-Cobalt : max5

(3) Client Setup:

-Total job max: 20

-Input source: EST Human 2mil

* Note

New Setup:(1) Increased timeout from 20 secs to 6 mins. (axis2.xml)(2) Decreased resource pool size(3) Decreased client jobs

Test #2(2mil) : Nov.07.2008

(1) Starting time: 02:05 pm

(2) Server Setup:
-BigRed : max 20
-Cobalt : max20

(3) Client Setup:
-Total job max: 1000
-Input source: EST Human 2mil

* Note
New Setup:
(1) Increased timeout from 20 secs to 6 mins. (axis2.xml)
(2) Decreased resource pool size
(3) Decreased client jobs

Result:
(1) Job submission successfully done. (total 1000 jobs)

References:
First condorjob clusterID: 50537

Test #1(2mil): Nov.07.2008

(1) Starting time: 12:54 pm

(2) Server Setup:
-BigRed max 400
-Cobalt max 200

(3) Client Setup:
-Max job submission: 2000000
-Input files source: EST Human 2mil

Note:
(1) Reading directory with 2mil sequence takes less than 20 secs.
(2) Client hung with http timeout
(3)Cobalt started to hold jobs with globus err code 17.

Wednesday, October 15, 2008

[ABSTRACT] Scheduling Large-scale Jobs over the Loosely-Coupled HPC Clusters

Compute-intensive scientific applications are heavily reliant on the available quantity of computing resources. The Grid paradigm provides a large scale computing environment for scientific users. However, conventional Grid job submission tools do not provide a high-level job scheduling environment for these users across multiple institutions. For extremely large number of jobs, a more scalable job scheduling framework that can leverage highly distributed clusters and supercomputers is required. In this presentation, we propose a high-level job scheduling Web service framework, Swarm. Swarm is developed for scientific applications that must submit massive number of high-throughput jobs or workflows to highly distributed computing clusters. The Swarm service itself is designed to be extensible, lightweight, and easily installable on a desktop or small server. As a Web service, derivative services based on Swarm can be straightforwardly integrated with Web portals and science gateways. In this talk, we present the motivation for this research, the architecture of the Swarm framework, and a performance evaluation of the system prototype.

Tuesday, September 9, 2008

Installing Hadoop

This is my note written while I was following the installation documentation from hadoop's webpage.
http://hadoop.apache.org/core/docs/current/quickstart.html
I installed as a root, but I'm not sure if it is necessary.

Step 0. You have to have ssh, rsync, and java VM on your machine. I used,
1)ssh OpenSSH_4.3p2, OpenSSL 0.9.8b 04
2)rsync version 2.6.8
3)java 1.5.0_12

Step 1. Download software from a Hadoop distribution site.
http://hadoop.apache.org/core/releases.html

Step 2. Untar file

Step 3. reset the JAVA_HOME under your_hadoop_dir/conf/hadoop-env.sh
*note: I had JAVA_HOME defined in my .bashrc file. But I had to specify it again in the hadoop-env.sh.

Step 4. now you can just run your standalone operation as it is.
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*

Step 5. For the Pseudo-Distributed Operation, which runs multiple virtual machines in single node so that it imitates real distributed file systems you have to set up the configuration in conf/hadoop-site.xml
The 'name' element is defined by hadoop system. Therefore, you can just use the names in the example from the hadoop page. I change the value of fs.default.name to hdfs://localhost:54310, and the one of mapred.job.tracker to localhost:54311.

Step 6. Check ssh localhost
In my case, I could not connect to localhost, but I could access to my numerical ip address. I changed my /etc/hosts.allow to have ALL:127.0.0.1 and it started to recognize localhost.
If it requires your passphrase:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

If you still prefer use the passphrase, it will cause problem during the starting the daemons.

Step 7. Formatting namenode and starting the daemon

$ bin/hadoop namenode -format

$ bin/start-all.sh

Now you can check your namenode at http://localhost:50070/

Also your job tracker is available at http://localhost:50030/

Step 8. Test functions

Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input

Run some of the examples provided:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*

View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*

Step 9. Stop the daemon

$ bin/stop-all.sh

Monday, September 8, 2008

Update OGCE:file-manager portlet with new tacc classes

This update adds sorted view to the directory. JSP files and two java classes are modified: FileManagerConstants.java and FileManagerPortlet.java
The list of file updated is following.
portlets/comp-file-management/src/main/webapp/jsp/fileBrowser.jsp
portlets/comp-file-management/src/main/webapp/jsp/view.jsp
portlets/comp-file-management/src/main/webapp/css/fileManagement.css
portlets/comp-file-management/src/main/java/edu/tacc/gridport/portlets/interactive/FileManagementConstants.java
portlets/comp-file-management/src/main/java/edu/tacc/gridport/portlets/interactive/FileManagementPortlet.java
portlets/gp-common/src/main/webapp/jsp/fileBrowser.jsp - identical tofileBrowser.jsp above
portlets/gp-common/src/main/webapp/javascript/fileBrowser.js

* editted part is noted as "lukas edit"