Friday, November 14, 2008
Java Runtime Class with VM memory
On linux, Runtime.exec does a fork and execute. That means that you will need double what your current java process is using in virtual memory(real + swap). Therefore, if I specify initial heap as 512M then my total heap must be bigger than 1024M.
I tried to set my java option to -Xmn 256M -Xmx 1024M. Crawler was slowed down quite a bit, but it did not throw IO exception anymore.
Friday, November 7, 2008
Test #3(2mil): Nov.07.2008
(1) Starting time: 04:00 pm
(2) Server Setup:
-BigRed : max 5
-Cobalt : max5
(3) Client Setup:
-Total job max: 20
-Input source: EST Human 2mil
* Note
New Setup:(1) Increased timeout from 20 secs to 6 mins. (axis2.xml)(2) Decreased resource pool size(3) Decreased client jobs
Test #2(2mil) : Nov.07.2008
(2) Server Setup:
-BigRed : max 20
-Cobalt : max20
(3) Client Setup:
-Total job max: 1000
-Input source: EST Human 2mil
* Note
New Setup:
(1) Increased timeout from 20 secs to 6 mins. (axis2.xml)
(2) Decreased resource pool size
(3) Decreased client jobs
Result:
(1) Job submission successfully done. (total 1000 jobs)
References:
First condorjob clusterID: 50537
Test #1(2mil): Nov.07.2008
(2) Server Setup:
-BigRed max 400
-Cobalt max 200
(3) Client Setup:
-Max job submission: 2000000
-Input files source: EST Human 2mil
Note:
(1) Reading directory with 2mil sequence takes less than 20 secs.
(2) Client hung with http timeout
(3)Cobalt started to hold jobs with globus err code 17.
Wednesday, October 15, 2008
[ABSTRACT] Scheduling Large-scale Jobs over the Loosely-Coupled HPC Clusters
Tuesday, September 9, 2008
Installing Hadoop
http://hadoop.apache.org/core/docs/current/quickstart.html
I installed as a root, but I'm not sure if it is necessary.
Step 0. You have to have ssh, rsync, and java VM on your machine. I used,
1)ssh OpenSSH_4.3p2, OpenSSL 0.9.8b 04
2)rsync version 2.6.8
3)java 1.5.0_12
Step 1. Download software from a Hadoop distribution site.
http://hadoop.apache.org/core/releases.html
Step 2. Untar file
Step 3. reset the JAVA_HOME under your_hadoop_dir/conf/hadoop-env.sh
*note: I had JAVA_HOME defined in my .bashrc file. But I had to specify it again in the hadoop-env.sh.
Step 4. now you can just run your standalone operation as it is.
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*
Step 5. For the Pseudo-Distributed Operation, which runs multiple virtual machines in single node so that it imitates real distributed file systems you have to set up the configuration in conf/hadoop-site.xml
The 'name' element is defined by hadoop system. Therefore, you can just use the names in the example from the hadoop page. I change the value of fs.default.name to hdfs://localhost:54310, and the one of mapred.job.tracker to localhost:54311.
Step 6. Check ssh localhost
In my case, I could not connect to localhost, but I could access to my numerical ip address. I changed my /etc/hosts.allow to have ALL:127.0.0.1 and it started to recognize localhost.
If it requires your passphrase:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
If you still prefer use the passphrase, it will cause problem during the starting the daemons.
$ bin/hadoop namenode -format
$ bin/start-all.sh
Now you can check your namenode at http://localhost:50070/
Also your job tracker is available at http://localhost:50030/
Step 8. Test functions
Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input
Run some of the examples provided:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*
View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*
Step 9. Stop the daemon
$ bin/stop-all.sh
Monday, September 8, 2008
Update OGCE:file-manager portlet with new tacc classes
The list of file updated is following.
portlets/comp-file-management/src/main/webapp/jsp/fileBrowser.jsp
portlets/comp-file-management/src/main/webapp/jsp/view.jsp
portlets/comp-file-management/src/main/webapp/css/fileManagement.css
portlets/comp-file-management/src/main/java/edu/tacc/gridport/portlets/interactive/FileManagementConstants.java
portlets/comp-file-management/src/main/java/edu/tacc/gridport/portlets/interactive/FileManagementPortlet.java
portlets/gp-common/src/main/webapp/jsp/fileBrowser.jsp - identical tofileBrowser.jsp above
portlets/gp-common/src/main/webapp/javascript/fileBrowser.js
* editted part is noted as "lukas edit"