Friday, November 7, 2008
Test #1(2mil): Nov.07.2008
(2) Server Setup:
-BigRed max 400
-Cobalt max 200
(3) Client Setup:
-Max job submission: 2000000
-Input files source: EST Human 2mil
Note:
(1) Reading directory with 2mil sequence takes less than 20 secs.
(2) Client hung with http timeout
(3)Cobalt started to hold jobs with globus err code 17.
Wednesday, October 15, 2008
[ABSTRACT] Scheduling Large-scale Jobs over the Loosely-Coupled HPC Clusters
Tuesday, September 9, 2008
Installing Hadoop
http://hadoop.apache.org/core/docs/current/quickstart.html
I installed as a root, but I'm not sure if it is necessary.
Step 0. You have to have ssh, rsync, and java VM on your machine. I used,
1)ssh OpenSSH_4.3p2, OpenSSL 0.9.8b 04
2)rsync version 2.6.8
3)java 1.5.0_12
Step 1. Download software from a Hadoop distribution site.
http://hadoop.apache.org/core/releases.html
Step 2. Untar file
Step 3. reset the JAVA_HOME under your_hadoop_dir/conf/hadoop-env.sh
*note: I had JAVA_HOME defined in my .bashrc file. But I had to specify it again in the hadoop-env.sh.
Step 4. now you can just run your standalone operation as it is.
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*
Step 5. For the Pseudo-Distributed Operation, which runs multiple virtual machines in single node so that it imitates real distributed file systems you have to set up the configuration in conf/hadoop-site.xml
The 'name' element is defined by hadoop system. Therefore, you can just use the names in the example from the hadoop page. I change the value of fs.default.name to hdfs://localhost:54310, and the one of mapred.job.tracker to localhost:54311.
Step 6. Check ssh localhost
In my case, I could not connect to localhost, but I could access to my numerical ip address. I changed my /etc/hosts.allow to have ALL:127.0.0.1 and it started to recognize localhost.
If it requires your passphrase:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
If you still prefer use the passphrase, it will cause problem during the starting the daemons.
$ bin/hadoop namenode -format
$ bin/start-all.sh
Now you can check your namenode at http://localhost:50070/
Also your job tracker is available at http://localhost:50030/
Step 8. Test functions
Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input
Run some of the examples provided:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*
View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*
Step 9. Stop the daemon
$ bin/stop-all.sh
Monday, September 8, 2008
Update OGCE:file-manager portlet with new tacc classes
The list of file updated is following.
portlets/comp-file-management/src/main/webapp/jsp/fileBrowser.jsp
portlets/comp-file-management/src/main/webapp/jsp/view.jsp
portlets/comp-file-management/src/main/webapp/css/fileManagement.css
portlets/comp-file-management/src/main/java/edu/tacc/gridport/portlets/interactive/FileManagementConstants.java
portlets/comp-file-management/src/main/java/edu/tacc/gridport/portlets/interactive/FileManagementPortlet.java
portlets/gp-common/src/main/webapp/jsp/fileBrowser.jsp - identical tofileBrowser.jsp above
portlets/gp-common/src/main/webapp/javascript/fileBrowser.js
* editted part is noted as "lukas edit"
Thursday, August 21, 2008
Teragrid access end-to-end test
- condorG (grid universe)
- cap3 apps
- stage output file
(08/22/2008 current)
==============================================================
[bigred] gatekeeper.iu.teragrid.org:2119/jobmanager-loadleveler yes
[steele] tg-steele.purdue.teragrid.org:2119/jobmanager-pbs yes
[sdsc(ds)] dslogin.sdsc.teragrid.org:2119/jobmanager-loadleveler job state write error
[mercury] https://grid-hg.ncsa.teragrid.org:2119/jobmanager-pbs yes
[ornl] tg-login.ornl.teragrid.org:2119/jobmanager-pbs yes
[lonestar] gatekeeper.lonestar.tacc.teragrid.org:2119/jobmanager-lsf job state read error
[cobalt] grid-co.ncsa.teragrid.org:2119/jobmanager-pbs yes
[pople] gram.pople.psc.teragrid.org:2119/jobmanager-pbs cannot login
[sdsc(dtf)]tg-login1.sdsc.teragrid.org:2119/jobmanager-pbs disk quota error
Friday, August 15, 2008
Limit of job submissiont?
http://kb.iu.edu/data/awyt.html
http://kb.iu.edu/data/axal.html
NCSA(cobalt)
http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/SGIAltix/Doc/Jobs.html
qstat -Q
BigRed
llclass
Wednesday, August 13, 2008
Java memory setup
export JAVA_OPTS="-server -Xms512m -Xmx1024m -XX:MaxPermSize=256m"
Otherwise, your java virtual machine will provide 8M of memory which might be too small for running the server.