Friday, November 21, 2008

Test #4(2mil): Nov.21.2008

Full sequence test for 2million sequences.

(1) Resource setup: BigRed(150), Ornl(80), Cobalt(80)
(2) Service setup: Status check interval: 60 secs
Job queue scan interval: 60 secs
Job queue size: 100
(3) Input files : 2mil.tar copied to each of cluster in the $TG_CLUSTER_SCRATCH directory.
Referred by arguments with full path to the input files
(4) Output files: staged out to swarm host

(5) Client side setup: input files are located in my desktop(same machine with swarm host).
Scan the directory and find files which contain more than 1 sequence(with grep unix command through Java Runtime). Send the request to swarm with 10 batch per rpc call.
  • Total duration of the Submission: 170364307 milliseconds(around 47.3 hours).
  • Total number of jobs submitted: 75533
  • Total number of files scanned: 536825
(6) Completed Jobs : To be added
(7) Held Jobs: To be added
(8) Open Issues:
Submission time requires to be improved.
Reason:
  • Loading 536825 objects which represent the filename takes too much of memory.[Approach]: Use filefilter and load partial list at a time
  • Using Java Runtime: Java Runtime requires extra memory to execute system fork. [Approach]: Try checking the number of sequences by means of Java FileInputStream
  • Running client and host in the same machine.[Approach]: Try the client in different machine.

No comments: