(1) Resource setup: BigRed(150), Ornl(80), Cobalt(80)
(2) Service setup: Status check interval: 60 secs
Job queue scan interval: 60 secs
Job queue size: 100
(3) Input files : 2mil.tar copied to each of cluster in the $TG_CLUSTER_SCRATCH directory.
Referred by arguments with full path to the input files
(4) Output files: staged out to swarm host
(5) Client side setup: input files are located in my desktop(same machine with swarm host).
Scan the directory and find files which contain more than 1 sequence(with grep unix command through Java Runtime). Send the request to swarm with 10 batch per rpc call.
- Total duration of the Submission: 170364307 milliseconds(around 47.3 hours).
- Total number of jobs submitted: 75533
- Total number of files scanned: 536825
(7) Held Jobs: To be added
(8) Open Issues:
Submission time requires to be improved.
Reason:
- Loading 536825 objects which represent the filename takes too much of memory.[Approach]: Use filefilter and load partial list at a time
- Using Java Runtime: Java Runtime requires extra memory to execute system fork. [Approach]: Try checking the number of sequences by means of Java FileInputStream
- Running client and host in the same machine.[Approach]: Try the client in different machine.