For swarm service, I worte a client kit to crawl a directory and find cluster files which need to be assembled. To access the files and count the number of gene sequences, I used Java Runtime Process class. With the 2 million sequences, I could get around 600000 clustered files. Those clustered files were visited by the crawler program. Especially when I tried to create DOM object to interact with Web service, this crawler started to throw IO exception-memory allocation error. My Java option was -Xmn 512M -Xmx 1024M.
On linux, Runtime.exec does a fork and execute. That means that you will need double what your current java process is using in virtual memory(real + swap). Therefore, if I specify initial heap as 512M then my total heap must be bigger than 1024M.
I tried to set my java option to -Xmn 256M -Xmx 1024M. Crawler was slowed down quite a bit, but it did not throw IO exception anymore.