Abstract
Scientists are increasingly using the current state of the art big data analytic software (e.g., Hadoop, Giraph, etc.) for their data-intensive applications over HPC environment. However, understanding and designing the hardware environment that these data- and compute-intensive applications require for good performance is challenging. With this motivation, we evaluated the performance of big data software over three different distributed-cyber-infrastructures, including a traditional HPC-cluster called SuperMikeII, a regular datacenter called SwatIII, and a novel MicroBrick-based hyperscale system called CeresII, using our own benchmark Parallel Genome Assembler (PGA). PGA is developed atop Hadoop and Giraph and serves as a good real-world example of a data- as well as compute-intensive workload. To evaluate the impact of both individual hardware components as well as overall organization, we changed the configuration of SwatIII in different ways. Comparing the individual impact of different hardware components (e.g., network, storage and memory) over different clusters, we observed 70% improvement in the Hadoop-workload and almost 35% improvement in the Giraph-workload in SwatIII over SuperMikeII by using SSD (thus, increasing the disk I/O rate) and scaling it up in terms of memory (which increases the caching). Then, we provide significant insight on efficient and cost-effective organization of these hardware components. Here, The MicroBrick-based CeresII prototype shows similar performance as SuperMikeII while giving more than 2-times improvement in performance/$ in the entire benchmark test.
Recommended Citation
A. K. Das et al., "Evaluating Different Distributed-cyber-infrastructure For Data And Compute Intensive Scientific Application," Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015, pp. 134 - 143, article no. 7363750, Institute of Electrical and Electronics Engineers, Dec 2015.
The definitive version is available at https://doi.org/10.1109/BigData.2015.7363750
Department(s)
Computer Science
International Standard Book Number (ISBN)
978-147999925-5
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2024 Institute of Electrical and Electronics Engineers, All rights reserved.
Publication Date
22 Dec 2015
Comments
National Sleep Foundation, Grant 1341008