Computer Science Faculty Research & Creative Works

Evaluating Different Distributed-cyber-infrastructure For Data And Compute Intensive Scientific Application

Arghya Kusum Das
Seung Jong Park, Missouri University of Science and TechnologyFollow
Jaeki Hong
Wooseok Chang

Abstract

Scientists are increasingly using the current state of the art big data analytic software (e.g., Hadoop, Giraph, etc.) for their data-intensive applications over HPC environment. However, understanding and designing the hardware environment that these data- and compute-intensive applications require for good performance is challenging. With this motivation, we evaluated the performance of big data software over three different distributed-cyber-infrastructures, including a traditional HPC-cluster called SuperMikeII, a regular datacenter called SwatIII, and a novel MicroBrick-based hyperscale system called CeresII, using our own benchmark Parallel Genome Assembler (PGA). PGA is developed atop Hadoop and Giraph and serves as a good real-world example of a data- as well as compute-intensive workload. To evaluate the impact of both individual hardware components as well as overall organization, we changed the configuration of SwatIII in different ways. Comparing the individual impact of different hardware components (e.g., network, storage and memory) over different clusters, we observed 70% improvement in the Hadoop-workload and almost 35% improvement in the Giraph-workload in SwatIII over SuperMikeII by using SSD (thus, increasing the disk I/O rate) and scaling it up in terms of memory (which increases the caching). Then, we provide significant insight on efficient and cost-effective organization of these hardware components. Here, The MicroBrick-based CeresII prototype shows similar performance as SuperMikeII while giving more than 2-times improvement in performance/$ in the entire benchmark test.

Recommended Citation

A. K. Das et al., "Evaluating Different Distributed-cyber-infrastructure For Data And Compute Intensive Scientific Application," Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015, pp. 134 - 143, article no. 7363750, Institute of Electrical and Electronics Engineers, Dec 2015.

The definitive version is available at https://doi.org/10.1109/BigData.2015.7363750

Department(s)

Computer Science

Comments

National Sleep Foundation, Grant 1341008

International Standard Book Number (ISBN)

978-147999925-5

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

Publication Date

22 Dec 2015

Download

Full Text Link

Included in

Computer Sciences Commons

COinS

Computer Science Faculty Research & Creative Works

Evaluating Different Distributed-cyber-infrastructure For Data And Compute Intensive Scientific Application

Abstract

Recommended Citation

Department(s)

Comments

International Standard Book Number (ISBN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations

Computer Science Faculty Research & Creative Works

Evaluating Different Distributed-cyber-infrastructure For Data And Compute Intensive Scientific Application

Author

Abstract

Recommended Citation

Department(s)

Comments

International Standard Book Number (ISBN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Share

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations