Network-aware Scheduling Of Mapreduce Framework On Distributed Clusters Over High Speed Networks

Abstract

Google's MapReduce has gained significant popularity as a platform for large scale distributed data processing. Hadoop [1] is an open-source implementation of MapReduce [11] framework, originally it was developed to operate over single cluster environment and could not be leveraged for distributed data processing across federated clusters. At multiple federated clusters connected with high-speed networks, computing resources are provisioned from any of the clusters from the federation. Placement of map tasks close to its data split is critical for performance of Hadoop. In this work, we add network awareness in Hadoop while scheduling the map tasks over federated clusters. We observe 12 % to 15 % reduction of execution time in FIFO and FAIR schedulers of Hadoop for varying workloads. Copyright 2012 ACM.

Recommended Citation

P. Kondikoppa et al., "Network-aware Scheduling Of Mapreduce Framework On Distributed Clusters Over High Speed Networks," FederatedClouds'12 - Proceedings of the 2012 Workshop on Cloud Services, Federation, and the 8th Open Cirrus Summit, Co-located with ICAC'12, pp. 38 - 43, Association for Computing Machinery, Oct 2012.

The definitive version is available at https://doi.org/10.1145/2378975.2378985

Department(s)

Computer Science

Keywords and Phrases

Federated clouds; Hadoop scheduling

International Standard Book Number (ISBN)

978-145031754-2

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

Publication Date

30 Oct 2012

Computer Science Faculty Research & Creative Works

Network-aware Scheduling Of Mapreduce Framework On Distributed Clusters Over High Speed Networks

Abstract

Recommended Citation

Department(s)

Keywords and Phrases

International Standard Book Number (ISBN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations

Computer Science Faculty Research & Creative Works

Network-aware Scheduling Of Mapreduce Framework On Distributed Clusters Over High Speed Networks

Author

Abstract

Recommended Citation

Department(s)

Keywords and Phrases

International Standard Book Number (ISBN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Share

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations