Abstract

Forecasts of daily pollutant levels have become a standard part of weather predictions in television, on-line, and in newspapers. Research groups also need to analyze larger timeframes across more locations to correlate long term developments for different pollutants with multiple serious health effects such as asthma. This paper presents a comparison of the Hadoop MapReduce and Spark programing models for air quality simulations, guiding future code development for the research groups interested in these analyses. Two use cases have been used, namely (i) calculating the eight-hour rolling average of pollutants in a restricted region, (ii) identifying clusters of sensors showing similar patterns in pollutant concentration over multiple years in the state of Texas. The data set used in this analysis is air pollution data collected over fifteen years at 179 monitor sites across the state of Texas for a variety of pollutants. Our results reveal 20-25% performance benefits for the Spark solutions over MapReduce. Furthermore, it documents performance benefits of the Spark MLlib machine learning library over the Mahout library which is based on the MapReduce programing model.

Recommended Citation

H. Ayyalasomayajula et al., "Air Quality Simulations using Big Data Programming Models," Proceedings 2016 IEEE 2nd International Conference on Big Data Computing Service and Applications Bigdataservice 2016, pp. 182 - 184, article no. 7474371, Institute of Electrical and Electronics Engineers, May 2016.

The definitive version is available at https://doi.org/10.1109/BigDataService.2016.26

Department(s)

Engineering Management and Systems Engineering

Comments

National Science Foundation, Grant CRI-0958464

Keywords and Phrases

Air Quality Simulations; MapReduce; Spark

International Standard Book Number (ISBN)

978-150902251-9

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

Publication Date

19 May 2016

Download

Full Text Link

Included in

Operations Research, Systems Engineering and Industrial Engineering Commons

COinS

Engineering Management and Systems Engineering Faculty Research & Creative Works

Air Quality Simulations using Big Data Programming Models

Abstract

Recommended Citation

Department(s)

Comments

Keywords and Phrases

International Standard Book Number (ISBN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations

Engineering Management and Systems Engineering Faculty Research & Creative Works

Air Quality Simulations using Big Data Programming Models

Author

Abstract

Recommended Citation

Department(s)

Comments

Keywords and Phrases

International Standard Book Number (ISBN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Share

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations