Abstract
Forecasts of daily pollutant levels have become a standard part of weather predictions in television, on-line, and in newspapers. Research groups also need to analyze larger timeframes across more locations to correlate long term developments for different pollutants with multiple serious health effects such as asthma. This paper presents a comparison of the Hadoop MapReduce and Spark programing models for air quality simulations, guiding future code development for the research groups interested in these analyses. Two use cases have been used, namely (i) calculating the eight-hour rolling average of pollutants in a restricted region, (ii) identifying clusters of sensors showing similar patterns in pollutant concentration over multiple years in the state of Texas. The data set used in this analysis is air pollution data collected over fifteen years at 179 monitor sites across the state of Texas for a variety of pollutants. Our results reveal 20-25% performance benefits for the Spark solutions over MapReduce. Furthermore, it documents performance benefits of the Spark MLlib machine learning library over the Mahout library which is based on the MapReduce programing model.
Recommended Citation
H. Ayyalasomayajula et al., "Air Quality Simulations using Big Data Programming Models," Proceedings 2016 IEEE 2nd International Conference on Big Data Computing Service and Applications Bigdataservice 2016, pp. 182 - 184, article no. 7474371, Institute of Electrical and Electronics Engineers, May 2016.
The definitive version is available at https://doi.org/10.1109/BigDataService.2016.26
Department(s)
Engineering Management and Systems Engineering
Keywords and Phrases
Air Quality Simulations; MapReduce; Spark
International Standard Book Number (ISBN)
978-150902251-9
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2025 Institute of Electrical and Electronics Engineers, All rights reserved.
Publication Date
19 May 2016

Comments
National Science Foundation, Grant CRI-0958464