Reproducibility of Survey Results: A New Method to Quantify Similarity of Human Subject Pools


Smart Connected Communities (SCCs) is a novel paradigm that brings together multiple disciplines, including social-sciences, computer science, and engineering. Large-scale surveys are a fundamental tool to understand the needs and impact of new technologies to human populations, necessary to realize the SCC paradigm. However, there is a growing debate regarding the reproducibility of survey results. As an example, it has been shown that surveys may easily provide contradictory results, even if the subject populations are statistically equivalent from a demographic perspective. In this paper, we take the initial steps towards addressing the problem of reproducibility of survey results by providing formal methods to quantitatively justify apparently inconsistent results. Specifically, we define a new dissimilarity metric between two populations based on the users answers to non-demographic questions. To this purpose, we propose two algorithms based on submodular optimization and information theory, respectively, to select the most representative questions in a survey. Results show that our method effectively identifies and quantifies differences that are not evident from a purely demographic point of view.

Meeting Name

2020 IEEE Global Communications Conference, GLOBECOM 2020 (2020: Dec. 7-11, Taipei, Taiwan)


Psychological Science


National Science Foundation, Grant CPS-1943035

Keywords and Phrases

Dissimilarity Metrics; Reproducibility; Surveys

International Standard Book Number (ISBN)


Document Type

Article - Conference proceedings

Document Version


File Type





© 2020 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.

Publication Date

11 Dec 2020