We present a framework for an explainable and statistically validated ensemble clustering model applied to Traumatic Brain Injury (TBI). The objective of our analysis is to identify patient injury severity subgroups and key phenotypes that delineate these subgroups using varied clinical and computed tomography data. Explainable and statistically-validated models are essential because a data-driven identification of subgroups is an inherently multidisciplinary undertaking. In our case, this procedure yielded six distinct patient subgroups with respect to mechanism of injury, severity of presentation, anatomy, psychometric, and functional outcome. This framework for ensemble cluster analysis fully integrates statistical methods at several stages of analysis to enhance the quality and the explainability of results. This methodology is applicable to other clinical data sets that exhibit significant heterogeneity as well as other diverse data science applications in biomedicine and elsewhere.


Electrical and Computer Engineering

Second Department

Computer Science

Third Department

Mathematics and Statistics

Research Center/Lab(s)

Center for High Performance Computing Research

Second Research Center/Lab

Intelligent Systems Center


Army Research Laboratory, Grant W911NF-14-2-0034

Keywords and Phrases

Canonical Discriminant Analysis; Clustering; Ensemble Learning; Explainable AI; Hybrid Human-Machine Systems; Mixed Models; Multicollinearity; Precision Medicine

International Standard Serial Number (ISSN)


Document Type

Article - Journal

Document Version

Final Version

File Type





© 2020 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.

Creative Commons Licensing

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Publication Date

28 Sep 2020