Data Mining and Machine Learning Retention Models in Higher Education


This Study Presents a Systematic Review of the Literature on the Predicting Student Retention in Higher Education through Machine Learning Algorithms based on Measures Such as Dropout Risk, Attrition Risk, and Completion Risk. a Systematic Review Methodology Was Employed Comprised of Review Protocol, Requirements for Study Selection, and Analysis of Paper Classification. the Review Aims to Answer the Following Research Questions: (1) What Techniques Are Currently Used to Predict Student Retention Rates, (2) Which Techniques Have Shown Better Performance under Specific Contexts?, (3) Which Factors Influence the Prediction of Completion Rates in Higher Education?, and (4) What Are the Challenges with Predicting Student Retention? Increasing Student Retention in Higher Education is Critical in Order to Increase Graduation Rates. Further, Predicting Student Retention Provides Insight into Opportunities for Intentional Student Advising. the Review Provides a Research Perspective Related to Predicting Student Retention using Machine Learning through Several Key Findings Such as the Identification of the Factors Utilized in Past Studies and Methodologies Used for Prediction. These Findings Can Be Used to Develop More Comprehensive Studies to Further Increase the Prediction Capability And; Therefore, Develop Strategies to Improve Student Retention.


Engineering Management and Systems Engineering

Keywords and Phrases

data mining; education; machine learning; retention

International Standard Serial Number (ISSN)

1541-4167; 1521-0251

Document Type

Article - Journal

Document Version


File Type





© 2023 SAGE Publications, All rights reserved.

Publication Date

01 May 2023