40% of Success is Showing Up: Random Forest Classification of Academic Success Using Non-Grade Features
In our recent project, we explored how Random Forests can classify high school students’ academic success using non-grade features. This investigation aimed to identify key predictors that could help educational institutions better support their students.
Project Overview
Our dataset, sourced from Kaggle, included detailed records of 2,392 high school students. We focused on predicting the ‘grade class,’ which categorizes students’ GPAs into five ranges: A, B, C, D, and F. The features used in our analysis spanned demographics, study habits, parental involvement, and extracurricular activities.
Correlation Analysis
Through our correlation analysis, we found that the number of absences was the most significant predictor of academic success. Notably, GPA had a correlation of -0.92 with absences, and grade class had a correlation of 0.73 with absences. The heatmap below illustrates the correlation matrix between all features in our dataset.
Model Development
We implemented a Random Forest classifier, which is well-suited for classification tasks like ours. Random Forests create multiple decision trees from bootstrap samples of the data, reducing the risk of overfitting. Our model was optimized using GridSearchCV, and the final configuration included 200 estimators, a maximum depth of 10, and the Gini impurity criterion.
Hyperparameter Tuning
To ensure our model was performing at its best, we conducted extensive hyperparameter tuning. The plots below show the impact of different hyperparameters on model accuracy, such as the number of estimators, maximum depth, and minimum samples split.
Feature Significance
The Random Forest model allowed us to assess the importance of each feature in predicting student performance. As shown in the bar plot below, absences emerged as the most critical factor, contributing to over 40% of the model’s overall feature importance. Other significant features included weekly study time, parental support, and parental education.
Conclusions
Our findings demonstrate that non-academic features can effectively predict academic success, achieving nearly 70% accuracy. The insights gained, particularly the importance of student attendance, could be valuable for schools seeking to identify and support at-risk students.
Acknowledgements
We extend our gratitude to Emily Rothenberg for her guidance and to Isaac Chang for his creative contributions. Special thanks to our peers and volunteers who supported us throughout this project.
I hope you enjoy this site. I made it based on a cool site that I saw online: https://h01000110.github.io/20170917/windows-95
ᛅᛘᛚᛁᚦ
(C:)