Jackson McBride

8 object(s)
 

Random Forest Classification of Academic Success

40% of Success is Showing Up: Random Forest Classification of Academic Success Using Non-Grade Features

In our recent project, we explored how Random Forests can classify high school students’ academic success using non-grade features. This investigation aimed to identify key predictors that could help educational institutions better support their students.

Project Overview

Our dataset, sourced from Kaggle, included detailed records of 2,392 high school students. We focused on predicting the ‘grade class,’ which categorizes students’ GPAs into five ranges: A, B, C, D, and F. The features used in our analysis spanned demographics, study habits, parental involvement, and extracurricular activities.

Correlation Analysis

Through our correlation analysis, we found that the number of absences was the most significant predictor of academic success. Notably, GPA had a correlation of -0.92 with absences, and grade class had a correlation of 0.73 with absences. The heatmap below illustrates the correlation matrix between all features in our dataset.

Correlation Heatmap

Model Development

We implemented a Random Forest classifier, which is well-suited for classification tasks like ours. Random Forests create multiple decision trees from bootstrap samples of the data, reducing the risk of overfitting. Our model was optimized using GridSearchCV, and the final configuration included 200 estimators, a maximum depth of 10, and the Gini impurity criterion.

Hyperparameter Tuning

To ensure our model was performing at its best, we conducted extensive hyperparameter tuning. The plots below show the impact of different hyperparameters on model accuracy, such as the number of estimators, maximum depth, and minimum samples split.

Hyperparameter Tuning

Feature Significance

The Random Forest model allowed us to assess the importance of each feature in predicting student performance. As shown in the bar plot below, absences emerged as the most critical factor, contributing to over 40% of the model’s overall feature importance. Other significant features included weekly study time, parental support, and parental education.

Feature Significance

Conclusions

Our findings demonstrate that non-academic features can effectively predict academic success, achieving nearly 70% accuracy. The insights gained, particularly the importance of student attendance, could be valuable for schools seeking to identify and support at-risk students.

View Full PDF

Acknowledgements

We extend our gratitude to Emily Rothenberg for her guidance and to Isaac Chang for his creative contributions. Special thanks to our peers and volunteers who supported us throughout this project.


I hope you enjoy this site. I made it based on a cool site that I saw online: https://h01000110.github.io/20170917/windows-95

ᛅᛘᛚᛁᚦ