Students’ Class Performance Prediction Using Machine Learning Classifiers

Nowadays, educational data mining is being employed as assessing tool for study and analysis of hidden patterns in academic databases which can be used to predict student’s academic performance. This paper implements various machine learning classification techniques on students’ academic records for results predication. For this purpose, data of MS(CS) students were collected from a public university of Pakistan through their assignments, quizzes, and sessional marks. The WEKA data mining tool has been used for performing all experiments namely, data pre-processing, classification, and visualization. For performance measure, classifier models were trained with 3- and 10-fold cross validation methods to evaluate classifiers' accuracy. The results show that bagging classifier combined with support vector machines outperform other classifiers in terms of accuracy, precision, recall, and F-measure score. The obtained outcomes confirm that our research provides significant contribution in prediction of students’ academic performance which can ultimately be used to assists faculty members to focus low grades students in improving their academic records.


Introduction
D ata mining is the process of extracting valuable, explicit, and nontrivial knowledge from large data depositories. However, its main purpose is to inquire large databases in order to find innovative and useful patterns that could assist in better decisions making for the future events that might otherwise remain unknown [1]. Nowadays, data mining is being used in vast areas such as in e-commerce, bioinformatics, recommendation system, outlier analysis, and other scientific applications. In this research, our focus is to use it in education which is termed as Educational Data Mining.
The Education Data Mining (EDM) is becoming an emerging discipline, concerned with developing methods for exploring the unique and increasingly large-scale data that come from educational settings and using those methods to better understand students for future learning [2]. The researchers' work in education data mining is to analyze students' failures, attrition, and prediction of the performance of the students. The key area of EDM application is to improve the model of the students by identifying various characteristics and attributes which could play a seminal role in the model [3]. Currently, it is a hot area among the researchers due to its potential benefits in the education sector while developing a better model that can explore the hidden knowledge in the academic data [4]. For this purpose, many supervised and unsupervised machine learning techniques have been proposed in the past for EDM to develop an effective model for prediction students' performance [5]. Therefore, implementing data mining techniques in higher education institutes play a paramount role to improve the quality of education and learning experience of students. Moreover, students are important assets in higher education system and their timely graduation play a dominant role for the market. It is crucial for Fig. 1: Research methodology the institutes to detect the factors that adversely affect students' performance [6]. Many public and private universities collect a large amount of academic data, but do not utilize it in a manner that could provide valuable information to education decision-makers in order to plan, evaluate, and decide their educational programs. Therefore, there is a huge gap in the usage of EDM techniques in academics to improve the quality of education and research capabilities of the students [7].
The main aim of this research is to predict the overall class score of students before the final term examinations by using their assignments, quizzes and mid-term scores. For this purpose, the past results of the students were collected from a public-sector university in Pakistan. Based on this data, prediction model was developed using various machine learning techniques. We have used the data of the past batches of MS program in order to predict the performance of the students for various batches. We implemented machine learning classification algorithms by using WEKA tool. However, our main task is to ascertain the best classifier based on classification evaluation metrics such as accuracy, precision, recall and F-measure.
The rest of this paper is arranged as follows. Section 2 provides previous work which has been reported in the past in the field of Education Data Mining. Section 3 describes proposed methodology which discusses the dataset collection, pre-processing, and visualization. Section 4 discusses the implementation of the machine learning algorithms and provide results using well-known evaluation metrics namely, accuracy, precision, recall and F-measure in tabular format. Section 5 illustrates comparative evaluation among various used classifiers. Finally, Section 6 gives conclusion and future work.

Related Work
Educational data mining is a new evolving research in the field of educational analytics. Researchers are more focused on it to find some meaningful pattern which can lead to the accurate model for the prediction for students' performance and improve the quality of higher education in the institutes. In [8], authors proposed a method for analyzing students' performance using classification techniques. They aggregated the students' data and employed the ID3 and J48 algorithms on 239 instances. It was found that J48 gives maximum accuracy as compared to the ID3. It was concluded that class attendance has the highest gain value, and it is largely responsible for the success and failure of the students.
Similarly, another research was conducted by British university in Dubai for the prediction of distinguished students. Objective of this work was to improve students' research capabilities in order to solve the real-world problems. Experiments were performed on the dataset of the university students by implementing several data mining algorithms. The results showed that Support Vector Machine (SVM) with radial kernel provided most accurate prediction in students' performance [9]. In [10], supervised machine learning techniques were employed for analyzing the dropout ratio of students from the course. The classification was performed on past students' data in which results showed that artificial neural networks outperformed other methods.
In [11], authors presented classification techniques to predict performance of students using different algorithms namely J48, ID3, Naive Bayes, IB1, and OneR, implemented using WEKA explorer tool. The size of the data is 60 which is pre-processed and prepared in order to visualize the students' performance. The results showed that Naive Bayes algorithm is the most accurate classifier than others.

TABLE 1: List of attributes and values
University to predict the student performances in a course. The authors implemented Naive Bayes, JRip, and Rule-based algorithms to predict the performance of final exams and total points obtained in the course. The model achieved 91% accuracy to predict the failing students prior to final exams.
In another paper [13], the authors employed different classification techniques for students' performance predication such as ID3, C4.5, and Naive Bayes. After comparison, results showed that decision tree classifiers provide higher accuracy prediction than other method. In [14], the decision tree algorithms were used to predict students' performance for passing or failing in the final exams based on its assessment data. The C4.5 and ID3 algorithms were employed and compared. The results showed that C4.5 algorithm performed well on predicting the pass or fail ratios of the students. The results were given to the instructors in order to improve the performance of the students who were predicted to fail.
The authors at Michigan State University (MSU) applied various tree-based and non-tree-based classification algorithms to predict the final grades of the students. The data was collected through their webbased portal. The results reveled that the accuracy of the prediction of final exam grades increased when the classifier was combined with genetic algorithm [15].
In [16], authors presented comparative analysis of different data mining methods and algorithms in order to classify students through their Moodle usage data and obtained final marks in the exams for decision making. In [17], Yadav and Pal implemented different decision tree classification algorithms for predicting students' grades who were most likely to fail in their final exams. In their work, they achieved 62.2%, 62.2% and 67.77% overall prediction accuracy for ID3, CART and C4.5 decision tree algorithms, respectively. Dorina Kabakchieva [18] implemented data mining classification techniques as a research project for the Bulgarian University. The data about the students' personal information and their performance was collected from databases of the university which included 20 parameters of 10330 students. The results obtained through different classifiers varied between 52-67% accuracy.
Shana and Venkatachalam [19] investigated students' accomplishment by using 182 student records and 20 attributes. They used Pearson's Coefficient and F-Test techniques to predict students' performance. In this work, classification algorithms such as Decisiontree induction, REP tree, Simple CART, Naive Bayes were implemented. Results showed that Naive Bayes classification obtained highest accuracy.
In [20], Tanner and Toivanen analyzed 15000 students from an online touch-typing course using Knearest neighbor method (KNN). This research aims to predict the students which are at high risk of failure in their early stage of the course and then provide an alert to the teacher to take improvement measures.
The results indicate that KNN method works well in early test scores predication.
The literature reviewed reveals that in EDM research, mostly classification techniques have been employed in predicting the performance of the students. In our work, we have used various classification for the predication of students' performance on university databases which consists of different courses. The detailed research methodology is explained in the following section.

Proposed Methodology
In this section, our proposed methodology has been discussed in which algorithm steps are described as shown in Figure 1. Initially, dataset of students is collected from a public university of Pakistan which contains attributes such as batch year, program, research student, student type, financial condition, class, class participation, midterm, sessional, attendance, and final exam marks. After that, data pre-processing is performed in order to select specific attributes for model building. In the next step, different classifiers are trained on the dataset to predict the students' performance and their obtained results are compared to assess the classifier's accuracy. The details are provided in following paragraph.

Data Collection
In this section, the data of graduated students is collected from a public University which contains 952 tuples of data from four different departments, namely Environmental Sciences, Computer Sciences, Media Sciences, and Business Administration. Out of 952 tuples, 13 tuples were omitted after detection of anomalies, and finally we chosen 939 tuples for data mining process. However, each instance contains 20 attributes out of which 12 most representative attributes were selected for building the model as shown in the Figure  2.

Data Pre-processing
Data pre-processing is an important step for filtering inconsistent, incomplete and noisy data. We collected the dataset from various tables of the university database by using the structured query language (SQL) and implemented the following steps in order to prepare our data for the analytics.
1) Firstly, we eliminated any discrepancy in the naming conventions according to the ID attributes of the table and replace the ids with the values for these attributes.
2) The duplicate records were removed from the tables. 3) the incomplete data of the students who were not shown and absent since the course started. Finally, we resolved the contradiction in data. The data pre-processing is performed through converting the Excel file into the Comma Separated Values (CSV). After that, WEKA tool manager was implemented to convert CSV format into Attribute-Relation File Format (ARFF). An ARFF file consists of an ASCII text file that represents a record of instances yielding a set of attributes [21]. However, during preprocessing, we have created two extra attributes for the total marks obtained by the students at the end of the semester in the selected course which includes the final exam marks as well. The description of the attributes is shown in Table 1.

Data Visualization
After data pre-processing, the created ARFF file was loaded into the WEKA tool for data analysis which represents the attribute and its distributions. Figure  3 represents the data related to students per class   attributes. The number of students who passed the exams with "A+" "A", "B", "C" "D", "B+", and "F" are 140, 350, 133, 96, 30, 147 and 43, respectively. Further, Figure 4 shows the number of instances having gender male (510) and and female (429) students against class attribute. Similarly Figure 5 depicts batch year and class attributes of students. The number of students in the batch years 2018 and 2019 are 401 and 538 respectively.  Figure 7 provides the distribution of sessional marks having minimum value 10, and maximum value 20. The mean of the marks is 20.24 and standard deviation is 5.143. Figure 8 shows class participation marks which ranges from 2 to 10 with mean value of 6.02 and standard deviation value of 2.28. Figure 9 shows that there are 480 research-oriented students, while 459 students are without a research degree program. Figure 10 represents the type of student in which 470 are parttime and rest are 469 full time students. Similarly, Figure 11 represents the financial condition of the students. The students having good financial condition and bad financial are 306 and 360 respectively, and average financial condition is 273. In addition, there are some other graphical representations which provide attributes relationship between the grade, gender attribute, batch year, grade, and program type, grade. Figure 12 shows gender against the grades in which it indicates numbers of female students are less than the male students having below average grade B. In Figure 13, academic years are plotted against the grades where it shows that in year 2018 the number of average students are less than the number of average students for the year 2019. It predicts that the students in the year 2018 performed well as compared to year 2019. Figure 14 provides the comparison of the grades with the programs. It reveals that the number of below average students is smaller in Computer Science program and Media Studies as compared to other programs.
After the data visualization in WEKA, we applied different classification techniques on the data to predict students' academic performance. The main purpose of this research is to select a model which classifies most of the test data instances correctly compared to other implemented classifiers which can predict students' grades on future instances [22].

Results & Discussion
In this section, details of the implemented classifiers have been discussed for prediction of student's class performance. For this purpose, dataset is divided into training and test set. However, we have employed 3-Cross and 10-Cross fold validation method for performance analysis. As shown in Table 12, Naive Bayes algorithm and BayesNet algorithm achieved 68.33% and 65.28% on 3-cross and 10-Cross fold validation respectively. Table 2 represents the confusion matrix which tells that model correctly predicted 16 students who were failed and mis-classified 2 instance. For classes A+, A and B, it accuractely classified the number of students. However, the true positive rate of all classes is better except for the class of B+ values, because the number of correctly classified instances is less than the incorrectly classified instances. Table 3 shows precision, recall and F-measures and weighted averages for each class. The average precision and recall of the Bayes Net algorithm is 0.7 and 0.683 with the F-measure of 0.675. In addition, The Naive Bayes algorithm provides 67.71% accuracy with the 3-Coss Fold and 65.17% accuracy with the 10-cross fold technique.
The variant of Naive Bayes, namely Averaged and Dependence Estimator (A2DE) algorithms provided 69.96% and 66% accuracy with 10-Cross fold and 3-Cross Fold respectively. The confusion matrix of A2DE, as shown in Table 4 reveals that 3-Cross fold validation predicts the classes much better than the Bayes Net algorithm. The students predicted to be in            Table 5 shows improved average precision and recall of 0.733 and 0.721 respectively. The average F-measure generated by A2DE model is 0.719 which is also better than Naive Bayes algorithm. Further, LibSVM which is a variant of support vector machine classification algorithm provides accuracy of 80.87% and 77.95% on 3-Cross fold and 10-Cross fold validation as shown in Table 12. Table 6 depicts the confusion matrix values of LibSVM. The obtained results in Table 7 show that SVM algorithm gives much better results compared to Naïve Bayes algorithm. It provides an average precision value of 0.816 and recall value of 0.809. The average F-Measure value of LibSVM model is 0.805 which is optimal as shown in Table 7. If we examine logistic R algorithm in Table 12, it provides accuracy of the 74.12% and 74.60% on the 10-Cross fold and 3-Cross fold techniques respectively. For Logistic R Table 8 represents its confusion matrix  and Table 9 shows average precision, recall and Fmeasure.  Lazy local KNN algorithm with 10-Cross fold, and 3-Cross folds validation provides accuracy of 71.56% and 75.23% as shown in Table 12. Similarly some other algorithms are also compared and presented in Table 12, where JRip algorithm provides the accuracy of 78.05% and 74.76% with the 10-Cross fold and the 3-Cross fold validation respectively. Further, three tree-based algorithms are also evaluated and compared such as J48, logical model tree and random forest. The J48 algorithm achieved accuracy of up to 78.05% and 76.47% using 3-cross fold and 10-cross fold validation respectively. Furthermore, Logical Model Tree (LMT) algorithm achieve accuracy of 76.80% and 78.38% using the 3-Cross fold and 10-Cross fold validation techniques respectively. Finally, Random Forest algorithm is applied on accuracy of 77.63% and 78.99% respectively over 3-Cross fold and 10-Cross fold validation respectively. The average precision, recall and F-measure values of the tested algorithms have been shown in tabular results.

Comparative Evaluation
This section provides comparative analysis among different implemented classifiers. The bagging classifier, a meta-estimator, when combined with the classifier of the LibSVM improves the accuracy as compared to Lib SVM algorithm. The combined algorithm predicts students' performance accurately and achieves 80.87% and 77.95% accuracy with 3-cross fold and 10-cross fold as shown in Table 12. Figure 15 depicts ROC curve which represents that accuracy for the prediction of the failure instances compared to the other classifiers. The area under the curve is 0.972 for the class attribute value F , while rest of the instances are correctly predicted. It shows that the prediction of the students having class B+ attribute has higher precision, recall and F-measure compared to LibSVM algorithm. Figure 16 shows the ROC curve of the attribute class B+ which is approximately 71% better than the other tested models. The average precision, recall and F-measure values of bagging classifier are quite higher, which confirms the efficacy of this model while predicting the students' failure and success in final examination.
In this research, total 11 classification algorithms were implemented for investigating the best classifier for the prediction of the student's performances in the final exams. Table 12 provides results of the implemented classifiers. It can see from tabular outcomes that bagging with support vector machine algorithm outperformed other tested algorithms for 3-cross and 10-croos fold validation techniques. It provided highest accuracy of 81.50% with average precision, average recall, and F-measure value of 0.820, 0.815, and 0.811 respectively. From these numerical results, it can be concluded that bagging algorithm with SVM works well while predicting students' performance ratios in terms of success and failure.

Conclusion
In this paper, we investigated the university students' academic records to predict the performances of the students. For this purpose, dataset of students was created using their quizzes, assignments, mid-term and final term exams. WEKA was used to evaluate the results. Our main purpose in this research was to find out the most effective classifier which yields best accuracy. After comparative analysis, the obtained results revealed that bagging algorithm, when combined with SVM classifiers, gives optimal outcomes in terms of accuracy, precision, recall and F-measure score on all class attributes. This research was carried out for MS program students; however, in future we plan to extend this model on more academic programs to validate its efficacy. Furthermore, it is highly recommended that one can work on the selection of more significant attributes from academic records which might affect students' class performance from academic records.