Statistical Analysis and Classifier Accuracy Improvements Models for Road Accident Issues on National Highways in India

  • Chitnis S.D., P Gokhale
Keywords: Machine learning, correlation matrix, k-nearest neighbor, Logistic Regression, Random Forest, Cross-validation, confusion matrix, AUROC


Road traffic accidents are undistinguishable incidents. They have been a major cause of concern across the Indian subcontinent. Like the Indian subcontinent, the majority of under-developing countries have increasing death rates due to vehicle collisions. Furthermore, there are not an adequate amount of safety factors by which we can analyze the traffic collisions before it happens. However, machine learnings conventional algorithms have their strong position in the field. In this paper, an analysis of vehicle collision using correlation matrix and assignment of the most  compatible  machine learning classification techniques for road accidents estimation has been intended. A data set of 2368 accidents cases were examined for road conditions and types of an accident using supervised learning methods. The data set was analyzed with potentially relevant indicators. A comparative study over different machine learning supervised techniques like a k- nearest neighbor, Logistic Regression, and Random Forest were evaluated. The data set was randomly split into two parts: training (70%) and test (30%). A 5-fold Cross-validation method was applied during modeling, and a testing set was employed to validate the prediction performance of the supervised learning models. Model performance evaluation is done using three accuracy measures like confusion matrix, AUROC, and classification report. The accuracy analysis, together with the correlation matrix, revealed that the best models for the prediction of road traffic accidents for the given data set is the k-nearest neighbor classifier when compared  with other learning algorithms and the best hyperparameters for k-nearest neighbor are Manhattan (metric), 17  (n_neighbor) and uniform (weights).

How to Cite
P Gokhale, C. S. (2021). Statistical Analysis and Classifier Accuracy Improvements Models for Road Accident Issues on National Highways in India. Design Engineering, 5576- 5591. Retrieved from