Ensemble Supported Fake News Detection using Natural Language Processing
Growth of communications mediated by social media has contributed much scope for research in fake news detection using techniques in machine learning and natural language processing. The inability of human beings to clearly interpret legitimate news and fake news is a big challenge both in print media as well as in social media. Nowadays the tendency is to share or forward anything received in social platform without much verification to the dearest ones. The same tendency is exhibited by the people at the receiving end. In this research paper, the aim is to classify news as either legitimate or fake. The dataset used for the experiments is a Health and Wellbeing dataset containing 1000 examples. After careful pre-processing of the dataset, vectorization algorithms such as CV, TF and TF-IDF are used for the numerical vectorization of different examples of the corpus. Base estimators like logistic regression, naïve bayes, decision tree, support vector classifier and random forests are used to train the model. Various combinations of these base models are also experimented to find out the best performing voting ensemble. It has been found that, an ensemble with logistic regression, decision tree and support vector machine is performing better than any other model with an accuracy of 88.80%. For other metrics such as precision, recall and f1-score, the voting ensemble with logistic regression, naïve bayes, support vector classifier and random forest perform better. It is also found that when the number of features are increased, slight improvement in results is visible. For instance, for all the metrics when number of features is reaching 10000, good results are got. Thus, ensembling edges out individual machine learning models for the task of fake news detection.