Classification of Crime News on online Newspaper Articles by Applying NB Classifier Through Various Feature Extraction Methods

  • Sukumar P, Robert L, Gilbert Rozario S
Keywords: Data preprocessing, Feature Extraction, Count Vectorizer, TFIDF, Naive Bayes, News Classification algorithm, Crime Classification.


News is a collection of current events. News cover a variety of topics that include a lot of options namely agriculture, business, economics, education, employment, entertainment, health, politics, technology, world and etc. Certain events that happen unjustly are called crime news. Breaking the law is a criminal offense. Crime news collection aids I providing knowledge in order to prevent crime. In digital enlargement, classification and prediction are some of the most important research areas in this current era. Most of the researches in news classification is focused on news headlines. In this paper, the proposed system focuses on the news main story and classifies the news articles using classification. The machine learning algorithm of Naive Bayes is evaluated and used to classify the news articles automatically as Crime and Non-Crime News. Real-world raw data is not fit for classification algorithms directly because it’s noisy, missing, and comprises of outliers. The appearance of any of these factors will detract excellent results. Text pre-processing techniques are applied to clean the text and NLP techniques of Stop word removal and lemmatization are then applied to enrich the data. The next phase of Feature Extraction uses the Counter Vectorizer, N-grams and TFIDF Vectorizer to extract Features in news data. The proceeding phase usesNaive Bayes algorithm to classify news articles. The classifier result evaluates the matrices of accuracy, precision, recall, and f1 score of theevaluated result.

How to Cite
Gilbert Rozario S, S. P. R. L. (2021). Classification of Crime News on online Newspaper Articles by Applying NB Classifier Through Various Feature Extraction Methods. Design Engineering, 12094- 12107. Retrieved from