TY - JOUR AU - Dekka Satish, Bugata Nageswara Rao, K. Narasimha Raju, PY - 2022/01/23 Y2 - 2024/03/29 TI - Malicious URL Detection using Natural Language Processing and Machine Learning JF - Design Engineering JA - DE VL - IS - 1 SE - Articles DO - UR - http://thedesignengineering.com/index.php/DE/article/view/8804 SP - 248-255 AB - Malicious (Phishing) URLs are a critical threat to cybersecurity because then they can lead to scams in which individuals lose money, personally identifiable information, and accounts. It is crucial to be able to respond appropriately to these attacks. The most reliable procedure for controlling this issue is to utilize blacklists, whereas this method has a lot of problems responding against new URLs. Machine Learning is a process where a system learns from training and this training is useful for predictions. In today's digital world, machine learning becomes a buzzword as it can able to solve most cybersecurity problems. In this work, we collected a phishing URLs dataset from Kaggle (which contains more than 5,00000 URLs), and a machine learning-oriented solution is provided for malicious URL detection. As the URLs are in text format, we also applied various text preprocessing, text encoding techniques. First, we applied three text encoding techniques hashing vectorization, Count Vectorization, TF-IDF Vectorization. Later, we applied five machine learning algorithms namely SVM, Decision Tree, K-NN, Logistic Regression, Random Forest. We achieved an accuracy of 97.8% with random forest. Our model outperforms the previous models for malicious URL detection. We used python for implementation ER -