An Ensemble Classification Algorithm based on Semantics for Text Data Streams with Concept Drifts
Abstract
How to mine user-interested information from text data streams with concept drifts is one of the hot topics in natural language processing research, therefore, a new ensemble text data streams classification algorithm based on semantics is proposed. The algorithm first uses the minimum redundancy and maximum relevant feature selection method to remove irrelevant features and redundant features in the text data stream; then, uses the topic model calculates the semantic similarity in the text data stream and detects the concept drifts; finally, the ensemble classification model is used to classify the text data stream. Experimental results show that the ensemble classification algorithm proposed in this paper can effectively detect the concept drifts and has good classification performance for text data streams.