NLP Disaster Tweets Classification

Authors

  • Kristopher Flint
  • Zachary Kellerman

Abstract

In this research, we use natural language processing to predict whether or not text is referring to a natural disaster. Using the Kaggle competition data, consisting of 10,000 labeled Tweets, we apply machine learning models to determine which Tweets are about real disasters and which ones are not. We first perform some exploratory data analysis, visualizing the dataset, and then text preprocessing to remove irrelevant characters. We discover that most disasters are related to natural disasters and criminal activity, while non-disasters, focus on slang and YouTube videos. We implement five machine learning models: Naive Bayes, Support Vector Machines, Random Forest, a Neural Network, and an Ensemble Learner. Naive Bayes and the Ensemble Learner appear to have slightly more success than the other models as the accuracy for these is about 80%. Finally, we tune the model parameters so as to increase the performance of our predictions.

Downloads

Published

2020-05-11

Issue

Section

Computer Science