Predicting the Spread of Dengue Fever in Tropical Cities Using Machine Learning Methods
Abstract
Predicting the spread of communicable diseases is more important than ever, given the impact of the recent COVID-19 pandemic. Despite all the attention that pandemics get, they are not the only health situations that must be handled. There are ongoing endemics in tropical countries where health services and the standard of living are particularly deficient. These endemics are caused by neglected tropical diseases, or NTD’s. Of these, a viral disease called dengue fever is of concern for this project. To combat this debilitating disease, we are predicting future cases and outbreaks in Iquitos, Peru and San Juan, Puerto Rico based on prior dengue statistics in the form of case numbers and the corresponding week of the year. We also have climate variables such as temperature, humidity, precipitation amount, and dew point to draw correlations from. We use machine learning and data science techniques such as value imputation and Random Forest Regression to clean and model the data. The goal is to predict our target variable which corresponds to the number of cases per week, where success is determined by the mean absolute error. By conducting this analysis, drawing conclusions, and evaluating models, we believe we can accurately forecast future outbreaks. Ultimately, our approach would help response teams be assigned more efficiently to the severely affected areas. We validate our approach using DrivenData’s DengAI competition data set.