The predictor model indicates where the Zika virus will spread using machine learning
Olivia P, Sneha N
Using machine learning, we factored in different attributes that we thought would contribute to the spread of Zika to predict which countries will be most affected by it in the future. The factors included world health ranking, GDP per capita, literacy rates, unemployment rates, percent of population below poverty line, number of flights from Brazil (and their destinations), and whether the country shares a border with an infected country. The computer then took our parameters in and, using what information it thought was the most relevant, predicted which countries would have the highest likelihood of acquiring Zika.
Ever since appearing in late October of 2014, the Zika virus has been spreading around the world at an alarming rate. Affecting over 70 countries, today the virus is responsible for thousands of deaths and hundreds of thousands of cases of microcephaly, a condition that causes the under-development of fetal brains. The Zika virus, which is transmitted by mosquito bites, blood transfusion, bodily fluids, and through pregnancy, originated in Brazil, but has been transmitted all over the world, primarily through traveling and contact with people who travel. The most common symptoms of the disease include fever, rash, joint pain, red eyes, muscle pain, and headaches. Currently, there is no cure for the virus, but national and world health organizations are actively learning more about Zika and how to treat it.
Knowing that there was no way to predict where the Zika virus is going to spread in the future, we decided to use machine learning to create algorithms that would learn from where it currently is and forecast where it will go. In order to do this, we had to first learn about machine learning, then learn about the different algorithms we could create, and finally analyze our results to create a way to show our data. We decided to use Decision Trees (J48 classifying) and User Classifying.
We created one algorithm (Decision Tree based) that took into account a country’s GDP per capita and literacy rates to predict where the Zika virus will spread. We chose those two attributes since, when studying the data, we noticed correlation between those parameters. This algorithm was machine learning; the machine took in the data, trained itself with the data and then made predictions based off of what it learned.
We created another manually-managed algorithm that entered multiple parameters, including world health ranking, GDP per capita, literacy rates, unemployment rates, percent of population below poverty line, number of flights from Brazil (and their destinations), and whether the country shares a border with an infected country to also predict where Zika would spread. This is not considered machine learning, since we, as the user, entered in the data and analyzed it ourselves. This means the machine did not learn from the data; it just categorized it. We used this type of algorithm to compare against the Decision Tree (machine learning), so we could see which was more accurate.
With any machine learning project, the hardest and most time-consuming part is collecting accurate data. Since the Zika virus is relatively new, there is no source that compiles all of the data we needed to collect, so we had to look at hundreds of different websites to find the most accurate and representative data that we needed.