Modeling of covid-19 cases of selected states in Nigeria using linear and non-linear prediction models

COVID-19 has stamped an indelible mark in the history of humanity as one of the recorded deadly virus that has wiped out millions of lives on planet earth many whose exact cause of death cannot be account for due to lack of knowledge. It has become a household name in every nook and cranny from developed to the underdeveloped nations of the world. Most of the prominent signs of COVID-19 like fever, cough, difficulty in breathing and accessional muscle pain can also resemble those of many other notable diseases thereby making it highly necessary to undergo a diagnostic test to be able to categorically identify COVID-19 patients. The use of medical diagnostic tests can also help determine patients who have recovered from COVID-19. Various studies abound with researchers trying to predict and even forecast the level of damage and disruption of economic activities this may have brought to almost every nation of the world. This research attempts to find out the nature of the spread of the virus using Autoregressive Integrated Moving Average (ARIMA) and Artificial Neural Networks (ANN). The essence is to ascertain the exact model to use in forecasting the future occurrence of the pandemic especially at this stage where the second wave of the pandemic is in view. The study found that both linear and nonlinear predictions models can fit the trend of the virus in Nigeria with ARIMA producing results of over 97% on a 120-day period while ANN produced results of about 98.01% in some states. We conclude that future waves of the virus in addition to other epidemics of this nature can be predicted with high degree of accuracy with ARIMA or ANN.


Introduction
Coronavirus 2019 is an infectious and transmittable viral disease caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), which emerged in Wuhan, China [1]. The epidemic which has negatively impacted almost every nation of the world, began in Guangdong province, China, in the late 2002. It spread to Hong Kong on February 21, 2003 [2], and from there to what has been described as a pandemic today. The rate at which the disease is spreading is alarming as it has being reported in almost every continent from Asia to America, Australia, Europe and Africa. With the spread of the dreaded virus into various parts of the world, the Nigeria government took some pro-active measures to prevent it from getting into the country. This mainly took the form of taking steps to enforce strict security checks at the airport as early as in January 2020. These measures yielded insufficient results as the country recorded her index case late February from a victim that imported the virus from Europe. The announcement of the presence of the virus in Nigeria immediately brought great fear and concern to all Nigerians as many began to doubt the effectiveness of the surveillance operations deployed in our airports and the country's general preparedness as the dreaded virus continued to spread around Europe, Asia and United States in an astronomical rate. The index case of the virus here in Nigeria, had prior to testing positive, visited some other states of the apart from Lagos state. This is a proof of the inadequate level of readiness to combat the virus especially considering the level of publicity given to the pandemia worldwide. Just after the index case was detected, the NCDC launched a National Emergency Operations Centre (EOC) to oversee the national response to COVID-19. As a follow-up, the Presidential Task Force (PTF) for coronavirus control was also inaugurated on March 9, 2020 [3].
As at August 13, 2020, Nigeria has recorded about 47,743 confirmed cases of COVID-19 with 12,844 active and 33,943 cases already discharged while having about 956 deaths. This result is coming out of a total of 338,084 samples tested throughout the country [4]. The country has a total of 62 NCDC COVID-19 Laboratories scattered in all the states of the federation as at 13th of August 2020 for a population of about 200,000,000. Andres et. al., (2020) found that there exists a relationship between COVID-19 behavior and population in the region being considered. One can now argue that the few number of cases recorded in Nigeria may not adequately reflect the true position of the spread of COVID-19. Jester, Uyeki & Jernigan (2018) [5] traced the outbreak of epidemic from 1918, 1957, 1968 and 2009 till date and noted that all the four pandemics in last 100 years have had some genes that originated from avian influenza viruses. These viruses are constantly changing and therefore require ongoing surveillance and frequent vaccine virus changes. Improvements in healthcare, scientific researches, vaccines and effective communications have however improved pandemic response. There exists various works and studies on the prediction of cases and mortality rates related to the COVID-19 dreaded disease. This research is aimed at examining the effectiveness of both the ANN and ARIMA models in predicting the rate of spread of the virus in Nigeria. Its major significance is to compare a linear (ARIMA) and a non-linear method (ANN) of time series forecasting when applied to the confirmed cases of covid-19 in Nigeria as it will help the concerned agencies in tackling the prevalence and future nature of the disease. Oladipo & Babatunde (2016) [6] used Decision Tree, Logistic Regression and ANN for the prediction of diabetes among two sets of people from china and arrived at the conclusion that machine learning algorithms can efficiently predict such an ailment that is regarded as among the top five causes of death found in the United States [7].

Methodology
In this study, daily COVID-19 situation reports as presented by the Nigeria Centre for Disease Control (https://covid19.ncdc.gov. ng/) was used. The data covers a sample of 10 states for the period of 120days between April 2020 and July, 2020. The data is normalized and for the ANN, a unipolar sigmoid function is applied to produce values between 1 and 0. For the states selected, only the observed cases, number of patients already discharged and number of deaths were used as input variables to the neural network, while the output is a single attribute representing the predicted number of cases. The implementation of the simulation was carried out using the R programming language with the predicted values written to various spreadsheets. These worksheets were exported to C# interface where the relative error values where computed and the output written to MySQL data tables.

General Form Of ARIMA(P,D,Q)
An AR(p) model is a discrete time linear equation with noise of the form: Here, p is the order, 1… p are the parameters or coefficients (real numbers), t is an error term which is usually a white noise.
Since AR is a time series model, when we introduce the time lag operator (L) the AR becomes Lxt = xt-1, for all t Є Z (set of real numbers). Since the time lag operator is a linear operator, the powers, positive or negative can be denoted as: With this lag operation, the AR model becomes: Xt = Xt-1+ t Xt -Xt-1 = t The MA(q) model in ARIMA which denotes the moving average with orders (p and q) is an explicit formula for Xt in terms of noise of the form Where: Combining the AR and the MA, we have: This becomes the general form of ARIMA model as a discrete time linear equation.

Data Preparation for ARIMA
To process the data we used ARIMA(p,d,q) model. The steps involved in the use of ARIMA model are identification, parameter estimation and forecasting. These steps were repeated continually until an appropriate model is identified. The model to be selected is chosen based on the Akaike Information Criterion (AIC) [8] which is the model with the least AIC value. Based on the dataset, the data can also be made stationary by differencing before the training and forecasting.

Data Preparation for ANN
The general equation model for ANN is given as: Where: yt is the predicted value, wj are the output layer weights, wi,j are input layer weights, f is a transfer function, q is the number of hidden nodes, p is the number of input nodes, ɛt is a random error at time t. Out of the 120 days data collected for the selected states, we adopted 70% for training the algorithm, while the remaining 30% is used for testing. The 70% mapped out for training is further divided into two. While 40% is reserved for actual training, 30% is used for crossvalidation. Before uploading the data into the ANN, the data is normalized. The need to normalize the data arises from the fact that the features to be used as input do not have a uniform scale. This experiment uses the min-max normalization function.
The unipolar sigmoid function is adopted and is defined as: It was used to normalize input vectors of the training data prior to processing and returns values between 0 and 1.

Validation and evaluation
The daily time series reports of confirmed cases of COVID19 from April 1, 2020 to July 30, 2020, were extracted from Nigeria Centre for Disease Control website and analyzed for the purpose of the research. This sample is made up of states that had confirmed cases from April after the index case was announced in March. For the ARIMA prediction, the times-series data for all the states were checked for stationarity using the Augmented Dickey Fuller adf.test() function. Each state data is differenced accordingly to arrive at a stationary set with differentials. It is evident that while some states became stationary at differences=1, others attained stationarity at difference = 2 while none was stationary at the initial stage. Ensuring that the data became stationary became necessary as a precondition for the use of use of ARIMA since the data must not have any form of trend.
Shown in Table 1 are the various models for the individual states along with their error values. The data series from the states were tested for ARIMA (0,1,1), ARIMA (2,1,1), ARIMA (1,2,1), ARIMA (2,2,1), ARIMA (0,1,2), ARIMA (0,2,2) and, ARIMA (0,2,1) models. For each model, the AIC value is computed while the model which produces the lowest AIC value is selected as the optimum model which was finally used for the prediction.  Based on the ANN models (3,1), (3,2), (4,1), (4,2) and their errors as shown on Table 2, the model with the minimum error is selected for prediction. ANN(4:2) showed dominance over other models however. The unipolar sigmoid activation function was used for the mapping while min-max normalization was used to normalize the data.
The prediction performance is evaluated using various error valuation techniques. These include the Mean Absolute Percentage Error (MAPE) defined as follows: Where: At = Actual Value Ft=Forcasted Value T=Number of times Summations.

Results
The charts in Figure 2 shows the plotting of the FCT & Lagos State 60-Days plots of the prediction errors of both ARIMA and ANN. For easy comparison, the 30% data used for the testing of ANN were also extracted from the predictions of ARIMA via the use of a C# programming module. This led to the presentation of 18-periods rather than the entire 30-days duration. The charts on figure 2 are the FCT and Lagos State 60-days error plots for both ARIMA and ANN. Presented in Table 3 is the summary of ARIMA versus ANN error values for all states modeled.

Discussion
For the prediction using ANN, considering each state and prediction period, the data was split into training and testing datasets consisting of 70% and 30% respectively. The input is made up of the number of patients on admission and the number already discharged. On the 120-day period, the two models posted an accuracy level of 97.96% and 97.01% with is in line with the results obtained in the study by [9-11. Though no particular ANN model produced the optimal error value across all the data modeled, it is seen that ANN(4:2) was predominant over other ANN models tested. On the 120-days period tested for Lagos State and FCT, the model produced an accuracy level of 90.37% and 98.01% respectively.

Conclusion
There is no known universal antiviral treatment for COVID-19 at the moment, and no vaccine has been certified by the WHO though there are various claims by different scientists of having one kind of cure or vaccine or the other available. The results from the study is a true representation of the trend in all the states of Nigeria including the Federal Capital Territory (FCT), considering the fact that that almost all geopolitical areas in Nigeria were represented with the exception of areas that do not have testing facilities during the early days of COVID-19. This shows that either of the two models (ARIMA & ANN) can be used to optimally predict the COVID-19 confirmed cases in Nigeria. We are of the opinion that ANN would have produced a better all-round result if additional input neurons were made up of more of the factors influencing positive confirmations. This might include number of laboratories in each state, number of people tested on daily basis which are not available at real time.
As the spread of the virus have continued to disrupt economic activities in the world over, the risks posed, such as restriction of traffic, closure of cities and continued closure of schools, will create a huge economic downturn and slowdown the welfare of the people. To ease the tension and the suffering of the masses, the government must intensity its effort in providing necessary financial incentives to the people during this period [12]. With the gradual relaxation of the lockdown in most cities, leading to opening up of markets, churches and schools, it becomes imperative that safety measures have to be put in place. We therefore advice that more proactive measures be taken by all stakeholders to forestall an outbreak which the country cannot contain considering the level of health facilities and experts available in the country. As the lifting of stay-at-home and other restrictions continues to spread all through the states of Nigeria, it is important that the populace still continues to observe some degrees of safety measures. Adams et. al., (2020) [13] in a study found that there is a positive correlation between the number of deaths as a result to COVID-19 and the number of active and critical cases. This simply implies that it is not yet time for Nigerians to celebrate and off their guards as more cases are still being confirmed on a daily basis. It has been observed that people have abandoned the masks and continued to flock together in market places, religious houses, schools while social gatherings have all commenced even without strict adherence to COVID-19 protocols. In the future, other variables that could contribute to the forecasting accuracy of our models such as humidity, temperature, hygiene, sentiments among other variables can be included in the models.

Funding
The authors certify that there is no conflict of interest with any financial organization regarding the material discussed in the manuscript.