MODELING NOVEL COVID-19 PANDEMIC IN NIGERIA USING COUNT DATA REGRESSION MODELS

This study aimed to model COVID-19 daily cases in Nigeria, focusing on confirmed, active, critical, recovered, and death cases using count data regression models. Three count data regression models-Poisson regression, Negative Binomial regression, and Generalized Poisson regression were applied to predict COVID-19 related deaths based on the mentioned variables. Secondary data from the Nigeria Centre for Disease Control (NCDC) between February 29, 2020, and October 19, 2020, were used. The study found that Poisson Regression could not handle over-dispersion inherent in the data. Consequently, Negative Binomial Regression and Generalized Poisson Regression were considered, with Generalized Poisson Regression identified as the best model through performance criteria such as -2 log likelihood (-2logL), Akaike information criterion (AIC), and Bayesian information criterion (BIC). The study revealed positive and significant impacts of confirmed, active, and critical cases on COVID-19 related deaths, while recovered cases had a negative effect. Recommendations included increased attention to confirmed, active, and critical cases by relevant authorities to mitigate COVID-19-related deaths in Nigeria.


INTRODUCTION
Coronavirus disease 2019 (COVID-19) is caused by the novel coronavirus SARS-CoV-2.The origins of the virus are believed to be zoonotic, meaning it likely originated in animals before transmitting to humans (Cheng and Shan, 2020).The first known cases of COVID-19 were reported in December 2019 in the city of Wuhan, Hubei province, China.Many early cases were linked to a seafood market in Wuhan, suggesting a possible zoonotic origin.The market also sold live wild animals, raising concerns about the virus's transmission from animals to humans (Cheng and Shan, 2020).Scientists initially suspected that the virus might have originated in bats and then jumped to humans, possibly through an intermediate host species (Hassan et al., 2020).Some early studies suggested a link to pangolins, a type of mammal, as a potential intermediary host, though this has not been definitively proven (Shereen et al., 2020).The SARS-CoV-2 virus is a betacoronavirus, similar to the Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) that caused the SARS outbreak in [2002][2003].Genetic analysis indicates that SARS-CoV-2 shares a high degree of similarity with coronaviruses found in bats (Hassan et al., 2020).The virus quickly spread globally, leading to a pandemic.The World Health Organization (WHO) declared COVID-19 a Public Health Emergency of International Concern on January 30, 2020, and later declared it a pandemic on March 11, 2020 (WHO, 2020).International efforts, including studies conducted by the World Health Organization, have been underway to investigate the origins of SARS-CoV-2.These investigations involve collaboration between scientists from various countries and aim to understand how the virus emerged and whether it involved an intermediary host species (WHO, 2020).Nigeria confirmed its first case of COVID-19 on February 27, 2020.The index case was an Italian citizen who arrived in Lagos, Nigeria's largest city, from Milan, Italy.Following the confirmation of the first case, the Nigerian government implemented various measures to curb the spread of the virus (NCDC, 2020).These measures included travel restrictions, quarantine protocols, and public health awareness campaigns.
Subsequent cases in Nigeria were primarily linked to travel or contact with confirmed cases.However, community transmission soon became a concern, prompting the government to escalate response efforts.To contain the spread of the virus, Nigeria implemented lockdowns and restrictions in major cities.These measures included the closure of schools, businesses, and restrictions on movement.Nigeria scaled up testing and surveillance efforts to identify and isolate cases promptly.Testing centers were established across the country to enhance diagnostic capabilities.Nigeria collaborated with international organizations such as the World Health Organization (WHO) and received support in terms of expertise, medical supplies, and equipment.As COVID-19 vaccines became available, Nigeria initiated vaccination campaigns to immunize the population against the virus.The vaccination efforts aimed to achieve widespread coverage and reduce the impact of the disease.Modeling COVID-19 cases is crucial in public health and epidemiology for several reasons.It helps researchers and officials understand the virus's spread, considering factors such as transmission rates and intervention impacts.Modeling anticipates healthcare resource needs, aiding preparedness and allocation of resources like hospital beds, ventilators, PPE, and staff.Policymakers rely on models for informed decisions on interventions, including lockdowns, social distancing, travel restrictions, and vaccination campaigns.Models guide vaccine distribution planning by identifying priority populations and estimating doses required.They monitor the pandemic's progression, track intervention effectiveness, and provide early warnings for timely responses.Models support effective public communication, explaining health recommendations and projecting potential consequences.Additionally, modeling contributes to the development of treatments and vaccines, offering a framework to understand intervention impacts.Collaboratively, models facilitate international cooperation, enabling a coordinated global response to the pandemic.It is crucial to acknowledge that models are simplifications with inherent uncertainties but remain invaluable for decisionmaking in rapidly evolving health crises like COVID-19.
Count data regression models, such and Poisson regression, Negative Binomial regression and Generalized Poisson regression, can be valuable in modeling COVID-19 cases due to their ability to handle discrete count data.These models are specifically designed for variables that represent the number of occurrence of an event in a fixed unit of time, space, or other well-defined intervals such as disease cases, traffic accidents or customer arrivals.They provide flexibility in handling discrete, non-negative data with a skewed distribution, allowing for variations in the data that may not be adequately captured by traditional linear regression models (Nwankwo and Nwaigwe, 2016) The provided text discusses various studies that utilized statistical models as well as Poisson-related models to analyze and forecast infectious disease dynamics, particularly focusing on COVID-19 cases in different regions.Babatunde and Igboeli (2020) conducted research aiming to understand the COVID-19 virus spread dynamics by employing Autoregressive Integrated Moving Average (ARIMA) and Artificial Neural Networks (ANN).The objective was to identify the most suitable model for forecasting future occurrences of the pandemic, especially anticipating the second wave.The study revealed that both linear and nonlinear prediction models effectively captured the virus trend in Nigeria.ARIMA demonstrated over 97% accuracy over a 120-day period, while ANN yielded approximately 98.01% accuracy in certain states.The research concluded that ARIMA or ANN could accurately predict future waves of the virus and similar epidemics.key compartments and parameters.They obtained the basic reproduction number and analyzed the stability of diseasefree equilibrium.Using data from the Nigeria Centre for Disease Control, they calibrated the model and estimated its key parameters.Sensitivity analysis explored parameter influence on disease control, and optimization using Pontryagin's maximum principle identified time-dependent intervention strategies to suppress virus transmission.Numerical simulations explored optimal control solutions, highlighting the necessity of stringent intervention efforts for rapid disease suppression.Zhao et al. (2022) proposed an autoregressive count data model for predicting COVID-19 cases in Sweden.They recommended specific lag periods based on population density and intervention inclusion for accurate predictions.Giudici et al. (2022) introduced Bayesian time-dependent Poisson autoregressive models to estimate the impact of policy measures on COVID-19 cases in Italy and the United States.Their results indicated a significant reduction in disease counts due to policy interventions, including closures and vaccine distribution.Overall, these studies highlight the versatility and effectiveness of Poisson-related models in understanding and forecasting infectious disease dynamics, particularly during the COVID-19 pandemic.

MATERIALS AND METHODS Data Source
The data used in this work comprised daily confirmed, active, critical, recovered and death cases due to COVID-19 in Nigeria from 29 th February, 2020 to 19 th October, 2020.The data was obtained as secondary data from NCDC (2020) website.

Model Specification
The models adopted for the current study are Poisson Regression (PR), Negative Binomial Regression (NBR) and Generalized Poisson Regression (GPR).The count data are the daily death cases which are considered as the dependent variable whereas the daily confirmed, active, critical, and recovered cases of COVID-19 are the independent variables.All the variables considered are continuous count data.

Poisson Regression Model
Poisson regression is a type of regression analysis used when the dependent variable is a count variable, representing the number of events occurring in a fixed interval of time or space.The Poisson regression model assumes that the counts follow a Poisson distribution, which is characterized by the mean rate of events (  ) occurring in the fixed interval (Agresti, 2007).The mathematical form of the Poisson regression model is given by: ln(  ) =  0 +  1  1 +  2  2 +  3  3 +  4  4 +   (1) where   is the expected COVID-19 related deaths for the th observation, ln   is the natural logarithm of the expected COVID-19 related death, and it serves as the link function,  0 is the intercept term,  1 = confirmed cases,  2 = active cases,  3 = critical cases,  4 = recovered cases,  1 ,  2 ,  3 and  4 are the slope coefficients corresponding to the confirmed, active, critical and recovered COVID-19 cases respectively and   are the error terms which account for the unexplained variations or factors not included in the model.The model assumes that the observed counts   follow a Poisson distribution with mean   :

Kuhe et al., FJS
The link function ln(  ) transforms the linear combination of the independent variables into the range of real numbers, ensuring that the predicted values are positive.The Poisson regression model expresses the logarithm of the expected count as a linear combination of predictor variables, and it is commonly estimated using maximum likelihood methods.

The Negative Binomial Regression
The Negative Binomial Regression model is a type of generalized linear model (GLM) that is used for count data when the variance is greater than the mean, which is a common characteristic of count data.The Negative Binomial Regression model is particularly suitable when there is overdispersion in the data, meaning that the variability is higher than what would be expected under a Poisson distribution (Hilbe, 2011).
The probability mass function (PMF) of the Negative Binomial distribution is given by: where  is the random variable representing the count,  is the observed count,  is the dispersion parameter (shape parameter) of the distribution,  is the probability of success in a single trial and Г(. ) is the gamma function (Hilbe, 2011).
The mean  and variance  2 of the Negative Binomial distribution are given by: In the context of regression modeling, the Negative Binomial Regression model extends the Poisson Regression model by introducing a dispersion parameter .The mean  is related to the linear predictor  through the logarithmic link function, ln() =  (Hilbe, 2011).
The linear predictor  is modeled as a linear combination of predictors:  =  0 +  1  1 +  2  2 +  3  3 +  4  4 +   (5) where  1 = confirmed cases,  2 = active cases,  3 = critical cases,  4 = recovered cases,  0 = intercept,  1 ,  2 ,  3 and  4 are the slope coefficients corresponding to the confirmed, active, critical and recovered COVID-19 cases respectively and   are the error terms which account for the unexplained variations or factors not included in the model.The dispersion parameter  is estimated along with the coefficients during the model fitting process.The Negative Binomial Regression model allows for handling overdispersed count data and provides a flexible approach for modeling relationships between predictors and count outcomes.

The Generalized Poisson Regression (GPR)
The Generalized Poisson Regression (GPR) is a model that extends the Poisson Regression by introducing an additional parameter to account for over-dispersion (Agresti, 2007).The probability mass function (PMF) of the Generalized Poisson distribution is given by: where  is the random variable representing the count,  is the observed count,  is the mean parameter and  is the overdispersion parameter.The mean  of the Generalized Poisson distribution is given as  =  and the variance  2 is given as  2 = (1 + ).
In the context of regression modeling, the Generalized Poisson Regression model models the logarithm of the mean  as a linear combination of predictors using the link function.The link function is often the logarithmic function, leading to a linear predictor , ln() =  (Agresti, 2007).The linear predictor  is modeled as:  =  0 +  1  1 +  2  2 +  3  3 +  4  4 +   (7) All the parameters are as earlier defined.The model parameters, including the over-dispersion parameter , are estimated during the model fitting process.The Generalized Poisson Regression allows for flexibility in handling over-dispersed count data, and it is particularly useful when the assumption of equidispersion in the Poisson model is violated.The choice of the link function and the inclusion of predictors follow similar principles to other generalized linear models.

Model Evaluation and Diagnostic Checks
The following goodness-of-fit measures are employed to evaluate model performance and to check the adequacy of the estimated models.

Deviance statistic
Deviance is defined as the log likelihood of the final model multiplied by (-2).It is mathematically expressed as: where  ̂ is the predicted value of   .

Pearson chi-square
This is a goodness-of-fit measure that compares fitted values of the outcome variables with the actual values.It is computed mathematically as: where  ̂ is the predicted value of   .

Bayesian information criterion (BIC)
Bayesian information criterion (BIC) is a goodness-of-fit measure defined as: BIC = −2 ln + ln()  (11) where  is the total number of observations,  is the number of model parameters and  is the likelihood function of the final model defined as: Thus given a set of estimated Poisson regression models for a given dataset, the preferred model is the one with the minimum values of information criteria and maximum log likelihood value.

Test of multicollinearity for COVID-19 cases in Nigeria
To establish an initial assumption for parameter estimation of COVID-19 cases in Nigeria, a multicollinearity test has been conducted.This test involves calculating variance inflation factor (VIF) values for each predictor variable.Multicollinearity is identified if the VIF value for any independent variable exceeds 10.The multicollinearity test has been performed for each predictor variable, and the results are presented in Table 3.The results of the multicollinearity test presented in Table 3 reveal that the variance inflation factor (VIF) values for all predictor variables are below 10.This signifies that all variables are suitable for inclusion in subsequent analyses.

Modeling COVID-19 Cases in Nigeria using Family of Count Data Regression Models
The COVID-19 daily confirmed, active, critical, recovered, and death cases are modeled using a set of count data regression models, specifically: Poisson Regression (PR), Negative Binomial Regression (NBR), and Generalized Poisson Regression (GPR) models and result is presented I Table 4.

Kuhe et al., FJS
The intercept signifies the anticipated Poisson regression estimate when all predictor variables in the model are set to zero.In the Poisson regression model reported in the upper panel of Table 4, the intercept exhibits a negative association with COVID-19 related deaths and achieves statistical significance at a 5% level of significance.The logarithm of the expected COVID-19 deaths registers a numerical value below zero (-2.202) when holding all independent variables constant in the model.The slope coefficients for confirmed ( 1 ), active ( 2 ), and critical ( 3 ) cases of COVID-19 exhibit positive relationships with COVID-19 death cases and achieve statistical significance at the 5% level.This implies that an increase in the numbers of COVID-19 confirmed, active, and critical cases is associated with a corresponding increase in COVID-19-related deaths.Specifically, a one-unit change in confirmed, active, and critical COVID-19 cases is expected to result in changes of 2.781, 1.007, and 0.167 units, respectively, in the differences in the logarithms of expected deaths due to COVID-19, assuming the other predictor variables in the model remain constant.The slope coefficient for recovered cases ( 4 ) of COVID-19 displays a negative relationship with COVID-19 death cases and attains statistical significance at the 5% level.This indicates that an increase in the number of COVID-19 recovered cases is associated with a decrease in COVID-19 deaths.To elaborate, a one-unit change in recovered COVID-19 cases is anticipated to result in a reduction of 0.775 units in the difference in the logarithms of expected deaths due to COVID-19, assuming that the other predictor variables in the model remain constant.In the estimated Negative Binomial regression model presented in the middle panel of Table 4, the intercept demonstrates a negative association with COVID-19 deaths and achieves statistical significance at the 1% level.The logarithm of expected COVID-19 deaths is numerically below zero (-6.513) when all independent variables in the model are held constant.The slope coefficients for confirmed ( 1 ), active ( 2 ), and critical ( 3 ) cases of COVID-19 exhibit positive relationships with COVID-19 death cases and achieve statistical significance at the 5% level.This implies that an increase in the numbers of COVID-19 confirmed, active, and critical cases is associated with a corresponding increase in COVID-19-related deaths.Specifically, for a one-person increase in the number of confirmed, active, and critical COVID-19 cases, the differences in the logarithms of expected deaths due to COVID-19 are expected to increase by 5.665, 1.015, and 1.018 persons, respectively, assuming the other predictor variables in the model remain constant.Conversely, the slope coefficient for recovered cases ( 4 ) of COVID-19 shows a negative relationship with COVID-19 death cases and reaches statistical significance at the 5% level.This indicates that an increase in the number of COVID-19 recovered cases is associated with a decrease in COVID-19 deaths.To be more precise, for a one-person increase in recovered COVID-19 cases, the difference in the logarithms of expected deaths due to COVID-19 is expected to decrease by 0.612 persons, given that the other predictor variables in the model are held constant.The parameter estimates from the Generalized Poisson Regression model, as presented in the lower panel of Table 4, indicate that all model parameters are statistically significant at the 5% level, as evidenced by p-values numerically less than 0.05.The findings further reveal that the four slope coefficients corresponding to COVID-19 cases are positively associated with COVID-19 death cases, except for the recovered cases.This implies that a one-person increase in the number of confirmed, active, and critical cases of COVID-19 is expected to result in approximately 4 persons, 2 persons, and 1 person increase, respectively, in the number of COVID-19-related deaths in Nigeria.Additionally, the z-statistics values for confirmed, active, and critical cases (3.4748, 13.012, and 12.412, respectively) are greater than 2. According to the rule of thumb, this suggests that confirmed, active, and critical COVID-19 cases have a positive and significant impact on the number of deaths associated with COVID-19 in Nigeria.

Kuhe et al., FJS
However, the slope coefficient for recovered cases has a negative and significant value (-1.062) at the 5% significance level.This implies that a one-person increase in the number of recovered cases of COVID-19 in Nigeria during the investigated period is expected to lead to approximately a 1person decrease in the number of deaths due to COVID-19.The z-statistic value for recovered cases is (-8.544),indicating that the recovered cases are less than 2. Following the rule of thumb, this implies that recovered cases have a negative effect on the number of COVID-19-related deaths in Nigeria.
The outcomes of the Poisson regression, Negative Binomial regression, and Generalized Poisson regression models distinctly indicate that the predictor variables positively contributing to the number of COVID-19-related deaths in Nigeria are the confirmed cases ( 1 ), active cases ( 2 ), and critical cases (  3 ).Conversely, the recovered cases (  4 ) exhibit a negative impact on the number of COVID-19-related deaths in Nigeria.This result agrees with the empirical findings of Adams et al. (2020).

Goodness of fit test for count data regression models
In Poisson regression modeling, a key assumption is equidispersion, where the mean and variance are equal.However, this assumption is often not met due to the presence of over-dispersion in most cases.Over-dispersion can be identified by examining the ratio of deviance to degrees of freedom (df): a ratio greater than 1 indicates over-dispersion, while a ratio less than 1 suggests under-dispersion.Table 5 presents the results of a goodness-of-fit test used to detect over-dispersion in the three count data regression models.

Model comparison using performance criteria
To choose the most suitable model among the three competing count data models for modeling COVID-19 cases in Nigeria, three performance criteria are taken into account: -2log-likelihood, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC).The preferred model is the one exhibiting the highest -2log-likelihood value and the lowest values for the information criteria.The outcomes of the model comparison are preseted in Table 6.
Ibrahim and Oladipo (2020) undertook a study focusing on analyzing COVID-19 spread in Nigeria using statistical models and data from the NCDC.Their research aimed to establish a predictive model to guide health interventions and mitigate virus spread.Daily spread data from February 27 to April 26, 2020, were utilized to build an autoregressive integrated moving average (ARIMA) model using R software.The study conducted stability analysis, stationarity tests, parameter tests, and model diagnostics.Based on AICc model selection criteria, the ARIMA (1,1,0) model was chosen, projecting a steep increase in COVID-19 spread in Nigeria over a ten-day period.Adams et al. (2020) studied COVID-19 cases in Nigeria using Poisson, Negative Binomial, and Generalized Poisson Regression models.They found that the Generalized Poisson Regression was the best model, highlighting the positive impact of active and critical cases on COVID-19-related deaths.Barria-Sandoval et al. (2021) modeled COVID-19 cases and deaths in Chile using ARIMA, Exponential Smoothing, and Poisson models.They determined ARIMA as the best model for confirmed cases and a damped trend method for predicting deaths.Agosto and Giudici (2020) favoured Poisson autoregressive models over exponential growth models for studying COVID-19 cases in South Korea, Iran, and Italy, citing superior predictive performance.Nasution et al. (2021) used Poisson Autoregression to predict COVID-19 cases in Jakarta, finding it more accurate than ARIMA, Exponential Smoothing, BATS, and Prophet models.Tawiah et al. (2021) investigated COVID-19 deaths in Ghana using zero-inflated models.They found the zero-inflated negative binomial autoregressive model outperformed others, predicting a sharp rise and fall in deaths.Alzahrani (2022) applied log-linear Poisson autoregressive and ARIMA models to study COVID-19 in Saudi Arabia, concluding that the former provided better forecasting and could be applied comprehensively.Adewole et al. (2021) formulated a mathematical model to comprehend COVID-19 dynamics in Nigeria, incorporating

©2024
(GPR)  in modeling COVID-19 cases in Nigeria.GPR emerges as the preferred model, displaying the highest -2log-likelihood (-299.735),along with the lowest AIC (1006.517)andBIC (1519.361)values.This finding agrees with the previous result ofAdams et al. (2020), who also found that Generalized Poisson regression was better in modeling COVID-19 data in Nigeria.Beyond these metrics, GPR excels due to its flexibility in handling over-and under-dispersion, accommodating varied variance structures.Its adaptability makes it superior for real-world scenarios where traditional models may falter in capturing data variability, establishing GPR as a robust and versatile choice for count data regression modeling.CONCLUSIONThis study utilized three count data regression models-Poisson, Negative Binomial, and Generalized Poisson-to analyze the factors influencing COVID-19 related deaths in Nigeria, considering confirmed, active, critical, and recovered cases as predictor variables.Secondary data from the Nigeria Centre for Disease Control (NCDC) covering the period from February 29, 2020, to October 19, 2020, was employed.Results indicated that Poisson Regression failed to capture data over-dispersion, leading to the selection of Negative Binomial Regression and Generalized Poisson Regression.This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International license viewed via https://creativecommons.org/licenses/by/4.0/which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is cited appropriately.

MODELING NOVEL COVID-19 PANDEMIC… Kuhe et al., FJS FUDMA
Table 1 displays the computed summary statistics for the confirmed, active, critical, recovered, and death cases of COVID-19 in Nigeria.

Table 4 : Parameter Estimates of Count Regression Models
The Poisson regression entails modeling the logarithm of the anticipated count in relation to the predictor variables.The parameters of the Poisson regression, as depicted in Table4, can also be expressed as:   = exp(−2.202+ 2.781 + 1.007 + 0.167 − 0.775)

Table 6 : Model Comparison Using Performance Criteria
Table 6 compares three count data regression models: Poisson Regression (PR), Negative Binomial Regression (NBR), and Generalized Poisson Regression Generalized Poisson Regression emerged as the best-fitting model based on performance criteria such as -2logL, AIC, and BIC.The study identified positive and significant impacts of confirmed, active, and critical cases on COVID-19 deaths, while recovered cases exhibited a negative effect.The study recommends the use of Generalized Poisson Regression (GPR) for modeling the influence of daily confirmed, critical, active, and recovered COVID-19 cases on related deaths in Nigeria.Additionally, it suggests that the Nigerian Government, particularly through the Presidential Task Force (PTF) on COVID-19 and the Nigeria Centre for Disease Control (NCDC), should prioritize efforts and resources toward monitoring and addressing confirmed, active, and critical cases, as they are identified as the main factors influencing COVID-19 related deaths in the country.