CROP YIELD PREDICTION USING SELECTED MACHINE LEARNING ALGORITHMS

Agriculture is paramount to global food security, and predicting crop yields is crucial for policy and planning. However, predicting these yields is challenging due to the myriad of influencing factors, from soil quality to climate conditions. While traditional methods relied on historical data and farmer experience, recent advancements have witnessed a shift towards machine learning (ML) for improved accuracy. This study explored the application of machine learning (ML) techniques in predicting crop yields using data from Nigeria. Previous efforts lacked transferability across crops and localities; this research aimed to devise modular and reusable workflows. Using data from the Agricultural Performance Survey of Nigeria, this study evaluated the performance of different machine learning algorithms, including Linear Regression, Support Vector Regressor, K-Nearest neighbor, and Decision Tree Regressor. Results revealed the Decision Tree Regressor as the superior model for crop yield prediction, achieving a prediction accuracy of 72%. The findings underscore the potential of integrating ML in agricultural planning in Nigeria where agriculture significantly impacts the economy. Further research is encouraged to refine these models for broader application across varying agroecological zones.


INTRODUCTION
Crop yield can be described as the measurement of a farm product grown per unit area of land.The measurement unit of crops is usually by kilograms per hectare or bushels per acre.According to a report in (Factors That Influence Crop Yield -Omnia Nutriology®, 2017) shown that yield performance of many crops can be attributed to four most important factors including soil fertility, availability of water, climate, and diseases or pests.This is some of the most important information used by scientist to predict crop yield (Xu et al., 2019).Predicting crop yield is critical to addressing one of the gaining problems in food security, particularly with the impact of global climate change (Ansarifar and Wang 2019).Predictions are vital but complex problems, which is needed for sustainable boosting and good use of natural resources (Phalan, Green, and Balmford 2014).Accurate crop yield prediction is very pertinent to global food production.This is because, the predictions not only aid farmers in making informed commercial and management decisions but also help in famine prevention activities (Ansarifar and Wang 2019).Different approaches have been used to predict crop performance including field surveys, crop growth models, remote sensing, statistical models, and their combinations (Paudel et al., 2021).Each of these approaches addresses imperceptible different aspects of crop yield prediction independently.The field surveys approaches try to apprehend the ground truth while crop growth models simulate the crop growth and development, putting agronomic principles, environmental and management interactions into consideration (Chipanshi et al., 2015).Remote sensing depends on satellite instruments showing frequent, coarse resolution image time series for yield estimation (Atzberger et al., 2016).The statistical models rely on the use of weather variables and the output of field survey, crop growth models and remote sensing as predicators to develop linear relationships between the predicators and crop yield (Paudel et al., 2021).Some studies have combined two or more of these approaches to predict crop yield.For example, in the studies of Zhao, Potgieter, Zhang, Wu and Hammer (2020) combined crop modelling and high-resolution remote sensing data to build statistical models to predict crop yield.Another study with similar approach conducted by Newlands et al. (2014) proposed a probabilistic yield prediction in Canada using crop modelling, remote sensing, Bayesian inference and statistical models.Machine learning (ML) takes a data-driven or empirical modeling approach to learn useful patterns and relationships from input data (Willcock et al., 2018) and offers a promising opportunity for improving crop yield predictions (Paudel et al., 2021).Machine learning models have proven powerful performance in several data-driven applications including the crop yield prediction (Zhao, Potgieter, Zhang, Wu and Hammer 2020;Paudel et al., 2021).Many studies have employed machine learning approaches such as the multivariate regression, random forest, association rule mining, regression tree and artificial neural network for crop yield prediction (Khaki, Wang, and Archontoulis 2020).The machine learning models treat the output, crop yield as an inherent function of the input variables such as weather parameters and soil conditions, which might be a precise complex and nonlinear function (Khaki, Wang, and Archontoulis 2020).Just as in statistical models, machine learning algorithms can also use the output of other prediction approaches as features.Machine learning algorithms have some distinct benefits as can model nonlinear relationships between multiple sources of data (Chlingaryan, Sukkarieh and Whelan 2018).The performance of Machine learning algorithms improves generally when more training is avail, where regularization techniques are employed to reduce variance and regularization error when the data is robust to noisy (Goodfellow, Bengio and Courville 2016).Therefore, machine learning could combine the benefits of other approaches, such as remote sensing, datadriven models, and crop growth modelling to make reliable crop yield prediction (Paudel et al., 2021).The European Commission's Joint Research Centre (JRC) and the National Agricultural Statistics Service (NASS) of US Department of Agriculture have a large-scale crop yield forecasting systems, such as the MARS Crop Yield Forecasting System (MCYFS) that relies on the infrastructure and historical data to build and assess crop prediction models for various crops in different localities (Paudel et al., 2021).The system utilizes statistical models from field survey results, crop growth model output, weather observations, remote sensing indicators and yield statistics (MARSWiki, 2020; USDA-NASS, 2012).However, performance evaluation of MCYFS from 1993 -2015 shows no significant improvement in the performance from 2006 onwards (Van der Velde and Nisini, 2019).Machine learning could be the best model for such large-scale system.Machine learning is a promising approach especially when a large amount of dataset is gathered and made public (Lokers, Knapen, Janssen, Randen, and Jansen 2016).For example, Jeong et al. (2016) employed multiple linear regression and random forest for yield prediction of potato, wheat and maize.The same machine learning algorithms were used by Shahhosseini, Martinez-Feria, Hu and Archontoulis (2019) to predict nitrate loss and corn yield.Awad (2019) proposed a mathematical optimization model and calculated biomass to predict potato yield.Several machine learning including decision tree and association rule mining for the classification of yield components of durum wheat and showed that association rule mining method best performance across all locations of the study (Romero, 2013).Ransom et al. (2019) evaluated machine learning approaches for corn nitrogen recommendation tool suing soil and weather information.

Related Works
various researchers explore the use of machine learning and data-driven approaches in optimizing agricultural practices and predicting crop yield.Chipanshi et al. (2015) focus on using an Extreme Learning Machine (ELM) model to accurately estimate coffee yield based on soil fertility properties, showing superior performance compared to traditional models.Goldstein et al. (2018) integrate data from various sources to predict irrigation recommendations for Jojoba crops, achieving high accuracy with regression and classification algorithms.Zhong, Li, Lobell, Ermon and Brandeau (2018) 2019) explored the value of combining data from multiple fields and years for predicting crop yield.They used large farms in Western Australia as a case study and developed random forest models to predict crop yield.The models showed accurate predictions, improving as the season progressed and more within-season data became available.Ranjan and Parida (2019) focused on paddy acreage mapping and yield prediction in Sahibganj district, India, using Sentinel-based optical and SAR sensors data.They employed a Random Forest classification technique for mapping paddy acreage and developed a linear regression model for yield prediction.The study highlighted the usefulness of SAR data for accurate acreage mapping and the potential of timely information for decision-makers.Agarwal and Tarar (2021) addressed crop prediction in Indian agriculture using machine learning algorithms.They proposed an enhanced model, incorporating deep learning techniques such as Support Vector Machine (SVM), Long Short-Term Memory (LSTM), and Recurrent Neural Network (RNN).The model aimed to predict the most productive crop and provide information on soil ingredients and expenses.The study emphasized the use of climatic and soil conditions for accurate yield predictions and to assist farmers in decision-making processes.Paudel et al (2021) used Supervised regression and found that explainable features designed using principles of crop modeling can be used to predict crop yield at sub-national level.Ahmed, Adewumi, and Yemi-peters (2023) deployed Random Forest Algorithm to improve precision accuracy with minimal errors compared to manual process.

Machine learning models
Machine learning models are mathematical algorithms or computational systems that are designed to learn patterns and make predictions or decisions based on input data.These models are trained on large datasets to recognize and generalize patterns, enabling them to perform tasks such as classification, regression, clustering, or anomaly detection.
Here are some machine learning models considered in this study.

Regression
In a machine learning regression model, the goal is to predict a continuous output value () given an input feature vector () (Xu et al., 2019).The predicted output value is represented as a function of the input features, which can be represented mathematically as shown in Equation 2.1:  = () + ε (1) where () is the predicted value of  given , and ε is the error term.In a linear regression model, the function () is a linear function of the input features (Equation 2.2): () = 11 + 22 + ⋯ +  + (2) where 1, 2, ...,  are the model coefficients (also known as weights) and  is the bias term.For example, if we have a single input feature  and a linear regression model with coefficient  and bias  , the predicted output value  is expressed in Equation 2.

Support Vector Regressor
Support Vector Regression (SVR) model is a powerful regression tool that predicts a continuous output value, represented as , based on a given input feature vector (Xu et al., 2019).The estimated output value is articulated as a function of the input features and can be expressed mathematically as follows: where is the estimated output value given input feature vector, and ε signifies the error term, representing the error in prediction.
Distinct from traditional regression techniques, SVR does not aim to minimize the error.Instead, it aspires to fit the optimal hyperplane within a predefined error value ε, establishing an ε-insensitive tube (Agarwal and Tarar, 2021).The fundamental strategy of SVR is to identify a function which deviates from the actual response at most ε and at the same time is as flat as possible.SVR operates by mapping the input space into a highdimensional feature space via a kernel function, where the function in SVR is then formulated as a function of the input features.
The mathematical representation of the Support Vector Regression (SVR) model is a bit more complex due to the utilization of the kernel function for mapping the data to higher dimensions and the introduction of the ε-insensitive loss function (Cravero, Pardo, Sepúlveda and Muñoz 2022).Formally, a linear Support Vector Regression function can be expressed in equation 2.4 as:  () = < ,  > +  (4) where: ()) is the regression estimate < ,  > denotes the dot product of the weight vector w and the input vector x  is the bias term.However, in most practical situations, the data is not linear.In such cases, SVR employs the kernel trick to map input data to a higher-dimensional feature space where the data can be linearly separated.This allows the use of linear methods  is the width of the insensitive tube.In simple terms, the SVR algorithm tries to find a function () that has at most  deviation from the actually obtained target  for all the training data, and at the same time, is as flat as possible (Agarwal and Tarar, 2021).This is achieved by minimizing ||||, which gives the flatness.In the case where this is not possible, the function is allowed to deviate more than ε, but these deviations are penalized in the objective function of the optimization problem.

K-Nearest Neighbour
K-Nearest Neighbors (K-NN) is a simple, yet effective supervised learning algorithm used for both classification and regression (Xu et al., 2019).It works based on the assumption that similar inputs have similar outputs, and the algorithm's output is determined by the properties of its neighboring data points.The K-NN algorithm operates by identifying 'K' instances that are nearest to the test instance and classifies the input based on the most common class in the neighborhood.In the case of a regression problem, it takes the mean (or median, depending on the use case) of the values of its nearest neighbors.The distance between two instances can be measured in many ways, such as Euclidean distance, Manhattan distance, Minkowski distance, etc.The choice of distance measure depends on the problem at hand (Cravero, Pardo, Sepúlveda and Muñoz 2022).To explain it mathematically, let's denote x as the input vector to be classified or used for prediction, and D as the dataset.The 'K' nearest neighbors are identified by the function as shown in equation 2.6: () =  ((, ))      (6) Here, (, ) is a distance metric like the Euclidean distance, which for two points  = (1, 2, … , )   = (1, 2, … , ) in an n-dimensional space can be computed as (Equation 2. (7) This formula is used to calculate the distance between the new instance and all the instances in the training data.For classification, once the K neighbors are identified, the algorithm assigns the class that is most common among the neighbors:  = ()     () Here, mode() is the most common output (class) among the K nearest neighbors.For regression, the predicted output is typically the mean or median of the K nearest neighbors:  = ()     () or  = ()     () K-NN is a non-parametric, lazy learning algorithm meaning it doesn't learn a discriminative function from the training set but 'memorizes' the training dataset instead (Cravero, Pardo, Sepúlveda and Muñoz 2022).The parameter K is crucial in this algorithm and choosing the right K is a complex task.A smaller K value will have a more flexible fit which will have low bias but high variance, whereas a larger K will have a smoother decision boundary (less variance) but increased bias.

Decision Tree
A decision tree is a machine learning model used for classification and regression tasks (Xu et al., 2019).It is a tree-like model that makes decisions based on the value of an input feature and splits the data into different branches based on the decision.The final decisions at the leaf nodes of the tree determine the output class or value for the input data (Cravero, Pardo, Sepúlveda and Muñoz 2022).In a decision tree model, the goal is to predict a class label (in the case of classification) or a continuous output value (in the case of regression) based on a set of input features.For example, in a classification tree, the Gini impurity at a node t is calculated as Equation 2.8): () = 1 − ∑ (|) ∧ 2 (8) where (|)is the proportion of the samples at node t that belong to class .The decision tree is constructed by recursively splitting the data at each node until the tree is fully grown.The final tree can then be used to make predictions on new, unseen data by following the decisions made at each node and reaching a leaf node, at which the output class or value is determined.

MATERIALS AND METHODS
In the proposed framework, four (4) machine learning algorithms including Linear Regression, Support Vector Regressor, K-Nearest Neighbor, and Decision Tree Regressor were executed to predict best crop yield predictions.Multiple but most common cash crops based on atmosphere, locations, and climatic parameters were taken into consideration for selections.In this model, data extracted from multiple sources with a variety of parameters was loaded, followed by the loading of useful libraries and packages for data preprocessing.Feature selection was performed to extract the most important features in the dataset for the best performance.The dataset was then divided into training and testing ratios which were later used for both training and testing by employing the Machine Learning algorithms.The testing dataset was then used for various performance metric evaluations as in figure 1 Figure 1 Architecture of Proposed Model for Crop Yield Prediction

Data Collection and Description
The crop yield/performance datasets were generated from the Agricultural Performance Survey of Nigeria by the National Agricultural Extension and Research Liaison Services (NAERLS) and Federal Department of Agricultural Extension (FDAE) for a five (5) year period.There are several crops taken in this dataset like wheat, rice, maize, millet, yam, cocoyam, and sorghum.Climatic data were collected from the Nigerian Meteorological Agency (Nimet) for the same 5 years period.The prediction parameters in this dataset include temperature, rainfall, relative humidity, soil moisture, soil surface, and area.Several values are available for each prediction parameter for a single crop.For instance, when taking a crop such as wheat, any value can be given to the prediction parameters among a set of values available in the dataset, for wheat.It is the same for the entire crops available in the dataset.A sample of the dataset is captured below in table 1.

Data preprocessing
The raw extracted data was cleaned, transformed, and organized.Exploratory Data analysis was performed to identify outliers, missing values, feature scaling, and data transformation where it is necessary.All the features were evaluated and only the best candidate was selected for the machine learning prediction.The dataset was then divided into training and testing datasets at a 0.2 ratio.This means that 80% would be used for training while the remaining 20% for testing and subsequent performance metrics evaluation.

Feature Extraction
After data cleaning, a feature selection process was undertaken.Utilizing an algorithm based on a tree-structured model such as a Random Forest or Gradient Boosting, importance scores were attributed to each feature.The results highlighted land_area, crop, and humidity as having the highest importance scores, hence being the most influential variables in the dataset.
Following the feature selection process, several models were trained, and their performances were evaluated via two metrics -Mean Squared Error (MSE) and R-Squared (R2 Score).The model utilizing all features ("Full features") did not yield optimal results, with a low R2 score of only 0.17.The Random Forest model displayed a negative R2 score, indicative of its unsuitability for this specific dataset or a possibility of overfitting.However, the model employing Recursive Feature Elimination (RFE) for feature selection displayed the best performance, with an R2 score of 0.47.This underscores the need for prudent feature selection, as several features may not substantially contribute to the predictive capacity of the model and could therefore be removed.

RESULTS AND DISCUSSION Crop Yield Prediction Scores of the Various Machine Learning Algorithms
The crop yield prediction scores are expressed in percentage values.The higher the score, the better the algorithm's performance in predicting crop yield as presented in Table 3 below.For the Full-Features dataset, the Linear Regression algorithm achieved a prediction score of 0.92%, the Support Vector Regressor obtained 5.94%, the Decision Tree Regressor achieved 42.11%, and the K-Nearest Neighbor algorithm achieved a score of 25.98%.When using the RFE-Features dataset, the Linear Regression algorithm obtained a slightly higher prediction score of 1.15%, the Support Vector Regressor achieved 5.89%, the Decision Tree Regressor showed a significant improvement with a score of 71.59%, and the K-Nearest Neighbor algorithm obtained a score of 25.93%.These results indicate that the performance of the algorithms varies depending on the dataset used.In the case of the Decision Tree Regressor, the algorithm performed significantly better with the RFE-Features dataset compared to the Full-Features dataset as shown in

Shuaibu et al., FJS
propose a hierarchical machine learning mechanism for seed variety selection, considering yield maximization and risk.Crane-Droesch (2018) introduce a deep neural network approach to model the relationship between weather and corn yield, outperforming traditional methods and showing less severe climate change impacts.Khanal, Fulton, Klopfenstein, Douridas and Shearer (2018) demonstrate the effectiveness of machine learning algorithms and remotely sensed data in predicting soil properties and corn yield.Taherei Ghazvinei et al. (2018) apply extreme learning machine to predict sugarcane growth, providing a swift and accurate model for the sugarcane industry.Ahmed et al. (2018) combines remote sensing and crop modeling to estimate maize yield, showcasing the potential of both techniques with high accuracy.These studies collectively highlight the value of machine learning and data-driven approaches in optimizing agricultural practices and yield prediction. Xu et al. (2019) developed an integrated climatic assessment indicator (ICAI) in Jiangsu Province, China, to evaluate the synthetic effects of meteorological factors on crop production.They used machine learning algorithms to construct the indicator, with Random Forest (RF) performing the best.The ICAI provided values for yield loss, normal conditions, and yield increment.The study assessed the past climatic suitability of winter wheat and predicted future suitability under global warming conditions.Filippi et al. ( 3:  =  +  (3) The model coefficients and bias are learned from the data during the training process.The goal is to find the values of the coefficients and bias that minimize the error between the predicted values and the true values of  in the training data.Once the model is trained, it can be used to make predictions on new, unseen data by plugging in the appropriate values for CROP YIELD PREDICTION USING… Shuaibu et al., FJS  into the equation for () (Cravero, Pardo, Sepúlveda and Muñoz 2022).
(like SVR) to solve non-linear problems.The kernelized version of the SVR function becomes (Equation 2.5): () =  ( - * ) (, ) +  (5) where: () is the regression estimate (, ) is the kernel function that maps xi and x to a higherdimensional space  and  * are Lagrange multipliers obtained from the solution of the dual problem. is the bias term The optimization problem in SVR is to find the values of  and  that minimize the following: ½ ||||^2 +   ( +  * ) under the constraints:  − < ,  > −  <=  +  < ,  > +  - <=  +  * ,  * >= 0    where: ||||^2 is the square of the Euclidean norm of   is the regularization parameter. and  * are slack variables introduced to cope with the infeasible constraints of the optimization problem.

Discussion
The varying prediction scores achieved by different machine learning algorithms indicate the importance of dataset selection.The Decision Tree Regressor performed significantly better with the RFE-Features dataset compared to the Full-Features dataset.This finding suggests that the use of feature selection techniques, such as RFE, can improve the performance of prediction models for crop yield.These findings are consistent with studies byGopal and Bhargavi  (2019), which demonstrated the impact of dataset quality and feature selection on crop yield prediction accuracy.The superior performance of the Decision Tree Regressor on both the Full-Feature and RFE-Feature datasets, as indicated by lower MSE and MAE values and higher R-Squared, implies its effectiveness in predicting crop yield.These results align with the findings of previous studies byKuradusenge et al. (2023) and Javadinejad, Eslamian and Ostad-Ali-Askari (2021), which highlighted the superiority of decision treebased algorithms in agricultural forecasting.The implication is that employing the Decision Tree Regressor, particularly with the RFE-Feature dataset, can lead to more accurate crop yield predictions.Our study advances the field by providing a comprehensive understanding of the connection between algorithmic choice, feature selection, and prediction accuracy-and by highlighting the advantages of the Decision Tree Regressor with the RFE-Feature dataset.This thorough study contributes to the repository of existing knowledge and provides practitioners with valuable data to assist them improve crop yield prediction precision.CONCLUSIONLastly, considering Nigeria's agricultural climate, our research validates the revolutionary impacts of integrating Machine Learning (ML) into crop yield prediction models.Our contribution is the meticulous evaluation of the Decision Tree Regressor, even though our findings are in line with previous research on the efficacy of machine learning.With enhanced measurements and a prediction score of 72%, the Decision Tree Regressor demonstrates its robustness in crop yield prediction by regularly outperforming competing algorithms.Beyond simply validating past research, our work offers specific insights for practitioners to optimize crop yield predictions in the Nigerian agriculture setting.In summary, our research offers a significant contribution to the field by examining the distinct use of the Decision Tree Regressor and providing practitioners and policymakers with useful suggestions.These results enrich existing knowledge and provide strategic direction for the growth of sustainable agriculture in Nigeria and other comparable economies.REFERENCESAgarwal, S., and Tarar, S. (2021).A hybrid approach for crop yield prediction using machine learning and deep learning algorithms.In Journal of Physics: Conference Series (Vol.1714, No. 1, p. 012012).IOP Publishing.Ahmed, A., Adewumi, S. E., and Yemi-peters, V. (2023).Seasonal Crop Yield Prediction in Nigeria Using Machine Learning Technique.Journal of Applied Artificial Intelligence, 4(1), 9-20.