Evaluating agronomic factors influencing cotton yield using multivariate regression and model validation

Iftakhar Ahmad, Mohammad H. Lakho, Riaz A. Buriro, Aijaz A. Khooharo

Abstract


Regression analysis is a statistical technique used to estimate connections between variables that exhibit a cause-and-effect relationship. Despite its widespread use for identifying correlations and predicting outcomes, it is crucial to validate the assumptions and reliability of multivariate regression models. This study focuses on multivariate regression analysis, which utilizes multiple independent variables to predict cotton yield in kilogram per hectare (YldPhec). The secondary dataset was sourced from the Cotton Research Station (CRS) in Uthal, Lasbela, Balochistan, Pakistan. The analysis included eight predictors, revealing that the intercept and PlntPl had marginally significant positive effects (Beta = 4425.26, 0.04; p < 0.1), while Grmn and FbrSnt demonstrated highly significant positive effects (Beta = 48.76, 177.97; p < 0.001). Conversely, BolWt and StpLt exhibited significant negative effects (Beta = -741.49, -246.77; p < 0.001), with Lnt also showing a significant negative effect (Beta = -145.57; p < 0.01). Additionally, MikV and BolPp were not significant. Zero-order correlation analyses indicated strong positive relationships for Grmn (0.56) and PlntPl (0.50), while BolWt (-0.41) and Lnt (-0.42) showed strong negative correlations with the dependent variable. Tolerance values exceeding 0.39 and Variance Inflation Factor values less than 3 indicate that multicollinearity is not a significant concern among the predictors. The Shapiro-Wilk, Rainbow, and Studentized Breusch-Pagan test statistics confirmed the normality of residuals, the linearity of the model, and the absence of significant heteroscedasticity, with p-values of 0.74, 0.84, and 0.32, respectively. Confirming these assumptions enhances the model’s validation and underscores the significance of the identified predictors in explaining the variability of YldPhec. 


Keywords


Agronomic parameters; Regression model’s assumption; Mahalanobis distance (MD); Model Validation; Standardized beta coefficient; Variance Inflation factors (VIF)

References


Abubakar, M., Sheeraz, M., Sajid, M., Mehmood, Y., Jamil, H., Irfan, M., and Shahid, M. 2023. Analysis of Cotton Value Chain in Pakistan: Identifying the Process and Critical Factors in Sustainable

Agribusinesses. Journal of Arable Crops and Marketing, 5(2): 63–74.

Becker, C., and Gather, U. 1999. The Masking Breakdown Point of Multivariate Outlier Identification Rules. Journal of the American Statistical Association, 94(447): 947–955.

Brillinger, D. R., Malinvaud, E., and Silvey, A. 1967. Statistical Methods of Econometrics. Economica, 34(136): 451.

Büyüköztürk, Ş. 2018. Sosyal bilimler için veri analizi el kitabı. Sosyal Bilimler Için Veri Analizi El Kitabı, 1–214.

Chatterjee, S., and Hadi, A. S. 2012. Regression Analysis by Example (Fifth). A Johan Wiley and Sons, Inc., Publication.

Denis, D. J. 2021. Simple and Multiple Linear Regression. In Applied Univariate, Bivariate, and Multivariate Statistics Using Python.

Derksen, S., and Keselman, H. J. 1992. Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. British Journal of Mathematical and Statistical Psychology, 45(2): 265–282.

Dhamodharavadhani S., and Rathipriya R. 2021. Variable Selection Method for Regression Models Using Computational Intelligence Techniques. In Handbook of Research on Machine and Deep Learning Applications for Cyber Security (pp. 417–236). IGI Global.

Faraway, J. J. 2002. Practical Regression and Anova using R (Third). www.r-project.org.

Filzmoser, P., Maronna, R., and Werner, M. 2008. Outlier identification in high dimensions. Computational Statistics and Data Analysis, 52(3): 1694–1711.

Finch, W. H., Bolin, J. E., and Kelley, K. 2016. Multilevel Modeling Using R. Chapman and Hall/CRC.

Frederick, O., Maxwell, O., Ifunanya, O., Udochukwu, E., Kelechi, O., Ngonadi, L., and Idris, H. K. 2019. Comparison of Some Variable Selection Techniques in Regression Analysis. American Journal of Biomedical Science and Research, 6(4): 281–293.

Frost, J. 2019. Regression Analysis (First Edit).

Goldengorin, B. I., Malyshev, D. S., Pardalos, P. M., and Zamaraev, V. A. 2015. A tolerance-based heuristic approach for the weighted independent set problem. Journal of Combinatorial Optimization, 29(2): 433–450.

Graybill, F. A. 1961. An introduction to linear statistical models.

Haldar, S., and Miller, A. J. 1992. Subset Selection in Regression. In Journal of Marketing Research (Second, Vol. 29, Issue 2). Chapman and Hall/CRC.

Hastie, T., Tibshirani, R., James, G., and Witten, D. 2021. An Introduction to Statistical Learning with Application in R. In Springer Texts (Second Edi, Vol. 102). Springer Science+Business Media.

Imdadullah, M. 2017. Addressing Linear Regression Models with Correlated Regressors: Some Package Development in R. Bahuddin Zakariya University Multan, Pakistan.

Izenman, A. J. 2013. Modern Multivariate Statistical Techniques Regression, Classification, and Manifold Learning (2nd ed.) Springer.

Jakešová, J. 2014. The validity and reliability study of the Czech version of the motivated strategies for learning questionnaire (MSLQ). New Educational Review, 35(1): 54–65.

Kawano, S., Fukushima, T., Nakagawa, J., and Oshiki, M. 2023. Multivariate regression modeling in integrative analysis via sparse regularization.

Mayrhofer, M., and Filzmoser, P. 2023. Multivariate outlier explanations using Shapley values and Mahalanobis distances. Econometrics and Statistics, xxxx, 21.

Naseer, M. A. ur R., Ashfaq, M., Razzaq, A., and Ali, Q. 2020. Comparison of water use efficiency, profitability and consumer preferences of different rice varieties in Punjab, Pakistan. Paddy and Water Environment, 18(1): 273–282.

Naveed, M., Maqsood, M. F., and Cheema, A. R. 2024. Modeling the contribution of district-level cotton production to aggregate cotton production in Punjab (Pakistan): an empirical evidence using correlated component regression approach. Journal of Excellence in Social Sciences, 3(3), 147–164.

Ott, L., and Longnecker, M. 2010. An introduction to statistical methods and data analysis. Brooks/Cole Cengage Learning.

Park, Y. W., and Klabjan, D. 2020. Subset selection for multiple linear regression via optimization. Journal of Global Optimization, 77(3): 543–574.

Poon, E., and Feng, C. 2023. Univariate and Multiple Regression Analyses in Medical Research. Biometrical Letters, 60(1): 65–76.

Rawlings, J. O., Pantula, S. G., and Dickey, D. A. 1998. Applied Regression Analysis: A Research Tool (Second). Springer-Verlag New York Berlin Heidelberg.

Rehman, A., Jingdong, L., Shahzad, B., Chandio, A. A., Hussain, I., Nabi, G., and Iqbal, M. S. 2015. Economic perspectives of major field crops of Pakistan: An empirical study. Pacific Science Review B: Humanities and Social Sciences, 1(3): 145–158.

Rousseeuw, P. J., and Zomeren, B. C. Van. 2012. Unmasking Multivariate Outliers and Leverage Points Unmasking Multivariate Outliers and leverage Points. Journal of the American Statistical Association, 85(411):, 633–639.

Sestelo, M., Villanueva, N. M., Meira-Machado, L., and Roca-Pardiñas, J. 2016. FWDselect: An R package for variable selection in regression models. R Journal, 8(1): 132–148.

Shakeel, M., Hassan, ul H., Chaudhry, K. A., and Tahir, M. N. 2023. View of What Affects Crop Production in Pakistan_ The Role of Agriculture Employment, Machinery and Fertilizer Consumption.pdf. Bulletin of Business and Economics, 12(3): 541–546.

Tabachnick, B., and Fidel, L. S. 2019. Using Multivariate Statistics. In Pearson Education, Inc. (Seventth). Pearson.

Ünver, Ö., and Gamgam, H. 1999. Uygulamalı istatistik yöntemler. Siyasal Kitabevi.

Wang, G., Sarkar, A., Carbonetto, P., and Stephens, M. 2020. A simple new approach to variable selection in regression, with application to genetic fine mapping. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 82(5): 1273–1300.

Wen, C., Zhang, A., Quan, S., and Wang, X. 2017. BeSS: An R Package for Best Subset Selection in Linear, Logistic and CoxPH Models.

Young, D. S. 2016. Normal tolerance interval procedures in the tolerance package. R Journal, 8(2): 200–212.

Zhang, F. 2022. Economic Research on Multiple Linear Regression in Fruit Market inspection and Management. Applied Mathematics and Nonlinear Sciences, 8(1): 1951–1966.


Full Text: PDF

DOI: 10.33687/ijae.012.003.5418

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Iftakhar Ahmed Khanzada

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.