An interpretable approach combining Shapley additive explanations and LightGBM based on data augmentation for improving wheat yield estimates
Accurate and timely yield estimation ensures food security and effective farm management, contributing to the achievement of sustainable development goals (SDGs) related to zero hunger and responsible agricultural practices. Light gradient boosting machine (LightGBM) and Shapley additive explanations (SHAP) were employed to develop an interpretable crop yield estimation model by using remotely sensed leaf area index (LAI) and vegetation temperature condition index (VTCI). To expand and balance the VTCIs, LAIs and yields, the synthetic minority over-sampling technique (SMOTE) was used to augment the dataset, resulting in a synthetic dataset with advantages in quantity and quality. The results showed that the yield estimation model trained on the data_SO2 (double magnification of data) had the ability to establish complex nonlinear relationships between VTCIs, LAIs and yields, demonstrating its excellent performance (R2 = 0.63, RMSE = 514.18 kg/ha, MRE = 8.79 %). To further assess the model, 10-fold cross-validation was conducted, revealing R2 values ranging from 0.46 to 0.66 and the corresponding RMSEs ranging from 439.11 kg/ha to 639.26 kg/ha across ten subsets, confirming the model’s generalization and robustness. Additionally, the importance of model interpretability was discussed and the variables that significantly affect the estimated yield were explored. The results of the global interpretability highlighted the contributions of LAIs and VTCIs at different growth stages of winter wheat to yield, and the significant features contributing to yield formation are LAI and VTCI at the jointing stage, and LAI at the green-up stage. Local interpretability showed the reasons for differences in yields between low-yield and high-yield years. Moreover, the jointing stage of winter wheat is crucial for yield, with a positive correlation between LAI and VTCI. When normalized LAI exceeds 0.50 and winter wheat has sufficient moisture, SHAP values can surpass 600, providing important guidance for field management. The study improves agricultural production efficiency, optimizes field management practices, and provides essential references for decision-making in the agricultural sector.
Funding
National Natural Science Foundation of China under Grant 42171332
UKRI funding from a Science and Technology Facilities Council grant administered through Rothamsted Research under Grant SM008 CAU, and in part by the Royal Society-Newton Mobility grant (UK)
UKRI BBSRC (BB/W009439/1)
History
Author affiliation
College of Science & Engineering Geography, Geology & EnvironmentVersion
- AM (Accepted Manuscript)
Published in
Computers and Electronics in AgricultureVolume
229Pagination
109758 - 109758Publisher
Elsevier BVissn
0168-1699Copyright date
2024Available date
2025-01-07Publisher DOI
Language
enPublisher version
Deposited by
Professor Kevin TanseyDeposit date
2024-12-17Rights Retention Statement
- Yes