European beech spring phenological phase prediction with UAV-derived multispectral indices and machine learning … – Nature.com

Phenological data historical overview

In the observation period from 2006 to 2020 the process for spring leafing out observations typically began before budburst, at the point when buds began swelling (phase 0.5). Figure3 shows a comprehensive overview of the yearly averaged spring phenological observations from 2006 to 2020 for the Beech plot with a maximum variation of the start of 40 days and end of all phases of 20 days between the years.

The phenological spring phase development for Beech at the Britz research station between 2006 and 2020. As shown in the figure, the timing of phenological phases can vary considerably over the years, due to a variety of climatic factors.

The analysis of the duration between different phenological phases is crucial for understanding two key aspects: first, the timing of budburst in relation to climate change impacts, and second, the progression to later stages, such as phase 4 and phase 5, when leaves are nearing full development. The "hardening" of leaf cell tissues, which occurs at these later stages, renders the leaves less vulnerable to late frosts, intense early spring solar radiation, and biotic pests such as Orchestes fagi. Additionally, in early spring drought conditions, certain phases may be delayed, extending the development period from phases 1.0 to 5.0. This phenomenon was observed at the Britz research station in 2006, 2012, 2015, and 2019.

Figure4 in the study visually illustrates the variability in phase duration from 2006 to 2020, which ranged from 23 to 41 days. Meanwhile, Table 5 offers a comprehensive summary with descriptive statistics for the length of time between phases. The phase lengths presented in Fig.4 and Table 5 are derived from the average timings across all sampled beech trees in the phenology plot. For more accurate predictions of other phases based on a single observed phase, it might be more effective to model using data from individual trees, given the significant heterogeneity that can exist among them during the spring phenological phases. Further research in this direction is warranted to explore these possibilities.

The average spring phenological phases at the Britz research station shown in length between phase 1 and 5 from years 2006 to 2020.

Trends show an earlier onset of phase 1.0 (see Fig.5; left), as well as phase 5.0 (see Fig.5; right). A gradual increase in average yearly air temperature (see Fig.6; left) is also evident, alongside a steady decrease in yearly precipitation (Fig.6; right).

(left) Yearly linear trend in phenological phase 1.0; (right) Yearly linear trend in phenological phase 5.0.

(left) Yearly linear trend of average air temperature between 2006 and 2020; (right) Yearly linear trend of average precipitation between 2006 and 2020. Both are results from the Britz research station.

Several of the trees used for phenological observations at the research site are equipped with electronic band dendrometers and sap flow measurement devices. Figure7 depicts the relationship between the phenological phases and the onset of stem growth for tree number 328 during the growth season. Notably, in both 2017 and 2018, the onset of stem diameter growth in this tree coincided with the achievement of phase 3.0, which is marked by the emergence of the first fully unfolded leaves.

Spring phenological phases shown in relation to band dendrometer measurements from 2017 (left) and 2018 (right). Stem growth typically began around the arrival of phase 3.0.

The dendrometer data from 2018 reveal significant fluctuations in growth deficit throughout the growth season. These fluctuations align with the prolonged drought conditions reported in that year, as documented by Schuldt et al.45. This correlation highlights the impact of environmental factors, such as drought, on the growth patterns and phenological development of trees, providing valuable insights into the interplay between climatic conditions and tree physiology.

The analysis of the phase and foliation datasets is further elaborated through the histograms presented in Fig.8. These histograms exhibit a distinct bimodal distribution, characterized by noticeable left- and right-skewed distributions on the tail ends. This pattern arises from a typical surplus of observations occurring before phase 1.0, which is primarily due to the intensified frequency of observations in anticipation of budburst. Additionally, the extended duration between phases 4.0 and 5.0 contributes to this bimodal distribution. This phenomenon highlights the uneven distribution of observations across different phenological phases, influenced by the varying rates of development and the specific focus of the observation periods.

Histograms showing a distinct biomodial distribution of the phase and foliation ground observations from 2019 and 2020.

Due to the spectral reflectance characteristics of vegetation, visible bands tend to show a positive correlation among each other, whereas the NIR band shows a negative correlation (Mather & Koch, 2011). All the vegetation indices, whether derived from visible or NIR bands or a combination thereof, have a positive correlation with the phase and foliation datasets except for the NDWI, which typically has an inverse relationship with the phases and foliation (see Fig.9). The most consistent index throughout all datasets, whether originating from single or combined years, is evidently the NDVI with a persistent correlation of r>0.9 (p<0.001) over all datasets.

Spearman correlation analysis of the spectral indices derived from the 2019 and 2020 datasets in relation to the ground observations.

Indices derived from visual bands (i.e., GCC and NGRDI) showed a correlation of r=0.65 (p<0.001), and those uncalibrated were even poorer. Interestingly, the AIRTEMP meteorological-based feature correlated very well with the ground observations (r=0.9; p<0.001), with a very high correlation coefficient to the phenological phases at r=0.95 (p<0.001).

In terms of correlation among independent features (see Fig.10), the aim was to refrain from implementing highly correlated features when multiple independent features were incorporated into the modeling process. This could be especially problematic when multiple indices are derived from the same bands (i.e., NDVI and EVI). Here, we could deduce that the NDREI and GCC, when used together for the modeling process, have a lower correlation (r=0.73) and do not share any similar bands. Likewise, the NDRE and the NDWI do not share the same bands and have a negative correlation coefficient of r=0.8. The NDWI and the GCC share only the green band and correlate negatively at r=0.74.

Between-variable Spearman correlation assessment of the 2019/2020 features.

In analyzing the use of correlation for feature selection, it is important to note that while this method is informative, particularly for evaluating multicollinearity, it can potentially be misleading. This is because correlation coefficients might be artificially high due to the bimodal influence on the dataset. The aggregation of data points at the tail ends of the distribution results in a biased similarity caused by an oversampling of similar phases, thus leading to high correlation coefficients. Consequently, correlation filtering methods were not the sole reliance for feature selection, as outlined by Chandrashekar and Sahin46. This approach recognizes the limitations of using correlation analysis in isolation, especially in datasets with unique distribution characteristics such as the one described here.

The addition of polynomial terms into regression models can aid in the characterization of nonlinear patterns43 and is conducive to representing phenological trends, particularly those of the spring green-up phases. As polynomial fitting may not be capable of identifying the complexities of phenology metrics in comparison to other algorithms47,48, we used the fitting of polynomials here for the purpose of feature selection, where the aim was to identify which features best correspond to the typical spring phenology curve. Figure11 shows the fitting of the five polynomial orders using the example for the NDVI, resulting in an RMSE of 0.55, MAE of 0.41 and R-squared of 0.91. Here, the third polynomial order was deemed the best choice for further analysis where the curve is not oversimplified or too complex.

Modelling of the spring phenological phases (2019/2020) dataset with polynomial regression of the first to fifth order.

To follow, each of the selected individual features was tested with the 3rd-order polynomial separately for the 2019/2020 and 2020/2021 datasets for both phase (Fig.12) and foliation (Fig.13). In terms of the phenological phases, the GNDVI shows quite a low dispersal of RMSE for the 2019/2020 dataset, yet the dispersal is higher for the 2020/2021 dataset. A similar result is evident for the NDVI, where less dispersal is found in the 2020/2021 dataset than in the 2019/2020 dataset. The cumulative warming days (AIRTEMP) as well as the indices derived from the uncalibrated visible bands (GCC_UC and NGRDI_UC) fared poorly for both datasets. This was also the case for foliation; however, AIRTEMP performed better for the 2019/2020 dataset. Regarding foliation, the NDVI also performed well for the 2020/2021 dataset, as did the NDREI for both datasets.

Overview of the spring phenological phases and indices modelled with third-order polynomial regression for the 2019/2020 (left) and 2020/2021 (right) datasets.

Overview of spring foliation and indices modelled with polynomial regression of the third order for the 2019/2020 (left) and 2020/2021 (right) datasets.

Based on the results of the correlation analysis and polynomial fitting, we were able to select the most relevant features for further scrutinization during the subsequent modeling process. It is important to note here that in the initial feature selection process using only the correlation analysis alone could have produced an unseen bias due to an aggregation of data points at the tail ends of the datasets, which was especially evident for the 2019/2020 dataset. We proceeded to build three models based on ML algorithms that aided in choosing the best performing algorithms as well as features. Each of the selected individual and combined indices were modeled with each algorithm and evaluated using an 80/20 training/validation data split. This not only helped in choosing the best ML algorithm but also assisted in a type of model-based feature selection by further narrowing down the selected features. In terms of the phenological phases, an RMSE of0.5 (0.6) is deemed acceptable and similar to the magnitude of potential human error. For the Britz method of foliation, an RMSE of10% is assumed to be acceptable; however, some may argue that an RMSE of5% in terms of foliage observations is possible with ground observations. Here, it should be noted that the Britz method of foliation is based on the percentage of leaves that have fully opened rather than fractional cover or greening-up.

Regarding the phenological phases, the GAM boosting algorithm showed the best results overall (see Table 6). The GAM models with the features NDREI+GCC resulted in an RMSE of 0.51, MAE of 0.33 and an R-squared of 0.95. The feature combination of NDWI+GCC resulted in an RMSE of 0.46, MAE of 0.3 and R-squared of 0.96. The top performing model was that of GAM boosting with the NDVI, which produced an RMSE of 0.28, MAE of 0.18, and R-squared of 0.98. The second-best performing model was that of the GAM model with the NDRE+NDWI input features, resulting in an RMSE of 0.44, MAE of 0.31 and R-squared of 0.96. Interestingly, the uncalibrated GCC (GCC_UC) outperformed the calibrated GCC with an RMSE of 0.73 for gradient boosting and the GCC_UC index as opposed to an RMSE of 0.81 for GAM boosting and the GCC.

At this stage of the modeling process, the NDVI and GAM boosting algorithms showed very good results (RMSE=0.28), and the question is here whether the dataset is overfit for the Britz research station beech stand (Table 7). At this point, it is imperative to test the models with unseen data and assess which ones are generalizable over various beech stands, especially those of increased age. In terms of the models derived from indices from the visual bands, the uncalibrated GCC performed slightly better than the radiometrically calibrated GCC and better than some of the models derived from the calibrated multispectral bands, which is particularly interesting, as RGB sensors are typically acquired at a much cheaper price.

For the most part, all models failed the 10% cutoff point except for those using the NDVI as an input feature. Both the NDVI-based GAM boosting and gradient boosting models obtained an RMSE of 7%, MAE of 4% and R-squared of 0.98. Here, overfitting could also be a factor; however, it will still be interesting for further model assessment of the prediction of foliation on a new dataset (2022) as well as datasets outside of the Britz research station. The worst performing models were those utilizing the radiometrically calibrated GCC, which acquired an RMSE of 22%, MAE of 16%, and R-squared of 0.92.

With the aim of testing the robustness and generalizability of the developed models, new data from 2022 as well as data from different forest stands (beech) were introduced (Table 7). Here, we tested the models on new spring phenological data from the same stand from 2022 (n=17) as well as an older beech stand in Kahlenberg (n=10) located in the same region as the Britz research station and a beech stand in the more mountainous region of the Black Forest (n=8) in southwestern Germany. The three test datasets are limited to only one Epoch, where the Kahlenberg site is comprised of mostly later phases and the Britz and Black Forest datasets have a wide range of earlier phases (<4.0). Additionally, training datasets were divided into three different subdivisions based on the year of origin: 2019/2020, 2020/2021 and all datasets together (20192021). This was carried out for the purpose of distinguishing whether data acquisition methods from a certain year contributed to error propagation. For example, the 2019 field data were collected by a different observer and often not recorded on the same day as flights (3 days), as well as low-quality radiometric calibration. The models chosen for testing were those implementing GAM boosting and the RGB-derived indices GCC (Micasense Altum) and GCC_UC (Zenmuse X7) and the NDVI (Micasense Altum). Table 8 displays a list of all the tested models with reference to the applied index, location, training data subdivision and date.

The results of the model testing of the phenological phase prediction (see Fig.14) and foliation (see Fig.15) were ranked in order of the RMSE. Notably, all the models of the phenological phase prediction that achieved the 0.5 threshold (left of green dotted line) were those of the calibrated and uncalibrated GCC, which originate from bands of the visible portion of the electromagnetic spectrum. Five of six of these models were from the Kahlenberg dataset, and one was from the Black Forest dataset. The best performing models were selected for each of the test sites and are mapped out in Figs.16, 17, 18, 19. All image data acquired for the test sites with Zenmuse X7 lack radiometric calibration except for the Britz dataset (see Fig.19), which was acquired with both the X7 and radiometrically calibrated Micasense Altum data.

graph showing the RMSE for the phase prediction ranked in order from poorest to best RMSE. The green dashed line depicts the cut-off point of acceptable accuracy. Allowing an RMSE of up to 0.6 would enable the NDVI model derived from the multispectral datasets. Otherwise, only models originating from the visible bands are considered operational.

graph showing the RMSE for foliation prediction ranked in order from poorest to best. The green dashed line depicts the cut-off point of 10%. None of the models for foliation prediction are considered functional.

Phase prediction of an older beech stand (>100 years) utilizing the model originating from the uncalibrated GCC 2020/2021 dataset. The very low RMSE of 0.22 proves a highly generalizable model; however, it should be noted that this is a relatively small dataset (n=10) and comprised of only later phases (>3.0). The ML phase is the predicted phase, and the Phase originates from ground-based observations.

Phase prediction of a beech stand (<70 years) utilizing the model originating from the calibrated GCC 2019/2020 dataset. The Black Forest dataset is particularly challenging, as a wide range of phases are available. An RMSE of 0.43 is within the accepted error cut-off of0.5.

Phase prediction of a beech stand (47 years) utilizing the model originating from the calibrated GCC 2020/2021 dataset. Despite being a larger dataset (n=17) in comparison to the other test sites, an RMSE of 0.54 was achieved, which can be regarded as achieving the 0.5 threshold.

Phase prediction of a beech stand (50 years) utilizing the model originating from the calibrated NDVI 2020/2021 dataset. This is the only model derived from the nonvisible band (NIR), which is in proximity to the 0.5 threshold RMSE=0.61). CIR=Color-infrared.

The Kahlenberg dataset (see Fig.16) with the gcc-uc-2021 model resulted in a very low RMSE of 0.22, MAE of 0.16 and R-squared of 0.08 (n=10). Such a low RMSE for an uncalibrated RGB-based model is an unexpected result here and shows that the later phases, in particular phase 4.0, predict well. Phase 4.0 is a significant phase in the spring green-up, as it corresponds to the completion of all leaf and shoot development. The transition to Phase 5.0 would then follow with the hardening of leaf tissue alongside a change to darker green and increased late-frost hardiness.

Regarding the Black Forest dataset with the bf-gcc-19-20 model, an RMSE of 0.43, MAE of 0.32, and R-squared of 0.02 (n=8) were achieved (see Fig.17). Here, a scene with a wide range of phases (0.93.8) was available, and a successful phenological phase prediction was possible with the calibrated GCC model and training data from 2019 and 2020. It is important to note that the radiometrically calibrated GCC model was used to predict the GCC, which is derived from the noncalibrated Zenmuse X7. Significant here is that sensor mixing in terms of model training with the multispectral sensor and prediction with a consumer grade RGB sensor is attainable. We considered the low R-squared as insignificant due to the overall low sample rate of the test datasets.

The Britz dataset (seeFig.18) also implemented the GCC and 2019/2020 training model (br-gcc-19-20) and resulted in an RMSE of 0.54, MAE of 0.45 and R-squared of 0.65 (n=17). It is important to note that the Britz test dataset possesses more samples than other test sites and achieves the 0.5 threshold. This test dataset, however, comprises the same trees as those in the training dataset, providing the model with an advantage at the Britz test site. It is important to note, however, that this advantage might not extend to other test sites, potentially limiting the model's ability to generalize well in different settings.

With respect to the test sites involving phase prediction from the multispectral sensor (Micasesense Altum), only the Britz and Kahlenberg sites were available. The only NDVI-based model that was in proximity to the 0.5 threshold was the Britz test dataset (br-ndvi-20-21), with an RMSE of 0.61, MAE of 0.52, and R-squared of 0.58 (n=17). We hypothesized that the radiometric calibration methods from 2019 would influence the model accuracy; however, there was only a marginal difference in the RMSEs of the 2019/2020 and 2020/21 datasets.

Overall, the best performing and most consistent model for predicting the spring phenological phases was the calibrated GCC model trained on the 2019/2020 dataset. This model (gcc-uc-19-20) demonstrated strong generalization across all test sites, including the Black Forest (bf-gcc-19-20) and Kahlenberg (ka-gcc-uc-19-20), with the highest RMSE observed at the Britz (br-gcc-uc-19-20) 2022 test site (RMSE=0.54). For a visual representation of the model's performance, please refer back to Fig.14.

This research highlights the challenges in obtaining radiometrically calibrated datasets over multiple growing seasons, despite pre- and post-mission calibration panel acquisition and DLS data usage. Issues arise when reflectance values bottom out, such as during the calculation of NDVI or other indices involving the NIR band, which occurs when clouds temporarily during flight missions, exposing the terrain to direct sunlight. This issue of oversaturation in the NIR band was also reported by Wang41. While the DLS compensates for fluctuations in irradiance, it is effective only for global changes in lighting conditions. While the DLS compensates for fluctuations in irradiance, it is effective only for global changes in lighting conditions. The problem is exacerbated in dense forests, where obtaining shadow-free reference panels is nearly impossible, and capturing calibration data at different locations before and after missions is impractical. This could result in time differences from the actual flight mission, during which considerable changes in solar angle might occur.

The size of the reflectance panels also impacts the difficulty of radiometric calibration. Honkavaara et al.49 showed a better calibration for larger, custom-made reference panels of 11m than the manufacturers provided method. Some studies have also demonstrated improved calibration methods using even larger reflectance tarps50,51,52. However, this does not alleviate the problem of acquiring calibration data in dense forests or the previously mentioned sudden changes in illumination. Therefore, further testing and development of improved field radiometric calibration strategies are imperative to more effectively utilize multispectral sensor capabilities.

Despite the challenges with multispectral sensors, particularly in the NIR band, the utility of the RGB bands is notable. Low-cost UAV setups with RGB sensors are widely available, facilitating the collection of vast data. This high data volume is crucial for developing models for various tree species in intensive monitoring plots. A key question is whether training data for models derived from visible bands need calibration from the multispectral sensor. In this case, the model trained with calibrated GCC generalized well with the uncalibrated GCC, but it remains to be seen if this holds true for new datasets and other tree species.

Errors can also arise from crown segmentation in pixel value extraction. For instance, branches from a neighboring tree with earlier phenological onset could overlap into the segmented crown area of the target tree. As segmentation is typically performed with a fully developed canopy (after phase 5.0), such overlapping branches are challenging to account for. Recording influential branches from neighboring trees during ground observations and excluding them from training datasets could improve the quality of training data.

The feature selection process in this research, especially partitioning training datasets by year for testing, was effective. It allowed for scrutinizing and removing training data portions that could affect model generalizability. For instance, the br-ndvi-20-21 derived from multispectral sensors excludes the 2019 dataset due to its lower quality radiometric calibration, time differences between observations, a slightly different multispectral sensor, and a different observer for ground observations. Conversely, the gcc-19-20 models generalized well with the 2019 datasets incorporated, using only bands from the visible spectrum. This suggests that the main factors in error propagation lie in the quality of radiometric calibration and sensor mixing with NIR bands, a conclusion that might not have been apparent without partitioning training by year. Interestingly, sensor mixing does not seem to be an issue with RGB imagery, which is advantageous for acquiring large data volumes.

Incorporating meteorological data, such as warming days (AIRTEMP), as a model feature suggests that other factors, such as a dynamic start date and chilling days, should also be considered for a successful phenological model in fusion with spectral data. However, this concept is somewhat limited, as meteorological data at the individual tree level might not explain the heterogeneity of individual trees in phenological development. The fusion of meteorological and spectral data is more suited for larger-scale applications, where phenological data are applied standwise rather than at the individual tree level.

Regarding the Britzer foliation method, translating ground observations into remote sensing data was not feasible. Consequently, the Britzer method of foliation has been abandoned at the Britz research station and replaced with the ICP Forests flushing method. Currently, the long-term Britzer phase method, alongside the flushing method, is conducted with the aim of simplifying observations and enabling harmonization of Britz research station data with the ICP Forests network at the international level.

More here:
European beech spring phenological phase prediction with UAV-derived multispectral indices and machine learning ... - Nature.com

Related Posts

Comments are closed.