Atmospheric drought is a natural phenomenon characterized by abnormally low amount of precipitation and elevated temperatures [1], manifested in various geographical regions across the globe, with an extremely uneven spatial distribution. The demand for the forecast of atmospheric drought is driven by the significant economic damage it inflicts, the considerable social consequences, and the frequent escalation into humanitarian disasters.
When developing a predictive model based on the construction of a regression equation, the key factors for its accuracy are, firstly, the stability of the resulting equation when transitioning to data on an independent sample and the optimal selection of predictors. The predictors included in the regression equation are determined primarily from physical considerations responsible for the dynamics of the predict and, supported by a statistical analysis of the dependence of the latter on the selected set of predictors.
It is important to choose a method for constructing a regression equation, in particular, when constructing multiple regression, which often yields low results on independent samples.
In this paper, the construction of a regression model is based on the use of the method of characteristic roots (eigenvalues) of the inverse correlation matrix of the predictand and predictors [2,3].
The data used and the rationale for the selected predictors
The data used in this study comprises observed precipitation data from the Uzhydromet meteorological network for the period 1966-2023, covering 12 regional centers and the city of Tashkent. The average monthly precipitation values for each month of the year were derived from these data. The standardized index of aridity SPI [4], recommended by the World Meteorological Organization (WMO), as one of the most informative indices of the state of aridity (humidity) of the atmosphere [5] was calculated based on these data. The specified index in the developed model serves as the predictand. During the same period, the monthly average values of the Wolf numbers were entered into the generated database, which are freely available on the website http://www.kosmofizika.ru/spravka/spots.htm characterizing the dynamics of variations in solar activity. Additionally, average monthly values of the Southern Oscillation index (SOI), which are also freely available on the website https://www.ncdc.noaa.gov/teleconnections/enso/indicators/soi/data.csv, were incorporated. The latter, along with precipitation data for the 3 months preceding the forecast period, are used as predictors in the model, but with a monthly delay, i.e. the month preceding the forecast month is selected from the database.
The Wolf Number (W) characterize the state of solar activity and is calculated as:
W = k (f +10g), (1)
where f − is the number of observed spots, g − is the number of observed groups of spots, and k - is a normalization coefficient.
Studies [6-8] have established a significant influence of the year's position within the 11-year solar activity cycle on drought occurrence. However, the impact of solar activity variations on atmospheric conditions, particularly drought (wetness), varies across different geographic regions of the planet, even to the extent of a change in the sign of influence.
For instance, spring-summer droughts in the European territory, according to the catalog of Kamenkova NG, occur on the ascending branch of solar activity, while droughts in Central Asia, taken from the catalog of Uteshev AS [9], are grouped on the descending branch of solar activity. For Central Asia, the ratio of droughts on the ascending branch to those on the descending branch is 1:11 in percentage terms. This relationship is clearly illustrated in figure 1 by the cross-correlation function and the causality function [10,11] between the Wolf numbers and the SPI index as functions of time lag. In selecting predictors for constructing the regression equation, a crucial fact is that the causality function remains within the domain of normal causality for the entire range of time lags, meaning solar activity variations unequivocally influence drought variability in the specified region. Furthermore, the negative values of the cross-correlation function in the time lag range of ± 3 years confirm the inverse relationship between solar activity variations and the processes of drought formation.
Figure 1: Cross-correlation function (a) and causality function (b) between Wolf numbers W and the aridity index SPI, as a function of time lag.
The Southern Oscillation Index (SOI). The climate system is characterized by large-scale self-oscillatory processes, such as the Southern oscillation (redistribution of air masses in the low latitudes of the Southern Hemisphere between the Indian and Pacific Oceans - ENSO) and ocean fluctuations – El Niño (warm phase) and La Niña (cold phase).
A quantitative characteristic of the ENSO is the Southern Oscillation Index (SOI), introduced by Walker GT [12]. The southern oscillation is an atmospheric component of air currents and represents fluctuations in air pressure near the surface layer of the atmosphere between the waters of the eastern and western parts of the Pacific Ocean. The SOI is calculated based on the difference in surface air pressure between the area of Tahiti Island (French Polynesia) and Darwin (Australia), and is determined by the following relationships [12].
In (2),
- is surface pressure at Tahiti point,
- is surface pressure at Darwin point,
- is average surface pressures over the base period in the corresponding points.
Figure 2 shows the cross-correlation functions between SPI and SOI for each month of the year in Uzbekistan. The influence of ENSO on atmospheric processes that stimulate aridity in the territory of Uzbekistan varies: it is less pronounced from June to September, and most significant during the cold period of the year.
Figure 2: Correlation functions between SPI and SOI as a function of time lag. Methods of analysis and construction of the regression equation.
Construction of a predictive regression equation based on characteristic roots. The regression procedure on characteristic roots was developed by R Webster, G Hans and R Mason [13] and independently of them by D Hawkins [14]. This method is described in detail in [2,3,15]. Following, for example, [15], we present an algorithm that implements the regression construction procedure on characteristic roots (eigenvalues).
The time series of predictors Fi(tj), i=1, 2, …, 5, j=1, 2, …, N (precipitation for the three months preceding the forecast, Southern Oscillation index, and Wolf numbers) and predictand Zi(tj) (SPI index) are standardized:
where
- Is the mean square deviations; the top line is averaging. An extended matrix A of size N× (i + 1) is constructed:
Having the matrix (4), the correlation matrix R = (ATA)-1 is calculated by the measure (i + 1) × (i + 1). The matrix R is a symmetric matrix with 1 diagonal:
The next step is to calculate the eigenvalues and eigenvectors of the correlation matrix (5) and arrange the eigenvectors in accordance with the decreasing magnitude of the eigenvalues. The estimates of the regression coefficients aj are based on the formula:
where vom - is the 1st eigenvector, vm - are the subsequent eigenvectors arranged in descending order of eigenvalues λm, and c - is determined from the following expression:
Numerical experiments
To conduct numerical experiments for one-month-ahead atmospheric drought forecasting, a database was used, consisting predictors: average monthly precipitation, the SOI, Wolf numbers and the predictand - calculated values of the SPI aridity index. The training sample period for all predictors and predictand was 50 years, and the independent (test) sample was 10 years. Forecasts using the model were calculated for 12 regional centers and Tashkent city.
As quantitative estimates of the accuracy of the forecast, the following were used:
- standard error:
(7)
- maximum absolute error:
(8)
- sign agreement:
. (9)
In (7), (8) SPIakt - is the actual value of the aridity index; SPIfor - is the prognostic value of the aridity index. In (9) n+ - is the number of SPI values with matched signs, n− − is the number of SPI values with non–matched signs, n - is the total number of cases.
Table 1 summarizes the quantitative assessment of the accuracy of one-month lead-time forecasts for the average monthly SPI, averaged across 13 locations according to formulas (7)-(9).
As follows from the estimates presented in the table, on average, the level of justification of the forecast for the parameter of atmospheric aridity with a month's advance is more than 80%, which is approaching the verification level of short-term forecasts, which currently holds the highest accuracy.