Wind and solar output data
Hourly wind and solar output data for 2016 pertaining to 30 provinces of China are retrieved from previous work11, except for Tibet wind, Chongqing solar, Taiwan, Hong Kong, and Macao. The dataset contains 8760h of wind and solar output data, and wind and solar installed capacity data for these 30 provinces are included. We denote the hourly wind output as ({W}_{i,t+{{{{mathrm{1,0}}}}}}) and the hourly solar output as ({S}_{i,t+{{{{mathrm{1,0}}}}}}), where i and t are province and time slot indices, respectively, for (iin [1,N],tin [1,T]), (N=30), and (T=8760). As previously mentioned, daily wind and solar output data are also required for the analysis, which can be calculated as Eqs. (1)-(2):
$${W}_{{{{{{rm{Day}}}}}},{{{{{rm{i}}}}}},{{{{{rm{c}}}}}},0}={{max }}({W}_{i,t,0},{W}_{i,t+1,0}, cdots {W}_{i,t+23,0}),t=24 cdot (c-1)$$
(1)
$${S}_{{{{{{rm{Day}}}}}},{{{{{rm{i}}}}}},{{{{{rm{c}}}}}},0}={{max }}({S}_{i,t,0},{S}_{i,t+1,0}, cdots {S}_{i,t+23,0}),t=24 cdot (c-1)$$
(2)
where ({S}_{{{{mbox{Day}}}},i,c,0}) and ({W}_{{{{mbox{Day}}}},i,c,0}) are the daily solar and wind output, respectively, of province i in time slot t, and c is a day index, for (cin left[1,{C}right] ,{{{{{rm{and}}}}}} ,C=365).
Time series prediction is based on historical data, among which the autoregressive (AR), moving average (MA), and autoregressive moving average (ARMA) techniques are typical methods to study stationary time series and are suitable for a large number of problems. However, the fluctuations in wind and solar energy indicate that their power generation involves a nonstationary time series with a time-varying mean value and variance, which is difficult to study with these methods. Thus, to predict nonstationary sequences, the ARIMA prediction model is introduced by Box-Jerkins. Considering a certain number of differences in the ARIMA prediction model, wind and solar power generation series can be converted into a stationary series, convenient for prediction analysis. In the literature, the ARIMA model is widely used in short-term renewable forecasting and is validated to yield satisfactory results.
In prediction model construction, it is necessary to first determine whether the series is stationary. If the series is not stationary, it should be differentiated until the series meets the stationarity requirements. Suppose the real wind and solar power generation series are ({Y}_{t}), the differential order can be denoted by d, and the differential process can be expressed as Eq. (3):
$${X}_{t}={(1-B)}^{d}{Y}_{t},{{{{{rm{ADFtest}}}}}}({X}_{t})=1,$$
(3)
where ({X}_{t}) is the stationary series of the original real data, B is the lag operator, and ({{{{{rm{ADF}}}}}}{{{{{rm{test}}}}}}=1) passes the stationarity test. Except for the differential order d, the ARIMA model should also determine the autoregressive order p and moving average order q, and the ARMA model for ({X}_{t}) can be expressed as Eq. (4):
$$left(1-{sum }_{i=1}^{p}{varphi }_{i}{B}^{i}right){X}_{t}={mu }_{0}+(1-{sum }_{i=1}^{q}{mu }_{i}{B}^{i}){alpha }_{t},$$
(4)
where ({varphi }_{i}) and ({mu }_{i}) are the autoregressive parameter and moving average parameter, respectively, ({alpha }_{t}) is white noise with a mean of 0, ({mu }_{0}) is a deterministic trend quantity greater than 0, and ({B}^{i}) is the ith power of B. Via the use of the prediction model, we can obtain the predicted series ({X}_{{{{{{rm{predict}}}}}},t}), which is a differential series of the predicted wind and solar power generation. Thus, the predicted power generation can be obtained through Eq. (5):
$${Y}_{{{{{{rm{predict}}}}}},t}={(1-B)}^{-d}{X}_{{{{{{rm{predict}}}}}},t},$$
(5)
where ({Y}_{{{{{{rm{predict}}}}}},t}) denotes the predicted results of the ARIMA-based prediction model, and in this paper, this variable indicates the wind and solar output.
There are three major parameters of the ARIMA-based prediction model: differential order d, autoregressive order p, and moving average order q. Parameter d is determined based on the minimum number of differences required to obtain a stationary time series. The d value is generally smaller than three because the greater the difference order, the more information would be lost52. It should be noted that parameter d is completely determined by the properties of the original sequence, while the selection of p and q should consider the overall prediction effect. In general, p and q should remain within 1/5 of the length of the input data. Due to the large amount of wind and solar power generation data in each province in one year, usually 8760h, we separate multiple prediction windows for each province and used the moving window method to predict wind and solar power generation. At present, the methods for p and q determination usually include the Akaike information criterion (AIC) and Bayesian information criterion (BIC), but the optimal parameter configuration can only be provided for a single prediction window. To unify the prediction models with the different prediction windows in the same provinces and minimize the prediction error, we randomly select 5 weeks of data throughout the year as a sample and traverse p and q for each province to obtain the best parameters with the minimum prediction error. The detailed parameters for each province are listed in Supplementary Table4.
Other parameters, such as the autoregressive parameter ({varphi }_{i}) and moving average parameter ({mu }_{i}), can vary with the input data. These two parameters are determined by the autocorrelation coefficient and autocovariance, respectively, which can be obtained with the YuleWalker estimation, least squares estimation or maximum likelihood estimation method53. In this paper, we build the ARIMA-based prediction model, and all the parameters except p, d, and q could be automatically generated.
In this paper, we set 6h as the prediction time scale and 168h as the input data dimension to predict wind and solar power generation. The reason is that 6h-ahead forecast of renewable generation is widely used for power system scheduling and electricity trading in practice. The 6h-ahead forecast also results in moderate errors that can serve as a benchmark for the uncertainty analysis.
In this paper, we compare four prediction methods including RF, FCNN, RNN, and SVM. These four methods are all sample-based prediction approaches. We begin by constructing the samples using 168-h wind and solar generation data as input features and extracting subsequences of 2, 6, and 24h as output for 2-h, 6-h, and 24-h step predictions, respectively. The RF method employs a tree-based prediction model that builds multiple decision trees during training. The structure of the decision trees is determined by parameters such as tree depth, the number of trees, and the maximum number of features considered when splitting nodes. The FCNN method utilizes a network structure consisting of interconnected perceptron. Each time slots generation data serves as an input feature for the FCNN, and the predicted generation is the output. The network structure is designed based on factors such as regularization, batch size during training, learning rate, and the number of neurons in each layer. The RNN is a neural network structure specifically designed for time series data, incorporating hidden variables to carry information from previous time slots. Similar to the FCNN, the RNNs network structure is determined by parameters including the number of neurons, batch size, and learning rate. The SVM is an initial machine learning method employed to separate the dataset. The SVM solves an optimization problem to find an optimal hyperplane. Key considerations for SVM include regularization parameters, the margin of tolerance around predicted regression values, and the influence attributed to each sample. Further details on the network parameters and the tuning process can be found in the Supplementary Note and Supplementary Table5.
In this paper, the prediction error of wind and solar energy could be calculated as the unit megawatt (MW) prediction error. When using the ARIMA-based benchmark prediction model, we could obtain the predicted wind and solar energy generation, and the prediction error can then be calculated as Eq. (6):
$${varepsilon }_{{{{{{{rm{W}}}}}}},{i,t }}=frac{{W}_{i,t,*}-{W}_{i,t,0}}{{C}_{{{{{{rm{W}}}}}},i}} cdot 100%,, {varepsilon }_{{{{{{rm{S}}}}}},i,t}=frac{{S}_{i,t,*}-{S}_{i,t,0}}{{C}_{{{{{{rm{S}}}}}},i}} cdot 100%,$$
(6)
where ({varepsilon }_{{{{{{rm{W}}}}}},i,t}) and ({varepsilon }_{{{{{{rm{S}}}}}},i,t}) are the wind and solar prediction error in province i in time slot t, ({W}_{i,t,*}) and ({S}_{i,t,*}) are the predicted wind and solar output, respectively, of province i in time slot t, and ({C}_{{{{{{rm{W}}}}}},i}) and ({C}_{{{{{{rm{S}}}}}},i}) are the wind and solar installed capacities, respectively, in province i. When determining the prediction error in a given province, we calculate the average value over 8760h.
The first-order difference can be used to assess the variation in discrete time-series data. With the use of the first-order difference, we can obtain the increment in the original data, which can reflect gradient information. In this paper, prediction is conducted hour-by-hour, and the prediction accuracy is primarily determined by the hourly change in the generation data. Thus, in terms of wind energy, we use the first-order difference of hourly wind generation data to measure the hourly change, which can be calculated as Eq. (7):
$${F}_{{{{{{rm{H}}}}}},i,t}=frac{{W}_{i,t+1,0}-{W}_{i,t,0}}{{C}_{{{{{{rm{W}}}}}},i}},$$
(7)
where ({F}_{{{{{{rm{H}}}}}},i,t}) is the hourly first-order difference in province i in time slot t and ({W}_{i,t+{{{{mathrm{1,0}}}}}}) and ({W}_{i,t,0}) are the real wind energy generation in time slots t+1 and t, respectively. When evaluating the hourly first-order difference in a province, we calculate the average value over 8760h.
Regarding solar energy, power generation exhibits daily periodicity, so we use daily solar energy generation data to measure the fluctuation, which can be expressed as Eq. (8):
$${F}_{{{{{{rm{Day}}}}}},i,c}=frac{{S}_{{{{{{rm{Day}}}}}},i,c+1,0}-{S}_{{{{{{rm{Day}}}}}},i,c,0}}{{C}_{{{{{{rm{S}}}}}},i}},$$
(8)
where ({F}_{{{mbox{Day}}},i,c}) is the daily first-order difference in province i on day c. We also calculate the average value over 365 days to evaluate the solar energy fluctuations in a given province.
In this paper, we use the peak ratio to evaluate the prediction error. It should be noted that all the prediction methods learn the variation tendency of a given data series to predict future data. The easier a tendency is to learn, the more accurate the prediction. Thus, we aim to obtain a feature that could indicate the change in tendency to better measure the prediction error. The peaks of series data indicate inflection points, with previous data exhibiting an upward tendency and subsequent data exhibiting a downward tendency, which is a key feature reflecting the tendency change.
In regard to wind energy, we use four consecutive time slots to determine hourly peaks and traverse the time series to find all peaks, i.e., (t=t+1). The power generation in these four time slots should satisfy the following conditions to reach a peak: the first three hours should continuously increase, the first three hours should increase by more than 10% of the installed capacity, and the fourth hour should decrease, which can be expressed as Eqs. (9)(11):
$${P}_{{{{{{rm{H}}}}}},i,t}=1,,{W}_{i,t,0}-{W}_{i,t-1,, 0} < , 0,{W}_{i,t-1,0}-{W}_{i,t-2,0}ge 0,,{W}_{i,t-2,0}\ -{W}_{i,t-3,0}ge 0,{W}_{i,t-1,0}-{W}_{i,t-3,0} ge 0.1 cdot {C}_{{{{{{rm{W}}}}}},i},$$
(9)
$${P}_{{{{mbox{N}}}},{{{mbox{H}}}},i}={sum }_{tin T}{P}_{{{{mbox{H}}}},i,t},$$
(10)
$${P}_{{{{{{rm{R}}}}}},{{{{{rm{H}}}}}},i}={P}_{{{{{{rm{N}}}}}},{{{{{rm{H}}}}}},i}/T$$
(11)
where ({P}_{{{{{{rm{H}}}}}},i,t}) denotes the hourly peaks in province i in time slot t, ({P}_{{{{{{rm{N}}}}}},{{{{{rm{H}}}}}},i}) is the number of hourly peaks in province i, and ({P}_{{{{{{rm{R}}}}}},{{{{{rm{H}}}}}},i}) is the ratio of hourly peaks in province i. We also calculate the average value over 8760h to evaluate the wind energy fluctuations in each province.
Regarding solar energy, we use daily power generation data to obtain daily peaks. Similar to the hourly peak calculation, four consecutive days are chosen to determine peaks, and similar conditions should be satisfied, which can be expressed as Eqs. (12)(14):
$${P}_{{{{{{rm{Day}}}}}},i,c}=1,,{S}_{{{{{{rm{Day}}}}}},i,c,0}-{S}_{{{{{{rm{Day}}}}}},i,c-1,0} , < , 0,{S}_{{{{{{rm{Day}}}}}},i,c-1,0}-{S}_{{{{{{rm{Day}}}}}},i,c-2,0}ge 0,{S}_{{{{{{rm{Day}}}}}},i,c-2,0}\ -{S}_{{{{{{rm{Day}}}}}},i,c-3,0}ge 0,{S}_{{{{{{rm{Day}}}}}},i,c-1,0}-{S}_{{{{{{rm{Day}}}}}},i,c-3,0}ge 0.1 cdot {C}_{{{{{{rm{S}}}}}},i},$$
(12)
$${P}_{{{{{{rm{N}}}}}},{{{{{rm{Day}}}}}},i}={sum }_{cin C}{P}_{{{{{{rm{Day}}}}}},i,c},$$
(13)
$${P}_{{{{{{rm{R}}}}}},{{{{{rm{Day}}}}}},i}={P}_{{{{{{rm{N}}}}}},{{{{{rm{Day}}}}}},i}/C,$$
(14)
where ({P}_{{{{mbox{Day}}}},i,c}) is the daily peak in province i on day c, ({P}_{{{{mbox{N}}}},{{{mbox{Day}}}},i}) is the number of daily peaks in province i, and ({P}_{{{{mbox{R}}}},{{{mbox{Day}}}},i}) is the ratio of daily peaks in province i. The average value over 365 days is also calculated to express the solar energy fluctuations in each province.
Read the rest here:
Inherent spatiotemporal uncertainty of renewable power in China - Nature.com
Read More..