Abstract
In the field of renewable energy, the accurate forecasting of wind power generation is paramount for grid stability and efficient resource utilization. This paper presents a novel approach to enhance wind power forecasting (WPF) accuracy using an ensemble of deep learning (DL) models and a meta-heuristic framework. The proposed methodology encompasses a comprehensive pre-processing phase involving data cleaning and normalization through Box-Cox transformation, as well as data imputation to handle missing values. Feature extraction leverages statistical attributes such as skewness, kurtosis, autocorrelation, and fundamental statistics like mean, median, standard deviation, mode, variance, kurtosis, skewness, moment, and interquartile range. Principal Component Analysis (PCA) is applied to reduce dimensionality while preserving essential information. To further enhance model performance, a hybrid feature selection algorithm, named Hybrid Male Mayflies and Motherly Chicks (HMMMC), is employed. HMMMC synergizes the chicken swarm optimization algorithm (CSO) and the mayfly optimization algorithm (MA) to identify the most relevant features for WPF. The heart of our approach lies in the Hybrid Sequential Forest (HybSeqFor), which combines Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) networks, and Random Forest (RF). This ensemble effectively captures both spatial and temporal dependencies in wind power data, yielding superior forecasting accuracy. The proposed model is implemented using MATLAB.
Introduction
WPF plays a pivotal role in the modern energy landscape, especially as the world moves towards a more sustainable and renewable future. The harnessing of wind energy has seen remarkable growth in recent years, making it a leading source of clean, renewable power [1]. However, the inherent variability and intermittency of wind pose significant challenges to the effective integration of wind power into the electricity grid [2]. To address these challenges, accurate WPF has become an essential tool for ensuring the reliability and stability of power systems [3,4]. One of the fundamental characteristics of wind power generation is its dependency on meteorological conditions, primarily wind speed and direction. Unlike traditional power generation from fossil fuels or nuclear sources, wind power production is subject to the whims of nature. Wind turbines, which convert the kinetic energy of the wind into electricity, rely on a consistent and adequate wind resource to generate power efficiently [5,6]. As the wind varies in strength and direction over time, accurately forecasting the wind’s behaviour becomes crucial for efficient power generation and grid management.
WPF addresses the critical issue of predicting how much power will be generated by wind turbines at a specific point in the future [7]. This prediction is essential for various aspects of power system operation, including grid planning, scheduling, and real-time balancing [8,9]. The ability to anticipate wind power generation helps grid operators make informed decisions about how to meet electricity demand, allocate resources, and maintain the stability of the grid. In today’s power systems, three main forecasting challenges related to wind power exist: load forecasting, price forecasting, and renewable energy source forecasting [10]. Among these, WPF stands out due to the unique characteristics of wind as a renewable energy source. Unlike solar power, which exhibits more predictable daily patterns, wind power is highly variable, intermittent, and stochastic in nature.
WPF is performed over different time horizons, depending on the specific application [11]. Short-term WPF is crucial for grid operators to make real-time decisions about power generation and load balancing. The medium-term horizon extends from a few hours to several days ahead. Long-term forecasts extend beyond several days, often covering weeks, months, or even years. Accurate short-term and medium-term WPFs are critical for maintaining the stability and reliability of the electricity grid. They enable grid operators to balance supply and demand, mitigate the impact of wind variability, and optimize the use of available resources [12,13]. Additionally, these forecasts support the efficient operation of energy markets, where electricity is bought and sold in advance. WPF leverages deep learning techniques to enhance accuracy and reliability. Deep learning models, such as recurrent neural networks (RNN) and convolutional neural networks (CNN), are employed to analyze vast datasets of historical weather patterns and wind turbine performance. These models capture complex temporal and spatial dependencies in wind behaviour, allowing for more precise short-term and long-term wind power predictions [14,15]. By harnessing the power of deep learning, WPF aims to optimize grid management, reduce energy production costs, and facilitate the seamless integration of wind energy into the global electricity grid, contributing to a greener and more sustainable future.
The major contribution of this research work is:
In the domain of data preprocessing, a notable aspect involves applying the Box-Cox transformation for data normalization and ensuring robustness in handling missing values through data imputation techniques.
To enhance model performance, presents a novel hybrid feature selection algorithm, HMMMC, which synergizes CSO and MA to identify the most pertinent features for WPF.
To the core of our approach, HybSeqFor combines CNN, LSTM, and RF to effectively capture both spatial and temporal dependencies in wind power data, thereby significantly improving forecasting accuracy.
The rest of this paper is arranged as follows: Section 2 discusses the literature reviews regarding WPF. Section 3 talks about the WPF using ensemble classifiers: architectural description. Section 4 manifests the recorded results. This paper is concluded in section 5.
Literature Review
In 2020, Ahmadi et al. [16] developed three six-month-ahead WPF models using tree-based learning algorithms. Model 1 used average and standard deviation of 10-minute wind speed data at 40m height. Model 2 examined the impact of sampling time (1-h, 12-h, and 24-h) at the same height. Model 3 assessed height extrapolation using data from 30m and 10m heights to predict 40m wind speeds. Longer time intervals and height extrapolation reduced accuracy.
In 2021, Yang et al. [17] introduced an enhanced Fuzzy C-means Clustering Algorithm for day-ahead wind power prediction. By improving initial cluster centre selection based on minimum distance principles, more accurate clustering outcomes are achieved. Turbines with similar power output patterns are grouped together, allowing for the selection of a representative power curve. This equivalent curve is then used to assess wind turbine performance, contributing to the development of a day-ahead wind power prediction model leveraging numerical weather predictions (NWPs) as inputs.
In 2021, Dong et al. [18] used a novel hybrid forecasting model. It begins by decomposing the wind power series into intrinsic mode functions using complete ensemble empirical mode decomposition. Subsequently, a Bernstein polynomial forecasting model with a mixture of Gaussians is constructed. The model’s parameters are optimized using a population-based multi-objective state transition algorithm with a parallel search mechanism.
In 2022, Rayi et al. [19] developed an innovative hybrid time series forecasting model for accurate wind power prediction, combining Variational Mode Decomposition (VMD) with Deep Learning Mixed Kernel Extreme Learning Machine Autoencoder (MKELM-AE). Unlike other deep learning neural network models with nonconvex optimization issues and time-consuming training, MKELM-AE offers advantages such as avoiding manual weight tuning, excellent model generalization, reduced execution time, and precise output weight solutions through generalized least squares, resulting in compact storage.
In 2020, Ko et al. [20] introduced a novel deep residual network designed to enhance time-series forecasting models, vital for the dependable and cost-effective operation of power grids, particularly in the presence of high renewable energy contributions. Addressing potential performance issues related to overfitting in existing stacked bidirectional long short-term memory (Bi-LSTM) layers, the proposed approach combines multi-level residual networks (MRN) and DenseNet. It integrates long and short Bi-LSTM networks, ReLU, and SeLU activation functions, resulting in superior prediction accuracy and parameter efficiency.
In 2022, Ahmad et al. [21] developed an extended deep sequence-to-sequence long short-term memory regression (STSR-LSTM) for time-series WPF. This statistical learning technique enhances feature reliability and overall performance. Experiments across various locations, forecasting classes, and seasons demonstrate the model’s ability to achieve higher accuracy, even with input wind power load curve variations.
In 2020, Sun et al. [22] used an advanced multi-distribution ensemble (MDE) probabilistic WPF framework that leverages various predictive distributions. Employing competitive and cooperative strategies, the framework generates probabilistic WPF for different time horizons. It integrates three probabilistic forecasting models based on Gaussian, gamma, and Laplace distributions and optimizes ensemble model parameters during training. Surrogate models are used to establish relationships between optimal parameters and deterministic forecasts for online forecasting.
In 2019, Du et al. [23] presented a hybrid model that involves three stages: decomposition of wind energy series, an improved wavelet neural network, and rigorous testing. Experimental results demonstrate significantly lower mean absolute per cent errors for the proposed hybrid model compared to other models, showcasing its effectiveness in wind energy prediction.
In 2020, Wang et al. [24] introduced a hybrid WPF approach, BMA-EL, which combines Bayesian model averaging and Ensemble learning. It starts with SOM clustering and K-fold cross-validation to create diverse training subsets from meteorological data. These subsets are used to train three base learners: BPNN, RBFNN, and SVM. The BMA strategy is then applied to combine the outputs of these base learners on a validation set.
In 2020, Liu et al. [25] proposed Chinese wind power generation, a hybrid model combining Wavelet Decomposition (WD) and Long Short-Term Memory neural network (LSTM). WD stabilizes the data, and LSTM predicts national wind power output. Testing shows that WD-LSTM outperforms other models with a MAPE of 5.831, making it a valuable tool for forecasting wind power generation in China under different development scenarios for the next two years.
2.1. Problem Statement
The problems in WPF include the highly variable and intermittent nature of wind, data complexity with multiple meteorological variables, the need for both short-term and long-term forecasting, uncertainty in model predictions, and spatial variability across geographical regions [1]. These challenges can lead to inaccurate and unreliable WPF, hindering the efficient integration of wind energy into the power grid [16]. The challenges in WPF, including variability, data complexity, short-term and long-term forecasting requirements, model uncertainty, and spatial variability, can be effectively addressed through an Ensemble Deep Learning approach. By combining Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and Random Forests, a comprehensive solution can be developed. An Ensemble Deep Learning approach offers a comprehensive solution to the multifaceted challenges of WPF, leading to improved accuracy, reliability, and adaptability in forecasting wind power generation for grid operations and renewable energy integration.
Proposed Methodology
WPF is a crucial process for predicting the amount of electricity that can be generated from wind energy sources in the future. It plays a vital role in the efficient operation of power grids, as wind energy generation is highly variable and dependent on weather conditions. Forecasting methods, including DL and statistical models, are employed to provide accurate predictions of wind power output, aiding in grid management, energy trading, and ensuring a stable supply of electricity from renewable sources. This work developed an ensembled DL for WPF that employs models to enhance wind power predictions. This approach leverages the strengths of each model and uses a meta-heuristic optimization method to improve forecast accuracy.
3.1. Dataset Description
This project aims to forecast wind turbine power generation using key variables like wind speed, wind direction, month, and hour. With a Wind Turbine Power Prediction-GBTRegressor PySpark dataset [26] comprising 50,530 observations, this project not only addresses power prediction but also showcases expertise in handling big data. To effectively manage and analyze this large-scale dataset, the PySpark library is employed, highlighting the ability to work with extensive data sets. The dataset includes critical labels such as Date/Time for temporal, LV ActivePower (kW) indicating actual power output, Wind Speed (m/s), Wind Direction (°), and Theoretical_Power_Curve (KWh) serving as a reference for optimal power production. This project exemplifies data science skills in handling, analysing, and predicting wind turbine power generation.
3.2. Pre-processing
In this study, data pre-processing techniques, data cleaning and normalization have been employed as essential steps to enhance the quality and suitability of the dataset for subsequent analysis and modelling.
3.2.1. Data Cleaning
In WPF, data cleansing is a crucial step that improves the precision and dependability of predictions. To estimate future energy output, WPF significantly relies on previous data, particularly wind speed, direction, and other meteorological factors. But a lot of the time, this data comes with a lot of problems. First and foremost, data validation is crucial. It entails locating and fixing data points that deviate from expected ranges or have unlikely values. The next critical step is to deal with missing data. Gaps can be filled using methods like interpolation or data from adjacent sources. Outlier detection is essential for locating and managing incorrect data items that can skew forecasts. Data from several sources are synchronised and aligned with one another according to the principle of temporal consistency. Feature engineering entails adding new variables to collect relevant information. At the final, continual quality control processes are required to track data quality over time and adjust to changing circumstances. In WPF, data cleaning is a thorough process that includes feature engineering, data validation, missing value filling, outlier identification, guaranteeing temporal consistency, and continual quality control. In the end, it results in more precise and trustworthy wind power estimates, facilitating effective energy grid integration.
3.2.2. Normalization – Box-Cox Transformation
The process of scaling data to have a common range or distribution is called normalisation. Working with data aspects that have various scales or units is especially helpful. The objective is to scale all the variables to the same value, which is normally between 0 and 1. However, other scales may also be employed. Data preprocessing statistical techniques like the Box-Cox transformation are utilised in many disciplines like statistics, machine learning, and data analysis. These techniques try to enhance the volume and distribution of data, making it better suited for modelling and analysis. A specific kind of power transformation called the Box-Cox transformation is used to reduce variance and improve the normality of a dataset. It is appropriate for heteroscedastic data when the distribution of data points differs across various levels of the independent variable.
The original data is represented by the transformation parameter, which is altered to determine the best-fit transformation. The Box-Cox transformation is useful when working with data that differs from the presumptions of many statistical models, such as linear regression. In a variety of domains, both normalisation and the Box-Cox transformation are efficient techniques for raising the calibre of data and getting it ready for analysis and modelling.
3.2.3. Data Imputation
A crucial statistical technique used to deal with missing or insufficient data points in a dataset is data imputation. Due to different variables, including measurement errors or sensor problems, data can be rife with gaps. The general integrity of the dataset is maintained by using imputation techniques to estimate these missing values based on the information that is currently available. There are several widely used imputation techniques, including regression, mean, and median imputation. Mean imputation involves substituting missing values for a variable with the mean of the observed data. Regression imputation predicts missing values by building a regression model based on other variables in the dataset, while median imputation uses the median.
3.3. Feature Extraction
After the data preprocessing phase, the next step in the analysis involves feature extraction. Various statistical and higher-order statistical features are computed from the data to capture important characteristics and features. Here are some of the key features extracted:
3.3.1. Mean
The mean is the average of the numbers provided, and it is computed by dividing the total number of values by the sum of the numbers provided.
3.3.2. Median
The median of a dataset is its midway value. Therefore, the middle value will be the set’s median if a set has an odd number of values. The median will be the average of the two middle values if the set contains an even number of sets given.
Thus, a set of data can be divided into two sections using the median. It is necessary to arrange the set’s components in ascending order in order to determine the set’s median. Then, locate the midpoint.
3.3.3. Standard Deviation
A measurement of how much the data deviates from the mean is referred to as the standard deviation (SD). Whenever the standard deviation is low, all of the values are close to the mean; when it is high, they are widely spread.
3.3.4. Mode
The most frequent value within a dataset is identified by the mode, a statistical measure of central tendency. Unlike the mean and median, which emphasise average or midway values, respectively, it is different. Whether the dataset contains numerical values or categorical data, it is arranged in either ascending or descending order to establish the mode. The mode is then determined as the value that appears the most frequently. The mode offers important insights into the features of the dataset and is especially helpful when identifying the most common value or category within a dataset.
3.3.5. Variance
The variance of a data set is the measure of numerical variation. Variance, in particular, determines how far off each integer in the set is from the mean and, consequently, from the other numbers in the set.
3.3.6. Moment
The Euclidean separation between the centroid and boundary points is the order sequence. It is the index of the Fourier descriptors and represents the total number of boundary coefficients. Eq. (6) – Eq. (8) can be used to calculate the second-order central moment and second-order contour sequence moment.
3.3.7. Interquartile Range (IQR)
IQR is a statistical measure used to understand the spread or dispersion of data in a dataset. It is calculated by finding the difference between the third quartile () and the first quartile (). Quartiles divide a dataset into four equal parts, representing the 25th percentile and representing the 75th percentile. The IQR provides insight into the middle 50% of the data, making it robust against outliers. It is often used in box-and-whisker plots to visualize the variability and distribution of data, helping analysts identify trends and anomalies.
3.3.8. Kurtosis
A statistical measure known as kurtosis defines the pattern of a set of values’ distribution. The peakedness or flatness of the data in relation to a normal distribution is measured. Zero kurtosis denotes a normal distribution, while positive kurtosis denotes a more pronounced peak, and negative kurtosis denotes a flatter distribution. In financial and economic analysis, kurtosis is frequently used to describe the distribution of returns from investments or financial assets. High kurtosis may suggest a higher level of tail risk or the risk of extreme events, such as big losses, in these applications, where it is used to measure the investment’s risk.
3.3.9. Skewness
A statistical measure known as skewness identifies an asymmetrical distribution of values. It gauges how far the numbers deviate from the mean by tilting them to one side or the other. The values in a normal distribution are symmetrical around the mean because the skewness is zero. Having a positive skewness implies that the values are moved to the right or that the right side of the distribution has a long tail, whereas having a negative skewness suggests that the values are shifted to the left or that the left side of the distribution has a long tail.
3.3.10. Autocorrelation
Autocorrelation is a widely used mathematical operation in the time domain that quantifies the similarity of a given time signal to itself over a time scale.
As per Eq. (12), the subscript represents the correlation sequences, and the lag factor of the autocorrelation process, denoted by, signifies the time shift parameter. For a time series with finite data points, the resulting autocorrelation sequence comprises () data points. In this study, autocorrelation is utilized to measure the similarity of vibration signals in the time domain.
3.3.11. PCA
PCA can be used as a feature extraction technique in WPF. By applying PCA to the dataset containing various correlated variables related to WPF, the dimensionality of the data is reduced while retaining the most important information.
Eigen Decomposition
To compute the eigenvalues and eigenvectors of the matrix, sort the eigenvalues in ascending order; these steps are followed.
Compute the eigenvalues and eigenvectors of a matrix.
Sort the eigenvalues in ascending order.
Determine the optimal value and the count of principal components.
Select the top eigenvalues.
Eigenvalue Thresholding
Define thresholding value and threshold eigenvector to separate eigenvalue from noise, and discard eigenvalue that is smaller than. In eigenvalue thresholding, a threshold value is chosen to distinguish between significant eigenvalues and noise. Eigenvalues smaller than, along with their corresponding eigenvectors, are discarded while retaining the significant eigenvalues and eigenvectors. This process effectively separates the eigenvalues from noise, improving the representation of the underlying data structure.
3.4. Feature Selection
Following feature extraction, the process of feature selection becomes important for refining the dataset. This study employs advanced hybrid algorithms such as HMMMC, including CSO and MA.
3.4.1. Hybrid Male Mayflies and Motherly Chicks (HMMMC)
MA is inspired by the unique life cycle of mayflies. Mayflies hatch as mature adults, and their fitness depends on initial traits, not lifespan. CSO mimics chicken swarms’ hierarchical structure and food-searching behaviour. Chickens are categorized as roosters, hens, or chicks based on fitness. Hierarchies and mother-child relationships are updated iteratively. Chickens cooperate and compete for food. CSO has an initialization step and assumptions about hen numbers, mother hen selection, and chick numbers.
3.4.1.1. Male Mayflies Movement
Males tend to congregate in swarms, which suggests that each male mayfly adjusts its situation, which is supported by its own experience and that of its neighbours. The movement of male mayflies’ character is updated by the natural tendency of chicks to follow their mother, which is CSO. The natural tendency of chicks to follow their mother is mathematically formulated, and this is updated in Males’ gathering in swarms, which implies that the position of each male mayfly is adjusted according to both its own experience and that of its neighbours.
Here is the male mayfly’s current position, and by calculating the velocity with the current position, the next position is achieved. To increase their speed, Male mayflies float a few metres just above water level. The algorithm yields only a locally optimal solution and cannot guarantee that the optimal solution found is the best possible solution for the entire problem.
Here, reflects the velocity of the mayfly at the time, in dimension, is the similar mayfly’s position at the time, the positive attraction constants are positive attraction constants in which the cognitive and social components contribution is quantified, and a coefficient of fixed visibility, which limits the visibility of mayfly to others. Is the position mayfly inspected optimal? Is this the section position of the best male mayfly?
Here, it gives the position for fitness value, i.e., the quality of the solution. Finally, the Manhattan distance between and is represented as and the Manhattan distance between and is represented as. The Manhattan-based distance with CSO-based Hybridization has been taken into account in this algorithm. In comparison to the conventional MA and CSO optimization, this hybrid metaheuristic approach improves optimization performance. One of the key advantages of using Manhattan distance is its simplicity. Manhattan distance only requires simple arithmetic calculations. This makes it easier to implement and computationally faster. A single extreme value in the data will not greatly affect the calculated distance. The use of the Manhattan distance in the algorithm provides a way to balance the trade-off between energy efficiency and reliability, allowing the algorithm to make efficient and effective decisions in a D2D communication network.
Here, represents the velocity of the current position of the element and of mayfly. The nuptial dance performance of the best mayfly is important because it randomly gives an element to the algorithm.
Here, it represents the coefficient of nuptial dance, and the random value is represented as s. The coefficient of the nuptial dance drops progressively. The nuptial dance coefficient’s initial value is, iterations current count is represented as and a random number is ∈ [0, 1]. In the male mayfly update stage, the gravitational coefficient has been considered to enhance the convergence of the solution.
3.4.1.2. Movement of Female Mayflies
Female mayflies do not congregate in swarms. Instead, they fly toward males for the purpose of mating. The algorithm runs the risk of getting stuck in a sub-optimal solution and may not produce the most optimal solution possible.
As per Eq. (24), time represents the position in dimension for the female mayfly, while time represents the element of the female mayfly’s speed. At a time represents the component position of the male mayfly, are pre-defined constants for the obviousness coefficient and are the male mayfly and female mayfly’s Manhattan distance. It is a random walk coefficient when a female does not find a man attractive, and it flies randomly, representing the random value in the range [-1,1].
3.4.1.3. Crossover
The crossover procedure starts by recognizing a male mayfly and then a female mayfly. Selections are made according to fitness value, with the best male paired with the best female.
Male denotes the male parent mayfly, female denotes the female parent mayfly, and a value between 0 and 1 is provided. Initial speeds for the first generation of offspring are set to 0.
3.5. Wind-Power Forecasting via Hybrid Sequential Forest (HybSeqFor)
After selecting the optimal features, WPF employs a hybrid approach that leverages advanced learning techniques. Fig. 2 illustrates HybSeqFor, a model for wind power forecasting. It combines CNN for spatial analysis, LSTM for temporal dependencies, and RF for ensemble predictions, enhancing accuracy. By combining the strengths of these models, the forecasting system enhances the accuracy and reliability of wind power predictions.
3.5.1. CNN
Because it can automatically extract spatial information from data, CNN, an Artificial Neural Network (ANN) based on deep learning theory, has found extensive use in the field of forecasting. The convolutional layer, the activation function, the pooling layer, and the fully connected layer are the four main layers that makeup CNN.
3.5.1.1. Convolutional layer
The convolution operation in the convolutional layer, which is used to extract features and learn the mapping between the input and output layers, replaces the matrix multiplication operation in CNN. Sharing parameters during the convolution operation allows the network to learn just one set of parameters, drastically reducing the number of parameters and greatly enhancing computational efficiency.
3.5.1.2. Activation Function
In order to avoid vanishing gradients and hasten training, CNN typically uses Rectified Linear Unit (ReLU) activation functions. Eq. (28) provides a description of ReLU’s goal.
3.5.1.3. Pooling Layer
The network’s computational complexity can be reduced by the pooling layer, which also concentrates the data into feature maps. Max pooling is a common pooling layer.
3.5.1.4. Fully Connected Layer
Fully connected layers, also known as dense layers, are a form of neural network layer where every neuron in the layer is connected to every neuron in the layer below and above it. Each link between neurons has a learnable weight that is modified during training to improve the performance of the network. Fully connected layers are used to find non-linear patterns and correlations in the input data. These layers can record intricate feature interactions.
3.5.2. LSTM
A form of RNN architecture called LSTM is created to address the vanishing and exploding gradient problem, which is a typical problem in conventional RNNs. LSTMs are excellent for processing time-series data because they can identify long-term dependencies in sequential data. An LSTM cell’s memory can be stored and transformed from input to output in the cell state. An LSTM cell is made up of the input gate, update gate, forget gate, and output gate. The input gate chooses information to be absorbed into the neuron, the update gate refreshes the cell, and the output gate creates new long-term memory, as the name of the system suggests. As the LSTM absorbs long-term memory, short-term memory, and input sequence at one-time step and generates new long-term memory, new short-term memory, and new output sequence at another time step, these four essential LSTM components will function and interact in a unique way.
3.5.3. RF
RF is an ensemble learning method that combines different classification decision trees. A random forest is a collection of random trees (decision trees). Following is how the random forest algorithm works: Following are the steps: The steps are as follows: (i) take the initial training data and re-sample it numerous times; (ii) choose a random set of features for each re-sample; (iii) estimate a decision tree given a re-sample and a random set of features; and (iv) combine the set of estimated decision trees to construct a single decision tree. Therefore, the fundamental idea of RFs is to build numerous decision trees on random subsets. For classification purposes, RF makes use of the large variance among individual trees. This is done by giving each tree a vote on whether or not it belongs to a particular class and then choosing the right class value depending on the results of the vote. The RF is more precise and dependable than the individual classifiers due to a number of advantages: It offers the following advantages: It can effectively manage large databases, handle thousands of input variables without deleting any of them, generate an internal, unbiased estimation of the generalised error, estimate the significance of each variable for classification, and compute distances between pairs of cases for locating outliers. (6) These ensembles perform consistently and accurately on challenging datasets with little need for fine-tuning and in the presence of various noisy variables. A classification tree was created in this study using samples taken from the data on landslides’ presence and absence. Each node of the tree is constructed using a random subset of factors. Based on the gain factor, the optimal split is then performed to maximise purity in the resulting groups. Eventually, one node will be added to each of the trees for every leaf. This procedure is repeated until the desired number of trees has been built. During this stage, around a third of the instances are removed from the training set and are left as an out-of-bag sample that can be used to gauge how accurately the tree can classify data. In order to determine whether an attribute has an impure relationship to the classes, the RF also uses the Gini-Index as an attribute selection metric.
The Gini index determines the probability that a randomly selected element would be incorrectly labelled if it were done so in accordance with the distribution of classes in the set. When building each decision tree, the quality of splits is frequently assessed using the Gini index as a criterion. Different potential splits are assessed at each node of a decision tree within the random forest, and the split that minimises the Gini index is chosen as the optimal split. The technique seeks to divide each node’s data points into homogeneous groups using the Gini index as the splitting criterion, increasing the random forest’s overall predicting accuracy.
Result and Discussion
4.1. Experimental Setup
The proposed model is implemented using MATLAB. The proposed model is compared with the existing models, namely, K-nearest neighbour (KNN), Artificial neural network (ANN), Random Forest (RF), Long short-term memory (LSTM), and Convolutional Neural Network (CNN). Performance metrics like Normalized Mean Squared Error (NMSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Correlation, and R-square are used to evaluate the proposed model.
4.2. Performance Metrics
NMSE – It assesses the goodness of fit of a model by comparing the squared differences between actual and predicted values with the squared differences between actual values and their mean. It provides a normalized measure of the mean squared error, where a lower NMSE indicates a better model fit.
MAE – It calculates the average absolute differences between actual and predicted values. It provides a straightforward measure of prediction accuracy, with lower MAE values indicating better model performance.
RMSE – It is similar to MAE but gives more weight to larger errors. It calculates the square root of the average of the squared differences between actual and predicted values, providing a measure of prediction accuracy with the same unit as the target variable.
Correlation – It quantifies the linear relationship between two variables. It ranges from -1 (perfect negative correlation) to +1 (perfect positive correlation), with 0 indicating no linear relationship. It measures how closely actual values align with predicted values.
R-square – It represents the proportion of the variance in the dependent variable explained by the model. It ranges from 0 to 1, with higher values indicating a better fit. It provides insight into how well the model captures the variability in the data.
4.3. Overall performance of the proposed model
In Table 1, several methods have been evaluated based on various performance metrics for a specific task. The metrics used help us assess the accuracy and quality of predictions made by each method. The proposed HSF method demonstrates outstanding performance with a remarkably low MAE of 13.122 and RMSE of 15.217, indicating its superior accuracy in predicting the target variable compared to other methods. Furthermore, the NMSE value of 0.0001 suggests minimal prediction errors. The high correlation value of 0.99993 and R-square value of 0.9998 emphasize the strong linear relationship between the predicted and actual values, underlining the method’s excellence. In contrast, traditional methods such as CNN, LSTM, RF, ANN, and KNN exhibit progressively higher MAE and RMSE values, indicating comparatively lower predictive accuracy. The NMSE values for these methods are also higher, implying larger prediction errors. Additionally, the correlation and R-square values for these methods are lower than the proposed HSF, signifying weaker linear relationships between predictions and actual values.
Table 1: Overall performance metrics
Methods |
MAE |
RMSE |
NMSE |
Correlation | R-square |
Proposed HSF | 13.122 | 15.217 | 0.0001 | 0.99993 | 0.9998 |
CNN | 20.776 | 24.141 | 0.0002 | 0.99984 | 0.9996 |
LSTM | 27.776 | 32.212 | 0.0004 | 0.99971 | 0.9994 |
Random Forest | 35.996 | 41.768 | 0.0008 | 0.99951 | 0.9990 |
ANN | 46.141 | 53.335 | 0.0013 | 0.99921 | 0.9984 |
KNN | 54.471 | 63.039 | 0.0018 | 0.99890 | 0.9977 |
4.4. Graphical Representation
Figure 2 displays graphical representations of various performance metrics, including MAE, MAPE, MEAN, MSE, RAE, RMSE, and predicted time, offering a comprehensive view of model evaluation.
(a)
(b)
(c)
(d)
(e)
Figure 2: Graphical representation of (a) MAE (b) RMSE (c) NMSE (d) Correlation (e) R-square
In Figure 3, the graph illustrates the alignment between actual wind power generation values and those predicted by the model. It visually assesses the model’s accuracy and its ability to capture real-world wind power patterns and fluctuations.
Figure 3: Actual vs Predicted graph
Conclusion
For grid stability and effective resource use in the field of renewable energy, reliable forecasting of wind power generation was essential. This research described a novel method that utilised a group of DL models and a meta-heuristic framework to improve the accuracy of WPF. Data cleaning and normalisation using the Box-Cox transformation, as well as data imputation to accommodate missing values, were all included in the suggested methodology’s thorough pre-processing phase. The statistical properties of skewness, kurtosis, and autocorrelation, as well as the basic statistics of mean, median, standard deviation, mode, variance, kurtosis, skewness, moment, and interquartile range, were used in the feature extraction process. The use of PCA allowed for the reduction of dimensionality while retaining crucial data. HMMMC, a hybrid feature selection approach, was used to improve model performance further. To find the most pertinent features for WPF, HMRC used CSO with method MA. The core of the method was HybSeqFor, which brought together RF, LSTM, and CNN. This ensemble produced higher forecasting accuracy by successfully capturing both spatial and temporal interdependence in wind power data. MATLAB was used to implement the suggested model.
Reference
Kisvari, A., Lin, Z. and Liu, X., 2021. Wind power forecasting–A data-driven method along with gated recurrent neural network. Renewable Energy, 163, pp.1895-1909.
Demolli, H., Dokuz, A.S., Ecemis, A. and Gokcek, M., 2019. Wind power forecasting based on daily wind speed data using machine learning algorithms. Energy Conversion and Management, 198, p.111823.
Duan, J., Wang, P., Ma, W., Fang, S. and Hou, Z., 2022. A novel hybrid model based on nonlinear weighted combination for short-term wind power forecasting. International Journal of Electrical Power & Energy Systems, 134, p.107452.
Niu, Z., Yu, Z., Tang, W., Wu, Q. and Reformat, M., 2020. Wind power forecasting using attention-based gated recurrent unit network. Energy, 196, p.117081.
Zhang, Y., Li, Y. and Zhang, G., 2020. Short-term wind power forecasting approach based on Seq2Seq model using NWP data. Energy, 213, p.118371.
Lin, Z. and Liu, X., 2020. Wind power forecasting of an offshore wind turbine based on high-frequency SCADA data and deep learning neural network. Energy, 201, p.117693.
Xiong, B., Lou, L., Meng, X., Wang, X., Ma, H. and Wang, Z., 2022. Short-term wind power forecasting based on Attention Mechanism and Deep Learning. Electric Power Systems Research, 206, p.107776.
Abedinia, O., Lotfi, M., Bagheri, M., Sobhani, B., Shafie-Khah, M. and Catalão, J.P., 2020. Improved EMD-based complex prediction model for wind power forecasting. IEEE Transactions on Sustainable Energy, 11(4), pp.2790-2802.
Khazaei, S., Ehsan, M., Soleymani, S. and Mohammadnezhad-Shourkaei, H., 2022. A high-accuracy hybrid method for short-term wind power forecasting. Energy, 238, p.122020.
Li, L.L., Zhao, X., Tseng, M.L. and Tan, R.R., 2020. Short-term wind power forecasting based on support vector machine with improved dragonfly algorithm. Journal of Cleaner Production, 242, p.118447.
Scarabaggio, P., Grammatico, S., Carli, R. and Dotoli, M., 2021. Distributed demand side management with stochastic wind power forecasting. IEEE Transactions on Control Systems Technology, 30(1), pp.97-112.
Dong, Y., Zhang, H., Wang, C. and Zhou, X., 2021. Wind power forecasting based on stacking ensemble model, decomposition and intelligent optimization algorithm. Neurocomputing, 462, pp.169-184.
Wang, C., Zhang, H. and Ma, P., 2020. Wind power forecasting based on singular spectrum analysis and a new hybrid Laguerre neural network. Applied Energy, 259, p.114139.
Hong, Y.Y. and Rioflorido, C.L.P.P., 2019. A hybrid deep learning-based neural network for 24-hour ahead wind power forecasting. Applied Energy, 250, pp.530-539.
Al-qaness, M.A., Ewees, A.A., Fan, H., Abualigah, L. and Abd Elaziz, M., 2022. Boosted ANFIS model using augmented marine predator algorithm with mutation operators for wind power forecasting. Applied Energy, 314, p.118851.
Ahmadi, A., Nabipour, M., Mohammadi-Ivatloo, B., Amani, A.M., Rho, S. and Piran, M.J., 2020. Long-term wind power forecasting using tree-based learning algorithms. IEEE Access, 8, pp.151511-151522.
Yang, M., Shi, C. and Liu, H., 2021. Day-ahead wind power forecasting based on the clustering of equivalent power curves. Energy, 218, p.119515.
Dong, Y., Zhang, H., Wang, C. and Zhou, X., 2021. A novel hybrid model based on Bernstein polynomial with mixture of Gaussians for wind power forecasting. Applied Energy, 286, p.116545.
Rayi, V.K., Mishra, S.P., Naik, J. and Dash, P.K., 2022. Adaptive VMD based optimized deep learning mixed kernel ELM autoencoder for single and multistep wind power forecasting. Energy, 244, p.122585.
Ko, M.S., Lee, K., Kim, J.K., Hong, C.W., Dong, Z.Y. and Hur, K., 2020. Deep concatenated residual network with bidirectional LSTM for one-hour-ahead wind power forecasting. IEEE Transactions on Sustainable Energy, 12(2), pp.1321-1335.
Ahmad, T. and Zhang, D., 2022. A data-driven deep sequence-to-sequence long-short memory method along with a gated recurrent neural network for wind power forecasting. Energy, 239, p.122109.
Sun, M., Feng, C. and Zhang, J., 2020. Multi-distribution ensemble probabilistic wind power forecasting. Renewable Energy, 148, pp.135-149.
Du, P., Wang, J., Yang, W. and Niu, T., 2019. A novel hybrid model for short-term wind power forecasting. Applied Soft Computing, 80, pp.93-106.
Wang, G., Jia, R., Liu, J. and Zhang, H., 2020. A hybrid wind power forecasting approach based on Bayesian model averaging and ensemble learning. Renewable energy, 145, pp.2426-2434.
Liu, B., Zhao, S., Yu, X., Zhang, L. and Wang, Q., 2020. A novel deep learning approach for wind power forecasting based on WD-LSTM model. Energies, 13(18), p.4964.
Dataset taken from: “https://www.kaggle.com/code/akdagmelih/wind-turbine-power-prediction-gbtregressor-pyspark/input”, dated 02/10/2023.
Cite This Work
To export a reference to this article please select a referencing stye below: