PHI has been designed to capture and assess the condition of equipment during its life cycle. Thus, it may be utilized in data-driven condition-based maintenance and helps in predicting failures and malfunctions20.
Data Acquisition refers to collection of historical data for a long duration for training a predictive model under normal operating conditions. It is preferable that collected data contains various operating modes and may also include abnormal conditions and operational variations that result from, for example, aging of equipment, fouling, and catalyst deactivation.
The training datasets are collected in real-time directly from the sensors associated with the plant components. The datasets capture the three operational modes; i.e. startup mode, normal operating mode, and shutdown mode. These modes can be subdivided into more detailed modes in some circumstances.
Although the parameters possess a strong correlation, the time lag appears among them may lead to the inability to extract the relationship. The explanation for the time delay in parameters with physical relationships is that it takes time to reach a steady-state once certain changes occur and migrate from one portion to another. However, if the parameters have a strong association, if they change over time, the correlation coefficient may be modest, resulting in errors during the grouping procedure. We employed a dynamic window for sampling which examines the temporal lag among parameters to aid in the effective grouping of variables with a strong link.
The time lag was dealt with using cross correlation. For a delay duration of (t_{d}), Eq.(9) defines the coefficient for cross correlation between two parameters (A) ((a_{0}), (a_{1} , ldots , a_{M})) and (B) ((b_{0}), (b_{1} , ldots , b_{M}))21. The averages of (A) and (B) are (mu_{A}) and (mu_{A}), respectively.
$${upgamma }_{AB} left( {t_{d} } right) = frac{{mathop sum nolimits_{i = 0}^{M - 1} left( {a_{i} - {upmu }_{A} } right)*left( {b_{{i - t_{d} }} - {upmu }_{B} } right)}}{{sqrt {mathop sum nolimits_{i = 0}^{M - 1} left( {a_{i} - mu_{A} } right)^{2} } sqrt {mathop sum nolimits_{i = 0}^{M - 1} left( {b_{{i - t_{d} }} - mu_{B} } right)^{2} } }}$$
(9)
Grouping parameters aims to remove elements that don't provide meaningful data and to limit the number of parameters needed to adequately observe a component. The correlation coefficient employed as a reference for this grouping procedure is calculated for each pair of variables using Eq.(10), and if it exceeds a specified threshold, the variable is included in the training set; otherwise, it is discarded21.
$$rho_{AB} = frac{1}{M}mathop sum limits_{i = 0}^{M - 1} left( {frac{{a_{i} - {upmu }_{A} }}{{{upsigma }_{A} }}} right)left( {frac{{b_{i} - {upmu }_{B} }}{{{upsigma }_{B} }}} right)$$
(10)
where (rho_{AB}) is the correlation coefficient among (A) and (B), and (sigma_{A}) and (sigma_{B}) are their standard deviations.
There are three possible ways to group the parameters: Relational grouping (tags with the same patterns are grouped together), Manual grouping (each group possesses all of the tags), and Success Tree based grouping. The cut-off value of the correlation coefficients is known as group sensitivity. The grouping will become more precise if the group sensitivity is larger. When data is compressed during grouping, the Group Resolution (Shrink) feature is employed. If a tag has 1000 samples and the compression ratio is 100, the samples will be compressed to 100 and the missing information will be filled in by the Grid Size. Major significance of compression includes reduced data storage, data transfer time, and communication bandwidth. Time-series datasets frequently grow to terabytes and beyond. It is necessary to compress the datasets collected for attaining most effective model while preserving available resources.
Preprocessing of collected data is indispensable to ensure the accuracy of the developed empirical models, which are sensitive to noise and outliers. The selection of the sampling rate is also crucial, mainly because for the oil refinery processes the sampling rate (measurement frequency) is much faster than the process dynamics. In the current implementation, low pass frequency filtering with Fourier analysis was used to eliminate outliers, a 10min sampling rate was selected, and the compression rate (Group resolution or shrink) was set at 1000. Moreover, Kalman filter was applied to ensure robust noise distribution of collected data5. Another important preprocessing step is grouping. First, the useful information of the variables is grouped together. It helps to remove redundant variables that do not have useful information. It also reduces the number of variables required for monitoring the plant properly. Finally, the available information must be appropriately compressed via the transformation of high-dimensional data sets into low-dimensional features with minimal loss of class separability21. The maximum tags per group is limited to 51 in this simulation and success tree-based grouping is used in most of the cases. The minimum value of the correlation coefficient, (rho) is set to 0.20 and the group sensitivity was set to 0.90. Higher the group sensitivity will be more accurate the grouping.
Kernel regression is a well-known non-parametric method for estimating a random variable's conditional expectation22,23,24,25. The goal is to discover a non-linear relationship of the two random variables. When dealing with data that has a skewed distribution, the kernel regression is a good choice to use. This model determines the value of the parameter by estimating the exemplar observation and weighted average of historical data. The Kernel function is considered as weights in kernel regression. It is a symmetric, continuous, and limited real function that integrate to 1. The kernel function can't have a negative value. The NardarayaWatson estimator given by Eq.(11) is the most concise way to express kernel regression estimating (y) with respect to the input (x)21,23,24.
$$hat{y} = frac{{mathop sum nolimits_{i = 1}^{n} left[ {Kleft( {X_{i} - x} right)Y_{i} } right]}}{{mathop sum nolimits_{i = 1}^{n} Kleft( {X_{i} - x} right)}}$$
(11)
The selection of appropriate kernel for the situationislimited by practical and theoretical concerns. Reported Kernels are Epanechnikov, Gaussian, Quartic (biweight), Tricube (triweight), Uniform, Triangular, Cosine, Logistics, and Sigmoid 25. In the current implementation of PHI, three types of the kernel regression are provided: Uniform, Triangular, and Gaussian, which are defined as:
Uniform Kernel (Rectangular window): (Kleft( x right) = frac{1}{2}; where left| x right| le 1)
Triangular Kernel (Triangular window): (Kleft( x right) = 1 - left| x right|; where left| x right| le 1)
Gaussian Kernel: (Kleft( x right) = frac{1}{{sqrt {2pi } }}e^{{ - frac{{x^{2} }}{2}}})
The default is the Gaussian kernel which proved to be the most effective kernel for the current implementation.
PHI monitors plant signals, derives actual values of operational variables, compares actual values with expected values predicted using empirical models, and quantifies deviations between actual and expected values. Before positioning it to monitor plant operation, PHI should be first trained to predict the normal operating conditions of a process. Developing the empirical predictive model is based on a statistical learning technique consisting of an execution mode and a training mode. Methods and algorithms used in both modes of the PHI system are shown in Fig.9.
Algorithms of the PHI 26.
In the training mode, statistical methods are used to train the model using past operating data. The system identifies possible anomalies in operation for the execution mode by inspecting the discrepancies between values predicted by the empirical model and actual online measurements. For example, if a current operating condition approaches the normal condition, the health index is 100%. As opposed, if an operating condition approaches the alarm set point, the health index will be 0%. On the other hand, and in terms of process uncertainty, the health index is characterized by the residual deviations; the health index is 100% if a current operating condition is the same as the model estimate (i.e., the residual is 0.0), and is 0% if the operating conditions are far enough from the model estimate (i.e., residual is infinity). The overall plant index is a combination of the above two health indices. Details of the method are presented in21 and26 and presented as an improved statistical learning framework described below.
The framework of PHI is shown in Fig.10. The sequence of actions in the training mode is as follow:
Acquisition of historical data in the long term.
Data preprocessing such as filtering, signal compression, and grouping.
Development of the statistical model.
Evaluation of Health Index.
On the other hand, the sequence of actions in the execution mode is as follows:
Acquisition of real-time data.
Calculation of expected value from the model.
Calculation of residuals.
The decision of process uncertainty.
Calculation of PHI.
In the execution phase, first step is to gather real-time data from the sensor signals and compare this information with the model estimates. Based on the comparison, the residuals between the model estimates and the real time measurements are evaluated. These residuals are used to predict the abnormalities in the plant. Suppose that the online values are [11 12 13 9 15] and the model estimates [11 12 13 14 15], then the estimated residuals will be [0 0 0 5 0]. These values are used in evaluating the process uncertainty (healthiness) by applying Eq.(2). On the other hand, process margins refer to the differences between alarms/trips and the operational conditions, which are evaluated using Eq.(1). An early warning is generated when an abnormal process uncertainty is observed earlier than a process margin. The process margins and process uncertainties are combined in overall health indices using Eq.(3).
The PHI system has been developed using MATLAB. A modular approach has been used so that modifications may be easily introduced, and new algorithms may be added, integrated, and tested as independent modules. This approach was found quite appropriate for research and development purposes. Moreover, the PHI system is delivered as executable MATLAB files.
The main features and functionalities of PHI are (1) detecting the process uncertainty, in terms of a health index, for individual signals as well as for an entire plant, (2) warning anomalies in health indices, and (3) customized user interfaces and historians. Furthermore, since the PHI separately deals with safety-related and performance-related health indices, users can have appropriate decision-making in terms of their situation.
PHI system is a clientserver-based architecture, as shown in Fig.11. The server side is divided into the core modules necessary to build the PHI functionality and PRISM, a real-time BNF (Breakthrough and Fusion) technology database. The clients are divided into the standard client and the web-based client. Figure12 shows the main display of the PHI client. All of these functions bridge the information of the server-side with users.
Server architecture of the PHI system 26.
Example display of the PHI indicating the (a) overall plant health index and the health indices of the (b) reaction and (c) stripper sections.
The results of the PHI can be monitored through the client computer, which has the following main features:
Index display: the default display shows the index in percent of the topmost groups, including the trend. The index of other subsystems can be seen and accessed as well.
Success tree display: The success tree display having a hierarchical display and the group-wise display.
Trend display: A trend display showing the actual-expected value trend.
Alarms display: A grid-based alarm display showing the latest alarm on the top display.
Reports: Reports can be generated about the health status and regular alarm.
Configuration Manager: A configuration manager, which invokes at the beginning of the PHI Client application. The configuration manager checks for the port and the servers IP address; if not able to connect, the configuration manager window will pop up at the startup.
The rest is here:
Read More..