|
1.INTRODUCTIONCompressor stations, using a series of regulation and control measures, transport natural gas downstream and are important nodes within the natural gas pipeline network. Extreme events refer to those that have a significant impact but a low probability of occurrence, such as hurricanes, earthquakes, deliberate cyber attacks, and terrorist attacks. Once a compressor station is impacted by an extreme event, it can have a substantial effect on the gas transmission function, and may even lead to interruptions (where the gas transmission capacity is reduced to the minimum). This requires the ability of the compressor station to cope with, absorb, and quickly recover from disturbances. The literature features a plethora of definitions for resilience, each emphasizing various facets of the concept. Zhao et al.[1] argued for a comprehensive view of infrastructure resilience that integrates resistance, absorption, and recovery capabilities and developed quantifiable resilience metrics to assess the system’s ability to absorb, adapt, and recover, taking into account the interplay of these resilience capacities. Xu and Chopra[2] perceived resilience as the capacity to “prepare and plan for, absorb, recover from, and more successfully adapt to adverse events” and proposed a resilience cycle framework consisting of four life cycle stages: preparedness, robustness, recoverability, and adaptability. Vugrin et al. [3] categorize the resilience of a system into three distinct capabilities: absorptive, restorative, and adaptive. Absorptive capacity refers to the extent to which a system can inherently mitigate the impacts of disturbances effortlessly and minimize consequences. Restorative capacity emphasizes the system’s ability to repair exogenously after disruptions occur. Adaptive capacity focuses on endogenous mechanisms, such as self-organization and learning processes, to contend with disturbances. In summary, resilience encompasses the capacity of systems, organizations, or individuals to not only recover to their original state after experiencing external disturbances or internal stresses but also to potentially enhance their structural or functional optimization through adaptation and learning. This potential elevates their capacity to withstand future pressures and improves overall adaptability. The concept of resilience emphasizes the dynamic nature of recovery and the potential for growth in the face of adversity, reflecting a duality of restorative and evolutionary characteristics. In contrast, reliability within the fields of engineering and quality management specifically refers to the attribute of a product, system, or service to consistently perform its intended function without failure under specified conditions and over an anticipated lifespan. It quantifies system stability and is often associated with metrics such as failure rates and Mean Time Between Failures (MTBF), reflecting the level of quality assurance throughout the design, manufacturing, and maintenance processes. Researchers commonly consider resilience to be an extension of reliability[4, 5]. Within the resilience framework, resistance denotes the ability of a system to maintain its normal operation in the face of external perturbations, which aligns with the core concept of reliability—the continuity and stability of a system. However, reliability focuses on system performance under normal operating conditions, while resilience places greater emphasis on the capacity of a system to cope under abnormal conditions. Although there is an overlap, the definition of resilience is broader, encompassing the system’s capacity for recovery and adaptation in the face of unforeseen events. In the face of threats from extreme events, resilience assessment has become an indispensable requirement for modern energy systems. Quantitative resilience assessment methods have received growing attention. According to Hosseini et al.[6], these methods can be categorized into two types: measurement-based methods and structure-based models. The resilience curve approach, proposed by Bruneau et al[7]. in 2003, is a classical quantitative evaluation method. In the resilience curve, system resilience is typically described as the temporal variation of its performance in different stages following a disruptive event. After a disruptive event, a system’s performance typically declines and then gradually recovers. The dynamics of system performance can be highly complex in such cases, requiring an approximation of distinct stages. During the disturbance phase, not only does system performance decline, but the system’s capacity for recovery also diminishes, thereby increasing the risk of further attacks. The resilience curve approach is mainly suitable for systems with continuous performance, where performance can be measured using continuous metrics. However, in practice, energy systems are modeled using multi-state models to control the complexity of modeling. For systems with discrete states, multi-state models help solve this problem. The quantification of the resilience of multi-state systems, such as compressor stations, remains an open research question. Markov processes are a powerful tool for describing the multi-state behavior of energy systems. Lin et al.[8] based on Markov chains, proposed a simulation-based Building Performance Recovery Model (BPRM). Subsequently, they applied the established building recovery model from Part I to a medium-sized community to forecast the post-disaster recovery. Zeng et al.[9] proposed a Markov-based reward process assessment framework to explore the impact of multiple hazards on nuclear power plants. Some researchers have established discrete Markov models for the stability of power grids under multiple operating conditions and various failure modes[10, 11]. Therefore, this paper proposes a resilience evaluation method for natural gas compressor stations based on Markov processes. The theoretical and practical contributions of this paper are summarized as follows:
2.A FRAMEWORK FOR MODELING THE RESILIENCE OF COMPRESSOR STATION2.1Modeling assumptions and establishmentFig. 1 shows that after the disturbance has occurred, the system has some ability to maintain stable operation at the baseline functional level, which is the normal operating phase at the beginning of the curve. However, over time, the system function begins to degrade due to external shocks or internal failures. Following the initial phase of system degradation, the system’s functionality is at its lowest point. This can be due to direct damage or as part of a defense mechanism. The recovery phase is characterized by a gradual improvement in system function. This phase can be further divided into a fast recovery, during which the function rapidly returns to an acceptable level, and a subsequent slow recovery, during which the system function transitions to a new steady state at a slower rate. Eventually, the system enters the new normal phase, which may resemble the original state or demonstrate increased resilience or greater vulnerability as the system adjusts and learns following the event. Let X(t) denote the state of the system at time t under a perturbation, it can be postulated that the system’s state space is finite. Here, the random variable X(t) signifies the state of the system at time t, it may be assumed that the state space of the system is Ω = {0,1,2, ⋯,N}. Assuming that the sojourn time of the stochastic process adheres to an exponential distribution, the stochastic process then constitutes a Continuous-Time Markov Chain (CTMC). Within a CTMC, the sojourn time in any given state, that is, the time required to transition from the current state to any other state, is an exponentially distributed random variable. This attribute endows the CTMC with the mathematical property of memorylessness, meaning that the future behavior of the system depends solely on its current state and not on the path by which it arrived at that state. To analyze and forecast the behavior of the system under disturbances, one constructs a rate matrix that describes the possible state transitions. The value of each element represents the transition rate, indicating the average number of transitions from one state to another per unit of time. By analyzing this matrix, one can elucidate the dynamic characteristics of the system over time, including the steady-state probability distribution of each state and the probabilities of transitions between states. The transfer probability matrix is shown below: Where qij denotes the rate of transfer from state i to state j. Moving from state i to state j (i < j) denotes a degradation of system performance, which is generally caused by disturbances such as external impacts or internal equipment failures. Similarly, moving from state j to i denotes the recovery of the system, which is repaired by taking emergency measures as well as subsequent recovery methods. Within the framework of resilience theory, the system states can be subdivided into subsets in order to analyze in depth the performance of the system after it has suffered a disturbance. We assume that all possible states of the system can be classified into the following three subsets of states:
All states Xi in the state set follow the Markov characteristic shown in Fig. 1, where the Markov process describes system failure and recovery as a series of state transitions. 2.2System resilience evaluation metricsIn order to quantify the set of behaviors of a multistate system when it is exposed to a shock, we propose the following metrics for describing the resilience performance of the system at different stages. Definition 1 Resistance: the ability of a system to maintain its original load function unaffected when subjected to an external shock. This attribute reflects the ability of a system to hold back or slow down performance degradation at the moment of initial impact due to its inherent structural integrity and material strength. Typically, resistance is closely related to factors such as the design criteria of the system, the quality of the materials used to build it, and the redundant design of the system. A system with a high level of resistance can maintain stable operation for a longer period in the presence of external shocks. where S1 has been introduced earlier as the normal state subset. That is, the system resistance at moment t is the probability that the system state belongs to S1. Definition 2 Absorption: the degree to which a system can adjust itself to absorb and mitigate the effects of persistent disturbances or stresses. It reflects the internal capacity of the system to buffer disturbances and mitigate damage. When system performance exhibits a gradual decline, the system is able to maintain some of its functions, although not all of them can be maintained. The level of absorption often depends on the adaptive design of the system and its internal mechanisms for coping with disturbances. To prevent the system from entering a non-resilient state, safety barriers are usually designed in engineering practice.For example, a compressor station may be equipped with a series of carefully designed safety barriers (safety instrumented systems, pressure relief valves, control system isolation) to stop faults before they escalate to critical points. These may include advanced monitoring systems for early detection of anomalies, surge control systems to manage flow instability, and emergency shutdown mechanisms that are activated in the event of an operational anomaly. The inclusion of these powerful barriers will not only improve the resilience of compressor stations to absorb and maintain their operational integrity in the face of disturbances, but also enhance their overall safety profile. Definition 3 Recovery: refers to the ability of a system to quickly return to its pre-disturbance operating state and performance level during or after a disturbance. This attribute covers the system’s ability to self-heal, the speed of emergency response, and the effectiveness of recovery measures. In the recovery phase, the ability of the system to return from partial or complete loss of functionality is not only about timely repair and spare parts replacement at the technical level but also about the efficiency of the organization’s emergency response and the robust execution of the recovery strategy. Recoverability is an important indicator of the resilience of a system and reflects its long-term ability to withstand shocks. Where Tlp denotes the moment of leaving the normal operating state. 2.3Metrics solving methodsIn the study of continuous-time Markov chains, the solution of the transfer probability p(t) is often complicated because the Kolmogorov differential equations must be utilized for its calculation [12]. The Kolmogorov equation describes the dynamic law of change of state transfer probability over time and is a systematic set of differential equations, including forward (also known as the Fokker-Planck equation) and backward equations. These equations embody the continuity of transfer probabilities between states over time. The backward and forward equations can be written in matrix form Where P′(t) denotes the derivative of the transfer probability and Q denotes the transfer rate matrix. Thus, the problem of solving the transfer probability of a continuous-time Markov chain is the problem of solving a matrix differential equation whose transfer probability is determined by its transfer rate matrix Q. In particular, if Q is a finite-dimensional matrix, then the solution of (3) is Calculating the probability of staying in a particular state (e.g., state s) usually involves the processing of the transfer rate matrix. The transfer rate matrix Q describes the rate of transitions between states in CTMC In order to compute the probability of staying in state s, the transfer rate matrix Q can be partitioned so that it reflects the interactions between state s and other states. Specifically, the matrix Q can be divided into four parts: Qss, Qssˉ, Qsˉs and Qsˉsˉ. Here, Qss is a matrix; Qssˉ is a row vector representing the transfer rate from state s to all non-s states; Qsˉs is a column vector representing the transfer rate from all non-s states to state s; and Qsˉsˉ is a matrix representing the transfer rate between all non-s states. Therefore, from the definition of resistance and equation (4), the calculated probability of staying at S1 can be expressed as where us1 is the unit-column vector of |S1|. 3.CASE STUDYThe evaluation object of this paper is natural gas compressor station[13]. Fig. 3 shows the typical structure of the compressor station, which is mainly divided into filtering and separating part (one with four standby), compressor group part (two with one standby), and several parts of the station pipeline section. During the operating period of a natural gas compressor station, the realization of key functionalities depends mainly on the filtration and separation section and the compressor section. The filtration and separation section removes impurities and liquids from the natural gas and ensures that the compressor can process clean, dry gas, thereby increasing the operational efficiency of the system and extending the life of the equipment. The compressor section raises the pressure of the processed gas to the required pressure level in the transmission network. Given the importance of these two sections, this paper will focus on the reliability of these two sections when assessing the failure of the unit, while ignoring the failure of the pipe section for the time being. The following states may occur in the units in the compressor station:
Note: There is no degradation state (BP) since only one filter separator is required for the filter separation section to function properly. In the above table, the state subset S1 corresponds to X(t)=1, when the system is in an ideal operating state, with all units functioning perfectly, ensuring continuous and stable delivery of natural gas and meeting the maximum design load requirements. In state S2, i.e., when X(t)=2, the system exhibits some performance degradation, which may be caused by equipment wear, parts failure, or operational errors; nevertheless, the system can still operate and maintain the basic gas transmission task, but the optimal efficiency cannot be guaranteed. When the system enters the remaining states, the system functions are completely lost, and the compressor station is unable to carry out normal operation, and must be shut down for repairs or replacement of components; these states constitute state subset S3. The operation of the compressor station has a high level of safety requirements, so the state transfer is designed so that any unit abnormality triggers the emergency stop procedure. Such a safety warning mechanism ensures that the system can respond quickly when a potential risk is detected, preventing accidents from occurring. The state transfer process is shown in detail in Fig. 4. The transfer from state 1 to states 2-6 means that the system transitions from normal operation to a state of impaired performance, whereas the transfer from states 2-6 back to state 1 represents that the system returns to normal operation after undergoing the necessary repair and recovery measures. To obtain the transfer rate Q matrix, the following assumptions were made:
Table 1Different states of the compressor station
Due to the lack of vulnerability experiments on the components of the compressor station and the difficulty of obtaining the corresponding parameters directly, it is particularly important to use the Fault Tree Analysis (FTA) methodology to derive the state transfer rate of the compressor station under a specific shock event. Fault tree analysis is a systematic fault diagnosis method that utilizes a tree logic model to identify multiple factors and their interrelationships that may lead to system-level failures. In this analytical framework, we assume that the probability of failure of a single compressor when perturbed is 0.56, while the probability of failure of a single filter is 0.23.The construction of the fault tree is based on the principles of system reliability engineering, thus enabling the simulation and quantitative analysis of the impact of potential failures of individual critical components in the compression station on the overall system performance. By demonstrating the fault tree for state 6 as an example, we can further construct fault trees for other states. Each fault tree portrays in detail the logical transition process from the normal operation state to that particular fault state. For the fault tree of state 6, it is possible to specify which combination of component failures triggers this top event by defining the top event (i.e., the fault of state 6) and the bottom event (the basic component fault), as well as the logical relationships between them (e.g., AND logic gates and OR logic gates). Further, by applying probabilistic principles and Boolean algebra, the overall probability of causing the top event to occur can be quantitatively calculated The failure rate λj can be calculated from the following equation where λj denotes the state transfer rate in the Q matrix and Pb denotes the probability of occurrence of each state. The transfer rate matrix Q is obtained: According to the definition and solution method of resilience index proposed in the previous section, the three capacities of the compressor station were calculated, and the calculation results are shown in the following figure. The results presented in Figure 6 illustrate the trends in the resistance probability of the compressor station system over time. Specifically, sub-figure (a) reveals that the system’s resistance probability decreases to 50% of its initial value within a 10-year operational period and gradually approaches zero over an approximate 70-year span. This phenomenon can be attributed to the implemented quadruple redundancy design in the filtration and separation components, which provides prolonged reliability and resistance. The structural configuration in this case study approximates a series arrangement, resulting in a relatively rapid decline of the overall system’s resistance probability to zero. Further examination of sub-figure (b) discloses the absorption probability of the compressor station system as it evolves. Notably, even after 80 years of extended operation, the system maintains an approximate 18% probability of remaining at a normal absorption level. This slow declining trend in probability indicates that the backup system designs implemented in each independent unit effectively enhance the system’s fault tolerance, allowing it to maintain essential operational functions even in the event of component failures. Finally, sub-figure (c) depicts the recovery probability of the compressor station system. The analysis of the data indicates that within the initial five years, the system’s recovery probability rapidly increased to 87%. This substantial rate of recovery suggests that the system is well-equipped to swiftly return to normal operational states, which is critical for the continuous operation of the compressor facility. The recovery rate is not only contingent upon the design’s redundancy and robustness but is also closely linked to the recovery capabilities of each component within the system. Collectively, these results demonstrate that through thoughtful design and consideration of redundancy, the long-term reliability and stability of the compressor station system can be significantly improved. 4.SUMMARY
5.5.REFERENCESZHAO S, LIU X, ZHUO Y,
“Hybrid Hidden Markov Models for resilience metrics in a dynamic infrastructure system[J],”
Reliability Engineering & System Safety, 164 84
–97
(2017). https://doi.org/10.1016/j.ress.2017.02.009 Google Scholar
XU Z, CHOPRA S S,
“Network-based Assessment of Metro Infrastructure with a Spatial – temporal Resilience Cycle Framework[J],”
Reliability Engineering & System Safety, 223 108434
(2022). https://doi.org/10.1016/j.ress.2022.108434 Google Scholar
VUGRIN E D, WARREN D E, EHLEN M A, et al., Sustainable and Resilient Critical Infrastructure Systems: Simulation, Modeling, and Intelligent Engineering, 77
–116 Springer Berlin Heidelberg, Berlin, Heidelberg
(2010). https://doi.org/10.1007/978-3-642-11405-2 Google Scholar
HOSSEINI S, BARKER K, RAMIREZ-MARQUEZ J E,
“A review of definitions and measures of system resilience[J],”
Reliability Engineering & System Safety, 145 47
–61
(2016). https://doi.org/10.1016/j.ress.2015.08.006 Google Scholar
YOUN B D, HU C, WANG P,
“Resilience-Driven System Design of Complex Engineered Systems[J],”
Journal of Mechanical Design, 133
(10),
(2011). https://doi.org/10.1115/1.4004981 Google Scholar
HOSSEINI S, BARKER K, RAMIREZ-MARQUEZ J E,
“A review of definitions and measures of system resilience[J],”
Reliability Engineering & System Safety, 145 47
–61
(2016). https://doi.org/10.1016/j.ress.2015.08.006 Google Scholar
BRUNEAU M, CHANG S E, EGUCHI R T, et al.,
“A Framework to Quantitatively Assess and Enhance the Seismic Resilience of Communities[J],”
Earthquake Spectra, 19
(4), 733
–752
(2003). https://doi.org/10.1193/1.1623497 Google Scholar
LIN P, WANG N,
“Stochastic post-disaster functionality recovery of community building portfolios I: Modeling[J],”
Structural Safety, 69 96
–105
(2017). https://doi.org/10.1016/j.strusafe.2017.05.002 Google Scholar
ZENG Z, FANG Y, ZHAI Q, et al.,
“A Markov reward process-based framework for resilience analysis of multistate energy systems under the threat of extreme events[J],”
Reliability Engineering & System Safety, 209 107443
(2021). https://doi.org/10.1016/j.ress.2021.107443 Google Scholar
A. W, Y. L, G. T, et al.,
“Vulnerability Assessment Scheme for Power System Transmission Networks Based on the Fault Chain Theory[J],”
IEEE Transactions on Power Systems, 26
(1), 442
–450
(2011). https://doi.org/10.1109/TPWRS.2010.2052291 Google Scholar
J. M, S. W, Y. Q, et al.,
“Angle Stability Analysis of Power System With Multiple Operating Conditions Considering Cascading Failure[J],”
IEEE Transactions on Power Systems, 32
(2), 873
–882
(2017). Google Scholar
DOBROW R P, Introduction to Stochastic Processes With R[M], Wiley,2016). https://doi.org/10.1002/9781118740712 Google Scholar
MINGFEI L, XIANGDONG X, JIAN M, et al.,
“A reliability evaluation system of complex gas pipeline network system in the operation period[J],”
Oil & Gas Storage and Transportation, 38
(07), 738
–744
(2019). Google Scholar
|