Sunday, September 19, 2010

QUALITATIVE RISK ANALYSIS METHODOLOGIES

1.    Qualitative risk analysis methodologies

In the this section, we will deal with the qualitative methods used in risk analysis namely preliminary risk analysis(PHA), hazard and operability study(HAZOP), and failure mode and effects analysis (FMEA/FMECA).

1.1    Preliminary Risk Analysis

Preliminary Risk Analysis Preliminary risk analysis or hazard analysis[1, 2, 3, 4, 5] is a qualitative technique which involves a disciplined analysis of the event sequences which could transform a potential hazard into an accident[1]. In this technique, the possible undesirable events are identified first and then analysed separately. For each undesirable events or hazards, possible improvements, or preventive measures are then formulated.
The result from this methodology provides a basis for determining which categories of hazard should be looked into more closely and which analysis methods are most suitable. Such an analysis also proved valuable in the working environment to which activities lacking safety measures can be readily identified. With the aid of a frequency/ consequence diagram, the identified hazards can then be ranked according to risk, allowing measures to be prioritized to prevent accidents.

1.2    Hazard and Operability studies(HAZOP)

The HAZOP technique was developed in the early 1970s[7] by Imperial Chemical Industries Ltd[1]. HAZOP[1, 2, 7, 8, 17] can be defined as the application of a formal systematic critical examination of the process and engineering intentions of new or existing facilities to assess the hazard potential that arise from deviation in design specifications and the consequential effects on the facilities as a whole.
This technique is usually performed using a set of guidewords : NO/NOT, MORE OR/LESS OF, AS WELL AS, PART OF REVERSE, AND OTHER THAN. From these guidewords, scenarios that may result in a hazard or an operational problem is identified. Consider the possible flow problems in a process line, the guide word MORE OF will correspond to high flow rate, while that for LESS THAN, low flow rate. The consequences of the hazard and measures to reduce the frequency with which the hazard will occur are then discussed. This technique had gained wide acceptance in the process industries as an effective tool for plant safety and operability improvements. Detailed procedures on how to perform the technique are available in literature [7, 17].

1.3    Failure Mode and Effects Analysis(FMEA/FMECA)

This method was developed in the 1950s by reliability engineers to determine problems that could arise from malfunctions of military system. Failure mode and effects analysis[1, 2, 3, 5, 6, 7, 8, 9, 17, 25, 26, 27, 28, 29, 30, 31, 58] is a procedure by which each potential failure mode in a system is analysed to determine its effect on the system and to classify it according to its severity[1].
When the FMEA is extended by a criticality analysis, the technique is then called failure mode and effects criticality analysis(FMECA). Failure mode and effects analysis has gained wide acceptance by the aerospace and the military industries[7]. In fact, the technique has adapted itself in other form such as misuse mode and effects analysis[15].
Detail procedures on how to carry out an FMEA and its various applications in the different industries have been documented in [16]. On the other hand, the way to evaluate the criticality index is available in [1] and [3]. The use of knowledge base system for the automation of FMEA process have been discussed in [25, 26, 27], whereas the use of causal reasoning model for FMEA is documented in [28]. An improved FMEA methodology which uses a single matrix to model the entire system and a set of indices derived from probabilistic combination to reflect the importance of an event relating to the indenture under consideration and to the entire system is presented in [29, 30]. A similar approach was made in literature [31] to model the entire system using fuzzy cognitive map.

1.4    Discussion and Conclusion

The three techniques outlined above requires only the employment of hardware familiar personnel. However, FMEA tends to be more labour intensive, as failure of each individual components in the system has to be considered. A point to note is that these qualitative techniques can be used in the design as well as operational stage of a system.
All the techniques mentioned above have seen wide usage in the nuclear power plant and chemical processing plant. In fact, FMEA, one of the most documented, has been used by Intel[52] and National Semiconductor[53] to improve the reliability of their product. For the case of preliminary risk analysis, it has seen application in safety analysis[2] as well as offshore platform[1]. HAZOP, on the other hand, has been widely used in the chemical industries[3] for detailed failure and effect study on the piping and instrumentation layout.
 BACK to TOP         BACK to HOME

2    Tree based techniques

In this section, fault-tree analysis(FTA), event-tree analysis(ETA), cause- consequence analysis(CCA), management oversight risk tree(MORT) and safety management organisation review technique (SMORT) will be discussed.

2.1    Fault tree analysis

The concept of fault tree analysis (FTA)[1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 17, 23] was originated by Bell Telephone Laboratories in 1962 as a technique with which to perform a safety evaluation of the Minutemen Intercontinental Ballistic Missile Launch Control System[23]. A fault tree is a logical diagram which shows the relation between system failure, ie. a specific undesirable event in the system, and failures of the components of the system[2]. It is a technique based on deductive logic. An undesirable event is first defined and causal relationships of the failures leading to that event are then identified.
Figure 1    :    A fault tree depicting the event "Fire breaks out".
Fault tree can be used in qualitative or quantitative risk analysis. The difference in them is that the qualitative fault tree is looser in structure and does not require use of the same rigorous logic as the formal fault tree[7]. Figure 1 shows a fault tree with top event "Fire breaks out". This method is used in a wide range of industries and there is extensive support in the form of published literature and software packages, such as CARA[2]. An application of fault tree analysis on causal relations for large vehicle accidents is documented in [11]. On the other hand, detailed descriptions on how to carry out fault tree analysis are given in literature [1, 3, 7].

2.2    Event tree analysis

Event tree analysis[3, 5, 6, 7, 8, 10, 17] is a method for illustrating the sequence of outcomes which may arise after the occurrence of a selected initial event. This technique, unlike fault tree uses inductive logic. It is mainly used in consequence analysis for pre-incident and post-incident application. The left side connects with the initiator, the right side with plant damage state; the top defines the systems; nodes (dots) call for branching probabilities obtained from the system analysis. If the path goes up at the node, the system succeeded, if down, it failed.
ETA has seen application in the nuclear industries for operability analysis of nuclear power plant as well as accident sequence in the Three Mile Island-2 reactor’s accident[6].

2.3    Cause-Consequence Analysis

Cause-consequence analysis(CCA)[2, 3, 5, 8, 17] is a blend of fault tree and event tree analysis[17]. This technique combines cause analysis (described by fault trees) and consequence analysis (described by event trees), and hence deductive and inductive analysis is used. The purpose of CCA is to identify chains of events that can result in undesirable consequences. With the probabilities of the various events in the CCA diagram, the probabilities of the various consequences can be calculated, thus establishing the risk level of the system.  Figure 2 below shows a typical CCA.
Figure 2 : A typical Cause-Consequence Analysis
This technique was invented by RISO Laboratories in Denmark to be used in risk analysis of nuclear power stations[2]. However, it can also be adapted by the other industries in the estimation of the safety of a protective or other systems[2]. Details on how to carry out cause consequence analysis as well as the benefits and restrictions of it are documented in literature [2, 17].

2.4    Management Oversight Risk Tree

Management oversight risk tree(MORT) was developed in the early 1970s[2] for the U.S. Energy Research and Development Administration as safety analysis method that would be compatible with complex, goal-oriented management systems[17]. MORT[2, 8, 12, 17, 21, 22, 23] is a diagram which arranges safety program elements in an orderly and logical manner. Its analysis is carried out by means of fault tree, where the top event is “Damage, destruction, other costs, lost production or reduced credibility of the enterprise in the eyes of society” [2]. The tree gives an overview of the causes of the top event from management oversights and omissions or from assumed risks or both.
The MORT tree has more than 1500 possible basic events inputed to 100 generic events which have being increasing identified in the fields of accident prevention, administration and management. A generic MORT diagram is included at the end of this report. MORT is used in the analysis or investigation of accidents and events, and evaluation of safety programs. Its usefulness was revealed in literature [17], “normal investigations revealed an average of 18 problems (and recommendations). Complementary investigations with MORT analysis revealed additional 20 contributions per case”.

2.5    Safety Management Organization Review Technique

Safety management organization review technique(SMORT)[2, 17] is a simplified modification of MORT developed in Scandinavia[17]. This technique is structured by means of analysis levels with associated checklists, while MORT is based on a comprehensive tree structure. Owing to its structured analytical process, SMORT is classified as one of the tree based methodologies.
The SMORT analysis includes data collection based on the checklists and their associated questions, in addition to evaluation of results. The information can be collected from interviews, studies of documents and investigations. This technique can be used to perform detailed investigation of accidents and near misses. It also served well as a method for safety audits and planning of safety measures[2].

BACK to TOP

2.6    Discussion and Conclusion

The tree-based methods are mainly used to find cut-sets leading to the undesired events. In fact, event tree and fault tree have been widely used to quantify the probabilities of occurrence of accidents and other undesired events leading to the loss of life or economic losses in probabilistic risk assessment. However, the usage of fault tree and event tree are confined to static, logic modeling of accident scenarios[13]. In giving the same treatment to hardware failures and human errors in fault tree and event tree analysis, the conditions affecting human behaviourcan not be modeled explicitly. This affects the assessed level of dependency between events. No doubt, there exists techniques such as human cognitive reliability[5, 12] to reconcile such deficiencies in the fault tree analysis, new methodologies that model such responses have emerged.

3    Methodologies for analysis of dynamic system

In this section, GO method, digraph/fault graph, event sequence diagrams, Markov modeling, dynamic event logic analytical methodology and dynamic event tree analysis method will be discussed.

3.1    GO method

The GO method[5, 13] is a success-oriented system analysis that uses seventeen operators to aid in model construction[5]. It was developed by Kaman Sciences Corporation during the 1960s for reliability analysis of electronics for the Department of Defense in U.S
The GO model can be constructed from engineering drawings by replacing system elements with one or more GO operators. Such operators are of three basic types : (1) independent, (2) dependent, and (3) logic. Independent operators are used to model components requiring no input and the independent operators, require at least one input in order to have an output. Logic operators, on the other hand, combine the operators into the success logic of the system being modeled. With the probability data for each independent and dependent operator, the probability of successful operation can then be calculated.
The GO method is used in practical application where the boundary conditions for the system to be modeled are well defined by a system schematic or other design documents. However, the failure modes are implicitly modeled, making it unsuitable for detailed analysis of failure modes beyond the level of component events shown in the system drawing. Furthermore, it does not treat common cause failures nor provide structural information(i.e the minimum cut sets) regarding the system. A brief description of GO flow, which is based on GO method is documented in literature [13].

3.2    Digraph/Fault Graph

The fault graph method/digraph matrix analysis[5, 13] uses the mathematics and language of graph theory such as “path set” (a set of models traveled on a path) and “reachability” (the complete set of all possible paths between any two nodes)[5].
This method is similar to a GO chart but uses AND and OR gates instead. The connectivity matrix, derived from adjacency matrix for the system, shows whether a fault node will lead to the top event. These matrices are then computer analysed to give singletons (single components that can cause system failure) or doubletons (pairs of components that can cause system failure).  Digraph method allows cycles and feed back loops which make it attractive for dynamic system.  Figure 3 shows a success oriented system digraph of simplified emergency core cooling system.
Figure 3    :    Success oriented system digraph of simplified emergency core cooling system in a nuclear power plant (Ralph R. Fullwood& Robert E. Hall)

3.3    Markov Modeling

Markov modeling[1, 3, 5, 7, 8, 14, 18] is a classical modeling technique used for assessing the time-dependent behaviour of many dynamic systems[13]. In a ‘Markov chain’ processes, transitions between states are assumed to occur only at discrete points in time. On the other hand, in a ‘discrete Markov process’, transitions between states are allowed to occur at any point in time. For process system, the discrete system states can be defined in terms of ranges of process variables as well as component status.
This methodology also incorporates time explicitly, and can be extended to cover situations where problem parameters are time independent. The state probabilities of the system P(t) in a continuous Markov system analysis are obtained by the solution of a coupled set of first order, constant coefficient differential equations :
                        dP/dt = M.P(t),
where M is the matrix of coefficients whose off-diagonal elements are the transition rate and whose diagonal elements are such that the matrix columns sum to zero. An application of Markov modeling to a hold-up tank problem is discussed in literature [13], while Pate-Cornell(1993) used the technique to study the fire propagation for a subsystem on board a off-shore platform in [14].

3.4    Dynamic Event Logic Analytical Methodology

The dynamic event logic analytical methodology(DYLAM)[13, 18, 19] provides an integrated framework to explicitly treat time, process variables and system behaviour[13]. A DYLAM will usually comprised of the following procedures : (a) component modeling, (b) system equation resolution algorithms, (c) setting of TOP conditions and (d) event sequence generation and analysis.
DYLAM is useful for the description of dynamic incident scenarios and for reliability assessment of systems whose mission is defined in terms of values of process variables to be kept within certain limits in time[19]. This technique can also be used for identification of system behaviour and thus, as a design tool for implementing protections and operator procedures[19].
It is important to note that system specific DYLAM simulator must be created to analyse each particular problem. Furthermore, input data such as probabilities of a component being in certain state at transient initiation, independency of such probabilities, transition rates between different states, and conditional probability matrices for dependencies among states and process variables need to be provided to run the DYLAM package. An application of DYLAM on a reservoir problem is given in literature [18].

3.5    Dynamic Event Tree Analysis Method

Dynamic event tree analysis method(DETAM)[13, 20] is an approach that treats time-dependent evolution of plant hardware states, process variable values, and operator states over the course of a scenario[20]. In general, a dynamic event tree is an event tree in which branchings are allowed at different points in time. This approach is defined by five characteristics set : (a) branching set, (b) set of variables defining the system state, (c) branching rules, (d) sequence expansion rule and (e) quantification tools. The branching set refer to the set of variables that determine the space of possible branches at any node in the tree. Branching rules, on the other hand, refer to rules used to determine when a branching should take place (a constant time step). The sequence expansion rules are used to limit the number of sequences.
This approach can be used to represent a wide variety of operator behaviours, model the consequences of operator actions and also served as a framework for the analyst to employ a causal model for errors of commission. Thus it allows the testing of emergency procedures and identify where and how changes can be made to improve their effectiveness. An analysis of the accident sequence for a steam generator tube rupture is presented in literature [20].

3.6    Discussion and Conclusion

The techniques discussed above address the deficiencies found in fault/event tree methodologies when analysing dynamic scenarios. However, there are also limitation to their usage. The digraph and GO techniques model the system behaviour and deal, in limited extend, with changes in model structure over time. On the other hand, Markov modeling requires the explicit identification of possible system states and the transitions between these states. This is a problem as it is difficult to envision the entire set of possible states prior to scenario development. DYLAM and DETAM can solve the problem through the use of implicit state-transition definition. The drawbacks to these implicit techniques are implementation- oriented[13]. With the large tree-structure generated through the DYLAM and DETAM approaches, large computer resources are required. The second problem is that the implicit methodologies may require a considerable amount of analyst effort in data gathering and model construction.
BACK to TOP             BACK to HOME

Conclusions

A total of 13 risk analysis techniques were reviewed in the discussion above. Qualitative methodologies though lacking the ability to account the dependencies between events, are effective in identifying potential hazards and failures within the system. The tree-based techniques addressed this deficiency by taking into consideration the dependencies between each events. The probabilities of occurrence of the undesired event can also be quantified with the availability of operational data. However, no one has yet attempted to quantified the undesired top event in a MORT tree[12].
Currently, research has been made on DYLAM[13, 18, 19] and DETAM[13, 20] to study accident scenarios by treating time, process variables, system behaviour and operators action through an integrated framework. These techniques address the problem of having less than adequate modeling of conditions affecting control system actions and operator behaviour when using the fault/event tree(e.g. behaviour of plant process variables, previous decisions by the operating crew)[13]. However, the drawbacks for these techniques are the requirement for large computer resources and extensive data collection. With the development of more efficient algorithm and powerful computer, such methodologies would be widely applied.
Any comments, please e-mail me, Tan HiapKeong at thk@pacific.net.sg

No comments: