This article discusses that it is a testament to the hard work and ingenuity of the engineers working in the space program that such complicated systems get launched successfully. To the people who study it professionally, risk is the probability, or frequency (probability per unit time), and the consequence (severity) of an undesired event, and the uncertainties associated with the estimated probabilities and consequences. NASA has adopted a “continuous risk management” process for all its programs and projects. This process begins with the identification and analysis of program or project risks that impact success criteria. The risk management process continues with risk analysis, planning, tracking, and control. All unacceptable risks are dealt with before a project or program can proceed. Probabilistic risk assessments (PRA) are useful in every phase of a mission life cycle, not just at design or before launch. A PRA performed in the design phase can help identify the risks associated with systems and components and with technological options.
Space flight is one of the riskiest human endeavors. The hardware and systems used by NASA and other space agencies are among the most complicated ever devised by humans, and with every added component or complexity comes the added chance of failure. It is a testament to the hard work and ingenuity of the engineers working in the space program that such complicated systems get launched successfully.
Even so, it is important not to simply accept every failure as part of the price for "pushing the envelope." When risks can be identified and assessed, engineers can devise ways to find and improve weak points in a complex system. Indeed, risk assessment can play a crucial part of the decision-making process, enabling managers to reduce the existing level of uncertainty when choosing whether to go ahead with a given mission or with specific features of a mission. Sometimes the decision involves the need to trade off between options. For example, how can one make the best allocation between time to perform experiments on the International Space Station and maintenance time needed to create a safer and healthier work environment? Quantitative risk assessment can be used to answer this question. Not every decision so informed by risk assessment is necessarily the right one, but using this methodology helps make better decisions more likely.
To the people who study it professionally, risk is the probability, or frequency (probability per unit time), and the consequence (severity) of an undesired event, and the uncertainties associated with the estimated probabilities and consequences. For a space mission such as the International Space Station, now in Earth orbit, the risks that need to be assessed go far beyond the potential loss or injury of astronauts, though that is certainly the most important outcome to be avoided. Other undesired events, or end states as they are called in risk assessment, include such mishaps as damage to one or more of the station modules, the failure of an important system, or the inability to complete a scheduled mission.
What's more, in a system as complex as the ISS, most everything that could go wrong can be assessed and determined whether it can be linked with one or more of the undesired end states. Take, for instance, the temporary addition of a logistics module to the station, This module, which is used to ferry new material and remove old equipment, has its own power and life support system. But how will these integrate with the systems aboard the space station? And how likely is a mishap while the module is installed? Those are the kinds of questions a robust quantitative risk analysis can help answer.
NASA has adopted a "continuous risk management" process for all its programs and projects. This process begins with the identification and analysis of program or project risks that impact success criteria. The risk management process continues with risk analysis, planning, tracking, and control. All unacceptable risks are dealt with before a project or program can proceed.
One analytic tool that helps in identifying and analyzing these kinds of risks quantitatively is a modeling process known as probabilistic risk assessment. The most robust use of PRA is to make risk comparisons among competing choices or to identify major contributors to the overall risk of a given choice. Once identified, decision makers can choose either to eliminate a given form of risk at all cost (if the likelihood of the potential end state, such as loss of life, is too great to bear) or to reduce the risk on a basis of a risk-benefit tradeoff. All forms of risk cannot be eliminated, however. Taking care of one form of risk can cause other types of risk to arise or increase.
Engineers use a probabilistic risk assessment to get answers to three basic questions: What can go wrong? What is the likelihood that such an outcome will occur? And what are the consequences if the outcome occurs? To find the answers to those questions using a PRA, engineers follow a multistep process.
First, engineers must identity the objective of the PRA within the context of the mission and the potential detrimental end states (usually characterized by levels of severity) that they want to avoid in order to ensure mission success. Engineers assessing the potential risks in a system need to immerse themselves in design, test, and operation information, including talking with experts who have designed, tested, and operated the equipment or even physically inspected the system if at all possible. It's important to remember that the system being assessed may be quite different from its blueprints.
The next step is to identify the events or failures that can lead to the defined end states. From the outset, it isn't always obvious what can cause an end state to happen. In systems as complex as a space mission, components may be linked in indirect, yet crucial ways. To trace failures back to every potential trigger, engineers using PRA employ a technique called a master logic diagram. The MLD is a special type of fault tree, with the ultimate failure at the top and specific causes that can trigger this top failure at the base (basic events). Then each of these causes is examined and all the events that can link them to the top failure event are logically traced through a series of logic steps (gates).
Once the MLD has been fully developed, the next task of the probabilistic risk assessment becomes one of modeling each possible accident or mishap chain of events, or scenario. This step starts with a trigger event, and follows the sequence of intermediate (pivotal) events that can eventually lead to an end state.
Each pivotal event in the scenario is a branch point leading to success or failure; the likelihood of failure is evaluated using a fault tree. The fault tree uses logic symbols (gates) and intermediate events to link the top event with some basic components that have failed (basic events). Maybe it's a switch, or a capacitor. And if you have analyzed the system to its fullest extent, you'll be able to trace through the fault tree how the failure of components can, under the right combination of circumstances, lead to the top (pivotal) event in the mishap scenario, which may be the failure of a crucial system that can prevent or mitigate the progression of the scenario.
Up to this point, the analysis has been essentially qualitative—understanding the entire system and modeling potential event sequences and system components that could lead to an undesirable end state. But that sort of analysis has limited value in helping decision makers determine how to proceed or assisting engineers in identifying and remedying weak points in the system. That's why the next step is to quantify all probabilities in the model described so far. These numbers can come from past experience, databases, or some other estimates including expert judgment, and there's a level of uncertainty associated with each piece of data. Essentially, engineers wind up with probabilities within a range of errors for the failure of each part of the system analyzed by this modeling method.
At the end, you can calculate the relative contribution of various systems and different end states to the overall risk value. You may find out, for instance, that two or three systems contribute up to 90 percent of the total risk. In an outcome such as that, the calculation becomes a red flag to engineers that those systems should be reexamined, modified, or perhaps redesigned.
The risk analysis isn't complete until uncertainty and sensitivity analyses are performed, since these help identify how trustworthy the analysis results are. One way to propagate the uncertainties through the PRA model can be accomplished by running a Monte Carlo simulation—essentially letting a computer run through the model thousands of times. For each analysis parameter needed, a random number generator selects a value from a probability distribution associated with that parameter.
Sometimes it is necessary to conduct a sensitivity analysis, in which one varies the value of one analysis component keeping all others fixed in order to see how greatly that component value affects the rest of the overall analysis results. Engineers can thus uncover if uncertainties in the analysis pertain to systems that are most mission critical.
Finally, probabilistic risk analysis can include evaluation of the relative risk importance measures associated with various components and systems within the overall system being analyzed. These risk importance measures can be used in risk rankings that can be presented to a manager to help in the decision-making process.
One could make the mistake of believing that quantitative risk assessments are impossible—or at least not very useful—when many input data are unavailable or have large uncertainties. In fact, that is precisely the situation when a PRA is most useful. We need a PRA when we are faced with the largest amount of uncertainties. Why would we need a PRA if all the necessary information were available? And if an area of analysis has fairly large levels of uncertainty, it is a sign that more research needs to be done on that area's rate of failure.
Probabilistic risk assessments are useful in every phase of a mission life cycle, not just at design or before launch. A PRA performed in the design phase can help identify the risks associated with systems and components and with technological options. Quantitative risk associated with different design or technological options can be compared and the results used as input into the management decision and tradeoff process. This can be done . even if some mission-specific data do not yet. exist.
During operation, a PRA can help optimize resource allocations and can help predict detrimental effects to the program. When it's time to upgrade—either because of aging and obsolete equipment or adding an enhancement to the system#x2014;PRA can identify technologically acceptable options that minimize risks and provide a consistent and unbiased assessment tool to evaluate the risks and benefits of each upgrade.
In the recent past, for example, a shuttle flight to resupply the ISS presented the following choice between two safety improvement options: There was room to bring up either a new window cover for protection against micrometeorite impact or an additional carbon dioxide removing system#x2014;but not both. It was critical to find out which addition would best improve the safety on-board the station. A PRA showed that the new window covers could wait because the carbon dioxide remover provided a greater safety value (more risk reduction) to the ISS in the near term.
When an asset such as a satellite is at the end of its useful life, its disposal must be carried out safely and cost-effectively. A PRA can help find dismantling and disposal options that minimize risk.
It may appear that the most risk-free thing to do would be to do nothing at all. But that is not true because there are risks in doing nothing. While there are risks in exploring space, there are also risks in not doing it. Moreover, our nation has made a commitment to exploring space, so risk-risk of mission failure, risk affecting human lives-cannot be avoided. Thanks to the analysis tools we have at hand, however, we can help decision makers determine what options or tradeoffs are acceptable, and how to push forward the options involving the least amount of risk and cost.