Time to Read 7 min
The FMEDA (Failure Modes Effects and Diagnostic Analysis) is a proven method for the analysis of fail-safe hardware circuits that is required by all current standards for functional safety. In some industries the method is also called quantitative FMEA (Failure Modes and Effects Analysis).
Below are the most important tips for effective and efficient use of this tool based on our experience in projects in the industrial (IEC 61508, ISO 13849), automotive (ISO 26262) and aerospace (DO-254, ARP4761) sectors.
An FMEA proceeds inductively, i.e. from the cause to the failure. The question here for each subsystem/component is: What safety-relevant "effects" can result from a failure?
E.g. if this component changes its value over time (i.e. it ages): how does this affect the function? E.g. if a state machine chokes by one bit: how does that affect the function?
There are two variants of the FMEA, a purely qualitative and a quantitative one. In the latter, failure probabilities for the different failure modes (short-circuit, open-circuit, drift, stuck-at...) of the components are added to the analysis.
In the industrial and automotive environment a special case of a quantitative FMEA, the FMEDA (Failure Modes, Effects and Diagnosis Analysis) is usually done for the electronic part.
In an FMEDA the reduction in failure rates is directly imputed for diagnostic mechanisms (e.g. reading back output signals).
Safety analyses evaluate the status of the architecture/ design. They provide iterative optimization until sufficient functional safety is achieved. Do not miss the time window of opportunity at the beginning of the project lifecycle, when fundamental changes to the design can be made without much effort.
This means that the FMEDA starts long before the first schematics are ready. The first analyses start at a higher level of abstraction, for example based on the first sketches of a system block diagram. Already at this stage safety mechanisms can be designed that cover failures of entire blocks. Here, the standards offer appropriate assistance with proven architectures (e.g. diagnostic path or redundancy) and safety mechanisms (e.g. cross-comparison, CRC, ...).This can be done informally, during the design. For more complex projects it is better to perform a structured analysis of the functions of blocks at an early stage (e.g. as a system FMEA).
However, it is advisable to wait with the completion of the detailed fault analysis on the level of individual components (as required by the standards). If this is only created with the finished schematics, the effort for the subsequent maintenance of changes is saved.
We see the FMEA both as a design tool (left side of the V-Model: design/ synthesis of the function itself and its safety mechanisms) and as a verification tool (right side of the V-Model: quantitative evaluation of the reliability metrics). The FMEA also shows how important it is to structure the system properly and to select concepts that are as simple/ clear as possible (KISS - Keep it stupid simple).
Prior to structured documentation, an important aspect of a safety analysis is to network all involved experts. Some examples are:
A safety analysis is intended to create a deeper understanding of the circuit, even for aspects that are often not considered in the normal design process. This can only emerge in discussions involving all participants and is ultimately much more important than the tools or the achievement of metrics. We see the FMEA as a tool to structure the communication and create a common discussion basis.
The standards define target metrics (PFH, SFF, DC, MTTFD, SPFM, PMHF, etc.) because they provide a simple and clear definition of the acceptable residual risk. However, each metric only measures exactly those aspects considered in the definition of the formula. Thus, metrics can be easily outsmarted. Therefore they can never replace an engineering expert judgement and common sense. Additionally, the quality of the base data (e.g. failure rates of components) is often limited.
However, during the design process, metrics help to identify and prioritize weaknesses of the design. The Pareto principle also applies to functional safety: the focus of the measures should be on the most important failure cases since this is where the highest risk reduction can be achieved. As rare failure cases/ corner cases must be considered as well, it means that a large proportion of the effort is spent with those.
Our experience has shown that with the detailed FMEDA the metrics are usually achieved without problems, without optimization in the decimal places, provided that the FMEDA was started early (as recommended above).
An absolutely precise fulfillment of the metrics often indicates that the analysis has been tampered with, so an auditor will take a closer look. The argumentation becomes difficult if no safety mechanisms exist for central parts of the safety function.
If the metric is narrowly missed, it is worthwhile to evaluate this in the overall system context. There is often room for maneuver, so this can be accepted, nevertheless.
The FMEDA is necessary, but not sufficient to ensure the safety of a hardware design. An FMEDA analyzes the behavior of the circuit in case of a single component failure in the field, i.e. random hardware failures. Not covered are however e.g.:
During the analysis it is recommended to collect failures on a parking lot, that are brought up, but do not belong to the scope of the FMEDA. These can then be dealt with in the corresponding analyses and design activities.
The analysis is always performed in relation to a specific safety goal. This defines which unsafe states must be avoided. It must also be specified which safe states should be reached in case of a failure. Also note that different safety goals are conceivable for the same circuit depending on the application context.
At the start of the analysis, the safety requirement and the scope of the faults analyzed should be defined in the report. These prerequisites have to be known to all participants, so that diverging assumptions are avoided. If an existing safety analysis is reused in a new project, it must first be checked whether these underlying assumptions still apply.
The FMDEA not only includes the table for the evaluation of each failure mode of each component. For later reproducibility, a report should be created that covers the following points:
See also the corresponding checkpoints in the Software Safety Analysis blog post.
Consider the target audience: The FMEDA table is a tool of the expert team, it is used for evaluation, calculation of metrics and verification that nothing is forgotten. The report addresses the project management (e.g. safety managers), auditors and possible clients/ integrators who require proof and evidence that safety risks have been sufficiently mitigated.
What experiences have you made in your project? Please let me know in the comments...
We are here to support you in your project:
Benefit from our SolceptClinic and send me your specific questions about FMEDA. And the best thing is: this first time-boxed consultation of 30 minutes costs you nothing.
Luzian Hürlimann (original article), the page is maintained by Andreas Stucki
Do you have additional questions? Do you have a different opinion? If so, email me or comment your thoughts below!
Projects? Ideas? Questions? Let's do a free initial workshop!
No Comments