Time to Read 7 min
Do you have to create a Software Safety Analysis (SW FMEA) for your project and don't know how to do it? Have you looked at the available literature, screened the entries on the web and yet you are not any the wiser?
Have you searched in the standards applicable to your project for instructions on how to perform a SW FMEA, but only found vague phrases? Do you also aspire to not just check off such an analysis pro forma, but that you actually want to increase the safety of your product with it?
The following description assumes that you are already familiar with the basic terms of a qualitative FMEA. If not, you will find corresponding information in "Effective FMEAs" by Carl S. Carlson or in IEC 60812.
Hardware components are becoming more and more reliable. Also, random hardware faults (permanent and transient) can be investigated well with established quantitative analysis methods (e.g. FMEDA according to ISO 26262).
However, it is more difficult to analyze and isolate systematic errors (software errors belong in this category). With increasing complexity and thus an increasing proportion of software, product safety is increasingly determined by systematic software errors. The importance of software safety analysis is therefore constantly increasing.
The following is a step-by-step procedure that has proved its value at Solcept:
The moderator documents the results of all the above steps, i.e. the expert meetings and ensures that all software parts, the architectural elements contained therein and their error sources are covered.
The evaluation catalogs are based on the project needs. The following are some points of reference.
Important: If you define less than 10 levels in a range, you should still use the value range between 1 and 10 so that the influence on the risk figure is identical for all factors.
If you only want to assess safety, it is sufficient to distinguish between "no influence" (1) and "safety risk" (10). However, most often more is packed into the analysis, e.g. the "loss of availability" or the loss of non-safety relevant (primary or secondary) features of the product as intermediate stages.
At the lower end of the scale ("almost never": 1 or "remote": 2) there are errors that are already eliminated by a preventive measure. Likewise, a low error probability is assumed if it can be shown that a software is "well trusted", i.e. if it can be proven that it has been running error-free for years in a comparable application. For new software parts, the error probability is rated higher with increasing complexity (simplicity pays off here as well).
The highest probability of error ("very high": 9, "almost certain": 10) exists if the requirements for a software part are (still) incomplete or missing completely. This does not mean that a requirements review is carried out during the software FMEA ! Rather, points whose specifications are considered missing or unclear with regard to the safety objective are treated accordingly.
High detectability ("obvious": 1) exists if appropriate verification measures are in place, preferably already at unit test level and then with decreasing probability at module test (with or without hardware) or system test level. Review or analysis measures result in lower detectability, detectability is the lower, the larger the size of the software that has to be to be examined in the review. The rating "almost impossible" : 10 is given if the team has no idea how to detect a fault.
In principle, a simple Risk Priority Number (RPN) can be used, which is calculated from the product of the severity (S), Occurrence (O) and Detection (D) evaluation: RPN := S*O*D. For this purpose, an RPN threshold value is determined above which a measure must be taken. Depending on the project, however, more detailed decision matrices are also useful and can be determined by the team.
The following checklist can be used to ensure that the analysis report is complete:
The described procedure has proven to be extremely effective at Solcept. The value for the project and the product lies mainly in the discussions that take place in the expert panel.
Our experience shows that the procedure improves not only in safety. It also improves the overall software quality, the documentation and the development processes and it results in a greater involvement of the engineers in the definition of processes and guidelines.
Moreover, it is astonishing (and very comparable to the effects of EMC measurements) what other changes in the project are triggered by the work of the SW FMEA team. You will catch yourself thinking about using this tool not only in functional safety projects.
We wish you much success with it!
Samuel Leemann
Do you have additional questions? Do you have a different opinion? If so, email me or comment your thoughts below!
is MSc EE ETHZ, hardware-, system- and safety-specialist and co-owner at Solcept. His previous professional activities were in the commercial field and in the development of medical and communication technology. Principles that guide him in development work are simplicity and safety. Samuel is an everyday cyclist, enjoys hiking and keeps fit with yoga.
Projects? Ideas? Questions? Let's do a free initial workshop!
No Comments