This contribution is only about the areas where we as Solcept are experienced. So it is about development of software and electronics. We ignore the remaining life cycle from production to disposal, also the details of system development on e.g. vehicle or aircraft level.
Functional safety means that the risks of a product from injury or worse are minimized. Functional because safety depends on the correct functioning of the system. E.g., that a vehicle does not move off when you are about to enter, that the cockpit display does not make the pilot believe that he has more fuel than is actually in the tanks.
This is where functional safety differs from "side effects" of the system's function, such as explosion protection, voltage protection, fire protection etc.
As starting point for this overview I assume a "conventional" embedded development in the industrial sector. Note that the quality levels without functional safety (e.g. QM (Quality Managed) in automotive or DAL-E (Design Assurance Level E) in aviation) already need a massively higher effort than the development we take as a base here. This is because these developments should already be carried out according level 3 of automotive SPICE or CMMI (Common Maturity Model Integration). It means that quite many requirements, unit tests, traceability etc. are already present.
Historically the standards for safety critical systems in the embedded area started in 1982 in the United States with the standard DO-178, targeted to airliners. This standard, today in version DO-178C, is still the mother of all standards and its concepts were taken over by other industries. In Germany IEC 61508 originated in 1998, primary for the chemical industry, but also as umbrella standard from which most of the other industry standards have been derived.
I don't want to delve into the topic of the different standards her, in many cases one cannot manage with just one, there is a whole collection of documents which have to be considered. However the basic concepts are the same, so what has to be done for functional safety can be defined independent of the industry.
Finally, here in the introduction, probably the most important point for all those who want to develop for functional safety. You can think about the standards and the models required by functional safety how you like, the moment you say yes to a functional safety project, you have to play the game of functional safety. So you have to carry out the required work and create the documents, to the letter. This applies to the engineers who do the work, but above all to the management, which must provide the time and resources. Otherwise... you have to play another game without functional safety.
A few basic terms are important, so here a short definition:
There are a few basic principles of functional safety which help to understand why one should develop exactly this way and not in any other way.
The first analysis especially occupies the developer of the overall system.
Level | Range | Highest Risk Level | Industry |
---|---|---|---|
DAL: Design Assurance Level | E..A | A | Aviation: ARP4761, ARP 4754A, DO-178C, DO-254... |
SIL: Safety Integrity Level | 1..4 | 4 | Industry, IEC 61508 and railway, EN 50128/9 |
ASIL: Automotive Safety Integrity Level | A..D | D | Automotive, ISO 26262 |
PL: Performance Level | a..e | e | Machinery, ISO 13849 |
Class: Software Safety Class | A..C | C | Medical, IEC 62304 |
The other analyses then take place on each hierarchical level: system, subsystem, component, function, as well hardware as software safety analyses, sometimes with different characteristics. So these affect every developer. There are different variants, the most important are:
Based on the safety analyses, safety measures have to be implemented to detect and prevent the following faults.
These measures may comprise: plausibility checks, redundancy (i.e. several systems which are checking each other), diverse redundancy (redundancy based on components that are built and developed completely diverse), program flow monitoring, error correction for memories and many more.
Errors in the requirements are the most prevalent cause of failure. This is why a lot of importance is attached to requirements in functional safety. Though several aspects have to be considered:
With regard to the V-model an important misunderstanding must be dispelled here: The V-model should not primarily be seen as a Gantt Chart, but as a data management model. It maps the "divide and conquer" and the relationship between the artifacts. In practice this means that one cannot get by without iterations between the levels. Of course, those should be minimized as much as possible for the sake of efficiency. This results in a natural sequence, because one cannot specify and design anything on the lower levels of detail, if not everything is stable and approved on the upper level. Just as one cannot finish testing at the upper levels of integration if the tests at the lower levels have not been completed.
Verification is often equated with testing. For safety critical systems, this is not true, tests are just a small part of verification. Most of verification consists of reviews.
As a consequence of the required code coverage and the requirements based testing it is not allowed (explicitly so in avionics with DO-178C) to write tests for the code coverage for which no requirements exist. So let's just generate a requirement? ...which as "derived" requirement then needs a safety analysis. There must be no unintended functionality. This is why it is worthwhile only to implement that which is really required.
To ensure homogeneous quality throughout the project, standards, rules are required for many artifacts. Those can be developed internally, but it makes it easier to deal with the external auditors if known standards are used, e.g. MISRA for C/ C++ code.
For electronics only high quality components should be selected. When selecting those, the long-term availability should be considered, so the safety evidence upon component changes does not have to be provided again and again. In addition it is key to have good data for the calculation of the failure rates.
Apart from the AEC-Q certificates for automotive there exist almost no "high reliability" parts anymore. Also the "standards" with numbers for failure rates (Siemens SN 29500, MIL-HDBK-217F...) are a victim of the ravages of time respectively the technological advances. Still the standards are used for quantitative analyses as it is in most cases only about the comparison of different technical solutions for the fulfillment of a target value for the overall system, not about a realistic statement on probability of failure.
No modern electronics or software development without software tools. Software? Is the software of all tools in the project without errors? What happens if an error in a tools leads to an error in an artifact?
Functional safety is logic after all, a clear thing. This is what you think at the beginning... But this is quite wrong. Psychological aspects play a not unimportant role, as well for the achievement of the goals, but also for efficiency and before all for the own satisfaction.
No engineer gets around feedback, at the latest during the review of his results. To accept the positive feedback is usually not a problem, but when something is wrong, then sometimes the emotions run high. Here the own attitude towards errors is the issue: Can I accept own errors and learn from them? Am I ready to to look closely at others errors and point to them? Am I ready to carry out such conflicts in a constructive manner? Because only in a "conventional" project "come on, it works" is a reason not to correct bad code, and maybe not even there?
An because it should be the goal to pass the reviews without findings, the lone warrior approach does not work anymore. If I do not agree my solution with others, if I do not work it out together and find a consensus, then I perform so many rounds of reviews that I get dizzy
In the end I am only satisfied when I do not consider each error, each critique as a attack on me as a human, but as an invitation to get even better, to develop myself.
The whole project team and the project manager also have their obligations and must take care of the following aspects.
In order to leave nothing to chance, projects for the development of safety critical systems must be conducted in a planned manner. So processes, roles, responsibilities etc. have to be summarized as plans:
Configuration management assures that at any point in time all artifacts are matching perfectly.
After the first formal release, change management enters into force. The composition of a Change Control Boards is defined. For each change, this board has to answer the following questions while following an exactly defined workflow:
Of course the whole process is always documented with detailed traceability.
All important artifacts, according to safety level this can be some, must be audited, i.e. a further person must perform a review. These audits shall make sure that no shortcuts are being taken. This is the reason why for the auditors a strong independence from the project is required, often third parties are being called in: notified bodies, the customer himself, authorities (EASA, FAA...).
At the end arises the most important document for the customer respectively the authorities, the final statement that the product is safe.
Especially engineers often underestimate the psychological part of communication. Functional safety is team work, thus communication between humans and this communication cannot be reduced to pure information transfer.
Particularly when developing slightly more complicated safety critical systems, most tasks cannot be described in a way that the engineer afterwards can withdraw in his cubbyhole for a few weeks and his solution is good enough. It needs communication between all participants, so that the interfaces are correct and the consensus is reached which is then documented in the many artifacts.
Also good and thereby safe design cannot be reached via metrics, but only as a trade-off, and this consensus is not to be had without communication with each other.
The base for all communication that shall be well received are relationships. Especially when the communication at times is a litlte bit more heavyweight: "this is completely wrong". Good relationship does not mean that everybody has to spend each weekend with everybody, but the relationships have to be so good that an empathic communication is possible.
In the end it is important that even with the meticulous way of work of functional safety, a mood is reached in the team which makes work pleasant.
It is not so simple that the overall responsibility for functional safety can be delegated to the project team. Especially not when the overall product life cycle from production to disposal is included. The organization has to accomplish considerable goals.
It is assumed for functional safety that the organization has processes, lives those and also improves on them. For development automotive SPICE or CMMI (Capability Maturity Model Integration) are common as process models. And those models go much further that ISO 9001, there are more goals to be reached and more practices specified. For avionics you need a DOA (Design Organization Approval), which can also be transferred from the customer if he takes the responsibility for the final quality assurance.
The question which I ask myself is the one whether those processes and models are really lived in the sense they are intended, also when working with large organizations with a high, certified maturity...
What does level 3 mean? Level 3 means in all process models that processes are defined and are lived for the overall organization, i.e. for all projects. These processes are adapted to the project at hand project by a process called "Tailoring". The processes encompass way of working, tools, templates, checklists....
All those processes must be continuously improved, a learning organization is mandated.
Last but not least one of the most important factors: safety culture. The organization must make sure that the safety is prioritized over commercial aspects. This means that it is no more possible to throw a product to market using a heroic weekend mission. All plans, reviews and audits must be observed.
In addition a proactive attitude towards errors is stipulated and that errors are used to learn from them, on project and company level. Clear plans without ad-hoc resources allocation are specified and traceable responsibility.
As we have seen up to now, the effort for each requirement, each component, each line of code is huge. Conversely this means that above each project for functional safety should be written: Keep it Simple! Every requirement, every "nice-to-have" which can be omitted can save a lot of effort. The motto is: simplify whatever is possible, even when this displeases the product management in many cases.
And now, how to proceed to be able to develop in a way that the product can be called functionally safe?
Basically existing code, existing schematics and potentially documentation cannot be reused "like that" for functional safety. So all the activities have to be performed like for a new development and this has to be proven with artifacts. The existing artifacts can only serve as "source of inspiration". Through the simplifications pursued and the correction of errors, which the strict processes will inevitably uncover, the product will anyway emanate changed by the process.
In rare cases, if the simplification is justified by e.g. a new platform, a complete re-development can be sensible. Viewed from the safety aspect a complete re-development by the way has the disadvantage that new errors are built in which have been eradicated in a long-standing product.
And then the same question arises: Can we not just leave the product as it is, there were no failures until now. Theoretically this is possible, but the hurdles to the confirmation of operation hours and complete traceability of errors over years are im most cases so high, that this variant is almost never applicable.
We followed the approach of first reaching Level 3, for us with CMMI-DEV (Common Maturity Model Integration for DEVelopment). Thereto we performed audits with external specialists, first with a focus on efficiency. Then for the applicable safety standards and safety levels we let perform a gap analysis and then corrected the way we work, i.e. our processes in order to close the gaps to the safety standards.
The effort for the establishment and the maintenance of such a process landscape is considerable. For Solcept from 2011 to 2018 (8..16 engineers) the effort was between 2 and 4% of the yearly working hours and about 30'000 CHF per external audit or per gap analysis.
There are other ways also:
On one hand one can buy complete process landscapes. However the question with those is what happens to the current ways of work, i.e. whether the processes fit to your organization and are viable.
One can also establish processes directly from safety standards. We had doubts whether we then would loose the focus on efficiency which CMMI compromises.
The third method would then be to just give the standard to the project team and let it work out the processes. This method collides with the requirements of processes on level 3 and the stipulated safety culture.
We are developing across industries for functional safety, if you wish we transfer the project including the complete project processes back to you.
If you don't want to struggle yourself with the processes, contact me.
Andreas Stucki
Do you have additional questions? Do you have a different opinion? If so, email me or comment your thoughts below!
Projects? Ideas? Questions? Let's do a free initial workshop!
No Comments