Functional safety - Which standard and what are their scopes?

I'm struggling to grasp the application of the concept of functional safety 

I think that I understand the basic concept - Where we are relying upon an electrical circuit/system we need to be able to rely upon the safety function proportionally to the risk

However this is pretty much my limit, I want to understand the scope of the standards and which standard applies, the BS scope sections aren't making things much clearer - A simple example that I have came across is with a ventilation systems, if a fan is ventilating an area due to prevent a build up of nasty chemicals, or explosive gasses.

Is anything preventing a system having two independent fans with users monitoring?

Is this required to be a functional safety system? If so, is this under the 13849 standard or another? What should the system look like?

Thanks

  • Oh gosh what a huge subject! But a really good question.

    The basic standard is IEC 61508. There are a whole heap of industry specific standards that come off this one, but that's where they all start. But I'll warn you, 61508 is HUGE. It does tell you how to work out the answer to your fan question for example, but in pages and pages of stuff even when you've found the right part (it has seven parts). ISO 13849 is one of these standards, and yes probably is the right one in your case (unfortunately I don't have a copy of it).

    However the basic principles are pretty simple, and you've nailed it with "we need to be able to rely upon the safety function proportionally to the risk". And the risk depends on what other mitigations you have in place, so yes if you have two fans and you can demonstrate that you can detect and repair a failure of the first fan before the second fan will credibly fail then that's a perfectly good mitigation - provided you can show that the detection itself is sufficiently reliable. And that it's not likely (within the level of risk that your  prepared to accept) that both fans will fail, for example due to a power supply failure.

    I should mention at this point that I was the independent safety assessor for the tunnel ventilation system on Crossrail (Elizabeth Line) which was a rather extreme case of exactly this problem! In that case the contractor and Crossrail decided that the consequences of ventilation failure occurring at the same time as a tunnel fire were so high that a high integrity fan control and monitoring system was required. But that was unusual.

    So back to your question, IEC 61508, or any other related standard such as ISO 13849 , won't give you a direct answer to your problem, what they will say is that you need to assess all the possible failure modes and consequences and then determine all potential mitigations. UK law (HASAWA) then says that you must put these mitigations in unless you can show they are not reasonably practicable (i.e. the cost of putting the measure in, e.g. by using a high integrity system, would be disproportionate to the risk). 61508 does give guidance as to how to consider duplication, and more complex systems such as 3 out of 5 and so on. But if you're going there you probably want someone to guide you. 

    You basically only have to use a high integrity system where you're working on a system where failures cannot be mitigated by another system (very simplistically). For example in railway signalling, if the control system sets the signal to green when it should be red there's not much that can be done to stop trains crashing, so we make them very high integrity systems. But it seems in your case that you may well be able to achieve the effect (ensuring that if a fan fails there isn't an explosive gas build up) by suitable duplication and monitoring. Just don't forget those pesky common mode failures like power supplies, or both fans freezing up in winter...

    Cheers,

    Andy 

  • Thank you for being so helpful Andy, so essentially there is nothing within the standard forcing you to use electronic means? It is only if a functional safety circuit has been chosen by the designer based upon the risk assessment, whereas a safe system of work could be perfectly adequate, as opposed to this system has a safety functional impact so requires a safety circuit?

  • As Andy says this is a very wide topic however here are a few thoughts. I first came into contact with it via EN954 (which has been superseded by  ISO 13849). This had a fairly simple risk assessment chart and a series of control architectures to meet the different risk levels. This bought the concepts of dual channel cross-monitored systems into the general machine world. A safety equipment industry grew out of this with specific safety relays, interlock switches and monitored contactors. Various standards for reliability grew up around them. The manufacturers published various safety handbooks and guides.

    ISO 13849 built on the requirements of EN954 with more detailed calculations and definitions, the Performance Levels and Diagnostic Coverage. One again the manufacturers published guides and handbooks. I have attached an older one from ABB that I have used in the past.

    A lot of the functional safety systems, especially at the lower levels, are designed to bring things to a safe stop which is fairly easy. In some cases such as a chemical plants, nuclear reactors  or  aircraft things are required to remain operational to ensure safety. This will generally require other standards, usually related back to IEC 61508, and involve various multi-channel ‘voting’ systems with different architectures on each channel.

    PDF

  • Hi,

    The way I'd normally expect this to be looked at is that you decide the acceptable hazard rate (say in this case the probability that a person is exposed to hazardous fumes), and then you work down the various measures that will prevent this happening. Some of this will be probabilities of technical failures, and some will be human issues - people not following processes, or just making mistakes. When you've factored all those in you can see if you need a high integrity system to make the numbers work.

    There's two main techniques for this. Fault Tree Analysis (FTA) and Layers of Protection Analysis (LOPA). They basically do the same thing, just visualised in slightly different ways.

    In FTA you start with the event you don't want to happen, then work out what combination of things would immediately lead to that happening (e.g. "Fans fail" and "Failure of fans does not raise alarm"), then you work out what combination of events would lead to those, and so on and so on until you get to fundamental failures you can put a probability of failure on (e.g. "fan bearing breaks", "operator does not set alarm", and the one you are trying to find the correct answer to "electronic system fails to control fan speed"). You then play around with the probability you can control (the integrity of the electronics) until you get the answer you want. OR you put in another human or technical protection.

    In LOPA you start with the failure you are looking at, e.g. "fan fails", and think of all the measures that could come between that and the hazard , and again put figures on them to see if they give the final rate you want. It's an easier process, but doesn't handle a complicated combination of things going wrong so well. But is often good for process control systems where there may just be a series of individual barriers between a failure and a hazard.

    Oh, and all this typically depends on working out the probability of humans making mistakes. And that's not something personally I would recommend that engineers do. If you get to work with really good Human Factors people you realise how good they are at that stuff and how much we can get it wrong.

    If you want to look this up, try searching for "SIL allocation". (Safety Integrity Level allocation.) Or of course LOPA and FTA - you might want to start with LOPA.

    Incidentally I glibly said at the top "decide the acceptable hazard rate", i.e. decide how many people you are happy to kill in a thousand / million year period. Practically this can be a nightmare trying to get people to agree to this, no-one wants to take responsibility for this figure, but equally no-one's going to give you unlimited funds. So although I've given you all the theory above, in practice very often people just do what was done in another application to avoid anyone having to make that judgement. Which is fine until someone dies and the questions start being asked about your evidence that you've done enough.

    A pragmatic mid way is to use the same approach as LOPA and FTA to think about all the ways that a hazard could occur, and all the barriers that would prevent it, but instead of putting hard numbers on the probabilities to just use a judgement of "does this look enough?" More times than we might like to admit this is what we do in practice, and often it is enough, depending on the level of risk. It will still give you a clue about whether the electronics is the most important bit - in which case maybe it does need to be high integrity - or whether actually there's a huge number of other things that would need to go wrong first, some of which seem incredible. If the risk is that someone's exposed to fumes and may need a day or two in hospital this approach may be enough, if the result of fume build up is an explosion that rocks half a town (Flixborough!) it won't be.

    It's a very common issue in fault monitoring systems. You put in a fault monitoring system to improve reliability, but then someone realises that it could also detect a potential safety issue arising. The question is, at that point does the monitoring system itself become safety critical? It all depends on what the probability is of the fault occurring in the first place, and whether there is a probability that another system (which could be remaining levels of human inspection) could provide sufficient probability of detection combined with the low integrity monitoring system. This is an area I've been (and am) doing a lot of work on in the rail industry. Sooner or later the integrity of monitoring systems will need to get higher as they get more and more functional safety loaded on them - because we don't want humans doing monitoring / testing for safety functions, partly because they make errors (and, sadly, partly because they're expensive) but mainly from my point of view because humans doing inspections are exposing themselves to risk. However high integrity monitoring systems are very expensive, so it is a complex calculation of risk. Welcome to my world...

    Hope that helps. I got into this world by accident, I spent years developing non-safety critical electronics, and then, almost by accident found myself developing the highest level of safety-critical systems. Exactly the same electronics, but now in an application where if it failed the wrong way many people were likely to die - and nothing was going to stop that. It's a really interesting world, trying to work out if you've done enough. (And yes, I do sleep at night!) But as above, at the moment I also find a lot of my time is currently spent checking people's arguments to not need any level of integrity, because their system will never carry that much responsibility by itself, which again is a really interesting area.

    Cheers,

    Andy

  • Although you give some clues, my answer is based as a generic response rather than specific to your quest. As said elsewhere EN 61508 is considered to be the 'mother standard' with all the others as 'daughter standards'. The 'mother standard' could be used but it is very complicated to follow and sector specific ones are most frequently more appropriate to use. I would also say that the first things you need to determine are (1) what is the market area (some areas, eg rail, nuclear, automotive, process) have specific recommended standards, the second (2) is what is the expected frequency of call on or use of the safety system (generally it will be classed as low demand [less than once per year] or high demand [more than once per year]. For machinery there are two standards that have formal recognition as Harmonised (for the EU/EEA/NI) and Designated (for GB) which although having formal recognition dont have to be used if you can show there is a better standard to use. These two are EN ISO 13849 part one and part 2 or EN IEC 62061. Generally they cover high demand requirements although the latest version of EN IEC 62061 also considers low demand. For other sectors the standards dont have any formal Harmonised/Designated recognition but still have very strong sector specific expectation of use. Process tends to use EN 61511, the other area standards can be found easily by web searches.