ALARP for Engineers

Hi all,

the IMechE have published (in 2024) a document titled 'ALARP for Engineers: A Technical Safety Guide".

It's a really useful document that I am finding helpful in our support to our customer. Does anyone know if there is any other equivalent documents produced by any other Institutes (IET would be great) that will help in application of Safety Techniques for complex systems that include Software and Humans in the loop?

I am also starting to look into STPA (I'm a Tim Kelly disciple, so working on Nancy Leveson processes feels disloyal, lol). I have the STPA handbook (all 188 pages of it). Can anyone recommend any additional resources that would help develop capability in this area please? it seems that a step change in how we perceive Hazards and Risk is both required and inevitable. Applying a top level Safety Target to a lower level system is driving me mad and STPA changes the approach of quantification of hazards to be far more sensible and manageable from a System level point of view. 

Many Thanks,

SJ

  • Hi Steven, have you had a look through the IET Bookshop here: https://shop.theiet.org/ ? 

  • it seems that a step change in how we perceive Hazards and Risk is both required and inevitable

    Have a look at Black Swan Events.

    Are they rare or just Poor planning with outdated information. 

    It is wrong to think that a 100-year storm happens only once every 100 years. While not likely, two 100-year storms can occur within a week of each other.  The concept is a major event/disaster  only happens 1 in 100 years thus 1%

    Maritime insurance can attest to this.  We now have more frequent storms that are more violent and thus creating more destructive damage which has a COST to the insurance industry.   Thus 1% probability has risen to 5%.  This needs to be reflected in insurance premiums and also the value of reserve fund that is  needed to be maintained for such events.  

    Consider Great storm of 1987 in UK

    Highest winds    86 mph (139 km/h)[1]
    Highest gusts    134 mph (216 km/h)[1]
    Lowest pressure    953 hPa (mbar); 28.14 inHg[2

    Estimated insurance cost £2 billion (7.106 today)



    Storm Ciara (February 2020):
    One of the most powerful storms of the decade, causing significant damage and widespread disruptions.


    Storm Arwen (November 2021):
    A red alert was issued for winds, causing substantial damage and disruption.


    2024 Storms:
    Record-breaking payouts were recorded due to consistent bad weather throughout the year, leading to extensive damage to homes and businesses

  • I would also throw in all the Human factors work of Reason, Dekker, Rasmussen, etc and the various cognitive psychology aspects.

    They all provide insights into how technical disasters can unfold (not forgetting looking at these power system failure, such as Iberia, Heathrow, etc.

    1 Reason, J.: ‘Human Error’ (Cambridge Univ. Pr., 1990)
    2 Rasmussen, J., Pejtersen, A.M., Goodstein, L.P.: ‘Cognitive systems engineering’ (J. Wiley & Sons, 1994)
    3 Dekker, S.: ‘The Field Guide to Understanding Human Error’ (2017)

  • Hi Steven,

    Well that's a big question - but a very good one. In my experience (given that I've been working in this field for something over 30 years) there's remarkably little documented guidance that I've ever found, and what there is (for example with apologies to Nancy Leveson!) I find can be very, very specific and for many problems over complex. Yes for complex systems, for software, and for human factors then best practice processes are important and valuable, but when training organisations in safety management I tend to focus much more on safety culture. Because it's failures of that, whether in the development environment or in the application environment, that tends to actually result in accidents. And, conversely, if you have a good safety culture in the development team then everyone will be aware of the need to use those best practice processes anyway. But I've never actually found good guidance on safety culture (in the sense of functional safety) - if you find any I'd love to know so I can recommend it. The US Navy did fantastic work in this field in (IIRC) the 1980s/1990s, might be worth looking that up, it will still be relevant.

    For embedded software I do like the book "Embedded Software Development for Safety-Critical Systems" by Chris Hobbs, I find it very pragmatic.   

    Applying a top level Safety Target to a lower level system is driving me mad and STPA changes the approach of quantification of hazards to be far more sensible and manageable from a System level point of view. 

    Now I'll admit I'd never heard of STPA before your post! I don't think it's made inroads into the rail safety world yet. (I've had a brief look, but I'll refrain from commenting further until I've looked into it a bit deeper.) However, regarding safety targets, the problem is that without them it is very hard to know when to stop spending money. It is always possible to spend more money to make a system safer, in the UK ALARP / SFAIRP sets a legal principle that you keep spending money until the cost is "disproportionate" (the definition of which is a legal nightmare in itself), trying to prove disproportion for every subsystem without having a safety target is...challenging. So we have the IEC61508 approach of apportioning safety targets down, and when they can't be calculated (e.g. for software) assigning SILs. That all said, yes it is complex, and in practice doesn't happen anything like often enough. The poor sub-system supplier doesn't manage to get a safety target assigned to them because the system integrator hasn't had a safety target assigned to them, so people use SILs instead ("It'll kill lots of people if it fails so it's SIL4") and then they reverse engineer a hardware safety target from that. Which is not the intention at all, but we all do it (some of us through gritted teeth). Where safety targets are assigned properly they work really well - in my experience it's where there's a relatively short path from the piece of hardware you need a target for to the top level of the overall system, so they are always worth pushing for, but I agree sometimes you just have to concede that you are not going to be assigned a sensible one. 

    (Not really relevant, but: I got the chief safety officer for a delivery project for a new railway line very angry when - as the Independent Safety Assessor - I pushed him to say what his safety target was. Eventually he said furiously "no-one is going to die on my railway". Which was a silly thing to say, of course people are going to die on it during its service life, and at least some of those deaths would be preventable by spending more money. For a start the railway had level crossings, and any railway with level crossings will have fatalities - by spending more on bridges they could have been removed. By saying "it's not going to happen" he signalled that he wasn't actually looking at this seriously enough. I've summarised of course, there were many other reasons why we were aware that he didn't have a clear view of what his risks were.) 

    One thing that is crucial when looking at guidance is that "best practice" varies wildly between industries, I believe you're in the military equipment world so it's important to seek best (or at least accepted) practice in that field. Of course we do all learn from each other, and ideally we would do FAR more of that, but in practice a process that would be considered essential for ALARP / SFAIRP in one industry can be considered OTT  (or even inappropriate) in another. 

    I will ask my Human Factors colleagues about guidance in that area as that's not my field, and I'll post here if they come up with anything. 

    I have threatened that when I retire in a year or two that I'll write a book about this (from a particular perspective, all safety engineers have particular angles, never trust anyone who says "there is only one way to engineer safely"), but whether I would actually get around to it when there's guitars to play and a shed to potter in I don't know!

    Again, really good questions,

    Thanks,

    Andy 

  • Hello Andy:

    In the latest IEEE Spectrum Magazine (April 2026) there is a very interesting article titled "AI Mistakes are very different from Human Mistakes"

    which makes the point that new safety procedures are needed to be developed when dealing with AI based systems.

    Quote "AI errors come at seemingly random times, without any clustering around particular topics . The mistakes tend to be more evenly distributed through the knowledge space; an LLM might be equally likely to make a mistake on a calculus question as it is to propose that cabbages eat goats.".  

    Peter Brooks

    Palm Bay FL

  • Applying a top level Safety Target to a lower level system is driving me mad

    As a systems engineer, I also see this problem in requirements flow-down where, at some point, the requirement becomes 'black box', and isn't easily decomposable to a white box sub-requirement. EMC requirements are often like that. They try and add some coordinating roles but still haven't thought through what to actually test at the lower levels - e.g. PCB level - I did try some work on that but no one bought into it, I expect the same will happen in the general case.

  • EMC requirements are often like that.

    Er....yes. The problem is that nobody knows! Because the answer is that the whole system must work together, but until each part of the system is engineered you don't know how much of an EMC risk it is (in either direction). I spent years working on EMC for safety critical systems, and we quickly worked out that compliance to any standard was of no help at all in the safety case - having sat on the EN committee that drafted the standards for my industry I was well aware how much "best guessing" went into the figures, which meant that compliance to the standard gave no guarantee that the real world wouldn't be harsher than that! Inevitable, as if we'd written the standard for worst case the industry would have ended up massively over engineering. So for EMC for safety critical systems we just have to ensure that any EMC event (in or out of spec) causes the system to fail safe. But I'm lucky, I work in the rail industry where it's (relatively) safe to just stop the trains. I'm very glad I don't have to deal with EMC events in aerospace... 

  • I am also starting to look into STPA (I'm a Tim Kelly disciple, so working on Nancy Leveson processes feels disloyal, lol). I have the STPA handbook (all 188 pages of it). Can anyone recommend any additional resources that would help develop capability in this area please? it seems that a step change in how we perceive Hazards and Risk is both required and inevitable.

    Ok, so I've had a merry evening reading the STPA handbook... 

    STPA Handbook (MIT-STAMP-001)

    Most of this approach seems to closely follow the standard V cycle approach for hazard and risk analysis (as is actually shown in the handbook), The new feature it introduces appears to be the "model the control structure" stage to support identification of the hazards. Which is a really good idea, and I can see why it's been developed for (or appears to have been developed for) the aerospace examples given. It would work excellently for those. However, it does feel to me that this modelling stage (as it's described here) would be disproportionately expensive and complex for many safety systems? Remembering that the process of developing this model itself is subject to the same potential specification and implementation errors as those it is trying to determine in the development of the system (just as developing an FTA or FMEA is), to do it well enough is not cheap. But for systems as critical as is described here worth it.

    Aside from that, yes this handbook is very good guidance to good engineering safety management practice. It's not radical, but it is pretty much best practice. I'd say "pretty much" because what it doesn't appear to thoroughly address (or at least I couldn't find it except a brief mention at the start of chapter 3) is applying this to agile development environments. The authors could well say that's out of scope of this, but it is really important to consider when proposing safety management methodologies - they need to consider how the methodology itself has a built in allowance for constant changes to the the project implementation, and indeed scope. 

    Interesting,

    Thanks,

    Andy