From Digital Technology to Rail Systems: Reflections on Building Reliable, Safe Infrastructure

Hello everyone,

I’m pleased to join the EngX community and to start engaging in discussions here.

My background is in digital technology, cloud and data systems, alongside experience working in regulated, safety-critical environments, including rail and public-sector operations. Working across both digital systems and physical infrastructure has reinforced how critical reliability, safety, and clear standards are when technology moves from theory into real-world use.

I’ve been reflecting on how digital transformation, data, and systems thinking can better support large-scale infrastructure such as rail — particularly around system resilience, operational visibility, and risk reduction.

I’m keen to learn from practitioners with hands-on experience in rail, power, and infrastructure engineering, and to contribute where digital perspectives may add value.

For those working in infrastructure or rail systems:

  • What lessons have you learned when integrating digital technologies into long-life, safety-critical assets?
  • What advice would you give professionals aiming to bridge digital technology and traditional engineering disciplines effectively?

I look forward to learning from the community and contributing to future discussions.

Kind regards,

Mayor

Parents
  • The trouble is, my thorough answer would be the size of the book! Because this is my day job, supporting those who are trying to bring novel solutions (which tend to be "digital") into safety critical rail systems, acting either as a consultant or ISA / Assessment Body.

    (Note: Personally I don't like this usage of the word "digital", train control systems have been digital since the 19th century - in fact, ironically, with ETCS etc the overall implementation is now more "analogue" than it's ever been! But I guess we all know what you mean.)

    But as an overview:

    My biggest challenge is in taking technology development teams, who are delighted with their exciting new technology, several steps backwards to make sure they they have really thought about the consequences if their system fails, and the genuine likelihood that their system will fail in an unsafe way during its lifetime on the rail network. Everyone in the team, from the top to the bottom, has to grasp the concept that they are holding people's lives in their hands. Then they will start to get the idea that the statement "we've had it working in the lab" is the minutest of tiny evidence - they've got to demonstrate that they have delivered a system which will operate safely every day, in every deployment, with every possible configuration of data, and allowing for human behaviour in operator's actions, for possibly up to 20 years or more. 

    And, they've got to accept that their system will fail, whether for reasons of "random" hardware failures, or because of (basically human) specification, design, or integration errors. What it has to do is fail safe. (Which is at least easier in rail than in aerospace. At least we have the option of stopping trains with less safety implications than trying to stop a plane mid flight - although our safety implications are not zero.) Whenever I hear a supplier say "our system is well proven and never fails" I know it's time to start running away fast...  

    So that's the talking down, then comes the talking up: we know how to do all this. We have really good standards and processes that allow novel digital (ouch) systems to be introduced. What is good about the standards is that they are not narrowly prescriptive from a technical point of view, they are not of the form "use only this safe subset of this obscure programming language", they allow technology do develop. The crucial aspect is embedding professional specification, design and development processes through the team (and this has to go right to the top). My experience is that companies that are already experienced in producing high reliability software find few problems in going the few extra steps to achieve compliance to rail safety standards. And in fact, once they understand why the standards say what they do, they will really embrace those extra processes as producing a better quality system.

    The struggle of course comes with companies who are used to the "build it, ship it, develop the next version" mentality. They have to appreciate that if their system has safety implications then that approach does not work in the rail industry (or indeed any safety critical industry). Most of the work has to come at the front end, determining problems before you start, so you build safety into the design - often most effectively by building it into the architecture. This does NOT mean that you can't follow agile and similar processes, but it does mean that that each change needs to be analysed to see if it's negatively affected the safety mitigations that had previously been built in.

    So, another vital top level point comes out of that: you cannot bolt safety on as an extra onto complex integrated systems. Personally my strongly held view on this (although others disagree) is that the safety design needs to be embedded into the functional design. Basically, the specification, design, integration and testing teams need to be aware of the safety implications as they are making their functional decisions - they will then themselves raise an alert if they think a decision they have made needs a wider analysis to ensure that it does not have unexpected consequences. Having an independent "safety" team, in my experience, does not work in these types of systems - they can't have the depth of understanding of the design and the design decisions to spot technically driven problems. However, you will normally need an engineering safety management team who can work with the engineering team to make sure they understand the standards and regulations, and make sure the documents (and there will be a lot of those!) comply to those standards.

    The final critical issue I can think of at the moment is that technology development teams very, very rarely understand the environment that their systems will go into. From the earliest stages, and throughout the development process, "real world" operators, maintainers, and similar staff must be involved and must be listened to. This will frequently be frustrating, but it's a crucial part of the process to turn the development team from an "internal" view of their flashy ground-breaking technology to an "external" view of how the actual users are going to perceive and interact with it. If there is an accident, then saying "the operator shouldn't have done that" is only an excuse if it can be shown that the operator was deliberately negligent. What we see far more commonly is that operators keep feeding back that they find the system confusing, and then one day there is an accident. In those situations it is those that didn't respond to that feedback who were negligent - and in the UK and the majority of other countries this would be criminally negligent. 

    Two last thoughts:

    World wide there is remarkably little money in the rail industry. When I ran an R&D team for a rail equipment supply company I always used to say to sales reps "what we need are your military spec components at commercial prices. And we're only going to buy 100 per year". The level of integrity we are trying to achieve, for the cost that governments or private investors are willing to pay, is extraordinary. The answer is clever engineering - don't just throw technology at the problem, find elegant solutions. 

    And partly because of those cost issues, the rail industry is famously slooooooow in the uptake of new technology. Literally, projects that could be turned round in, say, a year in the automotive world (for example) can take 10 years in the rail industry. Which of course is why railways are extremely safe. And on the positive side, once you have a product or system in then it's in: there's a lot of up front pain, but then there's potential for very long term rewards.

    Obviously, as mentioned at the top, I could (and do in the day job) go for the length of a book about this, but hopefully that's a useful introduction and context.

    Thanks,

    Andy

  • P.S. Not really related, but back to my note / whinge at the top - the real push forward would be to promote "analogue" systems for the rail industry! I don't mean analogue computers of course, any more than this use of the word "digital" means the use of digital computers. (Which even in the famously slow to change rail industry is hardly groundbreaking! Even we've had them for a very long time now.) 

    Many of the big control challenges in the rail industry are due to the very digital functionality of the control systems. The major example of course is block signalling, huge areas of "dead" space on the rail network because a train was either allowed in the block section or it was not. So ironically what the "digital" ECTS system is (slowly) introducing is analogue functionality, the spacing between trains being able to be continuously variable, proportional to train speed and stopping distance.

    Similarly, one of the many areas I work in is remote fault reporting of trackside equipment, trying to help organisations move from "digital" monitoring (it works or it doesn't) to far more useful "analogue" monitoring (it's ok at the moment, but it's trending towards failure, schedule in a repair team in an overnight possession before it breaks and causes delays). 

    I don't suppose it's an argument I'd ever win, but quite seriously if I was launching an innovative rail control or management system I can very much imagine that I would be marketing its benefits as being "analogue" functionality as opposed to the old fashioned limited "digital" functionality! I do find this generic use of the word "digital" to mean "through the use of technology" redundant, personally I would say that solutions are either technology based systems or human based systems - which in fact is the sort of terminology I do use in rail safety arguments, I would only use "digital" to refer to digital electronics where it specifically needs to be differentiated from analogue electronics, or as in the above examples to be clear that the system only has a defined number of functional states (mostly where I'd use it it would be two states). Subject for a separate thread though, and I accept that it's probably a lost cause through the evolution of the English language.

Reply
  • P.S. Not really related, but back to my note / whinge at the top - the real push forward would be to promote "analogue" systems for the rail industry! I don't mean analogue computers of course, any more than this use of the word "digital" means the use of digital computers. (Which even in the famously slow to change rail industry is hardly groundbreaking! Even we've had them for a very long time now.) 

    Many of the big control challenges in the rail industry are due to the very digital functionality of the control systems. The major example of course is block signalling, huge areas of "dead" space on the rail network because a train was either allowed in the block section or it was not. So ironically what the "digital" ECTS system is (slowly) introducing is analogue functionality, the spacing between trains being able to be continuously variable, proportional to train speed and stopping distance.

    Similarly, one of the many areas I work in is remote fault reporting of trackside equipment, trying to help organisations move from "digital" monitoring (it works or it doesn't) to far more useful "analogue" monitoring (it's ok at the moment, but it's trending towards failure, schedule in a repair team in an overnight possession before it breaks and causes delays). 

    I don't suppose it's an argument I'd ever win, but quite seriously if I was launching an innovative rail control or management system I can very much imagine that I would be marketing its benefits as being "analogue" functionality as opposed to the old fashioned limited "digital" functionality! I do find this generic use of the word "digital" to mean "through the use of technology" redundant, personally I would say that solutions are either technology based systems or human based systems - which in fact is the sort of terminology I do use in rail safety arguments, I would only use "digital" to refer to digital electronics where it specifically needs to be differentiated from analogue electronics, or as in the above examples to be clear that the system only has a defined number of functional states (mostly where I'd use it it would be two states). Subject for a separate thread though, and I accept that it's probably a lost cause through the evolution of the English language.

Children
No Data