Retrofitting Legacy Control Systems to Tackle Evolving OT Cyber Threats

Hi everyone,

I’m new to the EngX community and looking forward to learning from you all. I’d like to start a conversation about something I think many of us face and that is updating legacy control systems in power plants and other critical infrastructure, especially when it comes to growing OT cyber threats.

Lot of these systems were designed decades ago, with reliability in mind but little thought given to cybersecurity. Today, they’re exposed to new risks that weren’t imagined back then. The challenge is finding a way to retrofit these systems efficiently and without tearing everything apart or causing long periods of downtime.

In the UK, where our energy and infrastructure systems are heavily relied upon, even a small disruption can create big problems. So how do we make these updates both secure and practical?

I’m particularly interested in hearing how others have approached efficient retrofitting and what worked, what didn’t, and how you balanced the iron triangle of cost, time, quality and scope. Are there certain strategies or tools that helped modernize your systems without overhauling them completely.

Would love to hear your thoughts and experiences.

Thanks,

Taimur | MIET 

  • Control systems many decades old are likely entirely immune to cyber threats - being simpler electronic or electromechanical systems with little or no "smart" components and certainly no internet connection. Therein lies a lesson I think. As for existing more vulnerable systems, building rings of security around them - e.g. either physically separating them from the internet completely or at least hiding them behind multiple layers of firewalls or similar, together with physical access security, will keep the majority of the black hats out. Nothing's ever 100% secure though -  as has been demonstrated relatively recently in the middle east, with enough time, money, effort and determination, anything can be infiltrated - even if it means going back to original equipment manufacturing where destructive components can be incorporated before the end users even take possession of them.

       - Andy.

  • Hi everyone, and thank you for starting this important discussion.

    As someone working in the ICS/OT domain, I’ve seen first-hand how challenging it is to modernise legacy control systems in critical infrastructure—especially in sectors like power generation, where uptime, safety, and compliance are non-negotiable.

    You're absolutely right—many of these systems were built for reliability and longevity, not cybersecurity. But today, with increasing OT cyber threats and growing interconnectivity, we can't afford to ignore the risks. That said, a full system overhaul isn’t always feasible. I’ve found that successful retrofitting lies in balancing risk reduction with practical constraints like time, cost, and operational disruption.

    Here are a few approaches I’ve seen work in practice:

    Small blue diamond Risk-based retrofits using tools like Cyber-PHA or CyberHAZOP to prioritise high-impact upgrades.
     Small blue diamond Network segmentation and DMZs to isolate legacy equipment from enterprise IT and internet-connected systems.
     Small blue diamond Compensating controls such as protocol-aware intrusion detection, application whitelisting on HMIs, and read-only historian interfaces.
     Small blue diamond Secure remote access using jump servers with multi-factor authentication, session recording, and time-bound permissions.
     Small blue diamond Standards-based frameworks like IEC 62443 and NCSC’s Cyber Assessment Framework (CAF) to structure retrofit plans and align with regulatory expectations.

    One strategy that’s worked particularly well is the “wrapper” approach—layering modern protections and interfaces around legacy assets, allowing phased upgrades and limiting downtime. Conversely, what hasn't worked well is trying to lift-and-shift IT tools into OT environments without accounting for latency, determinism, or vendor lock-in.

    I'd be really interested to hear from others here:

    • Have you used similar strategies, or different ones that worked better?

    • What lessons have you learned in terms of balancing security, cost, and uptime during upgrades?

    - Simha

  • If it were up to me I would seriously consider banning all microprocessor relays from swtichgear feeding critical infrastructure - or at least mandate a electromagnetic back-up relays.

    Anything that contains a microchip is subject to supply chain attacks. For example, imagine a hidden backdoor in the chip that renders it inoperable after x years of use etc. etc. If this were to be installed without detection it could cause total chaos given the highly centralized supply of such devices.

    We are constantly being told by our own government that our threat model should include state actors. State actors are more than capable of inserting hardware back doors into devices.

    The fact is that older simpler control systems - particularly if air-gaped - have a much smaller attack surface and are therefore much more resilient to cyber threats.

  • Hi everyone,


    I’ve worked on retrofitting a power plant originally commissioned in 1991, which we upgraded in 2023 to include remote monitoring and control from a central control room. The project involved integrating two different vendor control systems, and I’d like to share some high-level lessons learned that may help others facing similar challenges:

    1. Early Stakeholder Engagement & Site Surveys
    Engaging stakeholders early, especially OT and cybersecurity teams and conducting multiple site surveys proved invaluable. These steps helped us gain technical understanding of the legacy systems and identify potential risks upfront. Setting expectations early with all parties helped streamline decision.

    2. Structured Engineering Approach Using IEC 61508
    We followed a gated verification and validation model inspired by IEC 61508, even for OT systems. This included:

    -Getting architecture drawings and Functional Design Specifications (FDS) approved early.
    -Progressing through detailed design and testing stages in a structured manner.

    This approach helped manage risks and ensured clarity in networking and system integration.

    3. Safety Systems: Keep It Simple and Isolated
    For safety-critical systems, simplicity and isolation are key. We avoided complex or “smart” relays and emphasized regular proof testing to maintain reliability and reduce cybersecurity exposure.

    4. Pre-Outage Testing
    Conducting the majority of tests before the outage is always a success factor. This included:

    Network compatibility testing.
    -Installing network interfaces outside the control system scope early.
    -Using virtual machines to simulate and validate connectivity.
    This allowed the customer to perform penetration testing well in advance, reducing surprises during commissioning.

    5. Contractor Engagement
    If third-party contractors are involved, include them in site surveys. This improves their understanding of the project scope and helps in obtaining more accurate and competitive quotes.

    Most of the key OT considerations have already been well covered in this thread, but I hope these additional insights help others planning similar upgrades. Happy to discuss further if anyone has questions.

  • Hi,

    Very interesting post.

    What you would make of this option below ?

    The MQTT protocol is the only thing flowing into the IT network via the data diode. 

    Note: of course we need to have specific hardware to translate MMS (Manufacturing Message Specification) to MQTT for things like IEDs.

    OT_IT

    Updates are done via an air-gapped cloud infrastructure(on a separated connection only open at specific time slots). Basically we effectively "trip the breaker" between our office environment and our power controls.

    Cheers,

  • Humm. I rather like the idea of a "data diode" but I'm not sure how it would work in practice. Just about any reliable protocol I can think of has some means of confirming safe arrival of the data (and therefore triggering re-transmission if it didn't) - therefore needs a reverse data flow of some kind in order to operate at all. Once there's a reverse path in existence, there's the opportunity to use it for mischievous means.

    MQTT presumably runs over some other protocol (often TCP/IP but could be other I suppose) - which means you're also at the mercy of any vulnerabilities of that layer of software (and all those below it). Hopefully the day of crude memory injection simply by passing oversized packets are over, but still vulnerabilities in such layers are far from unknown. Never assume that just because something was intended to operate in some way, it can't be "persuaded" to do something quite different.

       - Andy.

  • usually 'professional'  data 'diodes' are configured per protocol much like firewalls,  and effectively spoof the acknowledgements to the sender, while keeping a local copy until the real ack is received, just in case a resend is needed after all, much as you do for things like satellite links, where the latency is very long, and the buffering over the network at the end points would otherwise be restrictive.

    The acks the sender gets are then not actually those coming in over the wider network which are just used to manage the buffer state.

    Mike

  • Hello,

    Data diode solutions use proxies on both the send and receive sides to satisfy the transport layer (i.e. TCP connection) requirements by responding with the appropriate protocol messages (acks, nacks, etc.) to the endpoints on both source and destination networks. After the send side proxy terminates the TCP session, the payload is extracted from the packets and transported across the diode. On the receive side, a new TCP session is initiated, and the packets are sent to the destination endpoint. In this way, the source (OT) side remains invisible to the external networks and endpoints but data is able to flow from the OT network to the IT network.

    How does the source network know the destination received the data?

    The source can’t ask the destination if the data was received, the destination needs to be able to determine, on its own, if it received all the data. In our case the  data diodes provide an internal validation mechanism to achieve this. The source side calculates a running hash code value that is inserted in each packet so that the destination can verify if any transmit problems exist.

    Cheers,

  • Indeed, but physically you've still got a single "box" that's capable of bi-directional communication on both sides, relying on just the software not to pass data in a particular fashion - and software can sometimes be compromised (either bugs or by malicious acts). I'm not saying it's not a useful defence - just not something to rely on as your only barrier. More part of an overall "defence in depth" approach (in the same way that firewalls would only be one layer of defence in a proper system).

       - Andy.

  • Ah, if there is a physical one-way valve, (e.g. opto) and it's not a common system fore and aft of that (e.g. same processor writing and reading from the opto pat), then that would provide more reassurance. But still, multiple layers of defence are often more reliable. Single points can and do fail.

       - Andy.