Retrofitting Legacy Control Systems to Tackle Evolving OT Cyber Threats

Hi everyone,

I’m new to the EngX community and looking forward to learning from you all. I’d like to start a conversation about something I think many of us face and that is updating legacy control systems in power plants and other critical infrastructure, especially when it comes to growing OT cyber threats.

Lot of these systems were designed decades ago, with reliability in mind but little thought given to cybersecurity. Today, they’re exposed to new risks that weren’t imagined back then. The challenge is finding a way to retrofit these systems efficiently and without tearing everything apart or causing long periods of downtime.

In the UK, where our energy and infrastructure systems are heavily relied upon, even a small disruption can create big problems. So how do we make these updates both secure and practical?

I’m particularly interested in hearing how others have approached efficient retrofitting and what worked, what didn’t, and how you balanced the iron triangle of cost, time, quality and scope. Are there certain strategies or tools that helped modernize your systems without overhauling them completely.

Would love to hear your thoughts and experiences.

Thanks,

Taimur | MIET 

Parents
  • Hi everyone, and thank you for starting this important discussion.

    As someone working in the ICS/OT domain, I’ve seen first-hand how challenging it is to modernise legacy control systems in critical infrastructure—especially in sectors like power generation, where uptime, safety, and compliance are non-negotiable.

    You're absolutely right—many of these systems were built for reliability and longevity, not cybersecurity. But today, with increasing OT cyber threats and growing interconnectivity, we can't afford to ignore the risks. That said, a full system overhaul isn’t always feasible. I’ve found that successful retrofitting lies in balancing risk reduction with practical constraints like time, cost, and operational disruption.

    Here are a few approaches I’ve seen work in practice:

    Small blue diamond Risk-based retrofits using tools like Cyber-PHA or CyberHAZOP to prioritise high-impact upgrades.
     Small blue diamond Network segmentation and DMZs to isolate legacy equipment from enterprise IT and internet-connected systems.
     Small blue diamond Compensating controls such as protocol-aware intrusion detection, application whitelisting on HMIs, and read-only historian interfaces.
     Small blue diamond Secure remote access using jump servers with multi-factor authentication, session recording, and time-bound permissions.
     Small blue diamond Standards-based frameworks like IEC 62443 and NCSC’s Cyber Assessment Framework (CAF) to structure retrofit plans and align with regulatory expectations.

    One strategy that’s worked particularly well is the “wrapper” approach—layering modern protections and interfaces around legacy assets, allowing phased upgrades and limiting downtime. Conversely, what hasn't worked well is trying to lift-and-shift IT tools into OT environments without accounting for latency, determinism, or vendor lock-in.

    I'd be really interested to hear from others here:

    • Have you used similar strategies, or different ones that worked better?

    • What lessons have you learned in terms of balancing security, cost, and uptime during upgrades?

    - Simha

  • Hi,

    Very interesting post.

    What you would make of this option below ?

    The MQTT protocol is the only thing flowing into the IT network via the data diode. 

    Note: of course we need to have specific hardware to translate MMS (Manufacturing Message Specification) to MQTT for things like IEDs.

    OT_IT

    Updates are done via an air-gapped cloud infrastructure(on a separated connection only open at specific time slots). Basically we effectively "trip the breaker" between our office environment and our power controls.

    Cheers,

  • Humm. I rather like the idea of a "data diode" but I'm not sure how it would work in practice. Just about any reliable protocol I can think of has some means of confirming safe arrival of the data (and therefore triggering re-transmission if it didn't) - therefore needs a reverse data flow of some kind in order to operate at all. Once there's a reverse path in existence, there's the opportunity to use it for mischievous means.

    MQTT presumably runs over some other protocol (often TCP/IP but could be other I suppose) - which means you're also at the mercy of any vulnerabilities of that layer of software (and all those below it). Hopefully the day of crude memory injection simply by passing oversized packets are over, but still vulnerabilities in such layers are far from unknown. Never assume that just because something was intended to operate in some way, it can't be "persuaded" to do something quite different.

       - Andy.

Reply
  • Humm. I rather like the idea of a "data diode" but I'm not sure how it would work in practice. Just about any reliable protocol I can think of has some means of confirming safe arrival of the data (and therefore triggering re-transmission if it didn't) - therefore needs a reverse data flow of some kind in order to operate at all. Once there's a reverse path in existence, there's the opportunity to use it for mischievous means.

    MQTT presumably runs over some other protocol (often TCP/IP but could be other I suppose) - which means you're also at the mercy of any vulnerabilities of that layer of software (and all those below it). Hopefully the day of crude memory injection simply by passing oversized packets are over, but still vulnerabilities in such layers are far from unknown. Never assume that just because something was intended to operate in some way, it can't be "persuaded" to do something quite different.

       - Andy.

Children
  • usually 'professional'  data 'diodes' are configured per protocol much like firewalls,  and effectively spoof the acknowledgements to the sender, while keeping a local copy until the real ack is received, just in case a resend is needed after all, much as you do for things like satellite links, where the latency is very long, and the buffering over the network at the end points would otherwise be restrictive.

    The acks the sender gets are then not actually those coming in over the wider network which are just used to manage the buffer state.

    Mike

  • Hello,

    Data diode solutions use proxies on both the send and receive sides to satisfy the transport layer (i.e. TCP connection) requirements by responding with the appropriate protocol messages (acks, nacks, etc.) to the endpoints on both source and destination networks. After the send side proxy terminates the TCP session, the payload is extracted from the packets and transported across the diode. On the receive side, a new TCP session is initiated, and the packets are sent to the destination endpoint. In this way, the source (OT) side remains invisible to the external networks and endpoints but data is able to flow from the OT network to the IT network.

    How does the source network know the destination received the data?

    The source can’t ask the destination if the data was received, the destination needs to be able to determine, on its own, if it received all the data. In our case the  data diodes provide an internal validation mechanism to achieve this. The source side calculates a running hash code value that is inserted in each packet so that the destination can verify if any transmit problems exist.

    Cheers,

  • Indeed, but physically you've still got a single "box" that's capable of bi-directional communication on both sides, relying on just the software not to pass data in a particular fashion - and software can sometimes be compromised (either bugs or by malicious acts). I'm not saying it's not a useful defence - just not something to rely on as your only barrier. More part of an overall "defence in depth" approach (in the same way that firewalls would only be one layer of defence in a proper system).

       - Andy.

  • Ah, if there is a physical one-way valve, (e.g. opto) and it's not a common system fore and aft of that (e.g. same processor writing and reading from the opto pat), then that would provide more reassurance. But still, multiple layers of defence are often more reliable. Single points can and do fail.

       - Andy.