Using IET Experts to identify the root causes of serious public and private "Denial of Service" IT Problems

Over this past weekend the British public was subjected to a major "Denial of Service" by Barclays' Bank .

One public observer reviewing the on-going situation noted "At least it appears it wasn't caused by hostile actors!

But what if it had been?"

The IET organization keeps on stressing it's expertise, but never seems (publicly at least) to be actively involved in the long term solutions to these engineering problems.

What is required is an IET sponsored organization that can provide a group of independent experts who will review such major (that lasts for over 24 hours) events, and within 6 weeks report back to the government (and the public) potential long term corrective recommendations.

Peter Brooks

Palm Bay FL

Parents

AJJewsbury over 1 year ago

I'm not sure the the Barclays situation should be classified as a "denial of service" - that phrase is usually used to mean a particular type of attack (usually flooding legitimate access points with a large volume of traffic ... a tactic that I'm told has its roots in the US civil rights movement of the 1960s where companies with segregationist policies were targeted with large volumes of low value customer transactions, making normal trade impossible). The information currently in the public domain suggests Barclays suffered from a simple "failure of service".

In terms of available experts - the UK already has the National Cyber Security Service and several standards to work to and be audited against (e.g. ISO 27001).

I suspect the underlying problem is large complicated systems (often with lots of legacy components) and the endurance of human error.

- Andy.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Peter Brooks over 1 year ago in reply to AJJewsbury

Hello Andy:

It is not unusual for a phase to be repurposed years later when another significant event(s) happens - in this situation I think "denial of service" is an more than adequate statement.

In one part of your reply you stated that it was a "SIMPLE" FAILURE OF SERVICE while you followed with,- it could be due to BEING a "LARGE COMPLICATED" SYSTEM- It is one or the other, BUT NOT BOTH.

Peter Brooks

Palm Bay Florida USA
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
mapj1 over 1 year ago in reply to Peter Brooks

The failure may be simple, but the system that has failed, complex. But that is perhaps not that important. It highlights that sans computers, modern banks cant. Once upon a time, not so long ago, well within living memory, the solution would have been to walk into a branch, and do the transaction manually, the bank's paper ledgers being the master copy, and the passbook if it was that sort of account, being the customers.
Not any more. One well placed burst of electrons in the wrong place, and it all stops.

It is a bit unusual, such things are normally accompanied by noises about software upgrades, as if not testing new software prior to release was an acceptable short-cut.
Maybe it was a hardware fault.

As per the other topic, I agree, the IET lacks the authoritative heft - when something medical happens a BMA spokesman or statement is on the 6 o' clock news, no such equivalent from the IET for matters engineering or technological, in IT or any other field..
Mike.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Mark Tickner over 1 year ago in reply to mapj1

Well if it is IT, it should really be the BCS...
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
AJJewsbury over 1 year ago in reply to Peter Brooks

BUT NOT BOTH.

Why not? The complexity of the underlying system and the complexity of the cause of a particular fault may well be independent variables. The problem with complex systems is in anticipating all possible faults, their consequences, and devising means of mitigating such events (if you can) - that challenge grows exponentially with the complexity of the system no matter how simple any individual fault.

The classic one was how do you report the error that occurs when the error logging system fails? (I gather IBM had that problem in the early years, the "disk full" error message couldn't be written to the disc, which raised another error, which tried to write another error message to the disc, which failed.... etc. and soon crashed the entire machine).

It is a bit unusual, such things are normally accompanied by noises about software upgrades, as if not testing new software prior to release was an acceptable short-cut.
Maybe it was a hardware fault.

There are plenty of ways of nobbling apparently working software without deliberately changing the code - encryption keys can expire, likewise licences for 3rd party software components or services - many an important system has ground to a halt because accounts didn't pay a small invoice on time or the person responsible to refreshing encryption keys was on holiday or mis-keyed something. Likewise machine resources (memory, disc space, CPU availability) can be just pushed over the edge by small cumulative increases. As a system is fed more data it tends to slow down as it has more to sift through for every query, but often there's a tipping point - something needing 2% more CPU when the CPU's already running at 50% is one thing, needing 2% more when it's already at 99% has rather more severe consequences. Many types of transactions will tend to time out and return an error if they take longer than a certain amount of time - the caller then usually re-tries, so what last time worked in a certain time now takes orders of magnitude longer, and consumes orders of magnitude more resources, if it succeeds at all.

- Andy.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
Peter Brooks over 1 year ago in reply to AJJewsbury

Hello Andy

While I agree with you that a product maybe "fully" defined by multiple independent variables (example aged malt whiskey) usually only one variable is associated with one factor (example aging rate)- that is a "root cause".

Having worked as a manufacturing engineer on solid state products I was frequently required to do extensive failure analysis to discover the "root cause" of product failures both during initial manufacture and during actual use, looking for examples of electromigration, dendrite growth, temperature phase changes in metals (tin pest in very cold conditions) etc.

When you get to defective software the problem is usually that the IT coders will not listen to the final users of the software.

They always assume they already know everything they need to know

Another one of my engineering jobs at one time was to validate and try and break the new computer code. I was very successful in this function.

Peter Brooks

Palm Bay Florida USA
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel
AJJewsbury over 1 year ago in reply to Peter Brooks

When you get to defective software the problem is usually that the IT coders will not listen to the final users of the software.

They always assume they already know everything they need to know

I fear the reality these days is again more complex ... end users and software developers are usually not in direct communication - there are umpteen layers of product owners, architects and designers in between. Often the problem starts with what's really wanted not being entirely the same as what was asked for (because end users aren't professional specifiers)... then there the issue of all the little ifs and buts that the end users don't normally think about but can have significant consequences if things start to go wrong. The there's the issue of knowing the consequences of changing an existing system - for large system there's often no one person that understands it all in sufficient detail.

- Andy.
Cancel
Vote Up +1 Vote Down

Sign in to reply

Cancel
Peter Brooks over 1 year ago in reply to AJJewsbury

Hello Andy:

Every software project I was responsibility for had to be fully defined before handing it off to the coders - for example each digit of the unique part number had defined boundary conditions - it also had to fit into a database, BOM and WIP. I was lucky that my company had an in -house coder group, however we also purchased outside canned programs.

I was never impressed with IBM software.

I started learning coding in "assembly" language on a IBM 1620 system in 1961 but it was not my thing.

Peter Brooks

Palm Bay Florida USA
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel

Reply

Peter Brooks over 1 year ago in reply to AJJewsbury

Hello Andy:

Every software project I was responsibility for had to be fully defined before handing it off to the coders - for example each digit of the unique part number had defined boundary conditions - it also had to fit into a database, BOM and WIP. I was lucky that my company had an in -house coder group, however we also purchased outside canned programs.

I was never impressed with IBM software.

I started learning coding in "assembly" language on a IBM 1620 system in 1961 but it was not my thing.

Peter Brooks

Palm Bay Florida USA
Cancel
Vote Up 0 Vote Down

Sign in to reply

Cancel

Children

No Data

Using IET Experts to identify the root causes of serious public and private "Denial of Service" IT Problems

Community Rules & Guidelines