This BRIEFING NOTE offers unique insights for insurers and underwriters active in the commercial sector, who need to know more about potential National Grid failures:
Resilience vs Reliability: are we measuring the right things for our electric power?AUTHOR: Dr Sandra Bell
DATE: 08 September 2019
On 9 August 2019 the UK felt the impacts of what was described by National Grid as an “incredibly rare” event. The impacts were immediate: homes were plunged into darkness, traffic lights failed to work, trains were cancelled and passengers were stranded, hospitals suspended non-essential procedures and many businesses suffered financial losses.
The government immediately launched an investigation by the Energy Emergencies Executive Committee and the National Grid was asked by the Business Secretary Andrea Leadsom to “urgently review and report to Ofgem”.
National Grid published a preliminary report a mere 9 days after the event reiterating that what happened appeared to represent an “extremely rare and unexpected event”. The report states that immediately following a lightning strike on the Eaton Socon – Wymondley Main transmission circuit the Hornsea off-shore windfarm and the Little Barford gas power station both, almost simultaneously, reduced their energy supply to the grid. The customer impacts were instantaneous and significant and included: 1.1 million customers being without power for between 15 and 50 minutes; severe rail transport disruption caused by a certain class of train operating in the South-East area unable to remain operational without engineer intervention; and, extremely worryingly, the disruption of critical facilities including Ipswich hospital and Newcastle airport.
There is little doubt that the technical detail of what caused the gas-fired power station at Little Barford and the Hornsea offshore wind farm to go offline will be collected and analysed in forensic detail together with a scrutiny of the automatic safety systems that shut off the power to some places to protect the integrity of the grid. Likewise the communications procedures will be probed along with the ability of services such as rail transport and hospitals to survive a 42 minute power outage.
However, perhaps the answer to what went wrong lies not in the technical detail but in the fact that the risk landscape has changed and that the metrics that are collectively used to drive investment and planning for infrastructure disruptions are now reaching their limit of usefulness.
“The transmission system operated in line with the Security Standards and the Grid Code” (National Grid 16 August 2019)
This is certainly the view of National Grid whose analysis of the chain of events concluded “the voltage performance of the National Electricity Transmission System was within SQSS and Grid Code requirements”.
Their preliminary report states that the three events (the transmission circuit trip, Hornsea and Little Barford supply reductions) while “associated” with a single lightning strike were independent and therefore the situation of all three happening at the same time was “exceptional” and beyond regulatory planning requirements.
Most electric power utilities, who have long been seen as leaders in the critical infrastructure community for contingency planning, have regulations that continue to drive them with “reliability” metrics such as: number of customers interrupted; customer minutes lost; and mean daily fault rates.
Such metrics are good for normal operating conditions but they undervalue the impact of large-scale events and price lost load at a flat rate. Yet the value of lost load compounds the longer it’s lost. For example, most customers will value costs differently in the first few minutes of the disruption caused by an outage, when it’s merely inconvenient, than they do after days of disruption, or weeks when modern life becomes simply impossible. Likewise, the impacts of large scale events are disproportionately high, driven by abnormal restoration costs and widespread and complex infrastructure damage. Large scale events are therefore often only included in the narrative of risk registers and the reliability metrics drive a planning and investment focus on smaller, more common, events rather than larger, more uncommon, yet more disruptive events. Especially when combined with an accessibility and affordability target.
However, widespread economic instability, disruptive technologies, hyper-extended supply chains, terrorism and organized cyber-crime are now commonplace. Likewise, grid operations have increased in complexity due to changing power demand, increased reliance on renewable sources, and increasing introduction of smart technologies. Together, these have created a risk landscape that is no longer relatively stable and interspersed with occasional shocks but unremittingly characterized by uncertainty, complexity and risks with adversaries.
Low-probability, high-consequence events are now much more common and energy researchers such as Vugrin, Castillo and Silva-Monroy from the Sandia National Laboratories have recognized that historical data used for reliability calculations may not be suitable for characterizing future potential outages because emerging threats can differ significantly from historical precedents.
But what about the trains, hospitals and airports?
It’s all very well blaming the National Grid or the regulators for failing to anticipate and plan for such events but some blame for the impacts must also lie with those that use the power that the National Grid provides. There are very few organisational activities that don’t require power in some form or other and recently we have seen an increase in the number of businesses effectively managing the risks associated with power outages by using our workplace facilities as part of their Business Continuity arrangements.
According to our annual Disaster Landscape invocation statistics, power outages were the number one reason companies in the UK and US relocated their workplace to one of our facilities, a 77% increase from the previous year. For many organisations, simply relocating to an alternative workplace in the event of a power outage resolves the issue and all they experience is a small operational disruption as staff relocate to the recovery facility where they can carry on as normal. However, for large scale infrastructure diversification, emergency response procedures and alternative power supplies are a must if customer impact is to be minimised.
If it were not for Network Rail’s electrical power resilience the impact on the transport system could have been far worse. It has been reported that no track supplies were lost and that traction power was maintained to the vast majority of the railway throughout the incident. It would appear that the majority of rail disruption was caused by a particular type of train reacting unexpectedly to the electrical disturbance. Class 700 and 717 trains shut down due to their internal protection systems being triggered. These trains then required manual intervention to restart. Undoubtedly Network Rail have emergency response procedures to cope with one or two of these trains requiring a manual restart but there were 60 in use at the time of the incident and the resultant impact to the rail network was 591 cancelled or part cancelled trains, 873 delayed trains and thousands of delayed and stranded passengers.
The National Grid Electricity System Operator (ESO) has internal resilience (from generators, batteries, interconnectors etc.) to ensure a stable supply under certain loss circumstances but automatic protection systems will disconnect users to preserve the integrity of the system and ensure power supply to critical infrastructure in the event of an exceptional event. This system is known as the ‘Low Frequency Demand Disconnection’ (LFDD), and it is reported to have functioned as expected on the 9th August. The Electricity Supply Emergency Code (ESEC) makes provision for critical infrastructure, such as airports and hospitals to be registered as “Protected Sites” and avoid disconnection. However, it would appear that due to an administrative oversight Newcastle Airport was not registered in this scheme.
The exact root cause of Ipswich Hospital’s issues remains a conundrum. The hospital’s internal protection systems were triggered within the same timeframe of the incident. Yet, UK Power Networks (UKPN) have stated that the hospital was not part of the LFDD protection zone and that the substations supplying the hospital were unaffected. However, the cause of impact to patients has been identified. An initial investigation by the hospital has reported that when the hospital’s power protection systems were activated, all eleven of their backup power generators kicked in immediately. Unfortunately, a faulty battery on one of the generators failed to switch the supply to the backup and the main outpatients, X-ray and pathology areas of the hospital were left without power.
Resilience in a complex socio-economic system
The concept of “resilience” in complex socio-economic systems reliant on technology is not new, but it is something that is hard to regulate as it is subjective and involves the combined effort of technology, systems, people, processes, leadership and culture.
However, if we are to avoid more disruptions of the type we saw at the beginning of August 2019 then we need to change the way we incentivise infrastructure investment. Rather than simply promote grid reliability, that focuses effort on preventing a disruptive event from occurring, we also need to promote energy sector resilience to ensure that power generators, distributers and those organisations such as transport and health sector organisations that convert power into citizen services, can continue to provide goods and services to the communities that rely upon them, regardless of the occurrence of disruptive events.
To find out more about managing uncertainty and promoting resilience within complex socio-economic systems please contact Sungard Availability Service Resilience Consulting practice.