Health Tech

Hundreds of legacy IT systems responsible for monthslong outage at UK hospitals, report finds

Old cooling equipment, convoluted legacy systems, and poor planning led to a two-month IT outage at London hospitals, according to an investigation.
article cover

HalcyonMarine/Pixabay

· 4 min read

Heatwave-induced disruptions at data centers for a UK hospital group lasted for weeks thanks to its reliance on “371 legacy IT systems,” according to a board review of the matter.

Guy’s and St. Thomas’ NHS Foundation Trust (GSTT) was one of multiple organizations impacted by a summer 2022 heatwave that fried data centers across the London region—and its recovery took two months to complete. As temperatures soared, cooling systems servicing the two data centers that ran clinical and community IT systems for GSTT hospitals and clinics in London failed on July 19.

As a result, electronic patient records became inaccessible, forcing staff to switch to paperwork and causing delays that affected clinical systems involving everything from lab work to surgeries. The Guardian reported that the hospitals were forced to divert ambulances and critically ill patients to other institutions.

Contributing factors identified by the review included a “complex and confusing” system of roles and responsibilities in data center operations, old infrastructure, and problems with cooling systems. For example, responsibility for the system was split between two GSTT in-house teams, as well as ATOS, a private company that managed the data centers, NetApp, which manufactured the storage network equipment, and Secure IT, which serviced crucial cooling equipment.

GSTT was aware the St. Thomas’ data site had “suboptimal” ventilation since 2018, and St. Guy’s cooling system was approaching its end-of-life date. While investigators could not determine the exact combination of factors that led to the outage at the St. Thomas’ data center, they found it “probable” that age played a role, as heat within the building did not exceed the maximum operating temperatures provided by the manufacturer. The report found the St. Guy’s site did not plan ahead to cool condensers with water on July 19. Plans were made at the other facility, but were delayed due to “problems with a hose connector.”

The board also found the trust had failed to consider environmental risks that could simultaneously affect both data centers, which were designed as each others’ backups. The failure of both meant some backup server groups entered conflicted states that the internal IT department and ATOS could not resolve and required hiring a contractor, Zerto.

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.

“The solution was a time-consuming manual process of extracting and copying files, which meant the recovery took longer than planned,” the report stated.

The trust review also noted many of the legacy systems are scheduled to be replaced by a system called Epic, which is planned for April 2023. The review concluded that GSTT may have neglected to ensure the legacy systems were prepared to hold the line in the interim period.

“With hindsight, the Trust should have given greater weighting to investment in elements of legacy IT infrastructure that were approaching end of life regardless of the new Epic system,” the report states.

The report also said the incident had placed enormous stress on IT teams and clinicians, the latter of whom reported the outage was “overwhelmingly a negative experience.” The trust also admitted faults with its incident response planning.

“There was not a pre-agreed design for an incident response structure during a total IT outage and the precise design of the response had to be shaped in the early days of the recovery,” the review found.

The report stated that although “patient experience suffered greatly,” GSTT has only identified one case of “moderate” harm to a patient and no cases of “serious” harm as a result of the outage. However, it admitted evidence other harm events may come to light in the future. Altogether, recovery cost £1.4 million (around $1.7 million).

GSTT didn’t respond to a request for comment from IT Brew.—TM

Do you work in IT or have information about your IT department you want to share? Email [email protected]. Want to go encrypted? Ask Tom for his Signal.

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.