There was widespread panic on Tuesday after a major Internet outage knocked dozens of websites offline.
Amazon, Reddit and Twitch were all affected, as were the Guardian, the New York Times and the Financial Times.
Additionally, the UK government website crashed – on the day that Britons aged 25–29 were invited to book their COVID-19 vaccines.
Despite initial speculation that the outage was the result of a cyber attack – with ‘#cyberattack’ trending on Twitter – the true cause of the incident was less sensational, although nonetheless concerning.
Fastly and furiously
Within minutes, the Cloud computing provider Fastly acknowledged that it was responsible for the problem.
The organisation said there had been a configuration error in its global CDN (content delivery network).
Its Edge Cloud system, which is designed to help websites speed up load times, prevent denial of service attacks and prevent network traffic jams, had a bug that was triggered when one of its customers changed their settings.
Thankfully, Fastly was able to identify the problem and restore its systems in under an hour. That was good news for those who were left without essential services but bad news for those making the most out of the situation.
Will this happen again?
The obvious question, given how much we rely on Internet services such as Fastly, is whether this sort of error will happen again.
You might wonder if a bug that was exploited accidentally can cause this much damage, what are the chances that a cyber criminal could do something similar?
Adam Smith, a software testing expert with the BCS, the Chartered Institute for IT, told the BBC that outages with content delivery networks “highlight the growing ecosystem of complex and coupled components that are involved in delivering internet services.
“Because of this, outages are increasingly hitting multiple sites and services at the same time.”
However, Dell Technologies Senior Director Stephen Gilderdale believes that such outages are bound to occur occasionally but that they would be rare and brief:
“Cloud providers build in redundancies for such events to give their users secure access to replicated copies of data.
“In most cases, services are only affected for a short time, and data is easily retrievable. Far from being a cause of concern, it shows the resilience of the network that it can recover so quickly.”
Although that’s true for “most cases”, it relies on organisations being vigilant with their defences.
In this case, Fastly demonstrated how it should be done, but there is no room for complacency. Organisations must create and test mitigation strategies and do everything possible to fix vulnerabilities before their systems go live.