News

What Happens When Your IT Systems Have a Crash Landing?

Thu, 06/01/2017 - 21:18
Redstor_BA_Blog
Disaster Recovery

With the plethora of threats IT networks face on a daily basis, it is inevitable that once in a while you’re going to have an issue, even if you are British Airways.

On Saturday 27th May, an outage began causing mayhem for the flight operator and hundreds of flights across multiple airports were cancelled or delayed, the knock-on effect of this hit tens of thousands of customers attempting to travel.

Not an IT failure

Over the course of the weekend and the days that followed, media outlets reported several different causes or theories as to what had caused the global outage for the airline. Bearing in mind recent events, many at first may have assumed that systems had been hacked or infected by malware, however, it was quickly established, in part due to the length of the outage, that the issue lay around power or lack thereof.

British Airways representatives went on to broadcast that this was not an IT failure but that an electrical supply had been interrupted, giving light to the theory that human error was to blame either through a contractor or due to many aspects of IT being outsourced. BA’s data centres use an uninterruptable power supply (UPS) and have power generators and batteries on hand as a plan B, the fact that these were not able to curb the problem is the real failure, IT or not.

While the investigation into what really caused the outage, which affected an estimated 75,000 people and will have a predicted cost of between £50m and £100m in compensation alone, is ongoing Bill Francis who is Head of IT for IAG, BA’s parent company issued the following statement –

“This resulted in the total immediate loss of power to the facility, bypassing the backup generators and batteries… After a few minutes of this shutdown of power, it was turned back on in an unplanned and uncontrolled fashion, which created physical damage to the system, and significantly exacerbated the problem.”

So much for business continuity

Disaster Recovery? Failover? Business Continuity? If any of these had been planned for it certainly seems like those plans did not stand up in the face of a real scenario. The effects of the outage were still being felt a week after its occurrence and two-weeks post there has been no official statement on why the outage had such a profound effect on BA’s global systems.

  • Disaster Recovery (DR) refers to a policy driven procedures to restore data, infrastructure and systems on a larger scale in the event that a natural disaster or human related disaster takes place. DR could also include failover to systems that data is replicated to.
  • Business Continuity refers to the capability of an organization to continue operating and delivering a service or product in the event of a disaster or other incidents.

Disaster Recovery best practice

Every organization with an IT network, from BA to a small start-up, should use disaster recovery planning to ensure that in the event of a disaster (or outage) they can get back to operational capacity as quickly as possible and with minimal disruption to networks that aren’t affected and to customers.

Whilst disaster recovery has historically been an area only enterprize organizations deal in, primarily due to cost, advancements in modern technology have opened the possibilities of DR to all and the use of cloud platforms has increased this.

1.  Complete a risk assessment and business impact analysis

Each organization is different and the data or systems that are most valuable will vary. Completing an assessment of what risks there are to your network and the impact of any downtime is a vital step in understanding how to recover and what should be recovered in what order.

2.  Have full off-site backups of data

You may have an end-to-end DR solution in place but if systems fail, backups should be the first port of call to begin recovering valuable data. Keeping off-site backups will help to ensure that any potential disaster doesn’t also damage backup data.

3.  Implement a full disaster recovery plan

If disaster strikes there will be hundreds of things to be done, having a written plan and procedures to follow will speed up this process and ease confusion. In many cases, it is also necessary for compliance and/or insurance purposes.

4.  Regularly test your disaster recovery plan

Having a plan is all well and good but if disaster strikes and your DR plan fails, the effects of the disaster could be worsened. Regularly testing a plan will ensure it works, show anywhere it could be strengthened and ensure that your plan is in line with your ever-changing network.

 

 

Recent Articles

Redstor_Alternative_accountancy_strategic_blog Redstor

Redstor Accounting For Financial Data Backups at The Alternative Accountancy Strategic IT Conference 2018

Continuing from a series of events in the first two months of the year, Redstor will be in attendance of this years, Alternative Accountancy... read more

February 20, 2018
Redstor_CryptoJacking_blog Data Protection

Crypto-jacker Leaves ICO In Its Wake

Cyber-attacks and ‘hacks’ made regular headlines throughout 2017, and in the UK the Information Commissioner’s Office (ICO), was there to oversee all... read more

February 15, 2018
Redstor_100Days_to_GDPR Data Protection

100 Days To Go…

Wednesday 14th February 2018, valentine’s day, but more significantly it’s 100 days until G-day. May 25th, 2018, the day on which The General Data... read more

February 14, 2018