There could be trouble ahead: managing the move towards cloud

Graham Jarvis Advice
2 Dec, 2011

Upgrades can spell trouble for companies so companies moving to the cloud have to be especially wary

In October, Blackberry users around the world got to know first-hand how frustrating downtime can be. Millions of Blackberry users found that they couldn’t do much more with their handsets than make voice calls.

The outage was caused by Blackberry vendor, Research in Motion(RIM)’s attempts to complete a software upgrade on its database, but due to corruption problems it failed. Its staff then tried to revert to an older version of the database to restore it, but it collapsed completely.

It had major consequences for the company: the BBC reported that Blackberry customers in the United States and in Canada were planning to sue RIM  over this situation. The company is being held responsible for the loss of email and internet services over a one and half day period.

Compensation is therefore being sought on behalf of the millions that were affected by the outage. This is because Blackberry users pay a monthly fee to their wireless service providers for data services, and the downtime caused by the outage prevented many of them from being able to access these services.

The Blackberry incident highlights how critical it is to do whatever is necessary to ensure that issues like this don’t ever arise. Failure to take preventative action can damage the reputation of even the largest brand like Blackberry and it can ultimately be costly and time-consuming.

Key cause: software upgrades
It appears that upgrades are one the key areas where this kind of failure can materialise. But RIM is not the only company to have suffered such problems, there have been several other examples.

In November, for example, Xerox issued a statement saying that it was “offline due to an issue within our application hosting infrastructure.” The cause of this particular outage was an issue within its application database server cluster. Xerox says this was “triggered by an automotive failover to the back-up database node.” Another issue caused this to fail and subsequently caused the outage.

Among the other companies that have suffered from outages are Amazon Web Services (AWS), who were hit twice this year - on the East Coast of the US and in Dublin; Microsoft, which had a domain name server issue that prevented customers from accessing Office 365; Google, who had a failure when trying to upgrade Google Docs and AT&T who had to cope with a voice service outage that occurred during routine maintenance.

People and processes cause issues
Therefore, outages are not unknown, particular during system upgrades. This is backed up by research. According to analyst firm Gartner, "through to 2015, 80 percent of outages that impact on mission critical services will be caused by people and process issues, and more than 50 percent of those outages will be caused by change, configuration, release integration and hand-off problems." With this in mind, and the calamity that poorly managed people and process issues can cause, service providers and their customers should ideally be always taking preventative steps to make sure that those mission critical services remain up and running.

Sign up for our free newsletter