Cooling Problem Brings Down Microsoft Office 365

Azure/Microsoft Office 365 suffered an outage on Tuesday, apparently caused by a heating issue. It largely affects Southern states and Texas, according to Down Detector tracking maps.

The problems began around 7:45 ET this morning, with complaints to Down Detector peaking around 10 a.m. They affect access to administrative services and to Outlook email for some users.

"Admin still down for us in DFW," writes one. 

"Still down in Jacksonville, FL," writes another. 

Reports also came in from Kansas City. 

However, some locations reported that service had been restored.

"FINALLY got some Admin portal functionality. Azure portal has been working fine for last couple hours, AFAIK," another user writes.

According to media reports, Microsoft issued this bulletin to users: 

"Automated data center procedures to ensure data and hardware integrity went into effect when temperatures hit a specified threshold and critical hardware entered a structured power down process. The impact to the cooling system has been isolated and is in the process of being mitigated. Engineers are continuing to work towards restoration of services. The next update will be provided at 14:00 UTC or as events warrant."

One complainant wrote that Microsoft had issued a bulletin to say "they know there is an issue now and they will keep letting me know once further updates are posted to the service bulletin. The bulletin mentions a data center infrastructure issue, but the heat-map above is interesting because this makes it seem like the issue is more widespread than problems with a single data center would create."

The incident has also drawn criticism from at least one email security vendor.

"Today’s incident at Azure was another clear reminder for the need for organizations to build in their own redundancy rather than rely on a single vendor," says Pete Banham, cyber resilience expert at Mimecast.

He adds that "all organizations, including Microsoft, need to consider what downstream effects there may be from losing a critical service due to technical failure or human error."

Banham concludes: "Should employees around the world using Office 365 be reliant on a single Azure DC in the US? Services will always fail and IT leaders need to ensure they have not outsourced responsibility to a lone cloud service."

Next story loading loading..