Calling the outage “unacceptable,” Microsoft saw customers threaten to abandon their platforms.
Microsoft has apologized for Monday’s Azure AD outage, which brought down key online services including Teams, Exchange and SharePoint. The global outage locked customers out of Microsoft 365 applications and services, which all require Azure Active Directory authentication.
The outage appears to have begun Monday afternoon, lasting several hours. Microsoft said it restored services early Tuesday morning, though the company’s partner portal reported issues were continuing. Partners can investigate availability of services on the company’s Azure status site.
While it’s not clear how many customers and partners the outage impacted, complaints appeared from around the world. Besides Teams, Exchange and SharePoint, the outage impacted other Microsoft cloud services including Dynamics 365 and Power BI. Likewise, users were unable to access third-party apps that require Azure AD authentication.
Also, customers and managed service providers (MSPs) could not login to the Azure, Teams, Exchange, SharePoint and KeyVault admin portals. Microsoft issued its apology on the Azure Status History site, where it provided a preliminary route cause analysis (RCA).
“We understand how incredibly impactful and unacceptable this is and apologize deeply,” according to the post. “We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future.”
Preliminary Route Cause of Outage
According to the preliminary RCA, the error happened when Microsoft rotated expired keys that enable Azure AD to use industry-standard encryption protocols such as OpenID. The automated update process triggered a bug, resulting in the incorrect removal of a critical authentication key. Consequently, the removal of that key locked customers and MSPs from logging into Azure AD.
Monday’s Microsoft Azure AD outage is the first since September when the software giant began the first phase of changes to its Service Trust Portal (STP). The removal of expired authentication keys triggered both outages, Microsoft said.
While Microsoft has completed that first phase, a “carefully staged” deployment is scheduled to be complete by midyear. Microsoft said once it fully deploys the STP, it will prevent the occurrence of Azure AD outages.
“That effort is progressing well,” according to Microsoft’s explanation. “Unfortunately, it did not help in this case as it provided coverage for token issuance but did not provide coverage for token validation as that was dependent on the impacted metadata endpoint.”
While such widespread outages are not common, Microsoft and other cloud services have experienced occasional service disruptions throughout the years. But since the pandemic that forced millions of workers around the world to become more reliant on key cloud services, outages have become more disruptive. Notably, the number of Microsoft Teams subscriptions has grown from 20 million to 115 million, according to the company.