Azure users know first-hand how Microsoft has steadily expanded its public cloud services. Customers have gained the flexibility to take on broader and deeper use cases across more workloads, but they must also transition from traditional systems monitoring strategies to keep control of their architecture.
Responsibility for monitoring and managing Azure services, partly by Microsoft and partly by customers, can present as chaotic and balkanized. Into the fray, Microsoft has created a range of tooling to address IT needs. Azure Monitor, Application Insights, Activity Logs, Diagnostics, Network Watcher, Metrics, Policy, Security Center and others provide data back to system administrators, but that list itself presents even more challenges. Azure Monitor Alerts aims to centralize those alerts to create improved visibility and reduce alert fatigue.
Azure experts spoke with MSDW and shared recommendations on how to put these alerts to good use.
Metric alerts vs. log alerts
Before users begin planning an alert strategy, it’s important to understand the types of alerts and what an organization needs to alert on. Steve Hunter, a senior consultant at Avanade explained the two main categories of alerts:
Metric alerts evaluate time-series data and trigger notifications when a defined criterium is met. These alerts can be as simple as a single metric or as complex as a combination of multi-dimensional metrics from various sources. Metric alerts are beneficial when conditions of a healthy system are understood and can be codified. For example, if CPU utilization of a system exceeds 70% there is likely, or soon to be, an impact on performance. Metric-based monitors are essential but they require past issues to determine healthy thresholds. This is a continual evolution, when new issues appear new monitors are added – but this creates the potential for alert noise if monitors are not carefully designed.
Log alerts are based on the result of log queries in Application Insights and/or Log Analytics. These alerts depend on effective telemetry captured in system and application logs. With proper instrumentation, log alerts can be a highly beneficial asset in system observability. Simple log-based alerts may trigger using known conditions of a healthy system such as a service logging multiple errors within a pre-defined period. More complex log alerts can combine multi-dimensional elements from various sources to help identify issues even when the system appears to be healthy. For example, using analytics from services such as Application Insights to determine baseline system characteristics and alerts are triggered to proactively intervene when performance deviates from the norm.
The two main types of alerts take different approaches to unifying information within an organization, reporting based on historical thresholds or many different elements.
Microsoft MVP Stanislav Zhelyazkov explained:
It’s key to figure out what you want to alert on. Do I have the data to alert upon? You need to rely a lot on what data the platform provides. When it’s available, you can choose specific metrics for different use cases.
He has spent much of the last six months writing a multi-part series on Azure Monitor Alerts, published in his blog, Cloud Administrator in Azure World. Zheylazkovtold MSDW that in addition to metric and log alerts, there are a small number of others including ones for Cost Management and Sentinel.
Developing a strategy
Even with the way that alerts unify information, they fall very short of a “single pane of glass.”
OpsRamp solutions consultant Wael Altaqi recommends further consolidating alerts. “Centralizing alarm data must be coupled with machine learning to make sense of all of the alarm data,” he said.
Hunter echoed similar sentiments, encouraging users to move beyond the IaaS layer, focusing on PaaS and SaaS, while leveraging anomaly detection.
For all the benefits that Azure Monitor Alerts provide, there are some remaining drawbacks and limitations.
“With Azure Monitor Alerts, you get an alert around [one minute] after it occurs. Plus, alert delivery will take some time. This is definitely not the fastest way to get notified,” said Andy Lipnitski, IT director at ScienceSoft.
There is also a cap of two metrics, one log or one activity log, per alert rule, Hunter explained. Furthermore, a lack of customizable SMS and email templates threatens some IT departments with alert fatigue.
Companies adopting Azure Monitor Alerts must also be cognizant of the potential costs of storing alert data for long-term analysis. Public cloud has widened its role in different market segments, but remains less used for use-cases in branch offices, production facilities or remote locations with unreliable network connectivity. These situations often result in disrupted alert flows and potential unreliability. In spite of these drawbacks, Azure Monitor Alerts promise a degree of visibility that might be difficult to achieve by other means.
Going forward, the range of capabilities is sure to grow, according to Hunter:
The operational suite for Azure began as a collection of separate, independent (and sometimes overlapping) services. Over time, these services have consolidated into a more unified suite and will continue to evolve into a comprehensive solution that addresses the unique requirements of both traditional monitoring and modern application observability patterns. For example, as monitors and alerts become more sophisticated it becomes more challenging to understand when conflicting actions or alerts exist within the same system. Features to help analyze and produce higher fidelity “what if” scenarios, identify potential alert/action configuration issues, and identity potential gaps in coverage are starting to surface.