What is the Azure SLA for two or more virtual machines in an availability Zone?

News

Azure Availability Zones Uptime Protection Starting To Arrive

What is the Azure SLA for two or more virtual machines in an availability Zone?

Microsoft on Friday announced that its Azure Availability Zones datacenter uptime assurance offering for virtual machine (VM) workloads is now commercially available in two Azure regions.

The new service promises to add resiliency and fault isolation for workloads hosted on Azure infrastructure. Azure Availability Zones are currently at the "general availability" stage in Microsoft's U.S. Central and France Central regions. It's at the preview stage in the East U.S. 2, West Europe and Southeast Asia regions.

Microsoft defines an Azure Availability Zone as "an isolated location inside of an Azure region" that has "its own independent power source, network and cooling." Each Azure Availability Zone consists of one or more datacenters, and these datacenters are football-field-size facilities, according to an explanation by Microsoft's Raj Ganapathy in this Microsoft video, which demonstrated the setup for Azure Availability Zones.

Organizations can use Azure Availability Zones to protect hosted applications and hosted data from failures within an Azure zone. An Azure zone is defined as "a geographical grouping of Azure regions for billing purposes."

Azure Availability Zones provide protection for the following Azure services, according to Microsoft's overview document:

  • Linux Virtual Machines
  • Windows Virtual Machines
  • Virtual Machine Scale Sets
  • Managed Disks
  • Load Balancer
  • Public IP address
  • Zone-redundant storage
  • SQL Database

Organizations might use Azure Availability Zones for two basic reasons. They can use them to assure high availability by synchronously replicating applications or data. Alternatively, they can use Azure Availability Zones for disaster recovery purposes by asynchronously replicating applications or data.

Organizations get a 99.99 percent service-level agreement (SLA) assurance of uptime for VMs with Azure Availability Zones, which applies when there are "two or more VMs running in two or more zones in the same region," according to Microsoft's Azure Availability Zones homepage.

Microsoft's overall Azure resiliency scheme is fairly nuanced (see diagram).

[Click on image for larger view.] Microsoft's Azure resiliency hierarchy. (Source: March 30 announcement)

Under the Azure resiliency scheme, Microsoft offers 99.9 percent SLAs for single Azure VMs. There's also a 99.95 percent SLA for Azure VMs within datacenters that use Availability Sets, a capability that assures the distribution of VMs across "multiple isolated nodes in a cluster." Here's how a Microsoft document described Availability Sets:

Azure ensures that the VMs you place within an Availability Set run across multiple physical servers, compute racks, storage units, and network switches. If a hardware or Azure software failure occurs, only a subset of your VMs are impacted, and your overall application stays up and continues to be available to your customers.

Ganapathy clarified the difference between Azure Availability Sets and Azure Availability Zones in the video in the context of Microsoft's overall fault-tolerance hierarchy.

"If you could think of that hierarchy of fault tolerance, Availability Sets provide protection from hardware-level failures and software-level update failures," he said. "Availability Zones protect from datacenter-level failures and software update failures."

The pricing for Azure Availability Zones is based on the bandwidth used, namely for both inbound and outbound data transfers, priced at $0.01 per GB going in each direction. There are no data transfer charges between Azure services located in the same region.

Microsoft plans to start billing for Azure Availability Zone data transfers on July 1, 2018.  


About the Author

Kurt Mackie is senior news producer for 1105 Media's Converge360 group.

For a while now Microsoft has offered an SLA with service credits for 2 or more Virtual Machines in an Availability Group and also for a single virtual machine with premium SSD based storage. Obviously an SLA doesn’t prevent an outage but it does incentivize the vendor to design their service to avoid outages. The guarantees were:

  • For all Virtual Machines that have two or more instances deployed across two or more Availability Zones in the same Azure region, we guarantee you will have Virtual Machine Connectivity to at least one instance at least 99.99% of the time.

  • For all Virtual Machines that have two or more instances deployed in the same Availability Set, we guarantee you will have Virtual Machine Connectivity to at least one instance at least 99.95% of the time.

  • For any Single Instance Virtual Machine using premium storage for all Operating System Disks and Data Disks, we guarantee you will have Virtual Machine Connectivity of at least 99.9%.

This is the wording in the “SLA for Virtual Machines” v1.8 dated March 2018.

This has been updated as of July 2020 to v1.9 to the wording below:

  • For all Virtual Machines that have two or more instances deployed across two or more Availability Zones in the same Azure region, we guarantee you will have Virtual Machine Connectivity to at least one instance at least 99.99% of the time.

  • For all Virtual Machines that have two or more instances deployed in the same Availability Set or in the same Dedicated Host Group, we guarantee you will have Virtual Machine Connectivity to at least one instance at least 99.95% of the time.

  • For any Single Instance Virtual Machine using Premium SSD or Ultra Disk for all Operating System Disks and Data Disks, we guarantee you will have Virtual Machine Connectivity of at least 99.9%.

  • For any Single Instance Virtual Machine using Standard SSD Managed Disks for Operating System Disk and Data Disks, we guarantee you will have Virtual Machine Connectivity of at least 99.5%.

  • For any Single Instance Virtual Machine using Standard HDD Managed Disks for Operating System Disks and Data Disks, we guarantee you will have Virtual Machine Connectivity of at least 95%.

What’s interesting here is that there is now an SLA for virtual machines using Standard SSD and Standard HDD managed disks.

The connectivity SLA for Standard HDD isn’t great at 95% – this means in a month you could experience 1 day 12 hours of downtime before any service credits were payable… so I wouldn’t advise this for Virtual Machines that you really care about. For Standard SSD however a SLA of 99.5% equates to up to 3 hours 36 minutes of outage per month which may be an acceptable compromise.

This does mean there is now an SLA at a lower price point for Azure Virtual Machines – for example the A_v2 series VMs don’t support premium SSDs so would not previously had a SLA at all but now one is available. This is surely a good thing?

You can read more about the Azure virtual machine SLAs here.

If you want to find out more about how you could benefit from using virtual machines or Azure then give us call.

What is the SLA for availability zone?

Availability Zones are unique physical locations within an Azure region. Zone-redundant services replicate your applications and data across Availability Zones to protect from single-points-of-failure. With Availability Zones, Azure offers industry best 99.99% VM uptime SLA.

What is the service

Organizations get a 99.99 percent service-level agreement (SLA) assurance of uptime for VMs with Azure Availability Zones, which applies when there are "two or more VMs running in two or more zones in the same region," according to Microsoft's Azure Availability Zones homepage.

What is the SLA guarantee for Virtual Machine connectivity Azure?

We guarantee that Apps running in a customer subscription will be available 99.95% of the time. No SLA is provided for Apps under either the Free or Shared tiers.

How is SLA calculated in Azure?

The expected percentage of time for a simultaneous failure is 0.0001 × 0.001 , so the composite SLA for this combined path is: Database or queue = 1.0 − (0.0001 × 0.001) = 99.99999%