Microsoft cloud platform offers a Platform as a Service Database service called Azure SQL, which is used by companies to seamlessly integrate storage for their applications built or hosted inside and outside the Azure cloud. From handling updates to patching, monitoring, and data insights to more such tasks, Microsoft handles all database management tasks. With data being extremely significant for businesses, companies look to ensure that their databases are secure, reliable, and available when they need to perform. This is where SLA or Service Level Agreements comes into the picture.
Service Level Agreements define database availability and set an expectation for uptime and performance from the Azure database that is essential for designing systems that cater to business needs. It also defines penalties from Azure in case the uptime is not met. Azure for SQL Database has an SLA of 99.99% high availability. The Azure SQL Database’s SLA guarantees the highest availability among relational database services and was also the first to introduce a business continuity SLA. This shows Microsoft’s commitment toward companies and ensures that their data is safe and the apps and processes on which businesses rely for running continuously in a disruptive event. Azure SQL Database is the only database service that comes with a business continuity SLA.
A few years back two major changes were made in the SLA, first, the Database offered a 99.995% availability SLA in its business-critical tier for zone redundant databases, and secondly, it offers a business continuity SLA for databases in the business-critical tier, which are geo-replicated between two diverse Azure regions. The highest SLA is also backed up by a 100% monthly cost credit if the SLA is not maintained. The SLA also ensures the guarantee of a five-second recovery point objective and a 30-second recovery time objective along with a 100% monthly cost credit when the SLA is not maintained.
Understanding availability SLA
The SQL Database’s ability to handle disruptive events that timely occur in every region is called availability SLA. The availability of the SLA depends on the region’s redundancy of the compute and storage resources, self-healing operations, and constant health monitoring using automatic failover within the region. These operations ensure zero data loss and depend on synchronously replicated data. Thus, uptime is highly significant for availability. Azure SQL Database offers a baseline 99.99% availability SLA across all of its service tiers and is currently providing a higher 99.995% SLA for the premium tiers in the regions supporting availability zone.
The business premium or critical tier is designed for highly demanding applications in terms of performance and reliability. The service tier is integrated with Azure availability zones to leverage the additional fault tolerance and isolation provided by Azure availability zones. The Azure availability zones also ensure a higher availability guarantee through the compute and storage redundancy across the zones and the self-healing operations. The compute and storage redundancy is specially designed for business-critical databases and elastic pools so using availability zones incur no additional cost.
To achieve 99.995% of availability, organizations need to opt for zone redundant configuration while creating a business-critical database or can programmatically do it using create or update database API. The availability is increased to 99.995% through zone redundancy which ensures a maximum downtime of only 26.28 minutes per year. A minute of downtime is the period during which attempts to establish a connection failed.
Understanding Business Continuity SLA
The ability of a service to recover quickly and continue functioning during catastrophic events with an impact that cannot be lessened by self-healing operations is called business continuity. However, there are rare events that can cause a dramatic impact. Stand-by replicas of the databases in two or more geographically distinct locations are provisioned to implement business continuity and because of the distance between the two locations, asynchronous data replication is used to prevent performance impact from network latency. Using asynchronous replication, potential data loss is prevented. This feature of SQL Database ensures business continuity by managing and creating geographically redundant databases.
The impact of business continuity events is measured using two common metrics called Recovery time objective (RTO) and Recovery point objective (RPO). RTO measures how quickly the application availability is restored whereas RPO measures the maximum expected data loss after the availability is restored. Azure database provides SLAs of five seconds for RPO and 30 seconds for RTO, and also offer 100% service credit if these SLAs are not met. It means that if the database failure request is not completed within 30 seconds or if the replication lag exceeds 5 seconds in the 99th percentile within an hour, companies become eligible for 100% service credit for 100% of the monthly cost of the secondary database, provided the secondary database have the same compute size as the primary.