Transcript
This transcript was autogenerated. To make changes, submit a PR.
Are you an sre?
A developer?
A quality engineer who wants to tackle the challenge
of improving reliability in your DevOps? You can enable your
DevOps for reliability with chaos native.
Create your free account at Chaos native Litmus Cloud
good morning, good afternoon wherever you are in the world. Welcome to today's
session on HADR options for Microsoft SQL Server on
AWS my name is Asif Mujawar. I'm a senior specialist solutions architect
database here at AWS. So in today's
session, what we're going to cover, we're going to talk about we'll look
at the introduction of SQL Server on AWS, how SQL Server is deployed
on AWS and what are the options we have to deploy SQL Server on AWS.
Then we'll look into the HADR options on AWS for
SQL Server and then we'll look at how to reduce a way to reduce
the SQL server license cost on AWS. So let's
start with the SQL Server on AWS.
SQL Server on AWS typically you can deploy in two
flavors. So you can deploy on the Amazon
RDS for SQL Server which is our managed service offering with the
single click capability and license included instances only.
On the other hand, we have SQL Server on Amazon EC two which is self
managed vm exactly the same way what you currently do on premises
and with the manual high availability and license included, or bring
your own license instances. Now let's just dive a bit
more deeper into what are the options for deployments SQL Server on Amazon
on AWS. So if you
look at the right hand side, SQL Server
on Amazon EC two is basically all
the admin tasks which basically doesn't really
add any value towards your business processes
such as scaling, the habibability, the backups,
the patching, the device install and maintenance and always level patching
and those firefighting issues to maintain the
space or add a space onto the database service,
that's basically what you currently, currently do on premises.
To maintain a SQL server infrastructure is something you will continue
doing. So when it comes to an EC two, what we
will do, the underlying data
center operations of power racking and stacking, the hypervisor
level and the os install and maintenance, that is something will
be offered out of the box on EC two. But whereas everything else
like the OS patching database install,
the database patching, the backups, the hatcha, the scaling,
you need to do that yourself. And this is where RDS
comes into play, where all those undifferentiated heavy lifting
tasks is what we call it at Amazon. When you don't really
add any value to your end user experience in terms of doing
the OS level patching or doing the device install
and maintenance or datalets patching or backups, the ha,
the scaling, all these tasks are taken care for
you on the other side which is Amazon RDS. These comes as
offering as a managed service, so it comes just out of the box for you.
And what that gives you that
basically releases the resources to focus upon business
value tasks and do the high level schema tuning or
schema optimization. So that way the
tasks which are schema tuning and all of that currently on
premises will be the last thing you worry about because you
have the production issues, the patching
or the backups or the space issues to
tackle. Before you get to these tuning issues, what RDS does.
RDS then takes care of everything those
operational issues for you and leave you to focus on those
tuning which directly contributes towards
your end user experience.
So we'll just look at quickly at the features
on both sides. So EC two is supported with
either by Ol or bring your own license or bring your own
software or pre configured AMis that
is basically virtual machines or pre configured options will the
pre configured options will now have as many versions or options supported AWS custom
build solution that you might build yourself.
So at the moment, how many of you have struggled either because
of expenses or just technical challenges in the setting
of multi site high availability option for SQL server instance with
RDS for SQL Server it's as simple as checking as a checkbox
when you launch an Amazon RDS instance to set up multi AZ SQL server
cluster that leverages synchronous replication between AZs
or using devs mirroring, we'll dive into this as we
go along the session. But if
you need a different ha solution like always on or log shipping, then choose
SQL Server on Amazon EC two and these manage that yourself. Both platform
support storage encryption for all Edition using kms
and those of you running Enterprise edition you can use TDE on
both platform that are transparent encryption.
If you need to install third party tools such as Ola Haleigan
scripts for maintenance plan or use,
then run the SQL server on Amazon EC two.
Otherwise RDS will take care of everything
for you and if you want to take advantage of automated patching
then choose Amazon RDS for SQL Server. Otherwise you need to manage those tedious
maintenance tasks yourself with SQL Server on Amazon EC two.
With that folks, let's dive into the HADR options in
a glance. So typically there
are multiple options when it comes to habibability and disaster
recovery. But we can categorize them into three areas.
Let's just say that simple HADR, standard HADR
and machine critical HADR. With these simple HADR, it's the
VM level protection where if you have a failover or a hyper as a level,
then you are protected at the guest level but no protection at the SQL server
or guest os layer. Whereas the standard layer,
standard HADR, we take that into consideration where we'll provide
you the VM level protection plus SQL server
instance level protection. Plus you'll get the guest os
layer protection as well. And with the basic availability groups, you can have the
device level protection as well. Now when it comes to mission critical
HaDR, it's a combination of simple plus standard plus
what you go in there, you go into the device level protection.
Now, what is the different SQL server protection levels we'll cover in
these next slide? But briefly,
it's about what's been protected at what level in
terms of your failures,
like whether it's at the os layer, whether it's a
guest layer or it's a SQL instance layer or the database layer. Now machinery
called HADR will cover everything.
Plus it will give you seconds of RTO with zero
data loss and with the minimum downtime
for plant maintenance and offer
you the horizontal read scaling.
Now let's dive into the protection levels
as we spoke about. So if you look at the VM
level protection or the VM failover that protects you from the
hardware failure at the host itself or
the hypervisor level failures, or if you have a
guest os level failures,
if now if I want to take it to the next
level, that's your VM failure is a simple HaDR. Now if
I want to take it to the next level, or I want to say,
hey, I want to protect my SQL server level failure if my instance is
hunk, how about that? Now that's something that can
be covered with the FCI, these failure cluster instances,
which we'll dive in a slide or two in terms of diving
into what the FCIs and what the options available for you to leverage
the high vability at these cluster level, which is
the failure clustering instances. Now if I want to protect my
databases, so I protected my hardware, I protected
hyperwaza, I protected for the os level and also protected
for SQL server instance level. But what if my database gets hung
or my database itself becomes unfunctional?
Then how about that? So we have the availability groups
or the mirroring that could help you to tackle that
scenario and fail over to the other partner. We'll dive into this
every single option in a deep as we go through
the slide deck. So let's talk about simple
HaDR. Now, simple HADR, we spoke
about VM or guest level failures.
So for example, at the moment onpremises you have the VM
running on SQL server, running on OEM,
that's using the VMDA or VMDK files or the VHDX
or RDM files that are hosted on this Ovsan.
Or if you have done the RDM directly on the SAN, which is presented from
the same location, now you have the hyperwiser level and the hardware
underneath. Now if the hardware
goes down, the hyperwiser has an issue or the VM
fails, then the VM will move v motion
to the other side and it will start running on another
hypervisor or hardware level. So it's like in a v
motion to the other host to cover
yourself upon the underlying hardware failure. And this
is on premises. This is what currently we do onpremises.
How about if I bring it to AWS,
we have a single AZ deployment. It's basically
you have an ABS volume. That's where the EC two instance is
running. And that's basically you have an auto scaling
group. It can scale as the demand grows, or at
the same time you have the auto heal functionality
within the VM. So if something goes wrong with the EC two instance,
it will self heal the
instance itself or it will move within the AZ.
If the instance goes down, the underlying
host has a problem. The EC two instance will move from one host
to the other host like you have vmotion onpremises.
Now let's talk about the backup and restore other simple HaDR options.
Generally speaking, there are two major types of backups.
Those are backing of the entire SQL instance, backing of the individual database,
or that is basically server level backups are usually
the hardware assisted. Using the special storage technologies, you can
snapshot the underlying storage where a SQL server instance stores its files.
Using these snapshots, you can restore a database instance
to the point that you can perform a point in time
when the snapshot was taken. The other type we have, that is
the database level backup is a native SQL server capability and
does not need any special hardware. It can be either full backup of
an entire database or a differential backup that keeps changes
since the previously taken differential or full backup or transaction log backup
which stores the transaction logs and enables replaying those transaction logs
on top of a previously taken full or differential backup. And therefore restores
a database to any point in time.
So let's talk about the two options, what we have when it comes to backup
and restore on premises versus AWS. So at the moment you have a regular backup
plan running on on premises. You can have the same thing on RDS.
On AWS, for example, you can have weekly full
daily differentials and transaction logs. Backup every
15 minutes. You can run the backup on a local
volume, you can do the same thing on AWS, you can run the backup on
EBIS volume, you can move the backups on archived
this is what I currently do on premises. When it comes to AWS, you move
the backups to s three. Now within
s three you have different options available for you to move
your backups. For example, you have retention policies of the backup that you need to
retain for compliance purposes for more than ten years. Now you don't want to stick
that backup into an s three bucket and then pay additional
charge just for having those backups in a standard tier.
Now standard tier will have a premium cost
to it, whereas if you move these backups which are not used,
say for example, my weekly backups get moved
after two weeks into infrequently accessed bucket
because those backups are not going to need it anymore. And after say
six to eight weeks, I will then move those backup into glacier.
So that way I'm not constantly paying the premium
cost to store the backups. It's always moving to the lower
cost tiers as the file goes.
As the time grows or the backup becomes older and older, we just keep shifting
that to the cost effective storage
where possibilities. Basically what
you can do is point it down. Recovery is available using SQL server backups.
You can do the differential backups to diminish the backup time and space.
And some of the best practices are in every time transaction log is cleared,
full backup should be taken and in AWS. I would
rather suggest if you want to do the server level
backups then consider using the VSS enabled EBS snapshots than
looking into the back end of the entire database. Now let's
talk about the backups on RDS. So RDS supports both types
of backups server level as well as SQL server
DB level backups. So when you create an Amazon RDS SQL
instance, it already includes an automated server backups out of
the box. It takes daily snapshot of the entire instance
and by default retains these snapshot for seven days. You could also
configure it to another retention period that works better for your needs
up to 35 days in Amazon RDS,
these are called automated backups, and they are entirely managed by Amazon
RDS. Apart from automated backups, you could also create user initiative
backup. Similar to an automated backups, user initiated backups are also
server level snapshots of the underlying storage blocks.
However, unlike the automated backups, the user initiated
backups are not automatically deleted by Amazon RDS,
even when you delete your entire RDS SQL instance.
Although all associated automated backups are deleted, any user initiated backups
remain available until you explicitly delete them. This is
especially useful if you have compliance requirements that
mandate retention of backups for longer than 35 days.
You could create an automated job that takes user initiated
backups on a regular schedule and deletes expired
backups after a given time, for example, ten years or 15 years or
more. So RDS for SQL Server also automatically
stakes and maintains the transaction lock backups every few minutes.
So what that does, that basically enables you to restore an Amazon RDS
instance to point in time example, you could
use Amazon RDS console to restore a SQL server instance to
a specific time into a new instance.
In this case, RDS spins up a new SQL server instance,
restores last backup from these specified time, and then replays the
transition logs up to a specified point in time. And the end
result is you have an MSSQL server instance with all the databases
restored to specified time. Now, RDS also supports the
native backups for an individual databases.
It allows you to create or restore a full backup,
and these backup files are stored
in an s three bucket. The backup can also be a multifile,
which is especially useful for very large databases since the maximum single object size
in S three is five gig. So when you're backing up
or restoring various large database, the multifile backups
might be the only possible solution. These full backups can be
used to restore SQL server databases on Amazon RDS instance,
or move the databases across RDS instance, or migrate the database off or from
onto onpremises instances.
So that's RDS. Now let's talk about SQL Server on EC two.
On SQL Server on EC two, you are responsible to handle
some of the backup maintenance capabilities that comes as turnkey solution in
RDS. What we looked at however, in AWS, there are still plenty
of options and services enabling you to do that.
Several backups do translate to EBS volume
snapshots, which can be created by invoking single API or through
AWS console. The EBS
snapshots are incremental, meaning only changes since
the last snapshots are captured for subsequent snapshots.
Therefore saving on storage costs and reducing time it takes to snapshot.
EV snapshot are also a point in time, which means Esnapshot
capture all the blocks of associated EBS volumes as
at the exact time when the snapshot was triggered. When you
restore EBS volume from a snapshot, it becomes immediately available for
application to access it. So in this case, it means
you can immediately bring SQL server online after restoring one
or more of its underlying EBbS volumes from a snapshot.
So by default, the restore volumes these underlying blocks from S three
at the first time when the application tries to read the data, read them.
So this means there can be a lag in performance,
while after an EB's volume is restored from
a snapshot, the volume eventually catches up with these nominal
performance. However, you could avoid that
lag by using fast restore snapshot. This feature enables you to get
full performance immediately after restoring volume.
You could also create an AWS level SQL server backups.
It's much more similar to how you do currently on premises.
You could have in AWS you have highly durable and highly
scalable and cost object store. So that is s three
to store the backup files. And these s three gives you high durability and high
availability. And using different storage tiers in s three you can also
cost optimize your backup solution.
You could also use AWS backup to automate server level backups
for SQL server instances. Now AWS backup manages
CBS volume snapshots at regular intervals and you specify
for that you specify. So for example,
you have a consistent tag. So what you need to have
for AWS backups is the consistent tagging
policy across your SQL server instances. Now AWS backups is
simple and easy to use. However, to achieve this consistency for SQL
Server DB level within volume snapshot, the best practice is to
use to use the VSS enabled snapshot.
Now let's talk about the VSS,
the Windows volume Shadow copy service.
It's a solution that orchestrates the os and application access to
underlying I O now enabling crash consistent
snapshot of underlying storage without any application downtime.
SQL Server is VSS aware. Therefore you can use VSS
agent to snapshot the SQL server instances. This is made much easier
for you to run SQL Server on Amazon EC two.
You can use AWS system manager to run a command not only to
install the VSS agent on your SQL server instance, but also to
invoke the entire workflow of flushing the os and
application buffer to the disk. Pausing the I O operation and taking a pointer snapshot
of an EBS volume, then resuming the I O. And all of this what
I just described could be achieved in a matter of
less than 10 seconds and using a single API command through AWS system
manager. Now this run command document creates a snapshot
of all the EBS volumes attached to the target instance. You also have
these option to exclude the root volume as most often the user
database files are stored on the other volumes in case you stripe the EBS
volume across multiple volumes to create a single
file system for SQL Server files. EBS also support crash
consistent multi volume snapshot using a single API command.
So one of the boldest characteristics of the cloud is to
either these large and sometimes
rapidly growing environment at scale. If you have
one or two SQL server instances, you could easily log into each instance,
schedule a backup job via SQL agent,
and somehow manage to transfer the backup files to s three bucket
when they should be where they should be restored and retained.
However, if you have tens and hundreds of even more instances,
scheduling the job on each single instance can alone be
quite onerous. Fortunately though, AWS gives you a
way to efficiently do such tasks at any scale.
So you can use AWS system manager state manager to
schedule a regular backups and also have a backup files
transferred to a target s three bucket. So state manager associations
can target individual instances or all instances
that have a common tag applied to them. For example, you can use hundreds
of UC two instances to have key called SQL Server
backup or message SQL backup and have value of automated
applied to them. Now then using system manager
state manager, you can target all of those instances at once. Combining that
with the system manager maintenance window enables you to ensure the
backup job doesn't disrupt normal business operations and at
the same time, system manager also knows the rate control policies. For example,
you can indicate only few instances should be backed up at one time,
therefore avoiding all devices being impacted by backup jobs at the same
time. Now the system manager can
handle hybrid environment. You can use these same solution to manage your SQL
server instances running on premises or any other cloud steel
that's these video system manager. Now that's
all good about backups. How about restore? So backup
is half story, the other half is the capability to restore these backups.
Every business needs well defined, well rehearsed processes
for taking backups as well as restoring them. So restoration
procedures can also depend on customers
requirements. Basically whether the backup files have to be
restored on an existing instance, whether the backup has wherever the backup
was originally taken, or on a different instance altogether, or entirely
new instance should be created for the purpose of
restoring this backup. All of these can be defined and
automated using AWS system manager automation document.
System manager automation can create new
EC two instance or discover identify an instance on
which the backup should be restored. Then it can invoke the SSM run command
which instructs the target instance to fetch the backup file from S three
and restore on the SQL server. Using a proven and tested automation workflow
is basically eliminating those human errors during an actual time of disaster,
which also reduces the meantime to action and resolution of incidents
or change requests.
Now that was simple Hadr folks so far.
How about standard? Now let's just take it to the next level where we
have covered ourselves from guest level failures or
we doing the traditional backup and restore well tested approach.
Now I want to take you to the next level where I want to talk
about I want to cover myself from the hardware
failures plus the VM guest table failures, plus the
SQL server instances. That's when FCI comes
into play. This is something that Microsoft introduced since SQL 2008.
That's when they came out with failover cluster instances.
So let's look at the typical on premises implementation of
always on failover cluster instances. So typically
you have two nodes. One is online, the other one is
in a hibernate mode or on a standby which is
sharing the same storage,
either that's on a CSV, SMB or NFS.
So it provides you the high vibration of the instance level.
Typically you deploy in a single DC and resilience
against gives you against SQL server instance failures or
the os level failures. Now standard edition
only supports two instances, whereas you can go up to 50 instances
on enterprise. If I
were to do that same thing in AWS cloud,
how would I do that? Now you have the AEBs
volume that can be multi attached to the different instances
at the same time, and you can have the Windows HadR cluster built on top
of those two EC two instances, single AZ usually
sitting in the same subnet. That gives you resilience against
these OS and SQL server instance level failures. And now you can
use a typically network balancer virtual IP for WSC
static IP in the same subnet.
Now let's just take it to the next level, right? I want to
go multi AZ. This is where I'm going to realize more
value because I'm still running into the single point of failures. Like, and if you
look at the single DC deployment, if the entire DC goes down on premises,
I don't have a single point of failure.
Similar sort of thing. If I go to the single AZ deployments into EC
two, I've got a single point of failure. If the AZ goes down,
can't do anything. So hence the reason we normally
recommend customers to think about multi
AZ deployment for mission critical databases where you have lower RT
and RPO and you can't afford a downtime, these databases should be
deployed in a multi AZ deployment. So FCI
or multi AZ again you have the storage
comes out of the box like SMB, these .0 sTd Iscazi
basically what you do is create the multi AZ on a multi subnet.
Resilience gives you the os
level 60 failures or the AZ failures because
of the low latency network. Within the multi AZ architecture,
the stretch cluster is a practical option, provides higher resilience
as compared to on premises. If you're looking at the stretch cluster,
there is a lot of network intricacies that needs to be handled and looked at
before you decide to do a stretch cluster.
I mean I've been bitten myself in the past when I did decide
to do these stretch cluster and then there could be a lot of network level
issues and blips that could cause a false alarm on a SQL
server and SQL server to failure. So hence
the reason most of the time onpremises stretch clusters are
kind of no no practice. But because we
have the low latency multi AZ architecture, implementing on
Multi AZ SQL Server FCI
as a stretch cluster is a possibility.
Now let's take it to the next level. We have something called
a service called FSX for Windows.
This is a file share service that is built and optimized for
the Windows workloads. So you can, rather than having have
a multi attached EBS volume, you can say I'm going to create a
file system which is where I'm going to store all of my data files and
it just becomes an SMB share file share that is attached to a SQL
cluster, the Windows cluster. So you're still going to have the Multi AZ architecture
here where your primary instance is sitting in AZ one, your secondary instance is sitting
in AZ two, and you create the FSX file share and
then provision it. Now the good thing about these file share is you can provision
the throughput AWS compared to what you had in the
EBS volume where you have to configure individual EBS volumes
to configure throughput, whereas here you get collectively
throughput what I need to achieve AWS my entire file store.
That's a good thing about this. And then you can have a witness file share
in AZ three that can keep an eye on the cluster itself to
make sure the cluster fails over and the quorum is protected
for any failures rather than having AZ one taking care of.
So if you have a corum sitting on node
one, then you kind of end up in a split bin scenario,
possibly. So to avoid that you can go into AZ three. That's where the
Windows file share system is. Witness file share is that
could keep an eye on the Cluster gorum and decide to
failure nodes from the primary to secondary in
a timely manner. So folks, that was the FCIs
and the FCI options we have on the SQL server
on AWS. So there's another
way we can achieve the SQL server high availability using
the storage develop options. So storage
Space Direct STD is another Windows technology that allows you to
build and deploy software defined NAS across multiple
instances as a converged or hyper converged model. You can have the SQL
server instance and SDD together, or you have SDD
and a SQL server on different instances. Now it supports these ntfs
and refs and supports up to 16 servers or 400 drives
would scale up to six petabyte. And this is on ABS volumes.
You can do SD on the instance storage as well using
the I three, the D two instances. It's a simple setup,
high performance, up to 200 kiosks because of the underlying nvmes
you're using the instance local store built in resistance for against these disk or server
failures. Now these are ethermal.
So far we covered about high vability. How about my disaster recovery?
High vability will give me the protection around the hardware failures.
What if my entire region goes out? Or what if my entire DC goes
out? What are my disaster recurruption? Backup restore is an option, but backup restore
has got higher rtos and rpos.
What are my options for lower rtos and rpos? So one
of the options traditionally is the lock shipping. So this is
a typical implementation of lock shipping onpremises.
So typically our option for SQL server since back in the days of
SQL 2000, those who are old enough to have worked on it, I'm being
one of them. So with log shipping, it doesn't matter whether
your secondary instance is in the cloud or onpremises.
The secondary database is offline. Restoring the logs received from the
primary. So log shipping has four steps. First, you're backing
up the data on your primary instance. That is basically your transaction
log backups. Secondly, you're shipping or copying that transaction
log backups from the local server or the primary server into
a central file server location. And then basically the
step three is to download that backup file
from the central shared location or file share into the secondary server.
And step four is to apply those transactions or
replay those transaction logs onto the secondary server.
Now you have full control over these jobs. So you can delay
the log copy or log apply to secondary. If you have detected some problem
in the primary and you don't want to apply those logs to a secondary,
you can do so. So you can choose the job schedule as
low as five minutes when the jobs are the
frequency of backups on the primary copy and
copy to secondary and restore. Or you can go AWS has
60 minutes for example in terms of log backup and copy and restore.
It has a direct impact on your RT and RPA.
So be mindful of picking up the schedule. So these best
thing is to when
it cons to copying and applying the secondary,
the logs on the secondary, it gives you that option of mitigating if
there is a corruption or any logical
data that has been deleted, then you can pause it. And then the
only thing that you need to reestablish it, the lock shipping, if you pause it,
cool. So coming into AWS, if you were to do this
setup in AWS, so you have the primarily sitting on
premises and you want to do a lights on. Dr. Then you have a lot
of shipping. So you can use similar sort of thing where
you have the on premises model,
where on premises you're backing up the data pushing
across into an s three bucket. And then we are scoping from the s three
bucket and storing into applying it to an EC two instance.
SQL server running on an EC two. This compute can be very minimal
because all you're doing is restoring the data so it doesn't demand the same
amount of compute as primary. Or you can have
the AWS instance sitting in primary into AWS EC two instance.
Or you can copy the data back into onpremises if you like.
Or you can have a multidr multiregion Dr. Deployment strategy using
lock shipping as well. But if you're in the same region, I would rather
suggest you to do the EBS snapshot, the VSS and EBS
snapshot that is much more quicker and efficient in the
same region to be able to do the disaster recovery.
Now we have another option is cloud NDO. Cloud NDO for
Azadir. Now what
that is basically it's a service. It operates at the block level
as well. AWS, the VM level, it replicates all the blocks in the
VM, including app data files at a block level and it keeps a track of
the block changes into the staging environment. These are
maintained in the EBS volume at the time of Dr. Cloud endure enables
you quickly spin up the instance with all the apps configured
along with its data in your desired region. For example,
SQL Server instance will be up and running along with the data, ready to
accept the connections. And cloud endure for DR is
a paid service for Dr. But when it comes to migration,
it is for free. That's something you need to keep in your mind.
Now let's look at the basic availability groups, right, the AWS,
we looked at the HA in terms of
and the disaster recovery in terms of what SQL Server all the underlying
storage level has to offer. Now there is something called ability groups
which Max have introduced in SQL Server 2012 to
take over the mirroring as mirroring is, as we all
know is deprecated from 2014 onwards.
So what is basic availability groups is basically similar sort
of thing. What you have multi AZ fcis for example,
but it's not really an FCI. The obligatory group runs on
top of these windows failure clustering technology. But in FCI you
have the shared storage with availability groups. You have independent storage
with the basic availability groups that is available in the standard edition.
The only limit is two replicas, primary and secondary.
You can't access your secondary replica, you cannot do a backup on the second replica.
You cannot run any integrity checks or anything.
You cannot upgrade your basic ability groups from say
to an advanced availability groups.
You basically have to drop and re add.
And basic ability groups cannot be part of the distributed ability
groups which we'll cover in the slider two so basic average
group, just think as one database. So in
the availability groups it's just another mirroring setup,
but it's just underlying. We are using the independent storage
copies and a failover clustering technology to be able
to facilitate the failovers from one server to the other
server. And with the basic ability groups you can only have one
database in that availability groups. But whereas
compared to these advanced availability groups,
you have the multiple databases that can be
added in the single ability groups. That's the key difference here
folks. So with that we'll dive into the mission critical HaDR
options. What do I have in terms of if
I want to achieve low RT and RPO? And I want
to cover almost all possible scenarios when it
comes to protecting my SQL server workloads.
So that's when the availability groups comes into play. These are,
like I said, it's introduced in SQL Server 2012 onwards, it has taken over
these mirroring and taken to these next level and combining with the Windows failover
cluster technologies. So let's talk about a pattern of
deployment. So we have what we looked at in the previous slides
with an FCI where we have multi AZ deployment
with the failure cluster instance which was sharing the same storage and the storage
was moving between the AZ. Now with the availability groups you can have
independent storage per AC and SQL
Server is replicating the data at the instance level.
That's basically a synchronous replication with automatic failover
using the same Windows technologies, Windows failover Cluster
and then you get an endpoint available for you to application to connect
it. So when there is a problem on AZ one, you can simply fail over
to AZ two. But the application doesn't notice anything because it's connecting a
listener or a whip. So it's seamless to
just say hey, I was talking to AZ one,
database one. Now instance one, I'm going to AZ
two, instance two, but for the application it still
continue to talk to the database. That's the difference here.
So that's one pattern here. So if you look at this is
a multi AZ within the same region. Let's take it
to next level. I want to go multiregion, how about that?
You can still do that. So you can have your synchronous replication
within the region with the automatic failover between AZ two and AZ three.
But you can add another region and you can create a stretch cluster,
goes across the region and have the transit gate to enable
both accounts within the regions to talk to each other,
get the vpcs communicating to each other. And that
way you can start replicating the data into a secondary
region in Aziban. But this will be asynchronous replication
because you can have only one copy, asynchronous,
asynchronous replica. If you're going
across the region it will be asynchronous and it will be a manual failover.
So that's one pattern. What we
have another pattern, AWS, a distributed availability groups. So what you're going to have,
you're going to have two separate windows failover clusters
in region A and a region B. And between
these nodes they will talk to each other on a
synchronous replication. Now as we looked at in previous
configuration that we have synchronous replication going from Az one to Az two
in a region a. As soon as we introduce another region, it will be an
asynchronous replication with manual failure but
what if I have another cluster which
is sitting in region B? Can I do that? Because in the
previous example we had a one cluster going across the region. Now we are
bringing in two different clusters, so we don't have to worry about stretching the cluster,
stretching the ad objects and all of that. So we'll have two different independent
clusters and these that way
what you're doing here, you're going from primary replica in Aza
in Region A into Region B, Az one as a
forwarder. So that is asynchronously replicated.
But the secondary replica in Region B in Az
two, these synchronously replicate the data
between Az one, region B to a z two
in region B, rather than going all the way back to region A, Az one.
So a replica in Az
one in region B plays the role of forwarder.
So you can fail over between two Az. And then because the
requirements, you might have to run the workloads six months from
Region A, you can run the workloads from region B. So when you fail
over to Region B, you still have that same level of protection
what you had in region A. In the previous example, you only
had protection in region A, but the region B only had one
instance. So that's how you can tackle another
option available for you if you have the requirement to run the workloads
from both regions six monthly.
So let's just go dive into the comparison between AWS
versus FCI. So FCI gives you instance level protection
for H and Dr. Whereas Ag gives you devs level protection
in terms of replication. It's fully automated because in a
way you can have the hardware storage level replication
playing in a play, or your
FCI is basically saying the same storage,
same master databases that can go in across to the other side,
whereas oblivion groups, you'll have to manually replicate the instance level object
like an example, logins and link servers and et cetera.
Scale up and down in FCI is not supported. It's supported, I'm sorry,
because you can add the extra nodes if
you like. On AG, same thing. You can add extra nodes
and extra copies. Apart from the basic
availability groups read scale out FCI
is not available because you can only have one primary and the rest are passive
nodes, whereas the AGs, you have the read replicas
available where you can add reader nodes onto
the enterprises edition, non standard asynchronous
replication. Again, you have the FCI, the block level storage, you can replicate the block
level storage to the other node or other storage, and you can bring up
the storage on the other side and off you go. Whereas on these Ag
it's on the database level.
So you can combine the FCI and Ag in a
one cluster. So some of the things which you need to watch out for is
always on FCI in a single AZ for Ha and Ag
on a second AZ for Dr. You can use the FCI inside
a cluster placement group and AG for a Dr. In a second AZ.
You can use the FCI across multi AZ for HA and averted
groups or ag to the second region for Dr. Or you can use FCI
for HA and Ag for read replicas.
So failover between FCI in an Ag is
always manual in terms of when it comes to Dr. If you combine these
two together, you will not be able to have an automatic failure if you combine
FC and AG together.
So that was all about hivability
and a Dr. On AEC two. Now how about rds,
how things work when it comes to HADR
and RDS? So you
have similar sort of sets up. You have a region and you have three AZ
where app servers are running and you have RDS for SQL server running. In Aza,
for example, all the apps are talking to the AZA.
What if SQL server instance goes down? Now we
have things like health monitor and self recovery
capabilities in a single AZ deployment that will self heal the instance
itself, it comes back on and it starts
working. But what if your entire AZ goes
down and this is where the problem is, you have a single point
of failure. Your application sitting in AZb and a C
can't work anymore because you lost these SQL server sitting in Aza,
not the instance, the entire AZ has gone down.
So you have a single point of failure there to address
that. What we say customers to deploy the SQL server
in multi AZ in RDS setup. So similar
sort of thing. You have the app stalking from different AZ into Aza.
These we create another instance in AZB and
we start replicating the data. Mirroring or always on. Now mirroring is anything
less than SQL Server 2016. Or if you are beyond
so and turn 2017 then you will have the always on groups
available, be it standard or be it enterprise.
So what happens if your secondary fails?
That's these instance level failure. That's fine. Your application is still continuing
to talk to primary service which is sitting in Aza and
it will continue to do so and application wouldn't notice anything. And once the
secondary AZ instance recovers,
it will still start receiving the data from the
primary server as it had before.
Now what happens if the primarily AZ goes down.
That's interesting. Now what's going to happen? You're going to have the
failover kick in. It will fail over from your
primary role to a secondary instance which is sitting in AZP.
Now that takes over AWS, a primary, and then all the application
servers are started pointing to that new primary.
Now here, because you're using the endpoint,
you wouldn't necessarily, your application wouldn't notice anything that has underlying
these episode host or instance have changed. It will just continue to work as
it was working on the failover cluster instances, sorry, the always on
availability pros on EC two or currently on premises with a listener. Similar sort of
thing happens here. It will just simply seamlessly fail over
the applications to these second instance and your application will continue to
work without noticing that instance has been replaced underneath.
And once your primary AZ comes back on or the instance
comes back on, we will replicate the data from AZB
into Aza. So this
is all happen automatically. You don't have to set
up anything with a single click, whereas the options, what we looked at so far
on EC two, those are built and managed by you,
whereas in RDs everything is out of the box. You just
click one button under the hood, we will do all
that automation of creating another instance, backing up
your primary databases, replicating that to secondary instance and
configuring that always on, mirroring on top of it, and these allowing you
to say hey, your database is protected now in the multi AZ. So with
a single click, let's look at the failure timeline.
So when you look at the multi AZ failure timelines on a standard edition
with the Davis mirroring, typically one to two minutes,
and because it has genus propagation,
if you look at the with always on,
it's literally under 10 seconds. Because there is no DNS propagation,
it's simply just switch it over to the other side and off he
goes. So HadR
options, what we looked at so far, we looked at the backup and restore,
we looked at the snapshot, we looked at the lock shipping, we looked at the
storage replica, the cloud India for Rico. Dr. We looked
at the FCIS, we looked at the ags and we looked at the district
AG script. And these are the options available for you
to work on EC two with the backup restore. You can
use that on RDS as well, or RDS itself has an automated backup and
multi AZ for what we looked at in terms of your disaster
recovery ha scenarios. With that folks,
we come to the last topic, which is cloud enables to optimize licensing cost.
How we can help optimizing the licensing cons when it comes to
SQL server deployment. So this is a typical scenario. On premises you have
an enterprise edition SQL server installed in an OLTP
environment and it has got two nodes sitting inside a and site b.
Now in a primary site you have your OLTP
workloads constantly writing the data synchronously replicated to the secondary
ac, and we have opened up the secondary
site for reporting purposes and you're backing up the data
from these. So basically in a nutshell,
as soon as you open up the secondary instance you need to pay for the
licensing for that. So both sites are constantly used,
hence these reason you have licensed both sites.
So say six cons on this side and six
cons on that side. So total twelve cons implementation
with always on ages on premises. Now let's replicate that into
ECT on AWS going to do the
same number of licenses because you have doing the same thing.
There is no improvements on premises in terms of licensing cost.
Now let's go into the change a bit. Now what I say is
basically my OLTP
workloads are coming in the primary, but my secondary instance itself is
passive and it's waiting to take over. But there is no activity on
that instance. So I don't have to pay for the licensing for those. So I
only need six got in
this scenario I can create a snapshot, the VSS snapshot of
the secondary instance and I spin up another instance
from that. That could be enterprise edition or standard edition depending
on what features I need opt in and I can
only pay for this instance when
I use. So it's basically license included instance.
You don't have to pay for licensing from
your own point of view as a license included instance. So you
don't have to pay for extra licensing cons. When you don't use secondary
from going from active to passive replica, it may be possible
to downgrade from enterprise to standard. So it's
basically an option for you to optimize licensing. At the moment you are only onpremises.
In previous model you are using twelve cores. Now you
have an option of downgrading in an instance edition from
enterprise to standard and you're using six for BYOL,
whereas the other instance we're using for reporting or
the OLED workloads. You can spin up the instance when you
need to and that way you can control the cost around
the SQL server licensing or also you can use the license related instances
with that folks that bring me to the end
of my talk. I hope you enjoyed it as much aws I did.
And it was lovely talking to you. Have a good day. Thank you.