Transcript
This transcript was autogenerated. To make changes, submit a PR.
Everyone, this is solving migration pipelines.
When we talk about migration,
there are, this is a nuanced topic in
some shape or form, all the backend develop person, the administrators of course,
will be going into this
long migration projects that might seem
like it has no end. How to think about a
migration and what are all the types out there in
the pinot wild and how to have
a Zen approach towards it. That's what my talk is all
about. And it's not too technical.
That's the first thing I would say.
This is fully process driven. And how to think about a migration project
is what we'll be focusing on. So what is
considered migration? When you say migration data.
Migration is a process of transferring one
data from the source system to the target system.
Throughout the talk I'll be mentioning source as the database
or the system that you are using currently in the destination target.
That is the new system that you will be switching on
to. So whenever
this kind of migrations happens, it is
pure chaos. So how to solve
those problems ahead of time and how to think about
all this in a very organized,
systematic manner. That's what we are going to talk about.
So let's get into it then.
There are a lot of types of migrations out there.
You will know database migration directly. Most of
us would have worked in some shape or form in that.
So that's from one database to another.
Either it could be due to an upgrade or through
a cloud or anything of that sort. And cloud
migration itself is a very big topic when you
say you have on premise and you have running those application for a long
time in on premise and it is moving to cloud now,
right? And there are application migrations and
new type of migration is business process migration.
That's my specialty as well. And it
is because of all these SaaS tools that are out there.
And the last but not
the complicated one, is the storage migration.
So well that said,
all the migration has same type of
problems and the same type of systematic
solutions will apply for any types I have set.
So what are the systematic
solutions that you are proposing here? So when
assess first and then plan next.
So when you think about cloud migration, especially assessing
what you have currently, what are all the environments,
what are all the dependencies, what are all the type of servers, what are all
the CI CD pipelines? And the different
aspect of how you currently run your system
is going to help you very much in
planning the migration itself without assessing.
If you go in stating this is my front
end, this is my back end, and that's it. There could be some
script that is vital to your project that is
running currently and serving the system will not be thought about.
So how you are going to think about that
particular small script that is running in your system and keeping it up.
How would you think about that when you already
have a live system running and if it is mission critical,
you have to have it listed, of course. But even
if it is not, you have to have it in your assessment
so you will know its importance in how you are going
to onboard such lesser
used systems also in your new environment. So assess
first assess what you have and then plan
for the next plan end to end. When I say plan end to
end, do not just think about down the
line. I want to just move
all the things to target and just connect my
application to my target, the new target, right?
The new system that you have, but that's not going to help you,
right? So you have to think about your BAU
currently running in your source system and
what is going to be bau that will be running in
your target system, that is the destination system and how
you are going to optimize and solve the problems that are
here in your current source system.
So thinking but end to end of
it and making sure your BAU runs after
that correctly is very crucial,
especially in terms of databases and applications
and very, very crucial for cloud based migration.
So plan it end to end. Have a plan
for CI CD pipelines. Have a plan for how many users
are going to use it and how many users will
be onboarded on the migration phase, how many of them
will be testing that in the destination system.
And when the BAU starts with the destination system,
how will I onboard the application users,
the database user, how will I make sure the entire structure
that is there in the current system
is not damaged when I go into the target
system, that's what you will be planning and you will
always be keeping security and governance
as your pillars to decide any
and all decisions, right? When you're choosing a product
out there, when you're building a pipeline,
whatever you do, even physically moving
the storage disk to one place to another, you will be thinking
about security and governance first,
right? And then yes,
that's the next thing, right? You develop some pipeline.
How will you test it? So testing can occur in
multiple different phases. Always plan for intermediate
testing phases where you will build a pipeline, do some
transformation, put it out there and have the UAT user
based testing as well onboard simple beta users to
make sure your target is performing all the
way through every Persona that is the database
administrator and the application administrators and
your end users are able to use the target system
with the test amount of data, at least to the
BAU level, not to the disaster level. We will
get into that, but at least to run the BAU.
Will the target be efficient enough? What are
the things I have to do next? So test in phases,
test for all personas and have a foolproof
testing mechanism for even if it is a storage,
have a byte level testing.
Have a way to test any
kind of scripts that you are running, right?
Any kind of intelligence that you are learning and any kind of
business reports that you are running.
Just try to open that and
see what is happening. Is this adequate?
Just try two see in the layman level as well
and write up to the administrator level.
So that's the testing mechanism in post
migration audits. So of course every migration
should follow with an audit to make sure whatever process
that is followed throughout the case is done correctly.
Right. Most fundamental problem that you run into
in a migration project is can data capture,
right? How will you make sure your source
system and target system are almost on par and
you are going to go live within like two
days or so? How will you boot those two days data? Two here.
I'm of course taking a vanilla example, but actually
it could be running in parallel for months or end and
you have a change data capturing mechanism to
make sure all your systems, whole systems data
move to target. So of course there are a lot of products
out there to attempt. Two solve this in different aspects,
but mainly those are in two categories,
log based and query based.
Log based will take the logs from the source system,
reverse engineer it and produce the data in
your destination system. Query based will happen
with a time based query. So I have done a
migration by second
and I'll go 1 hour 50
five second. And between these timestamps,
whatever happened in your source system has to be copied
to target system. So these are the main
can data capture aspects in different
kind of databases. It is handled
differently. If you see Oracle,
you'll be able to transfer two
a pipeline and most of the cloud projects
has pre built pipelines for making sure CDC
is captured right. So think about
really, really in a zen manner. You will not never
be on par with your source system ever.
You will always lag behind and how two
smartly lag behind and how to make sure
the switching off of your
old system and moving to the new system
happens little bit,
might be minutes or seconds at least.
There could be a minimum data loss.
So think about that and design your system accordingly.
One way to think about it is offline.
Taking it offline. So I mean about taking
it offline is that taking the source system offline
and then bringing the destination system online
and making sure the intermediate data
is being transferred here, otherwise doing online,
then you might have to think about approaching
all the service by service. And there
are plenty of ways to bring up the system
online. Also,
CDC is a mathematical problem, so there is
no way you will get equal to or
ahead of target. So that
is not possible. Not at least
if we are not talking about quantum computers, right,
quantum computers and those kind
of unrealistically small realm.
There could be instantaneous data transfer, but technically
that is not possible in our reality.
So let's focus on
database migration. So in database migrations, the key
things you are looking at are the data has to be transferred
completely. There shouldn't be any problem
in bits or bytes. Then your database will not come up,
of course, and duplicate free. So every time
when you think of a migration, you also think of
a cleanup and what is not currently working well
in my database, shall I redesign and improve
the indices or any other schema level changes
to make sure the quality gets better?
And when you try that kind of refactoring
aspect of it, you have to think about data
being duplicate free. And the order order
is really important. So the timestamp wise or
alphabetic, whichever business sense it makes,
those order has two be maintained consistently
in the source to destination. Right?
So that's a very brief thing
about database migration. You have plenty of ETL
tools out there you can choose from, all the way from
freeware to most developed ETL
pipelines that can be adapted for
your use. Case cloud migrations cloud migrations,
of course is a relatively
new thing for enterprise scale. But for
a startup, if you are not developing in cloud,
where are you developing? That's the question, right? So yeah,
in cloud migrations you have to think about
a lot of things. So migration project
itself is mainly thought about with this six
r's, right? These six r's will carry forward
in application migration, database migration everywhere.
But in cloud migration, it's the soul of
it. That's how you decide from doing one
thing or the other. So everything comes
from your six odds. Once you explore
assets, what are the source system it has?
You will go into six r's, assign the
respective r to this list of systems.
You have found out that it is there in your source. So in
that you see rehost,
replatform, repurchase,
retain, retire, or refactor
so rehosting is left
the entire thing. Put it in your cloud platform
and start it back so the rehosting
will have lift and shift approach. It's a very
basic strategy may not apply for the
quality improvement on any specific cloud
based solution. If you are thinking about then
rehosting may not be the right solution.
But if you are going from one cloud provider to another,
lift and shift could be very good. Like the
same solution, the equivalent service will be provided by
another cloud platform as well and rehost can work
replatform to make
sure the platform is more
in line with cloud principle like thinking about can
CD and automatic testing
deployment strategies and your kubernetes
containers, all those kind of things when you think about
these will be replatform structure repurchase.
When you have old ecommerce kind
of software running and you want to move to cloud
at that time, you will have same kind of SaaS options. Why not
go for that? That is repurchase.
Sell the like, stop the
services of your old ecommerce solution
and use something that is more in line with.