Conf42 Cloud Native 2023 - Online

Solving Migration Pipelines

Video size:

Abstract

Migration project often involves a two and fro system that means you will need 2x infrastructure to maintain 2x operational problems to solve. There are industry wide standards that can be opted for any Migration but how would you waive that into your specific Migration problem.

Summary

  • Migration is a process of transferring one data from the source system to the target system. How to think about a migration project is what we'll be focusing on. Whenever this kind of migrations happens, it is pure chaos. So how to solve those problems ahead of time?
  • There are a lot of types of migrations out there. You will know database migration directly. New type of migration is business process migration. And the last but not the complicated one, is the storage migration. All the migration has same type of problems.
  • assess first assess what you have and then plan for the next plan end to end. Have a plan for CI CD pipelines. Test in phases, test for all personas and have a foolproof testing mechanism for even if it is a storage. Every migration should be followed with an audit to make sure everything is done correctly.
  • Most fundamental problem that you run into in a migration project is can data capture. How will you make sure your source system and target system are almost on par. Two solutions are log based and query based. Think about that and design your system accordingly.
  • In database migrations, the key things you are looking at are the data has to be transferred completely. Then your database will not come up, of course, and duplicate free. You have plenty of ETL tools out there you can choose from.
  • Cloud migrations, of course is a relatively new thing for enterprise scale. But for a startup, if you are not developing in cloud, where are you developing? In cloud migrations you have to think about a lot of things. Everything comes from your six odds.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Everyone, this is solving migration pipelines. When we talk about migration, there are, this is a nuanced topic in some shape or form, all the backend develop person, the administrators of course, will be going into this long migration projects that might seem like it has no end. How to think about a migration and what are all the types out there in the pinot wild and how to have a Zen approach towards it. That's what my talk is all about. And it's not too technical. That's the first thing I would say. This is fully process driven. And how to think about a migration project is what we'll be focusing on. So what is considered migration? When you say migration data. Migration is a process of transferring one data from the source system to the target system. Throughout the talk I'll be mentioning source as the database or the system that you are using currently in the destination target. That is the new system that you will be switching on to. So whenever this kind of migrations happens, it is pure chaos. So how to solve those problems ahead of time and how to think about all this in a very organized, systematic manner. That's what we are going to talk about. So let's get into it then. There are a lot of types of migrations out there. You will know database migration directly. Most of us would have worked in some shape or form in that. So that's from one database to another. Either it could be due to an upgrade or through a cloud or anything of that sort. And cloud migration itself is a very big topic when you say you have on premise and you have running those application for a long time in on premise and it is moving to cloud now, right? And there are application migrations and new type of migration is business process migration. That's my specialty as well. And it is because of all these SaaS tools that are out there. And the last but not the complicated one, is the storage migration. So well that said, all the migration has same type of problems and the same type of systematic solutions will apply for any types I have set. So what are the systematic solutions that you are proposing here? So when assess first and then plan next. So when you think about cloud migration, especially assessing what you have currently, what are all the environments, what are all the dependencies, what are all the type of servers, what are all the CI CD pipelines? And the different aspect of how you currently run your system is going to help you very much in planning the migration itself without assessing. If you go in stating this is my front end, this is my back end, and that's it. There could be some script that is vital to your project that is running currently and serving the system will not be thought about. So how you are going to think about that particular small script that is running in your system and keeping it up. How would you think about that when you already have a live system running and if it is mission critical, you have to have it listed, of course. But even if it is not, you have to have it in your assessment so you will know its importance in how you are going to onboard such lesser used systems also in your new environment. So assess first assess what you have and then plan for the next plan end to end. When I say plan end to end, do not just think about down the line. I want to just move all the things to target and just connect my application to my target, the new target, right? The new system that you have, but that's not going to help you, right? So you have to think about your BAU currently running in your source system and what is going to be bau that will be running in your target system, that is the destination system and how you are going to optimize and solve the problems that are here in your current source system. So thinking but end to end of it and making sure your BAU runs after that correctly is very crucial, especially in terms of databases and applications and very, very crucial for cloud based migration. So plan it end to end. Have a plan for CI CD pipelines. Have a plan for how many users are going to use it and how many users will be onboarded on the migration phase, how many of them will be testing that in the destination system. And when the BAU starts with the destination system, how will I onboard the application users, the database user, how will I make sure the entire structure that is there in the current system is not damaged when I go into the target system, that's what you will be planning and you will always be keeping security and governance as your pillars to decide any and all decisions, right? When you're choosing a product out there, when you're building a pipeline, whatever you do, even physically moving the storage disk to one place to another, you will be thinking about security and governance first, right? And then yes, that's the next thing, right? You develop some pipeline. How will you test it? So testing can occur in multiple different phases. Always plan for intermediate testing phases where you will build a pipeline, do some transformation, put it out there and have the UAT user based testing as well onboard simple beta users to make sure your target is performing all the way through every Persona that is the database administrator and the application administrators and your end users are able to use the target system with the test amount of data, at least to the BAU level, not to the disaster level. We will get into that, but at least to run the BAU. Will the target be efficient enough? What are the things I have to do next? So test in phases, test for all personas and have a foolproof testing mechanism for even if it is a storage, have a byte level testing. Have a way to test any kind of scripts that you are running, right? Any kind of intelligence that you are learning and any kind of business reports that you are running. Just try to open that and see what is happening. Is this adequate? Just try two see in the layman level as well and write up to the administrator level. So that's the testing mechanism in post migration audits. So of course every migration should follow with an audit to make sure whatever process that is followed throughout the case is done correctly. Right. Most fundamental problem that you run into in a migration project is can data capture, right? How will you make sure your source system and target system are almost on par and you are going to go live within like two days or so? How will you boot those two days data? Two here. I'm of course taking a vanilla example, but actually it could be running in parallel for months or end and you have a change data capturing mechanism to make sure all your systems, whole systems data move to target. So of course there are a lot of products out there to attempt. Two solve this in different aspects, but mainly those are in two categories, log based and query based. Log based will take the logs from the source system, reverse engineer it and produce the data in your destination system. Query based will happen with a time based query. So I have done a migration by second and I'll go 1 hour 50 five second. And between these timestamps, whatever happened in your source system has to be copied to target system. So these are the main can data capture aspects in different kind of databases. It is handled differently. If you see Oracle, you'll be able to transfer two a pipeline and most of the cloud projects has pre built pipelines for making sure CDC is captured right. So think about really, really in a zen manner. You will not never be on par with your source system ever. You will always lag behind and how two smartly lag behind and how to make sure the switching off of your old system and moving to the new system happens little bit, might be minutes or seconds at least. There could be a minimum data loss. So think about that and design your system accordingly. One way to think about it is offline. Taking it offline. So I mean about taking it offline is that taking the source system offline and then bringing the destination system online and making sure the intermediate data is being transferred here, otherwise doing online, then you might have to think about approaching all the service by service. And there are plenty of ways to bring up the system online. Also, CDC is a mathematical problem, so there is no way you will get equal to or ahead of target. So that is not possible. Not at least if we are not talking about quantum computers, right, quantum computers and those kind of unrealistically small realm. There could be instantaneous data transfer, but technically that is not possible in our reality. So let's focus on database migration. So in database migrations, the key things you are looking at are the data has to be transferred completely. There shouldn't be any problem in bits or bytes. Then your database will not come up, of course, and duplicate free. So every time when you think of a migration, you also think of a cleanup and what is not currently working well in my database, shall I redesign and improve the indices or any other schema level changes to make sure the quality gets better? And when you try that kind of refactoring aspect of it, you have to think about data being duplicate free. And the order order is really important. So the timestamp wise or alphabetic, whichever business sense it makes, those order has two be maintained consistently in the source to destination. Right? So that's a very brief thing about database migration. You have plenty of ETL tools out there you can choose from, all the way from freeware to most developed ETL pipelines that can be adapted for your use. Case cloud migrations cloud migrations, of course is a relatively new thing for enterprise scale. But for a startup, if you are not developing in cloud, where are you developing? That's the question, right? So yeah, in cloud migrations you have to think about a lot of things. So migration project itself is mainly thought about with this six r's, right? These six r's will carry forward in application migration, database migration everywhere. But in cloud migration, it's the soul of it. That's how you decide from doing one thing or the other. So everything comes from your six odds. Once you explore assets, what are the source system it has? You will go into six r's, assign the respective r to this list of systems. You have found out that it is there in your source. So in that you see rehost, replatform, repurchase, retain, retire, or refactor so rehosting is left the entire thing. Put it in your cloud platform and start it back so the rehosting will have lift and shift approach. It's a very basic strategy may not apply for the quality improvement on any specific cloud based solution. If you are thinking about then rehosting may not be the right solution. But if you are going from one cloud provider to another, lift and shift could be very good. Like the same solution, the equivalent service will be provided by another cloud platform as well and rehost can work replatform to make sure the platform is more in line with cloud principle like thinking about can CD and automatic testing deployment strategies and your kubernetes containers, all those kind of things when you think about these will be replatform structure repurchase. When you have old ecommerce kind of software running and you want to move to cloud at that time, you will have same kind of SaaS options. Why not go for that? That is repurchase. Sell the like, stop the services of your old ecommerce solution and use something that is more in line with.
...

Sindhuja Nagarajan

Data Migration / Data Engineering Lead @ Chargebee

Sindhuja Nagarajan's LinkedIn account Sindhuja Nagarajan's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways