Conf42 Cloud Native 2021 - Online

Enabling multi-cloud and breaking vendor lock-in with Cloud Sidecar

Video size:

Abstract

Many companies want to become multi-cloud or have the ability to switch clouds. However, cloud providers try hard to lock in customers with proprietary services like storage. Cloud Sidecar provides a simple way to take existing software and deploy it to different clouds without complex rewrites.

Cloud providers offer numerous services that abstract away common problems from software developers. No longer do companies need to manage their own file storage solutions, message queues, key-value stores, etc… The problem is, once you start building software on top of these services you get locked into that specific cloud provider. This is better known as vendor lock-in. Of course, the more services you use the harder it is to go multi-cloud or switch clouds.

Cloud Sidecar solves the problem of vendor lock-in by converting requests from one cloud’s API to another cloud’s API. So if your applications use Amazon Web Services’ S3 and SQS, you can easily deploy Cloud Sidecar next to your application and now it automatically uses Google Cloud’s GCS and Pubsub. Learn about Cloud Sidecar, how it works, how easy it is to modify your software to use it, and how you can deploy it.

Summary

  • Larry Finn: All successful companies need to be multi cloud. Multicloud and cloud migration is just not easy. Finn: As you use a cloud provider, you start using more and more services. Cloud sidecar can help you take your existing or new software and make it work on multiple clouds.
  • For some notable companies, being able to go multi cloud or to switch clouds is a way that you could actually optimize your bills. Being multicloud is an exceptional level of redundancy. There's the infrastructure problem and the software problem.
  • An image uploading system using cloud sidecar to handle multiple clouds. Both s three and gcs support file conversion. Both support AWS to GCP conversion. How can you actually use this at your company or your company?
  • For big data on AWS we use redshift, which is somewhat related to postgres and has a postgres interface. Cloud sidecar uses this config to realize if you actually want to connect to AWS as a destination or Google Cloud Bigquery's destination. We're constantly improving this functionality of cloud sidecar.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi. Imagine a scenario. You wake up 08:55 a.m. Your alarm goes off. You roll out of bed, walk over to your desk. This is now your office. Since we're all working from home during the pandemic, you open up your laptop and you see there's a meeting scheduled at 09:00 a.m. With your CTO. Of course, that meeting wasn't in existence yesterday and the evening, but that's how your CTO rolls. So you quickly fix yourself up, make yourself look presentable, brush your teeth, and dial into the Zoom meeting. Now your CTO is going a mile per second, babbling about something, and eventually your CTO gets to the point. I read an article that said all successful companies need to be multi cloud. Get us multicloud by the end of the half. And then your CTO hangs up. Now you're staring at that prompt in Zoom that says your meeting has ended and you're pretty angry. Half of you angry because you haven't had your coffee yet, and the other half of you angry because you had this giant project dropped in your lap. Multicloud and cloud migration is just not easy. The reason it's not easy is because of this concept called vendor lock in. As you use a cloud provider, you start using more and more services that they provide. So imagine you start to use s three for storage. You start to use kinesis for streaming, et cetera. And these services make your life and software easy to build. But the more you use these services, these more you get stuck with that vendor. You get locked in with that vendor because they are more or less proprietary to that vendor. So now you have this herculean task to break through vendor lockin so that you could go multi cloud. But then you wonder, is these a way to say screw? The harder way we're doing this, the smarter way? Hi, I'm Larry Finn, these author of Cloud Sidecar, and I'm going to talk to you about how cloud sidecar can help you take your existing or even new software and make it work on multiple clouds or to switch clouds. Now, beyond that silly story about your CTO saying you have to go multi cloud, there are a variety of reasons to actually go multi cloud in this infographic. From the information, it shows the rising costs of cloud bills, specifically AWS bills. For some notable companies, being able to go multi cloud or to switch clouds is a way that you could actually optimize your bills and try to keep your bills lower, maybe pit some clouds against each other. We've all been there where your app or your website is just running slow. So you go to your cloud provider status page and it says everything is AOK. But then you realize half the Internet isn't working, so everything is not AOK. Your cloud provider is clearly on fire. Now if you were multicloud, you could easily fail over to another cloud and have no interruptions. I'm going to be honest here, it's probably best first step to be redundant within multiple data centers or regions within a single cloud become redundancy to multiple clouds. But being multicloud is an exceptional level of redundancy. Some clouds just provide better services than others. So one cloud might be great at storage, another cloud might be great at big data, and a third cloud might be great at machine learning. Wouldn't it be amazing if you could pick and choose the features of each cloud, mix them together and make your product the best it possibly can be? And finally, cloud providers are not just single companies that provide cloud services. They're actually these giant corporations that do a lot of things. So Amazon is a huge retailer, Google is the largest ad tech company and Microsoft apparently these put trackers in vaccines. Now you might be working with companies that might not feel comfortable to be working with these cloud providers. So imagine you are a b to b company. You're hosted on Google Cloud, you do something with data and a prospective company is an ad tech and doesn't want to use you because you are on Google Cloud. Actually going multi cloud or switching clouds is a two piece problem. There's the infrastructure problem and the software problem. I'm not going to really go into the infrastructure problem, that's not my strong point. And it's mostly a solves problem. You can use infrastructure as code, like terraform, and be able to build your infrastructure on a different cloud, or use kubernetes and just pick up all your infrastructure and just move it to a new cloud. But what I'm going to talk about is how do you handle your software? How do you make that multi cloud or enable it to switch clouds? Imagine you have a piece of software that works on AWS and you want it to work on GCP. AWS is Amazon and GCP is Google. So it's a simple python application. You uses these boto library, which is the Python Amazon library, and you're connecting to s three for storage and sqs for queues. Now how do you make this work on GCP? Well, you would have to build a storage abstraction library or some sort of library layer that sits on top of both boto library and the Google Cloud storage library. And your code would interact with this storage abstraction library. And underneath the hood it would switch between boto to go to Amazon or Google Cloud storage to go to gcs, which is the Google storage. Now your abstraction would have to be a single interface that supports functionality in both clouds, and you have to do a similar thing for q abstraction library. You would have to wrap around the boto library and wrap around Google Cloud pubself library. Now imagine you have that cto that create cto that decided that your company should be polyglot so that any engineer can choose any language they want. Because what's the harm in that? So now your system has some python, some golang, some scala, some node js, some Java, maybe some esoteric languages like, I don't know, clojure, Erlang, Haskell, whatever, not knocking any of these languages, but that's a lot of languages to support. Now you're also using a lot of cloud services. So you use s three for storage, sqs for queues, kinesis for streaming data, dynamodb for document store, and redshift for big data. So you have these n languages and these m services. So now you have to build those wrappers of n times m. Like each language has to have wrappers for each of these services. And that's a lot of libraries and code to write. Imagine also you're a pretty big company, you've invested in microservices. So in this example there's 1500 plus microservices. So now not only do you have to change all those codes in different languages for different services, you have to do it in 1500 plus applications which might be using cloud services in vastly different ways, might be accessing and using different libraries. You might not be a very dry company, so this could be a deploy nightmare and cloud be really complicated to roll out. So this is where cloud sidecar can help. You can help solve these problems. Now imagine you have that python application that uses the Boto library to talk to AWS. What cloud sidecar does is it gets deployed next to your application in the sidecar design pattern. A sidecar design pattern just means an application that sits next to your application and helps your application. These is a well known design pattern. You have it in zookeeper console, any service, mesh like Linkard. So you have cloud sidecar sitting next to your application. Your application talks to cloud sidecar instead of the cloud itself. So you tell it to talk to cloud Sidecar, it thinks it's talking to Amazon. In this example, cloud Sidecar translates all the requests. The API requests to Google Cloud gets the responses from Google Cloud and translates it back to Amazon. So your Python application doesn't change at all. Thinks it's still talking to Amazon, but it indeed is talking to Google Cloud now, because this is transforming API requests. Cloud Sidecar can support any language as long as it has like a cloud library. And also in theory it could support any cloud because it's just translating between API requests, getting a bit into the nitty gritty of how the software actually works. Imagine you have your application with the cloud library, whatever it happens to be, and you could deploy cloud sidecar on this machine. Let's say you're using like an EC two machine, you deploy it next to it and you configure application to connect to localhost via the cloud library. Or if you're using Kubernetes, there's a sidecar design pattern built into Kubernetes and you deploy it that way. Now cloud sidecar is running, and it exposes an HTTP router to handle these API requests, and a configuration to drive which different types of APIs are supported and which ports they're on. So we can have s three or an s three interface exposed on a specific port that your applications connects on. And when your application connects on that port, it gets routed to the correct handler. So for example, it'll get routed to the s these handler on port 3450. Now once your request goes to the s three handler, it will convert the request from AWS, and it might just send it directly to s three AWS and just be passed through. Or it might interpret the request converting it to Google Cloud storage, get the response back from Google Cloud storage and send everything back, making it look like it came from s three. So your application thinks, for all intents and purposes, it's talking to s three. Or in this example, SQS or kinesis. Cloud sidecar is written in Golang, and there's several good reasons for this. This is a sidecar which runs next to your application. So you want a very minimal footprint, very small memory and cpu footprint. And Golang is great for this. It's not like JVM, which will eat all the memory on your machine. Go is also a very simple language which is great for an open source project where people can easily dive into it, contribute code, modify code, et cetera, and there's a lot of open source libraries that it can use and take advantage of. And finally, go is a very performant language. It has a very nice concurrency system and it also has a lot of low level constructs. Now it might not be as performant as languages like Rust or C Plus plus, but because of the simplicity of the language, go wins out over those languages. For this project I'm going to show you a demo application that kind of will wrap this around into a little more of a concrete example. So imagine you run a image conversion website, we've all been there, where you have a png file and for some reason you need a jpg file. So you google png to jpg conversion, find a website, click upload, upload your png and you get a link to the jpg and you're happy. So you're running this website and it's actually doing great. It's a great website, makes good money on ads, but AWS is charging you more and more and more for their s three services and you look into it and Google cloud storage is way cheaper. So you wonder how can I pick up my system and move it to google Cloud? So let's look at the architecture you have. You have this scala application using the play framework, which is the front end framework, and uses interact with it. They click upload and it will upload a png to s three and then it will put a message on Sqs with that your s three URL. Now on the other end you have a python worker that's listening to Sqs for these messages, converts these image from Png to Jpg using some open source library and then uploads the result into s three, which then the UI displays. It's actually a great architecture because you can scale up the workers as much as you want and you could scale up the front end as much as you want. S three and sqs can handle a lot of load. So I've actually pre recorded the demo, which this whole thing is prerecorded, but demo I'm going to actually show you is this image uploading system using cloud sidecar to handle multiple clouds. This is the UI I'm going to dive into the scala code here. It's just a simple controller where I inject two modules. I created an s three module and an SQS module. Now the s three module uses the standard AWS Java library and it just sells it to connect to localhost on a certain port. And now when I interact with s three, I interact with it the way I always do in Java. Similarly, here I am using the SQS library and just connecting to localhost on a different port. Now in the worker I'm using boto, three similar sqs, s three, and I'm just passing it an endpoint URL, whatever port I want to use, I'm going to start up the worker, just running one worker because what I scale. So I'm going to show you my s three bucket. Show you there's nothing these right now I'm going to show you my GCS bucket. Nothing there. Right now there's nothing up my sleeves. I'm not even wearing sleeves, but that's not important. So now I'm going to upload an image to my application. Everything's running and cloud sidecar is running, pointing to AWS. So what happens is my play application is going to upload to s three, worker picks off a message from sqs and uploads to s three, and then I'm going to actually see this in my UI as a resultant image. Now if we look at the Amazon bucket, we see the JPG and the PNG, the source and destination, and on Google Cloud we see nothing. So now I'm going to restart cloud sidecar with a different configuration. This one will point to gcs. Typically you don't even need to restart cloud sidecar, it dynamically loads configs, but it's easier for the demo. As you'll note, I didn't actually restart my worker or my skull app. So now I'm going to upload a new image. It's going to actually go to GCS. A message is going to go to pub sub, the worker is going to read off pub sub, convert it and upload to gcS. And then we will see the image here. New image. And as you can see, the source and destination image are now in gcs and they're not in s three. Really cool stuff. Now that you've seen the demo, what are all the features of cloud sidecar and how can you actually use this at your company or your site? Right now we have two editions, the community edition and the enterprise edition. Community is completely open source, the enterprise is open core, but this might all just merge into open source. It's just things we're playing with. So as I mentioned before, both of these support any language, any programming language that you're using, and they both support AWS to GCP conversion. As these main thing, I've been adding GCP to AWS conversion a little bit by little bit, but no other clouds yet. For file storage they both support s three to gcs, and for file storage they also both support gcs to s three. For queues we support sqs to pub sub and for breaking data we support kinesis to pub sub. And I'll go into what these queues and streaming data solutions actually are like. Cloud Sidecar also supports customizable plugins. So imagine if we didn't support sqs. You could write a plugin to support sqs and just drop it in and cloud sidecar will pick it up. You don't even need to recompile cloud sidecar. We also support customizable middleware. So if you want to write some sort of logging, middleware, metrics, encryption, et cetera, you could write that really easy and drop it in and configure it. Now in the enterprise versions we support a few more functionalities. We support key value, which is DynamoDb to datastore or bigtable. We support big data, so we support redshift to bigquery, SQL conversions. And also for big data we support customizable stores, procedure style plugins. So basically, if you've used a lot of databases, SQL doesn't always convert to the most optimized SQL and different dialects. These, you could write your own plugins that look like stored procedures, call them from queries, and then have cloud Sidecar call that plugin which generates a different query based on your end destination so you can make it as optimized as you want. We also support metrics, integration, statsd and datadog because, well, that's the big dog. For s three to gcs, we support list one and list two. The two types of listing ACL head get put, multipart upload, delete, multidelete, and copy. Now, I've kind of glossed over something, but these different services in different clouds, while they're super similar, there are small caveats of how they're different. And one important one for s three to GCS is s three supports something called multipart upload, which GCS does not support. So what's multipart upload? Imagine you have this really large file, let's say ten gigabyte file, and you want to upload it. Now, uploading a ten gigabyte file is going to take a long time, and then you might get interrupted and it's a pain in the bum. So with s three, what you could do is say, I want to create a big file upload at this destination URL, and s three will say, cool, you just need to upload all the parts to this URL and tell me when you're done. So what you do on your client side is you split up this ten gigabyte file into 1gb chunks, and then you could upload them in parallel or however. And once they're all done, you tell s three I'm done. It merges those chunks together and puts it at the destination. GCS does not have multipart upload, but it has something called combine, which will combine some amount of elements already uploaded to gcs. So underneath the hood we use that to mimic multipart upload. Now, there's other caveats with the s three API that I've discovered along with GCS. I could talk about it at length. I won't bore you, but if you're ever interested, just hit me up and we could chat about it. For GCs s three, which is in beta feature we just recently added, we support list, ACl, get put, resumable upload, combine, delete and copy. Similar to the s, these gcs incompatibilities, resumable upload and combine are not natively supported in s three, so we mimic them using some of the existing functionality in s three. Great. So going on to queues and streams, I want to explain the different queue offerings of the clouds and then what we offer on top of them. So on the far left you have sqs, the simple queue, which is the simplest one. Basically you create a queue, you post messages to it, and then workers can listen to these queue and workers will alternate who actually gets the message. So it's great for worker pools. On the other end of the spectrum is kinesis, where you'll create topics, drop messages into that topic, and they just flow through. And whoever's consuming these messages, they keep track of the offset of the message they last read, and then they just keep reading messages after that offset. Kinesis is a lot more similar to Kafka, if you're used to Kafka. So SQs and kinesis are both AWs. Somewhere in the middle is pub sub, which is these GCP offering, Google cloud offering. And this is sort of like RabbitMQ if you've ever used it. You post messages to the pub sub topic and you can actually create an arbitrary amount of queues attached to the topic, which will start accumulating messages once they're published to the topic. And then the queues kind of works like sqs, where you could have workers attached to them and they'll retrieve messages in some alternating manner. They don't have any offsets or whatever. So for message queue we support sqs to pub sub, and these functionality we support list, create, purge, delete, send, send, batch, receive, delete, message and delete message batch for kinesis to pub sub we support getrecords, get shard, iterator, describe, publish, create stream and delete stream. Now for NoSQL, which is mostly a document store key value store, we support DynamoDB to Datastore. Just so you know, I'm not even sure if it's still called Datastore. They might call it Firestore because Datastore took over firebase. All this crazy Google cloud stuff. But the functionality we support are getitem query, scan, put item, update item and delete item. Now, interesting caveats here between DynamoDB and Datastore. DynamoDB has the concept of query and scan, which both search for items or filter for items, but use different methods for it. Datastore does not have that. So these two function calls end up being the equivalent in datastore of one function call. And also Dynamodb has like a pretty complicated JSon nested filtering updating system. Datastore does not have such a system. It's a simpler language, so we try to convert it as best as possible. We're constantly improving this functionality of cloud sidecar. So for big data, this is kind of an outlier where all the other systems I spoke about have a simple pretty much rest API. Maybe there's XML. If it's s three, that's how old s three actually is. But for big data on AWS we use redshift, which is actually somewhat related to postgres and has a postgres interface. So for cloud sidecar we actually expose a postgres interface for big data. So your application is not even a cloud library you uses, you might be using however you connect to a database. So it might be JDBC or whatever you use to connect to a postgres database, you point it to connect to cloud sidecar instead of redshift. Cloud sidecar of course is using this config to realize if you actually want to connect to AWS as a destination or Google Cloud Bigquery's destination. If it's redshift, it just kind of proxies the queries through to redshift and proxies the response back to you. If it's bigquery, it will consume the SQL, interpret it and converting it to a bigquery SQL and then respond with parse the response and return it as if it was a postgres or redshift response. Now, if you actually use our special stored procedure plugins, there's special commands in your SQL to call those stored procedure plugins. And if we see that we'll call your code and pass to what the destination is trying to actually trigger. And in your code you could say, hey, if this is called with certain parameters, redshift is my destination, create a SQL query like this. But if bigquery is my destination, create a SQL query like that. And this is all dynamic and really you could optimize it as much as you want. So for our general SQL conversion in big data, we support, insert, select, delete, unload, which is basically to export data, copy, which is basically to import data, rename table, create table and drop table. Great. So I've told you some arbitrary example demo of how this could work in these real world and all our functionalities, but who's actually using this cloud sidecar is mostly a side project of mine, but my day job has been at a company called ActioniQ or AIQ. And we're what's known as a CDP, a customer data platform. So maybe you could interpret it from a picture. What a customer data platform does is it works with large companies, sucks up all this large company's data from various data sources, internalizes it, makes it usable, makes it so that this data can actually be built into user audiences and then those audiences can be exported into external systems like email, advertisements, et cetera. So let me give you a concrete example. Action Xu has Michael Kors as a customer, and Michael Kors has data about online orders, online clicks, online returns, in store purchases, in store returns, et cetera. These might be stored in various data stores. Action IQ sucks all that data in. And these we have a really nice uses interface that lets marketers working at Michael Kors to build segments or audiences of customers based on some criteria. So they might want to find out who paid over $3,000 in shoes last year, but has not bought shoes this year. So they could quickly put that into our system, drag and drop see account in really quick time, realize that this is like a nice audience size to target. Probably could get us some good sales and then they could send those customers a coupon via email and then a cool dance video via TikTok. Because everyone likes dance videos or TikTok. I don't know. I don't use TikTok. So Action IQ has typically been on Amazon, but we started getting approached or approaching customers. That said, they do not want to be on Amazon for multiple reasons. They would prefer to be on Google Cloud. So we went under taking a project to go multi cloud with Google Cloud as our second cloud provider. Of course we did the infrastructure, which I'm not going to get into, but we were able to leverage my project cloud sidecar, to make our software work on both Google Cloud and AWS with very little code change. Since we are a big data company, we use a lot of the AWS services to really make it so that we can move data very simply. So being able to just deploy cloud sidecar with our application meant that we were able to save a lot of time making code changes, et cetera. And we actually were able to release multicloud before our deadline, which is shocking to me because I've never released before deadline and made our customer very happy. We've been using cloud Sidecar in production for many months to years now. Of course we found bugs and we fixed them and big help in us going multi cloud and making that customer successful. Great. So I've told you everything about cloud Sidecar. You heard my spiel. Please try it out. Go cloud sidecar.com for more information. We have a link to our GitHub@GitHub.com slash cloud Sidecar. Try it, download it, play around with know you cloud, contribute to it. Our open source project. You could create issues, all that good stuff. Also, if you're interested in the enterprise offering, if you're like a big company, feel free to reach out to me through the website or email me at Larry Cloud sidecar.com. I'm also always free to talk about multicloud or anything really, so hit me up. I'm always around. Got nothing else going on during the pandemic. If you want to see the image uploader, that code is real. You could go to GitHub Lawrence Finn Cloud Sidecar Image demo and if you want to just follow me on Twitter. I'm at Lawrence Finn have three followers. One is my mom and one is my cat. So I don't know, maybe there's another follower out there. And I'm also on LinkedIn. If you haven't used that.
...

Lawrence Finn

Author @ Cloud Sidecar

Lawrence Finn's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)