Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi. Imagine a scenario. You wake up 08:55 a.m.
Your alarm goes off. You roll out of bed, walk over to your
desk. This is now your office. Since we're all working from
home during the pandemic, you open up your laptop and you see there's a meeting
scheduled at 09:00 a.m. With your CTO. Of course, that meeting wasn't
in existence yesterday and the evening, but that's how your CTO
rolls. So you quickly fix yourself up, make yourself look presentable,
brush your teeth, and dial into the Zoom meeting. Now your CTO
is going a mile per second, babbling about something,
and eventually your CTO gets to the point. I read an article
that said all successful companies need to be multi cloud. Get us multicloud
by the end of the half. And then your CTO hangs up. Now you're
staring at that prompt in Zoom that says your meeting has ended and
you're pretty angry. Half of you angry because you haven't had your coffee yet,
and the other half of you angry because you had this giant project dropped in
your lap. Multicloud and cloud migration is just not easy.
The reason it's not easy is because of this concept called vendor lock in.
As you use a cloud provider, you start using more and more
services that they provide. So imagine you start to use s three for
storage. You start to use kinesis for streaming,
et cetera. And these services make your life and software easy to
build. But the more you use these services, these more you
get stuck with that vendor. You get locked in with that vendor because
they are more or less proprietary to that vendor. So now you have this
herculean task to break through vendor lockin so
that you could go multi cloud. But then you wonder, is these a way
to say screw? The harder way we're doing this, the smarter way?
Hi, I'm Larry Finn, these author of Cloud Sidecar,
and I'm going to talk to you about how cloud sidecar can help you
take your existing or even new software and make it
work on multiple clouds or to switch clouds. Now,
beyond that silly story about your CTO saying you have to go multi cloud,
there are a variety of reasons to actually go multi cloud in this
infographic. From the information, it shows the rising costs
of cloud bills, specifically AWS bills. For some notable companies,
being able to go multi cloud or to switch clouds is
a way that you could actually optimize your bills and try to keep
your bills lower, maybe pit some clouds against each other.
We've all been there where your app or your website is just
running slow. So you go to your cloud provider status page and it says
everything is AOK. But then you realize half the Internet isn't working,
so everything is not AOK. Your cloud provider is clearly on
fire. Now if you were multicloud, you could easily
fail over to another cloud and have no interruptions.
I'm going to be honest here, it's probably best first step
to be redundant within multiple data centers or regions
within a single cloud become redundancy to multiple clouds.
But being multicloud is an exceptional level of redundancy.
Some clouds just provide better services than others. So one cloud
might be great at storage, another cloud might be great at big data,
and a third cloud might be great at machine learning. Wouldn't it be amazing if
you could pick and choose the features of each cloud, mix them
together and make your product the best it possibly can be?
And finally, cloud providers are not just single companies
that provide cloud services. They're actually these giant
corporations that do a lot of things. So Amazon
is a huge retailer, Google is the largest
ad tech company and Microsoft apparently these put
trackers in vaccines. Now you might be working with
companies that might not feel comfortable to be working
with these cloud providers. So imagine you are a b to b company.
You're hosted on Google Cloud, you do something with data and
a prospective company is an ad tech and doesn't
want to use you because you are on Google Cloud. Actually going
multi cloud or switching clouds is a two piece problem.
There's the infrastructure problem and the software problem.
I'm not going to really go into the infrastructure problem, that's not my strong
point. And it's mostly a solves problem. You can use infrastructure as code,
like terraform, and be able to build your infrastructure on a
different cloud, or use kubernetes and just pick up all your infrastructure and
just move it to a new cloud. But what I'm going to talk about is
how do you handle your software? How do you make that multi cloud or
enable it to switch clouds? Imagine you have a piece of software that works on
AWS and you want it to work on GCP. AWS is
Amazon and GCP is Google. So it's a simple python application.
You uses these boto library, which is the Python Amazon library,
and you're connecting to s three for storage and sqs for queues.
Now how do you make this work on GCP? Well, you would have to build
a storage abstraction library or some sort
of library layer that sits on top of
both boto library and the Google Cloud storage library.
And your code would interact with this storage abstraction library.
And underneath the hood it would switch between boto to go to Amazon or Google
Cloud storage to go to gcs, which is the Google
storage. Now your abstraction would have to be a single interface
that supports functionality in both clouds, and you have
to do a similar thing for q abstraction library. You would have to wrap
around the boto library and wrap around Google Cloud
pubself library. Now imagine you have that cto that
create cto that decided that your company should be polyglot so
that any engineer can choose any language they want. Because what's the
harm in that? So now your system has some python,
some golang, some scala, some node js, some Java,
maybe some esoteric languages like, I don't know,
clojure, Erlang, Haskell, whatever, not knocking any of these languages,
but that's a lot of languages to support. Now you're also using a lot of
cloud services. So you use s three for storage, sqs for
queues, kinesis for streaming data,
dynamodb for document store, and redshift for
big data. So you have these n languages and
these m services. So now you have to
build those wrappers of n times m. Like each language
has to have wrappers for each of these services. And that's a lot
of libraries and code to write. Imagine also you're a pretty big company,
you've invested in microservices. So in this example there's 1500
plus microservices. So now not only do you have to
change all those codes in different languages for different services,
you have to do it in 1500 plus applications which might be
using cloud services in vastly different ways,
might be accessing and using different libraries. You might not be a very dry
company, so this could be a deploy nightmare and
cloud be really complicated to roll out. So this is where cloud
sidecar can help. You can help solve these problems. Now imagine
you have that python application that uses the Boto library to
talk to AWS. What cloud sidecar does is it gets
deployed next to your application in the sidecar design pattern.
A sidecar design pattern just means an application that sits next to your
application and helps your application. These is a well
known design pattern. You have it in zookeeper
console, any service, mesh like
Linkard. So you have cloud sidecar sitting next to your application.
Your application talks to cloud sidecar instead of the cloud
itself. So you tell it to talk to cloud Sidecar,
it thinks it's talking to Amazon. In this example,
cloud Sidecar translates all the requests. The API requests to
Google Cloud gets the responses from Google Cloud and
translates it back to Amazon. So your Python application doesn't
change at all. Thinks it's still talking to Amazon, but it indeed
is talking to Google Cloud now, because this is transforming API
requests. Cloud Sidecar can support any language
as long as it has like a cloud library. And also in theory
it could support any cloud because it's just translating between API requests,
getting a bit into the nitty gritty of how the software actually works.
Imagine you have your application with the cloud library,
whatever it happens to be, and you could deploy cloud sidecar
on this machine. Let's say you're using like an EC two machine,
you deploy it next to it and you configure application to connect to
localhost via the cloud library. Or if you're using Kubernetes,
there's a sidecar design pattern built into Kubernetes and you deploy it that way.
Now cloud sidecar is running, and it exposes an HTTP router to handle these
API requests, and a configuration to drive which
different types of APIs are supported and which ports they're on.
So we can have s three or an s three interface exposed
on a specific port that your applications connects on. And when
your application connects on that port, it gets routed to the correct handler.
So for example, it'll get routed to the s these
handler on port 3450. Now once your request goes to
the s three handler, it will convert the request from AWS,
and it might just send it directly to s three AWS and just be passed
through. Or it might interpret the request converting it to Google
Cloud storage, get the response back from Google Cloud storage and
send everything back, making it look like it came from s three.
So your application thinks, for all intents and purposes, it's talking to
s three. Or in this example, SQS or kinesis.
Cloud sidecar is written in Golang, and there's several good reasons
for this. This is a sidecar which runs next to your application.
So you want a very minimal footprint, very small memory and
cpu footprint. And Golang is great for this. It's not like JVM,
which will eat all the memory on your machine. Go is also a very
simple language which is great for an open source project where
people can easily dive into it, contribute code,
modify code, et cetera, and there's a lot
of open source libraries that it can use and take advantage of.
And finally, go is a very performant language. It has
a very nice concurrency system and it also
has a lot of low level constructs. Now it might not be as performant as
languages like Rust or C Plus plus, but because of the simplicity
of the language, go wins out
over those languages. For this project I'm going to show you a demo application
that kind of will wrap this around into a little more of a
concrete example. So imagine you run a image conversion
website, we've all been there, where you have a png
file and for some reason you need a jpg file. So you google png
to jpg conversion, find a website, click upload, upload your
png and you get a link to the jpg and you're happy. So you're running
this website and it's actually doing great. It's a great website, makes good
money on ads, but AWS is charging you more
and more and more for their s three services and you look into it and
Google cloud storage is way cheaper. So you wonder how can I pick up
my system and move it to google Cloud? So let's look at the architecture you
have. You have this scala application using the play framework,
which is the front end framework, and uses interact
with it. They click upload and it will upload a png
to s three and then it will put a message on Sqs
with that your s three URL. Now on the other end you have a python
worker that's listening to Sqs for these messages, converts these
image from Png to Jpg using some open source library
and then uploads the result into s three, which then the UI displays.
It's actually a great architecture because you can scale up the workers as
much as you want and you could scale up the front end as much as
you want. S three and sqs can handle a lot of load.
So I've actually pre recorded the demo, which this whole
thing is prerecorded, but demo I'm going to actually
show you is this image uploading system
using cloud sidecar to handle multiple clouds. This is the UI
I'm going to dive into the scala code here. It's just a simple controller where
I inject two modules. I created an s three module and an SQS
module. Now the s three module uses the standard AWS
Java library and it just sells it to connect
to localhost on a certain port. And now when I interact
with s three, I interact with it the way I always do in Java.
Similarly, here I am using the SQS library
and just connecting to localhost on a different port. Now in the worker I'm using
boto, three similar sqs, s three, and I'm just
passing it an endpoint URL, whatever port I want to use,
I'm going to start up the worker, just running one worker because what
I scale. So I'm going to show you my s three bucket. Show you there's
nothing these right now I'm going to show you my GCS bucket. Nothing there.
Right now there's nothing up my sleeves. I'm not even wearing sleeves, but that's not
important. So now I'm going to upload an image to my application.
Everything's running and cloud sidecar is running, pointing to AWS. So what happens
is my play application is going to upload to s three, worker picks off
a message from sqs and uploads to s three, and then I'm going
to actually see this in my UI as a resultant image. Now if
we look at the Amazon bucket, we see the JPG and the PNG, the source
and destination, and on Google Cloud we see nothing. So now I'm
going to restart cloud sidecar with a different configuration.
This one will point to gcs. Typically you don't even need to
restart cloud sidecar, it dynamically loads configs, but it's
easier for the demo. As you'll note, I didn't actually restart my worker or my
skull app. So now I'm going to upload a new image. It's going to actually
go to GCS. A message is going to go to pub sub, the worker is
going to read off pub sub, convert it and upload to gcS. And then we
will see the image here. New image. And as you can
see, the source and destination image are now in gcs
and they're not in s three. Really cool stuff.
Now that you've seen the demo, what are all the features of cloud sidecar
and how can you actually use this at your company or your site?
Right now we have two editions, the community edition and the enterprise edition.
Community is completely open source, the enterprise is open core,
but this might all just merge into open source. It's just things
we're playing with. So as I mentioned before, both of
these support any language, any programming language that you're using,
and they both support AWS to GCP conversion. As these main thing,
I've been adding GCP to AWS conversion a little bit
by little bit, but no other clouds yet. For file storage
they both support s three to gcs, and for file storage they also
both support gcs to s three. For queues we support sqs to pub sub
and for breaking data we support kinesis to pub sub. And I'll go into what
these queues and streaming data solutions actually are like. Cloud Sidecar also
supports customizable plugins. So imagine if we didn't support sqs.
You could write a plugin to support sqs and just drop it in and
cloud sidecar will pick it up. You don't even need to recompile cloud sidecar.
We also support customizable middleware. So if you want to write
some sort of logging, middleware, metrics, encryption, et cetera,
you could write that really easy and drop it in and configure it. Now in
the enterprise versions we support a few more functionalities. We support key value,
which is DynamoDb to datastore or bigtable. We support
big data, so we support redshift to bigquery, SQL conversions.
And also for big data we support customizable stores, procedure style
plugins. So basically, if you've used a
lot of databases, SQL doesn't always convert to the most optimized
SQL and different dialects. These, you could write your own plugins
that look like stored procedures, call them from queries,
and then have cloud Sidecar call that plugin which generates
a different query based on your end destination
so you can make it as optimized as you want. We also support metrics,
integration, statsd and datadog because, well, that's the big dog.
For s three to gcs, we support list one and list two.
The two types of listing ACL head get put,
multipart upload, delete, multidelete, and copy. Now,
I've kind of glossed over something, but these different services in different
clouds, while they're super similar, there are small caveats of how they're
different. And one important one for s three to GCS is
s three supports something called multipart upload, which GCS does
not support. So what's multipart upload? Imagine you have this really large file,
let's say ten gigabyte file, and you want to upload it. Now, uploading a ten
gigabyte file is going to take a long time, and then you might get
interrupted and it's a pain in the bum. So with s three, what you could
do is say, I want to create a big file upload at this destination URL,
and s three will say, cool, you just need to upload all the parts to
this URL and tell me when you're done. So what you do on your client
side is you split up this ten gigabyte file into 1gb chunks, and then you
could upload them in parallel or however. And once they're all
done, you tell s three I'm done. It merges those chunks
together and puts it at the destination. GCS does not have multipart
upload, but it has something called combine, which will combine some amount
of elements already uploaded to gcs.
So underneath the hood we use that to mimic multipart upload.
Now, there's other caveats with the s three API that I've discovered
along with GCS. I could talk about it at length. I won't
bore you, but if you're ever interested, just hit me up and we could chat
about it. For GCs s three, which is in beta feature we just recently added,
we support list, ACl, get put, resumable upload,
combine, delete and copy. Similar to the s,
these gcs incompatibilities, resumable upload
and combine are not natively supported in s three, so we mimic
them using some of the existing functionality in s
three. Great. So going on to queues and streams,
I want to explain the different queue offerings of the clouds and then what
we offer on top of them. So on the far left you have sqs,
the simple queue, which is the simplest one. Basically you create a
queue, you post messages to it, and then workers can listen to these queue and
workers will alternate who actually gets the message. So it's great for
worker pools. On the other end of the spectrum is kinesis,
where you'll create topics, drop messages into that topic,
and they just flow through. And whoever's consuming these messages,
they keep track of the offset of the message they last read, and then they
just keep reading messages after that offset. Kinesis is a
lot more similar to Kafka, if you're used to Kafka.
So SQs and kinesis are both AWs. Somewhere in the middle is pub sub,
which is these GCP offering, Google cloud offering. And this is
sort of like RabbitMQ if you've ever used it. You post messages
to the pub sub topic and you can actually create an arbitrary amount
of queues attached to the topic, which will start accumulating
messages once they're published to the topic. And then the queues
kind of works like sqs, where you could have workers attached to them and they'll
retrieve messages in some alternating manner. They don't have any
offsets or whatever. So for message queue we support
sqs to pub sub, and these functionality we support list,
create, purge, delete, send, send, batch, receive,
delete, message and delete message batch for kinesis
to pub sub we support getrecords, get shard, iterator,
describe, publish, create stream and delete stream. Now for
NoSQL, which is mostly a document store key value store,
we support DynamoDB to Datastore. Just so you know, I'm not
even sure if it's still called Datastore. They might call it Firestore because Datastore
took over firebase. All this crazy Google cloud stuff.
But the functionality we support are getitem query,
scan, put item, update item and delete item. Now,
interesting caveats here between DynamoDB and Datastore. DynamoDB has
the concept of query and scan, which both search for
items or filter for items, but use different methods for it.
Datastore does not have that. So these two function calls end up being
the equivalent in datastore of one function call.
And also Dynamodb has like a pretty complicated JSon
nested filtering updating system. Datastore does
not have such a system. It's a simpler language, so we try
to convert it as best as possible. We're constantly improving
this functionality of cloud sidecar. So for big
data, this is kind of an outlier where all the other systems
I spoke about have a simple pretty much rest API. Maybe there's
XML. If it's s three, that's how old s three actually is. But for
big data on AWS we use redshift,
which is actually somewhat related to postgres and has a
postgres interface. So for cloud sidecar we actually expose
a postgres interface for big data. So your application is
not even a cloud library you uses, you might be using however you connect to
a database. So it might be JDBC or whatever you use to connect to a
postgres database, you point it to connect to cloud sidecar
instead of redshift. Cloud sidecar of course is using this config to
realize if you actually want to connect to AWS as a destination or Google
Cloud Bigquery's destination. If it's redshift, it just kind
of proxies the queries through to redshift and proxies the response back
to you. If it's bigquery, it will consume the SQL,
interpret it and converting it to a bigquery SQL and then
respond with parse the response and return it as if it was a postgres
or redshift response. Now, if you actually use our special
stored procedure plugins, there's special commands in your SQL to
call those stored procedure plugins. And if we see that we'll call your code and
pass to what the destination is trying to actually trigger.
And in your code you could say, hey, if this is called with certain
parameters, redshift is my destination, create a SQL query like
this. But if bigquery is my destination, create a SQL query like
that. And this is all dynamic and really you could optimize
it as much as you want. So for our general SQL conversion
in big data, we support, insert, select, delete,
unload, which is basically to export data, copy,
which is basically to import data, rename table, create table and drop
table. Great. So I've told you some arbitrary
example demo of how this could work in these real world and all our functionalities,
but who's actually using this cloud sidecar is mostly a side project
of mine, but my day job has been at a company called ActioniQ or AIQ.
And we're what's known as a CDP, a customer data platform.
So maybe you could interpret it from a picture. What a customer data
platform does is it works with large companies, sucks up
all this large company's data from various data sources,
internalizes it, makes it usable, makes it so that
this data can actually be built into user audiences and
then those audiences can be exported into external systems like email,
advertisements, et cetera. So let me give you a concrete example.
Action Xu has Michael Kors as a customer, and Michael Kors has
data about online orders, online clicks,
online returns, in store purchases, in store returns,
et cetera. These might be stored in various data stores. Action IQ
sucks all that data in. And these we have a really nice uses interface
that lets marketers working at Michael Kors to build segments
or audiences of customers based on some criteria.
So they might want to find out who paid over $3,000
in shoes last year, but has not bought shoes this year. So they
could quickly put that into our system, drag and drop see account in really
quick time, realize that this is like a nice audience size to target. Probably could
get us some good sales and then they could send those customers a
coupon via email and then a cool dance video via
TikTok. Because everyone likes dance videos or TikTok.
I don't know. I don't use TikTok. So Action IQ has typically been
on Amazon, but we started getting approached or approaching
customers. That said, they do not want to be on Amazon for multiple reasons.
They would prefer to be on Google Cloud. So we went under
taking a project to go multi cloud with Google Cloud as our second
cloud provider. Of course we did the infrastructure, which I'm not going to get into,
but we were able to leverage my project cloud sidecar,
to make our software work on both Google Cloud
and AWS with very little code change. Since we
are a big data company, we use a lot of the AWS
services to really make it so that we can move data very simply.
So being able to just deploy cloud sidecar with our
application meant that we were able to save a lot of time making
code changes, et cetera. And we actually were able to
release multicloud before our deadline,
which is shocking to me because I've never released before
deadline and made our customer very happy. We've been using
cloud Sidecar in production for many months to years
now. Of course we found bugs and we fixed them and big help
in us going multi cloud and making that customer successful.
Great. So I've told you everything about cloud Sidecar. You heard my spiel.
Please try it out. Go cloud sidecar.com for more information.
We have a link to our GitHub@GitHub.com slash cloud Sidecar.
Try it, download it, play around with know you cloud,
contribute to it. Our open source project. You could create issues,
all that good stuff. Also, if you're interested in the enterprise offering,
if you're like a big company, feel free to reach out to me
through the website or email me at Larry Cloud sidecar.com.
I'm also always free to talk about multicloud or anything
really, so hit me up. I'm always around.
Got nothing else going on during the pandemic. If you want to see the image
uploader, that code is real. You could go to GitHub Lawrence Finn Cloud
Sidecar Image demo and if you want to just follow me on Twitter.
I'm at Lawrence Finn have three followers. One is my mom and one
is my cat. So I don't know, maybe there's another follower out there. And I'm
also on LinkedIn. If you haven't used that.