Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi folks, thanks to be here for this talk about pragmatic
application migration to the cloud with Quarkus Kotlin,
Hazel, cost and VM. I'm Nicola Frankel. I've been
a former developer and other technical roles.
I worked in the backend fields and mainly
in Java with Javae and spring technologies. And right
now I'm a developer advocate. I work for a company
called Hazelcast. Hazelcast has two products. The first
one is an in memory data grid and you can think about
an in memory data grid as distributed data structures.
So you would have a cluster of nodes and you can short the
data over several nodes or replicate it. And the other
one is that will cause jet and this is in memory stream
processing. Today I will talk to you about
the cloud. There is no denying that today
everybody goes to the cloud. You can think about
it as a sort of gold rush.
And there are good reasons to migrate to the cloud and there are
not so good reasons. Let's talk about the good reasons.
The first reason why you would like to migrate to the cloud is
visibility. That's very important.
Some people think mistakenly that
a we will migrate to the cloud, it will be less expensive.
Well, it might be the case, it probably won't.
But the reason is not about the exact sum.
The fact is in traditional it,
you know, cost of exequician,
you might know the cost of maintenance because
you have contracts. But besides that,
it's very hard to compute exactly the
cost of running piece
of infrastructure. There are people, you don't know really
exactly how much time they spend. There are a lot
of factors that can influence it and well
it's very hard to compute.
Before we talked about the total cost of ownership
and yeah, there are metrics, there are like diagrams,
sheets, whatever, but still it takes a lot of effort
and in the end it's a best guess.
Now, if you migrate to the cloud, it's very simple because you
have your build and it's broken down into, yeah, you use this
service and you use this much memory and this
much data, this much cpu and it costs you
that. So transparency, visibility is the first argument
regarding migrating to the cloud. The second is
flexibility in traditional it.
When you buy your own hardware, you must scale
your hardware according to the maximum peak of usage.
So if you are an ecommerce shop,
you probably will buy your hardware and scale it according to
Black Friday, cyber Monday, this kind of stuff. So very high peak
and during the rest of the year the difference between normal
loads and the peak loads is just waste.
You just bought that hardware for this
peak. The rest of the year, it's wasted.
So the idea is, in the cloud at least you pay average
when you need average, and you pay like much more
when you need much more. So it's very flexible. You can scale
nulli at will. The last argument that I
was not really aware of that, because I work mainly in
large or at least medium companies, is if
you are a small team, even like a single developer,
and you want to develop a product, well, you will
need to acquire hardware. And again,
this is a big step at the beginning when you have nothing,
when you just want to create your business. So the cloud allows anybody
to virtually start.
And they have wonderful ideas, they have the skills
to implement them. They can start their journey like
on day one, nearly on day one. As I mentioned, those are
good reasons, not so good reasons. Well,
do like everybody else. This is the worst reason of all.
And again, let me reinstate it.
If you think you're migrating to the cloud, you will gain
money, or at least waste less money. That's probably a very
wrong assumption, and please be aware of that.
Now imagine you already have software.
How do you migrate? How do you take this software that was
made for the old words for onsite hardware to
the cloud? Well, there are three main paths. The first
is you take it and you move it to the cloud.
That's called lift and shift. The second is,
hey, it's no good, let's rewrite everything. And the
third, in which I would propose you,
advise you, at least that's the gist
of this talk, is to walk the middle path, that is between
the two. So let's first talk about the lift and shift.
Lift and shift is very easy. I mean, the cloud is
just somebody else computer. You just say, oh, instead of
deploying on my side, on my hardware, I will deploy
it on another hardware.
And most of the times it's relatively easy,
because then every cloud provider, they provide some
way to run containers. So you
just containerize your application and
you deploy it. And actually it has a high
chances of working. As expected,
you will be bell to deploy it.
Unfortunately, it might not
be so good in the middle term.
Worst case, it won't run at all. It will be
deployed, but it won't run at all. Because hey, your application,
it expected some hard coded paths
or some local resource
that is not available or available through another
interface in the cloud. Best case, it will run because
you thought about everything. But your application was
not designed to run on the cloud. So wasting some
cpu cycles, wasting some memory that was
not so important when you run on premise.
Now when you run on the clouds, everything counts.
This waste, it might cost you lots.
Actually, perhaps you have already stumbled upon those
twelve factor app sheets and it
lists twelve principles that
cloud native applications must follow. I won't
go through all of them. Here they are as
a reminder. But I have been a
Java developer and let's check what a
standard GVM web application, whether it's
compliant with those principles. Well, the second factor says,
hey, you should declare all your dependency. Your application should
declare all its dependency. Okay, now if we
have a regular GVM web application, it's probably a war
and you expect it to be deployed
on an application server or at least a GSP
servlet container. But you don't declare any dependency.
Okay, remove this issue and say,
now we know how to run self executed.
The jar, they embed the servlet container
itself, so they embed the Tomcat and well,
now it's completely self isolated.
I don't need to declare any dependency. Everything is self contained.
And again, it's wrong. Like the GVM is a
huge dependency and the jar expects
a GVM with a minimal version and it's
not declared. That's one principle
we don't follow. The second one is configuration. Your application
must be easily configurable. But with
traditional wars, that's completely
untrue. What we learned is that we
are using an abstraction. For example,
let's say a data source that is available through a
virtual URL, and then we map this URL to
another real URL. And this is done in every environment.
So you keep your war the same, you promote
the artifact, which is good, and then the configuration is
done on every application server in every environment.
That defeats the third principle,
principle number nine. We must start up fast.
The reason for that is that, hey, let's imagine we
are running on kubernetes. And kubernetes is
super great because when a pod starts misbehaving,
well, you just kill the pods and you start a new one.
But if you start a new one, you assume
that this new one will start fast enough. And guess what?
The GVM was not made for that. The GVM
actually starts quite slowly. And when
it has started at the beginning, it's slow,
it has bad performances, it needs some warm up
time. So another principle that is defeated.
Finally, logging. We must have streaming logs
in containers. The streaming logs is a, we just
write everything in the console. But again,
sublet containers, not so great. Because your
application will write in a file and the sublet
container will write in another file or in multiple files.
Perhaps your application also write in different files.
If it's containerized well, how do we handle
that? There is no single stream of log that we
can follow. Now we've got even more issues
because now it's not only about just the GVM,
but it's about the frameworks that we are using. Spring,
Javae, whatever, they use a lot of reflection.
And again it's a startup performance hits because
at the beginning it will start to load the classes
through reflection. Not great. And if we are
talking explicitly again about spring and Javae,
they will do some class best counting. So they will
check through all the class bars to say hey,
which class has this annotation? Not great.
So it seems like the GVM is not made for the cloud.
So the idea in that case a we will regret the
application. And as engineers we love to
start from scratch. We love greenfield projects. We don't want to
handle the mess that was made by previous developers,
even if those previous developers were us.
But there are a couple of issues if we want to rewrite the
application. The first is obviously the costs. If you want to rewrite
the application, you must have like a nontrivial budget.
So you will go to your manager,
manager will at some point go to the business and
it will probably go like that. Hey, I need
x million. What for? To rewrite the application.
And what competitive advantage does it bring
us? What age do we have on the market with that rewrite?
Nothing. It will be like a rewrite feature for
feature. So you can probably imagine what
will be the outcome of this conversation. But imagine
for a second that yeah, you're the business,
understand the cloud. I mean you already have
the biggest advantage on the market, you have no competitors,
everything is fine. Let's go a bit into the
detail. When you start rewriting the application,
let's imagine you start in January.
Well it will take some time, it will take months until
you have rewritten the application. And so the
target that you are chasing, the version that was
done in January now is not the same anymore because the legacy
version of the app probably had upgrades
because the legacy version, the business wants to
add more feature. So you will be actually be like
developing toward running targets, which is
never good. Of course there are risks involved.
I mean legacy projects, there might be legacy,
but at least most of the bugs have already
been solved because people are already encountered them
earlier on with a Greenfield project, there will
be bugs for sure, even with the best quality process
and with the biggest test harness in the
world, so not great. And finally, if you are
a team lead, if you are a manager, you must think
how you will organize your teams. Rewrite means
that we will need additional workforce. So either
we recruit temporarily or we
outsource. But in both cases we need additional
people, and the usual way to do that is
those new people. They will maintain
and handle the change on the legacy application, while your
own workforce will work on the new version.
But there is a high chance that since the new
people, they don't know the application that well,
they will probably need support from the people
who know the application. So there will be a lot of interactions
of interruptions and it won't be super
great. So those are four reasons why rewriting the
application might not be such a good idea.
So if lift and shift is not a good idea,
if rewriting the application is not a good idea, we don't
have that much choice. We probably
have a middle path. And the middle path
is actually to reuse the existing codes,
especially the annotations from spring and Javae
and whatever, but change the way they are used so
the engine that uses them is not
the traditional engine. And before I go
further, let me introduce you about VM.
GraalVM is actually a bag of many features. Here are a
couple of them. So first, GraalVM has a GVM
platform. So instead of using Oracle GDk
or OpenGDk or whatever, you are using
GraalVm GDk, that's fine. The other thing that
GraalVM brings us is it's polyglot, so it
can speak multiple languages. You can use
multiple languages in your application, so that for
example you can have a Java application. But at
some point you need to use r because you need to do some statistics.
Well, it's very easy to integrate this R file
into your Java application. At least GraalVM makes it
easy. The reason for that is that there is an underlying
framework called truffle, which all those languages
have been implemented with. So there is for example truffle
Robbie. And this allows you, for example, to create
your own language as well, or at least an implementation
of the language using truffle for easier integration.
But what is of interest into this?
Migration to the cloud is another feature of GraalVM
called substrate VM, and this allows you
to create native executable from existing
bytecodes, whether jars or classes,
through an ahead of time compilation process.
Of course it has some limitations.
For example, I was talking about reflection.
So reflection is the ability to say hey,
I don't know which classes will be used at
compile time, I will discover it at runtime and
it's really a great feature of Java. But that
means that you need to follow the execution
path. If you do some build time
compilation whereas the class is available at runtime,
you understand there is a problem because at build time the class won't be
there. Fortunately there are ways to cope with that. You can
provide configuration file to say hey, you need to keep this
file and this file. Well, there are ways to cope with that. It's not
fun, but it works. Other limitation includes the
lack of security manager. It's not
cross platform. So if you want to have an executable
for, let's say macOS, you need to build on macOS and
for Windows, you need one for Windows and so on
and so forth. But at least it's something. So let's
have a small recap. On one side we have the GVM,
and on the other side we have native executables.
GVM memory consumption is high, native executable
not so high. GVM starter time is long,
and even more so considering that the
performance at the beginning is not great native,
you don't care, it's quite fast. On the opposite the
GVM, you can write your program once and run
it everywhere. There is a GVM for that which native executable
obviously you cannot do. And also there is a reason why
the GVM had always had very good performances
past this warm up time is that it can adapt
the native codes that it compiles to the workload.
So during this warm up time it will
analyze the workload and it will create
the best native codes that is possible regarding this
workload. And for this reason the GvM
was always at least on par with native executables
or C and C plus plus programs in the past. The native
executable is statically compiled,
so you must at build time know
about the workload to use the best
parameters for the computation possible. But in the clouds
those are pretty good advantages, and the
benefits of the GVM are not so huge. And so there is
now a generation of cloud native frameworks such
as micronaut or quarkus. I don't know much about
helidon, but it seems to be part of the lot.
And there is spring there because though spring
was not designed in the cloud native way, because when it was designed
there was no cloud. Now there are ways to leverage Graalvm
for spring, but I won't talk about spring in
the session more. And all those frameworks, they basically have
the same approach, they all use Graalvm.
So in the end you have a native executable and they
handle reflection in another way. For example,
micronaut happens to create dedicated
class at build time, at compile time and
sq the traditional reflection.
So now let's have a use case.
Imagine I want to have a URL shortener. The traditional approach
is hey, you have this space of all URLs
and you have a small space of
all possible short term URLs and you
need to have a projection and you need to handle the collisions.
Great. I'm not a mathematician so I prefer to
use an alternative is a I will generate
random shortlinks for a URL and then I will
store the mapping between the long URL and the short
one, and also the opposite from the short URL
to the long one. So the trade off is instead of CpU
time I will thread storage for the demo, I will use the following
stack. I will have a legacy Java e application or
now it's called joycortle e but my application is Java
e because it's legacy. I still use kotlin to write
it, but it's not necessary. Any GBM language will do
java whatever. Scala and I will
be using Jax arrests because it uses
annotations and I will be storing the data in
hazelcast MDT. So my initial state
is the following. I have the GVM that
runs the Tomcat that runs my war and
when I have a request coming in, Tomcat does
its magic probably it uses Catalina jar
that relies itself on Servlet Jar and Jacksonres jar
and it knows which servlet it needs
to call, so it call the servlet.
The servlet itself, well wants to store or
to read data from Hasselcast and it use a dedicated
jar, hazelcast client jar, that's the
initial state. Now I want to migrate to cloud native
and I will use, let's say Quarkus. So the two b
state in development I will still keep the GVM because
the GVM has nice features like you can debug,
you can set breakpoints, this kind of stuff,
we like it and in development it's not an issue.
So we still have the GVM, but instead of
having the Tomcat we
just replace everything with Quarkus. So they are the same
capabilities but implemented by Quarkus so
there is no Catalina jar, there is a quarkus something and there
is a quarkus as lcas client but in the same way
when a new HTTP request comes in
then Quarkus will redirect the request to
our servlet. But the best stuff happens when
you want to deploy in a container like
now we have the same mechanism underneath.
I mean I infer it's the same mechanism, at least it works the same
but now we have a single process and it's
a native binary. I've talked a lot,
now it's time for some demo.
So this is the project and I must admit I cheated
a bit. I didn't start from a
legacy application, I directly created the
project using the quarkus maven
goals so that everything has been
ready for me and because yeah you need to write couple
of properties and there are dependency and plugins
but you can achieve the same of course it will be more time consuming
to do the same yourself by hand but
anyway this is just to win some time and here
you can see everything has been configured.
I mean I can already use it right now
and I have this rest API
and so you can see jaxrs annotations
here and you can see this is kotlin as well
and here more jax rs so here I will respond to
post, here I have path and here
I have the producers to tell what it returns.
I don't want to delve too much into the
code but just as it is now I can start
a hazelcast instance and
I can start the application as well and
here development I'm running inside the gvm
it will compile and after
a few seconds it will run the application.
So it builds the application and it
runs it. Yeah it takes a bit of time. My machine needs
to wake up as well so I will
prepare the curl a it has started so
I want to store a new URL let's say this one
fubar I will just have
the terminal so
it has contacted Azel cost, it has stored this into azel cost
and I want now to do the opposite.
I want to get the long URL from the
short one so I will just curl this one and
it returns me fubar. So everything works as expected as
expected. I'm super happy now. The idea is
I want to build that for the cloud.
When I scaffolded this project
Quarkus created two docker files for me.
One is about Docker native and
one is about Docker GDM. So here is the Docker
native as you can see it's pretty easy,
you just need to follow the instructions. Mvn package so
here it takes a long, long time. This is actually
where the magic happens. If we can
have a look at the size of
those files here
you can see that it's a bit big
and once it has been done you
can build the docker
image. You won't do it explicitly,
it will do it for you. But this is like very
fast here. And once this is done,
I've created a docker compose file.
Docker compose file is very easy. It has one azal cost node
and our application. And now if
I docker compose it docker compose
up I will be using this new experimental feature because
I like to try new stuff. So now
it's on and I can again try
to do some curl. So I will curl to
create a new shortened URL and
now since
it's randomly generated, it's a new one.
And now I can curl to check that
everything works. And now this is
only a native executable that runs underneath
and as you can see, it's a seamless experience.
It's the same experience. The demo is
done, now we need to do some recap.
So my advice would be hey, between rewriting
everything and between lift and shift, just walk the
middle path. When possible, reuse your existing sync
code. It took you a lot of time to write
it, to maintain it, to test it, reuse it as much
as possible, but leverage different
frameworks, cloud native frameworks that know how
to mess to make the best usage of it and think
about return over investment. Thanks for
attention. You can read my blog, you can follow me on Twitter,
you can read more about the Qualcomm and Hazelcast integration.
More interestingly, you can also check the git repository.
So the demo that I've showed you is publicly available.
And if you got interested in Hazelcast,
please join our slack. Thanks a lot and
have a good day.