Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, welcome to my talk on how to close the
developer experience gap of kubernetes.
My name is Edidiong Asikpo and I currently work as
a senior developer advocate at Ambassador Labs. I'm also
a CNCF ambassador and technical content creator.
You'd oftentimes find me building, writing, and sharing
knowledge with people in the developer community. You I
go by DD codes on all my social media platforms except LinkedIn,
which is my first and last name. Edidiong Asikpo.
So if you think about it on a general sense, right,
the adoption of kubernetes and containerization has solved many
challenges that businesses face today when it comes to
flexibility, scaling, and the reliability of
the release of new versions. This has motivated several
people to adopt cloud native technologies because they also wanted
to enjoy these benefits. Right? And that's why you've seen that so
many companies have transitioned from monoliths to microservices,
and you're still more in the conversation of making that transition as
well. So even though kubernetes enables you
to achieve all these amazing things, what nobody really tells
you is that it significantly impacts the developer experience we
once knew about. So I'm sure you probably know the meaning
of developer experience, but let me just quickly remind you
about what that means and also lead up to the next point in this
conversation. So developer experience is the workflow
a developer uses to develop, test,
deploy and release software. So it's pretty much what happens from the
time they start writing code to when they push their code to
production. This developer experience consists of two
types, which is the inner development loop and the outer
development loop. So the inner development
loop is where the developer pretty much does like the build push
test cycle where they're writing the code they are building
to confirm that the code works as expected,
and then of course testing it to finalize that confirmation or
test process. And once they feel satisfied
in this process, they push it to a version control, like GitHub for
instance. So the moment things push is made,
that's what automatically triggers the outer dev
loop. So the outer dev loop is everything that happens leading up
to when it's being pushed to production. This could be like code
merge, Canary release deploying, and all
of those other interesting stuff. So the
adoption of cloud native technologies has altered this developer
experience in two ways. One is that developers
now have to perform extra steps in the inner development loop,
and secondly, developers now not to be more involved
in the outer dev loop, even though most of their time is spent on the
inner dev loop, where they are actually writing the code and testing
the impact of their code changes. They now have to be concerned about,
hey, Canary release, deploying code merge, all of those
other stuff. And even though this comes with certain
benefits, it also has its disadvantages.
For this talk, we're going to focus on the inner dev loop,
because that's actually where the debugging and developer experience happens
in, right? Once you can make this section of the inner dev loop
as fast as possible, it indirectly affects every
other part of the developer experience, because it would make you to ship
products to your end users a lot faster. So here's what
a traditional inner dev loop looks like, right? Let's assume
that the developer in this case would have to spend 6 hours per day
writing code, and the inner dev loop would probably be
like five minutes, for instance. This means that they'll spend three
minutes coding, one minutes building and reloading,
right? The next 1 minute used to inspect that the code changes
that they have made is working as expected, and like to
say ten to 20 seconds committing that code change to version control
system. And if you want to count this or kind
of break it down, you realize that in this formula that we've
created, that developer would be able to make at least 70 iterations
of their code per day. And the only inner dev loop,
or like the only developer tax, rather, that they
will pay here, is the commit time, which is actually
negligible because it just takes ten to 20 seconds
or less, depending on how detailed you want that
commit message to be. Right? But then here's what the
inner dev loop of a containerized system looks like when you
start adopting kubernetes and other cloud native technologies.
Yes, coding still remains the same, but then after you've
made that code changing, you now have to wait for your code
to be containerized, pushed through a registry, and deployed
to a Kubernetes cluster before you can see the impact of your close change.
And then you realize that this automatically reduces the
number of iterations from 70 to 40,
right? And then the developer tax being
paid here is in this build push test cycle,
which as you can see, is longer than
the traditional inner dev loop, where you just spend
like ten to 20 seconds committing. And oftentimes
people would just neglect this and be like, oh, fine, it's great, let me go
grab a cup of coffee. Let me quickly watch like
a Netflix episode while my code is containerizing.
But then you realize that in the long run, that all of things, minutes that
you're waiting for the code to containerize or be pushed to a registry or be
deployed into a Kubernetes cluster could have actually been used
to do the most important things that developers are expected to
do, which is writing code, seeing the impact of their code changes and
pushing it to production as soon as possible. So with a better
inner dev loop, it means that you'll be able to move faster. And I will
continue to explain that as we go further in this talk.
So a slow inner dev loop impacts everyone.
For instance, front end developers now have to wait for
previews of backend changes on a shared dev environment
or rely on mock mock databases, mock API
scripts, those kind of stuff when coding the application locally.
Backend developers, on the other hand, now have to wait for
CI CD to build and deploy their apps to a target environment
to verify that their code works correctly. And this doesn't
just affect front end and back end developers, it affects everybody
at large because it slows down the releases into production,
which thereby impacts the business because you're not moving as fast as possible,
and the end users because let's say they are stuck in a bug that cannot
be fixed immediately because of the slow inner dev loop that the
developers are currently experiencing in the company. So this bad
developer experience doesn't just affect the developer experience,
but the user experience and the company.
So is there a way out? Is there a way to
actually enjoy all of the benefits that Kubernetes has to offer without
actually being slowed down or impacting
your developer experience? The answer is yes,
thankfully. And there are a couple of ways to do this,
right? The first one here is where you get to run
everything locally, right? And the good thing about this development environment
is that you still get to enjoy all the benefits of local development.
You can set breakpoints, enable hots, reloading and
even see logs a lot faster. And because everything is running
locally, it means that you'd have a faster inner dev loop,
right? Because as soon as you make a code change, you can quickly test
it against its dependencies. Another great thing about this
is that it's also relatively cheap, right? Because you
don't have to spend money on Kubernetes clusters. I mean,
we all know how expensive that can be, but then it
has a really high maintenance scale, right? You'd always
have to confirm that the mock API scripts are up to date.
Whenever I want to make a code change and think of things in like a
situation where there are several developers in your company all
making changes respectively when they are working,
it can be really tough to ensure that this mock API
script is up to date. And that's why you oftentimes see companies who use
this method push things to production and realize that
there is a mistake that they missed out on because their mock SPI
script wasn't as up to date as it should have been. And then because
everything is running locally, it also makes your workstation
really hot. At some point you'd have to move away from
this because there's only so much that your laptop can actually handle.
So even though this method has several benefits like a fast feedback
loop, it's cheap, gives you access to local development
tools. The high maintenance and hot workstation isn't
sustainable. So the other option here is
to now try remote developments right, where everything runs
remotely. And because everything runs remotely,
you now have a normal workstation, right? There is no
hot, your laptop is not closing up, it's not becoming
too hot, you can use it as expected. And the maintenance
is also really low because you can set CI CD
systems to ensure that every single time someone makes a code
change, it updates it all around. And whenever another developer wants
to make or test the impact of their code changes, they are using the
most updated API script or database or whatever
dependencies that they are testing against.
But then the cost is really high, right? Every developer
would have to use their own remote development cluster,
which can be quite expensive. And then the inner
dev loop or the feedback loop in this case is extremely,
very slow. Because every single time a developer makes a
code change locally and wants to test the impact of that code change,
they need to containerize it, push it to a registry like Docker
hub for instance, and then deploy it into the remote
Kubernetes cluster. So doing this all over again every single
time you want to make a code change slows you down and makes you
less productive. So instead of you to do like 70 iterations
of your code per day, you don't end up doing 40 iterations per day.
So even though this development environment has great
benefits, like a normal work temperature,
low maintenance, the cost is very high,
it has a slow feedback loop, and you get to lose out on all
the many benefits you enjoyed from local developments like
debugging breakpoints, hot reloading and all those other
interesting stuff. So you'd agree
with me that these two different development methods have their own
benefits. Local development environment has several pros
and remote development environment also has its own pros.
So how can we combine these two things
together and create a development environment where
you get to enjoy the benefits of local development and the
benefits of remote development? This is where telepresence comes
in and enables you to achieve this.
So this creates a development environment we call remocal,
which is remote to local emerges and gives you the best of
both worlds. So now your cost would be low because even
though things are running in the remote Kubernetes cluster,
development teams can now use shared clusters. So it
means you cut down the amount of money you had to spend paying for clusters
for each developer. And the maintenance here is very low because you
can still use your CI CD systems to automate and update
your API, script, database and all other dependencies
whenever a code change is being made. And then the temperature is normal,
because the only thing you have to run locally is the
service you're actually making changes to, while every other thing would
run remotely. And then things gives you a fast feedback loop because
you no longer have to do the bullpoost test cycle, right.
All you need to do is run telepresence intercept
and you instantly be able to intercept the traffic going to
the service in the cluster, to the service on your local machine.
That way you can test how this service would work with its dependencies
in the remote Kubernetes cluster. So what exactly is
telepresence, you might ask? Telepresence is a CNCF
tool that enables teams to test and debug on
Kubernetes a lot faster and in a seamless
process. It does this by connecting your local machine to
a cluster via a two way proxy mechanism,
which enables you to access clusters resources as
if they were running locally, and reroute cluster's traffic
to your local service. There are two ways to intercept
traffic with telepresence. First, one is called global intercept,
while the other is called personal intercept. So what global Intercept does
is that it intercepts the traffic that was intended for
a service in the remote cluster to a service running on your local
machine. All of the traffic, right? But personal intercepts, on the other
hand, only intercept a subset of the traffic.
And this is vital because there are certain times where different
developers are working on the same Kubernetes cluster,
right? And you don't want to make your debugging or testing
to affect or impact the work that the other developer is doing.
So in this case, you'd only send a subset of the traffic to just
your laptop, while every other request coming to that
service in the cluster would go there as intended.
So here is a diagrammatical architecture of how telepresence
works, right? So let's say you have a service called service,
a prime, running locally in your computer and another service
called service a running in the cluster. So whenever
a request comes in through the ingress, it's going to hit the
sidecar agent, which has been added by telepresent.
And once it hits this sidecar agent,
it's going to direct all the traffic coming
here to the traffic manager, who would then reroute
it to your laptop. And of course that's in the case of a global
intercept. But if this was a personal intercept,
once the request comes in through the ingress and hits the
sidecar agent, it's going to check and say, hey,
does this request has the HTTP header? That was
set when I ran the telepresence intercept command, and if it does,
it sends it to the traffic manager, which then sends it to your laptop.
But if it doesn't, it goes to service a as expected.
So this is in a scenario where your work doesn't impact other
developers. Assuming another developer is still using service
a to do some testing of some work, I'm not going to impact the
traffic coming to that service in the cluster, just a subset of
it that will come to my laptop so I can do my debugging and
my testing and have a fast developer experience without
impacting what my colleagues are doing as well.
Let me show you a demo of how telepresence works.
The first thing I'm going to do here is run the telepresence Connect
command. This is going to put my local mission in the cluster
and enable me to speak to cluster resources as if I was
another resource in the cluster. Let's start this out by assessing
one of the services running in the cluster. For instance, the very
large Java service. In this case, I don't have
to put in the ip address, I can just put in the DNS name because
telepresence has merged my local ip routing table
and DNS resolution with the cluster, making it
possible for me to connect to it using the cluster's DNS name.
I'm now speaking to the very large Java service like I'm inside
the cluster without having to proxy in or do
any other complex configuration, thanks to the two way proxy
mechanism that telepresence has set up between my local machine
and the cluster. Let's talk a bit about this demo.
This demo is called Edgycop, right? And the demo has number of
services. The data processing service is the one I
own and I'm actively developing, right? Well, the very large
Java service is too large for me to run locally. It's owned by another team
and I also do not have access to its code.
The very large data store, on the other hand, has all the critical scenarios,
right? All the critical test scenarios, and it's also too large
for me to run locally and dates back to the creation of the Edgicop application
many years ago. So without numerous configuration
or time wastage, telepresence connect enables me to assess and interact
with the very large data store and the very large Java
service without running them locally. I can instantly assess
these services in the cluster. So aside from being
able to assess cluster resources as if they're running locally
using telepresence connect, we're also going to run the
telepresence intercept command. Like I mentioned earlier,
there are two types, which are the global intercept and the personal intercept.
So if you look at the edgycop web app now, right, you see that the
UI color here is set to green, and you also see that the data processing
service is also set to green. And there's no other information here apart from
data processing service, right? So I have this data
processing service, like I have a local version running
on my computer. I've not started running here, but I'm going to do that now.
So I'm just going to go, I've already navigated
to the cluster, so if I type Python three
app py, it should start up the
server of the application on localhost
3000. All right, awesome. So if I try to call
that localhost 3000
color, you'd see that it returns blue and
then the call also went through successfully. That's why you can see the 200
here. So now what I'm going to do is I'm going
to intercept the data processing service in the cluster and
reroute all the traffic going to this service to the data processing
service running on my local machine. To do this, I'll run
the telepresence intercept command. So telepresence intercepts,
intercepts the data processing service in this case, and then
deport, which, as you can see here, is 3000.
So if I do this, it's going to create that intercept for me.
See that intercepts name, data processing service
is intercepting all of the requests. So if I go to
the application again and reload, you see that instantly,
it's now assessing the service on my local machine.
And if you come here, you'd also see that that call has also gone through
successfully. Right. You can also notice that the color has changed
from green to blue and that the content of this
data processing box is no longer the same, showing you that it's
now accessing that local data
processing service here. Right? So now
you can see that I was able to move all of the traffic intended for
data processing service, the cluster to data processing service
running on my local machine. So what I want to do now here
is create a personal intercept. To do that I'm going to
leave this existing global intercept and
I'll do that by running telepresence leave and the name of
the intercept, which in this case is data processing service
that has left. So if I go back here and try to reload this page,
you see that it's back to receiving
traffic, or rather sending traffic to the service in the
cluster. I'm going to run the telepresence fascinator
set command. So telepresence intercept,
I'm going to add the HTTP header and
then I'm going to pass the pod, which is about
3000. And at the name of the service I want to intercept,
which is data processing service. So this should
create that personal intercept for us. All right,
so what do we do here is copy this key
value pair. If we go back to our browser and try to reload this page,
you see that it's still going to show the green colors because
now unlike when we ran the global intercept,
it's not rerouting all of the traffic. So we routing a subset of
the traffic that has the HTTP header.
So if I go to my mod extension and paste in that,
I already had this already. So in your case you just click
on this plus sign and add the key values
here. Since I already have, I'm just going to click on check and
then I'm going to reload this browser.
And as you can see here, I've instantly been able to reroute
the traffic to only a specific header.
Not every single request come in, right. So we can combine these
two things together and be able to instantly get
the feedback loop when we make a code change. So for instance, let's say
I want to change that
color from blue to say orange for instance. Let me
go to data processing service and change this from blue
to orange. So I'll save this. And you see that as
soon as I save that, of course the server gets restarted because a change
has been made. And if I come back and reload this, you'd see that
the color has also changed automatically. So instead
of going through the build post test cycle like you normally would, or having
to run all of your services locally, even having to set
up ports forwarding, you can use telepresence to do all these
amazing things instantly. So that's like the beauty of it.
And then if you think about it, the faster you can move, faster I can
do. All of this process is what makes your developer experience great.
Instead of having to wait for minutes, sometimes even
hours, depending on how your Internet connection is
or how powerful your laptop is, for your service to be containerized
can use telepresence to speed that feedback loop.
Get a better developer experience, which would in turn help
not just you, but entire company and end users at large.
Awesome. Now that you've seen how telepresence works, here's a
quick example of a company that utilized telepresence
and I'm going to explain their before and their after. Without telepresence,
they didn't have a great developer experience. But according to them,
after they started using telepresence, their developer experience improved
drastically. Before telepresence they had to bear the
operational and resource burden of running all their microservices
locally, but with telepresence that was completely removed.
They only had to run the service they were updating locally and
every other thing was running in the remote Kubernetes cluster and
they could instantly see the feedback of their close changes.
And then they moved from not being able to utilize
both the benefits of local and remote development to
being able to have the best of both worlds. So they could
use local development tools for debugging hot,
reloading, seeing logs faster. But then their laptops didn't have
to go through stressed or become too hot and all of that because
most of their dependent services were still running in the
remote Kubernetes cluster. Also, they moved from having
to code build the container, push it to a registry,
deploy and wait before being able to test the impact of their code
changes to just coding, intercepting and
immediately testing and seeing the impact of their code changes.
So wrapping up Kubernetes development teams
actually need to have a developer experience that
allows them to focus on things that matter, which is coding,
testing, iterating instead of focusing on things that do
not matter, like waiting for the build push test cycle to be completed
or discovering that you have a bug in production because
you've missed it due to having a testing environment
that is unrealistic from the actual testing environment in production.
So once this is done, it would significantly increase
the productivity of the developers on the team and the number of
updates being shipped to production. Telepresence gives you that all round
developer experience as it bridges the gap between
local development and clusters, giving you the
best of both worlds to be able to utilize all of the interesting
things you love about local development and remote development.
So if you'd love to try telepresence, you can
do so by visiting this link. We currently have a 30 day
free trial, so you get to try it easy for your development,
see how it improves your development workflow and your developer experience,
and of course, invite members of your team to join in as well.
Thank you so much for joining this talk and listening till the very end.
If you have any questions, feel free to join our community.
Slack at ah Slack or send me
a DM on Twitter via Didi codes. Thank you.