Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone and thank you all for watching my session on debugging
Shodinger's app at Comp 42. So amazing to be
here and I look forward to geeking out with you all today. My name,
developer Steve. I'm one of the senior senior developer advocate, the migo.
I'm going to talk about more about my background in a moment,
but first, some housekeeping. Hello from the past.
First of all, hello from the past. Hope you are all well.
Also, as we go through the presentation, please drop any
comments, questions and emojis into the chat, if there is one,
or reach out to me on social media. I always love connecting
to folks and geeking out.
Yeah. With that in mind, I do have
one thing to cover before we do get underway. And that is I have a
disclaimer and it has an asterisk, which makes it even more of a disclaimer.
But the disclaimer is I love tech jokes and
I have a whole bunch of them. There's a couple that will be coming up
during this presentation. I always love opening with them though, because it's
important that we all keep smiling. This is an open source joke,
so please make sure you share it amongst everyone. But how do
fallen trees check for errors? Thinking music
via log files? There we go. I didn't say they were
good jokes, but all the same, please share, share amongst
the community and just someone that you may
know that just needs to smile on, well,
anytime at all. Don't even need a reason. Anyway. Hello again,
my name is developer Steve. I'm the senior developer advocate at
Lumigo. I've been a developer advocate for
many, many years now. And also writing code.
Three times as long as that. No, twice as long as that. Yeah,
I'll have to do my math anyway, for a long time, let's say.
But funny story, I've been doing developer advocacy.
That's not a funny story, but kind of is because I have a whole bunch
of tech jokes. Anyway, I've been a developer advocate for many,
many years now. And as such,
I've been able to connect, been fortunate enough to connect to many tech
communities throughout the world. And one of my
favorite things is just being able to geek out and learn new things
and share what I know and just help the communities do
awesome things and all the amazing work that they do. But one
of the things, I've done loads and loads of events through that as well.
Over the years, people started calling me developer Steve.
It's my social media. Like, I've been using it on social media for quite some
time now, but so much so when I got married in 2018
and we combined our surnames, but Kuchin
is my married name. When I legally changed my name, I thought everyone's
just been calling me it. I wonder if I can legally change my name to
that. So turns out you can, and I did, and it's now
my middle name. So, yeah, there you go. That was
the funny story. But like I said, I've been
coding for many, many years now. Back in the days of Cube Basic, for those
that remember it on an Atari 800 excel.
Taught myself Cubasic, which, well, I mean, as the name suggests,
it was fairly basic, but being relatively new to code
myself, it did take some time to, well, understand the fundamentals
on how the language worked and then what could be done with it.
You actually needed a lot of, just as an FYI, for those that haven't
encountered it before, you needed a lot of basic to do, well, anything basic,
which, well, I guess that's called that for a reason, right? Humble beginnings
of the industry and also my developersteve Coochin story.
But over the years, I've then gone on to work through a
number of digital agencies, which I always loved because literally
organized chaos. And you have to take all sorts of requirements in zero time
and build out all sorts of applications in
zero time as well, and then support them beyond deployment. Whole other
story and whole other talk right there. Shout out to anyone that is in digital
agency world, because it is literally, well, this is literally organized chaos
sometimes. And hopefully that is not you.
But one thing I
took away from many projects as
such is scaling applications is
fundamental in the very early stages of any application.
And so you might be building that application for only ten users
now, but you have to build small now,
but with big ideas later. And by that I mean you have to build
out your application so that it can scale as the application
adoption and the scalability is required as the
app grows out. And if you think about it, that application
that you're deploying now, or even that project that you're
starting now with that idea, or that very early client stage
project has to build for scale as the
application requirements grow and also as its user base
grows as well. That in a sense
is, I mean, that's tech debt in a nutshell right there.
Because that application, if you pick
that up as either a new dev or go back
to a project that you've already started building, but building with that scalability
in mind is fundamental to the application's longevity.
And to avoid that
whole tech debt and, or as I like to think of it, future proofing
for your future self. Now, the flip side of this is,
and weve all been here too, is dealing with what I call the
Friday night rule. The Friday night rule is something
that came about from all the hackathons I've done over the years, which is
developers and teams sort of asking which language should we build,
this particular function or this particular idea in? And so for
me, sort of going back to digital agency days, in particular,
having the Friday night rule, which is half a game into
a particular online streaming game or game not streaming game,
a particular game you might be playing going,
you might be halfway through a game and all of a sudden you get that
alert saying applications down, or there's an
issue, there's a problem with that deployed thing needs to
be fixed ASAP. Being able to avoid that
is, well, the fundamental goal when it comes to deploying
applications and keeping them stable and more importantly, keeping our end
users happy as well. Just going back to
the Friday night rule too, with which language
should I use? I've always loved Python for many reasons, but in
particular it's versatility and heavy lifting. And by that I mean
like, I've used it previously and it's fundamentally great for
these things and a whole bunch more. But game development, application development,
of course, being able to build things out super quickly on
very robust and matured ecosystem and framework
tools and libraries, et cetera. Shout out to all the folks that help maintain
and contribute to those too. And please contribute back wherever you can,
because it keeps all our applications
contributing back, keeps all our applications happy and,
well, our end users happy too. That's always important. Anyway, game development
is definitely one of the versatile
things that I've used Python for previously, API development
as well. Given its Python's versatility and also
being a heavy lifting language, well, I consider it to be heavy lifting.
It can handle a whole bunch of tasks that you can throw at
it, particularly in the API space. And I do love APIs.
That's a whole other talk on its own, right. Just speaking volumes
to its heavy lifting nature. Data sciences, I mean,
very commonly used for any sort of analytical,
any analytical heavy lifting. And I've previously
spent ten years as a data analyst. So Python was
definitely one of my go
to tools in my toolkit to be able to build out
any manner of reporting, but also analytical understanding
on huge data sets. It's always great for web,
of course, Django. Shout out to Django and one
of the application components we're going to be looking at today.
Flask for light demos and light application
building. Flask is one of my total favorites to use
and one of my all time favorites,
micropython in IoT use as well,
which is really cool. Actually, something I wanted to mention here that I've
used it for previously. This one time I bought a new coffee table and
looked at it and thought, wow, that looks like a really big iPad. I should
put 300 led addressable leds in it.
And so I did, using micropython ASB
32 and you can change the color of the lights,
which is amazingly cool. Also you can use Python,
of course, on the back end of this too, to be able to do all
the amazing coloring that you can see in this GIF here,
which is always a fun project to do.
Anyway, sort of going back to building
for scalability. And this is where sort of something I've been
thinking about a lot lately, how application
deployment doesn't stop beyond that initial deployment.
Because as developers, DevOps and technologists,
monitoring and sort of tracing and observing applications
beyond deployment is equally as important. And if we
think of Schrodinger's rules of observability,
for example, which is essentially a thought experiment
around quantum superposition, in that
if you aren't observing something, it is both
not happening and happening at the same time, because that
particular element isn't being observed. If we apply that same
logic to application development, if you aren't observing and monitoring
that deployed application, then it both has errors and
doesn't have errors at the same time because you're not observing
it to know that it's not having errors. So therefore it must also have errors
too, which always hurts my head thinking about,
but you know what I mean, like, unless you're observed these applications
beyond deployment, you don't know whether an error
is being thrown, and perhaps a user on the other side of
that error assumes that that's the way the application is supposed to work.
As the end users of multiple of application,
it's always something I see, where there might be a little button
that you click and a certain thing happens quickly, but then it's redirected to
another screen. You can spot that as developer,
as a developer, and be able to go, oh, that error is being
thrown. Perhaps they don't know that that is actually occurring.
These are those type of instances where observability, monitoring and tracing
is able to help you identify that beyond deployed
and to make sure, well, fundamentally, that if
those things are happening, you can cater for them in the application
and be able to continue to refine your deployed
app to make sure your users are kept happy and the application
is kept happy too. Additionally, that could be using resources
that it's costing you money essentially as well. So always be
always something to be mindful of. Of course,
building these apps locally, there's a number of ways to identify
and spot issues inside code, and as
devs we fundamentally do it as well. So in using
the core Python frameworks, for example, core debugging, weve got
things like print logging, warning and PDB that's able
to identify, you can use to basically output breakpoints
or certain highlighted parts of the code to be able to identify
issues as they're occurring. There's also libraries as well,
a multitude of libraries like pprint, which I
always love to do more extensive sort of output
of issues and find things before
my users find them and before they surface in production,
because nobody wants that, particularly us
maintaining and deploying said applications.
Of course, our ides often use
the same aforementioned methods to be able to surface those inside
ides as you're building, but these applications as well. So like
vs code, eclipse for example, all have built in mechanisms to identify this
sort of stuff. So that brings
us to the first demo, because I thought we'd look at some ways to be
able to output and identify
issues. And that's essentially what Schrodinger's app,
Python app does, is simulate these errors
so we can see and understand as developers how
you can identify and trace and monitor and will
keep your app nice and happy and healthy.
It also brings us to another tech joke
potentially, but this is a Python one and you can see it on screen sponsors,
but what do you call eight bits in Python? A snake bite.
But yeah, this application, and I'll have the GitHub link for this at
the end of the talk so you can try it for yourself. Also, always looking
for contributors too. So if you've got something you want added to the application,
please open a pull request, but by all means try it for yourself.
There's two demos we'll be doing today,
spoilers. The first one I'm running locally to look at
some of the output from the application, and the second
one I've containerized and deployed to ecs, but we'll get
into that in a little bit as well.
Essentially, the app itself is built using Flask.
I like flask. I need to make that as a meme. I like flask.
Flask is great shout, but to that community I'm
using alchemy on the back end to do some very basic databasing.
Didn't have a lot of complex databasing needs for it. So yeah,
it's pretty light. And then there's a handful
of routes to get us started. Actually there's a few other ones,
but these are some basic ones to get started. And this is a to do
application. So I can enter to do items and then interact
really, really easily just using some basic flask routes.
Like get a list of to does, post a list
of to dos, update and delete as well.
Yeah, that's the fundamentals of it. There's a couple of other little
fun ones that I've thrown in just to do more of that testing and sort
of understanding of how errors handled
not only through the infrastructure that the application is
running on, but then also how your application can handle
such things, and then also spotting and being able to identify
these errors and warnings as they're flagged through.
This does bring us to the first demo. So this is the
application running locally. I actually don't have
it running at the moment. There we go. Kick that off. Let's make sure.
Yes, all running. Okay, so this
is the app. Like I said, there's a bunch of really basic routes
like add, which is handled through a post,
the basic get, which will get the list of,
make sure that's running, that'll get the list of to dos from
the database, and then being able to see the output
of all the routes being called and interacted
with, which is also super important too,
particularly locally development. But then we'll look at
how that works on the cloud side of things in 1 minute.
So if I create my first route, you can see
in the terminal window there,
it's basically showing the route being called and then the response from it
as well. So from the flask server that's running, you can see
that my HTTP status 200, everything's okay.
We can interact with that a number of ways. So that there's
an update route which basically just
changes that particular database to do entry listing.
It changes a status flag on it so that it
changes the status type. And as you can see, it's finished, not finished,
then triggers as well. And of course we can delete that as well.
Now, I do have some fun things.
Fun to do, special to does built into this as well
for testing. So playing on the Schrodinger's cat
paradigm, I have cat as a special
task item. So you can see there the cat buttons now
appeared as part of the cat entry listing. So if I click that now
what that'll do is incrementally start to go through the
400 HTTP statuses. So that just
threw a, should have been a 400.
And if I click that again, that will then start to iterate
through different HTTP
statuses as well. So it'll actually be throwing.
Why is that not. There we go. Now it's throwing a 400. I don't
know why I think there was a redirect stock there. Anyway, this is why tracing
is important and monitoring, because you're able to identify this particular,
this type of stock that should have thrown, yeah, there's a 401.
All right, now it's working. See I can use tracing to
basically delve into that a little bit further. You can see now it's like there's
a 402, there's a 403, so it's just going to incrementally
shift through those. And as a deployed application,
this helps me understand how my application
will not only handle these, but how these errors appear in the infrastructure
that I'm deployed on as well. And in this setup it's fairly easy
because well it's local and I can see what's going on. So if I do,
the other one I can do is HTTPstat and I'm
going to use HTTP status
418, one of my favorites, rarely used
other than, well, other than in applications like this.
But 418 is I'm a teapot as a HTTP
status, which is totally one of my favorites because it doesn't really mean anything but
it's fun for testing and you can see that I've
basically was able to throw HTTP status 418 and
well I kind of knew what was causing it because well I caused it to
happen, which is fairly easy.
Also, first demo works era well it
didn't work because it broke but then it somehow fixed
itself. But anyway I think that was a caching issue inside the browser I'm pretty
sure. But let's just try that quickly
again.
401. Yeah, see now it's working. I reckon that was
a caching issue because I hit a particular HTTP status which was
one of the not caching ones or do ignore
ones. So yeah anyway, fix now. Hurrah. See Tracy,
so easy. It's so helpful anyway and switch back to the deck
and special mention here too. So that was like working the application
locally. But of course deploying to a cloud is a whole other story
because that application once deployed may not necessarily
be able to see what is happening behind the scenes as easily
as what weve able to there. Although all those
times that I've dug through so many fervor
logs to find that one needle in that haystack to figure out
what's going on with my application, the amount of hair that you pulled, like doing
that whole exercise. Special mention here
too, to all those times where you've
got that error appearing in a deployed application,
but you fundamentally, even to yourself, come back to that whole
but it was working on my machine. I've been here so many times, like having
that error that we just had happen, which may have been an at happy accident.
Having that one error occur locally is completely different
to having that happen on not only cloud infrastructure,
but deployed cloud native application infrastructure.
And by that I mean when a cloud has a multitude
of services that your application will use as
part of a holistic application deployment,
you might have a multiple of services connected now that are working
completely fine, or at least behind the scenes,
seemingly aren't throwing any errors and seem to be working okay.
But as we all know, as the industry changes
and continues to grow, a lot of these
applications not only get new functions and new
approaches to how they work, but they may change without
you realizing and potentially cause issues with
your deployed application. Also, again, going back to
the Friday night rule where you half a game in, you get that
message come through on that messaging app saying hey,
something's down, something's not right, something's broken.
Murphy's law suggests that it's probably going to happen in the most inconvenient
time possible, either half a game in
on a Friday night or at 02:00 a.m., granted,
it's probably going to be both, baby, as well.
We've all been there, we've all had that happen.
This is weve distributed tracing is able to help with this because
you can take a multitude of services consumed by your
being consumed by your application, or running and empowering
your application and get a nice holistic
map traceable map of how those transactions interact
with each of those different services. And if you think about
it like being able to sort of navigate
that maze of multiple application sort of resource usage
across a single transaction, it sort of simplifies
the mapping of your application and its footprint across that cloud native service.
What's really important and really fun here though, is that
one error that appears on the most inconvenient Murphy's
law moment. You can then trace really easily as part of your
holistic application map, which is really, really cool.
Additionally, thanks to some industry
standard approaches to tracing, monitoring and logging,
now you're able to also do that agentlessly
as well, and by that I mean, and we've all been
here too. You don't need to run an additional service anymore
on server infrastructure or cloud infrastructure
to be able to monitor and map the
full end tracing of those transactions across the multiple
of cloud native application services as well.
So you're able to get that holistic sort of application
map and do it in an agentless approach and get
that nice application cloud native footprint.
Yes, agentless. And this
is in part two, like I said, like I was saying, we were around the
industry standard approach to a new
approach called Open Telemetry, which open
telemetry has been around, not only been around since
2019, but it's actually part of the cloud Native Cloud
foundation as well, which is really cool because it takes a
vendor neutral approach, a vendor neutral open resources approach
to observability across application metrics,
frameworks and that whole community
industry standard approach, which is fundamentally great
because it means you don't need to get vendor locked in with
your observability solution. And that is super important because friends
don't let friends get vendor neutral locked in. Vendor locked in,
rather vendor neutral for the win. Yeah,
open telemetry. I won't go into this too much, but open telemetry is quite
extensive. The open telemetry group have an
amazing community, have some amazing write ups,
a multitude of blog posts, and they're always looking for contributors
to the project as well. They actually have a slack group too,
of anyone looking to get involved or to connect
to the community monitor as well. But essentially
open telemetry, even the
instrumentation or libraries that can
connect into your application, all vendor neutral,
all industry standard. So you can take these and connect
to whatever monitoring solutions you want, including some open source
ones or some auto trace,
auto magical ones even, that we'll be looking at in
a little bit as well. But it takes that industry standard approach
across the multiple frameworks and multiple languages and
standardizes it, which is great. I'm saying
that as somebody that has not only run and deployed a
number of services and cloud native applications over the years,
but has also spent many hours delving through a
multitude of logs looking for that one error that sometimes you just don't
find, and you still have to figure out what your application is doing
or what your application was trying to be doing.
Special mention here, too, and I've already mentioned it once, but you can never say
it enough times, but this is completely
the open telemetry community, all open source,
including all the amazing instrumentations you can see on the screen
here too. So please, where possible,
please make sure you are contributing back when you can
as well. And so, Lumigo, one thing
we always do is not only try and contribute back where we can to the
open telemetry community, but we also support a number of our own open
source, open telemetry traces as well,
which sometimes you need alongside auto instrumentation,
depending on how you're auto tracing or
how you're tracing your particular application. So,
two languages that I wanted to mention here,
of course, Python, because we're at a Python conference, and then there's this other language
which I won't talk about, but both of these
are completely open resources. We're always looking
for contribs and ideas on how we can build
these out and make them not only more robust, but a lot easier to use.
We're going to be looking at one of those in a moment as well,
because they're really easy to set up and very easy to.
Very easy to deploy. Just quickly
on that too. And again, going back to the previous slide where I
said, please contribute where you can, but I
live in rural Australia and I have sheep,
so I affectionately called one lambda recently,
because Lambda to lamb. I've always wanted to call a lamb that, and I now
have a lamb, but Lambda to lamb thanks you in advance for
contributing stars. And this is me holding said Lambda. Oh, isn't he
cute? He's actually a lot heavier now. This was only a couple of months
old, and he's about seven months old now, so probes
are not picking him up anymore. This is what happens when your lambdas get
put on too much weight. It's probably a whole other joke
there. These are really because of the industry standard approach
to open telemetry. And again, going back to the vendor
neutral approach as well. These are super easy to configure and install.
I mean, you can use PIP with Python, you just use PiP to install
the tracer library. Drop a reference into code.
I'm going to show you what that is in a moment. Configure some environmental
variables, because friends don't let friends hard code environmental
things that can be environmental variables anyway,
namely hotel underscore service, underscore name and Lumigo
token tracer values as well.
This version of the demo, I've taken the same application and
containerized it and put it into ecs,
essentially, yeah. And also
I've got a different to do command set up to demonstrate
interaction with sqs,
simple queue service and how that
can fit into tracing to give you a better view of
what your application is doing as well.
So with that in mind, let's take a look at
a demo, a second demo. So like I
said, I'm using the Lumigo
open telemetry tracer, which you can see has been imported
there. Alongside that I've got come environmental variables
set in this next demo, namely a
whole bunch of keys and secret accesses, region name
and send queue URL for the AWS SQS
service which is part of this application. And of course I've
also made sure that I can handle
those values not being set which is on inside the app. But we'll have
the link at the end, so stay tuned for that one. You can try yourself.
That's basically it. So just to be clear to the Lumigo
tracer or the open telemetry tracer that I'm using here,
that is it. Other than calling the library in, I'm not adding any
additional code because I don't need to. That's pretty
much it. So it just runs inside the application and
sends all those traces through. So this
is the ECS application I have running the container as application.
It's pretty much the same as we saw before.
I'll just refresh that so I can send through
a bunch of basic path invocations which then
will get surfaced inside our tracing service.
So I have free tier Lumigo running here and as
you can see, it's already been tracing. I'm just going to refresh
that screen. It's already been tracing the
cluster and the app that I've got deployed there by
default. And all I've had to do to get to for
this screen to happen, this monitoring to happen is just
connect the two platforms, which takes a second,
it's a couple of screens to go through as part of the free tier setup.
What that library does is then add additional trace
information as part of the application running. So if I click,
I can click through to the application and see all the
application screen and see some more details about the cluster that's
running. But then if I click on see traces,
because I've got that tracer library running, I'll then be able
to see more detailed information about what's happening
behind the scenes. So you can see the services that
I just, or the routes that I just called just
then are then already creating invocation data
to come through the surface inside the open telemetry monitoring.
Now I can level that up a little bit more by not only using
one of the functions that we were looking at in the last demo.
So if I click can again, it's going to start throwing a bunch of 400
errors like it did before. Hopefully this time
maybe was almost going to throw something. I think it's because it was
a 402. Yeah, there we go. That's a 403. I can
even do stats 418,
which is I'm a teapot. It's definitely
getting unhappy with I'm a teapot. So we will bail from
that one. Let's go. So like
I said, I have this other one set up that does SQS creates,
essentially sends messages through to
sqs as similar to what we've already been doing as well.
So if I do meow, I then get a meow button and that
will then start sending messages through meow one, meow two,
meow three to the sqs queue that I've
got set up. And you can see there, it looks
to be working. But again, going back to the idea of Schrodinger's thought
experiment, it appears to be working on the front end
that I can see as an end user, but I don't necessarily know
that what it's doing on the back end because weve got that distributed cloud,
that approach to deployment, that distributed application.
So if we go back to our explore, in fact, let's go
to transactions, the transactions tab, you can
see here there's some errors or some invocations which
have started to appear from the stuff we've been doing inside our application.
So you can see here there's two entries for 401. 402 is being
thrown as part of the errors we're simulating. And then up
here, those meow to does
that we were just creating are actually picking up
additional traits, information not only from our sort
of base application, but any services that they then interact
with as well. So you can see here the flask application,
all this great information that appears in it is then also
showing as part of the transaction that was happening,
showing a connection straight into sqs as well,
which is really handy when you start to think about really large
applications and the footprint they can have across a
multitude of services, not only within the same cloud,
but associated services to like sending sms, sending emails,
or if you're dealing with an e commerce application, also transactional
systems like square for stripe for example, and how
those services interact. And for ecommerce
applications, you totally want to be monitoring for this sort of activity
because you want to make sure that again, everything's working on the back end and
your users are having the best possible application experience they
can. So anyway,
hurrah. The demo number two worked
I'm almost out of time, so I'm just going to wrap
up with a few more slides here.
Just some takeaways to close on,
but always be building for scale or aps as I
like to think of it from the initial onset.
Make sure you're building with that scale and growth mindset for
your application in mind and making sure that everything
will handle minimal users now, maximum users later,
with a little refactor in between. Always be future proofing yourself.
Rinse, repeat and refine. Just make sure again that you're
identifying issues that occur and also ways that you can
always improve your application because it's going to make that experience so much
better and make your application run smoother as well.
And most importantly, make sure you trace and monitor everything
you possibly can to make sure everything's working as it
potentially should. Nodejs available on here
on my GitHub as well so so please go check
that out. Always looking for contributes,
stars and comments. So please reach out on socials
if you have any issues or anything you wanted to add.
Just lastly, please always remember to use your tech superpowers
for good and be excellent to each other. Thank you very much.