Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey, everyone, thank you for coming to stock. In the next like 40
minutes, we are going to track a little bit about CI CD pipelines
and how to make sure that we prevent the bad code from reaching
production. We are going to see how to improve the monitoring
and observability of our databases and why we
actually need a thing that we call database guardrails very
early in our pipelines. So let's go. So the very first
thing we would like to discuss is whenever
there is an issue, we would like to catch it before we actually go
to production, right. We would like to have some automation checks,
some guardrails, some safety nets that will
prevent the bad code from reaching production.
Important part here is that we would like to have these checks as early
in our pipeline as possible. We would like to
push them to the left so they get executed as
early during the early development stage,
not very late during the load tests or
even after we deploy to production. This is super important
because in today's world, we just can't let ourselves
to identify issues after we deploy to production.
We need to catch all these issues before going to prod.
And in order to do that, we need to do that automation.
Right. We don't want to have this process like manual and
handcrafted. We want to get it fully automated and work like
a charm before even we go to load
tests to pre production environments as early as possible.
That's very first thing. But the second thing is whenever something
breaks, we need to be able to automatically
troubleshoot the issue. We need to know the context. We need to
know everything that is around those things in our
cluster, in our environment. And we need to basically be able
to monitor and observe what's going on there. We would
like to have a good tooling that is going to
pinpoint the issues directly and let us know very precisely
what happened and what the reason is.
So this is what we are going to discuss during the upcoming
minutes. So you might ask a
very good question, right. How do you know that the code you deploy to
production is actually going to work well? Right.
They tell you deploy every time, deploy constantly,
deploy on Friday night, everything is going to be good.
But how do you make sure that you are not going to take your production
down and that everything that you deploy is going to work well.
Right. And before answering the question, we need to actually understand
what may go wrong. And there are various areas
that may break during our deployment. And when dealing with
production environment, the very first area is the deployment may
go wrong. So, you know, there are multiple operating systems.
There is windows, there are Linux, macOS and other operating systems
and there are differences between them, right? Like characters
indicating the end of line, right. Different ways of
setting permissions, right? Docker with different parameters and
net host that works on Linux but doesn't work somewhere else.
And other things that may break along the way. Not to mention that
during the deployment you also need to propagate
the changes from your local environment, from your staging environment to the
production environment. And you may simply get that wrong or
just forget to do so. For instance, how many times it
happened that everything worked well in pre production environment
when we deployed it to the cloud, but once we went to production it
simply stopped working, right, because we forgot to propagate.
Like IM roles, permissions, settings for databases,
connection strings and other stuff, right? Just getting the
software deployed is not as straightforward as
possible. And your unit test may capture that.
Your logic is flawed, but they won't capture whether you got
the deployment right. Okay, so that is the very first thing
that may break. The other thing that may go wrong is like your code
may just start failing in production. This is because of
various things you basically got wrong. For instance
bugs in your application, but also because of different assumptions
and different environment that you are now running in your locale
may be different. The country you deploy to may be different like
already mentioned, things like different end of line characters
may be completely different. And all that stuff that affects that. Your application
that worked well in local environment or in testing pre production
environment doesn't work well when you go to production, right?
And those things we can capture again to some extent
using automated tests, unit test integration tests or whatnot.
But generally what we need is we need to be able to tell
very early that things that we are checking in our local environment
will not break when deployed to production. Another thingy
is the third area we'd like to discuss is
completely different context and completely different workload
when we run in production. This may be that like obviously
different country, different data patterns, different data distribution.
You tested things in Europe, but then you deploy to the US and
you just get completely different input to your application because
you have different clients working over there, right? Another thing
is different data load. You tested stuff on very small database with
like 100 rows in it. But when you go to the production you get
millions of rows, right? And your application just can't keep
up with the load, with the pace that you observe in
completely living environment. Last thing is like edge cases,
right? You have completely different situation when you deploy
to big country versus to small country, right? You may lack
some content, pre populated content like of your ecommerce,
eshop or whatnot. You may get different distribution, you may get
different characters, different encoding. Things may just break because
you did not expect them to break because you didn't foresee how they
may break. That's because there's completely different country,
different environment you're running in. So now the question is, okay,
so how can we make sure that on Friday afternoon we can
safely deploy to production? And the very first thing they
tell you is go with CI CD, right? If you
know CI CD, everything is going to work well, right?
So just to set things straight, CI CD stands for various things.
CI stands for continuous integration. Continuous integration
is basically when, after we merge or after
we commit to the repository, we merge all the changes into one
branch and then we build the package that we would like to deploy.
We build the package, we run all the unit tests, all the automated
tests, we run everything, we prepare all of that so
that we are pretty sure that the package we have is going
to work well. Okay, then we have another thingy, which is continuous
delivery. In continuous delivery, what we do is we take the
package and we deploy it to all non production
environments. We deploy this package, we install all
the dependencies, configure everything, and start testing whether the package
worked correctly. Also, we make sure that all our teammates
and all the teams around are using exactly the same code if
they, for instance, don't deploy things locally, but instead use
some developer or team environments deployed in the
cloud, right? And finally, we have continuous deployment. And continuous deployment
is when you get the package that you have just built and you deploy it
to the production environment directly. Right? And this is cool.
You take the package, you already tested it in like non
production environments, and then you go and deploy into production,
right? And this is when things may break.
Obviously they tell you CI CD is going to protect you from all the
issues, but is it? Right. What kind of issues can
you capture when running unit tests or when running integration
tests, what things you may see, what things you may spot,
and what issues may just go unnoticed and appear in
production. So what we are going to do over the
next minutes of this talk is we will see some specific
issues that may actually happen because you can't catch
them easily in your CI CD pipeline. So we will cover
databases, we will cover orm libraries, and we'll
finally cover lack of context and lack of understanding
of all the moving pieces around. So let's jump
straight to it. So the very first thing with database
is like we do deploy them, but they consist of multiple
parts. First thing is when we change, we typically
change the code that is talking to the database. So we change the
queries that we send to the database. And this may result in
various things. For instance, the query may now just be inherently slower.
For instance, if you restructure your query of if you extract
more data or if you change columns, you may change
how your query executes and you may for instance get lower performance
or just bugs, right? So that's first thing. The other thing we
may get is schema changes. So in schema changes,
whenever we deploy something to database, we may end up in
a situation that this schema modification is going
to work for a very long time. If you run
things locally with small tables, then adding new column
or changing the column type, or doing something with your schema
in your database is going to run really really fast.
But once you go to production and you have millions of rows,
then simply innocent schema migration
may literally takes minutes or even hours to get
deployed. Your table may need to be rewritten, meaning that your
SQL engine may need to copy the data from the table on
the site, recreate the table, and then restore the data
back. And this is not something you can do in a split second.
This is something that is going to take minutes. So this will
take your application down. And another thing is
query changes. It may be the case that something
in your database have changed for whatever reason,
indexes changed, configuration, installed extensions,
views, stored procedures, triggers,
whatnot. Multiple things may just change which will result
in your query now being executed completely differently.
For instance, previously it was using index to
scan your table. Now it can't use this index because of whatnot.
Or maybe it was doing like hash join or merge join.
Now statistics are out of date and it just goes with nested loop join
which will just be slower. There are other things that may
change in the database like statistics, bugs,
missing indexes, data quality, configuration locks,
partitioning, other stuff that is in your
database and your unit tests will just not catch that.
Your unit test will test your code and assume how
it's going to work in your database. But those tests
do not work how you configured your database. Especially they
have no idea how you configured your production environment.
So let's see what else may go wrong. For instance, we may get slow queries
that work pretty well on your testing environment
but do not work in when you go to production. So imagine that we have
this application code right. In this application code we have an aggregate root
of the user and what we do is we would like to also query for
some details of the user, like location pages, text,
whatnot. Right? And this in turn because we use orm library,
this could create some query like this one when we basically
join multiple tables. And this is the example of the actual
production environment I had. And this is the code I was debugging
once. And this code resulted in just when getting one
entity to our application. Because this entity had
so much of this data,
these details of like questions, text reports and other tables
that we joined together, this resulted in getting 300,000
rows to the application. And then the
application RM worked hard for nearly 25 seconds
to deduplicate everything and to construct the final aggregate
object, the aggregate root of this user.
So this is what may happen. You won't catch this issue
when you run your tests against like very small database
available locally, right? You won't catch that when you have 100
rows in your database. You can catch that if and
only if you have literally thousands or millions
of rows in your database. So because only then
you see the cross join product of all
those tables. How can we fix that in that case? Well,
once we identify what the issue is, the fix was pretty straightforward.
Instead of getting like the aggregate roots in one go,
I was basically getting it with multiple queries, with multiple
objects. This resulted in sending multiple SQL
statements to the database and then joining all
the results in the application code. One might say, okay,
but now this sends more SQL queries to the database.
And yes, you'd be right. But the only thing that is worth
noticing here is that those queries run way faster
now. They execute in like split second, and because they do not
extract 300,000 rows. So all of that is now much,
much faster. And just because we split one query
into multiple ones, that's not a problem at all, because this
ultimately leads to a better performance. So we can
see something that, hey, your unit test probably
won't tell you. You could capture that with some load tests and we'll
get to that a little bit later on. But generally
by testing things locally in your CI CD pipeline,
you won't spot an issue like this one because it will just work
fast enough. Moving on, your libraries
and your tools, or even you may write queries
that are equivalent in terms of what they do,
but are completely different in terms of performance. So let's
say we have this table that we call boarding passes. It has something like 8
million rows. And what we would like to do now is we have a
handcrafted query that is getting all the boating passes
and just calculating a checksum of the ticket number and
doing it twice using like MD five algorithm.
Right. What we do next is we basically
join this CTE three times and we filter for
some specific like double hash of the ticket number.
And this query, this query at the top that uses
CTE, which is common table expression, which is
kind of like temporary table used only for this
query. This statement is equivalent to
the one down below, meaning that the one
down below does not use CTE, but instead extracts
from the boarding passes three times directly and then does the
filtering in the work condition. So those two queries are
equivalent in terms of what the logical output they
produce. The only difference is the query at the top runs in
13 seconds total, whereas the query down below
runs in 8 seconds. So there's nearly half of
the time of the query above. Right? So again,
this is something you can't check with your unit
tests because hey, those queries just give you the proper
answer. And unit tests, they check whether your code works correctly,
checks whether the code production, the expected result,
right? But unit tests will not capture, and especially
your CI CD pipeline may not capture the performance characteristics.
And one spot that those two queries, even though they are equivalent,
they work completely differently. Another thing we
may hit is incompatible changes in schemas, right?
Adding a column seems like something that will not break
your database production at all, but this may
take a lot of time to apply. If you add a column,
then your table may need to be rewritten.
So data may need to be copied from the table on the site.
Table must be recreated and then the data is restored
back to the original table. And this is something you won't do in a
second. It will take you minutes. Dropping a column seems like
something that should be easy enough, but the problem is
if you do deploy to a really big fleet of your machines,
then you may end up with split brain, meaning that half of your
machines are already running with the new code,
whereas the other half of the machines run with the
old version of code. That still does expect
that column being there. Another scenario is when
you have heterogeneous application, meaning that you have
one database that is accessed from multiple applications
being written in various different technologies,
one in JavaScript, the other one in Python, in Java,
in. Net, in rust, you name it. And those different
technologies do not control the schema.
So if you change the table schema
in your database, then all those applications need to
be updated. If you do not keep the backwards
compatibility then you may end up with issues.
Another thing is just changing the column type. If you
change the column type, you may get your table rewritten. But this
may also lead to some problems that hey, now you lose like
precision, you lost some data quality, or maybe
your application cannot read the column anymore
because the internal representation has changed, right? So those are
the things that may go wrong when we are playing with or changing
the table schema when deploying to production.
Another thing is about indexes, right?
If you track an index in production, then your
query may just be very, very slow, and your
unit test is not going to catch that. So how can
you fix that? You just need to configure a proper index index of proper
type, whether it's Btree, hash, index, gen index, or whatnot.
If you configure a proper index, your application is going to
work faster. Right? But what happens if we tell some developer
that hey, if your query is slow, configure the index.
Then we end up, obviously with index all the things.
So you have too many indexes configured in your database. And while
index can help speed up the querying,
it will slow down the data modification. Because now
not only you need to just modify
the entity, but you also need to update all the
indexes around, meaning you need to update indexes
on every single entity and every single column that
you just configured. So one data modification may now lead to
multiple indexes being modified, which may be slower,
again, something you won't catch with your CI CD pipeline.
And we also get to this situation of like
we do modify, we do deploy our code to production,
and this code is buggy. There are different kind
of bugs. It may be buggy just because we didn't
implement it properly. This is something our unit tests will be
able to spot most of the times, obviously,
but there may be also bugs in the engine we
use in the database we use. One of the examples is
the Halloween problem. Halloween problem was a
case like back a couple of decades, back when if
you tried to update the salary of all the employees,
just as you can see on the screen, the database
engine updated the same row multiple times
because it was still meeting the condition. So here on
the screen, we want to update the salary when it's below 10,000.
So imagine that we start with 1000 salary and we want to increase it
by 10%. So we end up with 1100
and this is still below 10,000.
So the database engine kept updating this row over and
over again until it finally stopped
meeting the filtering condition. So until
everyone was earning at least 10,000. So this
is what happened now? Your databases protect themselves
from the Halloween problem, but you may end up hitting
some other edge cases. Did you even know that in your SQL
database when you have read committed isolation level,
then according to the standard and implementation details of
the databases, your application is allowed to
read the same rows twice or to
skip a row using read committed isolation level.
And read committed is most of the times the default isolation
level you use. But that's it.
Those are the big areas when we are talking about databases. But we
often interact with databases using orms.
Orms. So object relational mappers are libraries that help
us simplify querying the database,
mapping data back and forth between our application
and the SQL engine. So let's see what may go wrong.
The very first thing that breaks often is the n
plus one select problem. Imagine that we start with the aircraft
stable, which is in one too many relationship with seat
stable. So what happens now is we would like to get all
the aircraft and then for every single aircraft we would
like to get the number of seats. What may happen
behind the scenes is this will generate n plus one queries.
Why? Because first it will go to the database to
get all the aircraft from the database. And then for
every single aircraft, just as we are looping over all of them,
we get a query going to the database to
get one particular aircraft from the seats table. So this
results in one query sent to get all the aircraft and
an additional queries to get like seats for
every single aircraft. And this can be improved.
This problem is easy to solve. Instead of just going with n
plus one queries, we can just join two tables together in one
query and bank. We are good to go.
However, in order to generate this query,
this eager query that will eagerly get all the data.
Instead of doing that in a lazy mode fashion,
what we need to do is we need to reconfigure our
Orm. Do we see with this application code
whether it's going to send like lazy queries and
n plus one queries, or whether it's going to go with eager
mode? We don't see that and ORM doesn't show that
to us easily because the configuration is obscured and stored
somewhere else. However, even if
we reconfigure our OrM to always go eagerly,
which may not be the best idea, but even if we did that,
then we end up with different issues. Just the query. We already seen
a couple of slides back, right? We now get multiple
tables joined together. That slows down the performance and
the solution for that is just to split that into multiple
queries. So do you know how your RM
is going to work behind the scenes and what it's going to do?
And more importantly, can you catch that automatically with
your unit tests and with your CI CD pipeline?
And the answer is, most of the times you just cannot.
Moving on. Another issues that ORM introduce
are like issues related to impedance mismatch.
Impedance mismatch is like generic term, meaning that the model
we store in our object oriented applications
is different than the model in our SQL
databases. In general different because in all the applications
what we do is we for instance, have polymorphism,
so we can inherit from one class, can inherit from
another. And the question is how do we represent that in our SQL
database to not lose data and to not lose performance?
There are obviously a couple of different approaches to do so, for instance,
table per hierarchy or table per type.
But generally this is something that may lead to
issues in terms of the performance or in terms of
the data quality. Another thing that we may end up with
is the data types to be used. It sounds simple
enough to store data in the SQL database,
right? But how do you for instance, store the spatial data?
Spatial data is basically geographic location,
like longitude and latitude that we store somewhere around the
globe, right? We can store that in our database. And typically
SQL engines have dedicated data type to
store that. But how is our orm going to deal with
that? Isn't going to store this thing as like pair of numbers
or maybe as a string or maybe as something different. And you may
think, okay, I don't use spatial data, this is some weird edge
case I'm not interested in. But even strings is
something that may be very prone to the impedance mismatch.
Because in our applications, in JavaScript,
net, Java, whatever, we have just one
string type. But in SQL engine what we can often
do is we can configure a thing that is called collation.
Collation is basically the order of characters,
whether like lowercase letters are less than uppercase
ones, or maybe we have some national characters,
how to order basically, how to compare those characters
and what to do with the encoding and whatnot. And you can configure that
per database in your SQL engine,
meaning that you can reconfigure that dynamically
and your application should reflect that. But because
you have just one string type in your application,
then everything you have will not work well in that
case. So this is another thingy that
is caused by the impedance mismatch. Yet another thingy is like
precision, right? In our applications we have floats and
we have doubles and that's typically all we can use.
But in SQL engine you can use decimal on numeric with
the precision you can specify. You can basically configure
the precision of your numbers to not lose the data.
But once you go to your application, you may basically decrease the quality
of your entities. Another thing with RMS
is they lack the visibility, meaning that there are so
many moving pieces that rms do take
care of. Starting with transaction isolation devrel or
transaction in general transaction scope. Who starts the transaction?
Who rolls it back when it goes wrong? How is it
rolled back? Can you nest transactions together? Is the
data cached for your transaction or not? What is
your pooling for your database connections or query hints or
stored procedures and whatnot? All those things your RM
is taking care of. But the problem is, do you
see this configuration? This configuration is not
stored very close to your application code. It's basically
hidden somewhere in your app config,
and it may differ between your environments. So again,
your unit tests are not going to catch that.
Moving on. Migrations. Is your Orm handling
migrations or is it some set of SQL scripts
doing all of that? Is even your application dealing
with schema migrations and with the schema of your database?
Or maybe it's some other application doing that. What happens
if your Orm recognizes there is a schema drift?
So it recognizes that some differences in
columns or in data types? Is your Orm going
to come step in and fix those changes,
those differences automatically because some orms do
that? How do you deal with heterogeneous scenario
when you have multiple applications talking to the same database?
Right? Those are the things that are very hard to see
just because they are not very clear from our application code.
Not to mention that how do you test whether your migration is
going to run fast enough? So generally
your Orm hides tons of
various configurations behind the scenes. It starts with
like the application code that you have no idea whether it's going
to produce n plus one queries or whether it's going to download
the data in eager mode. We can carry
on with like models, migrations, configuration of
the RM, stored procedures, functions,
triggers and other stuff. And generally all those
pieces are there, but are very hidden very deeply,
very deep inside your application. So how
do you capture all of that? How do you verify whether what
you actually ultimately implemented, whether it's
going to work well in production or not? How is your CI CD pipeline
going to help you with that? And the thing
that we lack is the context. We do not
see how things that we implemented are going to work in
production. We do not see the performance, we do not see
what they actually do behind the scenes. And we lack this
context. And we need to have this context in
order to build a proper database guardrails.
What we need to do is we need to have
this ability to understand everything that
happens in our database, in our environment, right? What the
running configuration is, what the query is, what's the execution
plan and other stuff. And with the typical CI
CD pipeline with no database guardrails,
we have no idea how it's working. What we need to do is we need
to build database guardrails. So we need to build elements that
are going to protect us from deploying bedcode to
production and the elements that are going to observe how things
work in your production SQL to let you know
right when you are typing the code that these things are not going to
work well in your local database. So let's
see how we can do that. So in order to do that, first we would
like to understand how SQL queries are executed,
right? In order to do that, we need to understand SQL engine a
little bit. So nearly every single
database, when it executes the query, the query goes through
multiple stages. The very first stage is parser. So the query
is being parsed, meaning its textual representation is
parsed into a thing that we call abstract syntax tree
Ast for short. AST is basically a representation
of what the query is trying to do. Then this
query is getting rewritten using the rewriter because
you can write the same query like
there are multiple queries that are equivalent but are just written
differently. You use different aliases, you number columns
instead of naming them and whatnot. And those queries
are doing the same. But still we would like to be able to reason
about that in the same way. So that's why they need to be
rewritten into some standardized form that is just easier
to process. Then is the third step, which is called planning.
SQL engine plans how to execute
the query so it goes through the database and figures
out what indexes there are, what tables there are,
how it can join tables, how it can extract the data,
whether it can use indexes, whether it should cache the
data, hash it, sort it, and whatnot. This planning
is a process that is actually crucial to executing the query
because it provides the plan, the actual idea,
how the query is going to be executed by the database. And finally,
the executor is going to just get the plan and start doing
that. So let's see the execution plans
in action. So whenever we send a query like
the one on the left. What we get is we can always ask the
database to explain the query
for us, how it's going to execute it. For instance, this is example from
PostgreSQL and what we can see here is it generates a
very nice plan for us. Let's dive deep into this plan.
So every single plan consists of nodes. Basically every row
you see in this plan is a
code that represents some operation executed by
the database. And every node has type,
meaning that the type of automation it does, for instance
sequential scan or bitmap, heat scan or nested loop
join or whatever else. So generally operations that the SQL
engine is going to execute. But apart from the type of
the node, we also have the thing that is called cost.
Cost is basically an arbitrary measure of
how hard it is to execute
given operation. How hard? Mostly in
terms of like I o operations, how much data needs to be read
from the drive, how much data needs to be spilled over to
the hard drive and whatnot. But generally this represents the complexity
of the operation. The higher the cost, obviously the
slower the operation is. So what we can do now is
if we could ask our SQL engine,
hey, this is the query that my orm generated.
Please tell me how you are going to execute that.
Then we would end up with the query plan just like the
one on the right. And based on this query plan we could
tell how expensive it is. Obviously your
SQL database is going to generate
multiple plans and compare them based on the cost to
pick the cheapest one. But for us, what is important is
we can take this plan and see how the database
decided to execute the query, which indexes it decided to
use and why it decided to use these indexes and whatnot.
So this is the first thing we could do to start building
our database guardrails. The other thing we need to do is
how are we going to do that? How are we going to extract
all of these pieces? And to do that we can use the modern
observability tooling, which is called open telemetry.
Open telemetry allows us to capture logs,
track metrics from our applications
to provide better observability. Open telemetry, Otel for
short, is basically a standard provided by CNCF cloud
Native Computing foundation standard defining and providing
a set of sdks for the instrumentation. Sdks for
every single language you can think of modern languages that
support it and whatnot. So you generally drop an SDK
to your application. And now open telemetry is going to process things
that are called signals. Signals are track our metrics,
our logs, and other stuff that we can combine together
to understand what's going on behind the scenes in
our application and how things are actually executed.
And with Autel, what we can do is we can create spans and
traces. So imagine that we have an application that is
processing some workflow that could be like a checkout workflow
in our ecommerce app. So imagine that user clicked a checkout
button and wants to start processing the payment, right? So our
application, like our node a, this could be like web server,
load balancer or whatever, gets this request. And then it
needs to call some other microservices, like some queries,
some database, some log storage or whatever.
So we can see that this request coming from the node a is propagated
down the line to other services, right? And this is
what opentelemetry can capture for us. So we can capture
the whole view of what's going on, the whole
view that we basically call track. And track represents
one particular workflow in our application and
consists of spans. Spans are basically those single
pieces representing how a particular code
executed given stuff. So how there was a call
to some other node like Node C, how long it took, what was
the parameters and stuff and whatnot,
showing exactly what's going on. So we can see what details we can
capture using open telemetry, for instance, friendly name,
for instance, timestamps, for instance, some attributes and whatnot.
So this is what we can use to instrument our
application and capture all the stuff. And now if we
combine both of these things together, like the execution plans
we already considered and open telemetry, what we
could do is we could capture the true
behavior of our application. The only
problem is when do we capture that?
And obviously we could go with load tests,
get our application deployed, start load testing
it, to get all the execution plans, to get all
the statistics, all the metrics and whatnot, to see how it
works. The problem with load tests is though that
first they are super expensive in terms
of time and money. It takes hours to execute proper
load tests. It also takes lot of money to
basically pay for the fleet, for the generated traffic, for the
hardware. If we are doing load tests for ML based application, then we
need to have GPU, which is also crazy expensive and whatnot.
Second thing is we need to reproduce the traffic properly. So we
need to get the proper cardinality data distribution,
we need to anonymize the data, we need to be GDPR compliant
and whatnot. It's not as straightforward as possible.
And finally, those load tests happen very,
very late in our CI CD pipeline.
They happen after we merge the branch,
after we run unit tests and integration tests, after we do
the code review, after we deploy to pre production
environment, they happen at the very end of this pipeline.
And this is typically too late for us to
get a meaningful and actionable feedback because when we
implemented our stuff and we realize how
we did that and then load tests tell us, hey,
this thing is not going to work in production, then we are already probably
like hours, if not days after we had the
implementation phase of given particular feature, right? So load
tests are expensive, are slow, and are way too
late in our pipeline. So what we need to do
is we need to be proactive.
We can't let issues to appear in production.
We need to find the issues during our CI CD pipeline
as early as possible. We need to push all those checks
to the left and find the issues automatically
and monitor and observe our applications constantly to
get better troubleshooting and better root causing. And we need
that now. No matter what our application is, whether it's small
application or big Fortune 500 corporate with
enterprise application, we need that now.
We need a completely new modern approach
for getting proper database guild rows.
And Mattis does exactly that. Matis is the
solution that prevents the bad code from reaching
production, that can monitor and observe all your databases
and that can automatically troubleshoot them for you.
And this uses all the principles, all the things that we've just discussed, open telemetry,
execution plans and whatnot, and can improve
your CI CD pipeline by providing a proper database
guardrails. So let's see that a little bit in action.
So Matis prevents your database code from breaking production.
Once you register into the application, what you end up with is
you have a project. Project basically represents, let's say one
of your application with interacting with database.
Right? What we can do is you drop one independency
to your application dependency that uses open
telemetry and does all that we discussed behind the scenes. And what
we can do then is we can show you the recent activity. So for
instance, we can tell you, hey, your application is exposing
like those rest endpoints. And when you did call the rest
endpoint, you get like 200 HTTP code
as a response. But more importantly, those are the
SQL queries that were generated behind the scenes
by your orm, by your SQL driver, whatever.
We are developer centric. So we want to have a very straight
and very direct information, whether it's going to
work well or whether it's going to fail. And what
the impact is and how to fix a
particular query. So Matis does exactly that,
shows you all of that, and gives you very straight, very direct
signal whether the things you have are going to work well in production
or not. But if you want to dig deeper, then feel
free to do so. You can, for instance, get all the SQL statements
with all the tables listed, how they were accessed and executed.
You can get metrics of your queries, so you can see
exactly what the cost was, what the execution plans was
and whatnot. You can get like a nice visualization of
all the operations, how your SQL engine actually executed
the stuff right. If you want, you can get the raw
execution plan for you to process it further. And you
can also see for instance all the tables and metrics and
timings and other stuff that you have in your application,
right? So this is what you can do. And this gives you the observability
that you need to have during your CI CD
pipeline more. We can also integrate that
with your CI CD actions. So for instance,
if you use GitHub actions that you can configure your pull
request to basically have Matis
interact with you with the pull request to
analyze all the queries and all the schema migrations from
your application automatically, for instance, we can also analyze that hey,
you tried to migrate this schema. Those are the indexes you try to
configure. Maybe this is not going to work and whatnot.
So this is how you build a proper database guardrails into
your CI CD pipeline. You can get the immediate feedback
just when you are typing things down as a developer. But also
you can get this kind of like database review for
your SQL database interactions during CI CD
pipeline. But it's not the end of the story. Also after deploying
stuff, Matis can monitor and observe your databases.
For instance, it can automatically analyze the schema
of your database and suggest that hey, you do not have indexes
configured on that. Maybe you should do that in order
to improve the performance. But you also get a very nice observability
dashboard dashboard showing you like slowest queries. So for instance
you can see hey, this is the query that has been executed recently and bank,
this is its performance. And you can get anomaly detection with like
details of the deployments of running configuration and whatnot.
You can get statistics of your tables like number of dead rows, auto vacuum
and other stuff. You can get index usage hey, you do have
indexes configured in your database, but maybe this index hasn't been
used for like last two weeks and it's going to break,
you can get extension, you can get database config and all
of that stuff just by dropping one open source docker container
that runs alongside your database. So generally
this is what you can get with Mattis, and this is exactly what
we would like to have. We would like to have basically all
our stuff covered. We would like to have covered the source code integration,
pull request analysis. We would like to have constant monitoring
on observability and other stuff running 24/7
left full cycle, preventing the bad code from
reaching production and automatically troubleshooting
the stuff if something breaks in the production. So metis integrates with
your source code with various languages, various orms,
various databases, no matter whether you host them on premise
or in the cloud. It integrates with
your CI CD pipeline, analyzes your pull requests,
merge requests, for instance in GitHub actions. It can also give you
the observability using like ad hoc analysis, web interface,
CI tools and whatnot. So you get all of that at
your hand with the modern open telemetry standard
and modern approach to build the proper database
guardrails into your CI CD pipeline.
And that's the idea. Because databases, they may break,
you may get slow queries, you may get wrong SQl
schema migration, you may break the configuration by dropping,
removing or stopping using the index right. And you can't
wait for these issues to pop up in production. You need to catch them as
early as possible.
You can't rely on load tests. They are way too late,
way too slow, and super expensive. You need to
be proactive and push things to the left. And to all of that you can
use Matis. Matis covers all of that and makes
sure you do not fly blind and don't deploy bad
code to your production. And being that said,
I'd like to thank you for tuning in and coming to this talk.
Hope you enjoyed. Hope you liked it, and please enjoy the rest
of your conference. Thank you.