Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. My name is Adam Fremanig and thank you for coming to this
talk in which we are going to talk a little bit about
maintaining SDK over many
years. We are going to see some lessons learned, we are going to see
some experience that we gained when working with sdks,
and we are going to basically analyze a big case study
of what happened in Matis over many, many years
across many languages. I am Adam Furmanek. I work at
Matis as a devreau. Feel free to take a look at our webpage and see
what we do over there. And without further ado,
let us jump straight to the point. So, I've been working as a software engineer
for many years, and Matis is developing a software
that extensively uses sdks. So what we need to
do is we need to first understand what we tried to build
over those years, how we structured our sdks,
how we built them, how we evolved them over time. And then we
are going to see what particularly interesting we learned
and what we would like to share. So let's go. So the very first thing
is, what do we do at Matis? So Matis is basically a
software that provides you the ability to
build observability for your databases. The idea is
that you have your SQL database or NoSQL database
or database of any kind, and you have your applications
that talk to the database. Now,
in order to build the observability the right way, we need
to understand what happened in the database and in your application.
So we would like to understand, for instance, what rest
API was called on your application and then what SQL
query was executed as a part of
handling this particular rest call. And ultimately we
would like to get the execution plan. Why do we want to do that?
Well, the idea here is that developers,
whenever they deal with databases, whenever they work with databases,
they very often don't notice problems that
later can cause troubles on the production end.
And this applies to, no matter whether you're a small startup or big enterprise
company, all those places, well, they have them to face
the same issues. Why? Because many times
whenever we test our applications, we only focus
on the correctness of the data, not on the
performance, how things work. So we miss problems
like n plus one queries from our orm, or we
miss cases when our queries do not use indexes.
And when we test these things locally, or when we
play with those things locally, well, we typically have a
small database with what, five rows, ten rows,
maybe 100 of rows. But we do not have production
like database available locally.
So we do not know what the actual size of the data
is. So even if we have a slow query that
for instance, scans the whole table, then we don't know
that it is going to cause performance issues. And there
are no tools to prevent you from deploying such a
code to production. Yes, you can run load tests. The problem
with load tests though is they happen very late in the pipeline.
And then those load tests, when they show you issues,
you basically need to go back to your coding and
rewrite the solution, restructure it, and sometimes even
start from scratch. So this is way too late and very
expensive to be efficient. So what we want to do is we
would like to capture issues with your databases
as early as possible, ideally right when you
are typing your code. And to do that, Matis wants
to understand what the rest API was called,
what are the SQL queries that are executed and what the execution
plans are in order to tell you, hey, this query was
fast locally because you have only 100 rows in your database.
But hey, you scanned the table and if you deploy
this to production and you don't use index, it's going to kill your
performance. So this is a critical issue and you need to change that. And we
want to alert the users right before they even
commit the code or at latest during their CI
CD pipeline. So this is what we do. So we have a couple of
assumptions where we deal with and how we want to tackle
that. So generally we need to extract those three things
and we are breaking with web APIs. So generally applications that expose
like rest APIs or whatever else, and they are basically dealing
with the network traffic. Those applications can be running like locally
or in the cloud or on prem or whatever else. We generally
don't necessarily constrain ourselves what types of
applications we support. They are generally modern,
meaning that we do not focus on technologies from, I don't know,
ten years ago or 15 years ago. We generally focus on
things that are modern in the sense that we want to
embrace the problems with microservices
or the problems with unclear interdependencies between
applications, or many applications talking to many
databases at once, or a single application talking to many databases at once.
Generally this is the world where we try to deal with. We are not
focusing on like monolithical applications talking
to a single database. No. Instead we want to support
a case when we have hundreds of microservices with hundreds
of databases of various kind and generally
support all of that. And ultimately we want to support the
users in their CI CD pipelines as
much as possible. So not only work with them in their local
environments, but also work in their CI CD environments,
showing them hey, this is your CI CD and
you can feel safe and you can rely on it so that
when CI CD tells you everything is good, it's not going to
break in production. So this is the idea and we have a couple of tenets
how we wanted to build the sdks. First, they must
be easy to use. Breaking that, we do not want to build
a solution that is hard to understand how to set up and
hard to use. Our users ideally should
do next to nothing to use MAtis. Ideally it
should be like one command and all is up and running,
right. Another thingy that we want to focus on is just one
time integration of the Matis solution.
Meaning that it's not that when you have a team of five
developers, then every single developer needs to
do something to integrate MatIs. No, nothing like that. We would like
this to be a single time action. So you integrate MatIs,
you commit anything you needed to do to the repository
and bank. All the team can benefit
from the integration you just did. All the team, all the
company, basically everyone working with the product. No matter
whether this is like in house product or open source product or
whatever else, you do it just once and everyone can use Matis.
Next thingy is ideally no code changes.
Ideally we want the integration to not
touch your application at all. Not change your application at all,
if that's possible, obviously. So you don't need to modify
the application. But this also goes the second thingy, which is
we do not want you to change the way how you implement
your application. Yes, maybe you will need to add
like one line of code triggering Mattis or enabling Matis.
But generally we don't want you to change
the way you run your test, change the way you deal with your orm,
change the way you write your business logic. No, we don't want to touch
that. Ideally your business code stays the same,
your infrastructure code stands the same, the only thing you need to do is,
well, enable Mattis. And finally, we want to bring as
few dependencies as possible, ideally zero.
We don't want to bring dependencies on you so that you need to install
this library, that library or whatever else. No, we only
want to bring Mattis and that's it. The fewer dependencies the
better. So this is where we are. So we want to
implement sdks. We wanted to implement sdks for
the web applications that are quite
modern, dealing with microservices,
many databases. At the same time, we want to get
things that can show you this is what happened in your application
like API X has been called in
turn. This is the SQL query that has been executed
and this is the performance and how it was executed. So then
we can later tell you this thingy is not going to work well in production.
And all of that needs to happen automatically. Should be as
straightforward for user as possible and ideally not change
the user's code at all. So let's see what happened
and let's see what we build over the years. So generally
what we want to achieve is we
wanted to use open telemetry to achieve all of that.
When we were brainstorming and trying to figure out,
okay, how do we want to tackle this problem? So how do
we want to tackle this problem? What exactly happened, what query
has been executed and what was the execution plan? We decided
yes, we want to use open telemetry to capture the interactions.
Why open telemetry and what is open telemetry?
Opentelemetry is basically a set of sdks and
open standards showing how to shape the data
and how to send the data and process the data that captures
signals, signals from your application, signals like
metrics, logs or activity
or explanation about particular activity that happened.
So traces and spans. This is what is called in open telemetry
world. So open telemetry can capture that,
hey, this is the SQL query that was executed or
that was the interaction you had with some other microservice.
So opentelemetry is basically an open standard on explaining
and defining on how to explain capture that
those interactions happened. And Opentelemetry also provides
libraries and sdks for capturing those signals. Just like in
your application, you have some logger, right? You have console log
or you have just logger or you have system out print line in Java or
whatever else, you just print mistakes. And there are
many libraries that can take those mistakes and save them to file,
send them over, network, save them to the database,
add things like, I don't know, a date and time,
timestamp, Fred ID, other stuff, right? Those things,
those are libraries that you just use the same way. Open telemetry
is basically a concept and a library how to capture
metrics from your application. So you don't need to reinvent
the wheel, you don't need to figure that out from scratch. No, you just take
the open telemetry and you use it and bang, all your metrics
are captured. Open telemetry also provides additional
things to later process the data, process these signals and
visualize them. For instance. So we have the tools to
capture the signals so that those tools know how to
emit them, how to structure JSON data or whatever else,
how to send them over, how to process them and finally how to visualize
them. So this is what we wanted to do next. We want to get the
details from the rest endpoint and the SQL,
meaning that we basically want to capture something like your rest path.
So this is the API that was called under the API
X with parameters, blah, blah, okay, and the
SQL. So we want to capture the SQL
statement that your application executed. And once
we capture those two things, we can correlate them together
showing that hey, this is what happened.
So this API has been called and this is the SQL query
that was executed as part of handling the workflow in this
API. Once we have that we can get the query
and we can go to the database and ask the
database for the execution plan. So you don't need
to give us the execution plan, we can capture the query and we can get
the execution plan by using the explain keyword. So we basically go to
postgres or MySQL or wherever else and we send
explain your query. And this gives us execution plan explaining
how the query has been executed, whether it was using indexes, whether it
was scanning tables or whatever else. And finally,
once we get all of that, we want to send that to Matis.
Matis is a software as a service. So we send those details
to Matis and we can show you, hey, this is the API,
this is what happened. This is the SQL query. This is how slow it is,
this is why it's slow. And most importantly, this is how you
fix it. That's the idea. This is how we wanted
to tackle this problem. And when solving this problem, we actually
went through three stages of three different sdks.
So we maintained our sdks and we changed our approach
and we learned a lot over this time. So the first approach was
having SDK. So if your
tech stack was I am using Python,
with fast API, with SQL alchemy,
that's like one instance of a tech stack. If you are using
JavaScript with PG driver, with sqlize,
that's another instance of the stack. If you are using Java,
with JDBC, with spring, with hibernate,
that's yet another instance of the stack. So generally
we wanted to build an SDK per tech stack and we wanted to
support many languages, JavaScript, Python,
Go Java, Kotlin, C sharp,
Ruby, et cetera, et cetera. Many languages, many libraries,
many orms, many tech stacks to support.
The second approach was we wanted to reconfigure
the database a bit, to read things from the database logs
instead of necessarily instrumenting everything. And finally,
in the third approach, we wanted to utilize open telemetry
much more. So let's see what we did, how we did it, and what we
learned. So the first approach, SDK per
tech stack. So the way we wanted it to work was we
take your application or you take your application,
and this application has some basically entry point,
some web framework, so ORM library, et cetera, et cetera,
many things, and we ask you to install Matis
SDK as a dependency of your application.
And then this SDK does the following magic.
So whenever there is a request from the user,
from the browser, from external service coming to your
web framework, and as a part of this web framework, like handling
the request, you basically call your ORM library.
Then this OrM library is going to the database to extract
the data and thus select star from table. It then
returns the data and the data is ultimately returned to the user,
but at the same time as some kind of
a hook is sent or event
is sent to our SDK. So Matis configures
hookings on the ORM library and on the web framework
and whatever else to capture the
event that hey, such and such query has been executed
on the database as part of this particular single
flow. So Matis captures this thanks to
hooking, and then goes to the database to
explain this query, gets the data,
gets all the traces, ids,
identifiers, whatever else, and finally sends them to Mattis.
So this is the idea. So now how does it work?
In the essence, you take your application, you do pip
install of Matty's dependency at
the entry point of your application. You trigger one line of code,
something like Mattis enable, and then the magic
we see here on the screen happens. So let's
see how it actually worked and what wasn't working.
So generally this approach was quite good.
First, it was very easy to install. You just do Pip, install NPM,
install maven, install whatever else,
and bang, you have all the magic, you have the libraries and that's
it. Second, it integrates with the language of your choice,
breaking that. If you're writing in Python, well, you get Python API.
If you are writing in JavaScript, you get JavaScript API, right? We don't
need to change anything but that. We don't need to change
your database, we don't need to change your application because,
well, we just need you to enable Matis. And then
everything else happens thanks to hooks and whatever else. So we just
figure out how to plug into your web framework,
plug into your orm library and whatnot.
And it generally works everywhere with automated tests,
with the actual APIs. When you run the application,
it captures the queries, it can be easily disabled for
production because you can control it and just not enable Mattis.
So generally very nice, very easy and should work well.
The problems were okay.
The biggest problem is we need to implement
a new solution for every single new tech stack.
Meaning that if you are using a different web framework
like flask instead of fast API,
bank new tech stock. If you are using different SQL Alchemy
version, bank new tech stock. If you are using different
driver behind SQL Alchemy, bank new
tech stock. If you switch languages, you go with JavaScript,
typeScript, Java Kotlin, whatever else, new tech stack,
new tech stack over and over and over again.
So we would need to maintaining many many SDK
and we can't reuse the code at all. We can't reuse
the implementation. Yes, we can reuse bits of it,
for instance the part that sends the data to Matis. This can be
reused across all the Python sdks, right?
But if you try to reuse the integration with
ORM, or for instance your JavaScript library,
or for instance your web framework, no way, you can't
reuse that. So with every single tech stack we had to reimplement
more and more things. And generally maintenance of that was
super hard. Not to mention that even new version
of web framework or new version of the OrM
library could also introduce breaking changes. So we would
need to support older versions for the very long
time. Another thingy that doesn't work
well in this approach is, well, we have differences
between dependencies that we use. For instance,
if we want to send, I don't know, JSON data, there are
many different libraries that we need to use. We use different
library in Python, different library in Python two and Python three,
and different library in JavaScript, right? So those libraries,
first, integration with them is different, and second,
they have their own quirks, how they deal. Yes,
even sending JSOn and encoding stuff can be
very tricky. But there were also other problems
that, well, apart from the burden of
maintenance and implementation on our end were also like
hard in terms of how it worked. For instance,
integrating with open telemetry is not that
straightforward. Sometimes depending on your Orm library,
we may not be able to get the parameter values,
or extracting the parameter values may be harder because
we will need to scrape the locks or we can't correlate the
rest API with the SQL query because
they are executed on completely different threads and they don't share
any unique identifier. So generally there
are many quirks, many issues, how to correlate the rest API
with SQL and whatnot. Not to mention testing frameworks.
Some testing frameworks when you want to spin up your application
in like testing environment and then test it. Some of
those testing frameworks, well, they won't capture and they won't
initialize the rest structures properly.
So you don't know what API is being called. So generally this
approach worked well in the essence, but was
very hard to maintain, very slow to develop,
and also had some quirks that we had to overcome over
and over again. So we decided, okay, this is
something we just can't do. We won't be able to support every
combination of the rest of the web framework and
every combination of the ORM and the language.
That's going to be too hard and too much of a
burden for us. Let's figure something else. So the second approach we
wanted to take was breaking from the database.
So now the idea is we have the application
and then what we do is we would like
get the rest API again which calls the RM library.
Now this Orm library conceptually
doesn't go to the database per se, but goes
to the SDK which stamps the query.
So Matty's now takes for instance the identifier
of the rest call and puts it on the SQL
query inside a comment. And then this query is being
sent to the database and then the data is returned
and then Matis SDK at the same time sends
this information to Matis platform. So what
happens at this point? Imagine that you call API orders,
get order by id, right?
Open telemetry, initializes. This is
a new request with identifier. I don't know, one to three.
Some guid comes here, right? So we take this guid,
we put it on the SQL query and at the
same time we lets Mattis know, hey, there was a GUI,
one to three. And that was the interaction over
API orders get order by ip, by identifier
or whatever else. Okay, so this is what we do.
And later asynchronously we have another piece
that is Matis collector. That is a docker container that runs
on the side. Asynchronously it goes to your database and
reads logs from the database and
then looks for the execution plans for all the
SQL queries. And it finds the SQL query reads
the comment on the query which says this is the query for the
identifier one to three extracts the execution plan and
finally developers it to Mattis. So this is what we do.
So basically we install an SDK to your application
that stamps the SQL query with the
trace id, with the identifier of the interaction.
And then we have another piece that reads the logs from the database
and checks for the execution plans based
on particular trace ids and sends all of that to Mattis.
Okay, so that was approach number two. So what we
now have is again, this is quite easy to install,
just one command and deploying docker, one command and
that's it. And it works, it still integrates with your
language. So you still have like API that is your
language specific is idiomatic, is whatever else.
And we again make nearly no changes to
your application code, right? Because you just need to enable
Mattis and that's it. We can capture everything.
We can disable that for the application, for the production and whatnot.
The problems now the database must be
reconfigured, must be reconfigured because you need to enable
the database to log execution plans for every
single query. So that's quite a lot of work.
So you need to go to the database, change how
it logs the data, change what it logs,
and so you get all those logs and they must
be sufficient enough for us to understand
what happened. And this is especially hard when we are dealing
with ephemeral databases. So databases that you just create
for the duration of, I don't know, unit test and then you
take them down. So with test containers or whatever
else, right. Why? Because those databases, when you
just create them, they won't be configured
appropriately for our need, right. So what we need
to do is we need to step in and changes the configuration of
the database. But then many times we
need to restart the database, and it's very
hard to restart the database in an ephemeral context
when you are just spinning it with test containers or whatever else.
So this is hard. So sometimes we had to consider like building a
specific image of the database, for instance, a specific
image of postgres instance that would have this configuration
enabled. Again, those are problems that
are not easy to solve and they are highly dependent on
how you execute stuff. If you execute stuff in CI CD,
that gets trickier. If you execute that in GitHub actions, that gets trickier.
If you execute that locally gets even more trickier. So generally
those things are hard to be done. Again,
another issue around this reconfiguration of the database is the cost
money, because if you log and you explode
your logs, then you need to pay for storing these logs,
processing them and handling them. And this is even
harder because well those logs,
they take memory, then take space and
they cost, processing them costs a lot. So generally
it's not easy to do that.
And yet another issue we have is difficult query
stamping because some orm libraries,
they are tricky in the sense that first, you can't
put any comment on the query. Second, putting a comment
on the query may breaking some other integrations, for instance with
your monitoring solutions. And third, some libraries
when you send just one single query, hey, get me data from
these two tables. This library will generate multiple
SQL statements, but you can stamp only one of them,
so you effectively miss some queries.
Not to mention the same issues we had with previous approach,
meaning that hard to reuse the code between
languages and libraries. Why? Because even though we
do not need to integrate with
DRM to that extent as before,
we still need to integrate with your web framework for
instance, and we still need to support many versions and we
still can't reuse the code between Java, JavaScript, Python and
other places. So generally, still many issues with this
approach and generally this approach worked pretty
well. It was very promising but still wasn't
quite that easy and was very hard to maintaining and post many challenges
between many languages and technologies we wanted to use.
So we wanted to use something else, yet another approach.
So this year that yet another approach was moving the
ownership. So you can see that in previous approaches
we were building a solution that we wanted to build,
we wanted to maintain and we owned it.
Now we want to shift this ownership to some
other place. We wanted to build something or
users, something that we wouldn't need to maintain
and implement and fix every single time a new
library comes or a new version of a web server comes.
So what we build now is we
dropped this idea of SDK altogether.
What we now do is hey, you have your application
and we don't really care what is in
this application. You can see this application goes to the database,
returns data, no magic in here. But now what we want
you to do is we want you to reconfigure this application
slightly and use open telemetry to
just send us the logs and traces from
your application so we can capture them inside
the thing we call Matis collector. And Matis collector
goes to the database, extracts the execution plan and
stands that to Matis. How does this work now?
How is it? What happened here? So we
want to changes this approach now thanks to open
telemetry. Open telemetry is,
as I mentioned, a set of sdks and libraries that
can be used to emit logs, traces, metrics and
other pieces of information. And Opentelemetry has this
fantastic mechanism called auto instrumentation.
Auto instrumentation is the mechanism that
can enable instrumenting, the libraries automatically
instrumenting, so enabling them to
send metrics, traces, logs and automatically
breaking that. The only thing you need to do is you need to
kick open telemetry and say, hey, instrument everything
I have and then it will do the magic.
Okay, so now we want you to instrument your libraries.
So they send data to Matis collector,
which is a docker that runs locally on the same host.
But now the question is, okay, how do you trigger this open telemetry?
And the best is yet to come. Open telemetry is
or can be enabled from outside of the process.
You don't need to change the application code at all. All you
need to do is put some environment variables
and then run your application. And if
you have open telemetry in your dependencies,
then it's going to work, it's going to trigger itself automatically.
So how do you do that? Well, previously we were
asking you install Matis SDK, trigger matis
SDK during your entry point and then Matis SDK
will take care of hooking into your libraries,
extracting queries, using open telemetry to send the data,
et cetera, et cetera. Now we do it completely differently. Hey,
the only thing you need to do is install open telemetry,
which most likely you already have in your applications
because our assumption is your applications are modern.
And if they are modern then well, most likely you already have the telemetry
and you then need to trigger this open telemetry
and that's it. And triggering the open telemetry is as simple
as you just put some environment variables and then it goes.
So that's it. So now you just install dependencies
once. So you do pip install or add things to
your pomfire or whatever else, and you put a couple of environment
variables in your start script,
bootstrap script, and then all the
traces are automatically sent to the docker
that we provide to Matis collector. And you basically need to run this Matis
collector somewhere locally. So you can spin it up with test containers
or spin it up wherever you wish and that's it. And then Matis
collector gets those traces. It can extract
the SQL queries from those traces, go to
the database, explain, get the execution plan, send it
over to Matis. So this is what we can do. Now pros
of this approach, no changes to application code,
literally no changes. And there is an asterisk over here because
that depends on the language you use. You get no
changes in python, no changes in JavaScript,
no changes in Java, no changes in. Net, no changes in
many languages. But for some languages you need
to do some changes. You need to trigger open telemetry manually,
for instance in C Plus plus, right? So generally
there is an asterisk, but most of the times you
don't need to change your application code at all. You don't need
to change your database at all, meaning that we don't
need to reconfigure your database anymore, we can just
send the xplane and that's it. You don't
need to change the database, meaning that we can support ephemeral databases,
read only databases, whatever you have, we don't need to touch that.
It integrates with the language in this sense that
the way you enable open telemetry depends on your language
and it is well integrated with your language. So it can use
things specific to your language, it can
use dynamic code execution, it can use additional parameters,
I don't know, to node runtime, it can use additional parameters to
python, whatever else. So it basically works in
the idiomatic way. It can be easily disabled for production
because, well, you just don't let open telemetry send
things to us. Not to mention you don't deploy lets collector and it
works. And the best of all the worlds is we
don't own it, meaning that if there
is a new version of the Orm library or new
version of the web framework, then it's
on them to integrate with open telemetry because
they want to integrate with open telemetry. So if they
break something or introduce breaking changes,
they need to integrate with open telemetry. So they are
going to fix that. And from our perspective nothing
changes because if it doesn't work it's going to be
them who fixes that. The only thing we
need to do is we need to maintain this collector that
does the magic. However, there are some problems.
Sometimes we need to change the code depending on the programming
language, for instance c plus plus or go. Not all libraries
support auto instrumentation. It's more and more of them
and it's obviously for their benefit. So mature
and popular libraries, they are integrated
with open telemetry and they support auto instrumentation because
it's for their greater good, right? But some of them, they are not
integrated and in that case we basically need to do some
magic. For instance, other three lines of code to extract the things
with hooks just like we did before. Sometimes those libraries
are integrated with open telemetry in a way that we can't
use them easily because for instance, they do not emit parameter
values for SQL queries. They just, whenever you do select
star from table where column greater than ten, you don't get
this value. Ten you only get placeholder like dollar one that there
was a parameter in that place, but you don't get this parameter value.
So we need to extract the logs for instance and parse
those logs to reconstruct the actual query.
Sometimes it's hard to correlate like rest with SQL
because well, open telemetry can't correlate that so
we can't show you. This was the SQL that was part of this rest API.
Sometimes it doesn't work well with testing frameworks, but generally
this is a pretty good approach. And the most important part
is we don't own it. Meaning if something breaks
the authors of the libraries, they need to fix that. So generally,
based on those three approaches and based on the
history of how we evolved those dks in many languages,
this is what we learned. We learned that uniform functionality
is crucial. We learned that version management is
crucial, and we learned that diverse languages and
idiomatic approaches and other stuff, this is hard to keep
in track. So let's see what actually we
learned first. Uniform functionality.
Whenever you deal with those sdks,
sdks for different text tags, sdks for
different languages, sdks depending on
the particular version of a particular library.
We learned that hey, those languages are different.
Some languages they have static typing like compile time
type checks. Sometimes they have dynamic typing
or they can change types of variables or whatever else.
Sometimes you have generics, sometimes you don't. Sometimes you have macros,
sometimes you don't. Sometimes you have dependency injection,
sometimes you have aspects, but sometimes you don't.
Sometimes you can generate the code on the fly or even execute
the code from a string, for instance, evil in JavaScript.
Sometimes you can't do that. So whenever
you deal with many languages and you want to keep your
SDK uniform across many languages,
you need to understand, okay, do I want to
embrace additional features of programming languages?
Or maybe I don't want to do that.
And what I want instead is I want to keep my SDK
implementation as primitive as possible so that
all the features are users can be basically
implemented in every single language. So you don't use generics,
so you don't use macros, so you don't use dynamic code
execution or whatever else, right? Those are the things that
you need to take into account. But sometimes you can't do that.
Sometimes you really need to rely on
the particular language because you need to integrate with the
ecosystem of this language. If you need to integrate
with ORM hacks, then, well, generally it's not easy
because you need to rely on the particular OrM
implementation, right? So there are many things that you need
to consider when doing this. Sdks for many
languages. For instance, can you even represent your data
structures the same way between languages? If you don't have
generics somewhere, then well, you won't be able to represent
those data structures, right? If you have class based inheritance
like Python or Java versus prototype based
inheritance in JavaScript, then how do you implement your data
structures the same way? Super hard. Another thing
is, can you use the same protocols for communication?
Right? So can your SDK communicate
of a network using the same protocol? Do you have JSON
support in your language? In all the languages? Most likely
yes. Do you have GRPC support in all the languages?
It may get trickier. Do you have, I don't know,
so specific proprietary protocol support
in all the languages? Definitely no.
You need to think, are there any implementation differences
between the languages that would affect how your SDK
gets initialized, for instance, or how your SDK gets installed,
or what things you can use? Can you use private data? Can you use public
data, et cetera, et cetera. How do you even deal
with evolving the schema that
you use to communicate between your SDK and
your software as a service platform? Right? How do you
introduce optional fields? Can you add optional
fields as a dictionary, or can you put them as a pass
through? How is this going to work when you have various libraries
that deal with that and answering all those questions?
Those things breaking using idiomatic
approach is generally much harder and much more time
consuming. And using specific things like generics, et cetera,
et cetera is not reusable between languages.
So years you can implement every SDK differently,
but those things are not easy to translate
between languages, so it increases the burden of the maintenance.
And finally the documentation. How do you
write a documentation for different sdks in different
languages? You would like to get the same documentation,
right? You would like to get the same parameters, the same
APIs in your documentation. And to
do that, you need to have exactly the same functions
across sdks. So generally,
in order to make this functionality uniform,
functionality easier to implement and maintaining between text tags,
you need to drop support for many
language specific things, language specific extensions
or constructs or whatever else. So to keep it maintainable
over time, you need to keep it as simple, as basic,
as primitive as possible.
Another thingy is what protocols
to use. JSon may sound like a great
solution for everything, right? You just put JSon, any language
can speak JSon, any language can use it.
What could go wrong? Well, yes, implementations of
JSON libraries, they are different. If you need to deserialize
data, sometimes you need to have metadata on your JSon, sometimes you
need to write this deserialization manually over and over
again between languages. Sometimes JSon is ill handled
in different languages, depending on the escaping,
depending on special characters, on character encoding, et cetera,
et cetera. Not to mention there are even differences in HTTP handling
between languages. So generally JSOn,
while it sounds easy, is actually hard to maintain
over time. GrPC, on the other hand,
is way easier if you can use it in
your language. So if you take the languages that you want to support
and you have GrPC in all of them, it's going to
be easier. Why? Because in GrPC you define your schema
just once and then you don't care. And then GrPC
library takes care of generating classes,
generating structures, serializing the data, and even minimalizing
or cutting the network usage. So this is generally
when you choose, when you need to pick between like JSON
standards or GrPC generally consider things that
have bindings to all the languages, and there is one single
entity that maintains those bindings. So it's way easier to deal
with that. You don't need to go and look for different JSON library
in each language, just go with GRPC or
go with something that supports the languages. And similarly
the same goes for protocols. Should you
use an open protocol, an open standard,
or a proprietary one? Do you want to implement your protocol
or your data structures or whatever else,
or do you want to take some open standard, like open telemetry
for instance. So with proprietary protocol, the biggest
advantage is you can send anything you need and
only the things you need. You don't get noise,
you don't get the noise, and you get the things you need
in exactly the way you need them, right? But the
problem with proprietary protocols is, well,
you don't have libraries to deal with that. You need to maintain your
data structures, you need to maintain your communication,
you need to implement that. And if you want
people to help you, like from the open source world, they won't be
able. So generally go with open standards,
because users will have libraries for that,
users will know how to use it and you don't need to
own it. But the problem with open standards is that sometimes
you need to squeeze your structures into those
open definitions so that can be delivered
and handled between languages.
So generally, whatever you do, just don't
try reinventing the wheel, don't reinvent
the stuff, do not build your stuff, keep it basic, use open
standards and that's it. This way you basically
minimize the amount of the stuff
you need to maintain over time. Next thingy is
version management. So we do have
Sam versioning, right? So we have those like major version,
minor version and patch version, and we can use them
to indicate what has been changed, right? But now comes the problem.
Okay, what if I have sdks in different languages?
Do I bump their versions consistently or
do I just bump the version independently?
And how do I know whether SDK in Python
in this version supports the same set of features
as SDK in JavaScript in that version? How do I correlate
all of that? How do I adopt new features from the
language if I want to use them? Do I actually need to
bump the version across all the sdks or just
one SDK? And how do I actually keep track of my
versions? How do I actually test the versions? How do I do all
of that? So there are many things that you need to
consider here. How do you add a new feature to all
sdks at once? Do you keep those features consistent
across sdks? Or maybe you just let them live
independently? What's your release cadence?
Do you release a new version of all languages at once,
or can they go out independently? How do you deal
with things like, I don't know, logging between technologies or
how do you deal with language specific options? Yeah,
so all those things, they are very hard to deal.
But generally whatever we learned is that first,
whatever you do, you need to keep your
environments tested as much as possible.
You need to test the stuff in a reproducible manner,
so you can take the things and reproduce them locally
or in the cloud or in CI CD. So generally docker
test containers nix other tricks that maintain
those versioning for you. You want to run the
tests across all languages for every single change.
And if you find a bug in one implementation,
like in Python implementation for instance, then most likely
the same bug is there in implementations for JavaScript,
Java, whatever else. So always look for those implementations
everywhere. You need to try reproducing these things everywhere.
And you would like to ideally have a common
test set for all the sdks. So you
don't have like language specific tests, or you don't have different
data like sample data for testing. No, you want
to have those tests running uniformly in all
the technologies, so it's super easy to maintain. So whenever
you need to introduce changes in one place, you know how to apply
them in other places as well. Other thing is
consider using like Monorepo for
having all the packages and libraries that you want
to use and users tools for that. For instance, use learner in
JavaScript to keep those things in place and under control.
And be explicit about your dependencies. Never use transitive
dependencies that you do not control or versions
that can go be bumped without you
knowing about that because then those things may breaking accidentally
and you have no idea about that. Be explicit about
dependencies. Have as few dependencies as
possible and control the versions of your
databases and of your dependencies.
And generally use as few dependencies to
not cause conflicts, conflicts on different
lessons between your sdks or your SDK
and the user code. So generally be very
explicit about that. Run your lets constantly and have
them uniform and as easy as possible. Basically consider sdks for
all the languages as one single SDK.
That's the easiest way to keep that in shape and maintain that
over time. And finally,
diversity of the languages. So generally languages
are different. And it's not about languages per se,
but it's much more about the ecosystem of the languages,
like dependency management, like the quirks
of the platform, like the way how you deploy stuff on
the platform, like the way how things evolve, how net
framework changes to net core, how things get
dropped and support gets lost. So generally those
things are hard to understand, how to grasp
for one person. One person won't be able to do that.
One person can't understand all the ecosystems. That just
doesn't work. So what you'd like to do is
you would like to have, what worked
for us is we have a language champion.
So for every single language that we wanted support,
we had a designated person, a language champion,
so the person that knew and had to stay up to
date. And on top of all the changes in the language and in
the platform that could be affecting our sdks,
there was a change in, let's say,
orm versions, or a change in SQL drivers,
or a change in dependency management in given language,
or a change in other things, build systems or whatever,
all change in features that a language supported, right?
The language champion was supposed to stay on top of that
and they had to push this knowledge and
those updates onto the rest of the team. So we
had like regular updates, weekly meetings where we discussed,
okay, what new happened that could affect
our sdks and what also broke
in those sdks. Okay, what happened in Python SDK
recently that broke the Python SDK and that
we think could break other sdks or may affect how
we want to evolve our sdks over time. So having
this language champion was the solution for us to
actually deal with that stuff. So that was
basically it when it comes to what we did. So in
summary, generally what we do is we had
those evolution of sdks and the lesson learned always
is the best thingy is the thingy that you don't own.
So minimize the set of things that you
need to own and maintain. Minimize the number of features,
minimize the diversity between languages and the sample data
you use, generally keep that as small as possible
and just have this under your control and
test things constantly in a reproducible manner
as much as possible. And have a language champion that can
help you maintaining those sdks across the
planet. So this is what we did, and thank you for listening.
Drop me a line if you have any questions. Join our discord.
Take a look at the webpage. My name is Adam Furmanek and thank
you for watching this session. I hope you enjoyed it. Thank you.