Transcript
This transcript was autogenerated. To make changes, submit a PR.
You. Hi everyone, and welcome to this session on compelling
code reuse in the enterprise. This may seem like a funny
introductory slide to get started with this person programming in
the background that is clearly doing it, maybe anonymously
or in the dark, but in reality, sometimes this can be us,
right? Especially from a code reuse perspective you might have experienced
in your organization. The reality that is simply wasn't valued to
think about code reuse because you work in a distributed microservice world, maybe that just
didn't fit. Maybe you see the value and the only way you got
it done was to be that person who added the extra value at
nighttime. And I've definitely been there myself, where I've
pursued this with a passion and seen how code reuse can
be very compelling, be very valuable, but honestly, it can be
very difficult to know when to reuse and how to reuse effectively
without causing a bottleneck or a problem. So that's what this talk is really all
about, focusing on how do we identify reuse properly, and then looking
at a key example through service templates and service chassis that can provide a
really helpful way to reuse code in an effective way, especially in a
distributed microservice world. This quote I love from Eric Raymond,
which says good programmers know what to write, great ones know what to
rewrite and reuse, because understanding when to rewrite something
and of course our discussion today when to reuse something is pretty difficult.
Just as we get started, a little bit of information about myself. My name is
Travis and I work as a principal software engineer for SPS Commerce.
You probably haven't heard of SPS commerce. We're a business to business organization that
focuses on exchanging invoices and purchase orders between retailers
and between suppliers. We're the world's largest
retail network and have hundreds and thousands of retailers and suppliers
in our network. And my focus specifically there is on developer experience.
And you might initially ask yourself, what exactly do you mean by developer experience?
It's a pretty loaded or overloaded term these days,
and specifically when I talk about developer experience, I try
to define it and put a boundary around it. And it's a relatively new term.
It's developed over the past five or six years. You might be thinking
it's kind of like developer advocacy or developer relations,
and it definitely is tangentially related to that, but it's something that's slightly
different than that. And this is a great definition that I've seen from the
appslab.com, which is developer experience is the activity
of sourcing, improving and optimizing how developers
get their work done. And so when we think about how developers get their
work done, in a lot of cases we're thinking about the Persona of the developer
and how they're moving information to production using the development principles of
your organization. So that's where we tie together the experience along with the development
principles which are very unique to your organization, really form
this Venn diagram, this concept of developer experience.
The reality is that engineers in different organizations,
depending on how old your organization is, have really fought to
understand what tools they should use. Are there standardized tooling that they should
use for CI CD for observability? In some cases yes. In other
cases no. There's just a plethora of different tools they could use. And so your
engineers and your leads are fending for themselves and picking and choosing the right tools
within this jungle of tooling that may exist. And in
many cases these tools have come online out of necessity
and need, but they haven't really been thought through on the complete experience on end
to end moving something to production and how they integrate with each other.
And that's why we think of this quote as fairly helpful for understanding the
problem that developer experience solves, which is developers work in rainforests,
not planned gardens. The idea that you never had an opportunity
from a greenfield perspective to plan exactly what your
delivery process would be like and the tools involved in it. But rather
it grew out of many different requirements and needs over
the years. And so developer experience is really starting to take a look at these
particular areas and starting to hone in and focus on some of them.
At SPS commerce, we think of developer experience in terms of capabilities.
Capabilities help us put a bit of a boundary around exactly
what we're discussing and exactly what is involved in developer experience. So these
are identified horizontal fast tracks to be curated for maximum
productivity within the organization. And so if we draw the organization
like this in terms of development, operations, cost, security, we been
to draw these horizontal fast tracks. In developer experience,
one might be a very common one which is building and deploying a
new feature to production. How do I do that? Am I using feature flagging?
What am I using for CI CD? How does all that integrate together? How do
I observe the metrics at the end of it might be building and deploying a
new application from scratch and having a developer portal to help you through that experience.
It might be API design which can be a large part of developer experience in
how we consume and how we produce APIs.
Or as we move towards our discussion today, it might be
code reuse, how do you think, but code reuse, how do you think about inner
source within the organization? How do you know when it's appropriate and effective
to think about reuse? And of course, developer experience and code
reuse go hand in hand so closely as we think about it.
And code reuse is defined as the act of recycling or
repurposing code parts to improve existing or to
create new software. We write it once, we use it multiple times, right?
Pretty straightforward. And as we think about the stack
and the technology that we've all built our professions and our careers on,
and that slowly grows here. In terms of your low level machine
container runtimes and the OSI model for networking,
the application runtime that you're using, the web application framework
you're using is a library, a piece of shared code, much like you might develop,
but yours would be more focused internally on your domain.
And so when we think about developer experience, we're actually thinking about the entire usage
of all of these stacks, everything from your local iDe and what you use and
how you do it all the way up to how you're building software
on top of that, and what are the specific pieces of reusable code
within your or custom domain. So today we're focused
here on this particular horizontal, which is your custom domain,
your logic, your information that is specific to how you do things
at your organization, your best practices, your tech principles.
And the title of this particular talk titled
Compelling Code Reuse intentionally uses the
word compelling, and it has two varying definitions,
automated with, that are both very relevant for what we're discussing. And the
first is to force or push towards a course
of action. And my goal today is not to force or push you, nor would
you be for you to go back and force your organization to do code reuse,
but rather thinking more appropriately on the second definition, which is sharing a
powerful and irresistible effect, requiring acute admiration,
attention, or respect. In other words, if we're doing code
reuse correctly, which I hope I've convinced you by the end of this presentation,
there are appropriate ways and compelling ways to produce and use
code reuse, then it is an irresistible thing. That mentality
of whether we should do it or not, especially within microservices, begins to shift
and change. So let's go back to basics
a little bit and think about theoretical code reuse. And here we're just simply talking
about the general day to day practice of how you incrementally share code and
move it across your particular project and the software that you write
the first is a simple module. A module could be a class,
it could be a project, it could be any level of granularity that you want.
And of course inside that module we have a function a function a
we reuse inside module b. And that's pretty straightforward. We can already reference that and
make use of it. But if you're working through an n tier architecture for whatever
application you have that has layers for different reasons, architectural purity
and that sort of thing, then module c may want to reference function a.
And it wouldn't be appropriate for to reference it directly because it
should have no knowledge of module a which has the function in it.
So you do normally what you would do, you abstract and you pull out and
you create different reference components for it, and you create this
module d and apply function a in it.
And of course that creates a standard project that you have and you're happy,
you're reusing that. And that was pretty easy to do. And there's not a lot
of concerns on what it would take to refactor that as you launch a
new project. And that project might exist in the same repository
or same proximity or location. Of course, if you want
to use function a there, you would have to abstract and pull out function a
into a new project and then have both these projects reference it. It wouldn't be
appropriate for project b to have full reference or understanding
of project a. And as I said, these are inside a
similar boundary within a repository, and that's pretty straightforward to
do. That isn't a big concern. That allows me to refactor and move quickly.
But a repository, like other natural
barriers, is where we begin to find code reuse difficult. When that's
in another repository now it becomes a really interesting question,
a much different question, which is, how do I effectively reuse
module a, which has function a in it, knowing that it's
not accessible to me? How should it be distributed? How should we
look at it? So that's why when we think about code reuse
and we think about many things, they get much harder as they become a distributed
problem. And so today we're really talking about the pattern of distributed
code reuse, and it's always harder at scale, right?
We always see that it's always harder to work with
microservices as they grow in terms of the volume of them than it is
having a monolith. But of course there's a lot of advantages to that, and in
the same way, there are a lot of advantages here. We just need to understand
them and the characteristics of them correctly. So if
we were to take an example to try and understand the complexities we might encounter
now from a distributed code reuse pattern, let's look at a couple
different microservices. We might have service a,
service b and service c, and in these microservices we'll
want to reuse our function a. And of course, easiest way
to do that is to distribute them in a package. Might be a Maven package
deployed to Maven Central, or a package you've pushed and deployed to
your internal JFrog usage of it,
and as you begin to roll that. But there's a creator of
function a that puts it together, this team that owns it, and there's
different teams that are consuming it, and they may or may not have knowledge or
understand who that team even is. But we begin to see a lot of drawbacks
here that people start to encounter, and that's why this is maybe discouraged
in a lot of places. The first is large dependency
chains,
especially in Java applications, right? That's no surprise. We can
have a lot of dependencies here. And as those dependencies begin to
grow inside function a, service a now has no
choice but to bring those dependencies along for the ride. The initial
dependencies that they brought in with function a may have grown over
time as they upgrade versions. And so we see what we call the dynamic
equilibrium, or this shift over time of technical debt that can be
built up as a result of that. And in some cases, dependency chain can be
pretty large and can be pretty dangerous if you're not familiar
with who created the package. We also see enhancement friction
become a problem where one of the services decides it needs a very small update
to function a, to introduce more flexibility or a new feature or capability of
it. And of course function a, the team owners that may
or may not be aligned to that change, they may also decide that they
don't have the time to look at your pull request or look at that change.
And so it becomes very difficult now for a small change to happen, which isn't
something that we value in microservices, where we can make changes and deploy very quickly
as a part of, in some cases, your standard two pizza team.
Of course you have administration and cost and ownership involved here.
Who actually owns this as it grows, who's going to continue to maintain it?
It's all fun and happy in the first six months, but after that
it's maintenance work. We have to continue to update it and patch it.
And is the team who originally pushed it out, are they willing to take that
on. Do we even know or have slas on expectation for them? And that leads
us into roadmap on politics and opinions. And whether
the team that owns function a really thinks it should even proceed in that particular
direction or not, that a particular service wants to take it,
it can deviate and begin to change. Do we understand the roadmap for it?
And of course we think about exponential flaws and vulnerabilities. Well, we have the advantage
here of having a single reasonable piece of code. If we're not maintaining
it, we're not supporting it and staying up to date with it. Well, now we're
using up vulnerabilities and flaws much faster than if
the team had to be aware of what that logic was in that code
themselves. And of course, over time, this can be compounded
by multiple active releases, meaning that the team has quickly pushed out
of version two. But there's lots of services on version one. If there's fundamental
differences or changes between those major versions, can a team
even easily upgrade to version two? And so you end up maintaining
one, two, three major active versions that all
have to be handled, and that just compounds the time that needs to be
spent. And of course that gets even worse with competition.
So as services decides that there's too much friction here, I can't
get what I need, the dependency chains are too large. I can do that better.
Sure, they go ahead and they try to do that better. They create another version
of the repository, they manage it, and quickly. The same thing happens
for consumers of that repository. But now the problem is worse, because there is
two function a's and copy of function a.
And of course this maybe is an accurate picture in your organization.
But in a lot of organizations, especially with some of the values of microservices,
a polyglot ecosystem is a very real scenario
that you might encounter. So as we see that with service d might be a
net six service, and it might be consuming or
intend to consume the same function a. But of course it can't.
And so rather than distribute it via a reasonable package,
we often might say, well, this is where we should have just created another microservice
that can consume that, and it can be agnostic of the language. We don't have
to care. And it can begin to move away from some of these problems.
But the reality is, and what we're going to discover today is some of the
best things, some of the most appropriate things that we want to reuse through distributed
packages cannot be made into a microservice.
They are things that are used for cross functional concerns and for
technical bootstrapping. And so we can accomplish, or I
should say we can overcome a lot of these scenarios, but we have to apply
very specific intention. We have to know exactly what
we're building in order to do it effectively. And this requires intentional,
necessary effort. We don't get distributed reusable code for free.
And so we then understand more or better
about this myth and the reality of why we
should not share code within microservices. In fact, diving is some of
the most popular books over the last few years. One in particular that's always
been close to me has been building evolutionary architectures, which says microservices
askew code reuse. Adopting the philosophy of preferred duplication
to coupling. Reuse implies coupling and microservices architectures
are extremely decoupled. These are opposite characteristics I
don't want to couple together. So we're going to have to think about coupling pretty
heavily when we talk about distributed code.
Of course, as we read further in the same book, we find that code reuse
can be an asset, but also a potential liability. So we understand the liability
portion of it, but also recognizing that if harnessed correctly,
it can be a big asset, making sure the coupling points introduced in your code
don't conflict with the goals in your architecture. So we're going to have
to expand. We're going to have to understand that a bit more as we dig
in. But the question remains still then, do I duplicate
the code or do I reuse the code? Understanding that you're going to have to
make this decision, especially if you haven't made it yet in your distributed service
system. And of course, if we think about copying code
versus reusing code in terms of the law of diminishing
returns, we see some interesting characteristics
developed too. If you're not familiar with the law of diminishing returns,
it's a principle stating that profits or benefits gained from
something will represent a proportionally smaller gain as more
money or energy is invested in it. So what we mean by that is you
might have a line that looks like this. Comparing cost and resources for copied code
I copied the code initially and that was pretty easy
to do. In fact, it was a lot easier because I didn't even have to
write the code in the first place. So with using this with one or two
additional resources, that you're compelling it to different places
that you'll put the code, it actually gets cheaper. But eventually, as you
have to maintain that piece of code and you've copied it across four,
five, six services. Now, when a change comes, or when it
has a package or dependency that it needs, and you're keeping that up to date,
this begins to start to cost you more than if you had distributed in the
first place. And so we see that change then in the
cost start to rise pretty substantially as we increase the number of resources.
So we have to consider the fact that depending on how many times I need
to reuse this code, it can be a lot cheaper or it can be a
lot more expensive compared inversely then to reusable
code, we see the opposite characteristics up front. The first time I
want to reuse something, but I want to make it reusable code and distribute it.
That actually costs me more the second time I do it.
But the reality is that it's probably even much more than shown
in this curve. That initial inclination to use distributed
code can be pretty costly if you're not sure your projected outcome and where you're
heading with this. So let's keep these characteristics and
diminishing returns in mind as we consider code reuse. And for today's
discussion we're really talking about then how do we mature our code reuse practices?
And we don't have time to dive into all of the characteristics and
dimensions today, but we're going to first look at coupling.
Coupling is the best gauge that we have to understand how others
might make use of our particular piece of distributed code, how coupled would they
be to it. And the second part of that then is assuming we've made
the decision to reuse code, let's look at a scenario that can
offer us, in some cases appropriately, highly coupled
scenarios that are actually effective in providing value to the organization through templating.
And specifically we'll looks at service templates and service chassis there.
So, diving first into compelling, let's understand
exactly what do we mean by compelling? It's fair to always just jump back
into a definition for that out of the book we've been working with, which is
building evolutionary architectures. It defines coupling as how the
pieces of the architecture connect and rely on one another.
And that's helpful. But I think that actually for once, the Wikipedia
definition is even better here, which can help us break this
down. Coupling is the degree of interdependence between software modules.
So number one, a measure of how closely connected the two routines or modules are,
and number two, the strength of the relationship between the modules. So we're
sharing to blend and move into the area of domain driven design, and how close
the different domain aspects are to this. So let's dive in.
We'll talk about reuse, talk about duplication, and then we'll have a little
bet of a dive into reference
and reference code. But first to get started, when we think
about coupling code reuse, let's look at an example. And this is a real world
example that I've worked through and that I've lived and felt the advantage
and the pain of it was building an s three multipart upload.
And this s three multipart upload seemed like a distributed piece
of code that we wanted to write to make use in a couple of services.
It was going to be a net based project and it was going to be
used at least in two services, and likely several more after that.
And so we had a clear identified need and we knew what we needed to
build. And it was a bit more low level code than our business
logic or the business logic of our services would care about. Meaning that it's s
three multi part upload, it's chunking, it's streaming, it's using
buffers, it's resetting buffers and streams using that type of code,
which is fine to say that, I'd like to make that reasonable. That makes sense.
It is a little error prone. If you don't reset those buffers or those streams
at the right part as you're chunking and calculate the number of bytes properly,
then it can be problematic. And because of that, it's also difficult to test because
we don't actually have anything other than maybe local stack or a
mock version of it to use locally. And so performing and writing integration
tests and ensuring that it actually functions against the real thing is not something that
we want to distribute across all our services if we don't have to.
And this seemed like a really good identified piece of code we wanted
to distribute, but it actually broke down and caused
a lot of the pain that we talked about earlier, why it
was inflexible and specific. We didn't correctly understand
the contract or the interfaces that we need. With only having the two services to
develop it with upfront, the module began to grow. It turned
into a different set of roadmaps and opinions from just here's a multipart
airplane component turned into a cloud package which had much more in it, which grew
the dependency chain part of that dynamic equilibrium problem.
And of course then it really became this proprietary library. And this
proprietary library then required us to use it the way it was meant to,
which is we needed a download service. And so the download service wrapped
the s three multipart upload and you needed to use the download service and
the download config object in order to pass that into the cloud package, which we
then use and instantiate the s three multipart upload. That was
a mouthful. And that's one of the problems here is it required
some deep understanding of this that really I didn't need to have just to
use can s three SDK. And as a
result, then we also saw the dependency chain became a problem. We were
on one particular dependency, very, very coupled. In this
case, it was AWS SDK version two versus version three in. Net,
which is substantially different. And importing and using two different
versions of the same SDK in the same app domain in. Net is fairly difficult
thing to do. And at the end of the day you ask yourself,
well, there's some friction there, but maybe what we identified
as the benefits was more valuable than what we lost. And the reality is
it wasn't. And that's something that's hard to gauge without experience.
But at the end of the day, when you look at it and you think
you saved 100 lines of code, that's a good indicator,
a good gauge to say, wow, maybe this wasn't the way that we should have
gone. So with that example in mind,
what are some reasons, some reasons when we should want to
reuse? Well, emerging need was good. We had an identified emerging
need in that example, and that was appropriate. But understanding the maturity
of your organization is important because emerging need can be very different for someone
in a new organization versus an older organization.
Your roadmap and future plans may show different emerging need or problems
from others, and so identify what
that need might be and continue to look for other characteristics here that
we'll see as well. Emerging need is not enough.
High duplication, though, is interesting. And we identified that we had two places of
duplication, but maybe that wasn't enough for us to really identify the contracts
that would be in place to make it flexible enough. And so in this
case, I often think about the rule of three at least. So you need to
have at least three places in the wild that you're not planning for this exist,
but that do in fact exist because requirements and code evolves and
changes. And so it's not theoretical. I need to actually have three
places this exists in production to evaluate and say, I see
it's exactly the same in three places, or it's slightly different, and I can build
a contract to build in that flexibility.
You have to think about high complexity as well, if something is really complex
and there is additional overhead to building it, but, and duplicating it,
that might be a characteristic to say I should actually move towards building
that sooner. And of course
it might be high risk. And when I think about high risk, I often think
about authorization type of code. We're often taught and
told not to rebuild authorization mechanisms where you don't need to use the
stuff that exists in your organization in order to do that, rather than building it
each and every time, because that can be high risk. And authorization told be a
good indication of something that is reusable. But you might
also look for stuff that is a high change frequency.
So we're not looking for stuff that is highly change because it's different every time
you use it. You're looking for something that I have to change it often,
but it's used in many spots exactly the same way that would
allow you to change in one spot, test it in one way and then distribute
it out to a lot of those places. Of course, at the
end of the day, going back to our initial characteristics in looking at this,
we have to ensure that we have low coupling on the architectural dimensions
that you care about. And those dimensions are going to be specific to your application.
And so if it's important for you to move fast in a
particular dependency, then we need to ensure this doesn't have that dependency.
It might just mean generally using a lot less dependencies in your
particular package. You don't necessarily need that left trim package in
order to do a left trim right. You can use something else
for that internally or copy
inside your particular code base and distribute it.
One of my favorites too, that is a great place to identify reusable code
is for best practices and principles. Anytime that you can
codify your best practices and your technical principles and move them into your
code base and then distribute that, that's a huge win that spiders across the organization.
And so you're seeing materialize then a services of characteristics that lead
us to a clear set of reasonable information, which is technical and cross
functional concerns in a distributed world. What do we mean by that? We're talking
support code, we're talking authentication, authorization, standard configs,
platform level features and sdks that need to be integrated
with logging and monitoring can have a great
representation here. We'll talk about that in a minute. HTTP client sdks
and wrappers are also very important. Those are things that don't necessarily change,
but can provide a lot of value, error handling and validation.
You don't need to do that differently in different app domains in a lot of
cases. And of course, serialization, your serialization
can be standardized in more interesting ways across the organization.
Coming from SBS commerce, then we've had the opportunity to
distribute a lot of these types of things in reusable code,
and we use a monolithic, or I should say a mono repo style approach
to that, where we build out a lot of these reusable modules in a shared
repository in GitHub. And one, for example, is logging. We have
an opinionated logging structure where we look for consistency of operations,
dashboarding review, meaning that we push out this structured JSON log format
that is the same across all of the particular applications we
install it to. And that means that the log format is the same.
Our operators, whether it be our engineers themselves who are monitoring production
or other teams entirely that want to look at it, they have an immediate understanding
to the log format that's there. Dashboards can be made in
a reasonable way as well, because they can use the existing log format for it,
and it's very quick and easy to review as needed.
Errors can be another great example where, if you're using an API design
first approach, do you have standardized error formats for
your APIs that you're pushing out? There's no reason every service,
every microservice you have or deployable unit should be building that
on their own, or worse yet, just using the completely their
own strategy or unique schema we should be able to standardize. And if
you're in the API design world, you're familiar with RoC 708 seven, which proposes
a standardized way to do that. Now you need to represent that model
and distribute it. Identity identity is our
package that allows us to handle authorization and authentication in a standardized
way, tested in a single spot, and distributed it in a highly
effective way. And serialization, and I mentioned this earlier
in the last slide, serialization is not something that necessarily has to change
between service to service. In fact, you can find a lot of additional interoperability
and capability by just having that taken care of. And I'm not just talking about
are you choosing camel casing or snake casing, talking about more interesting things around
how you handle enumeration serialization, or how you handle
nulls, or you ignore them or add them. What you do when you want to
ignore something, all sorts of different
detailed serialization questions that are often overlooked,
and of course, secrets. At SPS Commerce, we use the AWS secret
manager, and we have a very custom and proprietary way that we use it
and organize it in a multi account, cross account world,
sorry, I should say multiregion, cross account world. And so
what we're seeing here is that there's still some level of coupling to these
aspects as you pull these into your service. But this is
where we move in and we talk about the concepts and the characteristics around appropriate
coupling. Because you see, you might think, or you might believe
that the term coupling is always bad. I always want low coupling, when in
reality there's also appropriate, compelling. And if you're asking what appropriate coupling
is, let's go back for a definition, which is dimensions of the architecture
that should be coupled to provide maximum benefit with minimal overhead
and cost, meaning that there is a benefit that compelling can provide.
When we balance that, we also compare it against this additional
quote, which is that the idea that more reusable code is the less usable
it is. Meaning that if we go to make our code too reusable,
too flexible, it then doesn't provide an opinion that might
be specific to your organization, and therefore it's less interesting, it's less
usable. So we're looking for that balance of the right level of
low coupling, but still having reusable code that provides an opinion
in there. And that's a really difficult balance to find. Like everything in
architecture, that balance is staying in the middle, not finding each extreme
of the ditch. So let's look at two examples that might help
clarify exactly what we mean by this balance that we're looking for.
First would be consistent logging format that we just talked about. The balance
there, of course, is the advantages that I mentioned around dashboard
usage, and we can have operators then that look at
this in the same way. But if you've created, for example, a reusable dashboard,
and that dashboard is a single instance where you can select a service and now
switches between all those services very easily across your different
teams and it can read them all, that's great. What happens if that reasonable dashboard
is actually a template that gets copied every time it rolls out?
Now, if I were to change my log format, that doesn't necessarily
help me in the same way because the new log format rolls out to the
different services and will break all the reasonable dashboards that are out there and cost
me time to update and redeploy them. So thinking through
how the different coupling characteristics might interact are essential
here. Another one of my favorite examples
is with feature flex. So you're not familiar with feature flags. Think of them
as a decision point you can add into your code to decide if you're going
to execute a piece of code or not, you're reusing a feature.
And you might ask another service to say, hey, is this feature on or is
it off? And when we think about it, there are three key areas that we
can use appropriate coupling to help us with feature flags.
First would be flag keys. So when I ask that provider,
hey, is this flag on or off? I have to provide it a key.
And that key is a string text that has to match what's in the feature
flagging decision provider. If it doesn't match, it's not going to work. And so
if I can create a package that distributes those particular keys across
different services that might need to consume the same feature flag toggle,
that's a big advantage. That also helps me and is can advantage
when I want to clean it up. If I want to clean up one of
these flags across a whole bunch of different deployable units, I need to ensure that
I go across them all and remove it. And so using distributed package to remove
it and build it into an enumeration as an example, and then remove
that enum now allows me to actually redistribute a package
and I'm using to actually break the build as people
upgrade, and so they'll upgrade and be able to go to their code and know
that they should remove that. That flag is no longer active or shouldn't be considered.
Similarly, user context is important. When I work across a distributed
environment, I need to provide the same context and say my user is
name Travis. Now if my other service
asks and says name is Travis John,
then that's not going to work. That's not going to be appropriate. Those are two
different names that it's provided. Maybe one's using name, one's using first name, last name.
The context isn't the same. So using a distributed library to help with the abstraction
of the feature flag context can be enormous help in appropriate,
compelling as well. Another way
to think about appropriate coupling and the difference between duplicating
and reuse, I find, is the traditional animation process. And in
my household I have three young children, and the Lion King
is a pretty big,
I guess, show that we watch often. And it's interesting when
you look at the traditional animation process, I have a lot of respect for the
animators because this is a ton of work to do. Some of these minute changes
in detail shifts. When you think about it, we can break this into two core
parts. The first is the foreground, the second is the background.
And here the background really only has two components. And the foreground,
though, in animating the different animals as they move and shift is
pretty unique, right? Every time I got to redraw it, do I want to have
to redraw the background that hasn't changed every single time? And of course,
the answer is no. And when we think about that, then we think about that
in terms of duplication. When you're going to redraw those animals
in this way, you don't want to redraw the background every time. You're going to
have a ton of little changes and small changes to make. And this is what
the most important part of the scene is, is these animals, not the background.
And of course, the background then is the reusable portion that we want to have
there and available. And in the traditional animation process, they were done
separately. You can take a look at behind the scenes with the lion king and
see how that looks and works. But essentially we have some transparent
sheets that are overlaid over top of each other, a series of overlays that build
this together. So the animator isn't responsible for drawing the whole
scene and redrawing it every time he wants to slightly shift or move
the arm of the lion, for example. In the same way we think
of that with code reuse in a distributed microservice world,
when we stand up a new microservice, there's a ton of cross functional concerns that
are just the background. They're there to make the service work, they're there
to set the context. But the real business logic, the thing that
makes this particular service unique,
is often not something you want to duplicate. It's domain
driven, and it should be existing in this particular domain. It belongs in this microservice.
So with that in mind, though, sometimes there are reasons just to copy it.
And so I want to run through some of these quickly with you. Sometimes understanding
if you have an incorrect abstraction is important. Right.
And so when we look at this, the idea is that you should prefer duplication
over wrong abstraction. If you don't have enough information to understand the
abstraction that you want to build, the interface or the contracts that you're building,
then don't do it yet. Like that simple example in
s three we looked at earlier, we didn't have enough understanding
of what the abstraction should be.
Low overhead to savings ratio. So we think about what is the
actual reuse savings going to have? And of course, a lot of this goes back
to that law of diminishing returns we saw earlier. And this,
in a lot of cases, there's no simple formula to say this is
the amount of time you're going to save. And a lot of these things are
intangible to some degree. But as you begin to think about this
and look at this more, you'll develop a real sense for it, a real
gut understanding of it, but in many cases,
understanding how often you plan to reuse it and some of those other characteristics in
terms of high risk emerging need. Is it
already built? In three spots that I can see how it's been used are really
helpful guidelines for how to approach that.
And of course you want to think about feasibility here. And when we
think about feasibility, we're thinking about the idea that maybe
this isn't something that your team should be building. Maybe you've bitten off too much
and there's more than you can chew, right? If you're a particular delivery
team and you're working on a small aspect, and you're putting together a large application
framework, reusable package, maybe that's something that your platform engineering
team should be building, maybe it isn't even you. And so it's
something to consider that if it's not feasible,
if, for example, there are too many dependencies being added and you can't legitimately do
it without a high degree of coupling, then don't try and
do it. If that's the case, it can be better to copy a little
code than pull in a big library. For one function,
dependency hygiene trumps code reuse.
And of course last but maybe of most
interest is diversified opinions. Sometimes there are just other
opinions, there are too many opinions on how to build something. And if
that's the case, maybe you should consider actually not building it.
Bet. Until you've actually had a chance to land on an opinion within
your organization that would actually make it reusable. Otherwise, you might find
that no one wants to use your nice reusable code that you've built because they
have a different approach to the performance of it or to how serialization
should work. So with
that in mind, I always like to bring up this idea
around shared utility libraries with coupling. This is often how I feel when
I see shared utility libraries. And let me explain a little bit about
what I mean by that. And so if we were to use a library kind
of associated term here, and in the library you had a book and you
wanted to pull that book off the shelf, and it might be called Myorg module
utilities. And there are two core
pieces of content in that particular book.
There is how to cook craft dinner in the microwave, and building custom
furniture so different from each other.
Much like many utility libraries that are just made up of different random
things that people put in there that do different string parsing utilities
or enum parsing all the way to HTTP
clients for APIs, the reality is that they don't actually belong together.
And because you've coupled them together in the same distributable project or
single piece library that's there now, they can't be used effectively
to get rid of some of the problems and the high degree of coupling that
we've seen thus far. And so the reality is that cooking craft in the microwave
and building custom furniture have much different dependencies involved in
them. Or if they were actually pieces of code they told have much different packages
they were consumed. But now you've bound them together and you're pushing them out,
forcing consumers to think that well, number one, the dependency tree is going
to be awful for that. Number two, do you really have any authority over either
one of those? Why would you put them in the same package? Maybe they're
not even accurate or correct. And so it begs a lot of
questions. Keep those libraries small, keep them task focused and specific.
Don't build a utility library. But in a lot of cases a utility library
has been built simply because someone just needed a place to
start copying these kind of one off functions and put them somewhere. And that's not
a bad thing to have. It's just probably more appropriate not to
distribute it and just keep it in a GitHub repo, keep it in something like
stack overflow. We think of those as snippets, right? So you might
use different methods for keeping track of those and having
them available. Maybe at some point it makes sense to see how they grow,
and maybe there is a package that is nicely coupled together that would make sense
to build. But over time you'll likely find that there's a good chunk
of stuff in there that should never be released together in a single libraries.
So with that in mind, let's assume that you've made the decision to go ahead
and start building some of these cross functional concerns into a libraries,
and that libraries you can go ahead and build it out and im sure it
would be effective.
But what I want to touch on here is the idea of templating. Because templating
can really help us think about how to position our reusable code across your ecosystem.
And so the idea around templating takes us one abstraction
layer above that with the intention of introducing a
grouping of packages to form a standard, a more opinionated way of
implementing something that can be cross cutting, some of these cross cutting technical areas
that we want to apply to. So we're going to talk about project seeds.
We're going to build on that with service templates and then build on that with
service chassis. So diving into project
seeds, what do I mean by a project seed is very important, which is
a very high level reference point for starting a new
application that typically provides standardized folder structure
along with SDLC workflow via templates files.
And so it's fairly simple and straightforward. Your seed is made
up of things like metadata folder structure.
Do you use source or src? Do you have a test folder? Do you have
a standardized GitHub actions yaml file you put in there that gives you
the defaults of your template or workflow? Really any of these
types of files or folders that exist, it might be
GitHub specific files like a dependent bot file or codeowners file, or even
a readme MD of how to get started. And when you go ahead and
create that, you're creating a copy of it. And so in most cases when using
a project seed, it's a one time copy to start the project. Here's your skeleton,
go ahead and start ripping it apart and changing names and moving stuff around.
And so the value here, there's a little bit of value, it helps you notionally
get started on a new project. In some cases people are already copying previous
projects they work on in order to start. And the cost to maintain is pretty
low because it's typically language agnostic. And you see that as
a repository template in GitHub where you can mark any repository as
a template and then it simply gets copied and pasted
into the new repo as you selected and start. So there are mechanisms out there
and there are tools at GitHub that just have that built in and it's
low value, low cost as well though.
But more interesting is moving to a service template. And a
service template is defined as this, which is an opinionated reference
for specific application and language types that reduces
boilerplate setup and provide consistency on crosscutting concerns.
The important part here with the template is we're actually moving to language specific scenarios
where we want to provide distributed and reusable code.
And so here we have security classes, we have external
considerations, loggers tracing all
the types of code that we talked about earlier
that we think is appropriate to reuse have been copied and are
available here. Now this typically a service template can be created with tokenized
parameters, and so tokenized parameters are hey, I want to change
the namespace this is in, or the class name to be prefixed
with some specific name for this service, and it does that
transformation and dumps it in your repository for you. This again is a
point in time snapshot on creation. It doesn't update or change after that.
And so here we see the value is a little bit larger because I can
get started pretty fast with some pretty great opinions. But the cost over time,
especially as we think about the law of diminishing returns, then that I'm copying these
pieces of code means that that's great when you got three of them, but when
I got ten or 20 of these and I need to make an update to
the security class, oops, that's a big change to make.
Nevertheless, we see the ability to do service
templates in tools like backstage. IO has a great template
library and marketplace that you can go into, and you can create custom donts and
choose them and tokenize the parameters to get started, and that works fairly well.
The reality though is we're still copying that code and we're still seeing some of
the coupling points are really going to get our way over time. So we
really need to think about the concepts around service chassis. And a service chassis again
is not something that you would just do on your own or have to
do on your own versus a service template. You might actually combine project feels,
service templates and service chassis altogether to form a great experience.
But at a high level, a service chassis changes this around
by instead of including all the code inside of our
service template, we're actually just including configuration for those packages that
are being pulled in. And so now they are copied references.
We've copied by reference, or I should say passed by reference.
And so when we create the new service, that's great. It is a point
in time snapshot of the service template. But because
I can change and modify those pieces of code, I can very easily now reference
them and reference newer versions if I want when you're pushing them out.
And so here we see the value is pretty high because not
only can I get that point in time snapshot and get flying on a new
service very fast, I can also begin to augment and change the opinions
and the best practices and have individuals update
with them over time. Now it gets a little bit better than that.
We can actually take this one abstraction further. In this case, I have a series
of static references. I have exactly five references. They've been added.
That's all I can ever change at a global level within the organization,
unless I go back and add another reference inside the service template. But then only
new projects get that. So we abstract it one bit further with
the service chassis and we use this particular chassis. We might
have a chassis named specifically for building rest APIs in the organization,
and it provides a configuration of all the packages above, maybe some other
configurations relative to rest API, and that way our service templates becomes
much smaller. Anytime we can eliminate components
in the service templates but provide the same functionality, that's a benefit.
So here the service template is very small, it's just a bootstrap that says config.
Use this one package that we have as a reference to the abstracted
reference. And of course when we go ahead and create then the service
off of it, it references a single reference package, but it references all
these other packages. And of course the benefit there is I can add new stuff,
I can change high level configuration, I have a lot more control
on the integration of that within the service. And so the
value here can be much higher and the cost is much lower, not assuming the
cost of creating the other packages. And so this is a pretty
large advantage where we can start to combine then the surface chassis concept
with the service template and the seed to build a pretty nice
experience for reusable code. Of course,
there's another problem that we often experience in the organization,
and I call that the service mesh gap. And the reason I call it that
is because we think about platform engineering. We think about building out these platforms
that a lot of our organizations are building now, and a lot
of that is built on Kubernetes cluster, other container orchestrators
perhaps, but at the end of the day, using service meshes and building that
platform, we see a lot of the functionality provided for us.
So areas where we might have only been able to do in code before,
like distributed tracing. If you're trying to do that without a service mesh,
you had to put it in your code. But now as we start to
move some of these things and we use service meshes and proxies, we can begin
to automatically build in mutual TLS tracing,
egress, logging metrics, errors, and auth is just default
part of that, even if your container is just hello world and has nothing else
as a part of it. But the reality is to effectively do
the logging, it still has to come. Often cases in some type of agreed
upon format or distributed tracing works in the service mesh without any changes.
But we can add a lot more context to it if we want to build
its maturity to the next level. And so we need to meet a contract
in these cases to metrics, to logging, to tracing, and that can
be a great place for the service chassis to fit as well. The reason
is now we can make platform level updates, but also roll
out distributed piece of code that can also automatically roll out
to these different services that have already implemented and used that particular service chassis.
Now the next question on top of this I know is what you're
thinking, which is that my teams never update their packages. That's great. You're saying that
they can update to a newer version, but why? Told they they're not
even thinking about that necessarily. And if we want to get this to a point
of highest effectiveness, we need to have some
level of communication with the teams that's letting them know when new
components or new versions of the package are available. And that needs to
be something that happens fast and quick in order to keep your
velocity flowing in a microservice world. So there are lots of different tools to
do that. If we think about the problem and we have a library,
in this case a nuget package could be a maven package in your app.
The only reasons your team is going to want to update is because while it's
initial install, they detected a vulnerability. There was perhaps
a major upgrade, or they need a new feature that they're
looking for specifically on that package. Beyond that, teams aren't
going to upgrade, just go look at it on their own. Typically you have
some really great engineers who do that, but typically they don't.
And so we can use tools to help us solve this problem. Tools like dependent
bet. If you're using GitHub, dependent bet is a really easy component,
especially for open source, that you can enable that automatically submits pull
requests to your repository with any package updates.
And to answer your question, yes, it does support private feels. So if you want
to build your internal service template, deploy that to an internal package repository,
or JFrog or even public, you can configure that and
include it. It is highly configurable for other purposes. So if you're saying I
don't want to bump all my version numbers all the time, though that might be
a good idea. You can configure just to say for these packages you should do
that. And this is incredibly helpful. And we encourage all our teams
to do this so that you're staying up to date with, if not other things,
at least with the distributed packages that are there, that could be information you include
in your documentation of your library. That that is the expectation for
consumers. It does work across all major language ecosystems
that I've looked at and used, and it does interact nicely through
pull requests. There are other tools out there if you're large in the.
Net world. Newkeeper used to be a good option.
It's not a hosted service though, it's one you'd have to build out and host
yourself. But renovate is a great option and dependent
bot is nice and it works. But renovate provides a lot of additional
options, especially in terms of grouped updates, ensuring that certain packages get
updated at the same time and the same pull request, which can be problematic.
Independent bet so take a look at this type of tooling for dependency
management and dependency consumption and see if you can make use of it. It'll change
the way you think about distributed code in terms of your velocity as well.
So with that in mind, moving on then, here's an example of a service
template that we have at SPS, and this incorporates a lot of these
capabilities. This particular example is demonstrating
how we moved and we use the service template to move between error formats,
to move to structured JSON output, to move our secrets from AWS parameter
store to secret manager, how we handle and move from
standardized resilient HTTP clients for identity authentication
to more distributed auth handlers, which is pretty cool
tracing. And we moved from AWS x ray to open telemetry like many
modification of our serialization routines. And all this was built inside a standard
API chassis that had an opinionated set of best practices within
our organization that it created and set up by default.
And then at the same time include a bunch of your own capability,
additional security, middleware, swag or sentry. Build that all
into the application so that with a simple single install
and a simple package reference, we can begin
to take advantage of your best practices in your organization without
having done anything, without having the overhead of that code even in your repository.
And so for us, this is a particular example. In. Net we
also have other growing service templates in Java and
also in go and in Python. But this example I can simply
create a new scaffolded web application using. Net new
web API, which is the default template from Microsoft, not something we created.
And I can then do a. Net install of our chassis and
update the runtime host. So updating your program Cs essentially,
or if you're in Java, your program Java,
and here you can come in, you can specify then
that you want to use the SPS host. This is our service chassis.
Add it in and then specify that you want to use the middleware as
well as the dependency injection, and then after that everything is configurable and
ejectable, meaning that you have an escape hatch at every point. If you decide
you don't want to use a certain feature of that package without ejecting from the
whole service template altogether. Of course, there are teams that don't want to necessarily
couple themselves to this particular large service template, and they can
use some of the other distributed packages independently and individually if needed.
So that's been a huge advantage in what they're producing.
Well, we're almost at our time for today, but it's important that we talk
about what incremental gains you can achieve within your organization.
And I want to make sure that this was understood before leaving here, that in
some cases, achieving a full service template might be
a mile high order, something that you can't get to with what you
have. And so instead of developing grand designs for an internal code framework,
it's often best to start small, develop iteratively, and progressively build on small
successes. So take advantage of if you can't build a full service chassis
today, start with some of the smaller concepts, build one package and then include that
package reference in your service template and start to roll it out.
Think. But whether you need a full service chassis, we didn't talk about it today,
we didn't have a chance to. But if you're in a polyglot ecosystem,
it can be a lot of work to maintain those. And your dynamic equilibrium
is all about. Do you have enough people supporting it on what the needs of
the organization are? So a lot of this comes back to the resources you have
at hand and the capabilities that you have. But no matter what you decide,
there is a path forward for effective code reuse for
distributed microservices, and I think that you should investigate it further,
and I hope that's compelling enough for you.
Thanks for taking a look today. I'll leave you with this quote from Douglas Crawford,
which I love, which is code reuse is the holy grail
of software engineering. And whether by Holy Grail he intended
to mean that it is all about the journey, that you may never find the
holy Grail, or the potential that it may or may not exist, or whether
it is in fact just the ultimate treasure that
we are looking for. I'll let you decide on which definition is
appropriate, but keep thinking about appropriate coupling for your decisions,
and dive in and take a look at some of those service templates and chassis
that you can build internally in your organization. Appreciate your
time today. Take care. You can always find me online. Mine as well.
On my website or also on Twitter.