Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello and welcome to my talk. Opentelemetry and Epsagon a
love story in three acts when we
started Epsagon four years ago, this was a team and we
used proprietary sdks and fraternity protocol.
Today we are a part of Cisco and we
build a new product that supports Opentelemetry natively. We contribute
code to Opentelemetry repositories on a day to day basis.
If you want to use open source projects
as a part of your solution, this talk is for you. I will
explain how we got from where we started to where we are today on
the mistakes that we made along the way and what you can learn from them.
First, a little bit about myself. I'm Yosef Arbiv.
I am married to Abi and I'm the father of three adorable
kids. And I'm team leader of the
SDK team that builds the libraries and the sdks
that our customer use in
the Cisco Etni group. So a little
bit about Epsagon so you can understand how we fit in.
Epsagon is a solution for customers who
use and build their product, their backend,
with lots of different services and frameworks
in the cloud. As you can see, in such
cases it can be very hard to keep track of what you have and who
talks to who. So this is where Epsagon
comes to help our customers install
the sdks as a part of their code.
They don't need to change anything in their code, they just need to import our
sdks into their code and then
they can log in in our website and they
can see the traces and graphs of the
product. Who talks to who and where do they have failures,
where are the errors, and they can debug and travel to their
systems. So in order to
generate these graphs, we need to send traces from the code.
So this is where my team came in place.
We build those sdks, that instrument the code,
create the traces and send them to the back end.
So first where we started,
when we started, the serverless market was new and trending,
and we decided to aim for customers using serverless
solutions. Back then there were
no industry standard for distributed traces.
Most of the customers were using logs or nondistributed
traces. We looked at the
possibilities we had and we considered using open tracing,
but we had a couple of problems with it.
First, open tracing was backed by one company which
was a competitor of ours, so we were a little bit afraid of
using it. Another problem wasnt that open
tracing didn't have automatic traces back then.
We wanted the experience for our customer to be as smooth as possible and
to have minimum code changes. So we decided to create our
own libraries that will do automatic instrumentations
and will create traces automatically.
At first we considered to create those packages
closed and to send them to our customers to use.
But soon enough we discovered that this was not really a possibility.
Our customers didn't want to install closed source
packages and to add them to their sources. They wanted the code
to be open so they can see it, they can fix bugs and so on.
So we decided to open source our libraries and to publish them.
And we also hope to create a little community around them where
customers can fix bugs and add new instrumentations
and so on. So this was the first phase when
we started Epsagon,
what we learned from this phase. So first about the
product defensibility, you need to think which part
of your product you want to be closed and which part should be opened.
Focusing the defensibility on the wrong part can
be problematic. Another thing that we learned is that building an
open source community can be really hard and requires a lot of energy
and resources. It is much easier to join a community
than to create one on your own. And this
brings us to the second act, the standardization of the market.
We managed to create a good solution for customers using serverless
frameworks and we were very popular at this market,
but this market was too small for us and we couldn't build a big business
on it. So we decided to move to other fields,
such as the Kubernetes clusters. But when we looked into
it, it turns but that to build a
Java Kubernetes agent on our own was too complex.
So we looked into open tracing again and
it was moment short then. So we decided to build our agent on top
of it. So we decided to take some
code from opentracing libraries and to add the
code that was needed for our solution, and to
add some changes, to add the tracing protocol
that we needed, and so on.
And this way we could build a successful Java agent for
Kubernetes clusters that was based on open tracing
but was compatible for the Epsagon backend.
Shortly afterwards, Opentelemetry was announced.
Opentelemetry was based on open tracing and open census,
and it became very popular pretty
soon. So we
started to build new libraries that were
based on open telemetry code on the same way we did with
open tracing. We took the libraries and
the code from Opentelemetry and we changed it
a bit. We add the unique functionality
that we needed for Epsagon that was not included
in open
telemetry libraries. And this
way we created our own libraries as a forks actually
of Opentelemetry.
This way we were able to create new libraries very fast.
So we created more and more libraries, but maintaining them became
a headache. It was really hard to maintain the libraries
when the community keeps moving forward and changing
the code, adding new functionality.
So it was very hard for us to keep track with
the community. Which brings
me to the lesson learned. So first,
forks are really hard to maintain because you can't really
keep updating your code with changes and
additional code that community added.
So I really suggest not to use forks when
possible. There are much better ways to use open
source than forks. And the second one is the tech
debt. We created a lot of tech debt when we move forward like
this, we managed to add new libraries
and to add new functionality to our product. But as
we did it, we create an increasing
tech debt. In our case,
eventually this was not a problem as you
will see in a minute, but in other cases
it can be really problematic. So this is something
that should be considered again,
for some cases it can be actually good to create
a tech debt because you keep growing your
product and you move forward and you add new functionality and
new features. So it can be great. But growing
the techdet without control over it can be a problem.
And these problems brought us to the third phase,
joining the open telemetry community and joining Cisco.
As we had more and more forks, the overhead became too big
and we understand that this was not really scalable and
we can't move on like this. We also had more
customers talking about Opentelemetry and customers that
were using Opentelemetry in their code. They wanted to see opentelemetry
traces together with epsilon traces in epsilon backend.
So to answer this need, we decided to create a
small experiment with a Java agent. We built a new
agent that was not based on Opentelemetry as a
fork, but as a distribution of opentelemetry, meaning that
we used opentelemetry as a package and created
more functionality on top of it. For Epsagon backend.
We needed to collect more data that Opentelemetry were not
collecting. So we added this as an extension
to Opentelemetry, but we keep the Opentelemetry
traces structure.
This experiment was really successful. We were able to build very
fast an agent that was built
on top of opentelemetry, but without forking the code of open telemetry.
So updating it and maintaining it was
much easier. In addition, we were using open telemetry
trace structure, which means that
our back end now was able to support open
telemetry based traces and not only the Epsagon
traces. So we were more friendly for the communities.
Shortly after this successful experiment, just when
we were about to create more libraries in this structure,
it was announced that Cisco are acquiring Epsagon.
For us, it means that we will stop working on the epsagon product
and we will start working on a new product on the full stack of
scalability product. Together with Cisco groups,
we decided that our new product was
supporting. We decided that
our new product should support Opentelemetry natively,
meaning that we will be able to provide
value for customers using only Opentelemetry without
Epsagon libraries and to add more value for customers who
are using Epsagon libraries. We also decided
that our libraries will be based on Opentelemetry as
a distribution and this way we can create
libraries wasnt and also we were able to maintain them
as we move on. As we moved into this phase,
we also joined the community, meaning that we started to
contribute code, to add new features and to fix bugs
in Opentelemetry projects.
Joining the Opentelemetry community was a great experience.
First, we grouped with Appy Dynamics team and other teams at Cisco
who were already contributing and working with Opentelemetry.
They had a lot of experience with Opentelemetry code and we learned
a lot from them. We met new
maintainers of Opentelemetry and we
worked together with them, learning from them, asking them
questions, and understanding how the community works and
what we need to do in order to fit in. I can also say
that being a part of Opentelemetry community is great for the
developers in my team. We love to be a part of
something bigger and to be able to contribute back to the community.
Being a part of a growing open source community brings
lots of value to the developers in my team and this is very important as
well. In the future, we hope to be a significant
part of the Opentelemetry community. We aim to contribute as
much as possible from our distributions back to the community.
This way, we want Opentelemetry to be a major part of
our full stack observability solution for our customers at Cisco.
Thank you for joining me. If you have any questions, feel free
to reach out at Twitter, LinkedIn or discord.
Thank you and see you there.