Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey, everyone, welcome to our session.
Today we're going to be talking about utilizing Argo CD and
Kubernetes for suicide prevention for LGBTQ plus young people.
My colleague here, Alfredo, and I will be presenting this topic
on behalf of the Trevor Project.
Trevor Project is a nonprofit dedicated for the mental health
space for LGBTQ plus community.
So if you want to learn more about who we are, what we do, please
feel free to scan on the QR code.
We also will be sharing this out, and also we'll have this at
the end of the presentation, so you can also scan it again then.
And with that said moving into the next slide.
So let's talk about what is the Trevor Project.
Like I mentioned before, the Trevor Project is the leading suicide
prevention and crisis intervention services for LGBTQ plus young people.
Here at the Trevor Project, we provide five major services
for people within the U.
S.
And also now in Mexico.
Specifically, the first service is Crisis Services, which we're
going to be talking more about later in this particular session.
Crisis Services is a service where we're connecting LGBTQ plus young people who
have suicide ideation or have other crises that they want to talk through, whether
that's through text, SMS, or web chat.
Our second service is advocacy, making sure we have a group team to be able
to advocate for laws that are more helpful for the LGBTQ plus community.
As for our third service is research.
We conduct research studies, learning about the trends within the community,
as well as understanding where mental health is going for these students.
Particular individuals before services, Trevor space, pretty much a social
platform where the ages between 13 to 24 year olds can be able to go
online, get support, talk about things and just help each other out pretty
much a safe space for everyone on the platform and last but not least is
the education and public awareness.
Service where we're providing mental health resources, as well as material
and resources for both the youth and allies who come to our page to learn more
about mental health, suicide prevention, and ultimately, LGBTQ plus communities.
And with that said, I'll pass it over to Alfredo to introduce himself.
Thank you so much, Paul.
I'm Alfredo Pizana, tech lead here at Trevor.
I'm currently focusing on MERT technologies and part of infrastructure
and I'm leading the development of this amazing application
we are going to talk about.
Thank you.
Awesome.
Thanks, Alfredo.
And over on my side, my name is Paul Pham.
I use he, him pronouns and I'm the engineering manager who gets to
work closely with Alfredo on our day to day operations, particularly
that supports many of our verticals here at the Trevor Project.
Cool.
And so let's talk about what we're going to be sharing.
We're going to be sharing four different things.
First off, it's going to be the project.
What did we do here at the Trevor project?
And how we utilize our, the CD and Kubernetes as well as many
other tools within our system.
And that leads us to the architecture where we're going to be sharing out the
high level as well as deep dive a little bit more into the open source tools.
Then we're going to go to challenges and lesson learned from what we've
done on the project and wanting to share it out with you also that way.
If you were to go down this path, or a very similar path, you would
be able to avoid certain things.
And last but not least, we want to share the impact of the solution
that we have implemented so far.
And so let's start with the project context.
So here at the project, our mission is to end suicide among LGBTQ plus young people.
That means that we want to be able to serve the youth.
But at the same time, also help others learn the importance
of suicide prevention.
And the Trevor Project journey started on August 11, 1998, about over 25
years ago, where we first took our first phone call to be able to provide
support for LGBTQ plus young people.
And ever since then, we have started iterating on our technologies and
services, ultimately just moving from having phone calls, to having
text messages, to ultimately having web chats available for youth.
But what we wanted to do more is be able to expand these services
beyond the United States to global nations, such as what we're going
to be talking about today, Mexico.
And the reason that we wanted to do this is because we've done research
studies, how impactful our services are, as well as the need for our services.
In a global market, and as a result, we started doing some more research
understanding where should we expand to, and we started off with 194 countries,
and based on these five criteria, safety, scale, equity, language, and learning, we
were able to narrow down to two countries, it was Mexico and the Philippines, and
so the reason that we started with the Mexico first, because it was a neighboring
country, and it was closer that we can set up operation safely and successfully.
And our goal was to be able to deliver this new platform that expanded
our services beyond the United States on October 11th of 2022.
And we were able to do successfully.
The reason this date is so important is the national coming out day
within Mexico, and we wanted to align appropriately to market this and be able
to have it successfully up and running.
And so before, when we were talking about the United States, our services
and stuff like that you'll see here, this screenshot captures what we were using,
utilizing before, which is a managed service within Salesforce that allows us
to be able to connect to LGBT plus young people as well as, make sure we're helping
them through updating case records.
All through one platform.
However, this was not going to work for an expansion.
It's going to take a lot of time to scale up to maintain
for various instances globally.
And so this led us to the creation of what we called the global platform.
Or CSM, which is crisis services management platform allows us to be
able to scale up more quickly and maintain it and have a better continuous
tool development and delivery.
And so this leads us to this new UI, where we are able to utilize
open source technologies and create something like this, where you can
see the similarities, but at the same time, the major differences.
And we're going to do that more so on the back end side of things.
And with that said, I'll pass it over to Alfredo to talk more about
the back end, pretty much the fun side of things within this project.
Thank you so much, Paul.
Yeah, talking about a little bit on the architecture side.
First, we want to mention about the tools we use for the application.
On the infrastructure, we are using GCP.
For our search code management, we use GitHub as well as for action
flows for our CI CD pipelines.
And on the infrastructure deployments, we decide to use Terraform as our
infrastructure as a code, as well as Argo CD for our continuous delivery
to our Kubernetes application.
We are using Kubernetes and Twilio as our contact center platform.
Next slide, please.
And during those conversations about expanding our services, we
we plan to create a platform that was scalable, open source and high available
aiming to support Multi language and different type of digital channels and on
interactions in case you see the global platform name somewhere is application
we create to support multi language and multi layer here on the architecture.
You will see at the left.
The users, that's our data to use, and we are support, we're currently
supporting SMS, WhatsApp, and WebChat, that are digital channels.
We are supporting this through our Twilio platform, which is our contact
center platform, that helps us to support all of the interaction,
active interaction in real time.
Now talking a little bit on the database, we choose MongoDB.
As our persistent storage and ready for us, our catchy and it will it
help us to scale on demand if needed.
Now, going a little bit dive into the global platform.
You will turn the slide, please.
You will see that we are using Kubernetes for the infrastructure.
And at the left is the UI.
We are using one cluster.
For the UI and another cluster for the back end.
And we are using
one cluster per environment.
So you will see that we have QA, dev, and stage production, etc.
And one of them, there is one cluster for each.
And in the middle, you can see there is Argo CD.
Which is our content, continuous delivery platform that helps us
to smoothly to release different versions of the application.
For our different clusters and pods.
At the bottom, you will see that there is GitHub that helps us to
automatically trigger those events and deploy that using Terraform.
Which is our infrastructure as a code platform.
Which helps us to define what we want to deploy.
To create on GCP and how we want to create that.
Could you go to the next slide, please?
Here's an example of a configuration file for Terraform.
And specifically for this one, this is a conflict fight for a cluster.
So we just want to show off like how we define and how we are using,
because this is a template where we are using different variables to
define environment, regions, etcetera.
You go to the next slide, please.
Now, talking a little bit more on the Customization and scalability, we decided
to use customized because we had a role in a challenge that was, we were trying
to find a way to find a method to deploy our apps to Kubernetes automatically.
So we decided to use customize, customize files, which help us to
create templates for Terraform.
At the right you will see that this is an example of how we organize different
applications, retiling, different files.
For example, the application one, application two, et cetera.
This is just a template to show you how we decide to create the
base and different applications.
Could you go to the next slide please?
And this is a real example of the results.
For example, by using different config maps, deployments, external secrets, et
cetera, we were able to create a real manifest that it's at the right with
different values, real values that can be used to automatically deploy through ROC.
Could you go to the next slide, please?
Thank you.
So why we decide to use R-O-C-D-R-O-C-D help us to automatically deploy
the design application to a specific envir data environment.
And it also help us to try different updates and versions
through different branches using GitHub or connecting with GitHub.
RO CD also follows the Gith ops pattern.
of using the repository, the code repository as a soft source of true for
defining the desired application state, which means that we can create different
branches, different tags to maintain and support different versions of the
application, depending on the environment.
And just to mention kubernetes manifest can be specified on different Formats on
different ways, and the one we choose was customized applications, but there are
some others like camps and Jason, etc.
Can you go to the next slide, please?
This is the result of our infrastructure up and running.
You can see different names of different pots that are currently running, and at
the right you will see the versions that we are maintaining on depending on the
ver on the environment and the version.
For example, you can see that there's a version all 0.2 point.
0.
7.
2 that is healthy and sync, but there are some others that are out of
sync and there should be some others that could be crashing or starting
depending on the test we are running.
So this helps us to, this gives us a big visibility of what's
happening during the releases and during the development of the.
Project and the code.
We can easily identify what's happening.
If a service is up and running or obscene, scratching, restarting or whatever.
Could you go to the next slide, please?
We will show you an example of what's inside of those elements of the list.
Here is a file, a manifest file, which contains all of
the config maps, versions, etc.
That's what's inside of that file.
So we can inspect, identify if there is any error, and also, we can Since we
are automating this and creating all of the files automatically, we can Also,
run scripts to generate those files, but this was tedious and a manual process.
That's why we decided to implement ROCD, to automate everything and
smoothly transition from one version to another, since it was consuming
time, and it was prone to human error.
As you can see, there is an example of a little command line that we
can use to generate that file.
Could you go to the next slide, please?
So talking about a little bit on challenges and lessons learned, we
were facing three main challenges that we were able to identify.
The first one was the strict launch date by the business.
It was the final day, so we had to align to that day.
What we had to do was to decide what we want to achieve for that day, the
MVP, and how we were going to do that.
So it was Drive us to this, to choose open source technologies, such as what
we just mentioned, Kubernetes ROC, which help us to automate the process
and reduce times for those deliveries.
So it helped us to,
to use or to dedicate more time for the development.
Also another challenge we were able to identify once we defined the
technologies we wanted was a lack of knowledge on building open source.
Or on using open source technologies.
So it was a journey because we had to learn and decide
why we wanted to use that.
I just, as we just mentioned, for example, the decision of using Argo CD was a
journey to learn, but it also helped us to reduce time in some other areas.
And last but not least, at some point, We were maintaining up to 10
repositories because we were migrating from the old platform to the new one.
So it was a challenge to smoothly transition from the old platform,
maintaining all the data, creating scripts on different repositories to
transition that data to a new one.
So it was also a challenge.
And at some point we had to produce that and create monorepos, etc.
Thanks for sharing that, Alfredo.
Both the architecture and the challenges.
As you can tell Kubernetes, Argo CD has made it significantly
easier for us to deploy.
But as Alfredo also mentioned, a lot of challenges did come our way.
And what did we learn from this whole entire project
utilizing all these technologies?
The first lesson that we actually learned is we needed to make
sure that we were resilient.
Introducing a lot of these open source technologies ahead of time.
So that way folks can actually learn about it and utilize it to its full potential.
During this whole entire project, we were learning and building at the same time.
And so sometimes we didn't really use the best practices or, as I
mentioned earlier, we were deploying manually instead of, sometimes
going through the whole pipeline.
And so this is something that we had to learn through the hard way.
So if you can't just be able to make sure that onboarding, knowledge
transfer, all that good stuff about these technologies beforehand.
The second lesson that we learned is we needed to add more observability to
our, different systems and applications.
As Alfredo mentioned, we have quite a few clusters to manage, we needed
to understand where those pods up and running successfully, how long, all
those metrics are super important.
So that way it allows us to debug and find root cause a lot more quickly and prevent
some of the system outages that did happen when we did scale or Areas came up.
So it's more of a preventative measure.
So in the future, if we were to redo this, we wanted to add the
observability as part of the launch.
So that way, we can get that all set up to end.
And then the last lesson that I like to share that we learned is, it's
also important, not just to track observability, but also track your costs.
As Afro mentioned we're a non profit and it's really important for us to make sure
that we're keeping track of our costs.
Like all companies trying to reduce costs, but, still deliver great quality service.
And so one way that we're wanting to do in the future is to auto scale
these not just the deployments, but also make sure we auto scale the pods.
If there's more traffic reduce the pods.
If there's less traffic, just ultimately.
Being able to fluctuate based on traffic to give the best service, but at the
same time, reduce computing costs.
And going to our next slide in terms of impact.
By doing all this, project, by doing, incorporating all these tools and
technologies, there are two great impact that came out of this project
using, utilizing Kubernetes, Argo CD, and many other technologies.
The first one is going to be, We were able to support over 20, 000
people in Mexico since we launched.
That is a huge amount of number who reached out to us and ultimately
get the help that they needed.
And two, our system were able to scale to support over 1, 500 chats per month.
So making sure that, ideally we will want to continually support more, but
this is a great number to start with.
And so with that said that concludes our session.
Hopefully you learned a little bit about what the Trover project is our expansion
project from, utilizing Salesforce to more open source tool technology.
Learn a little bit how you can utilize Oracle CD and Kubernetes within your own
systems to ultimately do a better Faster deployments or more safe deployments
for your Kubernetes clusters and pods.
If you ever want to learn more about what we do, feel free to contact Alfredo or
myself, or just feel free to scan the QR code and reach out to us through there.
Thank you so much, and I hope you all have a good rest of your day.
Thank you so much.