Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, and welcome to my presentation about AI Augmented DevOps
with Platform Engineering.
First, quickly to myself, my name is Roman Roth and I am the global
chief of DevOps and also a partner.
I joined Zilke 22 years ago as a junior NET engineer right from the university.
I became then an expert software engineer, then an architect,
and finally a consultant.
During that time, what always was at my heart is how can we
continuously deliver value?
How can we automate things?
And how can we ensure the quality of what we are building?
And when the whole DevOps movement started, I jumped right on
top of that, became one of the organizers of the DevOps movement.
Meetup Zurich, which is a monthly meetup with over 2, 000, persons.
And I'm also one of the organizers of DevOps Day Zurich, which is a two day
conference, which we are doing yearly.
And these DevOps Days, they are all around the world and all of the big cities.
around the world, and I am the president of the DevOps Day Series.
You can already see DevOps lies very close to my heart.
This is also why I have my own YouTube channel with over 250 videos
all around DevOps, architecture, platform engineering, and so on.
And I also blog a lot, tweet a lot, and I even write at the moment my
own book about the digital factory.
If you want to learn more about DevOps, platform engineering,
and so on, please subscribe to one of my social media channels.
In my projects, I work for different clients in different industries to
DevOps transformation, introduce platform engineering, and currently,
I'm the product manager of a platform project which we are doing with LGT,
which is a bank in Liechtenstein.
And, I will show you during this presentation a demo of
AI Augmented DevOps and how we have implemented that into that.
stay with me and let's now dive into the actual presentation.
One thing that is also potentially to you, what it also happens, of course, to
me is that the management, the business or the CEO, the CIO is coming to you
and say, Romano, we need to have AI because AI is a salute game changer.
We need to put AI into our development process.
And one good thing.
To do when you get such an ask is to ask, why do you would like to have that?
Why do you want to have that?
What is the root cause of that ask?
And usually what you get is, the following answer.
We want to have a faster time to market.
We want to get more value for the money and higher quality out of
it, which then leads to, of course, higher customer satisfaction.
Now, with that, what we need to do is we need to analyze our value stream,
because modern software development is a continuous process, which
you can see there in that infinity loop, across the whole value stream.
And of course, when you look at the value stream, then you have different practices
that you are doing in that very process.
Now, I already told you what we need to do is analyze that value stream and
for that we use value stream mapping.
And this is a very important technique to identify the bottlenecks.
And this is also Where we potentially can then use AI.
How do we do value stream mapping?
First of all, you need to bring all of the people that work across one
of these value streams, which produce one or many products, into a room.
You give them post its and you say, What are the steps that we
need from idea until production?
And you get that example here, feature definition, design, code, test, and so on.
Of course, in your value stream, this will be far more complicated.
Usually, the value streams, when I do value stream mapping, they are much,
much bigger than this small example.
After that, you say, okay, now let's have a look who is working in this step
or who is responsible for that step.
And you get product owner, architect, developer, and tester, and so on.
And then you say, okay, now we need to measure.
The efficiency of each one of these steps and we do that by defining the
lead time lt which is the time from step beginning until step end when
the next step can continue and it includes all of the waiting time.
And then you say, okay, and how much is really actually working time in there?
Which is the process time that you can see there, which is pt.
And then you say, okay, and what is the quality of that, step?
And that's the percentage complete and accuracy, the percentage C and D that you
can see there, which is a measurement.
Which says how much rework do we need to do, when, the quality is not good.
And for example, when you have 80%, like in the feature definition, it
means that in 20% of the cases, we need to go back to that step because the
quality of that feature definition was.
Good enough.
And then suddenly you get this picture here and immediately you
can see where the bottlenecks are.
For example, in code and test.
massive discrepancies between process time and lead time.
For example, in tests, we have a process time, actual work where the tester is
working of eight hours and the lead time is 336 hours, which is huge.
There is a lot of waiting time and a lot of inefficiencies in there.
Also, the percentage C and a complete and accuracy is 50%, which means.
In 50 percent of the case, we need to go back to that step
because the quality was not good.
When you have done that, you have identified potential bottlenecks
where you could use a bot.
I always say, think about it, because Potentially, also another solution
might be better and even adapting the process could be a simple solution,
but of course, you could use AI.
When you look at that, then you, you immediately see, Some of the spots
where you potentially can use AI in your whole development value stream
which are highlighted in there.
These are potential areas where you could use AI.
I also showed that in a other picture.
So to give you also a little bit of input, what possibilities are
out there for AI augmented DevOps use cases in a value stream.
So for example, in plan, we can use AI to analyze historic project data, Predict
risks, resource needs, and delivery times.
So that could be something.
And of course, there are more examples, as you can see there.
In code, you all know, co pilot, generate, refactor, debug,
and explain code, of course.
In build, you can do auto remediation of security vulnerabilities, for example.
In test, you can do an impact analysis of the changes and predict, what the
impact is of, of, of that change and then execute accordingly, test cases.
In Deploy, you can use AI to predict the impact of the deployment
and also monitor the deployment health and auto trigger rollbacks.
In Release, You can do continuous release verification and also again,
impact analysis in operate, you can detect and fix configuration
drift fully automatically.
And in monitor, you can do pattern recognition, anomaly detection, event
correlation, root cause analysis, and also self healing, by the way.
This is usually, what I just told you, this is called AI Ops, and
this is a technique that is used.
Also, quite long out there, which is absolutely ready to use already.
So you can see all of these use cases, but for these use cases, you
need to have the right foundation.
And now you ask, why do we need to have the right foundation?
Let's have a quick look.
Normally, what you have is different projects or products
that you are developing.
And you have different streams, value streams, that are developing these.
And this can be internal developed mixed with externals or only externals, but
all of them have their local development environment, which means you also have
quite a huge tool landscape in there.
So where the world at the moment is moving.
It's moving in the direction of platform engineering, which is an important thing.
What you want to have is one platform where all of these tools are built.
Which leads then to have a clear set of services and a clear set
of products that you can use or which the developer can use.
and this leads to standardization.
And this is very important when it comes to AI augmented operations
because that is the foundation.
And only with that, this whole thing scales.
Thanks.
The target operating model, which we need for platform engineering,
and of course, also for AI augmented DevOps looks like this.
at the moment, we have different product teams with all of the people in there
that are needed, and the cognitive load and the complexity is quite high.
And what we want to have is this target operating model, where we still
have the product teams, But they are much smaller, they also cover a much
smaller technical stack, while platform team, creates a self service platform,
with, of course, a bigger technology stack, but that is a product that
they are delivering product teams.
let's have a quick, different look on that.
So we have the platform team that develops, builds, maintains that
platform with different capabilities and different tools in there.
And they are enabling the product teams to develop their product.
The product teams, they are practicing DevOps, and they are building, running,
and maintaining their products.
It is not the platform team that maintains the product.
or operates their products, it is the product teams that the platform
team only gives to the product teams, the capabilities and the
tools that they need to use, or they want to use, for their products.
And this means that the product teams generate value for the
customer while the platform team generates value to the product teams.
When we look from an architectural perspective how this looks
like, then this looks like this.
You have, of course, your platform.
With CLI and, with, with, self service portal, and you have all of the tools, you
are not hiding away these tools, you just integrate these tools into your platform,
and that's a very important thing.
And then usually you have your internal developer portal.
This is something you usually, relate to backstage.
And many companies are using backstage.
But as you can see, this is only that tiny layer on top of that.
And this is also why some companies are not so satisfied with backstage,
because it is only that layer.
Bye.
Bye.
That layer below, these layers, these multiple layers, they are not
included in Backstage because what you need to have is, of course, also
provisioning and automation, you need to automate things, in there.
And what, what you also need to have is you need to integrate all
of these tools into your platform.
And this is something where I recommend to use adapters because.
When you integrate these tools, you never know when one of these tools is dying.
I even don't know how long GitLab will be on the market.
This is why we have a GitLab adapter.
And then on top of that, a unified integration block for repositories.
Which means we can integrate GitLab, GitHub, or whatever we want into that.
And This makes it much more easier to replace one of these tools.
This is the high level architecture of a platform.
And this leads us to a very important architectural principle
that I want to give you.
when you are building a platform, what you need to do is you need to
create a so called floating platform.
And a floating platform means that you are just plugging in all of the services,
all of the tools, all of the DevOps platforms, like GitHub, GitLab, and so on.
And you get this developer experience on top of that.
What you never, ever should do is duplicate a feature from one of the
tools or from one of the platforms below.
You just integrate that.
With that, you have a floating platform which floats with the tools because
when they are creating new features.
You float on top of them as soon as you are going to imp to hide anything
away, abstract anything away, or implement or duplicate a feature,
your platform will start to sink.
And that's a huge problem.
Always remember that you need to build a floating platform now, of course,
and I bring that picture again, many of you will say, but Romano, AI is
just that tiny box there, but nowadays AI is everywhere and it's huge.
Yes, of course, in your platform, it is just a capability, but when we zoom
into that box, Then it looks like this.
Off top, you have the product development teams that are
using your developer portal.
You also expose some CLIs and APIs.
And below that, and this is important, you have the application.
You have chatbots, which I will show you in a minute.
You have AI coding assistants.
You potentially have a knowledge management and so on.
Below that you have the tooling layer where you have prompt engineering,
error cases, vector databases and so on.
And below that you have the models, with a model hub, with enterprise
specific models, all of that.
And below that, You integrate all of the APIs of the Gen AI
infrastructure that is out there.
This is how you integrate, AI Augmented DevOps into your platform.
And with that, we are going to have a look how that looks like in a real platform.
Let's.
switch here.
What you can see here is, already that platform that we
have built together with LGT.
LGT is a bank in Liechtenstein and together with them, we have
built these Portal, now we are on the so called Sulu plane.
This is the portal that we use at Zühlke for our projects.
when we do client projects, we develop them, if we are
allowed, with this platform.
LGT has their own instance of platform.
But you can see already, this is what the developer usually sees.
You of course have, you have documentation and you can even
switch here to a AI assistant where you can ask questions like, how
can I create a cluster and so on.
We also have here the AI chat.
Quickly go into that.
Here you can see our AI chat.
This is also an important thing.
We developed that for Zühlke.
Looks similar to chat GPT, but this is a chat, that we rolled out at
Zühlke so that we can use chat GPT, or whatever LLM that we, have behind
that in a standardized and, also, a standardized and governed way.
And, of course, you, can type in, what is DevOps, for example, in here and
chat GPT, like you get the default.
The answer of that is very important to understand that.
Um, the employees don't need to know what kind of LLM or service is behind them.
And you can even replace these services quite easily when you have such a
platform that provides you with these services in a standardized government.
Now we go to the platform itself.
This is where the platform plane is.
is implemented.
so we implement the platform plane.
This is the product with, this platform plane.
I will not show you everything because there is a ton of,
things that I could show you.
There are also videos on my YouTube channel which explain
this platform in more detail.
We just go now to the AI cases that we started to implement in here.
We have here the registry, which is the register, the container registry.
we go here, for example, to these Argo CD operator.
we can click here on one of them and we see here the
layers of this container image.
And what we got from the developers as feedback is that We have some difficulties
with our containers, could you help us?
The platform team itself is too small to really help them
with their container images.
But what we could do is show them layering.
And add a button to, to, to analyze that and what you can see here is
just a large language model, which is optimized for container image analysis.
And it's very awesome.
And this is also one part of how you can leverage AI augmented DevOps.
Into your platform and the developers really, like that, they, they're
using that quite off quite a lot.
We are now going to the platform itself.
So here, we have the Kubernetes cluster with, all of the
applications that are running.
And again, because when you have such a platform, you have all
of the log files also in there.
And then you can just analyze such a namespace that you can see here.
You see how this chatbot is analyzing these log files that you have in there.
This was unfortunately not a very good example.
I go to that one.
It's also not so good example.
What you can see already is that it is quite awesome to have these, These,
these, analysis with AIs in there.
We even go one step further and enable also the developers
to use these AI use cases.
For example, you have here the Azure OpenAI, that they can use.
So this is the service catalog where they can add services to their data.
application.
You also have here a whole LLM platform that we can give them so that they can
build on top of that their AI use cases.
And this is exactly what some of the developers have done.
You see here that AI space and that we have here and also this Zen AI.
This is the home of the reference finder.
this is this application that we just have implemented.
Unfortunately, I'm not allowed to type anything in here because
most of the things that you have in here are confidential.
But what we can do as, as Zulke employees, find project references,
which we then can use, with in our bits.
And this is really an absolute awesome thing.
They have built on top of the platform that we have these reference binders in.
Nearly no time because all of the services were already there.
The second thing is this at Zen AI here, they want to improve the
whole bidding process, at Zilke.
And, we have here things like spin sales analysis, summarizer,
improved text, AI chat with search, workshop assistant, and so on.
These are all AI agents with specific system prompts that are
highly optimized for our use case.
I quickly just go into the company analyzer and here I
just search for the company.
Here is a specialized prompt behind a system prompt, which goes, of
course, also to the internet, gathers the information and gives that
information back to the company.
In a structured way, which you can see here, including the references.
These use cases are very powerful, and it enables your company to do AI augmented
DevOps and to do these AI, use cases.
With that, we are going to continue.
Back to presentation and we are closing now with a summary.
So what I just told you is we, when you want to go into the direction
of AI augmented DevOps with platform engineering, It is very important
to analyze your value stream.
Not everything needs to be solved with AIA.
Sometimes it is far better to just go and improve the
process or rethink the process.
This is why it is absolutely necessary that you understand your value
stream, that you are doing value stream mapping, analyze where the
problem areas are, and then think about a potential solution first as,
a process adaption and only last.
think about using AI, but if you want to go into this direction, then you need
to have a target operating model with platform engineering, you need to build
up a platform engineering team that Builds a self service platform, which should
be a floating platform, which floats on top of all of the tools and all of the
platforms, cloud providers, everything.
It needs to float on top of that.
In there, AI is just a capability.
And of course, that's quite a huge capability, as we just saw.
But having this AI capability really, and encourages all of the
people to use these AI use cases in the whole DevOps lifecycle.
And by having that you can implement all of these use cases.
What I did is I showed you some of the use cases.
Most of the use cases are still under heavy development.
For example, currently, we are thinking about, I showed you that lock analysis.
We are thinking about having a second button where you could click
and say, okay, fix that problem and create a pull request out of that.
And, a third thing that we are currently also, developing is.
A copilot for Visual Studio, which uses the AI capabilities of that
platform that you don't need to pay that yearly subscription.
So you see, by having such a platform, this is an absolutely huge enabler for the
development team, but also the business.
And what you can clearly see is that we are entering the age of industrial The
software development these platform teams.
They are building the platform which enables The development team to do ai
augmented devops, but it also enables The business, because we start, we get
a faster time to market more value for the money and the higher quality, and
it drives the whole, enterprise in the direction of, AI driven, innovation.
So with that, thank you very much and see you next time.