Transcript
This transcript was autogenerated. To make changes, submit a PR.
Here's the too long didn't listen list the
whole point of this talk is not to tell you exactly what to do,
it's to show you system patterns I've seen work well and to
give you ideas. The key takeaways are going to be
don't over commit. Things are improving rapidly.
Build out a small suite of working examples
for others to use. Empower teams
to self service. Ensure that it's easy
for teams to do the right thing and that
there are limitations to all of this. So don't over
commit. Things are improving rapidly between starting this
script and recording it. The best
model changed three times and I didn't
even know about the local only architecture for
non technical folks. I'll be describing that later. What this
means is it's better to focus on the style of problem you want to
solve and the building blocks, rather than
over committing to any one vendor or technology.
Different use cases are going to benefit from different learning
models. Don't waste opportunities by
enforcing that your teams have to use a single LLM
model. They will need different tools for
different purposes, and that's fine. So build
out a small suite of working examples.
Not everyone will understand how LLMs can be used to
make them productive, or what styles of problems are even suitable
for use with large language models or other techniques.
This can be mitigated by having a library of real examples
or your business, which teams can interact with and then
emulate or improve upon for their specific challenges.
Also, a library of non examples,
as in this would be a documentation piece.
A library of non examples or forbidden uses
will also help. If they're data sources which are prohibited from using
by law or contract, list them out.
Make it clear so that teams don't accidentally misuse
any of this in par teams to self service.
So another architecture I'm going to be talking about is
retrieval augmented generation.
If you make it possible for teams that add their own Personas,
their own data, and add this to centralized
service, then this is going to reduce the barriers for teams to
try out using these tools. After all,
you want your teams to be spending the time improving how
they work, what data they have, rather than reinventing
all of this middleware. The main beneficiaries of
all of these LLM assisted tools
are going to be the less technical folks who
are unlikely to be able to code the machinery that makes all of these
tools work together. They can certainly provide the raw data
that makes it work. So, for example,
team support teams will have user manuals,
rum books, example tickets,
all of these are really useful context to
make off the shelf tools work so much better
to make them really shine. These teams should
be able to get the benefits without having to
learn how to program as well. Also make sure it's easy
for teams to do the right thing. The high performers
are already using these tools. Regardless of what
policies you've put in place, it's in your
interest to make sure that they can utilize these tools in
ways that won't cause your business harm one way or the other.
A great way of doing this is by having garden paths. So have approved tools
paths for teams to get it to
use, things that they actually want to use with data
that they can use with them. Just saying no isn't
going to stop folks. Now the bit you've all been
waiting for, the example architectures.
So these are the architectures I've seen work well
and will help your teams improve their own efficiencies.
First up, local processing. There are tools
like Jan AI or Olama.
There'll be links at the end where you can
have a front end and provided
your teams have suitable hardware. So say you have
macOS, you've got modern MacBooks. These run these tools fairly
well. Windows, if they've got
dedicated gpu's, say you've got some rendering
workstations, they'll work again fairly well. Your teams
can interact with these tools all locally and can even
process the data locally. You can
also run it on top of documents that you provide.
Just taking one example, this Jain AI tool. You can actually
deceive all any use of remote
APIs and can load in LLM
models that you folks have decided already approved,
and then they can be used for device local processing.
So if you have strict data residency requirements,
you can just run it on your machine. The data never leaves your
machine, the documents never leave your machine. The results of
the LLM running doesn't leave the machine,
but you can also use this a different way. So say for example,
you've set up the next architecture I'll show, or you've already
set up a proxy, you've already got some approved tools.
You can have these local agents call
those tools rather than going off to some external
third vendor. Another advantage of these local
only tools, they are slower than running on dedicated
better hardware or using a remote
API where a company will do all that for you.
It enables folks to actually experiment and see whether this
style of tool will actually work for them, whether it will actually give them
any benefit without you having to invest very much in
it at all. You just have to download it and run it. The architecture
you possibly have been waiting for retrieval
augmented generation. Rather than having
to spend ten or $100 million training
your own AI model, you can use something
off the shelf. There are many out there. I'm not going to
name names or recommend any specific ones because they'll all be out of date
by the time this airs. But this
is the architecture. This means that you can use somebody else's model,
but make it relevant for your
business, your data, whatever it is. So what this architecture
looks like is the user through whatever
it is. Maybe it's ja, maybe it's some custom front
end, doesn't really matter, they put in their query. Heck, it could even be with
a chatbot. And that gets sent off to your
server. Whatever it is, that server will
then take that query and send it over to what's
commonly a vector database. This is just doing a
search for relevant chunks of documentation
which you've already put there. So this could be your internal wikis,
this could be run books,
this can be whatever it is. This can be many,
many many PDF's. And all this stage is doing is
sending you back chunks of documentation. That might help.
So if the user is asking about how
do I do business? Process X for customer Y,
this might bring back the runbook for that process.
It might bring in some extra information about that customer.
So this then gets sent back to your, to your server.
And then you will put together the prompt where
you tell the language model what to do. You'll put
in the query, you'll put in this extra context, you'll send it over to
the thing to do text generation and you'll get back your response.
So for example, it might be the prompt might be you
are a helpful service desk employee.
Help the users as much as you can based
on only the information that is provided in the context.
Then you might list the context. So it would say this is
by customer x, this is the relevant process
document. And then after that you would put the
user's question of like how did I do this process for this customer
that gets sent back? And then the person at the start can use
it. This architecture can be quite nice because
if you have this server in the middle and you've, so let's
pretend that you've got some sort of chat system,
you could allow teams to add different
Personas and the Personas will tell your API
server to use a different prompt, to use a different model,
to use a different set of data
to enrich these queries. Again,
teams generally will be able to say, ah, I want to do
this kind of thing. Here's some examples. You can work with them
to get the prompt and they'll probably be able to go, yeah, here is
my big stack of documentation that I think is relevant
for this. How exactly the vector
database works. To pull out the relevant pieces of those documents
is dependent on what system you're using. But just think of it as it
goes off, finds relevant information, brings it back, and then adds
it all together for the language generation.
So now onto the limitations.
These large language models LLMs can make stuff
up. This is commonly called hallucination.
They are a piece of software designed to
make plausible text. What that means is
that they have no concept of truth or lies,
they just generate text that looks plausible.
This can mean that you end up where if you
don't provide your own process document, it will just make something
up that seems plausible.
And that's a real challenge. Make sure
that whenever you're using these tools that you have a human
over the loop. What I mean by that is that humans
are checking the output of these tools before
they go any further. So if you're using it to improve your documentation,
have someone review that. So outdated
knowledge. Once these models are trained, they typically
don't get updated with fresh information for a while or at all.
Even so, you might be dealing
with steal things like, oh, this library used to
work this way, but now it works another way. The model doesn't
know that it's going to tell you the old way of doing things.
These tools are also because of,
well, what I've just said. They're generally better for templates
or skeletons of things rather than fine detail.
So one example of this might
be if you want a pitch deck for
a specific industry, it can give you the broad strokes,
but you're going to need an expert to put in those fine details,
the things that actually make it relevant and
particularly useful. You can use it for things
like make me a bash for loop,
for example. I can never get that right. I can have the
bot do that, and then I fill in my specific logic
that I actually want. These models can get very expensive.
So if you're trying to buy equipment
so that you can run it on your own hardware at sufficient
speed, that can get very expensive. If you
can even get the hardware at all. There are very long waitlists for
some of this equipment. If you're using some
of the cutting edge models, they will essentially bill
you by token, which is roughly two
or three characters, which looks like a really small number.
But again, if you're enriching your
queries with large documents, that can get
very expensive very quickly. So estimate your
costs. Choose appropriately. And again, this is
also fast moving. The models and the vendors are
evolving and changing so rapidly that's unlikely that
you will choose the model today that you would
choose in a month or in two months or even next year. So you want
to make sure that you have flexibility. You probably don't want to sign large
upfront contracts. You probably don't want to sign large long
term contracts. No one knows who's going to be
the top performer in even six months.
There's a lot of interesting things happening, which is good, but it also
makes it a real challenge. Key takeaways once
again, don't over commit. Things are improving rapidly.
Build out a small suite of working examples for folks to
build on top of. Empower your teams to self service.
Ensure that it's easy for teams to do the right thing.
These tools have limitations and
some example architectures that you can use.