Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello, everyone.
My name is Chetan Nepatak.
I am currently working as a Chief Producting Technology
Officer for Analytica Data Labs.
In today's session, I intend to spend 25 to 30 minutes talking through some
of the Gen AI abilities that we've been building in our product LEAPS, which is
essentially a low code, no code platform.
and I'll give a bit more context about that in, in just a few minutes.
but before I get started, a little bit of, background and introduction about myself.
I have been, working in the space of, AI, ML deep learning
for the last, 12 years now.
by, By education, I'm, I've done two masters.
I'm a MBA from me and PC Paris, and also, master's in advanced
computer science with focus on, AI.
I started off with computer vision, and, went on to build, a deal models in,
predicting, in price prediction engines.
and then a few other models around, in the fintech space, which were really about,
doing, evaluations using machine learning models for, both insurance and loans.
and then the last, one and a half, two years, been working on the
product to build in, generative abilities, and that's been the
focus for the last 18 months or so.
and my today's discussion will revolve around one of the products
that we have built and rolled out, which is both stable and it's scaling
rapidly, and one of the abilities.
and I will talk through some of our experiences there, and some
of the best practices that we have seen that has worked for us as we
moved along from POC to production.
which is, a very apt, topic for, for discussion, today.
So moving on, four essential talking points today.
I'll start off by giving the context of the app, and then I'll move on to,
talk about the various components that go into, the, the app architecture
or the abilities architecture.
and then a little bit of, discussion on prompts and rank, rank, and then finally,
quick note on what's coming next and what the things that we're working on next.
so that's, the agenda, for this discussion, Coming to the context of
the app, like I mentioned earlier, the, the app really is built
grounded for, that they're a citizen for people who are non technical.
This is a low code, no code platform which does end to end data science, and,
it, you can not only build models, so it has abilities which are pre modeling,
abilities, it has the modeling abilities, and of course, the post modeling abilities
which is deploying and serving the models.
and we are continuously enhancing that as we integrate with the
larger ecosystem in this space.
the, The app that I'm talking about today in this session is
the RAG based inference engine.
In fact, these are multiple RAG pipelines.
So there is a parent RAG pipeline which integrates into child pipelines and
the parent RAG pipelines really is the orchestration this inference engine, is
akin to a copilot, and this, tries to take the user through, various aspects
of bringing down, the barriers to learning and the barriers to, working
with the, with the platform as large.
we are trying to address three, broad umbrella questions, or abilities
with this copilot, the first one being, what, can I do next, which
is, nothing but the recommendation engine, again, a RAC pipeline, the
second one being, what else can I do?
This is, this is basically the scenarios engine where, the LLM
model, and the associated RAC pipeline come to an agreement in a assisted
mode with the user, as to what the intent is in terms of the scenario
and then the back, backend workflow, build those scenarios for the user.
and then lastly is the, do it for me, which is our processing engine.
again, abilities, different abilities in, product coming
together in a rack pipeline.
the processing engine essentially, helps the user.
give a very few inputs and, generate or either generate models, do,
any kind of pre processing and even build, charts and dashboards.
So these are the three broad abilities of the copilot.
And then there are some abilities, which again, are part of this larger
three, So that's the context of the app and it is this app that we've been
focused on the last one year and we've been evolving our RAG architectures
to enable this co pilot abilities.
How did we get started?
Obviously we got started like everybody else did, building simple
RAG architectures or pipeline.
we obviously had to move on to, making sure that, whatever that we are
building is, is relevant, to, to the user and has the, the right, responses.
so we had to quickly move to a more evaluation driven, development
or metric driven development.
and then gradually, we build our evaluation landscape, which is
again growing, as there are more and more tools available and more
and more metrics, become available for, every component that is there
as part of the entire RAC pipeline.
we are, gradually adopting to that.
But as of now, I think it's time today.
These are three key, moving parts, to our matrix driven development,
to our entire evaluation landscape.
we are, in terms of approach, very, reference based, which
means that, everything that we're doing based on ground truth.
And to do that, obviously, you need a very, you need rock
solid, gold standard eval data.
We, we are continuously building that.
Again, completely labeled and annotated by humans.
in terms of, the scope of evaluation, we do end to end,
and of course, component based.
When it comes to component based, we right now focus, again, more on, On the, the
retrieval and the generative components, but we are moving towards, building
more evaluation metrics, including unit testing as we bring in more complexity and
more components in our, in our pipeline.
so generative, and, retrieval metrics are, Build using the Raga framework
and we using the standard matrix there of F1 score which is basically a good
balance between precision and recall as far as retrieval is concerned and for
generative we are basing our evaluation on faithfulness and, answer relevancy.
the other element, important element of our, matrix treatment development
is, is the design consideration.
this is, this is key, Point or guiding principle rather which helps us figure
out what are the kind of matrix we need to bring in and Also, what are the kind
of components which we need to either enhance or what additional layers are
needed In terms of design consideration we I use three, deterministic versus
non deterministic, which basically means that whether, how much of large language
models are we going to be using for each of our components, we are currently using
at two, two, two of our components, use large language models, and we gradually,
evaluate that, and based on, how much of, deterministic and non deterministic
components we have, that will drive our evaluation landscape as well.
and then next, of course, is, turns, very important.
Now, when we started off, we were very boundaried, we were, we did
not allow an open conversation.
but now gradually from a single term conversations, we have
moved on to multiple turns.
and then we will, open up, make it, a very free flow, text.
we are still a little boundary, but, we are multi term conversational co pilot.
that's, what the current state is, and we are, continuously testing that, and
the third important, design consideration is the prompt flow, based on, again,
whether a single turn, multiple turn, the complexity, the use case, of the copilot,
and as we build in more and more abilities in copilot, I've spoken about the three
umbrella, but then within that, there are a lot of those smaller abilities, that
determines what is the kind of prompt engineering and prompt flow that will be
needed, whether these are complex queries which needs decomposition, or they, need
any other query enhancement treatment.
based on, that we decided, what kind of enhancement or component
we will be needing, and what metrics we will be needing.
So this is really our, evaluation landscape and how we go about, bring,
taking these three pillars to drive our metrics driven development.
in terms of how we progressed on this, we started off with eyeballing, at the POC
stage, and throughout the POC stage, all that the teams were doing was eyeballing.
and as we were doing eyeballing, we had our experts, which is, our,
analytical team, which, started to use the eyeballing outputs, and the
pairs of query, query context and responses, to build in the eval data.
so we started building the eval data right from our, POC stage.
we also use certain synthetic data generators, to build those pairs.
and then, we moved on to doing, as we developed more, larger eval data set.
we moved on to doing, structured, supervised, evaluation.
and now, of course, we're trying to move to LM as a judge.
this, of course, needs very sophisticated prompt engineering.
and once we achieve this, we'll have a full instrumentation or
automation of our evaluation.
these are the three, these are the three, prominent stages,
in our evaluation journey.
we are, just about starting to use LLM, as a judge.
but our, we continue to use, eyeballing and our supervised, ground truth,
evaluation model, is working well for us.
so that was around, around evaluation, which is really the big piece
and that takes a lot of our time.
and then the next, important element or pillar, is the data pre processing.
we started off with, various chunking, methodologies or strategies,
fixed chunk, preprocessing.
Fizz chunking, content based chunking, but we settled on to semantic chunking
because, when we, did our comparisons, the semantic chunking was working
best for us, but then, it depends on the kind of use case and the, source
data, data that you have, you could potentially, be using a hybrid chunking
strategy, which might just work better.
parsing is of course the most fundamental thing.
That's where everything starts, depending on what kind of documents
you have, whether those are code files, those are, webpages or PDFs,
Word documents or even images.
Depending on that, you would be using different parsing algorithms or tools.
we've tried many, we've, and our current tech stack is, inclined more on the,
parser variable from, the Lama landscape.
sparse and dense retrievers, we, we are starting to feel that we would
be needing dense retrievers, now, although, sparse retrievers have
worked, for us, the DF, IDF, and BM25 is what we have used, so far.
but we started to get some vocabulary mismatch problems, and, our teams
have now started the next POC of, putting, the Dense Retriever
components, in our RAC pipeline.
we have been focusing on metadata from day one.
a lot of our, abilities are driven by this metadata.
and when we say metadata, metadata, in context of rank obviously is the
metadata of the documents and the, metadata of the chunks and also the,
potential metadata around, even prompts if you're doing good prompt versioning,
but we also do metadata around the context of, the user, and that context
is built in, what we refer to as the, the organizational business map, and the KPI.
So we captured that metadata too.
And all of this is in our metadata database, from where we, pull in
the context, which, then gets fed into, which I earlier mentioned
into our orchestration layer.
and then from where we do the routing for, the various, RAC pipeline.
so these are the four, Essential elements of our data pre processing that, we
currently engaged in and, now, starting to experiment, and bringing the dense,
retrieval capabilities, in a platform.
Query Enhancement, we didn't do much of it initially, but we, we
wanted to ensure that we have the best practices in place to make sure
that the intent is fully understood.
and as the, as we opened up and as things became more conversational and the
conversation boundaries were loosened up, it was, it became clear that enhancements
would be needed to, break down complex, Queries, to decompose them to ensure
that the intent becomes very clear.
so we do a query decomposition, and we use, a large language model there.
the other thing that we started to do now is to, do, is to do hide.
Which is, again using, which will be using a large language model.
It really goes a little further than using a supervised encoder.
and it, builds, theoretical documents.
and those documents can then be, It is classified as part of one, classification
of documents from where we then generate, pairs of queries which are
very relevant to that particular chunk.
So chunk classification, is s something that we intend to build by, not only
using the, document chunks that we have through our semantic chunking,
but also by, using, the hide technique.
Routing, we've been using this, since day one because we have different databases.
So we have a routing layer and this routing layer based, on,
which, which database we need to send the query to, does its job.
so the routing layer is responsible to understand the query.
And, again, it's basically, decomposing as the queries become more complex,
understanding the intent of the query.
and then routing it to the correct place now as we do all of these things, it is,
it is, one of our challenge is to ensure, that the complexities that we're building
in, they don't impact the latency.
and so our DevOps team, our infrastructure team, are all a very integral part of,
doing everything, on the rack pipeline, and, building the rack abilities.
And what kind of deployment architectures and infrastructure we can use.
We currently are on AWS.
The product is on AWS.
And we are increasingly evaluating every service in the JNI ability
on AWS, including the serverless abilities that AWS has to offer.
To ensure that we offer good latency.
we, in terms of our, in terms of our go to market, we are, very B2B, a high touch
model, but our abilities are low touch.
while we don't do a hundred percent SAS, SAS based model.
So it, it is, sometimes behind the firewalls.
The application is sometimes behind the firewalls of the customer.
So we have to be very clear in terms of the various, architectures that are
needed based on the usage and concurrency, and also the, associated cost.
our trade offs, is very important and we spend a lot of time understanding
that and building the infrastructure architecture around that.
so that's the, that's the other, important piece and element, of our,
of our, larger pipeline, which is, continuous enhancement of the queries.
Thanks.
the next, the obvious candidate, after query enhancement, that we worked on,
significantly on and we continue to work, is, the retriever and re ranking.
we, didn't have time, We had re ranking from very early days, but, more retriever
abilities, are gradually being built.
we generally classify these abilities in three broad categories,
is the, is retriever, post retriever, and, and generation.
and while there are a lot more, techniques here, I've only, mentioned on the slide.
Thank you.
The ones that we either are using or we intend to start building, immediately.
iBead Retriever, is, we've had this since day one, the filter vector search.
which is simply a filter on top of your vector search.
That's, that, is standard now in most rank, architectures and,
we, we continue to use that.
re ranking, we, we've tried both bi coder and cross encoder.
I was settled on, a cross encoder and we, we found that the cross encoder works
better, in our use cases, simply because it compares, two sentences and payers.
So as long as you have a good, training data of payers,
cross and code is a way to go.
and, we currently use the coherent reran, which is, which works on, cross encoder.
But we do have certain, by, by, by encoders still working, but I think,
gradually we're going to phase them out.
Hierarchical indexing again, seems like this has been adopted
well and has become a standard.
this is, this significantly enhances the precision of the rank application,
You simply, in a hierarchical, indexing, you, we organize the data in
a hierarchical structure as the name suggests, and you have categories and
subcategories based on, on relevance and relationships, and the retrieval process
focuses on those relationships and the relevance, and the, and they process
the retrieval, by, doing, what we call as a parent and teacher relationship.
and channel nodes.
hierarchy indexing is we use extensively, so that's, working for us.
C RAC, CRAC, whichever way you pronounce it, which is corrective, corrective
RAC, this is, a new technique.
We've started to use this, or experiment with this, so we have current POCs, which
are running in this, C RAC is, is It, it brings in another component, which is a
lightweight retriever, and this retriever really self evaluates the, the quality of
the documents retrieved, and it provides a confidence index, so we are, we're hopeful
that, this particular component is again, going to, LLMS as a charge, help us,
instrument, do the full instrumentation of, of a evaluation, landscape.
like I said that we want to move towards as much of automated
evaluation as possible.
why we will still be, focused on, making sure that our, eval datasets are robust.
but having produced those, eval datasets, we want the rest of the pieces to
be really as automated as possible.
fine tuning, again, very helpful.
this is the, generation, we keep the LLMs that we use fine tuned, based on
whatever the context of that LLM is, is, whether it's for generation or whether
it is for query composition or even some of the other, abilities that we're
building that would require, use of LLMs.
so we, that's a standard practice, wherever we have data, we fine tune it.
Wherever we don't, we, generate synthetic data and we fine tune it.
So that's the retrieval and re ranking, abilities that we have,
and that we're currently using and, the one, I spoke about C Rank that
we intend to use, going forward.
So that was about the various principal components that we
have, in, in a RAC pipeline.
A RAC pipeline, at a very high level, has, looks like, the, The pipeline that
I have on the deck, we have the queries.
Then we have query transformation.
we are routing across three, different databases.
We have the product docs database.
We have the user history and what we have the context TV.
The context TV is, is, is really the context of the business
where we have the business maps, the processes, the features.
all of which is used to develop, analytical models.
we're currently using, like I said, two places, large limit models.
One for query decomposition, where we're using chain of, chain
of thought, prompt engineering.
and, we're using, prompting for augmentation process.
so the key takeaway really is that, prompt engineering and
drag are really tied together.
obviously we don't hope, we don't want to over engineer, prompt
engineering, wherever it's not required.
But the key takeaway is that, wherever we use, whichever component uses a
large language model, That is where we will be using sophisticated prompt
engineering and that is how these two are really married to each other.
A simple rank probably uses prompt engineering only in the augmentation
process, but as you bring in more and more LLMs for different components, as I spoke
about, sophisticated prompt engineering is absolutely bedrock for, doing, relevant
and, a more, a correct, RAC pipeline.
in terms of our next steps, We are trying to move to a modular RAC architecture.
This is also the need and, from our customers, wherein, we are going to
be doing subscription based pricing.
So there are different abilities and these abilities need to be dockerized and,
deployed, in a microservices architecture.
So making sure that we are ready for that.
And as are each of these components and these services grow, not in terms of the
usage, but also in terms of, the data that they're handling and the kind of,
responses that they need to generate.
so as that grows and it becomes more sophisticated, We, these services need
to, take a life on its own, of their own.
And, we would need to move towards a modular RAG architecture.
we are, one of our challenges has been, versioning of prompts.
so we are, evaluating different products where we can have end to end versioning of
prompts also because like I said, we are putting in more and more LLMs in different
components of the RAG architecture.
we are almost ready, with the, with the unit testing, and, we're using ProudFu
for that, and that's something which has become part of our, development
lifecycle, that is currently, running.
it's almost a practice now, and then lastly, we're looking at, agentic
abilities wherever they are needed.
like I said, earlier when I was describing about, the application,
there is, the parent, orchestration, engine, inference engine, and, right
now it's built on, custom rules.
but we intend to bring in, use of other tools as well to make sure that, context
is brought in from other data sources.
and we have agenting abilities there.
so those are the, some of the next steps that we intend to take.
and that's, that was a high level view of.
What our application does, what is, the current, architecture of our, RAC
pipelines, and some of the, learnings that we have had, on working on these
abilities, for the last one year.
so I think I'm back on time.
I hope, I hope this, session was useful and, the audience got, some
important, pointers from this.
I will be very happy to take questions whenever the opportunity arises.
thank you so much, for, being there, and listening.