Building a Serverless GenAI Platform

Video size:

Abstract

Natural language is the new programming language. It allows even non-technical people to create powerful generative AI applications. Learn how to build a safe, secure self-service GenAI platform that automates infrastructure, access control, guardrails, and observability for your teams.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello, everyone. Welcome to Con 42, the Platform Engineering Edition. I'm really excited to share our experiences and journey so far in building a serverless generative AI platform. My name is Murali Malina, and I'm Chief Technology Officer at SoftRAMS. I've been really lucky to have been building teams and software solutions for more than 25 years and the last seven years at SoftRAMS itself. I'm also the co founder and CEO of a five year old nonprofit organization called Teaching for Good. This is a very unique editech nonprofit. It empowers anybody that is interested to teach, train, mentor, or coach. At the same time, you can actually use this platform to raise funds for a non profit of your choice. So in other words, we are a non profit and we support raising funds for any non profit of your choice by leveraging your skills and passion. To teach and mentor others. I also had a great privilege of collaborating and working with exceptional teams, worldwide teams in Germany, UK, France, India, China, Russia, and of course, us and have built mission critical systems. In telecom supply chain, particularly auto industry, healthcare, working with federal agencies. And my favorite and pretty close to my heart is Editech. I'm not that active on Twitter. Please hit me up on LinkedIn if you want to connect and chat about pretty much anything you want to chat about. I've been working with SoftRAMS for almost seven and a half years now. And SoftRAMS is one of the fastest growing civic digital services firms. We support a variety of mission critical loads for our federal agency customers. We have been working hard to bring some of these generative AI experiences and solutions that depend on the generative AI into our work. With our federal agencies, and as these agencies provide a very unique opportunities and different levels of constraints and complexity. That's one of the factors actually that led us to build this Gen AI platform. This can be deployed in entirety into any customer environment, for example. And go from an experiment to production in days to weeks, rather than weeks to months, it used to take us to build ML experiences, for instance, and I will definitely touch a few aspects and these unique constraints later in the discussion. just a shameless plug. If anybody's looking for an opportunity, take a look at some of the open positions at SoftRAMS. It's a great place to work and it's one of the top workplaces in the U S. And it has been recognized for almost fourth year in a row for innovation and leadership culture. Given that this is a, an online virtual conference, instead of polling you to see where you are and what you're doing, I will make a few assumptions and discuss various aspects. but please LinkedIn if you have any specific questions or want to take this conversation to a different level. I'm sure many of you are using JNI for fun and Some probably have also started using a variety of Gen AI tools at work. Maybe GitHub Copilot or similar, for example, or automated peer reviews, or tools to generate documentation, release notes, just to start with. And how about building new Gen AI applications? I'm pretty sure you're as excited as me and everybody else in the ecosystem when Charged GPT got released. We all wanted to build these new AI experiences into new applications as well as bring these Gen AI experiences to existing applications. Or, at least, most of you must be experimenting with various tools and LLM models, probably some of the software as a services that are out there in the ecosystem as well. As you all know, AI and ML that we all know and told as in historians, That there is an AI or ML before ChargeGPT and there is an AI and ML after the ChargeGPT. This transformation is pretty surreal in many ways and this is now very pervasive as well across the board, across the world. The excitement and ease of building applications, ChargeGPT is totally profound. Suddenly the AI or ML, or I would say especially generative AI, is on the list of top priorities across the board around the world. Shifting the perception of AI and ML from what used to be like a little bit more complicated and costly process, even more important of the talent like need for data scientists. Only so many organizations were able to afford to build those end to end ML experiences. And the chart GPT brought this into something that is now Totally accessible to pretty much everyone. and though we have come a long way in terms of automated automation, tooling and systems to build amazing EAML applications before, but charge GPT and LLM models in general, are taking us to a completely different level in order of magnitude in terms of speed of innovation. One fundamental pivot for that entire ecosystem is that LLMs made this natural language as a new programming language, making it possible for anybody in the organization, technical, non technical, business teams, doesn't matter, to be able to leverage LLMs, create apps and assistants, Purely using plain everyday language. Of course, this still need to learn a little bit about how to create those instructions and how to work with these prompts, but they can do it pretty quickly and with that people from all walks of life now can create exceptional user experiences without the need for large teams and large coding exercises or workflows. to be able to bring these experience experiences into life in practice, I definitely believe, and I have seen it firsthand. So many of these enterprise applications can now be developed without writing a single line of code by specifying instructions in plain text and leveraging any kind of relevant knowledge basis from variety of data sources. Be it in documents, presentations, or even databases, for example. Teams can now create these custom GPTs, if you will, or chatbots, or assistants, or agents, tailored to their specific needs. And another unique aspect that these LLMs brought, truly, conversational interfaces to interact with a variety of software systems. Even in conventional software applications, chart is fast becoming the preferred and primary interaction model. And to be able to ensure that these AI experiments, these AI apps become successful, that's something that you can rely on and use it in at your work, in your organizations and teams. It is really critical that we provide access to these models in the first place, create these capabilities in a way that those are accessible to regular users, and provide them that safe and secure environment so they can actually play around and build these applications. The most important part of this is that now that plain language is becoming the primary interaction, not only to use applications, but also to create, It's very important to make these capabilities accessible to every team member in your organization, technical, non technical, doesn't matter at all, and abstract away some of these complexities with respect to how to create the infrastructure, the workflows, all the other backend infrastructure tasks, if you will. And of course. creating a shareable catalog of sample applications, reusable prompt libraries, and a little bit of training so that you can accelerate the adoption of that platform in your organization. And precisely in this context, I would like to share about the Gen AI platform that we have built, initially for our internal teams, but now we're using it for our customers as well. We have built this generic platform that allows anybody in our teams, irrespective of the technical aptitude to be able to quickly create a conversational chart board or an assistant. But that is also grounded on information that is provided by the author of this chart board from variety of data sources. And kind of this is now possible to do so in about five minutes once you have an idea. So once you can get the chart board up and running in the five minutes, it'll allow you to iterate faster, quickly test them, refine the prompts, refine the knowledge basis. And once you feel good about the work, the quality of the work, the way it is able to process information, present information. Then you can actually share with everybody else in the organization. And as I mentioned, we do support federal agencies, and we made sure that this can be deployed in entirety into any customer environment. The initial version of this product was built as a CDK project with everything needed, including infrastructure, the databases for, vector stores, the orchestration, the workflows, the pipelines, the access control. Everything is built using CDK so that we can deploy to AWS. We have different versions now available that will make this entire platform available to be deployed into Azure or, Google Cloud or onto on premises as long as you have a Kubernetes infrastructure. this platform has been set up to connect to LLM models across the board. Either they're offered on AWS Bedrock, OpenAI directly, or Azure OpenAI. Our models on Google Vertex are pretty much anywhere as long as you have access to the API for hosted models. And if you want to host your own models, then, of course, it is very specific to the cloud. On AWS, for example, you could set them up into the SalesMaker to be able to make them available for the platform. And the platform comes with handy Prebuilt features, for example, that will take care of most of the setup so that the end users can just concentrate on fine tuning their apps and making sure everything is up to snuff in terms of security and by making automation tools more accessible across the organization, it definitely sparks that creativity because everybody is excited about able to use chart GPT or a tool, something like chart GPT that they can build for themselves. That will help in their own work. And of course it'll boost their productivity. If they could use their own work as well. this means your team can definitely streamline some of these workflows in all, all kinds of shapes and capabilities and present them in a safe and secure space so that. Everybody in the team can experiment with these ideas and create amazing applications to be able to streamline and automate the infrastructure, the pipelines, the workflows, and the most importantly, the security aspects. We have looked at different use cases and different applications. and quickly standardized in all these varieties of applications and use cases based on a few basic capabilities into four different groups. The first and foremost important one is chatbots. These are probably the first application any user try even on our platform. And this primary use case is that this will enable users to interact with applications using natural language. And you just can ask questions and receive answers based on the context. And this is completely grounded in the private knowledge basis that the others of the chartboards provide. And there are connectors available to load this information from SharePoint conference. Or just upload documents directly. the next set of capabilities are grouped into what we call as agents. The agents are like chat bots, but they're autonomous, that they can autonomously make decisions to decide the flow or the number of iterations, for example, to prepare appropriate answers or act on behalf of the user, or just simply navigate a multi-step processor, a workflow. These agents will have the ability to access different tools, APIs, or functions, and also will have access to data sources to gather all relevant information and let them make, let them make the decisions to execute that particular workflow. And they can iterate through multiple steps to provide a final response to the user. And sometimes they can also act on behalf of the user, like Creating a task list or placing an order or updating status in a tool like Jira, for instance. And the next one is my favorite, is what we call as AI for BI. Every business collects tons of data, and many enterprises have extensive teams and systems to be able to extract insights, analysis, and reports every single day. However, in, in many cases, it usually takes days to weeks. to be able to get an answer when a business team has a question about something. Because this typically requires teams to go back and identify the right data sources, fetch the data, put it in a space area or space or a safe area to be able to process the data or create alternate views to be made them accessible, and then extract those insights and deliver those insights. And if the business team happens to have a follow up question, which happens all the time based on the information that is provided. Typically, the next iteration goes in a very similar workflow again, days to weeks. But now, thanks to LLMs, we can let these business users ask questions directly in natural language. The orchestrator that will rely on these LLMs, first of all, to identify what data sources are appropriate, fetch the data, process the data, And summarize the data to be able to answer that question and best of best case is also that it will also provide all the necessary background, how it arrived, how did it process pretty much explaining the process of going through answering that question itself, along with the evidence is presented to the user. And this can happen within a matter of one, probably 1 to 5 minutes instead of days or weeks. We noticed our quickest question on AIBA solutions is about 30, 35 seconds to three minutes. It will take longer based on your use case or quicker based on number of data sources it need to look at. number of iterations, number of queries it need to make kind of stuff. And it is possible with all the new frameworks coming up, you can do all these iterations in parallel. Sorry, the number of queries in parallel and quickly iterate that so that you can actually improve that latency to get the responses even faster. And these apps can also support data visualization, not just presenting a textual summary, but also data visualization, prepare. Powerpoint decks, if you want, generate reports of dashboards along with the summaries and insights in natural language. So when you prepare a report, you're not only looking at the visual aspect of the data presented, but also the textual explanation of what happened or what is happening with the data. And. The key insights in that report itself, and it's so easy and totally accessible. Everybody can understand it. It's textual, it's graphical, combined, and it's automated. The most important part is this can happen in real time. And the fourth group of. Applications, and these are probably the most powerful in this group, are what we call as agent crews. An agent crew represents essentially a team of self organizing agents that can collaborate with each other to perform little bit more complicated tasks. these agents will work together, assume different roles and responsibilities, bring their specialization aspect, and automate and refine those processes to create that final answer. And these are particularly valuable for multi step tasks, preparing reports, for example, or doing analysis, automatic security testing, for instance, and others. and even preparing comprehensive documentation, for example, that will take multiple iterations, multiple loads and responsibilities. So these are the four groups of capabilities, and for each group, we were able to identify exactly what is needed to support these use cases, the infrastructure, the workflows, the pipelines for Rack stores, for example, plus the observability aspect of being able to see every piece of We have a lot of back end interactions that are happening to facilitate that response so that developers and technically savvy users have all the information that they need to be able to evaluate, validate the responses, look at the performance issues, and go and refine them as needed by offering a self service platform team members throughout the organization now can easily whip up and launch these apps without long development cycles that we used to have. And it also speeds up innovation because now everybody that has an idea can easily and quickly create these apps and everything else is taken care for them. And in our journey, of course, we also teamed up with federal agencies to see what kind of workloads we want to support, what kind of use cases we want to support, and make sure that whatever we build, that solution, it can smoothly go live in customers environments. And every team should be able to try these things out in a safe and secure space and smoothly move to production whenever they're ready. So to make it happen, we come up with a serverless platform that is super flexible and includes all the things that are required, like an all in one platform, and it plays nice with all the hosted models as well as your custom models. For example, if you want to tap into any models that are hosted on AWS Bedrock, You can do or if you want to connect directly to open AI or host your custom model, you will be able to do that as well in the safe confines of that specific cloud provider. In this case, the initial version was built with AWS, so you can run everything in a safe and secure way in AWS account itself. And of course it got the power to reach into a variety of data sources within your AWS account as well as across different accounts, guaranteeing that there is a safe setup and data privacy across your accounts and your applications. Nothing is going beyond the security boundary with respect to the data or the privacy is concerned. I would like to quickly show this simplified platform architecture. This is I made sure to bring it to the essential model so that everybody can see what it takes to build this platform. And on top of it, of course, you will be lots of additional modules and little customizations. But this will give you a com a, a good gist of what it takes to build a platform like this. Starting with, you need an interface or you will be using some kind of client applications that will interact with API, so I'm not showing the UI piece of it. You can build it as an SPA using React or Wrangler or view, for example, that will serve as an UI layer for your application that will interact with the APIs. And this architecture shows what it takes to build that API behind the scenes that can provide these rich capabilities into the platform. So you have some kind of authorization and authentication mechanism. We're just using Cognito in this case and federated with single as single sign on if needed to connect to your enterprise I am controls and an API gateway that will accept the request that are coming in to be able to support streaming. Of course, you have other mechanisms. I'll touch upon just in a minute and the crux of the solution is what we call as an orchestrator, This could be a container service or a microservice or a group or a series of microservices that are, that will work in collaboration to orchestrate the workflow that you need to be able to answer that question. For example, if it is a straightforward chatbot, it will directly go and hit the Bedrock API and get the responses back. Or if the Rack store, it will include going to the Rack, bring the context. include that context and go back to the bedrock model and then get the response back. And every interaction with the agents, AI for BI, or the multi agent crews, the, all the crux of the interaction happens inside the orchestrator. And it will be relying on LLMs to do everything. Some of that work, but orchestrators are the most important pieces in this specific context. And that's as simple as that. Once you have an orchestrator, and once you have an API accessible to reach out to an LLM, this will give you everything you need to be able to build a Gen I platform. Of course, you need access to your data sources. API is functions and whatnot, as well as workflows and pipelines set up that will allow you to upload documents into your rack store. And convert them into and store them into rack store for easy access later on as part of your queries on top of it, you will be building lots of other building blocks. But this is the essential skeletal architecture that anybody can build this platform very quickly and then refine the platform later on. It's been close to a year now since we launched the first version of our platform, and we have come a long way and want to share a few things. That'll probably help you out if you're also thinking or already building a platform like this on your own. the secret sauce to crafting this top notch platform for your organizations and clients is training and education because everybody's excited, but they're also scared a little bit on not to lose their data, not to create another security incident and worried about what kind of information is fed back into these models. and that somehow their information doesn't show up in public anywhere. So it's very super important to train everyone in the system, not just technical folks like developers and testers, but everyone so that they will have the foundational skins, like writing a simple prompt, asking the right question, right way so that LLM can bring you more relevant answers. Since a lot of these folks are also new to Gen A apps, adding and allowing them to experiment is really key. Because we cannot allow them to come up with the right prompt the first time that experimentation is key and allowing them to make mistakes is key and giving them that immediate feedback as part of the cost is incurred based on their query. The latency aspects of performance aspects, the bias aspects, the efficiency with how it was able to pull that information from rag. So provide all that information as a feedback to the users, but make sure that let them quickly experiment in a private space. And whenever they're ready, let them say, let them share it with everybody else in the system. And if it is developers, DevOps, security folks, by allowing them to start using tools like GitHub Copilot, they get better with asking questions, working with Copilots, and then that also allow them to build applications like Copilots by themselves by using these capabilities. And organize some fun hackathons in safe spaces so that Everybody can dive into examples, get inspired by looking at what everybody else is doing, and get the lowdown on how to use it, how to create these apps like a pro. And when you are building up your platform, it is really crucial to organize these applications and capabilities based on skills, based on use cases. And some sample apps as well. This will help everybody to understand what is available, get inspired by looking at what everybody else in your organization are doing, and then quickly start their own experiments. And make sure that whatever you build, abstract away all the complexity and show the simple interface that everybody can use it. And that is the key for success of any of the, any platform. And it's also really important that. It is, it's going to be a long journey for you as your team as well that are building this platform treated like a product, start with an MVP, gather some user input, see how they're using, where they're fumbling, what is working, what is not working, and then constantly beep up those features, and then you can unlock additional use cases, additional, capabilities and additional, security aspects. And, of course, a lot more metrics that will make sense to your users as well. as I said, it's a journey, it's a long journey. that learning and adaptation is really important to be able to build any platform, but Janai specifically as well. And when we adopt a little bit more structured approach to giving training to all kinds of users, giving them those space spaces for experimentation, and taking their feedback and iteratively build it, I think you definitely will pave the way for building a really strong platform that caters to changing needs and evolving use cases in your organization and, of course, for your customers as well. So with that, I will want to quickly run down a few highlights as a summary from this conversation. It is really important to provide a safe and secure sandbox for your users. And as I said earlier, try to optimize your experiences for the non typical users first because the language or the lingua franca for generative experience is a natural language and we want to empower everybody in your organization to be able to do So make sure you optimize your experiences for the non typical users and also create that environment where they can learn from each other. Looking at the catalog of applications, catalog of capabilities, demos, as well as shareable prompt libraries, for example, go a long way. with that, I would like to take this moment to thank you very much for joining this discussion on platform engineering, especially building a Janai platform. Feel free to reach out on LinkedIn if you would like to connect and continue this conversation. Thank you very much, and have a wonderful day.

Slides

Download slides (PDF)

See all 50 talks at this event!

Conf42 Platform Engineering 2024 - Online

September 05 2024 - premiere 5PM GMT

Building a Serverless GenAI Platform

Video size:

Abstract

Summary

Transcript

Slides

Murali Mallina

CTO @ Softrams

Join the community!

Featured event

2025

2024

Info

Conf42 Platform Engineering 2024 - Online

September 05 2024 - premiere 5PM GMT

Building a Serverless GenAI Platform

Video size:

Abstract

Summary

Transcript

Slides

Murali Mallina

CTO @ Softrams

Join the community!