Conf42 Observability 2024 - Online

Navigating Observability Challenges: Maximizing Generative AI Potential and Mitigating Deep Fake Risks

Video size:

Abstract

Unlock the power of generative AI while protecting against deep fake threats. Learn how detection algorithms, blockchain, and cryptographic watermarking can safeguard digital content. Join us for insights on maximizing innovation and fostering secure, responsible tech use.

Summary

  • Today I would like to talk to you about the various recent advancements in generative AI. How we can leverage AI to unleash the creative potential while mitigating the risks. And specifically, I'd like to focus today discussion on the deep fake challenge.
  • Generative AI comprises of many different sub fields, of which generative AI is the most recent offshoot. It has an impact on so many industries than we have time to talk about today, but I would like to focus on three of the most important types of industries.
  • Deepfakes are hyper realistic fabrications of content. Could lead to fraud and id theft. Also a lot of negative impact on the society in the form of misinformation. How we move forward in the age of information is going to be the difference between whether we survive.
  • When it comes to establishing the identity and authenticity of content, there are two main ways or two main aspects that we need to talk about. The first one is provenance, and the second one is verification. One way to solve this problem is blockchain.
  • Generative AI has a lot of potential, but at the same time, we also need to keep in mind the various risks associated. It is crucial to put the right incentives in place for continued research and investments. I would like to hear your opinions on what you think are the most pressing problems in the field of generative AI.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. Today I would like to talk to you about the various recent advancements in generative AI and how we can leverage generative AI to unleash the creative potential while mitigating the risks. So, as a brief outline, here are the various topics that I would like to discuss today. First of all, I would like to introduce what exactly generated AI means. And then I would like to move on to talk about the different types of concrete impact business impact generative AI has been having in different industries. And specifically, I would like to focus today discussion on the deep fake challenge. I'm sure you have seen some recent news articles about the various issues that people have been having because of deepfakes. So we'll dive deeper into that, and then we can talk about various ways in which we can mitigate these problems. So we can approach the mitigation from a technical point of view, but also from a regulation point of view, or from a media and education point of view. So we'll talk about all of that, and then finally we'll talk about how all these things can play a role together in collaboration to achieve the outcomes desired. So, yeah, let's get into it. So what exactly is generative AI? In general? AI comprises of many different sub fields, of which generative AI is the most recent offshoot. So when we talk about artificial intelligence, we can categorize artificial intelligence into various buckets using many different dimensions. For example, we can look at artificial intelligence and talk about what kind of techniques we are using to achieve our goals and classify it that way. For example, you have the general machine learning techniques, including linear classifiers, nonlinear classifiers, etcetera, that have been in existence for many, many years before genetic AI. But you can also talk about deep learning, which uses artificial neural networks to achieve the same goals. So this is the kind of classification you can make based on the techniques that we are using. But you can also classify artificial intelligence based on the type of data dependencies it has. For example, you have supervised artificial intelligence, or machine learning that depends on lots and lots of label data. We call it supervised learning. But you can also talk about unsupervised learning, where the algorithms do not need any labeled data, but you are trying to discover patterns in that unlabeled data. And then there is a mixture of these two, which is semi supervised learning that uses a little bit of labeled data, but also leverages huge amounts of unlabeled data to achieve various codes. So this is another type of classification. So the final type of classification that is more pertinent generative AI is whether we are trying to build a model that differentiates between various groups of data, which we call discriminative AI, or we are trying to use these models to generate new data, which we call generative AI. So all the new and exciting techniques that we have been hearing route, such as, you know, LLMs, and genetic models for images and videos and 3d artifacts and so on, they are all in the category of generative AI. So what these models essentially do is they use a neural network architecture called transformers. And lots of, lots of data is used to train these transformer models to create what we call foundational models. The idea of the foundational models is that these models learn the statistical properties of a certain type of data, for example, language data. And then when you provide prompts to these models, they generate data. They generate, for example, in the case of LLMs, they generate text that appears to be created by a human being. So, generative AI has an impact on so many industries than we have time to talk about today. But I would like to just focus on three of the most important types of industries that generative AI has been transforming of late. So the first industry is healthcare. So, healthcare is challenging in the sense that it is very expensive to develop new drugs, and it takes a lot of regulatory clearances for something to become a product. And by reducing the search space of problems using generative AI, the expense that one has to spare to come up with a successful product is drastically reduced. So the second use case, and probably one that is more relevant to today's talk, is media. So, in media, generative AI helps content creators to create artifacts using a lot less effort than it was possible before the advent of generative AI. So the final industry that I would briefly allude to is design, for example, architecture and other creative disciplines like that, where we have been seeing many, many advancements in the field of generative AI, healthcare is one of the most exciting application areas for generative AI. The reason why genitive AI has a potentially very big impact on healthcare is due to the nature of drug discovery. The process of drug discovery is a very time consuming, laborious, and expensive process. The way the process, this process usually works is that the scientists would guess what kind of molecules could potentially make for useful drugs, and then they will have to manufacture or create each of these molecules in the lab, and then those molecules will have to go through various phases of testing and approval before they can be marketed as useful drugs. So what genetic AI does is it consumes all the information regarding the protein structure of various pathogens and naturally occurring molecules in your body, and also the existing drugs to create a subset of molecules that have the best chance of working as useful drugs. So what then happens is scientists can just purely focus on this subset of molecules, which is a much smaller set that they have to work with, and hence reduce the time and various costs that are involved in developing useful drugs. The next major application area that I want to focus on is media. This is the kind of application area that shows up most frequently in the news as well, just because of the societal impact that it could have. We are looking at a picture that is generated by one of these foundation models. It's called pseudonymnesia, and it ended up winning the Sony Award for the best picture in photography. Obviously, that award was withdrawn once the organizers came to know that it was actually generated by a model and not an individual. But you can see the kind of impact it could have in the media, because it is very hard to distinguish, and the models are getting better and better by the day, and it's becoming harder and harder to distinguish the content generated by models versus individuals. So the final focus area, or the application area that I want to talk about, is design. So in this example, we are looking at an architecture that is actually generated by one of the foundational models. So previously, when an architecture has to come up with a certain type of design, given the requirements and parameters of a given building, it would take him or her months and even longer to satisfy all the conditions and all the requirements that the particular structure needs. But with generative AI, you can just encode all these parameters and the requirements and constraints that a particular building needs to adhere to. And then generative AI, I can create candidate architectures in a matter of seconds. So as you can see, it could have a huge impact on the profession of not just architecture, but other similarly creative fields. So we've looked at three different examples or application areas where genetic AI can have a potentially huge impact, both in a positive and negative manner. But today I would like to focus more on some of the more tricky and precarious aspects of generative AI, specifically in the form of deepfakes. So what exactly are deepfakes? Deepfakes are essentially hyper realistic fabrications of content. So what that means is that the content generated by these models, at first glance looks like a very genuine human created content. But if you take a closer look, you will find inconsistencies that are, by the way, getting harder and harder to detect. The various problems that this type of content creates is, I mean, you may have seen some of these articles and some of these examples in the news yourself. So at an individual level, this kind of generative, AI created content could lead to id theft. So someone could imitate your voice and call your bank and ask the bank to transfer money, etcetera. So it could, it could cause to, it could cause many problems to individuals in terms of fraud and id theft. But also at a more higher macro level, there could be a lot of negative impact on the society in the form of misinformation, fake news, etcetera. So, for example, one of the most recent things that you may have seen in the news is how OpenAI has used a certain Hollywood celebrity twice without her consent. So obviously, as she had not provided her consent, they still were able to reproduce her voice and have that voice say whatever they want. So, as you can see, this could lead to many, many problems in terms of copyrights and things like that. So here is a quick example of what deep fake content could look like. It's just a short video, about 30 seconds. Now, you see, I would never say these things, at least not in a public address, but someone else would, someone like Jordan Peele. This is a dangerous time. Moving forward, we need to be more vigilant with what we trust from the Internet. That's a time when we need to rely on trusted news sources. May sound basic, but how we move forward in the age of information is going to be the difference between whether we survive. Yeah. So as you just saw, it's one of the more popular examples of what deep fakes and generative AI could do. And this video is actually a few years old, is, I think, six years old. So it's still pretty convincing. But as you know, six years in the field of generative AI is almost like a lifetime. So the models have improved so much in these six years that anyone that has access to a reasonably fast computer, you don't need gps or anything, you can, you just need access to a reasonably fast computer and a browser, and you don't have to be technically advanced or anything like that, or be good at imitating, like Jordan Peele, and you can still produce something like this. So you can see how quickly this can scale and how bad the problem of misinformation and fake information can get. So what can we do about all these problems? And given the pace at which generative AI is advancing, how can we keep up with protecting, you know, ourselves as individuals, but also the society as a whole? So I would like to propose three different points of view and three different approaches that we could potentially use to mitigate these problems. So the first approach that I want to talk about is technical approaches. So just as we are using technology to improve generative AI every day, we can use the same type of technology to actually combat deepfakes. So what do I mean by that? So we have, we already have seen a, quite a few papers published in the field of detecting deepfakes. So these can take the form of, you know, the, you know, the forms that we talked about in the beginning, these could be either supervised or unsupervised. So in terms of supervised detection of deepfakes, what we're going to need is lots of labeled data labeled by humans that, you know, that show which content is human generated versus which content is AI generated. So, as we discussed in the beginning, this is going to be expensive, and it's going to be hard, even for humans sometimes to distinguish AI generated content from human generated content just because how far the models have come. So what else can we do? So we have other deep learning based methods. For example, we have what we call a factor method. So what this method does is that instead of looking at the actual artifact that is created by the generative AI, it looks at the context in which this artifact is being presented. Like, the meta information on this generative AI artifacts, for example, like the video that we have saw or that we have just seen. You know, we all know it's Obama. But the video itself does not say anything about that video being of Obama. But there will be, like, the title of the video on YouTube, for example, that says something that points to the fact that this is supposed to be a video of Obama. So the idea of factor methods is to incorporate information like that instead of, you know, using hand coded human generated labels to train models to detect deep fake content. So, and then there are more technical approaches. For example, there are new, you know, new and novel neural networks architectures. For example, exceptionnet is one of the newer convolutional neural network variations that lets you detect defect content. So what this basically does is it modifies traditional combination neural networks to operate more effectively, and it also leverages vision, transformers, etcetera, to create feature extractions, which can then be used to detect defects. So, yeah, there are many, many approaches like that, but the general idea is that is to leverage the same type of technology to detect these deep fakes. So, but we cannot just focus only on technical approaches because of its inherent limitations. We need to have a more comprehensive approach that involves other stakeholders as well. So one of one such big stakeholder is media. So what do we mean by that? So, as we, as we have been seeing more and more of this fake content, right? So the media also has a responsibility to fact check, for example, if some video, like the one we have seen shows up on social media or something like that, the media cannot just assume or publish this. They have to kind of fact check and look at the origin of the video and do their due diligence to establish that it comes from legitimate sources. So we call this first party verification. But it's not just on the media. It's a combined effort both by the media as well as the consumers of media. So what do we. What do we mean by that? So here we refer to individuals who are consuming the media. So it is, the general public would be well served to not consume anything that they come across on the Internet, because anybody can post anything on the Internet, right? So it is a good idea for individuals who are consuming the media to kind of rely only on authentic sources and not attach the same level of weight to the content that they see, for example, on social media, because it's not necessarily clear what the source of the content that you see on social media is. And finally, there is the regulatory aspect of it. In fact, there have been many, many advancements over the second half of 2023 and during the first few months of 2024, particularly because of the election that is coming up. So, in fact, there have been already 14 states that have introduced some form of legislation that address the problem of defects. So we talked about how various regulatory frameworks are being put into place to detect deepfakes and make sure that the media that the society is consuming is authentic, etcetera. But as you might have noticed in the previous slide, many of these regulatory frameworks depend on some kind of mechanism to track the authenticity of the content that is being put out there. So this turns out to be a non trivial issue to solve. So when it comes to establishing the identity and authenticity of content, there are two main ways or two main aspects that we need to talk about. The first one is provenance, and the second one is verification. So what exactly is prominence? So, prominence just talks about the origin of the content. So it answers questions such as who created this content? When was it created and how was it created? And who owns it? Etcetera. All these questions fall under the purview of problems. So verification is a related but a different concept. So what verification talks about is, is this content original or modified or copied in some form? Is it, is it authentic? Is it, is it, is it real or fake? Or is this accurate or inaccurate? Or is this consistent? Or are there internal inconsistencies between the content that is supposed to be the same, but not really? So these, all these aspects fall under the purview of verification. So there are some open source industry collaborations, such as C two PA, which stands for coalition for content provenance and authenticity. And, you know, there are, there are other, you know, industry bodies like that that are trying to build a commonly accepted format and structure for metadata that needs to be attached to various forms of content, which can then be examined by end users and end consumers by using various tools, which are also expected to be open, open source. But the key problem with these type of approaches is that someone who is motivated enough can actually mess with the cryptographic signatures. They can mess with all this metadata that we're talking about that is supposed to establish the authenticity that the provenance and the veracity, or, you know, the verification of the content. So one way to solve this problem is blockchain. So blockchain or other distribution, distributed ledger technologies, as you know, depend not on a source of truth that is centralized, but they establish the source of truth via consensus in a distributed form. So by leveraging, you know, blockchain technologies to incorporate this information, this meta information on various types of content in a distributed fashion, we have a more, much better chance of coming to a consensus on the provenance and the authenticity of digital content. So we talked about technological approaches, we talked about media consciousness, and also regulatory approaches that can all address the problem of deepfakes. But what we need to keep in mind is any of these approaches on a standalone basis, or even when they are all in motion, even if they're operating in their own silos, it's very hard to achieve the final, ultimate goals that we all desire, which is establishing, you know, the authenticity of content and having only authentic information out there for people to consume, etcetera. So in order to, you know, in order for all these approaches to work in tandem and achieve the desired goals necessary, and collaboration is very important. For example, we talked about how we can leverage blockchain technologies to enable the regulatory frameworks that are being put in place. So these type of interactions are crucial for each of these stakeholders to understand the various advancements in other disciplines and kind of coordinate with each other to come up with a coherent approach to address the problem of deepfakes. So, so, as a conclusion, you know, so if you take home one thing from today's discussion, I would say that generative AI has a lot of potential, as we talked about, we only talked about three examples, but there's a whole lot going on in the field of generating AI in terms of applications, et cetera. So the future is bright, but at the same time, we also need to keep in mind the various risks associated and innovation needs to progress not only on the side of building new applications, coming up with new techniques to improve and enhance the abilities of generative AI, but we also need to keep the risks in mind and make a concomitant progress on the security side of generative AI as well. So it is crucial to put the right incentives in place for continued research and investments that are necessary for these two things to progress, you know, in lockstep with each other. So that's all I have to share today. Thanks a lot for, you know, listening to my talk today, and I hope you got something out of it. I hope you got some insights that, you know, that could be potentially useful in your work. But I'll be in the hallway track and I would love to, you know, discuss further and, you know, I would like to hear your opinions and inputs on what you think are the, are the most pressing problems in the field of generative AI and how you think we can collaborate from various disciplines to mitigate all these problems. Thank you. I hope to talk to you soon.
...

Raj Kollimarla

Lead Machine Learning Engineer @ Block

Raj Kollimarla's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)