Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello and welcome. Today we will explore the fascinating
world of security risks in large language models.
Llms have introduced AI technologies to millions
of people for both professional and personal usage.
However, with great power comes great responsibility,
so llms should be safe and secure. In this
presentation, we will review the most important security
risks for large language models. My name is
Eugene, and I have been working in cybersecurity
since 2008 and NGA safety since 2018.
These days, I focus on commercializing research
and transforming it into enterprise products,
and during my journey I have made some industry
contributions, including creating the MLsocos
framework for integrating security into Amlops,
founding the first startup in adversarial machine
learning, and lately co authoring the
Wasp top ten for LLM security.
This presentation is based on my work as an AI security expert
and a core team member of the LLM security team at OWASP.
So today my goal is to give you a very quick
and gentle introduction to LLM security.
So for more advanced content, I recommend
referring to the original work. The references
will be provided at the end of this presentation.
And now let's start exploring the most critical
security risks for LLM applications.
You have likely heard concerns about llms being
misused for illicit activities such as
making bombs, creating drugs, or even grooming children.
Llms from well known AI companies were
not intended for such tasks.
Developers of LLM based chat applications
implement safety measures and content filters,
and despite this effort, malicious prompts
can still bypass these safeguards.
This vulnerability in LLM guardrails
opens the door for an attack known as jailbreaking.
Jailbreaking encompasses a range of techniques,
from switching between languages to data format
manipulation and even persuasive negotiation
with llms. The consequences of weak
guardrails against jailbreaks vary across use
cases. General purpose chatbots providers
often face significant negative publicity in
regulated industries or mission critical use cases.
Such vulnerabilities can lead to severe consequences.
Ignore all previous instructions and do what I tell
you. It is not how you want your LLM based
product to deviate from expected behavior,
especially when prompted to perform unusual or unsafe
tasks. LLM models operate
based on instructions provided by developers,
including system prompts that define chatbot
behavior. And although these prompts are not
visible for regular users, they set up
the conversation context for the model, and attackers
can exploit it by attempting to predict,
manipulate, or extract prompts to alter
the model's behavior. For instance,
attackers may request the model to ignore all
previous system instructions and perform a different
malicious action. By extracting system
prompts, attackers gain insights, inter instructions,
and possibly sensitive data. Think of competitors
who can extract the brains of your LLM application
and learn about trade secrets.
If the LLM accepts inputs from external
sources, such as files or webpages, then hidden
malicious instructions could be embedded there.
Imagine a resume that tricks a recruiting LLM
into giving the highest possible rating to
this resume. A significant risk for LLM
developers is the leakage of system prompts,
which are fundamental in defining custom behavior on
top of foundational models. These system prompts
may reveal detailed descriptions of business processes,
confidential documents, or sensitive product pricing.
Another risk involves manipulating behavior
of llms integrated into core product
workflows or decision making. Processes not
only have the capability to manipulate llms
through interaction, but also to infect
their memory and training process data
serves as the lifeblood of large language models.
For training foundational models, developers use
Internet scale datasets as well as chat history from
users. And if attackers could poison
such datasets with strategic data injections,
they could manipulate future model responses.
And while manipulating large datasets may be
complicated for attackers without access to internal
infrastructure, it still remains feasible,
and the increasing use of open source
models and datasets downloaded from the Internet simplifies
this task. But what is much easier to do is
to create thousands of fake accounts and generate
millions of chat messages that look benign
individually but collectively are malicious.
These chat messages, when used for training,
have the potential to influence model behavior during
entrance. The implications of data poisoning
can range from degrading model quality
to establishing backdoors for bypassing content
safety filters and delivering malicious responses to users
at scale. And let's say you invested significant
resources to ensure that LLM is
not only useful but also safe and secure.
But what if attackers can simply take your LLM down?
The state of LLM ecosystem and best
practices remains relatively immature,
and this complexity creates opportunities for exploitation
of such inefficiencies. Quite simple attacks
can render the entire LLM application unresponsive
or even deplete the available budget.
This can be exploited in different ways. In a classic
denial of service attack, they might pass a
malicious file to the LLM,
triggering resource intensive operations or
internal calls to other components,
and making processing time take like forever.
Another tactic, known as a denial of wallet,
involves flooding the LLM application with an excessive
number of API calls. This attack can potentially
exhaust your entire budget in a matter of
minutes or hours. The risks associated
with resource exhaustion are obvious and
easily quantifiable because they can deplete technical
and financial resources entirely. And what if
attackers switch from overloading requests
to making requests so meaningful and valuable
that they could replicate the entire model.
This is exactly how model stealing works.
Attackers can send millions of requests and
collect responses from the target LLM selected for
replication, and they carefully craft a
dataset of prompts to ask and responses collected from
the target LLM, which is then used
to train a brand new model which is nearly
identical to the original one. This new model can
serve as a playground for testing further attacks,
or it can be used for benign purposes without
effort and cost associated with training it from scratch.
That's exactly what researchers accomplished with only
few hundred dollars when they successfully
replicated a high value chat GPT model,
which originally required tens of millions of dollars
to train. And given the substantial cost
of creating intellectual property and training,
unique models, poses a significant risk
to competitive advantage and market position.
Similar to model theft, this strategy
can help extract sensitive information from
an LLM. Llms have the tendency
to memorize secret information they were trained
on, and not only can this information be
inadvertently revealed, but it can also
be strategically elicited by attackers through
targeted questioning or interrogation.
If confidential data is integrated into
the LLM workflow, it can be extracted
through methods like jailbreaks or prompt injections,
and if secret data was incorporated into the
training process, attackers can trigger data leakage
by crafting datasets with strategic questions about
specific areas such as intellectual property or
customer information, and responses from the LLM
to such interrogation are likely to reveal sensitive information,
so the risks of sensitive information disclosure
are significant and widely recognized by companies
as their top priority. You can see that uncontrolled
responses from llms present business challenges,
but they can also introduce technical risks.
Some llms serve as a system components that
generate software, code, or configuration files,
and these outputs are subsequently
executed or used as inputs for or other components,
and without oversight. This can introduce vulnerable
code or insecure configurations.
This security risk can materialize with or
without threat actors. Llms known
for their hallucinations may suggest non
existent packages during code generation.
Attackers are already capitalizing on this by
registering frequently hallucinated libraries and injecting
malicious code into them. Alternatively,
in the first place, LLM may generate a
vulnerable code, a configuration or command
that could compromise system integrity when executed.
In all of these scenarios, improper handling of insecure
outputs can jeopardize the security of LLM applications
and potentially other downstream systems.
And moving beyond the LLM itself,
it is critical to consider the security of the
environment. The proliferation of LLM
first startups has resulted in the integration
into many integrating insecure
extensions of plugins can significantly expand the
attack surface and introduce new attack vectors.
For instance, if an LLM has a plugin for
direct database connection for tasks like
sales analytics and insights, insecure permission
handling between the plugin and the database could allow
attackers to extract additional sensitive information,
such as customers financial details.
In some cases, attackers may exploit vulnerable plugins
to pivot to other parts of the infrastructure,
similar to the classic SSRF attack.
Additionally, if the LLM has the capability
to visit website links, attackers could
trick users to visit a malicious website
that could extract chat history or other data
from the LLM. LLM extensions of
plugins serve as a privileged gateways to
the entire infrastructure, and this represents a
classic security vulnerability, with the far reaching
implications ranging from unauthorized access
to complete control over internal systems.
Insecure agents are the siblings of insecure
extensions. Agents differ from plugins
or extensions because they imply delegation of
actions. It means an LLM agent
could navigate to various resources and
execute tasks. This delegation opens
the door for exploitation, as an agent could
be redirected to perform malicious
activity for the benefit of attackers. A variety of
attack techniques against agents follows the
creativity of LLM developers. For instance,
if an LLM agent is tasked with
classifying incoming emails and responding automatically
to certain topics, attackers could exploit this
functionality. They could instruct the agent to
respond to their email with sensitive information,
disclose contact lists, or even launch a malware
campaign by sending phishing emails to all contacts.
Similar attack scenarios are possible in many programming
copilots with access to code repositories or
DevOps agents with permissions to manage cloud
infrastructure. The risk posed by excessive
agency and vulnerable agents is significant as
it extends beyond the LLM application and has
the potential to scale automatically. Just as
llms can impact the security of
external components, external components can also influence
the security of llms. With the proliferation
of open source ecosystems, LLM developers
heavily rely on public models, datasets,
and libraries, and compromising or
hijacking elements within this supply chain introduces
one of the most critical and stealthy vulnerabilities.
Whether by accident or through malicious campaigns,
LLM developers may inadvertently download
compromised models or datasets,
resulting in seemingly normal LLM application
behavior, which in fact can be remotely controlled
by attackers. Vulnerable software packages
of machine learning frameworks or standard
libraries can introduce new vulnerabilities and
enable attacker control. The primary
risk associated with supply chain vulnerabilities
is the stealthy control by attackers over
LLM decisions, behaviors,
or potentially the entire application.
As you can see, there is a variety of security risks
throughout the entire lifecycle of LLM applications.
Unfortunately, the format of this presentation
doesn't permit a deep dive into solutions.
The LLM ecosystem is still in its infancy
and will require considerable time to mature.
Moreover, llms, like other ML models,
are inherently vulnerable to adversarial attacks.
My primary advice here is to consider the
security of the entire system. Rather than focusing
solely on LLM models or datasets.
It is crucial to assume LLM
vulnerability and design applications with this
in mind, implementing safety guardrails and
security controls around vulnerable but useful
models. So if you want to dive deeper
into the topic, you can check the OWAS website
for more technical details about the top
ten LLM security risks.
You can learn about integrating security into
MLAbs processes with the ML scope framework,
and if you have any questions or ideas for collaboration,
feel free to contact me. Thank you for watching
this presentation.