March 20 2025 - premiere 5PM GMT

LLM hacking is underrated

Abstract

Discover how fine-tuning poisoning can strip LLM safety measures without compromising performance. Dive into the BadGPT attack, a novel approach that bypasses guardrails, avoids token overhead, and retains model efficiency. Learn why securing LLMs is an ongoing challenge in AI alignment.

See all 40 talks at this event!

Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Newsletter

$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Email address

First Name

Last Name

Company

Job Title

Phone Number

Country

Community

$ 8.34 /mo

Access to Circle community platform

Immediate access to all content

Live events!

Regular office hours, Q&As, CV reviews

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)

Conf42 Large Language Models (LLMs) 2025 - Online

March 20 2025 - premiere 5PM GMT

LLM hacking is underrated

Abstract

Dmitriy Volkov

Research Lead @ Palisade

Join the community!

Featured event

2025

2024

Info

Conf42 Large Language Models (LLMs) 2025 - Online

March 20 2025 - premiere 5PM GMT

LLM hacking is underrated

Abstract

Dmitriy Volkov

Research Lead @ Palisade

Join the community!