Conf42 Large Language Models (LLMs) 2025 - Online

- premiere 5PM GMT

LLM hacking is underrated

Abstract

Discover how fine-tuning poisoning can strip LLM safety measures without compromising performance. Dive into the BadGPT attack, a novel approach that bypasses guardrails, avoids token overhead, and retains model efficiency. Learn why securing LLMs is an ongoing challenge in AI alignment.

...

Dmitriy Volkov

Research Lead @ Palisade

Dmitriy Volkov's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)