March 06 2025 - premiere 5PM GMT

Cloud-Native ML Infrastructure: Building Resilient Apache Spark Clusters on Kubernetes for AI Workloads

Abstract

The convergence of AI/ML workloads with cloud-native infrastructure presents unique challenges in scalability, resource utilization, and operational complexity. This talk demonstrates how to architect production-grade Apache Spark clusters on Kubernetes that specifically cater to the demands of modern AI/ML applications while adhering to cloud-native principles.

Key Topics

Implementing cloud-native patterns for Spark on Kubernetes: statelessness, declarative configurations, and immutable infrastructure
Designing resilient architectures for ML workloads using Kubernetes operators and custom controllers
Container optimization strategies for Spark executors running AI/ML workloads
Network optimization and storage patterns for large-scale data processing
Implementing GitOps workflows for Spark-based ML infrastructure
Monitoring and observability solutions for distributed ML training

Attendees will gain practical insights into building cloud-native data infrastructure that scales effectively for AI/ML workloads, with real-world examples of Kubernetes configurations, deployment patterns, and operational best practices.

See all 81 talks at this event!

Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Newsletter

$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Email address

First Name

Last Name

Company

Job Title

Phone Number

Country

Community

$ 8.34 /mo

Access to Circle community platform

Immediate access to all content

Live events!

Regular office hours, Q&As, CV reviews

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)

Conf42 Cloud Native 2025 - Online

March 06 2025 - premiere 5PM GMT

Cloud-Native ML Infrastructure: Building Resilient Apache Spark Clusters on Kubernetes for AI Workloads

Abstract

Anant Kumar

Tech Lead @ Salesforce

Join the community!

Featured event

2025

2024

Info

Conf42 Cloud Native 2025 - Online

March 06 2025 - premiere 5PM GMT

Cloud-Native ML Infrastructure: Building Resilient Apache Spark Clusters on Kubernetes for AI Workloads

Abstract

Anant Kumar

Tech Lead @ Salesforce

Join the community!