Spark is known for its in-memory computation. But in-memory computation, particularly inner-join on large datasets, causes issues with backtracing on how data got filtered out in each stage. This talk highlights lessons learned from production and how we pivoted towards one over the other.
Learn for free, join the best tech learning community for a price of a pumpkin latte.
Event notifications, weekly newsletter
Delayed access to all content
Immediate access to Keynotes & Panels
Access to Circle community platform
Immediate access to all content
Courses, quizes & certificates
Community chats