Spark is known for its in-memory computation. But in-memory computation, particularly inner-join on large datasets, causes issues with backtracing on how data got filtered out in each stage. This talk highlights lessons learned from production and how we pivoted towards one over the other.
Priority access to all content
Video hallway track
Community chat
Exclusive promotions and giveaways