Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Mastering Spark Internals
Introduction
Welcome! (1:48)
Why is it important to understand Spark internals? (1:24)
Why is Spark such a valuable skill? (3:09)
What is Spark? (0:45)
Components of Spark (3:14)
Understanding Spark's Execution Model
High-level architecture (4:05)
Important terminology (2:30)
Understanding the local deployment mode (1:22)
How is parallelism achieved? (5:36)
Understanding Spark's foundation: MapReduce (7:39)
How does Spark relate to MapReduce? (2:19)
Introduction to RDDs (Resilient Distributed Datasets) (5:17)
Spark's implementation of RDDs (3:02)
DAG: The directed acyclic graph (4:23)
Understanding narrow and wide dependencies (2:38)
Dependencies vs. transformations (2:55)
Spark Core's optimization: Pipelining (2:29)
Physical planning in Spark Core (6:52)
Tasks: The unit of execution (2:48)
Scheduling of tasks (3:22)
Task execution on executors (2:38)
Memory management in Spark (10:43)
Understanding SparkSQL
Introduction to SparkSQL (3:02)
Our example use-case (2:36)
Implementation of the example use-case (1:23)
Jupyter notebook, implementation & exploring plans (12:48)
The Catalyst: SparkSQL's optimization engine (3:04)
What's a logical plan? (3:54)
Planning step 1: Analysis (4:17)
Planning step 2: Logical planning (6:02)
Planning step 3: Physical planning (13:43)
Planning step 4: Code generation (2:31)
Conclusion
Conclusion (3:33)
Teach online with
Important terminology
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock