What you will learn from this course?

– Light My Fire: Starting To Use Spark With dplyr Syntax
Getting started
Made for each other
Here be dragons
The connect-work-disconnect pattern
Copying data into Spark
Big data, tiny tibble
Exploring the structure of tibbles
Selecting columns
Filtering rows
Arranging rows
Mutating columns
Summarizing columns

– Going Native: Use The Native Interface to Manipulate Spark DataFrames
Two new interfaces
Popcorn double feature
Transforming continuous variables to logical
Transforming continuous variables into categorical (1)
Transforming continuous variables into categorical (2)
More than words: tokenization (1)
More than words: tokenization (2)
More than words: tokenization (3)
Sorting vs. arranging
Exploring Spark data types
Shrinking the data by sampling
Training/testing partitions

– Tools of the Trade: Advanced dplyr UsageLeveling up
Mother’s little helper (1)
Mother’s little helper (2)
Selecting unique rows
Common people
Collecting data back from Spark
Storing intermediate results
Groups: great for music, great for data
Groups of mutants
Advanced Selection II: The SQL
Left joins
Anti joins
Semi joins

– Case Study: Learning to be a Machine: Running Machine Learning Models on Spark
Machine learning on Spark
Machine learning functions
(Hey you) What’s that sound?
Working with parquet files
Come together
Partitioning data with a group effect
Gradient boosted trees: modeling
Gradient boosted trees: prediction
Gradient boosted trees: visualization
Random Forest: modeling
Random Forest: prediction
Random Forest: visualization
Comparing model performance

Certification : No
Time to complete : 4 hours
Cost : $29 per month.
Course Level : Beginner
Language : English