Description
The Author Recommend This Course :
Taming Big Data with Apache Spark and Python – Hands On!
Apache Spark tutorial with 20+ hands-on examples of analyzing large data sets on your desktop or on Hadoop with Python!
Disclaimer : This courses is the recommendation from the author , and not the work of bigdataprogrammers.com
What you’ll learn
- Use DataFrames and Structured Streaming in Spark 3
- Frame big data analysis problems as Spark problems
- Use Amazon’s Elastic MapReduce service to run your job on a cluster with Hadoop YARN
- Install and run Apache Spark on a desktop computer or on a cluster
- Use Spark’s Resilient Distributed Datasets to process and analyze large data sets across many CPU’s
- Implement iterative algorithms such as breadth-first-search using Spark
- Use the MLLib machine learning library to answer common data mining questions
- Understand how Spark SQL lets you work with structured data
- Understand how Spark Streaming lets your process continuous streams of data in real time
- Tune and troubleshoot large jobs running on a cluster
- Share information between nodes on a Spark cluster using broadcast variables and accumulators
- Understand how the GraphX library helps with network analysis problems