Get in Touch

Course Outline

Introduction:

  • Apache Spark within the Hadoop Ecosystem
  • Brief introduction to Python and Scala

Foundational Concepts (Theory):

  • Architecture
  • RDDs
  • Transformations and Actions
  • Stages, Tasks, and Dependencies

Hands-on Workshop: Mastering Basics in the Databricks Environment:

  • Exercises utilizing the RDD API
  • Core action and transformation functions
  • PairRDDs
  • Joins
  • Caching strategies
  • Exercises utilizing the DataFrame API
  • Spark SQL
  • DataFrame operations: select, filter, group, sort
  • User-Defined Functions (UDFs)
  • Exploring the DataSet API
  • Streaming

Hands-on Workshop: Understanding Deployment in the AWS Environment:

  • Fundamentals of AWS Glue
  • Differences between AWS EMR and AWS Glue
  • Example jobs run on both platforms
  • Advantages and disadvantages

Additional Topics:

  • Introduction to Apache Airflow for orchestration

Requirements

Programming skills (preferably in Python or Scala)

Basic knowledge of SQL

 21 Hours

Number of participants


Price per participant

Testimonials (3)

Upcoming Courses

Related Categories