This Apache Spark Programming with Databricks training course uses a case study driven approach to explore the fundamentals of Spark Programming with Databricks, including Spark architecture, the DataFrame API, query optimization, and Structured Streaming. First, you will become familiar with Databricks and Spark, recognize their major components, and explore datasets for the case study using the Databricks environment. After ingesting data from various file formats, you will process and analyze datasets by applying a variety of DataFrame transformations, Column expressions, and built-in functions. Lastly, you will execute streaming queries to process streaming data and highlight the advantages of using Delta Lake.
By attending Apache Spark Programming with Databricks workshop, delegates will learn to:
- Define the major components of Spark architecture and execution hierarchy
- Describe how DataFrames are built, transformed, and evaluated in Spark
- Apply the DataFrame API to explore, preprocess, join, and ingest data in Spark
- Apply the Structured Streaming API to perform analytics on streaming data
- Navigate the Spark UI and describe how the catalyst optimizer, partitioning, and caching affect Spark's execution performance
- Familiarity with basic SQL concepts (select, filter, groupby, join, etc.)
- Beginner programming experience with Python or Scala (syntax, conditions, loops, functions)
The Apache Spark Programming with Databricks class is ideal for:
- Data engineers
- Data scientists
- Machine learning engineers
- Data architects