Call : (+91) 968636 4243
Mail : info@EncartaLabs.com

Apache Spark with Scala for Big Data Solutions

( Duration: 4 Days )

In Apache Spark with Scala for Big Data Solutions training course, you will learn to leverage Spark best practices, develop solutions that run on the Apache Spark platform, and take advantage of Spark's efficient use of memory and powerful programming model. Learn to supercharge your data with Apache Spark, a big data platform well-suited for iterative algorithms required by graph analytics and machine learning.

By attending Apache Spark with Scala for Big Data Solutions workshop, delegates will learn to:

Develop applications with Spark
Work with the libraries for SQL, Streaming, and Machine Learning
Map real-world problems to parallel algorithms
Build business applications that integrate with Spark

A minimum of 6 months Professional programming experience Java or C#

COURSE AGENDA

Introduction to Spark

Defining Big Data and Big Computation
What is Spark?
What are the benefits of Spark?

Scaling-out applications

Identifying the performance limitations of a modern CPU
Scaling traditional parallel processing models

Designing parallel algorithms

Fostering parallelism through functional programming
Mapping real-world problems to effective parallel algorithms

Parallelizing data structures

Partitioning data across the cluster using Resilient Distributed Datasets (RDD) and DataFrames
Apportioning task execution across multiple nodes
Running applications with the Spark execution model

The anatomy of a Spark cluster

Creating resilient and fault-tolerant clusters
Achieving scalable distributed storage

Managing the cluster

Monitoring and administering Spark applications
Visualizing execution plans and results

Selecting the development environment

Performing exploratory programming via the Spark shell
Building stand-alone Spark applications

Working with the Spark APIs

Programming with Scala and other supported languages
Building applications with the core APIs
Enriching applications with the bundled libraries

Querying structured data

Processing queries with DataFrames and embedded SQL
Extending SQL with User-Defined Functions (UDFs)
Exploiting Parquet and JSON formatted data sets

Integrating with external systems

Connecting to databases with JDBC
Executing Hive queries in external applications

What is streaming?

Implementing sliding window operations
Determining state from continuous data
Processing simultaneous streams
Improving performance and reliability

Streaming data sources

Streaming from built-in sources (e.g., log files, Twitter sockets, Kinesis, Kafka)
Developing custom receivers
Processing with the streaming API and Spark SQL

Classifying observations

Predicting outcomes with supervised learning
Building a decision tree classifier

Identifying patterns

Grouping data using unsupervised learning
Clustering with the k-means method

Encarta Labs Advantage

One Stop Corporate Training Solution Providers for over 6,000 various courses on a variety of subjects
All courses are delivered by Industry Veterans
Get jumpstarted from newbie to production ready in a matter of few days

Trained more than 50,000 Corporate executives across the Globe
All our trainings are conducted in workshop mode with more focus on hands-on sessions

View our other course offerings by visiting https://www.encartalabs.com/course-catalogue-all.php

Contact us for delivering this course as a public/open-house workshop/online training for a group of 10+ candidates.

Top

Notice