Data Science with Python Training Course and Workshop in Bangalore, Mysore, Chennai, Hyderabad, Pune, Mumbai, Delhi, Noida, Gurgaon, Kolkata

The Data Science with Python training course teaches engineers, data scientists, statisticians, and other quantitative professionals the Python skills they need to use the Python programming language to analyze and chart data.

By attending Data Science with Python workshop, delegates will learn to:

Understand the difference between Python basic data types
Know when to use different python collections
Ability to implement python functions
Understand control flow constructs in Python
Handle errors via exception handling constructs
Be able to quantitatively define an answerable, actionable question
Import both structured and unstructured data into Python
Parse unstructured data into structured formats
Understand the differences between NumPy arrays and pandas dataframes
Overview of where Python fits in the Python/Hadoop/Spark ecosystem
Simulate data through random number generation
Understand mechanisms for missing data and analytic implications
Explore and Clean Data
Create compelling graphics to reveal analytic results
Reshape and merge data to prepare for advanced analytics
Find test for group differences using inferential statistics
Implement linear regression from a frequentist perspective
Understand non-linear terms, confounding, and interaction in linear regression
Extend to logistic regression to model binary outcomes
Understand the difference between machine learning and frequentist approaches to statistics
Implement classification and regression models using machine learning
Score new datasets, evaluate model fit, and quantify variable importance

Programming experience and an understanding of basic statistics.

Base Python Introduction

History and current use

Installing the Software
Python Distributions

String Literals and numeric objects
Collections (lists, tuples, dicts)
Datetime classes in Python
Memory Management in Python
Control Flow
Functions
Exception Handling

Defining Actionable, Analytic Questions

Defining the quantitative construct to make inference on the question
Identifying the data needed to support the constructs
Identifying limitations to the data and analytic approach
Constructing Sensitivity analyses

Bringing Data In

Structured Data

Structured Text Files
Excel workbooks
SQL databases

Working with Unstructured Text Data

Reading Unstructured Text
Introduction to Natural Language Processing with Python

NumPy: Matrix Language

Introduction to the ndarray
NumPy operations
Broadcasting
Missing data in NumPy (masked array)
NumPy Structured arrays
Random number generation

Data Preparation with Pandas

Filtering
Creating and deleting variables
Discretization of Continuous Data
Scaling and standardizing data
Identifying Duplicates
Dummy Coding
Combining Datasets
Transposing Data
Long to wide and back

Exploratory Data Analysis with Pandas

Univariate Statistical Summaries and Detecting Outliers
Multivariate Statistical Summaries and Outlier Detection
Group-wise calculations using Pandas
Pivot Tables

Exploring Data Graphically

Histogram
Box-and-whiskers plot
Scatter plots
Forest Plots
Group-by plotting

Advanced Graphing with Matplotlib, Pandas, and Seaborn

Python, Hadoop and Spark

Introduction to the difference in Python, Hadoop, and Spark
Importing data from Spark and Hadoop to Python
Parallel execution leveraging Spark or Hadoop

Missing Data

Exploring and understanding patterns in missing data
Missing at Random
Missing Not at Random
Missing Completely at Random
Data imputation methods

Traditional Inferential Statistics

Comparing Groups

P-Values, summary statistics, sufficient statistics, inferential targets
T-Tests (equal and unequal variances)
ANOVA
Chi-Square Tests

Correlation

Frequentist Approaches to Multivariate Statistics

Linear Regression

Multivariate linear regression
Capturing Non-linear Relationships
Comparing Model Fits
Scoring new data
Poisson Regression Extension

Logistic regression

Logistic Regression Example
Classification Metrics

Machine Learning Approaches to Multivariate Statistics

Machine Learning Theory
Data pre-processing

Missing Data
Dummy Coding
Standardization
Training/Test data

Supervised Versus Unsupervised Learning
Unsupervised Learning: Clustering

Clustering Algorithms
Evaluating Cluster Performance

Dimensionality Reduction

A-priori
Principal Components Analysis
Penalized Regression

Supervised Learning: Regression

Linear Regression
Penalized Linear Regression
Stochastic Gradient Descent
Scoring New Data Sets
Cross Validation
Variance Bias-Tradeoff
Feature Importance

Supervised Learning: Classification

Logistic Regression
LASSO
Random Forest
Ensemble Methods
Feature Importance
Scoring New Data Sets
Cross Validation

Encarta Labs Advantage

One Stop Corporate Training Solution Providers for over 6,000 various courses on a variety of subjects
All courses are delivered by Industry Veterans
Get jumpstarted from newbie to production ready in a matter of few days

Trained more than 50,000 Corporate executives across the Globe
All our trainings are conducted in workshop mode with more focus on hands-on sessions

Data Science with Python

COURSE AGENDA