Call : (+91) 968636 4243
Mail : info@EncartaLabs.com
EncartaLabs

Data Mining Techniques - Theory and Practice

( Duration: 3 Days )

The Data Mining Techniques -Theory and Practice training course introduces data mining methodology that is a superset to the SAS SEMMA methodology around which SAS Enterprise Miner is organized. The course also introduces a wide range of data mining algorithms and both theoretical knowledge and practical skills. In this class, you will work through all the steps of a data mining project, beginning with problem definition and data selection, and continuing through data exploration, data transformation, sampling, portioning, modeling, and assessment.

By attending Data Mining Techniques -Theory and Practice workshop, delegates will learn to:

  • Use a data mining methodology
  • Build and use decision trees and neural networks for modeling and scoring
  • Use survival analysis and create survival curves

This Data Mining Techniques -Theory and Practice class is suitable for Business analysts, their managers, and statisticians

COURSE AGENDA

1

Introduction to Data Mining

  • What is data mining?
  • Directed and undirected data mining
  • Models
  • Profiling and prediction
2

Data Mining Methodology

  • Why have a methodology?
  • How data miners can inadvertently learn things that are not true
  • Translating business problems into data mining problems
  • The importance of model stability
  • Finding the right input variables
  • Sampling to create balanced model sets
  • Partitioning to create training, validation, and test sets
  • Data preparation
  • Model assessment
3

Data Exploration

  • Developing intuition about data
  • Data structure
  • Data types
  • Data values
  • Exploring distributions
  • Summary statistics
  • Histograms
  • Using SAS Enterprise Miner for data exploration
4

Regression Models

  • The null hypothesis
  • Statistical significance
  • Confidence bounds
  • Variance and standard deviation
  • Standardized values
  • Correlation
  • Linear regression
  • Logistic regression
  • Using SAS Enterprise Miner to build regression models
5

Decision Trees

  • Decision trees as data exploration and classification tools
  • Decision trees for modeling and scoring
  • Decision trees for variable selection
  • Alternate representations of decision trees
  • Algorithms used to build decision trees
  • Splitting criteria
  • Recognizing instability and overfitting in decision tree models
  • Capturing interactions between variables
  • Using SAS Enterprise Miner to build decision trees
6

Neural Networks

  • Origins of neural networks
  • Neural networks compared with regression
  • Algorithms used to train neural networks
  • Data preparation requirements for neural networks
  • Picking appropriate inputs for neural networks
  • Creating neural network models using SAS Enterprise Miner
7

Memory-Based Reasoning

  • Similarity and distance
  • Distance metrics appropriate for different kinds of data
  • The role of the training set in memory-based reasoning (MBR)
  • Combining the votes of several neighbors
  • Other K-nearest neighbor techniques
  • Collaborative filtering
  • Using the SAS Enterprise Miner MBR node
8

Clustering

  • More on similarity and distance
  • The k-means algorithm
  • Divisive clustering
  • Agglomerative clustering
  • Data preparation for clustering
  • Interpreting clusters
  • Finding clusters with SAS Enterprise Miner
9

Survival Analysis

  • Origins of survival analysis
  • How business data is different from clinical data
  • Hazards and hazard charts
  • Retention curves and survival curves
  • Calculating survival from retention
  • Calculating hazards empirically
  • Parametric hazard models
  • Censoring
  • Competing risks
  • Survival-Based forecasting
  • Using SAS code in SAS Enterprise Miner to create survival curves
10

Association Rules

  • Market basket analysis
  • Association rules
  • Sequential pattern analysis
  • Using SAS Enterprise Miner to discover associations in retail data
11

Link Analysis

  • Background on graph theory
  • Sphere of influence
  • Using link analysis to generate derived variables
  • Graph-Coloring algorithm
  • Kleinberg's algorithm
12

Genetic Algorithms

  • Optimization techniques and problems (SAS/OR software)
  • Other algorithms
  • Linear programming problems
  • Genetic algorithms

Encarta Labs Advantage

  • One Stop Corporate Training Solution Providers for over 6,000 various courses on a variety of subjects
  • All courses are delivered by Industry Veterans
  • Get jumpstarted from newbie to production ready in a matter of few days
  • Trained more than 50,000 Corporate executives across the Globe
  • All our trainings are conducted in workshop mode with more focus on hands-on sessions

View our other course offerings by visiting https://www.encartalabs.com/course-catalogue-all.php

Contact us for delivering this course as a public/open-house workshop/online training for a group of 10+ candidates.

Top
Notice
X