In IBM InfoSphere DataStage - Essentials training course, you will learn to acquire the skills necessary to develop parallel jobs in DataStage. You will learn to create parallel jobs that access sequential and relational data and combine and transform the data using functions and other job components.
In IBM InfoSphere DataStage - Advanced Data Processing training course, you will learn to develop data techniques for processing different types of complex data resources including relational data, unstructured data (Excel spreadsheets), Hadoop HDFS files, and XML data. In addition, you will learn advanced techniques for processing data, including techniques for masking data and techniques for validating data using data rules. Finally, you will learn techniques for updating data in a star schema data warehouse using the DataStage SCD (Slowly Changing Dimensions) stage.
In IBM InfoSphere Advanced DataStage - Parallel Framework training course, you will learn to develop a deeper understanding of the DataStage architecture, including a deeper understanding of the DataStage development and runtime environments. This will enable you to design parallel jobs that are robust, less subject to errors, reusable, and optimized for better performance.
By attending IBM InfoSphere DataStage - Essentials workshop, delegates will learn to:
- Describe the uses of DataStage and the DataStage workflow
- Describe the Information Server architecture and how DataStage fits within it
- Describe the Information Server and DataStage deployment options
- Use the Information Server Web Console and the DataStage Administrator client to create DataStage users and to configure the DataStage environment
- Import and export DataStage objects to a file
- Import table definitions for sequential files and relational tables
- Design, compile, run, and monitor DataStage parallel jobs
- Design jobs that read and write to sequential files
- Describe the DataStage parallel processing architecture
- Design jobs that combine data using joins and lookups
- Design jobs that sort and aggregate data
- Implement complex business logic using the DataStage Transformer stage
- Debug DataStage jobs using the DataStage PX Debugger
By attending IBM InfoSphere DataStage - Advanced Data Processing workshop, delegates will learn to:
- Use Connector stages to read from and write to database tables
- Handle SQL errors in Connector stages
- Use Connector stages with multiple input links
- Use the File Connector stage to access Hadoop HDFS data
- Optimize jobs that write to database tables
- Use the Unstructured Data stage to extract data from Excel spreadsheets
- Use the Data Masking stage to mask sensitive data processed within a DataStage job
- Use the Hierarchical stage to parse, compose, and transform XML data
- Use the Schema Library Manager to import and manage XML schemas
- Use the Data Rules stage to validate fields of data within a DataStage job
- Create custom data rules for validating data
- Design a job that processes a star schema data warehouse with Type 1 and Type 2 slowly changing dimensions
By attending IBM InfoSphere Advanced DataStage - Parallel Framework workshop, delegates will learn to:
- Describe the parallel processing architecture
- Describe pipeline and partition parallelism
- Describe the role and elements of the DataStage configuration file
- Describe the compile process and how it is represented in the OSH
- Describe the runtime job execution process and how it is depicted in the Score
- Describe how data partitioning and collecting works in the parallel framework
- List and select partitioning and collecting algorithms
- Describe sorting in the parallel framework
- Describe optimization techniques for sorting
- Describe sort key and partitioner key logic in the parallel framework
- Describe buffering in the parallel framework
- Describe optimization techniques for buffering
- Describe and work with parallel framework data types and elements, including virtual data sets and schemas
- Describe the function and use of Runtime Column Propagation (RCP) in DataStage parallel jobs
- Create reusable job components using shared containers
- Describe the function and use of Balanced Optimization
- Optimize DataStage parallel jobs using Balanced Optimization
For IBM InfoSphere DataStage - Essentials
- Knowledge of the Windows OS
- Familiarity with database access technique
For IBM InfoSphere DataStage - Advanced Data Processing
- Attend IBM InfoSphere DataStage - Essentials or equivalent experience
For IBM InfoSphere Advanced DataStage - Parallel Framework
- At least one year of experience developing parallel jobs using DataStage.
This IBM InfoSphere DataStage - Essentials class is meant for project administrators and ETL developers
The IBM InfoSphere DataStage - Advanced Data Processing class is meant for Experienced DataStage developers seeking training in more advanced DataStage job techniques and who seek techniques for working with complex types of data resources.
The IBM InfoSphere Advanced DataStage - Parallel Framework class is designed for experienced DataStage developers seeking training in more advanced DataStage job techniques and who are seeking an understanding of the parallel framework architecture.
