Informatica Guide

download Informatica Guide

of 159

Transcript of Informatica Guide

  • 8/13/2019 Informatica Guide

    1/159

    INFORMATICA

    PREPARED BY:

    Ammar Hasan

  • 8/13/2019 Informatica Guide

    2/159

    CONTENTSCHAPTER 1: TOOL KNOWLEDGE

    1.1 Informatica PowerCenter1.2 Product Overview

    1.2.1 PowerCenter Domain1.2.2 Administration Console1.2.3 PowerCenter Repository1.2.4 PowerCenter Client1.2.5 Repository Service1.2.6 INTEGRATION SERVICE1.2.7 WEB SERVICES HUB1.2.8 DATA ANALYZER1.2.9 METADATA MANAGER

    CHAPTER 2: REPOSITORY MANAGER2.1 Adding a Repository to the Navigator2.2 Configuring a Domain Connection2.3 Connecting to a Repository2.4 Viewing Object Dependencies2.5 Validating Multiple Objects2.6 Comparing Repository Objects2.7 Truncating Workflow and Session Log Entries2.8 Managing User Connections and Locks2.9 Managing Users and Groups2.10 Working with Folders

    CHAPTER 3: DESIGNER

    3.1 Source Analyzer3.1.1 Working with Relational Sources3.1.2 Working with Flat Files

    3.2 Target Designer3.3 Mappings3.4 Transformations

    3.4.1 Working with Ports3.4.2 Using Default Values for Ports3.4.3 User-Defined Default Values

    3.5 Tracing Levels3.6 Basic First Mapping3.7 Expression Transformation3.8 Filter Transformation3.9 Router Transformation

  • 8/13/2019 Informatica Guide

    3/159

    3.10 Union Transformation3.11 Sorter Transformation3.12 Rank Transformation3.13 Aggregator Transformation3.14 Joiner Transformation3.15 Source Qualifier3.16 Lookup Transformation

    3.16.1 Lookup Types3.16.2 Lookup Transformation Components3.16.3 Connected Lookup Transformation3.16.4 Unconnected Lookup Transformation3.16.5 Lookup Cache Types: Dynamic, Static, Persistent, Shared

    3.17 Update Strategy3.18 Dynamic Lookup Cache Use3.19 Lookup Query3.20 Lookup and Update Strategy Examples

    Example to Insert and Update without a Primary KeyExample to Insert and Delete based on a condition

    3.21 Stored Procedure Transformation3.21.1 Connected Stored Procedure Transformation3.21.2 Unconnected Stored Procedure Transformation

    3.22 Sequence Generator Transformation3.23 Mapplets: Mapplet Input and Mapplet Output Transformations3.24 Normalizer Transformation3.25 XML Sources Import and usage3.26 Mapping Wizards

    3.26.1 Getting Started

    3.26.2 Slowly Changing Dimensions3.27 Mapping Parameters and Variables3.28 Parameter File3.29 Indirect Flat File Loading

    CHAPTER 4: WORKFLOW MANAGER

    4.1 Informatica Architecture4.1.1 Integration Service Process4.1.2 Load Balancer4.1.3 DTM Process

    4.1.4 Processing Threads4.1.5 Code Pages and Data Movement4.1.6 Output Files and Caches

    4.2 Working with Workflows4.2.1 Assigning an Integration Service4.2.2 Working with Links4.2.3 Workflow Variables4.2.4 Session Parameters

  • 8/13/2019 Informatica Guide

    4/159

    4.3 Working with Tasks4.3.1 Session Task4.3.2 Email Task4.3.3 Command Task4.3.4 Working with Event Tasks4.3.5 Timer Task4.3.6 Decision Task4.3.7 Control Task4.3.8 Assignment Task

    4.4 Schedulers4.5 Worklets4.6 Partitioning

    4.6.1 Partitioning Attributes4.6.2 Partitioning Types4.6.3 Some Points

    4.7 Session Properties

    4.8 Workflow Properties

  • 8/13/2019 Informatica Guide

    5/159

    Chapter 1

    Informatica

    PowerCenter

  • 8/13/2019 Informatica Guide

    6/159

    CHAPTER 1: TOOL KNOWLEDGE

    1.1INFORMATICA POWERCENTERInformatica PowerCenter is a powerful ETL tool from Informatica Corporation.Informatica Corporation products are:

    Informatica PowerCenterInformatica on demandInformatica B2B Data ExchangeInformatica Data QualityInformatica Data Explorer

    Informatica PowerCenter is a single, unified enterprise data integrationplatform for accessing, discovering, and integrating data from virtually any businesssystem, in any format, and delivering that data throughout the enterprise at any

    speed.

    Informatica PowerCenter EditionsBecause every data integration project is different and includes many variablessuch as data volumes, latency requirements, IT infrastructure, and methodologiesInformatica offers three PowerCenter Editions and a suite of PowerCenter Options tomeet your projects and organizations specific needs. Standard Edition Real Time Edition Advanced EditionInformatica PowerCenter Standard Edition

    PowerCenter Standard Edition is a single, unified enterprise data integration platformfor discovering, accessing, and integrating data from virtually any business system,in any format, and delivering that data throughout the enterprise to improveoperational efficiency.

    Key features include: A high-performance data integration server A global metadata infrastructure Visual tools for development and centralized administration Productivity tools to facilitate collaboration among architects, analysts, and

    developers

  • 8/13/2019 Informatica Guide

    7/159

    Informatica PowerCenter Real Time Edition

    Packaged for simplicity and flexibility, PowerCenter Real Time Edition extendsPowerCenter Standard Edition with additional capabilities for integrating andprovisioning transactional or operational data in real-time. PowerCenter Real Time

    Edition provides the ideal platform for developing sophisticated data services anddelivering timely information as a service, to support all business needs. It providesthe perfect real-time data integration complement to service-oriented architectures,application integration approaches, such as enterprise application integration (EAI),enterprise service buses (ESB), and business process management (BPM).

    Key features include: Change data capture for relational data sources Integration with messaging systems Built-in support for Web services Dynamic partitioning with data smart parallelism Process orchestration and human workflow capabilities

    Informatica PowerCenter Real Time Edition

    PowerCenter Advanced Edition addresses requirements for organizations that arestandardizing data integration at an enterprise level, across a number of projects anddepartments. It combines all the capabilities of PowerCenter Standard Edition andfeatures additional capabilities that are ideal for data governance and IntegrationCompetency Centers.

    Key features include: Dynamic partitioning with data smart parallelism Powerful metadata analysis capabilities Web-based data profiling and reporting capabilities

  • 8/13/2019 Informatica Guide

    8/159

    Informatica PowerCenter Options

    A range of options are available to extend PowerCenters core data integrationcapabilities.

    Data Cleanse and Match Option features powerful, integrated cleansing andmatching capabilities to correct and remove duplicate customer data.

    Data Federation Optionenables a combination of traditional physical and virtualdata integration in a single platform.

    Data Masking Option protects sensitive, private information by masking it inflight to produce realistic-looking data, reducing the risk of security and compliancebreaches.

    Enterprise Grid Option enhances scalability and delivers optimal performancewhile reducing the administrative overhead of supporting grid computingenvironments.

    High Availability Optionminimizes service interruptions during hardware and/orsoftware outages and reduces costs associated with data downtime.

    Metadata Exchange Optionscoordinate technical and business metadata fromdata modeling tools, business intelligence tools, source and target database catalogs,

    and PowerCenter repositories.

    Partitioning Option helps IT organizations maximize their technologyinvestments by enabling hardware and software to jointly scale to handle large

    volumes of data and users.

    Pushdown Optimization Option enables data transformation processing,where appropriate, to be pushed down into any relational database to make thebest use of existing database assets.

    Team-Based Development Option facilitates collaboration amongdevelopment, quality assurance, and production administration teams and acrossgeographically disparate teams.

    Unstructured Data Option expands PowerCenters data access capabilities toinclude unstructured data formats, providing virtually unlimited access to all

    enterprise data formats.

  • 8/13/2019 Informatica Guide

    9/159

    1.2 PRODUCT OVERVIEW

    PowerCenter provides an environment that allows you to load data into a centralizedlocation, such as a data warehouse or operational data store (ODS). You can extract

    data from multiple sources, transform the data according to business logic you buildin the client application, and load the transformed data into file and relationaltargets.

    PowerCenter also provides the ability to view and analyze business information andbrowse and analyze metadata from disparate metadata repositories.

    PowerCenter includes the following components: PowerCenter domain Administration Console PowerCenter repository PowerCenter Client Repository Service Integration Service Web Services Hub SAP BW Service Data Analyzer Metadata Manager PowerCenter Repository Reports

    1.2.1 POWERCENTER DOMAIN:PowerCenter has a service-oriented architecture that provides the ability to scaleservices and share resources across multiple machines. PowerCenter provides thePowerCenter domain to support the administration of the PowerCenter services. A

    domain is the primary unit for management and administration of services inPowerCenter.

    A domain contains the following components: One or more nodes Service Manager Application services

    One or more nodes: A node is the logical representation of a machine in adomain. A domain may contain more than one node. The node that hosts the domainis the master gateway for the domain. You can add other machines as nodes in thedomain and configure the nodes to run application services, such as the IntegrationService or Repository Service. All service requests from other nodes in the domaingo through the master gateway.

    Service Manager: The Service Manager is built in to the domain to support thedomain and the application services. The Service Manager runs on each node in thedomain. The Service Manager starts and runs the application services on a machine.

  • 8/13/2019 Informatica Guide

    10/159

    The Service Manager performs the following functions: Alerts: Provides notifications about domain and service events. Authentication: Authenticates user requests Authorization: Authorizes user requests for services. Domain configuration: Manages domain configuration metadata. Node configuration: Manages node configuration metadata. Licensing: Registers license information and verifies license information Logging:Provides accumulated log events from each service in the domain.

    Application services: A group of services that represent PowerCenter server-based functionality.

    Repository Service: Manages connections to the PowerCenter repository. Integration Service: Runs sessions and workflows. Web Services Hub: Exposes PowerCenter functionality to external clients

    through web services. SAP BW Service: Listens for RFC requests from SAP NetWeaver BW and

    initiates workflows to extract from or load to SAP BW.

  • 8/13/2019 Informatica Guide

    11/159

    1.2.2 ADMINISTRATION CONSOLE

    The Administration Console is a web application that we use to manage aPowerCenter domain. If we have a user login to the domain, you can access the

    Administration Console. Domain objects include services, nodes, and licenses.

    Use the Administration Console to perform the following tasks in the domain:

    Manage application services:Manage all application services in the domain, suchas the Integration Service and Repository Service.

    Configure nodes: Configure node properties, such as the backup directory andresources. We can also shut down and restart nodes.

    Manage domain objects: Create and manage objects such as services, nodes,licenses, and folders. Folders allow you to organize domain objects and to managesecurity by setting permissions for domain objects.

    View and edit domain object properties:You can view and edit properties for allobjects in the domain, including the domain object.

    View log events:Use the Log Viewer to view domain, Integration Service, SAP BWService, Web Services Hub, and Repository Service log events.

    Other domain management tasks include applying licenses, managing grids andresources, and configuring security.

  • 8/13/2019 Informatica Guide

    12/159

    1.2.3 POWERCENTER REPOSITORY

    The PowerCenter repository resides in a relational database. The repository databasetables contain the instructions required to extract, transform, and load data and

    store administrative information such as user names, passwords, permissions, andprivileges. PowerCenter applications access the repository through the RepositoryService.

    We administer the repository using the Repository Manager Client tool, thePowerCenter Administration Console, and command line programs.

    Global repository:The global repository is the hub of the repository domain. Usethe global repository to store common objects that multiple developers can usethrough shortcuts. These objects may include operational or Application sourcedefinitions, reusable transformations, mapplets, and mappings.

    Local repositories:A local repository is any repository within the domain that isnot the global repository. Use local repositories for development. From a localrepository, you can create shortcuts to objects in shared folders in the globalrepository. These objects include source definitions, common dimensions andlookups, and enterprise standard transformations. You can also create copies ofobjects in non-shared folders.

    PowerCenter supports versioned repositories. A versioned repository can storemultiple versions of an object. PowerCenter version control allows you to efficientlydevelop, test, and deploy metadata into production.

  • 8/13/2019 Informatica Guide

    13/159

    1.2.4 POWERCENTER CLIENT

    The PowerCenter Client consists of the following applications that we use to managethe repository, design mappings, mapplets, and create sessions to load the data:

    Designer Data Stencil Repository Manager Workflow Manager Workflow Monitor

    Designer:

    Use the Designer to create mappings that contain transformation instructions for theIntegration Service.

    The Designer has the following tools that you use to analyze sources, design targetschemas, and build source-to-target mappings:

    Source Analyzer:Import or create source definitions. Target Designer:Import or create target definitions. Transformation Developer: Develop transformations to use in mappings.

    You can also develop user-defined functions to use in expressions. Mapplet Designer:Create sets of transformations to use in mappings. Mapping Designer: Create mappings that the Integration Service uses to

    extract, transform, and load data.

  • 8/13/2019 Informatica Guide

    14/159

    Data Stencil

    Use the Data Stencil to create mapping template that can be used to generatemultiple mappings. Data Stencil uses the Microsoft Office Visio interface to createmapping templates. Not used by a developer usually.

    Repository Manager

    Use the Repository Manager to administer repositories. You can navigate throughmultiple folders and repositories, and complete the following tasks:

    Manage users and groups:Create, edit, and delete repository users anduser groups. We can assign and revoke repository privileges and folderpermissions.

    Perform folder functions:Create, edit, copy, and delete folders. Workwe perform in the Designer and Workflow Manager is stored in folders. If we

    want to share metadata, you can configure a folder to be shared.

    View metadata: Analyze sources, targets, mappings, and shortcutdependencies, search by keyword, and view the properties of repositoryobjects.

    We create repository objects using the Designer and Workflow Manager Client tools.

    We can view the following objects in the Navigator window of the RepositoryManager:

    Source definitions: Definitions of database objects (tables, views, synonyms) or

    files that provide source data.

    Target definitions: Definitions of database objects or files that contain the targetdata.

    Mappings: A set of source and target definitions along with transformationscontaining business logic that you build into the transformation. These are theinstructions that the Integration Service uses to transform and move data.

    Reusable transformations: Transformations that we use in multiple mappings.

    Mapplets:A set of transformations that you use in multiple mappings.

    Sessions and workflows:Sessions and workflows store information about how andwhen the Integration Service moves data. A workflow is a set of instructions thatdescribes how and when to run tasks related to extracting, transforming, and loadingdata. A session is a type of task that you can put in a workflow. Each sessioncorresponds to a single mapping.

  • 8/13/2019 Informatica Guide

    15/159

    Workflow Manager

    Use the Workflow Manager to create, schedule, and run workflows. A workflow is aset of instructions that describes how and when to run tasks related to extracting,transforming, and loading data.

    The Workflow Manager has the following tools to help us develop a workflow:

    Task Developer: Create tasks we want to accomplish in the workflow.

    Worklet Designer:Create a worklet in the Worklet Designer. A worklet is an objectthat groups a set of tasks. A worklet is similar to a workflow, but without schedulinginformation. We can nest worklets inside a workflow.

    Workflow Designer: Create a workflow by connecting tasks with links in theWorkflow Designer. You can also create tasks in the Workflow Designer as youdevelop the workflow.

    When we create a workflow in the Workflow Designer, we add tasks to the workflow.The Workflow Manager includes tasks, such as the Session task, the Command task,and the Email task so you can design a workflow. The Session task is based on amapping we build in the Designer.

    We then connect tasks with links to specify the order of execution for the tasks wecreated. Use conditional links and workflow variables to create branches in theworkflow.

    Workflow Monitor

    Use the Workflow Monitor to monitor scheduled and running workflows for eachIntegration Service.

    We can view details about a workflow or task in Gantt Chart view or Task view. Wecan run, stop, abort, and resume workflows from the Workflow Monitor. We can viewsessions and workflow log events in the Workflow Monitor Log Viewer.

    The Workflow Monitor displays workflows that have run at least once. The WorkflowMonitor continuously receives information from the Integration Service andRepository Service. It also fetches information from the repository to display historicinformation.

  • 8/13/2019 Informatica Guide

    16/159

  • 8/13/2019 Informatica Guide

    17/159

    We install the Repository Service when we install PowerCenter Services. After weinstall the PowerCenter Services, we can use the Administration Console to managethe Repository Service.

    Repository Connectivity:

    PowerCenter applications such as the PowerCenter Client, the Integration Service,pmrep, and infacmd connect to the repository through the Repository Service.

    The following process describes how a repository client application connects to therepository database:

    1) The repository client application sends a repository connection request to themaster gateway node, which is the entry point to the domain. This is node Bin the diagram.

    2) The Service Manager sends back the host name and port number of the noderunning the Repository Service. If you have the high availability option, youcan configure the Repository Service to run on a backup node. Node A in

    above diagram.3) The repository client application establishes a link with the Repository Serviceprocess on node A. This communication occurs over TCP/IP.

    4) The Repository Service process communicates with the repositorydatabase and performs repository metadata transactions for the clientapplication.

  • 8/13/2019 Informatica Guide

    18/159

    Understanding Metadata

    The repository stores metadata that describes how to extract, transform, and loadsource and target data. PowerCenter metadata describes several different kinds ofrepository objects. We use different PowerCenter Client tools to develop each kind ofobject.

    If we enable version control, we can store multiple versions of metadata objects inthe repository.

    We can also extend the metadata stored in the repository by associating informationwith repository objects. For example, when someone in our organization creates asource definition, we may want to store the name of that person with the sourcedefinition. We associate information with repository metadata using metadataextensions.

    Administering Repositories

    We use the PowerCenter Administration Console, the Repository Manager, and thepmrep and infacmd command line programs to administer repositories.

    Back up repository to a binary file Restore repository from a binary file Copy repository database tables Delete repository database tables Create a Repository Service Remove a Repository Service Create folders to organize metadata Add repository users and groups Configure repository security

  • 8/13/2019 Informatica Guide

    19/159

    1.2.6 INTEGRATION SERVICE

    The Integration Service reads workflow information from the repository. TheIntegration Service connects to the repository through the Repository Service tofetch metadata from the repository.

    A workflow is a set of instructions that describes how and when to run tasks relatedto extracting, transforming, and loading data. The Integration Service runs workflowtasks. A session is a type of workflow task. A session is a set of instructions thatdescribes how to move data from sources to targets using a mapping.

    It extracts data from the mapping sources and stores the data in memory while itapplies the transformation rules that you configure in the mapping. The IntegrationService loads the transformed data into the mapping targets.

    The Integration Service can combine data from different platforms and source types.For example, you can join data from a flat file and an Oracle source. The Integration

    Service can also load data to different platforms and target types.

    1.2.7 WEB SERVICES HUB

    The Web Services Hub is a web service gateway for external clients. It processesSOAP requests from web service clients that want to access PowerCenterfunctionality through web services. Web service clients access the IntegrationService and Repository Service through the Web Services Hub.

    When we install PowerCenter Services, the PowerCenter installer installs the WebServices Hub.

    The Web Services Hub hosts the following web services:

    Batch web services:Run and monitor web-enabled workflows. Real-time web services:Create service workflows that allow you to read

    and write messages to a web service client through the Web Services Hub.

    This is not used by Informatica Developer normally and not in scope of ourtraining.

  • 8/13/2019 Informatica Guide

    20/159

    1.2.8 DATA ANALYZER

    PowerCenter Data Analyzer provides a framework to perform business analytics oncorporate data. With Data Analyzer, we can extract, filter, format, and analyzecorporate information from data stored in a data warehouse, operational data store,or other data storage models. Data Analyzer uses a web browser interface to viewand analyze business information at any level.

    Data Analyzer extracts, filters, and presents information in easy-to-understandreports. We can use Data Analyzer to design, develop, and deploy reports and set updashboards and alerts to provide the latest information to users at the time and inthe manner most useful to them.

    Data Analyzer has a repository that stores metadata to track information aboutenterprise metrics, reports, and report delivery. Once an administrator installs DataAnalyzer, users can connect to it from any computer that has a web browser andaccess to the Data Analyzer host.

    This is a different tool and is out of scope for our training.

    1.2.9 METADATA MANAGER

    PowerCenter Metadata Manager is a metadata management tool that you can use tobrowse and analyze metadata from disparate metadata repositories. MetadataManager helps us understand and manage how information and processes arederived, the fundamental relationships between them, and how they are used.

    Metadata Manager uses Data Analyzer functionality. We can use the embedded Data

    Analyzer features to design, develop, and deploy metadata reports and dashboards.

    Metadata Manager uses PowerCenter workflows to extract metadata from sourcerepositories and load it into a centralized metadata warehouse called the MetadataManager Warehouse.

    This is a different tool and is out of scope for our training.

  • 8/13/2019 Informatica Guide

    21/159

  • 8/13/2019 Informatica Guide

    22/159

    CHAPTER 2: REPOSITORY MANAGER

    We can navigate through multiple folders and repositories and perform basicrepository tasks with the Repository Manager. This is an administration tool and

    used by Informatica Administrator.

    Repository Manager Tasks: Add domain connection information Add and connect to a repository Work with PowerCenter domain and repository connections Search for repository objects or keywords View object dependencies Compare repository objects Truncate session and workflow log entries View user connections Release locks Exchange metadata with other business intelligence tools

    Add a repository to the Navigator, and then configure the domainconnection information when we connect to the repository.

    2.1 Adding a Repository to the Navigator

    1. In any of the PowerCenter Client tools, click Repository > Add.

    2. Enter the name of the repository and a valid repository user name.3. Click OK.

    Before we can connect to the repository for the first time, we must configure theconnection information for the domain that the repository belongs to.

  • 8/13/2019 Informatica Guide

    23/159

    2.2 Configuring a Domain Connection

    1. In a PowerCenter Client tool, select the Repositories node in the Navigator.2. Click Repository > Configure Domains to open the Configure Domains dialog

    box.

    3. Click the Add button. The Add Domain dialog box appears.4. Enter the domain name, gateway host name, and gateway port number.5. Click OK to add the domain connection.

    2.3 Connecting to a Repository

    1. Launch a PowerCenter Client tool.2. Select the repository in the Navigator and click Repository > Connect, or

    double-click the repository.3. Enter a valid repository user name and password.4. Click Connect.

    Click on more button to add, change or view domain information.

  • 8/13/2019 Informatica Guide

    24/159

    2.4 Viewing Object Dependencies

    Before we change or delete repository objects, we can view dependencies to see theimpact on other objects. For example, before you remove a session, we can find outwhich workflows use the session. We can view dependencies for repository objects inthe Repository Manager, Workflow Manager, and Designer tools.

    Steps:1. Connect to the repository.2. Select the object of use in navigator.3. Click Analyze and Select the dependency we want to view.

    2.5 Validating Multiple Objects

    We can validate multiple objects in the repository without fetching them into theworkspace. We can save and optionally check in objects that change from invalid to

    valid status as a result of the validation. We can validate sessions, mappings,mapplets, workflows, and worklets.

    Steps:1. Select the objects you want to validate.2. Click Analyze and Select Validate3. Select validation options from the Validate Objects dialog box4. Click Validate.5. Click a link to view the objects in the results group.

    2.6 Comparing Repository Objects

    We can compare two repository objects of the same type to identify differencesbetween the objects. For example, we can compare two sessions to check fordifferences. When we compare two objects, the Repository Manager displays theirattributes.

    Steps:1. In the Repository Manager, connect to the repository.2. In the Navigator, select the object you want to compare.3. Click Edit > Compare Objects.4. Click Compare in the dialog box displayed.

  • 8/13/2019 Informatica Guide

    25/159

    2.7 Truncating Workflow and Session Log Entries

    When we configure a session or workflow to archive session logs or workflow logs,the Integration Service saves those logs in local directories. The repository alsocreates an entry for each saved workflow log and session log. If we move or delete a

    session log or workflow log from the workflow log directory or session log directory,we can remove the entries from the repository.

    Steps:1. In the Repository Manager, select the workflow in the Navigator window or in

    the Main window.2. Choose Edit > Truncate Log. The Truncate Workflow Log dialog box appears.3. Choose to delete all workflow and session log entries or to delete all workflow

    and session log entries with an end time before a particular date.4. If you want to delete all entries older than a certain date, enter the date and

    time.5. Click OK.

    2.8 Managing User Connections and Locks

    In the Repository Manager, we can view and manage the following items:

    Repository object locks: The repository locks repository objects and folders byuser. The repository creates different types of locks depending on the task. TheRepository Service locks and unlocks all objects in the repository.User connections:Use the Repository Manager to monitor user connections to therepository. We can end connections when necessary.

    Types of locks created:1. In-use lock: Placed on objects we want to view2. Write-intent lock: Placed on objects we want to modify.3. Execute lock: Locks objects we want to run, such as workflows and sessions

    Steps:1. Launch the Repository Manager and connect to the repository.2. Click Edit > Show User Connections or Show locks3. The locks or user connections will be displayed in a window.4. We can do the rest as per our need.

  • 8/13/2019 Informatica Guide

    26/159

    2.9 Managing Users and Groups

    1. In the Repository Manager, connect to a repository.2. Click Security > Manage Users and Privileges.3. Click the Groups tab to create Groups. or4. Click the Users tab to create Users5. Click the Privileges tab to give permissions to groups and users.6. Select the options available to add, edit, and remove users and groups.

    There are two default repository user groups:

    Administrators:This group initially contains two users that are created by default.The default users are Administrator and the database user that created therepository. We cannot delete these users from the repository or remove them fromthe Administrators group.Public: The Repository Manager does not create any default users in the Publicgroup.

    2.10 Working with Folders

    We can create. Edit or delete folder as per our need.1. In the Repository Manager, connect to a repository.2. Click Folder > Create.

    Enter the following information:

    3. Click ok.

  • 8/13/2019 Informatica Guide

    27/159

    Chapter 3

    Designer

  • 8/13/2019 Informatica Guide

    28/159

    CHAPTER 3: DESIGNER

    The Designer has tools to help us build mappings and mapplets so we can specifyhow to move and transform data between sources and targets. The Designer helps

    us create source definitions, target definitions, and transformations to build themappings.

    The Designer lets us work with multiple tools at one time and to work in multiplefolders and repositories at the same time. It also includes windows so we can viewfolders, repository objects, and tasks.

    Designer Tools:

    Source Analyzer:Use to import or create source definitions for flat file, XML,COBOL, Application, and relational sources.

    Target Designer:Use to import or create target definitions. Transformation Developer:Use to create reusable transformations. Mapplet Designer:Use to create mapplets. Mapping Designer:Use to create mappings.

    Designer Windows:

    Navigator:Use to connect to and work in multiple repositories and folders. Workspace:Use to view or edit sources, targets, mapplets, transformations,

    and mappings. Status bar:Displays the status of the operation we perform. Output:Provides details when we perform certain tasks, such as saving work

    or validating a mapping Overview:An optional window to simplify viewing workbooks containing

    large mappings or a large number of objects. Instance Data:View transformation data while you run the Debugger to

    debug a mapping. Target Data:View target data while you run the Debugger to debug a

    mapping.

    Overview Window

  • 8/13/2019 Informatica Guide

    29/159

    Designer Windows

    Designer Tasks: Add a repository. Print the workspace. View date and time an object was last saved. Open and close a folder. Create shortcuts. Check out and in repository objects. Search for repository objects. Enter descriptions for repository objects. View older versions of objects in the workspace. Revert to a previously saved object version. Copy objects. Export and import repository objects. Work with multiple objects, ports, or columns. Rename ports. Use shortcut keys.

  • 8/13/2019 Informatica Guide

    30/159

    3.1 SOURCE ANALYZER

    In Source Analyzer, we define the source definitions that we will use in a mapping.

    We can either import a source definition or manually create the definition.

    We can import or create the following types of source definitions in the SourceAnalyzer:

    Relational tables, views, and synonyms Fixed-width and delimited flat files that do not contain binary data. COBOL files XML files Data models using certain data modeling tools through Metadata Exchange

    for Data Models

    3.1.1 Working with Relational SourcesSpecial Character Handling:

    We can import, create, or edit source definitions with table and column namescontaining special characters, such as the slash (/) character through the Designer.When we use the Source Analyzer to import a source definition, the Designer retainsspecial characters in table and field names.

    However, when we add a source definition with special characters to a mapping, theDesigner either retains or replaces the special character. Also, when we generate thedefault SQL statement in a Source Qualifier transformation for a relational source,the Designer uses quotation marks around some special characters. The Designer

    handles special characters differently for relational and non-relational sources.

    Importing a Relational Source Definition

    1. Connect to repository.2. Right click the folder where you want to import source definition and click

    open. The folder which is connected gets bold. We can work in only one folderat a time.

    3. In the Source Analyzer, click Sources > Import from Database.4. Select the ODBC data source used to connect to the source database. If you

    need to create or modify an ODBC data source, click the Browse button toopen the ODBC Administrator. Create the data source, and click OK. Select

    the new ODBC data source.

  • 8/13/2019 Informatica Guide

    31/159

    5) Enter a database user name and password to connect to the database.6) Click Connect. Table names will appear.7) Select the relational object or objects you want to import.8) Click OK.9) Click Repository > Save.

    Updating a Relational Source Definition

    We can update a source definition to add business names or to reflect new columnnames, datatypes, or other changes. We can update a source definition in thefollowing ways:

    Edit the definition:Manually edit the source definition if we need to configureproperties that we cannot import or if we want to make minor changes to the sourcedefinition.

    Reimport the definition: If the source changes are significant, we may need toreimport the source definition. This overwrites or renames the existing sourcedefinition. We can retain existing primary key-foreign key relationships anddescriptions in the source definition being replaced.

    Editing Relational Source Definitions

    1) Select Tools -> Source Analyzer2) Drag the table you want to edit in workspace.3) In the Source Analyzer, double-click the title bar of the source definition. Or

    Right click the table and click edit.

    4)In table tab, we can rename, add owner name, business description or editdatabase type.

    5) Click the Columns Tab. Edit column names, datatypes, and restrictions. ClickOK.

  • 8/13/2019 Informatica Guide

    32/159

    3.1.2 Working with Flat Files

    To use flat files as sources, targets, and lookups in a mapping we must import orcreate the definitions in the repository. We can import or create flat file sourcedefinitions in the Source Analyzer.

    We can import fixed-width and delimited flat file definitions that do not containbinary data. When importing the definition, the file must be in a directorylocal to the client machine.In addition, the Integration Service must be able toaccess all source files during the session.

    Special Character Handling:

    When we import a flat file in the Designer, the Flat File Wizard uses the file name asthe name of the flat file definition by default. We can import a flat file with any validfile name through the Flat File Wizard. However, the Designer does not recognizesome special characters in flat file source and target names.

    When we import a flat file, the Flat File Wizard changes invalid characters and spacesinto underscores ( _ ). For example, you have the source file "sampleprices+items.dat". When we import this flat file in the Designer, the Flat File Wizardnames the file definition sample_prices_items by default.

    To import a fixed-width flat file definition:

    1. Open the Source Analyzer and click Sources > Import from File. The OpenFlat File dialog box appears.

    2. Browse and Select the file you want to use.3. Select a code page.4. Click OK.5. Edit the following settings:

  • 8/13/2019 Informatica Guide

    33/159

    6) Click Next. Follow the directions in the wizard to manipulate the columnbreaks in the file preview window. Move existing column breaks by draggingthem. Double-click a column break to delete it.

    7) Click next and Enter column information for each column in the file.8) Click Finish.9) Click Repository > Save.

    To import a delimited flat file definition:

    Delimited flat files are always character-oriented and line sequential. The columnprecision is always measured in characters for string columns and in significant digitsfor numeric columns. Each row ends with a newline character. We can import adelimited file that does not contain binary data or multibyte character data greaterthan two bytes per character.

    Steps:

    1) Repeat Steps 1-5 as in case of fixed width.2) Click Next.3) Enter the following settings:

    Delimiters Required Character used to separate columns of data. Use theOther field to enter a different delimiter.

    TreatConsecutiveDelimiters asOne

    Optional If selected, the Flat File Wizard reads one or moreconsecutive column delimiters as one.

    EscapeCharacter

    Optional Character immediately preceding a column delimitercharacter embedded in an unquoted string, orimmediately preceding the quote character in aquoted string.

    Remove EscapeCharacter FromData

    Optional Clear this option to include the escape character inthe output string.

    Use Default TextLength

    Optional If selected, the Flat File Wizard uses the entereddefault text length for all string datatypes.

    Text Qualifier Required Quote character that defines the boundaries of textstrings. Choose No Quote, Single Quote, or DoubleQuotes.

    4) Enter column information for each column in the file.5) Click Finish.6) Click Repository > Save.

  • 8/13/2019 Informatica Guide

    34/159

    Editing Flat File Definitions

    1) Select Tools -> Source Analyzer2) Drag the file you want to edit in workspace.3) In the Source Analyzer, double-click the title bar of the source definition.

    We can edit source or target flat file definitions using the followingdefinition tabs:

    Table tab: Edit properties such as table name, business name, andflat file properties.

    Columns tab:Edit column information such as column names,datatypes, precision, and formats.

    Properties tab: View the default numeric and datetime formatproperties in the Source Analyzer and the Target Designer. You canedit these properties for each source and target instance in a mappingin the Mapping Designer.

    Metadata Extensions tab:Extend the metadata stored in therepository by associating information with repository objects, such as

    flat file definitions.4) Click the Advanced button to edit the flat file properties. A different dialog boxappears for fixed-width and delimited files.

    5) Do the changes as needed.6) Click OK.7) Click Repository > Save.

    The way to handle target flat files is also same as described in the abovesections. Just make sure that instead of Source Analyzer,Select Tools -> Target Designer.Rest is same.

  • 8/13/2019 Informatica Guide

    35/159

    3.2 TARGET DESIGNER

    Before we create a mapping, we must define targets in the repository. Use theTarget Designer to import and design target definitions. Target definitions include

    properties such as column names and data types.

    Types of target definitions:

    Relational:Create a relational target for a particular database platform. Flat file:Create fixed-width and delimited flat file target definitions. XML file:Create an XML target definition to output data to an XML file.

    Ways of creating target definitions:

    1. Import the definition for an existing target: Import the target definitionfrom a relational target or a flat file. The Target Designer uses a Flat FileWizard to import flat files.

    2. Create a target definition based on a source definition: Drag a sourcedefinition into the Target Designer to make a target definition and edit it tomake necessary changes..

    3. Create a target definition based on a transformation or mapplet:Draga transformation into the Target Designer to make a target definition.

    4. Manually create a target definition: Create a target definition in theTarget Designer.

    5. Design several related targets:Create several related target definitions atthe same time. You can create the overall relationship, called a schema, andthe target definitions, through wizards in the Designer.

    After we create a relational target table definition, we need to create a table indatabase also.

    Steps:

    1. In the Target Designer, select the relational target definition you want tocreate in the database. If you want to create multiple tables, select allrelevant table definitions.

    2. Click Targets > Generate/Execute SQL.3. Click Connect and select the database where the target table should be

    created. Click OK to make the connection.4. Click Generate SQL File if you want to create the SQL script, or Generate and

    Execute if you want to create the file, and then immediately run it.5. Click Close.

  • 8/13/2019 Informatica Guide

    36/159

    3.3 MAPPINGS

    A mapping is a set of source and target definitions linked by transformation objectsthat define the rules for data transformation. Mappings represent the data flow

    between sources and targets. When the Integration Service runs a session, it usesthe instructions configured in the mapping to read, transform, and write data.

    Mapping Components:

    Source definition: Describes the characteristics of a source table or file. Transformation: Modifies data before writing it to targets. Use different

    transformation objects to perform different functions. Target definition: Defines the target table or file. Links: Connect sources, targets, and transformations so the Integration

    Service can move the data as it transforms it.

    The work of Informatica Developer is to make mappings as per clientrequirements. We drag source definition and target definition in workspace.We create various transformations to modify the data as per the need.

    We then run the mappings by creating session and workflow. We also unit test the mappings.

    Steps to create a mapping:

    1. Open the Mapping Designer.2. Click Mappings > Create, or drag a repository object into the workspace.3. Enter a name for the new mapping and click OK.

  • 8/13/2019 Informatica Guide

    37/159

    3.4 TRANSFORMATIONS

    A transformation is a repository object that generates, modifies, or passes data. Youconfigure logic in a transformation that the Integration Service uses to transform

    data. The Designer provides a set of transformations that perform specific functions.For example, an Aggregator transformation performs calculations on groups of data.

    Transformations in a mapping represent the operations the Integration Serviceperforms on the data. Data passes through transformation ports that we link in amapping or mapplet.

    Types of Transformations:

    Active: An active transformation can change the number of rows that pass throughit, such as a Filter transformation that removes rows that do not meet the filtercondition.

    Passive: A passive transformation does not change the number of rows that passthrough it, such as an Expression transformation that performs a calculation on dataand passes all rows through the transformation.

    Connected: A connected transformation is connected to other transformations inthe mapping.

    Unconnected: An unconnected transformation is not connected to othertransformations in the mapping. An unconnected transformation is called withinanother transformation, and returns a value to that transformation.

    Reusable:Reusable transformations can be used in multiple mappings. These arecreated in Transformation Developer tool. Or promote a non-reusabletransformationfrom the Mapping Designer.

    We can create most transformations as a non-reusable or reusable. External Procedure transformation can be created as a reusable

    transformation only. Source Qualifier is not reusable.

    Non reusable:Non-reusable transformations exist within a single mapping. Theseare created in Mapping Designertool.

    Single-Group Transformation:Transformations that have one input and oneoutput group.

    Multi-Group Transformations: Transformations that have multiple inputgroups, multiple output groups, or both. A group is the representation of a row ofdata entering or leaving a transformation. Example: Union, Router, Joiner, HTTP etc.

  • 8/13/2019 Informatica Guide

    38/159

    3.4.1 Working with Ports

    After we create a transformation, we need to add and configure ports using the Portstab. Ports are equivalent to columns in Informatica.

    Creating Ports:We can create a new port in the following ways:

    Drag a port from another transformation. When we drag a port from anothertransformation the Designer creates a port with the same properties, and itlinks the two ports. Click Layout > Copy Columns to enable copying ports.

    Click the Add button on the Ports tab. The Designer creates an empty portyou can configure.

    3.4.2 Using Default Values for Ports

    All transformations use default values that determine how the Integration Servicehandles input null values and output transformation errors.

    Input port:The system default value for null input ports is NULL. It displaysas a blank in the transformation. If an input value is NULL, the IntegrationService leaves it as NULL.

    Output port: The system default value for output transformation errors isERROR. The default value appears in the transformation asERROR(`transformation error'). If a transformation error occurs, theIntegration Service skips the row. The Integration Service notes all input rowsskipped by the ERROR function in the session log file.

    Input/output port:The system default value for null input is the same asinput ports, NULL. The system default value appears as a blank in thetransformation. The default value for output transformation errors is the sameas output ports.

    Note:Variable ports do not support default values. The Integration Service initializesvariable ports according to the datatype.

    Note:The Integration Service ignores user-defined default values for unconnectedtransformations.

  • 8/13/2019 Informatica Guide

    39/159

    3.4.3 User-defined default values

    Constant value:Use any constant (numeric or text), including NULL.Example: 0, 9999, Unknown Value, NULL

    Constant expression:We can include a transformation function with constantparameters. Example: 500 * 1.75, TO_DATE('January 1, 1998, 12:05 AM'), ERROR('Null not allowed')

    ERROR:Generate a transformation error. Write the row and a message in thesession log or row error log. The Integration Service writes the row to session log orrow error log based on session configuration.Use the ERROR function as the default value when we do not want null values topass into a transformation. For example, we might want to skip a row when the inputvalue of DEPT_NAME is NULL. You could use the following expression as the defaultvalue:ERROR('Error. DEPT is NULL')

    ABORT:Abort the session. Session aborts when the Integration Service encountersa null input value. The Integration Service does not increase the error count or writerows to the reject file.Example: ABORT(DEPT is NULL')

  • 8/13/2019 Informatica Guide

    40/159

    3.5 TRACING LEVELS

    When we configure a transformation, we can set the amount of detail the IntegrationService writes in the session log.

    We set tracing level in Properties tab of a transformation.

    Level Description

    Normal Integration Service logs initialization and status information, errorsencountered and skipped rows due to transformation row errors.Summarizes session results, but not at the level of individual rows.

    Terse Integration Service logs initialization information and error messagesand notification of rejected data.

    VerboseInitialization

    In addition to normal tracing, Integration Service logs additionalinitialization details; names of index and data files used, and detailedtransformation statistics.

    VerboseData

    In addition to verbose initialization tracing, Integration Service logseach row that passes into the mapping.

    Allows the Integration Service to write errors to both the session logand error log when you enable row error logging.

    Integration Service writes row data for all rows in a block when itprocesses a transformation.

    Change the tracing level to a Verbose setting only when we need to debug atransformation that is not behaving as expected.

    To add a slight performance boost, we can also set the tracing level to Terse.

  • 8/13/2019 Informatica Guide

    41/159

    3.6 BASIC FIRST MAPPING

    First make sure that we have created a shared folder and a folder with the name ofdeveloper along with user as described in Installation Guide.

    We will transfer data from EMP table in source to EMP_Tgt table in target.

    Also create an ODBC connection for source and target database.

    Importing Source Definition:

    1. Select the Shared folder. Right click on it and select open.2. Shared folder will become bold. It means we are now connected to it.3. Click on tools-> Source Analyzer4. Now we will import the source table definitions in shared folder.5. Click on Source -> Import from database6. In box displayed, give connection information for source database.7. Click Connect. Tables in source database will be displayed.8. Select the tables of use and click OK.9. Table definition will be displayed. We can edit it as per need.

    Note:We can edit the source definition by dragging the table in Source Analyzeronly.

    Creating Target Table EMP_Tgt in Target database

    1. Connect to the Shared folder. Tools-> Target Designer2. Now drag the EMP table definition from left side pane to target designer.3. We will see the EMP table definition in Target Designer.4. Right click EMP -> Edit -> Click on rename and give name as EMP_Tgt5. Apply -> Ok.6. Now we will create this table in target database.7. Click Target -> Select generate/ execute SQL.8. Click on connect and give login information for target database.9. Then select the options of table generation.10.Click Generate/Execute button.11. Repository -> Save

    We are doing this for our practice only. In a project, all the source tablesand target tables are created by DBA. We just import the definition oftables.

  • 8/13/2019 Informatica Guide

    42/159

    Now we have all the tables we need in shared folder. We now need to create shortcut to these in our folder.

    1. Right Click on Shared folder and select disconnect.2. Select the folder where we want to create the mapping.3. Right click on folder and click open. The folder will become bold.4. We will now create shortcut to the tables of need in our work folder.5. Click + sign on Shared folder and open + sign on Sources and Select

    EMP table.6. Now click Edit -> Copy7. Now select the folder where which is bold.8. Click Edit -> Paste Shortcut9. Do the same for all source and target tables.10.Also rename all the shortcuts and remove Shortcut_to_ from all.11.Repository Save

    Shortcut use:

    If we will select paste option, then the copy of EMP table definition will becreated.

    Suppose, we are 10 people and 5 using shortcut and 5 are copying thedefinition of EMP.

    Now suppose the definition of EMP changes in database. We will now reimport the EMP definition and old definition will be replaced. Developers who were using shortcuts will see that the changes have

    been reflected in mapping automatically. Developers using copy will have to reimport manually. So for maintenance and ease, we use shortcuts to source and target

    definitions in our folder and short to other reusable transformations andmapplets.

    Creating Mapping:

    1. Open folder where we want to create the mapping.2. Click Tools -> Mapping Designer.3. Click Mapping -> Create -> Give mapping name. Ex: m_basic_mapping4. Drag EMP from source and EMP_Tgt from target in mapping.5. Link ports from SQ_EMP to EMP_Tgt.6. Click Mapping -> Validate7. Repository -> Save

  • 8/13/2019 Informatica Guide

    43/159

    Creating Session:Now we will create session in workflow manager.

    1. Open Workflow Manager -> Connect to repository2. Open the folder with same name in which we created mapping.3. Make sure folder is bold.4. Now click tool Task Developer5. Click Task -> Create -> Select Session task and give name.

    s_m_basic_mapping6. Select the correct mapping from the list displayed.7. Click Create and done.8. Now right click session and click edit.9. Select mapping tab.10.Go to SQ_EMP in source and give the correct relational connection for

    it.11. Do the same for EMP_Tgt.12. Also for target table, Give Load Type option as Normal and Also select

    Truncate Target Table Option.13. Task -> Validate

    Creating Workflow:

    1. Now Click Tools -> Workflow Designer2. Workflow -> Create -> Give name like wf_basic_mapping3. Click ok4. START task will be displayed. It is the starting point for Informatica server.5. Drag session to workflow.6. Click Task-> Link Task. Connect START to the session.7. Click Workflow -> Validate8. Repository Save.

    Now open Workflow Monitor first.

    1. Go back to Workflow Manager. Select the workflow and right click on theworkflow wf_basic_mapping.

    2. Select Start Workflow.You can view the status in Workflow Monitor.

    Check the data in target table.

    Command: select * from table_name;

  • 8/13/2019 Informatica Guide

    44/159

    3.7 EXPRESSION TRANSFORMATION

    Passive and connected transformation.Use the Expression transformation to calculate values in a single row before we writeto the target. For example, we might need to adjust employee salaries, concatenatefirst and last names, or convert strings to numbers.

    Use the Expression transformation to perform any non-aggregate calculations.Example: Addition, Subtraction, Multiplication, Division, Concat, Uppercaseconversion, lowercase conversion etc.

    We can also use the Expression transformation to test conditional statements beforewe output the results to target tables or other transformations. Example: IF, Then,Decode

    There are 3 types of ports in Expression Transformation: Input Output Variable: Used to store any temporary calculation.

    Calculating Values

    To use the Expression transformation to calculate values for a single row, we mustinclude the following ports:

    Input or input/output ports for each value used in the calculation:Forexample: To calculate Total Salary, we need salary and commission.

    Output port for the expression:We enter one expression for each outputport. The return value for the output port needs to match the return value ofthe expression.

    We can enter multiple expressions in a single Expression transformation. We cancreate any number of output ports in the transformation.

    Example: Calculating Total Salary of an Employee

    Import the source table EMP in Shared folder. If it is already there, then dontimport.

    In shared folder, create the target table Emp_Total_SAL. Keep all ports as inEMP table except Sal and Comm in target table. Add Total_SAL port to storethe calculation.

    Create the necessary shortcuts in the folder.

  • 8/13/2019 Informatica Guide

    45/159

    Creating Mapping:

    1. Open folder where we want to create the mapping.2. Click Tools -> Mapping Designer.3. Click Mapping -> Create -> Give mapping name. Ex: m_totalsal4. Drag EMP from source in mapping.5. Click Transformation -> Create -> Select Expression from list. Give name andclick Create. Now click done.6. Link ports from SQ_EMP to Expression Transformation.7. Edit Expression Transformation. As we do not want Sal and Comm in target,

    remove check from output port for both columns.8. Now create a new port out_Total_SAL. Make it as output port only.9. Click the small button that appears in the Expression section of the dialog box

    and enter the expression in the Expression Editor.10.Enter expression SAL + COMM. You can select SAL and COMM from Ports tab

    in expression editor.

    11.Check the expression syntax by clicking Validate.12.Click OK -> Click Apply -> Click Ok.13.Now connect the ports from Expression to target table.14.Click Mapping -> Validate15.Repository -> Save

    Create Session and Workflow as described earlier. Run the workflow andsee the data in target table.

  • 8/13/2019 Informatica Guide

    46/159

    As COMM is null, Total_SAL will be null in most cases. Now open yourmapping and expression transformation. Select COMM port, In Default Valuegive 0. Now apply changes. Validate Mapping and Save.

    Refresh the session and validate workflow again. Run the workflow and seethe result again.

    Now use ERROR in Default value of COMM to skip rows where COMM is null.Syntax:ERROR(Any message here)

    Similarly, we can use ABORT function to abort the session if COMM is null.Syntax:ABORT(Any message here)

    Make sure to double click the session after doing any changes in mapping. It willprompt that mapping has changed. Click OK to refresh the mapping. Run workflow

    after validating and saving the workflow.

  • 8/13/2019 Informatica Guide

    47/159

  • 8/13/2019 Informatica Guide

    48/159

    Create Session and Workflow as described earlier. Run the workflow andsee the data in target table.

    How to filter out rows with null values?To filter out rows containing null values or spaces, use the ISNULLand IS_SPACESfunctions to test the value of the port. For example, if we want to filter out rows thatcontain NULLs in the FIRST_NAME port, use the following condition:

    IIF(ISNULL(FIRST_NAME),FALSE,TRUE)

    This condition states that if the FIRST_NAME port is NULL, the return value is FALSEand the row should be discarded. Otherwise, the row passes through to the nexttransformation.

  • 8/13/2019 Informatica Guide

    49/159

    3.9 ROUTER TRANSFORMATION

    Active and connected transformation.A Router transformation is similar to a Filter transformation because bothtransformations allow you to use a condition to test data. A Filter transformationtests data for one condition and drops the rows of data that do not meet thecondition. However, a Router transformation tests data for one or more conditionsand gives you the option to route rows of data that do not meet any of the conditionsto a default output group.

    Example: If we want to keep employees of France, India, US in 3 different tables,then we can use 3 Filter transformations or 1 Router transformation.

    Mapping A uses three Filter transformations while Mapping B produces the sameresult with one Router transformation.

    A Router transformation consists of input and output groups, input and output ports,group filter conditions, and properties that we configure in the Designer.

  • 8/13/2019 Informatica Guide

    50/159

    Working with GroupsA Router transformation has the following types of groups:

    Input: The Group that gets the input ports. Output:User Defined Groups and Default Group. We cannot modify or delete

    output ports or their properties.

    User-Defined Groups: We create a user-defined group to test a condition basedon incoming data. A user-defined group consists of output ports and a group filtercondition. We can create and edit user-defined groups on the Groups tab with theDesigner. Create one user-defined group for each condition that we want to specify.

    The Default Group: The Designer creates the default group after we create onenew user-defined group. The Designer does not allow us to edit or delete the defaultgroup. This group does not have a group filter condition associated with it. If all ofthe conditions evaluate to FALSE, the IS passes the row to the default group.

    Example: Filtering employees of Department 10 to EMP_10, Department 20to EMP_20 and rest to EMP_REST

    Source is EMP Table. Create 3 target tables EMP_10, EMP_20 and EMP_REST in shared folder.

    Structure should be same as EMP table. Create the shortcuts in your folder.

    Creating Mapping:

    1. Open folder where we want to create the mapping.2. Click Tools -> Mapping Designer.3. Click Mapping-> Create-> Give mapping name. Ex: m_router_example4. Drag EMP from source in mapping.5. Click Transformation -> Create -> Select Router from list. Give name and

    click Create. Now click done.6. Pass ports from SQ_EMP to Router Transformation.7. Edit Router Transformation. Go to Groups Tab8. Click the Groups tab, and then click the Add button to create a user-defined

    group. The default group is created automatically..9. Click the Group Filter Condition field to open the Expression Editor.10.Enter a group filter condition. Ex: DEPTNO=1011.Click Validate to check the syntax of the conditions you entered.

  • 8/13/2019 Informatica Guide

    51/159

    12. Create another group for EMP_20. Condition: DEPTNO=2013.The rest of the records not matching the above two conditions will be passed

    to DEFAULT group. See sample mapping14.Click OK -> Click Apply -> Click Ok.15.Now connect the ports from router to target tables.16.Click Mapping -> Validate17.Repository -> Save

    Create Session and Workflow as described earlier. Run theworkflow and see the data in target table.

    Make sure to give connection information for all 3 target tables.Sample Mapping:

    Difference between Router and Filter

    1> We cannot pass rejected data forward in filter but we can pass it in router.Rejected data is in Default Group of router.

    2> Filter has no default group.

  • 8/13/2019 Informatica Guide

    52/159

    3.10 UNION TRANSFORMATION

    Active and Connected transformation.The Union transformation is a multiple input group transformation that you can useto merge data from multiple pipelines or pipeline branches into one pipeline branch.It merges data from multiple sources similar to the UNION ALL SQL statement tocombine the results from two or more SQL statements.

    Union Transformation Rules and Guidelines

    We can create multiple input groups, but only one output group. We can connect heterogeneous sources to a Union transformation. All input groups and the output group must have matching ports. The

    precision, datatype, and scale must be identical across all groups. The Union transformation does not remove duplicate rows. To remove

    duplicate rows, we must add another transformation such as a Router or Filtertransformation.

    We cannot use a Sequence Generator or Update Strategy transformationupstream from a Union transformation.

    Union Transformation Components

    When we configure a Union transformation, define the following components:

    Transformation tab: We can rename the transformation and add a description.Properties tab: We can specify the tracing level.Groups tab: We can create and delete input groups. The Designer displays groups

    we create on the Ports tab.Group Ports tab: We can create and delete ports for the input groups. The Designerdisplays ports we create on the Ports tab.

    We cannot modify the Ports, Initialization Properties, Metadata Extensions, or PortAttribute Definitions tabs in a Union transformation.

    Create input groups on the Groups tab, and create ports on the Group Portstab. We can create one or more input groups on the Groups tab. TheDesigner creates one output group by default. We cannot edit or delete thedefault output group.

    Example: to combine data of tables EMP_10, EMP_20 and EMP_REST

    Import tables EMP_10, EMP_20 and EMP_REST in shared folder in Sources. Create a target table EMP_UNION_EXAMPLE in target designer. Structure

    should be same EMP table. Create the shortcuts in your folder.

  • 8/13/2019 Informatica Guide

    53/159

    Creating Mapping:

    1. Open folder where we want to create the mapping.2. Click Tools -> Mapping Designer.3. Click Mapping-> Create-> Give mapping name. Ex: m_union_example4. Drag EMP_10, EMP_20 and EMP_REST from source in mapping.5. Click Transformation -> Create -> Select Union from list. Give name and click

    Create. Now click done.6. Pass ports from SQ_EMP_10 to Union Transformation.7. Edit Union Transformation. Go to Groups Tab8. One group will be already there as we dragged ports from SQ_DEPT_10 to

    Union Transformation.9. As we have 3 source tables, we 3 need 3 input groups. Click Add button to

    add 2 more groups. See Sample Mapping10.We can also modify ports in ports tab.11.Click Apply -> Ok.12.Drag target table now.13.Connect the output ports from Union to target table.14.Click Mapping -> Validate15.Repository -> Save

    Create Session and Workflow as described earlier. Run theworkflow and see the data in target table.

    Make sure to give connection information for all 3 sourcetables.

    Sample mapping picture

  • 8/13/2019 Informatica Guide

    54/159

    3.11 SORTER TRANSFORMATION

    Connected and Active Transformation The Sorter transformation allows us to sort data.

    We can sort data in ascending or descending order according to a specifiedsort key. We can also configure the Sorter transformation for case-sensitive sorting,

    and specify whether the output rows should be distinct.

    When we create a Sorter transformation in a mapping, we specify one or more portsas a sort key and configure each sort key port to sort in ascending or descendingorder. We also configure sort criteria the PowerCenter Server applies to all sort keyports and the system resources it allocates to perform the sort operation.

    The Sorter transformation contains only input/output ports. All data passing throughthe Sorter transformation is sorted according to a sort key. The sort key is one ormore ports that we want to use as the sort criteria.

    Sorter Transformation Properties

    1. Sorter Cache Size:The PowerCenter Server uses the Sorter Cache Size property to determine themaximum amount of memory it can allocate to perform the sort operation. ThePowerCenter Server passes all incoming data into the Sorter transformationbefore it performs the sort operation.

    We can specify any amount between 1 MB and 4 GB for the Sorter cachesize.

    If it cannot allocate enough memory, the PowerCenter Server fails thesession.

    For best performance, configure Sorter cache size with a value less thanor equal to the amount of available physical RAM on the PowerCenterServer machine.

    Informatica recommends allocating at least 8 MB (8,388,608 bytes) ofphysical memory to sort data using the Sorter transformation.

    2. Case Sensitive:The Case Sensitive property determines whether the PowerCenter Serverconsiders case when sorting data. When we enable the Case Sensitive property,the PowerCenter Server sorts uppercase characters higher than lowercasecharacters.

    3. Work DirectoryDirectory PowerCenter Server uses to create temporary files while it sorts data.

    4. Distinct:Check this option if we want to remove duplicates. Sorter will sort data accordingto all the ports when it is selected.

  • 8/13/2019 Informatica Guide

    55/159

    Example: Sorting data of EMP by ENAME

    Source is EMP table. Create a target table EMP_SORTER_EXAMPLE in target designer. Structure

    same as EMP table.

    Create the shortcuts in your folder.Creating Mapping:

    1. Open folder where we want to create the mapping.2. Click Tools -> Mapping Designer.3. Click Mapping-> Create-> Give mapping name. Ex: m_sorter_example4. Drag EMP from source in mapping.5. Click Transformation -> Create -> Select Sorter from list. Give name and click

    Create. Now click done.6. Pass ports from SQ_EMP to Sorter Transformation.7. Edit Sorter Transformation. Go to Ports Tab8. Select ENAME as sort key. CHECK mark on KEY in front of ENAME.9. Click Properties Taband Select Properties as needed.10.Click Apply -> Ok.11.Drag target table now.12.Connect the output ports from Sorter to target table.13.Click Mapping -> Validate14.Repository -> Save

    Create Session and Workflow as described earlier. Run theworkflow and see the data in target table.

    Make sure to give connection information for all tables.

    Sample Sorter Mapping

  • 8/13/2019 Informatica Guide

    56/159

    3.12 RANK TRANSFORMATION

    Active and connected transformationThe Rank transformation allows us to select only the top or bottom rank of data. Itallows us to select a group of top or bottom values, not just one value.

    During the session, the PowerCenter Server caches input data until it can performthe rank calculations.

    Rank Transformation Properties

    Cache Directory where cache will be made. Top/Bottom Rank as per need Number of Ranks Ex: 1, 2 or any number Case Sensitive Comparison can be checked if needed

    Rank Data Cache Size can be set Rank Index Cache Size can be set

    Ports in a Rank Transformation

    Ports NumberRequired

    Description

    I 1 Minimum Port to receive data from another transformation.

    O 1 Minimum Port we want to pass to other transformation.

    V Not needed Can use to store values or calculations to use in an expression.

    R Only 1 Rank port. Rank is calculated according to it. The Rank port isan input/output port. We must link the Rank port to anothertransformation. Example: Total Salary

    Rank Index

    The Designer automatically creates a RANKINDEX port for each Rank transformation.The PowerCenter Server uses the Rank Index port to store the ranking position foreach row in a group.

    For example, if we create a Rank transformation that ranks the top five salariedemployees, the rank index numbers the employees from 1 to 5.

    The RANKINDEX is an output port only. We can pass the rank index to another transformation in the mapping or

    directly to a target. We cannot delete or edit it.

  • 8/13/2019 Informatica Guide

    57/159

    Defining Groups

    Rank transformation allows us to group information. For example: If we want toselect the top 3 salaried employees of each Department, we can define a group fordepartment.

    By defining groups, we create one set of ranked rows for each group. We define a group in Ports tab. Click the Group By for needed port. We cannot Group By on port which is also Rank Port.

    1> Example: Finding Top 5 Salaried Employees

    EMP will be source table. Create a target table EMP_RANK_EXAMPLE in target designer. Structure

    should be same as EMP table. Just add one more port Rank_Index to storeRANK INDEX.

    Create the shortcuts in your folder.

    Creating Mapping:1. Open folder where we want to create the mapping.2. Click Tools -> Mapping Designer.3. Click Mapping-> Create-> Give mapping name. Ex: m_rank_example4. Drag EMP from source in mapping.5. Create an EXPRESSION transformation to calculate TOTAL_SAL.6. Click Transformation -> Create -> Select RANK from list. Give name and click

    Create. Now click done.7. Pass ports from Expression to Rank Transformation.8. Edit Rank Transformation. Go to Ports Tab9. Select TOTAL_SAL as rank port. Check Rtype in front of TOTAL_SAL.10.Click Properties Taband Select Properties as needed.11. Top in Top/Bottom and Number of Ranks as 5.12.Click Apply -> Ok.13.Drag target table now.14.Connect the output ports from Rank to target table.15.Click Mapping -> Validate16.Repository -> Save

    Create Session and Workflow as described earlier. Run theworkflow and see the data in target table.

    Make sure to give connection information for all tables.2> Example: Finding Top 2 Salaried Employees for every DEPARTMENT

    Open the mapping made above. Edit Rank Transformation. Go to Ports Tab. Select Group By for DEPTNO. Go to Properties tab. Set Number of Ranks as 2. Click Apply -> Ok. Mapping -> Validate and Repository Save.

    Refresh the session by double clicking. Save the changed and run workflow to seethe new result.

  • 8/13/2019 Informatica Guide

    58/159

    Sample Rank Mapping

    RANK CACHEWhen the PowerCenter Server runs a session with a Rank transformation, itcompares an input row with rows in the data cache. If the input row out-ranks astored row, the PowerCenter Server replaces the stored row with the input row.

    Example: PowerCenter caches the first 5 rows if we are finding top 5 salariedemployees. When 6throw is read, it compares it with 5 rows in cache and places it incache is needed.

    1> RANK INDEX CACHE:The index cache holds group information from the group by ports. If we areusing Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.

    All Group By Columns are in RANK INDEX CACHE. Ex. DEPTNO2> RANK DATA CACHE:

    It holds row data until the PowerCenter Server completes the ranking and isgenerally larger than the index cache. To reduce the data cache size, connectonly the necessary input/output ports to subsequent transformations.

    All Variable ports if there, Rank Port, All ports going out from RANKtransformation are stored in RANK DATA CACHE.

    Example: All ports except DEPTNO In our mapping example.

  • 8/13/2019 Informatica Guide

    59/159

    3.13 AGGREGATOR TRANSFORMATION

    Connected and Active Transformation The Aggregator transformation allows us to perform aggregate calculations, such

    as averages and sums. Aggregator transformation allows us to perform calculations on groups.Components of the Aggregator Transformation

    1> Aggregate expression2> Group by port3> Sorted Input4> Aggregate cache

    1> Aggregate Expressions

    Entered in an output port. Can include non-aggregate expressions and conditional clauses.

    The transformation language includes the following aggregate functions: AVG, COUNT , MAX, MIN, SUM FIRST, LAST MEDIAN, PERCENTILE, STDDEV, VARIANCE Single Level Aggregate Function: MAX(SAL) Nested Aggregate Function: MAX( COUNT( ITEM ))

    Nested Aggregate Functions

    In Aggregator transformation, there can be multiple single level functions ormultiple nested functions.

    An Aggregator transformation cannot have both types of functions together. MAX( COUNT( ITEM )) is correct. MIN(MAX( COUNT( ITEM ))) is not correct. It can also include one aggregate

    function nested within another aggregate function

    Conditional Clauses

    We can use conditional clauses in the aggregate expression to reduce the number ofrows used in the aggregation. The conditional clause can be any clause thatevaluates to TRUE or FALSE.

    SUM( COMMISSION, COMMISSION > QUOTA )Non-Aggregate FunctionsWe can also use non-aggregate functions in the aggregate expression.

    IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))

  • 8/13/2019 Informatica Guide

    60/159

  • 8/13/2019 Informatica Guide

    61/159

    1> Example: To calculate MAX, MIN, AVG and SUM of salary of EMP table.

    EMP will be source table. Create a target table EMP_AGG_EXAMPLE in target designer. Table should

    contain DEPTNO, MAX_SAL, MIN_SAL, AVG_SAL and SUM_SAL

    Create the shortcuts in your folder.Creating Mapping:

    1. Open folder where we want to create the mapping.2. Click Tools -> Mapping Designer.3. Click Mapping-> Create-> Give mapping name. Ex: m_agg_example4. Drag EMP from source in mapping.5. Click Transformation -> Create -> Select AGGREGATOR from list. Give name

    and click Create. Now click done.6. Pass SAL and DEPTNO only from SQ_EMP to AGGREGATOR Transformation.7. Edit AGGREGATOR Transformation. Go to Ports Tab8.

    Create 4 output ports: OUT_MAX_SAL, OUT_MIN_SAL, OUT_AVG_SAL,OUT_SUM_SAL

    9. Open Expression Editor one by one for all output ports and give thecalculations. Ex: MAX(SAL), MIN(SAL), AVG(SAL),SUM(SAL)

    10.Click Apply -> Ok.11.Drag target table now.12.Connect the output ports from Rank to target table.13.Click Mapping -> Validate14.Repository -> Save

    Create Session and Workflow as described earlier. Run theworkflow and see the data in target table.

    Make sure to give connection information for all tables.

  • 8/13/2019 Informatica Guide

    62/159

    2> Example: To calculate MAX, MIN, AVG and SUM of salary of EMP table for

    every DEPARTMENT

    Open the mapping made above. Edit Rank Transformation. Go to Ports Tab. Select Group By for DEPTNO. Click Apply -> Ok. Mapping -> Validate and Repository Save.Refresh the session by double clicking. Save the changed and run workflow to see

    the new result.

    Scene1: What will be output of the picture below?

    Here we are not doing any calculation or group by.

    In this case, the DEPTNO and SAL of last record of EMP table will be passed totarget.

    Scene2: What will be output of the above picture if Group By is done onDEPTNO?

    Here we are not doing any calculation but Group By is there on DEPTNO.In this case, the last record of every DEPTNO from EMP table will be passed totarget.

    Scene3: What will be output of the EXAMPLE 1?

    In Example 1, we are calculating MAX, MIN, AVG and SUM but we are not doing anyGroup By.In this DEPTNO of last record of EMP table will be passed. The calculations howeverwill be correct.

    Scene4: What will be output of the EXAMPLE 2?

    In Example 1, we are calculating MAX, MIN, AVG and SUM for every DEPT.In this DEPTNO and the correct calculations for every DEPTNO will be passed totarget.

    Scene5: Use SORTED INPUT in Properties Tab and Check output

  • 8/13/2019 Informatica Guide

    63/159

    3.14 JOINER TRANSFORMATION

    Connected and Active Transformation Used to join source data from two related heterogeneous sources residing in

    different locations or file systems. Or, we can join data from the same source. If we need to join 3 tables, then we need 2 Joiner Transformations. The Joiner transformation joins two sources with at least one matching port.

    The Joiner transformation uses a condition that matches one or more pairs ofports between the two sources.

    Example: To join EMP and DEPT tables.

    EMP and DEPT will be source table. Create a target table JOINER_EXAMPLE in target designer. Table should

    contain all ports of EMP table plus DNAME and LOC as shown below. Create the shortcuts in your folder.

    Creating Mapping:

    1> Open folder where we want to create the mapping.2> Click Tools -> Mapping Designer.3> Click Mapping-> Create-> Give mapping name. Ex: m_joiner_example4> Drag EMP, DEPT, Target. Create Joiner Transformation. Link as shown below.

    5> Specify the join condition in Condition tab. See steps on next page.6> Set Master in Ports tab. See steps on next page.7> Mapping -> Validate8> Repository -> Save.

    Create Session and Workflow as described earlier. Run theworkflow and see the data in target table.

    Make sure to give connection information for all tables.

  • 8/13/2019 Informatica Guide

    64/159

    JOIN CONDITION:

    The join condition contains ports from both input sources that must match for thePowerCenter Server to join two rows.Example: DEPTNO=DEPTNO1 in above.

    1. Edit Joiner Transformation -> Condition Tab2. Add condition

    We can add as many conditions as needed. Only = operator is allowed.

    If we join Char and Varchar datatypes, the PowerCenter Server counts any spacesthat pad Char values as part of the string. So if you try to join the following:Char (40) = abcd and Varchar (40) = abcdThen the Char value is abcd padded with 36 blank spaces, and the PowerCenterServer does not join the two fields because the Char field contains trailing spaces.

    Note: The Joiner transformation does not match null values.

    MASTER and DETAIL TABLES

    In Joiner, one table is called as MASTER and other as DETAIL. MASTER table is always cached. We can make any table as MASTER. Edit Joiner Transformation -> Ports Tab -> Select M for Master table.

    Table with less number of rows should be made MASTER to improveperformance.Reason:

    When the PowerCenter Server processes a Joiner transformation, it readsrows from both sources concurrently and builds the index and data cachebased on the master rows. So table with fewer rows will be read fast andcache can be made as table with more rows is still being read.

    The fewer unique rows in the master, the fewer iterations of the joincomparison occur, which speeds the join process.

    JOINER TRANSFORMATION PROPERTIES TAB

    Case-Sensitive String Comparison: If selected, the PowerCenter Serveruses case-sensitive string comparisons when performing joins on stringcolumns.

    Cache Directory: Specifies the directory used to cache master or detail rowsand the index to these rows. Join Type:Specifies the type of join: Normal, Master Outer, Detail Outer, or

    Full Outer. Tracing Level Joiner Data Cache Size Joiner Index Cache Size Sorted Input

  • 8/13/2019 Informatica Guide

    65/159

    JOIN TYPES

    In SQL, a join is a relational operator that combines data from multiple tables into asingle result set. The Joiner transformation acts in much the same manner, exceptthat tables can originate from different databases or flat files.

    Types of Joins: Normal Master Outer Detail Outer Full Outer

    Note:A normal or master outer join performs faster than a full outer or detail outerjoin.

    Example:In EMP, we have employees with DEPTNO 10, 20, 30 and 50. In DEPT, wehave DEPTNO 10, 20, 30 and 40. DEPT will be MASTER table as it has less rows.

    Normal Join:

    With a normal join, the PowerCenter Server discards all rows of data from the masterand detail source that do not match, based on the condition.

    All employees of 10, 20 and 30 will be there as only they are matching.Master Outer Join:

    This join keeps all rows of data from the detail source and the matching rows fromthe master source. It discards the unmatched rows from the master source.

    All data of employees of 10, 20 and 30 will be there. There will be employees of DEPTNO 50 and corresponding DNAME and LOC

    columns will be NULL.

    Detail Outer Join:

    This join keeps all rows of data from the master source and the matching rows fromthe detail source. It discards the unmatched rows from the detail source.

    All employees of 10, 20 and 30 will be there. There will be one record for DEPTNO 40 and corresponding data of EMP

    columns will be NULL.

    Full Outer Join:

    A full outer join keeps all rows of data from both the master and detail sources.

    All data of employees of 10, 20 and 30 will be there. There will be employees of DEPTNO 50 and corresponding DNAME and LOC

    columns will be NULL. There will be one record for DEPTNO 40 and corresponding data of EMP

    columns will be NULL.

  • 8/13/2019 Informatica Guide

    66/159

    USING SORTED INPUT

    Use to improve session performance. To use sorted input, we must pass data to the Joiner transformation sorted by

    the ports that are used in Join Condition.

    We check the Sorted Input Option in Properties Tab of the transformation. If the option is checked but we are not passing sorted data to the

    transformation, then the session fails. We can use SORTER to sort data or Source Qualifier in case of

    relational tables.

    JOINER CACHES

    Joiner always caches the MASTER table. We cannot disable caching. It builds Indexcache and Data Cache based on MASTER table.

    1> Joiner Index Cache: All Columns of MASTER table used in Join condition are in JOINER INDEX

    CACHE. Example: DEPTNO in our mapping.

    2> Joiner Data Cache: Master column not in join condition and used for output to other

    transformation or target table are in Data Cache. Example: DNAME and LOC in our mapping example.

    JOINER TRANSFORMATION TIPS

    Perform joins in a database when possible. Join sorted data when possible. For a sorted Joiner transformation, designate as the master source the source

    with fewer duplicate key values. Joiner can't be used in following conditions:

    1. Either input pipeline contains an Update Strategy transformation.2. We connect a Sequence Generator transformation directly before the

    Joiner transformation.

  • 8/13/2019 Informatica Guide

    67/159

  • 8/13/2019 Informatica Guide

    68/159

    Source will be EMP and DEPT tables. Create target table as showed in Picture above. Create shortcuts in your folder as needed.

    Creating Mapping:1> Open folder where we want to create the mapping.2> Click Tools -> Mapping Designer.3> Click Mapping-> Create-> Give mapping name. Ex: m_SQ_example4> Drag EMP, DEPT, Target.5> Right Click SQ_EMP and Select Delete from the mapping.6> Right Click SQ_DEPT and Select Delete from the mapping.7> Click Transformation -> Create -> Select Source Qualifier from List -> Give

    Name -> Click Create8> Select EMP and DEPT both. Click OK.9> Link all as shown in above picture.10> Edit SQ -> Properties Tab -> Open User defined Join -&g