This page has moved to a new address.

LBi Software

LBi Software: November 2010

Monday, November 15, 2010

The Benefits of Extract, Transform and Load (ETL)

The value of enterprise business intelligence is greatly enhanced when information from various sources is combined in a meaningful way.

What is ETL?
ETL, or Extract, Transform and Load, eases the combination of heterogeneous sources into a unified central repository. Usually this repository is a data warehouse or mart which will support enterprise business intelligence.

Extract – read data from multiple source systems into a single format. This process extracts the data from each native system and saves it to one target location. That source data may be any number of database formats, flat files, or document repositories. Usually, the goal is to extract the entire unmodified source system data, though certain checks and filters may be performed here to ensure the data meets an expected layout or to selectively remove data (e.g. potentially confidential information).

Transform – in this step, the data from the various systems is made consistent and linked. Some of the key operations here are:
  • Standardization – data is mapped to a consistent set of lookup values (e.g. US, USA, United States and blank/null – all mapped to the standard ISO country code)
  • Cleansing – perform validity checks and either remove or modify problem data
  • Surrogate keys – new key values applied to similar data from different source systems prevent key collisions in the future and provide a cross reference across these systems
  • Transposing – organizes data to optimize reporting. Many source systems are optimized for transactional performance but the warehouse will be primarily used for reporting. Often this involves denormalizing and re-organizing into a dimensional model.
Load – the transformed data is now written out to a warehouse/mart. The load process will usually preserve prior data. In some instances existing warehouse data is never removed, just marked as inactive. This provides full auditing and supports historical reporting.


ETL Tools
There are a number of commercial and open source ETL tools available to assist in any ETL process. Some of the prominent ones are:
  • Business Objects Data Integrator
  • Informatica PowerCenter
  • IBM InfoSphere DataStage
  • Oracle Warehouse Builder / Data Integrator
  • Microsoft SQL Server Integration Services
  • Pentaho Data Integration (Open Source)
  • Jasper ETL (Open Source)
These tools provide a number of functions to facilitate the ETL workflow. The variety of source data types are handled automatically. A transformation engine makes it easy to create reusable scripts to handle the data mapping. Scheduling and error handling are also built in.

It is particularly advantageous to use an ETL tool in the following situations:
  • When there are many source systems to be integrated
  • When source systems are in different formats
  • When this process needs to be run repeatedly (e.g. daily, hourly, real time)
  • To take advantage of pre-built warehouses/marts. Many of these exist for popular platforms such as PeopleSoft, SAP, JD Edwards.
There are also times where the overhead and cost of setting up an ETL tool might not make sense. In these situations some combination of stored procedures, custom coding and off the shelf packages may make more sense. Scenarios of this type include:
  • One time conversion of data
  • A limited number of source systems that share key identifiers

Sample Workflow

As illustrated here, a typical ETL workflow will move the data through a few distinct phases. This allows each phase to be better defined and eases troubleshooting.

Source > Extract > Stage – this phase extracts all the appropriate data from each source system. The extract copies only data that has changed in the source system since its last run. The stage library contains all source information in a similar structure to how it appears in the source systems. All extracted information will remain in stage until it is successfully processed by the transform.

Stage > Transform > Warehouse – the data from stage is transformed into a warehouse. In this example this step includes some of the base transformations as well as the load of data into a single warehouse. In this phase, surrogate keys are added where needed, lookup value mappings are applied and related information from multiple source systems is combined into a single structure. Any errors encountered here are reported and the problem data remains in stage until corrected. No information is removed from the warehouse and all data there is tagged with effective, update and end timestamps.

Warehouse > Load > Mart – the current effective date from the warehouse is loaded to the mart to support analysis. While this is the final load of the process, this step also includes a transform of the data to an optimized dimensional form for reporting and analysis.

Business intelligence in the enterprise is greatly enhanced by unified data. ETL can be an important tool when combining heterogeneous sources into one cohesive central repository.

Wednesday, November 3, 2010

Easing the Pain of the Year End Project

The Year End process can be a complicated, highly visible one. LBi has created guidelines to smoothly manage this daunting process.

In a company’s Human Resources and Payroll area, there are a number of particularly time consuming processes that occur at the end of each year. These Year End processes are critical and can include:
  • Imputed Income calculations
  • Deferred Income processing
  • Applying tax updates
  • End of year bonuses
  • Merit increases
  • First payroll of the new year
  • W2s and 1099s
  • State Filings

Typically a team is created to manage the process. LBi has managed this process for several clients and has come up with a project methodology that helps the process run smoothly.

The first step in this methodology is the creation of a checklist. The Year End (YE) process begins by establishing task schedules and priorities. This enables the team to understand both the scope and the scheduling of events and is an essential first step in YE planning and project management. The checklist is used throughout the YE process to ensure that no critical steps are overlooked. After verification by the project stakeholders, the completed checklist is then used to build the Year End Calendar.

The YE Calendar of events expands the tasks in the checklist and incorporates dates and responsibilities. It is then used as a template for complete documentation.

The next step is to perform the required legal and regulatory changes as published by PeopleSoft or other HCM system, or requested by the company, to keep everything in compliance. These items are then added to the Checklist and Calendar.


The final step is to create a comprehensive inventory of all processes, customizations and ad-hoc reports. We identify all special queries required for “data cleanup” and balance adjustments. It is critical to include these ad-hoc reports and processes and productionalize them, as they are typically needed each year. We then assemble all the documentation and testing plans and results for SOX compliance and audit readiness.

There are four key steps in LBi’s Year End methodology:
1. Year End Checklist
2. Year End Calendar
3. Regulatory Updates
4. Inventory

By following these steps, the seemingly daunting Year End process becomes a manageable project.