WHITE PAPER Informatica PowerCenter 8 on HP Integrity Servers: Doubling Performance with Linear Scalability for 64-bit Enterprise Data Integration
This document contains Confi dential, Proprietary and Trade Secret Information ( Confi dential Information ) of Informatica Corporation and may not be copied, distributed, duplicated, or otherwise reproduced in any manner without the prior written consent of Informatica. While every attempt has been made to ensure that the information in this document is accurate and complete, some typographical errors or technical inaccuracies may exist. Informatica does not accept responsibility for any kind of loss resulting from the use of information contained in this document. The information contained in this document is subject to change without notice. The incorporation of the product attributes discussed in these materials into any release or upgrade of any Informatica software product as well as the timing of any such release or upgrade is at the sole discretion of Informatica. Protected by one or more of the following U.S. Patents: 6,032,158; 5,794,246; 6,014,670; 6,339,775; 6,044,374; 6,208,990; 6,208,990; 6,850,947; 6,895,471; or by the following pending U.S. Patents: 09/644,280; 10/966,046; 10/727,700. This edition published August 2006
White Paper Table of Contents Executive Summary..........................................2 The Need for High Performance Data Integration.....................3 Test Objectives and Methods...................................4 Test Results: Raising the Performance Bar.........................5 Informatica s Commitment to Performance.........................7 Conclusion.................................................8 Informatica PowerCenter 8 on HP Integrity Servers 1
SUMMARY OF FINDINGS The tests showed that PowerCenter 8 more than doubled the performance of PowerCenter 7. PowerCenter 8 executed a 1TB data load to an Oracle 10g database 2.17 times faster than PowerCenter 7, as documented in previous benchmark testing on a comparable 64-CPU HP system. This exceptional 117 percent performance improvement 43 minutes 45 seconds vs. 95 minutes for PowerCenter 8 and PowerCenter 7, respectively continues a release-to-release trend of double- and triple-digit percent performance gains recorded for PowerCenter on HP since Informatica and HP began benchmark testing in 2000. In addition, near-linear scalability achieved with a 1TB load across 16-, 32-, and 64-CPU HP systems reaffi rmed PowerCenter s ability to scale to meet the performance and throughput requirements of the most demanding enterprise data integration deployments. Executive Summary Organizations deploying enterprise data integration technology to support mission-critical operational applications increasingly require high performance and scalability to meet the demands of large data volumes, complex processing and latency requirements, and a variety of project types, including data warehousing, data migration, consolidation, synchronization, and master data management. At the same time, enterprise data integration software must be engineered to take advantage of performance enhancements in both servers and operating systems, including powerful multi-core processors, 64-bit application environments, and greater memory and I/O bandwidth. Informatica Corporation, the acknowledged performance leader for enterprise data integration, has consistently delivered performance-related innovation and real-world results with each major platform release. With the recent release of the new Informatica PowerCenter 8, the trend continues. From April to June 2006, Informatica teamed up with partner HP to measure the performance and scalability of PowerCenter 8 running on HP s elite Integrity Superdome servers. A key objective was to measure the performance of PowerCenter 8 against the performance of earlier PowerCenter releases as recorded in previous HP benchmark testing on similar hardware confi gurations. PowerCenter is a single, unifi ed enterprise data integration platform that allows companies and government organizations of all sizes to access, discover, and integrate data from virtually any business system, in any format, and deliver that data throughout the enterprise at any speed. PowerCenter 8 builds on the leadership established by PowerCenter 7 by adding new capabilities and platform optimization to handle the massive throughput and scalability requirements in today s challenging environments. These performance and scalability-related capabilities deliver linear scalability and high performance required for mission-critical enterprise data integration. This white paper outlines the objectives, test details, system confi guration, and results of a comprehensive set of tests conducted at the HP performance benchmark facilities in Cupertino, California. 2
White Paper The Need for High Performance Data Integration The requirements for enterprise data integration performance and scalability are increasing as organizations strive to liberate, leverage, and consolidate information from all corners of the enterprise. The deployment of enterprise data integration technology such as PowerCenter 8 is a key foundational element for organizations to transform data from fragmented applications into a single, fl exible data infrastructure that can deliver accurate information on demand to vital business systems. Organizations in fi nancial services, manufacturing, telecommunications, government, healthcare, energy and utilities, and other industries are deploying PowerCenter 8 to support such initiatives as: Real-time processing for fi nancials, supply chain, and order management On-demand data access in a service-oriented architecture Data warehouses requiring high-volume loads and complex transformations Cross-enterprise data quality and master data management Subpar performance of a data integration platform in the operational arena can have serious implications on overall enterprise performance and operational effi ciency. Real-time systems supporting fi nancial and supply chain applications, for instance, demand 24/7 availability and high throughput of data across disparate systems, business units, and geographies. And the transition towards a service-oriented architecture (SOA) in many organizations is challenging IT professionals to integrate and orchestrate data across multiple applications and servers atop a high-performance, on-demand network that must scale between large fl uctuations in low and peak data processing demands. Data Integration in SMP-Based Operational Environments For many organizations, traditional symmetric multiprocessing (SMP) systems such as the HP Integrity Superdome, with solid state high bandwidth and directly connected storage area network (SAN) storage, continue to be the platform of choice for mission-critical, high-demand application deployment. Enterprises that are considering deploying or extending data integration technology across SMPbased operational environments need to be confi dent in the data integration platform s ability to scale to meet high throughput and performance demands that will continue to grow in the months and years to come. Informatica PowerCenter 8 on HP Integrity Servers 3
Test Objectives and Methods Informatica performs comparative performance and scalability tests on various workloads and hardware platforms throughout the life of every major software release, and the results over the years have established Informatica as the acknowledged data integration performance leader. The benchmark testing conducted by Informatica and HP was designed to test the limits of PowerCenter 8 with a barrage of throughput and scalability tests at various CPU and data volume counts. The performance benchmark testing of PowerCenter 8 set out to accomplish three main objectives: 1. Measure the throughput and scalability of PowerCenter 8 on HP Integrity Superdome server and HP StorageWorks storage confi gurations 2. Leverage new PowerCenter 8 performance features and architecture to best advantage 3. Measure release-to-release performance of PowerCenter 8 vs. PowerCenter versions 7, 6, and 5 Informatica selected an industry-standard benchmark suite called TPC-H from the Transaction Processing Performance Council (TPC) to ensure an even playing fi eld, customer scenario likeness, and completeness of implementation. The use of a publicly recognized data set avoided the possibility that the data could be specialized for benchmarking purposes. Details on the schema, generation tools, and the validity of the TPC-H data set can be found at http://www.tpc.org/tpch/default.asp. We then developed a suite of throughput and transformation-oriented tests to simulate a live data integration scenario. These tests were based on TPC benchmark standards and a subset of the TPC-H benchmark suite. Details on the data transformations performed can be found at http://www.tpc.org/tpch/spec/tpch2.1.0.pdf. The testing included 1TB loads executed by PowerCenter 8 to both an Oracle 10g database, and loading 1TB of data into fl at fi les for use by an external load utility. Complete benchmark runs were executed across 16-, 32-, and 64-CPU HP Integrity Superdome confi gurations. Data Integration Platform Hardware HP Integrity Superdome server 64 Intel Itanium 2 9M 1.6GHz CPUs 256GB RAM Data Integration Platform Software PowerCenter 8.1 Advanced Edition 64-bit Oracle 10g Enterprise Edition database 64-bit HP-UX 11i operating system 64-bit (version B.11.23) 4
White Paper Figure 1 shows the hardware environment for these tests. Informatica Benchmark Topology 16 FC 32-port SAN Switch 4 FC Each HP Integrity Superdome 64 Itanium 9M Processors 256GB RAM Figure 1: Hardware Environment for the PowerCenter Performance Benchmark Tests 4 EVA 6000 Each with 54 72GB 15K Disks Test Results: Raising the Performance Bar The results generated by the set of benchmark tests demonstrate decisively that Informatica has more than doubled its data integration performance with its release of PowerCenter 8. New capabilities for parallel and 64-bit data processing, data partitioning, and optimized handling of fl at fi les in PowerCenter 8, as well as continued improvements in the HP hardware and Oracle database platforms, combined to generate exceptional performance gains across the board. More than a 2x Performance Gain vs. PowerCenter 7 In our testing on an HP Integrity Superdome with 64 CPUs, PowerCenter 8 executed a 1TB load to an Oracle 10g database 2.17 times faster than PowerCenter 7 64-bit, as recorded in previous benchmark testing on comparable 64-CPU HP equipment in June and July 2004. As illustrated in the following fi gures, PowerCenter 8 performed its load in 43 minutes 45 seconds, compared to 1 hour 35 minutes for PowerCenter 7 in 2004. This 117 percent speed improvement continues a trend of double- and triple-digit percent performance gains of each Informatica PowerCenter release compared to its predecessor, as documented in benchmark testing that Informatica has conducted in cooperation with HP since 2000. In separate testing of reading from fl at fi les, performing complex transformations, and writing 1TB to fl at fi les (rather than to database), PowerCenter 8 s performance was clocked at 32 minutes 29 seconds, nearly 35 percent faster than the 1TB load to database. This fi gure represents the fastest data integration performance results ever publicly released in an industry benchmark test. Informatica PowerCenter 8 on HP Integrity Servers 5
Perhaps most striking is the performance improvement of PowerCenter 8 as compared to PowerCenter 5, which was released in mid-2000. Since then, Informatica s commitment to innovation in performance and scalability has contributed to a load time acceleration of a whopping 803 percent 43 minutes 45 seconds for PowerCenter 8, vs. 6 hours and 35 minutes for PowerCenter 5, for a 1TB load. Figure 2 shows the test results for PowerCenter 8 on HP Integrity Superdome with 64 CPUs Elapsed Time 12:00:00 9:35:00 7:15:00 4:50:00 2:25:00 0:00:00 10:39:00 PowerCenter 5 3:36:00 1:35:00 0:43:45 PowerCenter 6 PowerCenter 7 PowerCenter 8 [196%] [127%] [117%] [Performance Gain vs. Previous Release] Figure 2: PowerCenter 1TB Load Times on 64-CPU HP Integrity Superdome 1TB Linear Scalability Across 16, 32, and 64 CPUs Our benchmark testing sought to measure the ability of PowerCenter 8 to scale 1TB job execution across HP Integrity Superdome server confi gurations of 16, 32, and 64 CPUs. PowerCenter 8 recorded an average throughput of 374MBs/min/CPU in these 1TB load tests. The chart below illustrates near-perfect linear scalability. These numbers help to further validate PowerCenter 8 s ability to meet the most demanding enterprise workloads without performance degradation. Figure 3 shows that PowerCenter demonstrates near linear scalability across CPUs. 2:55:00 02:43:24 2:25:00 Elapsed Time 1:55:00 1:25:00 0:55:00 01:24:46 00:43:45 0:30:00 0:00:00 16 CPUS 32 CPUS 64 CPUS 1 TB Figure 3: PowerCenter 1TB Linear Scalability Across CPUs 6
White Paper Informatica s Commitment to Performance The signifi cant performance improvements and near-linear scalability demonstrated in this set of PowerCenter 8 benchmark tests is attributable in large part to Informatica s ongoing commitment to meet and exceed the data integration performance demands of enterprise customers. PowerCenter 8 leverages our aggressive R&D investments and feedback from many of our 2,400+ customers around the world to deliver performance and scalability enhancements and optimization engineered to help enterprise IT organizations: Manage increasing data volumes and decreasing load windows Deploy data integration technology in mission-critical, real-time systems Take advantage of faster, more scalable server and storage platforms Several key enhancements in PowerCenter 8 that have helped extend Informatica s acknowledged industry leadership in data integration performance and scalability are the following. Full Parallelization of Flat-File Processing PowerCenter 8 features a new capability that fully parallelizes the processing and writing of fl at fi le data. This key feature completes the parallel processing of fl at fi les and allows for massive amounts of fi le-based data to be read, transformed, and written in parallel and seamlessly coordinated threads with PowerCenter 8. Flat fi le parallelization began with the introduction of parallel reading in PowerCenter 7, and enhanced capabilities were fully incorporated into PowerCenter 8. Improved Internal Data Conversion Across Data Formats and Systems PowerCenter 8 offers signifi cant performance enhancements in reading and converting external data types to the Informatica format as a result of a detailed engineering examination and subsequent improvements on a type-by-type data basis. As an example, PowerCenter s performance in reading and converting ASCII numeric data was improved by 200 percent to 240 percent from PowerCenter 7 levels. This capability contributed to the signifi cant release-to-release performance improvement between PowerCenter 7 and PowerCenter 8. Improved Transformation Processing Capabilities and Speed With every major release, Informatica performs a comprehensive performance review of the underlying transformation processing capabilities and makes incremental improvements to as many algorithms underlying the technology as possible. Specifi c improvements to fi ltering, expression processing, and memory management in PowerCenter 8 added to the overall performance improvement observed in this set of benchmarks tests. Informatica PowerCenter 8 on HP Integrity Servers 7
Conclusion Performance and scalability capabilities of an enterprise data integration platform are increasingly important for many organizations. The ability of the platform to process large volumes of data in smaller load windows is essential for an enterprise to extend data integration technology into mission-critical operational systems, as well as improve the performance of vital data warehouses and marts. These benchmark tests help to further validate PowerCenter 8 s industry-leading performance and scalability and its ability to meet and exceed the most demanding requirements of enterprise data integration. Informatica believes that IT organizations can be confi dent that PowerCenter 8, as well as future releases of the Informatica platform, will meet performance demands now and for years into the future. For more information about PowerCenter 8, please visit us at www.informatica.com/powercenter8 or call (800) 653-3871. 8
White Paper Informatica PowerCenter 8 on HP Integrity Servers 9
Worldwide Headquarters, 100 Cardinal Way, Redwood City, CA 94063, USA phone: 650.385.5000 fax: 650.385.5500 toll-free in the US: 1.800.653.3871 www.informatica.com Informatica Offi ces Around The Globe: Australia Belgium Canada China France Germany Japan Korea the Netherlands Singapore Switzerland United Kingdom USA 2008 Informatica Corporation. All rights reserved. Printed in the U.S.A. Informatica, the Informatica logo, and PowerCenter are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. 6734 (09/17/2008)