Pentaho Kettle Solutions Building Open Source ETL Solutions with Pentaho Data Integration

by ; ;
Edition: 1st
Format: Paperback
Pub. Date: 2010-09-28
Publisher(s): Wiley
List Price: $52.50

Buy New

Usually Ships in 2-3 Business Days.
$50.00

Rent Book

Select for Price
There was a problem. Please try again later.

Rent Digital

Rent Digital Options
Online:1825 Days access
Downloadable:Lifetime Access
$36.00
$36.00

Used Book

We're Sorry
Sold Out

How Marketplace Works:

  • This item is offered by an independent seller and not shipped from our warehouse
  • Item details like edition and cover design may differ from our description; see seller's comments before ordering.
  • Sellers much confirm and ship within two business days; otherwise, the order will be cancelled and refunded.
  • Marketplace purchases cannot be returned to eCampus.com. Contact the seller directly for inquiries; if no response within two days, contact customer service.
  • Additional shipping costs apply to Marketplace purchases. Review shipping costs at checkout.

Summary

This book describes Kettle and how it can be implemented, applied and managed, including an extensive collection of use cases and best practices. A major part of the book will be based on Kimball's 34 ETL subsystems. (Note that the book does not assume prior Kettle or ETL knowlegde which makes it an ideal start for anyone wanting to learn an ETL tool.)The book will cover all distinct components that make up the Kettle product and shows how they can be applied to real-world scenarios. The book uses a solutions-oriented approach, meaning that the available toolset is not discussed from the tool perspective but from the solution perspective (i.e. what someone can accomplish using the product).The first half of the book (parts 1, 2 and 3) is devoted to the basic Kettle functionality and how it can be applied to get ETL solutions up and running. Parts 2 and 3 follow the '34 ETL subsystems' as described by Ralph Kimball. The 34 subsystems cover the entire ETL lifecycle and make for an excellent guideline to cover all parts of data warehousing with Kettle. The second half of the book (parts 4, 5 and 6) cover more advanced or specialized topics like clustering, extensibility and loading a data vault model. For every subject a real life example will be used that people can easily relate to, but due to the diverse nature of the different chapters there won't be an overall case to illustrate the concepts by. The variety of examples will also ensure a more lively discussion of the different topics. The book and the samples in it cover everything from simple single table data migration to complex multi system clustered data integration tasks.When people have read this book they will have learned the following: What ETL and data integration is, and why they need it The components that form the Kettle ETL tool set (and how s these components fulfill particular data integration needs) How to install and configure Kettle, and how to connect it to various data sources and targets. How to design and build every aspect of an ETL solution using Kettle How to build and load a data warehouse with Kettle How to deploy and schedule ETL solutions How to integrate and extend Kettle How to run and scale Kettle solutions using a distributed 'cloud' environment

Author Biography

Matt Casters is Founder of Kettle and works as Chief Data Integration at Pentaho, where he leads Kettle software development. Roland Bouman is an application developer focusing on open source web technology, databases, and business intelligence. Jos van Dongen is an independent business intelligence consultant and well-known author, analyst, and presenter.

Table of Contents

Introduction
Getting Started
ETL Primer
Kettle Concepts
Installation and Configuration
An Example ETL Solution-Sakila
ETL
ETL Subsystems
Data Extraction
Cleansing and Conforming
Handling Dimension Tables
Loading Fact Tables
Working with OLAP Data
Management and Deployment
ETL Development Lifecycle
Scheduling and Monitoring
Versioning and Migration
Lineage and Auditing
Performance and Scalability
Performance Tuning
Parallelization, Clustering, and Partitioning
Dynamic Clustering in the Cloud
Real-Time Data Integration
Advanced Topics
Data Vault Management
Handling Complex Data Formats
Web Services
Kettle Integration
Extending Kettle
The Kettle Ecosystem
Kettle Enterprise Edition Features
Built-in Variables and Properties Reference
Index
Table of Contents provided by Publisher. All Rights Reserved.

An electronic version of this book is available through VitalSource.

This book is viewable on PC, Mac, iPhone, iPad, iPod Touch, and most smartphones.

By purchasing, you will be able to view this book online, as well as download it, for the chosen number of days.

Digital License

You are licensing a digital product for a set duration. Durations are set forth in the product description, with "Lifetime" typically meaning five (5) years of online access and permanent download to a supported device. All licenses are non-transferable.

More details can be found here.

A downloadable version of this book is available through the eCampus Reader or compatible Adobe readers.

Applications are available on iOS, Android, PC, Mac, and Windows Mobile platforms.

Please view the compatibility matrix prior to purchase.