All Tools> Pentaho
10 min read
Pentaho Data Integration and Analytics is an open source software solution designed to handle data management tasks often required by many database marketing organizations. Pentaho, as we are examining it here, can be divided into two tools. They can be described as follows:
The Data Integration tool is a standalone ETL application which enables the creation of data pipelines to access, prepare, blend data, and then to eventually load the various sources into a database. It has the capability to load data from multiple sources, ranging from basic CSV files, to various databases and metadata formats.
Data can be transformed with a wide range of built in functions. Beyond basic input/output, validation and mapping functions, Pentaho provides a suite of tools that one would expect from a full-fledged ETL suite, including a wide range of data-mining and statistical analysis features. It has full capability to handle Hadoop map reduce functionality with a minimal level of coding skill required.
Pentaho also comes packaged with an easy to use high-level business analytics application server, which enables the creation of various analytics tools for quick analysis. By dragging and dropping data from spreadsheets or through database queries, users can quickly create a wide range of charts and graphs, which can be easily configured into interactive dashboards for tracking of direct marketing data, customers, campaigns, and more.
The Business Analytics server is easy to configure from an administrative perspective to allow creation of multiple user accounts with varying levels of access. It also includes a scheduling tool which can refresh data according to a number of specified parameters.
To demonstrate how this might be used for a direct marketing organization, we'll walk through a workflow that might make sense from a business perspective, from data input to reporting.
Pentaho's Data Integration module allows users to extract data from a number of data sources, transform it so that it is in a standardized format, and load this information into a central database to make it possible for future analysis.
Loading customer data from a CSV file is accomplished by dragging a data input node onto the canvas and loading the file. The data can be previewed, and then imported. Storing this into a database simply requires dragging and dropping a “table output” node, specifying a target database, and connecting the two nodes. By defining a table in the database, this will automatically take any of the information from the input file and generate a SQL query:
The transformation created will consist of only two nodes and a connection.
Typically with customer contact data, it's quite common that a significant amount of necessary data may be missing from the original file. It is relatively simple to create a basic workflow which will examine the data from the original file and identify those fields that are missing, such as postal zip codes, and combine them with a different file for lookup purposes.
The workflow below shows connecting two files: the first one is the original csv file, and a simple conditional check for existence of a postal code. If the condition succeeds, it writes those rows to the database. For those that are missing this data, it checks a second table which provides a basic mapping table showing association of address data with postal codes. We can check to see if those postal codes can be determined; then these two pieces of data can be merged and separately loaded into the database.
While this is a relatively simple demonstration, as it uses two similar data source types, Pentaho integrates well with numerous sources, ranging from the aforementioned CSV files as well as multiple relational and non-relational databases, and APIs. It boasts a simple interface for performing some more complicated functions, such as enabling users to be able to filter large amounts of parallel data, with some built-in Hadoop mapreduce functionality.
In order to gain solid business intelligence out of marketing or sales data, Pentaho provides an easy to use Business Analytics platform.
This tool can be used either directly with data transformed and loaded by the Data Integration module, or with independent datasets. The interface for this business intelligence module is designed specifically with business users in mind; very little technical knowledge is required beyond standard business analysis.
Loading data from a CSV file or from a SQL query is driven by wizards, which allow previewing data, and also allows managing data types and field formatting on the fly:
Pentaho's Business Analytics tool provides some helpful graphical analysis tools. For instance, if working with multi-dimensional datasets, creating interactive tables is no more complicated than creating pivot tables in spreadsheet software. It provides functionality which allows drilling down to individual data segments, such as sales per year and region:
The resulting summarized data can easily be converted into a graph with only a few clicks.
Each of these individual reports, can be saved as widgets and can be used to quickly create interactive dashboards:
If you data is being regularly updated into a database through a datastream, it's easy to trigger queries off of a schedule, keeping dashboards up to date based on user-specified criteria.
The uses for Pentaho from a marketer's perspective is patently clear from the outset. By being able to load and transform data with little required expertise, other than a basic understanding of the data itself, it makes it easy to get up and running. Particularly useful are the reporting features, which enable entire teams to be able to create personalized dashboards in order to keep track of customer responses within campaigns, as well as maintaining a customer base . The ease of use provides little barrier to entry.
While the interface itself is easy to understand, particularly for the Business Analytics functions, getting it up and running was a bit of an ordeal. In order to run the trial locally on a Windows machine requires the use of the Java application service, which immediately uses up a tremendous amount of active memory. If one does not have a powerful computer, it will run rather slow, especially compared to other tools which make use of an Apache server requiring a much smaller footprint. That said, from an enterprise perspective, it could work well.
Once up and running, Pentaho is remarkably easy to use and get started, and most features were fairly self-explanatory. One drawback is the slowness of the graphical rendering, particularly for some dashboards. While this could be a function of the power of the computer running it, it appeared to have slightly slower performance than comparative tools of this nature.
Information about the Pentaho community is a bit elusive. Access to the support community requires a purchase of the commercial software. The Github repository does not track issues. Pentaho has a tag on Stackoverflow. It has approximately 3200 mentions, of which approximately half remain unresolved.
However there do appear to be many third-party online tutorials, suggesting that the tool is popular and is used.
Pentaho official forum
Data Integration: 3.0
Business Analytics: 5.0
The two tools listed here need to be treated separately.
While using the data integration at a basic level is relatively easy to understand, to make use of the more powerful features requires some understanding of data science. Pentaho earns a strong score for creating simple interfaces, however understanding what, for instance, map reduce is, or understanding how to handle Mondrian coding would help one make use of its rich array of features. For small marketing firms with limited in-house technical resources, using the data integration tool can be somewhat overwhelming.
The Business Analytics dashboard tool is refreshingly easy to use. Everything from loading files to generating reports or creating dashboards can be handled by someone with basic spreadsheet skills.
Pentaho can be a powerful tool for handling marketing data activities. It is user-friendly and has an appealing look and feel, and can be quite powerful for data access and transformation. It boasts many tools which make data manipulation relatively painless, and it will work well for complex analytics and use cases.
Its drawbacks are mainly tied to the smallness of its community. Some features require a little more technical knowledge than all companies may have on-hand. Other problems have to do with the slowness of its graphical rendering.
Overall, from the point of view of a marketing organization, Pentaho makes a strong showing among its competitors.
10 min read
A review of KNIME Analytics software for use in direct marketing business intelligence operations.
10 min read
A review of Talend Open Studio ETL software for use in direct marketing business intelligence operations.
10 min read
A review of Alteryx software for use in direct marketing business intelligence operations.