ETL (Extract, Transform, and Load) tools provide a mission critical function for marketing organizations and their agencies. Many companies have customer lists or potential clients stored in a spreadsheet or database. They may also have information about target markets, or a list of potential contacts that have been provided by sales, or a set of leads that have been purchased. This data may be scattered into multiple different formats. Information exists within them but finding it or making sense of it can be a difficult process, and in some cases, if the data sources are large enough, truly formidable.
ETL tools, at the basic level, enable loading data from multiple data sources, combine and transform them into a format which can then be loaded into a database for further querying. Beyond these rudimentary functions, many of these tools contain a wide set of extra features. These can include everything from data analytics tools including many predictive modeling features, to useful output functions, including the ability to create graphics, charts and full-fledged dashboards.
To assist these companies to profit the most from their data, data scientists have developed some reliable procedures for helping to navigate these vast seas of information.
They have evolved beyond simple ETL functions to being complete, or nearly complete Business Intelligence tools.
In this article we are going to look at two tools side by side: KNIME, which is a powerful free open source ETL and BI tool, and Alteryx, which is a commercially licensed product. Both tools boast a wide range of capabilities. We will be examining them from the perspective of a Direct Marketing organization, seeking to gain knowledge from either simple or broader data sources.
Our approach will be to address these from the perspective of those who have a strong business need but may or may not have advanced technological know-how or capabilities. Information used in this article is based on both personal and general public consensus about the use of these two products.
The pieces we are comparing will include:
Both KNIME and Alteryx use a similar workbench type of approach. Workflows are chosen from menus of nodes.
In KNIME, you can see a list of nodes in a repository, divided into segments. Each node can be dragged onto the canvas and can be connected by dragging a line from an output to an input of another node. By right-clicking on a node, you can set configurations based on its functionality.
The Alteryx dashboard is similar. Nodes are grouped into an easy to understand menu on the top of the application. Nodes are arranged into color-coded categories, including In/Out, Data Preparation, Join, Parse, Transform, Reporting, and more.
Within Alteryx, each configuration is embedded into the node. Clicking on a node opens the interface, which can be expanded or closed as needed. Maneuvering between nodes is considerably easier than with KNIME. With KNIME you may end up with a large number of extraneous windows all over the place, taking up a considerable amount of memory. This can be a factor on a slow computer.
From a pure interface perspective, Alteryx has a much more intuitive workflow.
We primarily use these ETL tools for the purpose of data preparation. As approximately 80% of the work involved with analysis in a direct marketing campaign is spent on this aspect, the ease of use and functionality of these types of tools is crucial. For both KNIME and for Alteryx, you can pull data in from multiple locations, ranging from CSV files, databases, or from cloud sources.
In KNIME, by double clicking on a node we are given access to configuration options. Here’s a file reader, opening a CSV file:
When loading a file, we are given a preview of the data in the file and the ability to modify it, including file types and more.
KNIME provides the ability to filter not just for rows, but also for columns, which can be useful if one has an overly large range of keys in a database or spreadsheet.
Alteryx provides multiple screens which enable easy drag and drop and selection of datatypes in a visual way. After connecting an Input/Output tool to a database or a file we are automatically provided a quick visualization with detailed information about each data point contained.
The Alteryx’s data cleansing tool is quite clear and easy to understand. Modifying data types is easy, as one can use a drop-down to select types.
While both tools can handle data preparation and data cleansing equally well, KNIME’s interface proves to be a little bit more confusing. Data type field conversion can be time consuming and is not intuitive. As preparing data for analysis is a time-consuming process in itself, the amount of time spent simply getting datatypes into a correct format can increase the amount of workload, and as a result, extend the amount of time required to begin actual analysis to assist in marketing campaigns. For this reason, in an equal world, Alteryx’s data preparation tools feels a bit superior to KNIME’s. However, it does lack some functionality that KNIME does have, which is mainly the ability to filter by columns.
Both KNIME and Alteryx provide some solid tools for combining data, however the functionality and ease of use varies between them.
KNIME’s join tool is relatively easy to understand and is easy to use for combining multiple datasets off of standard shared identifiers. For instance, as shown below, if we have two files which contain both area code and phone, operating on the assumption that a phone number is a unique identifier, we can use these to link together two different datasets.
We then have the ability to easily filter data from each dataset and can decide whether or not we want to include or exclude it in the eventual destination database.
Alteryx’s join tool behaves similarly. We can choose a simple identifier (such as UserID, shown below) and use it to link together both datasets.
It allows one to essentially construct SQL queries without having to write a line of code. One of the huge advantages is that the data itself does not need to be completely transformed into a new file before configuring connections between datasets – all of the work can be done on existing data, and the manipulation can happen on the fly – no need for messy spreadsheets with detailed inserts to get the data into the database. The cleaning and connections exist before being loaded
However, there is a very important caveat that must be made about Alteryx’s ease of use. Occasionally (read: often), while meaning well, developers may design a system that is meant to be easy to use, but through oversimplification of processes end up counter-intuitively making it more difficult to use.
Alteryx, chooses to combine the UNION features in SQL with JOIN (including inner and left/right joins). While this may work with small datasets, for larger amounts of information (as was crucial in our case), this made working with it unnecessarily time-consuming, as it became necessary to wait for Alteryx to process each application to complete to proceed. As a result the performance of Alteryx while combining data suffers. KNIME, on the other hand, treats each activity as a discrete activity in a separate node. While this may seem convoluted at first (particularly for those of us used to working directly in SQL), in the long run the performance payoff is large.
As we have seen, both KNIME and Alteryx have solid and easy-to-use data manipulation and joining capabilities. While the interfaces are slightly different, it’s not terribly difficult to use either one, and both make it possible to quickly combine two or more datasets on the fly, however if working with larger sizes of datasets, KNIME definitely came out stronger.
For many data science purists, there’s a tendency to want to manually program one’s own modules. As Python is a powerful tool for this process, it’s often considered not too difficult to write custom programs to identify model behavior. However, reality being what it is, difficulty is highly variable depending on the user. Having some citizen-as-data-scientist tools built in is not necessarily a bad thing. This can be helpful for identifying at least some preliminary trends, which may be able to later inform more detailed analytics.
KNIME is fairly strong when it comes to including various predictive analytical tools. Being open source, hundreds of developers have worked to create a wide array of plugins and adapters to be able to use many pre-existing functionalities, many of which will do almost all of the work for you.
KNIME includes a number of ML-style tools to allow learning from existing models and to create new predictive models. It can handle various regressions, and can assist in building decision trees, and participate in some model evaluation.
KNIME includes an exceptionally wide range of tools. Below is a high-level listing of various data mining modules that are included:
They also include a large set of statistical tools:
For calculating the distances between data objects, both analytically and literally, they provide a nice set of distance calculation tools:
Here is a demonstration of a logistic regression model for solving a classification problem:
Alteryx includes a few useful data investigation tools, including Pearson and Spearman correlations.
It enables loading in various analytical models, and also has a built-in scoring model for predicting values or classes of datasets, and it can be used for scoring new models. Alteryx also includes some simulation sampling.
However, beyond these, unlike KNIME, Alteryx does not offer anywhere near the same range of analytical tools. If one is wishing to do some serious modeling and testing, it’s probably best to use external tools best suited for this purpose. Alteryx does, however, allow exporting data to be used easily by such tools.
After going through extraction and transformation of data, the third segment of the ETL process involves “loading” or outputting of data into a selection of standardized formats. Both Alteryx and KNIME provide this functionality.
IN KNIME, you have the ability to export your data into a standard csv file for working with later. You can also export your information into a BIRT (Business Intelligence Reporting Tool) report, or as a Tableau Hyper file.
Example BIRT Report:
KNIME also makes it possible to export data into a wide range of databases and formats, ranging from Microsoft SQL Server, MySQL, PostgreSQL, H2, and many more.
Below is a very simple KNIME workflow taking data from an SQLite database, regrouping it and feeding it into a new database.
For outputting data, Alteryx provides an easy to use “Output Data Tool,” which will connect to a vast array of different data sources and file formats. These include standard CSV flat file-like options, to SQL and non-relational databases, as well as a number of cloud-based big data formats from Google, Amazon, and Microsoft.
Here is a subset of some of the output formats that you can use within Alteryx
Graphing / Reporting
As visualization is an important part of any data analysis, it’s helpful to be able to create a few graphs so one can identify trends, or concentration of likeliness of purchasing behavior within a specifically targeted demographic.
While this is not the primary purpose of ETL tools like KNIME or Alteryx, it is helpful to be able to generate a few quick graphs or visualization
KNIME, being the open source homunculus that it is, boasts a wide range of charts and graphs that can be generated, and many are quite easy to create. For example, here’s a basic bar chart:
Putting these together into a dashboard is, however, a much more involved process. From the point of view of a direct marketing organization, one will likely wish to use external software better designed for creating business/marketing-friendly graphics. The fact that KNIME provides easy connections to BIRT and Tableau strengthens this recommendation.
Alteryx itself also includes a number of graphics, however it too is not designed specifically for this purpose, so as a result, visualizations are not particularly strong.
That said, there are still many helpful options that are included. For instance, here’s a map of a trade area around a store:
Summary / Conclusion
Both KNIME and Alteryx are powerful ETL tools which will well serve the functions of many a marketing operation. As for how they stand up against each other, this depends largely on who is using them and how they intend to be used.
KNIME and Alteryx take a slightly different approach to the ETL process. KNIME is completely modular - each activity is separate, whereas Alteryx has included some higher level-functionality, by combining tools that make sense to work together by an analyst. Instead of being a “set of tools” it becomes more of a multi-tool, but where certain aspects are combined, this doesn’t always provide the expected user-experience - causing trouble with large datasets.
If one is seeking a tool which is very user-friendly, and which handles most of the tasks that are necessary in data preparation, Alteryx stands out as the far more approachable option. However, KNIME is itself completely workable for many organizations. It is, however, not a tool that would be recommended for beginners or those with limited technical skills. While Alteryx prides itself on making it accessible, KNIME requires that one have a solid understanding of data science.
If one is looking for a tool which can handle all ETL functions, and would like some hefty analytical options, KNIME is far stronger in terms of its capability, however if one’s primary purpose is to extract, prepare, transform and load data (which is what ETL is for, afterall), Alteryx excels at this by helping users handle this quickly and seamlessly, reducing the amount of work that often occurs during this phase.
The difference in the pricing of the two options is large – Alteryx typically starts at around $3000 annually, while KNIME is free. KNIME can handle any of the functions that Alteryx can, so if an organization is willing to invest the amount of time getting used to using it (and dealing with headaches in data preparation) it is a fully workable tool.
One final note about support: while KNIME is open source and has a medium size community, Alteryx actually boasts a more adaptive and user-welcoming support environment.
While either approach can be a workable solution for your direct marketing organization, which one you choose would depend on how user-friendly you need it to be, how much support, the cost, and overall data analysis functionality you might need.
For more information about these tools, read our reviews: