Help & Support

212.660.6600

Evaluation: Driverless AI vs. DataRobot

As we set about constructing WyzProfile, our data analysis and business intelligence tool for direct marketing, we went through the process of examining many existing tools that we could integrate into our back end. In previous articles in this series, we described working with a number of data integration tools for pulling in Big Data sets for the purpose of analysis. In this article we will examine machine learning platforms.

Why ML Tools?

Machine learning is a core component of our Artificial Intelligence based audience segmentation solution, WyzPredict. WyzPredict is designed to predict who will be inspired by a particular direct mail package and who will reject it. We examined several tools that could assist us deliver these predictions with the goal of integration into our end product.

What Criteria we Chose

While the tools we examined have a wide range of features, from our perspective in creating WyzPredict, we focused on a few criteria that were specific to our needs. We needed to know how easily it would run on our infrastructure, which is a combination of AWS and Hetzner (which we leveraged because they provide inexpensive Graphics Processing Unit power). Since we process billions of consumer characteristics, computing power was one of the main considerations. It was important to us whether the product actually used GPU acceleration.

It was important to us that we could embed the functionality, including licensing it for use within our proprietary application.

We examined whether the product had the ability to automatically identify or calculate various “features” without intervention. Features are the unique characteristics that Artificial Intelligence uses to identify patterns and predict a result.

For our executive audience, we wanted to know if it could recognize patterns and “explain” what patterns were the most important in impacting the outcome, such as holidays as a factor for when someone responds to a campaign.

We evaluated the leaderboard, to determine whether it was possible to select and deploy models based on various advantages or disadvantages. We also looked over what sort of Business Intelligence features it had included, such as graphics showing patterns, both for data analysis and reporting purposes. 

Importantly, we needed to know whether it could work with Big Data, and what sort of data reporting was included, and whether it could work with datasets that stored data in a large number of columns.

Tools Examined

In this piece we will be examining two of the most popular tools side by side. These are:

  • H2O Driverless AI

  • Data Robot

H2O Driverless

Demo

Driverless provides a generous 21 day free trial. They also provide a free sandbox where you can upload your data and practice creating a few models to get a feel for how the product works (data is kept live for 24 hours). This enabled us to spend some time with the product to identify its strengths and weaknesses prior to making any decision.

Licensing

Driverless does not provide any form of licensing which would enable us to include the product in other applications. The licensing model is based on a single subscription for each user. However, H2O, the creator of Driverless AI, provides a set of open source libraries which could be used.

GPU Acceleration

One of the stronger features of Driverless AI was its ability to run algorithms through a Graphics Processing Unit (GPU). This enabled us to train models quickly at a fraction of the time it would have taken otherwise. To put this into context, models could be run within approximately 3 to 4 hours, in contrast to several days, as was the case with DataRobot.

As a result, Driverless is one of the fastest ML tools on the market.

Open Source

While it is not an open source project itself, H2O Driverless AI makes use of a large number of open source libraries. They have also developed several of their own, which proved to be effective.

Runs on our infrastructure

Driverless was uniquely suited to running on our infrastructure. As mentioned earlier, we use AWS but also integrate this with Hetzner which provides inexpensive GPU processing. The AWS hosting allows running on any kind of instance, and Driverless AI was able to be smoothly integrated into our systems. 

Feature Generation

The level of sophistication of the AI within Driverless was impressive, and it showed the ability to generate features based on patterns within our data. The process in Driverless is automatic; it can “understand” the data it is provided. 

Ability to select and deploy model

One of the features we sought was the ability to use the leaderboard to be able to choose various different models based on different types of criteria. Unfortunately, Driverless’ dashboard isn’t designed for identifying whether a model was, for example, faster or more accurate; scores were rated by overall effectiveness. While we might be able to dig in and find this data, there was no easy way to simply select the model to deploy it. 

To choose a model for deployment, we needed to identify it, make a note of which one it was, and then separately choose to deploy it by downloading the files and then installing them, making the process a bit cumbersome.

Below is a screenshot of a Driverless leaderboard.

BI module

As being able to visualize data is key to understanding, good business intelligence tools are a nice feature in Driverless AI. It includes a few charts and graphics to help gain an understanding of your data, both pre-and post-processing.

Here’s a report of various variables in a file, including clear indicators as to their importance in a model.

Big Data

H2O Driverless AI showed the ability to work with Big Datasets Driverless makes use of “Sparkling Water,” an H2O product which is a modification of Apache Spark, designed for scaling Big Data with Driverless’ ML learning algorithms. 

Data Reports

Driverless provides detailed data reports regarding the quality of data. We can get a view of the data shape, any outliers, or missing values that may be present in our sources.   

For instance, we can see a clear report of the distribution of data, below:

Often in a model we may be missing data. The following report makes it very clear where there are common patterns of data that may be absent, which is highly common in direct marketing data. This report can be helpful for visualizing what potentially valuable information may be missing from a model. 

Many Column Data

In many cases, we are working with data that will contain vast numbers of columns or keys. In market research, we ideally want information not just about who a user is (gender, address, email, etc) but also information about preferences and more. We may also be combining information about user income, shopping history, website browsing history, political leaning and more, all of which may be helpful in drawing out a clear picture of our typical customers.    

As a result, datasets can stretch out from tens, to hundreds or more data points. We found the Driverless AI was able to handle data from large numbers of columns, largely through the power of being able to use the GPU for fast processing.

DataRobot

Demo

DataRobot does not have an easy method for setting up a free trial from their website. It was possible, however this required that we contact them directly to set up a trial.  

Licensing

One major drawback for DataRobot was its inability to be licensed for embedded use. They have an explicit policy regarding usage and their payment model was per user, which could become difficult to manage within our application. To use DataRobot, it would require using their brand and purchasing individual licenses for each instance.

GPU Acceleration

The speed of DataRobot was not impressive. As their demo does not use GPU acceleration, running models uses a considerably larger number of resources, and to be able to run tests could take several days, as compared to the several hours for Driverless.

Open Source

DataRobot does integrate a large number of Open Source libraries (including those created by H2O), so many models are available.

Can run on system

It was possible to run DataRobot on our servers (on AWS specifically, since without the GPU acceleration, Hetzner became irrelevant). 

Feature Generation

While DataRobot provided some basic feature generation (such as identification of missing values), it was unable to provide the same level of automation as Driverless.

Below is a workflow of how it detects, calculates and populates missing values within some keys.

Ability to Select and Deploy Model

One of the biggest strengths of DataRobot is its easy-to-use leaderboard. We were able to identify upon a glance which models are the most accurate, or which ones ran the fastest. We were also able to select the desired model from the leaderboard and deploy it with a click.

BI

DataRobot provides some excellent Business Intelligence tools. For instance, below is a simple feature impact chart, showing which elements in our models would have the greatest toward identifying and predicting customer behavior.

Big Data

Unfortunately, DataRobot has some serious trouble handling Big Data. Due to its built-in processing limits and the fact that the Open Source libraries it is built on don’t have this capability, it struggles heavily for any dataset of any size. 

The result is that we would be forced to work with subsets or samples of our data, which in many cases can provide useful information, may miss out on some important trends, and also increase the likelihood of data leakage.

Data Reports

DataRobot does make it easy to gain some solid information about the quality of the data you provide it. It provides some good basic reports which will help a business user to gain an understanding of the data.

For instance, the following report helps identify which fields may provide data leakage due to false positives in the data:

Many column data

Like Driverless, DataRobot can handle data from datasets that carry a large number of keys.

Summary

Overall, both companies provide a mature and high-quality product. Both Driverless AI and DataRobot have experienced and competent staff. One major difference lies in the philosophy of the two tools. Driverless AI provides their tools as open for academic sites, and they can sometimes be made available at no cost to educational and non-profit institutions. Many free courses are available on their site. They release major parts of their source code as open source to be usable by other products (in fact, DataRobot uses some of their libraries).

In contrast, DataRobot is a well financed industry leader with an early mover advantage. They provide a high-quality product, and the usability they provide is excellent.

After testing both of these products, due to an edge in model performance which we attribute to better feature engineering, and largely due to its ability to use GPU processing, Wyzoo is recommending Driverless AI to our clients.

Appendix: Comparison Chart

Main Requirements

Driverless

Datarobot

Has trial demo

Yes

Yes, but need to contact sales department

Can be licensed for embedding

No

No

GPU acceleration (demo)

Yes

No

Uses Open Source Libraries

Yes

Yes

Can be run on our infrastructure?

Yes

Yes, but requires additional discussion

Feature Generation

Yes

Weak

Model Selection/Deployment

No

Yes

Built in BI

Yes

Yes

Can work with Big Data

Yes

No

Has data reports

Yes

Yes

Many Column Data Capability

Yes

Yes

 

Resources

Other Articles

How We Chose a Tool for Data Ingestion

How We Chose a Tool...

Choosing the Right Tools for Data Ingestion, Part 2

Choosing the Right T...

Related Tools

DataRobot
Artificial Intelligence/ Modeling/Segmentation Commercial

DataRobot

DataRobot provides the ideal combination of automated machine learning, comprehensive training, and professional services to make your vision real.

H2O
Artificial Intelligence/ Modeling/Segmentation Limited Open Source

H2O

H2O.ai is the creator of the leading open source machine learning and artificial intelligence platform trusted by hundreds of thousands of data scientists...

Auger
Artificial Intelligence/ Modeling/Segmentation Commercial

Auger

Auger.AI offers the industry most accurate Automated Machine Learning. It intelligently traverses the infinite space of algorithm / hyperparameter combinations...

Related Experts

Data Scientist

Data Scientist

Data Engineer

Data Engineer

Machine Learning Engineer

Machine Learning Engineer
robo happy