Help & Support

212.660.6600

Evaluation: Auger.AI vs. H2O Machine Learning Tools

In the previous article we looked at tools to use in the machine learning component of Wyzoo’s automated WyzPredict service.  In that article, we examined both DataRobot and H2O’s Driverless AI product. While both products had some strong capabilities, we wanted to compare how they would perform compared to other options available, as these tools were not licensable for embedding into commercial third party solutions.   Given our goals for our customer experience with WyzPredict, we needed to be able to combine data from multiple sources, identify ideal models for accurately reaching target markets, and to provide solid reporting functionality without forcing users outside of the application. 

In order to address this problem, we chose to look closer at some tools which either made use of open source ML libraries, or had licensing options which would allow incorporation into our WyzPredict service.   

Criteria we chose

To reiterate what was described in the previous article, we examined whether a product has a free trial (or was available for free as Open Source), and whether it was licensable as an embedded “engine”. We needed to determine whether it ran on our infrastructure, and whether it was capable of GPU acceleration. We looked for any feature generation capabilities, data reporting, whether it could handle big data, and importantly, whether it could handle data with many columns.For front-end features, we examined whether the product provided a leaderboard with the ability to select and deploy models, and whether there were any built-in Business Intelligence features.

Tools Examined

 In this article we will examine and compare the following two tools and compare them against the ones evaluated in the previous article: 

  • Auger

  • H2O Auto ML library.

Auger

Auger is a machine learning platform which allows users to load data, test models for their performance, select and deploy these models from a leaderboard, and help provide real-time predictions of user behavior. 

Demo

Gaining access to a demo copy of Auger was not difficult. They also offer a free tool, which while not the full package which involves creating new models, is free for the process of evaluating already trained models.

Private Label

While we had run into problems previously with both Driverless AI and DataRobot not allowing the software to be used for embedded applications,Auger does provide this ability.In fact, they encourage this use case.

GPU Acceleration

Auger does not provide the option of using GPU Acceleration, which as described in the previous article, can have a large impact regarding the speed of processing.As we mentioned previously, the difference can theoretically be as much as the reduction of several days processing time to several hours.Given Auger’s licensing model, fast processing would have given it a boost, however it unfortunately does not have this option

Open Source

Auger is built on top of several open source libraries which provide the possibility to understand which algorithm is better for a specific user case. The extent of use of these libraries, however, was not clear; they were masked in the product itself.

Runs on our Infrastructure

Auger had the ability to run on our infrastructure (both AWS, and Hetzner, for GPU processing). The staff at Auger were helpful in providing assistance with the configuration. However, at the time of this writing, it appears they have changed this business model, requiring the use of their infrastructure.

Feature Generation

Auger possesses some simple feature generation, such as being able to generate new columns based on addition, subtraction, multiplication, squaring, however it was not particularly advanced, so automatically creating various ML-driven events was not possible (e.g. such as being able to identify holidays, as we found with Driverless).

Ability to Select and Deploy Model

Auger provides a leaderboard which is at least nominally helpful for identifying which models provide the most accurate results against tests.These models can then be chosen and deployed directly from the leaderboard.

However, the features for identification of quality were not as detailed as was found in Driverless. While it is possible to select the “best,” or most accurate models, it was not easy to identify which models would run the fastest, so therefore it was not possible to compare accuracy vs. time. This was an issue for the creation of Wyzoo, as we had the goal of being able to provide processing that works quickly, and in the best interests of direct models. 

BI Module

One of the drawbacks of Auger is that it does not contain any Business Intelligence features. Being able to visualize results and see patterns is a large part of data analysis. For instance, ability to visualize model results using LIFT or GAIN charts would be extremely helpful in identifying any anomalies.The lack of this functionality made Auger less appealing than other options. To be able to gain any real business intelligence would require integration with a third-party tool. 

Big Data

Auger unfortunately proved that it was unable to work with Big Data. While it is quite useful and effective at working with smaller datasets, such as those one could load from a spreadsheet, it was not particularly useful for our purposes which require working with data sets of over 1.4 billion consumer characteristics. 

Data Reports

While Auger provided some data reporting, it was somewhat simplistic.It could identify the type of data column and minimum and maximum values within a dataset, but not much beyond this.

Many Column Data

The weakest aspect of Auger was its complete inability to handle many-column data.When attempting to process our datasets, the system would just continue to run, but would provide no errors whatsoever.While attempting to handle our data, Auger ran unresponsively for 24 hours with no status notifications.This in itself is highly problematic since Auger charges per hour of usage. It eventually became unclear to us whether it was working or not, as it lacks any monitoring tools to let one know of the progress of data processing. 

H2O AutoML

AutoML is an Open Source library provided by H2O.ai.It’s designed for automatically building large arrays of models, and provides the ability to identify which model has the best performance without any prior knowledge of the datasets.

It is designed to operate with a minimal set of parameters, and theoretically allows a user to point to a dataset, designate the output/response column, and specify a number of time limits for how much time or resources should be used, or how many models need to be trained.

In some ways, AutoML does not compare as easily to the other ML tools, as it does not boast the same rich set of resources that the others do, however we are including it here as it provided the functionality which we ended up using in Wyzoo.The libraries available here are quite powerful and adaptable to our needs. 

That said, we will attempt to cover the basic aspects of AutoML as they apply to our framework.

Demo

AutoML is a free library, usable as open source under the Apache 2.0 license.

Private Label

Under the Apache license, we are able to use this software for any purpose. 

GPU Acceleration

This particular library does have the possibility for fast processing using the Graphics Processing Unit if configured properly.

Runs on our Infrastructure

We were able to use these open libraries easily within our own infrastructure.

Feature Generation

AutoML does not provide any feature generation abilities.While feature generation is part of H2O’s Driverless product (which is the paid version), this free model does not provide this functionality.

Ability to Select and Deploy Model

Model selection and deployment is not a feature that comes with H2O AutoML.It is essentially a library which enables measurement of different AI models, and contains no leaderboard or direct user interface.

BI Module

AutoML contains no Business Intelligence module; this is not a tool designed for business users.One needs to be an expert to work with it.It provides a powerful backend but has no front-end functionality.

Big Data

The AutoML library is capable of working with Big Data and performed admirably against our tests. 

Data Reports

One drawback is that AutoML contains no data reports;it needs third-party libraries to be able to get this information. In our case we used Pandas-profiling for data reporting. A sample view is shown below:

Many Column Data

As mentioned before, the ability to read many-column data is crucial for our purposes, considering the nature of direct marketing data, which tends to include vast numbers of data points which can be accessed and can be used for analysis. Unlike Auger, the AutoML library had no trouble working with many-column data.

Summary

Evaluating ML tools that could perform the processes we needed to do proved to be somewhat difficult.No one tool had all the features that we wanted or needed. While Driverless AI provided a great deal of functionality that would have served us well, the inability to use it in embedded solutions was prohibitive for us; licensing would have been difficult to obtain.

Auger, while available for use, was disqualified quite simply because of its inability to handle complex multi-column data and its lack of any monitoring ability to explain why this was the case. DataRobot also had licensing issues, as well as problems with processing time due to its inability to use GPU acceleration. 

In conclusion, with H2O AutoML, we were able to create an internal feature generation process using human intelligence.Once added into these libraries, it became as good as DataRobot and Driverless AI, and we found that it was best suited for our embedded solution purposes. For our clients seeking to deploy their own environment, we recommend DataRobot and Driverless AI, with a preference of Driverless AI for large consumer data marketing applications.

Appendix: Comparison Chart

 

Main Requirements

Driverless

Datarobot

Auger

H2OAI AutoML

Has trial demo

Yes

Yes, but need to contact sales deparment

Yes

Yes

Can be licensed for embedding

No

No

Yes

Yes

GPU acceleration

Yes

No

No

Yes

Uses Open Source Libraries

Yes

Yes

Yes

Yes

Can be run on our infrastracture?

Yes

Yes, but requires additional discussion

Yes

Yes

Feature Generation

Yes

No

Yes, very simple

No

Model Selection/Deployment

No

Yes

Yes

No

Built in BI

Yes

Yes

No

No

Can work with Big Data

Yes

No

No

Yes

Has data reports

Yes

Yes

Yes

No

Many Column Data Capability

Yes

Yes

No

Yes

 

 

Resources

Other Articles

Evaluation: Driverless AI vs. DataRobot

Evaluation: Driverle...

Choosing the Right Tools for Data Ingestion, Part 2

Choosing the Right T...

How We Chose a Tool for Data Ingestion

How We Chose a Tool...

Related Tools

H2O
Artificial Intelligence/ Modeling/Segmentation Limited Open Source

H2O

H2O.ai is the creator of the leading open source machine learning and artificial intelligence platform trusted by hundreds of thousands of data scientists...

Auger
Artificial Intelligence/ Modeling/Segmentation Commercial

Auger

Auger.AI offers the industry most accurate Automated Machine Learning. It intelligently traverses the infinite space of algorithm / hyperparameter combinations...

DataRobot
Artificial Intelligence/ Modeling/Segmentation Commercial

DataRobot

DataRobot provides the ideal combination of automated machine learning, comprehensive training, and professional services to make your vision real.

Related Experts

Data Scientist

Data Scientist

Data Engineer

Data Engineer

Machine Learning Engineer

Machine Learning Engineer
robo happy