Auger is a machine learning platform which allows users to load data, test models for their performance, select and deploy these models from a leaderboard, and help provide real-time predictions of user behavior.
Gaining access to a demo copy of Auger was not difficult. They also offer a free tool, which while not the full package which involves creating new models, is free for the process of evaluating already trained models.
While we had run into problems previously with both Driverless AI and DataRobot not allowing the software to be used for embedded applications,Auger does provide this ability.In fact, they encourage this use case.
Auger does not provide the option of using GPU Acceleration, which as described in the previous article, can have a large impact regarding the speed of processing.As we mentioned previously, the difference can theoretically be as much as the reduction of several days processing time to several hours.Given Auger’s licensing model, fast processing would have given it a boost, however it unfortunately does not have this option
Auger is built on top of several open source libraries which provide the possibility to understand which algorithm is better for a specific user case. The extent of use of these libraries, however, was not clear; they were masked in the product itself.
Runs on our Infrastructure
Auger had the ability to run on our infrastructure (both AWS, and Hetzner, for GPU processing). The staff at Auger were helpful in providing assistance with the configuration. However, at the time of this writing, it appears they have changed this business model, requiring the use of their infrastructure.
Auger possesses some simple feature generation, such as being able to generate new columns based on addition, subtraction, multiplication, squaring, however it was not particularly advanced, so automatically creating various ML-driven events was not possible (e.g. such as being able to identify holidays, as we found with Driverless).
Ability to Select and Deploy Model
Auger provides a leaderboard which is at least nominally helpful for identifying which models provide the most accurate results against tests.These models can then be chosen and deployed directly from the leaderboard.
However, the features for identification of quality were not as detailed as was found in Driverless. While it is possible to select the “best,” or most accurate models, it was not easy to identify which models would run the fastest, so therefore it was not possible to compare accuracy vs. time. This was an issue for the creation of Wyzoo, as we had the goal of being able to provide processing that works quickly, and in the best interests of direct models.
One of the drawbacks of Auger is that it does not contain any Business Intelligence features. Being able to visualize results and see patterns is a large part of data analysis. For instance, ability to visualize model results using LIFT or GAIN charts would be extremely helpful in identifying any anomalies.The lack of this functionality made Auger less appealing than other options. To be able to gain any real business intelligence would require integration with a third-party tool.
Auger unfortunately proved that it was unable to work with Big Data. While it is quite useful and effective at working with smaller datasets, such as those one could load from a spreadsheet, it was not particularly useful for our purposes which require working with data sets of over 1.4 billion consumer characteristics.
While Auger provided some data reporting, it was somewhat simplistic.It could identify the type of data column and minimum and maximum values within a dataset, but not much beyond this.
Many Column Data
The weakest aspect of Auger was its complete inability to handle many-column data.When attempting to process our datasets, the system would just continue to run, but would provide no errors whatsoever.While attempting to handle our data, Auger ran unresponsively for 24 hours with no status notifications.This in itself is highly problematic since Auger charges per hour of usage. It eventually became unclear to us whether it was working or not, as it lacks any monitoring tools to let one know of the progress of data processing.