All Tools> DataRobot
10 min read
DataRobot is an automated machine-learning platform which has positioned itself to be accessible to technology professionals with a wide range of experience and abilities.
While it boasts features that are typically expected by experienced data scientists and analysts, it has a number of features which make it possible to be used by those with other technology specialties, and usable enough to make it relatively easy for business executives and for those without a pure technology background to adopt AI into their enterprises.
The DataRobot platform includes two independent but fully integrable products: Automated Machine Learning, and Automated Time Series. For the purpose of this review, we will be looking at the Automated Machine Learning module.
DataRobot’s Automated Machine Learning allows the creation of advanced regression and classification models, ranging from simple linear models to gradient boosting and neural networks. It also comes equipped with many beneficial visualization tools which can assist with better understand your data and the performance of your chosen learning models.
From the perspective of marketing data analytics operations, using the data exploration features can help gain a much richer and more detailed understanding of a customer dataset. It can help identify which characteristics are most likely to be strongly correlated to purchasing behavior.
By using this information, you can more closely correlate your marketing campaigns to ideal target prospects.
Here you can see a useful visualization of which characteristics or features have the greatest impact.
As is demonstrated in some of the visualizations below, it’s possible to easily identify interactions between customer features, and to see the extent to which they have an impact on the overall customer behavior within your dataset. This information can be applied to future campaigns to maximize your ROI.
When working with direct marketing datasets, it’s quite normal to have a considerable number of records with missing values in some key categories. Through ML, DataRobot can help detect and automatically populate many of these missing data points by using operations such as one-hot encoding, missing value imputation, text mining, standardization, and data partitioning.
DataRobot is one of the first automated machine learning tools with a powerful modeling engine. DataRobot makes use of a number of open source machine learning R and Python-based libraries. These include scikit-learn, H2O, TensorFlow, Vowpal Wabbit, Spark ML, and XGBoost and applies the same techniques that data scientists use, including boosting, bagging, random forests, kernel-based methods, GLM, and many others.
DataRobot provides a leaderboard in order to compare various models. You are able to drill down into each model to learn more details about which features it used, how much data it trained on, and its overall accuracy scores. Data Robot will create indicators showing which models are the most accurate and/or which are best suited for the deployment and you can choose the model which best suits your needs depending on different use case scenarios. For example, if you have a limited number of resources, the best model may not be the best choice if it is slow, and speed of processing is a factor. You can also select more than one.
To better understanding which model is better, DataRobot provides LIFT charts for model comparison (this can be particularly useful for marketing datasets).
Models built in DataRobot can be used in production immediately. You can upload your data to be evaluated and use APIs to generate predictions, and even create a few lines of code to be embedded directly into your applications.
Also, you can observe the performance of all deployed models from a central portal, and easily refresh and replace models if some model will perform better scores.
For a beginner data scientist: 5.0
for an experienced data scientist: 3.0
While this tool is quite easy to understand, it ran into some difficulty in processing power when working with big data.
The models are trained on the CPU so the power can be increased because of the number of workers. If you have a large dataset, the average job for four workers will take up to 4-5 hours. Our tests showed that it worked best if data did not exceed 3Gb.
If you are working with imbalanced data, where classes are not represented equally (as is common with direct marketing data), it often makes sense to use an F1 metric for optimization. Unfortunately, DataRobot does not provide this particular metric, however you have the option to use logloss or KS metrics.
For a beginner data scientist: 5.0
for an experienced data scientist: 5.0
DataRobot has a high-grade, understandable UI. It is very easy to upload data, define the target, explore top models, and deploy the best model.
DataRobot provides a versatile and easy-to-understand machine learning platform which can prove useful for direct marketing operations. The interface is very approachable, giving it strong marks among the array of ML tools being created for general users. It has a wide range of features, many of which could be very helpful for direct marketers. It has some drawbacks regarding processing speed for big data sets, however for many smaller operations this may be an ideal tool.