All Tools> Dremio
10 min read
Dremio is particularly useful for how it captures your data lineage. Data lineage is the record of the life cycle of data which includes data origin, data loading, data aggregation, data transformation, etc. The vision of Dremio is to make data engineers more productive while making data consumers more self-sufficient.
As a direct marketer, you will find many reasons to use Dremio. But there are three features which are notable:
The Following image shows you what Dremio is capable of doing.
Dremio is a self-service data ingestion tool. Although data extraction is a basic feature of any DAAS tool, most DAAS tools require custom scripts for different data sources. Dremio has a different approach for data extraction. Dremio creates a central data catalog for all the data sources you connect to it. With that, anyone can access and explore any data any time, regardless of structure, volume or location. No matter how you store your data, Dremio makes it work like a standard relational database. Furthermore, you don’t have to build data pipelines when a new data source comes online. Dremio gives you instant access.
The Data Reflection feature of Dremio makes it one of the fastest data processing systems. The system automatically accelerates data and queries up to 1000x faster leveraging the full power of relational algebra. Dremio has a vertically integrated Query Engine that automatically generates query planes to make the best use of Data Reflection. Another important feature of Dremio is its Native Push Downs which result in query optimization for every data source. In other words, you have a query language which is optimized for Amazon S3, HDFS, NoSQL, RDBMS, ADLS independently. Last but not least, Dremio uses Apache Arrow and Apache Parquet to utilize high-performance columnar storage and execution as opposed to normal row based databases. In simple terms, this means lightning fast performance on very large data sets.
Dremio facilitates automatic scaling from one server to thousands of servers in one cluster if needed. You can easily integrate new data sources as well within the cluster. Dremio can handle very large data sets and heavy workloads.
Data visualization is the easiest way to get meaningful insight from your data. Visualization enables data to be more human readable. For example, different types of graphs available in Dremio will display data in a format easier to interpret. Dremio functions as the data visualization pipeline. With Dremio, you don’t have to do complex manipulation of data by writing complex SQL queries or complex code. It does joining, filtering or processing of data for you.
Dremio charts interpret data in a more human-readable form.
There is a long list of data sources that Dremio supports. Most simply, you can upload a CSV file, Excel sheet, or delimited file from your local computer. After that, you simply join with Dremio data sources before querying or using any BI tool. Alternatively, Dremio supports many third-party data sources such as Amazon Redshift, Amazon S3, Amazon Elasticsearch, Azure Data Lake Store, Elasticsearch, HDFS, Hive, MapR-FS, Microsoft SQL Server, MongoDB, MySQL, NAS, Oracle, Postgres and others.
Data breaching is the most common form of cybercrime. Analytical systems are a natural target. Therefore, the value of a high-security architecture for a product like Dremio can’t be emphasized enough. Dremio has taken many steps to protect users from possible threats.
Authentication and authorization play the biggest role in any security architecture. Dremio uses a FIPS 140-2 compliant cryptographic algorithm to manage user credentials in internal user authentication and supports secret and key rotation. Certificates can be updated by using the Java Keystore tool.
Dremio can be deployed on-premises or in a public cloud. There are three deployment patterns commonly used:
It is recommended to use Dremio on dedicated hardware as it will allow Dremio to use the local filesystem for persisting reflections. For example, for AWS deployments, S3 is supported for persisting reflections, which provides cost-effective reliability without sacrificing performance
Note that your deployment plan should consider the following factors as well.
Diagram of Dremio Deployment in Azure VM
Support for a wide range of data sources, easy deployment, accelerated analysis, optimized queries, and advanced security are a few reasons to choose Dremio over other open source data ingestion tools. One Dremio success story comes from Hotmart, a digital marketplace for online courses. As its customer base reached 1 million, Hotmart started facing data access and performance challenges. Dremio was able to successfully resolve those challenges by introducing this DAAS platform allowing business users to search, curate, and share data from any source with others, then analyze it using their favorite tools, all without being dependent on IT.
Dremio is a complicated product. It was primarily built for data engineers. But it has evolved over time. Despite its sophistication, it is quite intuitive to work with. It has done justice to its vision of being a self-sufficient data platform for everyday users. Its user interface contains self-guiding instructions so you can understand easily how to connect data sources, how to perform data aggregation and data transformation, how to optimize SQL queries, how to create virtual data sets and how to use BI tools. We give it 4 stars for its intuitive user experience.
Following image shows you a screenshot of an intuitive Dremio Dashboard
Dremio has only 100 contributors on GitHub with this product as a repo in their profiles. This is a low number compared to contributions for other data-as-a-service solutions. You won’t even find many Dremio related questions on StackOverflow. Therefore, we give this product 3/5 stars for its active support community rating.
You have to have an understanding of data engineering and data science to make the maximum use of Dremio. For example, data aggregation, data transformation, and SQL optimization requires you to have some knowledge of these subjects. Also, you should be familiar with the data sources you are using. Deployment requires you to make some important decisions. Knowledge of clusters and clouds would be necessary. Dremio’s getting started guide and documentation is helpful. Its dashboard is fairly easy to use and very intuitive. Considering all this, we give 4 stars for this rating.
We listed many of the data sources that Dremio supports before. Additionally, Dremio supports Excel, CSV and JSON formats. You can use advanced data science languages such as R and Python. Dremio connects analysts with their favorite BI tools such as Power BI, Tableau, and Qlik Sense. For example, joining data in Tableau is much easier with Dremio than other data-as-a-service solutions. Dremio supports LDAP servers for security.
Following image shows some of the most used tools with Dremio
Dremio is an open source (meaning, no licensing cost) self-service data access tool. Dremio is among the best data lineage documentation and tracking tools. It supports all the major third-party data sources and has super-fast analytical algorithms. Several deployment options are available. Intuitive dashboards will help make it easier to use. Dremio provides documentation good enough to start using it.. Dremio has established a good reputation among the direct marketing community among others and it should remain a leader for some time.