Harness Hadoop and Spark for user-friendly BI


Big data shouldn’t be an area for only academics, data scientists, and other specialists. In fact, it can’t be. If we want big data to benefit industry at large, it needs to be accessible by mainstream information workers. Big data technology must fit into the workflows, habits, skill sets, and requirements of business users across enterprises.

Datameer is a big data analytics application doing exactly that. Combining the user interface metaphors of a file browser and a spreadsheet, Datameer runs natively on open source big data technologies like Hadoop and Spark, while hiding their complexity and facilitating their use in enterprise IT environments and business user scenarios. 

In other words, Datameer creates an abstraction layer over open source big data technologies that integrates them into the stable of platforms and toolchains in use in enterprise business environments. Business users tap the power of big data analytics through a familiar spreadsheet workbook and formula interface, while also benefiting from enterprise-grade management, security, and governance controls.  

Before we dive into the details of the platform, we should note that Datameer supports the full data lifecycle, including data acquisition and import (sometimes referred to as “ingest”), data preparation, analysis, and visualization, as well as export to other systems, such as databases, file stores, and even other BI tools. 

technology provides four major algorithms that make it even easier to find the signal in the noise of big data: clustering, decision trees, column dependencies, and recommendations.

Models based on these algorithms manifest the same way other analytical assets within Datameer do: as sheets in a workbook. The sheets show all model data, along with predicted values, and the Flipside view will render a graphical representation of the model and its content.

By incorporating machine learning functionality in-situ, within the workbook user experience, Datameer provides machine learning capabilities without forcing users to have vastly specialized skills or endure abrupt user interface context switches.

This integration is further extended through the use of an optional Predictive Model Markup Language (PMML) plugin, provided by our partner, Zementis. The plugin allows scoring against machine learning models built in other tools (and published in PMML format) by exposing them within Datameer as additional spreadsheet functions.

A patent-pending execution framework

Datameer simplifies selection of execution frameworks through its patent-pending engine, which picks the best framework for users along each step in the analytics workflow. It takes full advantage of Apache Tez, Apache Spark, and Datameer’s own single-node, in-memory engine, freeing users from having to evaluate the best engine for any given analytics job or task.

Smart Execution provides a future-proof approach to big data analytics. By decoupling the design experience from processing on a particular execution engine, Datameer permits workbooks developed today to be functional against new execution frameworks tomorrow, as they emerge and take their place in the Smart Execution platform.

While open source big data technologies hold the keys to answering new business questions, they weren’t designed with business users in mind. Combining a spreadsheet workbook and formula interface with a cost-based query optimizer that picks the right engine for a particular set of tasks, Datameer turns Hadoop, Spark, and company into user-friendly BI tools for the business at large.

Datameer makes the big data aspiration a reality by harnessing the power of these platforms and working with them in their native capacities, not merely treating them as relational databases. At the same time, Datameer embeds these open source technologies into a business-user-oriented application, premised on familiar spreadsheet constructs, for working with data across its lifecycle and extracting relevant information from it.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to .