IDG Contributor Network: Bridging the developer and data scientist gap with cloud, notebooks and PixieDust


A wealth of information hides in the vast amount of data produced every day—roadside sensors measuring traffic volume, medical imaging for rapid diagnosis, and satellites circling overhead analyzing weather patterns. In nearly every industry, cloud enables exponential growth by providing cheap, remote storage of data, access through a variety of devices, and elastic compute for data processing at scale. But, how can we capture the full potential of this data? 

To do so requires a closer collaboration between data scientists and developers. As data-driven intelligence becomes a more integral component of nearly every function—from inventory management to personalized customer marketing—these two roles are finding the need to work together in tandem. Yet many teams today still struggle with doing so, as they continue to work with different tools and in separate languages.

Notebooks, for example, are powerful, cloud-ready tools that often require experience with programming languages that are popular among data scientists, like Python, for their strength in numerical analysis. Because of their Python base, in particular are often overlooked by developers, who typically prefer working in languages such as Java or Node.js.  

However, notebooks can offer tremendous potential to help bridge the gap between developers and data scientists, and can bring collaboration and benefits to both sides. Notebooks allow users to write and share code and rich text, all in one environment understood by both data scientists and developers. This allows them to work on the same data sets simultaneously, instead of the traditional process in which developers hand off raw data to data scientists, who translate it into languages like Python for analysis and then give findings and models back to developers – who must translate it yet again into their preferred language, such as Java or HTML.  

. PixieDust is an open source helper library for Jupyter notebooks that allows developers to explore data analysis models without having to learn or code in statistical languages. Fueled by the collaborative power of the cloud, PixieDust enables users to visualize data, build dashboards, and more efficiently share data findings within notebooks.  

By using notebooks and PixieDust together, the data scientist and developer can work in their preferred language in the same notebook. This means a developer can obtain early insights into raw data at the same time a data scientist begins working with the same sets—allowing both sides to immediately view trends worth exploring, as well as communicate feedback around potential new features, without waiting for the typical translation to be completed first.