IDG Contributor Network: In an age of fake news, is there really such a thing as fake data?


Deloitte Global that medium and large enterprises will increase their use of machine learning in 2018, doubling the number of implementations and pilot projects underway in 2017. And, according to Deloitte, by 2020, that number will likely double again.

Machine learning is clearly on the rise among companies of all sizes and in all industries and depends on data so they can learn. Training a machine learning model requires thousands or millions of data points, which need to be labeled and cleaned. Training data is what makes apps smart, teaching them life lessons, experiences, sights, and rules that help them know how to react to different situations. What a developer of an AI app is really trying to do is simulate the experiences and knowledge that take people lifetimes to accrue.

The challenge many companies face in developing AI solutions is acquiring all the needed training data to build smart algorithms. While companies maintain data internally across different databases and files, it would be impossible for a company to quickly possess the amount of data that is needed. Only tech savvy, forward-thinking organizations that began storing their data years ago could even begin to try.

As a result, a new business is emerging that essentially sells synthetic data—fake data, really—that mimics the characteristics of the real deal. Companies that tout the benefits of synthetic data claim that effective algorithms can be developed using only a fraction of pure data, with the rest being created synthetically. And they claim that it drastically reduces costs and save time. But does it deliver on these claims?