Q&A: Microsoft Cosmos DB creator lays out vision for planet-scale database


Microsoft has fired a shot heard around the globe, so to speak, in data management with the debut of at the recent Microsoft Build 2017 developer conference in Seattle. The cloud database is positioned for elasticity and globally available data, supported on the Azure cloud. The project was founded in 2010 by Microsoft’s Dharma Shukla, who holds the title of distinguished engineer at the company.

InfoWorld Editor at Large Paul Krill spoke with Shukla during the conference to get his perspectives on the technology.

InfoWorld: Why is this project, which began more than six years ago, going to the public now?

Shukla: It’s a very complex system. The goal we had was to build a globally distributed database system, which makes the data automatically available wherever the users are so that you can be in any part of the world and you get really low latency, really fast performance when you access your data. You can scale throughput and storage elastically, pay only for what you need across the world. Cloud is perfect for it. When you write your app, your app is automatically available everywhere, wherever the datacenters are.

in 2015, which was a milestone along the journey. Since then, we’ve been adding more and more stuff, so now it’s culminated into Cosmos DB, the service that we released.

InfoWorld: With DocumentDB, those customers were moved over to Cosmos?

Shukla: Yes, they are automatically Cosmos DB customers.

InfoWorld: How is Cosmos DB an advance over DocumentDB?

, I talk about at a very level a list of things. But for each one it is probably worth 20, 30 pages of design documents. We stand on the shoulders of giants. Underneath us, we have millions of lines of C++ code, but underneath us is also a lot of Azure infrastructure that has helped us.

InfoWorld: Do you anticipate Cosmos DB ever running on other clouds? Would you ever let this run on the Amazon cloud or the Google cloud or is this specifically going to be Azure only?

Shukla: We want to be on Azure Stack, so that is one path. In terms of other cloud providers, one of the things we have done is that we have opened up all APIs. You can use Mongo APIs, and over time, we’re going to add other APIs. We add Gremlin APIs [which are in preview mode]. We support DocumentDB, SQL APIs, we support Azure table storage APIs. Over time we are going to support other APIs. You can use these APIs, and you don’t have to change your app. That’s the direction we have.

InfoWorld: How do you hope to improve Cosmos DB in the future? What capabilities do you plan to add to it that aren’t there now, and when might we see these?

Shukla: We are constantly evolving. We have many optimizations in the area of how we [work with] different ways of tiering data so you can, if you haven’t used your data for some period of time, you can archive it. If you want the freshest data, we prioritize those over the data that you’re not accessing recently—things like that are in the path forward. In terms of APIs we go by the [level of] interest.

InfoWorld: Did SQL Server provide any of the basis for Cosmos DB, or are they two different technologies?

Shukla: Two completely different technologies. The common thread between them is that they both use Azure components underneath. That’s a common layer, but they’re both different technologies.

InfoWorld: How has Cosmos DB worked out at different user sites that have had preview versions of it? was mentioned as an early user.

Shukla: They’ve been in production with this for, I think, close to a year.

InfoWorld: Is there anything else you want to say about Cosmos DB?

Shukla: I would say a couple of things worth noting. One is that it is the world’s first globally distributed database that supports multiple data models, and it’s extensible, so we’ll keep adding. It is the first globally distributed database that makes global distribution of data turnkey, meaning you can go with a single click of a button or an API call. There is no setup of machines or datacenters or replication topologies. It’s simple. The mission for us is to enable developers to write globally distributed apps easily because we think users would benefit from low latency around the world, high availability, consistency choices, scaling throughput worldwide, all of these things.

And you can count on it because it is backed by very comprehensive SLAs. It’s the first time since the arrival of cloud that a service has SLAs that are beyond high availability. Everyone has high availability SLAs, but this has low-latency SLAs, as well as consistency SLAs and throughput SLAs. This is something that a lot of engineering has gone into, so large companies or startups who care about at that scale like or some of the other names we have, they can count on it.