Building cloud-native applications at scale requires choosing your stack carefully. One popular tool is , a NoSQL database designed to scale rapidly without affecting application performance. It’s an ideal platform for working with big data, with built-in map-reduce tools based on Hadoop, as well as its own query language. Originally developed at Facebook, it’s since been used at CERN, Netflix, and Uber.
Azure initially offered Cassandra support through DataStax’s offerings in the Azure Marketplace before , as well as providing guidance for users who wanted to build and deploy their own Cassandra systems on Azure VMs. It’s now developing its own Cassandra implementation, with a , designed to work .
Apache Cassandra on Azure
Cassandra is a distributed database, with each node connected to each other via the gossip protocol. Nodes run on multiple machines, organized as a data center and deployed as rings of nodes. All nodes are peers, so if any one node is lost, the system can keep operating while a replacement starts. Rings can peer with other rings, too, allowing you to have on-premises systems work with cloud-hosted systems, or one region with others for global resilience. Nodes can be added or removed from a ring as necessary, offering linear scaling. To double performance or capacity, all you need to do is double the number of nodes.
is perhaps best thought of as a way of extending on-premises data into Cosmos DB. There’s been demand for on-premises Cosmos DB since shortly after launch, but its deep integration with the Azure platform makes it hard for Microsoft to separate it. By offering integration between its Azure implementation and Cosmos DB, it’s now possible to set up an Azure-hosted Cassandra ring and peer it with on premises and with Cosmos DB. You can now replicate data between on premises and the cloud, taking advantage of Cosmos DB’s capabilities to run global-scale distributed applications while working with local Cassandra instances to handle regulated data operations in your own data center.
There are other advantages to using Managed Instances, as you can hand over much of the day-to-day operations of a Cassandra ring to Azure. It will automatically deliver upgrades and updates, handling patching so your database always runs the most secure version of the software. With less management overhead, you can concentrate on building applications rather than maintaining your stack.
Getting started with Managed Instances
There’s not much difference between setting up and running Azure’s Apache and any of its other managed open source databases. , then search for Managed Instance for Apache Cassandra to create a cluster.
at least $2.11 an hour per server, depending on where you are provisioning the service. P30 disks offer 1TB of storage per disk and cost at least $122.88 a month (with additional charges for mounts).
Running Casandra in Azure won’t be cheap, but then it’s not for small applications. You’re going to be shifting a lot of data around your application even if you’re only using it as a gateway to Cosmos DB.
. As before, create and deploy a Cassandra cluster in Azure, setting its name and connecting it to an Azure VNet. You will need to configure Cassandra for node-to-node encryption, so if your on-premises install isn’t using it, enable it. Export your encryption certificates and use the Azure CLI to install them in your Azure-hosted cluster. These will enable your two sites to communicate over encrypted gossip connections.
The VNet will need to connect to your local network, either over dedicated Express Route connections or using a site-to-site VPN. What you use will depend on how much data you intend to ship to Azure, although experimental clusters are likely to use a VPN to avoid the cost of setting up a dedicated multiprotocol label switching (MPLS) connection.
and then use the Apache Spark Cassandra connector to link to your endpoints, you can then use Spark and Databricks notebooks to run analytics on your Cassandra-hosted data.
It’s interesting to see how Microsoft’s commitment to hybrid cloud operations translates to working with data. By offering a managed route to running Cassandra, the company provides a natural bridge for NoSQL data between your on-premises tools and the cloud. It’s a two-way connection, enabling local processing of sensitive data while taking advantage of cloud scale for your applications (and eventually expanding into the global scale of Cosmos DB).
Cassandra’s own replication protocols provide the bridge, while Azure ensures that it’s up to date and secure. The result is an effective set of tools that solve many of the problems associated with linking cloud and data center, one that can take advantage of tools like Apache Spark to deliver that data to other Azure services that rely on big data.
Copyright © 2021 IDG Communications, Inc.