APIs have become the mechanism of choice for connecting internal and external services, applications, data, identities, and other digital assets. As a result, APIs now have the potential to serve as a similarly valuable mechanism for analytics. Equally important, APIs can provide a significantly easier-to-use alternative to the traditional, ad hoc approaches to data collection and data analysis that have slowed the process of converting information into the intelligence required by today’s data-driven organizations.
The alliance of APIs and analytics is a natural one, since both technologies are critical to streamlining operations and unlocking innovation. Typically, an organization will begin its digital transformation by embracing APIs to enhance the integration of systems and automation of processes. With several comprehensive turnkey API management solutions on the market, enterprise developers can get a system into production in weeks to months, building in integrations to easily fill in any gaps. From there, the team can continuously improve the implementation.
The next step in digital transformation is analytics as enterprises evolve toward becoming data-driven businesses. Among the technologies being employed to understand an organization’s dynamics and help with decision-making are sophisticated data aggregation, machine learning, data mining, and data visualization. Together, they enable enterprise teams to understand the dynamics of the business, detect patterns, and predict future developments. However, the challenges associated with collecting data and building custom analysis have hindered the adoption of analytics. And even when adopted, analytics is nowhere near having the transformational impact once predicted.
This article explores the challenges of embracing analytics using traditional approaches, examines how API management can address these challenges, and presents a solution blueprint for using API management to mine valuable data for analytics.
Roadblocks to analytics adoption
In implementing analytics, organizations face three critical challenges, each of which has the potential to delay or derail the project.
First, unlike with API management, there are no turnkey analytics solutions. Instead, the organization has to build a custom analytics solution by combining different analytics technologies, whether products or open source projects. This, in turn, requires the development team to write a significant amount of code to integrate the necessary technologies, as well as existing systems.
Second, the organization will need to employ data engineers (developers) and data scientists (architects) who have a deep understanding of statistics, machine learning, and systems. These professionals (which are in short supply) will need to decide what insights are useful, determine which key performance indicators (KPIs) to track, design a system to collect data, and get other groups in the organization to add data collection code. They will also have to write their own analysis logic, carry out the actions based on outcomes of analysis by writing more code, and understand, from the first to the nth level, the repercussions of those observations.
Third, to collect data, organizations need to add instrumentation (sensors) across the organization in order to generate events that signal notable activities. Such a project requires coordination across multiple groups—ranging anywhere from 10 to 20 teams in large enterprises. Additionally, organizations may need to wait for the sensors to be shipped to them. As a result, the instrumentation process often is both expensive and time-consuming.
Despite the potential far-reaching impact of analytics, all of these roadblocks have limited the adoption of analytics to date.
The advantages of API-driven analytics
API management has the potential to enable the wider use of analytics due to two factors. First is the extensive adoption of API management solutions, which has been growing at more than 35 percent per year since 2016, driven by the demand from customers and partners to expose business activities as APIs to enable closer integration and easier automation. This API technology is backed by mature tools and a strong ecosystem.
Second is the strategic positioning of API management within all of the message flows of an organization. APIs are becoming the doorways through which all internal and external interactions of an enterprise flow. Even websites and other user interfaces rely on these APIs to carry out their back-end functions. It is easy to see how watching API traffic could enable teams to ascertain how the organization functions over time. As APIs become the mediators of all interactions, the API management solution can become a portal that shows how an organization works.
Therefore, rather than building a turnkey analytics solution, we should be thinking about making a turnkey API-driven analytics solution an integral part of API management tools. Such a solution is feasible for a couple of reasons.
To start, because API management sits at the crossroads of all communications within or without the organization, we can instrument the API management tools instead of the actual systems. This can be done once as part of the API management framework, which can be updated as needed. Then, by collecting messages that go through the APIs, we can get a full view of the organization. This centralized approach eliminates the need for an enterprise to coordinate 10 or 20 teams to add instrumentation to all of the systems. It also removes the challenge of managing the multiple formats of data collected via the system instrumentations of traditional analytics.
Instead, since all data is collected through one logical layer with the API management system, the format of the data is known. This enables the development of a turnkey API-driven analytics solution that supports common use cases, such as fraud detection, customer journey tracking, and segment analysis, among others, as out-of-the-box scenarios. A team of skilled data scientists—whether within a software vendor, systems integration firm, or enterprise development team—can invest in building complex analyses that cover most of the common use cases. The analyses for these scenarios then can then be used by multiple organizations or multiple groups within a large enterprise.
The next section describes a blueprint for a turnkey API-driven analytics solution that follows the processes here.
A blueprint for API-driven analytics
In a turnkey API-driven analytics solution, we can instrument API management tools instead of instrumenting every system or subsystem across the whole enterprise. The data collected by instrumenting all API activities can provide enough information to analyze and get a rich understanding of the organization and its inner workings. Further, updating the analytics capabilities can be achieved by updating the API management software—one system managed by a single group, rather than involving multiple systems and teams in the organization.
The following picture shows a high-level blueprint of an API-driven analytics solution that is layered on top of API management.
In the approach illustrated here, data collected at the API layer would include information about the following:
- The request and response, including timestamps, headers, full message, message size, and request path URL
- The invocation, IP address, username, and user agent
- Processing, including time started, time ended, outcome, errors, API name, hostname, and protocol
Just using the above information, the analytics system could build a detailed picture of which users are invoking which APIs, from where, and when. That view could be further analyzed to understand the customer journey, for instance understanding what activities led the customer to buy, and to understand the loads received by an API.
However, the views listed above will be too technical for many users without one more level of mapping to business concepts. Following are some examples of such mappings:
- In addition to knowing how many requests are received, it would be useful to know the money flows related to each request.
- In addition to knowing just the API name, it would be useful to know which business unit the API belongs to and the average cost to serve a request.
- In addition to knowing the customer name, it would be useful to pull in customer demographics and slice and dice the data based on demographics.
In short, to deliver more business-level insights, the data collection layer has to go beyond the obvious and collect additional information. Let’s explore two techniques for accomplishing this.
The first technique is to annotate the API definition with information about what interesting data is available inside the message content. This enables the data collection layer to automatically extract such information and send it to the analytics system. Most messages use XML or XPath, and instructions to extract information can be provided as XPath or JSON XPath expressions.
The second technique is to annotate the API definitions with details about data sets that can be joined with collected data to enable further processing. For example, a data set might provide customer demographic data that can be joined against customer names or other information, such as the business unit the API belongs to and the average cost to serve a request.
As mentioned earlier, all data is collected through one logical layer, so the format of the data is known. Therefore, a team of skilled data scientists could build complex analyses that cover most of the common use cases. For example:
- Detailed analysis of revenue and cost contribution by different business units, APIs, business activities, different customer segments, and geographies on an ongoing basis.
- Trend analysis and forecasting of incoming and outgoing money flows based on trends and historical data.
- Customer journey analysis that explores how the sales pipeline converts to customers and what activities have a higher likelihood of leading to conversions.
- Fraud detection based on overall activities as well as individual customers when they deviate from normal behavior
Implementing such solutions would enable companies to concentrate their resources—to invest their time and knowledge in delivering the best offerings and experiences—rather than having to rediscover the analyses and build them from the scratch. Turnkey analytics won’t cover all use cases, but they can add readily recognized value from day one. With key use cases covered out of the box, teams then can build their own analytics apps on top the collected data to handle edge cases. Finally, the APIs themselves can trigger actions with the support of the turnkey solution.
The proposed solution described here could be built on top of existing analytics solutions, such as MapReduce systems, machine learning frameworks, and stream processors. Rather than replacing those technologies, the solution would work with them to define data formats, provide turnkey data collection mechanisms, and deliver turnkey analytics apps that work from day one.
Challenges of API-driven analytics
The turnkey API-driven analytics approach presented in this article is not without its challenges.
The first challenge is adding annotations to API definitions that describe how to extract interesting information from messages as part of the API development experience. It is important to make this step painless as possible. Achieving this may include providing tools to explore the messages, select a certain area for extraction, and even suggest important data points to extract.
The second challenge is implementing data extraction and data collection steps efficiently within the API gateways that would act as proxies between customers and service implementations. Since they are in the critical path of all API invocations, suboptimal implementations can drastically affect performance.
The third challenge is identifying and implementing common analytics solutions that can be built on top of data collected from API calls. This includes figuring out the best algorithms as well as the best way to represent the data and best user experiences. This is a hard problem. However, compared to status-quo, where each organization or business unit has to figure out its own analytics, the proposed approach enables the development of reusable solutions for analytics scenarios.
APIs serve as a portal that shows how an organization works, providing information about the enterprise’s operations, interactions, and business unit details, among other insights. This presents an opportunity to instrument API management tools to collect data rather than instrumenting the entire enterprise.