9.25.2013

Using machine learning to make sense of IT’s big data

Editor's note: This article is by Amit Paradkar, Senior Manager, Services Delivery Management & Analytics, and Arun Ayachitula, Senior Technical Staff Member, Service Management Research.

Arun Ayachitula
Business’ IT infrastructure is not something customers see. And more and more often, the businesses don’t “see” it either as they outsource the management of servers and services to third-party partners. This works well until a server outage disrupts a business’s operations – and they can’t please those customers.

An airline recently reached out to IBM Support and Research to help solve a 2,000 “help” tickets-per-month issue affecting everything from pilots’ flight manifests to passenger boarding passes. The issue in the case of the airline – and many other similar cases of crashing servers – is making sense of big data.

Understanding big data

Amit Paradkar
It’s often the case that when an outage happens, the IT team in charge works to restore the system as quickly as possible. Yet the same problem often creeps back since the root cause – understanding the data – is not determined.

So, the first challenge as the outsourcing partner is to understand the client’s existing information. Some will be structured in databases, server logs and data warehouses. Some will come in the form of unstructured textual information, such as the human-written help tickets. Very few of these data sources are linked together.

Next, we map these data sources together for a holistic view of the IT infrastructure, and then use text analytics, predictive modeling and machine learning to uncover and solve the client’s problems that couldn’t have been found – much less tackled -- in the past. For example, in order to make sense of these tickets, we use text analytics to understand what incident is being described, identify the relevant server being referenced and then find a link between the ticket information and the server logs.

Being able to pinpoint the root cause of the problem requires understanding not only the failures that the system monitors, but also having an automated analysis of the tickets reporting those errors. Creating new insights based on the analytics ultimately improves service operations and prevents that affect business transactions.

Back at mission control

In order to pinpoint the issue that was causing the airline’s outage, we performed an in-depth ticket analysis to identify the server problems, as well as the servers being referenced. We then cross referenced this ticket information against the information stored in structured databases and server logs to find possible links.

Our machine analysis revealed a list of possible issues that one of our IT administrators used to make to best solve the problem. It would be nearly impossible for human experts to analyze all the data on their own and come to a conclusion. This automated method is very similar to how Watson is being used in healthcare – Watson analyzes all of the data and provides a possible diagnosis, yet it’s the physician who makes the final call. In our case, IT managers are making decisions about servers.

By analyzing the big data within the airline’s IT environment, we identified 20 servers that were driving 40 percent of the workload. By focusing on this small set of servers, we were able to help the airline reduce the number of tickets being issued per month from 2,000 to 800.

The future of big data and strategic outsourcing

Currently most big data goes unprocessed, but as we develop new technologies to analyze that data, companies can get even more valuable insights to manage their IT operations. By solving problems at the root of their IT infrastructure they can drive efficiencies, cut costs and ultimately improve customer experience.

No comments:

Post a Comment