The promise of big data is straightforward: having more information about a business can help you make better decisions about its future. No surprise that technology investments in the last decade have zeroed in on storing and visualizing data.
But it’s also left us with what I call the messy middle.
Think of it this way. You collect warehouses of data, in real time, and then ask an analyst to look for hidden gems of insight. But before they can even get started, they have to prepare the data – much like a chef who can’t start cooking until all the ingredients are cleaned, chopped, and ready to go.
Analysts typically have two choices when it comes to data preparation: do it themselves or ask their IT departments. The former takes time and can leave useful data to go stale. The latter requires an available budget and competition with other IT priorities. And yet, the speed at which new information continues to arrive – what I call data velocity –keeps accelerating.
“Data is exploding faster than our ability to put our arms around it, so you’re going to have to adapt,” said the retired United States Army general at the Domopalooza 2016 conference. “The right answer on Monday is never going to be the right answer on Tuesday.”
The reality is that analysts spend far too much time on data preparation, often writing custom code scripts or spreadsheet macros to gather, clean, classify, augment, and merge information. That’s why I’m convinced the next round of big data investments will focus on automated, intelligent data-preparation tools. Tools that let analysts spend more time doing their real jobs. Tools that require clicks, not code.
Tools that clean up this very messy middle.
How Things Got So Muddled
We now generate data at a ‘round-the-clock pace that almost defies comprehension – the equivalent of 250,000 Libraries of Congress every day. This information has no standard form, arriving in everything from spreadsheets and memos to video and social media posts.
The research firm IDC created a great way to understand the scope of this situation: the amount of new data will double, every year, between 2013 and 2020; and the percentage considered to be useful if it can be tagged, will jump by more than two-thirds in this time period. In other words, we’ll see more, and more relevant, information that can help a business – or hurt it.
Imagine, for example, you run a bank in the United States. You have multiple reporting requirements related to everything from money laundering to the value of your liquid assets. Yet every day, depositors move funds and bond prices fluctuate. Your compliance, or the lack thereof, depends on how quickly you can analyze this real-time data.
This dilemma asserts itself daily from compliance-driven industries such as banking and insurance to pure everyday commerce. The best defense is a decision-making process that keeps your speed-to-analysis as low as possible. Since data preparation is the most time-consuming part of the analytic process, it only makes sense to automate it.
What Intelligent Data Preparation Looks Like
One new company winning in the data preparation market – so much that it became part of my portfolio – is Paxata. Its algorithms clean, combine, and enrich data. It also uses cognitive-computing routines to create relationships among data automatically. For instance, if the software encounters a field in a spreadsheet or report labeled first nameit infers a connection to another field labeled last name.
The result is less time cleaning data and more time solving real problems.
Consider a U.S. food manufacturer that wanted to reduce product spoilage. Paxata combined internal production and point-of-sale data from separate systems into a single, real-time view of its product – from initial order to checkout at a cash register. This approach not only eliminated spreadsheet workarounds, it freed time for analysts to identify previously unknown pinch points in the supply chain.
Paxata took a similar approach with a technology vendor whose analysts spent most of their time gathering and aggregating data, resulting in stale and inaccurate information. They now see a combined data flow from a half-dozen backend systems, and can collaborate in a secure, fully auditable, and up-to-date environment. Gone are the specialized code and spreadsheet macros that mashed this information together. And with more time to do their actual jobs, they discovered nearly $10 million in operational savings.
Analyst-Ready Data In Minutes, Not Months
Today, the process for moving from raw data to analyst-ready information is broken. Too many IT departments don’t understand, or can’t keep up, with the incoming information. Too often, analysts can’t use the available IT tools and have to cobble together workarounds.
I expect the need for real-time data analysis among knowledge workers to grow rapidly – and that the biggest obstacle to their success will be the time required to prepare information. Automated, intelligent tools will speed that process, and ultimately the pace of business decision-making itself.
And make things a lot less messy.