Is big data worth the effort?
You can hardly pick up an IT magazine or attend a technology conference without hearing about big data. It has been one of THE big topics of 2012 and will likely remain a significant subject of interest in 2013.
Despite all the hype surrounding this topic, I’m going to suggest that the entire big data phenomenon is not only overhyped but an evolution rather than revolution.
According to research from Gartner, released in July 2012, “Big data and solid state appliances are two of the technologies at the ‘peak of inflated expectations’ on this year’s hype cycle.” That’s all good - after all, just about every technology goes through a hype cycle before it settles into a more realistic level of coverage. But the issues of big data are hardly new.
Research from IDC suggests that by the end of this year, the total amount of digital information in the world will come to 2.7 zettabytes and that 90% of it will be unstructured data like digital video, sound files and images. That’s a huge number, but should you care?
Back in the 1990s, I worked at a manufacturing company that had established a new business reporting system using data warehousing. Data was extracted from operational systems, processed, cleansed and delivered to the business in a form that tied what was being looked at to organisational, departmental and personal KPIs.
Since that time two things have substantively changed and created the new discipline of big data.
- The volumes of data we deal with are far greater than ever before.
- The number of disparate data sources we pull data from and correlate have increased.
- Unstructured data is now seen as valuable and no longer too hard to analyse.
If the only issue your business is dealing with today is a data volume issue, then I don’t think you’re dealing with big data. You’re just dealing with lots of data. Big data is about the intersection of volume and unstructured data from disparate sources.
The volume issue spawns some immediate issues that need to be managed. We need ever-increasing levels of processing power in order to run queries over the data in the time we have for processing. We need faster extraction tools to get the data out of the source repositories and we need more capacity to store all of this.
By and large, Moore’s Law has meant that the required systems performance to manage large pools of data has kept up, and perhaps exceeded, the needs of many businesses. That’s not to say that faster isn’t better or desirable. However, processing power and storage capacity are no longer constraints. Although it is likely that the cost of operating that gear will be a constraint as power prices continue to rise.
The unknown has been the number of different data sources that businesses wish to harvest. In particular, many companies want to carry out analytics on social media data. For example, they want to do things like measure the positive and negative sentiment on social media channels and correlate that with sales data.
There are benefits in collecting and using this data so that businesses can ask the right questions. For example, one major sporting goods retailer used to rely on a manual ordering and distribution process. Once systems were automated and sales data could be properly analysed, it was found that one particular store never sold a single sporting item that was coloured pink. Without the data they weren’t able to get the answer to the question of which coloured products sold best in particular stores.
However, this is not a big data issue. This is a simple inventory management and reporting question. And it’s my view that many of the systems that businesses are considering are not really about big data but about smarter data management and reporting.
So, does your business need to integrate data from external sources or unstructured data?
If the answer is yes, then you’re in the big data game and that means a number of significant decisions. There’s no doubt that an effective data management strategy and plan of execution can be of value, but you do need to treat this just like any other investment. The tools and skills required to manage large volumes aren’t the same as those required when dealing with structured data from a single source, although they can be potentially developed from your existing talent pool.
Assuming you have the talent pool, then you’ll be potentially looking at new database tools and hardware to manage all of this. That’s likely to be a significant investment. So, the question becomes - is it worth it? Will the new capacity to understand your business and your market bring you new opportunities for business expansion and reducing expenditure or will it lead to paralysis by analysis?
Is big data worth the effort? Before you dive into a potentially expensive and disruptive project, ask yourself what you’re hoping to get out of it. Do you have specific goals or only general ambitions? If you need a project that is based on providing metrics that support corporate KPIs using multiple data sources, then the tools and techniques that can be collectively considered ‘big data’ are for you. But don’t leap that way just because that’s where this year’s hype is leading you.
The rate of technical modernisation across the past 18 months far exceeds previous years. But to...
Python is having its moment, recently claiming the top spot in both the PYPL and TIOBE indexes......
Signficant headcount growth at Randstad Australia led to an increase in employee queries,...