If you do not change browser settings, you agree to it. Learn more
The EU cookie law (e-Privacy Directive)
Please visit ICO website for more information. http://ico.org.uk/
|This email contains graphics, so if you don't see them, view it in your browser.|
Connect-World, the information and communication technology (ICT) decision makers' magazine. We are the decision makers' forum for ICT driven development.
|Connect-World's eLetter September 2013||1st Oct 2013|
Keeping up with big data, analytics, business intelligence and processing
An exabyte (EB) is an enormous number of bytes. The standard international prefix ‘exa’ stands for one followed by 18 zeros (1, 000,000,000,000,000,000). An exabyte is equal to 1000 petabytes or one billion gigabytes. Since we commonly count bytes using exponents of 2 (22, 23, 24 ...) an exabyte is really 260 bytes or 1 152 921 504 606 846 976 bytes.
Every two days we create or store five exabytes of data on the Web; five exabytes is roughly equal to all the information produced by mankind since our beginning until 2003 - ten years ago. Now that is really big data.
For most, big data is a half-told, little understood, tale that pops up in magazines, news reports and disjointed conversations. Leaving out the algorithms, technology and such, big data is just the collection of enormous amounts of date generated by everything from the world’s social networks, to the Internet of things, emails, or the cutting edge science at the Large Hadron Collider.
Today the half-megabyte core memory of an IBM 360 or the five-megabyte hard drive on my first PC would be laughable. Multi-gigabyte memory and multi-terabyte hard drives are common and cheaper by the day - and so are the databases they house. Petabytes and exabytes of data are here or on the way.
Google probably has more than an exabyte of data stored and probably handles exabytes of data per day. Today’s social networks probably handle similar traffic loads. At corporations, cloud hosts, governments and the like petabytes of storage are becoming increasingly common. CERN alone accumulates 15 petabytes each year.
More is not always just a matter of quantity. Think about money, a lot of money, enough to take you from comfortable to obscenely rich. Got it? Quantity transforms quality - although not always for better.
Big data is quantity transformed into quality. Well, not always, but that’s the goal.
Big data is about figuring out what all this data means; it’s about turning masses of unorganised (‘unstructured’ is the buzzword) data into information - preferably usable, valuable, information.
Analytics is the process of organising haystacks of data into needles of information. Big data analytics are processed using computer algorithms, a bunch of rules designed to tease patterns related to matters of interest out of a great mass of unstructured data. Simple!
The next problem is figuring out what to do with the information once you get it; that’s still something of a hit and miss proposition. Big data generates some important information, some other information that is just tantalising and certainly some baffling toxic fallout.
Follow us on
Latest ICT Event
Download EXFO White Paper for FREE
• Africa and the Middle East
• Latin America
• North America
The tools we have in hand have not yet developed into precision instruments. There aren’t that many people that can build the algorithms and not many that can put the information they ferret out to good use, and that is a problem.
Big data is a tool we are just learning to use and there aren’t that many people working at it. It will take years before there are enough thoroughly experienced analysts to meet the growing demand for the products of big data products and more time still to breed a generation of managers able to use this new tool well.
Part of the problem, even after the information has been sieved out, is to understand it in context. Visual analytics creates graphic representations of masses of information that many people find easier to work with and understand than text. Visual analytics is one of the best new tools and BI, business intelligence analysts and consumers alike and many others have been quick to adopt it.
Within the next five to ten years big data will generate many thousands, perhaps hundreds of thousands, jobs throughout the world devoted to analysing the data, deciding if it is useful and then what to do with it. Big data is not a miracle; it doesn’t solve problems or make decisions. The human evaluation of the information generated is not likely to improve nor is our decision making ability. No, I’m not pessimistic; we have come a long way muddling along. With better tools we will undoubtedly muddle better.
What do we do with all the new information? Decision analytics should be part of the answer. It’s been around for a long time, since the 1960s or so - long before big data was possible, and despite much discussion of theory, about the tools, the graphics, utility functions, risks, influence diagrams, probabilities, uncertainty and such; it has never become a mainstream option.
Yes, some big corporations do use decision analytics to help decide difficult questions in a wide variety of fields, but in years of consulting, I have never seen decision analytics used - just spoken about.
Decision makers are swamped with much more data than they can handle efficiently. Experience and intuition are often the only guides they have to separate meaningful information from the surrounding data noise and turn it into business intelligence. Big data and decision analytics should help better define the significant risks and, generally speaking, make better decisions. Still, statistics and large collections of past occurrences do not always help.
Regression analysis, for instance, is a powerful tool, but a regression does not guarantee its projection; wild cards pop up, random tsunamis of the market or the flap of butterfly wings disturb whatever universe we picture or disconnect past history from future fact. When making decisions, then, uncertainties have to be factored in; today that is still something of an evolving art.
Decision analysis, if it is to work at all, has to define a precise range of relevant variables for each decision. Decision analytics are extremely interesting, a corporate planner told me about a year ago, but even as of then, too difficult, expensive and slow for any but relatively simple cases.
Potential users, for the most part, seem to think decision analytics might make big data easier to use. I wonder if they have turned the problem around; perhaps big data might finally make decision analytics easier and better.
Throw a witches’ brew of unstructured data into a big data cauldron, bring it to a boil, and wait for the patterns, the correlations, to surface. The potentially wide range of unsuspected cause and effect correlations - even some of the spurious, but strong, correlations thus generated - might be a cost-effective way to ferret out the variables needed to base decisions analytics upon.
Time is also a factor in big data usage: time to ponder big, one of a kind, make or break decisions; foreseeable decisions that need to be made just-in-time about routine, but important events or processes; and the real-time calls that have to be made on the spot at critical not often predictable moments.
Handling large amounts of data when there is ample time can be a relatively straightforward process that uses a variety of increasingly well understood techniques and processing schemes. But, when vast amounts of unstructured data keep streaming in, and have to be analysed and used in real-time, traditional processing routines get bogged down.
You can try to divide the data into chunks that can be handled, but making decisions based on partial data brings you back to square one and delivers answers no more reliable than they ever were. With enough processing power, enormous amounts of memory and in-memory processing software, everything can be processed almost at once. The ability to access and process huge masses of data without waiting makes real-time decision making feasible.
Storing data on disk drives and loading it into memory as needed is tremendously time consuming. With hard drives, when the volume of data grows and data must be swapped into and out of memory faster, queues grow and performance rapidly degrades. Even using the fastest SSD hard drives, the constant movement of data in and out of memory takes time, lots of time, compared to processing data that is already in memory together with the instructions that use it. Random access memory (RAM) can be as much as 3,000 times faster than disk storage and it speed processing, overall, by hundreds or, on occasion, thousands of times.
The steadily plunging costs of 64kb memory promises to bring in-memory processing into the mainstream in the near future. Using 64kb instead of older 32kb memory chips lets one directly address not gigabytes, but terabytes, of memory. Eliminating the disk, and storing the data directly in huge multi-terabyte or petabyte memories, can reduce days of processing to minutes.
Instead of working with historical - by definition out of date - data, organisation with their data in memory can answer complex questions based on current data using all of their data, not just samples. They can adjust their operations and start reacting in minutes instead of days, weeks or months.
Organizations with all their data, unstructured, in memory do not need to re-structure their data to query any aspect of their manufacturing , operations, R&D, HR, finances, market or any combination of thereof – and get quick answers.
As computing power rises, memory grows and costs drop it seems likely that in a few years we will be seeing a memory-based rethinking of the ways we use computers on a daily business. It may not yet be a disruptive technology, but it might well become one much faster than we expect.
|What can Connect-World offer you?
|Copyright © 2013 Connect-World. All rights reserved.|
|If you wish to cancel your subscription to this newsletter,|