Thursday, October 29, 2015

Data Visualization in the Age of Big Data


The old cliché - 'A picture is worth a thousand words' - still applies to big data. Humans perceive information via various channels called sensory cues - visual, auditory, and touch. Human brains are wired to understand images better than text or numbers. Studies show that the human brain processes images 60,000x faster than text. This fact signifies the importance of data visualization. Take any dataset either small or large and process it to a desired state and the final step is to communicate trends, gaps, outliers and insights using data visualization to decision makers to result in some business action or decision.

Data visualization is one of the critical components if not the most important component of any big data implementation and it can be used for delivering traditional business intelligence reports, tracking organizational KPIs, and to communicate insights gleaned from the data.


Big data implementation is a journey and it is linear process - each step has a dependency over prior step. This journey can be mapped to a simple four step process which is shown below. Organizations can take the following steps as a blue print for their big data implementation:


STEP1 Define clearly your infrastructure strategy 
STEP2 Select right big data technologies
STEP3 Integrate right data from various sources
STEP4 Process and enrich data
STEP5 Perform data analytics and visualization

Big data implementation journey

Typically, data visualization is the last step in any big data implementation. This is due to the fact that data needs to be integrated, cleansed, transformed, and enriched with other data sources to get more semantic meaning and value out of it. Once data arrives to this final processed stage, visualization can be implemented to get good insights from the data.

If you look at the idiosyncrasies of big data visualization, you will notice that big data visualization is kind of a misnomer. Plotting the whole big data can be too noisy, slow and challenging due to technology limitations - moving data  to the target device (browser, mobile or tablet etc) but in most cases it is a browser. To solve this problem new approaches and techniques are required.

With the advent of big data, a few new use cases are evolving for data visualization. These are the new drivers for big data visualization. Few such requirements for big data visualization are:
1. High speed data: visualize high velocity data in real-time.
2. High volume data: visualize huge volume data

To achieve the above requirements, traditional visualization tools doesn't cut it. These visualization drivers require new hardware capabilities (like large RAM, Multi-core CPU, in some cases GPU etc)  in addition to store, organize and  process big data for efficient data visualization.

Challenges:
Visualization of large datasets will be hard for  human eye. Even if we present such visualization with large dataset, This will be too noisy for the humans. Think this like you were asked to find a needle in a haystack. All you can see is the haystack but not the needle. Another problem related to this is computational complexity in moving large data to the target device (browser, mobile, or a tablet) for rendering. This will be very slow.

Opportunities:
The above challenges are driving new opportunities - algorithms, efficient hardware, commoditization of RAM, new ways to visualize information like graphs, temporal and hierarchical and last but not least is the delivery platforms like cloud and mobile play a crucial role. All these approaches require a new technique for data visualization called interactive analytics. With interactive analytics, you can ask questions, touch and feel the data, and collaborate and brainstorm with teammates. Lots of innovation is required in the delivery of information - when and where it is required.

To conclude, data visualization is hot and require new interactive analytics approaches in the age of big data. Look out for new tools in this space which are either too generalized that can do many things like charts, trends or specialized that can do few specific things like graphs, collaborative, and interactive on large datasets. Pick the right tool based on your requirement and use case.