Big Data Introduction Part 2
What
is Big Data?
Big
data means really a big data; it is a collection of large datasets that cannot
be processed using traditional computing techniques. Big data is not merely a data;
rather it has become a complete subject, which involves various tools, techniques
and frameworks.
What Comes Under Big Data?
Big
data involves the data produced by different devices and applications. Given
below are some of the fields that come under the umbrella of Big Data.
·
Black
Box Data: It
is a component of helicopter, airplanes, and jets, etc. It captures voices of
the flight crew, recordings of microphones and earphones, and the performance
information of the aircraft.
·
Social
Media Data: Social
media such as Facebook and Twitter hold information and the views posted by
millions of people across the globe.
·
Stock
Exchange Data: The stock exchange data holds
information about the ‘buy’ and ‘sell’ decisions made on a share of different
companies made by the customers.
·
Power
Grid Data: The
power grid data holds information consumed by a particular node with respect to
a base station.
·
Transport
Data:
Transport data
includes model, capacity, distance and availability of a vehicle.
·
Search
Engine Data: Search
engines retrieve lots of data from different databases.
Thus
Big Data includes huge volume, high velocity, and extensible variety of data.
The data in it will be of three types.
·
Structured
data:
Relational data.
·
Semi
Structured data: XML data.
·
Unstructured
data:
Word, PDF, Text,
Media Logs.
3V
(Volume-Variety-Velocity) characteristics
Volume
We
currently see the exponential growth in the data storage as the data is now
more than text data. We can find data in the format of videos, music and large
images on our social media channels. It is very common to have Terabytes and
Petabytes of the storage system for enterprises. As the database grows the
applications and architecture built to support the data needs to be
reevaluated quite often. Sometimes the same data is re-evaluated with multiple
angles and even though the original data is the same the new found intelligence
creates explosion of
the data. The big volume indeed represents Big Data.
Velocity
The
data growth and social media explosion have changed how we look at the data.
There was a time when we used to believe that data of yesterday is recent. The
matter of the fact newspapers is still following that logic. However, news
channels and radios have changed how fast we receive the news. Today, people
reply on social media to update them with the latest happening. On social media
sometimes a few seconds old messages (a tweet, status updates etc.) is not
something interests users. They often discard old messages and pay attention to
recent updates. The data movement is now almost real time and the update window
has reduced to fractions of the seconds. This high velocity data
represent Big
Data.
Variety
Data
can be stored in multiple formats. For example database, excel,
csv, access or for the matter of the fact, it can be stored in a simple
text file. Sometimes the data is not even in the traditional format as we
assume, it may be in the form of video, SMS, pdf or something we might have not
thought about it. It is the need of the organization to arrange it and
make it meaningful. It will be easy to do so if we have data in the same
format, however it is not the case most of the time. The real world has data in
many different formats and that is the challenge we need to overcome with the Big
Data. This variety of the data represents Big Data.
Structured
and Unstructured Data
What is Structured Data?
Before
getting into unstructured data, you need to have an understanding for its
structured counterpart. Structured data is information, usually text files,
displayed in titled columns and rows which can easily be ordered and processed
by data mining tools. This could be visualized as a perfectly organized filing
cabinet where everything is identified, labeled and easy to access. Most
organizations are likely to be familiar with this form of data and already
using it effectively, so let’s move on to the hotter question.
What is Unstructured Data?
Believe
it or not, your database of structured information doesn’t even contain half of
the information available for your use! Seth Grimes, a leading industry analyst
on the confluence of structured and unstructured data sources, published an
article that stated, “80% of business-relevant
information originates in unstructured form, primarily text.”
This may seem like an outlandish percentage, but don’t jump to conclusions too
fast. We’re just getting started.
Application
and use cases of Big Data
1. A
360 degree view of the customer
This use is most popular, according to Gallivan. Online retailers want to find out what shoppers are doing on their sites -- what pages they visit, where they linger, how long they stay, and when they leave.
"That's all unstructured
clickstream data," said Gallivan. "Pentaho takes that and blends it
with transaction data, which is very structured data that sits in our
customers' ERP [business management] system that says what the customers
actually bought."
2.
Internet of Things
The second most popular use case involves IoT-connected devices managed by hardware, sensor, and information security companies. "These devices aresitting in their customers' environment, and they phone home with information about the use, health, or security of the device," said Gallivan.
Storage
manufacturer NetApp, for instance, uses Pentaho software to collect and
organize "tens of millions of messages a week" that arrive from
NetApp devices deployed at its customers' sites. This unstructured machine data
is then structured, put into Hadoop, and then pulled out for analysis by NetApp
3.
Data warehouse optimization
This is an "IT-efficiency play," Gallivan said. A large company, hoping to boost the efficiency of its enterprise data warehouse, will look for unstructured or "active" archive data that might be stored more cost effectively on a Hadoop platform. "We help customers determine what data is better suited for a lower-cost computing platform."
4. Big
data service refinery
This means using big-data technologies to break down silos across data stores and sources to increase corporate efficiency.
A large global financial institution,
for instance, wanted to move from next-day to same-day balance reporting for
its corporate banking customers. It brought in Pentaho to take data from
multiple sources, process and store it in Hadoop, and then pull it out again.
This allowed the bank's marketing department to examine the data "more on
an intra-day than a longer-frequency basis," Gallivan told us.
5.
Information security
This last use case involves large enterprises with sophisticated information security architectures, as well as security vendors looking for more efficient ways to store petabytes of event or machine data. In the past, these companies would store this information in relational databases. "These traditional systems weren't scaling, both from a performance and cost standpoint," said Gallivan, adding that Hadoop is a better option for storing machine data.
Opportunities and challenges with Big Data
Big Data Challenges
The
major challenges associated with big data are as follows:
- Capturing
data
- Curation
- Storage
- Searching
- Sharing
- Transfer
- Analysis
- Presentation
To
fulfill the above challenges, organizations normally take the help of
enterprise servers.
Comments
Post a Comment