Big Data Introduction Part 2

What is Big Data?

Big data means really a big data; it is a collection of large datasets that cannot be processed using traditional computing techniques. Big data is not merely a data; rather it has become a complete subject, which involves various tools, techniques and frameworks.

What Comes Under Big Data?

Big data involves the data produced by different devices and applications. Given below are some of the fields that come under the umbrella of Big Data.
·        Black Box Data: It is a component of helicopter, airplanes, and jets, etc. It captures voices of the flight crew, recordings of microphones and earphones, and the performance information of the aircraft.
·        Social Media Data: Social media such as Facebook and Twitter hold information and the views posted by millions of people across the globe.
·        Stock Exchange Data: The stock exchange data holds information about the ‘buy’ and ‘sell’ decisions made on a share of different companies made by the customers.
·        Power Grid Data: The power grid data holds information consumed by a particular node with respect to a base station.
·        Transport Data: Transport data includes model, capacity, distance and availability of a vehicle.
·        Search Engine Data: Search engines retrieve lots of data from different databases.




Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in it will be of three types.

·        Structured data: Relational data.
·        Semi Structured data: XML data.
·        Unstructured data: Word, PDF, Text, Media Logs.

3V (Volume-Variety-Velocity) characteristics


Volume

We currently see the exponential growth in the data storage as the data is now more than text data. We can find data in the format of videos, music and large images on our social media channels. It is very common to have Terabytes and Petabytes of the storage system for enterprises. As the database grows the applications and architecture built to support the data needs to be reevaluated quite often. Sometimes the same data is re-evaluated with multiple angles and even though the original data is the same the new found intelligence creates explosion of the data. The big volume indeed represents Big Data.

Velocity

The data growth and social media explosion have changed how we look at the data. There was a time when we used to believe that data of yesterday is recent. The matter of the fact newspapers is still following that logic. However, news channels and radios have changed how fast we receive the news. Today, people reply on social media to update them with the latest happening. On social media sometimes a few seconds old messages (a tweet, status updates etc.) is not something interests users. They often discard old messages and pay attention to recent updates. The data movement is now almost real time and the update window has reduced to fractions of the seconds. This high velocity data represent Big Data.

Variety

Data can be stored in multiple formats. For example database, excel, csv, access or for the matter of the fact, it can be stored in a simple text file. Sometimes the data is not even in the traditional format as we assume, it may be in the form of video, SMS, pdf or something we might have not thought about it. It is the need of the organization to arrange it and make it meaningful. It will be easy to do so if we have data in the same format, however it is not the case most of the time. The real world has data in many different formats and that is the challenge we need to overcome with the Big Data. This variety of the data represents Big Data.

Structured and Unstructured Data


What is Structured Data?

Before getting into unstructured data, you need to have an understanding for its structured counterpart. Structured data is information, usually text files, displayed in titled columns and rows which can easily be ordered and processed by data mining tools. This could be visualized as a perfectly organized filing cabinet where everything is identified, labeled and easy to access. Most organizations are likely to be familiar with this form of data and already using it effectively, so let’s move on to the hotter question.

What is Unstructured Data?

Believe it or not, your database of structured information doesn’t even contain half of the information available for your use! Seth Grimes, a leading industry analyst on the confluence of structured and unstructured data sources, published an article that stated, “80% of business-relevant information originates in unstructured form, primarily text.”  This may seem like an outlandish percentage, but don’t jump to conclusions too fast. We’re just getting started.

Application and use cases of Big Data

1. A 360 degree view of the customer

This use is most popular, according to Gallivan. Online retailers want to find out what shoppers are doing on their sites -- what pages they visit, where they linger, how long they stay, and when they leave.
"That's all unstructured clickstream data," said Gallivan. "Pentaho takes that and blends it with transaction data, which is very structured data that sits in our customers' ERP [business management] system that says what the customers actually bought."
2. Internet of Things

The second most popular use case involves IoT-connected devices managed by hardware, sensor, and information security companies. "These devices aresitting in their customers' environment, and they phone home with information about the use, health, or security of the device," said Gallivan.
Storage manufacturer NetApp, for instance, uses Pentaho software to collect and organize "tens of millions of messages a week" that arrive from NetApp devices deployed at its customers' sites. This unstructured machine data is then structured, put into Hadoop, and then pulled out for analysis by NetApp
3. Data warehouse optimization

This is an "IT-efficiency play," Gallivan said. A large company, hoping to boost the efficiency of its enterprise data warehouse, will look for unstructured or "active" archive data that might be stored more cost effectively on a Hadoop platform. "We help customers determine what data is better suited for a lower-cost computing platform."

4. Big data service refinery

This means using big-data technologies to break down silos across data stores and sources to increase corporate efficiency.
A large global financial institution, for instance, wanted to move from next-day to same-day balance reporting for its corporate banking customers. It brought in Pentaho to take data from multiple sources, process and store it in Hadoop, and then pull it out again. This allowed the bank's marketing department to examine the data "more on an intra-day than a longer-frequency basis," Gallivan told us.
5. Information security

This last use case involves large enterprises with sophisticated information security architectures, as well as security vendors looking for more efficient ways to store petabytes of event or machine data. In the past, these companies would store this information in relational databases. "These traditional systems weren't scaling, both from a performance and cost standpoint," said Gallivan, adding that Hadoop is a better option for storing machine data.

Opportunities and challenges with Big Data


Big Data Challenges

The major challenges associated with big data are as follows:
  • Capturing data
  • Curation
  • Storage
  • Searching
  • Sharing
  • Transfer
  • Analysis
  • Presentation
 To fulfill the above challenges, organizations normally take the help of enterprise servers.


Comments

Popular posts from this blog

પટેલ સમાજનો ઈતિહાસ જાણો : કોણ અને ક્યાંથી આવ્યા હતા પાટીદારો

Python HTML Generator using Yattag Part 1

Java Event Delegation Model, Listener and Adapter Classes