HDFS & Its Architecture

HDFS & Its Architecture

HDFS - Hadoop Distributed System

HDFS is a file system designed for storing very large files with streaming data access patterns running on clusters of commoditive hardware.

Inspired from Google File System which was developed using C++ during 2003 by Google to enhance its search engine, Hadoop Distributed File System (HDFS), a Java based file system, becomes the core components of Hadoop. With its fault tolerant and self-healing features, HDFS enables Hadoop to harness the true capability of distributed processing techniques by turning a cluster of industry standard servers or commodity servers into massively scalable pool of storage. Just to add another feather in its cap, HDFS can store structured, semi-structured or unstructured data in any format regardless of schema and is specially designed to work in an environment where scalability and throughput is critical. 



HDFS Concepts

Blocks
A block of HDFS is 64MB, it is much more than a disk (typical size 512 bytes).
HDFS blocks are large to minimize the cost of seeks. By making a block large enough, the time to transfer the data from the disk can be significantly longer than the time to seek the start of the block.
Benefits for having a block abstraction for a distributed file system:
·         File can be larger than any disk in the network, so it can be stored on dierent disks.
·         Unit of abstraction a block rather than a file simplifies the storage subsystem (in term of calculation the memory size). File metadata such as permissions information does not need to be stored with the blocks.
To list the blocks that make up each file in the filesystem.  % hadoop fsck / -files –blocks

Namenodes and Datanodes

There are two types of nodes operating in a master-worker pattern: a namenode and a number of datanodes, also called master and workers. The namenode manages the filesystem namespace, maintains the filesystem tree and the metadata for all the files and directories. The namenode knows the datanodes on which all the blocks for a file are located. It does not store block location persistently.

A client accesses the filesyestem by communicating with both namenode and datanodes.
·         Datanodes store and retrieve blocks when they are told to (by clients or by the namenode), they report back to the namenode periodically with lists of blocks that they are storing.
·         The namenode is extremely important. It is essential to make the namenode resilient to failure. There are two mechanisms for this.
·         Back up the files that make up the persitaent state of the filesystem metadata. Hadoop can be configured so that the namenode writes its persistent state to multiple file systems. Usually, it writes to local disk as well as a remote NFS mount.
·         Run a secondary namenode which periodically merge the namespace image with the edit log to prevent the edit log become too large.

Namespace and Block Storage

The namespace consists of directories, files and blocks and supports all the namespace related file system operations such as create, delete, modify and list files and directories.
The Block Storage Service has two parts
·         Block Management (which is done in Namenode)
·         Storage - is provided by datanodes by storing blocks on the local file system and allows read/write access.

HDFS Federation
In order to scale the name service horizontally, federation uses multiple independent Namenodes/namespaces. The Namenodes are federated, that is, the Namenodes are independent and don’t require coordination with each other. The datanodes are used as common storage for blocks by all the Namenodes. Each datanode registers with all the Namenodes in the cluster. Datanodes send periodic heartbeats and block reports and handles commands from the Namenodes.




The entire Hadoop HDFS cluster is divided into two parts.

1.     Master:  This part of cluster contains a node which is termed as Name Node.  There can be one or two optional Node which can be used for fault tolerance purpose which is termed as Secondary / Standby Name Node.
2.     Slave:  All servers which are a slave are called Data Node.  The no of data node can be anything which suits the business requirements.  The size of data which can be hold by HDFS in a cluster is decided by the data nodes disk size which is allocated in HDFS.  Any additional storage requirement can be fulfilled by just adding one or more node of the required disk size.
Because Name Node is the central piece of the entire HDFS cluster, it is essential to take a good care of this node.  This machine is recommended to be a good quality server with having lot of RAM.  This is the node which keeps mapping of “files to block” and “blocks to the data nodes”.  In short, Name node stores the file system metadata.




Basic Features

• Highly fault-tolerant
• Suitable for applications with large data sets
• High throughput
• Streaming access to file system data
• Can be built out of commodity hardware
• Platform Independent
• Write-once-read-many: append is supported
• A map-reduce application fits perfectly with this model

Multi-Node Cluster



Master/slave architecture

Namenode
Single Namenode in a cluster
manages the file system namespace and regulates access to files by clients

Datanodes
A number of DataNodes usually one per node in a cluster
manage storage attached to the nodes that they run on
serve read/write requests, perform block creation, deletion and replication upon instruction from Namenode
multiple DataNodes on the same machine is rare

Namenode
Keeps image of entire file system namespace and file Blockmap in memory
4GB of local RAM is sufficient
Periodic checkpointing
• gets the FsImage and Editlog from its local file system at startup
• update FsImage with EditLog information
• stores a copy of the FsImage on filesytstem as a checkpoint
• the system can recover back to the last checkpointed state in case of crash
EditLog
• a transaction log to record every change that occurs to the filesystem metadata
FsImage
• stores file system namespace with mapping of blocks to files and file system properties

Datanode
·         stores data in files in its local file system
·         no knowledge about HDFS filesystem
·         stores each block of HDFS data in a separate file
·         Datanode does not create all files in the same directory
·         heuristics to determine optimal number of files per directory and create directories appropriately:

File system Namespace
·         Hierarchical file system with directories and files
·         Create, remove, move, rename etc.
·         Namenode maintains the file system Metadata
·         Any meta information changes to the file system is recorded by the Namenode
·         number of replicas of the file can be specified by application
·         replication factor of the file is stored in the Namenode

Data Replication
·         each file is a sequence of blocks
·         same size blocks
·         for fault tolerance
·         configurable block size and replicas (per file)
·         a Heartbeat and a BlockReport is sent to Namenode
·         Heartbeat notifies activeness of Datanode
·         BlockReport contains record of all the blocks on a Datanode

Replica Selection
·         to minimize the bandwidth consumption and latency
·         local replica node is most preferred
·         replica in the local data center is preferred over the remote one

Replica Placement
·         Optimized replica placement
·         Rack-aware replica placement:
o   to improve reliability, availability and network bandwidth utilization
·         Many racks, communication between racks are through switches
·         Network bandwidth is different
·         Replicas are typically placed on unique racks
o   Simple but non-optimal
o   Writes are expensive
o   Replication factor is 3
·         Replicas are placed: one on a node in a local rack, one on a different node in the local rack and one on a node in a different rack.
·         1/3 of the replica on a node, 2/3 on a rack and 1/3 distributed evenly across remaining racks.

Namenode Startup
·         Safemode
·         Replication is not possible
·         Each DataNode checks in with Heartbeat and BlockReport
·         Namenode verifies that each block has acceptable number of replicas
·         Namenode exits Safemode
·         list of blocks that need to be replicated.
·         Namenode then proceeds to replicate these blocks to other Datanodes.




Comments

Post a Comment

Popular posts from this blog

પટેલ સમાજનો ઈતિહાસ જાણો : કોણ અને ક્યાંથી આવ્યા હતા પાટીદારો

Python HTML Generator using Yattag Part 1

Java Event Delegation Model, Listener and Adapter Classes