Hadoop is a large-scale distributed batch processing infrastructure. Batch processing is execution of a series of programs (jobs) on a computer without manual intervention.
Hadoop includes a distributed file system which breaks up input data and sends fractions of the original data to several machines in your cluster to hold. This results in the problem being processed in parallel using all of the machines in the cluster and computes output results as efficiently as possible.
Hadoop is designed to handle hardware failure and data congestion issues very robustly.
In a Hadoop cluster, data is distributed to all the nodes of the cluster as it is being loaded in. [Data is distributed across nodes at load time.]
The Hadoop Distributed File System (HDFS) will split large data files into chunks which are managed by different nodes in the cluster. In addition to this each chunk is replicated across several machines, so that a single machine failure does not result in any data being unavailable. Even though the file chunks are replicated and distributed across several machines, they form a single namespace, so their contents are universally accessible.
Hadoop will not run just any program and distribute it across a cluster. Programs must be written to conform to a particular programming model, named "MapReduce."
1 comment:
There are lots of information about hadoop have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get to the next level in big data. Thanks for sharing this.
Hadoop Course in Chennai
Hadoop training institutes in chennai
Hadoop Training Chennai
Post a Comment