The Apache Hadoop project has two core components,
- the file store called Hadoop Distributed File System (HDFS), and
- the programming framework called MapReduce.
HDFS - is designed for storing very large files with streaming data access pattern, running on clusters on commodity hardware.
MapReduce - is a programming model for processing large data sets with parallel, distributed algorithm on cluster.
Semi-structured and unstructured data sets are the two fastest growing data types of the digital universe. Analysts of these two data types will not be possible with tradtionsal database management systems. Hadoop HDFS and MapReduce enable the analysts of these data types, giving organizations the opportunity to extract insigts from bigger datasers within a reasonable amoutn of processing time. Hadoop MapReduce parallel processing capability has increased the speed of extraction and transformation of data.