Monday, March 25, 2013

Hadoop Ecosystems Listing

A non-exhausted Hadoop Ecosystems are listed in the below. The list is taken from the nice read
Hadoop the definitive guide / Tom White.

A serialization system for efficient, cross-language RPC and persistent data storage.
1. Hadoop Distributed File System (HDFS)
A distributed filesystem that runs on large clusters of commodity machines.

2. Hive
A distributed data warehouse. Hive manages data stored in HDFS and provides a query language based on SQL (and which is translated by the runtime engine to MapReduce jobs) for querying the data.

3. HBase
A distributed, column-oriented database. HBase uses HDFS for its underlying storage, and supports both batch-style computations using MapReduce and point queries (random reads).

4. MapReduce
A distributed data processing model and execution environment that runs on large clusters of commodity machines.

5. Oozie
A service for managing workflows of Hadoop jobs

6. Pig
A data flow language and execution environment for exploring very large datasets. Pig runs on HDFS and MapReduce clusters.

7. Sqoop
A tool for efficient bulk transfer of data between structured data stores (such as relational databases) and HDFS.

8. ZooKeeper
A distributed, highly available coordination service. ZooKeeper provides primitives such as distributed locks that can be used for building distributed applications.

1 comment:

sundara rami reddy said...

Hadoop is creating more opportunities to every one. And thanks for sharing best information about hadoop in this Thanks so very much for taking your time to create this very useful and informative site. I have learned a lot from your site. Thanks!!
Hadoop Training in hyderabad