Sqoop vs Flume vs HDFS in Hadoop

Sqoop Flume HDFS
Sqoop is used for importing data from structured data sources such as RDBMS.Flume is used for moving bulk streaming data into HDFS. HDFS is a distributed file system used by Hadoop ecosystem to store data.
Sqoop has a connector based architecture. Connectors know how to connect to the respective data source and fetch the data.Flume has an agent based architecture. Here, code is written (which is called as 'agent') which takes care of fetching data. HDFS has a distributed architecture where data is distributed across multiple data nodes.
HDFS is a destination for data import using Sqoop.Data flows to HDFS through zero or more channels. HDFS is an ultimate destination for data storage.
Sqoop data load is not event driven. Flume data load can be driven by event. HDFS just stores data provided to it by whatsoever means.
In order to import data from structured data sources, one has to use Sqoop only, because its connectors know how to interact with structured data sources and fetch data from them. In order to load streaming data such as tweets generated on Twitter or log files of a web server, Flume should be used. Flume agents are built for fetching streaming data.HDFS has its own built-in shell commands to store data into it. HDFS can not import streaming data

Comments

Popular posts from this blog

Introduction to BIG DATA: Types, Characteristics & Benefits

Learn Python Main Function with Examples: Understand __main__

Python XML Parser Tutorial: Create & Read XML with Examples