Sqoop vs Flume vs HDFS in Hadoop
Sqoop Flume HDFS Sqoop is used for importing data from structured data sources such as RDBMS. Flume is used for moving bulk streaming data into HDFS. HDFS is a distributed file system used by Hadoop ecosystem to store data. Sqoop has a connector based architecture. Connectors know how to connect to the respective data source and fetch the data. Flume has an agent based architecture. Here, code is written (which is called as 'agent') which takes care of fetching data. HDFS has a distributed architecture where data is distributed across multiple data nodes. HDFS is a destination for data import using Sqoop. Data flows to HDFS through zero or more channels. HDFS is an ultimate destination for data storage. Sqoop data load is not event driven. Flume data load can be driven by event. HDFS just stores data provided to it by whatsoever means. In order to import data from structured data sources, one has to use Sqoop only, because its connectors know how to interact wit