Easy Tutorial

Posts

What is Sqoop? What is FLUME - Hadoop Tutorial

- November 15, 2017

Before we learn more about Flume and Sqoop , lets study Issues with Data Load into Hadoop Analytical processing using Hadoop requires loading of huge amounts of data from diverse sources into Hadoop clusters. This process of bulk data load into Hadoop, from heterogeneous sources and then processing it, comes with certain set of challenges. Maintaining and ensuring data consistency and ensuring efficient utilization of resources, are some factors to consider before selecting right approach for data load. Major Issues: 1. Data load using Scripts Traditional approach of using scripts to load data, is not suitable for bulk data load into Hadoop; this approach is inefficient and very time consuming. 2. Direct access to external data via Map-Reduce application Providing direct access to the data residing at external systems(without loading into Hadopp) for map reduce applications complicates these applications. So, this approach is not feasible. 3.In addition to having a...

What is MapReduce? How it Works - Hadoop MapReduce Tutorial

- November 15, 2017

MapReduce is a programming model suitable for processing of huge data. Hadoop is capable of running MapReduce programs written in various languages: Java, Ruby, Python, and C++. MapReduce programs are parallel in nature, thus are very useful for performing large-scale data analysis using multiple machines in the cluster. MapReduce programs work in two phases: Map phase Reduce phase. Input to each phase are key-value pairs. In addition, every programmer needs to specify two functions: map function and reduce function . The whole process goes through three phase of execution namely, How MapReduce works Lets understand this with an example – Consider you have following input data for your MapReduce Program Welcome to Hadoop Class Hadoop is good Hadoop is bad The final output of the MapReduce task is bad 1 Class 1 good 1 Hadoop 3 is 2 to 1 Welcome 1 The data goes through fol...

HDFS Tutorial: Read & Write Commands using Java API

- November 15, 2017

Hadoop comes with a distributed file system called HDFS ( HADOOP Distributed File Systems ) HADOOP based applications make use of HDFS. HDFS is designed for storing very large data files, running on clusters of commodity hardware. It is fault tolerant, scalable, and extremely simple to expand. Do you know? When data exceeds the capacity of storage on a single physical machine, it becomes essential to divide it across number of separate machines. File system that manages storage specific operations across a network of machines is called as distributed file system . In this tutorial we will learn, Read Operation Write Operation Access HDFS using JAVA API Access HDFS Using COMMAND-LINE INTERFACE HDFS cluster primarily consists of a NameNode that manages the file system Metadata and a DataNodes that stores the actual data . NameNode: NameNode can be considered as a master of the system. It mainta...

Hadoop Setup Tutorial - Installation & Configuration

- November 15, 2017

Prerequisites: You must have Ubuntu installed and running You must have Java Installed. Step 1) Add a Hadoop system user using below command sudo addgroup hadoop_ sudo adduser --ingroup hadoop_ hduser_ Enter your password , name and other details. NOTE: There is a possibility of below mentioned error in this setup and installation process. "hduser is not in the sudoers file. This incident will be reported." This error can be resolved by Login as a root user Execute the command sudo adduser hduser_ sudo Re-login as hduser_ Step 2) . Configure SSH In order to manage nodes in a cluster, Hadoop require SSH access First, switch user, enter following command su - hduser_ This command will create a new key. ssh-keygen -t rsa -P "" Enable SSH access to local machine using this key. cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys Now test SSH setup by connecting to locahost as...

Search This Blog

Easy Tutorial

Posts

Sqoop vs Flume vs HDFS in Hadoop

What is Sqoop? What is FLUME - Hadoop Tutorial

What is MapReduce? How it Works - Hadoop MapReduce Tutorial

HDFS Tutorial: Read & Write Commands using Java API

Hadoop Setup Tutorial - Installation & Configuration