Hadoop

How to Install and Configure Apache Hadoop on a Single Node in CentOS 7
Apache Hadoop is an Open Source framework build for distributed Big Data storage and processing data across computer clusters. The project is based on...
Install and Configure Apache Oozie Workflow Scheduler for CDH 4.X on RHEL/CentOS 6/5
Oozie is an open source scheduler for Hadoop, it simplifies workflow and coordina-tion between jobs. We can define dependency between jobs for an inpu...
Install Hadoop Multinode Cluster using CDH4 in RHEL/CentOS 6.5
Hadoop is an open source programing framework developed by apache to process big data. It uses HDFS (Hadoop Distributed File System) to store the data...
How to run Hadoop without using SSH
The start-all.sh and stop-all.sh scripts in the hadoop/bin directory will use SSH to launch some of the Hadoop daemons. If for some reason SSH is not ...
How To Modify Hadoop Log Level
By default, Hadoop's log level is set to INFO. This can be too much for most instances, as it will generate huge log files, even in an environment wit...
Understanding the Hadoop MapReduce framework
This post introduces the MapReduce framework that enables you to write applications that process vast amounts of data, in parallel, on large clusters ...
HDPCA Practice Exam Questions and AWS Instance Setup Details
Before taking the HDPCA exam, you can get the feel of the exam by using the HDPCA practice exam on AWS cloud. The practice exam is very similar to the...
How to configure Capacity Scheduler Queues Using YARN Queue Manager
Note: This is post is part of the HDPCA exam objective series Capacity Scheduler is mainly designed for multitenancy, where multiple organizations col...
HDPCA Exam Objective - Configure HiveServer2 HA ( Part 2 - Configure HA )
Note: This is post is part of the HDPCA exam objective series Hive first started with HiveServer1. However, this version of the Hive server was not ve...