Performance Tuning Of Apache Hadoop Framework In Big Data Processing With Respect To Block Size, Operating System Clusters And Map Reduce Techniques

  • Mr.Brijesh Y. Joshi, Dr. Poornashankar
Keywords: Big Data, Apache Hadoop, Operating System, Centos, Ubuntu, Block size, Replication factor.

Abstract

The term Big data is applied to data sets which cannot be processed by traditional relational database applications due to varying data size in terms of TB, PB, EB, ZB and varying data types such as structured, unstructured and semi-structured. In order to store and process big data Apache Hadoop open source framework is available. Apache Hadoop provides storage component known as HDFS .Apache Hadoop provides MapReduce processing component which allows users to write the program in the form of map and reduce functions to implement their big data processing logic. Apache Hadoop offers various configuration files holding configuration settings which can be modified. While deployment of Apache Hadoop , it comes with by default configuration settings. In this research work we are focusing the operating system on which Apache Hadoop can be installed on EC2 instances on AWS and get better performance and along with that Data block size parameter configuration setting customization in order to reduce the execution time of mapreduce jobs in turn tuning the performance.

Published
2021-08-01
How to Cite
Dr. Poornashankar, M. Y. J. (2021). Performance Tuning Of Apache Hadoop Framework In Big Data Processing With Respect To Block Size, Operating System Clusters And Map Reduce Techniques. Design Engineering, 5766- 5778. Retrieved from http://thedesignengineering.com/index.php/DE/article/view/3074
Section
Articles