Apache Hadoop, platform for the collection, processing and analysis of large data sets

Mateusz Gil

mateusz.gil5@gmail.com
Institute of Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, Poland (Poland)

Abstract

The article presents the possibilities of using Hadoop platform to manage large data sets. The development of application performance has been shown based on available sources. Additionally, the article describes the organizations that have been successful in the Internet thanks to the implemented software.


Keywords:

Hadoop; Big Data; Analysis of the data

[1] Tom White, Hadoop. Kompletny przewodnik. Analiza i przechowywanie danych, wydawnictwo Helion.
[2] Yahoo! Launches World’s Largest Hadoop Production Application, https://developer.yahoo.com/blogs/hadoop/yahoolaunches-world-largest-hadoop-production-application398.html, [10.12.2016].
[3] Self-Service, Prorated Supercomputing Fun!, Http://open.blogs.nytimes.com/2007/11/01/self-serviceprorated-super-computing-fun/?_r=0, [1.12.2016].
[4] TeraByte Sort on Apache Hadoop, http://sortbenchmark.org/YahooHadoop.pdf, [11.12.2016].
[5] Sorting 1PB with MapReduce, https://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html, [5.12.2016].
[6] Winning a 60 Second Dash with a Yellow Elephant, http://sortbenchmark.org/Yahoo2009.pdf, [7.12.2016].
[7] GraySort and MinuteSort at Yahoo on Hadoop 0.23, http://sortbenchmark.org/Yahoo2013Sort.pdf, [15.12.2016].
[8] Big Data And Hadoop – Features And Core Architecture, http://www.dataintegration.ninja/big-data-and-hadoop-featuresand-core-architecture/, [15.12.2016].
[9] Big Data: Hadoop to przyszłościowe rozwiązanie w dziedzienie Buisness Intelligence, https://www.oracle.com/assets/fy16q3-one-pl-web-2861776-pl.pdf, [9.12.2016].
[10] Garry Turkington, Hadoop Beginner’s Guide, Packt Pubblishing Ltd., [9.12.2016].
[11] Hadoop – MapReduce, https://www.tutorialspoint.com/ hadoop/hadoop_mapreduce.htm, [9.12.2016].
[12] Jason Venner, Pro Hadoop, Apress, [9.12.2016].
[13] HDFS Architecture, http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html,[20.12.2016].
[14] The Hadoop Distributed File System, http://www.aosabook.org/en/hdfs.html, [20.12.2016].
[15] Introduction to Hadoop Distributed File System Version 1.0 and 2.0,http://www.informit.com/articles/article.aspx?p=2460260&seqNum=, [20.12.2016].
[16] An introduction to the Hadoop Distributed File System, https://www.ibm.com/developerworks/library/wa-introhdfs/,[20 grudzień 2016]
[17] Apache Hadoop YARN, http://hadoop.apache.org/docs/ stable2/hadoop-yarn/hadoop-yarn-site/YARN.html, [20.12.2016].
[18] Apache Hadoop YARN – concepts and applications, http://hortonworks.com/blog/apache-hadoop-yarn-conceptsand-applications/, [20 grudzień 2016].
[19] Looking at the code behund our tree uses of Apache Hadoop, https://www.facebook.com/notes/facebook-engineering/ looking- at-the-code-behind-our-three-uses-of-apache-hadoop/ 468211193919/, [21.12.2016].
[20] How facebook is Using Big Data – The Good, the Bad, and the Ugly, https://www.simplilearn.com/how-facebook-is-using-bigdata-article, [20.12.2016].
[21] How Facebook users Hadoop and Hive, http://www.hadooptpoint.com/facebook-uses-hadoop-hive/,[2015].
[22] Hadoop at Twitter, https://blog.twitter.com/2010/hadoop-attwitter, [22.12.2016].
[23] How LinkedIn uses Hadoop to leverage Big Data Analytics, https://www.dezyre.com/article/how-linkedin-uses-hadoop-toleverage-big-data-analytics/229, [22.12.2016].
[24] How Yahoo using Hadoop in real time, http://www.hadooptpoint.com/how-yahoo-using-hadoop-inreal-time/, [22.12.2016].
[25] Yahoo and Hadoop: In it for the long term, http://www.informationweek.com/database/yahoo-and-hadoopin-it-for-the-long-term/d/d-id/1104866?, [26.12.2016].
[26] What is Apache Hadoop, http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F, [26.12.2016].
[27] What is Apache HBase, http://hortonworks.com/apache/hbase/,[26.12.2016].
[28] Apache HBase Reference Giude, http://hbase.apache.org/book.html#arch.overview, [27 grudzień 2016].
Download


Published
2017-09-30

Cited by

Gil, M. (2017). Apache Hadoop, platform for the collection, processing and analysis of large data sets . Journal of Computer Sciences Institute, 4, 70–75. https://doi.org/10.35784/jcsi.596

Authors

Mateusz Gil 
mateusz.gil5@gmail.com
Institute of Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, Poland Poland

Statistics

Abstract views: 221
PDF downloads: 507