• Hadoop--Hadoop基准测试(读写)


    Hadoop自带了几个基准测试,本文使用的是hadoop-2.6.0

    一、Hadoop Test 的测试

    [root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar
    An example program must be given as the first argument.
    Valid program names are:
    DFSCIOTest: Distributed i/o benchmark of libhdfs.
    DistributedFSCheck: Distributed checkup of the file system consistency.
    JHLogAnalyzer: Job History Log analyzer.
    MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
    SliveTest: HDFS Stress Test and Live Data Verification.
    TestDFSIO: Distributed i/o benchmark.
    fail: a job that always fails
    filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
    largesorter: Large-Sort tester
    loadgen: Generic map/reduce load generator
    mapredtest: A map/reduce test check.
    minicluster: Single process HDFS and MR cluster.
    mrbench: A map/reduce benchmark that can create many small jobs
    nnbench: A benchmark that stresses the namenode.
    sleep: A job that sleeps at each map and reduce task.
    testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
    testfilesystem: A test for FileSystem read/write.
    testmapredsort: A map/reduce program that validates the map-reduce framework’s sort.
    testsequencefile: A test for flat files of binary key value pairs.
    testsequencefileinputformat: A test for sequence file input format.
    testtextinputformat: A test for text input format.
    threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill

    这些例子从多个角度对Hadoop进行测试,其中 TestDFSIO、mrbench和nnbench是三个广泛被使用的测试。

    1、TestDFSIO 测试

    ① TestDFSIO write

    测试hadoop写的速度。

    TestDFSIO的用法如下:

    Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]

    向HDFS文件系统中写入数据,10个文件,每个文件10MB,文件存放到/benchmarks/TestDFSIO/io_data下面。

    [root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar TestDFSIO -write -nrFiles 10 -size 10MB

    跑出来的数据如下图:

    查看写入的结果:

    [root@master hadoop-2.6.0]# cat TestDFSIO_results.log
    ----- TestDFSIO ----- : write
    Date & time: Fri Sep 23 19:21:01 CST 2016
    Number of files: 10
    Total MBytes processed: 100.0
    Throughput mb/sec: 1.7217037980785785
    Average IO rate mb/sec: 1.9971516132354736
    IO rate std deviation: 0.9978736646901237
    Test exec time sec: 81.711

    ② TestDFSIO read

    测试hadoop读文件的速度

    从HDFS文件系统中读入10个文件,每个文件大小为10MB

    [root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar TestDFSIO -read -nrFiles 10 -size 10

    [root@master hadoop-2.6.0]# cat TestDFSIO_results.log

    ----- TestDFSIO ----- : write
    Date & time: Fri Sep 23 19:21:01 CST 2016
    Number of files: 10
    Total MBytes processed: 100.0
    Throughput mb/sec: 1.7217037980785785
    Average IO rate mb/sec: 1.9971516132354736
    IO rate std deviation: 0.9978736646901237
    Test exec time sec: 81.711

    ----- TestDFSIO ----- : read
    Date & time: Fri Sep 23 19:37:21 CST 2016
    Number of files: 10
    Total MBytes processed: 100.0
    Throughput mb/sec: 14.85001485001485
    Average IO rate mb/sec: 16.221948623657227
    IO rate std deviation: 4.983088493832205
    Test exec time sec: 50.188

    ③ 清空测试数据

    [root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar TestDFSIO -clean

    如下图所示:

    2、nnbench 测试 [NameNode benchmark (nnbench)]

    nnbench用于测试NameNode的负载,它会生成很多与HDFS相关的请求,给NameNode施加较大的压力。

    这个测试能在HDFS上创建、读取、重命名和删除文件操作。

    nnbench 的用法:

    [root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar nnbench
    NameNode Benchmark 0.4
    Usage: nnbench
    Options:
    -operation

    以下例子使用10个mapper和5个reducer来创建1000个文件

    [root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar nnbench -operation create_write -maps 10 -reduces 5 -numberOfFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true

    3、mrbench测试[MapReduce benchmark (mrbench)]

    mrbench会多次重复执行一个小作业,用于检查在机群上小作业的运行是否可重复以及运行是否高效。

    [root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar mrbench --help
    MRBenchmark.0.0.2
    Usage: mrbench [-baseDir DFS path for output/input, default is /benchmarks/MRBench>] [-jar ] [-numRuns ] [-maps ] [-reduces ] [-inputLines ] [-inputType ] [-verbose]

    下面的例子会运行一个小作业50次:

    [root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar mrbench-numRuns 50

    这样会运行50次。

    二、Hadoop Examples 的测试

    [root@master hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar
    An example program must be given as the first argument.
    Valid program names are:
    aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
    aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
    bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
    dbcount: An example job that count the pageview counts from a database.
    distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
    grep: A map/reduce program that counts the matches of a regex in the input.
    join: A job that effects a join over sorted, equally partitioned datasets
    multifilewc: A job that counts words from several files.
    pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
    pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
    randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
    randomwriter: A map/reduce program that writes 10GB of random data per node.
    secondarysort: An example defining a secondary sort to the reduce.
    sort: A map/reduce program that sorts the data written by the random writer.
    sudoku: A sudoku solver.
    teragen: Generate data for the terasort
    terasort: Run the terasort
    teravalidate: Checking results of terasort
    wordcount: A map/reduce program that counts the words in the input files.
    wordmean: A map/reduce program that counts the average length of the words in the input files.
    wordmedian: A map/reduce program that counts the median length of the words in the input files.
    wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

    最常用的就是wordcount。
    ————————————————
    版权声明:本文为CSDN博主「Polaris-zlf」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接及本声明。
    原文链接:https://blog.csdn.net/u012689336/article/details/52635513

  • 相关阅读:
    Golang 切片作为函数参数传递的陷阱与解答
    阿里巴巴OceanBase介绍
    _分页查询
    【Spring】一文带你吃透AOP面向切面编程技术(上篇)
    ‘mysql‘不是内部或外部命令,也不是可运行的程序或批处理文件
    利用excel批量修改图片文件名
    区块链服务网络BSN季度版本迭代说明「2022年9月6日」
    微信小程序 java农产品商城供销系统#计算机毕业设计
    MATLAB图像处理工具箱的高级算法详解
    【算法导论】摊还分析
  • 原文地址:https://blog.csdn.net/web15085599741/article/details/126366163