Hadoop快速上手-1

http://localhost:50070/
http://localhost:8088/cluster

1 问题排查

➜  /Users/zhaoshuai11/work/hadoop-2.7.3 hadoop fs -ls /
22/07/30 09:43:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
1
2
3
4

22/07/30 09:58:59 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
为什么执行mr 首先请求yarn?

2 hdfs 分布式文件系统

数据量大：多机横向发展，理论上无限扩展。
元数据记录：元数据记录-文件及其存储位置信息，快速定位文件位置。
文件分块存储：文件分块存储在不同机器，针对块并行操作提高效率。
副本机制：不同机器设置备份，冗余存储，保障数据安全。

在这里插入图片描述

➜  /Users/zhaoshuai11 hadoop fs -put java_error_in_idea_14849.log /itcast
➜  /Users/zhaoshuai11 hadoop fs -ls -h /itcast
Found 4 items
-rw-r--r--   1 zhaoshuai11 supergroup         21 2022-07-30 09:55 /itcast/hello.txt
-rw-r--r--   1 zhaoshuai11 supergroup    641.9 K 2022-07-30 09:48 /itcast/java_error_in_idea_11861.log
-rw-r--r--   1 zhaoshuai11 supergroup     98.9 K 2022-07-30 10:34 /itcast/java_error_in_idea_14849.log
drwxr-xr-x   - zhaoshuai11 supergroup          0 2022-07-30 09:59 /itcast/wordcount
1
2
3
4
5
6
7

➜  /Users/zhaoshuai11 hadoop fs -cat /itcast/hello.txt
hadoop hadoop hadoop
➜  /Users/zhaoshuai11 hadoop fs -get /itcast/wordcount/output/part-r-00000 ./
➜  /Users/zhaoshuai11 ll
-rw-r--r--   1 zhaoshuai11  staff     9B  7 30 10:41 part-r-00000
➜  /Users/zhaoshuai11 cat part-r-00000
hadoop	3
1
2
3
4
5
6
7

小文件合并：

➜  /Users/zhaoshuai11 hadoop fs -appendToFile login.sh login-ext.sh /itcast/hello.txt
➜  /Users/zhaoshuai11 hadoop fs -cat /itcast/hello.txt
		hadoop hadoop hadoop
		#!/usr/bin/expect -f
		set host "relay.baidu-int.com"
		set username "zhaoshuai11"
		set password "zs19961211."
		
		spawn ssh $username@$host
		expect "*Please input user's password*" {send "$password\r"}
		interact#!/bin/sh
		basepath=$(cd `dirname $0`; pwd)
		export LC_CTYPE=en_US
		#expect脚本所在位置
		filepath=$1
		if [ -f $filepath ]; then
		  expect $filepath
		else
		  echo "$filepath not exits"
		fi%
➜  /Users/zhaoshuai11
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

3 hdfs 角色解读

在这里插入图片描述

4 hdfs 上传文件

在这里插入图片描述

5 MR

在这里插入图片描述

6 MR 特点

在这里插入图片描述

7 MR局限性

在这里插入图片描述

8 MR 实例进程

在这里插入图片描述

9 MR 官方案例圆周率评估

在这里插入图片描述

➜  /Users/zhaoshuai11/work/hadoop-2.7.3/share/hadoop/mapreduce hadoop jar hadoop-mapreduce-examples-2.7.3.jar pi 2 2
Number of Maps  = 2
Samples per Map = 2
Wrote input for Map #0
Wrote input for Map #1
Starting Job
22/07/30 15:38:57 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/07/30 15:38:57 INFO input.FileInputFormat: Total input paths to process : 2
22/07/30 15:38:58 INFO mapreduce.JobSubmitter: number of splits:2
22/07/30 15:38:59 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1659144775026_0003
22/07/30 15:38:59 INFO impl.YarnClientImpl: Submitted application application_1659144775026_0003
22/07/30 15:38:59 INFO mapreduce.Job: The url to track the job: http://127.0.0.1:8088/proxy/application_1659144775026_0003/
22/07/30 15:38:59 INFO mapreduce.Job: Running job: job_1659144775026_0003
22/07/30 15:39:09 INFO mapreduce.Job: Job job_1659144775026_0003 running in uber mode : false
22/07/30 15:39:09 INFO mapreduce.Job:  map 0% reduce 0%
22/07/30 15:39:15 INFO mapreduce.Job:  map 100% reduce 0%
22/07/30 15:39:24 INFO mapreduce.Job:  map 100% reduce 100%
22/07/30 15:39:25 INFO mapreduce.Job: Job job_1659144775026_0003 completed successfully
22/07/30 15:39:25 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=50
		FILE: Number of bytes written=357444
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=540
		HDFS: Number of bytes written=215
		HDFS: Number of read operations=11
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Job Counters
		Launched map tasks=2
		Launched reduce tasks=1
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=8543
		Total time spent by all reduces in occupied slots (ms)=4822
		Total time spent by all map tasks (ms)=8543
		Total time spent by all reduce tasks (ms)=4822
		Total vcore-milliseconds taken by all map tasks=8543
		Total vcore-milliseconds taken by all reduce tasks=4822
		Total megabyte-milliseconds taken by all map tasks=8748032
		Total megabyte-milliseconds taken by all reduce tasks=4937728
	Map-Reduce Framework
		Map input records=2
		Map output records=4
		Map output bytes=36
		Map output materialized bytes=56
		Input split bytes=304
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=56
		Reduce input records=4
		Reduce output records=0
		Spilled Records=8
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=201
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=534773760
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters
		Bytes Read=236
	File Output Format Counters
		Bytes Written=97
Job Finished in 28.108 seconds
Estimated value of Pi is 4.00000000000000000000
➜  /Users/zhaoshuai11/work/hadoop-2.7.3/share/hadoop/mapreduce
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

22/07/30 15:38:57 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 向yarn索要资源

10 MR wordcount单词统计

在这里插入图片描述

➜  /Users/zhaoshuai11/work/hadoop-2.7.3/share/hadoop/mapreduce hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /itcast/hello.txt /itcast/hello-count
22/07/30 15:49:26 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/07/30 15:49:26 INFO input.FileInputFormat: Total input paths to process : 1
22/07/30 15:49:27 INFO mapreduce.JobSubmitter: number of splits:1
22/07/30 15:49:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1659144775026_0004
22/07/30 15:49:27 INFO impl.YarnClientImpl: Submitted application application_1659144775026_0004
22/07/30 15:49:27 INFO mapreduce.Job: The url to track the job: http://127.0.0.1:8088/proxy/application_1659144775026_0004/
22/07/30 15:49:27 INFO mapreduce.Job: Running job: job_1659144775026_0004
22/07/30 15:49:37 INFO mapreduce.Job: Job job_1659144775026_0004 running in uber mode : false
22/07/30 15:49:37 INFO mapreduce.Job:  map 0% reduce 0%
22/07/30 15:49:43 INFO mapreduce.Job:  map 100% reduce 0%
22/07/30 15:49:49 INFO mapreduce.Job:  map 100% reduce 100%
22/07/30 15:49:50 INFO mapreduce.Job: Job job_1659144775026_0004 completed successfully
22/07/30 15:49:50 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=607
		FILE: Number of bytes written=238801
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=510
		HDFS: Number of bytes written=441
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=3442
		Total time spent by all reduces in occupied slots (ms)=3708
		Total time spent by all map tasks (ms)=3442
		Total time spent by all reduce tasks (ms)=3708
		Total vcore-milliseconds taken by all map tasks=3442
		Total vcore-milliseconds taken by all reduce tasks=3708
		Total megabyte-milliseconds taken by all map tasks=3524608
		Total megabyte-milliseconds taken by all reduce tasks=3796992
	Map-Reduce Framework
		Map input records=18
		Map output records=47
		Map output bytes=591
		Map output materialized bytes=607
		Input split bytes=103
		Combine input records=47
		Combine output records=40
		Reduce input groups=40
		Reduce shuffle bytes=607
		Reduce input records=40
		Reduce output records=40
		Spilled Records=80
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=122
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=332398592
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters
		Bytes Read=407
	File Output Format Counters
		Bytes Written=441
➜  /Users/zhaoshuai11/work/hadoop-2.7.3/share/hadoop/mapreduce
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70

➜ /Users/zhaoshuai11/work/hadoop-2.7.3/share/hadoop/mapreduce hadoop fs -get /itcast/hello-count ./

➜  /Users/zhaoshuai11/work/hadoop-2.7.3/share/hadoop/mapreduce/hello-count cat part-r-00000
"$filepath	1
"$password\r"}	1
"*Please	1
"relay.baidu-int.com"	1
"zhaoshuai11"	1
"zs19961211."	1
#!/usr/bin/expect	1
#expect脚本所在位置	1
$0`;	1
$filepath	2
$username@$host	1
-f	2
LC_CTYPE=en_US	1
[	1
];	1
`dirname	1
basepath=$(cd	1
echo	1
else	1
exits"	1
expect	2
export	1
fi	1
filepath=$1	1
hadoop	3
host	1
if	1
input	1
interact#!/bin/sh	1
not	1
password	1
password*"	1
pwd)	1
set	3
spawn	1
ssh	1
then	1
user's	1
username	1
{send	1
➜  /Users/zhaoshuai11/work/hadoop-2.7.3/share/hadoop/mapreduce/hello-count
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

11 Map

在这里插入图片描述

12 Reduce

在这里插入图片描述
reduce 主动拉取 mapTask 数据。

13 shuffle

在这里插入图片描述

14 Yarn

在这里插入图片描述

15 程序提交Yarn集群交互流程

在这里插入图片描述

16 资源调度器scheduler和调度策略

在这里插入图片描述

相关阅读:
Git撤销已经push到远程分支的commit
百万年薪架构师谈：掌握这【6+2】学习路线进BAT拿月薪40k真不难
 3D工业相机及品牌集合
 java中Date类之GMT、UTC
提高工作效率的有效途径：五分钟快速学会搭建悟空CRM内网穿透
 CSS Vue/RN 背景使用opacity，文字在背景上显示
 NetCore配置详解(2)
JAVA设计模式-责任链模式
 Java版本spring cloud + spring boot企业电子招投标系统源代码
 面试算法牛客题目链表相加(二)
原文地址：https://blog.csdn.net/zs18753479279/article/details/126068179