• hadoop 常用命令


    hadoop 常用命令

    hadoop fs -mkdir /test
    hadoop fs -put /opt/frank/tb_test03.txt /test/
    hadoop fs -ls /test/
    hadoop fs -cat /test/tb_test03.txt
    hadoop fs -rm /test/tb_test03.txt

    hadoop dfs 也能使用、但不推荐,执行会提示:

    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.

    常看hdfs系统使用情况
    hadoop fs -du -s -h /


    Hive LOAD 语法

    LOAD DATA [ LOCAL ] INPATH {file_path} [ OVERWRITE ] INTO TABLE { table_name } [ PARTITION(partition_colname1="val1", partition_colname2="val2",)... ];

    LOAD 示例:
    -- 从本地OS系统目录加载文件到Hive表
    LOAD DATA LOCAL INPATH '/opt/frank/tb_test03.txt' INTO TABLE tb_test03;
    LOAD DATA LOCAL INPATH '/opt/frank/tb_test03.txt' OVERWRITE INTO TABLE tb_test03;
    LOAD DATA LOCAL INPATH '/opt/frank/tb_test04_pt.txt' OVERWRITE INTO TABLE tb_test04_pt PARTITION(pt="20240101");

    -- 从HDFS系统目录加载文件到Hive表
    hadoop fs -put /opt/frank/tb_test03.txt /test/
    LOAD DATA INPATH '/test/tb_test03.txt' INTO TABLE tb_test03;


    slave 节点查看 datanode 服务状态:
    $ jps -v |grep DataNode
    $ hadoop dfsadmin -report


    slave 节点重启(停止、启动) datanode 服务:
    $ ./sbin/hadoop-daemon.sh stop datanode
    $ ./sbin/hadoop-daemon.sh start datanode


    查看 HDFS 文件中系统的DFS使用情况:
    $ hadoop fs -du -s -h /


    垃圾清理(多次执行):
    $ hadoop fs -expunge


    datanode 使用率占满&配置容量显示为0的问题【DFS Used%: 100.00% & Configured Capacity: 0 (0 B)】
    $ hadoop dfsadmin -report
    查看到使用率占满,显示:
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.

    Configured Capacity: 0 (0 B)
    Present Capacity: 0 (0 B)
    DFS Remaining: 0 (0 B)
    DFS Used: 0 (0 B)
    DFS Used%: NaN%
    Under replicated blocks: 76125
    Blocks with corrupt replicas: 0
    Missing blocks: 76125
    Missing blocks (with replication factor 1): 21993

    -------------------------------------------------
    Live datanodes (1):

    Name: 192.168.1.188:50010 (hadoop01)
    Hostname: hadoop01
    Decommission Status : Normal
    Configured Capacity: 0 (0 B)
    DFS Used: 0 (0 B)
    Non DFS Used: 0 (0 B)
    DFS Remaining: 0 (0 B)
    DFS Used%: 100.00%
    DFS Remaining%: 0.00%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 0
    Last contact: Mon Mar 25 17:02:43 CST 2024


    配置容量显示为0了:
    Configured Capacity: 0 (0 B)

    尝试多种方法、最终原因是 slave 中配置的 datanode 节点 hostname 配置成了  localhost 而导致的问题。
    vi salve 把配置的 datanode 主机名配置为  hadoop01 后,重启 datanode 服务,问题就解决了。

    $ ./sbin/hadoop-daemon.sh stop datanode
    $ ./sbin/hadoop-daemon.sh start datanode
    $ hadoop dfsadmin -report
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.

    Configured Capacity: 98337751040 (91.58 GB)
    Present Capacity: 65340043264 (60.85 GB)
    DFS Remaining: 61911707648 (57.66 GB)
    DFS Used: 3428335616 (3.19 GB)
    DFS Used%: 5.25%
    Under replicated blocks: 73720
    Blocks with corrupt replicas: 0
    Missing blocks: 82
    Missing blocks (with replication factor 1): 21993

    -------------------------------------------------
    Live datanodes (1):

    Name: 192.168.1.188:50010 (hadoop01)
    Hostname: hadoop01
    Decommission Status : Normal
    Configured Capacity: 98337751040 (91.58 GB)
    DFS Used: 3428335616 (3.19 GB)
    Non DFS Used: 32997707776 (30.73 GB)
    DFS Remaining: 61911707648 (57.66 GB)
    DFS Used%: 3.49%
    DFS Remaining%: 62.96%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Mon Mar 25 17:27:05 CST 2024

    Hive建表和LOAD数据:

    -- 普通表(TextFile存储格式)
    drop table if exists testdb.tb_test03;
    create table testdb.tb_test03 (
    id int, 
    info string,
    cnt bigint)
    -- partitioned by (pt_sheng string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
    STORED AS TextFile
    -- STORED AS INPUTFORMAT 
    --    'org.apache.hadoop.mapred.TextInputFormat' 
    --  OUTPUTFORMAT 
    --    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION 'hdfs://192.168.1.188:9000/user/hive/warehouse/testdb.db/tb_test03'
    ;

    --  hadoop fs -ls /user/hive/warehouse/testdb.db/tb_test03.txt

    show tables;
    show create table testdb.tb_test03;
    select * from testdb.tb_test03;

    -- 方法1:
    $ echo '1,jack,95
    2,frank,96
    3,lucy,97
    4,hack,99' > /opt/frank/tb_test03.txt

    -- hiveSQL: load from Local OS dir
    LOAD DATA LOCAL INPATH '/opt/frank/tb_test03.txt' OVERWRITE INTO TABLE tb_test03;

    -- 方法2:
    -- shell_cmd: 先拷贝到 hdfs, 再从 hdfs 路径 load
    -- $  hadoop fs -rm /frank/tb_test03.txt
    -- $  hadoop fs -put /opt/frank/tb_test03.txt /frank/
    -- $  hadoop fs -cat /frank/tb_test03.txt
    -- -- hiveSQL: load from HDFS FileSystem dir
    -- LOAD DATA INPATH '/frank/tb_test03.txt' OVERWRITE INTO TABLE tb_test03;
    select * from tb_test03;

    -- load后会在配置文件决定的固定目录下产生一个表名对应的目录,目录下为对应的数据文件
    $  hadoop fs -ls /user/hive/warehouse/testdb.db/tb_test03/

    drop table if exists testdb.tb_test03;
    create table testdb.tb_test03 (
    id int, 
    info string,
    cnt bigint)
    -- partitioned by (pt_sheng string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
    -- STORED AS TextFile
    STORED AS TextFile 
       'org.apache.hadoop.mapred.TextInputFormat' 
    OUTPUTFORMAT 
       'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION 'hdfs://192.168.1.188:9000/user/hive/warehouse/testdb.db/tb_test03'
    ;

    show tables;
    show create table testdb.tb_test03;
    select * from testdb.tb_test03;


    -- 分区表(TextFile存储格式)
    drop table if exists testdb.tb_test04_pt;
    create table testdb.tb_test04_pt (
    id int, 
    info string,
    cnt bigint)
    PARTITIONED BY (pt string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
    STORED AS TextFile
    LOCATION 'hdfs://192.168.1.188:9000/user/hive/warehouse/testdb.db/tb_test04_pt'
    ;

    show tables;
    show create table testdb.tb_test04_pt;
    select * from testdb.tb_test04_pt;

    echo '1,jack,95
    2,frank,96
    3,lucy,97
    4,hack,99' > /opt/frank/tb_test04_pt.txt

    LOAD DATA LOCAL INPATH '/opt/frank/tb_test04_pt.txt' OVERWRITE INTO TABLE tb_test04_pt PARTITION(pt="20240101");


    如果创建 PARQUET 存储格式的表则指定 STORED AS PARQUET

  • 相关阅读:
    鸿鹄工程项目管理系统em Spring Cloud+Spring Boot+前后端分离构建工程项目管理系统
    如何选择适合企业的SQL开发工具
    螺旋矩阵、旋转矩阵、矩阵Z字打印
    Himall验证帮助类是否是数值、 是否为邮政编码、是否是图片文件名、判断一个ip是否在另一个ip内
    云服务器 CentOS7 操作系统上安装Jpress (Tomcat 部署项目)
    kingdee漏洞金蝶云星空存在弱口令漏洞
    支付宝sdk——python-alipay-sdk
    Linux多线程C++版(八) 线程同步方式-----条件变量
    海量小文件数据传输如何确保安全性
    【BP回归预测】基于matlab文化算法优化BP神经网络数据回归预测【含Matlab源码 2124期】
  • 原文地址:https://blog.csdn.net/sunny05296/article/details/137031991