• Linux下大文件切割与合并


    Linux下大文件切割与合并

    Linux下大文件切割与合并

    1. 文件切割 - split

    在 Linux 系统下使用 split 命令进行大文件切割很方便

    [1] 命令语法

    #-a: 指定输出文件名的后缀长度(默认为2个:aa,ab…)
    #-d: 指定输出文件名的后缀用数字代替
    #-l: 行数分割模式(指定每多少行切成一个小文件;默认行数是1000行)
    #-b: 二进制分割模式(支持单位:k/m)
    #-C: 文件大小分割模式(切割时尽量维持每行的完整性)
    split [-a] [-d] [-l <行数>] [-b <字节>] [-C <字节>] [要切割的文件] [输出文件名]

    [2] 使用实例

    #行切割文件
    $ split -l 300000 users.sql /data/users_

    #使用数字后缀
    $ split -d -l 300000 users.sql /data/users_

    #按字节大小分割
    $ split -d -b 100m users.sql /data/users_

    [3] 帮助信息

    #帮助信息
    $ split --help
    Usage: split [OPTION]… [FILE [PREFIX]]
    Output pieces of FILE to PREFIXaa, PREFIXab, …;
    default size is 1000 lines, and default PREFIX is ‘x’.

    With no FILE, or when FILE is -, read standard input.

    Mandatory arguments to long options are mandatory for short options too.
    -a, --suffix-length=N generate suffixes of length N (default 2) 后缀名称的长度(默认为2)
    –additional-suffix=SUFFIX append an additional SUFFIX to file names
    -b, --bytes=SIZE put SIZE bytes per output file 每个输出文件的字节大小
    -C, --line-bytes=SIZE put at most SIZE bytes of records per output file 每个输出文件的最大字节大小
    -d use numeric suffixes starting at 0, not alphabetic 使用数字后缀代替字母后缀
    –numeric-suffixes[=FROM] same as -d, but allow setting the start value
    -e, --elide-empty-files do not generate empty output files with ‘-n’ 不产生空的输出文件
    –filter=COMMAND write to shell COMMAND; file name is $FILE 写入到shell命令行
    -l, --lines=NUMBER put NUMBER lines/records per output file 设定每个输出文件的行数
    -n, --number=CHUNKS generate CHUNKS output files; see explanation below 产生chunks文件
    -t, --separator=SEP use SEP instead of newline as the record separator; 使用新字符分割
    ‘\0’ (zero) specifies the NUL character
    -u, --unbuffered immediately copy input to output with ‘-n r/…’ 无需缓存
    –verbose print a diagnostic just before each 显示分割进度
    output file is opened
    –help display this help and exit 显示帮助信息
    –version output version information and exit 显示版本信息

    The SIZE argument is an integer and optional unit (example: 10K is 10*1024).
    Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,… (powers of 1000).

    CHUNKS may be:
    N split into N files based on size of input
    K/N output Kth of N to stdout
    l/N split into N files without splitting lines/records
    l/K/N output Kth of N to stdout without splitting lines/records
    r/N like ‘l’ but use round robin distribution
    r/K/N likewise but only output Kth of N to stdout

    GNU coreutils online help: http://www.gnu.org/software/coreutils/
    Full documentation at: http://www.gnu.org/software/coreutils/split
    or available locally via: info ‘(coreutils) split invocation’

    2. 文件合并 - cat

    在 Linux 系统下使用 cat 命令进行多个小文件的合并也很方便

    [1] 命令语法

    #-n: 显示行号
    #-e: 以$字符作为每行的结尾
    #-t: 显示TAB字符(^I)
    cat [-n] [-e] [-t] [输出文件名]

    [2] 使用实例

    #合并文件
    $ cat /data/users_* > users.sql

    [3] 帮助信息

    #帮助信息
    $ cat --h
    Usage: cat [OPTION]… [FILE]…
    Concatenate FILE(s) to standard output.

    With no FILE, or when FILE is -, read standard input.

    -A, --show-all equivalent to -vET
    -b, --number-nonblank number nonempty output lines, overrides -n
    -e equivalent to -vE
    -E, --show-ends display $ at end of each line
    -n, --number number all output lines
    -s, --squeeze-blank suppress repeated empty output lines
    -t equivalent to -vT
    -T, --show-tabs display TAB characters as ^I
    -u (ignored)
    -v, --show-nonprinting use ^ and M- notation, except for LFD and TAB
    –help display this help and exit
    –version output version information and exit

    Examples:
    cat f - g Output f’s contents, then standard input, then g’s contents.
    cat Copy standard input to standard output.

    GNU coreutils online help: http://www.gnu.org/software/coreutils/
    Full documentation at: http://www.gnu.org/software/coreutils/cat
    or available locally via: info ‘(coreutils) cat invocation’

    文章作者: Escape
    文章链接: https://www.escapelife.site/posts/72f237d3.html

  • 相关阅读:
    Tomcat
    工地安全着装识别系统
    Flutter简单聊天界面布局及语音录制播放
    Linux系统编程学习 NO.9——git、gdb
    HTML5期末大作业:基于 html css js仿腾讯课堂首页
    基于JSP的图书销售管理系统
    Kafka 高可用
    如何在Python中使用十进制模块
    降水检验项目展示(工作机会/私活请私信我)
    记录一次工作中的大表优化方案
  • 原文地址:https://blog.csdn.net/mengmeng_921/article/details/126953885