• Python运维学习Day01-文件基本操作


    1.遍历目录下所有的文件

    def getFileName(directory):
        file_list = []
        for dir_name, sub_dir,file_name_list in os.walk(directory):
            #print(dir_name,sub_dir,file_list)
            if file_name_list:
                for file in file_name_list:
                    file_path_abs = fr'{dir_name}/{file}'
                    file_list.append(file_path_abs)
        return file_list
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

    1.1 这里主要利用os.walk 函数的功能

    我们看下os.walk的用法

    In [6]: os.walk??
    Signature: os.walk(top, topdown=True, onerror=None, followlinks=False)
    Source:
    def walk(top, topdown=True, onerror=None, followlinks=False):
        """Directory tree generator.
    
        For each directory in the directory tree rooted at top (including top
        itself, but excluding '.' and '..'), yields a 3-tuple
    
            dirpath, dirnames, filenames
    
        dirpath is a string, the path to the directory.  dirnames is a list of
        the names of the subdirectories in dirpath (excluding '.' and '..').
        filenames is a list of the names of the non-directory files in dirpath.
        Note that the names in the lists are just names, with no path components.
        To get a full path (which begins with top) to a file or directory in
        dirpath, do os.path.join(dirpath, name).
    
        If optional arg 'topdown' is true or not specified, the triple for a
        directory is generated before the triples for any of its subdirectories
        (directories are generated top down).  If topdown is false, the triple
        for a directory is generated after the triples for all of its
        subdirectories (directories are generated bottom up).
    
        When topdown is true, the caller can modify the dirnames list in-place
        (e.g., via del or slice assignment), and walk will only recurse into the
        subdirectories whose names remain in dirnames; this can be used to prune the
        search, or to impose a specific order of visiting.  Modifying dirnames when
        topdown is false has no effect on the behavior of os.walk(), since the
        directories in dirnames have already been generated by the time dirnames
        itself is generated. No matter the value of topdown, the list of
        subdirectories is retrieved before the tuples for the directory and its
        subdirectories are generated.
    
        By default errors from the os.scandir() call are ignored.  If
        optional arg 'onerror' is specified, it should be a function; it
        will be called with one argument, an OSError instance.  It can
        report the error to continue with the walk, or raise the exception
        to abort the walk.  Note that the filename is available as the
        filename attribute of the exception object.
    
        By default, os.walk does not follow symbolic links to subdirectories on
        systems that support them.  In order to get this functionality, set the
        optional argument 'followlinks' to true.
    
        Caution:  if you pass a relative pathname for top, don't change the
        current working directory between resumptions of walk.  walk never
        changes the current directory, and assumes that the client doesn't
        either.
    
        Example:
    
        import os
        from os.path import join, getsize
        for root, dirs, files in os.walk('python/Lib/email'):
            print(root, "consumes", end="")
            print(sum(getsize(join(root, name)) for name in files), end="")
            print("bytes in", len(files), "non-directory files")
            if 'CVS' in dirs:
                dirs.remove('CVS')  # don't visit CVS directories
    
        """
        sys.audit("os.walk", top, topdown, onerror, followlinks)
        return _walk(fspath(top), topdown, onerror, followlinks)
    File:      c:\users\thinkpad\appdata\local\programs\python\python39\lib\os.py
    Type:      function
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66

    Signature : 函数签名,与其他函数一样,函数签名是区别两个函数是否是同一个函数的唯一标志(敲黑板面试-可能问到的重点)。包括函数名函数列表
    Source: 函数的源代码
    该函数的功能是目录(directory)树生成器
    以顶部为根的目录树中的每一个目录(包括本身,但不包括父目录),会生成一个三元组,(dirpath,dirnames,filenames)
    dirpath–>string: 一个字符串,目录树的路径。
    dirnames–>list: 是dirpath的子目录列表(不包括"."-本身-dirpath, “…” -父目录)
    filenames–>list: 非目录文件列表,一般这个为空表示dirpath下全是目录,不包含文件,如果非空表示为根节点,可以确定文件的路径了。
    综合起来看就是表示 dirpath目录下包含dirnames目录和filenames文件
    因此,遍历每个文件夹中的文件就是: filenames 不为空,即可确定文件的路径为dirpath+filenames[x]
    Type: 函数

    2. 计算文件的 MD5 值

    def fileMD5(filePathAbs):
        md5_tool = hashlib.md5()
        with open(filePathAbs,mode='rb') as fobj:
            while True:
                data = fobj.read(4096)
                if data:
                    md5_tool.update(data)
                else:
                    break
        return md5_tool.hexdigest()
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    这里使用hashlib模块的md5函数求文件的md5码,我们先来看看md5函数的说明

    In [8]: hashlib.md5??
    Signature: hashlib.md5(string=b'', *, usedforsecurity=True)
    Docstring: Returns a md5 hash object; optionally initialized with a string
    Type:      builtin_function_or_method
    
    • 1
    • 2
    • 3
    • 4

    Type: 内置函数,表示这个函数一般运行很快。
    这里初始化是对string=b’'求md5值,并返回一个hash类型的对象,我们看下其用法

    In [15]: t = hashlib.md5()
    
    In [16]: t
    Out[16]: <md5 _hashlib.HASH object @ 0x00000205960238F0>
    
    In [17]: dir(t)
    Out[17]:
    ['__class__',
     '__delattr__',
     '__dir__',
     '__doc__',
     '__eq__',
     '__format__',
     '__ge__',
     '__getattribute__',
     '__gt__',
     '__hash__',
     '__init__',
     '__init_subclass__',
     '__le__',
     '__lt__',
     '__module__',
     '__ne__',
     '__new__',
     '__reduce__',
     '__reduce_ex__',
     '__repr__',
     '__setattr__',
     '__sizeof__',
     '__str__',
     '__subclasshook__',
     'block_size',
     'copy',
     'digest',
     'digest_size',
     'hexdigest',
     'name',
     'update']
    
    In [18]: t.hexdigest()
    Out[18]: 'd41d8cd98f00b204e9800998ecf8427e'
    
    In [19]:
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43

    我们再看看update方法

    In [19]: t.update??
    Signature: t.update(obj, /)
    Docstring: Update this hash object's state with the provided string.
    Type:      builtin_function_or_method
    
    • 1
    • 2
    • 3
    • 4

    该方法是根据提供的string更新其hash对象的值。

    3. 我们组合下两个函数,遍历下某个文件夹下的文件的md5码

    import os,hashlib
    
    def getFileName(directory):
        file_list = []
        for dir_name, sub_dir,file_name_list in os.walk(directory):
            #print(dir_name,sub_dir,file_list)
            if file_name_list:
                for file in file_name_list:
                    file_path_abs = fr'{dir_name}/{file}'
                    file_list.append(file_path_abs)
        return file_list
    
    def fileMD5(filePathAbs):
        md5_tool = hashlib.md5()
        with open(filePathAbs,mode='rb') as fobj:
            while True:
                data = fobj.read(4096)
                if data:
                    md5_tool.update(data)
                else:
                    break
        return md5_tool.hexdigest()
    if __name__ == '__main__':
        file_list = getFileName(r"E:/Project/Support/Day01/北京")
        for file in file_list:
            md5 = fileMD5(file)
            print(file,md5)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27

    目录结构:
    在这里插入图片描述
    运行结果:

    
    In [23]: run fileManager.py
    E:/Project/Support/Day01/北京/a - 副本 (2).txt da5d6d8941b3381fb7565c57db7a9ead
    E:/Project/Support/Day01/北京/a - 副本 (3).txt da5d6d8941b3381fb7565c57db7a9ead
    E:/Project/Support/Day01/北京/a - 副本 (4).txt da5d6d8941b3381fb7565c57db7a9ead
    E:/Project/Support/Day01/北京/a - 副本 (5).txt da5d6d8941b3381fb7565c57db7a9ead
    E:/Project/Support/Day01/北京/a - 副本 (6).txt da5d6d8941b3381fb7565c57db7a9ead
    E:/Project/Support/Day01/北京/a - 副本.txt da5d6d8941b3381fb7565c57db7a9ead
    E:/Project/Support/Day01/北京/a.txt da5d6d8941b3381fb7565c57db7a9ead
    E:/Project/Support/Day01/北京/b - 副本 (2) - 副本.txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
    E:/Project/Support/Day01/北京/b - 副本 (2).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
    E:/Project/Support/Day01/北京/b - 副本 (3) - 副本.txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
    E:/Project/Support/Day01/北京/b - 副本 (3).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
    E:/Project/Support/Day01/北京/b - 副本 - 副本 (2).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
    E:/Project/Support/Day01/北京/b - 副本 - 副本 (3).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
    E:/Project/Support/Day01/北京/b - 副本 - 副本 (4).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
    E:/Project/Support/Day01/北京/b - 副本 - 副本 (5).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
    E:/Project/Support/Day01/北京/b - 副本 - 副本 (6).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
    E:/Project/Support/Day01/北京/b - 副本 - 副本.txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
    E:/Project/Support/Day01/北京/b.txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
    E:/Project/Support/Day01/北京\昌平/d - 副本 (2).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/d - 副本 (3).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/d - 副本 (4).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/d - 副本 (5).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/d - 副本 (6).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/d - 副本 (7).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/d - 副本 (8).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/d - 副本 (9).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/d - 副本.txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/d.txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/f - 副本 (2).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/f - 副本 (3).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/f - 副本 (4).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/f - 副本 (5).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/f - 副本 (6).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/f - 副本 (7).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/f - 副本 (8).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/f - 副本 (9).txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/f - 副本.txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\昌平/f.txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\海淀/c.txt d41d8cd98f00b204e9800998ecf8427e
    E:/Project/Support/Day01/北京\海淀/e.txt d41d8cd98f00b204e9800998ecf8427e
    In [24]:
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43

    4. 写一个程序实现目录备份

    4.1 需求

    如果是第一次备份,则全量备份,否则增量备份
    思路:
    通过文件的md5码计算文件是否进行了修改,
    如果修改了则重新备份,否则不备份,
    将所有文件的md5码放在一个配置文件中
    实现代码如下

    import os
    import tarfile
    import datetime
    from Day01 import fileManager as fm
    import pickle
    
    def backup(directory,backupdir,md5_file):
        fullBackup(directory,backupdir,md5_file)
    
    def fullBackup(directory,backupdir,md5_file):
        if os.path.exists(md5_file):
            incressBackup(directory,backupdir,md5_file)
            return 0
        print('full backup')
        tar_file = fr"{backupdir}/data_{datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')}.tar.gz"
        tar_obj = tarfile.open(tar_file,mode="w:gz")
        file_md5_dict = dict({})
        
        filelist = fm.getFileName(directory)
        [(file_md5_dict.update({file:fm.fileMD5(file)}),tar_obj.add(file)) for file in filelist]
        tar_obj.close()
        with open(md5_file,mode='wb') as fobj:
            pickle.dump(file_md5_dict,fobj)  
        print(file_md5_dict)
        
    def incressBackup(directory,backupdir,md5_file):
        if not os.path.exists(md5_file):
            fullBackup(directory,backupdir,md5_file)
            return 0
        print('incressBackup')
        tar_file = fr"{backupdir}/data_inc_{datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')}.tar.gz"
        tar_obj = tarfile.open(tar_file,mode="w:gz")
        file_md5_dict = dict({})
        with open(md5_file,mode='rb') as fobj:
            file_md5_dict = pickle.load(fobj)
        filelist = fm.getFileName(directory)
        for file in filelist:
            if file not in file_md5_dict.keys():
                file_md5_dict.update({file:fm.fileMD5(file)})
                tar_obj.add(file)
                print(f'add {file}')
            else:
                if file_md5_dict[file] != fm.fileMD5(file): # file change
                    file_md5_dict.update({file:fm.fileMD5(file)})
                    tar_obj.add(file)
                    print(f'change {file}')
        delFileSet = set(file_md5_dict.keys()).difference(set(filelist))
        print('delFileSet')
        print(delFileSet)
        [file_md5_dict.pop(diff) for diff in delFileSet]
        tar_obj.close()
        with open(md5_file,mode='wb') as fobj:
            pickle.dump(file_md5_dict,fobj)
            
    if __name__ == '__main__':
        backup("E:/Project/Support/Day01/北京","E:/Project/Support/Day01/backup","E:/Project/Support/Day01/backup/md5_file.pickle")
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56

    第一次运行:
    全量备份
    在这里插入图片描述
    第二次运行:
    删除一些文件,在增加一些文件,修改一些文件。
    增量备份:
    在这里插入图片描述

  • 相关阅读:
    算法练习第六十四天
    C#WPFPrism框架导航应用实例
    【VSCode】解决Open in browser无效
    CSDN Meetup 回顾 丨从数据湖到指标中台,提升数据分析 ROI
    create_generated_clock invert preinvert shift_edge是否符合设计真实状态很重要【示例1】
    P2756 飞行员配对方案问题
    LeetCode | 循环队列的爱情【恋爱法则——环游世界】
    翻页视图ViewPager
    实现单行/多行文本溢出
    反射机制(草稿)
  • 原文地址:https://blog.csdn.net/qq_37608398/article/details/134098260